Downloading blob content

I missed this in the previous blog entry. The SDK makes it really trivial to extract blob data. For text content you can directly get the text, and for non-text content you can either download to a file or to a stream. The code snippet below shows how it’s done.

void DownloadBlobData()
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto blob_client = storage_account.create_cloud_blob_client();
  auto container = blob_client.get_container_reference(
      U("textdata"));
  bool created = container.create_if_not_exists();

  // Read the text content directly
  auto text_blob1 = container.get_block_blob_reference(
      U("texts/text1"));
  auto text = text_blob1.download_text();
  ucout << text << endl;

  // Download the blob data to a file
  auto text_blob2 = container.get_block_blob_reference(
      U("texts/text2"));
  text_blob2.download_to_file(
      U("d:\\tmp\\blobdata.txt"));

  // Download the blob data to an ostream
  stringstreambuf buffer;
  concurrency::streams::ostream output(buffer);
  text_blob2.download_to_stream(output);
  cout << buffer.collection() << endl;
}

Using Azure blob storage from C++

Blob storage is for storing large amounts of semi-structured or unstructured data such as images, videos, documents etc. The blob service lets you create named containers that can then contain one or more named blobs which can be publicly (optional) accessed via an URI.

Creating blobs

void CreateTextBlobs()
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto blob_client = storage_account.create_cloud_blob_client();
  auto container = blob_client.get_container_reference(
      U("textdata"));
  bool created = container.create_if_not_exists();

  blob_container_permissions permissions;
  permissions.set_public_access(
      blob_container_public_access_type::container);
  container.upload_permissions(permissions);

  auto text_blob1 = container.get_block_blob_reference(
      U("texts/text1"));
  text_blob1.upload_text(U("This is some text - modified"));

  auto text_blob2 = container.get_block_blob_reference(
      U("texts/text2"));
  text_blob2.upload_from_file(U("./stdafx.h"));
}

The classes/methods used are very similar to that used for table storage. Notice the use of blob_container_permissions to set the public access level for the container. It’s off by default, and you can optionally set it to blob (clients can read blob data) or container (clients can list blobs in the container and read blob data). You can simulate directories in the blob names. In the above example, both text1 and text2 are under the texts directory. This just affects the URI and are not physical directories.

Listing blobs

void ListTextBlobs()
{
  auto storage_account = cloud_storage_account::parse(
        U("UseDevelopmentStorage=true"));
  auto blob_client = storage_account.create_cloud_blob_client();
  auto container = blob_client.get_container_reference(
        U("textdata"));
  bool created = container.create_if_not_exists();

  continuation_token token;

  auto result = container.list_blobs_segmented(token);

  for (auto dir : result.directories())
  {
    ucout << U("Directory: ") << dir.uri().path() << endl;
    ucout << endl;

    continuation_token dir_token;
    auto resultInner = dir.list_blobs_segmented(dir_token);

    for (auto item : resultInner.blobs())
    {
      ucout << item.name() << endl;
      ucout << item.uri().path() << endl;
      ucout << item.properties().content_type() << endl;
      ucout << endl;
    }
  }
}

To list the blobs in a container, we need to use a continuation_token object. The container object supports a list_blobs_segmented method which takes this token. For each directory returned, we can then call list_blobs_segmented using a separate continuation_token object. Once we iterate through the blobs in the directory, we can access properties like the name, URI, content type etc. Here’s a sample output from calling the above fucntion.

Directory: /devstoreaccount1/textdata/texts/

texts/text1
/devstoreaccount1/textdata/texts/text1
text/plain; charset=utf-8

texts/text2
/devstoreaccount1/textdata/texts/text2
application/octet-stream

Notice how the text data we uploaded is of type text/plain whereas the file we uploaded is of type application/octet-stream. If you paste the first blob’s URI in a browser, the text is directly returned. Whereas in the second case, the file is offered for download – obviously this is browser specific behavior based on the content type.

Writing custom queries against an Azure table

While not as powerful as SQL, Azure tables do allow you to do minimal querying. With the native SDK, you’d do this using the table_query object’s set_filter_string function. Here’s a modified ReadTableData method from the previous blog entries.

void ReadTableData(string_t filter)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(U("Clients"));
  bool created = table.create_if_not_exists();

  table_query query;
  query.set_filter_string(filter);

  auto results = table.execute_query(query);

  for (auto item : results)
  {
    auto properties = item.properties();

    for (auto property : properties)
    {
      ucout << property.first << U(" = ") 
            << property.second.str() << U("\t");
    }

    ucout << endl;
  }
}

Here’s an example query that gets all rows within a range of RowKey values.

void ReadTableData(string_t rowKeyStart, string_t rowKeyEnd)
{
  auto filter = table_query::combine_filter_conditions(
    table_query::generate_filter_condition(
        U("RowKey"), 
        query_comparison_operator::greater_than_or_equal, 
        rowKeyStart),
    query_logical_operator::and,
    table_query::generate_filter_condition(
        U("RowKey"), 
        query_comparison_operator::less_than_or_equal, 
        rowKeyEnd));

  ReadTableData(filter);
}

The combine_filter_conditions function is used to create a query string. The query_comparison_operator class allows you to set comparison operators and the query_logical_operator class lets you set logical operators. In case you are wondering, that gets converted to the following string.

(RowKey ge ‘100’) and (RowKey le ‘104’)

Here’s a similar method, that queries against the Name column.

void ReadTableDataStartingWith(string_t letter1, string_t letter2)
{
  auto filter = table_query::combine_filter_conditions(
    table_query::generate_filter_condition(
        U("Name"), 
        query_comparison_operator::greater_than_or_equal, 
        letter1),
    query_logical_operator::and,
    table_query::generate_filter_condition(
        U("Name"), 
        query_comparison_operator::less_than, 
        letter2));

  ReadTableData(filter);
}

The generated query filter looks like this.

(Name ge ‘D’) and (Name lt ‘E’)

You can call it as follows :

ReadTableDataStartingWith(U("D"), U("E"));

That’d return all rows with names starting with a D. Querying against partition and row keys would be a faster approach. Also, for repeated querying against a set of data, you may want to fetch a larger subset of the data and then query it in memory using standard STL.

Modifying Azure table data

The last blog entry showed how to read data from Azure storage tables. This blog entry will show how to insert, update, and delete data to and from Azure table storage.

Inserting data

void InsertTableData(string_t key, int id, 
    string_t name, string_t phone)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(
      U("Clients"));
  bool created = table.create_if_not_exists();

  table_entity entity(partitionKey, key);
  auto& properties = entity.properties();
  properties.reserve(3);
  properties[U("Name")] = entity_property(name);
  properties[U("Phone")] = entity_property(phone);
  properties[U("Id")] = entity_property(id);

  auto operation = table_operation::insert_entity(entity);
  auto result = table.execute(operation);
}

Azure table data is really just about properties. So what’s involved is creating a new table entity, setting up some properties (named keys with values), and then an insert_entity operation is executed against the table.

Updating data

void UpdateTableData(string_t key, int id, 
    string_t name, string_t phone)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(
      U("Clients"));
  bool created = table.create_if_not_exists();

  auto operation = table_operation::retrieve_entity(
      partitionKey, key);
  auto result = table.execute(operation);
  auto entity = result.entity();
  auto& properties = entity.properties();
  properties[U("Name")] = entity_property(name);
  properties[U("Phone")] = entity_property(phone);
  properties[U("Id")] = entity_property(id);

  auto operationUpdate = table_operation::replace_entity(entity);
  result = table.execute(operationUpdate);
}

This code is very similar. The retrieve_entity operation is used to access the entity we need to update. The property values are updated, and then a replace_entity operation is executed.

Deleting data

void DeleteTableData(string_t key)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(
      U("Clients"));
  bool created = table.create_if_not_exists();

  auto operation = table_operation::retrieve_entity(
      partitionKey, key);
  auto result = table.execute(operation);
  auto entity = result.entity();

  auto operationUpdate = table_operation::delete_entity(entity);
  result = table.execute(operationUpdate);
}

This is very similar to the update, except that a delete_entity operation is executed. Future blog entries will talk about accessing Azure blob storage using the native SDK.

Azure Storage access using C++

The Microsoft Azure Storage Client Library for C++ is a library built on top of the C++ REST SDK that lets you access Azure Storage from your C++ apps. You need to install it via NuGet. While installing it, I could not locate it using the NuGet packages UI in Visual Studio 2013. Instead I had to install it via the Package Manager Console. I assume it’s because the library is pre-release. The latest version is 0.3.0 (preview) dated May 16 2014. So when you install it, you need to add the -pre parameter.

install-package wastorage -pre

This will add the REST SDK and other dependencies like the wastorage.redist to your project. To test if it works, I created a table using a quickly put together C# project and then accessed it from a C++ console app. I just used the Azure storage emulator. Here are the includes you need to have in your project.

#include "../packages/wastorage.0.3.0-preview/build/native/include/was/storage_account.h"
#include "../packages/wastorage.0.3.0-preview/build/native/include/was/table.h"

You can also add this using namespace declaration to save some keystrokes.

using namespace azure::storage;

Here’s the code snippet, and I’ve added comments to make it easier to understand.

// This returns the storage account object
auto storage_account = cloud_storage_account::parse(U("UseDevelopmentStorage=true"));

// This returns the table client service object
auto table_client = storage_account.create_cloud_table_client();

// Now, we get a reference to a named table
auto table = table_client.get_table_reference(U("Clients"));

// This will create the table if it does not exist
bool created = table.create_if_not_exists();

// We get everything without a filter
table_query query;
auto results = table.execute_query(query);

// Each item represents a table entity
for (auto item : results)
{
  auto properties = item.properties();

  // Each property from the table entity is returned as a key-value pair
  for (auto property : properties)
  {
    ucout << property.first << U(" = ") << property.second.str() << U("\t");
  }

  ucout << endl;
}

Example output:

Id = 100        Name = John Brown       Phone = 777-1234
Id = 101        Name = Mary Jane        Phone = 777-5678

Now this was basically as easy as using C#. The one difference, and it’s a non-trivial one, is that with C# you can map the returned table_entity to a .NET type. So I can write something like this:

TableQuery<Client> query = new TableQuery<Client>();

foreach (var item in table.ExecuteQuery(query))
{
    Console.WriteLine("{0} - {1}", item.Name, item.Phone);
}

Here, Client is an entity object.

class Client : TableEntity
{
    public Client(int id)
    {
        this.Id = id;
        this.PartitionKey = id.ToString();
        this.RowKey = id.ToString();
    }

    public Client()
    {
    }

    public string Name { get; set; }
    public string Phone { get; set; }
    public int Id { get; set; }
}

I am guessing here, but it’s most likely done via reflection. With C++, you’d need to write plumbing code to map the returned data to your C++ objects. Not a big deal really, and more a matter of coding convenience. Probably more performant to do it the C++ way (that’s an unverified personal thought).

Using weak_ptr

The weak_ptr holds a weakly referenced pointer to an object that is managed by a shared_ptr (or by multiple shared_ptr instances). The weak_ptr does not affect the strong ref count. You typically construct a weak_ptr out of a shared_ptr, and then when you need to access the underlying object, you call lock() on the weak_ptr which gives you a shared_ptr (with ref count incremented). One use of weak_ptr types is to help avoid circular references, which often leads to memory leaks as objects continue to remain in memory. Another example would be cached access to objects that may or may not be alive in memory. So you’d store the weak_ptrs and whenever you need to access the object, you’d check to see if the object’s alive, and create a shared_ptr from the weak_ptr as needed. The code snippet below shows such a pattern.

class NumberStoreCache
{
private:
  unordered_map<int, weak_ptr<NumberStore>> cache;

  shared_ptr<NumberStore> AddToCache(int number)
  {
    shared_ptr<NumberStore> store = make_shared<NumberStore>(
        number);
    weak_ptr<NumberStore> weak(store);
    cache[number] = weak;
    return store;
  }

public:
  shared_ptr<NumberStore> GetNumberStore(int number)
  {
    if (cache.find(number) == cache.end())
    {
      return AddToCache(number); // call 1 
    }
    else
    {
      weak_ptr<NumberStore> weak = cache[number];
      if (weak.expired())
      {
        return AddToCache(number); // call 2
      }
      else
      {
        return weak.lock(); // call3
      }
    }
  }
};

void Foo()
{
  NumberStoreCache nsCache;
  auto ns1 = nsCache.GetNumberStore(10); // call 1
  ns1.reset();
  auto ns2 = nsCache.GetNumberStore(10); // call 2
  auto ns3 = nsCache.GetNumberStore(10); // call 3
}

Using shared_ptr

While unique_ptr is meant for single-owner scenarios, shared_ptr is the reference counted smart pointer class that allows you to share the smart pointer around your code. Consider the code snippet below, which uses the NumberStore example from the previous blog entry.

void SharedPtrVersion()
{
  shared_ptr<NumberStore> number(
      new NumberStore(100)); // Ref Count : 1
  Foo(number); // Ref Count : 1 on function return
  shared_ptr<NumberStore> copy = number; // Ref Count : 2
  Foo(number); // Ref Count : 2 on function return
  copy.reset(); // Ref Count : 1
  Foo(number); // Ref Count : 1 on function return
  Foo(copy); // This will crash
}

The output will be:

NumberStore ctor
Foo : 100 Ref Count : 2
Foo : 100 Ref Count : 3
Foo : 100 Ref Count : 2
NumberStore dtor

The ref count goes up inside Foo‘s body as Foo has received a copy of the shared_ptr. It goes back as soon as Foo returns. Creating a copy increments the ref count as expected. Calling reset() decrements the ref count and releases ownership. Trying to pass around a reset shared_ptr will give you a crash / access violation.

The shared_ptr class also allows you to pass a lambda as the deletion method called in the destructor, so you can do something like this.

void SharedPtrArrayVersion()
{
  shared_ptr<NumberStore> number(
    new NumberStore[3], 
    [](NumberStore* pNumStore) { delete[] pNumStore; });
}

Note: when you call reset, if the ref count drops to 0, the object is destroyed. Example:

void SharedPtrVersion2()
{
  shared_ptr<NumberStore> number(new NumberStore(100));
  shared_ptr<NumberStore> copy = number;
  number.reset();
  copy.reset(); // destructor is called here
  cout << "end of method" << endl;
}

The next blog entry will talk about the weak_ptr class (the last of the 3 smart pointers introduced in C++ 11).

Using unique_ptr instead of auto_ptr

I’ve had a bit of a blogging hiatus and hope to make amends for that. Going forward, I will continue to blog on modern C++ features and also focus on newer frameworks from Microsoft (primarily C++ focused, but non-C++ technologies that interest me will also be discussed).

For years, we’ve all used auto_ptr with all its pitfalls. A minor issue was its lack of support for an array of objects, so if you used it with with an array of objects, only the first one would get deleted. A much bigger issue is that the auto_ptr transfers ownership when it’s assigned to another auto_ptr. And because this happens in a non-obvious fashion, it’s fairly easy to introduce problems in your code. The unique_ptr solves both these problems. It has a partial specialization that correctly calls delete[] on an array of objects. It also emphasizes the point of unique ownership. So you have to explicitly move an unique_ptr into another unique_ptr.

Here are some code snippets that show this more clearly.

class NumberStore
{
  int _num;

public:
  NumberStore(int num = 0) : _num(num)
  { 
    cout << "NumberStore ctor" << endl;
  }

  ~NumberStore()
  {
    cout << "NumberStore dtor" << endl;
  }

  int Num()
  {
    return _num;
  }
};

void Foo(auto_ptr<NumberStore> number)
{
  cout << "Foo : " << number->Num() << endl;
}

void AutoPtrProblem()
{
  auto_ptr<NumberStore> number(new NumberStore(100));
  Foo(number);
  Foo(number); //This call will crash
}

The 2nd call to Foo(...) will crash because the object has already been destroyed. Now consider the unique_ptr version.

void Foo(unique_ptr<NumberStore> number)
{
  cout << "Foo : " << number->Num() << endl;
}

void UniquePtrVersion()
{
  unique_ptr<NumberStore> number(new NumberStore(100));
  // Foo(number); <-- This won't compile
  Foo(move(number));
  Foo(move(number)); //This is an obvious programmer error now
}

By forcing you to explicitly call move, the compiler makes it obvious that a transfer of ownership is happening there. It’ll take some conscious bad programming to create a similar crash as before. As for an array of objects, the following method will call the destructor thrice.

void UniquePtrArrayVersion()
{
  unique_ptr<NumberStore[]> number(new NumberStore[3]);
}

So, going forward if you think you have a need for a single-owner smart pointer class, unique_ptr is the one to use, not auto_ptr. For other situations, you still do not use auto_ptr, instead you’d use shared_ptr which I’ll be blogging about shortly.

VC++ 2013 – Initializer lists and uniform initialization

We’ve always been able to use initializer lists with arrays, now you can do it with any type that has a method that takes an argument of type std::initializer_list<T> (including constructors). The standard library collections have all been updated to support initializer lists.

void foo()
{
  vector<int> vecint = { 3, 5, 19, 2 };
  map<int, double> mapintdoub =
  {
    { 4, 2.3},
    { 12, 4.1 },
    { 6, 0.7 }
  };
}

And it’s trivial to do this with your own functions.

void bar1(const initializer_list<int>& nums) 
{
  for (auto i : nums)
  {
    // use i
  }
}

bar1({ 1, 4, 6 });

You can also do it with your user defined types.

class bar2
{
public:
  bar2(initializer_list<int> nums) { }
};

class bar3
{
public:
  bar3(initializer_list<bar2> items) { }
};

bar2 b2 = { 3, 7, 88 };

bar3 b3 = { {1, 2}, { 14 }, { 11, 8 } };

Uniform initialization is a related feature that’s been added to C++ 11. It automatically uses the matching constructor.

class bar4
{
  int x;
  double y;
  string z;

public:
  bar4(int, double, string) { }
};

class bar5
{
public:
  bar5(int, bar4) { }
};

bar4 b4 { 12, 14.3, "apples" };

bar5 b5 { 10, { 1, 2.1, "bananas" } };

If there’s an initializer-list constructor, it takes precedence over another matching constructor.

class bar6
{
public:
  bar6(int, int) // (1)
  {
    // ...
  }

  bar6(initializer_list<int>) // (2)
  {
    // ...
  }
};
  
bar6 b6 { 10, 10 }; // --> calls (2) above

VC++ 2013 : Explicit conversion operators

Continuing on with my blog series on C++ 11 support added in Visual C++ 2013, this one’s about explicit conversion operators. I remember a rather embarrassing day in August 2004 when I realized that despite considering myself to be a decent C++ programmer, I had not until then known about the explicit keyword. I have a blog entry from back then.

Just to summarize the use of explicit, consider the example below.

class Test1
{
public:
  explicit Test1(int) { }
};

void Foo()
{
  Test1 t1(20);
  Test1 t2 = 20; // will not compile
}

While this could be done with conversion constructors, there was no way to do this for conversion operators because the standard did not support it. The bad thing about this was that you could not design a class to have consistency between conversion constructors and conversion operators. Consider the example below.

class Test1
{
public:
  explicit Test1(int) { }
};

class Test2
{
  int x;
public:
  Test2(int i) : x(i) { }
  operator Test1() { return Test1(x); }
};

void Foo()
{
  Test2 t1 = 20;
  Test1 t2 = t1; // will compile
}

That compiles now. Well, with C++ 11, you can now apply explicit on your conversion operators too.

class Test2
{
  int x;
public:
  Test2(int i) : x(i) { }
  explicit operator Test1() { return Test1(x); }
};

void Foo()
{
  Test2 t1 = 20;
  Test1 t2 = (Test1)t1; // this compiles
  Test1 t3 = t1; // will not compile
}

Here’s a not so obvious behavior with bool conversion operators.

class Test3
{
public:
  operator bool() { return true; }
};

void Foo()
{
  Test3 t3;
  if (t3)
  {
  }

  bool b = t3;
}

That compiles fine. Now try adding explicit to the operator.

class Test3
{
public:
  explicit operator bool() { return true; }
};

void Foo()
{
  Test3 t3;
  if (t3) // this compiles!
  {
  }

  bool b = t3; // will not compile
}

As expected, the 2nd conversion failed to compile, but the first one did. That’s because the if construct’s bool conversion is treated as explicit. So you need to be wary of this, just adding explicit to your bool conversion operator will not keep your type safe from accidental conversions to bool.