Using Azure blob storage from C++

Blob storage is for storing large amounts of semi-structured or unstructured data such as images, videos, documents etc. The blob service lets you create named containers that can then contain one or more named blobs which can be publicly (optional) accessed via an URI.

Creating blobs

void CreateTextBlobs()
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto blob_client = storage_account.create_cloud_blob_client();
  auto container = blob_client.get_container_reference(
      U("textdata"));
  bool created = container.create_if_not_exists();

  blob_container_permissions permissions;
  permissions.set_public_access(
      blob_container_public_access_type::container);
  container.upload_permissions(permissions);

  auto text_blob1 = container.get_block_blob_reference(
      U("texts/text1"));
  text_blob1.upload_text(U("This is some text - modified"));

  auto text_blob2 = container.get_block_blob_reference(
      U("texts/text2"));
  text_blob2.upload_from_file(U("./stdafx.h"));
}

The classes/methods used are very similar to that used for table storage. Notice the use of blob_container_permissions to set the public access level for the container. It’s off by default, and you can optionally set it to blob (clients can read blob data) or container (clients can list blobs in the container and read blob data). You can simulate directories in the blob names. In the above example, both text1 and text2 are under the texts directory. This just affects the URI and are not physical directories.

Listing blobs

void ListTextBlobs()
{
  auto storage_account = cloud_storage_account::parse(
        U("UseDevelopmentStorage=true"));
  auto blob_client = storage_account.create_cloud_blob_client();
  auto container = blob_client.get_container_reference(
        U("textdata"));
  bool created = container.create_if_not_exists();

  continuation_token token;

  auto result = container.list_blobs_segmented(token);

  for (auto dir : result.directories())
  {
    ucout << U("Directory: ") << dir.uri().path() << endl;
    ucout << endl;

    continuation_token dir_token;
    auto resultInner = dir.list_blobs_segmented(dir_token);

    for (auto item : resultInner.blobs())
    {
      ucout << item.name() << endl;
      ucout << item.uri().path() << endl;
      ucout << item.properties().content_type() << endl;
      ucout << endl;
    }
  }
}

To list the blobs in a container, we need to use a continuation_token object. The container object supports a list_blobs_segmented method which takes this token. For each directory returned, we can then call list_blobs_segmented using a separate continuation_token object. Once we iterate through the blobs in the directory, we can access properties like the name, URI, content type etc. Here’s a sample output from calling the above fucntion.

Directory: /devstoreaccount1/textdata/texts/

texts/text1
/devstoreaccount1/textdata/texts/text1
text/plain; charset=utf-8

texts/text2
/devstoreaccount1/textdata/texts/text2
application/octet-stream

Notice how the text data we uploaded is of type text/plain whereas the file we uploaded is of type application/octet-stream. If you paste the first blob’s URI in a browser, the text is directly returned. Whereas in the second case, the file is offered for download – obviously this is browser specific behavior based on the content type.

Writing custom queries against an Azure table

While not as powerful as SQL, Azure tables do allow you to do minimal querying. With the native SDK, you’d do this using the table_query object’s set_filter_string function. Here’s a modified ReadTableData method from the previous blog entries.

void ReadTableData(string_t filter)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(U("Clients"));
  bool created = table.create_if_not_exists();

  table_query query;
  query.set_filter_string(filter);

  auto results = table.execute_query(query);

  for (auto item : results)
  {
    auto properties = item.properties();

    for (auto property : properties)
    {
      ucout << property.first << U(" = ") 
            << property.second.str() << U("\t");
    }

    ucout << endl;
  }
}

Here’s an example query that gets all rows within a range of RowKey values.

void ReadTableData(string_t rowKeyStart, string_t rowKeyEnd)
{
  auto filter = table_query::combine_filter_conditions(
    table_query::generate_filter_condition(
        U("RowKey"), 
        query_comparison_operator::greater_than_or_equal, 
        rowKeyStart),
    query_logical_operator::and,
    table_query::generate_filter_condition(
        U("RowKey"), 
        query_comparison_operator::less_than_or_equal, 
        rowKeyEnd));

  ReadTableData(filter);
}

The combine_filter_conditions function is used to create a query string. The query_comparison_operator class allows you to set comparison operators and the query_logical_operator class lets you set logical operators. In case you are wondering, that gets converted to the following string.

(RowKey ge ‘100’) and (RowKey le ‘104’)

Here’s a similar method, that queries against the Name column.

void ReadTableDataStartingWith(string_t letter1, string_t letter2)
{
  auto filter = table_query::combine_filter_conditions(
    table_query::generate_filter_condition(
        U("Name"), 
        query_comparison_operator::greater_than_or_equal, 
        letter1),
    query_logical_operator::and,
    table_query::generate_filter_condition(
        U("Name"), 
        query_comparison_operator::less_than, 
        letter2));

  ReadTableData(filter);
}

The generated query filter looks like this.

(Name ge ‘D’) and (Name lt ‘E’)

You can call it as follows :

ReadTableDataStartingWith(U("D"), U("E"));

That’d return all rows with names starting with a D. Querying against partition and row keys would be a faster approach. Also, for repeated querying against a set of data, you may want to fetch a larger subset of the data and then query it in memory using standard STL.

Modifying Azure table data

The last blog entry showed how to read data from Azure storage tables. This blog entry will show how to insert, update, and delete data to and from Azure table storage.

Inserting data

void InsertTableData(string_t key, int id, 
    string_t name, string_t phone)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(
      U("Clients"));
  bool created = table.create_if_not_exists();

  table_entity entity(partitionKey, key);
  auto& properties = entity.properties();
  properties.reserve(3);
  properties[U("Name")] = entity_property(name);
  properties[U("Phone")] = entity_property(phone);
  properties[U("Id")] = entity_property(id);

  auto operation = table_operation::insert_entity(entity);
  auto result = table.execute(operation);
}

Azure table data is really just about properties. So what’s involved is creating a new table entity, setting up some properties (named keys with values), and then an insert_entity operation is executed against the table.

Updating data

void UpdateTableData(string_t key, int id, 
    string_t name, string_t phone)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(
      U("Clients"));
  bool created = table.create_if_not_exists();

  auto operation = table_operation::retrieve_entity(
      partitionKey, key);
  auto result = table.execute(operation);
  auto entity = result.entity();
  auto& properties = entity.properties();
  properties[U("Name")] = entity_property(name);
  properties[U("Phone")] = entity_property(phone);
  properties[U("Id")] = entity_property(id);

  auto operationUpdate = table_operation::replace_entity(entity);
  result = table.execute(operationUpdate);
}

This code is very similar. The retrieve_entity operation is used to access the entity we need to update. The property values are updated, and then a replace_entity operation is executed.

Deleting data

void DeleteTableData(string_t key)
{
  auto storage_account = cloud_storage_account::parse(
      U("UseDevelopmentStorage=true"));
  auto table_client = storage_account.create_cloud_table_client();
  auto table = table_client.get_table_reference(
      U("Clients"));
  bool created = table.create_if_not_exists();

  auto operation = table_operation::retrieve_entity(
      partitionKey, key);
  auto result = table.execute(operation);
  auto entity = result.entity();

  auto operationUpdate = table_operation::delete_entity(entity);
  result = table.execute(operationUpdate);
}

This is very similar to the update, except that a delete_entity operation is executed. Future blog entries will talk about accessing Azure blob storage using the native SDK.

Azure Storage access using C++

The Microsoft Azure Storage Client Library for C++ is a library built on top of the C++ REST SDK that lets you access Azure Storage from your C++ apps. You need to install it via NuGet. While installing it, I could not locate it using the NuGet packages UI in Visual Studio 2013. Instead I had to install it via the Package Manager Console. I assume it’s because the library is pre-release. The latest version is 0.3.0 (preview) dated May 16 2014. So when you install it, you need to add the -pre parameter.

install-package wastorage -pre

This will add the REST SDK and other dependencies like the wastorage.redist to your project. To test if it works, I created a table using a quickly put together C# project and then accessed it from a C++ console app. I just used the Azure storage emulator. Here are the includes you need to have in your project.

#include "../packages/wastorage.0.3.0-preview/build/native/include/was/storage_account.h"
#include "../packages/wastorage.0.3.0-preview/build/native/include/was/table.h"

You can also add this using namespace declaration to save some keystrokes.

using namespace azure::storage;

Here’s the code snippet, and I’ve added comments to make it easier to understand.

// This returns the storage account object
auto storage_account = cloud_storage_account::parse(U("UseDevelopmentStorage=true"));

// This returns the table client service object
auto table_client = storage_account.create_cloud_table_client();

// Now, we get a reference to a named table
auto table = table_client.get_table_reference(U("Clients"));

// This will create the table if it does not exist
bool created = table.create_if_not_exists();

// We get everything without a filter
table_query query;
auto results = table.execute_query(query);

// Each item represents a table entity
for (auto item : results)
{
  auto properties = item.properties();

  // Each property from the table entity is returned as a key-value pair
  for (auto property : properties)
  {
    ucout << property.first << U(" = ") << property.second.str() << U("\t");
  }

  ucout << endl;
}

Example output:

Id = 100        Name = John Brown       Phone = 777-1234
Id = 101        Name = Mary Jane        Phone = 777-5678

Now this was basically as easy as using C#. The one difference, and it’s a non-trivial one, is that with C# you can map the returned table_entity to a .NET type. So I can write something like this:

TableQuery<Client> query = new TableQuery<Client>();

foreach (var item in table.ExecuteQuery(query))
{
    Console.WriteLine("{0} - {1}", item.Name, item.Phone);
}

Here, Client is an entity object.

class Client : TableEntity
{
    public Client(int id)
    {
        this.Id = id;
        this.PartitionKey = id.ToString();
        this.RowKey = id.ToString();
    }

    public Client()
    {
    }

    public string Name { get; set; }
    public string Phone { get; set; }
    public int Id { get; set; }
}

I am guessing here, but it’s most likely done via reflection. With C++, you’d need to write plumbing code to map the returned data to your C++ objects. Not a big deal really, and more a matter of coding convenience. Probably more performant to do it the C++ way (that’s an unverified personal thought).