Windows Azure Storage: What’s in Your Blob?
Windows Azure Storage has some great features that make it very flexible, giving you lots of options for designing solutions. I will share some of those features and explain how they can be used.
The first thing to do is understand some of the terms and a little about the architecture so that everything can be put into context. In this post, I will be focusing on storage as it relates to IaaS (Infrastucture as a Service).
When you create a Windows Azure Virtual Machine, a VHD (Virtual Hard Drive) will be created for the guest OS and stored as a page blob in your storage account. Any additional drives you create will also be stored as page blobs.
What is a blob, you ask?
Windows Azure Blob storage is a service for storing large amounts of unstructured data. Blobs themselves, which are simply files of any type or size, come in two flavors: page blobs and block blobs. I am going to focus on page blobs because that is how VHD is stored. The reason for this is that page blobs are optimized for random read/write operations which make them ideal for virtual hard drives. Page blobs can be created up to 1TB in size, but don’t fret if you are looking for larger disks inside your virtual machine. You can stripe multiple virtual disks together from within the guest OS.
The next level up is called a container. The container provides a structure for grouping blobs together. By default, the VHDs you create will reside in a container called vhds. It is also possible to create folders inside a container to manage and organize your blobs.
The highest level of the storage namespace is called the Storage Account. All access to Windows Azure storage services is done through the Storage Account. You can create up to five storage accounts for your Windows Azure subscription, each containing up to 200TB of data.
Now that some of the terms are defined and you can see the hierarchical nature of the architecture, let’s dig into some of the features. We will start with snapshots. Snaphots can be taken at the blob level but also at the container level. I like this because it gives me the ability to take a snapshot of a single virtual machine or a group of virtual machines. The snapshots can be promoted to revert to a specific point in time, and they are also portable. This means that they can be copied and mounted by other virtual machines.
All Page Blobs are stored in a sparse format. So when you format your data drives, make sure you use the quick format option. This will not only save time on the format process, but it will also save you money. You will only pay for the storage that is actually used. When uploading VHDs to your storage account, you can take advantage of the sparse formatting by using a tool that understands it and only uploads the portions of the VHD that have data on them. Try the Add-AzureDataDisk cmdlet, available in the Windows Azure PowerShell module. Again, this can be another real time saver.
When copying large VHDs in your storage account, you will notice it is very fast, nearly instantaneous. The reason for this is that you are really only creating pointers and therefore not moving large amounts of data or consuming storage that you would need to buy. From a performance perspective, when you are reading or writing to the VHD, you can expect to get around 60 MB/sec or up to 500 transactions per second on each VHD. As I mentioned earlier, you can stripe multiple VHDs together inside the OS to achieve not only larger disks inside a virtual machine but also higher throughput. Up to 8K IOPs on a single virtual machine. If you do create striped volumes, you won’t want to geo-replicate this data. There is no guarantee a stripe set will be intact if data needed to be restored from the remote site.
That wraps things up for this week. As always, check back because I will be covering new Azure topics each week. Feel free to suggest a topic you would like to see.