BlobSync Nuget package released

After much tinkering about with Blob Updating, I’ve decided to release a Nuget package and see if anyone is interested.

The source is available on Github and the theory behind the logic used is in a previous post. What I’d like to describe here is the practical use of the newly released Nuget package “BlobSync”.

The BlobSync library is targeted to be cloud agnostic but in reality Azure Blob Storage is currently the only implementation available. So, to begin with, create a Windows Console application and add the Nuget package “BlobSync” to the reference assemblies. At the time of writing the latest (and only) version available is 0.1.0.

Next we’ll need to modify the App.config to include the Azure account information from the Azure Portal. Open the app.config and add the entries:

<appSettings>

  <add key=”AzureAccountKey” value=”Your Account Key”/>
  <add key=”AzureAccountName” value=”Your Account Name” />
  <add key=”SignatureSize” value=”100000” />
</appSettings>

 

Hopefully the key and name settings are self explanatory, but lets dig a bit deeper into “SignatureSize”.

Whenever a blob is uploaded it will be broken into “SignatureSize” sized chunks (Blocks in Azure lingo). So say I have a file that is 250000 bytes in size, this means I’ll end up with 2x100k blocks as well as a 50k block (the remaining bytes simply make up a block of the appropriate size).

This means that when we attempt to upload a modified version of the file later on, we’ll have the option of replacing 1 or more of these blocks. Now, in this particular case we wont be saving much, but this is just the beginning. Say we deal with very large files, then simply uploading 100k as opposed to uploading the entire 300M file is definitely a saving worth considering.

We also have the option of reducing the SignatureSize to something else (anywhere between 1 byte and 4M in reality). If we want finer grain replacing then we can reduce the SignatureSize to 1k (for example) but we need to remember that Azure Block blobs can only be constructed of 50000 blocks. This means that if all our blocks are 1k in size then the maximum size our block blob can be is 1k * 50000 == 50M, ie not that big. I’ve found that 100k is a good starting point.

 

Now, to get coding…

 

We’re going to use the class “BlobSync.AzureOps” for this example. You can see by intellisense that there are 6 methods of potential interest. In reality the calling code should only ever be concerned with 2 of them, “DownloadBlob” and “UploadFile”. I think the names are pretty self explanatory.

So to upload a file to Azure Blob Storage, we can do the following:


var blobSyncClient = new BlobSync.AzureOps();
blobSyncClient.UploadFile("mycontainer", "myblob",
"c:\\temp\\myfile.txt");

Assuming you have a container called “mycontainer” and a local file “c:\temp\myfile.txt” then you’ll end up with 2 blobs in Azure Blob Storage. The first will be called “myblob” and this has the same contents as “myfile.txt”. The second will be called “myblob.0.sig”, which I’ll call the Signature Blob. This signature blob contains information about “myblob” which will be used when any further uploads or downloads occur.

 

Say you now modify “c:\temp\myfile.txt” and want to update the version in the blob.

You can now execute the exact same 2 lines as before and this time the BlobSync library will perform a number of tasks:

 

1) Checks to see if a signature blob exists.

2) Downloads signature file

3) Uses the information in the signature file to determine which parts of the local file have been modified (compared to the existing blob).

4) Uploads the changes to Azure Blob Storage.

5) Generates a new signature file and uploads it.

 

Now the blob and the local file should be identical but with the minimum data transferred over the wire.

 

Downloading works pretty well much the same.

 

If you modify the local version and then decide that you want the version in the blob, you simply run the code:


var blobSyncClient = new BlobSync.AzureOps();
blobSyncClient.DownloadBlob("mycontainer", "myblob", "c:\\temp\\myfile.txt");

 

Then BlobSync library will perform the steps:

1) Checks to see if a signature blob exists.

2) Downloads signature file

3) Uses the information in the signature file to determine which parts of the local file have been modified (compared to the existing blob).

4) Downloads only those blocks from Azure Blob Storage that are not already available in the local file.

5) Reconstructs the local file based on the changes downloaded.

 

Currently BlobSync is here to reduce bandwidth requirements and isn’t optimised for quickest transfers. In reality it probably is quicker but it does not go out of its way to make downloads/uploads parallel etc. This is something I’ll be adding in soon to speed things up.

If anyone has any improvements or suggestions, please leave a comment.

Advertisements

Optimising Azure Blob Updating Part 2

In my previous post I covered the high level theory for an efficient method of updating Azure Block Blobs. (This theory can be used against most cloud storage providers as long as their blob API’s allow partial uploads/downloads).

The implementation of what was described is now available on Github.

I’ll go through the implementation specifics in a later post, but for those who want to try out BlobSync, simply clone the github repo. Compile (I’m using Visual Studio 2013) and you’ll end up with a binary “BlobSyncCmd.exe”. Some examples of using the command are:

 

blobsynccmd upload c:\temp\myfile  mycontainer  myblob

 

This will upload the local “myfile” to Azure, in the appropriate container with the appropriate name.

Then, feel free to modify your local file and upload it again. If you use any network monitoring tools you should see a dramatic reduction in the uploaded bytes (assuming you don’t modify the entire file).

Equally you can run the command:

 

blobsynccmd download c:\temp\myfile mycontainer myblob

 

This will download the blob and reuse as much of “myfile” as it can. Currently for testing purposes it won’t replace myfile but will create myfile.new

 

More to come.

Small AzureCopy update.

Just a quick update to both AzureCopy executable as well as the Nuget package. Previously the Skydrive code considered “types” of elements as “files” or “folders”, catch is Skydrive labels things differently. Previously if I’d tried to copy a png from Skydrive to anywhere it would never get detected and copied due to the fact Skydrive says this isn’t a “file” but an “image”. This is now rectified in AzureCopy.

 

Enjoy.