Iceberg Example

In my previous post I examined how to use BlobSync to create a tool that not only uploads/downloads deltas to Azure Blob Storage (and hence saving LOTS of bandwidth), but also how to keep multiple versions in the cloud easily.

As a sample file for uploading/downloading I’ve picked the entire Sherlock Holmes collection. Big enough that it can show the benefits of dealing with deltas for bandwidth savings, but small enough that it can be easily edited (text).

Firstly, I perform the original upload.

image

Here you can see that the original sherlock file about 3.6M and for the initial upload the entire file is uploaded (indicated by the “Uploaded 3868221 bytes” message).

Then I list the blobs and it shows I only have 1 version (called “sherlock” as expected).

iceberg2

Now, I edit the sherlock file and modify a few lines here and there, and reupload it.

 

iceberg3

We can instantly see that this time the upload only transferred 100003 bytes. Which is about 2.6% of the original file size. Which is a nice saving.

Then we list the blobs associated with “sherlock” again. This time we see 2 versions:

  • sherlock 8/01/2015 11:36:09 AM +00:00
  • sherlock.v1 8/01/2015 11:36:01 AM +00:00

Here we see sherlock and sherlock.v1.  The original sherlock blob that was uploaded was renamed to sherlock.v1. The new sherlock uploaded is now the vanilla “sherlock” blob.

Note: The timestamps still need a little work. The ones displayed are when blobs were copied/uploaded. This means that sherlock.v1 doesn’t have the original timestamp when sherlock was originally uploaded but when it was copied from sherlock to sherlock.v1. But I can live with that for the moment.

Now, say I realise that I really want to have a copy of the original sherlock. The problem is that my local version has been modified. No problems, now I can tell update my local file with the contents of sherlock.v1 (remember, thats the original one I uploaded).

iceberg4

The download was 99k (again, not the 3.6M of the full file). In my case the c:\temp\sherlock is now updated to be the same as the blob sherlock.v1 (ie the original file). How can I be sure?

Well, I happen to have a spare copy of the original sherlock file on my machine (c:\temp\sherlock-orig), and you can see from my file compare (fc.exe) that the original sherlock and my newly updated local copy are the same.

Now I can upload/download deltas AND have multiple versions available to me for future reference.

So, what happens with all my backups I don’t want? Well, you can always load up any Azure Storage Explorer program and delete the blobs you don’t want. Or you can use Icerberg to prune them for you.

Say I’ve created a few more versions of sherlock.

iceberg5

But I’ve decided that I only want to keep the latest 2 backups (ignoring the most current one). ie I want to keep sherlock, sherlock.v2 and sherlock.v3.

I can issue the prune command as such:

iceberg6

Here I tell it prune all but the latest 2 backups of the sherlock blob. I list the blobs afterwards and you can indeed see that apart from the latest (sherlock) there are only the 2 latest backups.

I’m starting to look at using this for more of my own personal backups. Hopefully this may be of use to others.

Advertisements

Versioned backups using BlobSync

As previously described, the BlobSync library (Github, Nuget, Blog) can be used to update Azure block blobs without having to upload the entire file/blobs. It perform an intelligent delta calculation and uploads the minimal data possible.

So, what’s next?

To show possible use cases for BlobSync, this post will outline how it is easily possible to create a backup application that not only uploads the minimal data required but also keeps a series of backups so you can always restore a previously saved blob.

The broad design of the program is as follows:

  • Allow uploading (updating) of blobs.
  • Allow downloading (updating of local files) of blobs
  • Allow multiple versions of blobs to exist and prune what we don’t want.

For this I’m using Visual Studio 2013, other versions may work fine but YMMV. The version of BlobSync I’m using is the latest available at time of writing (0.3.0) and can be installed through Nuget as per any other package (for those who are new to Nuget, please see the Nuget documentation).

Of the three requirements listed above only the last one really adds any new functionality above BlobSync. For the upload/download I really am just using a couple of equivalent methods in BlobSync. For the multiple versions we need to figure out which approach to use.

What I decided on (and has been working well) is that for updating of an existing blob, the following process is used:

  • Each blob will have a piece of metadata which has the latest version number of the blob
  • On upload the existing blob is copied to another blob with the name <original blob name>.v<latest version number>. (along with paired signature blob)
  • New delta is uploaded against existing blob.

For example, say we have a blob called “myfile”. This means we also have a “myfile.0.sig” which is the paired signature blob.

When we upload a new version of myfile the following happens:

  • copy myfile to myfile.v.1
  • copy myfile.0.sig to myfile.v.1.0.sig
  • upload delta against myfile

This means that myfile is now the latest version and myfile.v.1 is the version that previously existed. If we repeat this process then again myfile will be the latest and what used to be myfile will now be myfile.v.2 and so on. It should be noted that the copying of the blobs is performed by the brilliantly useful Azure CopyBlob API which allows Azure it copy the blob itself and doesn’t require any traffic between the application and Azure Blob Storage. This is a BIG time saver!

Now that we’d have myfile, myfile.v.1 and myfile.v.2 we should also be able to use this new project to download any version of the file. More importantly be able to just download the deltas to reduce bandwidth usage (since that is the aim of the game).

So this is the high level design in mind…   you might want to look at the implementation.