AzureCopy update.

*EDIT* Fixed binary link

A few updates have been added to AzureCopy program.

Firstly when copying to Azure you can specify the -blobcopy parameter which tells Azure itself to do the copying directly from the source location to the Azure datacentre. This means that if you’re copying from one internet location (say S3) to Azure, then all the traffic is going between the two datacentres and NOT via your machine. Saving bandwidth, sounds good ‘eh?

The “CopyBlob” API is asynchronous. This means that AzureCopy will complete before the blobs have actually completed copying. This may be fine, but if you’d prefer to monitor the status of the blobs as they copy you can supply the -m flag to the command. This will force AzureCopy to wait and monitor the copying.

Another recent change has been the ability to specify the blob type if the destination is Azure Blob Storage (dont forget, Azure has Page and Block blobs). This can be set by the -destblobtype flag (eg, -destblobtype page or -destblobtype block). This is only used if the destination is Azure AND the source was NOT Azure (if the source was Azure then we use the blob type of the source).

Version 0.5 is released and is available here and the source here

S3 to Azure migration.

A colleague recently asked me the best way to transfer a large amount of S3 data (> 50TB) onto Azure Blob Storage. I’d always though that if the need ever arose I’d probably be doing one of three steps:

  1. HDD sent to Amazon, transfer data to HDD’s. Send HDD to Azure guys and complete.
  2. Use some little tool that reads S3 blobs and writes to Azure Blobs.
  3. Write some tool if the existing ones didn’t do what I wanted.

Turns out option 1 has a couple of “gotchas”. Firstly, the cost in buying drives and getting Amazon to write the data is rather expensive. But also, Azure doesn’t provide the bulk upload feature (that I’m aware of). So scratch #1

For the second option, I’m aware that Microsoft does unofficially provides some tools to do this ( AzCopy ) but they don’t have all the options I require. Yes, I could pester the guy (umm humbly request) who maintains AzCopy to add new features, but being a coder I prefer option 3. (besides I dont have access to AzCopy source myself, so I cant extend it).

For option 3 I’ve decided to start from scratch using C#. Currently it has a number of functioning features as well as a larger number of planned features.

Currently it can copy between Azure and S3 (either direction) and can handle signed url’s (ie we dont have to have public urls). Although my primary aim is to help people move from S3 to Azure, either direction is possible.

The most basic command is:

azurecopy -i inputurl -o outputurl

This will download the blob from the input url to the local machine and then upload it to output url (should probably rename that destination url). This works, but is cumbersome. By default, it will store blobs in memory ( fine for small/medium blobs but obviously not a good idea once we talk 100’s of meg). To address this, we can modify the command to be:\

azurecopy -d “c:\temp\tempblobs” -i inputurl -o outputurl

This will download a copy of the blob into c:\temp\tempblobs. During upload it will upload from this file location. Currently it doesn’t clean up this download directory.

Another obvious shortfall is that if we’re interested in copying from one location to Azure (this particular scenario is focused on Azure), then we don’t really have a need to copy from a source url to the local machine and then from the local machine to Azure. Fortunately for this scenario Azure provides the wonderfully useful CopyBlob API. Essentially you tell Azure where the source of the blob is (S3 url) and then the destination (Azure url) and then leave it to Azure to do the copying directly. For example, we can do

azurecopy -blobcopy -i s3url -o azureurl

This will return immediately and currently does NOT check once the copy is completed (this will soon be rectified). But what this does is free up any bandwidth that would have been potentially used between the cloud environments and the local machine.

In addition to the individual copying of blobs, you can also copy many blobs in one command. As long as a directory is provided, then we’ll list all blobs in that container/bucket and copy them all.

azurecopy -i https://mys3.amazonaws.com/ -o https://testazure.windows.net/mycontainer

In this case the input url ends in ‘/’ which means list all blobs in this container/bucket.

The source for azurecopy is available here and binaries are available here

WARNING: This project was only started about 3 days ago and has “works for me”. I’ll continue developing this and enhancing as required and is free is use.