Virtual Directories

Over the last few days I’ve been expanding AzureCopy to handle S3 and Azure Virtual Directories (and I’m sure the concept will be useful with other cloud storage providers as well).

The idea behind Virtual Directories (I just cant bring myself to say I’ve been dabbling with VD) is to imitate the appearance of a regular filesystem structure. ie, many levels of directories and files.

Both S3 and Azure have a fairly flat way of looking at the world. S3 has its “bucket” in which you upload all your blobs, no “sub buckets”, no directories, everything is simply in the bucket.

Azure is a little more refined with its “container” concept, in the fact you can make many containers each with their own set of blobs but you can’t have containers in containers. ie, all containers must sit off the root.

Fortunately both Azure and S3 provide an out of the box method to create the illusion of a complex tree structure. Bring on Virtual Directories (Azure terminology, but is the same for S3). In this case all that is happening is the blob name itself is allowed to have the ‘/’ character in the name.

Simple…..

So if I have the URL https://myazureacct.blob.core.windows.net/mycontainer/dir1/dir2/myfile.txt then what’s really happening is that I have a container called “mycontainer” but the blob name is really “dir1/dir2/myfile.txt”.

If I wanted to list all the blobs in the container I’d use the usual process of:

var container = client.GetContainerReference( “mycontainer”);

var blobList = container.ListBlobs();

But, if I wanted to list all the blobs that were in “mycontainer/dir1” then we have a couple of options. The first would be do the above code, then filter it manually ourselves, with code like:

var newBlobs = (from blob in blobList where blob.Uri.AbsoluteUri.StartsWith(“…./mycontainer/dir1”))

This would work fine, but fortunately the Azure Storage library provides a built in alternative. What we can now do is get the container, then tell it to get a Virtual Directory within that container.

var container = client.GetContainerReference( “mycontainer”);

var vd = container.GetDirectoryReference(“dir1”);
var blobList = vd.ListBlobs();

This will now list the blobs that are listed in the Virtual Directory “dir1”.

Obviously this post has had an Azure slant to it, but S3 provides a very similar piece of functionality.

My only question now is…   when the AzureCopy API’s return a blob name…   what do we say? “myfile.txt”  or “dir1/dir2/myfile.txt” ??

Am still trying to figure that out…

Advertisements

AzureCopy API update.

The AzureCopy library Nuget package has hit a milestone of 103 downloads! (probably 6 of which are mine admittedly) but it appears as if people are at least curious about what it can provide.

So to celebrate I’ve decided to change the API. Better to do it sooner rather than later I believe. The API itself isn’t a breaking change, but I’ve been adding some methods to simplify the process of reading and writing blobs.

Up ‘til now every call had to deal with URLs. URLs aren’t fun when they’re potentially long and complex. To rectify this I’ve started the process of having the library itself generate the URLs and require the user to only provided the minimal input. This changes the way AzureCopy is used though, but not in any critical fashion.

When using URLs, it meant you could (in theory) specify any URL for any Azure/S3/Skydrive account you liked. Of course in practice your app.config file has the login details for only specific accounts so this flexibility wasn’t ever really there. AzureCopy now has the option of providing a base URL to the constructors of the various IBlobHandler implementations. This base URL is then used behind the scenes for constructing the full URLs at runtime.

eg. If I supplied the base URL as http://kenfaulkner.blob.windows.net  and then started to copy blob ABC from container XYZ, the library would simply concat the details in the right order to get the correct URL.

This means that an example I wrote earlier is still valid, but now there is an easier way.

var s3Url = “https://testabc123.s3-us-west-2.amazonaws.com”;
var azureUrl = “https://myblobstorage.blob.core.windows.net”;

var sourceHandler = new S3Handler( s3Url);
var targetHandler = new AzureHandler(azureUrl);

var blob = sourceHandler.ReadBlob(“”, “test.png”);

targetHandler.WriteBlob(“temp”, blob);

This means that manual URLs only need to be used when creating a new instance of an IBlobHandler. In the above case it’s saying copy “test.png” from my S3 account. The “” indicates the container to copy from (so in this case it just means the root container). The blob will be copied to Azure, specifically into my “temp” container.

On a side note:

Speaking of containers, I’m still in debate on how to handle “fake” directories in S3. Keeping with what people are used too with S3, I think I’ll follow the herd and just concat container names to the blob name and pretend it’s a directory. Ugly, but its the status quo.

AzureCopy assembly howto…

Now that AzureCopy has been split into a client executable and associated dll, it’s now possible to use the dll for your own applications. In this example I’ll show how to write a simple application to copy files from S3 to Azure Blob Storage. (again, for practical reasons I’d suggest using azurecopy executable itself… but hey, for those that want to code below are the steps required).

Firstly, learn to love Nuget. It the one stop shop for all your .NET assemblies and other dependencies. Assuming you know how to use it (if not, please see docs), add the “azurecopy” reference to your application. (in my case I’ll be creating a simple console app).

The AzureCopy dll has a number of key classes that implement the same simple interface. IBlobHandler is simply

public interface IBlobHandler
{
    Blob ReadBlob(string url, string filePath = “”);

    void WriteBlob(string url, Blob blob,  int parallelUploadFactor=1, int chunkSizeInMB=4);

    List<string> ListBlobsInContainer(string baseUrl);
}

I think the interface is fairly self explainatory.

– ReadBlob reads the blob stored at a given URL (with optional filePath if you want to cache it on local storage before copying to another cloud location).

– WriteBlob writes a blob to a given URL. Uploading can be performed in parallel for some cloud providers. You can specify the level of parallelism as well as how large those parallel chunks should be. These features will be expanded with future releases.

– ListBlobsInContainer is well, given a URL, list the blob contents (non-recursively).

Currently AzureCopy reads all its configuration from an App.Config file. Ideally these configurations will get injected by the client app, but for now, App.Config works fine.

Beyond the usual App.Config boilerplate, the entries needed are:

<add key=”AzureAccountKey”value=”/>
<add key=”AWSAccessKeyID” value=”” />
<add key=”AWSSecretAccessKeyID” value=”” />

The AzureAccountKey can be retrieved from the Azure Portal , likewise the AWS (S3) details are directly from Amazons Portal.

Now to the code.  There are only 2 lines of “real” code, one to read the blob and one to write (and hopefully you can figure out which is which) Smile

var sourceHandler = new S3Handler();
var targetHandler = new AzureHandler();

var s3Url = “https://testabc123.s3-us-west-2.amazonaws.com/test.png”;
var azureUrl = “https://myblobstorage.blob.core.windows.net/temp/”;

var blob = sourceHandler.ReadBlob(s3Url);

targetHandler.WriteBlob(azureUrl, blob);

In this case I want to copy from the S3 URL and write into the azureURL. The blob is stored in the S3 bucket “testabc123” (made up) and is writing to the Azure Storage account “myblobstorage” (again, made up) and put into the “temp” container.

This is just a starter on how to use AzureCopy assembly. In future posts I’ll go over more complex scenarios such as copying multiple files, asking Azure to do the copying for you (so you dont use any bandwidth!!!) among other things.

AzureCopy now on Nuget!

After a bit of bit of refactoring I’ve now split azurecopy into a executable project as well as a core dll that does most of the work. The dll is now available via Nuget (search for azurecopy) for anyone to start using the library for their own projects.

An example of how to use the library can be taken directly from the azurecopy executable project (https://github.com/kpfaulkner/azurecopy/blob/master/azurecopycommand/Program.cs).

At its simplest, you need to make 2 instances BlobHandlers (AzureHandler, S3Hander, SkydriveHander or FileSystemHandler). Each of these handlers implement the IBlobHandler interface which provides basic functionality (read, write, list).

For example you can do:

var azureBlobHandler = new AzureHandler();

var s3BlobHandler = new S3Handler();

var blob = s3BlobHandler.ReadBlob( <my s3 url….>);

azureBlobHandler( <my azure url>, blob )

And that will simply copy from S3 to Azure.

The AzureCopy assembly uses an App.Config file to get its configuration. The client executable that is referencing this dll will need to provide the file. Eventually the configuration will be injected from the executable to the client lib, but due to recent refactorings that has not been done yet.

As usual the source is here, downloadable executable is here and now the Nuget library is here.