Azurecopy (GO version) pre-release

As mentioned in previous posts, I’ve been writing a GO version of AzureCopy so people would have something that works cross platform (Linux, MacOS and Windows). Today I’ve released the first pre-release just to test the waters. It supports Windows only (simply due to not having compiled up the other platforms yet), and only supports local filesystem, S3 and Azure Blob Storage.

Baby steps.

The plan is to build for Linux,OSX, then start adding other cloud platforms. Meanwhile the original Azurecopy (Windows only, full dotnet framework) will still be developed (mainly from a Nuget/library point of view).  If you just need an executable to perform copying, then I suggest using this newer version.

Some examples of using this  newer version:

image1

In this case we’re just listing the contents of my testken123 (super secret) bucket. My AccessID and AccessSecret are passed in via command line options. The output format is in a basic tree structure (will add in a bog-standard list soon). In the above case, the top of the tree is “testken123” which is the bucket name. Under that we have 2 virtual directories (remember Azure/S3 etc do not really have directories but fake it by using / as a delimiter). In this case we see there is a blob called “ken1/test1” which treats the “ken1” part as a directory and “test1” as the blob name. Same applies for all the other results.  Simple enough.

Then we have:

image2

In this case we’re copying from my local filesystem (c:\temp\data\s3\) into the S3 bucket testken123. The console output is just to show what is going to be copied. Output will be modified to show progress.

Finally we have:

image3

That’s coping from Azure Blob Storage to S3. Same deal, basic output.

For every command it is possible to pass the “-debug” flag. This makes things VERY verbose but is extremely useful for figuring out issues.

This is just a first step, pre-release, uber new version. Please give it a go and let me know if there are any issues. The plan is to start cranking out changes pretty frequently.

0.1.0 version

Advertisements

AzureCopy GO

The Go version of AzureCopy is slowly making progress. So far I’ve just been focusing on local filesystem and Azure (since I can do those while offline on the train commute thanks to the Azure Storage Emulator). The next plan is for S3 integration, primarily because S3 –> Azure seems to be the big use case for the original AzureCopy.

I’m planning on frequent releases once the basic S3 code is added (hopefully within the next few days). Not all features from the original AzureCopy will be available, but will simply be focusing on 1) list content and 2) copy content. There will be a few new additions such as a “don’t overwrite” flag so copies can be continued after being stopped (has been requested by a few people).

Ofcourse, the original AzureCopy will still be developed (mainly from a Nuget packaging point of view) but if you just need a command line tool to copy (and maybe need it on multiple platforms) then this new version is probably the way to go.

Hopefully the S3 code will drop in a few days then I’ll have a first binary release for Linux, MacOS and Windows, and see how things proceed from there.

Lets Go Profiling

Recently I’ve been going through some experimenting with the Go profiler (on Windows, YMMV for other platforms). This is both the best and worse profiler I’ve ever used. Firstly, I want to address the bad part (since that’s relatively small compared to the great stuff).

Not so good

The profiler is a sampling profiler (as many are these days) but I’ve found that in many runs of the profiler, entire chunks of code are being missed. In my particular case I’m trying to profile the put/get queue methods of the Azure SDK. I have a single method that puts a few thousand messages onto the queue then retrieves them.

In the resulting profile run the puts are recorded but the gets are completely missed. I rerun the test (literally “up arrow, enter”) and I’ll get both puts and gets. Turns out one recommended way to improve the situation is to allow the profiler to run for a longer period of time. This definitely improved things but I still often had cases where entire code branches were simply missed :/

I haven’t tried it on any other platform than Windows, so maybe this is a common issue (doubt it) or if it’s just an OS issue.

The good stuff.

The simple command of “web” is fricken awesome. It produces a nice SVG that gives you a nice tree of function calls with the appropriate statistics (memory/CPU usage).

The way we generate the profile information is by running the go executable in testing and benchmark mode. For example, for profiling the Azure SDK (queuing in particular) I run the command:

go test –bench=. –benchtime=10s –run=XXX –cpuprofile=prof10s.cpu

What this does is run the test files (ie the *_test.go files) and also runs the benchmarking methods in those files (any method starting with “Benchmark” which accept the parameter *testing.B).

I’ve also told it to run for a minimum of 10 seconds. The result is a profile file called prof10s.cpu.

To load it simple run: go tool pprof  .\storage.test.exe .\prof10s.cpu

Note, storage.test.exe was produced during the test/benchmark run.

Now things get interesting. Say, to get to the top 10 CPU hungry function calls, the command “top –cum 10” can be used. In my case the results were:

top10

This information, although in theory is useful it’s showing me results for functions that setup the tests, outputted results etc, but not really the function that actually put some messages onto the queue. For that, we can specify where the profiler should focus on, such as :

top –cum PutMessage

top10-putmessage

Here we can see more useful function calls such as the ones in storage.Client.*, these are the ones I want to see the performance of.

Now, I’m a simple soul… (just ask my wife) and pictures paint such a nicer view on things. For that we can use the “web” command, or more specifically for this case “web PutMessage”. This generates a lovely SVG which makes things really clear to the user. A small snippet of the SVG looks like this:

 

topweb

This is just a small snippet of a far larger graph. But you can clearly see the major code paths of the “PutMessage” function. Where the time goes (ms times) but also the nice big bold boxes that subtly shout “LOOK HERE!!!”. This is very useful!

Others have blogged far more extensively than I ever will about the subject (eg Dave Cheney). I’m only starting out on Go let alone profiling, but it’s a seriously nice place to be.

GO-ing like a scolded… never mind

My personal Go projects have been temporarily put side while I’ve been contributing to the Azure Go SDK. I haven’t had this much fun coding in ages. One pull request accepted and 4 currently being reviewed, not bad for a Go newbie. I have to say the Microsofties running that repo have been very patient with me, particularly around doing non-Go-ish ™ things in Go. Maybe I should have written more of my own stuff first, but I thought jumping in the deep end will more exciting.

What I’ve learned:

  • Returning error instances as opposed to exceptions really isn’t that big a deal. Little more boilerplate code but I can live with that.
  • More about godoc, that is so damn useful. Make sure you use it!
  • You can defer anonymous functions as opposed to just running a particular function. This is great if you want to capture the error of the deferred call.
  • So much more about the general HTTP classes, existing Azure Go SDK and various string util functions.

Given I’m an Azure storage nutter I’ve been completely focusing on Blob Storage issues for now, hopefully I’ll start work on some table storage issues soon. This is so damn enjoyable Smile

Go… the adventure continues

So far so good in my Go exploring. I’m still mainly focusing on my Go port of AzureCopy , and I’ve got to say it’s been heaps of fun.

Firstly, looking back at my previous blog post to see what I’ve actually done against my planned outline:

  • Dev environment. Check…  definitely up and working fine. Important note to Visual Studio Code users (on Windows at least). Make sure you follow the VERY useful instructions at StackOverflow if you want to get debugging (via Delve) working.
  • Basic solution. Check…   the structure is fairly different from the original AzureCopy but this isn’t anything to do with C# vs Go… it’s purely down to experience and knowing what works and what would work better. The newer structure I have with the Go version could easily be applied to any other language, but for now I’m not going to start changing my C# version.
  • Haven’t hit the local filesystem yet…  but have started with Azure. So far can list containers and read blobs (again all in the new structure). VERY happy with the results.

 

Ok, overall very happy with Go, but my complaints (which I think are just the usual bunch of complaints I’m seeing with people new to Go) are:

  • No exceptions, just heaps of bloody err != nil checking. Seems tedious… but I’m sure I’ll understand WHY they chose this (eventually)
  • Debugging with Delve isn’t that easy yet. I swear the debugger jumps about the place a little. So far I’m mostly relying on log files rather that realtime inspection of variables in a debugger.

Coding in Go really does give me a “retro rush” which makes me feel like I’m back in the late 90’s coding C. I’m REALLY enjoying it. Yes, it can seem primitive (compared to C#) but at the same time it feels pure, easy enough to fit in my memory. This is good! Smile

The adventure continues….

Adventures in GO!

I’ve dabbled (ok ok, writing and rewriting “hello world” many times) in Go for a few years but have never really given it a serious Go (boom boom!) But after buying a GO in Action and going through a number of great Pluralsight courses (particularly by Nigel Poulton and Mike Van Sickle ) I’ve decided to give it another crack.

Instead of going through various tutorials I’ve decided to try porting (well more likely rewriting from scratch) my AzureCopy project. The original AzureCopy is all C# running on the .NET Framework 4.*. Although I DO (well did until recently) want to get it migrated to DotNET Core I thought it would be a good chance to learn Go PROPERLY.

I’m still trying to get my head around OO in a “kinda-is, kinda-isn’t, sorta, maybe” OO language like Go. Going back to structs (ahh glory days of C/C++), interfaces and having the magic of pointers back is really giving me a nostalgia kick.

The rough outline for this AzureCopy rewrite is basically as follows:

  • Get my dev environment sorted out (currently VSCode)
  • Basic solution structure sorted, rough architecture
  • Be able to copy to/from the local filesystem to Azure Blob Storage
  • List blobs/containers in Azure
  • Add S3
  • Add DropBox
  • Add OneDrive

Really don’t think I’ll bother with Sharepoint this time around, was a bitch to maintain in the existing version.

I’m unsure what the Go support is like with those cloud providers etc. I know Azure one seems mostly there (well for the stuff I need) but I get the distinct impression it’s the poor cousin to .NET, Java, Python etc.  I’ve yet to investigate S3’s Go offerings. Hopefully if these libs aren’t in a great shape I might get a chance to finally get my name on a contributors list somewhere. Smile

I’m sure my Go will suck…  but am hoping it will get better. The new version of AzureCopy is ofcourse on Github.

Azure Table Storage, a revisit

It’s been a while since I used Azure Table Storage (ATS) in anger. Well, kind of. I use it most days for various projects but it’s been about 18 months since I tried performing any bulk loads/querying/benchmarking etc.

The reason for my renewed interest is that a colleague mentioned that as they added more non indexed parameters to their ATS query, it was slowing down in a major fashion. This didn’t tally with my previous experience. So I wondered. Yes, if you don’t query ATS via the Partition Key and Row Key (ie the indexed fields) then it gets a lot slower, but everything is relative, right? It might be a lot slower, but it could still be “quick enough” for your purpose. So, I’ve coded some very very basic tests and tinkered.

My dummy data consisted of 2 million rows, split equally over 5 partitions. Each entity consisted of 10 fields (including partition key and row key). The fields (excluding row and partition) were 2 x strings, 2 x ints, 2 x doubles and 2 x dates. Currently my tests only focus on the strings and 1 of the ints (but I’ll be performing more tests in the near future).

The string fields were populated with a random word selected from a pool of 104. The ints and doubles were random between 0 and 1 million. The datetime was a random date between Jan 1 1970 and 1 Jan 2016. I repeat, the doubles and dates have not yet been included in any tests.

I tested a few different types of queries, starting simple and getting slightly more complex with each change. Firstly there is the query that takes the partition key and a value for field 3 (a string). Interestingly, the results were:

Partition key and field1 4706ms (av)
Partition key, field1 and field2 5368ms (av)
Partition key, field1, field2 and field3 7232ms (av)

So, despite only one field (partition key) being indexed, the adding of the other fields into the search didn’t completely make Azure Table Storage completely unusable and slow. Yes, it *almost* doubled the query time but wasn’t the huge difference that my colleague had experienced.

One thing to remember, although I created 2 million rows, these were split over 5 partition keys, so in effect the above queries were *really* only going over 400k rows.

More tests to follow…..