AzureCopy GO

The Go version of AzureCopy is slowly making progress. So far I’ve just been focusing on local filesystem and Azure (since I can do those while offline on the train commute thanks to the Azure Storage Emulator). The next plan is for S3 integration, primarily because S3 –> Azure seems to be the big use case for the original AzureCopy.

I’m planning on frequent releases once the basic S3 code is added (hopefully within the next few days). Not all features from the original AzureCopy will be available, but will simply be focusing on 1) list content and 2) copy content. There will be a few new additions such as a “don’t overwrite” flag so copies can be continued after being stopped (has been requested by a few people).

Ofcourse, the original AzureCopy will still be developed (mainly from a Nuget packaging point of view) but if you just need a command line tool to copy (and maybe need it on multiple platforms) then this new version is probably the way to go.

Hopefully the S3 code will drop in a few days then I’ll have a first binary release for Linux, MacOS and Windows, and see how things proceed from there.

Advertisements

Lets Go Profiling

Recently I’ve been going through some experimenting with the Go profiler (on Windows, YMMV for other platforms). This is both the best and worse profiler I’ve ever used. Firstly, I want to address the bad part (since that’s relatively small compared to the great stuff).

Not so good

The profiler is a sampling profiler (as many are these days) but I’ve found that in many runs of the profiler, entire chunks of code are being missed. In my particular case I’m trying to profile the put/get queue methods of the Azure SDK. I have a single method that puts a few thousand messages onto the queue then retrieves them.

In the resulting profile run the puts are recorded but the gets are completely missed. I rerun the test (literally “up arrow, enter”) and I’ll get both puts and gets. Turns out one recommended way to improve the situation is to allow the profiler to run for a longer period of time. This definitely improved things but I still often had cases where entire code branches were simply missed :/

I haven’t tried it on any other platform than Windows, so maybe this is a common issue (doubt it) or if it’s just an OS issue.

The good stuff.

The simple command of “web” is fricken awesome. It produces a nice SVG that gives you a nice tree of function calls with the appropriate statistics (memory/CPU usage).

The way we generate the profile information is by running the go executable in testing and benchmark mode. For example, for profiling the Azure SDK (queuing in particular) I run the command:

go test –bench=. –benchtime=10s –run=XXX –cpuprofile=prof10s.cpu

What this does is run the test files (ie the *_test.go files) and also runs the benchmarking methods in those files (any method starting with “Benchmark” which accept the parameter *testing.B).

I’ve also told it to run for a minimum of 10 seconds. The result is a profile file called prof10s.cpu.

To load it simple run: go tool pprof  .\storage.test.exe .\prof10s.cpu

Note, storage.test.exe was produced during the test/benchmark run.

Now things get interesting. Say, to get to the top 10 CPU hungry function calls, the command “top –cum 10” can be used. In my case the results were:

top10

This information, although in theory is useful it’s showing me results for functions that setup the tests, outputted results etc, but not really the function that actually put some messages onto the queue. For that, we can specify where the profiler should focus on, such as :

top –cum PutMessage

top10-putmessage

Here we can see more useful function calls such as the ones in storage.Client.*, these are the ones I want to see the performance of.

Now, I’m a simple soul… (just ask my wife) and pictures paint such a nicer view on things. For that we can use the “web” command, or more specifically for this case “web PutMessage”. This generates a lovely SVG which makes things really clear to the user. A small snippet of the SVG looks like this:

 

topweb

This is just a small snippet of a far larger graph. But you can clearly see the major code paths of the “PutMessage” function. Where the time goes (ms times) but also the nice big bold boxes that subtly shout “LOOK HERE!!!”. This is very useful!

Others have blogged far more extensively than I ever will about the subject (eg Dave Cheney). I’m only starting out on Go let alone profiling, but it’s a seriously nice place to be.