Recently I’ve been going through some experimenting with the Go profiler (on Windows, YMMV for other platforms). This is both the best and worse profiler I’ve ever used. Firstly, I want to address the bad part (since that’s relatively small compared to the great stuff).
Not so good
The profiler is a sampling profiler (as many are these days) but I’ve found that in many runs of the profiler, entire chunks of code are being missed. In my particular case I’m trying to profile the put/get queue methods of the Azure SDK. I have a single method that puts a few thousand messages onto the queue then retrieves them.
In the resulting profile run the puts are recorded but the gets are completely missed. I rerun the test (literally “up arrow, enter”) and I’ll get both puts and gets. Turns out one recommended way to improve the situation is to allow the profiler to run for a longer period of time. This definitely improved things but I still often had cases where entire code branches were simply missed
I haven’t tried it on any other platform than Windows, so maybe this is a common issue (doubt it) or if it’s just an OS issue.
The good stuff.
The simple command of “web” is fricken awesome. It produces a nice SVG that gives you a nice tree of function calls with the appropriate statistics (memory/CPU usage).
The way we generate the profile information is by running the go executable in testing and benchmark mode. For example, for profiling the Azure SDK (queuing in particular) I run the command:
go test –bench=. –benchtime=10s –run=XXX –cpuprofile=prof10s.cpu
What this does is run the test files (ie the *_test.go files) and also runs the benchmarking methods in those files (any method starting with “Benchmark” which accept the parameter *testing.B).
I’ve also told it to run for a minimum of 10 seconds. The result is a profile file called prof10s.cpu.
To load it simple run: go tool pprof .\storage.test.exe .\prof10s.cpu
Note, storage.test.exe was produced during the test/benchmark run.
Now things get interesting. Say, to get to the top 10 CPU hungry function calls, the command “top –cum 10” can be used. In my case the results were:
This information, although in theory is useful it’s showing me results for functions that setup the tests, outputted results etc, but not really the function that actually put some messages onto the queue. For that, we can specify where the profiler should focus on, such as :
top –cum PutMessage
Here we can see more useful function calls such as the ones in storage.Client.*, these are the ones I want to see the performance of.
Now, I’m a simple soul… (just ask my wife) and pictures paint such a nicer view on things. For that we can use the “web” command, or more specifically for this case “web PutMessage”. This generates a lovely SVG which makes things really clear to the user. A small snippet of the SVG looks like this:
This is just a small snippet of a far larger graph. But you can clearly see the major code paths of the “PutMessage” function. Where the time goes (ms times) but also the nice big bold boxes that subtly shout “LOOK HERE!!!”. This is very useful!
Others have blogged far more extensively than I ever will about the subject (eg Dave Cheney). I’m only starting out on Go let alone profiling, but it’s a seriously nice place to be.