Azure Table Storage, a revisit

It’s been a while since I used Azure Table Storage (ATS) in anger. Well, kind of. I use it most days for various projects but it’s been about 18 months since I tried performing any bulk loads/querying/benchmarking etc.

The reason for my renewed interest is that a colleague mentioned that as they added more non indexed parameters to their ATS query, it was slowing down in a major fashion. This didn’t tally with my previous experience. So I wondered. Yes, if you don’t query ATS via the Partition Key and Row Key (ie the indexed fields) then it gets a lot slower, but everything is relative, right? It might be a lot slower, but it could still be “quick enough” for your purpose. So, I’ve coded some very very basic tests and tinkered.

My dummy data consisted of 2 million rows, split equally over 5 partitions. Each entity consisted of 10 fields (including partition key and row key). The fields (excluding row and partition) were 2 x strings, 2 x ints, 2 x doubles and 2 x dates. Currently my tests only focus on the strings and 1 of the ints (but I’ll be performing more tests in the near future).

The string fields were populated with a random word selected from a pool of 104. The ints and doubles were random between 0 and 1 million. The datetime was a random date between Jan 1 1970 and 1 Jan 2016. I repeat, the doubles and dates have not yet been included in any tests.

I tested a few different types of queries, starting simple and getting slightly more complex with each change. Firstly there is the query that takes the partition key and a value for field 3 (a string). Interestingly, the results were:

Partition key and field1 4706ms (av)
Partition key, field1 and field2 5368ms (av)
Partition key, field1, field2 and field3 7232ms (av)

So, despite only one field (partition key) being indexed, the adding of the other fields into the search didn’t completely make Azure Table Storage completely unusable and slow. Yes, it *almost* doubled the query time but wasn’t the huge difference that my colleague had experienced.

One thing to remember, although I created 2 million rows, these were split over 5 partition keys, so in effect the above queries were *really* only going over 400k rows.

More tests to follow…..