It’s been a while since I used Azure Table Storage (ATS) in anger. Well, kind of. I use it most days for various projects but it’s been about 18 months since I tried performing any bulk loads/querying/benchmarking etc.
The reason for my renewed interest is that a colleague mentioned that as they added more non indexed parameters to their ATS query, it was slowing down in a major fashion. This didn’t tally with my previous experience. So I wondered. Yes, if you don’t query ATS via the Partition Key and Row Key (ie the indexed fields) then it gets a lot slower, but everything is relative, right? It might be a lot slower, but it could still be “quick enough” for your purpose. So, I’ve coded some very very basic tests and tinkered.
My dummy data consisted of 2 million rows, split equally over 5 partitions. Each entity consisted of 10 fields (including partition key and row key). The fields (excluding row and partition) were 2 x strings, 2 x ints, 2 x doubles and 2 x dates. Currently my tests only focus on the strings and 1 of the ints (but I’ll be performing more tests in the near future).
The string fields were populated with a random word selected from a pool of 104. The ints and doubles were random between 0 and 1 million. The datetime was a random date between Jan 1 1970 and 1 Jan 2016. I repeat, the doubles and dates have not yet been included in any tests.
I tested a few different types of queries, starting simple and getting slightly more complex with each change. Firstly there is the query that takes the partition key and a value for field 3 (a string). Interestingly, the results were:
Partition key and field1 | 4706ms (av) |
Partition key, field1 and field2 | 5368ms (av) |
Partition key, field1, field2 and field3 | 7232ms (av) |
So, despite only one field (partition key) being indexed, the adding of the other fields into the search didn’t completely make Azure Table Storage completely unusable and slow. Yes, it *almost* doubled the query time but wasn’t the huge difference that my colleague had experienced.
One thing to remember, although I created 2 million rows, these were split over 5 partition keys, so in effect the above queries were *really* only going over 400k rows.
More tests to follow…..