BlobSync now has parallel uploading of the binary deltas to Azure Blob Storage. Sounds like an obvious improvement (which I’ll continue to expand/improve) but wanted to make sure all the binary delta edge cases were working before adding tasks/threads into the mix. Currently the parallel factor is only 2 (this will be soon configurable) but it’s enough to prove it works. There have been some very tough bugs to squash since the 0.2.2 release, particularly around very small adjustments (byte or two) at the end of files being updated. These were being missed out previously, this is now fixed.
A small design change is how BlobSync uses small signatures when trying to determine how to match against new content. The problem is when we should and should NOT reuse small signatures.
For example (sorry for dodgy artwork), say we have a blob with some small signatures contained in it:
Then we extend the blob and during the update process we need to see if we have any existing signatures that can be reused in the new area:
The problem we have is that if these small signatures are a few bytes in size and they’re trying to find matches in the new area (yellow) there is a really good chance that they’ll get a match. After all, there are only 256 values to a byte! So what we’ll end up with is a new area that is potentially reusing a lot of small signatures instead of making a new block/signature and uploading the new data. Now strictly speaking we usually want to reuse as many signatures/blocks as we can but the problem with using so many tiny blocks is that we’ll soon fragment our blobs so much that we’ll end up not being able to update properly. Don’t forget a blob can only consist of 50000 blocks maximum.
So a rule BlobSync 0.3.0 has added is that if the byte range we’re looking at (yellow above) is greater than 1000 bytes and the block/signature we’re looking at is greater than 100 bytes then we’ll attempt to match OR if the byte range and the signature are exactly the same size. This way we’ll hopefully reduce the level of fragmentation and only add the volume of data being uploaded by a small percentage.
Sigexplorer has also been improved when you want to view the signatures being generated. Instead of rendering all signatures at once in the tree structure it will simply populate the “branches” as the user clicks on them. This reduces the load time significantly and makes the entire experience much quicker.