MySQL Compression
There are a lot of instances where it’s extremely convenient to blobs of data in a MySQL database. It’s not necessarily the most efficient storage, but it definitely has convenience points.
Speaking of convenience, MySQL actually includes compression functions that you can use right in your queries, allowing you to store the data compressed, but sill have introspection abilities without any special interface. Unfortunately, however, those functions aren’t directly compatible with, say, the PHP gzip extension, so you’re stuck packing all of the compression load on your perhaps already overloaded database.
That is, until now. A few days ago I was facing this exact problem at work, and with a little goading from the DBAs (i.e., they said it was “impossible”), I determined that I’d fix it, regardless of what it took.
After a few skinny dips into both the MySQL and gzip-extension source code, I discovered that MySQL was prepending a long integer onto the beginning of the compressed string. Luckily, this is easy to replicate in PHP:
<?php // compress compatible with MySQL UNCOMPRESS(...) $compressed = pack('L', strlen($string)).gzcompress($string); // uncompress compatible with MySQL COMPRESS(...) $uncompressed = gzuncompress(substr($compressed, 4)); ?>
Compression for all!
What about the possible space character at the end of the uncompressed string? Mysql documentation says: “If the string ends with space, an extra “.” character is added to avoid problems with endspace trimming…”
This will break in your case, wouldn’t it?
Olaf, while that could cause issues for the code above, it only occurs if you’re storing the compressed string in a CHAR or VARCHAR column. And the manual states:
Use of nonbinary string data types such as CHAR or VARCHAR to store compressed strings is not recommended … because character set conversion may occur. Use a VARBINARY or BLOB binary string column instead.
We’ve definitely ran into problems where I work by storing compressed data in a VARCHAR column, so I’d personally reiterate their warning.
This got me on the right track, but compress code needs to be the following to be compatible with MySQL UNCOMPRESS():
$compressed = pack(‘L’, strlen($string)).gzcompress($string);
Shaun: You’re correct, the code as listed before was incorrect. I must have copied it over wrong. I’ve updated the post; thanks for commenting!
pack(‘L’, strlen($string)) doesn’t work for me. I use pack(‘V’, strlen($string)). Any way thanks a lot for your example.
Carw: your comment raises a good point. The ‘L’ format of pack uses the machine byte-order, whereas N and V force big and little endian byte-orders (respectively). If you’re using different architectures between the machines running PHP and the machines running your database, you may need to explicitly specify the endianness.
Thanks for commenting!