Web Compression

I’ve alluded before to using gzip compression on webserver. HTML is very compressible, so servers moving tremendous amounts of text/HTML would see a major reduction in bandwidth. (Images and such would not see much of a benefit, as they’re already compressed.)

As an example, I downloaded the main page of Wikipedia, retrieving only the HTML and none of the supporting elements (graphics, stylesheets, external JavaScript). It’s 53,190 bytes. (This, frankly, isn’t a lot.) After running it through “gzip -9” (strongest compression), it’s 13,512 bytes, just shy of a 75% reduction in size.

There are a few problems with gzip, though:

  • Not all clients support it. Although frankly, I think most do. This isn’t a huge deal, though, as the client and server “negotiate” the content encoding, so it’ll only be used if it’s supported.
  • Not all servers support it. I don’t believe IIS supports it at all, although I could be wrong. Apache/PHP will merrily do it, but it has to be enabled, which means that lazy server admins won’t turn it on.
  • Although it really shouldn’t work that way, it looks to me as if it will ‘buffer’ the whole page then compress it, then send it. (gzip does support ‘streaming’ compression, just working in blocks.) Thus if you have a page that’s slow to load (e.g., it runs complex database queries that can’t be cached), it will appear even worse: users will get a blank page and then it will suddenly appear in front of them.
  • There’s overhead involved, so it looks like some admins keep it off due to server load. (Aside: it looks like Wikipedia compresses everything, even dynamically-generated content.)

But I’ve come across something interesting… A Hardware gzip Compression Card, apparently capable of handling 3 Gbits/second. I can’t find it for sale anywhere, nor a price mentioned, but I think it would be interesting to set up a sort of squid proxy that would sit between clients and the back-end servers, seamlessly compressing outgoing content to save bandwidth.

2 thoughts on “Web Compression

  1. Pingback: Matt’s Blog » Building an Improvised CDN

Leave a Reply

Your email address will not be published. Required fields are marked *