An Image Idea

Some of my favorite posts are the ones with images. I like to sometimes post photos I take, and they can really make a post much better. (As an example, Kyle’s recent headphones post.)

There are some ‘risks’ with allowing images, though:

  • Offsite images can change. You might post a little picture you found somewhere, and have the image on that server be replaced by a 1600×1200 porn image. Or an advertisement. Etc. Not a big deal with the way people are using them now, really.
  • Offsite images can slow the page down. Lately I’ve been working on benchmarking the site a lot, trying to get pages to load quickly. I can’t optimize the load times of images that aren’t on my server, though.
  • Offsite images can be used for tracking. The remote site gets the IP, and lots of other information, of the visiting site. This is probably a non-issue here, but images can be, and are, used for tracking purposes all the time.
  • I can’t really ‘regulate’ images: You could post a dozen 1600×1200 images as uncompressed TIFFs, and there’s nothing I can do about it. (Well, I could, and would, edit your post…)

It just occurred to me, though, that I could theoretically write some code to work around these issues, such as by doing the following:

  • Get the text from the database to display. (This is, of course, what happens so far.)
  • Scan the text for image tags.
  • If an image tag is found, see if we have the image cached already:
    • If so, we just change the image tag to point to our local cache instead.
    • If not, the server can go and download the image into its cache.
      • It can then ‘process’ it as needed, such as scaling it down and making sure it’s not an animated GIF.

An even better extension of this idea would be to enclose the image tag in links to the full original. (Although this falls apart if the image is already linked.)

I guess there are a few issues (besides taking the time to implement it):

  • There may be legal issues, as I’m essentially saving and redisplaying someone else’s images. I don’t think this would really be a big deal.
  • The post-processing can’t look like crap. I have no idea what to expect.
  • It would raise the server’s bandwidth usage. If someone links to a bunch of images, the bandwidth comes from that server. When they’re hosted here, it’s my bandwidth. But since I’ve been coming about 999 GB short of hitting my 1,000 GB limit, this isn’t a big issue right now. (Also, half the goal is to reduce the size of the images, so the impact wouldn’t be as big.) In extreme cases, it would also increase resource usage: normally serving up a couple small images is peanuts, but if the site were to be hammered with traffic, it’d slow things down somewhat.
  • We need to somehow limit the size of the cache. This can be done simply, by just setting a limit on how large the cache can grow and deleting the oldest images when it exceeds that size. This isn’t a perfect solution, though; for example, it has the implicit assumption that newer images are more important to cache. This is probably accurate more often than not, but it’s not always the case. (Example: an old post with images is linked to from other sites, or comes up a lot in searches.)

5 thoughts on “An Image Idea

  1. I’m in a contemplative mood…

    It’d be nice to process the post images as the post was being saved, rather than each time it got displayed (or cache the processed version). It would, however, require some some magic to detect cache misses (special 404 handler for the images directory?).
    The cache garbage collection could rely on last accessed time, not creation/modification time, although this might prevent you from just using the file-system and add overhead to each image view. (Although you could also post-process the access logs.)
    As you’re adjusting the image you could also add some sort of copyright notice/original URL to the image itself. This would probably ease legal concerns.
    If you served the images with something like Lighty, resource usage would be minimal.
    Speaking of large amounts of traffic, you might want to think about some sort of locking mechanism, too: no sense in having two threads fetch the same image at the same time.

Leave a Reply to andrew Cancel reply

Your email address will not be published. Required fields are marked *