Ultimate Boot CD

Ultimate Boot CD saves the day again! This time, my 500 GB drive with lots of important stuff backed up to it randomly wasn’t being detected. Windows saw it as a raw, unformated disk, and Linux wouldn’t mount it citing disk problems.

Of course, I had some problems at first… It’s a 500 GB drive, which is greater than 137 GB. It’s also mounted over USB, thanks to this brilliant piece of technology. So DOS-based file tools were understandably a bit confused. I ended up throwing the disk in my old desktop machine, where it was used as a “real” IDE drive instead of a USB external drive. And it turns out that most of the programs can cope with it being 500GB.

Of course, this is one of those classic problems where I have no idea what actually “fixed” it. I ran a bad block check (which takes forever on a 500GB disk!), and was actually somewhat irritated when it finished having found nary a bad block. But as I poked around looking at other options, I found that filesystem tools were showing me files on the drive. All my old data? Intact!

Seriously, burn yourself a copy of UBCD and keep it with your computers. It’ll save the day. Previously, I’ve used it to reset computer passwords for a professor, and to fix a broken (err, missing) bootloader.

memcached

On my continuing series of me poking around at ways to improve performance…

I accidentally stumbled across something on memcached. The classic example is LiveJournal (which, incidentally, created memcached for their needs). It’s extraordinarily database-intensive, and spread across dozens of servers. For what they were doing, generating HTML pages didn’t make sense that often. So it does something creative: it creates a cache (in the form of a hash table) that works across a network. You might have 2GB of RAM to spare on your database server (actually, you shouldn’t?) and 1GB RAM you could use on each of 6 nodes. Viola, 8 GB of cache. You modify your code to ask the cache for results, and, if you don’t get a result, then you go get it from the database (or whatever) as usual.

But what about situations like mine? I have one server. And I use MySQL query caching. But it turns out it’s useful. (One argument for using it is that you can just run multiple clients on a single server to render moot any problems with using more than 4GB on a 32-bit system… But I’m not lucky enough to have problems with not being able to address my memory.)

MySQL’s query cache has one really irritating “gotcha”–it doesn’t catch TEXT and BLOB records, since they’re of variable length. Remembering that this is a blog, consisting of lots and lots of text, you’ll quickly see my problem: nearly every request is a cache miss. (This is actually an oversimplification: there are lots of less obvious queries benefiting, but I digress.) (WordPress complicates things by insisting on using the exact timestamp in each query, which also renders a query cache useless.) I just use SuperCache on most pages, to generate HTML caches, which brings a tremendous speedup.

But on the main page, I’m just hitting the database directly on each load. It holds up fine given the low traffic we have, but “no one uses it” isn’t a reason to have terrible performance. I’ve wanted to do some major revising anyway, so I think a rewrite in my spare time is going to experiment with using memcached to improve performance.

Performance++

I guess I’ve become somewhat of a performance nut. Truthfully a lot of the time is spent doing things for nominal improvements: changing MySQL’s tmp directory to be in RAM has had no noticeable impact on performance, for example. Defragging log files doesn’t speed much up either.

I was reading a bit about LiteSpeed, though. It’s got a web GUI to control it, and is supposedly much faster than Apache. I’ve got it installed, but I’m having some permission issues right now. (The problem is that changing them will break Apache, so I’m going to have to try it with some insignificant pages first.) It’ll automatically build APC or eAccelerator in. It apparently has some improved security features, too, which is spiffy. And it’s compatible with Apache, so I don’t have to start from scratch.

The base version is free, too. (But not GPL.) The “Enterprise” edition is $349/year or $499 outright purchase. To me, it’s not worth it. But if I were a hosting company with many clients, I might be viewing it differently, especially if the performance is as good as they say.

Filesystems

On my continuing obsession with squeezing every bit of performance out of this system… They say that Linux filesystems don’t get fragmented. I never understood this. It’s apparently smarter about where files are placed. But still, frag-proof? If it was that easy, other filesystems would have figured it out long ago too. I figured that the explanation was just over my head. In reality, the “explanation” is that it’s a myth.

oxygen bin # fragck.pl /home 2.19458018658374% non contiguous files, 1.03385162150155 average fragments. oxygen bin # fragck.pl /var/log 56.3218390804598% non contiguous files, 28.9425287356322 average fragments. oxygen bin # fragck.pl /var/www/ 1.45061443222766% non contiguous files, 1.05527580153377 average fragments. oxygen bin # fragck.pl /etc 2.18023255813953% non contiguous files, 1.05450581395349 average fragments. oxygen bin # fragck.pl /var/lib/mysql/ 16.5424739195231% non contiguous files, 2.93740685543964 average fragments.

The results kind of make sense: /var/log is full of files where you’re constantly appending a line or two to various files, so it only stands to reason that, if the filesystem isn’t very careful, fragmentation would build up. The other one is /var/lib/mysql, where MySQL stores its data. It’s the same deal as /var/log, really, in that it’s continually adding files.

/var/log/messages, the system log file, is in 75 pieces. Its backup, messages.1.gz,was in 68.

Realistically the performance hit is negligible. It’s not like a core system file is in hundreds of pieces. (Like, say, the paging file!) /bin has very low fragmentation. Log files can be fragmented an not impact anything. (Except my OCD.) Although I am concerned about MySQL’s data stores building up fragmentation. In theory I can bring the database down and shuffle the files around, but it’s probably best left alone right now.

Fortunately, there’s hope… By moving a file to another partition, you cause it to move physical locations. Something like mv messages /tmp/ramdisk && mv /tmp/ramdisk/messages . will cause the file to be rewritten. (Granted, this particular command was an awful idea: syslog-ng keeps /var/log/messages open, and doesn’t like it when the file randomly disappears. The fact that it was only gone for a split-second doesn’t change the fact that the files location has changed.) Although don’t get too excited about this: for some reason, fragmentation sometimes ends up worse! access_log was in 60 pieces. Now it’s in 76.

I’ve also heard it said that some fragmentation isn’t necessarily a bad thing: a few files close together on the disk with light fragmentation is better than frag-free files on opposite ends of the disk. But that doesn’t satisfy my OCD. I guess the moral of the story is to not muck around too much with things. Or, “if it ain’t broke, don’t fix it!”

Speeding up MySQL with tmpfs?

I’m still getting a decent percent of files being created on disk in queries, even though my tmp_table_size is an astonishing 128MB. (The whole blogs database uses about 6MB of disk.)

The problem is described here: TEXT and BLOB queries apparently don’t like being in memory. This page explains it further.

The problem is that… These are blogs. Aside from some trivial cross-table type stuff, every single query uses TEXT queries. Interestingly, the solution everyone proposes is using a ramdisk. I was somewhat concerned about using a ramdisk, though: for one, the procedure for creating it looked somewhat arcane, and one place talking about it mentioned that his 16MB of ramdisk was almost as big as his 20MB hard drive. I think of my old 20GB hard drive as ridiculously old. The other reason, though, is that ramdisk is scary: it’s a finite size. I’d love something like a 1GB ramdisk for /tmp, but I don’t even have a gig of RAM, much less a gig to allocate for file storage.

Enter tmpfs. In a nutshell, it’s like tmpfs, but the size can be dynamic, and it can swap, which means I don’t have to worry about my 16MB tmpfs partition trying to store 17MB of data and blowing up. Creation was eerily easy:

# Make a directory to use as a mountpoint, and give it a misleading name
mkdir /tmp/ramdisk

# Mount it as type tmpfs
mount -t tmpfs tmpfs /tmp/ramdisk -o size=16M,mode=1777

In my.cnf, it’s as easy as changing tmpdir=/tmp/ to tmpdir=/tmp/ramdisk/.

And now, we let it run for a while and see how performance feels.

Undoing bad tar files

Proper ‘etiquette’ for packaging a tar file is to package it so that it extracts to a directory. But sometimes, the files are packaged by an idiot, and, when extracted, just extract the files right to whatever directory it’s in. (Which is fine, if you expect it.)

tar takes a “t” option (“t” for “list”–get it? I don’t…) to list (list?) the files in a directory. You can use it two ways:

  • Pre-emptively: tar ft file.tar will show you how it’d extract.
  • Retroactively: rm `tar ft file.tar` will list the files, and pass them as an argument to rm, deleting the mess it just made.

vim Trick of the Day

:g/nbsp/d

That command sets up a “range” of all lines (hence the g for global) that match “nbsp”, and runs the command “d” (delete) on them.

I’m working with a file that was converted by a script from HTML, but had some carryovers that were whole unnecessary lines… No lines with ‘desirable’ content had the nbsp in them, just the junk ones. So we delete them all.

Kudos to this site for the inspiration.

vim tricks

If you’re cool like me, you spend a decent amount of time in vi editing files. Despite all the fancy IDEs and the like, nothing beats uploading your PHP script to the webserver and editing in place. I don’t profess to be a vi expert. I’m far from it, in fact. But for those that are like me–comfortable working in it but far from being a master–here are a few tips:

  • Typing “G” (in command mode, but not as a : command!) takes you to the last line of the file.
  • ma, where a is a letter a-z, sets a as a ‘mark’. You can then issue commands reflecting that mark. For example, I wanted to delete about 500 lines from a file. But I didn’t know how many lines there were, so “500dd” wasn’t a viable option. In my case, I marked the last line I wanted to delete with a, went up to the first line I wanted to delete, and then typed d’a to delete from the current line to mark a. Note that, as you’re doing this, there’s no indication of it.
  • . (a single period) runs the last command again. Handy way more often than I’d expect!
  • :wq is probably the most well-known command. But ZZ (not :ZZ) is easier and does the same thing!

This is a handy reference, by the way. So isn’t the O’Reilly book, but you can’t Google your way through that.