Filesystems

On my continuing obsession with squeezing every bit of performance out of this system… They say that Linux filesystems don’t get fragmented. I never understood this. It’s apparently smarter about where files are placed. But still, frag-proof? If it was that easy, other filesystems would have figured it out long ago too. I figured that the explanation was just over my head. In reality, the “explanation” is that it’s a myth.

oxygen bin # fragck.pl /home 2.19458018658374% non contiguous files, 1.03385162150155 average fragments. oxygen bin # fragck.pl /var/log 56.3218390804598% non contiguous files, 28.9425287356322 average fragments. oxygen bin # fragck.pl /var/www/ 1.45061443222766% non contiguous files, 1.05527580153377 average fragments. oxygen bin # fragck.pl /etc 2.18023255813953% non contiguous files, 1.05450581395349 average fragments. oxygen bin # fragck.pl /var/lib/mysql/ 16.5424739195231% non contiguous files, 2.93740685543964 average fragments.

The results kind of make sense: /var/log is full of files where you’re constantly appending a line or two to various files, so it only stands to reason that, if the filesystem isn’t very careful, fragmentation would build up. The other one is /var/lib/mysql, where MySQL stores its data. It’s the same deal as /var/log, really, in that it’s continually adding files.

/var/log/messages, the system log file, is in 75 pieces. Its backup, messages.1.gz,was in 68.

Realistically the performance hit is negligible. It’s not like a core system file is in hundreds of pieces. (Like, say, the paging file!) /bin has very low fragmentation. Log files can be fragmented an not impact anything. (Except my OCD.) Although I am concerned about MySQL’s data stores building up fragmentation. In theory I can bring the database down and shuffle the files around, but it’s probably best left alone right now.

Fortunately, there’s hope… By moving a file to another partition, you cause it to move physical locations. Something like mv messages /tmp/ramdisk && mv /tmp/ramdisk/messages . will cause the file to be rewritten. (Granted, this particular command was an awful idea: syslog-ng keeps /var/log/messages open, and doesn’t like it when the file randomly disappears. The fact that it was only gone for a split-second doesn’t change the fact that the files location has changed.) Although don’t get too excited about this: for some reason, fragmentation sometimes ends up worse! access_log was in 60 pieces. Now it’s in 76.

I’ve also heard it said that some fragmentation isn’t necessarily a bad thing: a few files close together on the disk with light fragmentation is better than frag-free files on opposite ends of the disk. But that doesn’t satisfy my OCD. I guess the moral of the story is to not muck around too much with things. Or, “if it ain’t broke, don’t fix it!”

Speeding up MySQL with tmpfs?

I’m still getting a decent percent of files being created on disk in queries, even though my tmp_table_size is an astonishing 128MB. (The whole blogs database uses about 6MB of disk.)

The problem is described here: TEXT and BLOB queries apparently don’t like being in memory. This page explains it further.

The problem is that… These are blogs. Aside from some trivial cross-table type stuff, every single query uses TEXT queries. Interestingly, the solution everyone proposes is using a ramdisk. I was somewhat concerned about using a ramdisk, though: for one, the procedure for creating it looked somewhat arcane, and one place talking about it mentioned that his 16MB of ramdisk was almost as big as his 20MB hard drive. I think of my old 20GB hard drive as ridiculously old. The other reason, though, is that ramdisk is scary: it’s a finite size. I’d love something like a 1GB ramdisk for /tmp, but I don’t even have a gig of RAM, much less a gig to allocate for file storage.

Enter tmpfs. In a nutshell, it’s like tmpfs, but the size can be dynamic, and it can swap, which means I don’t have to worry about my 16MB tmpfs partition trying to store 17MB of data and blowing up. Creation was eerily easy:

# Make a directory to use as a mountpoint, and give it a misleading name
mkdir /tmp/ramdisk

# Mount it as type tmpfs
mount -t tmpfs tmpfs /tmp/ramdisk -o size=16M,mode=1777

In my.cnf, it’s as easy as changing tmpdir=/tmp/ to tmpdir=/tmp/ramdisk/.

And now, we let it run for a while and see how performance feels.

DB Stats

I’ve been playing with phpMyAdmin and doing a bit of optimization of it. A few stats:

  • Since I upgraded the kernel, MySQL has been up for a little under 3 days and 11 hours.
  • The DB server has moved 841 MiB of traffic. This is 10 MiB an hour.
  • It’s processed 131,048 queries. This is about 1,580 an hour.
  • 132,000 inserted rows.
  • 96K queries served out of MySQL’s query cache.
  • 1,393 temporary tables created on disk to handle queries. This seems like a bottleneck, although it is only a tiny percentage.

I’ve just restarted MySQL to apply some configuration changes. (Actually, I could have changed them on the fly now that I think about it…) I tweaked the settings a bit: MySQL allows you to set limits on how much RAM it can use for various operations, and I tend to be very frugal. But I think I was shooting myself in the foot there: it was relying on disk a bit too much. It’s not like I’m running a load average of 25 and am moving gigs of traffic a day, where tuning is really vital, but it still bothers me that it’s not as efficient as it could be.

Geekery

One of my weird OCD concerns is that some of the scripts I host place a heavy load on the server. I want to make sure that, in busy times, they don’t weigh down things further. Here’s a neat little bit of PHP I wrote to simply have PHP abort the page load if the 1-minute load average is over 2.00:

// Check the uptime first
$fh = fopen('/proc/loadavg', 'r');
$uptime = fread($fh, '4');
fclose($fh);

if ($uptime>2) {
die("Sorry, we're too busy.");
}

Rather than die(), you might throw a redirect to a cache or something else. And I should point out that, of course, running this code does take some CPU time… And that this script doesn’t always make sense: you’re basically forcing a failure before the server itself forces the failure. The time it makes sense is in the way I’m using it — when some unimportant, tangential project requires inordinate resources and you want to make sure it doesn’t slow the server down too excessively, at the expense of the more important projects (e.g., the blogs).

The Bible

I found a script that does that sort of Markov chains mentioned. I use it in PHP.

I needed a large body of text, though. Just using someone’s blog posts, for example, just results in a lot of repetitiveness. It’s no good. For bonus points, I wanted a large body of text that sounded kind of strange no matter how it was read.

So I found the Bible. It’s doubly good because the wording is pretty archaic, so you’re use to having to carefully analyze it to divine some meaning. While a guy on a forum saying he recently spent an evening with a grain of salt comes across as nonsense, in the context of the Bible you might try to read into it. This is perfect for this script!

Here’s the page. A lot of it’s sheer nonsense, but some of it’s incredibly good. In lieu of actual verse numbers, the script picks up on the numbers and very consistently plugs in two numbers in front of text.

Some recent highlights:

22 7 And David said unto Saul, I saw gods ascending out of the land to bury with the passengers those that remain upon the face of the earth, and upon every high mountain…

The zombies are coming? To kill the living?

5 11 Woe unto them! for their day is come, the time that David was escaped from Keilah; and he forbare to go forth. 23 14 And he went through the corn fields on the sabbath days. 4 32 And they were offended in him. But Jesus stooped down, and with his mouth, and began at the same scripture, and

This is one of those ones that almost tells a ‘coherent’ story about David escaping from Keilah, running through cornfields even on the Sabbath, which offended people. But Jesus stooped down to begin scripture. I’m fairly certain that no such verse appears in the Bible, though.

22 3 And David prepared iron in abundance for God had made them rejoice with great joy

That’s not much of a party….

Of course, sometimes it seems to get in a sort of loop… Anyone who’s read the Bible will recall that it, at various times, launches into really lengthy lists of people’s names and the relations between them. So I cringe whenever it begins doing that, because sometimes it just doesn’t stop. Here’s a good illustration of that:

are honest, whatsoever things are lovely, whatsoever things are honest, whatsoever things are just, whatsoever things are pure, whatsoever things are true, whatsoever things are honest, whatsoever things are honest, whatsoever things are honest, whatsoever things are true, whatsoever things are lovely, whatsoever things are honest, whatsoever things are just, whatsoever things are pure, whatsoever things are true, whatsoever things are lovely, whatsoever things are lovely, whatsoever things are just, whatsoever things are pure, whatsoever
29 2 And he placed forces in all the coasts thereof, from two years old was Jehoash when he began to reign, and he reigned eleven years in Jerusalem.

My biblical history isn’t so hot, but I’m fairly certain that rulers had to be at least three to begin their reign.

15 6 In the morning sow thy seed, and in the water

o_O

40 4 And the glory of their strength in the tabernacles of Ham
16 59 For thus saith the Lord GOD; Behold, I will stand upon my watch, and set me in dark places, as they that must give account, that they may lay hold on bow and spear; they are cruel, and have no child, and her husband were dead, she bowed herself and travailed; for her pains came upon her.

Say what?

33 25 Wherefore say unto them, My little finger shall be thicker than my father’s loins.

Is that an actual verse? It sounds like it may have been the equivalent of a your-mom insult from the biblical era?

Anyway, go see for yourself. Just don’t expect every verse to be good.

vim Trick of the Day

:g/nbsp/d

That command sets up a “range” of all lines (hence the g for global) that match “nbsp”, and runs the command “d” (delete) on them.

I’m working with a file that was converted by a script from HTML, but had some carryovers that were whole unnecessary lines… No lines with ‘desirable’ content had the nbsp in them, just the junk ones. So we delete them all.

Kudos to this site for the inspiration.

There Goes My Hero

Watch him as he goes! It was the usual “wasting time on Wikipedia” path — I started reading about nuclear fission, and then read about Los Alamos, and then read about the supercomputers, one of which ran Plan 9, so I read about Plan 9, and then its GUI, and then the guy who wrote the GUI. And there was an allusion to someone else, Mark V. Shaney. So I read about him.

In a nutshell, it was a script a few of the Plan 9 guys wrote that would process a lengthy body of text and do some statistical analysis, and use that to spit out writing. It was AI, in a sort, but “schizophrenic” is the best way I’ve seen it described. You read it and it’s one of those things where, for a minute, it makes sense, but then it radically shifts topics or draws some sort of completely irrelevant conclusion. Kind of like a lot of people on the Internet, actually.

They had some fun with textbooks. Here‘s an example, in which the code was fed a basic arithmetic textbook:

Why do we count things in groups of five. When people learned how to count many things, they matched them against their fingers. First they counted out enough things to match the fingers of both hands. Then they put these things aside in one quart. A giant-size bottle that will hold four quarts is a three-digit number….

It starts of making good sense, but suddenly they go from counting on your fingers to putting “these things” in a quart, and is pretty incomprehensible from there.

Here’s another really funny one. You read it, and can kind of comprehend it. But the first reply summarizes it well: it suddenly shifts from constipation to understanding the 19th century, with no logical shift. I think that commenter may have been aware of what was going on. The second guy accurately nails what’s going on.

Finnegan’s Wake? This one cracks me up a lot. But you read this, and doesn’t it exactly sum up what’s wrong with Internet forums? The people just seem totally bonkers, and like they’re ranting but not really sure what they’re ranting about. He manages to talk about being good in bed and using the latest version of BSD in the same sentence. The reply is hilarious, because it’s exactly what you’d think if you didn’t know what was going on: that the “guy” posting was on some serious drugs.

This one, though, is my all-time favorite. It starts off as some religious rant, but clearly not a coherent one. But the fifth paragraph is the best paragraph ever written:

When I meet someone on a professional basis, I want them to shave their arms. While at a conference a few weeks back, I spent an interesting evening with a grain of salt. I wouldn’t take them seriously!

I’m fairly certain there are AI ‘bots’ out there that do this same thing, maybe in more coherent forms. I want to acquire one. Badly. I’ve always been interested in the ‘bounds’ of nonsense—when something kind of makes sense, you work with it. We “understand” people shaving their arms in professional settings, and we can visualize someone spending an evening with a grain of salt, and I surely wouldn’t take them seriously afterwards. But we’re making ‘sense’ out of sheer nonsense generated by a computer. How far will it go before we think, “This is complete nonsense.”

vim tricks

If you’re cool like me, you spend a decent amount of time in vi editing files. Despite all the fancy IDEs and the like, nothing beats uploading your PHP script to the webserver and editing in place. I don’t profess to be a vi expert. I’m far from it, in fact. But for those that are like me–comfortable working in it but far from being a master–here are a few tips:

  • Typing “G” (in command mode, but not as a : command!) takes you to the last line of the file.
  • ma, where a is a letter a-z, sets a as a ‘mark’. You can then issue commands reflecting that mark. For example, I wanted to delete about 500 lines from a file. But I didn’t know how many lines there were, so “500dd” wasn’t a viable option. In my case, I marked the last line I wanted to delete with a, went up to the first line I wanted to delete, and then typed d’a to delete from the current line to mark a. Note that, as you’re doing this, there’s no indication of it.
  • . (a single period) runs the last command again. Handy way more often than I’d expect!
  • :wq is probably the most well-known command. But ZZ (not :ZZ) is easier and does the same thing!

This is a handy reference, by the way. So isn’t the O’Reilly book, but you can’t Google your way through that.

Dual Successes

We’ve spent weeks preparing for a group presentation in one of my classes. And tonight it all came down to the wire. We were all really nervous, and frankly, I thought we did pretty badly. But the professor’s a down-to-earth guy, so after class one of my members mentioned, “We did so bad!” or something to that effect. And he glanced around to make sure the other groups that went weren’t around, and told us, “I can honestly tell you that your group’s presentation made my night.” And that made my night.

And then when I was upgrading Apache I screwed up and deleted the vhost configuration files. And it’s one of those things I never understood… I tried recreating them but they never behaved in a way that made any sense at all. I’d load them and get errors that made no sense, or the server would just act in strange ways. I finally got it to the point where the blogs worked, even though nothing else did, so I left it. While there are nice GUI tools for Apache, they’re not much good on a headless server. (And no sane person runs X remotely on a server, since it’s a needless waste of CPU and running VNC would make it even worse.)

I just spent some time reading up on vhost configuration, and just got it right… I had the syntax all wrong the first time along, to the point that I’m surprised the server was coming up at all. I think I’m actually going to put together a static page on how to properly set up vhosts, because in my searches for help I found a lot of people with similar problems.

Tech Tricks

Here are a few low-tech computer tricks I’ve started doing lately:

  • I’ll periodically bump the wrong keys and find keyboard shortcuts that I didn’t know existed for sending an e-mail mid-sentence. It’s one thing when you’re e-mailing a friend ramblings about cheese (they may even be glad the e-mail got cut short?), but when you start e-mailing important people, it becomes a bigger deal. The last thing you want to do is e-mail the chief of police and say, “I’m working on an article and I’d like to mee”… The simple ‘fix’ is to not let your e-mail program dictate how you compose a message. The “To:” line comes first. Do it last, so you can’t mess up.
  • When attaching files, do it before you write the e-mail. I can’t believe how often people (myself very much included) send e-mails referring to attachments, but forget to add the attachment. If you can get in the habit of making attaching the file first, it’s a lot harder to mess up.
  • When downloading things from the Internet, always, always, always click “Save” instead of “Open.” I tend to do Open instead, because it seems like a needless step to save it to the Desktop and then open it. But in the past week I’ve lost two files because I click “Open” on a draft someone sends me. I spend a long time revising it, and hit Save every minute or so. But it gets saved to a temp directory that’s virtually impossible to find. Today I spent considerable time poking around the directories, and found that what’s stored is VERY limited. If you’ve visited any sites after you last saved the file, it’s practically assured that your file is 100% gone, because the cache will get purged. As I’ve said before, I’d consider this a fatal design flaw, and I can’t believe more people don’t have problems with this. So always, always, always save to your Desktop and then open. And, if you’re working on a file and about to close, don’t close it unless you’re positive you know where the file is being saved.

All of these are things that take some time getting used to. But I think they’re like, say, using a PDA: you have to commit to doing it 100%, or it’s utterly useless. If your calendar doesn’t contain everything you’re doing, it’s worse than having no calendar at all. I need to work on automatically clicking that “Save” box when downloading a file, and I need to work on re-ordering, into a non-intuitive way, the way I write e-mails. But if I can get the habit down right, the first time, in mid-sentence, I get an error that I can’t send an e-mail with no recipient named, it’s paid off. And the first time I don’t lose an hour’s worth of revisions and additions, it’s paid off.