Spam

My e-mail setup right now for n1zyy.com and ttwagner.com consists of just forwarding all e-mail to GMail. It works fine, and the spam filters there have been pretty much 100% effective. However, it bothers me that I’m forwarding dozens, if not hundreds, of e-mails just to have them ignored. Some basic spam filtering should really take place on my server.

I made a few basic configuration changes to Postfix, the MTA I run. In a nutshell, I tell it to require stricter compliance with e-mail RFCs: e-mails with HELO addresses that don’t exist (or just don’t make sense), and people sending multiple commands before the server replies to acknowledge them, for example, now results in mail delivery failing. The default configuration errs very much on the side of ‘safety’ in accepting mail, but the trick is to tighten it down enough that you’ll reject mail that’s egregious spam, but not reject anything that could be from a mailserver. And that’s where I’m at.

I also installed SpamAssassin. I’m currently using it in conjunction with procmail, and therefore wasn’t quite sure if it works. I set it up to make some changes to the headers, so that I can verify whether it’s working. But I ran into a problem I never thought I’d have: I’m not getting enough spam. I’m sitting here eagerly awaiting some to see what happens. And it’s just not coming.

Ultimate Boot CD

Ultimate Boot CD saves the day again! This time, my 500 GB drive with lots of important stuff backed up to it randomly wasn’t being detected. Windows saw it as a raw, unformated disk, and Linux wouldn’t mount it citing disk problems.

Of course, I had some problems at first… It’s a 500 GB drive, which is greater than 137 GB. It’s also mounted over USB, thanks to this brilliant piece of technology. So DOS-based file tools were understandably a bit confused. I ended up throwing the disk in my old desktop machine, where it was used as a “real” IDE drive instead of a USB external drive. And it turns out that most of the programs can cope with it being 500GB.

Of course, this is one of those classic problems where I have no idea what actually “fixed” it. I ran a bad block check (which takes forever on a 500GB disk!), and was actually somewhat irritated when it finished having found nary a bad block. But as I poked around looking at other options, I found that filesystem tools were showing me files on the drive. All my old data? Intact!

Seriously, burn yourself a copy of UBCD and keep it with your computers. It’ll save the day. Previously, I’ve used it to reset computer passwords for a professor, and to fix a broken (err, missing) bootloader.

The Time…

I’m pretty OCD and thus run an NTP server on this server. (It should respond to any hostname on this box.) Despite the server being in Texas, I keep the timezone set to EST.

So here’s a page displaying the time. Granted, having a clock that’s accurate down to a fraction of a second (synced to the atomic clock) is no longer that impressive. But tell me you’ve never wished for an easy way to find the correct time… Now you know.

memcached

On my continuing series of me poking around at ways to improve performance…

I accidentally stumbled across something on memcached. The classic example is LiveJournal (which, incidentally, created memcached for their needs). It’s extraordinarily database-intensive, and spread across dozens of servers. For what they were doing, generating HTML pages didn’t make sense that often. So it does something creative: it creates a cache (in the form of a hash table) that works across a network. You might have 2GB of RAM to spare on your database server (actually, you shouldn’t?) and 1GB RAM you could use on each of 6 nodes. Viola, 8 GB of cache. You modify your code to ask the cache for results, and, if you don’t get a result, then you go get it from the database (or whatever) as usual.

But what about situations like mine? I have one server. And I use MySQL query caching. But it turns out it’s useful. (One argument for using it is that you can just run multiple clients on a single server to render moot any problems with using more than 4GB on a 32-bit system… But I’m not lucky enough to have problems with not being able to address my memory.)

MySQL’s query cache has one really irritating “gotcha”–it doesn’t catch TEXT and BLOB records, since they’re of variable length. Remembering that this is a blog, consisting of lots and lots of text, you’ll quickly see my problem: nearly every request is a cache miss. (This is actually an oversimplification: there are lots of less obvious queries benefiting, but I digress.) (WordPress complicates things by insisting on using the exact timestamp in each query, which also renders a query cache useless.) I just use SuperCache on most pages, to generate HTML caches, which brings a tremendous speedup.

But on the main page, I’m just hitting the database directly on each load. It holds up fine given the low traffic we have, but “no one uses it” isn’t a reason to have terrible performance. I’ve wanted to do some major revising anyway, so I think a rewrite in my spare time is going to experiment with using memcached to improve performance.

Performance++

I guess I’ve become somewhat of a performance nut. Truthfully a lot of the time is spent doing things for nominal improvements: changing MySQL’s tmp directory to be in RAM has had no noticeable impact on performance, for example. Defragging log files doesn’t speed much up either.

I was reading a bit about LiteSpeed, though. It’s got a web GUI to control it, and is supposedly much faster than Apache. I’ve got it installed, but I’m having some permission issues right now. (The problem is that changing them will break Apache, so I’m going to have to try it with some insignificant pages first.) It’ll automatically build APC or eAccelerator in. It apparently has some improved security features, too, which is spiffy. And it’s compatible with Apache, so I don’t have to start from scratch.

The base version is free, too. (But not GPL.) The “Enterprise” edition is $349/year or $499 outright purchase. To me, it’s not worth it. But if I were a hosting company with many clients, I might be viewing it differently, especially if the performance is as good as they say.

Terrible Software

Two different things that boggled my mind today:

  • CCleaner offered clean up Symantec’s log files. All 5 gig of them. (?!?!)
  • Team Fortress 2 just crashed after spending about ten minutes “loading.” It complained that there wasn’t enough memory and that I probably had the paging file disabled. The latter is true: I never recreated it after disabling it since it was in 600 pieces. But RAM? I’ve got 2 GB of it. If you can’t write code to fit in that, you deserve to be stuck in a lift. A burning lift. With a corpse.

Seriously, 2 GB RAM isn’t enough to load the game? And you need 5 GB of log files?

MiniAjax — An Awesome Site

Web developers, check it out. My one complaint is that this is an awkward assortment of things ranging from little JavaScript snippets to free (GPL) apps to proprietary, expensive applications. But there are some very cool ones in there. (Psst! Heatmap is running on this site! It’s going to take a while to build up enough data worth sharing, but I’ll let you know when the time comes.) Some of the other ones are going to make their way into some projects I’m working on.

Malus Fide

I’ve always like the idea of rewarding douchebaggery with more douchebaggery. And one bit of douchebaggery that really bugs me is that, running a webserver, it’s always getting requests for pages that have never existed. What’s going on is that people are probing for common vulnerabilities. I don’t have a /phpmyadmin, but I get multiple requests a day for it. (I do have PHPMyAdmin, but it’s up to date, secure, and at an obscure URL.) Same goes for awstats.

What I’ve always wanted to do is respond to these requests with complete garbage. Unending garbage. My long-time dream was to link a page to /dev/random, a “file” in Linux that’s just, well, random. (It’s actually a device, a software random number generator.) The problem is that linking it is full of problems, and, when you finally get it working, you’ll realize that it’s smart enough to view it as a device and not a file.

So I took the lazy route and just created a 500MB file. You use dd to copy data from a disk, with /dev/urandom as the input and a file with a .html extension as output. I had it read 500 “blocks” of 1MB. Granted, this is a total waste of disk space, but right now I have some spare space.

Of course, I was left with a few resources I was concerned about: namely, RAM, CPU time, and network activity. I use thttpd for this idiotic venture, which lets me throttle network activity. I’ve got it at 16 KB/sec right now. (Which is an effective 128 kbps.) This ensures that if it gets hit a lot it won’t put me over my (1,000 GB!) bandwidth allocation.

Apparently, though, this throttling solves the problem: at first glance, it looks like it’s smart enough to just read 16KB chunks of the file and send them out, as opposed to trying to read it into memory, which would kill me on CPU time and RAM. So the net result is relatively minimal resource utilization.

Currently, it’s just sitting there at an obscure URL. But my eventual plan is to setup a /awstats and a /phpmyadmin and a /admin and a /drupal and have them all throw a redirect to this file.

The other bonus is that, at 16KB/sec, if a human gets there, they can just hit “stop” in their browser long before a crash is imminent. But, if it works as intended, infected systems looking to spread their worms/viruses won’t be smart enough to think, “This is complete gibberish and I’ve been downloading it for 30 minutes now” and will derail their attempts at propagating.

It’s not in motion yet, though… But I’ll keep you posted.