ISPs and Mirrors

Here’s something I’ve never understood… Why don’t ISPs run mirrors of popular things for their clients? I’m having Debian update its package info, and it’s taking a while because it’s seemingly using a crappy mirror. I can customize it–and will do so later–but I’m left wondering…

From my perspective, it’d be great, because it’d be closer to me and presumably faster. But from my ISP’s perspective, it’d be an even bigger win, because it would be bandwidth that never left their network, lowering their bandwidth bills. I’m sure that work with various Linux distros doesn’t account for that much of Comcast’s (or any other ISP’s) bandwidth, but I’m equally as sure that I’m not the only Comcast user that ran an “apt-get update” today on Debian Lenny. (In fact, Debian and Ubuntu desktops both go out daily to automatically update package listings?)

And it’s not like it’s huge overhead, either. Set up a single server with a few hundred gig of disks, and it’ll merrily keep everything up to date on its own via rsync. Put one in each of your major POPs and you’re done. Maybe $10,000 invested total.

As long as you’re at it, set them up with ntp. It’s always seemed like something an ISP should do. Those are much more accurate if they’re closer. (Northeast Comcast users should note that Comcast appears to peer with MIT; MIT’s public bonehed.lcs.mit.edu is about 8ms away from me.)

Cruft

I’m a pretty firm believer that, after several years, computers build up enough cruft that you need to start from scratch. With Windows the machine has gotten unbearably slow and there’s never enough disk space. With Linux, you’re running something really old, or just itching to try something new. With any OS, you’ve got a bunch of strange problems that have come up and you’ve just come to accept as normal.

These new installs tend to be great excuses for getting new hardware, too. Might as well hold off for a new hard drive if you’re short on disk space, and that way you don’t have to wipe anything. And you might as well upgrade to 4GB of RAM before you install a new OS.

I’m at a crossroads, though. I live in a UNIX world. I work on Linux at work all day. I come home and my computer runs Linux. I work on my website, running on a Linux machine. I love Linux, but it’s a little bit of a love-hate relationship. It has a few quirks that get under my skin. But going back to Windows would make no sense for me. I suppose I could maintain Linux boxes from a Windows machine, although it’s silly. But I’m way more comfortable in Linux these days anyway.

A lot of my coworkers run Macs, and it’s something I’ve been tempted by for a long time. It’s stable, slick, and it’s based on BSD. What’s not to like? Well, what’s not to like is the price. “OSx86″ solves this (although it’s effectively software piracy), but sources say that it’s kind of like trying to install Linux 10 years ago: you’d better know every piece of hardware in your machine, and be comfortable finding and installing drivers for it.

My other battle is whether I want a laptop or a desktop. I love my Thinkpad, but I want a much bigger drive (RAID, really), and a 14.1″ LCD is comically small. And I’d like to start doing more with virtual machines, but this would require some more RAM.

I played with pricing. I can build a quad-core system with a few big SATA disks and onboard RAID (I think RAID 5 is even an option), 8 GB RAM, and a 22” LCD for under a thousand dollars. That’s less than the cost of an entry-level Mac. (Excluding the Mini…)

But at the same time, I found a few potentially great upgrades to my laptop. A 128 GB SATA disk can be had for $200-300, and it’d cost about $50 to upgrade my Thinkpad to 4GB RAM. (From 2GB.) The jury’s still out on whether a Thinkpad T60 will see the full 4GB; some reports say that something in the BIOS or part of the chipset won’t go past 3, but others seem to suggest that the people saying that are just running OSs that can’t see 4GB on a 32-bit box.

So I’m more confused then ever about what I want. Should I upgrade my laptop or buy a new desktop? And, whatever I do, what is it going to run?

MySQL Replication Lessons Learned

A couple things I ran into today that I want to keep searchable here in case I run into them again, and that I figured might be useful to someone else someday:

Let’s say that you take down a MySQL server that’s a replicated slave to do a memory upgrade, and it takes a really long time to shut down, and then you find that the machine doesn’t like the “new” DIMMs, so you throw the old ones back in and power it up. Just hypothetically. You then restart MySQL and issue the START SLAVE command, but it dies with an error:

090127 14:53:17 [ERROR] Failed to open the relay log './mysqld-relay-bin.000023' (relay_log_pos 23726)
090127 14:53:17 [ERROR] Could not find target log during relay log initialization

The relay log and position were both wholly wrong. I poked around, and found a lot of people who ran into this; it seems to be a data corruption issue, but also happens occasionally on a reboot. There’s a bunch of suggested fixes out there that don’t actually work. One thing that does work, though, is deleting the relay logs on the slave. (Any time someone on the Internet tells you to delete a file, you should, of course, think “move to another directory” instead so you can undo it if need be.) Once I deleted the relay logs, it started right up.

Lesson #2? Now you’re about 6,000 seconds behind the master, and the replication lag counter is going down at a rate of about 1 second per second. You can wait a couple hours. That seems pretty pathetic, though.

If your to-do list reads, “1.) Get slave running, and then 2.) Fine-tune my.cnf, currently stolen from another machine,” there’s a chance that you have sync_binlogs=1 set. This is bad for two reasons: the first is simply because of what it’s designed to do: flush the binlog to disk on every write. This is very safe, but also very slow. But the second reason it’s bad is that it’s apparently especially bad on ext3, so it’s doubly important to not use this option, at least not when write performance is important.

Mastering Technology

If you ever read a technical discussion board, you’ll quickly come to the realization that the breakdown of people is maybe 90% people who have kind of figured out how to use the technology, 7% people who are power users, and 3% people who are experts. It’s an arbitrary breakdown, but it seems about right intuitively.

Consider something like Excel. Most people can use it to keep tabular data, and most of those even know how to calculate sums. But very few might have a clue how to use Pivot Tables or formulas spanning multiple sheets, and even fewer will know how to extend it.

MySQL is definitely the same way. A lot of MySQL users can install it on their server and make phpBB use it. They might not understand what MyISAM and InnoDB are or how they’re different, much less the pros and cons of each. And even fewer could make a halfway decent DBA. But the good news with MySQL is that some of that elite 3% of experts are very, very vocal, and doing really, really neat things. Jeremy Zawodny is the first name that comes to mind, and check out his The New MySQL Landscape post. And don’t miss Percona’s announcement of their GPL’ed XtraDB, a replacement for InnoDB that’s supposed to be optimized for performance on more powerful machines. Seems like it’s very new and meant for MySQL 5.1, which some pretty smart people have said isn’t ready for prime-time. One of the MySQL guys at Google has a post about his patches to make MySQL better scale to ‘big iron’ type systems, too. And then there’s Our Delta (found on the Jeremy Zawodny blog) which distributes various patched versions of MySQL. Some are especially intersting to me, like Fast Master Promotion which is designed to allow a slave MySQL box to be promoted to master pretty much instantly, or the KILL IF IDLE command, allowing you to issue KILL statements to a connection and have them not affect non-idle connections. UserStats would be really helpful to run on a development machine to see what your code is impacting.

Indiana Jones

People who know me well will know that I’m not generally fond of movies. I can think of maybe a half-dozen movies I’ve seen lately. Borat is the only one I can think of that I’d recommend. Most movies just turn out to be a waste of my time. So I’m maybe not the normal movie-watcher.

That said, I’d like to review the latest Indiana Jones movie, which I watched last night. I’ll keep it nice and brief: F Minus. Made no sense.

I’m not sure how it started, but I had recently read something about greywater recyling, re-using water from things like your washing machine (as opposed to something gross, like your toilet) for other uses. It’s sometimes done in small homes, diverting the drain from your shower into your garden or the like, but it’s also done in big commercial places in the desert, where it’s filtered more heavily and reused.

At times, Indiana Jones got boring enough that I pulled out my iPhone and started reading some pages on greywater recycling. Did you know that you shouldn’t store greywater for more than 24 hours? As the action picked up a little more, I’d put it away, only to become seriously bored again and turn back to reading about recycling greywater on Wikipedia.

I don’t recall the last Indiana Jones movie I saw, but I seem to recall him as being a sort of macho, wild west hero who rides horses, kills badguys with his six-shooter and a whip, and defends ancient historical sites. Good ol’ Americana that makes sense, albeit being totally unrealistic. (He must have had about 10,000 bullets fired at him, and not a single one hit him: no special skill was involved, he was just running away and somehow machine gun fire from many guns never, ever made contact with him.)

But this one ended with a magnetized quartz (huh?) skull turning into an alien, which formed a giant UFO-vortex that sucked up the evil Russian lady, and turned what looked a lot like Machu Pichu into an ocean. And then, the end.

If anyone’s thinking of seeing it, I’d recommend you instead stay home. Here is the Wikipedia page on greywater, which includes some good links. Sure, I can think of much more interesting things to do than read up on greywater recycling. But watching this Indiana Jones movie isn’t one of them.

Sure, it had a few scenes that may have beat greywater recycling, but on average, it was slightly less interesting than reading about greywater recycling and how the various plumbing codes in the US regulate it. (Spoiler alert: some plumbing codes permit it, some do not, and most allow it with heavy regulation that usually makes it neither cost-effective nor environmentally friendly.) But besides being slightly more interesting, greywater has the advantage of making sense. Halfway through reading about greywater, it’s not suddenly going to become a magnetic skull made out of quartz, and it definitely won’t spontaneously turn into an alien, form a giant vortex, and flood Machu Pichu, sucking the evil Russians “into another dimension, the space in between spaces.”

The trouble with SPF

Unlike SAV (also known as challenge-response systems), SPF is generally a decent idea. Basically, you publish a DNS record for your domain that lists what IPs are allowed to send mail from your domain. This means that you can say that mail sent from the host ‘mail.yourdomain.com’ is valid, but if a spammer sends mail from a random hijacked box in Tijuana, it will be rejected via SPF. It doesn’t target spam directly, but rather, it targets spam that spoofs the domain. (Which is probably a very good percentage of spam.)

But I’ve recently noticed a problem I hadn’t considered before: forwarders. I can easily set up e-mail addresses on my n1zyy.com domain that will simply point elsewhere. So mail sent to helen@n1zyy.com (which is actually a spamtrap; don’t e-mail it) might just be automatically redirected to another e-mail address, say john.doe@example.com. The headers are rewritten so that the whole thing is transparent.

The problem is that, with SPF, the mailserver that redirects the mail is effectively “forging” the headers, which means that SPF will block it. If example@hotmail.com sends an e-mail to helen@n1zyy.com, and it gets redirected to john.doe@example.com, it will fail if Hotmail has an SPF record. This is because example.com gets mail saying it’s from hotmail.com, but the headers indicate that it was actually sent from n1zyy.com.

There’s a few workarounds, but most are sustainable:

  • The person running the original domain could add an SPF record for the mailserver doing the forwarding. This is all well and good if you’re sending mail from n1zyy.com and wanted me to whitelist the Comcast mailserver or something, but consider the example I used, in which case you’d have to call up Hotmail and ask them to add mail.n1zyy.com’s IP to their SPF record. They’d laugh at you.
  • The recipient mailserver could override it. You could tell your example.com mailserver that, if the header says n1zyy.com, you shouldn’t check the SPF record. Again, good luck with this, unless you run the mailserver. Also, this is getting into “I could probably hack the Postfix source code to do that…” material.
  • The recipient mailserver could be configured so that SPF will check to see if any of the mailservers along the way are listed in the SPF record, and, if so, accept the mail. This sounds like a good idea to me, to be honest, but it’s deviating a bit from what SPF was meant to do.
  • The originating mailserver could stop using SPF, and this problem would go away. But then someone would send out a hundred million spam e-mails claiming to be from that domain, and they’d all go through.

Clearly, this is the type of thing that everyone is thinking about on a Friday night.

Conservative

I consider myself a moderate Democrat. I support the right of citizens to own guns, think government should be as small as possible, and hate paying taxes.

I think Obama is to the right of me, too. I think we’ve screwed up Gitmo so bad that we need to just free everyone today, and arrest those who committed torture. Obama thinks we need to focus on the future, not dwell on the past, left the military and CIA a bit of wiggle room to get confessions, and still has people jailed.

A small, lean government is important. We need most of what we have, though. Same deal with taxes: I hate paying them, and want them as low as possible. But there’s really not a ton of cruft. And something like 50% goes to the military.

I’m appalled by Obama’s opposition to gay marriage. I think anyone who opposes it must have failed civics: terms like “separation of church and state,” “equal protection of the law,” and “tyranny of the majority” all seem pretty applicable. I consider it a moral issue like letting women vote.

So what I really want to know is… If I’m a moderate Democrat, and Obama’s to my right… What is a conservative?

Thoughts on Water

I thought I’d piggyback on Mr. T’s work to ensure we gave fair coverage to more than one beverage. I actually have a couple different things to say.

  • A year or two ago, when gas was $4 a gallon, someone mentioned that people are furious paying $4/gallon for gasoline, but merrily pay $2 for a 16 ounce bottle of water, which is $16/gallon. Consider that one comes out of the faucet and is basically free, and the other involves negotiating with third-world cartels and massive amounts of refining.
  • Am I the only one that can’t drink water in restaurants? I think most restaurants use a little chlorine (?) in their water to keep things clean and sanitary. Except that I take a sip, feel like I’m drinking pool water, and spit it out all over the table*. It’s revolting. Not only that, but most places use the same water in their soda; if it’s light chlorine, it’s alright, but in some places, I can taste it in my soda, which is even more distressing to me. The thing that I don’t get is that it’s really easy to filter water. We have some fancy machine at work that’s sort of like a water bubbler hooked up to a sink, but it claims to use charcoal filters**, UV rays, and reverse osmosis. And that’s what water should taste like: nothing. I’ve started to drink it often. This is surely overkill, too. Throw a cheap filter in line with the water you’re serving to your customers, and voila! Strangely, there seems to be no correlation between restaurant quality: even in a place where you pay $50/person for dinner, their water might as well have been taken out of the toilet.

* Tremendous hyperbole. I furrow my brow and stop drinking the water.
* I am not a chemist, but doesn’t it seem counter-intuitive that running pure water through charcoal makes it cleaner?

MySQL Replication

A few thoughts on MySQL replication:

  • Since a “READ DATA FROM MASTER” will issue locks until the slave is up to sync, it’s a horrible way to bring up a new slave to an existing setup. What’s not written about a lot, but isn’t really a big secret, is that you should use the latest backup/dump of the database to get the slave almost up to speed, and make sure you log its log position (from a SHOW SLAVE STATUS and SHOW MASTER STATUS). From there, you can just tell MySQL the binary log and the log position to resume at. In my case this means catching up on about 45,000 seconds, but it’s better than locking the database until it’s done.
  • MySQL replicates temporary tables. I didn’t believe this at first, but it’s true. This means that replicating to a machine used for lots of data mining that creates temporary tables from SELECTs that take 30 minutes to complete is a bad idea. As best as I can see, there’s no, “Don’t copy over temporary tables, silly” option, either.
  • Running STOP SLAVE will seemingly wait for the current query to finish. (You could kill it, but that’d be bad…)
  • If, in the process of bringing up a replicant for the first time, it’s trying to create temporary tables based on temporary tables…. You can tell it to ignore “table not found” errors by putting slave-skip-errors = 1146 in your my.cnf: 1146 is the error for missing tables. This will keep replication going. Risky, but if you restored from a recent, complete backup, you’ll just skip creating temporary tables…

Shazam!

I got a bunch of kinda-neat apps for my iPhone. The free version of TouchTerm works well, but isn’t anything special. Twitterific is an okay Twitter client. Pandora is kind of cool, but I wish you could put it in the background, and it’s really only good over WiFi, and when there’s WiFi, I’m almost always at a computer. Dictionaire is a nice little dictionary app, but the load time is horrendous. PW is a password wallet; I don’t use it often. “Units” is a nice unit conversion thing, but nothing special…

Shazam, though, is really cool. I listen to the radio a lot while driving, and I’ve come to the realization that, for one reason or another, announcing the names of the songs you’re playing has gone out of style. So if I hear a catchy tune I want to buy, chances are, they’re not going to announce what it is.

Shazam lets you record a brief clip of the song, sends it off to their servers, and then identifies the song. There are so many things that could go wrong with this. I figured it would be kind of like OCR software or voice recognition: a cool concept that’s usually horribly off. Except that Shazam has never, ever, been unable to identify a song. I hear a song, pick up the iPhone, start Shazam, and hit “Tag Now,” and about 30 seconds later, I’ve got the song name, some album art, and links to buy it on iTunes, as well as preview it on Youtube.

And it’s free. I’d highly recommend it for any iPhone user that listens to music.