Google Routing Fail

This morning we thought the Internet was down in our office. Many sites we use constantly didn’t load, and many others were extremely slow to load, including our own site. We knew, though, that our site was in tip-top shape, which had us baffled. Routing problem on our ISP’s end?

Nope, just Google being down. I searched for something and the results timed out. “Internet down for anyone else?” “Yeah, I can’t get into my GMail.” “I was just trying to check Google Analytics, and that’s not working either.” “YouTube, too!” “Our site’s loading really slow, too. Must be our pipe.”

It turns out that a ridiculous amount of sites on the Internet are using Google Analytics, including my employer and my own site. (And just this afternoon I fixed the tags on the blogs so stats will track properly.)

When my site goes offline, it’s not surprising. My host has to shut down periodically when they discover frayed wiring in the data center. If my employer went offline, it would be a bigger deal, and it’s something I could envision: our data center getting knocked offline, or me goofing the config on our production router and writing the bad changes out to all of them. But Google going down? That’s unheard of.

Arbor Networks seems to have monitoring equipment all over the place, which means that they have some pretty good insight into things. As they posted about today, Google apparently fudged a route and it propagated out, routing most of their traffic through some low-level provider in Asia. The graph, to me, is the coolest part. When have you ever seen a graph reflecting a dropoff of many gigabits per second in a really short period? It’s orders of magnitudes more than anyone is used to.

Of course, times like this make me want to learn more about BGP4, the core routing protocol used on the Internet. Anyone remember when Youtube went down last year, and it turns out that it was some country (Pakistan?) trying to block Youtube, but they did so by advertising a bogus route internally, and a series of misconfigured routers allowed them to advertise that route, and soon it had propagated across pretty much everywhere?

MacBook Memory Constraints

Since I had a hard time finding this, I’ll post it here in the hopes of helping others out. Older Core Duo MacBook Pro laptops can only address 2GB of RAM. Many people will try to tell you differently, perhaps not understanding exactly what’s going on. A Core 2 Duo will see plenty, and I think newer MacBook Pros will see beyond 2GB, because I’m pretty sure I read about someone who got 6GB in one.

But a 2006-era Core Duo (not Core 2 Duo) machine will not see more than 2GB RAM. If you’re cockily sure that this is in error and buy a 2GB stick to replace one of the 1GB sticks, you’ll have a laptop with 3GB RAM, but it won’t boot. Trust me on this.

The good news is that my Thinkpad takes the exact same memory. (The bad news is that I rarely use it…)

Dies ist ein Schmerz.

I ran into a particularly annoying problem at work, with a Rails plugin not working. I have no idea what’s wrong, and eventually talked with a senior developer, who was equally as stumped. (The problem probably has to do with the fact that acts_as_versioned, the plugin, was built in 2006. The new version, from 2008, introduces new problems that preclude us from using it right now.)

Anyway, I decided to turn to Google, and eventually found one thread where it sounds like someone ran into the same problem and may have found a solution. Unfortunately, Google Translate, usually very helpful, really falls down on the job here for some reason. Standing between me and a petty bug that’s breaking a lot of stuff are these instructions:

Ich seh grad find_versions that eh model.versions.find
was ja calls interestingly with (: all, options) which is
for your: first addition is ungut.

Anyways, I think that your error is coming from somewhere else rather.

ciao, tom

I don’t think I’ll be getting this fixed anytime soon.

Monitor Deals, Again

I’ve been on the fence for a while about picking up a new monitor for work. The provided 1280×1024 LCD is just not enough for web development, where I typically have iTunes, TextMate, multiple terminals, Firefox, Firebug, Thunderbird, Adium, and Tweetie open.

I made up my mind one weekend, but forgot about it and, when I remembered Monday morning, it turns out that NewEgg is serious when they call them weekend deals. This weekend, though, they’re serious when they call them deals! My problem this time is with making up my mind. If you’re in the market for a big LCD, do it now!

  • 22″ Asus, 1680×1050, $149.99 plus a $20 rebate and free shipping.
  • 21.5″ Asus, 1920×1080, $169.99 plus a $10 rebate and free shipping.
  • 20″ Acer, 1680×1050, $139.99. (You pay shipping.)
  • 20″ Sceptre, 1680×1050, $129.99 and free shipping.
  • Big on name brands? HP 21.5″, 1920×1080, $204.99. (You pay shipping.)
  • Samsung’s ridiculous 23″, 2048×1152 monitor, $219.99 and free shipping.
  • Hanns-G 22″, 1680×1050, $139.99 plus a $10 coupon code plus (LCD581) free shipping.

I’ve got the dirt-cheap Hanns-G in my cart right now, but it’s tempting to pay a tiny bit more and get the full 1080p Asus. Or another $50 and get the ludicrous 2048×1152 one.

Now with Tracking Goodness

I just added JavaScript tags to the global site templates for Google Analytics and Quantcast. Once both begin collecting meaningful data, I can set bloggers up with access to the stats if they have a Google account. Quantcast is a neat site mainly used by marketing firms, but accounts are free and I was curious.

Passphrases

So you all know the usual password advice. But I saw someone talking about “passphrases” the other day, and got interested. Many—but far from all—sites just take whatever you type and run it through a one-way cryptographic hash, so that it’s stored in a fairly uniform ASCII hash. If my password is blank, or if it’s the most secure password on the planet, it’s going to look about the same in the database: something like 32 characters of text when it’s passed the one-way hash.

I think the word “password” brings in some artificial limits. How many people have a space in their password? I bet it’s astonishingly low, and probably because “password” implies that it should be a word.

But if it’s all just going to be hashed, meaning that there’s no reason for a maximum password length, why can’t, “I actually used a couple sentences for my password. Crack this one, n00bs!” be my password? I have some rarely-used passwords for very important things that are probably 12+ characters long, and extremely good passwords in terms of things that a cracker wouldn’t guess anyway. But I have so stop and think. P@$$w0rDee as a (fictitious) example: anything derived from “password” is bad, but ignore that. It’s ten characters, which is pretty good, and it’s slightly altered from the word it’s based on. And it’s easy to remember “password-ee.” But was it an @ or a 4 for the first “a”? And was it the “r” or the “D” that’s upper-case? For the ones I use every day, it’s all muscle memory. But for the ones I use rarely, it might take me a full minute to type out a ten-character password, because I have to think.

And that’s where, “I bet that you can’t crack this password” comes into play as a maybe-worthwhile idea. It’s a plain English sentence that’s foolishly easy to remember, with nothing “weird” about it to hamper my memory. The fact that it’s all based on simple English words is somewhat offset by the fact that it’s so unreasonably long for a normal password that password crackers wouldn’t even bother going out that far.

I think it would also make thematic passwords easier. It’s bad practice to use the same password everywhere, but no one in their right mind is able to use a different password for every site they visit. But suppose I had, “I keep my money safe at the bank” for my bank, and “I take good care of my health and my privacy” as my password for my health insurance provider? (Again, these are fairly bizarre examples and you shouldn’t use anything close to them!) It’s much better if you mix in some non-normal-English: “I keep my money safe in el banco” helps slightly. “I keep my $$ safe in el banco” is better.

There are lots and lots of places that don’t support this, and I’m not totally convinced that this is a great idea. But the concept has me pretty intrigued.

sync_binlog on ext3

I’ve mentioned before about how the sync_binlog setting in MySQL can be especially slow on ext3. Of course, I wasn’t the first to discover this; the MySQL Performance Blog mentioned it months ago.

I was reading through some of the slides I mentioned in my last post, and remembered that I’d left sync_binlog off on an in-house replicated slave. You’re able to set it on the fly, so a quick set global sync_binlog=1 was all it took to ensure we flushed everything to disk.

A while later I blogged about dstat and thought to run it on the in-house MySQL slave. I was confused to notice that the CPU was about 50% idle, 50% “wai” (I/O wait). For a box that’s just replaying INSERTs from production, that’s really bad. Below is a graph of load on the system. Care to guess when I enabled sync_binlog?

Performace with and without sync_binlog enabled

Performace with and without sync_binlog enabled

Disk I/O does roughly the same thing, but it’s less pronounced, “only” doubling in volume. But the difference is still pretty insane!

I/O Under Linux

I really didn’t intend to do a post focusing on I/O, an absurdly boring topic to most people, but I’ve recently stumbled across a few related tools that Linux users might find interesting.

  • ionice is standard on newish CentOS, at least. I ran into a situation where I have to back up an active NFS server. This sounds like it could spell disaster, with nfsd needing fast access and rsync wanting to touch hundreds of gigs of data as fast as it can. The nice command is meant to limit number-crunching, not disk-spinning. But never fear: ionice is here! Slapping ionice -c2 -n7 in front of my mammoth rsync seems to have done the trick: NFS stayed peppy for the duration of the transfer.
  • Something that I forget often: if you’re copying files around for the first time, or it’s been so long that the files are wholly different, use straight scp or something similar, not rsync. I don’t have the numbers to back it up, but rsync is good at copying over only what’s changed, but it’s a waste of time if the source and destination files are completely different.
  • dstat is a colorful replacement* for iostat. (* because it’s not necessarily a replacement, so much as a tool folding iostat, vmstat, and some others into one.) You can read the man page for dstat to find out plenty, but just typing dstat and letting it run is a good enough starting point. Network and interrupt stats, too!
  • Some of this interest comes from perusing the slideshows on the Percona Conference site. Also worth checking out are their presentation slides in general.

Seven Deadly Sins, Nationwide

Call it Six Degrees of Separation of the Seven Deadly Sins. Fark linked to The Atlantic which linked to Neatorama which linked to MetaFilter (yay!) which linked to Las Vegas Sun article. Kansas State University geographers embarked “a precision party trick — rigorous mapping of ridiculous data,” by creating maps of the frequency of the seven deadly sins across America, per-county. Many of the techniques are debatable, which is perhaps where the “rigorous mapping of ridiculous data” comes from: envy is graphed based on burglary and theft statistics, for example, and envy is based on a ratio comparing median income to people living below the poverty line. Gluttony is based on the number of fast-food restaurants per capita, which renders just a few bright red spots in the nation. And pride is seemingly just the average of all the others, which hardly makes any sense.

Here’s the full graphics, which can be viewed much larger full-screen. Despite the “rigorous mapping of ridiculous data,” I can’t help but notice trends. Greed seems most prevalant in areas right on the water. Lust seems much more common in the “Bible belt,” but their neighbors just to the north are at the extreme opposite end of the scale. The South is high on wrath, while the North is abnormally low.

What does this data actually tell us? Next to nothing, I’d wager. And yet I can’t help but find it intriguing.

Facebook’s Police Force

Newsweek has an interesting piece this week, entitled Walking the Beat, about Facebook’s 150 employees ranging from “porn cops” who review uploaded images to ensure they keep with the site’s rules, to site security personnel (who proactively probe for site vulnerabilities), to liaisons with the police, who handle 10-20 requests by police departments a day and, more intriguing, claim to end up being involved in almost “half the crimes that attract national media attention.” Their “undercover” division mingles in online blackhat and spammer communities to keep the site’s defenses up.

An interesting takeaway from the article: 150 of the company’s 850 employees are involved in policing site content, meaning that the division accounts for nearly 20% of Facebook employees. Another interesting aspect for me is the bit about user reactions: their proactive policing has led to many protest groups, but they’re “not too worried: users may join a protest group, but the fact that they haven’t quit the site altogether shows how sticky Facebook can be.”