Cleaning up Maple Syrup

Even when I have maple syrup on a plate after breakfast, it’s a pain to get off. But that’s nothing compared to what happens if you spill a few drops of maple syrup in your car’s cupholder and don’t realize until the next day. I ended up leaving it for a few more days because I couldn’t think of a way to get it out. Cleaning it up with a napkin wouldn’t work since it’s far too sticky; scraping it wouldn’t work because it would leave a thin layer behind; cleaning solution wouldn’t help a lot, either. Time for a new car?

Driving home tonight, the perfect solution hit me. Freeze it and then chip it out!

Most of us (myself included) don’t have access to liquid nitrogen, but we have something almost as good: cans of compressed air, meant for cleaning out computers and such. Held perfectly upside down, it sprays a liquid that can cause nasty frostbite.

It wasn’t as easy as I’d hoped, but it was the same general plan: spray a bunch of the stuff on to freeze the crap out of it, then go in with a knife and chip the deep-frozen maple syrup chunks off. (Warning: the stuff is dangerously cold. The can carries a frostbite warning. Don’t touch it!) The problem I ran into is that it would thaw out in a matter of seconds, meaning that I had to spray a ton and then very quickly chip it out. (Because there’s nothing like rushing while you’re weilding a knife and some unknown chemical that’s ridiculously cool.)

Nerd node: The Wikipedia page Gas duster covers the contents of this canned air stuff in great detail, and also mentions something I’m noticing after the fact: they put a bitterant in to keep people from huffing the stuff (I’m not sure why, since it’s toxic?), but it’s much stronger if you hold the can upside down (since you’re basically releasing the liquid before it expands, hence using a much greater quantity). So consider doing this in an open area. At the very least, don’t eat nearby, and whatever you do, don’t end up putting your fingers in your mouth. Trust me.

snoopdogg

I saw someone following a user named “snoopdogg” on Twitter. I thought it was hilarious because the posts are exactly what you might expect Snoop Dogg to tweet: things like “bacc on the blocc what up ma hungry twizzles” and “mad love to everyone out here itz gonna be a dope dizzay.”

But with over 100,000 followers, it seems that snoopdogg really is Snoop Dogg. And in this case, it’s a neat use of Twitter for marketing. He just posted, “4 all ma djs get this track out there download link http://tinyurl.com/cgl2ka” — the link goes to a stream of Snoop Dogg Millionaire, what seems to be a hot-of-the-press new song with someone from Slum Dog Millionaire. I have to say that the song is awful (IMHO), but that’s not the point.

Removing Duplicates from MySQL

I Googled that a while ago. How do you remove duplicate rows from a SQL database? (MySQL specifically.)

If you have a ‘simple’ duplicate setup, it’s pretty easy, and lots of other sites go into great detail. (And in that case, you might look into how to require that fields be unique so you don’t have that problem anymore…)

In our case, we ran into an obscure problem where we expected the combination of two fields to be unique. Without publicizing anything secret, we had a summary table of user_id and month. (Not that it’s actually anything secret, but I refer to the table I did this on as “tablename,” just since blogging about our production database schemas seems like a bad idea…) A brief bug introduced some duplicates. My task? Remove the duplicate rows.

So I went about it somewhat backwards, and started wondering how do a SELECT DISTINCT across two columns. You can’t, really. Someone recommended something clever: CONCAT() across the two field names, and a DISTINCT on that: SELECT DISTINCT(CONCAT(user_id, month))… After toying with that a bit, I was able to create a temporary table housing a de-duped version of our data:


create temporary table nodupes
SELECT DISTINCT(CONCAT(user_id, month)) asdf, tablename.*
FROM tablename GROUP BY asdf;

But how does that help? The problem with temporary tables is that they’re, uhh, temporary. If you lived on the edge, you could probably “DROP TABLE original_table; CREATE TABLE original_table (SELECT * FROM temp_table)”, but this has a few horrible problems. One is that you’re throwing away all your disk-based data so you can populate it with a version that exists only in RAM. If anything went wrong, you’d lose the data. Furthermore, in our case, this is a live, busy table. Dropping it in production is awful, and it would take many minutes to repopulate it since it’s a huge table.

So then it was clear what I wanted to do: something like DELETE FROM original_table WHERE id NOT IN(SELECT * FROM temp_table_with_no_dupes), except not using an IN() on a table with hundreds of thousands of rows. And this is where it got tricky. The key is to use that temporary table to derive a new temporary table, housing only the duplicates. (Actually, speaking of “the key,” create one in your temporary table: CREATE INDEX gofast ON nodupes(id); — you’ll need this soon, as it’s the difference between the next query taking 10 seconds and me aborting it after 30 minutes.)

The temporary table of duplicate rows is basically derived by joining your ‘real’ table with the list of de-duped tables, with a rather obscure trick: joining where id=NULL. So here’s what I did:


CREATE TEMPORARY TABLE dupes
(SELECT tablename.*
FROM tablename
LEFT OUTER JOIN nodupes nd
ON nd.id=tablename.id WHERE nd.id IS NULL);

For bonus points, we’re selecting from the real table, so the silly CONCAT() field doesn’t come through into this temporary table.

So once again, I momentarily flashed a sly grin. This is going to be easy! DELETE FROM tablename WHERE id IN(SELECT id FROM dupes);. But then my sly grin faded. It turns out that there are two problems here. The first is that there are still a lot of rows. In my case, I was doing an IN() with about 6,000 rows. MySQL doesn’t like that. But there’s a bigger problem. (Thanks to Andrew for confirming that this is a real bug: #9090.) MySQL doesn’t use indexes when you use a subquery like that. It seems the bug is pretty narrow in scope, but this is precisely it. So that query is awful, basically doing 6,000 full table scans. And on a really big table.

So the solution comes from something I’ve never even thought about doing before. MySQL calls it the “multi-delete.” I call it the, “Holy smokes, this could go really, really wrong if I’m not careful” delete.


-- Don't run this yet
DELETE bb
FROM tablename AS bb
RIGHT JOIN dupes d
ON (d.id=bb.id);

That’s the command you want, but a few caveats. First, change the first line to “SELECT *” and run it to see what happens. Make sure it’s only grabbing the duplicate rows. You’re not able to use a LIMIT in a multidelete query, so you have to be extremely careful here. Second, we tested this multiple times on various test databases before doing it against our master database, and I was still uneasy when it came time to hit enter. Keep backups before you do this. And finally, in many cases, MySQL will expect an explicit “USE database_name” before the multi-delete, where database_name is the database housing the table you’re deleting from. Keep that in mind if you get an error that it can’t find the table, even if you explicitly reference it (via database_name.table_name).

Child Porn and Not-Child-Porn

So I think almost everyone would agree that child porn is awful, and those involved in it should be sent to jail for a really long time.

But there’s a constant barrage of news stories where prosecutors fail to use any common sense at all. A 16-year-old boy might send an explicit photo of himself to a 16-year-old girl, and a school administrator might confiscate the phone and alert police. And then the boy is arrested for creating child porn, the girl is arrested for possessing child porn, and the school administrator who confiscated the phone is also arrested. How does that make any sense at all? In another recent case, two minors (both around 16) videotaped themselves having (consensual) sex and were charged with statutory rape, since both were underage.

I think it’s a case of no one wanting to appear too lax on prosecuting child predators, so no one stops to consider that it makes no sense to arrest children as child predators. Let’s stop arresting children for child porn. And then, let’s get the media to forget the absurd phrase “sexting.” And then maybe law enforcement can start focusing their attention again on creepy old perverts who should be arrested for child porn.

Any time these news stories come up, the response is pretty much universal: “That’s absurd,” or, “They’re making way too big of a deal out of this.” I wish some legislators would have the courage to say the same thing.

Relentless Materialism

A few cool things I’ve found myself looking at today:

  • Leatherman Skeletool. I have one of their old-school multi-tools, but it’s huge. This is a trimmed-down version that looks much slicker.
  • TuneUp, a program to clean up your iTunes libraries. (Found via Uncrate.) It’s supposed to find names for your music to keep things accurately labeled, and also download missing cover art. I haven’t tested it yet, though.

Deal Roundup

I don’t know why I keep doing this, but here’s a roundup of current tech deals that seem enticing to me:

  • HP tx2z laptop, $650. 12.1″ LED-backlit screen (multi-touch), a Turion X2 processor (AMD dual-core), a 320GB disk and Radeon HD3200 video card, webcam, fingerprint reader, and a DeskJet printer. Here’s the link at HP, which doesn’t mention how you can save several hundred dollars.
  • Hitachi SATA 2.5″ notebook drive, 100GB 7200RPM with a 16MB cache. $39.99 at NewEgg.
  • Acer X203Wbd, 20″ LCD at 1680×1050, $139.99 at NewEgg. Fairly good reviews, too.
  • Linksys WRT54GL router, the one that runs Linux so well, $60 at NewEgg.
  • Toshiba laptop, dual-core Intel, 15.4″ (with a crappy 1280×800 resolution), 3GB RAM, 250GB (crappy 5400rpm) disk, Vista Home Premium, and a “SuperMulti” DVD drive, $499.99 at NewEgg.
  • USB-to-IDE/SATA adapter, $25 at NewEgg. Very handy. I guess it doesn’t support SATAII, but it’ll do SATAI and 2.5/3.5″ IDE, great for reading data off of old hard drives. (Or for people who want a backup hard drive but are too lazy to buy one in a case with a USB cord.) It includes a power supply so you can power the drives. Some people seem to be confused or have trouble, though I have not.

Google

Some of us at work were talking over lunch about Google’s servers. There are apparently hundreds of thousands of them, and it’s well-known that they’re based on commodity x86 hardware. Otherwise, they’ve been pretty secretive.

The article came out on April 1st, so many are a little doubtful about its veracity, but its seems fairly believable. Apparently, the Google machines have integrated 12V batteries in lieu of UPSs, and the servers are hosted in shipping crates.

The Digg conversation also makes mentions of GoogleFS, their proprietary in-house filesystem that apparently makes it easy to stripe data across many servers. (Hadoop is loosely an open-source version of the same, though it’s not entirely the same.)

Edit: I included the link this time. After a while I started to doubt the veracity of this article. I’ll expand on some of my thoughts about why it seems odd in teh comments, but wanted to link to this Twitter post that seems to confirm the veracity of this article. Note that it’s dated April 2, which makes me think there’s better than a 50/50 shot that this is real. And this guy is either in on the prank, or corroborates that this was actually presented at a workshop.

Foxfire

This blog post isn’t about Firefox, the leading* web browser. I really do mean Foxfire.

I was going to open with a really nerdy explanation of bioluminescence, the scientific term for biological organisms (i.e., things that aren’t chemically-altered glowsticks) that glow in the dark. But that doesn’t sound interesting. (But do click through the Wikipedia link to see glow-in-the-dark waves.)

So I’ll cut to the chase. Foxfire is a mushroom that glows in the dark, and is apparently fairly common. What’s weird is that it appears as if it was widely known by people like Benjamin Franklin, but is now entirely unknown to people who don’t read my blog.

* Among my tech-savvy peers, at least.

Thoughts on Nessus and Conficker

Does anyone else remember the days when Nessus was a GPL’ed application? It was a top-notch security scanning tool. While nmap is a really powerful port scanner, Nessus was more targeted at helping administrators and auditors spot vulnerabilities in their network. As I recall it, people kept taking GPL’ed Nessus and trying to pass it off as their own commercial product, making a pretty penny on GPL’ed software. The Nessus developers were understandably annoyed, but they did something I wish they hadn’t: they became a commercial service.

It’s still a free download, but it’s kind of like anti-virus software (actually, a lot like anti-virus software) in that they steer you into paying for updates. The version I downloaded appears to be several months behind.

The reason I downloaded it was that I had heard it had been updated to detect Conficker machines. The media had hyped Conficker as an incredibly destructive virus, so I thought I’d set myself up with some tools to detect infestations. It’s always important to be prepared against infestations.

I certainly don’t wish malice on anyone, but I have to say that I was disappointed to see what a failure Conficker was. I don’t know a single person affected by it. It’s kind of like SARS, which after months and months of being hyped as the end of the world, turned out to cause something like 20 fatalities across the world. It was practically a joke. (Except to the 20 people who died.)

If you download the latest nmap, it’s capable of picking up Conficker-infested machines, too. If you check out the changelog, don’t miss “The compile-time Nmap ASCII dragon is now more ferocious thanks to better teeth alignment.”

Uncloaking Treason

People who watch their logs closely (or who drive to the datacenter and end up looking at the console on a front-end webserver) may occasionally see messages like “TCP treason uncloaked” on Linux boxes.

The conspiracy theorists say that it’s hacking attempts and that it crashes Apache. Most of these conversations are at least 5 years old.

The networking experts point out that this is nonsense, and it’s actually just a notification that the remote host shrunk its receive window size more than expected. It’s kind of a nonsensical condition, but it’s not exactly, well, treason.

But here’s a new one: it’s also sometimes caused by a kernel bug. It’s been fixed since 2.6.14 (based on the last comment here), but many, many people run old versions in production. Though actually, the systems I’m seeing this error on are newer than that, meaning it’s not affected by the kernel bug.

So for people seeing the error, it’s usually pretty much worthless data. (Similarly, lp0 is not actually on fire.)