Spam

So my new policy is to keep spam ‘on file’ for three days. It’s filed away as spam so no one sees it, but it’s good for analysis and such, to protect against future spam. Several times a day, I run a little script to delete spam older than three days and optimize the tables, to keep things running fast.

So this table is particularly telling of the spam problem. Akismet is catching just about all of it, so it’s not a big problem for me per se, but the fact remains that, with three days of spam and something like nine months of legitimate comments, spam accounts for right around two-thirds of all comments on my blog. Wow-a-wee-wow!

Geolocation

The concept of matching an IP to a country is known as IP geolocation, often just “IPGeo” or “GeoIP.” There are lots of reasons for using IP geolocation, ranging from the mundane (identifying countries in your webserver logfiles) to the questionable (banning countries from your server to cut down on spam) to the neat (doing it at firewall/router level and redirecting a user to the closest data center).

Most of the work is just done on a country level. You take an IP (72.36.178.234, my server) and look it up in a database, and get “UNITED STATES” as an answer. There do exist databases on finer levels, down to the city, but they’re expensive and often wrong. (I keep getting ads to find hot singles in Mashpee, more than 100 miles away and in a different state… Or maybe it’s Mattapan. Whatever the case, they’re not even close.)

It turns out that you can download a free database of IP-country mappings. It’s not infallible, but they say it’s 98% accurate. The database itself won’t do you any good. It’s a compressed CSV (comma-separated variable).

In the comments section here, there’s a snippet of PHP code to take the CSV and convert it to a huge series of SQL inserts, which you input into a database… (Hint: for whatever reason, his preg_match is imperfect and leaves a few instances of the word “error” in the middle of the file. It’s probably a bad idea, but I just commented out the “echo error” line. I end up with a 5.7MB SQL query. You can also just download the thing directly here (warning: 5.7 MB SQL file). Note that, per the license terms, I disclose in the comments that it’s a derivative work of their CSV file.

The other important catch is that IPs are stored as long integers, not ‘normal’ IPs. You’ll presumably want to use PHP + MySQL to get the country associate with PHP, so I’ll provide pseudocode in a minute. PHP provides an ip2long() function, but it only takes you halfway, but leaves you with sign problems. (Argh!) It’s an easy fix, though, and you want something like the following:

$long = sprintf("%u", ip2long($ip));
$query = "SELECT a2,a3,country FROM ip2c WHERE start <= $long AND end >= $long";

You then, of course, run $query and parse through it… You get 2- and 3-letter country codes, as well as the full country name. I use it, with good results, in seeing what country comment spam is coming from. (Most of it comes from the US.)

A MySQL query isn’t the proper way to do this: there exist binary files with the same data that result in faster lookups. But this is the simplest way to start doing IP geolocation in ten minutes time, and, with the query cache enabled, there’s not a ton of overhead.

I’m tempted to write some scripts to allow people to ‘browse’ the database, either looking up an IP, or to view it by country.

Update: Weird Silence has a binary implementation of this same database that’s supposedly much faster. The main page is here, the PHP one is here, and the C one is (t)here. (I’m wondering if it makes sense to write a PHP script to call the C version, and what the performance implications would be?)

Update 2: Get your country flags here.

Amazon S3

I really didn’t pay it that much attention, or think about its full potential, at the time it was released. But Amazon’s Simple Storage Servic (hence the “S3”) is really pretty neat. In a nutshell, it’s file hosting on Amazon’s proven network infrastructure. (When have you ever seen Amazon offline?) They provide HTTP and BitTorrent access to files.

Their charges do add up — it might cost a few hundred dollars a month to move a terabyte of data and store 80GB of content. But then again, the reliability (and scalability!) is probably much greater than what I can handle, and it’s apparently much cheaper than it would be to host it with a ‘real’ CDN service.

Sadly, I can’t think of a good use for this service. I suppose the average person really doesn’t need to hire a company to provide mirrors of their files for download. (It would make an awesome mirror for Linux/BSD distributions, but I think the typical mirror is someone with a lot of spare bandwidth and an extra server, not someone paying hundreds a month to mirror files for other people… I wonder if there’s a market for a ‘premium’ mirror service? I doubt it, since the existing ones seem to work fine?)

Islam

One thing I ran into in the Obama campaign was persistent rumors that he was a Muslim. I always thought it was pretty dumb that people were actually convinced of this, but it took me a while to realize that the real problem is what they don’t say, but surely think: they think that he’s Muslim and therefore a bad person.

I wish more people were at least marginally familiar with Islam. It’s a peaceful religion with a few fundamentalist nutjobs who interpret their scriptures in bizarre ways. Really not unlike Christianity.

There are two major sects, the Sunnis, with 85% of the Muslim population, and the Shi’a, accounting for around 15%.

Jihad itself is an interesting term. Thought to refer to “holy war,” it’s actually an ambiguous term referring to anything from holy war to a “struggle to improve one’s self and/or society” (per Wikipedia). And even when it does refer to holy war, there are lots of restrictions: it’s not supposed to include non-combatants, for example.

I don’t know half as much as I’d like to about Islam, giving its increasing importance in the world. But I do wish that more people would at least stop labeling all Muslims as terrorists.

Business Geek

Tonight I ate at a small restaurant in Amherst, and had the most delicious bottle of root beer ever. Called Virgil’s, it’s kind of hard to put my finger on what makes it so good. As I read the bottle for clues, I noticed that they were publicly traded. I thought this was strange, given that I’ve never even heard of them.

But indeed, they’re REED on the NASDAQ. And they closed out 2006 with a -21% profit margin and a -124% return on average equity. The “past” quarter (ended September ’07–newer results aren’t in) was exceptionally bad, with an almost -40% margin. But as I dug deeper, I realized that this wasn’t such a bad thing. They retired (paid) $1.6 million of debt, after a capital infusion of several millions (“paid-in capital”). They still had an outstanding $8.24 million deficit, but it’s maybe a good sign.

I’d still have reservations, though: the past quarter saw $3.88 million revenues, generated with $5.4 million of operating expense. They’ve got to find a way to either cut these costs, or grow revenues. (Or, preferably, do both!) Recent announcements suggest that Reed has found some new distributors and supermarkets to carry their chain, which may be what they need to come into the black.

And after all of this, I realized something: I set out to see if I could buy their soda online. And I ended up scrutinizing the company’s financials.

Retail Politics

One of the things that rocks about New Hampshire is the so-called “retail politics,” where politicians have to get out and work to convince us that we should vote for them. Running TV ads and blowing Iowa and New Hampshire off doesn’t work, as Giuliani proved.

Last weekend, we went to a house party in Merrimack (hosted by a fellow ham, actually), where a few dozen people came to hear Massachusetts Governor Deval Patrick speak about Obama. If you look at the US as a whole, this is a terrible proposition: the governor of Massachusetts takes an hour out of his day (well, probably more like three, if you account of travel time and all) to talk to thirty or so people? And yet this is what it takes.

Governor Patrick, by the way, is an awesome guy. He came around and talked to each person in the room. I told him I was going to school in Massachusetts, and he thanked my mom for “loaning” him to them. He seemed to genuinely care.

title=”Governor Patrick in NH by n1zyy, on Flickr”>Governor Patrick in NH

He has this incredible way of, when talking to you, making it seem like you’re the only person in the room. Here’s the governor of Massachusetts, coming up to someone’s house in New Hampshire, and talking to my mom and I as if he’s an old friend.

He spent a good deal of time just mingling, before he finally addressed us as a crowd and talked about Obama. He kept that brief, and then asked us a lot of questions. At one point, he was talking, and happened to say something along the lines of, “And I’ll tell you why I–” right as the home phone rang. Being the awesome person he is, he added, “And I’ll tell whoever’s calling,” and then picked up their phone.

title=”Answering the Phone by n1zyy, on Flickr”>Answering the Phone

“Hello, this is Governor Patrick.” I don’t really know what the person on the other end said, but I can only imagine they were somewhat confused. “We’ve got quite an enthusiastic crowd here for Obama,” he said, before asking the caller if they supported Obama. “No? Well then I’m afraid whoever you’re calling for isn’t home,” he joked before handing the phone over to the home’s residents.

Whoa’8

One thing that I find oddly fun is thinking about possible Pres-VP combinations.

Some that come to mind are obvious: Clinton-Edwards, Obama-Edwards… Each has its own nuances that are neat to explore. But there’s another reason I think it’s interesting. In the business world, if you have a fragmented market–many sellers in a market all competing–it makes sense to try to merge some of the small guys to become a powerhouse. (Obviously, you can take this too far and become an anti-competitive monopoly.) Where this tactic is especially important is when the markets are bad. (We’ve discussed at length whether Ford and GM should merge.)

I think the Democratic race is fragmented. (Republicans, too, but in a different way right now.) We have three candidates all attracting substantial support. I have to wonder what would happen if, say, Obama somehow convinced, say, Edwards to be his running mate. Would they form a powerhouse?

There are a lot of combinations that are laughably improbable. I don’t think we’ll ever see {Clinton, Obama}-{Romney, Giuliani}. They’re at opposite ends of the spectrum, and I think {Clinton, Obama} fans would be turned off that they’d picked {Romney, Giuliani} as a running mate, and vice versa. But I do like the idea of bipartisan couplings. I also don’t think that an Obama-Clinton (or Clinton-Obama) ticket is likely. They’ve spent so much time at each others’ throats that I can’t see it working.

But here are two that I find, to quote Kucinich, viable:

Obama-Richardson: They complement each other well, and, in my opinion, are both awesome candidates. Richardson is far behind in the polls, and thus doesn’t really stand a chance of getting the nomination; I’m far from the first to talk about him being in it for VP. Obama has Senate experience; Richardson has gubernatorial experience. Obama doesn’t have much foreign policy experience; Richardson has heaps of it. Obama brings an exciting, fresh perspective; Richardson brings decades of solid experience. (I’m not implying that Obama has no experience, nor that Richardson is ‘stale’–neither is true.) And neither of them are white, which is neat in a way.

Obama-Huckabee: Hear me out! Of the Republicans, I think Huckabee is my favorite. I certainly don’t agree with every position of his, but there are two things I really like about him. One is that he’s a good, honest guy. I think anything he does will be because he thinks it’s truly the right thing to do, not because it’ll make him rich. I think Obama-Huckabee would be the “cleanest” Administration in history. (Not in borderline-racist “clean and articulate” terms, but in “actually fighting for the American people and not doing anything crooked” terms.) And the second thing is that I love the way he views his faith–a call for him to do good on Earth. A religious, conservative Republican against the death penalty and in favor of helping the poor? Wow-a-wee-wow! There are some big differences between them, and I don’t know how reconcilable they are. But there comes a third benefit, too: done right, I think a bipartisan running ‘couple’ attracts the most votes. A Republican who would never go for Obama-Clinton might be convinced to vote for Obama-Huckabee. Not to mention centrist independents.

Desk Cleaning for the OCD

I tend to border on obsessive-compulsive. I wax the wheels on my car. I’m still not happy with the enormous performance gains I’ve gotten by tuning my WordPress setup to move from 4 to 400 (dynamic) pages per second.

So it stands to reason that how I clean my desk is… unique. The problem is that the desk is synthetic wood, so there exist a total of zero cleaning products that work on it. Using something like Windex cleans it but leaves it looking even more dull. And don’t even think about using a wood cleaner. I used Pledge once. It looked great, but it remained slippery for about a week. The Pledge is apparently supposed to sink into the wood. This is somewhat difficult when the desk isn’t made of wood, so you instead get an incredibly slick, oily desk.

I think I finally found the key, though: car wax. You clean with one of the many cleaners just to get junk off of it–I used Simple Green, but Windex would work just as well. And then you pour liquid car wax on and apply it with a cloth as if you were applying it to your car. Wait a while for it to dry, and wipe the white hazy stuff off. The result is a high sheen. It didn’t do as well as I’d like with filling in the scratches, but it looks much better overall.

I admit it: I’m a total dork. I just waxed my desk. But… It works.

Right Down through the Wire

It’s time! I’m going to go grab some lunch, but then I’m going out to cast my vote, run a couple errands, and then spend the rest of the day on Get Out The Vote activities. When the polls close at 8, I’ll breath a sigh of relief that I can sit down, but I think my nerves will be shot, too, as I go somewhere with my fellow supporters to watch the results come in.

New Hampshire residents, don’t forget to vote!