Heh

Today’s little bit of “Now that I think about it, that makes sense…” wisdom: if your system happens to be a recursing nameserver, when running something to display open network connections, don’t let it resolve hostnames… Caching keeps it from becoming an infinite loop, but you will end up opening a new network connection for every nameserver in the chain… And each of those requires a DNS lookup…

Wikipedia Geek

I noticed Wikipedia was extraordinarily slow.

Instead of being normal and thinking, “Huh, I can’t reach Wikipedia,” I decided to investigate. One neat thing about Wikipedia is that it’s very open: not just in the sense than anyone can edit Wikipedia or that Mediawiki is open-source, but that you can view their Ganglia monitoring system and even an (off-site) server admin log, which in this case reveals the problem. (Reversing the order of entries so that it’s chronological):

# 14:00 RobH: updated redirects.conf and pushed change for orphaned domains.
# 14:01 RobH: Site is down, go me =[
# 14:06 RobH: Pushed out old redirects.conf and restarted apaches.
# 14:10 RobH: Site back up, slow as squids play catchup.

So there you have it. A broken configuration file got released, breaking all the backend Apache webservers. It was fixed, but seems that the cache is still being rebuilt, so it might be slow for a while. In the meantime, why not read up on the Wikipedia server cluster? Or some graphs? For example, the daily bandwidth usage: pmtpa is their Tampa, FL colocation facility (serving the US) and knams is in the Netherlands (I think). (yaseo, the yellow one on the legend with no data, is a Yahoo data center in Korea.) You can see that the US cluster is hitting 3 Gbps, while the Europe cluster is exceeding 5 Gbps (!).

In any case, Wikipedia’s back up now. 😛

Blacklists

I don’t put a lot of faith in DNSBLs, which are blacklists of spammer IPs. (They’re hosted as nameserver entries; you’d submit a DNS lookup for 4.3.2.1.example.com, where example.com was the DNSBL, to see if 1.2.3.4 was in the list; if it was, you’d get an “A” record of 127.0.0.2 (customary) back as a match.)

My concern is mostly that, historically, DNSBL providers have gotten carried away and started to list whole netblocks, and then whole netblocks of their enemies who aren’t sending spam… And pretty soon, you’re getting a lot of false positives. (Non-spammers who falsely test “positive” in spam checks.) In other words, you start rejecting legitimate e-mail because the blacklists tell you it’s spam. That’s a risk I’m not willing to take, and it’s an even more unacceptable risk for a business to take.

Other blacklists just don’t work. They match something like 10% of spammers. One blacklist I looked at rejects something like 40% of spam, and 50% of legitimate mail. (Yes, that’s right: it rejects more legitimate mail than spam.) So you probably won’t be surprised to learn that I don’t use any blacklists, other than a running list of people who have sent me obvious spam in the past 14 days. (I should probably lower the time period to something like 5 days, but I’m really not in a hurry to.)

But there are some blacklists that aren’t evil. Take these stats with a grain of salt, because they don’t check for false positives, and because they’re based on a limited sample, but I’ve found the following lists to be reliable:

  • zen.spamhaus.org: 100.00% matches, 101.77 ms. average response time. This merges all the Spamhaus zones, which include not only a list of known, persistent spammers, but also a list of exploited machines, and their “Policy Blacklist,” of things like cable modem netblocks.
  • t1.dnsbl.net.au: 100.00% matches, 260.61 ms. average response time. This is also an aggregate zone of an Australian DNSBL provider, with very good results.
  • karmasphere.email-sender.dnsbl.karmasphere.com: 100.00% matches, 96.31 ms. average response time.
  • hostkarma.junkemailfilter.com: 85.71% matches, 552.92 ms. average response time. It’s very slow to load for me, for some reason, but it has good results.
  • psbl.surriel.com: 50.00% matches, 394.72 ms. average response time. An automated blacklist based on Spamikaze. Incidentally, Spakikaze reports some other blacklists using their code, which I might want to evaluate, too.
  • ubl.unsubscore.com: 42.86% matches, 52.75 ms. average response time. A bit about the list is published on the excellent OpenRBL Wiki. Even though it comes after a list of DNSBLs with “100%” matches, 42.86% is actually very good in the real world.

Between the OpenRBL site and Spamikaze’s list, I do have some more that I’d like to experiment with. I should again reiterate that this was a very non-scientific test; it evaluated fewer than 20 IP addresses which have been blacklisted by my servers in the past few days. It assumes that their servers get spam from the same sources that I do; given that many large blacklists contain millions of IPs, this isn’t an accurate assumption at all. All these statistics are really good for is pointing out blacklists that are worth taking a look at.

Spam

Since I was curious… A graph of spamming IPs I’ve encountered in the past 14 days (1,075) by country (78 total).

It’s only showing the top 11. (Excel’s decision.) Note that, as much as people love to blame China for spam (they are #1), the US is #9. (47 IPs, to China’s 139.) You could get the raw data (generated real-time) if you wanted to do your own analysis; Excel’s Pivot Tables proved quite handy in sorting and graphing the results. (Though I wish it was slightly easier to only include 10 values, yet have the values be for the whole data set… The top 10 account for less than 75% of total spam.)

I’d estimate that about 1,070 of these 1,075 IPs, by the way, are infected desktop machines. A new reason to keep your anti-virus current: so you stop sending spam!

The Google Phone

It looks like T-Mobile’s going to start selling a phone (made by HTC!) in about a month running Google’s Android. Engadget has more on the phone.

What interests me more than anything is its openness: Palm has an SDK, and so do most of the other smartphone makers. But none are totally free, or actively promoting their use. Android seems like the perfect phone for geeks.

They even had a Developer Challenge giving away lots of money for applications, with some neat ones as results. There’s another gallery of applications out there, too. T-Mobile has accidentally (?) published the manual online, and Engadget has an interview with the CTO, who basically says that, while they don’t really like unlocking phones / tethering, they don’t intend to cripple this phone.

iPhone? G1? Both?

McCain

So I’m admittedly biased against McCain, but I couldn’t help but find his decision to suspend his campaign to be… Strange. For some reason (maybe it’s because it’s what we did in school for years?), I couldn’t help but view it as a strategic move. And in that case, it was brilliant. For a candidate who admitted in January that “[t]he issue of economics is not something I’ve understood as well as I should,” and who, ten days ago, said, “The fundamentals of our economy are strong” despite the difficult times, the move to suspend his campaign shows just how seriously he takes the issue. It’s so important that he’s putting aside his campaign to valiantly fix it.

The move was doubly brilliant, because while McCain suddenly gave the appearance as being very concerned about protecting Americans who’ve lost it all in the economy, it also leaves Obama looking like he’s selfishly continuing to campaign, ignoring the issue.

But Obama was quick to point out, “I think it is going to be part of the President’s job to deal with more than one thing at once… So in my mind, actually, it’s more important than ever that we present ourselves to the American people and try to describe where we want to take the country, and where we want to take the economy.”

The guy over at Electoral-Vote.com has an interesting piece up:

John McCain suspended his campaign, stopped running ads, and said he would not participate in the first debate scheduled for tomorrow at the University of Mississippi in Oxford, MS. He said that the nation is on the brink of a serious recession and this is no time for politics. McCain has been in the Senate 25 years. He knows precisely what will happen if he barges into the office of Sen. Chris Dodd (D-CT), chairman of the Senate banking committee and announces: “OK, Outta here, I’m taking over now.” …

So why did McCain propose cancelling the debate? In a word: politics. By flying into D.C. as the savior he might appear as a man of action to people who don’t know how the Senate works. The reality of course, is that Obama and McCain’s appearance in Dodd’s office would instantly turn the entire event into a political circus.

Of course, he’s not so kind to McCain as he continues:

Balz says that McCain is an impulsive gambler and sees his campaign stalled, what with Obama rising in the polls, so he goes for a Hail Mary again. This is actually the third such gamble McCain has taken in less than a month. First, he picked an inexperienced governor who runs a state with a quarter the population of Brooklyn as his running mate. Then he cancelled the first day of the Republican National Convention due to a weather emergency. Now he wants to cancel a debate due to a financial emergency. There is an increasing risk that the voters will see him as an impetuous and reckless politician whereas Obama comes off as stable and mature. …

The NY Times also has an analytic article on the politics of this. The view there is that Republican members of Congress know very well that throwing $700 billion at Wall St. in a big hurry with no oversight is not popular with the voters. On the other hand, they don’t want to buck their own President who still has a modicum of popularity with the the Republican rank and file. They are hoping McCain can bail them out. Democrats don’t want to be seen as obstructionists, but they also see the bailout for what it really is: a ploy to spend so much money that a future President Obama’s hands would be tied for lack of money. In effect this move is Bush’s attempt to “rule from the grave” by severely constraining what the next President can do. Oddly, it might constrain McCain more than Bush since he (McCain) has spending plans, too.

He also links to a survey of 1,000 Americans, in which 86% of respondents says that the debate should still be held. (50% saying it should be “Held as Scheduled,” and 36% saying it should be “Held With Focus on Economy.”) A plurality of respondents (46%) further said that it would be bad for America to cancel the debate.

And the debate isn’t the only thing that McCain called off. McCain was supposed to be on the Letterman show, but called Dave up to cancel. In this clip on Youtube, if you can get past the awkward first minute in which it almost seems like Letterman is getting paid for how many times he can work “John McCain” into a sentence, it actually blossoms into a pretty interesting piece. For one, Letterman seems to actually admire McCain, talking at length about how he’s a genuine war hero. But he also goes on to point out that suspending the campaign was a downright bizarre move, and that it’s especially odd that he didn’t send in his #2. (Or one of her many body doubles from WomenWhoLookLikeSarahPalin.com?)

What interesting times we live in.

/dev/*random

I thought I’d share my latest discovery. Linux has two “random number generators” as pseudo-hardware devices (that is, they’re in /dev, but aren’t actual hardware, much like /dev/null.) They’re called /dev/random and /dev/urandom. I never knew, or even thought much about, the difference.

/dev/random will “block” if it runs out of entropy. /dev/urandom is less secure in that it will keep serving data, but it will be from a less-secure pseudo-random series.

The difference is quite useful. For example, when encrypting something, it’s important to have “good” random numbers, hence /dev/random is indicated. On the other hand, the caching resolver I’m running (localish-only) on this server uses /dev/urandom: randomness prevents cache poisoning, but I really don’t want my DNS queries waiting for the “entropy pool” to get refilled.

As an aside, some tools to measure the effective randomness of your nameserver’s ports. Comcast, pretty impressively, ranks “Great” on the tests, as do the various caching nameservers in use on our webserver.

More on Spam Filtering

I tweaked the policyd rules and my main.cf a bit more, so that my mailservers lets PolicyD do most of the examinations. The net effect was that Postfix itself (my mailserver, or MTA to be more accurate) stopped rejecting as many hosts, instead allowing PolicyD, a plugin I use to do some more advanced filtering, to handle those hosts. And this is a good thing because Postfix would just reject the message, whereas PolicyD adds the host to a blacklist first.

I noticed an interesting change as a result, though: at any given time, fewer hosts were sitting in my greylist table, and more hosts were sitting in my blacklist. As I type this, there are 60 hosts in my greylisting table, and 985 hosts in my blacklist table. (This isn’t a totally fair comparison, as the blacklist keeps hosts for 14 days, while the greylist table keeps hosts for 3 days.)

I significantly revamped the page listing the banned hosts, both to cache the output (since each of the hundred hosts now involves running 6 DNS lookups and parsing two multi-meg text files), and to list a lot more output. I don’t currently use any DNSBLs (DNS blacklists), but set the page up to show whether a given host matches those blacklists.

At the time of this writing, 96 of the 100 most recent hosts have been in the Spamhaus XBL, which lists “hijacked PCs infected by illegal 3rd party exploits.” 96%! Spamcop is the next best blacklist, with 76% matching, followed by 64% matching results from the Spamhaus PBL, which lists IPs that are for “end-users,” like residential cable modems and dial-up lines: things mailservers shouldn’t be on. (If you’re like me, BTW, the answer is, “Yes, you can remove yourself” if you run a legitimate mailserver on those netblocks.) I’ve also had good results with the NiX Spam blacklist, which I found mentioned on the page for OpenBSD’s spamd.

I’m pretty strongly against using blacklists for anything definitive, as they’ve been historically fraught with problems and abuses, with many administrators eager to list whole netblocks, or even people they don’t like. And my simple setup seems to be just shy of 100% effective in stopping spam, so I have no incentive to go for blacklists anyway. Plus, this is an analysis of how spam shows up in blacklists, but that doesn’t tell us anything about how many legitimate e-mails show up in those blacklists, which is an equally important metric to consider.

But if I were to use DNSBLs, I’d give strong consideration to the following:

  • zen.spamhaus.org, which blends the XBL and PBL, plus a general list of spammers.
  • sorbs.net, which has myriad blacklists; pay special attention to “web” (127.0.0.7) and the dynamic IP list (127.0.0.10).
  • bl.spamcop.net, which is quite well-known.

Russia, Again

From the BBC today: Russian warships have set off for Venezuela for joint exercises unprecedented since the Cold War.

To my elected officials: please, please, please don’t allow for Cold War II.

I’ve for a long time perceived that increasingly large parts of the world hated us. And it’s easy to think, “So what if Venezuela hates us?,” or “So what if Iran’s crazy leaders hate us?” But what about when all of these countries start banding together, and get support by an increasingly-deranged superpower with plenty of nuclear weapons? Mutually assured destruction doesn’t work when the enemy is suicidal or just downright crazy. I’m really not convinced that the MAD theory would keep Hugo Chavez or Ahmadinejad from launching nuclear weapons.

Our current foreign policy reminds me of a schoolyard bully. We probably have the biggest and best military. We can stand up, flex our muscles, and get our way. But bullies don’t know how to use diplomacy and avoid fights, only how to come out bloodied but on top. We shouldn’t go praising Russia or Chavez anytime soon, but, uhh, it’s high time we did something other than goading them into war.