We’re Back!

Today was just a tremendous string of technical failures.

  • I updated some WordPress modules, and things became unusable. It turns out the virtual machine was just about out of RAM, because I’d never upped the small limit I used for testing.
  • I rebooted to do that, but it wouldn’t start. After at least an hour of debugging, I concluded that absolutely nothing had changed since the server’s virtual image last booted correctly. Restarting xend fixed it.
  • When it finally came back up, networking wasn’t configured. It took me a while to find the appropriate routing information.
  • When that was finally back up, the DNS server didn’t start. It doesn’t keep a log file, and it doesn’t print errors to the console. There were actually multiple problems that kept it from starting, and I’m not sure how they didn’t happen last time, to be honest.
  • Burst.net, our hosting company, inexplicably started having about 1500ms pings for incoming traffic, tried from multiple sources across the country. This didn’t ‘break’ anything fully, though I expect that it didn’t bode well for NTP.

So things are up and running again… We hvae more RAM now, and a lot of little things have been updated. Keep your fingers crossed there aren’t more things that are broken.

Robert Mueller

There’s a neat article up on Washingtonian.com, The Ultimate G-Man: Robert Mueller Remakes the FBI, talking about Mueller, the director of the FBI. The article’s author, Garrett Graff, commented on writing the article, “I was really surprised at how little institutional attention had been paid to the Bureau by the media.” And it’s so true: even Mueller’s Wikipedia page is practically barren, yet he’s the head a gigantic government organization talked about in newspapers every day.

It’s not just a biography of the man sworn in on September 4, 2001 (note the date), but also gives some fascinating insight into the FBI, its changing mission, Washington politics, and even tells the incredible story of Alberto Gonzales’ rush to try to “convince” John Ashcroft, barely conscious in the hospital, to certify the legality of the White House’s wiretapping program.

It’s definitely an interesting read.

Chef Matt

Besides a passion for pouring Tabasco sauce all over food, I have another observation that I think qualifies me as a top-notch chef: crunchiness is an important quality of food.

Of course, you probably don’t want crunchiness on everything, but it’s worth considering.

Today’s Meat Sandwich? (Actually, a Meat Wrap, though it’s more because we’re out of bread than health-consciousness.) Pepperoni, turkey, cheese, Tabasco Green Pepper Sauce and…

A few ground-up corn chips. Incidentally, I think I nailed the perfect level of crunch the first time around. Too much and it’s like, err, eating ground-up corn chips, not eating a sandwich. But too few and it’s terribly bland.

I’m so totally going to open a restaurant. My food will come doused in Tabasco Green Pepper Sauce (maybe I can get sponsorship), and all of it will crunch.

About Time

My old LayeredTech box that’s been taken offline was a stratum 2 NTP server, and I kept upping the load until it was set at 100 Mbps in the pool, which had it handling about 15-20 queries/second. NTP was very light on system resources, so for a machine that sits in a data center and is allowed 1,000 GB of traffic a month, I was able to handle a lot of traffic. It was maybe 1-2 GB of bandwidth/month, though an instantaneous look would show a crazy amount of incoming connections. (It’s 75-80 bytes of UDP.)

So when we set up the new server (shared between Andrew and I), I conned Andrew into allowing me to run NTP in Dom0 (the ‘root’ domain, versus inside a virtual machine guest, as VMs, for probably-obvious reasons, aren’t allowed to manipulate the hardware clock) and put the server in the pool. (By some strange fluke, the geo-IP code saw the machine’s IP as being in Brazil… The server’s been added to the US zone, but also remains in the Brazil zone, as South America is very underserved: 17 servers in South America, versus 497 in the US alone.)

NTP is one of those things where the default configuration probably gets you 90% of the possible accuracy. Set up a machine and have it sync to pool.ntp.org and your clock will probably stay within 50ms of “true time.” (Assuming you use ‘real’ NTP, which polls up to every 1024 seconds, versus Windows’ conservative 7 day interval.)

However, you can squeeze more out of it. One thing is that, when you’re becoming a member of the pool, you don’t want to set your server as the pool, or you risk forming a feedback loop of sorts, in which you might ultimately look to your own server as its reference or whatnot. So you tend to hand-pick a nice array of servers.

NTP has a concept of stratums (“strata” is probably the correct plural). It essentially indicates the number of steps before you reach to a “reference clock,” which is something definitively setting time, such as a GPS receiver (which can be accurate down to the nanosecond level), or even an actual atomic clock. When you sync to stratum 1s, you serve time as a stratum 2, and so on. Stratum doesn’t necessarily point to decreased accuracy, but that’s kind of like saying that the number of hops on a traceroute doesn’t necessarily mean increased latency: in practice, it does, though with NTP the difference is typically very small. NTP is very good, though, at evaluating its clocks.

So I recently redid the server lineup, and pulled out the entries for a couple stratum 2 servers, so that our server is consistently stratum 2. A decent number of stratum 1s are semi-private, but open to those running public servers, which means we can get away with syncing to really nice clocks.

Another thing that makes a big difference is latency. Latency itself is actually not a big deal: NTP looks at how long packets take round-trip and adjusts, so that syncing to servers in Africa is really no different than syncing to local ones. At least in theory, though I’ve found that, in practice, it’s fairly accurate. But what matters is variable latency, especially uneven latency, such as if outgoing packets take a different route than incoming ones, which is increasingly common. (It usually makes no difference, as most people don’t need to calculate round-trip latency precisely…) This is where it helps to have more local clocks, as round-trip latency is small enough that differences between outgoing and incoming routing are minimized.

A less-common worry, though one worth looking into IMHO, is diversity of time sources, too. GPS is very commonly-used on stratum 1’s, because it’s cheap and very accurate. However, if it were to ever go down, it’s so commonly-used that some worry that NTP would become very degraded in accuracy. (Of course, if GPS were to go down, we’re probably have bigger problems than our clocks losing sync.)

So I’ve just redone our timeserver setup after a couple servers we used to use ended up being pretty crappy. (One was my old server, which is no longer online, and another inexplicably dropped to stratum 2.) We now sync to six different stratum 1 servers. All are geographically close, giving us about 30ms latency worst-case. A couple are set by CDMA (via cell towers, which get their time from GPS and tend to ‘filter’ it through a Rubidium reference), the PPS one syncs to GPS as I understand it, and two get time from ACTS, by dialing into NIST and syncing time over the phone lines, which apparently gives superb accuracy. (As you get much more controlled latency.) Oh, and the last server… It’s synced to the Naval Observatory’s atomic “master clocks.” (Somewhat to my annoyance, NTP seems to love the UDel server at the exclusion of the USNO clock.)

I’ll monitor things for a few days to see how things go, but I expect very good results. (Then again, we’ve had very good results in the past, too.) I worry that we might hop around between sources a bit, because we no longer have one server that stands head-and-shoulders above the others. Four of the six seem to be extremely local (in terms of latency) and extremely accurate (in terms of their agreement with each other). So far, though, I haven’t seen our root dispersion (rougly, the difference between the biggest and largest offsets, summed between all the servers in our path) go above 20ms. I am seeing a ~5ms spread between offsets between multiple “good” hosts, but then again, 5ms offset is very good…

Out of Touch

Sean Combs, better known as rapper Puff Daddy P-Diddy Diddy, recently complained that high gas prices have forced him to fly on commercial flights, instead of using his private jet.

“That’s how high gas prices are. I’m at the gate right now. This is really happening, proof gas prices are too high. Tell whoever the next president is we need to bring gas prices down.”

Somehow, I don’t feel an awful lot of sympathy?