BitTorrent is Cool

Having recently pulled down some updates via BitTorrent, I discovered a cool neat thing about the protocol. Obviously, it’s basically a peer-to-peer filesharing tool. But it has some neat things that keep it working well. Files are split up into many pieces, and each of those chunks can be downloaded from anyone. (Apparently, various file-integrity provisions exist, too, to help guard against people injecting garbage.)

The first neat thing is the concept of “choking” selfish systems. As I download chunks, my torrent client will automatically start sharing the completed chunks. If my client detects that you’re downloading completed pieces I have, but not sharing the completed pieces you have, you get “choked,” or banned. I stop sharing with you. (Periodically, an “optimistic unban” will kick in, giving you another chance.) This greatly increases the incentive for you to share files: otherwise, everyone would want to download only, meaning that very few people had the file.

The obvious problem is that the file, if one piece is missing, is useless. If you take a random 1MB chunk out of the middle of Microsoft Office, the whole program will fail to work. (Not that I condone downloading MS Office via BitTorrent. After all, it’s free from school!) So it’s important to make sure that no pieces become unavailable. So most clients implement a neat algorithm, called “rarest first.” The name sums it up pretty well: as clients go out advertising what pieces of the file it has, it will go out and grab the least-available pieces first. And after I finish that piece (and, by necessity, begin advertising that piece to peers), I go and get the next-rarest piece. Since the whole is useless without all the parts (the whole point of the rarest-first system), it doesn’t matter what order I acquire them in, thus permitting each client to help raise availability.

Overall, the more I read about the inner workings, the more impressed I am.

BitTorrent

A few tips, in the hopes that it’ll help someone else. (Aside: don’t download illegal stuff with BitTorrent. Do download the many awesome, legal things on BitTorrent, such as Ubuntu torrents.)

  • You can encrypt your BitTorrent traffic, which is meant at circumventing ISPs that feel like being pains and blocking traffic. However, “Enabled” isn’t the value you want. You want “Forced.” In uTorrent, this is under Preferences -> BitTorrent.
  • If you don’t upload at all, other nodes will “choke” you by refusing to talk to you. It doesn’t seem to me like it has to be entirely equitable; I’ve capped my upload at a pretty small number, but am downloading around 100 kB/second (800 kbps).
  • You’ll have a port number for incoming connections. If this port isn’t coming through (such as if you have a “default-deny” policy), things will work, but they’ll be unbearably slow. As an aside, if you’re behind an OpenBSD firewall (using pf), have a local IP of 192.168.1.79, and use the randomly-selected port 26689 as your local port for BitTorrent, the firewall rule looks like rdr on $ext_if proto tcp from any to any port 26689 -> 192.168.1.79 port 26689. Remember to flush the rules (pfctl -F rules) and then (possibly required? possibly done automatically with the flush?) load them back in (pfctl -f /etc/pf.conf).

With these three principals in mind, my (legitimate) download went from 0.8 kB/sec to 145 kB/sec.

Huh, a neat tip… If you pick a torrent from one site, but it’s something identical to what other sites have, add the additional trackers in to the first download, which will give you more peers!

Oh, another tip: don’t arbitrarily set a download limit! My downloads wouldn’t break 145 kB/sec or so, until I realized that I’d set a limit of 150 kB/sec. I removed the limit and am suddenly at 400 kB/sec. (Incidentally, our available bandwidth has suddenly plunged to nothing…)

One final note: Peer Guardian is good, but don’t run it unnecessarily, since it blocks a lot of legitimate traffic. Including, oddly, Steam’s servers (for games like Counter-Strike and TF2), apparently because they use Limelight’s CDN, and they’ve dubbed Limelight bad?

Clarity

I saw a reference to RAID 6 and didn’t recognize it, so I did what anyone would do–I Wikipediad (I’m going to make that a verb) it:

RAID 6 extends RAID 5 by adding an additional parity block, thus it uses block-level striping with two parity blocks distributed across all member disks. It was not one of the original RAID levels.

So that’s why I hadn’t heard of it–it’s not an “original” RAID level. (I don’t subscribe to RAID trade publications, so I wasn’t aware of it.) The description is a good one-liner, but there’s more text that follows. Surely, it will give me a good insight into exactly what this means and how it works in an applied setting.

RAID 5 can be seen as special case of a Reed-Solomon code.[5] RAID 5, being a degenerate case, requires only addition in the Galois field. Since we are operating on bits, the field used is a binary galois field GF(2). In cyclic representations of binary galois fields, addition is computed by a simple XOR.

After understanding RAID 5 as a special case of a Reed-Solomon code, it is easy to see that it is possible to extend the approach to produce redundancy simply by producing another syndrome; typically a polynomial in GF(28) (8 means we are operating on bytes). By adding additional syndromes it is possible to achieve any number of redundant disks, and recover from the failure of that many drives anywhere in the array, but RAID 6 refers to the specific case of two syndromes.

Wait, what? Reed-Solomon? Degenerate cases? Galois fields? Binary galois fields in cyclic representations? Special cases of the Reed-Solomon code? Polynomial notation of the Reed-Solomon field? I’m lost. Very lost, in fact. Here I was hoping for an expansion over a one-liner that I pretty much understood but that was somewhat vague. And instead I get… I’m not even sure what I got.

More on Time

I’m worried you’ll all think I’ve snapped and become obsessed with time. It’s not quite that bad. But here’s another post about time.

A lot of Windows machines seem to sync to time.windows.com, and Apple has its own time.apple.com service. I come across this interesting (to me) post about Apple’s service, and started doing some looking. First, an important bit of terminology (for those who don’t follow my every post): the stratum of an NTP server is basically its place in a hierarchy of systems. Stratum 1 is the top, meaning that it’s directly connected to an accurate time source (e.g., a GPS or other hyper-accurate time source). Lots and lots of people sync their clocks to Stratum 1 servers, and thus become Stratum 2. And lots of the Stratum 2 servers join the pool, meaning that people who sync to them become stratum 3. With each step down the hierarchy, you increment the stratum by one. The impact of stratum varies: you become more and more removed with each step, which obviously introduces error. But the level of error varies: it’s conceivable that strata 1-3 would all be on the same LAN, but it’s also possible that my stratum 3 server I sync to is sitting on a 100 Mbps line in a data center in Boston, gets it time from a stratum 2 machine with a 128kbps satellite link in Zimbabwe, which gets it time from a stratum 1 on a 3600bps dialup line in Rhodesia. (Is that even a country anymore?) So the “loss” of being several strata down varies somewhere from maybe 1ms up through many seconds.

One final note: the following commands are being run on a node that’s itself a stratum 2 NTP box, so pay attention to the “offset” field. (And note that it’s in seconds, not milliseconds, just to add a healthy dose of confusion.)

# ntpdate -q time.windows.com
server 207.46.197.32, stratum 3, offset 0.016801, delay 0.08435
 9 Mar 22:47:55 ntpdate[22072]: adjust time server 207.46.197.32 offset 0.016801 sec

First, it’s inexplicably a stratum 3 host. This isn’t necessarily bad as I discussed, so much as odd–you’d think that time.windows.com could at the very least be stratum 2, if not holding a GPS itself. (“time.windows.com” may actually be a number of machines sharing an IP, with GeoIP or whatnot.)

For comparison, the nodes I sync to:

#  ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-time-C.timefreq .ACTS.           1 u  142 1024  377   18.854    4.204   0.025
-india.colorado. .ACTS.           1 u   93 1024  377   23.777   11.419   3.751
+rrcs-64-183-56- .GPS.            1 u  408 1024  377   55.427    0.217   1.047
*tick.UH.EDU     .GPS.            1 u  483 1024  377   25.108   -2.053   0.082
+clock.xmission. .GPS.            1 u  145 1024  377   38.306   -0.861   0.019

Pay attention to the “offset” column, which here is in milliseconds. The timeservers range from pulling be back 2 milliseconds to advancing me 11 milliseconds. (Although also note the “-” on the first line, indicating that the +11ms host is considered a fairly bad source. By comparison, the “*” indicates the current ‘best’ server, which set my clock back 2ms.)

Microsoft’s server is trying to pull me 0.016801 seconds, or 16.8ms. As shown above, this is even worse than the server that’s being rejected for being too far off. (Of course, it’s worth repeating that it’s less than 17 milliseconds, which is more than enough accuracy if your goal is to simply keep your clock accurate!)

How about Apple?

# ntpdate -q time.apple.com
server 17.254.0.28, stratum 2, offset 0.004297, delay 0.07193
server 17.254.0.31, stratum 2, offset 0.003697, delay 0.07074
server 17.254.0.26, stratum 2, offset 0.004346, delay 0.07195
server 17.254.0.27, stratum 3, offset 0.004369, delay 0.07195
 9 Mar 23:23:37 ntpdate[4321]: adjust time server 17.254.0.31 offset 0.003697 sec

This one looks different than the Microsoft one: there are four IPs for time.apple.com. All four are kept at stratum 2. The offset here is better: 0.003697 seconds, or 3.7ms. It’s still pulling me in the wrong direction (-2ms was ruled the best, whereas this wants to advance me +4ms), but it’s much closer to accurate.

This got me wondering: Apple has 4 A records for time.apple.com… What does the NTP pool look like?

# ntpdate -q pool.ntp.org
server 67.201.12.252, stratum 3, offset 0.002990, delay 0.05959
server 69.36.240.252, stratum 2, offset 0.003088, delay 0.06898
server 98.172.38.232, stratum 2, offset 0.022841, delay 0.08508
server 209.67.219.106, stratum 3, offset 0.019785, delay 0.02614
server 216.184.20.83, stratum 3, offset 0.012001, delay 0.10208
 9 Mar 23:31:21 ntpdate[7524]: adjust time server 69.36.240.252 offset 0.003088 sec

It returns five hosts. And a quick aside:

;; ANSWER SECTION:
pool.ntp.org.           948     IN      A       209.67.219.106
pool.ntp.org.           948     IN      A       67.201.12.252
pool.ntp.org.           948     IN      A       98.172.38.232
pool.ntp.org.           948     IN      A       69.36.240.252
pool.ntp.org.           948     IN      A       216.184.20.83

They have a long expiry: 948 seconds, or about 16 minutes. Sure enough, asking again gets me the same 5, but with a decremented TTL. They do run GeoIP, though, so my other server is getting a different list, even though both are in the US. (But it keeps the same list, too.) But anyway, back to the ntpdate output…

I’d actually run this earlier, and gotten all stratum 1’s and stratum 2’s, which I was going to post. Now it’s all 2’s and 3’s. The dork in me wants to run this every ttl seconds and keep logs.

This shows where the NTP clock selection algorithm shines. I know from belonging to the pool that we get “monitored” by a server that checks our time a few times an hour and pulls us out of the pool if our servers are off (I think a second is the tolerance). Here, the worst server is 0.022841 seconds (22.8ms) off, and one of the best is picked. (The “best,” the one closest to mine, is actually discounted because it’s a stratum 3, and the stratum 2 one is presumed to be more accurate. Closeness to your own clock is generally not a metric, unless you’re running these on a server that’s already kept to good time.) But the net result is an offset of about 3ms: a slight improvement over Apple, and a big improvement over Microsoft.

Aside: I just ran the Microsoft one again, and its offset had dropped to 2.3ms. So I ran it with Apple and Microsoft in one, letting the NTP algorithm do its magic… Suddenly Microsoft was suggesting a -1ms offset, with Apple pushing a 4ms increase. They’ve all converged a bit, although the time.windows.com one seems to jump around a bit more than the others?

The moral of the story? All of these timeservers are really good if you’re just looking to keep the clock in your system tray right. If you’re a total nut for the right time, the pool is best, followed by Apple, then Microsoft.

Addendum: Microsoft hosts one of NIST’s Stratum 1 servers, which makes it all the stranger that time.microsoft.com is all the way at stratum 3.

Sysadmin

I like to run a really good Windows machine. Firefox is my default browser (although IE’s come leaps and bounds since it’s “I’ll merrily install any program a webpage asks me to!” days), I keep a system free of viruses and spyware, I have a “background” disk defrag tool, I routinely run CCleaner, etc. to purge accumulated cruft, and so forth. In short, I’m a system administrator’s dream. (Actually, I think I’m their nightmare, since the only time I contact them is when I have a really hard question, and I never do anything they expect… But I digress. If I administered a set of desktop nodes, I’d want them to be setup like mine.)

If I ran a computer network, though, I really wouldn’t trust normal people with doing things. Virus definitions need to be updated, virus checks need to be run, recycle bins need to get emptied, stale caches need to get purged, clocks need to get synced, and disks need to be defragged. I do this naturally on my desktop machine, so I don’t think of it as taking a lot of time, but if you asked me to maintain a network of, say, 30 PCs, I’d want to cry.

There exist, of course, a bajillion different tools for administering clusters of PCs. But what I find interesting is that I can’t think of any that really do what I want. I want to make sure certain programs are installed, and run them unattended periodically. Most solutions still seem like they’re require me to go to each PC and do my work, or they’d limit things: an increasingly common thing to do is just reimage each computer when it reboots. In some cases, though, this is totally undesired: people might forget to use their network drive, losing all their work when they reboot. Or they might need to install a legitimate program for their work, and you’d end up losing a lot of productivity as they’re forced to reinstall every time they reboot. (Which means that they won’t reboot often, which complicates other things.)

Time

So I’ve mentioned before that I run an NTP server. Stratum 2, which means it gets its time from a “Stratum 1,” which is set directly to something reliable. The main goal of NTP is to keep clocks in sync, and it’s pretty accurate, down to a fraction of a second, which is more accuracy than most people need. All of my computers will now agree on the time down to a second.

The ultimate source, of course, is the atomic clock. But there isn’t an atomic clock, per se. There’s actually an array of them, each using cesium or hydrogen as an atomic reference. Collectively they form “the” atomic clock, which is used as a frequency standard.

It’s all well and good to keep your computer clock (and wristwatch, and microwave, and oven, and wall clock…) synced within a second, but some things need more accuracy. The USNO (US Naval Observatory, in charge of maintaining the atomic clock system) explains one common scenario well: systems for determining one’s location, such as GPS and LORAN “are based on the travel time of the electromagnetic signals: an accuracy of 10 nanoseconds (10 one-billionths of a second) corresponds to a position accuracy of 10 feet.” There are also lots of other scientific uses for extremely precise time, many of which I couldn’t even begin to understand the basic premise of. But suffice it to say that there are actually a lot of times when knowing the time down to the nanosecond is important.

Things like NTP don’t cut it here. You can get down to the millisecond, but you need to be about a million times more accurate. (A millisecond is a thousand microseconds, which is a thousand nanoseconds.) So how do you keep the exact time? It turns out that there are actually several ways. One way (decreasingly common) was to keep an atomic clock of your own. You can buy a “small” (the size of a computer…ish) device that has cesium or hydrogen or rubidium inside of it, which keeps pretty accurate time. Over time it’ll wander, but at least short-term, it’s quite accurate.

One of the first ways is WWV, a shortwave radio station. (And it’s Hawaiian sister station, WWVH.) They run continuously, disseminating the exact time via radio as observed from the atomic clock system. In the past I’ve synced my watch to this source. More notable, in a behind-the-scenes type of way, is WWVB, a low-frequency (60 kHz) radio broadcast. This is what all your “atomic wall clocks” sync to. (Incidentally, I’ve read that most of them are fairly cheaply built, meaning that their time is really not accurate to more than a second.) Another interesting sidenote is the deal with their antennas: a quarter-wavelength antenna at such a low frequency is 1,250 meters tall, or about 4,100 feet (nearly a mile). But with some wacky designs they can overcome this (although pouring 50,000 Watts into it also helps).

The problem with “straight” receivers for WWVB, though, is that you have to figure in the time it takes for the signal to reach you, which is rarely done all that well (if at all). Instead, a more common technology is used: GPS.

It turns out that GPS carries insanely accurate time. Wikipedia has a really good article on it. Each GPS satellite carries an atomic clock onboard, and people on the ground keep it synced (with nanosecond accuracy) to the atomic system. There’s some funky correction going on to keep things perfectly accurate. GPS has a claimed accuracy of 100 nanoseconds, although people have found that it’s actually about ten times better, down to 10 nanoseconds or so.

As an aside, GPS in general is an interesting read. There’s a lot more going on than meets the eye. I recently dug up an old GPS and wondered if it needed an “update” to get new satellite positions: with ham satellites, we get periodic updates for our tracking software to account for changes in their path. GPS has a neat solution, though: the satellites broadcast this data. Actually, more accurately, they broadcast all the data for all the satellites, so that seeing one satellite will fill you in on the whole setup. There used to be Selective Availability, basically a deliberate introduction of error into the signal. The premise was that we didn’t want enemy forces using it: imagine a GPS-guided rocket, for example. So we introduced error of about 30 meters for a while. Ironically, it was ended because our own troops (before Iraq) couldn’t get the military units, so they were just buying off-the-shelf civilian units and incurring the decreased accuracy. So Selective Availability has been turned off, and there are indications that it was permanent. A third interesting tidbit is that the GPS satellites carry much more than might meet the eye, including equipment monitoring for nuclear detonations.

The timekeeping problem is what to do when you get the time at the GPS, though. High-end GPS units will provide a pulse-per-second signal, which you cna hook up to a computer via serial, and achieve great accuracy. But there are all sorts of considerations I never thought of. Between the time it actually charges the pin and the time the operating system has processed it takes a little bit of time, os there are special kernel modifications available for Linux and BSD to basically get the kernel directly monitoring the serial port, to greatly speed up its processing. I also discovered the Precision Time Protocol (commonly known by its technical name, IEEE 1588), which is designed to keep extremely accurate time over Ethernet, but apparently requires special NICs to truly work well.

I’ve also learned another interesting tidbit of information. CDMA (which is a general standard, not just the cell phone technology that Verizon uses) apparently requires time down to the microsecond to keep everything in sync, such as your multiple towers and all the units (e.g., phones) in sync and transmitting at the right times. So the easiest way to keep all of their towers in sync to a common standard was to put a GPS receiver at each tower and sync the system to that. Thus CDMA carries extremely accurate time derived from GPS, which has led to some interesting uses. It’s hard to get a GPS signal indoors, so they now make CDMA time units–they sit on a CDMA network in receive-only mode, getting the time but never taking the “next step” of actually affiliating with the network. This lets people get GPS-level accuracy inside buildings.

Public Safety

For those of you who don’t monitor police scanners regularly, I’d like to introduce what can be considered a fairly scary fact: their computer systems go down all the time.

Where it usually comes up is when they try to run a license plate or a person, or to query NCIC or similar. The officer calls it in and waits a few minutes, before the dispatcher calls back that the (remote) system is down. When you’re monitoring multiple neighboring towns, you’ll often notice that they all lose it at once. The backend servers are going down.

This drives me nuts. It’s usually not a huge deal, but now just imagine that you’re the police officer, and the guy you pull over, but can’t run through the system, actually has a warrant out for his arrest. For murdering a police officer. But you have no clue, because the system is down. Of course this is extreme, but it’s always been said that traffic stops are actually the most dangerous and unpredictable things an officer does. They never know whether it’s a nice old lady or someone with a warrant out for their arrest. A decent amount of arrests come from pulling people over for traffic violations and finding subsequent violations, like cocaine or guns, or an outstanding warrant.

My webserver sits in Texas on what’s basically an old desktop system. And it seems to have better uptime than these systems. As biased as I am in favor of my blogs, even I will admit that police databases are more important. Further, if my blogs were routinely unreachable, I’d be furious with my hosting company. Why is it tolerated when this happens?

Databases are fairly easy to replicate. Put a “cluster” of database nodes in a datacenter. You’re protected against a hardware failure. Of course, the data center’s still a single point of failure. So put another database node in a separate datacenter. That alone is probably all you’ll ever need. But you can keep turning up more database nodes in different locations as budget permits. (I suspect budget is the limiting reactant.)

But you can take it one step further. Set up another database node, not in a lonely datacenter, but in a large dispatch facility. (The MA State Police apparently run a very large 911 answering center.) So they get a database node there, that doesn’t answer public queries, but that receives updates from other database servers. And, in the event of some sort of catastrophic failure, remote dispatchers can call up and request that something be run.

I’m just really bothered that people seem to find it acceptable that, probably at least once a week, the system is unreachable for quite some time.

Saying It

I think I mentioned that I signed up for a $7 VPS account, more as a trial than anything. It was a relatively new company, but I figured I had, at most, $7 to lose.

They sent out a fairly terse e-mail that they were transferring companies to some organization in New Zealand. Understandably, I wasn’t too pleased, but I didn’t cancel immediately… I still had some time left on the month I’d signed up for. So today a new e-mail arrived, this one explaining what had happened. The tone suggests that they’ve come under harsh criticism and lost many customers. They go on to explain their rationale.

They seem to have entirely misjudged the situation. They could have done it two ways:

  • Sent out a terse, cryptic e-mail that they were being bought by another company and that my bills would be coming from a new company in New Zealand dollars, or
  • Sent out an upbeat e-mail explaining that, to improve quality of service, they were moving the virtual server division over to a different company, one with more experience maintaining servers, and expanding offerings, all the while keeping prices the same.

In reality, both reflect the same situation. But the first message almost seems guaranteed to scare away customers. Since I pay with PayPal, I don’t really care about currency; the exchange happens automatically. But inexplicably changing your billing currency adds a huge level of sketchiness. On the other hand, what they’re doing is basically upgrading their offerings and improving reliability and support, by selling their business to a better-established one that would better look after this. How could you not want that?

It’s all in how you say things. They had the opportunity to make me eagerly want their products. Instead, they handed me an upgrade that caused half their customers to cancel.