BitTorrent is Cool

Having recently pulled down some updates via BitTorrent, I discovered a cool neat thing about the protocol. Obviously, it’s basically a peer-to-peer filesharing tool. But it has some neat things that keep it working well. Files are split up into many pieces, and each of those chunks can be downloaded from anyone. (Apparently, various file-integrity provisions exist, too, to help guard against people injecting garbage.)

The first neat thing is the concept of “choking” selfish systems. As I download chunks, my torrent client will automatically start sharing the completed chunks. If my client detects that you’re downloading completed pieces I have, but not sharing the completed pieces you have, you get “choked,” or banned. I stop sharing with you. (Periodically, an “optimistic unban” will kick in, giving you another chance.) This greatly increases the incentive for you to share files: otherwise, everyone would want to download only, meaning that very few people had the file.

The obvious problem is that the file, if one piece is missing, is useless. If you take a random 1MB chunk out of the middle of Microsoft Office, the whole program will fail to work. (Not that I condone downloading MS Office via BitTorrent. After all, it’s free from school!) So it’s important to make sure that no pieces become unavailable. So most clients implement a neat algorithm, called “rarest first.” The name sums it up pretty well: as clients go out advertising what pieces of the file it has, it will go out and grab the least-available pieces first. And after I finish that piece (and, by necessity, begin advertising that piece to peers), I go and get the next-rarest piece. Since the whole is useless without all the parts (the whole point of the rarest-first system), it doesn’t matter what order I acquire them in, thus permitting each client to help raise availability.

Overall, the more I read about the inner workings, the more impressed I am.

BitTorrent

A few tips, in the hopes that it’ll help someone else. (Aside: don’t download illegal stuff with BitTorrent. Do download the many awesome, legal things on BitTorrent, such as Ubuntu torrents.)

  • You can encrypt your BitTorrent traffic, which is meant at circumventing ISPs that feel like being pains and blocking traffic. However, “Enabled” isn’t the value you want. You want “Forced.” In uTorrent, this is under Preferences -> BitTorrent.
  • If you don’t upload at all, other nodes will “choke” you by refusing to talk to you. It doesn’t seem to me like it has to be entirely equitable; I’ve capped my upload at a pretty small number, but am downloading around 100 kB/second (800 kbps).
  • You’ll have a port number for incoming connections. If this port isn’t coming through (such as if you have a “default-deny” policy), things will work, but they’ll be unbearably slow. As an aside, if you’re behind an OpenBSD firewall (using pf), have a local IP of 192.168.1.79, and use the randomly-selected port 26689 as your local port for BitTorrent, the firewall rule looks like rdr on $ext_if proto tcp from any to any port 26689 -> 192.168.1.79 port 26689. Remember to flush the rules (pfctl -F rules) and then (possibly required? possibly done automatically with the flush?) load them back in (pfctl -f /etc/pf.conf).

With these three principals in mind, my (legitimate) download went from 0.8 kB/sec to 145 kB/sec.

Huh, a neat tip… If you pick a torrent from one site, but it’s something identical to what other sites have, add the additional trackers in to the first download, which will give you more peers!

Oh, another tip: don’t arbitrarily set a download limit! My downloads wouldn’t break 145 kB/sec or so, until I realized that I’d set a limit of 150 kB/sec. I removed the limit and am suddenly at 400 kB/sec. (Incidentally, our available bandwidth has suddenly plunged to nothing…)

One final note: Peer Guardian is good, but don’t run it unnecessarily, since it blocks a lot of legitimate traffic. Including, oddly, Steam’s servers (for games like Counter-Strike and TF2), apparently because they use Limelight’s CDN, and they’ve dubbed Limelight bad?

Security Forces

I just finished a show on “NatGeo” about the private security firms working in Iraq. It was a really interesting watch. They’re not there to engage in combat, but they’re there for “security,” such as escorting construction materials for a new police station (something insurgents are eager to stop), and transporting VIPs around.

IEDs are apparently a huge problem, moreso than the news portrays. One of the guys brought back his SUV, with the whole side blown in and full of bullet holes. The SUV was “reinforced,” meaning that it had bullet-proof glass and huge steel plates over it, and yet it was still in terrible shape. He made it out alright, although the driver, an Iraqi, died. “That was my seventh IED,” he mentioned casually.

Most are apparently set on desolate roads, and are basically just tripped by any car. There are often just tripwires that set them off, versus manually being tripped. Which got me thinking of an old idea…

I want to build an “RC Car,” something radio-controlled. Except I don’t mean a little RC car. I mean an actual car that’s driven remotely. With GPS and a set of video cameras (plus a high-speed, low-latency data link), you could be pretty accurate. It probably wouldn’t be a good idea to remotely drive one of these down Route 3 (although I think you could design it to work pretty accurately). But I think they might rock in Iraq. You send one out a quarter-mile in front of your “real” convoy. No one’s in it, but its main purpose would be to trip IEDs, and do some scouting for you. From the back of a van in tow, or from a remote headquarters, people could watch for anything suspicious. And, “worst case,” it trips an IED, effectively wasting the IED on blowing up a van with no one in it. The real people behind could either divert their course, or plow on through, knowing that the bomb had been detonated.

I’ve also thought RC planes would be interesting. These days they’re “UAVs,” unmanned aerial vehicles. What I have in mind is isn’t the military UAV, a “real” airplane remotely controlled, but something a couple feet long with some cameras. Outfit it with GPS and various data links, such that it can stream video real-time, or even capture higher-resolution still images and transmit those. (Heck, fit a high-end camera on it, but have it transmit a 640×480 image, and just store the full-res to an 8GB Flash drive…)

I always thought it’d be cool to have as a pet project. Fly it around and go “sight-seeing” from your room, with what’s essentially a wireless webcam in the sky. I think they’d also be popular with places doing mapping / “satellite” imagery, as you could send these little things up and just have them run autonomously, snapping photos of an area until the batteries / gas ran low, at which point they’d return “home.”

But these things would rock in combat, too. Send these out over areas you’ve got to travel. (And areas you’re not travelling, to keep them guessing.) At a remote command post, someone can spot potential threats and identify them long before they become a problem. (You could even try grazing them with your mini RC plane.)

I don’t know what sort of radio infrastructure over there (well, I know they’re running CDM1250s and HT1250s, but I mean, I don’t know if they run repeaters / what power they run), but you might even fit a portable repeater on the little UAV, ensuring that their portable radios could still keep in touch with their post miles away.

As an aside, the radios I saw them with in the show don’t support encryption, meaning that it really wouldn’t be hard for insurgents to tune in. Their bombs keep getting more and more complex, showing that they’ve got some technically-minded people on board. It seems like a pretty bad idea to me to not encrypt your radio traffic in those circumstances.

Activation

My debit card expires this month, so I just got a new one in the mail. It has a number you have to call to activate it. So I dialed, and it rang twice. (I’m used to auto-answer systems picking up on the first ring, but whatever.)

I expected something like, “Thanks for calling Visa! To activate a card, press 1…”

Instead, I got:

“November 8, 2007!!!”

[awkward pause]

[lengthy message in Spanish directing Spanish speakers to press 2]

“Here’s how I can help you.”

[awkward pause]

[To activate a card, say “Activate a card.” To report a lost or stolen card…]

Me: “Activate a card.”

“Okay.”

[awkward pause]

“Please say the last four digits of the card.”

Me: [does so]

“All cards associated with this account have been activated. Goodbye.”

It was the strangest thing. And while “normal people” may like it, I find it extremely awkward to speak to computers on the phone. I’d think it would be less error-prone if I was asked to dial the last four digits. And it would certainly feel less awkward than me sitting in the living room saying, “Activate a card! 1-2-3-4!”

The worst is the greeting. Especially when it comes to credit cards, it’s important to at least pretend you’re a real company. Shouting (excitedly) a date and then having a couple seconds go by doesn’t inspire too much confidence.

A New Addiction?

I don’t normally watch TV. If I have time to waste, I find it much more satisfying to waste it on the computer. But yesterday my brother was watching TV. I ended up watching a bit of TV with him, and then he left. So I started flipping through the channels, and came across National Geographic in HD.

Whatever you might have thought National Geographic TV would be about, you’re probably wrong. I tuned in halfway through a show about mobsters controlling Las Vegas in the 80s (or maybe 70s), and the FBI work to bring them down. It was kind of a fascinating show, actually, but when it ended, I looked forward to quitting this “TV” thing and getting back to the computer.

But then the next show was about the Mafia working with biker gangs to sell cocaine in Canada, which got me hooked. After that, I finally escaped.

But then last night was another one of those “I finished the Internet” moments, and I really didn’t have the motivation to go do anything actually productive. So I tried another hit of TV.

National Geographic (which they call “Nat Geo,” a name which for some reason comes across as pretentious and irksome to me) had a program on “police technology,” the latest high-tech they’re using. They talked about the “Shot Spotter,” a neat system of microphones throughout a “bad neighborhood” in LA as a pilot program. They fired a test shot to demonstrate the system (they apparently test it often). About a minute later a cruiser showed up. Apparently it works by calculating the (miniscule) difference in arrival of the gunshot sound to various listening posts, and then triangulates the location with awfully good accuracy. He also modeled some newer non-lethal weapons. They modeled tasers, which, despite the whole “Don’t tase me bro” bad reputation they’re getting, actually strike me as a good thing. (They’re basically there for times when an officer’s only other choice is to pull out his gun, and I’d certainly rather get tasered than shot in the face.) The new ones come with embedeed video and audio cameras for accountability purposes. They also had what’s basically a paintball gun, firing something like 10 rounds a second. They have a few different “balls” they can fire, including hard rubber balls intended to inflict a bit of injury for crowd control, and traditional paintballs for marking suspects. But the neat one was the one they apparently use the most: “paintballs” filled with something like mace in powered form. They can fire a couple at a suspect and stop him pretty much instantly, or, for crowd control, fire a stream of them at the ground in a distance to keep people back.

That finished up at 11. Finally, I could escape the TV. Except, darn them, the next program was about past CIA programs, including some insane attempts at brainwashing people / feeding them LSD, and the pretty blatant murder of one of their operatives who’d expressed to his superiors that he was very uncomfortable with them testing serin on people, particularly after one died. So he “committed suicide” by jumping out a 13th-story window. The CIA insisted on a closed-casket funeral (for the family’s protection, of course!), and apparently discreetly had a few CIA agents at the funeral. The family later caught on that someone wasn’t right and exhumed the body, finding that he suffered blunt trauma to the head (which entirely contradicted the medical reports), and that the CIA the next year released a “manual” for their agents, including a recommendation that, to kill people, you should whack them in the head to knock them out, and then hurl them out a window to make it look like suicide.

Right now I’m just tuning into “Seconds from Disaster,” a show about a volcano in the Caribbean.

At 3pm there’s a show on the Green Berets, but I’m gone then. 4pm is a show about “hired guns” in Iraq, 7pm is about the shooting of Ronald Reagan, 8pm on Kent State, and 9pm is a program they’ve been hyping on the Oklahoma City bombing, with suggestions that the people convicted for it didn’t act alone. 10pm is about Columbine. I can skip 11pm, because it’s a repeat of Kent State. In fact, it’s repeats until 3am, when it’s about an old al Queda attempt at blowing up a plane. 4am is another plane crash, and 5am is another al Queda attack. Then 10am is “Military technology inspired by nature.”

I don’t think I can sleep anymore.

Seriously, this is an amazing TV network.

Clarity

I saw a reference to RAID 6 and didn’t recognize it, so I did what anyone would do–I Wikipediad (I’m going to make that a verb) it:

RAID 6 extends RAID 5 by adding an additional parity block, thus it uses block-level striping with two parity blocks distributed across all member disks. It was not one of the original RAID levels.

So that’s why I hadn’t heard of it–it’s not an “original” RAID level. (I don’t subscribe to RAID trade publications, so I wasn’t aware of it.) The description is a good one-liner, but there’s more text that follows. Surely, it will give me a good insight into exactly what this means and how it works in an applied setting.

RAID 5 can be seen as special case of a Reed-Solomon code.[5] RAID 5, being a degenerate case, requires only addition in the Galois field. Since we are operating on bits, the field used is a binary galois field GF(2). In cyclic representations of binary galois fields, addition is computed by a simple XOR.

After understanding RAID 5 as a special case of a Reed-Solomon code, it is easy to see that it is possible to extend the approach to produce redundancy simply by producing another syndrome; typically a polynomial in GF(28) (8 means we are operating on bytes). By adding additional syndromes it is possible to achieve any number of redundant disks, and recover from the failure of that many drives anywhere in the array, but RAID 6 refers to the specific case of two syndromes.

Wait, what? Reed-Solomon? Degenerate cases? Galois fields? Binary galois fields in cyclic representations? Special cases of the Reed-Solomon code? Polynomial notation of the Reed-Solomon field? I’m lost. Very lost, in fact. Here I was hoping for an expansion over a one-liner that I pretty much understood but that was somewhat vague. And instead I get… I’m not even sure what I got.

More on Time

I’m worried you’ll all think I’ve snapped and become obsessed with time. It’s not quite that bad. But here’s another post about time.

A lot of Windows machines seem to sync to time.windows.com, and Apple has its own time.apple.com service. I come across this interesting (to me) post about Apple’s service, and started doing some looking. First, an important bit of terminology (for those who don’t follow my every post): the stratum of an NTP server is basically its place in a hierarchy of systems. Stratum 1 is the top, meaning that it’s directly connected to an accurate time source (e.g., a GPS or other hyper-accurate time source). Lots and lots of people sync their clocks to Stratum 1 servers, and thus become Stratum 2. And lots of the Stratum 2 servers join the pool, meaning that people who sync to them become stratum 3. With each step down the hierarchy, you increment the stratum by one. The impact of stratum varies: you become more and more removed with each step, which obviously introduces error. But the level of error varies: it’s conceivable that strata 1-3 would all be on the same LAN, but it’s also possible that my stratum 3 server I sync to is sitting on a 100 Mbps line in a data center in Boston, gets it time from a stratum 2 machine with a 128kbps satellite link in Zimbabwe, which gets it time from a stratum 1 on a 3600bps dialup line in Rhodesia. (Is that even a country anymore?) So the “loss” of being several strata down varies somewhere from maybe 1ms up through many seconds.

One final note: the following commands are being run on a node that’s itself a stratum 2 NTP box, so pay attention to the “offset” field. (And note that it’s in seconds, not milliseconds, just to add a healthy dose of confusion.)

# ntpdate -q time.windows.com
server 207.46.197.32, stratum 3, offset 0.016801, delay 0.08435
 9 Mar 22:47:55 ntpdate[22072]: adjust time server 207.46.197.32 offset 0.016801 sec

First, it’s inexplicably a stratum 3 host. This isn’t necessarily bad as I discussed, so much as odd–you’d think that time.windows.com could at the very least be stratum 2, if not holding a GPS itself. (“time.windows.com” may actually be a number of machines sharing an IP, with GeoIP or whatnot.)

For comparison, the nodes I sync to:

#  ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-time-C.timefreq .ACTS.           1 u  142 1024  377   18.854    4.204   0.025
-india.colorado. .ACTS.           1 u   93 1024  377   23.777   11.419   3.751
+rrcs-64-183-56- .GPS.            1 u  408 1024  377   55.427    0.217   1.047
*tick.UH.EDU     .GPS.            1 u  483 1024  377   25.108   -2.053   0.082
+clock.xmission. .GPS.            1 u  145 1024  377   38.306   -0.861   0.019

Pay attention to the “offset” column, which here is in milliseconds. The timeservers range from pulling be back 2 milliseconds to advancing me 11 milliseconds. (Although also note the “-” on the first line, indicating that the +11ms host is considered a fairly bad source. By comparison, the “*” indicates the current ‘best’ server, which set my clock back 2ms.)

Microsoft’s server is trying to pull me 0.016801 seconds, or 16.8ms. As shown above, this is even worse than the server that’s being rejected for being too far off. (Of course, it’s worth repeating that it’s less than 17 milliseconds, which is more than enough accuracy if your goal is to simply keep your clock accurate!)

How about Apple?

# ntpdate -q time.apple.com
server 17.254.0.28, stratum 2, offset 0.004297, delay 0.07193
server 17.254.0.31, stratum 2, offset 0.003697, delay 0.07074
server 17.254.0.26, stratum 2, offset 0.004346, delay 0.07195
server 17.254.0.27, stratum 3, offset 0.004369, delay 0.07195
 9 Mar 23:23:37 ntpdate[4321]: adjust time server 17.254.0.31 offset 0.003697 sec

This one looks different than the Microsoft one: there are four IPs for time.apple.com. All four are kept at stratum 2. The offset here is better: 0.003697 seconds, or 3.7ms. It’s still pulling me in the wrong direction (-2ms was ruled the best, whereas this wants to advance me +4ms), but it’s much closer to accurate.

This got me wondering: Apple has 4 A records for time.apple.com… What does the NTP pool look like?

# ntpdate -q pool.ntp.org
server 67.201.12.252, stratum 3, offset 0.002990, delay 0.05959
server 69.36.240.252, stratum 2, offset 0.003088, delay 0.06898
server 98.172.38.232, stratum 2, offset 0.022841, delay 0.08508
server 209.67.219.106, stratum 3, offset 0.019785, delay 0.02614
server 216.184.20.83, stratum 3, offset 0.012001, delay 0.10208
 9 Mar 23:31:21 ntpdate[7524]: adjust time server 69.36.240.252 offset 0.003088 sec

It returns five hosts. And a quick aside:

;; ANSWER SECTION:
pool.ntp.org.           948     IN      A       209.67.219.106
pool.ntp.org.           948     IN      A       67.201.12.252
pool.ntp.org.           948     IN      A       98.172.38.232
pool.ntp.org.           948     IN      A       69.36.240.252
pool.ntp.org.           948     IN      A       216.184.20.83

They have a long expiry: 948 seconds, or about 16 minutes. Sure enough, asking again gets me the same 5, but with a decremented TTL. They do run GeoIP, though, so my other server is getting a different list, even though both are in the US. (But it keeps the same list, too.) But anyway, back to the ntpdate output…

I’d actually run this earlier, and gotten all stratum 1’s and stratum 2’s, which I was going to post. Now it’s all 2’s and 3’s. The dork in me wants to run this every ttl seconds and keep logs.

This shows where the NTP clock selection algorithm shines. I know from belonging to the pool that we get “monitored” by a server that checks our time a few times an hour and pulls us out of the pool if our servers are off (I think a second is the tolerance). Here, the worst server is 0.022841 seconds (22.8ms) off, and one of the best is picked. (The “best,” the one closest to mine, is actually discounted because it’s a stratum 3, and the stratum 2 one is presumed to be more accurate. Closeness to your own clock is generally not a metric, unless you’re running these on a server that’s already kept to good time.) But the net result is an offset of about 3ms: a slight improvement over Apple, and a big improvement over Microsoft.

Aside: I just ran the Microsoft one again, and its offset had dropped to 2.3ms. So I ran it with Apple and Microsoft in one, letting the NTP algorithm do its magic… Suddenly Microsoft was suggesting a -1ms offset, with Apple pushing a 4ms increase. They’ve all converged a bit, although the time.windows.com one seems to jump around a bit more than the others?

The moral of the story? All of these timeservers are really good if you’re just looking to keep the clock in your system tray right. If you’re a total nut for the right time, the pool is best, followed by Apple, then Microsoft.

Addendum: Microsoft hosts one of NIST’s Stratum 1 servers, which makes it all the stranger that time.microsoft.com is all the way at stratum 3.

Sysadmin

I like to run a really good Windows machine. Firefox is my default browser (although IE’s come leaps and bounds since it’s “I’ll merrily install any program a webpage asks me to!” days), I keep a system free of viruses and spyware, I have a “background” disk defrag tool, I routinely run CCleaner, etc. to purge accumulated cruft, and so forth. In short, I’m a system administrator’s dream. (Actually, I think I’m their nightmare, since the only time I contact them is when I have a really hard question, and I never do anything they expect… But I digress. If I administered a set of desktop nodes, I’d want them to be setup like mine.)

If I ran a computer network, though, I really wouldn’t trust normal people with doing things. Virus definitions need to be updated, virus checks need to be run, recycle bins need to get emptied, stale caches need to get purged, clocks need to get synced, and disks need to be defragged. I do this naturally on my desktop machine, so I don’t think of it as taking a lot of time, but if you asked me to maintain a network of, say, 30 PCs, I’d want to cry.

There exist, of course, a bajillion different tools for administering clusters of PCs. But what I find interesting is that I can’t think of any that really do what I want. I want to make sure certain programs are installed, and run them unattended periodically. Most solutions still seem like they’re require me to go to each PC and do my work, or they’d limit things: an increasingly common thing to do is just reimage each computer when it reboots. In some cases, though, this is totally undesired: people might forget to use their network drive, losing all their work when they reboot. Or they might need to install a legitimate program for their work, and you’d end up losing a lot of productivity as they’re forced to reinstall every time they reboot. (Which means that they won’t reboot often, which complicates other things.)