Weird Spam

I somehow came to read the “blog” of the Perl NOC one day–the network admins for the perl.org sites. They get some really amusing spam. And then there’s that category of things where you think it might be spam, but you’re not sure, like this one.

But anyway, today I was checking through my own mail that got filtered as spam, and got the following:

from    Selma Orr 
to  helen@n1zyy.com,
date    Wed, Mar 26, 2008 at 12:27 PM
subject I hate you damm

Instant delivery worldwide. Certified by VISA and VeriSign.

http://irisembreyck.blogspot.com

They spam me, tell me they hate me, curse at me, and then expect me to buy whatever they’re selling? Also, why does “helen@n1zyy.com” get a lot of spam? That address never existed! (I should disclaim that this message wasn’t actually sent to helen@n1zyy.com, otherwise I wouldn’t have gotten it. But I get rejected mail to helen@n1zyy.com showing up in the logfiles daily!)

The Most Awesome Thing…

…Ever.

Hulu. You can watch TV shows online. In (seemingly, I don’t know the exact resolution) high def. That’s pretty cool. Plus, it’s legal. Oh, and, the most important part: there’s no catch… It’s free. You sign up and watch TV shows.

With shows like Arrested Development, The Office (only 9 episodes right now), House (only 2 episodes), Psych (5), Monk (6), Journeyman (13), I Dream of Jeannie, National Geographic Presents, and…

Alright, you know what? I started listing the cool shows to write a nice, proper review. But the truth is, I really don’t want to write this anymore. I have 8 episodes of The Office, and an episode each of House and Psych to watch. And that’s just of the first four series I’ve listed. Paging through the list of shows to list the ones I love, I realized that I’d much rather be watching Hulu than writing about it.

So sign up and come join me in what might be the single biggest blow ever dealt to American productivity.

PayPal

PayPal’s really been getting on my nerves.

About a month ago, they froze my account, citing protecting my security. They said someone had attempted to access my account, they said. I performed the first two verification steps, but now I’m waiting on mail at home with a “security code” I have to enter to confirm it. Of course, it’s been almost two weeks with no mail from them.

So, through the PayPal site, I sent them a message asking what was up. I should clarify that I’m absolutely positive it was the “real” PayPal site. The certificate matches, and I initiated the access, so it’s not like I’m getting e-mail asking me to click a link (to paypal.com.this.is.a.scam.geocities.com)

It just bounced back to me, citing an unknown user on their end.

So I’m now approaching a month with no access to my account. I am not impressed, especially by their internal contact form bouncing back to me.

Professionalism

I frequent WebHostingTalk.com, a really good forum for people in the web hosting industry. There are lots of really knowledgeable people on there, but there are also sorts of people without so much technical knowledge….

There was one guy a while back who announced that he was starting a video sharing site (a la Youtube) and that he’d need 450 petabytes of transfer a month. No one was quite sure how to respond, since this is orders of magnitude more than anyone measures anything in. I calculated that he’d be using about 1,400 Gbps. (And that’s an average… Real traffic patterns for big sites are more of a sine wave, so you’d probably want about 2,000 Gbps aggregate capacity, which you’d be filling at peak hours.) I’m fairly certain that even a site like Google doesn’t use anything like that. In fact, I’m fairly certain that even if a site like Google called up their providers and asked for 1,400 Gbps, they’d be laughed at. No one out there can provide that.

But some are just distressing. One guy posted, maybe a year ago, that he was getting a “private room” and didn’t know what he’d need for equipment. Did he need a router? Switches? A “private room” in a data center, by the way, is to host your many racks of servers, walled off from others for maximum security. You’ve got to be a very big place, with a very big budget, to be doing that. This is kind of like asking, “I’m buying a 500,000 square foot warehouse. What do I need? Do I need a forklift? Lights?” (A lot of answers were basically, “What do you need? You need an IT department, and someone who doesn’t have to ask this question.” Although my favorite answer was, “Padded walls.” Normally it annoys me when people give rude answers online, but I couldn’t help but burst out laughing.)

Today’s post is from a guy who seems to have about 30 servers with one company, running what I can only assume is a successful hosting company. He’ll fill one server and order another, but he’s having difficulty “managing” the traffic–he wanted to pool all of the bandwidth together. This is something that most big companies will do for you if you ask, since you’re a huge customer and they know that their competitors will do it if they don’t.

If you buy a dedicated server, you’re usually given a bandwidth allocation in GB/month. I’m allowed 1,000 GB a month, for example. (And I don’t use 5% of it.) This comes out to using about 3 Mbps 24/7, but it’s much more convenient for me since I don’t have to worry about momentary usage, just the net amount of transfer moved. There are also subtleties here: I have 1,000 GB over a 10 Mbps line. 1,000 GB means that my average use can be up to 3 Mbps. But, in real life, as I mentioned, traffic patterns ebb and flow. If I were using 3 Mbps average (I’m not), I might be using 5 Mbps during the day, and 1 Mbps at night. So just giving me a 3 Mbps line wouldn’t cut it, since it’d be really crappy during the day.

But this guy’s host quoted him a price in Mbps. He was very confused by this. He was used to his GB/month, and didn’t know what to make of these foreign “Mbps” measurements.

Someone else just posted about how some guy with the IP 0.0.0.0 keeps connecting to him, and wondering if he should ban that IP, which he thinks is awfully suspicious. (It’s not as bad as the guy who was getting people with “blank IPs” connecting to him, and wondering if he could ban a null IP in his firewall… It turned out that he was running some random command which was returning way more than just IPs, hence a number of blank lines…)

Who are these people? I wouldn’t post a blog making fun of people who didn’t know otherwise obscure things, except that these should be basic little tasks for people in these positions. It’d be like a certified (not certifiable, but certified) sysadmin for Windows systems posting and saying, “I need to change my desktop background? How can I do this?” Or a car mechanic, who’s gone on and opened his third garage, posting and saying, “The oil in my car is old and dirty. Is it possible to somehow drain the old oil and put new oil in?” Or, for the more absurd requests we see, someone posting on a financial forum about how they’re starting a lemonade stand and think they need $750 billion in startup capital, wondering what bank will give them a better interest rate. It just shocks me that these people are successful and yet so clueless.

Activation

My debit card expires this month, so I just got a new one in the mail. It has a number you have to call to activate it. So I dialed, and it rang twice. (I’m used to auto-answer systems picking up on the first ring, but whatever.)

I expected something like, “Thanks for calling Visa! To activate a card, press 1…”

Instead, I got:

“November 8, 2007!!!”

[awkward pause]

[lengthy message in Spanish directing Spanish speakers to press 2]

“Here’s how I can help you.”

[awkward pause]

[To activate a card, say “Activate a card.” To report a lost or stolen card…]

Me: “Activate a card.”

“Okay.”

[awkward pause]

“Please say the last four digits of the card.”

Me: [does so]

“All cards associated with this account have been activated. Goodbye.”

It was the strangest thing. And while “normal people” may like it, I find it extremely awkward to speak to computers on the phone. I’d think it would be less error-prone if I was asked to dial the last four digits. And it would certainly feel less awkward than me sitting in the living room saying, “Activate a card! 1-2-3-4!”

The worst is the greeting. Especially when it comes to credit cards, it’s important to at least pretend you’re a real company. Shouting (excitedly) a date and then having a couple seconds go by doesn’t inspire too much confidence.

Clarity

I saw a reference to RAID 6 and didn’t recognize it, so I did what anyone would do–I Wikipediad (I’m going to make that a verb) it:

RAID 6 extends RAID 5 by adding an additional parity block, thus it uses block-level striping with two parity blocks distributed across all member disks. It was not one of the original RAID levels.

So that’s why I hadn’t heard of it–it’s not an “original” RAID level. (I don’t subscribe to RAID trade publications, so I wasn’t aware of it.) The description is a good one-liner, but there’s more text that follows. Surely, it will give me a good insight into exactly what this means and how it works in an applied setting.

RAID 5 can be seen as special case of a Reed-Solomon code.[5] RAID 5, being a degenerate case, requires only addition in the Galois field. Since we are operating on bits, the field used is a binary galois field GF(2). In cyclic representations of binary galois fields, addition is computed by a simple XOR.

After understanding RAID 5 as a special case of a Reed-Solomon code, it is easy to see that it is possible to extend the approach to produce redundancy simply by producing another syndrome; typically a polynomial in GF(28) (8 means we are operating on bytes). By adding additional syndromes it is possible to achieve any number of redundant disks, and recover from the failure of that many drives anywhere in the array, but RAID 6 refers to the specific case of two syndromes.

Wait, what? Reed-Solomon? Degenerate cases? Galois fields? Binary galois fields in cyclic representations? Special cases of the Reed-Solomon code? Polynomial notation of the Reed-Solomon field? I’m lost. Very lost, in fact. Here I was hoping for an expansion over a one-liner that I pretty much understood but that was somewhat vague. And instead I get… I’m not even sure what I got.

More on Time

I’m worried you’ll all think I’ve snapped and become obsessed with time. It’s not quite that bad. But here’s another post about time.

A lot of Windows machines seem to sync to time.windows.com, and Apple has its own time.apple.com service. I come across this interesting (to me) post about Apple’s service, and started doing some looking. First, an important bit of terminology (for those who don’t follow my every post): the stratum of an NTP server is basically its place in a hierarchy of systems. Stratum 1 is the top, meaning that it’s directly connected to an accurate time source (e.g., a GPS or other hyper-accurate time source). Lots and lots of people sync their clocks to Stratum 1 servers, and thus become Stratum 2. And lots of the Stratum 2 servers join the pool, meaning that people who sync to them become stratum 3. With each step down the hierarchy, you increment the stratum by one. The impact of stratum varies: you become more and more removed with each step, which obviously introduces error. But the level of error varies: it’s conceivable that strata 1-3 would all be on the same LAN, but it’s also possible that my stratum 3 server I sync to is sitting on a 100 Mbps line in a data center in Boston, gets it time from a stratum 2 machine with a 128kbps satellite link in Zimbabwe, which gets it time from a stratum 1 on a 3600bps dialup line in Rhodesia. (Is that even a country anymore?) So the “loss” of being several strata down varies somewhere from maybe 1ms up through many seconds.

One final note: the following commands are being run on a node that’s itself a stratum 2 NTP box, so pay attention to the “offset” field. (And note that it’s in seconds, not milliseconds, just to add a healthy dose of confusion.)

# ntpdate -q time.windows.com
server 207.46.197.32, stratum 3, offset 0.016801, delay 0.08435
 9 Mar 22:47:55 ntpdate[22072]: adjust time server 207.46.197.32 offset 0.016801 sec

First, it’s inexplicably a stratum 3 host. This isn’t necessarily bad as I discussed, so much as odd–you’d think that time.windows.com could at the very least be stratum 2, if not holding a GPS itself. (“time.windows.com” may actually be a number of machines sharing an IP, with GeoIP or whatnot.)

For comparison, the nodes I sync to:

#  ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-time-C.timefreq .ACTS.           1 u  142 1024  377   18.854    4.204   0.025
-india.colorado. .ACTS.           1 u   93 1024  377   23.777   11.419   3.751
+rrcs-64-183-56- .GPS.            1 u  408 1024  377   55.427    0.217   1.047
*tick.UH.EDU     .GPS.            1 u  483 1024  377   25.108   -2.053   0.082
+clock.xmission. .GPS.            1 u  145 1024  377   38.306   -0.861   0.019

Pay attention to the “offset” column, which here is in milliseconds. The timeservers range from pulling be back 2 milliseconds to advancing me 11 milliseconds. (Although also note the “-” on the first line, indicating that the +11ms host is considered a fairly bad source. By comparison, the “*” indicates the current ‘best’ server, which set my clock back 2ms.)

Microsoft’s server is trying to pull me 0.016801 seconds, or 16.8ms. As shown above, this is even worse than the server that’s being rejected for being too far off. (Of course, it’s worth repeating that it’s less than 17 milliseconds, which is more than enough accuracy if your goal is to simply keep your clock accurate!)

How about Apple?

# ntpdate -q time.apple.com
server 17.254.0.28, stratum 2, offset 0.004297, delay 0.07193
server 17.254.0.31, stratum 2, offset 0.003697, delay 0.07074
server 17.254.0.26, stratum 2, offset 0.004346, delay 0.07195
server 17.254.0.27, stratum 3, offset 0.004369, delay 0.07195
 9 Mar 23:23:37 ntpdate[4321]: adjust time server 17.254.0.31 offset 0.003697 sec

This one looks different than the Microsoft one: there are four IPs for time.apple.com. All four are kept at stratum 2. The offset here is better: 0.003697 seconds, or 3.7ms. It’s still pulling me in the wrong direction (-2ms was ruled the best, whereas this wants to advance me +4ms), but it’s much closer to accurate.

This got me wondering: Apple has 4 A records for time.apple.com… What does the NTP pool look like?

# ntpdate -q pool.ntp.org
server 67.201.12.252, stratum 3, offset 0.002990, delay 0.05959
server 69.36.240.252, stratum 2, offset 0.003088, delay 0.06898
server 98.172.38.232, stratum 2, offset 0.022841, delay 0.08508
server 209.67.219.106, stratum 3, offset 0.019785, delay 0.02614
server 216.184.20.83, stratum 3, offset 0.012001, delay 0.10208
 9 Mar 23:31:21 ntpdate[7524]: adjust time server 69.36.240.252 offset 0.003088 sec

It returns five hosts. And a quick aside:

;; ANSWER SECTION:
pool.ntp.org.           948     IN      A       209.67.219.106
pool.ntp.org.           948     IN      A       67.201.12.252
pool.ntp.org.           948     IN      A       98.172.38.232
pool.ntp.org.           948     IN      A       69.36.240.252
pool.ntp.org.           948     IN      A       216.184.20.83

They have a long expiry: 948 seconds, or about 16 minutes. Sure enough, asking again gets me the same 5, but with a decremented TTL. They do run GeoIP, though, so my other server is getting a different list, even though both are in the US. (But it keeps the same list, too.) But anyway, back to the ntpdate output…

I’d actually run this earlier, and gotten all stratum 1’s and stratum 2’s, which I was going to post. Now it’s all 2’s and 3’s. The dork in me wants to run this every ttl seconds and keep logs.

This shows where the NTP clock selection algorithm shines. I know from belonging to the pool that we get “monitored” by a server that checks our time a few times an hour and pulls us out of the pool if our servers are off (I think a second is the tolerance). Here, the worst server is 0.022841 seconds (22.8ms) off, and one of the best is picked. (The “best,” the one closest to mine, is actually discounted because it’s a stratum 3, and the stratum 2 one is presumed to be more accurate. Closeness to your own clock is generally not a metric, unless you’re running these on a server that’s already kept to good time.) But the net result is an offset of about 3ms: a slight improvement over Apple, and a big improvement over Microsoft.

Aside: I just ran the Microsoft one again, and its offset had dropped to 2.3ms. So I ran it with Apple and Microsoft in one, letting the NTP algorithm do its magic… Suddenly Microsoft was suggesting a -1ms offset, with Apple pushing a 4ms increase. They’ve all converged a bit, although the time.windows.com one seems to jump around a bit more than the others?

The moral of the story? All of these timeservers are really good if you’re just looking to keep the clock in your system tray right. If you’re a total nut for the right time, the pool is best, followed by Apple, then Microsoft.

Addendum: Microsoft hosts one of NIST’s Stratum 1 servers, which makes it all the stranger that time.microsoft.com is all the way at stratum 3.

Time

So I’ve mentioned before that I run an NTP server. Stratum 2, which means it gets its time from a “Stratum 1,” which is set directly to something reliable. The main goal of NTP is to keep clocks in sync, and it’s pretty accurate, down to a fraction of a second, which is more accuracy than most people need. All of my computers will now agree on the time down to a second.

The ultimate source, of course, is the atomic clock. But there isn’t an atomic clock, per se. There’s actually an array of them, each using cesium or hydrogen as an atomic reference. Collectively they form “the” atomic clock, which is used as a frequency standard.

It’s all well and good to keep your computer clock (and wristwatch, and microwave, and oven, and wall clock…) synced within a second, but some things need more accuracy. The USNO (US Naval Observatory, in charge of maintaining the atomic clock system) explains one common scenario well: systems for determining one’s location, such as GPS and LORAN “are based on the travel time of the electromagnetic signals: an accuracy of 10 nanoseconds (10 one-billionths of a second) corresponds to a position accuracy of 10 feet.” There are also lots of other scientific uses for extremely precise time, many of which I couldn’t even begin to understand the basic premise of. But suffice it to say that there are actually a lot of times when knowing the time down to the nanosecond is important.

Things like NTP don’t cut it here. You can get down to the millisecond, but you need to be about a million times more accurate. (A millisecond is a thousand microseconds, which is a thousand nanoseconds.) So how do you keep the exact time? It turns out that there are actually several ways. One way (decreasingly common) was to keep an atomic clock of your own. You can buy a “small” (the size of a computer…ish) device that has cesium or hydrogen or rubidium inside of it, which keeps pretty accurate time. Over time it’ll wander, but at least short-term, it’s quite accurate.

One of the first ways is WWV, a shortwave radio station. (And it’s Hawaiian sister station, WWVH.) They run continuously, disseminating the exact time via radio as observed from the atomic clock system. In the past I’ve synced my watch to this source. More notable, in a behind-the-scenes type of way, is WWVB, a low-frequency (60 kHz) radio broadcast. This is what all your “atomic wall clocks” sync to. (Incidentally, I’ve read that most of them are fairly cheaply built, meaning that their time is really not accurate to more than a second.) Another interesting sidenote is the deal with their antennas: a quarter-wavelength antenna at such a low frequency is 1,250 meters tall, or about 4,100 feet (nearly a mile). But with some wacky designs they can overcome this (although pouring 50,000 Watts into it also helps).

The problem with “straight” receivers for WWVB, though, is that you have to figure in the time it takes for the signal to reach you, which is rarely done all that well (if at all). Instead, a more common technology is used: GPS.

It turns out that GPS carries insanely accurate time. Wikipedia has a really good article on it. Each GPS satellite carries an atomic clock onboard, and people on the ground keep it synced (with nanosecond accuracy) to the atomic system. There’s some funky correction going on to keep things perfectly accurate. GPS has a claimed accuracy of 100 nanoseconds, although people have found that it’s actually about ten times better, down to 10 nanoseconds or so.

As an aside, GPS in general is an interesting read. There’s a lot more going on than meets the eye. I recently dug up an old GPS and wondered if it needed an “update” to get new satellite positions: with ham satellites, we get periodic updates for our tracking software to account for changes in their path. GPS has a neat solution, though: the satellites broadcast this data. Actually, more accurately, they broadcast all the data for all the satellites, so that seeing one satellite will fill you in on the whole setup. There used to be Selective Availability, basically a deliberate introduction of error into the signal. The premise was that we didn’t want enemy forces using it: imagine a GPS-guided rocket, for example. So we introduced error of about 30 meters for a while. Ironically, it was ended because our own troops (before Iraq) couldn’t get the military units, so they were just buying off-the-shelf civilian units and incurring the decreased accuracy. So Selective Availability has been turned off, and there are indications that it was permanent. A third interesting tidbit is that the GPS satellites carry much more than might meet the eye, including equipment monitoring for nuclear detonations.

The timekeeping problem is what to do when you get the time at the GPS, though. High-end GPS units will provide a pulse-per-second signal, which you cna hook up to a computer via serial, and achieve great accuracy. But there are all sorts of considerations I never thought of. Between the time it actually charges the pin and the time the operating system has processed it takes a little bit of time, os there are special kernel modifications available for Linux and BSD to basically get the kernel directly monitoring the serial port, to greatly speed up its processing. I also discovered the Precision Time Protocol (commonly known by its technical name, IEEE 1588), which is designed to keep extremely accurate time over Ethernet, but apparently requires special NICs to truly work well.

I’ve also learned another interesting tidbit of information. CDMA (which is a general standard, not just the cell phone technology that Verizon uses) apparently requires time down to the microsecond to keep everything in sync, such as your multiple towers and all the units (e.g., phones) in sync and transmitting at the right times. So the easiest way to keep all of their towers in sync to a common standard was to put a GPS receiver at each tower and sync the system to that. Thus CDMA carries extremely accurate time derived from GPS, which has led to some interesting uses. It’s hard to get a GPS signal indoors, so they now make CDMA time units–they sit on a CDMA network in receive-only mode, getting the time but never taking the “next step” of actually affiliating with the network. This lets people get GPS-level accuracy inside buildings.

I Told You…

Texas has problems.

Apparently, in the counties that got around to holding caucuses and primaries, no one was quite sure what they were doing. People waited hours to cast their ballot (wait, you cast ballots in a caucus? How is that diferent from a primary? Why do they hold them on the same day?), which apparently also confused a lot of people by, for some reason, asking them to select their sexual orientation?

The results (of candidates, not Texans’ sexuality) are still coming in….