Deal!

I’d posted before about my interest in picking up a low-capacity SSD card for my laptop, to drastically speed up disk access. (This actually has nothing to do with my recent posts about slow hard drives…)

Newegg seems to have a 64 GB SSD, 2.5″ SATA disk for $240 after rebate. Interestingly, from the specs, it seems as if not only are the seek times nill (on account of being solid-state), but the throughput exceeds that of your average hard disk. It won’t be released for four days, however. (Found via FatWallet, which also links to a review here.)

For those who aren’t major geeks, SSD is short for “solid-state disk.” Your ordinary hard drive is a bunch of spinning platters, whereas solid-state is the technology you see in a USB thumb drive or the like: no moving parts. The major benefit of SSDs thus far has been seek time: with a normal hard disk, the disk has to find the right spot on the disk and read it. Seek times average 8-10ms on most normal drives, but that adds up quickly with fragmentation or concurrent I/O. With an SSD, there are no moving parts, so “seek time” is pretty much non-existent: files are ready instantly. Early SSDs seemed to not be capable of moving as much data (in terms of MB/sec), though, meaing that SSDs were great for lots of small “random” access, but not so hot for handling big, contiguous files. Now, it’s looking as if OCZ has made SSDs kick butt over normal hard drives, and somehow offered the product at a fraction of what it normally costs. (This 64GB SSD is more normally-priced, to give you an idea of why they haven’t caught on so quickly.)

Incidentally, today I came across deals on two different notebooks for about $700, both of which have 4GB RAM, but 1280×800-pixel screens. The RAM is incredible, as are most of the other specs (though it’s 5400RPM drives), but I think you can do much better on the resolution.

Location Error vs. Time Error

This post christens my newest category, Thinking Aloud. It’s meant to house random thoughts that pop into my head, versus fully fleshed-out ideas. Thus it’s meant more as an invitation for comments than something factual or informative, and is likely full of errors…

Aside from “time geeks,” those who deal with it professionally, and those intricately familiar with the technical details, most people probably are unaware that each of the GPS satellites carries an atomic clock on board. This is necessary because the way the system works, in a nutshell, by triangulating your position from various satellites, where an integral detail is knowing precisely where the satellite is at a given time. More precise time means a more precise location, and there’s not much margin of error here. The GPS satellites are also syncronized daily to the “main” atomic clock (actually a bunch of atomic clocks based on a few different standards), so the net result is that the time from a GPS satellite is accurate down to the nano-second level: they’re within a few billionths of a second of the true time. Of course, GPS units, since they don’t cost millions of dollars, rarely output time this accurately, so even the best units seem to have “only” microsecond accuracy, or time down to a millionth of a second. Still, that’s pretty darn precise.

Thus many–in fact, most–of the stratum 1 NTP servers in the world derive their time from GPS, since it’s now pretty affordable and incredibly accurate.

The problem is that GPS isn’t perfect. Anyone with a GPS probably knows this. It’s liable to be anywhere from a foot off to something like a hundred feet off. This server (I feel bad linking, having just seen what colocation prices out there are like) keeps a scatter plot of its coordinates as reported by GPS. This basically shows the random noise (some would call it jitter) of the signal: the small inaccuracies in GPS are what result in the fixed server seemingly moving around.

We know that an error in location will also cause (or, really, is caused by) an error in time, even if it’s miniscule.

So here’s the wondering aloud part: we know that the server is not moving. (Or at least, we can reasonably assume it’s not.) So suppose we define one position as “right,” and any deviation in that as inaccurate. We could do what they did with Differential GPS and “precision-survey” the location, which would be very expensive. But we could also go for the cheap way, and just take an average. It looks like the center of that scatter graph is around -26.01255, 28.11445. (Unless I’m being dense, that graph seems ‘sideways’ from how we typically view a map, but I digress. The latitude was also stripped of its sign, which put it in Egypt… But again, I digress.)

So suppose we just defined that as the “correct” location, as it’s a good median value. Could we not write code to take the difference in reported location and translate it into a shift in time? Say that six meters East is the same as running 2 microseconds fast? (Totally arbitrary example.) I think the complicating factors wouldn’t whether it was possible, but knowing what to use as ‘true time,’ since if you picked an inaccurate assumed-accurate location, you’d essentially be introducing error, albeit a constant one. The big question, though, is whether it’s worth it: GPS is quite accurate as it is. I’m a perfectionist, so there’s no such thing as “good enough” time, but I have to wonder whether the benefit would even show up.

Building an Improvised CDN

From my “Random ideas I wish I had the resources to try out…” file…

The way the “pretty big” sites work is that they have a cluster of servers… A few are database servers, many are webservers, and a few are front-end caches. The theory is that the webservers do the ‘heavy lifting’ to generate a page… But many pages, such as the main page of the news, Wikipedia, or even these blogs, don’t need to be generated every time. The main page only updates every now and then. So you have a caching server, which basically handles all of the connections. If the page is in cache (and still valid), it’s served right then and there. If the page isn’t in cache, it will get the page from the backend servers and serve it up, and then add it to the cache.

The way the “really big” sites work is that they have many data centers across the country and your browser hits the closest one. This enhances load times and adds in redundancy (data centers do periodically go offline: The Planet did it just last week when a transformer inside blew up and the fire marshalls made them shut down all the generators.). Depending on whether they’re filthy rich or not, they’ll either use GeoIP-based DNS, or have elaborate routing going on. Many companies offer these services, by the way. It’s called CDN, or a Contribution Distribution Network. Akamai is the most obvious one, though you’ve probably used LimeLight before, too, along with some other less-prominent ones.

I’ve been toying with SilverStripe a bit, which is very spiffy, but it has one fatal flaw in my mind: its out-of-box performance is atrocious. I was testing it in a VPS I haven’t used before, so I don’t have a good frame of reference, but I got between 4 and 6 pages/second under benchmarking. That was after I turned on MySQL query caching and installed APC. Of course, I was using SilverStripe to build pages that would probably stay unchanged for weeks at a time. The 4-6 pages/second is similar to how WordPress behaved before I worked on optimizing it. For what it’s worth, static content (that is, stuff that doesn’t require talking to databases and running code) can handle 300-1000 pages/second on my server as some benchmarks I did demonstrated.

There were two main ways to enhance SilverStripe’s performance that I thought of. (Well, a third option, too: realize that no one will visit my SilverStripe site and leave it as-is. But that’s no fun.) The first is to ‘fix’ Silverstripe itself. With WordPress, I tweaked MySQL and set up APC (which gave a bigger boost than with SilverStripe, but still not a huge gain). But then I ended up coding the main page from scratch, and it uses memcache to store the generated page in RAM for a period of time. Instantly, benchmarking showed that I could handle hundreds of pages a second on the meager hardware I’m hosted on. (Soon to change…)

The other option, and one that may actually be preferable, is to just run the software normally, but stick it behind a cache. This might not be an instant fix, as I’m guessing the generated pages are tagged to not allow caching, but that can be fixed. (Aside: people seem to love setting huge expiry times for cached data, like having it cached for an hour. The main page here caches data for 30 seconds, which means that, worst case, the backend would be handling two pages a minute. Although if there were a network involved, I might bump it up or add a way to selectively purge pages from the cache.) squid is the most commonly-used one, but I’ve also heard interesting things about varnish, which was tailor-made for this purpose and is supposed to be a lot more efficient. There’s also pound, which seems interesting, but doesn’t cache on its own. varnish doesn’t yet support gzip compression of pages, which I think would be a major boost in throughput. (Although at the cost of server resources, of course… Unless you could get it working with a hardware gzip card!)

But then I started thinking… That caching frontend doesn’t have to be local! Pick up a machine in another data center as a ‘reverse proxy’ for your site. Viewers hit that, and it will keep an updated page in its cache. Pick a server up when someone’s having a sale and set it up.

But then, you can take it one step further, and pick up boxes to act as your caches in multiple data centers. One on the East Coast, one in the South, one on the West Coast, and one in Europe. (Or whatever your needs call for.) Use PowerDNS with GeoIP to direct viewers to the closest cache. (Indeed, this is what Wikipedia does: they have servers in Florida, the Netherlands, and Korea… DNS hands out the closest server based on where your IP is registered.) You can also keep DNS records with a fairly short TTL, so if one of the cache servers goes offline, you can just pull it from the pool and it’ll stop receiving traffic. You can also use the cache nodes themselves as DNS servers, to help make sure DNS is highly redundant.

It seems to me that it’d be a fairly promising idea, although I think there are some potential kinks you’d have to work out. (Given that you’ll probably have 20-100ms latency in retreiving cache misses, do you set a longer cache duration? But then, do you have to wait an hour for your urgent change to get pushed out? Can you flush only one item from the cache? What about uncacheable content, such as when users have to log in? How do you monitor many nodes to make sure they’re serving the right data? Will ISPs obey your DNS’s TTL records? Most of these things have obvious solutions, really, but the point is that it’s not an off-the-shelf solution, but something you’d have to mold to fit your exact setup.)

Aside: I’d like to put nginx, lighttpd, and Apache in a face-off. I’m reading good things about nginx.

Broken Windows

Last night we were unloading a shopping cart. When done, the place to put it away was pretty far away. But there were about ten other shopping carts littering the parking lot nearby, so I said, “Meh, what’s one more?”

As we got in the car, I proclaimed, “Broken Windows in action!” I think people were confused and assumed I was referring to a literal window which was broken. Instead, I was referring to the Broken Windows Theory, which is an interesting read. The basic premise is that researchers watched an abandoned warehouse. For weeks, no one vandalized the building. One day, one of the researchers (deliberately) broke one of the windows. In short order, vandals knocked out the rest of the windows. The theory is used a lot in policing, but I think it has applications in many other places. Such as parking lots: if you’re diligent in bringing in carts, I’d argue that you’d avoid people doing whta I did. (I also felt the same way at the bowling alley: if we frequently picked up candy wrappers and popcorn from the floor, the place seemed pretty clean. If we slacked, it felt like the place was being trashed by everyone in short order.)

The theory does have its detractors, but it also has strange people who see applications of their theory in parking lots. Enjoy the photo of chives, which have nothing to do with anything, but I just took it and I like it.

Chives

Strange Antenna Challenge

You know those times when you decide to let yourself surf aimlessly? And an hour later, you have absolutely no idea how you got to where you did?

I found the K0S Strange Antenna Contest page from 2003, where some ham radio operators started using, well, strange things as antennas. Who’d think that a ladder works well? (No no, not ladder line, but an actual ladder.) In fact, after working some people off of a ladder, they got an even better idea, and stood several ladders up, using them to support a pair of extension ladders laid horizontally, forming a ladder dipole, with impressive results. Sadly, they report that combining two shopping carts to make a dipole did not get them any contacts, nor did a basketball hoop.

This has me wondering what else would work… An aluminum chain link fence? A railing? Train tracks? Power lines? (Kidding on that one. Please do not try to attach anything to power lines.) Curtain rods? A couple of cars? A section of guardrail? A metal lamppost?

I poked around the site some more, to see if they did it in subsequent years. And they did. 2004, for example, saw my joke about using two cars come to fruition. (Okay, so they beat me to it by four years.) 2005 saw someone use a bronze statue, and, the next year, he was at it again with railroad tracks, albeit not full ones, but some sort of art exhibit / monument. (Aside: I’m pretty certain that trying to hook up a bunch of wires to train tracks may arouse a bit of suspicion by the police?) 2006 also saw a pair of exercise machines being used, with a comment about how they weren’t very effective, but the apt comment, “On the other hand, we did in fact make two contacts with a pair of exercise machines standing only a few inches above the earth!” And, confusing everything I know about antennas, someone used a tree. And a football stadium (which includes a commentary about how the university police were initially slightly suspicious about someone getting out of their car and hooking wires up to the stadium for some reason). 2007 saw a bridge as an antenna.

And 2008? Well, see, here’s the best thing. The 2008 Challenge is this weekend!

Of course, as a Technician-class license, I don’t have many HF privileges… The Technician license was (before all license classes saw it eliminated) the only class that didn’t require a Morse code exam, so it’s somewhat ironic that almost all of the new HF privileges Techs were given are in the CW portions of various bands. I do get 28.3-28.5 MHz now, allowing SSB on HF…

Time to hit the books, I think. (I think mine–and that one–might be outdated, actually. Looks like the question pool got revised in 2007.) There are always sample exams online, and the feedback can be helpful. Study a bit and take an exam a day, and then review your answers. (Theoretically, actually, you could just learn the answers to each question without understanding the concepts, though that’s really missing the spirit and point of ham radio.)

That Wacky State

Can you guess the state?

  • Recently had about 100 students arrested, and several fraternties banned, after a massive drug dealing operation was busted at a state university.
  • Recently became the second state in the nation to give homosexuals equal rights.
  • Recently had 2 arrested at another school for selling body parts on the black market.

Okay, so the link gives it away. But this wasn’t really meant to stump people anyway.

A Little Irony?

This falls into the category of things very few people would notice, but….

Microsoft provides time.windows.com, a public NTP server, operating in stratum 2.

I just came across NTPmonitor, a novel Windows app to monitor a handful of NTP servers. (Sadly, it doesn’t offer the option to sync to any of them, probably because most peoples’ computers let them configure it… Mine syncs to a domain controller which seems to want to give me the time, but not with too much accuracy.)

As with most full-featured NTP clients, it shows you what the remote timeserver reports as its reference clock. I’ve got my server in there, ttwagner.com, showing that it’s currently synced to clock.xmission.com. The “pool” server is pool.ntp.org; whichever of the many machines I connected to is synced to rubidium.broad.mit.edu. On the right we have time.nist.gov, synced to “ACTS,” a NIST protocol.

On the left is time.windows.com, the Microsoft NTP server. Its upstream timeserver?

clock3.redhat.com.

Screenshot attached, since I wouldn’t believe it without one.

time.windows.com gets its time from clock3.redhat.com

Big Iron

I keep coming across things like this eBay listing. Sun Enterprise 4500, 12 SPARC processors (400 MHz, 4MB cache) and 12 GB of RAM. This one looks to have a couple Gigabit fiber NICs, too. (Although it’s fiber, so you’d need a pricier switch to use it on a “normal” copper home LAN.)

Even if you foolishly assume that a 400 MHz SPARC is no better than a 400 MHz Celeron, with 12 processors, this is still a net of 4.8 GHz. With a dozen processors, this is clearly best for something that’s very multi-threaded.

Of course, there’s one problem: these machines use SCSI disks. SCSI’s great and all, but it’s expensive, and you can be sure that, if this machine even comes with hard drives (none are listed?), they’re 9GB. So pick up one of these. What’s that you say? Oh, it’s ATA and won’t work with SCSI? No problem!

Nowhere that I see does Sun mention whether Solaris 10 / OpenSolaris will run on older hardware, but I assume it will. Some Linux distros also excel at running on platforms like SPARC.

Now the real question: how much electricity does this thing use?