Big Iron

I keep coming across things like this eBay listing. Sun Enterprise 4500, 12 SPARC processors (400 MHz, 4MB cache) and 12 GB of RAM. This one looks to have a couple Gigabit fiber NICs, too. (Although it’s fiber, so you’d need a pricier switch to use it on a “normal” copper home LAN.)

Even if you foolishly assume that a 400 MHz SPARC is no better than a 400 MHz Celeron, with 12 processors, this is still a net of 4.8 GHz. With a dozen processors, this is clearly best for something that’s very multi-threaded.

Of course, there’s one problem: these machines use SCSI disks. SCSI’s great and all, but it’s expensive, and you can be sure that, if this machine even comes with hard drives (none are listed?), they’re 9GB. So pick up one of these. What’s that you say? Oh, it’s ATA and won’t work with SCSI? No problem!

Nowhere that I see does Sun mention whether Solaris 10 / OpenSolaris will run on older hardware, but I assume it will. Some Linux distros also excel at running on platforms like SPARC.

Now the real question: how much electricity does this thing use?

Faster Compression

It’s no secret that gzip is handy on UNIX systems for compressing files. But what I hadn’t really considered before is that you don’t have to create a huge file and then gzip it. You can simply pipe output through it and have it compressed on the fly.

For example: tt>[root@oxygen]# mysqldump –all-databases -p | gzip > 2008May10-alldbs.sql.gz

That backed up all the databases on this machine and compressed them. (It’s a 31MB file, but that’s nothing when you realize that one of my databases is about 90MB in size, and I have plenty others at 10-20MB each.)

Tip o’ the Day

The Web Developer toolbar, which is (1) the #1 hit on Google for “Web Developer,” and (2) now compatible with Firefox 3 beta, is totally awesome. You may recall that, in the past, if you had text after a bulleted list or similar on this page, the text would suddenly be mashed together. I never took the time to fully look into it, but it always irked me.

A quick “Outline… Outline Block Level Elements” drew colored boxes around each element of the page, which was exceptionally helpful. This shows the problem: posts start off inside a

tag, and adding a list or similar closes the

tag. This would have been an easy catch, except that the list looked fine. Upon a closer review, it’s because the lists specified the same line-spacing, thus looking right. While I most likely could have solved this by staring at the code for a long time, Web Developer made it much easier to spot: the first text is inside one box, followed by the list, but the other text is floating outside, leading to a quick, “Oh, I should look at how the

is set up” thought, which ended up being exactly the problem. (There’s a bit of excessive space now, but that’s caused by me using PHP to inject linebreaks.)

Web Developer also includes a lot of other useful tools, including the ability to edit the HTML of the page you’re viewing, view server headers, resize non-resizeable elements frames, show page comments, change GETs to POSTs and vice-versa, and much more. Whether you do design full-time, or if you just occasionally fix things, it’s worth having. And you can’t beat the fact that it’s free.

Web Compression

I’ve alluded before to using gzip compression on webserver. HTML is very compressible, so servers moving tremendous amounts of text/HTML would see a major reduction in bandwidth. (Images and such would not see much of a benefit, as they’re already compressed.)

As an example, I downloaded the main page of Wikipedia, retrieving only the HTML and none of the supporting elements (graphics, stylesheets, external JavaScript). It’s 53,190 bytes. (This, frankly, isn’t a lot.) After running it through “gzip -9” (strongest compression), it’s 13,512 bytes, just shy of a 75% reduction in size.

There are a few problems with gzip, though:

  • Not all clients support it. Although frankly, I think most do. This isn’t a huge deal, though, as the client and server “negotiate” the content encoding, so it’ll only be used if it’s supported.
  • Not all servers support it. I don’t believe IIS supports it at all, although I could be wrong. Apache/PHP will merrily do it, but it has to be enabled, which means that lazy server admins won’t turn it on.
  • Although it really shouldn’t work that way, it looks to me as if it will ‘buffer’ the whole page then compress it, then send it. (gzip does support ‘streaming’ compression, just working in blocks.) Thus if you have a page that’s slow to load (e.g., it runs complex database queries that can’t be cached), it will appear even worse: users will get a blank page and then it will suddenly appear in front of them.
  • There’s overhead involved, so it looks like some admins keep it off due to server load. (Aside: it looks like Wikipedia compresses everything, even dynamically-generated content.)

But I’ve come across something interesting… A Hardware gzip Compression Card, apparently capable of handling 3 Gbits/second. I can’t find it for sale anywhere, nor a price mentioned, but I think it would be interesting to set up a sort of squid proxy that would sit between clients and the back-end servers, seamlessly compressing outgoing content to save bandwidth.

Job Qualification

A job posting just listed something like, “Experience using TCP/IP” as a requirement.

I’ve been using it since the fifth grade or so, when we networked two PCs together. I now use it on a daily basis. I’ve used it very extensively, including billions of ACKs, millions of SYNs and FINs, and even some RSTs and PSHs.

But I’m not just being a goofball. The job doesn’t seem to entail any low-level knowledge of TCP/IP. I think they’re just looking for someone who knows what it is. (I also have extensive experience using ICs, device drivers, and OS kernels.)

I don’t really want to work at this place.

The Dream Network

Periodically I come across deals for computers that are very tempting. I’m not necessarily in the market right away: I’m going to keep my laptop until I’ve been working long enough that I can afford something stellar. It’s silly to “upgrade” a little bit. But every time I see these deals, I think of the various ways I could set things up… My “ideal (but realistic) computer” would actually be a network:

  • Network infrastructure: Gigabit Ethernet, switched, over Cat6. 10GigE and fiber are cool, but really not worth the cost for a home network.
  • A server machine. It needn’t be anything too powerful, and could (should) be something that doesn’t use a ton of electricity. The machine would run Linux and serve multiple rolls:
    • Fileserver. It’d have a handful (4-6?) of 500GB disks, running RAID. While performance is important, it’s important to me that this thing be very ‘safe’ and not lose data. (Actually, in a very ideal setup, there’d be two fileservers for maximum redundancy, but my goal with this setup is to be reasonable. What interests me, though, is that I think it’d be possible to use an uncommon but awesome network file system like Coda or AFS, but also have some network shares on top of that service that ‘look normal,’ so Windows could just merrily connect to an M: drive or whatnot, merrily oblivious to the fact that the fileserver is actually a network of two machines.) It’s important that the machine have gobs of free space, so that I can merrily rip every CD and DVD I own, save every photo I take, and back up my computers, without every worrying about being almost out of disk space. It’s also important to be hyper-organized here, and have one “share” for music, one “share” for photos I’ve taken, etc.
    • Internet gateway. It’d act as my router/firewall to the Internet, and also do stuff like DNS caching. It may or may not serve as a caching proxy; I tend to only notice caches when they act up, but then again, it might be quite helpful.
    • Timeserver. For about $100 you can get a good GPS with PPS (pulse-per-second) output and keep time down to a microsecond. Hook it up to the serial port of this machine, and have your local machine sync to that for unnecessarily accurate time. (Actually, it looks like you can do PTP in software with reasonable accuracy?)
    • Asterisk machine, potentially taking in an analog phone line and also VoIP services, and giving me a nice IP-based system to use, blending them all so it’s transparent how they’re coming in. It would also do stuff like voicemail, call routing/forwarding, etc. For added fun, it could be made to do faxes: receive them and save them as a PDF, and act as a “printer” for outgoing faxes. The code’s there to do this already.
    • Printserver. If you have multiple machines, it’s best to hang your printer(s) off of an always-on server. It could speak CUPS or the like to Linux, and simultaneously share the printer for Windows hosts.
    • MythTV backend? But most likely not; I’d prefer to offload that to a more powerful machine, rather than bogging down a server.
  • Primary desktop. Surprisingly, a quad-core system, 4 GB of RAM, and a 24″ LCD can be had for around $1,000 these days. That’s all I need in a system. I have my Logitech G15, which is all the keyboard I need. My concern is with what to run… These days I make use of Windows and Linux pretty heavily. I think virtualization will be mature enough by the time I’m actually going for a setup like this to allow me to get a Linux-based Xen host and run Windows inside of a virtual machine with no performance degradation. (This is actually mostly possible already, but as Andrew will attest, Xen can still have some kinks….) The system should have a big monitor. It’d be interesting to put something like an 8GB solid-state drive in it and use that for a super-fast boot, but the jury’s still out on whether it’s worthwhile. (I guess that some places are pushing SSD under some special name to make Windows boot instantly, but the reviews I’ve heard suggest that it gives a nominal improvement at best.)
  • Secondary desktop. Pay attention for a while to the short bursts of time when you can’t use your computer. The system locks up for a bit, or it’s just unbearably slow while the disks spin up and get a massive file, or you have to reboot, or you’re playing a full-screen game and die and wait 15 seconds to respawn, or….. In this “ideal setup,” I’d have a second machine. It needn’t be anything special; in fact, it could be the cheapest machine possible. It’d basically run Firefox, AIM/IRC, Picasa (off of the network fileserver), iTunes, and the like. For the sake of completeness, it should probably run whatever the other system doesn’t, out of Linux, XP, and Vista.

Long-distance NTP

It’s widely-believed that geographic proximity is the main factor determining the accuracy of NTP. While I’m not necessarily disputing its importance, I wanted to mention some interesting tests I’ve done.

As I mentioned earlier, my (“our,” really — @ co-owns it, and keeps paying the bill without sending me his PayPal address… g) NTP server was incorrectly, and pretty randomly, labeled as being in Brazil, and thus set to serve time to people in Brazil and, more generally, South America. But the server is in Pennsylvania.

Surely, then, with such a long distance, quality must be terrible? I decided to measure it in reverse: I did a “mock sync” to the South American pool from my server in Texas. The command ntpdate -q servername will “sync” to a server (or set of servers) via NTP, but only show you the offset it would have applied, rather than actually adjusting your clock. I ran this on a server that’s a stratum 2 server that’s been very stable on time: when it syncs with GPS-stabilized clocks every 1024 seconds, it’s rare for it to shift the time more than 10ms. So by syncing to the South American pool, I was able to, more or less, determine the effect of the long-distance, high-latency synchronization:

oxygen ~ # ntpdate -q south-america.pool.ntp.org
server 200.192.232.8, stratum 2, offset 0.021524, delay 0.23042
server 146.83.183.179, stratum 2, offset -0.002081, delay 0.17311
server 201.84.224.130, stratum 3, offset 0.004254, delay 0.19417
26 Apr 00:33:31 ntpdate[5828]: adjust time server 146.83.183.179 offset -0.002081 sec

(First of all, no, my US server is not in that list.) You can see that it selected three servers from the South American pool, and synced to each of them. I had pings of 230ms (0.23042 seconds), 173ms, and 194ms. I had offsets of 21ms, -2ms, and 4ms. Given the three choices, ntpdate deemed the middle the most accurate, indicating that, even with almost 200ms latency crossing continents (or arbitrary divisions thereof), it could be accurate to 2ms.

NTP “subtracts” latency from the times, so that a high-latency connection won’t affect the accuracy of the time. (What does break things, though, is highly-variable, or highly-asymmetric, latency, which is pretty much impossible to “calculate.”) Still, I was surprised by just how good it was.

I then tried a test synchronization to the US pool:

oxygen ~ # ntpdate -q us.pool.ntp.org
server 66.191.139.147, stratum 2, offset 0.001096, delay 0.07080
server 65.111.164.223, stratum 2, offset -0.003190, delay 0.07475
server 66.250.45.2, stratum 3, offset 0.013041, delay 0.07225
server 63.89.76.60, stratum 2, offset 0.005535, delay 0.07397
server 209.67.219.106, stratum 3, offset 0.003419, delay 0.02605
26 Apr 00:34:36 ntpdate[6216]: adjust time server 66.191.139.147 offset 0.001096 sec

This time we got 5 servers (as is normally the case: I think the set of 3 is just given out for groups with only a handful of total servers). Four of them has pings around 70ms, and the last had a ping of 26ms. The middle server had na offset (time difference) of 13ms, but the other four were pretty close. (And when one server is “way” off, NTP disregards it anyway. Plus, it was a stratum 3 mixed in with mostly stratum 2’s, which is another count against it.) It suggested advancing my clock by 1 millisecond. (When the code is good enough to discard a 13ms difference as ‘wildly inaccurate,’ you know it’s good.)

So we have a 1ms “error” in the US, and a 2ms to South America. Africa is served by a mere 5 NTP servers. How did that go?

oxygen ~ # ntpdate -q africa.pool.ntp.org
server 41.223.43.5, stratum 2, offset 0.086558, delay 0.62405
server 196.43.1.14, stratum 3, offset 0.005741, delay 0.32759
server 196.25.1.9, stratum 2, offset -0.006428, delay 0.29601
server 196.43.1.9, stratum 2, offset -0.007126, delay 0.32123
server 196.25.1.1, stratum 2, offset -0.004245, delay 0.29314
26 Apr 00:31:07 ntpdate[4913]: adjust time server 196.25.1.1 offset -0.004245 sec

Oddly, I was served all five servers in the zone. If you look at pings, they ranges from 293ms to 624ms. That’s really bad. And the first server gave terrible time: essentially wanting to advance my clock 87ms. But again, with multiple servers present, NTP did a great job of working out the right time. It ended up trying to move my clock back 4ms.

300ms (best case) pings, going from the US to Africa, and it’s accurate to 4ms?

Some conclusions, caveats, and comments to consider:

  • “Accuracy” and “error” are really not appropriate terms. Because the server doesn’t have a time source connected directly (e.g., a GPS receiver with pulse-per-second output, a WWV receiver, or a cesium / rubidium reference recently synced to the atomic clock), the server really doesn’t know the true time. Thus we can only measure how the time differences compare. In this, I essentially just assume that my server has the accurate time, which is a dubious claim.
  • The main conclusion here is that NTP is really good at dealing with long pings. The fact that my data is crossing the ocean doesn’t really matter: it can calculate round-trip error. In light of this fact, and ignoring other information, geography is not important.
  • The server I took these measurements from sits in a data center on a nice fat backbone to lots of other providers. Thus it could be argued that it may have a much better connection that a home user might. It would be interesting to try these measurements from a normal PC on a “normal” home line.
  • Thus, my point isn’t so much that “Geography doesn’t matter” as it is, “Even if geography works against you, you can still get really good time with NTP.”

Accuracy

I haven’t been paying too much attention to my NTP server, but some e-mails on the list about problems with the monitoring server got me to look at my server’s stats. Apparently the monitor experienced a bit of network turbulence and started considering everyone to be way off.

I was looking a bit at my local NTP stats. The last polling interval stepped my clock forward 5.024ms. This is perhaps more than I’d like; sometimes the offset is less than a millisecond.

So I started wondering what sort of accuracy this was. My server was happy with its sources, so it’s at the longest polling interval: once every 1,024 seconds, it’ll check. Thus, over 1,024 seconds, I lost 5.024ms.

Some quick calculations show that this is a 0.000491% error. I guess I’ll take it.

Geekery

Trying a different style for this post…

#

We held our “Rock Band Night” event tonight. The turnout wasn’t that great, but it’s a long weekend and gorgeous, so we were happy with the people we got. I brought my Xbox VGA cable, so we ran the Xbox into a projector at 1280×1024. We also pulled out an awesome sound system and hooked into that. What made things even more awesome, though, was that we realized that the projector not only has a Computer In, but a Computer Out, which just mirrors the input. So we hooked up a big monitor, and ended up with the band in front of the screen, facing the crowd, as if they were a normal band. That is how you play Rock Band. It was essentially like having a live band performing, minus the actual musical talent. The crowd was also just right, happily listening, periodically singing along, cheering good people and (good-naturedly) heckling those who missed strings of notes.

#

While listening, I spoke with a student who works in the admissions office, and she mentioned that she gets asked surprisingly often about video games at Bentley. We talked a bit about what we do, and then she asked if we had a website. We do, but the URL was long. So on a whim, I picked up bsgo.org.

#

The process of registering a new domain was interesting. It’s been a while since I went ahead with it. Initially, I inadvertently went to register.com, and merrily proceeded through the registration until it presented me with the total and asked for my credit card information. $79?! It was for a few years, but I forgot that they inexplicably charged a lot. I went to GoDaddy, which charges a more sane rate, but was constantly having to uncheck offers I didn’t want. I wanted to register the domain for one year, not several. I wanted to register bsgo.org, not bsgo.biz and bsgo.info and bsgo.tv. Every time I progressed to the next step, there were more offers for me to turn down.

On the flip side, a couple minutes later, it was live.

#

In the process of adding DNS records, I discovered that some of my existing ones seem corrupt / absent. www.ttwagner.com doesn’t resolve to an IP. If you notice any other assorted weirdness, let me know.

#

We’d talked before about making the webpage more than a barebones site with a couple of pages. Of course, then we get into all sorts of problems with preserving look and feel, and all that happens if we want to update navigation, etc. So I figured this was a great chance to try out SilverStripe, a spiffy-looking CMS. It looks very promising, although it uses some newer features in PHP that require me to update it, which has given me roundabout cause to do a lot of side-projects. (Like working on moving over to the VPS…)

#

I run a mailing list for the club on my machine, using Mailman. It works great, but as I graduate, I want to make sure that they’re not reliant on my server. I intend to keep hosting the list, but I’d hate for critical data to be in the hands of the aging server of someone that doesn’t even go to their school anymore. I wanted to back up the list, but Mailman lacks an “Export list…” feature. (Which annoys me almost enough to want to pick up Python just to add one in?) It turns out that it’s easy, but it took me some poking around.

Mailman, at least on Gentoo, keeps its stuff in /usr/local/mailman. There’s a lists/ folder, with a config.pck that seems to list all the members, as well as all the configuration. This might be good for backing up the list itself, but it’s pretty useless if you just want a list of members to pass on. I figured I could write a script to parse the file and extract the addresses, but I started to Google to see if it was done.

And then I found this page talking about it. And it turns out that there’s a tool to do it included with Mailman, in the bin/ folder. For me, then, /usr/local/mailman/bin/list_lists will list all of the mailing lists on the server. In addition to list_lists, there’s list_members [listname], which will do exactly what I wanted: provide a plain-text list of each member. I then redirected the output to my e-mail address….

./list_members BSGO | mail matt@example.com

#

I’m back in Ubuntu for the first time in a while, and I’ve got it upgrading in the background to Hardy Heron, the latest build of Ubuntu. I’m hoping that Xen will work in Hardy for me, allowing me to stay in Ubuntu permanently: I have too much Windows stuff I need to access. It’s hardly credible without data, but my dad has told me that he did some benchmarking and found that Windows running as a virtual guest on Linux actually outperforms native Windows in cases where you have a VT-capable chip. So I’m not concerned with performance, as much as whether it’ll work.

I’ve found that, for whatever reason, I’m just more comfortable in Linux than in Windows. As I was upgrading to the latest distribution, the process seemed to be slowing down. So I pulled up a command line and ran iftop, which showed me a list of my active network connections with a visualization of bandwidth on each connection. And little things like my ability to pipe the output of a command to an e-mail. This isn’t to say that one platform is “better” than the other, just that I feel more ‘at home’ when I’m on a Linux machine these days.