Iowa

Iowa Caucus today. I’m glued to my computer. But it hasn’t even started.

This article mentions some interesting scenarios. One is that Edwards has been campaigning like mad in Iowa for a long time, so some are suggesting that he might walk away in first place. But Obama’s camp is also expecting a huge turnout: if we can get a flood of young voters to go to the Caucus, Obama’s a shoe-in. The article even mentions that it wouldn’t really be so surprising if Hillary, generally considered the front-runner, comes out in third place. The polls have started contradicting each other. One December 30 poll shows Obama winning slightly, another shows Clinton winning slightly. It’s all within that margin of error, and, on top of the margin of error, you have to wonder about who’s going to show up at the Caucus.

Giuliani is playing his cards… strangely. It looks like he’s blowing off Iowa again. He’s behind even Thompson in Iowa. I’m still surprised that Huckabee is doing so well in Iowa. He and Romney are duking it out there. (And, while I have major issues with a candidate who proclaims that he’s going to recapture our nation for Christ, I think I’d favor him over Romney.)

New Hampshire’s a bit interesting, too. Averaging polls, and mixing gut feeling in, it looks like Clinton enjoys a slight lead over Obama, and both of them are out in front of Edwards. But I think Iowa’s going to play a big role. If Edwards does really well in Iowa, that may bring him success in New Hampshire. Of course I’m crossing my fingers for Obama.

The Republican front gets interesting here, because the candidates who are polling favorably here aren’t the same ones in Iowa. Giuliani isn’t doing too well here, either–and I recall a recent article suggesting that, the more he campaigned here, the more his numbers dropped–but he’s doing better than in Iowa. Huckabee here falls tremendously, though, to a mere 9%. The two big guys here are Romney and McCain. McCain was actually leading in the most recent poll, although a poll a few days earlier said the same about Romney.

South Carolina’s being called another bellwether state. They have split primaries: the R’s go the 19th and the D’s go a week later. The South Caroliners show no love for their neighbor to the North, John Edwards, who’s polling at 17% pretty consistently. Here, Obama and Clinton are also neck-and-neck, although what’s interesting is that it looks like Obama has been closing in: in previous surveys he wasn’t nearly as close. On the Republican front, they’re quite fragmented: Giuliani, McCain, Romney, and Thompson are all pretty close. Huckabee enjoys a significant lead here, with 28% of the vote. I’m thinking that Iowa and New Hampshire might shake things up a bit: Thompson and McCain aren’t looking viable in the first two, so perhaps their supporters will get behind another candidate.

Before South Carolina, though, we have Michigan. They haven’t been getting polled that often, though. It looks like Romney and Huckabee are the two big guys there. We have to go back to November to see Democrat results, but it looks like Clinton has a significant lead in Michigan. And then there’s Florida, where Giuliani leads, with Huckabee and Romney essentially tied for second. Hillary seems to enjoy a significant lead in the Democratic race.

Don’t get too caught up in the need for instant gratification watching who wins, though. The next week is going to be exciting, and then there’s Super Tuesday (or Super Duper Tuesday as it’s now being called), with over 20 states holding primaries the first Tuesday in February. But we can’t just wait until February to know: as the map shows, a sizable number of states have later primaries. Montana and South Dakota are off in the Twilight Zone, holding primaries in June. (Think they’ve had a lot of candidate visits? Then again, think they’re getting a lot of calls?) The DNC is at the end of August, and the RNC is the next week, starting off September.

Big Hosting

I tend to think of web hosting in terms of many sites to a server. And that’s how the majority of sites are hosted–there are multiple sites on this one server, and, if it were run by a hosting company and not owned by me, there’d probably be a couple hundred.

But the other end of the spectrum is a single site that takes up many servers. Most any big site is done this way. Google reportedly has tens of thousands. Any busy site has several, if nothing else to do load-balancing.

Lately I’ve become somewhat interested in the topic, and found some neat stuff about this realm of servers. A lot of things are done that I didn’t think were possible. While configuring my router, for example, I stumbled across stuff on CARP. I always thought of routers as a single point of failure: if your router goes down, everything behind it goes down. So you have two (or more) routers in mission-critical setups.

One thing I wondered about was serving up something that had voluminous data. For example, suppose you have a terabyte of data on your website. One technique might be to put a terabyte of drives in every server and do load balancing from there. But putting a terabyte of drives in each machine is expensive, and, frankly, if you’re putting massive storage in one machine, it’s probably huge but slow drives. Another option would be some sort of ‘horizontal partitioning,’ where five (arbitrary) servers each house one-fifth of the data. This reduces the absurdity of trying to stuff a terabyte of storage into each of your servers, but it brings problems of its own. For one, you don’t have any redundancy: if the machine serving sites starting with A-G goes down, all of those sites go down. Plus, you have no idea of how ‘balanced’ it will be. Even if you tried some intricate means of honing which material went where, the optimal layout would be constantly changing.

Your best bet, really, is to have a bunch of web machines, give them minimal storage (e.g., a 36GB SCSI drive–a 15,000 rpm one!), and have a backend fileserver that has the whole terabyte of data. Viewers would be assigned to any of the webservers (either in a round-robin fashion, or dynamically based on which server was the least busy), which would retrieve the requisite file from the fileserver and present it to the viewer. Of course, this places a huge load on the one fileserver. There’s an implicit assumption that you’re doing caching.

But how do you manage the caching? You’d need some complex code to first check your local cache, and then turn to the fileserver if needed. It’s not that hard to write, but it’s also a pain: rather than a straightforward, “Get the file, execute if it has CGI code, and then serve” process, you need the webserver to do some fancy footwork.

Enter Coda. No, not the awesome web-design GUI, but the distributed filesystem. In a nutshell, you have a server (or multiple servers!) and they each mount a partition called /coda, which refers to the network. But, it’ll cache files as needed. This is massively oversimplifying things: the actual use is to allow you to, say, bring your laptop into the office, work on files on the fileserver, and then, at the end of the day, seamlessly take it home with you to work from home, without having to worry about where the files physically reside. So running it just for the caching is practically a walk in the park: you don’t have complicated revision conflicts or anything of the sort. Another awesome feature about Coda is that, by design, it’s pretty resilient: part of the goal with caching and all was to pretty gracefully handle the fileserver going offline. So really, the more popular files would be cached by each node, with only cache misses hitting the fileserver. I also read an awesome anecdote about people running multiple Coda servers. When a disk fails, they just throw in a blank. You don’t need RAID, because the data’s redundant across other servers. With the new disk, you simply have it rebuild the missing files from other servers.

There’s also Lustre, which was apparently inspired by Coda. They focus on insane scalability, and it’s apparently used in some of the world’s biggest supercomputer clusters. I don’t yet know enough about it, really, but one thing that strikes me as awesome is the concept of “striping” across multiple nodes with the files you want.

The Linux HA project is interesting, too. There’s a lot of stuff that you don’t think about. One is load balancer redundancy… Of course you’d want to do it, but if you switched over to your backup router, all existing connections would be dropped. So they keep a UDP data stream going, where the master keeps the spare(s) in the loop on connection states. Suddenly having a new router or load balancer can also be confusing on the network. So if the master goes down, the spare will come up and just start spoofing its MAC and IP to match the node that went down. There’s a tool called heartbeat, whereby standby servers ping the master to see if it’s up. It’s apparently actually got some complex workings, and they recommend a serial link between the nodes so you’re not dependent on the network. (Granted, if the network to the routers goes down, it really doesn’t matter, but having them quarreling over who’s master will only complicate attempts to bring things back up!)

And there are lots of intricacies I hadn’t considered. It’s sometimes complicated to tell whether a node is down or not. But it turns out that a node in ambiguous state is often a horrible state of affairs: if it’s down and not pulled out of the pool, lots of people will get errors. And if other nodes are detecting oddities but it’s not down, something is awry with the server. There’s a concept called fencing I’d never heard, whereby the ‘quirky’ server is essentially shut out by its peers to prevent it from screwing things up (not only may it run away with shared resources, but the last thing you want is a service acting strangely to try to modify your files). The ultimate example of this is STONITH, which sounds like a fancy technical term (and, by definition, now is a technical term, I suppose), but really stands for “Shoot the Other Node in the Head.” From what I gather from the (odd) description, the basic premise is that if members of a cluster suspect that one of their peers is down, they “make it so” by calling external triggers to pull the node out of the network (often, seemingly, to just reboot the server).

I don’t think anyone is going to set up high-performance server clusters based on what someone borderline-delirious blogged at 1:40 in the morning because he couldn’t sleep, but I thought someone else might find this venture into what was, for me, new territory, to be interesting.

Geekostat

Disclaimer: I can tell right now that this is one of those late-night posts where I should be sleeping, not posting about a technical topic. But these not-entirely-lucid ones are sometimes the most fun to read.

I consider myself extremely tech-savvy. I can build a computer from parts, make my own Ethernet cables, run some performance tuning on interactive websites, write applications in numerous programming languages (as well as SQL and HTML), and much more.

But I still don’t get our digital thermostat. They’re programmed to go down to 58 at night, come up to 67 on weekends and from something like 6 to 9 a.m., and 3 to 9 p.m. on weekdays. In other words, when people are home.

Of course, me being home on vacation isn’t quite compatible with this. There’s a simple override, where you can hit the up or down arrows to set it to a temperature. While I use (and appreciate!) this, it’s also a pain. It’s really no fun waking up and having it be 58. I’d really like to reprogram it to automatically come up to 63 or so around 10:30.

I still don’t get why the whole thing isn’t on the LAN. This would have two obvious benefits right out of the gate–it’d be much easier to configure (even if you let someone with no clue about usability design the GUI, it’ll be better than the myriad knobs, switches, and buttons on our thermostat!), and it’d be more convenient in many cases to pull up a new tab in your web browser than to walk down the hall to the thermostat. (Plus, the thermostat is in my parents’ bedroom. I’d have loved to have turned the heat up a few degrees around 11 tonight, since it’s 9 outside and almost as cold inside. But something tells me they really wouldn’t have appreciated it.)

I’m also not sure that the ‘simple’ thermostat algorithm is that efficient. You figure it works something like:

while(1) { $temp = getTemperature(); $desired = readDial(); if($temp<$desired) furnace.enable; if($temp>$desired) furnace.disable; }

When we view it at ‘computer speed,’ I think we can see one of the basic problems: in theory, the furnace could start flapping, where on one loop iteration it turns the furnace on, and just a fraction of a second later, it turns it off. I don’t profess to know a lot about the overhead in starting a furnace, but I’d imagine that it’s most efficient to let it run for a few minutes.

I think a much better system would be to have a programmed minimum run time: if the furnace is turned on, we should run it for at least 5 minutes. After 5 minutes, we again evaluate the temperature: if it’s at the target, we turn it off. If not, we drop into a quicker polling, maybe once every minute. Incidentally, this is much better for the thermostat’s processor, but if its sole purpose is determining whether to turn something on or off, no one really cares about minimizing overhead.

So you give it a secondary purpose: handling a TCP/IP stack and a basic webserver! All of a sudden, instead of an infinite loop, you run a tiny bit of code every 30 seconds.

You can also generate some interesting statistics. For example, how long does the furnace need to run to raise the temperature one degree? How does this scale–if you want to raise it three degrees, does it take three times as long? How does the temperature of my house look when graphed across a day? How about telling me how long the furnace ran yesterday? And, given information about my furnace’s oil consumption and our fuel costs, it’d be cool to see how much it’s costing. And it could give us suggestions: “If you drop the temperature from 68 to 67, you’ll save $13.50 a month,” or such. This would require some storage, but a gig of solid-state media (e.g., a camera’s SD or CF card) is around $10-20 now. Plus, with the advent of AJAX, you can push some of the processing off to the client–let the client use a Flash applet or some good Javascript to draw the graphs if the thermostat is underpowered!

In conclusion, I’m freezing.

Idea

Why isn’t there a really good “network appliance” as a network gateway? You can get a low-end firewall/router, or you can build your own machine.

Setting up OpenBSD is no walk in the park, though. I want to build an “appliance” based on OpenBSD, and give it a nice spiffy web GUI. You buy the box, plug one side into your switch and one side into your cable modem or whatnot, and spend ten minutes in a web browser fine-tuning it. I was really fond of the appearance of the Cobalt Qube, although it could be made much smaller. And throw a nice LCD on the front with status. You can run a very low-power CPU, something like the one powering these. It really doesn’t need more than 512MB RAM, but give it a small solid-state drive. And a pair of Gigabit cards, not just for the speed, but because GigE cards usually are much higher-quality. In building routers, the quality of your card determines how hard the CPU has to work.

There’s so much that a router can do. You can run a transparent caching proxy, a caching DNS server, priority-based queuing of outgoing traffic (such as prioritizing ACKs so downloads don’t suffer because of uploads, or giving priority to time-sensitive materials such as games), NAT, an internal DHCP server, and, of course, a killer firewall. You can also generate great graphs of things such as bandwidth use, blocked packets, packet loss, latency…You can regulate network access per-IP or per-MAC, and do any sort of filtering you wanted. It could also easily integrate with a wireless network (maybe throw a wireless card in, too!), serving as an access point and enabling features like permitting only certain MACs to connect, requiring authentication, or letting anyone in but requiring that they sign up in some form (a captive portal). And I really don’t understand why worms and viruses spread so well. It’s trivial to block most of them at the network level if you really monitor incoming traffic.

I’m frankly kind of surprised that nothing of this level exists. I think there’s a definite market for quality routers. A $19 router does the job okay, but once you start to max out your connection, you’ll really notice the difference! A good router starts prioritizing traffic, so your ssh connection doesn’t drop and your game doesn’t lag out, but your webpages might load a little slower. An average router doesn’t do anything in particular and just starts dropping packets all over the place, leaving no one better off. (And a really bad router–our old one–seems to deal with a fully-saturated line not by dropping excess packets or using priority queueing, but by reboot itself, leaving everyone worse off… I think this may have had to do with the duct tape.)

High Dynamic Range

I’d been seeing a lot about HDR, or High Dynamic Range, photography. In layman’s terms, the dynamic range of a camera is the range from the darkest to the lightest parts a camera can record in one shot. The problem is that the dynamic range of cameras doesn’t match real life that often.

Long ago, photographers found a halfway decent solution: graduated filters. Basically, you stick a filter in front of the lens, with part of it darker than the rest. It’s great if, say, you want to take a great picture at the beach with both foreground detail and the sky properly exposed.

With computers, though, there’s been another photo. You take a series of bracketed shots: one or two for the sky, one or two for the foreground, etc. Some people have been known to stitch together close to a dozen. Having a tripod helps tremendously here, since the images need to be pretty much exactly the same besides exposure.

Strictly, HDR requires more than a monitor can really display, but a technique called tone mapping is often used. The basic premise is to take the “good” parts of each shot in a bracketed series and stitch them together. Photoshop CS2 and newer has an HDR utility, though I’ve been pretty unimpressed with the results. Today I started playing around with an Open Source tool called Qtpfsgui. It’s even cross-platform! It supports multiple algorithms for doing tone mapping, too.

Overall, I’m still not that happy with the results, but it’s a start. Here’s a ‘normal’ shot of the beach, taken on Cape Cod yesterday:

title=”Beach by n1zyy, on Flickr”>Beach

You’ll note that the foreground (e.g., the bench) is too dark, yet the sky is too light. It’s a good illustration of insufficient dynamic range.

Luckily, I knew in the back of my head that I wanted to try my hand at HDR photography, so I saw it as an opportunity. I set my camera to meter -2 to +2 EV, to try to cover the full range. The end product:

title=”Fattal Algorithm by n1zyy, on Flickr”>Fattal Algorithm

It displays a very common pet peeve of mine with HDR photos: it looks entirely unrealistic. Absurd, even. I think part of it’s that it’s just overdone, and that the contrast is jacked way up. I want to play around with it more and see if I can get a more natural product. So far, no luck. But, at least in a technical sense, it’s an improvement over the first image.

I’d like to see HDR come a little further, so that HDR photos don’t have the same, “Whoa!” quality that a scary old lady with way too much makeup has. I don’t think the limitations are entirely technical at this point, either.

Geek

We’ve been having a lot of intermittent network problems at home. Periodically, our Internet cuts out. At first I assumed it was our ISP–it’s no longer Adelphia (run by pharmacists), though–but subsequent research indicated that it wasn’t our ISP’s fault: our router was going down.

My dad set it all up, so I wasn’t too sure how things went. I was pretty confident that we were just using a generic store-bought broadband router, though, so I found it strange that it would be drifting in and out. It turns out that I overlooked something about the router: it’s being held together with duct tape.

I’d already been intrigued by OpenBSD’s pf, so this seemed like a sign! I commissioned an old desktop system, loaded OpenBSD up on it, and went to work configuring it. OpenBSD was just more different from Linux than I expected. It asks you if you want to let OpenBSD use the whole hard drive. I said yes, and thought, “Wow, this is just as easy as Ubuntu!” But it turns out that this was just the first stage. After this, you have to set “disk labels,” which are sort of like partitions but ambiguously different. The syntax is obscure, the purpose is obscure, and so forth. Then I had to configure the network. NICs are named by the drivers they use, so instead of eth0 and eth1 (for Ethernet), I have rl0 (Realtek) and dc0 (who knows).

I was also extremely confused trying to set up routing. Long-term, it was going to be the router, but short-term, it needs to know about our existing router so that it can connect and download the requisite packages.

So I finally got it all set up. I also installed MySQL (unnecessarily, it turns out), Apache, and PFW, a web-based configuration tool for pf. I ended up not using PFW, because my understanding of pf is so bad that I’m basically relegated to copying-and-pasting rules from websites into the configuration file.

Even using pf is confusing. It’s called pf, but typing “pf” at the command line doesn’t do anything. It turns out that you control it with a tool called “pfctl.” You can do pfctl -e to enable pf, and pfctl -d to disable it.

As I tried to tweak the firewall/routing rules, I’d periodically “restart” pf by disabling and then re-enabling it. I wasn’t sure if it read the rules “live” or if a restart was needed. It turns out… neither! The rules are stored in memory, but restarting pf doesn’t flush the rules. You need to pass pf some more arguments to tell it to flush the cache and read them anew from its configuration file.

After a few more hours of work, I thought it was all set up. Both NICs were configured, the external one to get an IP over DHCP, and the internal one with a low fixed IP. I had a complex set of rules, doing NAT, filtering traffic, and using HFSC for prioritized queueing. (HFSC seems completely undocumented, by the way. I took my tips from random websites.) It seemed very impressive: I prioritized ACKs so that downloads wouldn’t suffer if our outbound link was saturated. (Aside: it really doesn’t make sense to do queueing on incoming traffic, since the bottleneck is our Internet link, not our 100 Mbps LAN.)  I also afforded DNS, ssh, and video game traffic high priorities, but allocated them a lower percentage of traffic. I even figured out the default BitTorrent ports and gave them exceptionally low priority: if our line is fully saturated, the last thing I care about is sharing unnecessary data with other people.

And there are other neat features. It “scrubs” incoming connections, reassembling fragmented packets and just eliminating crap that doesn’t make sense. It catches egregious “spoofing” attempts and discards them.

I hooked up the second LAN connection to test it out, rebooted, and… waited.

It never came up. Well, it did come up. The computer’s running fine. Both network cards show up with the switch. Doing an nmap probe of our LAN, I see one strange entry. It’s actually pretty mysterious: it has no open ports, and attempting to ssh into it just sits there: it doesn’t send a connection refused, but completely ignores the incoming packets, leaving my poor ssh client sitting there waiting for a reply, having no clue what’s going on.

In a nutshell, it seems that I just built a firewall/router that’s so secure that I can only find one of its two cards on the network, and I can’t even try to log into it. Let’s see you hack that! Of course, this does have some issues. For example, I can’t use it.

I haven’t lost hope yet: I have a keyboard and monitor so I can log in on the console and try to do some tweaking there. (You can’t firewall off the keyboard.) It’s just not very encouraging to think, “Alright, let’s reboot and make sure it works as flawlessly as I think it will” and then have the darned thing not even show up on the network.

…and a Happy New Year!

(Okay, it works in chronological order, but I display newest on top… So just pretend my title complements Kyle’s.)

I wasn’t planning on blogging about my Christmas presents, but Kyle did and I decided to save some stuff for a new post.

I was much relieved when the former bishop of Turkey brought a Rebel XTi to replace my 10D (RIP, buddy; I loved you for the short time I knew thee). Although it’s technically a lower “class” of camera, the XTi is really an upgrade to the 10D in all ways except size and weight, so I’m quite pleased. (I “lost” ISO3200, but it was so noisy that i don’t miss it.) Not only is it a higher resolution (and a bigger LCD!), but Canon introduced an awesome new feature: an ultrasonic “duster” for the sensor that runs every time you turn the camera on or off. It’s too soon to tell, but it’s seemingly pretty effective at making sensor dust a problem of the past.

Along with it was a 50mm f/1.8 lens… I was a bit concerned at first, because it’s an effective 80mm with the FoV crop, but it’s turned out to still be an ideal length. The f/1.8 aperture affords me two great abilities: one is to take pictures in comparatively dark places without relying on flash, and the other is the ability to throw the background way out of focus, achieving “bokeh,” a fabulous effect.

title=”Holly by n1zyy, on Flickr”>Holly

I should note that, in the past few days, I re-shuffled things on my computer, re-installing Ubuntu on a clean partition and getting Compiz working. I’m hoping to use Xen to run my Windows installation, but I haven’t gotten Xen and my desktop environment to play nicely yet. I backed up my 500 GB “backup” drive, reformatted and repartitioned it (in a sane manner this time), and then moved everything back onto it in a more organized manner. I also set up an old stereo I had almost forgotten I owned. So it was practically Christmas even before today.

We also got a Wii for the family, along with Guitar Hero 3. Trying to get my parents using it, I realized just how steep the learning curve is: they’ve probably sunk a couple hours into practice and are just now finishing songs. It was the same way for me, too, just a long time ago. In a way, I kind of wonder why people bother: if you spend half an hour and get nothing but the crowd booing you, it’s really not encouraging to keep going. GH3 on the Wii is interesting–you snap a Wiimote into the back and use that, making it a wireless guitar. (Woot!) As an added bonus, the sounds when you mess up come out the controller and not the TV, which would be very helpful in multiplayer mode.

title=”Guitar Hero 3 for the Wii by n1zyy, on Flickr”>Guitar Hero 3 for the Wii

The Wiimotes now ship with this silly-looking “skin” for the controllers. I’m not sure whether it’s to protect the controllers (which practically explode if they get flung into a cinder block wall) or to protect people (who, presumably, do not like being hit in the head with game controllers), but it’s probably a good idea either way… They just look a bit goofy, is all.

title=”New Wiimotes by n1zyy, on Flickr”>New Wiimotes

I also got some great books… I’ve started several, and am having a hard time deciding whether I should keep up status quo (reading a chapter or two from one and then coming back and picking up another book and continuing that), or read them sequentially. Current must-reads on my nightstand* include my (signed!) copy of The First Campaign by Garrett Graff, an expert on blogging and politics; Tim Ferriss’ The Four-Hour Workweek (pre-review: the little bit I’ve read is fascinating, but between the book and his website, I can’t help but pick up on a bit of ego?); “Why Are All the Black Kids Sitting Together in the Cafeteria,” an interesting (or so it looks; I haven’t gotten far in yet) look at race relations in America by Spelman College President Beverly Daniel Tatum; and Naked Economics, a thin paperback “Undressing the Dismal Science” by Charles Wheelan: it looks like the type of book I wish I’d had when I was taking Economics.

I also received a nice vacuum. Ordinarily, I wouldn’t be too excited about a vacuum cleaner. Think of the, “Oh, it’s more clothes?!” you felt as a child. I think that’s how most people would feel upon receiving a vacuum cleaner. Especially college-aged guys. But you should see the floors in my dorm room… The vacuum was among my favorite gifts this year. Our floor at school gets vacuumed about once a month. It needs to be vacuumed about thrice a week. So it’s going to be a huge improvement.

  • Full disclosure: I don’t actually have a nightstand, but I didn’t think it was too egregious of a lie to not say that the books are actually split between my desk and the side of my bed. But in case anyone wants to try to accuse me, there it is: I don’t have a nightstand.

Advice

I learned two valuable lessons today:

  • Don’t ever create a 500GB FAT partition. No matter how good of an idea it seems, don’t do it. (Not terribly different is the advice, “Don’t ever create one big 500GB partition.”)
  • Mounting a filesystem as “msdos” is not the same as mounting it as “vfat” in Linux. msdos is still constrained by the 8.3 naming system. vfat is not. Unless the disk was literally written with MS DOS, don’t use msdos. It’ll work okay, but boy are you screwing yourself if you make backups with it mounted as msdos. (Fortunately, I realized this before wiping the drive.)

An Uncontrollable Urge

A few years ago Andy and I ran a hosting company. It never got that far, but it was fun, and also a learning experience.  Today I’m finding that I can’t get the idea of starting it again out of my head. The problem is that, this time, I’d want to start it big.

There are a bunch of technologies that I find downright exciting:

  • Old racks full of blade servers are hitting the used market. And by “old” I mean dual 2-3 GHz Xeons, a gig or two of RAM, and hard drives that still rival what hosts are renting in dedicated servers. I’d probably want to put in new drives, but the machines are cheap and they’re plentiful.
  • Boston has a number of good data centers, and all the big Tier 1 providers are here. That there seem to be no well-known hosting companies out here is frankly kind of surprising. You have no idea how badly I want to pick up a couple racks in a colocation facility, and pull in a couple 100 Mbps lines.
  • cPanel looks like it’s matured a lot since I last used it, and it has some good third-party stuff such as script installers. It looks like it remains the number one choice in virtual hosting.
  • Xen is downright exciting. It permits splitting a physical host into multiple virtual machines. With the advent of chips with hardware virtualization support from both AMD and Intel, it now runs with very little overhead. It used to require extensive modifications to the “guest” OS, so that only modified versions of Linux worked. With newer processors, though, you’re able to run machines without them having to know they’re in a virtual machine, opening up options. You can run Windows now. The virtual dedicated server / virtual private server market is growing. (Xen also supports moving hosts between physical servers, which has a lot of nice applications, too!)
  • OpenBSD’s firewall, pf, continues to intrigue me for its power. I just found PFW, a really spiffy web GUI for managing pf. Not only does it do basic firewall stuff, but it’s got support for prioritization of traffic / QoS, and for load balancing. I’m probably just scratching the surface.
  • I’ve spent years honing my admin skills and improving server performance. Improved performance on a shared server, of course, means more clients per server, or more money.

I’m wholly convinced I should start a Boston hosting company. I just need $100,000 capital or so. (Santa, do you read my blog? Do you fund businesses? I’ll give you partial equity.)