Big Hosting

I tend to think of web hosting in terms of many sites to a server. And that’s how the majority of sites are hosted–there are multiple sites on this one server, and, if it were run by a hosting company and not owned by me, there’d probably be a couple hundred.

But the other end of the spectrum is a single site that takes up many servers. Most any big site is done this way. Google reportedly has tens of thousands. Any busy site has several, if nothing else to do load-balancing.

Lately I’ve become somewhat interested in the topic, and found some neat stuff about this realm of servers. A lot of things are done that I didn’t think were possible. While configuring my router, for example, I stumbled across stuff on CARP. I always thought of routers as a single point of failure: if your router goes down, everything behind it goes down. So you have two (or more) routers in mission-critical setups.

One thing I wondered about was serving up something that had voluminous data. For example, suppose you have a terabyte of data on your website. One technique might be to put a terabyte of drives in every server and do load balancing from there. But putting a terabyte of drives in each machine is expensive, and, frankly, if you’re putting massive storage in one machine, it’s probably huge but slow drives. Another option would be some sort of ‘horizontal partitioning,’ where five (arbitrary) servers each house one-fifth of the data. This reduces the absurdity of trying to stuff a terabyte of storage into each of your servers, but it brings problems of its own. For one, you don’t have any redundancy: if the machine serving sites starting with A-G goes down, all of those sites go down. Plus, you have no idea of how ‘balanced’ it will be. Even if you tried some intricate means of honing which material went where, the optimal layout would be constantly changing.

Your best bet, really, is to have a bunch of web machines, give them minimal storage (e.g., a 36GB SCSI drive–a 15,000 rpm one!), and have a backend fileserver that has the whole terabyte of data. Viewers would be assigned to any of the webservers (either in a round-robin fashion, or dynamically based on which server was the least busy), which would retrieve the requisite file from the fileserver and present it to the viewer. Of course, this places a huge load on the one fileserver. There’s an implicit assumption that you’re doing caching.

But how do you manage the caching? You’d need some complex code to first check your local cache, and then turn to the fileserver if needed. It’s not that hard to write, but it’s also a pain: rather than a straightforward, “Get the file, execute if it has CGI code, and then serve” process, you need the webserver to do some fancy footwork.

Enter Coda. No, not the awesome web-design GUI, but the distributed filesystem. In a nutshell, you have a server (or multiple servers!) and they each mount a partition called /coda, which refers to the network. But, it’ll cache files as needed. This is massively oversimplifying things: the actual use is to allow you to, say, bring your laptop into the office, work on files on the fileserver, and then, at the end of the day, seamlessly take it home with you to work from home, without having to worry about where the files physically reside. So running it just for the caching is practically a walk in the park: you don’t have complicated revision conflicts or anything of the sort. Another awesome feature about Coda is that, by design, it’s pretty resilient: part of the goal with caching and all was to pretty gracefully handle the fileserver going offline. So really, the more popular files would be cached by each node, with only cache misses hitting the fileserver. I also read an awesome anecdote about people running multiple Coda servers. When a disk fails, they just throw in a blank. You don’t need RAID, because the data’s redundant across other servers. With the new disk, you simply have it rebuild the missing files from other servers.

There’s also Lustre, which was apparently inspired by Coda. They focus on insane scalability, and it’s apparently used in some of the world’s biggest supercomputer clusters. I don’t yet know enough about it, really, but one thing that strikes me as awesome is the concept of “striping” across multiple nodes with the files you want.

The Linux HA project is interesting, too. There’s a lot of stuff that you don’t think about. One is load balancer redundancy… Of course you’d want to do it, but if you switched over to your backup router, all existing connections would be dropped. So they keep a UDP data stream going, where the master keeps the spare(s) in the loop on connection states. Suddenly having a new router or load balancer can also be confusing on the network. So if the master goes down, the spare will come up and just start spoofing its MAC and IP to match the node that went down. There’s a tool called heartbeat, whereby standby servers ping the master to see if it’s up. It’s apparently actually got some complex workings, and they recommend a serial link between the nodes so you’re not dependent on the network. (Granted, if the network to the routers goes down, it really doesn’t matter, but having them quarreling over who’s master will only complicate attempts to bring things back up!)

And there are lots of intricacies I hadn’t considered. It’s sometimes complicated to tell whether a node is down or not. But it turns out that a node in ambiguous state is often a horrible state of affairs: if it’s down and not pulled out of the pool, lots of people will get errors. And if other nodes are detecting oddities but it’s not down, something is awry with the server. There’s a concept called fencing I’d never heard, whereby the ‘quirky’ server is essentially shut out by its peers to prevent it from screwing things up (not only may it run away with shared resources, but the last thing you want is a service acting strangely to try to modify your files). The ultimate example of this is STONITH, which sounds like a fancy technical term (and, by definition, now is a technical term, I suppose), but really stands for “Shoot the Other Node in the Head.” From what I gather from the (odd) description, the basic premise is that if members of a cluster suspect that one of their peers is down, they “make it so” by calling external triggers to pull the node out of the network (often, seemingly, to just reboot the server).

I don’t think anyone is going to set up high-performance server clusters based on what someone borderline-delirious blogged at 1:40 in the morning because he couldn’t sleep, but I thought someone else might find this venture into what was, for me, new territory, to be interesting.

Idea

Why isn’t there a really good “network appliance” as a network gateway? You can get a low-end firewall/router, or you can build your own machine.

Setting up OpenBSD is no walk in the park, though. I want to build an “appliance” based on OpenBSD, and give it a nice spiffy web GUI. You buy the box, plug one side into your switch and one side into your cable modem or whatnot, and spend ten minutes in a web browser fine-tuning it. I was really fond of the appearance of the Cobalt Qube, although it could be made much smaller. And throw a nice LCD on the front with status. You can run a very low-power CPU, something like the one powering these. It really doesn’t need more than 512MB RAM, but give it a small solid-state drive. And a pair of Gigabit cards, not just for the speed, but because GigE cards usually are much higher-quality. In building routers, the quality of your card determines how hard the CPU has to work.

There’s so much that a router can do. You can run a transparent caching proxy, a caching DNS server, priority-based queuing of outgoing traffic (such as prioritizing ACKs so downloads don’t suffer because of uploads, or giving priority to time-sensitive materials such as games), NAT, an internal DHCP server, and, of course, a killer firewall. You can also generate great graphs of things such as bandwidth use, blocked packets, packet loss, latency…You can regulate network access per-IP or per-MAC, and do any sort of filtering you wanted. It could also easily integrate with a wireless network (maybe throw a wireless card in, too!), serving as an access point and enabling features like permitting only certain MACs to connect, requiring authentication, or letting anyone in but requiring that they sign up in some form (a captive portal). And I really don’t understand why worms and viruses spread so well. It’s trivial to block most of them at the network level if you really monitor incoming traffic.

I’m frankly kind of surprised that nothing of this level exists. I think there’s a definite market for quality routers. A $19 router does the job okay, but once you start to max out your connection, you’ll really notice the difference! A good router starts prioritizing traffic, so your ssh connection doesn’t drop and your game doesn’t lag out, but your webpages might load a little slower. An average router doesn’t do anything in particular and just starts dropping packets all over the place, leaving no one better off. (And a really bad router–our old one–seems to deal with a fully-saturated line not by dropping excess packets or using priority queueing, but by reboot itself, leaving everyone worse off… I think this may have had to do with the duct tape.)

Custom LogFormat with Apache

Posting this in the hopes that it’ll help someone at some point….

Using Apache (Apache2 in my case, but I’m not sure it matters), you can customize the format for log files like access_log. Apache has a good page describing the variables you can use. But it doesn’t tell you everything you need to know!

The first question is where you put it… You can just specify it in httpd.conf (I put it near the end, but I don’t think its placement matters terribly, as long as it’s not in the middle of a section. It doesn’t go in any directives or anything. You can also insert it inside a VirtualHost directive if you only want it to apply to those. (Don’t put it inside a Directory directive!)

The second thing is something that’s not really specified anywhere: specifying a LogFormat without then specifying a CustomLog directive accomplishes nothing! I wanted to keep Apache logging in the default directory (/var/log/apache2/access_log on Gentoo), so I just set the LogFormat to something I wanted. And nothing happened.

You specify the format in CustomLog as well, so it’s handy to use LogFormat to assign a “nickname”:

LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"" n1zyy
CustomLog /var/log/apache2/access_log n1zyy

The first line sets the “n1zyy” ‘nickname’ to refer to to the format I specify. The next line sets a “custom” log file (in this case, it’s the same as the default, but I digress. It won’t work if I don’t specify it.) Then I tell it to use the format named “n1zyy.”

Once this is set up, you want to reload Apache, since it won’t notice your changes until you do.

High Dynamic Range

I’d been seeing a lot about HDR, or High Dynamic Range, photography. In layman’s terms, the dynamic range of a camera is the range from the darkest to the lightest parts a camera can record in one shot. The problem is that the dynamic range of cameras doesn’t match real life that often.

Long ago, photographers found a halfway decent solution: graduated filters. Basically, you stick a filter in front of the lens, with part of it darker than the rest. It’s great if, say, you want to take a great picture at the beach with both foreground detail and the sky properly exposed.

With computers, though, there’s been another photo. You take a series of bracketed shots: one or two for the sky, one or two for the foreground, etc. Some people have been known to stitch together close to a dozen. Having a tripod helps tremendously here, since the images need to be pretty much exactly the same besides exposure.

Strictly, HDR requires more than a monitor can really display, but a technique called tone mapping is often used. The basic premise is to take the “good” parts of each shot in a bracketed series and stitch them together. Photoshop CS2 and newer has an HDR utility, though I’ve been pretty unimpressed with the results. Today I started playing around with an Open Source tool called Qtpfsgui. It’s even cross-platform! It supports multiple algorithms for doing tone mapping, too.

Overall, I’m still not that happy with the results, but it’s a start. Here’s a ‘normal’ shot of the beach, taken on Cape Cod yesterday:

title=”Beach by n1zyy, on Flickr”>Beach

You’ll note that the foreground (e.g., the bench) is too dark, yet the sky is too light. It’s a good illustration of insufficient dynamic range.

Luckily, I knew in the back of my head that I wanted to try my hand at HDR photography, so I saw it as an opportunity. I set my camera to meter -2 to +2 EV, to try to cover the full range. The end product:

title=”Fattal Algorithm by n1zyy, on Flickr”>Fattal Algorithm

It displays a very common pet peeve of mine with HDR photos: it looks entirely unrealistic. Absurd, even. I think part of it’s that it’s just overdone, and that the contrast is jacked way up. I want to play around with it more and see if I can get a more natural product. So far, no luck. But, at least in a technical sense, it’s an improvement over the first image.

I’d like to see HDR come a little further, so that HDR photos don’t have the same, “Whoa!” quality that a scary old lady with way too much makeup has. I don’t think the limitations are entirely technical at this point, either.

Geek

We’ve been having a lot of intermittent network problems at home. Periodically, our Internet cuts out. At first I assumed it was our ISP–it’s no longer Adelphia (run by pharmacists), though–but subsequent research indicated that it wasn’t our ISP’s fault: our router was going down.

My dad set it all up, so I wasn’t too sure how things went. I was pretty confident that we were just using a generic store-bought broadband router, though, so I found it strange that it would be drifting in and out. It turns out that I overlooked something about the router: it’s being held together with duct tape.

I’d already been intrigued by OpenBSD’s pf, so this seemed like a sign! I commissioned an old desktop system, loaded OpenBSD up on it, and went to work configuring it. OpenBSD was just more different from Linux than I expected. It asks you if you want to let OpenBSD use the whole hard drive. I said yes, and thought, “Wow, this is just as easy as Ubuntu!” But it turns out that this was just the first stage. After this, you have to set “disk labels,” which are sort of like partitions but ambiguously different. The syntax is obscure, the purpose is obscure, and so forth. Then I had to configure the network. NICs are named by the drivers they use, so instead of eth0 and eth1 (for Ethernet), I have rl0 (Realtek) and dc0 (who knows).

I was also extremely confused trying to set up routing. Long-term, it was going to be the router, but short-term, it needs to know about our existing router so that it can connect and download the requisite packages.

So I finally got it all set up. I also installed MySQL (unnecessarily, it turns out), Apache, and PFW, a web-based configuration tool for pf. I ended up not using PFW, because my understanding of pf is so bad that I’m basically relegated to copying-and-pasting rules from websites into the configuration file.

Even using pf is confusing. It’s called pf, but typing “pf” at the command line doesn’t do anything. It turns out that you control it with a tool called “pfctl.” You can do pfctl -e to enable pf, and pfctl -d to disable it.

As I tried to tweak the firewall/routing rules, I’d periodically “restart” pf by disabling and then re-enabling it. I wasn’t sure if it read the rules “live” or if a restart was needed. It turns out… neither! The rules are stored in memory, but restarting pf doesn’t flush the rules. You need to pass pf some more arguments to tell it to flush the cache and read them anew from its configuration file.

After a few more hours of work, I thought it was all set up. Both NICs were configured, the external one to get an IP over DHCP, and the internal one with a low fixed IP. I had a complex set of rules, doing NAT, filtering traffic, and using HFSC for prioritized queueing. (HFSC seems completely undocumented, by the way. I took my tips from random websites.) It seemed very impressive: I prioritized ACKs so that downloads wouldn’t suffer if our outbound link was saturated. (Aside: it really doesn’t make sense to do queueing on incoming traffic, since the bottleneck is our Internet link, not our 100 Mbps LAN.)  I also afforded DNS, ssh, and video game traffic high priorities, but allocated them a lower percentage of traffic. I even figured out the default BitTorrent ports and gave them exceptionally low priority: if our line is fully saturated, the last thing I care about is sharing unnecessary data with other people.

And there are other neat features. It “scrubs” incoming connections, reassembling fragmented packets and just eliminating crap that doesn’t make sense. It catches egregious “spoofing” attempts and discards them.

I hooked up the second LAN connection to test it out, rebooted, and… waited.

It never came up. Well, it did come up. The computer’s running fine. Both network cards show up with the switch. Doing an nmap probe of our LAN, I see one strange entry. It’s actually pretty mysterious: it has no open ports, and attempting to ssh into it just sits there: it doesn’t send a connection refused, but completely ignores the incoming packets, leaving my poor ssh client sitting there waiting for a reply, having no clue what’s going on.

In a nutshell, it seems that I just built a firewall/router that’s so secure that I can only find one of its two cards on the network, and I can’t even try to log into it. Let’s see you hack that! Of course, this does have some issues. For example, I can’t use it.

I haven’t lost hope yet: I have a keyboard and monitor so I can log in on the console and try to do some tweaking there. (You can’t firewall off the keyboard.) It’s just not very encouraging to think, “Alright, let’s reboot and make sure it works as flawlessly as I think it will” and then have the darned thing not even show up on the network.

Advice

I learned two valuable lessons today:

  • Don’t ever create a 500GB FAT partition. No matter how good of an idea it seems, don’t do it. (Not terribly different is the advice, “Don’t ever create one big 500GB partition.”)
  • Mounting a filesystem as “msdos” is not the same as mounting it as “vfat” in Linux. msdos is still constrained by the 8.3 naming system. vfat is not. Unless the disk was literally written with MS DOS, don’t use msdos. It’ll work okay, but boy are you screwing yourself if you make backups with it mounted as msdos. (Fortunately, I realized this before wiping the drive.)

An Uncontrollable Urge

A few years ago Andy and I ran a hosting company. It never got that far, but it was fun, and also a learning experience.  Today I’m finding that I can’t get the idea of starting it again out of my head. The problem is that, this time, I’d want to start it big.

There are a bunch of technologies that I find downright exciting:

  • Old racks full of blade servers are hitting the used market. And by “old” I mean dual 2-3 GHz Xeons, a gig or two of RAM, and hard drives that still rival what hosts are renting in dedicated servers. I’d probably want to put in new drives, but the machines are cheap and they’re plentiful.
  • Boston has a number of good data centers, and all the big Tier 1 providers are here. That there seem to be no well-known hosting companies out here is frankly kind of surprising. You have no idea how badly I want to pick up a couple racks in a colocation facility, and pull in a couple 100 Mbps lines.
  • cPanel looks like it’s matured a lot since I last used it, and it has some good third-party stuff such as script installers. It looks like it remains the number one choice in virtual hosting.
  • Xen is downright exciting. It permits splitting a physical host into multiple virtual machines. With the advent of chips with hardware virtualization support from both AMD and Intel, it now runs with very little overhead. It used to require extensive modifications to the “guest” OS, so that only modified versions of Linux worked. With newer processors, though, you’re able to run machines without them having to know they’re in a virtual machine, opening up options. You can run Windows now. The virtual dedicated server / virtual private server market is growing. (Xen also supports moving hosts between physical servers, which has a lot of nice applications, too!)
  • OpenBSD’s firewall, pf, continues to intrigue me for its power. I just found PFW, a really spiffy web GUI for managing pf. Not only does it do basic firewall stuff, but it’s got support for prioritization of traffic / QoS, and for load balancing. I’m probably just scratching the surface.
  • I’ve spent years honing my admin skills and improving server performance. Improved performance on a shared server, of course, means more clients per server, or more money.

I’m wholly convinced I should start a Boston hosting company. I just need $100,000 capital or so. (Santa, do you read my blog? Do you fund businesses? I’ll give you partial equity.)

Coding Malpractice

I just wrote the following line of code. And it’s no mistake: it functions perfectly and does exactly what I wanted it to do:

$count += 0;

This is surely poor programming practice, essentially implicitly recasting a variable as an integer. But it’s simple and it works flawlessly. (The context: I run an SQL query saying SELECT SUM(votes)..., which makes the tabulation of all the entries MySQL’s problem, not mine. The one ‘flaw’ is that the sum of no votes isn’t 0, but NULL. This becomes a very important distinction when you’re trying to display a number: “0 votes” isn’t the same as ” votes.”)

Since we all know that NULL + 0 = 0 (and, of course, integer + 0 = integer), adding 0 works flawlessly. Could I just convert it to an integer? Probably. But I haven’t done that stuff in a while, and I was far too lazy to pull up the documentation. And incrementing a variable by 0 is way more fun.

Serverama

I’ve been itching to start up a Boston-based dedicated server company. There’s surprisingly few options. As I dig around I’ve found a few that exist, but they’re practically unheard of. I want to start one up and do some good marketing. Name recognition is pretty big when it comes to dedicated servers. However, none of the people I’ve talked to so far (my mom) has been willing to give me the ~$100,000 start up capital I’d need.

I have slightly different plans for furnishing the racks I’d want in a colocation center, but there are plenty of good deals to be had on eBay, in case anyone’s in the server market.

  • 1U Dell, dual 3 GHz Xeons, 1 GB RAM, 73 GB SCSI (10K RPM). $485 + $25 shipping.
  • If you’d prefer disk space over disk speed, there are also 1U SuperMicro machines, a 2.8 GHz Xeon (dual-capable, but not so equipped), 2 GB RAM, and a pair of 250 GB disks. About $400 shipped.
  • Comparatively expensive, but $1500 buys dual 3.2 GHz Xeons, 4 GB RAM, and 3x 146 GB SCSI disks.
  • Not cheap at all, but dual quad-core processor, 16 GB of RAM, and 6x 146 GB disks is my type of machine! $5375 if you’ve got the cash burning a hole in your pocket… (It looks like these processors support VT virtualization, so you can run something like Xen on it and host multiple virtual machines, Windows or Linux, without problems.
  • Dual 3.2 GHz Xeons, 4GB RAM, and a pair of 750 GB disks? $1277+58 seems like a pretty good deal!