Big Hosting

I tend to think of web hosting in terms of many sites to a server. And that’s how the majority of sites are hosted–there are multiple sites on this one server, and, if it were run by a hosting company and not owned by me, there’d probably be a couple hundred.

But the other end of the spectrum is a single site that takes up many servers. Most any big site is done this way. Google reportedly has tens of thousands. Any busy site has several, if nothing else to do load-balancing.

Lately I’ve become somewhat interested in the topic, and found some neat stuff about this realm of servers. A lot of things are done that I didn’t think were possible. While configuring my router, for example, I stumbled across stuff on CARP. I always thought of routers as a single point of failure: if your router goes down, everything behind it goes down. So you have two (or more) routers in mission-critical setups.

One thing I wondered about was serving up something that had voluminous data. For example, suppose you have a terabyte of data on your website. One technique might be to put a terabyte of drives in every server and do load balancing from there. But putting a terabyte of drives in each machine is expensive, and, frankly, if you’re putting massive storage in one machine, it’s probably huge but slow drives. Another option would be some sort of ‘horizontal partitioning,’ where five (arbitrary) servers each house one-fifth of the data. This reduces the absurdity of trying to stuff a terabyte of storage into each of your servers, but it brings problems of its own. For one, you don’t have any redundancy: if the machine serving sites starting with A-G goes down, all of those sites go down. Plus, you have no idea of how ‘balanced’ it will be. Even if you tried some intricate means of honing which material went where, the optimal layout would be constantly changing.

Your best bet, really, is to have a bunch of web machines, give them minimal storage (e.g., a 36GB SCSI drive–a 15,000 rpm one!), and have a backend fileserver that has the whole terabyte of data. Viewers would be assigned to any of the webservers (either in a round-robin fashion, or dynamically based on which server was the least busy), which would retrieve the requisite file from the fileserver and present it to the viewer. Of course, this places a huge load on the one fileserver. There’s an implicit assumption that you’re doing caching.

But how do you manage the caching? You’d need some complex code to first check your local cache, and then turn to the fileserver if needed. It’s not that hard to write, but it’s also a pain: rather than a straightforward, “Get the file, execute if it has CGI code, and then serve” process, you need the webserver to do some fancy footwork.

Enter Coda. No, not the awesome web-design GUI, but the distributed filesystem. In a nutshell, you have a server (or multiple servers!) and they each mount a partition called /coda, which refers to the network. But, it’ll cache files as needed. This is massively oversimplifying things: the actual use is to allow you to, say, bring your laptop into the office, work on files on the fileserver, and then, at the end of the day, seamlessly take it home with you to work from home, without having to worry about where the files physically reside. So running it just for the caching is practically a walk in the park: you don’t have complicated revision conflicts or anything of the sort. Another awesome feature about Coda is that, by design, it’s pretty resilient: part of the goal with caching and all was to pretty gracefully handle the fileserver going offline. So really, the more popular files would be cached by each node, with only cache misses hitting the fileserver. I also read an awesome anecdote about people running multiple Coda servers. When a disk fails, they just throw in a blank. You don’t need RAID, because the data’s redundant across other servers. With the new disk, you simply have it rebuild the missing files from other servers.

There’s also Lustre, which was apparently inspired by Coda. They focus on insane scalability, and it’s apparently used in some of the world’s biggest supercomputer clusters. I don’t yet know enough about it, really, but one thing that strikes me as awesome is the concept of “striping” across multiple nodes with the files you want.

The Linux HA project is interesting, too. There’s a lot of stuff that you don’t think about. One is load balancer redundancy… Of course you’d want to do it, but if you switched over to your backup router, all existing connections would be dropped. So they keep a UDP data stream going, where the master keeps the spare(s) in the loop on connection states. Suddenly having a new router or load balancer can also be confusing on the network. So if the master goes down, the spare will come up and just start spoofing its MAC and IP to match the node that went down. There’s a tool called heartbeat, whereby standby servers ping the master to see if it’s up. It’s apparently actually got some complex workings, and they recommend a serial link between the nodes so you’re not dependent on the network. (Granted, if the network to the routers goes down, it really doesn’t matter, but having them quarreling over who’s master will only complicate attempts to bring things back up!)

And there are lots of intricacies I hadn’t considered. It’s sometimes complicated to tell whether a node is down or not. But it turns out that a node in ambiguous state is often a horrible state of affairs: if it’s down and not pulled out of the pool, lots of people will get errors. And if other nodes are detecting oddities but it’s not down, something is awry with the server. There’s a concept called fencing I’d never heard, whereby the ‘quirky’ server is essentially shut out by its peers to prevent it from screwing things up (not only may it run away with shared resources, but the last thing you want is a service acting strangely to try to modify your files). The ultimate example of this is STONITH, which sounds like a fancy technical term (and, by definition, now is a technical term, I suppose), but really stands for “Shoot the Other Node in the Head.” From what I gather from the (odd) description, the basic premise is that if members of a cluster suspect that one of their peers is down, they “make it so” by calling external triggers to pull the node out of the network (often, seemingly, to just reboot the server).

I don’t think anyone is going to set up high-performance server clusters based on what someone borderline-delirious blogged at 1:40 in the morning because he couldn’t sleep, but I thought someone else might find this venture into what was, for me, new territory, to be interesting.

Transparent Government

As has happened in the past, it seems like the election has been reduced to 2 or 3 talking points–immigration, health care, and Iraq, to name the big ones.

I was brushing up on Obama’s stance on the issues, and found something that really excited me. Check out his page on ethics. It’s not vague talk about how lobbying is bad. He has an awesome plan:

  • A big database and Web frontend containing information on lobbyist activity, what they spend, and what bills are awarded.
  • Information on all federal contracts, how much they cost, who lobbied for them, and how the completion of the contracts is going.
  • Require that non-emergency bills be posted to the Internet for a few days before signing them, to promote “open government”
  • Do the same for earmarks, disclosing who added each earmark and why.
  • He also wants “21st Century Fireside Chats,” where Cabinet officials talk about what they’ve been doing periodically, streamed over the Internet for all to see.
  • Publicize meetings that shouldn’t be secret, such as “regulatory agency business.”

It’s ambitious, but boy would it be awesome! It’s funny: it almost seems like it’s somehow wrong that I should be able to see exactly what my elected officials are doing. And yet it’s really exactly what our government is all about: transparency. Wow-a-wee-wow!

Idea

Why isn’t there a really good “network appliance” as a network gateway? You can get a low-end firewall/router, or you can build your own machine.

Setting up OpenBSD is no walk in the park, though. I want to build an “appliance” based on OpenBSD, and give it a nice spiffy web GUI. You buy the box, plug one side into your switch and one side into your cable modem or whatnot, and spend ten minutes in a web browser fine-tuning it. I was really fond of the appearance of the Cobalt Qube, although it could be made much smaller. And throw a nice LCD on the front with status. You can run a very low-power CPU, something like the one powering these. It really doesn’t need more than 512MB RAM, but give it a small solid-state drive. And a pair of Gigabit cards, not just for the speed, but because GigE cards usually are much higher-quality. In building routers, the quality of your card determines how hard the CPU has to work.

There’s so much that a router can do. You can run a transparent caching proxy, a caching DNS server, priority-based queuing of outgoing traffic (such as prioritizing ACKs so downloads don’t suffer because of uploads, or giving priority to time-sensitive materials such as games), NAT, an internal DHCP server, and, of course, a killer firewall. You can also generate great graphs of things such as bandwidth use, blocked packets, packet loss, latency…You can regulate network access per-IP or per-MAC, and do any sort of filtering you wanted. It could also easily integrate with a wireless network (maybe throw a wireless card in, too!), serving as an access point and enabling features like permitting only certain MACs to connect, requiring authentication, or letting anyone in but requiring that they sign up in some form (a captive portal). And I really don’t understand why worms and viruses spread so well. It’s trivial to block most of them at the network level if you really monitor incoming traffic.

I’m frankly kind of surprised that nothing of this level exists. I think there’s a definite market for quality routers. A $19 router does the job okay, but once you start to max out your connection, you’ll really notice the difference! A good router starts prioritizing traffic, so your ssh connection doesn’t drop and your game doesn’t lag out, but your webpages might load a little slower. An average router doesn’t do anything in particular and just starts dropping packets all over the place, leaving no one better off. (And a really bad router–our old one–seems to deal with a fully-saturated line not by dropping excess packets or using priority queueing, but by reboot itself, leaving everyone worse off… I think this may have had to do with the duct tape.)

Ecstatic

In the most recent polls, Obama is leading narrowly in New Hampshire. And it’s practically a banal phrase at this point, but Iowa is a crapshoot: the “big three” (Edwards, Clinton, and Obama) are pretty much tied. Right now it looks like Edwards is leading, which people thought was unlikely. Thus I’m not too worried at the moment about Hillary’s triumphs in other places.

But for the first time in a while, I’m feeling really excited. This could actually happen!

I’m starting to get interested in the Republican primaries as well: they’re seeming pretty fragmented. Romney and Rudy both have big leads over each other in many states, but McCain and Huckabee are notable contenders in some states, too. (Somewhat humorously, at least to me, Romney has a pathetic 7% in Massachusetts, although the poll is ancient. Someone ought to do a new poll of Massachusetts voters.)

Plans are still up in the air but I may well end up volunteering over at the Obama headquarters later today. The nation is watching us, and I don’t want to sit by idly in the process. We can do this!

High Dynamic Range

I’d been seeing a lot about HDR, or High Dynamic Range, photography. In layman’s terms, the dynamic range of a camera is the range from the darkest to the lightest parts a camera can record in one shot. The problem is that the dynamic range of cameras doesn’t match real life that often.

Long ago, photographers found a halfway decent solution: graduated filters. Basically, you stick a filter in front of the lens, with part of it darker than the rest. It’s great if, say, you want to take a great picture at the beach with both foreground detail and the sky properly exposed.

With computers, though, there’s been another photo. You take a series of bracketed shots: one or two for the sky, one or two for the foreground, etc. Some people have been known to stitch together close to a dozen. Having a tripod helps tremendously here, since the images need to be pretty much exactly the same besides exposure.

Strictly, HDR requires more than a monitor can really display, but a technique called tone mapping is often used. The basic premise is to take the “good” parts of each shot in a bracketed series and stitch them together. Photoshop CS2 and newer has an HDR utility, though I’ve been pretty unimpressed with the results. Today I started playing around with an Open Source tool called Qtpfsgui. It’s even cross-platform! It supports multiple algorithms for doing tone mapping, too.

Overall, I’m still not that happy with the results, but it’s a start. Here’s a ‘normal’ shot of the beach, taken on Cape Cod yesterday:

title=”Beach by n1zyy, on Flickr”>Beach

You’ll note that the foreground (e.g., the bench) is too dark, yet the sky is too light. It’s a good illustration of insufficient dynamic range.

Luckily, I knew in the back of my head that I wanted to try my hand at HDR photography, so I saw it as an opportunity. I set my camera to meter -2 to +2 EV, to try to cover the full range. The end product:

title=”Fattal Algorithm by n1zyy, on Flickr”>Fattal Algorithm

It displays a very common pet peeve of mine with HDR photos: it looks entirely unrealistic. Absurd, even. I think part of it’s that it’s just overdone, and that the contrast is jacked way up. I want to play around with it more and see if I can get a more natural product. So far, no luck. But, at least in a technical sense, it’s an improvement over the first image.

I’d like to see HDR come a little further, so that HDR photos don’t have the same, “Whoa!” quality that a scary old lady with way too much makeup has. I don’t think the limitations are entirely technical at this point, either.

Benazir Bhutto

I confess to being ignorant enough to have not even heard of her, but Benazir Bhutto was a really interesting figure.

Now here’s an interesting video. You learn a few things. The first is that she speaks fluent English. The second is that she was widely aware of plots to kill her, and fingers a number of suspects in the video.

But the person who posted the video makes another interesting point. At one point she speaks of Osama’s son. Later on, she fingers a man “who killed Osama bin Laden,” an assertion which doesn’t seem to phase the interviewer.

The rumor’s existed for a while, but has generally just been peoples’ gut feelings and such. Now I’m intrigued.

Knots in My Stomach

Thanks Rusty for finding the Electoral-Vote.com website, something I’d forgotten about from the 2004 election. The data is in a bit of a confusing layout… Disregard the 2004 map and the first little table. He then has a comprehensive list of polls state-by-state.

My eyes are on Clinton:Obama. And I seriously have knots in my stomach here. Clinton is winning by at least 10% in most places. Arizona is 44% to 14%. In his home state of Illinois, Obama’s winning 37% to 33%.

The good news! Iowa, a key state, is slightly favoring Obama. But really, it’s a crapshoot: Obama, Edwards, and Clinton are neck-and-neck. Romney and Huckabee lead the Republican primary. At this point in time, though, my main concern is on the Democratic primary.

Here in New Hampshire, Obama’s trailing, 26% to 38%. This is not good. We’re #2 after Ohio.

Oklahoma’s weird. Obama’s got 13%, with Clinton and Edwards tied at 29%. (Don’t get me wrong: Edwards is good, but I don’t think he has a chance right now.)

The Republican one is interesting to take a gander at, too. In some places, Huckabee’s an also-ran. In Arizona, he got 3% of the votes. Once. In Iowa, he inches past Romney to take first place at 28%. Surprisingly (to me, at least), he’s doing the exact same thing in New Hampshire. With a quick skim (admittedly, much less than I’ve afforded the Democratic primary), it looks like Giuliani is king of the Republican race.

But a few thoughts:

  • I think the odds of Edwards winning the primary are slim. But he carries a substantial margin in some places. If he were to drop out and endorse Obama, the impact would be considerable. I worry that most of his fans would support Hillary, though.
  • I think we need to review the statistics after the Iowa caucus (January 3) and the New Hampshire primary (January 8). Everyone’s watching these, and the results will have a big impact. A strong lead by Obama may pull out some undecideds. Or, a strong lead by Clinton may freak out some people who will vote for Obama just to vote against her. (While I’d back her if she were our nominee, she is not my preferred Democrat, if you can tell.)
  • My super-early-money is on Clinton vs. Giuliani. And this concerns me greatly, because people voting on first impressions will probably favor Rudy without really doing a lot of research. (It also concerns me because I don’t particularly like either of them.)
  • The Republicans are getting weird results: Giuliani wins some places, Romney wins some places, McCain’s got a few wins (probably the least), and Huckabee, who I initially thought was the Kucinich of the Republicans, is actually leading in quite a few places. I’m really not sure who’s going to get their nomination.
  • As we saw in 2004, polls can be flaky. (I twice typed “pols” instead of “polls.” Freudian slip?) So this doesn’t necessarily mean anything.

One-sentence conclusion: It’s too soon to really have any idea how things will go, but Clinton has a discomforting majority in many states.

A few parting thoughts:

  • Read up on the Iowa caucus process if you’re not familiar. It’s quite foreign, really.
    • Apparently, only once in history (or once in five, put differently: an important distinction!) has the Straw Poll winner not matched the Iowa caucus winner. And this year’s Straw Poll winner was Romney. Both Giuliani and McCain screwed everything up by blowing the event off, and thus polled very poorly. I don’t know what this means: this might still tick off Iowa voters, tanking Giuliani in Caucus as well. But it also means that the data is probably skewed away from them right now, and if Iowa voters don’t have a vengeance, they may take votes away from Romney.
  • The Iowa Caucus is less than two weeks away, and the NH primary is less than three. Pay more attention to the statistics then.
  • Vote!

Qqueue!

So I’m a huge fan of Ask Metafilter. The basic premise is simple: you ask a question and lots of people answer. But Ask MeFi rocks because they maintain high standards. So you actually get really good answers. It costs $5 to join, which is done to pay for the servers but, frankly, seems like a good way for keeping crap out, too.  You’re allowed one question a week, so I try to make it good. But oftentimes, I put it off for several weeks for want of something worthy of using up my question.

So I started a list. And I figured I’d allowed voting and comments. And before I knew it, I had this monstrosity. It was actually extraordinarily simple to code, too. I hope to add better questions over time: these are the ones that were on my mind at the time. You can vote (the + and – buttons), and leave comments. Feel free to do so. (I’m not taking question ideas: get your own account if that’s what you want!)

Saving the Auto Industry

My whole family drives Toyotas. We love America and all, but we want good, solid cars. The U.S. is, understandably, concerned about how much oil we’re using. So we’re trying for a requirement that, by 2020, all cars sold get 35mpg at a minimum. Of course, the car companies are complaining that this is going to be incredibly difficult to do.

Two comments:

  • This is utter BS. My mom gets 50 mpg with her Prius. Honda did it in 1987.
  • Why does the government need to get involved? The way I think it should be working is that we say, “$3 a gallon for gas is ridiculous! I want a car that gets better gas mileage!” We stop buying cars that get horrible gas mileage, and, consequentially, Detroit stops making cars that get horrible gas mileage because no one is buying them. It costs me $40 every time I fill up. I wince every single time.

I found this video online. I’m not going to lie: it’s dry, and 20 minutes long. I was kind of proud to follow him most of the time as he talks about internal rates of returns and demand pull and the like. He makes some extremely obscure references, and even now, I’m not sure what he was talking about with oil at $12 a barrel.

And yet, despite it being presented in a technical, academic manner to an audience that’s definitely not normal people, he makes some points that are really, really, really worth hearing. One of the simplest ones: efficient cars are going to be made, the question is who’s going to make them. And, at least right now, it’s not us. (And it really boggles my mind, frankly. Ford makes one hybrid: the Ford Escape Hybrid. 34mpg on an SUV is impressive (I get 20-22). But what the heck market are they appealing to? They manage to completely dilute the effects of a hybrid engine by putting it in an SUV.)

GM developed a “concept car” 16 years ago that, as I recall, got close to 100 miles a gallon. Where is it?

Besides oil, another huge problem we’re facing is a ridiculously huge trade deficit. If we could make cars good enough that we wouldn’t have to keep importing cars, we could certainly help.

He presents some amazing statistics, too. 87% of the energy from fuel used in cars is utterly wasted. Only 6% of the total energy actually moves the car. (And when you figure in that the car weights significantly more than the passengers and luggage, he says that less than 1% actually moves the passengers.)

He says the solution is to lighten the car. I cringed for a minute. Lighter cars, especially on today’s roads, are asking for disaster. You can go drive your 500 pound car, and I’m sorry if I kill you when you crash into my SUV.

But it turns out that this is somewhat wrong. He showed a picture of a McLaren SLR (a $400,000+ car) that was made out of carbon fiber. It’s very light. Some idiot T-boned the car. Their car was totaled. The McLaren driver had to buff out a scratch in the paint. He suggested that, if you were to smash the car head-first into a brick wall, about 25 pounds of carbon fiber is all it would take to absorb the impact and let you walk away unharmed.

He goes on to call heavy cars “hostile cars,” and really, he’s got an excellent point. We’re making heavy cars solely for safety with other cars. But we can increase fuel efficiency, maintain (or increase!) driver safety, and decrease risk to other motorists by simply changing materials.

Oh, and one final point he makes that I thought was interesting: we think of OPEC as a cartel that has tons of power. In actuality, our power of demand far outweighs their supplier power, and we have the power in the equation. Except that we can’t stop buying oil. Years ago we saw a lull in demand, and basically gave OPEC the bird. He suggests doing it again.

I didn’t expect to watch the whole video, which is 20 minutes long. But before I knew it I was done. And it’s pretty thought-provoking.

Drive Carefully

This hilarious thread on bad drivers reminds me of a video that’s way funnier than it should… was shot of some cars sliding on fresh snow down a hill. It’s pretty common: someone brakes too hard or not soon enough, or steers to sharp, or whatever, and loses control. It’s scary. But watch that video–in particular, the first car.

How is that even possible?!