Building an Improvised CDN

From my “Random ideas I wish I had the resources to try out…” file…

The way the “pretty big” sites work is that they have a cluster of servers… A few are database servers, many are webservers, and a few are front-end caches. The theory is that the webservers do the ‘heavy lifting’ to generate a page… But many pages, such as the main page of the news, Wikipedia, or even these blogs, don’t need to be generated every time. The main page only updates every now and then. So you have a caching server, which basically handles all of the connections. If the page is in cache (and still valid), it’s served right then and there. If the page isn’t in cache, it will get the page from the backend servers and serve it up, and then add it to the cache.

The way the “really big” sites work is that they have many data centers across the country and your browser hits the closest one. This enhances load times and adds in redundancy (data centers do periodically go offline: The Planet did it just last week when a transformer inside blew up and the fire marshalls made them shut down all the generators.). Depending on whether they’re filthy rich or not, they’ll either use GeoIP-based DNS, or have elaborate routing going on. Many companies offer these services, by the way. It’s called CDN, or a Contribution Distribution Network. Akamai is the most obvious one, though you’ve probably used LimeLight before, too, along with some other less-prominent ones.

I’ve been toying with SilverStripe a bit, which is very spiffy, but it has one fatal flaw in my mind: its out-of-box performance is atrocious. I was testing it in a VPS I haven’t used before, so I don’t have a good frame of reference, but I got between 4 and 6 pages/second under benchmarking. That was after I turned on MySQL query caching and installed APC. Of course, I was using SilverStripe to build pages that would probably stay unchanged for weeks at a time. The 4-6 pages/second is similar to how WordPress behaved before I worked on optimizing it. For what it’s worth, static content (that is, stuff that doesn’t require talking to databases and running code) can handle 300-1000 pages/second on my server as some benchmarks I did demonstrated.

There were two main ways to enhance SilverStripe’s performance that I thought of. (Well, a third option, too: realize that no one will visit my SilverStripe site and leave it as-is. But that’s no fun.) The first is to ‘fix’ Silverstripe itself. With WordPress, I tweaked MySQL and set up APC (which gave a bigger boost than with SilverStripe, but still not a huge gain). But then I ended up coding the main page from scratch, and it uses memcache to store the generated page in RAM for a period of time. Instantly, benchmarking showed that I could handle hundreds of pages a second on the meager hardware I’m hosted on. (Soon to change…)

The other option, and one that may actually be preferable, is to just run the software normally, but stick it behind a cache. This might not be an instant fix, as I’m guessing the generated pages are tagged to not allow caching, but that can be fixed. (Aside: people seem to love setting huge expiry times for cached data, like having it cached for an hour. The main page here caches data for 30 seconds, which means that, worst case, the backend would be handling two pages a minute. Although if there were a network involved, I might bump it up or add a way to selectively purge pages from the cache.) squid is the most commonly-used one, but I’ve also heard interesting things about varnish, which was tailor-made for this purpose and is supposed to be a lot more efficient. There’s also pound, which seems interesting, but doesn’t cache on its own. varnish doesn’t yet support gzip compression of pages, which I think would be a major boost in throughput. (Although at the cost of server resources, of course… Unless you could get it working with a hardware gzip card!)

But then I started thinking… That caching frontend doesn’t have to be local! Pick up a machine in another data center as a ‘reverse proxy’ for your site. Viewers hit that, and it will keep an updated page in its cache. Pick a server up when someone’s having a sale and set it up.

But then, you can take it one step further, and pick up boxes to act as your caches in multiple data centers. One on the East Coast, one in the South, one on the West Coast, and one in Europe. (Or whatever your needs call for.) Use PowerDNS with GeoIP to direct viewers to the closest cache. (Indeed, this is what Wikipedia does: they have servers in Florida, the Netherlands, and Korea… DNS hands out the closest server based on where your IP is registered.) You can also keep DNS records with a fairly short TTL, so if one of the cache servers goes offline, you can just pull it from the pool and it’ll stop receiving traffic. You can also use the cache nodes themselves as DNS servers, to help make sure DNS is highly redundant.

It seems to me that it’d be a fairly promising idea, although I think there are some potential kinks you’d have to work out. (Given that you’ll probably have 20-100ms latency in retreiving cache misses, do you set a longer cache duration? But then, do you have to wait an hour for your urgent change to get pushed out? Can you flush only one item from the cache? What about uncacheable content, such as when users have to log in? How do you monitor many nodes to make sure they’re serving the right data? Will ISPs obey your DNS’s TTL records? Most of these things have obvious solutions, really, but the point is that it’s not an off-the-shelf solution, but something you’d have to mold to fit your exact setup.)

Aside: I’d like to put nginx, lighttpd, and Apache in a face-off. I’m reading good things about nginx.

Flip-Flopper

While looking at job postings on Craiglist, I somehow ended up at Craig Newmark’s personal blog. (He founded Craigslist.) He linked to this YouTube video.

For some reason I’m reminded of Hillary’s tall tale of ducking sniper fire. I respect McCain as a decorated soldier and a Senator who understands bipartanship, but after eight years of the Commander in Chief misleading* us about Iraq, I’d really prefer to not elect a president who will do more of the same.

* Whether it was intentional lies or him being fed false information I’m not sure, but either way, we were misled.

Broken Windows

Last night we were unloading a shopping cart. When done, the place to put it away was pretty far away. But there were about ten other shopping carts littering the parking lot nearby, so I said, “Meh, what’s one more?”

As we got in the car, I proclaimed, “Broken Windows in action!” I think people were confused and assumed I was referring to a literal window which was broken. Instead, I was referring to the Broken Windows Theory, which is an interesting read. The basic premise is that researchers watched an abandoned warehouse. For weeks, no one vandalized the building. One day, one of the researchers (deliberately) broke one of the windows. In short order, vandals knocked out the rest of the windows. The theory is used a lot in policing, but I think it has applications in many other places. Such as parking lots: if you’re diligent in bringing in carts, I’d argue that you’d avoid people doing whta I did. (I also felt the same way at the bowling alley: if we frequently picked up candy wrappers and popcorn from the floor, the place seemed pretty clean. If we slacked, it felt like the place was being trashed by everyone in short order.)

The theory does have its detractors, but it also has strange people who see applications of their theory in parking lots. Enjoy the photo of chives, which have nothing to do with anything, but I just took it and I like it.

Chives

Takin’ Care of Children

From the first two pages of today’s Nashua [NH] Telegraph:

  • A firefighter in Concord was commended after saving the lives of two children in a trailer home. He explained that he drove by and saw smoke pouring out of an attached shed, with the father of the children attempting to extinguish it with a garden hose. The firefighter ran into the burning house and got the kids–who were in their cribs–out.
  • A 20-year-old man was arrested in Derry for allegedly sexually assaulting a 4-year-old. Strangely, he was arrested in the library. He was reading, of all things, a book called Encyclopedia of Rape. Just… wow, dude, you have problems.
  • A man in Brentwood was arrested for beating his 6-month-old son, apparently breaking “more than two dozen bones” on the boy. At his sentencing, his wife spoke saying that he had never hurt the boy.
  • On the front page, a babysitter was arrested with 26 counts each of kidnapping and child endangerment, and felony theft. Before you think she had a warehouse of babies she’d kidnapped, she was actually outsourcing her babysitting. She’d agree to babysit the kid, and then find other babysitters on Craigslist to provide the care. Unfortunately for her, “personal contracts” cannot be assigned/transferred, and I’m pretty sure that child care, unlike lawn mowing, counts as a personal contract. However, I’d contest that kidnapping is a reach, and felony theft is utterly wrong: the services were received, just provided by someone else. It might be a tort, but I’m not sure that subcontracting can be considered theft, even if subcontracting wasn’t permissible. Child endangerment, though, is probably a pretty fitting charge. The parents have a brief statement encouraging other parents to do random, unannounced visits of their child to make sure that they’re actually there. I suppose, in this case, it’d have been appropriate, but it seems somewhat preposterous that one would have to do that, especially since most people hire babysitters because they’re unable to be there.

Also, is there any crime that isn’t committed on Craigslist? Browse around enough and you’ll find flagrant prostitution and drug sales. (You’ll also periodically see people arrested for this stuff in the newspaper… I’d love to work in a police department’s “Craiglist Division,” which might just be a full-time job.) Apparently you now have babysitting-outsourcers.

Missing the Point

This comic was pretty funny, and the age/2 + 7 formula got tossed around a lot by my roommates.

Of course, it gives us the minimum age one can date without being creepy. At 22, it’s [(22/2) + 7], or 18. (I, however, maintain that this discrepancy would, in fact, be creepy.)

But what about the upper age limit? The formula itself is silent on this, but we can easily do some substitution to make it work. If the minimum acceptable age (“M”) is your own age (“A”) divided by two, plus 7, we get:

M = A/2 + 7

We typically solve for M, knowing A. However, the oldest person I could date would have my A as their M, e.g.:

22 = A/2 + 7

With this realization, it’s a simple Algebra 1 question. Subtract 7 from both sides and then multiply by two.

Thus, the maximum age one can date is 2(a-7), where a is your age. For me, it’d be 2(22-7), or 30.

What interests me, though, is that this means I’m allowed to go back four years, but forward eight, within the margin of creepiness.

I built a spreadsheet for people aged 1 to 100 showing this and various other statistics. It’s online here as an HTML document. A few interesting trends emerge that aren’t intuitively obvious working with just the formulas:

  • The formula doesn’t make any sense below age 14.
  • Age 14 is a sort of ‘identity,’ when you’re first able to start non-creepily dating people, apparently, without breaking any laws of mathematics. At age 14, you can’t date anyone older, nor younger, than 14.
  • From there on out, every year you age adds 0.5 to the minimum age you can date, while adding 2 to the maximum age. Thus at 22, I can date 18-30. When I turn 23, my new range will be 18.5 to 32. (At age 100, you can date anyone between 57 and 186. Because dating anyone over 186 would definitely be creepy.)
  • As you can see, the two don’t grow at the same speed; the upper age grows four times as fast as the lower age. An interesting side-effect of this is that this means that, as time goes on, your age becomes radically different than the median age. By the time you reach 100, you’re 21.5 years younger than the median age of people you can date.

I Can’t Take It!

Rusty and I were just talking about the recent decision by the Democratic party and how we’re going to count delegates from the two states, which has left both sides somewhat unhappy.

But then we kind of realized that no one is talking about the real issues? I don’t particularly care how we seat delegates. The whole system sucks, and I hope after 2008 is over we can overhaul the way the DNC works. And I kind of had an epiphany: I feel like I’m trapped in this country, a faded emblem that used to be a beacon of prosperity and freedom.

Let’s talk about some things that actually matter.

  • I paid $53 to put gas in my car yesterday. It’s increasingly tempting to get a hybrid, but they’re in short supply. Not because they’re in high demand (though they are), but because not many are produced. American auto’s only hybrid seems to be the Ford Escape hybrid. (I refuse to count GMC’s “greenest” SUV that gets 20MPG.) A question on Ask MetaFilter today called my attention to the fact that they’re basically impossible to get, with the dealer he went to telling him flat-out that they wouldn’t order one for him. BTW, Ford just announced a $3 billion plant in Mexico.
  • We are the only civilized country in the world that doesn’t have universal health care. Americans are running into massive debt because they got sick. The typical response, beneath it all, seems to be a survival-of-the-fittest mentality that if you get cancer and go bankrupt paying for your treatment, it sucks to be you. Attempts to reform the system are consistently subverted by cries of “socialized medicine” without ever presenting a legitimate claim, just the catch phrase? (And there’s a good point to be made about how this is costing us huge money in less-obvious areas.)
  • If you come to see homosexuality as something that isn’t ‘wrong’ or ‘bad,’ opposition to gay marriage seems appallingly bigoted. I really don’t think opposing gay marriage is any different than opposing interracial marriage.
  • College is $40,000 a year. Schools throughout our country are failing. To quote, well, everyone, No Child Left Behind has left plenty of people behind.
  • Veterans are returning home and getting next to no support, or staying in ramshackle hospitals. Support our troops! Anyone? Those who oppose sending young Americans—my peers; people I went to school with; maybe me if I was born into a different family—to die in someone else’s civil war are branded as unpatriotic and not supporting our troops by the same people who can’t be bothered to waste money caring for our returning soldiers?
  • The United States economy is tanking. It probably has something to do with the fact that our schools are being surpassed by countries around the globe, that our post-9/11 xenophobia has resulted in immigration policies forcing college students who come here from abroad to leave our country, and that our health care costs are through the roof.

The thing is, I really love this country. But all around me I see signs of our great nation crumbling. At times I almost feel trapped. Can we please stop focusing on the things Republicans and Democrats disagree on, and instead work on getting things done? We all love America, want our troops to be cared for, want our schools to be the best, want to get treated in hospitals, and want our economy to thrive. Working with two parties seems to keep us from ever getting anything done, because all we can ever do is disagree. But why does it have to be that way? We all want the same things deep down. Can’t we take our different viewpoints and use them to our advantage, crafting solutions that appease both of us?

Damnum absque injuria

I was pleasantly surprised by what my little 55-200mm Sigma can do! I’ve noticed that if you’re not exacting in aligning the polarizer, you lose a lot of contrast, BUT it’s very easily fixed in Photoshop. I’ve also noticed that, short of focus problems, most everything is easily fixed in Photoshop. (I’ve stopped thinking of the images out of the camera as the final product, really.)

Out!

Shot at ISO 1600, with less noise than I’d expected, even after ‘lighting up’ the shadows a bit in Photoshop. There’s noise if you look for it at high resolutions, but I’d forgotten that 1600 can be quite usable.

As I mentioned at the top of the post, I’ve started doing a lot of post-processing in Photoshop. It’s something I hadn’t really been tuned into until I started doing a lot of photo enhancement, but a lot of images have a sort of ‘haze’ to them. (Shooting through a window, or shooting through a misaligned polarizer, will do this… But some cameras with crappy metering equipment do this on their own.) That’s easily fixed with Levels. Some images aren’t quite as tack-sharp as they should be, which can also be tweaked in Photoshop. Even the best cameras have imperfect dynamic ranges, leaving some details in darker areas obscured, and brighter portions overexposed (“blown out”). So my workflow (that’s a major buzzword right there) is to align images (rotate as needed, and adjust any that have sloping horizons), perform a Shadows & Highlights enhancement (CS2 and newer, I believe, have this feature, which is invaluable!), adjust Levels, and then apply an unsharp mask (I’ve been tending towards Smart Sharpen, 55% over a 1-pixel range, but it gets tweaked as needed.) Periodically I’ll play with Variations to get colors just right, and boost (or tone down, depending) an image’s saturation, but that’s only as-needed.

IMG_2766

That’s straight out of the camera. Not necessarily a bad picture, though a bit underexposed for my liking. (I’d gotten a batch of slightly overexposed shots, so I set it to underexpose slightly, which ended up being a mistake.) But here it is after 60 seconds in Photoshop:

Huh

It’s a striking difference: the apparent ‘haze’ has been lifted: the image is brighter (properly exposed!) and sharper.

It’s really not a great shot, but I tried the obligatory HDR shot:

MerchantsAuto.com Stadium

It’s an okay shot, but I think it’s a case where HDR really isn’t appropriate. It ends up being a very busy shot, and the very bright (very saturated!) colors in the crowd end up drawing attention away from the batter.

Nashua Fishercats Panorama

I’m also becoming a fan of panoramas. I’m glad Mr. T recommended Windows Live Photo Gallery or whatever it’s called; it’s worked pretty well. This ended up being a GIGANTIC photo (15297×1263 pixels, and that’s AFTER a very heavy crop, since my images didn’t line up that well, leaving huge black areas on the top and bottom). The downside is that there’s really no good way to view it; Flickr’s next size up (if you click through) is 1024×85 — 1024 pixels is a good width, but an image 85 pixels tall is practically useless. After that is the original, which I don’t recommend unless you have a fast connection and a lot of time to scroll around.

Anyway, it was fun… We left at the close of the 6th inning because it was getting late, but we (Manchester Fishercats) were losing 8 to 14. But I got some good pictures.

SLRs

I think the best thing about SLRs isn’t their elimination (well, exponential reduction) of shutter lag, nor the support for high ISOs, or even advanced exposure and metering modes. It’s that even at relative high apertures (f/5.6), you can keep a shallow depth of field. Consider this photograph:

Something-flower

(Does anyone know what type of flower this is, BTW?) The photo wouldn’t be half as good if everything were in focus, as a normal camera would have rendered it. But by throwing the distracting (and ugly!) background out of focus, the shot comes out a lot better. I don’t entirely love the depth of field on this one; I wish you could see a little more of the plant clearly (which would have required that I stop the lens down a bit more), but I also wish the background were even further out of focus (which would have required that I open up the lens a bit more). BTW, a little bit of HDR going on here, as it wasn’t the best lighting.

mIMG_2571

There’s another example. Too shallow, or at least, I should have manually selected the autofocus sensor to use one on the left, so that all the caterpillars were in focus. But the background (green and purple bushes) are pleasantly blurred, keeping your attention on the tree.

m-IMG_2593

Here I totally disregarded the rule of thirds. I like it anyway. The other leaves were pretty nearby, so they’re only slightly out of focus. But again, it draws your attention in closer.

m-IMG_2596

There’s the best example. The trees in the background were across the street, and thus extremely out of focus. The camera focused on the leaves, which are tack sharp.

And now, I’m going to go finish mowing the lawn. There were just too many photo opportunities I noticed… 😉

Although I’m attending a Fishercats game tonight… It’ll be my first time with an SLR there. Let’s see how that goes.

Enough Already!

I used to have a great deal of respect for Bill Clinton. Sure, the Monica Lewinksy scandal wasn’t so great, and I was pretty peeved at how many people he pardoned on his way out. But overall, I thought he did a good job, and his continuing work on charitable causes painted a picture of a man who truly cares about helping the world.

The more he campaigns for his wife, the more I think he’s a loose cannon who may have developed mad cow. Besides all the occasions of him flipping out, this news article (with a ludicrously long URL) says it all. His Sunday speech in South Dakota accused the media of some sort of vast conspiracy against his wife, and suggested that McCain will win if Hillary isn’t nominated. (Which is odd since most polls I’ve seen suggest the opposite: we need Obama if we’re going to beat McCain.

I also used to think it was premature and tasteless to try to suggest that she had to withdraw from the campaign. I’d have loved to have seen her withdraw and give her approval to Obama, but I thought it was inappropriate for people to call her to do so. But there are increasing calls for her to do just that, and I think the time has come. Unless she finishes big in June, she’s going to seal her fate, and it’s important that she, as the linked article says, withdraws while she still has some dignity left. Between her husband’s increasingly paranoid-sounding angry speeches, and her comment about Bobby Kennedy being killed*, which still hasn’t blown over, she’s already attracting a lot of negative sentiment, and I don’t think it’s going to get any better. I can’t find the link, but a few vocal people in New York are suggesting that, if/when she loses and goes back to being a Senator, she’s going to have a lot of wounds to heal first.

As Wikipedians would say, it’s time for her to withdraw under WP:SNOW. Wait until the next round of elections, but if she doesn’t finish big, it’s time she gracefully withdraws and urges her followers to cut out the business of promising to vote for McCain if she doesn’t get the nomination. Otherwise, she’s going to fracture the Democratic party, humiliate herself, put John McCain in office, and be hated for 20 years.

* While campaigning in New Hampshire, someone speaking at an event before she arrived said something to the effect of, “Some have compared Obama to JFK… But let’s not forget what happened to him.” Hillary seemed to be genuinely horrified when she was told about the remark, but still… Double references to Kennedys being killed, both times insinuating that the same might just happen to Obama…?