Amazon S3

I really didn’t pay it that much attention, or think about its full potential, at the time it was released. But Amazon’s Simple Storage Servic (hence the “S3”) is really pretty neat. In a nutshell, it’s file hosting on Amazon’s proven network infrastructure. (When have you ever seen Amazon offline?) They provide HTTP and BitTorrent access to files.

Their charges do add up — it might cost a few hundred dollars a month to move a terabyte of data and store 80GB of content. But then again, the reliability (and scalability!) is probably much greater than what I can handle, and it’s apparently much cheaper than it would be to host it with a ‘real’ CDN service.

Sadly, I can’t think of a good use for this service. I suppose the average person really doesn’t need to hire a company to provide mirrors of their files for download. (It would make an awesome mirror for Linux/BSD distributions, but I think the typical mirror is someone with a lot of spare bandwidth and an extra server, not someone paying hundreds a month to mirror files for other people… I wonder if there’s a market for a ‘premium’ mirror service? I doubt it, since the existing ones seem to work fine?)

Desk Cleaning for the OCD

I tend to border on obsessive-compulsive. I wax the wheels on my car. I’m still not happy with the enormous performance gains I’ve gotten by tuning my WordPress setup to move from 4 to 400 (dynamic) pages per second.

So it stands to reason that how I clean my desk is… unique. The problem is that the desk is synthetic wood, so there exist a total of zero cleaning products that work on it. Using something like Windex cleans it but leaves it looking even more dull. And don’t even think about using a wood cleaner. I used Pledge once. It looked great, but it remained slippery for about a week. The Pledge is apparently supposed to sink into the wood. This is somewhat difficult when the desk isn’t made of wood, so you instead get an incredibly slick, oily desk.

I think I finally found the key, though: car wax. You clean with one of the many cleaners just to get junk off of it–I used Simple Green, but Windex would work just as well. And then you pour liquid car wax on and apply it with a cloth as if you were applying it to your car. Wait a while for it to dry, and wipe the white hazy stuff off. The result is a high sheen. It didn’t do as well as I’d like with filling in the scratches, but it looks much better overall.

I admit it: I’m a total dork. I just waxed my desk. But… It works.

Thinking

My mind works in strange ways sometimes. Read and think about each of the following statements:

  • I was cooking a pizza in the oven at 250 degrees, but I was in a big hurry, so I doubled the temperature to 500 degrees.
  • I miss the summer days when it was 80 degrees, and, over night, the temperature would be halved to 40 degrees.
  • It was ten degrees the other morning, and tripled to thirty by noon.
  • It was 0.1 the morning before that, and had risen three-hundred times to 30 degrees by noon.
  • It was -1 before I woke up that morning, so it was -30 times as warm by noon.

To me, it makes progressively less and less sense. But I’m trying to think of why. It’s clearly asymptotic at 0 degrees: if it’s exactly 0 degrees and grows to 0.1 degrees, it’s “infinitely warmer.” Of course, most people wouldn’t notice the tenth of a degree increase, and my concept of “infinitely warmer” is something significantly warmer than 0.1. And it doesn’t make any sense when you go into negatives. I think another part of the problem is that “zero” degrees doesn’t mean “zero warmth,” since it doesn’t make sense to have a negative amount of warmth. (Assuming that “no warmth” isn’t neutral, but is absolute zero.) Of course, Fahrenheit and Celsius don’t even grow at the same rate, compounding things further.

YouTube

One of the many things I try to shy away from is making generalizations. They’re often harmful and downright inaccurate.

But one generalization I do feel comfortable making is that the comments on YouTube are among the worst I’ve ever seen. Even the few that are coherent tend to contain egregious grammar problems. I’m not talking about a missing comma. im talkin about like riteing like this i mean its so dummm why do they do this its like their never lurnd 2 right

Those are the good ones. The bad ones are offensive, pointless (“i like this video so much1!111”), or just downright bizarre. (In the video to one of my favorite songs, you barely see The Killers at all, yet someone left a comment that they love videos like this one where you can see the band playing the whole time.)

I want to know why this is the case. There are some sites (Digg, Slashdot) where there are some dumb comments. But YouTube is notoriously bad. Hilariously so. Except it’s gone way past hilarious, to the point of being irritating and kind of depressing. Is it a demographic thing? Is it swamped by 13-year-olds? (With apologies to 13-year-olds, who probably far exceed the average commenter on YouTube.) Is it a broken windows type thing, where people leave stupid comments because everyone else does?

YouTube recently implemented a rating system, where you can give a thumbs-up or thumbs-down to comments. Good idea. Except it really doesn’t work! For one, they made my classic mistake, but in reverse: they clearly never tested in Firefox (well, Flock or Firefox 3, but Flock is basically Firefox with some more addons and a fancy theme). But that’s not my point. A comment might voted up or down a couple points, but that’s all. There’s no suppression of comments, and the comments remain in chronological order, so comment moderation is pretty pointless.

Geekostat

Disclaimer: I can tell right now that this is one of those late-night posts where I should be sleeping, not posting about a technical topic. But these not-entirely-lucid ones are sometimes the most fun to read.

I consider myself extremely tech-savvy. I can build a computer from parts, make my own Ethernet cables, run some performance tuning on interactive websites, write applications in numerous programming languages (as well as SQL and HTML), and much more.

But I still don’t get our digital thermostat. They’re programmed to go down to 58 at night, come up to 67 on weekends and from something like 6 to 9 a.m., and 3 to 9 p.m. on weekdays. In other words, when people are home.

Of course, me being home on vacation isn’t quite compatible with this. There’s a simple override, where you can hit the up or down arrows to set it to a temperature. While I use (and appreciate!) this, it’s also a pain. It’s really no fun waking up and having it be 58. I’d really like to reprogram it to automatically come up to 63 or so around 10:30.

I still don’t get why the whole thing isn’t on the LAN. This would have two obvious benefits right out of the gate–it’d be much easier to configure (even if you let someone with no clue about usability design the GUI, it’ll be better than the myriad knobs, switches, and buttons on our thermostat!), and it’d be more convenient in many cases to pull up a new tab in your web browser than to walk down the hall to the thermostat. (Plus, the thermostat is in my parents’ bedroom. I’d have loved to have turned the heat up a few degrees around 11 tonight, since it’s 9 outside and almost as cold inside. But something tells me they really wouldn’t have appreciated it.)

I’m also not sure that the ‘simple’ thermostat algorithm is that efficient. You figure it works something like:

while(1) { $temp = getTemperature(); $desired = readDial(); if($temp<$desired) furnace.enable; if($temp>$desired) furnace.disable; }

When we view it at ‘computer speed,’ I think we can see one of the basic problems: in theory, the furnace could start flapping, where on one loop iteration it turns the furnace on, and just a fraction of a second later, it turns it off. I don’t profess to know a lot about the overhead in starting a furnace, but I’d imagine that it’s most efficient to let it run for a few minutes.

I think a much better system would be to have a programmed minimum run time: if the furnace is turned on, we should run it for at least 5 minutes. After 5 minutes, we again evaluate the temperature: if it’s at the target, we turn it off. If not, we drop into a quicker polling, maybe once every minute. Incidentally, this is much better for the thermostat’s processor, but if its sole purpose is determining whether to turn something on or off, no one really cares about minimizing overhead.

So you give it a secondary purpose: handling a TCP/IP stack and a basic webserver! All of a sudden, instead of an infinite loop, you run a tiny bit of code every 30 seconds.

You can also generate some interesting statistics. For example, how long does the furnace need to run to raise the temperature one degree? How does this scale–if you want to raise it three degrees, does it take three times as long? How does the temperature of my house look when graphed across a day? How about telling me how long the furnace ran yesterday? And, given information about my furnace’s oil consumption and our fuel costs, it’d be cool to see how much it’s costing. And it could give us suggestions: “If you drop the temperature from 68 to 67, you’ll save $13.50 a month,” or such. This would require some storage, but a gig of solid-state media (e.g., a camera’s SD or CF card) is around $10-20 now. Plus, with the advent of AJAX, you can push some of the processing off to the client–let the client use a Flash applet or some good Javascript to draw the graphs if the thermostat is underpowered!

In conclusion, I’m freezing.

Idea

Why isn’t there a really good “network appliance” as a network gateway? You can get a low-end firewall/router, or you can build your own machine.

Setting up OpenBSD is no walk in the park, though. I want to build an “appliance” based on OpenBSD, and give it a nice spiffy web GUI. You buy the box, plug one side into your switch and one side into your cable modem or whatnot, and spend ten minutes in a web browser fine-tuning it. I was really fond of the appearance of the Cobalt Qube, although it could be made much smaller. And throw a nice LCD on the front with status. You can run a very low-power CPU, something like the one powering these. It really doesn’t need more than 512MB RAM, but give it a small solid-state drive. And a pair of Gigabit cards, not just for the speed, but because GigE cards usually are much higher-quality. In building routers, the quality of your card determines how hard the CPU has to work.

There’s so much that a router can do. You can run a transparent caching proxy, a caching DNS server, priority-based queuing of outgoing traffic (such as prioritizing ACKs so downloads don’t suffer because of uploads, or giving priority to time-sensitive materials such as games), NAT, an internal DHCP server, and, of course, a killer firewall. You can also generate great graphs of things such as bandwidth use, blocked packets, packet loss, latency…You can regulate network access per-IP or per-MAC, and do any sort of filtering you wanted. It could also easily integrate with a wireless network (maybe throw a wireless card in, too!), serving as an access point and enabling features like permitting only certain MACs to connect, requiring authentication, or letting anyone in but requiring that they sign up in some form (a captive portal). And I really don’t understand why worms and viruses spread so well. It’s trivial to block most of them at the network level if you really monitor incoming traffic.

I’m frankly kind of surprised that nothing of this level exists. I think there’s a definite market for quality routers. A $19 router does the job okay, but once you start to max out your connection, you’ll really notice the difference! A good router starts prioritizing traffic, so your ssh connection doesn’t drop and your game doesn’t lag out, but your webpages might load a little slower. An average router doesn’t do anything in particular and just starts dropping packets all over the place, leaving no one better off. (And a really bad router–our old one–seems to deal with a fully-saturated line not by dropping excess packets or using priority queueing, but by reboot itself, leaving everyone worse off… I think this may have had to do with the duct tape.)

High Dynamic Range

I’d been seeing a lot about HDR, or High Dynamic Range, photography. In layman’s terms, the dynamic range of a camera is the range from the darkest to the lightest parts a camera can record in one shot. The problem is that the dynamic range of cameras doesn’t match real life that often.

Long ago, photographers found a halfway decent solution: graduated filters. Basically, you stick a filter in front of the lens, with part of it darker than the rest. It’s great if, say, you want to take a great picture at the beach with both foreground detail and the sky properly exposed.

With computers, though, there’s been another photo. You take a series of bracketed shots: one or two for the sky, one or two for the foreground, etc. Some people have been known to stitch together close to a dozen. Having a tripod helps tremendously here, since the images need to be pretty much exactly the same besides exposure.

Strictly, HDR requires more than a monitor can really display, but a technique called tone mapping is often used. The basic premise is to take the “good” parts of each shot in a bracketed series and stitch them together. Photoshop CS2 and newer has an HDR utility, though I’ve been pretty unimpressed with the results. Today I started playing around with an Open Source tool called Qtpfsgui. It’s even cross-platform! It supports multiple algorithms for doing tone mapping, too.

Overall, I’m still not that happy with the results, but it’s a start. Here’s a ‘normal’ shot of the beach, taken on Cape Cod yesterday:

title=”Beach by n1zyy, on Flickr”>Beach

You’ll note that the foreground (e.g., the bench) is too dark, yet the sky is too light. It’s a good illustration of insufficient dynamic range.

Luckily, I knew in the back of my head that I wanted to try my hand at HDR photography, so I saw it as an opportunity. I set my camera to meter -2 to +2 EV, to try to cover the full range. The end product:

title=”Fattal Algorithm by n1zyy, on Flickr”>Fattal Algorithm

It displays a very common pet peeve of mine with HDR photos: it looks entirely unrealistic. Absurd, even. I think part of it’s that it’s just overdone, and that the contrast is jacked way up. I want to play around with it more and see if I can get a more natural product. So far, no luck. But, at least in a technical sense, it’s an improvement over the first image.

I’d like to see HDR come a little further, so that HDR photos don’t have the same, “Whoa!” quality that a scary old lady with way too much makeup has. I don’t think the limitations are entirely technical at this point, either.

Geek

We’ve been having a lot of intermittent network problems at home. Periodically, our Internet cuts out. At first I assumed it was our ISP–it’s no longer Adelphia (run by pharmacists), though–but subsequent research indicated that it wasn’t our ISP’s fault: our router was going down.

My dad set it all up, so I wasn’t too sure how things went. I was pretty confident that we were just using a generic store-bought broadband router, though, so I found it strange that it would be drifting in and out. It turns out that I overlooked something about the router: it’s being held together with duct tape.

I’d already been intrigued by OpenBSD’s pf, so this seemed like a sign! I commissioned an old desktop system, loaded OpenBSD up on it, and went to work configuring it. OpenBSD was just more different from Linux than I expected. It asks you if you want to let OpenBSD use the whole hard drive. I said yes, and thought, “Wow, this is just as easy as Ubuntu!” But it turns out that this was just the first stage. After this, you have to set “disk labels,” which are sort of like partitions but ambiguously different. The syntax is obscure, the purpose is obscure, and so forth. Then I had to configure the network. NICs are named by the drivers they use, so instead of eth0 and eth1 (for Ethernet), I have rl0 (Realtek) and dc0 (who knows).

I was also extremely confused trying to set up routing. Long-term, it was going to be the router, but short-term, it needs to know about our existing router so that it can connect and download the requisite packages.

So I finally got it all set up. I also installed MySQL (unnecessarily, it turns out), Apache, and PFW, a web-based configuration tool for pf. I ended up not using PFW, because my understanding of pf is so bad that I’m basically relegated to copying-and-pasting rules from websites into the configuration file.

Even using pf is confusing. It’s called pf, but typing “pf” at the command line doesn’t do anything. It turns out that you control it with a tool called “pfctl.” You can do pfctl -e to enable pf, and pfctl -d to disable it.

As I tried to tweak the firewall/routing rules, I’d periodically “restart” pf by disabling and then re-enabling it. I wasn’t sure if it read the rules “live” or if a restart was needed. It turns out… neither! The rules are stored in memory, but restarting pf doesn’t flush the rules. You need to pass pf some more arguments to tell it to flush the cache and read them anew from its configuration file.

After a few more hours of work, I thought it was all set up. Both NICs were configured, the external one to get an IP over DHCP, and the internal one with a low fixed IP. I had a complex set of rules, doing NAT, filtering traffic, and using HFSC for prioritized queueing. (HFSC seems completely undocumented, by the way. I took my tips from random websites.) It seemed very impressive: I prioritized ACKs so that downloads wouldn’t suffer if our outbound link was saturated. (Aside: it really doesn’t make sense to do queueing on incoming traffic, since the bottleneck is our Internet link, not our 100 Mbps LAN.)  I also afforded DNS, ssh, and video game traffic high priorities, but allocated them a lower percentage of traffic. I even figured out the default BitTorrent ports and gave them exceptionally low priority: if our line is fully saturated, the last thing I care about is sharing unnecessary data with other people.

And there are other neat features. It “scrubs” incoming connections, reassembling fragmented packets and just eliminating crap that doesn’t make sense. It catches egregious “spoofing” attempts and discards them.

I hooked up the second LAN connection to test it out, rebooted, and… waited.

It never came up. Well, it did come up. The computer’s running fine. Both network cards show up with the switch. Doing an nmap probe of our LAN, I see one strange entry. It’s actually pretty mysterious: it has no open ports, and attempting to ssh into it just sits there: it doesn’t send a connection refused, but completely ignores the incoming packets, leaving my poor ssh client sitting there waiting for a reply, having no clue what’s going on.

In a nutshell, it seems that I just built a firewall/router that’s so secure that I can only find one of its two cards on the network, and I can’t even try to log into it. Let’s see you hack that! Of course, this does have some issues. For example, I can’t use it.

I haven’t lost hope yet: I have a keyboard and monitor so I can log in on the console and try to do some tweaking there. (You can’t firewall off the keyboard.) It’s just not very encouraging to think, “Alright, let’s reboot and make sure it works as flawlessly as I think it will” and then have the darned thing not even show up on the network.

An Uncontrollable Urge

A few years ago Andy and I ran a hosting company. It never got that far, but it was fun, and also a learning experience.  Today I’m finding that I can’t get the idea of starting it again out of my head. The problem is that, this time, I’d want to start it big.

There are a bunch of technologies that I find downright exciting:

  • Old racks full of blade servers are hitting the used market. And by “old” I mean dual 2-3 GHz Xeons, a gig or two of RAM, and hard drives that still rival what hosts are renting in dedicated servers. I’d probably want to put in new drives, but the machines are cheap and they’re plentiful.
  • Boston has a number of good data centers, and all the big Tier 1 providers are here. That there seem to be no well-known hosting companies out here is frankly kind of surprising. You have no idea how badly I want to pick up a couple racks in a colocation facility, and pull in a couple 100 Mbps lines.
  • cPanel looks like it’s matured a lot since I last used it, and it has some good third-party stuff such as script installers. It looks like it remains the number one choice in virtual hosting.
  • Xen is downright exciting. It permits splitting a physical host into multiple virtual machines. With the advent of chips with hardware virtualization support from both AMD and Intel, it now runs with very little overhead. It used to require extensive modifications to the “guest” OS, so that only modified versions of Linux worked. With newer processors, though, you’re able to run machines without them having to know they’re in a virtual machine, opening up options. You can run Windows now. The virtual dedicated server / virtual private server market is growing. (Xen also supports moving hosts between physical servers, which has a lot of nice applications, too!)
  • OpenBSD’s firewall, pf, continues to intrigue me for its power. I just found PFW, a really spiffy web GUI for managing pf. Not only does it do basic firewall stuff, but it’s got support for prioritization of traffic / QoS, and for load balancing. I’m probably just scratching the surface.
  • I’ve spent years honing my admin skills and improving server performance. Improved performance on a shared server, of course, means more clients per server, or more money.

I’m wholly convinced I should start a Boston hosting company. I just need $100,000 capital or so. (Santa, do you read my blog? Do you fund businesses? I’ll give you partial equity.)

Knots in My Stomach

Thanks Rusty for finding the Electoral-Vote.com website, something I’d forgotten about from the 2004 election. The data is in a bit of a confusing layout… Disregard the 2004 map and the first little table. He then has a comprehensive list of polls state-by-state.

My eyes are on Clinton:Obama. And I seriously have knots in my stomach here. Clinton is winning by at least 10% in most places. Arizona is 44% to 14%. In his home state of Illinois, Obama’s winning 37% to 33%.

The good news! Iowa, a key state, is slightly favoring Obama. But really, it’s a crapshoot: Obama, Edwards, and Clinton are neck-and-neck. Romney and Huckabee lead the Republican primary. At this point in time, though, my main concern is on the Democratic primary.

Here in New Hampshire, Obama’s trailing, 26% to 38%. This is not good. We’re #2 after Ohio.

Oklahoma’s weird. Obama’s got 13%, with Clinton and Edwards tied at 29%. (Don’t get me wrong: Edwards is good, but I don’t think he has a chance right now.)

The Republican one is interesting to take a gander at, too. In some places, Huckabee’s an also-ran. In Arizona, he got 3% of the votes. Once. In Iowa, he inches past Romney to take first place at 28%. Surprisingly (to me, at least), he’s doing the exact same thing in New Hampshire. With a quick skim (admittedly, much less than I’ve afforded the Democratic primary), it looks like Giuliani is king of the Republican race.

But a few thoughts:

  • I think the odds of Edwards winning the primary are slim. But he carries a substantial margin in some places. If he were to drop out and endorse Obama, the impact would be considerable. I worry that most of his fans would support Hillary, though.
  • I think we need to review the statistics after the Iowa caucus (January 3) and the New Hampshire primary (January 8). Everyone’s watching these, and the results will have a big impact. A strong lead by Obama may pull out some undecideds. Or, a strong lead by Clinton may freak out some people who will vote for Obama just to vote against her. (While I’d back her if she were our nominee, she is not my preferred Democrat, if you can tell.)
  • My super-early-money is on Clinton vs. Giuliani. And this concerns me greatly, because people voting on first impressions will probably favor Rudy without really doing a lot of research. (It also concerns me because I don’t particularly like either of them.)
  • The Republicans are getting weird results: Giuliani wins some places, Romney wins some places, McCain’s got a few wins (probably the least), and Huckabee, who I initially thought was the Kucinich of the Republicans, is actually leading in quite a few places. I’m really not sure who’s going to get their nomination.
  • As we saw in 2004, polls can be flaky. (I twice typed “pols” instead of “polls.” Freudian slip?) So this doesn’t necessarily mean anything.

One-sentence conclusion: It’s too soon to really have any idea how things will go, but Clinton has a discomforting majority in many states.

A few parting thoughts:

  • Read up on the Iowa caucus process if you’re not familiar. It’s quite foreign, really.
    • Apparently, only once in history (or once in five, put differently: an important distinction!) has the Straw Poll winner not matched the Iowa caucus winner. And this year’s Straw Poll winner was Romney. Both Giuliani and McCain screwed everything up by blowing the event off, and thus polled very poorly. I don’t know what this means: this might still tick off Iowa voters, tanking Giuliani in Caucus as well. But it also means that the data is probably skewed away from them right now, and if Iowa voters don’t have a vengeance, they may take votes away from Romney.
  • The Iowa Caucus is less than two weeks away, and the NH primary is less than three. Pay more attention to the statistics then.
  • Vote!