Vista, 5-Minute Review

When I graduated from college, I lost my license to use Office*, and had various other key things shut off. I was given an opportunity to purchase Vista (Business Edition) Upgrade and Office 2007 (Enterprise) for $20 each, so I figured I should, since I had no media for XP or Office. And then I remembered I had a spare 60GB partition for Windows on my 160 GB drive from when I’d intended to dual-boot, so I just installed it here. A few thoughts:

  • The install was “easy” but frankly not that great. I’ll dock a small number of points because, unlike the Ubuntu installer, it’s not a LiveCD: I can’t use the system while it’s installing. When your installer takes 20-30 minutes, it’s very nice to have a browser or game or something going. It also seemed to take forever, and at the end, went to reboot, but gave me a “Reboot Now” option, which I took. It never ejected my DVD, nor did it tell me to, so I figured I was supposed to leave it in…
  • …So it booted into the installer agian. I closed it, and got a message that I couldn’t use Windows if I didn’t install Windows. (Thanks… Though I suppose there are people who actually need that message.) And then it warned me that if I cancelled the installation, my computer may reboot.
  • Of course, I wanted to reboot my computer, so I said OK. My computer did not reboot.
  • Everything feels much more polished!
  • It spent several minutes “evaluating [my] computer’s performance” before going away with no indication of what had just happened. (I knew enough to find it, though: a 3.1. This bothers me slightly, since it’s a fairly meaningless number, but I digress.)
  • All of my text is blurry. Yes, I’m at the native resolution. (Which was detected automatically.) I assume it’s related to ClearType (or a lack thereof?), but I can’t find anything about it?
  • I set up wireless very easily. (Well, after I found the icon in the tray.)
  • Windows is obsessed with popping little bubbles up all over my screen. I guess it’s understandable since it’s the first time I’ve run it, though I’d be a lot happier if it didn’t offer to check GMail and my blogs for phishing attacks. Repeatedly.
  • How do I get a command prompt? (No, I’m serious. Is there a ‘cmd’ in Vista?)
  • The default desktop has one icon, the Recycle Bin. I like this uncluttered look.
  • Now I see what everyone was complaining about. Much like it’s obsessed with bubble notifications, it’s obsessed with asking me if I want to give permission to various things. The problem is that I’ll double-click on the clock in the system tray to set it up to sync to NTP, and get asked if I want to allow access to the clock. Yes, I do; that’s why I just tried to change it. Where do I turn this off?
  • Linux and Windows XP let me use the far-right of my touchpad as a scroll wheel. This feature is missing in Vista?

It’s too soon for a thorough review, but I am a fan of first-impressions things. And my first impressions are so-so. Probably a big improvement over XP, but with quite a few irritations.

Oh! I got the upgrade, which means you have to install it over an existing Windows thing. Except it was on another hard drive, so I’m using the well-known quirk where you can install it without a license key, and then “upgrade” that to the exact same version and put in a license key.

Skype

This is really pretty well-documented, but it’s easy to overlook or forget…

Skype, by default, will engage in some sort of peer-to-peer call relaying. If you ever look and see that you have a bajillion network connections open on some strange port, you can almost certainly blame Skype. (Close it and they go away.)

Building an Improvised CDN

From my “Random ideas I wish I had the resources to try out…” file…

The way the “pretty big” sites work is that they have a cluster of servers… A few are database servers, many are webservers, and a few are front-end caches. The theory is that the webservers do the ‘heavy lifting’ to generate a page… But many pages, such as the main page of the news, Wikipedia, or even these blogs, don’t need to be generated every time. The main page only updates every now and then. So you have a caching server, which basically handles all of the connections. If the page is in cache (and still valid), it’s served right then and there. If the page isn’t in cache, it will get the page from the backend servers and serve it up, and then add it to the cache.

The way the “really big” sites work is that they have many data centers across the country and your browser hits the closest one. This enhances load times and adds in redundancy (data centers do periodically go offline: The Planet did it just last week when a transformer inside blew up and the fire marshalls made them shut down all the generators.). Depending on whether they’re filthy rich or not, they’ll either use GeoIP-based DNS, or have elaborate routing going on. Many companies offer these services, by the way. It’s called CDN, or a Contribution Distribution Network. Akamai is the most obvious one, though you’ve probably used LimeLight before, too, along with some other less-prominent ones.

I’ve been toying with SilverStripe a bit, which is very spiffy, but it has one fatal flaw in my mind: its out-of-box performance is atrocious. I was testing it in a VPS I haven’t used before, so I don’t have a good frame of reference, but I got between 4 and 6 pages/second under benchmarking. That was after I turned on MySQL query caching and installed APC. Of course, I was using SilverStripe to build pages that would probably stay unchanged for weeks at a time. The 4-6 pages/second is similar to how WordPress behaved before I worked on optimizing it. For what it’s worth, static content (that is, stuff that doesn’t require talking to databases and running code) can handle 300-1000 pages/second on my server as some benchmarks I did demonstrated.

There were two main ways to enhance SilverStripe’s performance that I thought of. (Well, a third option, too: realize that no one will visit my SilverStripe site and leave it as-is. But that’s no fun.) The first is to ‘fix’ Silverstripe itself. With WordPress, I tweaked MySQL and set up APC (which gave a bigger boost than with SilverStripe, but still not a huge gain). But then I ended up coding the main page from scratch, and it uses memcache to store the generated page in RAM for a period of time. Instantly, benchmarking showed that I could handle hundreds of pages a second on the meager hardware I’m hosted on. (Soon to change…)

The other option, and one that may actually be preferable, is to just run the software normally, but stick it behind a cache. This might not be an instant fix, as I’m guessing the generated pages are tagged to not allow caching, but that can be fixed. (Aside: people seem to love setting huge expiry times for cached data, like having it cached for an hour. The main page here caches data for 30 seconds, which means that, worst case, the backend would be handling two pages a minute. Although if there were a network involved, I might bump it up or add a way to selectively purge pages from the cache.) squid is the most commonly-used one, but I’ve also heard interesting things about varnish, which was tailor-made for this purpose and is supposed to be a lot more efficient. There’s also pound, which seems interesting, but doesn’t cache on its own. varnish doesn’t yet support gzip compression of pages, which I think would be a major boost in throughput. (Although at the cost of server resources, of course… Unless you could get it working with a hardware gzip card!)

But then I started thinking… That caching frontend doesn’t have to be local! Pick up a machine in another data center as a ‘reverse proxy’ for your site. Viewers hit that, and it will keep an updated page in its cache. Pick a server up when someone’s having a sale and set it up.

But then, you can take it one step further, and pick up boxes to act as your caches in multiple data centers. One on the East Coast, one in the South, one on the West Coast, and one in Europe. (Or whatever your needs call for.) Use PowerDNS with GeoIP to direct viewers to the closest cache. (Indeed, this is what Wikipedia does: they have servers in Florida, the Netherlands, and Korea… DNS hands out the closest server based on where your IP is registered.) You can also keep DNS records with a fairly short TTL, so if one of the cache servers goes offline, you can just pull it from the pool and it’ll stop receiving traffic. You can also use the cache nodes themselves as DNS servers, to help make sure DNS is highly redundant.

It seems to me that it’d be a fairly promising idea, although I think there are some potential kinks you’d have to work out. (Given that you’ll probably have 20-100ms latency in retreiving cache misses, do you set a longer cache duration? But then, do you have to wait an hour for your urgent change to get pushed out? Can you flush only one item from the cache? What about uncacheable content, such as when users have to log in? How do you monitor many nodes to make sure they’re serving the right data? Will ISPs obey your DNS’s TTL records? Most of these things have obvious solutions, really, but the point is that it’s not an off-the-shelf solution, but something you’d have to mold to fit your exact setup.)

Aside: I’d like to put nginx, lighttpd, and Apache in a face-off. I’m reading good things about nginx.

Spam

While I’ve somehow completely eradicated comment spam for the time being here, I’m getting a decent amount on an old page elsewhere on the server that allowed comments. It fell into disuse long ago, so I’ve done some behind-the-scene tweaks to make it more of a honeypot for spammers. They don’t seem to care that no one visits the page anymore. I did a purge of a lot of the spam, before, halfway-through, deciding to keep it as a honeypot. Some of them are replying to threads that don’t even exist anymore. I’d never coded in anything to check whether the parent thread existed, so it’s “accepting” their comments, but they’re not even showing up on the page. And still they come!

Recent words of inspiration from one spammer, right before a set of links to porn:

Your site has very much liked me. I shall necessarily tell about him to the friends.

I beg to differ. My site does not very much like you. While it may smile while accepting your comments, it’s not smiling because it likes you. It’s smiling because it’s assembling a list of spamming IPs, and you’ve just landed on it. Please do, however, tell your friends to spam the page, too.

P.S. – I have deliberately refrained from linking to the page being abused, as I want to minimize its popularity, both to avoid giving the spammers exposure and to minimize the risk of someone leaving actual comments.

IPTV

Very often, I’ve wondered why TV isn’t carried over IP yet. For something broadcast over the public airwaves, it seems strange that no one makes it available over the Internet. I don’t mean being able to play little snippets and stories. I mean that I’d like to be able to do the same thing I can do with some radio stations: stream exactly what they’re broadcasting.

I don’t have a TV in my room. And frankly, I’d buy a 30″ LCD computer monitor before I spent the same amount on a 30″ LCD TV. But I do have several computer monitors. (In theory, my laptop and two 17″ LCDs, plus two 19″ CRTs, though none of them are hooked up right now.)

I think someone sufficiently enterprising could set something up, though. Think of a MythBox, which has a TV capture card. (Yes, they support HDTV.) It’s oriented towards recording, but really, there’s no reason you couldn’t simultaneously stream it over the LAN. It would require a decent amount of horsepower, but quad-core processors are getting cheap. One of those could easily serve a household. You’d just need enough TV cards to allocate one per simultaneous channel being watched (or recorded).

And then you just build a little webserver into the thing, and let me pull up streaming video from any channel I get over cable.

Heck, it’d make a nice appliance…

I Take the “Suck” out of “Internet”

Working title: “Using Crafty Google Searches to Turn in Spammers”

Like most Internet users, I get a lot of spam. GMail filters 99.99999999% of it correctly, but I periodically browse through the spam to make sure. One day I read one of them, and realized it linked to something.blogspot.com, a Blogger blog. As I’ve previously posted, you can report spam blogs to Blogger with a simple form.

So I then searched my spam for “blogspot” (in:spam blogspot) and went to the handful of URLs (stripping off any variables passed) to verify they were spam, and reported each.

But it gets better! I reported maybe 4 people that way. But I get a lot of spam with the subject “What a stupid face you have here $name,” where $name is the e-mail address they send it to. The body of the message just contains the word “Watch” with a link, which always takes you to a file called watch.exe on various servers (most likely hacked by a worm to host there?)… I’m not about to download it to see what it does, though, but I assume it’s no good.

So I was curious about it, so I Googled “what a stupid face you have here,” and realized that a lot of the results were spam. And, in fact, several were on Blogger. So I refined my search to site:blogspot.com "what a stupid face you have", and started clicking through to find them. A few are people posting about the mail, but most are splogs.

Servers

LayeredTech just announced a 50% increase in server pricing for me. Consequentially, I’m working today to get my virtual machine up and running, and then I’m going to move everything over there. This is an all-around upgrade, too: we’re moving to a closer data center (PA instead of TX) on a faster machine, and inside a shiny new virtual machine where everything will be set up right and where upgrading to a new version won’t require spending 20 minutes to update Portage and then even more time to compile everything.

I’ll keep you posted; I just got networking set up on the machine, next comes pulling down updates and basic configuration, and then all the packages!