Easy Backups on Linux

It’s good to keep backups, especially of servers in remote data centers using old hard drives.

rsync -vaEz --progress user@remote:/path /local/path

In my case, I’m doing it as root and just copying /, although, in hindsight, I think I should have used the –exclude=… option… It doesn’t make any sense to me to “back up” /proc or /dev, /tmp is iffy, and /mnt is usually not desired.

A few notes: I use –progress because otherwise it wants to just sit there, which is irritating.

-a is archive, which actually maps to a slew of options. -z enables compression. Note that this may or may not be desirable: on a fast link with a slower machine, this may do more harm than good. There’s also a –bwlimit argument that takes KB/sec as an argument. (–bwlimit=100 would be 100KB/sec, or 800kbps.)

Using rsync for backups is nothing new, but it’s still not used as widely as it could be. A seemingly-common option is to create a huge backup with tar, compress it, and then download the massive file. rsync saves you the overhead of making ludicrously-large backup files, and also lets you just download what’s changed, as opposed to downloading a complete image every time. It’s taking forever the first time, since I’m downloading about 40GB of content. But next time, it’ll be substantially quicker.

With backups this easy, everyone should be making backups frequently!

MySQL

Sun bought MySQL.

Also, Sun’s CEO {has a blog, doesn’t know how to resize images other than changing the HTML attributes}.

Remember back when they were a little below $5 a share and I said I thought they were going somewhere?

Next time I’m putting my money where my mouth is. They closed at $15.92 a share on Friday.

Of course, some are wondering whether this was a good buy. Not necessarily whether MySQL is good (it’s perhaps the most widely-used database in the world), but whether it makes sense to pay a billion dollars for it, when it’s (1) primarily an OpenSource product, and (2) going to take something like 20 years of revenues to break even. While I don’t quite buy the bit about it being a conspiracy with Oracle to kill the project, you should check out the page they link to, Sun’s list of acquisitions. It’s so bad that Sun appears to have a photograph of a dumpster with the Sun logo on it. (Okay, it’s a shipping crate. But it doesn’t make a ton of sense, and you have to grant that it looks a little bit like a dumpster.) It reminds me of when Sun bought Cobalt for $2 billion, and Cobalt went belly-up shortly thereafter. (I still think RaQs could be hot sellers today, by the way, if they were still being made. To take a company doing incredibly well and have it go belly-up in under a year takes some incredible mis-management.)

DNS Dork

The real geeks in the room already know what the root zone file is, but for those of you with lives… DNS (Domain Name Service) is the service that transforms names (blogs.n1zyy.com) into IPs (72.36.178.234). DNS is hierarchical: as a good analogy, think of there being a folder called “.com,” with entries for things like “amazon” and “n1zyy” (for amazon.com and n1zyy.com, two sites of very comparable importance.) Within the amazon ‘folder’ is a “www,” and within “n1zyy” is a “blogs,” for example. A domain name is really ‘backwards,’ then: if it were a folder on your hard drive, it would be something like C:.com.n1zyyblogs.

Of course, this is all spread out amongst many servers across the world. When you go to connect to blogs.n1zyy.com, you first need to find out how to query the .com nameservers. The root servers are the ones that give you this answer: they contain a mapping of what nameservers are responsible for each top-level domain (TLD), like .com, .org, and .uk.

So you get your answer for what nameservers process .com requests, and go to one of them, asking what nameserver is responsible for n1zyy.com. You get your answer and ask that nameserver who’s responsible for blogs.n1zyy.com, and finally get the IP your computer needs to connect to. And, for good measure, it probably gets cached, so that the next time you visit the site, you don’t have to go through the overhead of doing all those lookups again. (Of course, this all happens in the blink of an eye, behind the scenes.)

Anyway! The root zone file is the file that the root servers have, which spells out which nameservers handle which top-level domains.

Yours truly found the root zone file (it’s no big secret) and wrote a page displaying its contents, and a flag denoting the country of each of the nameservers. The one thing I don’t do is map each of the top-level domains to their respective country, since, in many cases, I don’t have the foggiest clue.

What’s interesting to note is that a lot of the data is just downright bizarre. Cuba has six nameservers for .cu. One is in Cuba, one in the Netherlands, and four are in the US. Fiji (.fj) has its first two nameservers… at berkeley.edu. American universities hosting foreign countries’ nameservers, however bizarre, isn’t new. .co (Colombia) has its first nameservers in Colombia (at a university there), but also has NYU and Columbia University (I think they did that just for the humor of Columbia hosting Colombia).

In other news, it turns out that there’s a list of country-to-ccTLD (Country-Code Top Level Domain) mappings. I’m going to work on incorporating this data… Maybe I can even pair it up with my IPGeo page with IP allocations per country…

Stomatron

I’ve been working on my resume as I seek to apply for a job that’s a neat blend of multiple interests–managing web projects (even in my preferred LAMP environment), politics, and even a management potential. And as I do it, I’m remembering all the stuff I did at FIRST, and reflecting on how much better it could be.

I was “fluent” in SQL at the time, but didn’t know some of the neater functions of MySQL. For example, when I wrote the web management interface to the Stomatron, I didn’t know that I could make MySQL calculate times. So I’d retrieve a sign-in and sign-out time and use some PHP code to calculate elapsed time. This wasn’t terrible, really, but it just meant that I did more work than was necessary.

More significantly, I didn’t know about the MySQL query cache. (Actually, I don’t know when it was introduced… This was five years ago.) Some of the queries were quite intense, and yet didn’t change all that often. This is exactly where the query cache is indicated.

Worse yet, I really didn’t do much with the idea of caching at all. Being the stats-freak that I am, I had a little info box showing some really neat stats, like the total number of “man hours” worked. As you can imagine, this is a computation that gets pretty intense pretty quickly, especially with 30+ people logging in and out every day, sometimes repeatedly. Query caching would have helped significantly, but some of this stuff could have been sped up in other ways, too, like keeping a persistent cache of this data. (Memcache is now my cache of choice, but APC, or even just an HTML file, would have worked well, too.)

And, 20/20 hindsight, I don’t recall ever backing up the Stomatron box. (I may well be wrong.) Especially since it and our backup server both ran Linux, it’d have been trivial to write a script to run at something like 3 a.m. (when none of us would be around to feel the potential slowdown) to have it do a database dump to our backup server. (MySQL replication would have been cool, but probably needless.) If I were doing it today, I’d also amend that script to employ our beloved dot-matrix logger, to print out some stats, such as cumulative hours per person, and maybe who worked that day. (Which would make recovery much easier in the event of a catastrophic data loss: we’d just take the previous night’s totals, and then replay (or, in this case, re-enter) the day’s login information.)

I’m not sure it was even mainstream back then, but our website could have used a lot of optimization, too. We were admittedly running up against a really slow architecture: I think it was a 300 MHz machine with 128MB RAM. With PostNuke, phpBB, and Gallery powering the site, every single pageload was being generated on the fly, and used a lot of database queries. APC or the like probably would have helped pretty well, but I have to wonder how things would have changed if we used MySQL query caching. Some queries (like WordPress’s insistence on using exact timestamps in every one) don’t benefit. I wonder if phpBB is like that. I have a feeling that at least the main page and such would have seen a speedup. We didn’t have a lot of memory to play with, but even 1MB of cache probably would have made a difference. As aged as the machine was, I think we could have squeezed more performance out of it.

I’m still proud of our scoring interface for our Lego League competition, though. I think Mr. I mentioned in passing a day or two before the competition that he wanted to throw something together in VB to show the score, but hadn’t had the time, or something of that sort. So Andy and I whipped up a PHP+MySQL solution after school that day, storing the score in MySQL and using PHP to retrieve results and calculate score, and then set up a laptop with IE to display the score on the projector. And since we hosted it on the main webserver, we could view it internally, but also permitted remote users to watch results. It was coded on such a short timeline that we ended up having to train the judges to use phpMyAdmin to put the scores in. And the “design requirements” we were given didn’t correctly state how the score was calculated, so we recoded the score section mid-competition.

I hope they ask me if I have experience working under deadlines.

Inexcusable

Culled from recent news, here are some things that have occurred that I can find absolutely no excuse for having happened:

  • Hackers infiltrated computer systems, turning off power to several (foreign) cities. I guess it makes sense that the power grid would now be controlled by computers, but it’s sheer idiocy to have such a system, in any way, connected to the Internet. (And one has to suspect it was, in some manner, an inside job: I can’t imagine there’s a spiffy web GUI with a “Turn off power to Washington, DC” button, but rather some inscrutable interface.)
  • This is actually old news, but it was dug up recently: Mike Huckabee’s son was arrested for trying to bring a gun on an airplane. I’ll buy that it probably wasn’t his intention to hijack the plane, but how you “accidentally” carry a gun into an airport escapes me. Most of us are paranoid about whether our tiny bottle of shampoo is pushing the envelope and whether it’ll result in a cavity search. And yet people keep waltzing in with guns. Furthermore, anyone who doesn’t know where their guns are shouldn’t be allowed to carry them in the first place. (Despite what some have said, this doesn’t change my opinion of Huckabee himself… His statements like, “And that’s what we need to do — to amend the Constitution so it’s in God’s standards…” are what influence my views of him.)
  • Another case of a laptop with private data on more than half a million people going missing.

Torture

Dear Republican hard-liners: waterboarding is really unpopular. But I have an awesome idea. You can torture detainees even more, while fooling the Democrats into thinking that you’ve had a sudden change of heart.

Give free dental care to all detainees, paying special attention to fill cavities.

They used this huge needle to give me Novacaine. If I were giving an injection to a buffalo, I’d think the needle was unnecessarily large. Furthermore, they weren’t content with merely jabbing me with the needle. They stuck it way in, which was only mildly painful, until they must have jammed it into a vein or something, which caused excruciating pain. As I screamed in pain, the dentist apologized and shifted the needle ever so slightly.

They did one filling, and then the main dentist randomly left for about fifteen minutes. Meanwhile, her partner in crime was left to implement some extremely bizarre torture implement. All I saw was that a blue latex thing–a lot like a rubber glove, only a flat sheet of it, was fit over my mouth, covering it completely, while something sharp was jammed into my gums until I screamed out again in pain. “Oh, does that hurt?” She removed it, and I never saw it again, so I have absolutely no clue what that was all about.

Sick of seeing ridiculously scary weapons being brandished in my face, I kept my eyes closed most of the time. (Actually, it was more the cloud of tooth-dust rising out of my mouth, and a desire to keep it out of my eyes.) I eventually opened my eyes, to find what can only be described as a large metal pipe sticking out of my mouth.  As with a gum-piercer with a latex cover obscuring my entire mouth, that thing couldn’t have served any legitimate dental purpose.

They ended up giving me three shots of Novacaine, as she’d keep drilling into teeth that still had feeling. After the second one, they both left the room, probably to find more torture devices.

Meanwhile, as I sat there bewildered, some lady came in, handed me a small FM radio with headphones, and said, “Here, this sometimes helps.” Between being completely bewildered as to what was going on, and being unable to talk anyway, I nodded in appreciation and took the radio. It only got two radio stations–the same one that they had playing in the room, and a country station. But I figured it would drown out the noise of the drill, even though I think the implication may have been that the excruciating pain was al in my head, and listening to music would cause me to forget the fact that I had a huge hole in my gum and someone repeatedly taking a drill to a tooth that definitely wasn’t numb.

With the third Novacaine shot, the whole right side of my face was numb. And my eye felt really funny. When they left again, I looked in the mirror and saw that it was halfway shut, while the other one was wide open. This was quite a distressing sight, so I mentioned it to torture-assistant lady. She made some neutral comment whose tone indicated, “I don’t want to concern you anymore than you already are, but I’ve never seen that before and it looks pretty scary.” The real dentist came back in and told me it was nothing to worry about.

On top of all of it, the assistant lady had really sharp fingernails that were digging into my cheek through her gloves the whole time. And the filling they used smelled like rubbing alcohol. The smell of rubbing alcohol isn’t that bad, unless it’s wafting directly into your nose, in which case it’s horrible: partially the smell, partially nauseating fumes.

Finally, my interrogators decided I’d had enough and released me. I left unable to really control my lips, with my jaw in excruciating pain, an unexplained cut in my lip, and with my upper lip having a horrible burning sensation.

Moral of the story: floss and brush your teeth! Twenty-seven times a day.

Torrent Hosting

So I’m contemplating posting my BlueQuartz VMware image on VMware’s “Appliances” page, where it’d probably get a decent amount of downloads. I strongly doubt I’ll run into my bandwidth limit (it’d have to be downloaded about 3,000 times in a month), but I still don’t want to use bandwidth I don’t have to. When you’re distributing a big file to lots of people all of a sudden, BitTorrent is the perfect solution.

Unlike distributing, say, a bootleg movie, there’s an ‘official source’ for a lot of legitimate torrent hosting. This doesn’t mean anything in BitTorrent, but I think it should. The official source wants to ‘host’ it, but get people to help with bandwidth over BitTorrent.

There should be an easy way for them to host the file. Run a single command, pass it the file you want to distribute, and it’ll automatically create a .torrent file, register with some trackers (or host your own?), and begin seeding the file. In practice, this would probably take 10-15 minutes of work by hand. That’s pathetic.

There’s also a catch 22 at first: you want seeders (people who have the whole file and upload it to their peers), since, without them, no one can get the file. But you need a seeder before anyone can be a seeder. The obvious solution is to seed your own file, and this is how it’s done. But, as the ‘official’ distributor of a file, you don’t want to burn through bandwidth, so it makes sense that you’d want to throttle your available bandwidth: if there were lots of other seeders, you’d only use a small amount of bandwidth. By keeping the ‘server’ up as a permanent seeder, you alleviate the really annoying problem of no one having the full file, which, obviously, prevents anyone from ever getting it.  This is sort of a “long tail” problem: after the rush is over, you often end up with BitTorrent not being so awesome.  (And, if you set your throttled upload bandwidth to be inversely proportional to the number of seeders, when no one else is seeding it, there’s really no difference between someone downloading your file over BitTorrent and downloading it directly from your server.)

Of course, you’ll still have to distribute over FTP/HTTP, since not everyone can use BitTorrent. But, if you distribute it ‘normally’ over HTTP, you create an incentive for people to just download it from you, bypassing BitTorrent, which ruins the whole plan. So you also need to be able to throttle your bandwidth on those services, to make sure that it’s never faster than BitTorrent.

I really think there should be an all-in-one package to do this, so the host just runs a quick command on the server, and the file’s immediately being seeded on BitTorrent and available on HTTP/FTP. And for all of “us,” just think of situations that, say, Linux distributions must have with distributing large files.

This could even be a hosted service: a decent amount of people providing things like games have been smart enough to embrace BitTorrent. The market’s there. There’s just nowhere offering this.

Beat the Rush

In case anyone here is interested, I’m hosting a VMware Player image for BlueQuartz, the ‘modern’ GPL version of the old Cobalt RaQ software. A lot of people seem to want a VMware image. I was one of them, until I ended up just creating one on my own.

So grab it while it’s hot! (Read: grab it before I take the time to better throttle download speed.)

Get a (Virtual) Life

Amid wrestling with getting Xen working (its kernel doesn’t play nicely with my video drivers… oh how I hate closed-source drivers), I downloaded VMware player. It’s free.

I first downloaded a VMware image of Mailserver by Allard Consulting. Quick review: I’ve never used it in a ‘real’ environment to send or receive e-mail (and I screwed up VMware’s networking, making things worse), but it seems extremely impressive. The one thing I have realized is that my much-raved-about spamd is very irritating if you try to telnet to port 25 to ‘test’ the mailserver. If I had a colocated server hosting multiple VPSs *cough* I think I’d buy the ‘real deal’ from them and use this as my mailserver.

But I think I’m going to get entirely distracted with virtual machines tonight. I’m running the latest and greatest version of Ubuntu, 7.10, codenamed “Gusty Gibbon.” But 8.04, code-named “Hardy Heron” is in early testing, and you can grab an image of it. (You can also run it on your desktop, it’s in no way ‘proprietary,’ but a lot of us aren’t hardcore enough to want to run bleeding-edge alpha code as our main OS.)

I’ve mentioned before that I was somewhat interested in the $300 PCs that Walmart was selling. They came with Linux, apparently something Google partnered with them on, dubbing the desktop environment “gOS.” (The machine also draws insanely low power.) Lo an behold, it’s out there as a VMware image. (I was also able to play around with the One Laptop per Child (OLPC) image in VMware.)

Oh, and Solaris anyone?