This Is My Hobby

I want to start a “meta ISP.”

When you sign up with your ISP, you’re paying for transit. They carry your data from one network to the other.

But now let’s say that I’m a mediocre residential ISP. I buy connectivity from a couple different upstream providers, and use BGP to make sure your data takes the fastest route. This is what most people do. It works.

Let’s further say that you run an extremely popular site, maybe one of the top 100 sites out there. You have a mediocre IT team. You have enormous bandwidth, coming in from three different carriers. You, too, use BGP to make sure that your outgoing traffic takes the quickest route.

So everything works. Traffic flows between the two networks. What’s the problem?

Well, it turns out that you, Mr. Big Site, have some of your core routers in a major data center out this way. And I, Mr. Big ISP, also have a few core routers in that building. This is really pretty common–there’s a (very aptly-named) network effect with transit. When several big guys move into a building, all of a sudden, more people want to be there too. So you get sites like One Wilshire, a thirty-story building in LA full of networking equipment. They’re very confidential about their tenants, but “word on the street” is that every network you’ve heard of, and plenty you haven’t, is in there. (When viewing that picture, by the way, it’s worth noting that these wires don’t go to some secretary’s PC. Each is probably carrying between 100 Mbps and 10 Gbps of traffic between various ISPs and major networks… Also an interesting note to the photo, they supposedly keep an elaborate database and label each wire, so that this huge rat’s nest is actually quite organized.)

Since we’re both huge companies, we’re each paying six figures a month on Internet. But when one of my customers views your site, they go through a few different ISPs, and across multiple states, before it arrives on your network. It’s asinine, but that’s how the networks work.

So we wise up to this. I call you up, and we run a Gigabit Ethernet line between our racks. And all of a sudden, life is peachy. Data travelling over that line–my customers viewing your site–is free. My bandwidth bills drop, and speeds improve, too. This is the world of peering. And, strangely, the mutually-beneficial practice is rarely done.

I think there’s a market for a big middleman here. The last mile (that would be a good book title, if a telecom magnate wanted to write his memoirs) is difficult–running lines to consumers’ homes. Similarly, it’s hardly trivial to become a Tier 1 ISP, a sort of ‘core backbone’ of the Internet. But an intermediary broker? Easy enough to do.

So you’d get space in the major exchanges, and peer with popular sites. Google, Yahoo, MSN, Youtube, Facebook, eBay, Myspace, Amazon, Akamai, etc.

Digital Photo Recovery

I just discovered PhotoRec, a tool for recovering digital camera images.

For the non-geeks, a quick basic background…. When you save a file, it writes it to various blocks on the disk. Then it makes an entry in the File Allocation Table, pointing to where on the disk the file is. When you delete a file, the entry is removed from the File Allocation Table. That’s really all that happens. The data is still there, but there’s nothing pointing to where on the disk it is. This has two implications. The first is that, with appropriate tools and a little luck, you can still retrieve a file that you’ve deleted. (Whether this is comforting or distressing depends on your perspective…) The second is that, with no entry in the File Allocation Table, it’s seen as “free space,” so new files saved to the disk may well end up getting that block. It’s technically possible to recover stuff even after it’s been overwritten, but at that point it’s much more complex and much more luck is involved.

Last night we went out to dinner… We took lots of photos, but some were deleted. So I figured PhotoRec might recover them. So I gave it a try.

The filesystem shows 163 photos. After running PhotoRec, I have 246 photos. What’s odd is what photos I have. It’s not the ones from last night. They’re scattered from various events, and several are from almost two months ago.

This does leave us with an important tip, though: if you delete an essential photo, stop. Each subsequent thing you do to the disk increases the odds of something overwriting it. In a camera, just turn it off. Taking more photos seriously jeopardizes your ability to recover anything.

In my case, I didn’t have anything really important… I just wondered how it would work. And I got strange results for recovered files. (Which has me wondering a lot about how its files get written out to disk, actually.) But it’s good knowledge for the future. (By the way, PhotoRec runs under not just Linux, but also, apparently, Windows, and most any other OS you can imagine.)

Thick & Thin

Kyle has an awesome 22″ LCD. His is probably 6″ thick in the back.

I have a 14.1″ laptop. My LCD is probably less than 1/2″ thin in the back.

Why is this? My first guess was the power supply. But I’ve seen a few LCDs where they’ve taken out the power supplies and put them on the floor. And frankly, you could fold up my whole laptop, glue the power brick onto the back, and it wouldn’t be any thicker than stand-alone LCDs.

Imagine if someone made a 22″ LCD that was 1/2″ thick. Ceteris paribus, I’d buy one.

I suppose one “disadvantage” is that I could probably, if I tried, grip my LCD at each corner, flex with all my might, and ruin the LCD. If I tried that on Kyle’s (don’t worry, I won’t!), I doubt I could. It’s inside a huge frame. But I don’t buy that this is a reason to not provide it.

Steve, I’m counting on you.

Dork

Yesterday was, for all intents and purposes, a snow day. They closed the school down at 1. Of course, I had no classes anyway, just some work that could be done anywhere. But this was a snow day. You don’t do work. At least, not the work you’re supposed to.

Kyle, always being curious about the hardware side of things, sent me a link to the RoomWizard downloads page after fishing out the hardware specs elsewhere. There were two things that interested me–one was that you could download a firmware image. The other was that they had a PDF of how to use their API.

Wait… API? That means… it’d be trivial to write an interface to these things!

The problem is that the manual never mentions the actual address of the API, which is just accessed over HTTP and returns XML. They give a few examples–/rwconnector is used most often. But alas, /rwconnector on these throws a 404.

Somewhat discouraged, I started poking around the firmware image. It’s a .tar.gz, and extracts… a (fairly) normal Linux filesystem. Besides some juicy stuff that I hope admins are instructed to change (there are several privileged user accounts), I also found some neat stuff. For one, it’s based on SuSE, but a very trimmed-down version. And it’s basically a full-functioning Linux machine, including an SMTP server, Apache Tomcat, etc.

But then I hit gold. There’s a configuration file for Tomcat, which mentions one URL of /Connector. So I fired it up and tried it in on one of the systems. Bingo!

So then I read a bit more of the API manual. It’s actually very simple–you can retrieve, edit, and delete bookings. (The edit and booking doesn’t let you do anything you can’t do via the web interface, by the way, lest anyone think this is a security flaw.) You get an XML document back with results.

So then I had to figure out how to get PHP to parse XML. It turns out that PHP actually has several ways to do it, including SimpleXML and DOM objects. I spent a while learning it and by the end of the day, I had a prototype working that would get reservations for the next 24 hours and parse out the information. (Small tip–don’t try to “escape” colons when dealing with XML. They denote a namespace. When you get rb:name, for example, the tag name is just name, in the rb namespace. Knowing this a little sooner would have saved me about half an hour of, “This code is so simple! Why doesn’t it work?!”)

The next step is to insert all of this into an SQL database, and then write a nice viewer for it. And also to experiment with adding bookings, although that should just require changing a line of code.

I haven’t actually written code to do timing, but it feels like it’s 1-2 seconds for me to get the XML data back, which suggests that the bottleneck is in its little database. Short-term, I want to write myself a little interface that will parse all the data, cache it, and give me a faster interface. Long-term, I want to try to see if I can get the library to adopt this, and have it be the booking mechanism. You can store them to a local database, and then have a background process use the API to push reservations out to the respective RoomWizards, so that they continue to function normally. But when people view the page, it’ll just get it all from the local database, meaning that the whole “Get the listings via API” thing is no longer necessary. (Unless you want to rebuild the database in case of a disk failure!)

RoomWizard

Even though I got to a business school and am a management major, my real passion is working on websites.

We just build a new library here, for millions and millions of dollars. We use a tool called RoomWizard for booking rooms. We get a web-based interface to book library rooms. This is a great idea. Unfortunately, it’s so fraught with bugs that it borders on unusable.

The main “bug” is that it’s basically so slow that it’s unusable. I tried viewing the source, and it’s got a HUGE block of JavaScript that’s a pain to read. Most of the page is being generated on the fly with JavaScript. There are times when this is the best way to do something. This is not one of them.

My current understanding–I may be wrong, since I’m still trying to make sense of this–is that each of the touch-screen units on the wall is a webserver. It’s responsible for storing all of its reservations. So when you view the main page, JavaScript has you going out to each of the 20+ rooms and requesting their status. The problem is that this takes forever, probably at least 15 seconds. By the time the page has finished drawing, it’s about time for the 60-second refresh to kick in.

I did a bit of viewing headers. The main page is running on ASP.net, but each individual room controller (probably like a 300 MHz embedded chip?) is running Apache Tomcat. Someone did a quick port scan and found that the devices have a lot of open ports–ftp, ssh, telnet (!), HTTP, and port 6000, which nmap guessed was X11. So I have a pretty good feeling these things are running embedded Linux.

Another problem is that there’s always one or two of the devices that, for whatever reason, are unreachable. So you get errors on those ones.

Booking conference rooms is like a Web Programming 101 interface. You get a basic introduction to SQL databases, and write a little interface. You could run this on an old 1 GHz PC with 128 MB of RAM and have pages load in fractions of a second, especially if you really knew how to configure a webserver. (Turn on APC and MySQL query caching, in this case, and you’re golden.) I cannot fathom why they thought it was a good idea to have a page make connections to 25 different little wall-mounted touchscreens. This places a big load on what have got to be underpowered little units, and is just a nightmare any way you look at it. I really see no benefit to what they’re doing.

Furthermore, this breaks off-campus connections, since you can’t connect to these units remotely.

You convert the wall-mounted RoomWizards from embedded webservers into a little web browser client, and they just pull down the data from the main server.

With a traditional, single database, it would also be easy to write a little search tool–“I need a room on Friday from 3:00 to 5:00.” This is a fairly simple SQL query. This is not a fairly simple question to ask 25 wall-mounted touchscreen things.

I’m tempted to write a little PHP script to go out, retrieve the data, and cache it. Essentially a hacked-together proxy…

Radio

I’m a long-term radio geek, and I’ve realized that the technology interests me more than actually using it. Having worked with lots and lots of radios (I realized that I have three sitting on my desk, all of which I have used in the past 30 minutes), I’ve concluded that I’d like to start a radio company. Our motto would be, “Our radios don’t suck.”

One of my radios is a ham radio, which is front-panel programmable (FPP), meaning that you can punch in frequencies on the keypad. This is pretty common with ham radios. By contrast, land-mobile radios (things that, say, a police officer would carry) very rarely have FPP capability; in fact, the FCC frowns on certifying radios with that capability, except for certain federal agencies that need to be able to reprogram their radios in the field. However, it’s often offered as a software add-on. But even using the ham radio, it’s really hard to use. Part of the problem is that the radio’s probably a decade old, and the print on the keypad has worn off. So I’m guessing at what buttons do.

There are very few radios with a graphic LCD. Dot-matrix LCDs almost seem cutting-edge in the radio world. By contrast, try to find a cell phone that doesn’t have a big color LCD on it. I have an old Garmin GPS III, and still admire that screen. I think it’s four shades of gray, and fairly high resolution. It’s a nice graphic LCD. It’s so much easier to use, and introduces stuff like the ability to “arrow” around a screen, as opposed to trying to use obscure key combinations. I’d actually love to see something like a 2″ by 2″ e-ink display (which, in addition to looking amazing, would reduce power usage), but it’d be a pain since it’s slow to redraw.

Motorola’s MDC1200 technology is practically ubiquitious in the public safety industry, transmitting a 1200 bps data burst containing a four-digit identifier. This could be so easily improved. Put a little $20 GPS chip in it, and have it transmit GPS coordinates on each transmission. (You could also include stuff like battery level, if on a portable, and information on received signal strength. The latter would be useful to run in the background and plot a map of the radio system’s reach.)

Programming is always a pain. Some of Motorola’s radios are programmed in ways that are so obscure that they border on comical. (I think the goal there is security.) I want to write an XML file for my radio. Put a USB port on the side of the radio. Let me hook it up to a computer, or just plug a thumb drive in and reprogram from that. But consider bigger problems, though. Boston PD switched to an “improved” channel lineup last year. Apparently they worked for weeks to pull radios in at the end of a shift, load up the new set of data, but leave the radios set to old configuration, until all the radios had the new programming in them. And then, at a quiet time one day, they broadcast a message telling officers how to switch to the new configuration. Over-the-air programming is possible, but it’s generally used in some specific situations. (OTACS, Motorola’s Over The Air Channel Steering, to direct a radio to switch to a particular channel, and OTAR, Over the Air Rekeying, to send new encryption keys to the radios.) Why not let the system send out bursts of programming data when the radio system is idle, loading up new programming data in the background, until they’re ready? Obviously, all of these programming things need some security constraints, but that’s trivial to implement.

I’m pretty confident that software-defined radio is going to become ubiquitous in the next decade, but no one’s really making use of it yet, except for uber-geeks in labs. APCO’s Project 25 digital voice (IMBE) has emerged as a standard in digital voice, but it’s meant to be made obsolete in the future by a “Phase II” implementation. Various other technologies have come and gone, such as Motorola’s VSELP. And there exist myriad trunking protocols for larger networks. I want to embrace SDR and use it in everything, “future-proofing” radios. (Of course companies have an incentive to not future-proof their hardware, forcing people to upgrade… But you can still make your money on selling software upgrades!)

Oh, and put an SD slot on the darn thing. Record the audio it receives, letting people play back transmissions they miss. Or host applications. (Or, permit programming!)

Televisions

LCD and plasma TVs are becoming increasingly popular, costing between $1,000 and $3,000.

If you have that budget in mind, something I’ve wanted to do for a long time suddenly becomes viable: buy a projector and mount it on your ceiling. Of course, only the very high-end projectors will do the 1920×1080 that 1080i and 1080p do, but 1024×768 is very doable for under $1,000, and the difference in resolution shouldn’t be all that noticeable. And then you’ve got something like a 100″ screen. Wow-a-wee-wow!

The caveat, of course, is that few (if any?) projectors include tuners, so you’d have to set up a PC for that, something like a Mythbox. But one can be put together for around $500, and that naively assumes that you don’t already have a spare computer with a tuner card or two.

School

For reasons that even I don’t understand, I find myself thinking a lot about improving schools. And yesterday was one of those joyous experiences where several different thoughts suddenly overlapped, forming something new.

One of my professors is an adjunct professor who teaches at several different schools. And she was talking about how it seemed to her that a decent number of prestigious schools focus too much on theoretical and abstract concepts, but no so much on real-life applications. This nicely sums up one of the areas in which I’d like to see grade schools improved.

  • Gym class was universally an utter waste of time. I suppose it got me moving a little bit. But watching football or basketball on TV, I realize that I still don’t understand the finer points of the game. How come this never came up in gym class? And, perhaps more significantly, I’ve been exercising a bit. I lift weights a few times a week, and am looking forward to nicer weather so I can take up jogging again. (Yes, I should just go to the gym and use a treadmill. But it’s not the same.) Why didn’t I do this in gym class? Why did we spend so much time on badminton? Why is there an “n” in the middle of badminton?
  • I can’t speak for others, but trigonometry was among my least favorite classes ever. Furthermore, I’ve never applied it anywhere. The only time it came up in subsequent classes was when we integrated trigonometric functions, and at that point, no one had any clue what we were doing anyway. But why not replace a math class with zero practical applications with a finance course? Not until I took a finance course here in my sophomore year did I truly learn about things like compound interest and the time value of money. Every person in America needs to know this. You have $1,000 sitting in your bank account. How much will you make if you put it in a one-year CD at 4.25%? And you graduate and go to buy a $250,000 condo, taking out a 30-year mortgage. What will be your monthly payments at 6% interest? What if you get 4.5%? What if you get stuck at 8%? And, when you’re done with that, how much do you pay over the lifetime of the mortgage? (Hint: at 6% annual interest, you pay almost exactly $1,500/month, for $539,000+. That means that your interest is more than 100% of the principal.) You can bring up usury laws, and the fact that national banks, et alia, got themselves exempted from them, and segue into credit cards. Why did no one teach me to balance a checkbook? (Okay, it’s easy. But still…)
  • I want to learn either the guitar or the piano. And I suspect that, if you went to middle or high school, you’d find lots of people who shared my interest. What the heck happened in music class? How did I pass music class without understanding how to read music, and without being able to play anything other than the recorder in 5th grade?
  • What are geography classes teaching people? Why, when I graduated high school, did I still have no clue where Iraq was on the map? Similarly, what the heck happens inĀ  civics and such? Why wasn’t I made to read the Declaration of Independence? Why don’t I know the Amendments cold? I think I should be able to yell “14th Amendment!” to any high school graduate and have them talk about its exact meaning, including due process and equal protection. 22nd Amendment? What President was it enacted in response to? When (ballpark) was it ratified? What states refused to ratify it? (Hint: Massachusetts was one of two.) What’s required to add a Constitutional Amendment?
  • Why are we so reliant on calculators? Last semester we were looking to bring a Presidential speaker, and contemplated opening it to the public to make sure we filled the crowd. A friend pulled out a calculator. “If we charge $5 a ticket, and get 100 people…” He plugged the numbers in. “That’s $500 to defray the costs.” Not until I called him on what he’d done did he even realize the absurdity of using a calculator for 5 x 100. But it’s not that he’s stupid. It’s that we’re all so dependent on them. All the time I’ll start to plug some numbers into a calculator and solve it in my head before I finish typing it in. I think higher math classes need to give Math Minutes again. The kids might think you’re nuts for doing it in calculus class, but it’s necessary!

Pollution

I don’t consider myself a ‘hardcore environmentalist,’ but I’m not sure there’s anyone on the planet who wouldn’t agree that this is absurd.

It could be easily fixed, too, if someone (Indonesian government? UN? Environmental groups?) were willing to pay a bit. Hand out nets, and offer a nominal amount of money for each pound of garbage pulled out of the river. 5 cents a pound? Figure that they can get at least 100 pounds of garbage in a big net, in probably twenty minutes of work. You just need to drag it behind you until it’s full.

I’m sure that pulling all the garbage out of the river won’t instantly cure it of its problems. (Currently, even fish can’t live in it.) But I’m also pretty confident that pulling all of the garbage out of the river would be an improvement over leaving it in…