This Is My Hobby

I want to start a “meta ISP.”

When you sign up with your ISP, you’re paying for transit. They carry your data from one network to the other.

But now let’s say that I’m a mediocre residential ISP. I buy connectivity from a couple different upstream providers, and use BGP to make sure your data takes the fastest route. This is what most people do. It works.

Let’s further say that you run an extremely popular site, maybe one of the top 100 sites out there. You have a mediocre IT team. You have enormous bandwidth, coming in from three different carriers. You, too, use BGP to make sure that your outgoing traffic takes the quickest route.

So everything works. Traffic flows between the two networks. What’s the problem?

Well, it turns out that you, Mr. Big Site, have some of your core routers in a major data center out this way. And I, Mr. Big ISP, also have a few core routers in that building. This is really pretty common–there’s a (very aptly-named) network effect with transit. When several big guys move into a building, all of a sudden, more people want to be there too. So you get sites like One Wilshire, a thirty-story building in LA full of networking equipment. They’re very confidential about their tenants, but “word on the street” is that every network you’ve heard of, and plenty you haven’t, is in there. (When viewing that picture, by the way, it’s worth noting that these wires don’t go to some secretary’s PC. Each is probably carrying between 100 Mbps and 10 Gbps of traffic between various ISPs and major networks… Also an interesting note to the photo, they supposedly keep an elaborate database and label each wire, so that this huge rat’s nest is actually quite organized.)

Since we’re both huge companies, we’re each paying six figures a month on Internet. But when one of my customers views your site, they go through a few different ISPs, and across multiple states, before it arrives on your network. It’s asinine, but that’s how the networks work.

So we wise up to this. I call you up, and we run a Gigabit Ethernet line between our racks. And all of a sudden, life is peachy. Data travelling over that line–my customers viewing your site–is free. My bandwidth bills drop, and speeds improve, too. This is the world of peering. And, strangely, the mutually-beneficial practice is rarely done.

I think there’s a market for a big middleman here. The last mile (that would be a good book title, if a telecom magnate wanted to write his memoirs) is difficult–running lines to consumers’ homes. Similarly, it’s hardly trivial to become a Tier 1 ISP, a sort of ‘core backbone’ of the Internet. But an intermediary broker? Easy enough to do.

So you’d get space in the major exchanges, and peer with popular sites. Google, Yahoo, MSN, Youtube, Facebook, eBay, Myspace, Amazon, Akamai, etc.

Digital Photo Recovery

I just discovered PhotoRec, a tool for recovering digital camera images.

For the non-geeks, a quick basic background…. When you save a file, it writes it to various blocks on the disk. Then it makes an entry in the File Allocation Table, pointing to where on the disk the file is. When you delete a file, the entry is removed from the File Allocation Table. That’s really all that happens. The data is still there, but there’s nothing pointing to where on the disk it is. This has two implications. The first is that, with appropriate tools and a little luck, you can still retrieve a file that you’ve deleted. (Whether this is comforting or distressing depends on your perspective…) The second is that, with no entry in the File Allocation Table, it’s seen as “free space,” so new files saved to the disk may well end up getting that block. It’s technically possible to recover stuff even after it’s been overwritten, but at that point it’s much more complex and much more luck is involved.

Last night we went out to dinner… We took lots of photos, but some were deleted. So I figured PhotoRec might recover them. So I gave it a try.

The filesystem shows 163 photos. After running PhotoRec, I have 246 photos. What’s odd is what photos I have. It’s not the ones from last night. They’re scattered from various events, and several are from almost two months ago.

This does leave us with an important tip, though: if you delete an essential photo, stop. Each subsequent thing you do to the disk increases the odds of something overwriting it. In a camera, just turn it off. Taking more photos seriously jeopardizes your ability to recover anything.

In my case, I didn’t have anything really important… I just wondered how it would work. And I got strange results for recovered files. (Which has me wondering a lot about how its files get written out to disk, actually.) But it’s good knowledge for the future. (By the way, PhotoRec runs under not just Linux, but also, apparently, Windows, and most any other OS you can imagine.)

Thick & Thin

Kyle has an awesome 22″ LCD. His is probably 6″ thick in the back.

I have a 14.1″ laptop. My LCD is probably less than 1/2″ thin in the back.

Why is this? My first guess was the power supply. But I’ve seen a few LCDs where they’ve taken out the power supplies and put them on the floor. And frankly, you could fold up my whole laptop, glue the power brick onto the back, and it wouldn’t be any thicker than stand-alone LCDs.

Imagine if someone made a 22″ LCD that was 1/2″ thick. Ceteris paribus, I’d buy one.

I suppose one “disadvantage” is that I could probably, if I tried, grip my LCD at each corner, flex with all my might, and ruin the LCD. If I tried that on Kyle’s (don’t worry, I won’t!), I doubt I could. It’s inside a huge frame. But I don’t buy that this is a reason to not provide it.

Steve, I’m counting on you.

Lottery

I lost the lottery last night. Actually, I never even entered. The drawing was for $270 million (30-year annuity), or $164.3 million cash. The stakes were high, but I still didn’t dare risk life and limb to go buy lottery tickets. Further, I’d have to clear my car off and I was feeling pretty lazy. I hoped it would roll over. Sadly, this was not the case.

But some of us were talking about what we’d do if we won. Not the “I’d buy an awesome house” or lease a 10GigE line. But financial planning stuff. First, we started the debate over whether you take the cash or the annuity. $270 million over 30 years is $9 million a year. Not to knock $9 million, but if I won $270 million, $9 million a year would seem pretty pathetic.  So I pulled out my trusty old financial calculator.

Let’s call the $164.3 million a nice even $150 million. You take $14.3 million off the top to get your indulgences out of the way. At 4% interest (realistic enough for something like a treasury note) over 30 years, you’d make $500,000 interest a month. (Assuming that you took the $500,000 out each month to spend.) That’s $6 million a year, or two-thirds of what you’d be getting if you took the annuity. Except you’ve already paid cash for a house overlooking Hollywood and shared your riches with your friends and family bought a couple cars too. Conversely, if you didn’t take your $500,000 a month out, you’d have just shy of $500 million in the bank at the end of 30 years.

So I’d definitely have taken out the cash.

The other thing I overlooked in the past is the concept of using annuities. I might like to host a scholarship, for example. I’d want $50,000 each year to fund it. Assuming the same 4% interest, you’d sock aside ($50,000 / 0.04), or $1.25 million. And then each year take out the $50,000 accumulated interest.

But, alas, I didn’t win. So no 10GigE to my home (probably not feasible anyway–transit costs are much less than I expected, but the cost of a fiber circuit to your home isn’t accounted for, and I’m not sure you can just call up Verizon and ask for them to light up a 10GigE link to LAIIX for you…)

Dork

Yesterday was, for all intents and purposes, a snow day. They closed the school down at 1. Of course, I had no classes anyway, just some work that could be done anywhere. But this was a snow day. You don’t do work. At least, not the work you’re supposed to.

Kyle, always being curious about the hardware side of things, sent me a link to the RoomWizard downloads page after fishing out the hardware specs elsewhere. There were two things that interested me–one was that you could download a firmware image. The other was that they had a PDF of how to use their API.

Wait… API? That means… it’d be trivial to write an interface to these things!

The problem is that the manual never mentions the actual address of the API, which is just accessed over HTTP and returns XML. They give a few examples–/rwconnector is used most often. But alas, /rwconnector on these throws a 404.

Somewhat discouraged, I started poking around the firmware image. It’s a .tar.gz, and extracts… a (fairly) normal Linux filesystem. Besides some juicy stuff that I hope admins are instructed to change (there are several privileged user accounts), I also found some neat stuff. For one, it’s based on SuSE, but a very trimmed-down version. And it’s basically a full-functioning Linux machine, including an SMTP server, Apache Tomcat, etc.

But then I hit gold. There’s a configuration file for Tomcat, which mentions one URL of /Connector. So I fired it up and tried it in on one of the systems. Bingo!

So then I read a bit more of the API manual. It’s actually very simple–you can retrieve, edit, and delete bookings. (The edit and booking doesn’t let you do anything you can’t do via the web interface, by the way, lest anyone think this is a security flaw.) You get an XML document back with results.

So then I had to figure out how to get PHP to parse XML. It turns out that PHP actually has several ways to do it, including SimpleXML and DOM objects. I spent a while learning it and by the end of the day, I had a prototype working that would get reservations for the next 24 hours and parse out the information. (Small tip–don’t try to “escape” colons when dealing with XML. They denote a namespace. When you get rb:name, for example, the tag name is just name, in the rb namespace. Knowing this a little sooner would have saved me about half an hour of, “This code is so simple! Why doesn’t it work?!”)

The next step is to insert all of this into an SQL database, and then write a nice viewer for it. And also to experiment with adding bookings, although that should just require changing a line of code.

I haven’t actually written code to do timing, but it feels like it’s 1-2 seconds for me to get the XML data back, which suggests that the bottleneck is in its little database. Short-term, I want to write myself a little interface that will parse all the data, cache it, and give me a faster interface. Long-term, I want to try to see if I can get the library to adopt this, and have it be the booking mechanism. You can store them to a local database, and then have a background process use the API to push reservations out to the respective RoomWizards, so that they continue to function normally. But when people view the page, it’ll just get it all from the local database, meaning that the whole “Get the listings via API” thing is no longer necessary. (Unless you want to rebuild the database in case of a disk failure!)

Criticizing Web Apps

As long as I’ve posted a lengthy diatribe about how awful the library room-booking web interface is, there are two more that drive me nuts.

We have a way of putting in work orders for maintenance. Last semester I tried to open one of our windows and it just fell out. This semester, we had three different light fixtures burn out in 2 days time. So you go online and put in a work order. This is a great thing to have web-based. Except they picked this insane system that opens multiple browser windows, resizes your browser (?!), uses copious JavaScript requiring you to double-click on links… And it only works in IE. Oh, and there are irritating things that could be fixed with one line of code… You log in with your student ID, which is eight-digits that inexplicably have an @ sign in front of them. So they have this big note on how you cannot use the at sign, you must only use your eight-digit number. One line of code could just strip it out if it was included.

Much like booking library rooms, submitting help tickets is a Programming 101 exercise. In fact, it’s easier than the library interface, because you don’t have to do time calculations. You have an employees table, a clients table, and a work table. Tasks get entered into work by the client, and the staff assigns an employee to it. And when it’s done, you set work.status to “Complete,” a simple ENUM field. This is like 45 minutes of coding, although I’d probably spend more time prettying up the interface.

Then there’s the computer help desk, another web app. For one thing, all the links to it point to an http:// URL. But if you actually use them, it barfs up an error that you must view it over a secure channel. Being a web dork, I just take “s” onto the end of “http” and life is golden. To someone who’s not so good with computers, and who’s already at wits’ end with their computer, they’re probably going to break down and cry, because even the help desk webpage doesn’t work for them.

This, too, only works in IE. In this case, they didn’t have copious bizarre crap (like requiring double-clicking on links), so I set Firefox to pretend it was IE. The page loads okay, but looks terrible, with nothing lining up right. IE and, well, the rest of the world, have differing views on how lots of things are done, but requiring IE really isn’t the best solution. Oh, and as an added bonus, they control your mouse cursor, preventing it from indicating links in any manner. This means that someone took time to write code that does nothing but decrease usability.

But worst of all, even if you use IE like they demand, if you actually try to click on any tickets to view them, you get taken to a random system with a long canonical hostname, which just throws you “HTTP 400 – Bad Request.”

So last night, I submitted a help desk ticket indicating that the help desk is broken. Because, frankly, it doesn’t work. All of its internal links take you to the wrong server (or, seemingly, the right server but with the wrong hostname), and that’s assuming you’re smart enough to get in, by understanding the error indicating that you need to use HTTPS, not HTTP.

Most of these things are sold as turnkey devices, it seems. Maybe I should start a company making them. Apparently, no technical expertise is required to do so.

RoomWizard

Even though I got to a business school and am a management major, my real passion is working on websites.

We just build a new library here, for millions and millions of dollars. We use a tool called RoomWizard for booking rooms. We get a web-based interface to book library rooms. This is a great idea. Unfortunately, it’s so fraught with bugs that it borders on unusable.

The main “bug” is that it’s basically so slow that it’s unusable. I tried viewing the source, and it’s got a HUGE block of JavaScript that’s a pain to read. Most of the page is being generated on the fly with JavaScript. There are times when this is the best way to do something. This is not one of them.

My current understanding–I may be wrong, since I’m still trying to make sense of this–is that each of the touch-screen units on the wall is a webserver. It’s responsible for storing all of its reservations. So when you view the main page, JavaScript has you going out to each of the 20+ rooms and requesting their status. The problem is that this takes forever, probably at least 15 seconds. By the time the page has finished drawing, it’s about time for the 60-second refresh to kick in.

I did a bit of viewing headers. The main page is running on ASP.net, but each individual room controller (probably like a 300 MHz embedded chip?) is running Apache Tomcat. Someone did a quick port scan and found that the devices have a lot of open ports–ftp, ssh, telnet (!), HTTP, and port 6000, which nmap guessed was X11. So I have a pretty good feeling these things are running embedded Linux.

Another problem is that there’s always one or two of the devices that, for whatever reason, are unreachable. So you get errors on those ones.

Booking conference rooms is like a Web Programming 101 interface. You get a basic introduction to SQL databases, and write a little interface. You could run this on an old 1 GHz PC with 128 MB of RAM and have pages load in fractions of a second, especially if you really knew how to configure a webserver. (Turn on APC and MySQL query caching, in this case, and you’re golden.) I cannot fathom why they thought it was a good idea to have a page make connections to 25 different little wall-mounted touchscreens. This places a big load on what have got to be underpowered little units, and is just a nightmare any way you look at it. I really see no benefit to what they’re doing.

Furthermore, this breaks off-campus connections, since you can’t connect to these units remotely.

You convert the wall-mounted RoomWizards from embedded webservers into a little web browser client, and they just pull down the data from the main server.

With a traditional, single database, it would also be easy to write a little search tool–“I need a room on Friday from 3:00 to 5:00.” This is a fairly simple SQL query. This is not a fairly simple question to ask 25 wall-mounted touchscreen things.

I’m tempted to write a little PHP script to go out, retrieve the data, and cache it. Essentially a hacked-together proxy…

Radio

I’m a long-term radio geek, and I’ve realized that the technology interests me more than actually using it. Having worked with lots and lots of radios (I realized that I have three sitting on my desk, all of which I have used in the past 30 minutes), I’ve concluded that I’d like to start a radio company. Our motto would be, “Our radios don’t suck.”

One of my radios is a ham radio, which is front-panel programmable (FPP), meaning that you can punch in frequencies on the keypad. This is pretty common with ham radios. By contrast, land-mobile radios (things that, say, a police officer would carry) very rarely have FPP capability; in fact, the FCC frowns on certifying radios with that capability, except for certain federal agencies that need to be able to reprogram their radios in the field. However, it’s often offered as a software add-on. But even using the ham radio, it’s really hard to use. Part of the problem is that the radio’s probably a decade old, and the print on the keypad has worn off. So I’m guessing at what buttons do.

There are very few radios with a graphic LCD. Dot-matrix LCDs almost seem cutting-edge in the radio world. By contrast, try to find a cell phone that doesn’t have a big color LCD on it. I have an old Garmin GPS III, and still admire that screen. I think it’s four shades of gray, and fairly high resolution. It’s a nice graphic LCD. It’s so much easier to use, and introduces stuff like the ability to “arrow” around a screen, as opposed to trying to use obscure key combinations. I’d actually love to see something like a 2″ by 2″ e-ink display (which, in addition to looking amazing, would reduce power usage), but it’d be a pain since it’s slow to redraw.

Motorola’s MDC1200 technology is practically ubiquitious in the public safety industry, transmitting a 1200 bps data burst containing a four-digit identifier. This could be so easily improved. Put a little $20 GPS chip in it, and have it transmit GPS coordinates on each transmission. (You could also include stuff like battery level, if on a portable, and information on received signal strength. The latter would be useful to run in the background and plot a map of the radio system’s reach.)

Programming is always a pain. Some of Motorola’s radios are programmed in ways that are so obscure that they border on comical. (I think the goal there is security.) I want to write an XML file for my radio. Put a USB port on the side of the radio. Let me hook it up to a computer, or just plug a thumb drive in and reprogram from that. But consider bigger problems, though. Boston PD switched to an “improved” channel lineup last year. Apparently they worked for weeks to pull radios in at the end of a shift, load up the new set of data, but leave the radios set to old configuration, until all the radios had the new programming in them. And then, at a quiet time one day, they broadcast a message telling officers how to switch to the new configuration. Over-the-air programming is possible, but it’s generally used in some specific situations. (OTACS, Motorola’s Over The Air Channel Steering, to direct a radio to switch to a particular channel, and OTAR, Over the Air Rekeying, to send new encryption keys to the radios.) Why not let the system send out bursts of programming data when the radio system is idle, loading up new programming data in the background, until they’re ready? Obviously, all of these programming things need some security constraints, but that’s trivial to implement.

I’m pretty confident that software-defined radio is going to become ubiquitous in the next decade, but no one’s really making use of it yet, except for uber-geeks in labs. APCO’s Project 25 digital voice (IMBE) has emerged as a standard in digital voice, but it’s meant to be made obsolete in the future by a “Phase II” implementation. Various other technologies have come and gone, such as Motorola’s VSELP. And there exist myriad trunking protocols for larger networks. I want to embrace SDR and use it in everything, “future-proofing” radios. (Of course companies have an incentive to not future-proof their hardware, forcing people to upgrade… But you can still make your money on selling software upgrades!)

Oh, and put an SD slot on the darn thing. Record the audio it receives, letting people play back transmissions they miss. Or host applications. (Or, permit programming!)

Intuitive

GRE, a (radio) scanner company that makes a lot of the scanners Radio Shack sells, also sells some under their own name.

This new one advertises an “Intuitive ‘Object Oriented’ User Interface Design,” which brings all the fun of OOP to a GUI. The picture of the radio reads “Press NEW to create objects,” and has three softkeys, labeled “NEW,” “EDIT,” and “GLOB.”

I’ll reserve final judgment until I play around with one, but, on the surface, this seems anything but intuitive.

Resizing Images and HTML

This post is meant for webmasters, and it addresses a startlingly common problem: images included on pages and “resized” only in HTML.

The basic tag to include an image, of course, is . That will include something.jpg on the page.

But say that the image is 1600×1200 pixels (2.1 megapixels: big enough to fill your screen and then some, at least for most people). This is way too big to put on your webpage. So what do people do? They do something like to resize it. This is a very, very bad way of doing it.

The problem is that this shows a fundamental misunderstanding of what the height and width attributes do. They’re essentially ‘hints’ for the browser. The web browser, when it sees an image in your HTML, will download the whole image. In this case, it’ll download your 1600×1200 image, which is probably about 500kB in size. (God help us if you have a whole series of these photographs on your page.) When it sees a mismatch between the specified height and width attributes, the browser will do a very rudimentary (read: very crappy) resize. So not only are you wasting a ton of bandwidth unnecessarily (which also makes your page load very slowly), but the end product is images that look horrible.

Instead, open the image up in your editor of choice. Photoshop CS3 is wonderful, but those of us who can’t justify spending more than $500 on image editing software may prefer a free tool like Paint.NET. Resize the image to the size you desire, and include that image, newly resized, on your page.

You’ll see multiple improvements: your site will use less bandwidth, your pages will load much faster, and your images will look much better. (Also: I’d encourage you to simply omit the height and width attributes if you’re not sure what you’re doing. Writing perfect HTML, you’d set them to the image’s native dimensions, but so many people screw it up that it’s probably safest to just omit them. Every browser I’ve ever used has handled this seamlessly.)