Wireless Networking

I’m yet to see an OS get wireless networking right. I’ve now worked pretty extensively with configuring and fixing wireless network configurations on XP (SP2) and Linux (Ubuntu). And frankly, both are disappointing. A few comments…

  • (Linux) I can see three wireless networks. Why am I connected to none of them, even after clicking on one of them?
  • (Both) I have a good connection, and all of a sudden, I have no network connection. I spent fifteen minutes fiddling and it still won’t come back. Windows includes a “Repair” function, and I’ve seen lots of people use it. I have never had it do anything, nor have I have ever seen it work for someone. I’m fairly certain it’s an inside joke at Microsoft or something.
  • (Both) You reboot and the wireless usually comes back up just fine. What the heck is going on? Why, with all the amazing developers working on both platforms, has no one ever figured a way to just bring the network down and back up? (Actually, you could argue that both OSs provide this — Linux lets you disable it and re-enable it, and Windows lets you “Repair” it. And yet neither of them works.)
  • (Windows) Why, when you can’t connect, do you give me a fake IP? There’s some bizarre netblock that Windows users get put on when they don’t actually have a network connection. What gives?

You’d think that WiFi was some technology that had only been out for a few months… But there have been years to get it right. Why has no one ever made it work right?

Overcoming Errors

I just updated PHP and Apache on this machine. Gentoo seems to have changed the way they do some things… But a few notes along the way, since there isn’t much in the way of helpful links out there…

  • I kept getting this error:
    apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
    No Listening Sockets Available, shutting down.
    Unable to open logs
    The usual suggestion with the “No listening sockets available” is that port 80 is being used by something else. Maybe Apache hasn’t actually shut down and you’re trying to restart it. However, in my case, I had made very certain that nothing was on port 80. The problem is actually caused by the fact that, after an upgrade, the “Listen” directive randomly goes missing. Tucking “Listen 80” into the top of a virtual host fixed everything.
  • Keep backups of all your config files. I screwed up and let them be rewritten as I upgraded Apache.

Mirror Idea

My server is allowed 1 terabyte of transfer a month. I would be shocked if I exceeded 10 GB any month.

Lots of services need mirrors. I’m becoming re-interested in streaming radio stations. Some of the good ones have limited bandwidth and fill up. Most open-source packages have a series of mirrors, too. Most distributions have elaborate mirror networks, in fact.

Here’s what someone should do. Set up a mirror ‘controller.’ I hit a generic name like us.something.com wanting to download something from a US mirror. This goes on all the time, and DNS does round-robin ‘load balancing’ across mirrors.

But you take it a little further. As a site admin with 900+ GB of bandwidth going unused each year, I can sign up and say, “I’ll take up to 25 GB a day,” and “I can spare 6 GB of disk space for mirrors,” and select a list of matching projects. I might end up hosting an Ubuntu mirror. I install a daemon on my server that communicates with the mirror network, but when someone someone hits the us.whatever.com pool, I’m in the list. But, it’ll detect that it’s forwarded me enough traffic for the day and pull me out. Furthermore, the daemon on my machine can also send a “temporarily remove me” notice, either for a duration of time or until further notice. That alleviates my final fear: that I won’t exceed disk quotas or bandwidth, but that serving all those files will really tax my system. When to send the signals is entirely up to me.

I’d like to volunteer to help, but I don’t want to blindly commit to something. And because I don’t see any good way to let me commit to help within my means, I have at least 20 GB of disk space and 900 GB of bandwidth that the community can’t use.

It seems like it wouldn’t be that hard, either. It might require a little more CPU power on the part of the mirror network management, but it’s not exceedingly complex. I think a simple PHP script might be easiest… You load, say, us.project.com/project/latest.tar.gz, but latest.tar.gz is actually a script that grabs a list of available mirrors and throws a redirect to the file on one of the mirrors.

The irony is that the argument against this idea — that it’d require more servers — is exactly what the problem is trying to solve.

E-mail Gates

This is surely not a revolutionary idea, but I’ve never seen references to it before. I forward my mail to my GMail account. I do own two domain names with mail services, though, but I just forward what I need to my GMail account.

Some sites require an e-mail address to sign up, and they send you a confirmation e-mail, so it needs to be real. However, I have no interest in letting them e-mail me long-term. The general solution is a “throw-away address.” You use it once and then delete it.

Here’s an idea that seems a little less wasteful to me. I call it an e-mail gate. I can set up a forwarder pretty easily. As long as I’m logged in, it’s just a few seconds of work. I can also remove a forwarder in a few second’s time.

So you might have an address like matt@gate01.ttwagner.com. (This sub-domain doesn’t even exist right now, so don’t bother trying it.) I can “open the gate” (turn the forwarder on), sign up on the site, get my confirmation e-mail, and then “close the gate” by turning the e-mail off. Consequentially, any crap they send will bounce back with a “No such address” message. But when I want to get e-mail or a lost password reminder, I can just turn the gate on for a minute. Unless you’re getting inundated with spam (e.g., tens of thousands a day), opening the gate for under a minute won’t be enough time for much crap to get through. Unlike throwaway addresses, you can use the same address as multiple sites, making it easy to remember what address you used.

It’d be neat to write some code to give a web-based interface to this. I think I need to get it working with some mail daemon that supports a MySQL database of users, though, since it currently involves putting the address in a text file, updating the address cache, removing it, and then updating the cache again. It’s quick if you’ve got a shell open, but it’s a bit of a pain to script.

I Live in a Web Browser

I don’t know why I keep eying quad-core systems. With the exception of playing music, copying files from my camera, some word processing, and IM, I live in a web browser. Here are some of the big uses:

  • GMail, my mail client. When I’m at my computer, I almost always have GMail up. I have a client for my Treo that lets me check it there. My school e-mail forwards to GMail. My ttwagner.com and n1zyy.com mail forwards to there.
  • Google and Wikipedia. I rely on Wikipedia way too much. But between Google and Wikipedia, I feel like I can do anything.
  • Google Docs is slowly winning me over. I move between my laptop, ‘public’ Office 2007 computers, and an office computer with Office 2003, so I’m hardly sold on any one particular interface. Google Docs is word processing (and spreadsheets) without the crap, although sometimes I do prefer to have it locally. But honestly, my life depends on the Internet, so ‘safety’ of files (in case I lose Internet access) really isn’t even one of the big issues.
  • Google Calendar has proved way more useful than I expected. It integrates nicely with GMail, sending me reminders and offering to let me schedule things that get e-mailed to me. And Goosync gives me an app on the Treo to sync my Treo calendar with my Google calendar. Bliss!
  • All my good photos end up on Flickr, and I buy and sell stuff on eBay often. I get my news through BBC and Google News.
  • I run a private Wiki. This is more useful than I ever imagined. I’m not quite as committed to it as I’d like, but I’m trying to keep all my class notes up there, which has a lot of benefits. During research, it’s a handy link dump. When drafting a constitution for a club here, I used that to allow collaborative editing.
  • I host a few mailing lists. Trying to keep a text file with 90 names and e-mail them and remove bounces and find people is a pain. Mailman is a savior.
  • I host multiple blogs. These are obvious, but there are some more I’m starting.
    • One, that never caught on, takes a pretty literal definition: a web log. I wanted a way for us to keep track of petty things that were going on, and have everything logged somewhere and searchable.
    • I’m also drafting one for the Democrats. A big part of what we do is outreach/publicity, and a blog is ideal for this.
  • Tonight I realized that none of my ‘task management’ systems worked. So I set up Mantis. It’s not perfect, but it works pretty well. Setting up Bugzilla is pretty intense, but no so with Mantis. The “problem” is that it was intended for software bug tracking, not keeping track of work I have to do, so I have fields like “Reproducibility” and other holdovers from software. I may do a little tweaking. But my plan is that anything I have to do should end up in there. Everything is in one place, and I can slice the data a million different ways, by priority, by category (one for each class, one for each club, one for each major class project, one for “Life”), etc.

Truly, without Firefox and a browser on my Treo, I don’t think I could get by. And I sometimes wonder if it’s worth paying monthly for a dedicated server. But I get so much benefit from the services I host for myself that it definitely is.

Gutsy Applications Menu

Posting this in the hopes that it’ll be useful to someone else, because it certainly took me a long time and caused a lot of frustration.

There’s a bizarre bug that a few people, myself including, have run into when upgrading to Ubuntu’s Gutsy Gibbon release: the applications menu is blank.

Some recommended deleting ~/.config/menus/applications.menu, but, in my case, this didn’t recreate it.

Here’s a tip, though: there’s an /etc/xdg/menus/applications.menu. Copying it to ~/.config/menus/ fixed my problem. And now, I have an applications menu. Hurrah!

Bans for Fun & Profit

The way I use this server gives me a luxury that bigger sites don’t: my visitors come from a select range, and I don’t have to worry much about blocking people erroneously. Therefore I can be quite aggressive in blocking IPs. /etc/hosts.deny is my new favorite file.

When I moderate comments here, I have a few choices… I can approve it, delete it, or mark it as spam. I never got what marking it as spam did… Apparently it doesn’t do much but set a ‘spam’ bit. (I’d hoped it trained Bayesian filters or something, but no such luck.) But what it does do is make it super-simple to construct an SQL query to pull out all the IPs that have posted spam. Add a little more and you get just the IPs this month that had posts flagged as spam. And you drop them in /etc/hosts.deny.

But then I was watching the system log file, and noticed lots of spam coming in. I’m not running much of a mailserver, so most addresses are bouncing. (Especially since they’re spamming addresses that have never existed?)

This is good news, though, for the IP-banhappy out there. Here’s my latest concoction:

grep NOQUEUE /var/log/messages | awk '{print $10}' | \
sed "s/[/ /g" | sed "s/]/ /g" | awk '{print "ALL", $2}' | \
sort | uniq -c | sort | tail

In a nutshell, we look for “NOQUEUE” in the log files, pull out the 10th column (IP), split out the junk so it’s just a numeric IP, sort it, weed out the dupes with uniq and pass it the -c flag, which has it count the number of times each line occurs, and then we sort that, so that the list is now sorted by the number of bounces. It defaults to ascending order, so that the top of the file is all people who’ve only e-mailed one invalid address. So the ‘juicy’ part is the end of the file. So we pipe it to tail, which, by default, shows the last ten lines. So the output looks like:

      5 ALL 219.140.194.117
      5 ALL 85.130.84.9
      6 ALL 86.152.15.119
      7 ALL 83.182.186.224
      8 ALL 125.181.70.135
      8 ALL 207.144.11.87
     10 ALL 125.212.188.156
     15 ALL 88.238.145.22
     17 ALL 217.26.169.66
     17 ALL 62.149.197.247

You could use a little more magic to automatically add the second and third columns to /etc/hosts.deny, but I prefer to do it this way… The reason is that sometimes (not in this example) you’ll see posts from a range of similar IPs. It’s more of a judgment call where you draw the line, so I like to give it the once-over.

Business Idea #49240924

I’m a big fan of things that don’t suck horribly. Sometimes I like to look up song lyrics on the Internet. And there are no sites that I’m a fan of, if you catch my drift…

There are a handful of lyrics sites that always rank highly on Google. But I’d say that 98% of the time, the lyrics contain egregious errors. They completely mishear a line (often in ways that just common sense can show is wrong), or have glaring misspellings, or just typos. And terrible formatting. Always.

I don’t get how a whole industry can be crappy, but that’s beside the point.

There’s a site called WikiLyrics. At one time the founder commented on a blog from years ago when I called for such a service. But the site is pretty hard to navigate, and looks too much like a wiki.

SongMeanings is the site I like most. The lyrics are usually spot-on. And, best of all, you can post comments on the ‘meaning’ of a song. But they have odd uptime problems, where the site will be down for days at a time. I haven’t been able to get to it for several days now.

If I had a lot of money, I’d buy out the handful of companies that always rank highly on Google for song lyric searches, along with Song Meanings, and develop one site to rule them all. Registered users could edit the lyrics, with some oversight. (My intuition says that music-related stuff is much more prone to vandalism.)

Rather than a bajillion obnoxious ads, we’d have a couple tasteful ads. Ideally, it would be more specific links: buy the song from a vendor who pays me a cut of every sale, and buy band merchandise with a similar arrangement. You could also try to work out something with concert tickets.

Work on setting up 30-second samples of the song. It is my understanding that 30 seconds counts as fair use.

Let people leave comments, but have Digg-style ‘voting.’ (Plus active moderators.) People can leave comments. The stupid ones get moderated down, the really stupid ones get deleted, but the good, insightful ones show up on top. The ones that say, “This song is about…”

And, most importantly, you need a nice clean, easy-to-use UI. Every single lyrics site gets this wrong. I don’t want to go through categories. I don’t want to have to specify whether it’s a song or an artist. I want to type in something and get it. I don’t want the lyrics to be in a 300-pixel wide frame that’s flanked by ads and other useless crap.

You can develop this on your own, but buying some other lyrics sites gives you steady traffic, high link rankings, and an established set of lyrics, however pathetic they may be. And, by buying them out, you ensure that the Internet has one less terrible website.

You are free to steal and use this idea. In fact, you are encouraged to steal and use this idea. 

Getting Familiar with the CLI

As long as you’re doing lots of work in Linux, there are some more things you’ll want to get used to. I spent a lot of time in the command line. (It’s kind of hard to avoid when you’re working on a headless server.) These tips are useless if you don’t have a basic familiarity, but for people with a relatively basic knowledge, here are tips that might come in handy:

Very often in less, I want to jump to the end of a file and work my way up. I can hit space over and over. One day I thought I was clever, when I realized it would tell you how many lines were in the file, and I began telling it jump to line 123 by typing :123 within less. But it turns out it’s even easier. G takes you to the last line. g takes you to the first line. There are many more handy tips here.

Of course, I spent even more time in vi. Search and replace is handy. But keep in mind that the :s/old/new command will only work on one occurrence. You can append a g, ending up with :s/old/new/g, but it’s only going to work on one line. This is usually not desirable. You can specify a line range. Generally, though, you want the whole file. $ denotes the end of the file, so you can do it as “1,$,” denoting “From line 1 to the end of the file.” But it’s even easier: % means “the whole file.” So I end up with…. :%s/old/new/g to replace all “old” with “new”. And if this isn’t what you want, press u to undo. The “G” trick to jump to the end works in vi, too. Turns out you can replace :wq with ZZ, which is essentially the same.

I’ve known about the uniq command for quite some time: its goal is to weed out duplicate lines. This is handy far more often than you might imagine: say you strung a ton of commands together to pull out a list of all e-mail addresses that your mailserver has rejected. There are bound to be many, many duplicates, because apparently bumttwagnerfor@domain is commonly-spammed (?!).

But uniq has a peculiar quirk that I missed. They call it a feature, although I’m not sure I agree. It’s for filtering out sequential duplicate lines. If the duplicate lines aren’t in order, it will merrily pass them on. I suppose there may be scenarios when this is desirable, although I’m at a loss to think of any. In a nutshell, whenever you want uniq, you probably want to run it through sort first. grep something /var/log/messages | sort | uniq, for example, will pull out all lines with “something” in them, but omit all duplicates.

And note that use of grep. For some reason people seem to think that echo filename | grep search_pattern is the way to do it… There’s no reason for echo. Just do grep search_pattern filename.

Fun with Shell Commands

I’m now running a mailserver, and I was trying to set up Mailman to handle a mailing list. I was having some odd behavior causing Mailman to barf up a fatal error, so I used a trailing monitor on the log file with tail -f.

In the course of doing that, I noticed several hosts connect attempting to deliver mail (presumably spam) to “bumttwagnerfor@domain…”, a bizarre address that definitely doesn’t exist.

It’s not a big deal, because the mail’s just bouncing. But it got irritating watching them all in the log file.

I wanted to ban them. It turns out that Linux makes this easy: there’s a hosts.deny file, and anyone in it is banned from connecting. I already have a script that watches for repeat failed login attempts on ssh and bans them. (And I have something like 200 IPs banned, although I suspect that it’s not purging them appropriately.)

All the log entries are in a common format, and look like this:

Oct 8 05:41:31 oxygen postfix/smtpd[23212]: NOQUEUE: reject: RCPT from unknown[62.233.163.250]: 550 5.1.1 : Recipient address rejected: User unknown in local recipient table; from= to= proto=ESMTP helo=<250.248/30.163.233.62.in-addr.arpa>

We can see (actually, guess, in my case) that the IP is the 10th ‘column’ (using a ‘space’ as a delimiter). So we can begin a rudimentary script to print out just that:

# grep bumttwagnerfor /var/log/messages  | awk '{ print $10}' | head
unknown[211.49.17.175]:
81.202.185.36.dyn.user.ono.com[81.202.185.36]:
host-89-228-234-224.kalisz.mm.pl[89.228.234.224]:
LSt-Amand-152-32-14-78.w82-127.abo.wanadoo.fr[82.127.29.78]:

But there’s an obvious problem: the hostname is rammed up against the IP. I want to just ban the IP, and strip out the hostname. The correct way is to write a lengthy regular expression to match just whatever’s between the [ and ]. (Note that you can’t just write a regular expression to match IPs: the very first one has an IP in its hostname, for example, which would throw you off.)

The quick and easy solution is to replace the [ with a space and the ] with a space, which gives you “hostname IP “. And then you use awk again to print it:

grep bumttwagnerfor /var/log/messages | awk '{ print $10}' | sed "s/[/ /g" | sed "s/]/ /g" | awk '{print $2}'

This is a pretty ugly command. Just the way I like it. 😉

But we’re not quite done! The format for hosts.deny is “Service: Address.” We’re just getting addresses here. I want the output to be something like ALL: 1.2.3.4 for each entry. (If they’re spamming me, I don’t want to allow them access to any other services.)

When it’s all said and done, here’s the command:

grep bumttwagnerfor /var/log/messages | awk '{ print $10}' | sed "s/[/ /g" | sed "s/]/ /g" | awk '{print "ALL", $2}'

You can just append a >> hosts.deny to deny them right away, or parse it through head or less to review first.

And viola. 440 IPs banned.

Seriously, though. wtf is going on? 440 different people have tried spamming this address that has definitely never existed.