What are these hostnames?

I’ve been getting slammed with spam lately. It’s all to a handful of spamtraps on a few domains I have, so it’s actually wonderful that it’s happening, because none of it hits my inbox; spammers are just adding themselves to a blacklist.

I’ve been watching logs and connections, and noticed that a lot of clients are sending bizarre HELO strings in all upper-case with random letters. The pattern seems vaguely familiar, and “Windows workgroups” is coming to mind. Do these hostnames look like that? If not, anyone have a clue what is generating these?

  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=
  • helo=

Incidentally, this argues towards the use of the reject_non_fqdn_helo_hostnames parameter, except that in my case, it would just block them from hitting a spamtrap. (Although really, a very small minority of good mailservers are thought to be misconfigured and identify themselves without an FQDN HELO, so this isn’t 100% safe.)

When I get around to it, I think I want to set my new server up with a little FreeBSD virtual machine and use spamd to torture spammers by talking to them at 1 byte/sec.

PocketRef

I’m yet to see who has purchased Pocket Ref give it anything but a 5-star review.

The concept is repeated in various niches, too (from different authors): AutoRef focuses on car, Pocket PC Ref for computers, Handyman In-Your-Pocket for… handyman stuff…?, and even Pocket Partner meant for cops, but with a lot of reviews from people who find it useful for dealing with Hazmat stuff.

I think I’m going to have to pick some of these up. They’re pretty slick.

False Positives

For someone providing e-mail services, allowing spam through is bad. Go0d mailserver admins get their spam rejection rate as high as they can.

But for someone providing e-mail services, flagging good e-mail as spam, known as a false positive, is really bad. Good mailserver admins have a false positive rate of 0%.

Looking through e-mail bounces from a (legitimate, opt-in) bulk e-mail sender, I’ve discovered a few things that are done wrong. For one, people are just using really bad lists. The five-ten-sg.com blacklist is a notorious example. It took me a long time to get unlisted from them, because someone else in the same datacenter had sent them spam once upon a time. They’re far from the only blacklist doing this, but the point is the same: look into the blacklists you use before you reject mail because of them!

Another thing, though: don’t reject mail because one blacklist says it’s bad. When I get around to setting up a new mailserver, my plan is to score IPs based on how many blacklists they have, weighing more accurate blacklists more heavily. Tools like SpamAssassin do this already. (My plan is to delete from the graylist table when IPs show up in numerous trustworthy blacklists; my area of interest is in the ability to reject mail before they even deliver the message body.)

In other news, my table of IPs that have delivered mail to various spamtraps in the past week have been in overdrive. Just over 2,000 hosts; the most recent 100 all came in within the past 8 hours. The month’s graph is pretty surprising:

The list is available here, but heed my warning above: don’t trust it alone.

Caches

Any time I’ve worked with performance tuning, I’ve found that caching gives the highest rewards with the least work. It’s entirely possible to see thousand-fold increases if you employ caching in the right places. My WordPress install used to run complex SQL queries on every page, and I benchmarked it at 4 pages/second under ideal conditions. I now do some trivial caching of the results of those queries and can push over 400 pages/second. Your OS caches files in unused memory since it’s so much faster than disk, and your browser caches static assets on a site so your browser doesn’t have to download them again for every page view.

But here’s what I’m actually posting about: What is it with people and cleaning out their caches all the time? It’s like caches are some sort of gunk that builds up and clogs up the works. It’d be like me being distraught to find that money is clogging up my bank account and trying to find a way to purge all the money so I had more room in the bank. This is one of those things that’s slightly amusing and only slightly irritating, until you talk to the tenth user in a row who talks about clearing their caches to try to make their computer faster. Why?! What is going through their heads?

Mastering your tools

One thing I’ve found with computers is that most people only learn the basics of many of the tools they use daily. Articles like My top 8 time-saving Firefox shortcuts are great ways to quickly pick up on the little things I never knew about (Ctrl+K to jump to the search bar, and Ctrl+L to jump to the URL bar!).

Today I found myself with a badly-formatted CSV file. Many of the last columns had newlines in them, which isn’t something a CSV should have. I fired up vim, and after a couple minutes, found out the regular expression to find any line that doesn’t end with a quotation mark: /[^”]$/ will do that. (Aside: regular expressions are giant pains, but I don’t know of any easier way to do what they do. I could write code to iterate over every line, but then I’ve got 15 lines of code instead of five obscure characters.)

Pressing “n” jumps to the next pattern matching that, and “J” pulls up the previous line to the current line. “n” again to the next match, and then “.” (easier than Shift+j for an uppercase) repeats the J. I soon realized that there were more instances than I thought, though. The ideal way would be have a regular expression to do the replacement, too, but I couldn’t find an easy way, so I did the next best thing and defined a macro. “qr” defined a macro named “r”, and pressing “n” and then “J”, the two steps from above, recorded them as a macro. “q” again stopped recording, and then “@r” ran the macro. Of course, “@r” (@@ does the same thing) wasn’t much easier to type than “n.n.n.n.n.n.n.” over and over.

So I ran “100@r” to run the macro 100 times. I realized that there were way more cases than I thought, and “10000@r” finished it off.

I was left noticing a few things. One is that many of these things were really obscure. To a UNIX power-user, it’s perhaps cake, but to a more average user, it’s black magic. I decided a while ago to make an effort to really learn vi. I’m only a small fraction of the way to mastery, but I’ve found that the time I invested in learning the lesser-known features has been well worth it.

Another thing I wonder about is how I could have done this more effectively. It seems like there ought to be a way to run a vim command for every spot that a regular expression matches. And putting a huge number in front of a macro to run it as many times as needed seems like a hack. At the same time, I know that using a heavyweight text editor was a no-go. This was an enormous file and text editors were dying under the load of trying to deal with the whole document. vim doesn’t mind a gigantic file.

I’ve learned some handy shell tricks and more about MySQL in the past year, too, and both have gotten me far. It sometimes doesn’t seem worth the time, but every time I’ve made an effort to really master all the nuances of a tool, I’ve recouped my time investment many times over.

Deals I’m Eyeing

A few good deals I’ve run into today:

  • VMware Fusion 1.0, $19.99 after rebate. Who would want Fusion 1.0 when 2.0 was out? Perhaps anyone who noticed that Fusion 1.0 owners get a free upgrade to 2.0
  • 22″ Dell LCD, 1920×1080, $139.99 with free shipping. I have two monitors at this resolution (sadly, not a pair: one’s at work, one’s at home), and got steals on both of them — and yet they cost more than this one. If you’re in the market for a new monitor, you’d be insane to get anything with a lower resolution.
  • Kingston V-Series 128 GB SSD, $249.99. It is supposedly one of the more recent drives where people stopped designing SSDs for raw throughput, and started working on eliminating the “stutter” problem where drives would periodically take a few seconds to write out a tiny block of data, bringing things to a screeching halt. It still claims 80-100MB/sec., but without the awful stutter that drives with faster throughput claim. (That said, I haven’t tested this, since I’m yet to buy one.)

Followers

A while back I asked on Twitter if people had found that A+ or CCNA certifications were worthwhile if you already know the stuff. I took the A+ course in high school but never got the certification. I’ve dabbled with Cisco at work, but am surely far short of CCNA. So I wondered if it was worthwhile.

No one answered me, but I did have several people whose profiles were devoted to Cisco, CCNA, and Cisco training follow me almost immediately. I was slightly amused and slightly annoyed.

I just asked a question about SEO. Not even expressing an interest in SEO, but asking whether the people who try binding huge blocks of IPs to their servers to “enhance SEO” were as full of hot air as I think they are. Now I have two new followers, both SEO enthusiasts.

What other terms should I mention to acquire some more hollow traffic on Twitter? 😉

Shades of Racism

I just had one of those experiences where a lot of little things conspire to paint a big, scary picture.

The other day I was driving to work and passed a Pontiac with this bumper sticker: “Toyota: From the fine folks that brought you Pearl Harbor Day.” Whoa. I didn’t really consider it blatantly racist, largely because the driver looked really old and he had a Navy bumper sticker, too. Maybe he was there. Not that I love it, but racism from war veterans seems less offensive to me, like John McCain continuing to use the racial slur “gooks” to refer to his captors.

That same night, I was watching the tall ships as a Chinese junk sailed in. I didn’t get the whole context of what they were saying, but someone nearby said something about how it was probably bringing lots of diseases into our country.

I’ve long thought that the immigration issue is being exploited by racists. There are plenty of non-racists who are really bothered by illegal immigration, surely, but I’ve also heard enough discussions about illegal immigration turn into racial slurs describing Mexicans to know that racism is alive and well.

Obama’s election brought out the Ku Klux Klan, a group that I had assumed died off sometime around the 60s. It seems that they’re alive and well, and that their ranks have surged with a black President. And it’s not just some strange people that none of us will ever meet. I’ve heard people express their disgust that we would elect an N-word to govern ourselves.

I also tend to think the same thing about Islamophobia. One’s religion now informs stereotypes about whether or not the person is a terrorist. There are all the people who didn’t vote for Obama because he was a Muslim. Besides the fact that it’s not even true, it’s disturbing. A while back I brushed up against someone calling for Obama’s assassination because he was a Muslim. (And yes, it was reported to the FBI.)

Now there’s a growing concern about growing size of the Ku Klux Klan and other openly-racist, often-violent white surpremacist groups, and the fact that they’re infiltrating our military. The media loves alarmist stories, but really, “Violent supremacist groups are joining the armed forces in record numbers” is about as good a cause for alarm as it gets.

Tallships

I work in (well, in an office next to) the Charlestown Navy Yard. I’d forgotten all about the Tall Ships event until they started docking.

Picton Castle

I haven’t been pleased with the photos I got, actually, largely because I’ve been carrying a bag with my work laptop and my camera bag every time I’ve tried to take photos thus far, plus the weather hasn’t been that great. But I thought I’d share a few for those who aren’t attending but who like boats.

Tallships by Night

Working until 9pm on Wednesday turned out to be a blessing. It was really nice and yielded some good photos.

Check out the whole gallery here.

Easy nofollow tags in Ruby (and Rails)

For a while I’ve been trying to ensure that all user-generated links on a site I code for had the rel=nofollow attribute to prevent giving spammers our link juice.

It’s a tough problem to solve, though. Or so I thought. I ended up doing a global search-and-replace (gsub) on any user-generated text, replacing " with " but this was broken for a few reasons. One is that, while apparently legal, it's bizarre to throw the rel attribute before the href attribute. Maybe that doesn't matter. The tricky part is that a link of the form http://www.example.com">; is totally valid, so I couldn't just match on or I would miss links from crafty users. The more I thought about it, the more it turned into a regular expression from hell. Plus, what if there was already a rel attribute, something like http://www.example.com" rel="faked_you_out">? I'd then put in a second rel attribute, which is bad. It was just spiraling out of control, and turning into a lot of code to handle a lot of weird possible cases.

A coworker nudged me in the direction of hpricot, an HTML parser. And suddenly, it was comically easy to do this flawlessly:

require 'hpricot'
html = Hpricot.parse(user_content_here)
(html/'a').each do |link|
   link['rel'] = 'nofollow'
end
return html.to_s

For each 'a' attribute, called 'link,' set its "rel" attribute to "nofollow". If there's already a rel attribute, it's replaced with "nofollow" and if there isn't one, it's added. hpricot handles all of the "special cases" that my code would have required. I don't care where the rel is at all in this. It just works.

How awesome is that? It seems to work flawlessly, and yet it's really basic code once you get past the somewhat unconventional format that hpricot uses.