The Metablog

The blog about the blogs!

Archive for December, 2007

Spam Blockade

I’ve for a long time noticed that comment spammers usually come in and use a POST command to post their spam, without ever issuing a GET for the form. This is a combination of things that doesn’t make sense.

I wondered aloud on the WPMU forum whether it would make sense to keep a MySQL table of people who issue a GET for a page, and, when a POST comes in, see if that user is in the table. Someone recommended something simple: use mod_rewrite to refuse comments from people who don’t have my site as the ‘referer’ when posting. Unless you’re using a very broken browser, this shouldn’t include any legitimate users, but will include uncreative spammers. (Note that blog posts are unaffected by this new rule, so people posting blogs through software as opposed to the web interface should be unaffected.)

If you find that this is causing problems for you, I’m mwaggy and I use gmail.com for my mail service.

  • 0 Comments
  • Filed under: Uncategorized
  • Stats

    I spent some time on the wpmu forums, and was inspired to create this little script that generates some statistics.

    The source is here. N.B. that the first line includes a ‘proprietary’ set of libraries which I’m not distributing, but all it does is handles the MySQL connection and creates a query() function for (my) ease of use. Also note that all of this information is either extracted from the database (aside from a little arithmetic here and there), so you’d do well to make sure MySQL query caching is turned on if you expect to use this type of thing with any regularity.

  • 3 Comments
  • Filed under: Uncategorized
  • Moderating Spam

    For whatever reasons, it’s been DEFCON 1 here for spam. I don’t really know what volume of spam you guys are getting, but for me, it’s been really bad. (Consider getting an Akismet key if it’s the case.)

    If you do have to moderate spam, please do all of us a favor: hit “Spam,” not “Delete.” Moderating the comment as spam keeps it from showing up, but also leaves an entry in the database, allowing me to include them in the banlist that I’m updating a couple times a day. If you just delete it, there’s no record of the comment ever occurring, so they can come back in and leave more spam.

    IP bans are far from 100% effective, but I’m hoping it’ll at least make a dent in the amount of spam we’re getting.

  • 2 Comments
  • Filed under: Uncategorized
  • Updates

    I just did some more cleaning up of the main page:

    • I added a link to the Links blog. I have in the back of my head that I’m going to set some more people up with accounts to that than have ‘full’ blogs.
    • I brought back the “Blogroll” list of links. I was never sure why it disappeared. It turns out that the reason is quite simple: I wanted to change the way the main page was generated to make it easier to cache, and I wholly omitted it. It’s configured to pull out random entries, maximum of 12. There are a total of 11 right now.
    • I changed the query that builds the main page. I’ll elaborate in the comments, but the short version is that it’s longer now, but will shrink as comments ‘age off’ the main page.
    • I added a Table of Contents to the side. This is nothing more than a list of what’s currently shown on the main page, but I think it’ll be worth the space.
    • I “fixed” bullet points (<li> tags)… With multiple lines of text, the bullet point was getting centered on the text. (This was actually explicitly designed in the CSS of the design I’m using, which seems odd, but it was easy to change.) They still need some fine-tuning.
  • 1 Comment
  • Filed under: Uncategorized
  • Spam

    It seems that the volume of spam here has drastically increased. I use the Akismet plugin to cut down on it, but it’s per-blog. (Aside: you can set it up yourself. The plugin is installed, but you need an API key to activate it for your blog. Visit here for details, and then bring that API key to the Plugins tab of your WP-Admin interface.)

    Anyway, I recently realized that it does the same thing I do: it keeps the comment in the database but sets the ’spam’ flag. This is wasteful of space, but great for things like, say, writing a simple SQL query to get a list of all IPs that have left spam in the past 48 hours and displaying them on a webpage.

    I wrote it to make it super-easy for me to ban IPs periodically… So I just added this list to /etc/hosts.deny, keeping them from connecting. (I do wish there were an easier way for ban “aging”–I don’t want to ban IPs for more than a couple days, but that’s not easy to do.)

    Anyway, you should hopefully notice a decreased volume of spam.

  • 0 Comments
  • Filed under: Uncategorized
  • Geekier

    A transparent upgrade to the code here will now cache the whole page. The problem is that each page load queries Memcache to see if there’s a cached version to use. This seems to take more overhead than I expected: I went from ~150 pages/second to about 200. (190 with Apache, 210 with lighttpd.)

    I think what I really need to do know is use something like APC to lighten the load further…

  • 3 Comments
  • Filed under: Uncategorized
  • New Interface!

    I’ve finally rolled out my secret project: a new main page.  The old one looked awful in IE. And even when it displayed properly, its appearance started to grate on my nerves.

    This new one (it’s hosted at /main2, but all old links should point to it) sports a new look, and a lot of under-the-hood improvements. (In theory, it should be much faster. In practice, I need to do some more tuning.) I also aggregate all the “Blogrolls” (a really corny name for a list of links) into one and randomly pick 12. (There aren’t 12 total, though, so it’s less.)

    As a heads up, I’m making extensive use of memcache to cache things. It should be pretty seamless, but if things take a couple seconds to appear, that’s why. (If things take a long time to appear, let me know: I can tweak how long they stay in the cache.)

    The only bug I’m currently aware of is that the “Calendar” is utterly useless: it’s the wrong month and the link doesn’t point anywhere. But it may well be rough around the edges: post a comment if so!

  • 5 Comments
  • Filed under: Uncategorized