Easy nofollow tags in Ruby (and Rails)

For a while I’ve been trying to ensure that all user-generated links on a site I code for had the rel=nofollow attribute to prevent giving spammers our link juice.

It’s a tough problem to solve, though. Or so I thought. I ended up doing a global search-and-replace (gsub) on any user-generated text, replacing " with " but this was broken for a few reasons. One is that, while apparently legal, it's bizarre to throw the rel attribute before the href attribute. Maybe that doesn't matter. The tricky part is that a link of the form http://www.example.com">; is totally valid, so I couldn't just match on or I would miss links from crafty users. The more I thought about it, the more it turned into a regular expression from hell. Plus, what if there was already a rel attribute, something like http://www.example.com" rel="faked_you_out">? I'd then put in a second rel attribute, which is bad. It was just spiraling out of control, and turning into a lot of code to handle a lot of weird possible cases.

A coworker nudged me in the direction of hpricot, an HTML parser. And suddenly, it was comically easy to do this flawlessly:

require 'hpricot'
html = Hpricot.parse(user_content_here)
(html/'a').each do |link|
   link['rel'] = 'nofollow'
end
return html.to_s

For each 'a' attribute, called 'link,' set its "rel" attribute to "nofollow". If there's already a rel attribute, it's replaced with "nofollow" and if there isn't one, it's added. hpricot handles all of the "special cases" that my code would have required. I don't care where the rel is at all in this. It just works.

How awesome is that? It seems to work flawlessly, and yet it's really basic code once you get past the somewhat unconventional format that hpricot uses.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.