Easy nofollow tags in Ruby (and Rails)

In: Uncategorized

6 Jul 2009

For a while I’ve been trying to ensure that all user-generated links on a site I code for had the rel=nofollow attribute to prevent giving spammers our link juice.

It’s a tough problem to solve, though. Or so I thought. I ended up doing a global search-and-replace (gsub) on any user-generated text, replacing "<a " with "<a rel='nofollow' " but this was broken for a few reasons. One is that, while apparently legal, it’s bizarre to throw the rel attribute before the href attribute. Maybe that doesn’t matter. The tricky part is that a link of the form <a title="evil site" href="http://www.example.com"> is totally valid, so I couldn’t just match on <a href or I would miss links from crafty users. The more I thought about it, the more it turned into a regular expression from hell. Plus, what if there was already a rel attribute, something like <a title="link from hell" href="http://www.example.com" rel="faked_you_out">? I’d then put in a second rel attribute, which is bad. It was just spiraling out of control, and turning into a lot of code to handle a lot of weird possible cases.

A coworker nudged me in the direction of hpricot, an HTML parser. And suddenly, it was comically easy to do this flawlessly:

require 'hpricot'
html = Hpricot.parse(user_content_here)
(html/'a').each do |link|
   link['rel'] = 'nofollow'
end
return html.to_s

For each ‘a’ attribute, called ‘link,’ set its “rel” attribute to “nofollow”. If there’s already a rel attribute, it’s replaced with “nofollow” and if there isn’t one, it’s added. hpricot handles all of the “special cases” that my code would have required. I don’t care where the rel is at all in this. It just works.

How awesome is that? It seems to work flawlessly, and yet it’s really basic code once you get past the somewhat unconventional format that hpricot uses.

Comment Form

On Other Sites

  • Red Eye: lol @ Mr notto 3.13 am here [...]
  • noname: This post was exactly the solution to the error I was getting, thanks. [...]
  • GGE: Thank you. Exactly my error. [...]
  • notty: Go ahead and believe this while I ransack your servers AHHHAHAHAA Mr Notty [...]
  • Matt: Hey Victor, A couple good resources for you... http://www.scanboston.com/boston.htm is really det [...]