{"id":2055,"date":"2009-07-06T20:03:01","date_gmt":"2009-07-07T00:03:01","guid":{"rendered":"http:\/\/blogs.n1zyy.com\/n1zyy\/?p=2055"},"modified":"2009-07-06T20:03:01","modified_gmt":"2009-07-07T00:03:01","slug":"easy-nofollow-tags-in-ruby","status":"publish","type":"post","link":"https:\/\/blogs.n1zyy.com\/n1zyy\/2009\/07\/06\/easy-nofollow-tags-in-ruby\/","title":{"rendered":"Easy nofollow tags in Ruby (and Rails)"},"content":{"rendered":"<p>For a while I&#8217;ve been trying to ensure that all user-generated links on a site I code for had the rel=<a href=\"http:\/\/en.wikipedia.org\/wiki\/Nofollow\">nofollow<\/a> attribute to prevent giving spammers our <a href=\"http:\/\/www.getfoundnow.com\/internetmarketing\/link-juice-explained.html\">link juice<\/a>.<\/p>\n<p>It&#8217;s a tough problem to solve, though. Or so I thought. I ended up doing a global search-and-replace (<a href=\"http:\/\/www.ruby-doc.org\/core\/classes\/String.html#M000817\">gsub<\/a>) on any user-generated text, replacing <tt>\"<a \"<\/tt> with <tt>\"<a rel='nofollow' \"<\/tt> but this was broken for a few reasons. One is that, while apparently legal, it's bizarre to throw the rel attribute before the href attribute. Maybe that doesn't matter. The tricky part is that a link of the form <tt><a title=\"evil site\" href=\"<a href=\"http:\/\/www.example.com\"&#038;gt\">http:\/\/www.example.com\"&gt<\/a>;<\/tt> is totally valid, so I couldn't just match on <tt><a href<\/tt> or I would miss links from crafty users. The more I thought about it, the more it turned into a regular expression from hell. Plus, what if there was already a rel attribute, something like <tt><a title=\"link from hell\" href=\"<a href=\"http:\/\/www.example.com\">http:\/\/www.example.com<\/a>\" rel=\"faked_you_out\"><\/tt>? I'd then put in a second <tt>rel<\/tt> attribute, which is bad. It was just spiraling out of control, and turning into a lot of code to handle a lot of weird possible cases.<\/p>\n<p>A coworker nudged me in the direction of <a href=\"http:\/\/wiki.github.com\/why\/hpricot\">hpricot<\/a>, an HTML parser. And suddenly, it was comically easy to do this flawlessly:<\/p>\n<blockquote><pre>\nrequire 'hpricot'\nhtml = Hpricot.parse(user_content_here)\n(html\/'a').each do |link|\n   link['rel'] = 'nofollow'\nend\nreturn html.to_s\n<\/pre><\/blockquote>\n<p>For each 'a' attribute, called 'link,' set its \"rel\" attribute to \"nofollow\". If there's already a <tt>rel<\/tt> attribute, it's replaced with \"nofollow\" and if there isn't one, it's added. hpricot handles all of the \"special cases\" that my code would have required. I don't care where the <tt>rel<\/tt> is at all in this. It just works.<\/p>\n<p>How awesome is that? It seems to work flawlessly, and yet it's really basic code once you get past the somewhat unconventional format that hpricot uses.<\/p>","protected":false},"excerpt":{"rendered":"<p>For a while I&#8217;ve been trying to ensure that all user-generated links on a site I code for had the rel=nofollow attribute to prevent giving spammers our link juice. It&#8217;s a tough problem to solve, though. Or so I thought. &hellip; <a href=\"https:\/\/blogs.n1zyy.com\/n1zyy\/2009\/07\/06\/easy-nofollow-tags-in-ruby\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2055","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/posts\/2055","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/comments?post=2055"}],"version-history":[{"count":0,"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/posts\/2055\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/media?parent=2055"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/categories?post=2055"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.n1zyy.com\/n1zyy\/wp-json\/wp\/v2\/tags?post=2055"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}