Yahoo announced yesterday that they were releasing detailed documentation on the Robots Exclusion Protocol (REP) and how Yahoo, Google and MSN would handle it. The REP, commonly utilized by webmasters in the form of a robots.txt file or in META tags, has been around since the 90s. By specifying certain robot rules, such as noindex or nofollow, webmasters can in some ways “hide” sections of their website which they do not want included in search results.
Yahoo’s announcement that they’ve been working with Google and MSN for the past couple years to develop a more standard approach to the REP is good news for SEOs. But more importantly, the fact that they are disclosing details of how they deal with REP directives makes it easier for webmasters to work with them. For the most part, this isn’t earthshattering news, since many SEOs have already conducted their own tests and pretty much have already figured out what works and what doesn’t.
Besides the common robots.txt file and META tag directives, Yahoo describes a couple directives which they support which are not supported by Google or MSN at this time:
Crawl-Delay: Allows a site to delay the frequency with which a crawler checks for new content
NOYDIR META Tag: This is similar to the NOODP META Tag but applies to the Yahoo! Directory, instead of the Open Directory Project
Robots-nocontent Tag: Allows you to identify the main content of your page so that the Yahoo crawler targets the right pages on your site for specific search queries by marking out non content parts of your page. Yahoo won’t use the sections tagged as such for indexing the page or for the abstract in the search results.
Google also provides their own REP information in their webmaster help center, How do I use a robots.txt file to control access to my site?