robots.txt to Exclude Content from Searches
November 10, 2008 by jp
Attracting spiders and bots to your website is one of your strategies to increase page rank and bring in more potential customers. But do you have pages of content you’d rather hide from those nosey search engines? Maybe you have multiple versions of a page on your site so you can split-test for effectiveness, or maybe you have different pages for viewing in the browser and one that is more printer-friendly. Rather than having them viewed as duplicate text, you can exclude one from being ‘crawled’.
Another aspect of many websites today is the value of privacy and sensitive information or data. It may be important to hide some of this information. You already know that search engines can only read text, so images and other types of graphics and javascript don’t really add any value to increasing your page rank. You may want to hide those from the creepy crawlers too.
That’s why robots.txt is such a great tool for website owners. With this resource, you can direct search engines where you want them to go on your site – but more importantly you can keep them away from pages where you DON’T want them to go. Robots.txt is a text file that tells the search engine spiders and bots what you want them to bypass or avoid.
You can disallow all crawlers from your entire server, exclude all bots from certain parts of the server, exclude specific SE bots from specific parts of the server, allow a single robot or exclude a single robot, or exclude all files on your server but one. It all depends on your reason for exclusions and what you’re trying to accomplish. For example, you may have a strategy that uses different pages on your server to optimize Google while you use a different strategy to optimize for Yahoo.
Be sure to put the robots.txt file in the top level directory of your web server, or the main directory, or crawlers and bots won’t find it. They’ll look there first to see if there is anything they should avoid – of course, not all spiders and bots are obedient and they may crawl the page anyway. But it’s less likely that they will try when you use a robots.txt file. And be sure to use all lower case letters in the file name - it’s not ‘Robots.txt’ file - it’s always ‘robots.txt’.
Visit http://www.robotstxt.org/ for more information on robots.txt and decide if this is something you need to help with your SEO strategy














Comments
Feel free to leave a comment...
and oh, if you want a pic to show with your comment, go get a gravatar!
You must be logged in to post a comment.