Gort_Firing How much do you know about the Robots.txt file?  It can be a handy tool for any webmaster to know more about.  Here are some of the most frequently asked questions I get about this special little file.

What is the robots.txt file used for?

In web site development, the robots.txt file is used as a special file that can talk back to the search engine spiders and crawlers to tell them what to do. Here is a little more about them from robotstxt.org:

Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.

How do you create a robots.txt file?

All you have to do is create a basic text file (with the name robots.txt) in your public_html folder inside of your hosting account. You can create that from within your control panel or you can create it on your desktop, and upload it via an FTP client.

Can a robot ignore my robots.txt file?

Yes, but this does not happen that often.  Most good main-stream  robots will not do this.  So is there a way to block just the bad robots?  Kind of.  You can only control robots that obey the robots.txt file.  If that doesn’t work, you might also be able to block the IP address the robot is coming from (if they are coming from a single IP address). 

What should I put in my robots.txt file?

This would depend on what you want to do.

If you would like to make sure all crawlers and spiders make their way in, insert this into your robots.txt file:

User-agent: *
Disallow:

If you want to keep the spiders and bots away from your content in a particular folder, use this:

User-agent: *
Disallow: /privatestuff

With "privatestuff" being the folder you wish to protect.

How can I check to see my robots.txt file has been created successfully?

It is easy to check if your have an account with Google Webmaster Tools. To analyze a site’s robots.txt file:

  • Sign into Google Webmaster Tools with your Google Account.
  • On the Dashboard, click the URL for the site you want.
  • Click Tools, and then click Analyze robots.txt.

How can I figure out where this robot or crawler came from?

You can find a good list of just about all the different robots out there here:

That would be a good list to check first, when you have a robot crawling your site, and you are unaware where it came from.

How do you know you are being visited by a robot?

He will mumble something about klatu verata nikto.  No wait, that was my buddy Gort.  Often the sign that your being visited by a robot is that in your server logs you see many documents have been retrieved in a very short period of time. 

Hope that helps in your robot taming ways.  If you have any other questions, feel free to ask away.

Related Tips and Tricks:

© Lunarpages Web Hosting – Also, don’t forget to follow @lunarpages on Twitter!