How to Avoid Bot Checks
Search engines such as Google and Bing generate the links on their sites by using automated programs to scour the Internet for links and content. These programs, known as bots, crawlers or spiders, visit a website when the website's owner submits the URL to the search engine, or when websites visited by search bots link to another site. A website owner might not want his website, or parts of his website, to be listed on a search engine. To enable owners to hide parts of their website, search engines look for a file called robots.txt immediately upon visiting a new website.
Instructions
-
-
1
Open a text editor, such as Microsoft's Notepad.
-
2
Type in the following to prevent any bots from indexing any areas of your site.
User-agent: *
Disallow: /
-
-
3
Alter the "User-agent" value to the name of a search engine' spider or multiple search spiders to create rules specifically for those bots. Change the "Disallow" value to specific directory names to block bots from accessing only those directories while allowing them to traverse the rest of the website. Add multiple "User-agent" lines to create different "Disallow" commands for several bots. For example, the following lines block most search bots from all portions of a website, but allows Google's bots unfettered access, except to two directories:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /private/
Disallow: /secret/
-
4
Save the text file under the name "robots.txt" EXACTLY. Do not add capitalization or any other changes to the file name.
-
5
Upload the file to the main root directory for your website, where the "Main," "Welcome" or "Index" page is typically located. Check that the web address for the file ends up separated from the title of your page page a single backslash (www.example.com/robots.txt). Avoid putting the file in another directory, as robots only check the main directory for robots.txt instructions.
-
1
Tips & Warnings
The asterisk is a special command in robots.txt that calls out all search bots. It cannot be used as a "wild card" command in any other way.
You can hide specific pages as well as directories by typing out the directory path to the file. For example, the following text hides only the "nospiders.html" page located in the "secret" directory: "Disallow: /secret/nospiders.html".
Altering the name of the file or placing it in any directory other than the root directory will cause bots to ignore the directions in the file.
References
Resources
- Photo Credit search 3d sign image by onlinebewerbung.de from Fotolia.com