How to Create a Web Robot
Although it sounds far-fetched, blocking search engine spiders with robots is actually what a robot.txt file does. Search engines use spiders (or robots, or bots) to crawl or index your website, searching for keywords to use to bring up your website in a search. A robot.txt file is a file you can easily create to let the spider know that you don't want it to crawl on your page, or part of your page.
Instructions
-
-
1
Open your favorite text editor. It doesn't matter what text editor you use. Notepad works just fine if you're on a PC, and can be found under "Accessories."
-
2
Enter two lines, one for the name of the spider that will be crawling your web page, and one for the directory or file name you want to exclude for the search. This is the syntax:
User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]For example:
User-Agent: Googlebot
Disallow: /mywebsite/private.htmlwhere "Googlebot" is the robot sent out by Google, and "private.html" is the file in the directory "mywebsite" that you do not want the robot to index.
-
-
3
Exclude a section of your site from all spiders. If you do not want any robots to index a certain section of your site, use the "*" character after User-Agent. Your file would look like this:
User-Agent: *
Disallow: /mywebsite/private.html -
4
Exclude your whole site from all robots. If you don't want any of your site to be visible by robots, (e.g. if you are building your website, and it is not ready to be viewed by the public), insert a "*" character after User-Agent, and the "/" after Disallow. For example:
User-Agent: *
Disallow: / -
5
If you want to allow all robots to access your whole site, simply add the asterisk as before, and leave the Disallow section empty, as follows:
User-Agent: *
Disallow: -
6
Save the file as robot.txt, and place it in the root directory of your website. For example, http://www.mywebsite.com/robots.txt.
-
1
Tips & Warnings
This technique is not a security measure. Pages that are not indexed can still be accessed. There are hundreds of bots out there, some of which will not respect your wishes, and will search the restricted sections of your sites anyway. Still others are designed to search only those restricted sections.
If you restrict your entire site while it is under construction, remember to lift that restriction when your site is ready for viewing so that it can be indexed.
References
Resources
- Photo Credit a magnifying glass pen writing image by davidphotos from Fotolia.com