How to Create a ROBOTS.TXT file for your site

By SB Glasby

Rate: (4 Ratings)

So you've got a web site, you've submitted it to several search engines but you are still not showing up on their listings. Perhaps you need a robots.txt file to tell the search engine robots what they can and can't look at. perhaps your hosting provider has placed on their that is blocking all search engines. All properly designed site will be using a robots.txt file.

Instructions

Difficulty: Easy

Things You’ll Need:

  • A basic text editor.
  • Assumption: you already have a web site on a hosting server.
Step1
Check to see if your site is already using a robots.txt file. Open a web browser and type in the domain name of your site then add robots.txt as the page you are requesting ( http://www.yourdomain.com/robots.txt )
Step2
If you get the standard Error 404 ( page not found ) error then your site does not have one there. If you see anything else, that is what search engines see when they make the request.
Step3
Open your text editor, press the space bar, then select file save as, and name the file robots.txt
Step4
type in:

User-agent: *
Disallow: /

to block all search engines from searching your site.
Step5
type in:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /search.asp

These entries will stop ( legitimate ) search engines from search certain directories or pages:
Step6
upload the robots.txt file to the "ROOT" of your site. The Root of the site is where the main page is.
Step7
Open a web browser and type in the domain name of your site then add robots.txt as the page you are requesting ( http://www.yourdomain.com/robots.txt ), you should now see the text file you just uploaded.

Tips & Warnings

  • If your hosting provider does not allow you to modify or have your own robots.txt file, you should enter a request with them to place a custom file for your site on their servers.
  • Technically, you are telling the search engines what they can see and index, by telling them what not to look at.
  • By not having a robots.txt file on your site, search engines assume that everything is OK to index.
  • Check the robots.txt file on other sites to see what they are blocking ( including search engines )
  • To inform the spiders to not index a whole directory, make sure to follow the directory name with a trailing slash. ie: /directory/ . The trailing slash tells the robot this is a directory.
  • Although most robots are running from UNIX servers, it's a good idea to make sure any directory or files named in the robots.txt file are exactly the same case as the file name on the server. ( windows servers will server up file names of mixed case ), UNIX servers will also serve up mixed case file names as long as they are configured to do so.
  • best practice is to name all files with lower case letter schemes not matter which server platform you are on.
  • Auto generated robots.txt file that send anything other than the text could make a search engine NOT index your site.
  • If the auto generated robots.txt file send and HTML page back on request, search engines may not index your site.

Comments

| View All Comments
mar1965

mar1965 said

Flag This Comment

on 3/16/2008 Excellent tip! Thanks for sharing this!

webmiser

webmiser said

Flag This Comment

on 1/14/2008 Thank You, Wish more people actually understood this concept

Post a Comment

POST A COMMENT

Request a New How-To Article

Looking for more How To information? Chances are there’s an eHow member who knows how to do what you’re looking to do. Submit an article request now!

eHow Article: How to Create a ROBOTS.TXT file for your site

Article By: SB Glasby

SB Glasby

Authority Authority | 3795 Points

Category: Internet

Articles: See my other articles

Related Ads

Internet

Veesites
Meet Virginia DeBolt eHow’s Internet Expert.