How to Use Robot.txt for Competitive Intelligence

If you run a website, you are in competition with millions of other websites and blogs for visitors, customers and search engine position. To gain an advantage over competitors, you need to know all that you can about them, and you can get this information through a process known as “competitive intelligence.” One way to gain competitive intelligence is to explore the pages on a website that a competitor doesn't want search engines to index. You can often get a glimpse of this information in your competitor's "robot.txt" file.

Instructions

    • 1

      Open a Web browser and navigate to your competitor's home page. Add "/robot.txt" without the quotes to the end of the Web address in your browser navigation bar and press "Enter."

    • 2

      Review the restricted subdirectories your competitor has listed in the robots.txt file by searching for entries such as "Disallow: /example/." These are folders on the site that your competitor does not want search engines to index. Some of these may be administrative folders that would be illegal to hack, but you might also see public files or projects your competitor is developing. For instance, if you see a "forum" listing, that might tell you that your competition is building a forum. You can use this information to build your own forum and bring it online first.

    • 3

      Examine the disallowed “user-agents” in the robots.txt file. Ask yourself why your competitor does not want specific bots to crawl the site. For example, is your competitor blocking a particular search engine, and if so, why? If Baidu is blocked, it might be that there’s no market for the service in China. However, it could also reveal useful information you can use to target the Chinese market and beat your competition to the punch.

Related Searches:

References

Comments

Related Ads

Featured