A Spider Bot Project

A Spider Bot Project thumbnail
A spider bot helps you sift through information.

Even if you don't own a robot that can cook dinner or rake leaves, you can at least create a robot to obey your commands online with a little programming know-how. A spider bot project allows programmers to create a program, called a "bot," to crawl various Web pages and extract information for later use. Many well-known companies rely on spider bots to "crawl" the Web and retrieve data. Though code samples often appear in C# and .NET, you can readily adapt them to any language.

  1. Environment

    • To begin your project, create a new file in the development environment of choice, such as Eclipse or Visual Studio. You can use almost any language together with an appropriate class library to access and manipulate data. The proper development environment provides for nearly endless Internet programming possibilities. For example, you might use the Web programming features of your framework to create a spider to alert you when a price has dropped or when a Web page has included new material.

    URI

    • Most programmers create a URI object before they create the actual spider bot. Instantiate a URI object using the Web page you would like to crawl with your spider bot as an argument:

      Uri uri = new Uri("http;//www.SampleSite.com/");

      The URI object represents the Web page the spider bot will eventually crawl. Here, the URI object points to the "SampleSite" Web site. Once you've created a URI or list of URIs, you're ready to begin building the bot itself.

    HttpWebRequest

    • Instantiate a HttpWebRequest using the URI to create the core of the bot:

      WebRequest sampleWebRequest = HttpWebRequest.Create(uri);

      The WebRequest lies at the core of the spider bot. It essentially requests Web pages in much the same way as a Web browser such as Internet Explorer or Firefox. However, rather than loading the contents of page for viewing, the WebRequest simply holds the files that contain the Web page in memory. Almost any Web framework contains an equivalent of .NET's HttpWebRequest object.

    HttpWebResponse

    • Put your bot to work using the WebRequest and its GetResponse method to request the Web page and place it in an HttpWebResponse object:

      HttpWebResponse SampleHttpWebResponse = (HttpWebResponse)http.GetResponse();

      The above code downloads the www.SampleSite.com page from the Internet and places it in local memory. You can then transform the data into a HttpWebResponse object, which allows you to manipulate the data.

    Stream

    • If you wish to make the Web page legible to a human being such as yourself, you must transform it into ASCII and assign it to a String data type variable. Read the HttpWebResponse using a StreamReader, and encode it into ASCII. Close the HttpWebResponse:

      StreamReader sampleStreamReader = new StreamReader(
      SampleHttpWebResponse.GetResponseStream(),
      System.Text.Encoding.ASCII );
      String result = sampleStreamReader.ReadToEnd();
      SampleHttpWebResponse.Close();
      sampleStreamReader.Close();

      From this step, you can save the Web page's data to a database, upload it to a server or display it. The possibilities are nearly endless.

Related Searches:

References

Resources

  • Photo Credit Hemera Technologies/AbleStock.com/Getty Images

Comments

Related Ads

Featured