What Is a PHP Spider?

What Is a PHP Spider? thumbnail
A spider follows links from one page to another.

A spider is a program that logs links between Web pages. Spiders can be written in any programming language, including PHP. You can get a spider program written in PHP either by downloading one, or writing one. A number of spider function libraries are available for PHP.

  1. PHP

    • PHP is a programming language written to create dynamic Web pages. A dynamic Web page is one whose content changes according to user actions or calling parameters. Web pages are written in the Hypertext Markup Language, or HTML. HTML is a formatting system, however, not a programming language. Web pages are stored and delivered in their original state. This form of file is called a static Web page. Web pages created by PHP take the form of an HTML template with programming code inserted in the body of the document. When the program is called, those programming blocks are executed, replacing themselves with HTML.

    Spider

    • A spider is also called a Web Crawler or a Web Bot. The purpose of this category of program is to document Web pages on the World Wide Web. The program needs a Web page as a starting point. That is called a “seed.” It then follows a link in that page to another page and then follows a link in that page to another page and so on. The spider can be written to log information about each page it visits, or just note its existence. Search engine spiders copy each page encountered into the search engine's database where other programs perform further analysis. Although many spider implementations have a range of tasks, The act of passing from one page to another is the task that defines the program as a spider.

    Programming Spiders

    • Although a spider can be written in any language, Java, Perl and C# are the most popular languages for these programs. This is mainly because programmers specializing in Web programming are familiar with these languages. Python and PHP are also used, again, because these languages have a skills pool following among the Web community.

    Method

    • PHP is usually used to generate Web pages that are then presented to requesting Web browsers. Web browsers have the active role of seeking a site and requesting a page from the Web server. PHP's role is usually static. A PHP program operating as a spider has to emulate Web browsers by requesting Web pages from Web servers. Web pages are transported by the Hypertext Transfer Protocol. This does not download a file containing the page, but copies out the code for the page and carries it in the body of the message responding to a request for a page. The PHP spider has to read in the contents of the message. It does not store the page, but scans incoming text for Web links. The link can be in any format, because HTML stores links with a tag like “<a href=“...”>” so the program just has to look for those symbols and copy out the text, where “...” appears in this example. Searching through text and stripping out specific sections is called “parsing” in programming parlance.

Related Searches:

References

Resources

  • Photo Credit Hemera Technologies/AbleStock.com/Getty Images

Comments

Related Ads

Featured