Google Indexer Function
Google is constantly scanning known and linked Web pages for updated content and soliciting information about new websites from domain owners. In order to make this information useful for online queries, captured websites must be indexed and organized. The Google Indexer is the virtual entity that is in charge of performing those organizational functions.
-
Definition
-
The Google Indexer is a component of the Google Search Engine system that forms the database for Google search functions. The Indexer stores the full text of every Web document located by Google's Web crawling function. Every word is is stored in a database that contains information on which Web document the word comes from, where in the document the word appears and what other words appear in close proximity to the word.
System
-
Google Indexer receives all of its information from the Googlebot Web crawler. The Googlebot locates Web pages through both an online request form and by following links that appear on other Web pages. Googlebot fetches the content of each Web page and sends the full text of the page to the Indexer for processing. The Indexer then sorts and stores the contents of the pages that it receives in its database. When a user types a query into the Google search engine, the query accesses the Indexer's database and returns search results based on a complex, proprietary algorithm.
-
Stop Words
-
One way that Google optimizes search times and minimizes the size of its massive Indexer database is by identifying terms that provide little information about the content of a page. These terms, which Google calls "stop words," are typically too frequently used and too general to be useful for the query process. Examples of stop words include "and," "if" and "the." Stop words are the only words that the Google Indexer omits from its database.
Performance Functions
-
The Indexer performs certain other functions to optimize database performance. Google determined that few punctuation patterns are useful for analyzing search terms; thus, most punctuation marks are omitted from the Indexer database. The database also omits multiple spaces and converts all letters to lowercase, a process which reduces the size of the Index database factorially at little cost to the quality of search results.
-
References
- Photo Credit Thinkstock Images/Comstock/Getty Images