Site Map Links Web Mining Information  Retrieval Glossary Bibliography
Search Engine

Search engines provide access to a fairly large portion of the publicly available pages on the Web. Stranded in the middle of this global electronic library of information without either a card catalog or any recognizable structure, Search engines are the best means devised yet for searching the web. For example, there are many search engines such as Google, Yahoo!, Alta Vista and so on.

There are two types of search engines:

  1. Individual.  Individual search engines compile their own searchable databases on the web.
  2. Meta.   Meta searchers do not compile databases. Instead, they search the databases of multiple sets of individual engines simultaneously.

Search engines compile their databases by employing "spiders" or "robots" ("bots") to crawl through web space from link to link, identifying and perusing pages. Sites with no links to other pages may be missed by spiders altogether. Once the spiders get to a web site, they typically index most of the words on the publicly available pages at the site. Web page owners may submit their URLs to search engines for "crawling" and eventual inclusion in their databases.

Whenever you search the web using a search engine, you're asking the engine to scan its index of sites and match your keywords and phrases with those in the texts of documents within the engine's database.  

On the down side, the sheer number of words indexed by search engines increases the likelihood that they will return hundreds of thousands of irrelevant responses to simple search requests. They will return lengthy documents in which your keyword appears only once.

Search engines use selected software programs to search their indexes for matching keywords and phrases, presenting their findings to you in some kind of relevance ranking. Although software programs may be similar, no two search engines are exactly the same in terms of size, speed and content; no two search engines use exactly the same ranking schemes, and not every search engine offers you exactly the same search options. Therefore, your search is going to be different on every engine you use. The difference may not be a lot, but it could be significant. Recent estimates put search engine overlap at approximately 60 percent and unique content at around 40 percent.  

In ranking web pages, search engines follow a certain set of rules. These may vary from one engine to another. Their goal, of course, is to return the most relevant pages at the top of their lists. To do this, they look for the location and frequency of keywords and phrases in the web page document and, sometimes, in the HTML META tags. They check out the title field and scan the headers and text near the top of the document. Some of them assess popularity by the number of links that are pointing to sites; the more links, the greater the popularity, i.e., value of the page. Search engines are best at finding unique keywords, phrases, quotes, and information buried in the full-text of web pages. Because they index word by word, search engines are also useful in retrieving tons of documents. If you want a wide range of responses to specific queries, use a search engine.


Created by Lan Man

Last Modified: Nov 11, 2002