Information Retrieval in Hypertext
Navigation or browsing is effective for small hypertext systems. For large hypertext databases, information retrieval (IR) through queries becomes crucial, although some well-structured hypertext systems, such as Victorian Web, can be navigated smoothly even without the help of information retrieval. However, Information retrieval systems serve the purpose of finding data items that are relevant to the users query request. The World Wide Web, as the largest hypertext system, is a tool that has become very popular as a means to easily access information from other sites. It is almost impossible to explore such a huge collection of various hypertext documents. Thus information retrieval plays an extremely critical role in hypertext systems. Of course, conversely, I argue here hyperlinks can greatly reinforce the usage of information retrieval systems.
Conklin had suggested that search and query mechanisms
can present information at a manageable level of complexity and detail
[Conklin, 1987]. Halasz's view was that
"navigational access itself is not sufficient. Effective access
to information stored in a hypermedia network requires query-based access
to complement navigation..........search and query needs to be elevated
to a primary access mechanism on par with navigation." [Halasz,
Finding information is a three-step process:
Traditional information retrieval research and development
has concentrated on the second and third step. The distributed nature
of the Internet, as well as the size of large hypertexts on CD-ROM, requires
shifting the focus towards the first step.
The result of a search may be either a pointer to the first match found, or a scored list of matches. Information retrieval is inherently uncertain: a very general query (like asking for one keyword) may yield too many answers, while a very specific query may result in no answers at all.
Structural querying is what distinguished information retrieval in hypertext from that in ordinary text databases. Beery and Kornatzky [BK90] have suggested a logical query language that allows a combination of structural and content-based queries. The logic is a combination of propositional calculus (without predicates or variables) and quantifiers such as many, most, at least m, exactly n, etc. Attribute/value pairs are used to denote content-properties. Another attempt to develop structural querying facilities is the GraphLog language by Consens and Mendelzon [CM89]. GraphLog is a visual language, based on pattern matching in the graph-structure of the hyperdocument.
Information retrieval in distributed hypertexts is inherently more complicated. Global queries are no longer possible. All search activities have to be done by means of automated browsing. The so-called fish-search is an example of a search tool using this technique. In case links carry information, like attribute/value pairs, that can be useful in determining whether or not to follow links, this information can be used to significantly reduce the search space, as explained in [FS90].