Navigation in Hypertext

Identifying Hierarchies

In order to view a hyperdocument somewhat like a book, a hierarchical structure (like chapters, sections, subsections, etc.) must be found. This problem consists of two parts: first the root must be identified and then hierarchical and cross-reference links must be distinguished.

The fundamental property of a root is that from the root every node in the hyperdocument must be reachable. Also, the distance from the root to any other node should not be too large. If the distance from the root to a node is very large, the user will have to go through a long and maybe tedious path before reaching the desired information. The root should also have a "reasonable" number of children. In order to satisfy the first two requirements the relative out centrality (ROC) is measured. The root is selected among the nodes with a high ROC, and not having too many children.All structural properties are derived from the converted distance matrix. This matrix contains the distance (using forward links) between each pair of nodes in the hyperdocument. The sum of all distances in this matrix is called the Converted Distance (CD) of the hyperdocument. The sum of all distances from a node to all other nodes is the Converted Out Distance (COD) of that node, while the sum of all distances from nodes to a given node is the Converted In Distance (CID) of that node. When we divide the CD by the COD resp. CID we obtain the relative out (resp. in) centrality (ROC resp. RIC).

The conversion constant K of the converted distance matrix determines how important the effect is of unreachable nodes. Since all nodes should be reachable from the root it is advisable to keep this constant large.

Nodes with a large ROC are good candidates to be root nodes. However, nodes with a very large number of outgoing links get a large ROC but are not usable as a root, since they give no clue as to where to start reading. They are called index nodes. Like the index of a book an index node is not a good starting point for the reader. Nodes with a large RIC are called reference nodes. The bibliography of a book is a typical example of a reference node. When studying the structure of documents it is advisable to leave out index and reference nodes because they disturb the structure of the hyperdocument while not posing navigational problems. (Nobody gets confused because a book or hyperdocument contains an index and a bibliography.)

Selecting a root from the nodes with a high ROC is left to the author. The (textual) content of the nodes is more important for this choice than the structural differences.

The hierarchical structure can be easily displayed. The figure below shows two graphs representing the same hyperdocument. The graph on the left looks pretty much random, while the one on the right shows the hierarchical structure and the cross-references.

In this graph it is left unclear whether the two nodes f should be united or not. One of the links to f could be considered a hierarchical link and the other one a cross-reference link.

Source: De.

Last modified: 6 Nov 2002 by Kathy Nguyen Dang