Measuring Similarity of Documents for Creating Links between Them

To create static links between semantically related text, we can simply calculate the similarity between all pairs of information, and then insert links between those that are most similar. This assumes that similarity, as measured by information retrieval techniques, mirrors semantic relatedness, and has been used to good effect. There are many ways of measuring similarity and then determining whether a link should be in place.

Many authors have described work on approaches like this and it was particularly important at a time when processing speeds meant that precomputation was attractive. Furuta et al. describe a comparative study of the quality of links produced [Furuta 1989]. Salton et al. describe building a set of cross-references for an encyclopedia [Salton 1991] and Lelu created links using both similarity and spreading activation [Lelu 1991]. Green introduced the use of lexical chains, exploiting the semantic relatedness of individual words, to determine when links should be used [Green 1998]. Allen demonstrated that there is value in distinguishing between the various sub-types of semantic links, and demonstrated how links associated with these sub-types can be determined and assigned [Allan 1997].