ALGORITHMS FOR LEARNING HYPERTEXTS

My collaborator Johan Bollen and I have developed a more efficient way to create associative networks, by applying a bootstrapping philosophy to hypertext navigation (Bollen & Heylighen, 1996, 1997). The hypertext paradigm provides a very simple and natural interface for representing any type of knowledge network to a user (Heylighen, 1991a). A document in a hypertext may contain the description in natural language or other media of a concept, while the associations of that concept to other concepts are represented by hypertext links within the document's text or graphics. The selection of a link by the user calls up the linked document, thus allowing the user to "browse" or "navigate" through the network. The problem with present hypertexts is the same one as with semantic networks and other knowledge representations: the system of nodes and links is created in an ad hoc way by the system designer, without being justified by either an underlying theory, or an empirical method for testing the effectiveness of the resulting network. This typically results in labyrinthine, "spaghetti-like" meshes of interconnected data, so that the user quickly gets lost in hyperspace.

The method we developed allows an associative hypertext network to "self-organize" into a simpler, more meaningful, and more easily usable network. The term "self-organization" is appropriate to the degree that there is no external programmer or designer deciding which node to link to which other node: better linking patterns emerge spontaneously. The existing links bootstrap new links into existence, which in turn change the existing link patterns. The information used to create new links is not internal to the network, though: it derives from the collective actions of the different users. In that sense one might say that the network "learns" from the way it is used.

The algorithms for such a learning web are very simple. Every potential link is assigned a certain "strength". For a given node a, only the links with the highest strength are actualized, i.e. are visible to the user. Within the node, these links are ordered by strength, so that the user will encounter the strongest link first. There are three separate learning rules for adapting the strengths.

1) Each time an existing link, say a -> b, is chosen by the user, its strength is increased. Thus, the strength of a link becomes a reflection of the frequency with which it is used by hypertext navigators. This rather obvious rule can only consolidate links that are already available within the node. In that sense, it functions as a selector of strong connections. However, it cannot actualize new links, since these are not accessible to the user. Therefore we need complementary rules that generate novelty or variation.

2) A user might follow an indirect connection between two nodes, say a -> b, b -> c. In that case the potential link a -> c increases its strength. This is a weak form of transitivity. It opens up an unlimited realm of new links. Indeed, one or several increases in strength of a -> c may be sufficient to make the potential link actual. The user can now directly select a -> c, and from there perhaps c -> d. This increases the strength of the potential link a -> d, which may in turn become actual, and so on. Eventually, an indefinitely extended path may thus be replaced by a single link a -> z. Of course, this assumes that a sufficient number of users effectively follow that path. Otherwise it will not be able to overcome the competition from paths chosen by other users, which will also increase their strengths. The underlying principle is that the paths that are most popular, i.e. followed most often, will eventually be replaced by direct links, thus minimizing the average number of links a user must follow in order to reach his or her preferred destination.

3) A similar rule can be used to implement a weak form of symmetry. When a user chooses a link a -> b, implying that there exists some association between the nodes a and b, we may assume that this also implies some association between b and a. Therefore, the reverse link b -> a gets a strength increase. This symmetry rule on its own is much more limited than transitivity, since it can only actualize a single new link for each existing link.

However, the collective effect of symmetry and transitivity is much more powerful than that of any single rule. For example, consider two links a₁ -> b, a₂ -> b. The fact that a₁ and a₂ point to the same node seems to indicate that a₁and a₂ have something in common, i.e. are related in some way. However, none of the rules will directly generate a link between a₁ and a₂. Yet, the repeated selection of the link a₂ -> b may actualize the link b -> a₂ by symmetry. The repeated selection of the already existing link a₁ -> b followed by this new link can then actualize the link a₁ -> a₂ through transitivity. Similar scenarios can be conceived for different orientations or different combinations of the links.

A remaining issue is the relative importance of the three above rules. In other words, how large should the increase in strength be for each of the rules? If we choose unity (1) to be the bonus given by the first rule, there are two remaining parameters or degrees of freedom: t is the bonus for transitivity, s for symmetry. Since the direct selection of a link by a user seems a more reliable indication of its usefulness than an indirect selection, we assume t < 1 , s < 1. The actual values will determine the efficiency of the learning process, but it seems that this matter cannot be settled by pure theoretical reasoning.

<4 5 6 789 10 >

Дата добавления: 2016-03-05; просмотров: 779;