logo资料库

语义网络分析.pdf

第1页 / 共38页
第2页 / 共38页
第3页 / 共38页
第4页 / 共38页
第5页 / 共38页
第6页 / 共38页
第7页 / 共38页
第8页 / 共38页
资料共38页,剩余部分请下载后查看
Cognitive Science 29 (2005) 41–78 Copyright © 2005 Cognitive Science Society, Inc. All rights reserved. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth Mark Steyversa, Joshua B. Tenenbaumb aDepartment of Cognitive Sciences, University of California, Irvine bDepartment of Brain and Cognitive Sciences, Massachusetts Institute of Technology Received 15 July 2003; received in revised form 19 June 2004; accepted 22 July 2004 Abstract We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget’s Thesaurus. We show that they have a small-world structure, char- acterized by sparse connectivity, short average path lengths between words, and strong local cluster- ing. In addition, the distributions of the number of connections follow power laws that indicate a scale-free pattern of connectivity, with most nodes having relatively few connections joined together through a small number of hubs with many connections. These regularities have also been found in certain other complex natural networks, such as the World Wide Web, but they are not consistent with many conventional models of semantic organization, based on inheritance hierarchies, arbitrarily structured networks, or high-dimensional vector spaces. We propose that these structures reflect the mechanisms by which semantic networks grow. We describe a simple model for semantic growth, in which each new word or concept is connected to an existing network by differentiating the connectiv- ity pattern of an existing node. This model generates appropriate small-world statistics and power-law connectivity distributions, and it also suggests one possible mechanistic basis for the effects of learn- ing history variables (age of acquisition, usage frequency) on behavioral performance in semantic processing tasks. Keywords: Semantic networks; Small worlds; Semantic representation; Growing network models 1. Introduction Network structures provide intuitive and useful representations for modeling semantic knowledge and inference. Within the paradigm of semantic network models, we can ask at least three distinct kinds of questions. The first type of question concerns structure and knowl- Requests for reprints should be sent to Mark Steyvers, Department of Cognitive Sciences, 3151 Social Sciences Plaza, University of California–Irvine, Irvine, CA 92697–5100; E-mail: msteyver@uci.edu or Joshua B. Tenen- baum, Department of Brain and Cognitive Sciences, 77 Massachusetts Avenue, Cambridge, MA 02139; E-mail: jbt@mit.edu
42 M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) edge: To what extent can the organization of human semantic knowledge be explained in terms of general structural principles that characterize the connectivity of semantic networks? The second type of question concerns process and performance: To what extent can human perfor- mance in semantic processing tasks be explained in terms of general processes operating on se- mantic networks? A third type of question concerns the interactions of structure and process: To what extent do the processes of semantic retrieval and search exploit the general structural features of semantic networks, and to what extent do those structural features reflect general processes of semantic acquisition or development? The earliest work on semantic networks attempted to confront these questions in an inte- grated fashion. Collins and Quillian (1969) suggested that concepts are represented as nodes in a tree-structured hierarchy, with connections determined by class-inclusion relations (Fig. 1). Additional nodes for characteristic attributes or predicates are linked to the most general level of the hierarchy to which they apply. A tree-structured hierarchy provides a particularly eco- nomical system for representing default knowledge about categories, but it places strong con- straints on the possible extensions of predicates—essentially, on the kinds of knowledge that are possible (Keil, 1979; Sommers, 1971). Collins and Quillian proposed algorithms for effi- ciently searching these inheritance hierarchies to retrieve or verify facts such as “Robins have wings,” and they showed that reaction times of human subjects often seemed to match the qual- itative predictions of this model. However, notwithstanding the elegance of this picture, it has severe limitations as a general model of semantic structure. Inheritance hierarchies are clearly appropriate only for certain taxonomically organized concepts, such as classes of animals or other natural kinds. Even in those ideal cases, a strict inheritance structure seems not to apply except for the most typical members of the hierarchy (Carey, 1985; Collins & Quillian, 1969; Rips, Shoben, & Smith, 1973; Sloman, 1998). Subsequent work on semantic networks put aside the search for general structural principles of knowledge organization and instead focused on elucidating the mechanisms of semantic processing in arbitrarily structured networks. The network models of Collins and Loftus (1975), for instance, are not characterized by any kind of large-scale structure such as a treelike hierarchy. In terms of their large-scale patterns of connectivity, these models are essentially un- structured, with each word or concept corresponding to a node and links between any two Fig. 1. Proposed large-scale structures for semantic networks: (a), a tree-structured hierarchy (e.g., Collins & Quillian, 1969); (b), an arbitrary, unstructured graph (e.g., Collins & Loftus, 1975); (c), a scale-free, small-world graph.
M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) 43 nodes that are directly associated in some way (Fig. 1B). Quantitative models of generic asso- ciative networks, often equipped with some kind of spreading-activation process, have been used to predict performance in a range of experimental memory retrieval tasks and to explain various priming and interference phenomena (Anderson, 2000; Collins & Loftus, 1975; Deese, 1965; Nelson, McKinney, Gee, & Janczura, 1998). As a result of research in this tradition, there is now a fair consensus about the general char- acter of at least some of the processes involved in the formation and search of semantic mem- ory (Anderson, 2000). By contrast, there is relatively less agreement about general principles governing the large-scale structure of semantic memory, or how that structure interacts with processes of memory search or knowledge acquisition. Typical textbook pictures of semantic memory still depict essentially arbitrary networks, such as Fig. 1B, with no distinctive large-scale structures. The implications for semantic network theories of meaning are not good. Under the semantic net view, meaning is inseparable from structure: The meaning of a concept is, at least in part, constituted by its connections to other concepts. Thus, without any general structural principles, the semantic net paradigm offers little or no general insights into the nature of semantics. In this article, we argue that there are in fact compelling general principles governing the structure of network representations for natural language semantics and that these structural principles have potentially significant implications for the processes of semantic growth and memory search. We stress from the outset that these principles are not meant to provide a genu- ine theory of semantics, nor do we believe that networks of word–word relations necessarily reflect all of the most important or deepest aspects of semantic structure. We do expect that se- mantic networks will play some role in any mature account of word meaning. Our goal here is to study some of the general structural properties of semantic networks that may ultimately form part of the groundwork for any semantic theory. The principles we propose are not based on any fixed structural motif such as the tree-struc- tured hierarchy of Collins and Quillian (1969). Rather, they are based on statistical regularities that we have uncovered via graph-theoretic analyses of previously described semantic net- works. We look at the distributions of several statistics calculated over nodes, pairs of nodes, or triples of nodes in a semantic network: The number of connections per word, the length of the shortest path between two words, and the percentage of a node’s neighbors that are themselves neighbors. We show that semantic networks, like many other natural networks (Watts & Strogatz, 1998), possess a small-world structure characterized by the combination of highly clustered neighborhoods and a short average path length. Moreover, this small-world structure seems to arise from a scale-free organization, also found in many other systems (Barabási & Albert, 1999; Strogatz, 2001), in which a relatively small number of well-connected nodes serve as hubs, and the distribution of node connectivities follows a power function. These statistical principles of semantic network structure are quite general in scope. They appear to hold for semantic network representations constructed in very different ways, whether from the word associations of naive subjects (Nelson, McEvoy, & Schreiber, 1999) or the considered analyses of linguists (Miller, 1995; Roget, 1911). At the same time, these regu- larities do not hold for many popular models of semantic structure, including both hierarchical or arbitrarily (unstructured) connected networks (Figures 1A and 1B), as well as high-dimen- sional vector space models such as Latent Semantic Analysis (LSA; Landauer & Dumais,
44 M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) 1997). These principles may thus suggest directions for new modeling approaches, or for ex- tending or revising existing models. Ultimately, they may help to determine which classes of models most faithfully capture the structure of natural language semantics. As in studies of scale-free or small-world structures in other physical, biological, or social networks (Albert, Jeong, & Barabási, 2000; Barabási & Albert, 1999; Watts & Strogatz, 1998), we will emphasize the implications of these distinctive structures for some of the crucial pro- cesses that operate on semantic networks. We suggest that these structures may be conse- quences of the developmental mechanisms by which connections between words or concepts are formed—either in language evolution, language acquisition, or both. In particular, we show how simple models of network growth can produce close quantitative fits to the statistics of se- mantic networks derived from word association or linguistic databases, based only on plausi- ble abstract principles with no free numerical parameters. In our model, a network acquires new concepts over time and connects each new concept to a subset of the concepts within an existing neighborhood, with the probability of choosing a particular neighborhood proportional to its size. This growth process can be viewed as a kind of semantic differentiation, in which new concepts correspond to more specific variations on existing concepts, and highly complex concepts (those with many connections) are more likely to be differentiated than simpler ones. It naturally yields scale-free small-world networks, such as the one shown in Fig. 1C (see also Fig. 6). Our models also make predictions about the time course of semantic acquisition, because the order in which meanings are acquired is crucial in determining their connectivity. Concepts that enter the network early are expected to show higher connectivity. We verify this relation experimentally with age-of-acquisition norms (Gilhooly & Logie, 1980; Morrison, Chappell, & Ellis, 1997) and explain how it could account for some puzzling behavioral effects of age of acquisition in lexical-decision and naming tasks, under plausible assumptions about search mechanisms in semantic memory. Our growing network models are not intended as anything like a complete model of semantic development, or even a precise mechanistic model of some aspects of semantic development. Rather, they capture just one aspect of semantic development at a high level of abstraction: the or- igin of new meaning representations as differentiations of existing meaning representations. There are clearly many aspects of semantic development that these models do not address, such as the different routes to word learning, the relative roles of verbal and nonverbal experience, and the possibility that semantic relations may disappear as well as appear. Our model shows that one aspect of semantic development—growth of semantic networks by differentiations of existing nodes—is sufficient to explain the characteristic large-scale statistical structures we observed across different semantic networks. There are also many aspects of semantic structure not cap- tured in our simplified semantic network models: the context sensitivity of meanings, the exis- tence of different kinds of semantic relations, or the precise nature of the relations between word meanings and concepts. We leave these topics as questions for future research. 2. Basic concepts from graph theory We begin by defining some terminology from graph theory and briefly introducing the statisti- cal properties that we will use to describe the structure of semantic networks.1 Underlying every
M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) 45 semantic network is a graph, consisting of a set of nodes (also called vertices) and a set of edges or arcs that join pairs of nodes. The number of nodes in the network is denoted by n. An edge is an undirected link between two nodes, and a graph containing only edges is said to be undirected. An arc is a directed link between two nodes, and a graph containing only arcs is said to be di- rected. Every directed graph corresponds naturally to an undirected graph over the same nodes, obtained by replacing each arc with an edge between the same pair of nodes (see Table 1). Two nodes that are connected by either an arc or edge are said to be neighbors; a neighbor- hood is a subset of nodes consisting of some node and all of its neighbors. When the network is directed, the in-degree and out-degree of a node refer to the numbers of arcs incoming to or out denote the in- and out-degree outgoing from that node, respectively. The variables k i of node i, respectively. When the network is undirected, the in-degree and out-degree are al- ways equal, and we refer to either quantity as the degree of a node. We write the degree of node i as ki. We will also use ki with directed networks to denote the degree of node i in the corre- sponding undirected network (i.e., the total numbers of neighbors of node i). in and k i In an undirected graph, a path is a sequence of edges that connects one node to another. In a directed graph, a (directed) path is a set of arcs that can be followed along the direction of the arcs from one node to another. We can also talk about undirected paths in a directed graph, re- ferring to the paths along edges in the corresponding undirected graph, but by default any ref- erence to paths in a directed network will assume the paths are directed. For a particular path from node x to node y, the path length is the number of edges (in an undirected graph) or arcs (in a directed graph) along that path. We refer to the distance between x and y as the length of the shortest path connecting them.2 In a connected graph, there exists an undirected path be- tween any pair of nodes. A directed graph is said to be strongly connected if between any pair of nodes there exists a path along the arcs. A (strongly) connected component is a subset of nodes that is (strongly) connected. We characterize the structure of semantic networks primarily in terms of four statistical features defined using the previous terminology. These quantities are the average distance L, the diameter D, the clustering coefficient C, and the degree distribution P(k). L and D are closely related: L refers Table 1 Definitions for terms and variables used throughout the paper Term/variable Definitions n L D C k, kin, kout P( k ) γ random graph small-world structure scale-free network number of nodes the average length of shortest path between pairs of nodes the diameter of the network the clustering coefficient (see Equation 1 and accompanying text) the degree, the in-degree, and out-degree (degree = number of connections) degree distribution average degree power-law exponent for the degree distribution (see Equation 2) network where each pair of nodes is joined by an edge with probability p network with short average path lengths L and relatively high clustering coefficient C (by comparison with equally dense random graphs) network with a degree distribution that is power-law distributed
46 M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) to the average of the shortest path lengths between all pairs of nodes in a network, whereas D refers tothemaximumofthesedistancesoverallpairsofnodes.Inotherwords,atmostDstepsareneeded to move from any node to any other, but on average only L steps are required.3 The clustering coefficient C, and the degree distribution P(k), are two different probabilistic measures of the graph connectivity structure. C represents the probability that two neighbors of a randomly chosen node will themselves be neighbors, or alternatively, the extent to which the neighborhoods of neighboring nodes overlap. Following Watts and Strogatz (1998), we calculate C by taking the average over all nodes i of the quantity C i  T i      k i 2       2 T i  k k i i 1  (1) where Ti denotes the number of connections between the neighbors of node i, and ki (ki – 1) / 2 is the number of connections that would be expected between i’s neighbors if they formed a fully connected subgraph. Because Ti can never exceed ki(ki – 1) / 2, the clustering coefficient C is nor- malized to lie between 0 to 1, as required of a probability. When C = 0, no nodes have neighbors that are also each others’neighbors. In a fully connected network (i.e., every node is connected to all other nodes), C = 1. Although the clustering coefficient is sensitive to the number of connec- Fig. 2. An illustration of the graph-theoretic properties that we will apply to semantic networks. (a) Two networks with equalnumbersofnodesandedges.Forbothnetworks,thevariablesn(numberofnodes)and(averagedegree,i.e.,av- erage number of edges) are shown as well as the statistical properties L (average shortest path length), D (diameter) and C (clustering coefficient). Note that the two networks in have different clustering coefficients even though they have the same n and . (b) The degree distributions corresponding to the two networks above. Both networks show the typical patternforrandomgraphs:approximatelybell-shapeddistributionswithtailsthatdecayexponentiallyaskincreases.
M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) 47 tions in a network, it is possible for two networks to have the same number of connections but dif- ferent clustering coefficients (see Fig. 2). Finally, note that because the definitions for Ti and ki are independent of whether the connections are based on edges or arcs, the clustering coefficients for a directed network and the corresponding undirected network are equal. The degree distribution P(k) represents the probability that a randomly chosen node will have degree k (i.e., will have k neighbors). For directed networks, we will concentrate on the distribution of in-degrees, although one can also look at the out-degree distribution. We esti- mate these distributions based on the relative frequencies of node degrees found throughout the network. The most straightforward feature of P(k) is the expected value under P(k). This quantity, estimated by simply averaging the degree of all nodes over the network, ranges be- tween 0 and n (for a fully connected network of n nodes) and represents the mean density of connections in the network. More information can be gleaned by plotting the full distribution P(k) as a function of k, using either a bar histogram (for small networks, as in Fig. 2), a binned scatter plot (for large networks, as in Fig. 5), or a smooth curve (for theoretical models, as later illustrated in Fig. 3). As we explain in the next section, the shapes of these plots provide char- acteristic signatures for different kinds of network structure and different processes of network growth. Fig. 2 shows these statistical measures for two different networks with 10 nodes and 15 edges. These examples illustrate how networks equal in size and density of connections may differ significantly in their other structural features. Fig. 2 also illustrates two general proper- ties of random graphs—graphs that are generated by placing an edge between any two nodes with some constant probability p independent of the existence of any other edge (a model intro- duced by Erdös and Réyni, 1960). First, for fixed n and , high values of C tend to imply high values of L and D. Second, the degree distribution P(k) is approximately bell shaped, with an exponential tail for high values of k.4 Although these two properties hold reliably for ran- dom graphs, they do not hold for many important natural networks, including semantic net- works in natural language. We next turn to a detailed discussion of the small-world and scale-free structures that do characterize natural semantic networks. Both of these structures can be thought of in terms of how they contrast with random graphs: Small-world structures are essentially defined by the combination of high values of C together with low values of L and D, whereas scale-free structures are characterized by non-bell-shaped degree distributions, with power-law (rather than exponential) tails. 3. Small-world and scale-free network structures Interest in the small-world phenomenon originated with the classic experiments of Milgram (1967) on social networks. Milgram’s results suggested that any two people in the United States were, on average, separated by only a small number of acquaintances or friends (popu- larly known as “six degrees of separation”). Although the finding of very short distances be- tween random pairs of nodes in a large, sparsely connected network may seem surprising, it does not necessarily indicate any interesting structure. This phenomenon occurs even in the random graphs described previously, where each pair of nodes is joined by an edge with proba- bility p. When p is sufficiently high, the whole network becomes connected, and the average distance L grows logarithmically with n, the size of the network (Erdös & Réyni, 1960).
48 M. Steyvers, J. B. Tenenbaum/Cognitive Science 29 (2005) Watts and Strogatz (1998) sparked renewed interest in the mathematical basis of the small-world phenomenon with their study of several naturally occurring networks: the power grid of the western United States, the collaboration network of international film actors, and the neural network of the worm Caenorhabditis elegans. They showed that although random graphs with comparable size n and mean connectivity describe very well the short path lengths found in these networks, they also exhibit clustering coefficients C, that are orders of magnitude smaller than those observed in the real networks. In other words, natural small- world networks somehow produce much shorter internode distances than would be expected in equally dense random graphs, given how likely it is that the neighbors of a node are also one another’s neighbors (see Note 4). For the remainder of this article, we use the term small-world structure to refer to this combination of short average path lengths L, and relatively high clus- tering coefficients C (by comparison with equally dense random graphs). Small-world structures have since been found in many other networks (reviewed in Strogatz, 2001), including the World Wide Web (WWW; Adamic, 1999; Albert, Jeong, & Barabási, 1999), networks of scientific collaborators (Newman, 2001), and metabolic networks in biology (Jeong, Tombor, Albert, Oltval, & Barabási, 2000). Watts and Strogatz (1998) proposed a simple abstract model for the formation of small-world structures, in which a small number of the con- nections in a low-dimensional regular lattice are replaced with connections between random pairs of nodes. The local neighborhood structure of the lattice leads to high clustering, whereas the long-range random connections lead to very short average path lengths. Amaral, Scala, Barthélémy, and Stanley (2000) distinguished between different classes of small-world networks by measuring the degree distribution P(k). In one class of networks, such as C. elegans and the U. S. power grid, the degree distribution decays exponentially. This behavior is well described by random graph models or variants of the Watts and Strogatz (1998) model. In other systems, such as the WWW or metabolic networks, the degree distribu- tion follows a power law (Barabási & Albert, 1999), k γ ( ) P k (2) for values of γ typically between 2 and 4. Fig. 3 illustrates the difference between power-law and exponential degree distributions. Intuitively, a power-law distribution implies that a small but significant number of nodes are connected to a very large number of other nodes, whereas Fig. 3. Two different kinds of degree distributions that may be observed in complex networks. (a) A power-law distribu- tion (dashed) and an exponential distribution (solid), shown with the same linear scales as the histograms in Figure 2b. (b) When plotted with log-log scaling, the same distributions are more clearly differentiated, particularly in their tails.
分享到:
收藏