Self-Organization in Phonology

Draft of a chapter to appear in 'A Companion to Phonology'

Andrew Wedel, University of Arizona


Structure can arise in a system in many different ways. Self-organization is one general mechanism for structure formation which has relatively recently been explored as a possible contributor to patterns found in language. The aims of this chapter are:

  • to provide an overview of self-organization as a general mechanism for structure formation;
  • to describe some of the ways that self-organizational processes can interact with other familiar mechanisms for structure formation;
  • to review work done to date arguing that particular phonological phenomena may arise through the contribution of self-organizational mechanisms.
  • 1. Self-organization as a pathway to structure

    ‘Self-organization’ is not a concept with crisp edges. Instead, it is a big-tent term covering the many ways structure can form in non-linear, dissipative systems. Non-linear systems are those in which the properties of the system as a whole cannot be understood in terms of the properties of individual system elements, in other words, in which new properties emerge through interaction. Cooked egg-white is a familiar example of a non-linear system. Egg-white consists primarily of the globular protein albumin, and prior to cooking the small albumin molecules slide past one another easily, producing a translucent semi-liquid. When the temperature is raised beyond a certain point, the albumin protein chains unfold and stick to each other, creating a large, highly interlocked structure that is opaque and stiff. These properties of cooked egg-white cannot be understood in terms of the summed properties of individual unfolded albumin molecules, but from their interaction as they stick together.

    Dissipative systems are those in which a given state or structure is maintained through a constant flux of energy or matter. As a consequence, an account for a structure within a dissipative system includes time at some level. A ripple in a creek provides a simple example of a higher order structure produced through a flux at a lower level of description. At one level of description, a ripple is an independent element of a creek that can interact with other elements at that level, such as a floating leaf or another ripple. At a lower level of description, it is a vast and constantly changing set of water molecules interacting with each other and the creek bed as they move. If the flow of water stops, the ripple disappears.

    Structure arises in non-linear, dissipative systems when many similar elements or events interact over time to produce persistent changes at some higher level of organization. Typically, structure formation in self-organizing systems is the result of positive and negative feedback loops engendered by the interaction of system elements with each other or with the environment. Positive feedback (also sometimes referred to as autocatalysis) arises when a given event makes a similar event more likely in the future. An example is the population growth that occurs when individuals have offspring at greater than the replacement rate. In this case, the birth of each additional individual makes a subsequent birth more likely. Positive feedback promotes change and can result in runaway processes.

    Negative feedback arises when an event makes a similar event less likely in the future, as when a growing population outstrips its supply of resources. In this case, each additional birth lowers the probability of a subsequent birth through increased competition. Negative feedback promotes stability. Both positive and negative feedback represent types of non-linearity, because the description of patterns resulting from feedback must make reference to interactions between system elements. Self-organization often occurs in systems through positive feedback between system internal elements that is prevented from snowballing beyond a certain point by negative feedback. Examples of this sort include population growth limited by finite resources, thunderstorm structure in which a growing updraft is constrained by a resulting downdraft, and economic bubbles, burst by the collapse of credit.

    Self-organized systems exhibit emergence. Emergence in this context refers to the generation of a higher order structure that interacts meaningfully with other structures of the system at this level of description. Our earlier example of a ripple in a stream serves as a familiar instance of emergence: the influence of a ripple on other system elements (a leaf, another ripple) is most usefully described in terms of our understanding of the behavior of ripples, rather than our understanding of water molecules.

    The Game of Life (J.H Conway, reported in Gardner 1970) provides a simple example of a deterministic, self-organized system that exhibits these properties. The Game of Life is a simple cellular-automaton that occupies an infinite, two-dimensional orthogonal grid, the cells of which can either be ‘alive’ or ‘dead’. There are three simple rules governing cell-birth and death, each of which makes reference to a cell’s eight immediate neighbors:

    1. If a living cell has fewer than two living neighbors, it dies.
    2. If a living cell has more than three living neighbors, it dies.
    3. If a dead cell has exactly three living neighbors, it becomes alive.

    The grid is initialized with some seed pattern of living cells, and then left to evolve according to these three rules. Some seed patterns result in uninteresting outcomes: if the distribution of living cells is too sparse, all cells quickly die; conversely, some seed patterns are stable and do not change even though the rules continue to be applied. Four cells arranged in a square is one example of such a stationary pattern. Other seed patterns produce oscillating structures or structures that move in a consistent direction across the grid. For example, a seed pattern of three living cells in a row produces an orthogonally oscillating row of three. The pattern known as Gosper’s Glider Gun is a particularly beautiful example of the complexity that can arise through the interaction of these simple rules over time.

    Gosper's Glider Gun
    Gosper's Glider Gun creating gliders.

    The Game of Life exhibits many of the typical features of self-organizing systems. Structure-formation depends on the interaction between elements (it is non-linear) and the application of cell birth- and death-rules over time (it is dissipative). It also requires the interaction of context-dependent positive and negative feedback processes: depending on the local context, the birth of a cell can cause the birth of additional cells in the next round or it can cause death. As in many self-organizing systems, structure arises in The Game of Life through positive feedback that is held in check by negative feedback. Finally, this system exhibits emergence, in which distinct groupings of living cells function as units with predictible behavior. Within the illustration of Gosper’s Glider Gun in Figure 1, two large groupings of cells bounce off of stationary square groupings at the edges of the system, and then bounce off of each other. In the process of bouncing off of each other, they create small self-contained ‘gliders’ that embark on an infinite journey down to the right.

    Finally, self-organized systems frequently exhibit phase transitions between semi-stable states defined by attractors. An attractor is a system state (or set of states) that nearby states tend to evolve toward. A simple visual metaphor for a system with multiple attractors is a surface with multiple basins. If a ball is placed somewhere on this surface, it will tend to roll to the bottom of whatever basin it happens to be in. Phase transitions correspond to the transition from one basin of attraction to another and are accompanied by a shift in the behavior of a system. In our visual analogy, if we begin to shake the surface, the ball will begin to bounce around within its basin and may eventually by chance roll up and over into a new basin, where it remains until it rolls up and over into a new basin.

    The simulation provided below provides a demonstration of a simple self-organizing system with two attractors and a phase transition between them. This simulation takes place in an orthogonal field of squares, in which a square can be yellow or blue and can change colors under the influence of two factors:

  • Biased noise: In each round of the simulation each square has a 5% probability of resetting its color. This probability can be biased toward one color or the other.
  • Local similarity bias: if there is a greater than 2:1 majority of any one color within a radius of four squares, a square changes color to match.
  • The similarity bias creates positive feedback, because once a particular neighborhood crosses the threshold of more than a 2:1 majority of one color, squares within that neighborhood tend to change to reinforce color similarity. There are two possible colors, and therefore two attractors in this system – a field of all blue, or of all yellow squares. To run the simulation, click on 'setup', and then click 'go'. The field is set up initially with all yellow squares and with the noise bias between the two colors set at zero. As the simulation begins, squares occasionally change to blue under influence of noise, but then rapidly revert to yellow because most of their neighbors are yellow. While the simulation is running, move the slider marked 'noise-bias-toward-blue' from 0% up to 50%. You will notice that as you boost the bias toward blue, the equilibrium proportion of blue squares increases as seen in the graph at lower left labeled 'Color change through time'. Although random change is biased toward blue, the system remains within the yellow attractor: the field still remains overwhelmingly yellow. If you increase the bias toward blue above and beyond 60%, however, larger patches of blue eventually appear and rapidly spread over the entire field. Although yellow squares continually appear through noise, the system is now within the blue basin of attraction.

    Many self-organizing systems driven by positive feedback exhibit hysteresis, which is the tendency for a current state to persist even when the original conditions promoting that state are no longer present. To see that this system exhibits hysteresis, once the field has become blue move the 'noise-bias-toward-blue' slider back to 0% bias, or even to -50% (i.e, a moderate bias toward yellow). Because of positive feedback, once a blue or yellow-majority pattern arises, it is self-sustaining despite the loss of an original bias that may have prompted it. To stop the simulation, click on 'go' again.

    Within the domain of morphology, local similarity bias in the form of analogical extension has been argued to influence the course of morphological change over time (see e.g. Hock 2003, Garrett 2008). Likewise, pockets of formally similar irregulars ('gangs') have been shown to be more likely to recruit new members than formally isolated irregulars (Bybee and Moder 1983, Stemberger and MacWhinney 1988). Under this general model, coherent generalizations over forms act as emergent attractors, just like the yellow or blue patterns in the simulation above. These patterns of similarity-based extension and, plausibly, resistance to extension are consistent with a model in which local similarity effects play a significant role in the formation of larger-scale morphological regularities. For a detailed implementation of this type of model simulating the evolution of past tense forms in Old English, see Hare and Elman 1995. In a similar fashion, I have argued that similarity biases at the level of sound-categories may underlie the development of regular patterns in phonology (Wedel 2007) as well as the outcome of conflicts between phonological and morphological regularities (Wedel 2009).

    2. Self-organization in interaction with other influences on structure

    Self-organization does not operate in a vacuum. It contributes structure in a context supplied by the system and its environment, and the properties of the system that support and direct the emergence of new structure can have any source. Features of the environment can supply negative feedback or serve as templates that give initial direction to self-organized structure. Supporting features of the system itself may arise from self-organization at a lower level of description, or may be innately specified. In the Game of Life, for example, self-organized structure formation is dependent on the properties of the environment (the orthogonal grid), and on the pre-determined character of interactions between cells. The structures that develop are also critically dependent on the initial seed pattern of living and dead cells which serves as an organizing template. Likewise, the behavior of the phase-transition simulation above depends on properties of the system that are determined elsewhere. The properties of the local similarity bias are given, as is the 5% probability of random change for any square in a given round. The bias in the direction of change is set by the user.

    From a design point of view, self-organization is a powerful tool: if the details of a complex structure can be constructed through emergence instead of direct specification by some other means, it can be encoded considerably more compactly than otherwise possible (Gell-Mann 1992). For example, the specification of Gosper’s Glider Gun requires only a description of the environment, the rules for cell birth and death, and the initial seed pattern. Further, different complex structures can be created by minimal changes to this description, such as the properties of the environment, the seed pattern or the rules for cell-birth and death.

    Similarly, many biological structures are thought to emerge from self-organizational pathways which are given shape and direction by innately specified contexts. For example, the spots and stripes that are found throughout the animal kingdom have been proposed to emerge through a single basic system with slight variations involving the competition between diffusing activator and inhibitor molecules in an animal’s skin. (This model was originally proposed by Alan Turing in 1950; for discussion see Ball, Chapter 4). Different shapes and patterns of coat markings can be produced simply by changing the relative diffusion speeds of the activator and inhibitor, or by changing the shapes of the underlying pigment-producing cells. This is an informationally much more compact way to produce a complex coat pattern than, for example, specifying the state of each and every individual pigment-producing cell. Furthermore, patterns that arise through self-organizational pathways are often very robust to perturbation because of the role that attractors play in the evolution of the system. For an overview of some of the many biological patterns that are thought to arise through self organization, including examples of the interaction between self-organizational pathways and other mechanisms, see Camazine et al. 2001.

    There are many resources available on the web to learn more about self-organization and related concepts. Excellent published resources include At Home in the Universe: The Search for the Laws of Self-Organization and Complexity (Kauffman 1995), A Self-Made Tapestry (Ball, 1999) and Self-organization in biological systems (Camazine et al. 2001).

    3. Self-organization as a pathway for structure formation in language

    Much recent work approaches language as a complex adaptive system in which grammatical patterns are emergent properties resulting from the repeated interaction of the many different elements that make up a larger language system: innate and acquired biases; forms at multiple levels of representation; interacting spheres of use; sociolinguistic networks; and chains of acquisition and transmission over longer time-scales. For representative examples, see Haspelmath 1999; Nettle 1999a and b; Plaut and Kello 1999; de Boer 2000; Croft 2000; Lindblom 2000; Bybee 2001; Bybee and Hopper 2001; Kirby 2001; Oudeyer 2002; Bod et al. 2003; Blevins 2004; Wedel 2007, Boersma and Hamman 2008, Mielke 2008; Blevins and Wedel 2009, Kirby et al. 2009.

    A priori, there are at least two plausible reasons to think that self-organizational pathways may contribute to some of the patterns we find in language. The first is simply that at many levels and time-scales, language provides the necessary conditions to support spontaneous emergence of patterns through self-organizational pathways (see e.g. Lindblom et al. 1984; Ohala 1989; Lindblom 1992; Keller 1994; Labov 1994; Cziko 1995; Elman 1995, Deacon 1997, Cooper 1999; Hurford 1999; Steels 2000, Bybee 2001; Blevins 2004, MacWhinney 2006, Pisoni and Levi 2007, Beckner et al. to appear, and many others). Language involves the repeated interaction of many like elements, in like ways, at many levels of description and time-scales. Variation and bias in language acquisition and use provide many potential feedback pathways. Basic language-external constraints, from articulatory and perceptual factors through general categorization mechanisms to cross-culturally common salience relationships, all provide structures and templates that could give common shape to self-organized patterns. Because structure tends to spontaneously emerge under these conditions, it would be surprising if self-organizational pathways do not contribute to the formation of some of the many observed patterns in language, whether language-particular or crosslinguistically frequent. Put another way, if we found that self-organizational mechanisms played no role in the emergence of any observed language patterns, our burden would be to explain why not.

    The second, perhaps less compelling reason derives from design-principles. As briefly reviewed above, the specification of a complex pattern can be much more compact and the resulting structure more robust to perturbation when created through self-organizational pathways. To the extent that the language faculty has evolved under functional constraints for use and that grammars continue to do so diachronically, self-organization represents a powerful mechanism for structure in the "blind watchmaker’s toolbox" (Dawkins 1986).

    Self-organization in phonology

    Explorations of self-organizational accounts for linguistic patterns can be understood in terms of the general scientific goal of explaining more with less. Just as a good Optimality Theory (Prince and Smolensky 1993) account attempts to explain new patterns through rankings of existing constraints, an account for linguistic patterns making use of a self-organizational pathway attempts to explain a complex pattern through the interaction of simpler independent mechanisms. Often, authors of these accounts suggest that a linguistic pattern emerges through the interaction of domain-general factors rather than through innate grammatical mechanisms, but this is not a necessary feature of a self-organizational account. Just as a self-organized biological pattern may arise from innately specified processes, a self-organized linguistic pattern could arise from simpler language-specific structures. Some models under the rubric of Biolinguistics have been framed in these terms; see e.g. Medeiros 2008.

    Many self-organizational accounts make use of computational simulation, either as an existence-proof that a given structure can arise through interactions between some defined set of system properties, and/or as a supporting illustration for verbal or analytic arguments. Simulation is particularly useful in this context because self-organization proceeds through chains of circular causation progressively building structure over time. As a consequence, verbal descriptions of proposed self-organizational processes are often hard to assess critically. More importantly, interacting feedback loops are notorious for producing counter-intuitive results, so a computational implementation of a model provides both an important research tool for a theorist and a demonstration for a reader that the model operates as expected (Peck 2004). The following is a brief survey of several phenomena in phonology that have been proposed to arise in part through self-organizational pathways. Where possible, brief summaries of simulation architectures are included.

    Early phonological acquisition

    In early phonological acqusition, initial relatively accurate word imitation is followed by a period of less accurate, but more systematic productions (Ferguson and Farwell 1975, reviewed in Vihman et al, 2009), reminiscent of the U-shaped learning curve of irregular morphological forms. Further, while these production patterns are consistent for a given child, they differ across children suggesting that the pathway to phonological competence is not prespecified at this level (Vihman and Croft 2007, Vihman et al. 2009). Vihman et al. (2009) propose that this phenomenon can be explained in a model based on the ability of infants to acquire individual word-gestalts, combined with an ability to generalize over those gestalts through feedback from their own production (see also Pierrehumbert 2003). Under this model, an infant’s set of practiced babbles provide the seed patterns for initial generalizations over learned word-gestalts through a process of feedback from their own productions. The substitution of these generalized phonological ‘templates’ in place of gestalts accounts for the period of poor production accuracy, yet greater precision. Accuracy subsequently improves as the myriad interactions with caregivers further shapes the trajectory of learning. Vihman et al. argue that a self-organizing, feedback-driven model of this kind is particularly well-suited to explain both the highly individual initial production templates observed in children, and their subsequent convergence on a community standard of pronunciation.

    Conspiracies in historical phonology

    Conspiracies, in which a seemingly disparate set of processes all result in a common pattern, are widespread in diachronic phonology. Robert Blust, for example, identifies many different types of diachronic change in Austronesian languages that conspire to create a disyllabic word, in many cases restoring a historical disyllable that had lost or gained a syllable through other changes (Blust 2007). In the history of Javanese, for example, reduplication, epenthesis, deletion and loss of a morpheme boundary have all occurred preferentially when the product of change is a disyllabic word. Blust suggests that conspiracies arise when a pattern in a language becomes particularly salient, leading it to function as a ‘linguistic attractor’ in language change (Cooper 1999). Within variationist approaches to language change (e.g. Bybee 2001, Blevins 2004), a salient pattern can influence the course of language change by biasing categorization (cf. ‘Change’ in Evolutionary Phonology, Blevins 2004), and/or by biasing the range of variation in production. (For a simulation of pattern feedback in production biasing language change, see Wedel 2007). In this context, Blust notes that disyllables have been reported to make up 94% of the set of content words in proto-Austronesian, and that in many modern Austronesian languages the disyllable remains the dominant word type.

    A variety of results are consistent with the hypothesis that linguistic attractors bias individual behavior and thereby influence the course of language change. A wide range of studies have shown that both grammaticality judgements (e.g., Krott et al. 2002, Albright 2002, Pierrehumbert 2006a) and performance (e.g., Bybee and Moder 1983, Dell et al. 2000, Vitevitch and Sommers 2003, Gonnerman et al. 2007, and many more) are biased by similarity and by pattern type-frequency at a wide range of representational levels (for reviews on various of these topics, see Bybee 2001, Ernestus and Baayen 2003, Bybee and McClelland 2005, Pierrehumbert 2006b, Baayen 2007, Pisoni and Levi 2007). In addition, the behavior of simulated language change in simple model systems is consistent with the general predictions of this model (Wedel 2007, 2009). Further supporting evidence could be sought in patterns of change within iterated artificial language learning and transmission paradigms of the sort pioneered by Kirby et al. (2008). Finally, it is worth noting that within models such as Evolutionary Phonology, synchronic alternation patterns are created through diachronic change rather than through mechanisms localized within a single individual’s language faculty (Blevins 2004, see chapter 3 for a review of earlier theories of this type). Under this model, this account of diachronic conspiracies provides the basis for an account of synchronic conspiracies as well.

    Actuation versus propagation of change

    A long-standing question in historical linguistics is how an initially isolated change can survive and propagate throughout a community, given that language learners tend to converge on a common community standard. To the extent that this is the case, isolated variants should never be able gain a foothold in a speech community because every learner is exposed to many speakers (see Keller 1994, p. 99 and Nettle 1999 for discussion). In a foundational paper, Nettle (1999) uses a well-articulated simulation to explore factors that are required to allow randomly occuring variants to become established, assuming the existence of a stratified social structure. He finds that given reasonable assumptions (i.e., that ceteris paribus, learners tend to adopt the local majority pattern), random variation in acquisition is not sufficient to induce a population-wide transition from one pattern to another without being so pervasive as to obliterate any coherent pattern at all. He then shows that when significant prestige inequities are introduced in which a small number of individuals serve disproportionately as acquisition models, the novel variants that arise can occasionally spread. When a small number of individuals exert strongly disproportionate influence, the effective population size is small, allowing random events a greater chance of influencing the trajectory of change (see the literature on genetic drift in biological populations; an excellent introduction can be found at http://www.nature.com/scitable/topicpage/Genetic-Drift-and-Effective-Population-Size-772523). However, it is clear that functional articulatory and perceptual factors influence the course of change as well; otherwise, we should observe as many diachronic changes that are phonetically unnatural as natural. Nettle explores the influence of functional biases in his model, and concludes that in order to enforce change alone, functional biases have to be sufficiently strong that anti-functional patterns should never occur. Since this is not the case, Nettle argues that social factors are a critical engine of change, but that the rate of actuation and the efficiency of propagation must also be biased by functional factors that influence ease of production, perception and acquisition.

    Many phonologists have provided evidence suggesting that phonological change can gradually percolate through the lexicon within a community through lexical diffusion (e.g., Wang 1967, 1977, Phillips 1984, Bybee 2003). A mechanism for an initially local variant to spread in this way can be illustrated with a variant of the simulation above. To see this, follow along with the following thought-experiment concerning the development of a phonetically natural allophonic split. This simulation does not represent a complete model of allophone development, but rather abstracts out and demonstrates a single mechanistic component that could contribute to a more complete model. In the simulation below, each small square in the field represents a lexeme that contains a particular class of phonemes, for example stops. We start with a single coherent form for each of these stops denoted by the color yellow, for example voicelessness. The squares on the right and left halves of the field represent lexical items with different local phonetic environments for stops: on the left side, the environment favors the voiceless/yellow allophone; on the right, the environment favors a voiced/blue variant. For example, imagine that lexical items on the left side of the field contain word-initial or word-final stops. Conversely, the right side of the field represents the portion of the lexicon in which lexical items contain stops in intervocalic contexts. A square's neighbors within the field represent lexical items that are phonologically similar, for example containing the same stop in a similar local environment. In production of any word, error can result in voicing variation, biased in one direction or the other by the local environment. As above, however, we assume a local similarity bias leading sounds in similar environments to be produced in similar ways (reviewed in Wedel 2007, section 2). In this simulation, as above, if a lexical item's neighbors in a radius of 4 squares favor a particular variant by a 2:1 or better margin, that lexical item will tend to be produced to match. In this simplified system, we can imagine each square to represent the most recently produced exemplar of that lexical category.

    Click 'set-up'. The slider marked 'bias-strength' should be set at 0%, representing no context-related bias in production. When you click on 'go', you'll see widely scattered blue squares blink on and off as lexical items are occasionally produced with voiced stop variants through random noise. In the plot on the left, you can follow the proportion of blue squares on the right hand side of the field. Now increase the bias to 50%, in which the odds of a yellow square on the right side of the field turning blue is twice the odds on the left side of the field. You'll see that the instantaneous proportion of blue squares on the right rises slightly, but the entire field remains ovewhelmingly yellow. In our metaphorical extension, stops in lexical items on right hand side of the lexicon are still usually produced as voiceless, despite the functional pressure for voicing. As you increase the bias above 50%, consistently voiced/blue variants will eventually emerge and spread across the portion of the lexicon with a stop-voicing bias. This change is actuated by a chance event (the chance production of many blue variants in one region of the lexicon at the same time), and then spreads as this new local generalization interacts with the existing bias toward voicing in particular environments.

    A more complete model of this sort would need, at minimum, to include a more sophisticated model of similarity relationships within the lexicon, a gradient model of phonetic variation; an interacting network of such individual models to represent a community of speakers including social factors; and a vertical/acquisition component to allow a more realistic phase-transition from one coherent pattern to another over the community. More generally however, this model of actuation and propagation proposes that ever-present contextual phonetic biases are usually inhibited from creating a large number of context-specific coherent allophonic patterns by a general bias toward overall coherence within a larger category -- in this example, a phoneme. However, if some chance event does create a sufficiently significant, locally coherent pattern that fits an existing phonetic bias, the bias can both stabilize this local pattern against reversion to the more global pattern and also spread that pattern to other similar forms. And of course, as Nettle argues, social factors are likely to play a critical role in this the outcome of this chance event. More discussion and a similar model of the development of phonological regularities can be found in Wedel (2007).

    Emergence of phonemes and inventory structure

    Vowel inventories appear to be constructed to optimize perceptual contrast between neighboring vowels given extant articulatory constraints (Liljencrants and Lindblom 1972). How does this apparent optimization come about? In an early self-organizational approach to this problem using a perception-production feedback loop, de Boer (1999) proposed that structure in vowel inventories emerges through interaction of language users under perceptual and production constraints, assuming a tendency for language users to imitate each other. To test and illustrate this, he constructed a simulation in which a group of agents can produce, perceive and remember vowel pronunciations in the form of prototypes. (Agents are entities within a simulation that can change independently, here representing individual language users.) Within the simulation, agents speak and imitate each other, modifying their vowel categories in response to how successful their imitations are. In each round, a random pair of agents is chosen from the larger set of agents to act as speaker and hearer. The speaker articulates a randomly chosen vowel from memory with some random error; if it has no vowels in memory, it produces a random vowel within the available articulatory space. The hearer compares the formant values of the vowel to prototypes it has in memory and chooses the closest one. If it has no vowels in memory, it creates a similar vowel and calculates its associated articulatory parameters. The listener then repeats the matched prototype vowel for the speaker, who checks to see how close it was to the originally produced vowel. If the vowel is judged to be the same, the speaker agent gives the listener feedback that its imitation was successful. In that case, the listener shifts the parameters of its matching prototype closer to the vowel that it heard from the speaker. If the imitation was not successful, the listener checks its memory to find out how often that prototype has given rise to successful imitations. If it has been mostly unsuccessful, it moves that prototype closer to the sound it heard, just as in a successful imitation. If it has been mostly successful before, it may be that the speaker has an additional prototype vowel in that region of vowel space, and so the listener creates a new prototype to approximately match what it heard. Several additional processes come into play in this simulation: (i) if a vowel prototype is infrequently matched to a perceived vowel, it is discarded; (ii) if two vowel prototypes are too close together, they are merged, and (iii) new vowels are introduced by speakers at a low frequency.

    This simulation employs a number of features that may not correspond directly to actual features of language use (e.g., direct feedback on imitative success; the operative mechanisms of category loss and merger), but that is not its primary point. These mechanisms simply allow vowel inventories in individual agents to change over time in response to constraints on the differentiation of vowels that are perceived in individual usage events. de Boer shows that given these constraints, populations of agents starting with empty vowel inventories develop jointly held, phonetically natural vowel inventories. He concludes from this that the typological generalizations over vowel inventories found in natural language may arise through articulatory and perceptual constraints in usage rather than some more direct, innate specification. Coherent structure is primarily driven by positive feedback in this system, which comes in two forms: modification of prototypes toward perceived vowels, and merger of prototypes that get too close. These encourage the development of coherent vowel categories shared across the set of agents. Because vowels that are too perceptually confusable tend to be merged, the set of surviving vowels tends towards a perceptually ‘optimal’ arrangement.

    Pierre-Yves Oudeyer has used an abstractly similar, more physiologically grounded model of a perception/production feedback loop to argue that positive feedback inherent in processing can create categorial distinctions in the absence of any functional pressure (Oudeyer 2002, 2006). Research in response biases of cortical fields of neurons shows that their output is well predicted by the aggregate response of the entire field, rather than by the output of the most highly activated neuron. From the set of activities of all neurons, it has been found that one can predict the perceived stimulus or motor output by computing the population vector over the field, namely, the sum of all preferred outputs of the set of neurons multiplied by their activities (Georgeopoulos, Schwartz, and Ketter 1986; for an account of the perceptual magnet effect (Kuhl 1991) based in this phenomenon, see Guenther and Gjaja 1996). The important feature of the population vector for our purposes is that it is shifted toward the center of the local distribution of outputs relative to the most highly activated neuron. Given a close mapping between perception and production (Oudeyer 2002, Fowler and Balantucci 2005), this property of cortical fields should produce positive feedback promoting the coalescence of perceptual-motor categories into well-defined distributions over many cycles of use.

    In Oudeyer’s model, linked motor and perceptual cortical fields are initially populated with randomly tuned neurons, such that there are no distinct coherent sound categories. Over the course of the simulation, randomly chosen production stimuli are produced by the motor field and processed by the perceptual field. In processing, each neuron in the perceptual field is activated by the production stimulus under the control of a Gaussian tuning factor responsive to the degree of match between the stimulus and a neuron’s preferred vector. The preferred vectors of all neurons that have been activated to some degree by the stimulus are then shifted toward that of the maximally active neuron, producing a reversion to the local mean. This update function acts to incrementally consolidate the vectors exhibited by the neural map, influenced by random peaks in the distribution of stimuli produced early in the simulation. The perceptual and motor fields are linked by an update function that shifts vectors in the motor field in parallel to those in the perceptual field, closing the perception/production feedback loop. The resulting positive feedback between perception and production allows a rapid collapse of the originally random distribution of vectors in the sensory map into a small number of coherent sound-motor categories. Oudeyer interprets this feature of his model to suggest that native features of our neurological production and perception apparatus may be designed to develop categories of a particular granularity, and that this feature may play a role in the development of phoneme inventories.

    Merger versus contrast maintenance

    A wide variety of experimental evidence indicates that individual percepts can leave detailed, long-lived traces in memory, and that these memory ‘exemplars’ influence future perception and production behavior (for reviews, see Tenpenny 1995, Johnson 1997; see also Pierrehumbert 2006b and volume 23 of The Linguistic Review). The influence of perception on production (Goldinger 2000, Nielsen 2007) creates the possibility of a perception/production feedback loop in which the effect of biases anywhere in the cycle can potentially build up over time within a single generation. Janet Pierrehumbert used an exemplar-based model of this loop to explore the consequences of feedback for merger between perceptually adjacent phonological categories (2001, 2002). In this model, categories consist of an abstract label and a set of stored perceptual exemplars that have been mapped to that category, where each exemplar is associated with an activation level that decays exponentially over time. No proposed mechanism in this particular model requires transmission between distinct agents, so as a simplification the simulation architecture uses a single category system in conversation with itself. Production proceeds by probabilistically choosing an exemplar in relation to activation level, averaging all the exemplar values within a set window around the chosen exemplar in proportion to their activations, and then adding a small amount of Gaussian noise to that average. Averaging within a window around a single exemplar creates a reversion to the mean of the local distribution, just as the use of the population vector does in Oudeyer’s simulation described above. Adding noise to production outputs keeps the distribution from collapsing to a single point through the effect of averaging and allows the system to evolve over time. To decide what label the new output should be categorized under, the summed activation level of exemplars within a set window around the output value is calculated for each label. The percept is then stored as a new exemplar under the category label with the highest score.

    Pierrehumbert showed that given this archtecture, if two categories drift close enough such that they begin to effectively compete for percepts along their mutual boundary, the category with greater overall exemplar activation tends to eventually absorb the less active category. This occurs through positive feedback between current activation and the ability to compete for percepts. All else being equal, an ambiguous percept is more likely to be mapped to a more active category than a less active category, which only results in the active category becoming yet more active with respect to the less active category. This feedback results in more and more percepts being mapped to the more active category, until the activation of the other category eventually falls low enough that it effectively no longer exists. The more active category is nonetheless influenced by the other category in this process of merger: as it absorbs the other category, its peak shifts toward the joint category mean within perceptual space.

    In all of the work reviewed above, local, similarity-based positive feedback drives coalescence of system elements into categorial groupings. In none of these accounts, however, is there any repulsive force that would prevent the steady merger of categories over time as they eventually drift into one another. As a consequence, the maintenance of multiple categories over the course of language change in these models would require either regular generation of new distinctions as in the de Boer model above or some mechanism to favor preservation of existing contrasts. Paul Boersma and Silke Hamann (2008) have approached the problem of contrast maintenance between existing sound categories through a constraint-based model that makes use of categorization accuracy on the part of a language learner. As a demonstration of their model, Boersma and Hamann simulate the evolution of category label/contents mappings within a unidimensional space. To concretize the model, they use the spectral frequency range of sibilants in human languages as the perceptual space. (In the following brief discussion of their model, I use an /s/ + /ʃ/ two-sibiliant system as a running example, although more or fewer categories are possible.) The architecture of the model is vertical, in which a naive agent learns to associate part of the spectral frequency range with /s/ and another part with /ʃ/ by hearing examples from a teaching agent with feedback. After this learning phase, the agent becomes a teacher and produces examples of /s/ and /ʃ/ to a new learner agent, and so on. For the purposes of the argument, Boersma and Hamann assume that learning agents have acquired sound category labels from word patterns prior to the beginning of the simulation. As a result, learners know at the start how many sibilant categories their language has, but are not yet sure where their distributions lie within the frequency continuum. Learners come to the simulation equipped with the ability to learn a relative strength of connection between a given spectral frequency and /s/ or to /ʃ/ by constructing a ranking among Optimality Theoretic constraints banning the mapping of a particular frequency to a particular category label (Boersma 1997).The architecture of the perception grammar incorporates both frequency of presentation and categorization accuracy, with the result that the grammar is maximally ‘certain’ about mappings for sibilant frequencies that are further apart relative to those frequencies that were most often heard. Because subsequent production is based on sampling from the learned perception grammar rather than from the distribution of actually learned examples, an agent’s production favors a distribution of sibilants that is slightly better separated than that which she heard herself. This creates a positive feedback loop that promotes increasing contrast between categories over many teacher/learner cycles. However, agents’ productions are also influenced by ranked articulatory constraints that have the effect of biasing productions toward the center of the frequency continuum. Boersma and Hamman show that under the balancing influence of positive feedback via the perception grammar and negative feedback from articulatory constraints, categories evolve as well-spaced distributions with a joint center of gravity at the midpoint of the continuum.

    The Boersma and Hamman model relies on error-feedback from the lexicon within the learner to drive contrast-maintenance. Another model for contrast-maintenance has been proposed that operates in situations in which there is an absence of error-feedback (Wedel 2004, Blevins and Wedel 2009). The relevant absence of error feedback in this model occurs when an ambiguous pronunciation is not rescued by an external context to determine its intended category mapping. As an example of external disambiguation, words like 'too' and 'two' in English are rarely confused by listeners because they are used in distinct sentential and semantic contexts. In contrast, because of their semantic similarity, words within morphological paradigms are often distinguished in context primarily by their phonetic differences. For example, utterances like ‘I cook chicken well' and ‘I cooked chicken well' could be used in very similar contexts, in which case the tense of the verb 'cook' is conveyed almost entirely by the audible presence or absence of the past-tense [-t].

    A hypothesis introduced in Wedel (2004) and explored more deeply in Blevins and Wedel (2009) is that this effectively greater 'functional load' of word-internal phonetic information within paradigms may account for anti-homophony effects in paradigms. 'Anti-homophony' refers to the failure of otherwise regular sound changes to occur in words when that change would render them homophonous with another word (see Blevins and Wedel 2009 for a review and examples). As in the Pierrehumbert model of category-merger reviewed above, this model rests on evidence that category behavior is updated by experience: if a pronunciation is ambiguous in context, a hearer may map it to a category that the speaker did not intend, resulting in the effective trading of a variant between the two categories at their boundary. It is this ‘variant trading’ between perceptually adjacent categories that drives the behavior of the model by preserving a crisp boundary between adjacent categories (see Blevins and Wedel 2009 for more discussion). The interaction of the component parameters of the model is illustrated in a simple exemplar-based simulation provided below.

    The simulation follows the evolution of two, one-dimensional categories in a production/perception loop where outputs are produced from each category in every round and then restored in one of the two categories. The two categories are represented in the model by red and blue square exemplars whose value is represented by their position along the x-axis. Multiple exemplars that have the same value are stacked vertically, such that the image of the two categories on the screen is equivalent to the distribution of exemplar values in each category. Categories can freely overlap in this simulation, represented by exemplars of different colors stacked at the same position. Slow memory decay is modeled through eventual deletion of older exemplars.

    In each round, several exemplars are randomly chosen from each of the categories in turn and an averaged output is produced (cf. averaging in Pierrehumbert’s model of merger (2002) and the use of a population vector in Oudeyer’s model of category formation (2002), discussed above). Gaussian noise is added to outputs in production to ensure that novel variants are constantly added to the system. Each variant output is in turn compared to the average values of the two categories and stored as a new exemplar in the category with the closest mean. This continual reproduction of exemplars in the form of new outputs and deletion of old exemplars allows the categories to evolve over time. The important feature of this simulation is the ability of an output to ‘switch colors’: if an output from the red category happens to be more similar to the blue category, it will be stored as a new exemplar of the blue category instead of the red. This is the variant trading that is at the heart of the model.

    To start, click on 'set-up', and set the 'Variant-trading' switch to 'off'. You'll see a stack of red and blue squares at the center of the x-axis; at this starting point the two categories have identical contents, both occupying the central position within the available dimension. When Variant-trading is set to off, outputs are always stored back into their originating category, i.e., a output from the red category will always appear as a new red square. This represents the situation in which external context nearly always disambiguates forms no matter how phonetically ambiguous they are, as for the English examples ‘too’ and ‘two’. Click on 'go once' a few times and notice how the originally narrow distribution shared by both categories relaxes under the influence of noise in production. Also notice that the distribution of red and blue exemplars remains mixed. (The percentage overlap between the red and blue categories is tracked over time in the lower-left graph.) If you'd like to see how the simulation evolves over time under these conditions, click 'go'. To stop the simulation, click 'go' again.

    Now click 'set-up' again to reset the simulation, and set 'Variant-trading' to 'on'. With this setting, outputs are stored in the category with the closest mean value. For example, if an output from the red category happens to be closer to the current average value of the blue category, it will appear as a blue square. Click on 'go once' a few times and notice how the two colors quickly segregate to separate sides of the overall distribution. Now click on 'go' and watch the evolution of the two categories over time. Notice that the two categories always remain separate. To stop the simulation, click on 'go' again. If you restart the simulation with ‘Variant-trading’ set to ‘off’ after the red and blue categories have become well-separated, you'll notice that while the two categories may remain separate at first, they eventually drift back together. It is this approach and effective merger that is inhibited by variant trading between the two categories. You can find the code for this simulation at http://dingo.sbs.arizona.edu/~wedel/simulations/1DVariantTrading.html

    Looking forward

    Our ability to build theories is limited by the knowledge that we already have. Whether or not any of the current self-organization-based accounts in phonology are ‘right’, they are valuable in expanding our understanding of pattern-formation mechanisms. Self-organization is ubiquitous in physical, biological and cultural systems, and given that language provides the conditions for self-organization many times over, linguists should anticipate finding it as a contributing mechanism in this domain as well. Just as with any other type of account, however, showing that a particular structure could arise through self-organization does not mean that it does so. A hypothesis stands or falls on the empirical success of its predictions. Arguments from first principles about, for example, whether phonological patterns are more likely to derive from a language-specific cognitve faculty or a more general set of factors are less valuable than well-constructed tests of model predictions. Fortunately, a wide variety of techniques and approaches are now available for testing hypotheses, from the ever-growing array of psycholinguistic techniques through corpus studies and artificial language learning paradigms.

    Self-organizational models for structure formation make use of previously identified cognitive, articulatory, perceptual or social factors as contributing building blocks. In turn, these models make further predictions about those factors, or may predict some yet-undescribed phenomenon. For example, self-organizing models of phonological change through usage require a production/perception feedback loop that can drive small, but persistent and generalizable changes in post-acquisition phonological categories (e.g., Bybee 2001, Pierrehumbert 2001, 2002, Mielke 2004, Wedel 2007). Although more work needs to be done to establish their generality, results from a variety of psycholinguistic studies are consistent with this prediction (e.g., Goldinger 2000, Nielsen 2007, Kraljic and Samuels 2006). Likewise, each of the other models reviewed above makes new predictions that can be tested empirically. As phonologists bootstrap back and forth between model building and simulation on the one hand and empirical methods on the other, the field should gain a steadily better sense if and how self-organizational mechanisms contribute to the wide variety of phenomena that we study.

    References

    Albright, A. (2002). Islands of reliability for regular morphology: Evidence from Italian. Language 78: 684-709.

    Albright, A. and Hayes, B. (2002). Modeling English Past Tense Intuitions with Minimal Generalization. In M. Maxwell (ed.) Proceedings of the Sixth Meeting of the ACL Special Interest Group in Computational Phonology.

    Baayen, R.H. (2007). Storage and computation in the mental lexicon. In G. Jarema and G. Libben (eds.), The Mental Lexicon: Core Perspectives. Elsevier, 81-104.

    Beckner, Clay, Richard Blythe, Joan Bybee, Morten H. Christiansen, William Croft, Nick C. Ellis, John Holland, Jinyun Ke, Diane Larsen-Freeman, and Tom Schoenemann. (2009). Language is a complex adaptive system. Language Learning 59:4S1.

    Blevins, J. (2004). Evolutionary Phonology: The emergence of sound patterns. Cambridge: Cambridge University Press.

    Blevins, J. and Wedel, A. (2009) Inhibited sound change: An evolutionary approach to lexical competition. Diachronica, 26: 143-183.

    Blust, R. (2007). Dyllabic attractors and anti-antigemination in Austronesian sound change. Phonology 24 : 1–36.

    Bod, Rens, Jennifer Hay, and Stefanie Jannedy (eds) (2003). Probabilistic linguistics. Cambridge, MA: The MIT Press.

    Boersma, P. and Hamann, S. (2008). The evolution of auditory dispersion in bidirectional constraint grammars. Phonology 25, 217-270.

    Bybee, J. (2001). Phonology and language use. Cambridge: Cambridge University Press.

    Bybee, Joan L. and Carol Lynn Moder. (1983). Morphological classes as natural categories. Language 59. 251-270.

    Bybee, J. and McClelland J.L. (2005). Alternatives to the Combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review 22: 381–410.

    Cooper, D. L. (1999). Linguistic Attractors: The cognitive dynamics of language acquisition and change. Amsterdam/Philadelphia: John Benjamins Publishing.

    de Boer, B. (2000) Self-organization in vowel systems. Journal of Phonetics 28(4):441--465.

    Croft, William. (2000). Explaining language change: An evolutionary approach. Harlow, Essex: Longman.

    Cziko, G. (1995). Without Miracles: Universal Selection Theory and the Second Darwinian Revolution. Cambridge, MA: MIT Press.

    Dawkins, Richard (1986). The Blind Watchmaker. New York: W. W. Norton & Company.

    Deacon, T. W. (1997). The Symbolic Species: The Co-evolution of Language and the Brain. W.W. Norton.

    Dell, Gary S., Kristopher D. Reed, David R. Adams, and Antje S. Meyer (2000). Speech errors, phonotactic constraints, and implicit learning: A study of the role of experience in language production. Journal of Experimental Psychology: Learning, Memory, and Cognition 26: 1355–1367.

    Dennet, D. (1995). Darwin’s Dangerous Idea: Evolution and the Meanings of Life. New York: Simon and Schuster.

    Elman, J.L. (1995). Language as a dynamical system. In R. Port and T. van Gelder (Eds.), Mind as Motion: Dynamical Perspectives on Behavior and Cognition. Cambridge, MA: MIT Press.

    Ferguson, C. A. & Farwell, C. B. (1975). Words and sounds in early language acquisition. Language, 51, 419–439. Reprinted with Appendix in W. S.-Y. Wang (Ed.). (1977). The Lexicon in Phonological Change The Hague: Mouton. 7–68.

    Gardner, M. (1970). Mathematical Games. .Scientific American. 223: 120-123.

    Garrett, A. (2008). Paradigmatic Uniformity and Markedness. In Jeff Good (ed.) Linguistic Universals and Language Change. Oxford University Press. 125-143.

    Gell-Mann, Murray. (1992). Complexity and complex adaptive systems. In J. A. Hawkins and M. Gell-Mann (eds.), The evolution of human languages. New York: Addison-Wesley. 3–18.

    Georgeopoulos, Schwartz, A. B., Ketter, R. E. (1986). “Neuronal population coding of movement vdirection”. Science 233.1416–1419.

    Goldinger, S. D. (2000) The role of perceptual episodes in lexical processing. In Cutler, A., McQueen, J. M, and Zondervan, R. Proceedings of SWAP (Spoken Word Access Processes) , Nijmegen, Max-Planck-Institute for Psycholinguistics.

    Gonnerman, L.M., Seidenberg, M.S., & Andersen, E.S. (2007). Graded semantic and phonological similarity effects in priming: Evidence for a distributed connectionist approach to morphology. Journal of Experimental Psychology: General 136, 323-345.

    Guenther, Frank H., and Gjaja, Marin. N. (1996). The perceptual magnet effect as an emergent property of neural map formation. Journal of the Acoustical Society of America 100, 1111-1121.

    Hare, M. and Elman, J.L. (1995). Learning and morphological change. Cognition 56, 61-98.

    Haspelmath, Martin. (1999). Optimality and diachronic adaptation. Zeitschrift für Sprachwissenschaft 18:180–206.

    Hock, Hans H. (2003). Analogical Change. In: Brian Josephs and Richard Janda, (eds.) The Handbook of Historical Linguistics. Blackwell 441-480.

    Hurford, J. (1999). The evolution of language and languages. In R. Dumbar, D. Knight, and C. Power (eds.) The evolution of culture. Edinburgh: Edinburgh University Press.

    Johnson, K. (1997). Speech perception without speaker normalization. In Johnson, K. and Mullennix J. W. (eds.) Talker Variability in Speech Processing. San Diego: Academic Press.

    Kauffman, S. (1995). At Home in the Universe: The Search for the Laws of Self-Organization and Complexity. Oxford University Press.

    Keller, R. (1994) On Language Change: The Invisible Hand in Language London: Routledge.

    Kirby, S. (1999). Function, Selection and Innateness: The Emergence of Language Universals. Oxford: Oxford University Press.

    Kirby, S., Cornish, H., and Smith, K. (2008). Cumulative Cultural Evolution in the Laboratory: an experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences 105(31):10681-10686.

    Kraljic, T. & Samuel, A. (2006). Generalization in perceptual learning for speech. Psychonomic Bulletin & Review, 13, 262-268.

    Krott, A., Schreuder, R. and Baayen, R. H. (2002): Analogical hierarchy: Exemplar-based modeling of linkers in Dutch noun-noun compounds. In R. Skousen, D. Londsdale, & D.B. Parkinson (eds.), Analogical Modeling: An Exemplar-Based approach to Language Amsterdam: John Benjamins. 181-206.

    Kuhl, Patricia K. (1991). Human adults and human infants show a perceptual magnet effect for the prototypes of speech categories, monkeys do not. Perception and Psychophysics 50, 93-107.

    Liljencrants, J. & Lindblom, B. (1972): Numerical simulation of vowel quality systems: The role of perceptual contrast. Language 48: 839-862.

    Labov, W. (1994). Principles of linguistic change, Vol. 1, Internal Factors. Oxford and Cambridge, MA: Blackwell.

    Lindblom, B. (1992). Phonological units as adaptive emergents of lexical development. In: C. A. Ferguson, L. Menn, and C. Stoel-Gammon, (eds.), Phonological Development: Models, Research, Implications. Timonium: York Press. 131-163.

    Lindblom, B., MacNeilage, P. and M. Studdert-Kennedy. (1984). Self-organizing processes and the explanation of language universals. In B. Butterworth, B. Comrie and Ö. Dahl, (eds.), Explanations for language universals. Walter de Gruyter & Co.

    MacWhinney, B. (2006) The emergence of linguistic form in time. Connection Science 17, 191-211.

    Medeiros, D. (2008). Optimal Growth in Phrase Structure. Biolinguistics 2: 156-195.

    Mielke, J. (2008). The emergence of distinctive features. Oxford: Oxford University Press.

    Nettle, D. (1999) Using Social Impact Theory to simulate language change. Lingua, 108(2-3):95--117.

    Nielsen, K. (2007). The Interaction between Spontaneous Imitation and LinguisticKnowledge. UCLA Working Papers in Phonetics 105. pp. 125-137.

    Lindblom, B. (2000). Developmental origins of adult phonology: the interplay between phonetic emergents and the evolutionary adaptations of sound patterns. Phonetica, 57, 297-314.

    Ohala, J. (1981). The Listener as a Source of Sound Change. In Proceedings of the Chicago Linguistics Society 17, Papers from the Parasession on Language and Behavior, 178-203.

    Ohala, J. (1989). Sound change is drawn from a pool of synchronic variation. In Breivik, L. E. and Jahr, E. H. (eds.) Language Change: Contributions to the study of its causes. . Berlin: Mouton de Gruyter.

    Oudeyer, Pierre-Ives (2002). Phonemic coding might be a result of sensory-motor coupling dynamics. In Bridget Hallam, Dario Floreano, John Hallam, Gillian Hayes, Jean-Arcady Meyer, (eds.) Proceedings of the 7th International Conference on the Simulation of Adaptive Behavior. Cambridge: MIT Press. pp. 406-416.

    Oudeyer, P-Y. (2006) Self-Organization in the Evolution of Speech. Oxford University Press.

    Peck, S. L. (2004). Simulation as experiment: a philosophical reassessment for biological modeling. Trends in Ecology and Evolution 19: 530-534.

    Phillips, B. S. (1984). Word frequency and the actuation of sound change. Language 60:320–342. Pierrehumbert, Janet (2001). Exemplar dynamics: Word frequency, lenition, and contrast. In Joan Bybee and Paul Hopper (eds.) Frequency effects and the emergence of linguistic structure. John Benjamins, Amsterdam. 137-157.

    Pierrehumbert, J. (2001). Exemplar dynamics: Word frequency, lenition, and contrast. In Bybee, J and P. Hopper (eds.) Frequency effects and the emergence of linguistic structure. John Benjamins, Amsterdam, 137-157.

    Pierrehumbert, J. (2002). Word-specific phonetics. In Gussenhoven, C and Warner, N. (eds.) Laboratory Phonology 7. Berlin; New York : Mouton de Gruyter.

    Pierrehumbert, J. (2003). Phonetic Diversity, Statistical Learning, and Acquisition of Phonology. Language and Speech 46: 115-154.

    Pierrehumbert, J. (2006a) The Statistical Basis of an Unnatural Alternation, in L. Goldstein, D.H. Whalen, and C. Best (eds), Laboratory Phonology VIII, Varieties of Phonological Competence. Mouton de Gruyter, Berlin, 81-107.

    Pierrehumbert, J. (2006b). The new toolkit. Journal of Phonetics 34: 516-530.

    Pisoni, D. B. & Levi, S. V. (2007). Some observations on representations and representational specificity in speech perception and spoken word recognition. In G. Gaskell, (ed.),The Oxford Handbook of Psycholinguistics. Oxford University Press. 3-18.

    Plaut, David C. and Christopher T. Kello (1999) The Emergence of Phonology From the Interplay of Speech Comprehension and Production: A Distributed Connectionist Approach. In Brian MacWhinney (ed.), Emergence of Language. Hillsdale, NJ: Lawrence Earlbaum Associates.

    Prince, Alan and Paul Smolensky (1993). Optimality Theory: Constraint interaction in generative grammar. Technical Report CU-CS-696-93, Department of Cognitive Science, University of Colorado at Boulder, and Technical Report TR-2, Rutgers Center for Cognitive Science, Rutgers University, New Brunswick, NJ.

    Steels, Luc. (2000). Language as a Complex Adaptive System. In M. Shoenauer (ed.), Proceedings of PPSN VI, Lecture Notes in Computer Science. Berlin: Springer-Verlag. 17–26.

    Stemberger, J., & MacWhinney, B. (1988). Are inflected forms stored in the lexicon? In M. Hammond, & M. Noonan (eds.), Theoretical Morphology. New York: Academic Press.

    Tenpenny, P. L. 1995. Abstractionist versus episodic theories of repetition priming and word identification. Psychonomic Bulletin and Review 2: 339-363.

    Vihman, M. M. & Croft, W. (2007). Phonological development: Toward a ‘radical’ templatic phonology. Linguistics, 45: 683–725. Vihman, M., DePaolis, R. A., Keren-Portnoy, T. (2009). A dynamic systems approach to babbling and words. In E. L. Bavin (ed.) The Cambridge Handbook of Child Language. Cambridge: Cambridge University Press. 163-182.

    Vitevitch, M. S. and Sommers, M. (2003). The facilitative influence of phonological similarity and neighborhood frequency in speech production. Memory & Cognition 31:491-504.

    Wang, William S.-Y. (1969). Competing changes as a cause of residue. Language 45:9–25. Wang, William S.-Y., & Cheng, C.-C. (1977). Implementation of phonological change: The Shaungfeng Chinese case. In W. Wang (ed.), The lexicon in phonological change. The Hague: Mouton.86–100. Wedel, A. (2004). Category competition drives contrast maintenance within an exemplar-based production/perception loop. In Goldsmith, J. and Wicentowski, R., eds., Proceedings of the Seventh Meeting of the ACL Special Interest Group in Computational Phonology, 7: 1-10.

    Wedel, Andrew. (2007). Feedback and regularity in the lexicon. Phonology 24: 147-185.

    Wedel, Andrew (2009). Variation, multi-level selection and conflicts between phonological and morphological regularities. In Blevin, J. and Blevins, J., (eds)., Analogy in Grammar: Form and Acquisition. Oxford University Press.