complexity.pdf

(335 KB) Pobierz
Self-organization and Selection
in the Emergence of Vocabulary
Authors:
Jinyun Ke
Language Engineering Laboratory
Department of Electronic Engineering
City University of Hong Kong, Hong Kong
phone: (852)27887187
fax: (852)27887791
email: jyke@ee.cityu.edu.hk
James Minett
Department of Electronic Engineering
City University of Hong Kong, Hong Kong
Ching-Pong Au
Department of Chinese, Linguistics and Translation
City University of Hong Kong, Hong Kong
William S-Y Wang
Language Engineering Laboratory
Department of Electronic Engineering
City University of Hong Kong, Hong Kong
Number of text pages: 13
Number of figures: 7
Number of tables: 9
This is a preprint of an article accepted for publication in Complexity copyright (2002)
1
Abstract
Human language may have started from a consistent set of mappings between meanings
and signals. These mappings, referred to as the early vocabulary, are considered to be the
results of conventions established among the agents of a population. In this study, we report
simulation models for investigating how such conventions can be reached. We propose that
convention is essentially the product of self-organization of the population through interac-
tions among the agents; and that cultural selection is another mechanism that speeds up the
establishment of convention. Whereas earlier studies emphasized either one or the other of
these two mechanisms, our focus is to integrate them into one hybrid model. The combina-
tion of these two complementary mechanisms, i.e. self-organization and cultural selection,
provides a plausible explanation for cultural evolution which progresses with high transmis-
sion rate. Furthermore, we observe that as the vocabulary tends to convergence there is a
uniform tendency to exhibit a sharp phase transition.
Summary
Language is one of the defining characteristics specific to humans. It is well known that
language evolves continuously at high rate. However, the answers to why and how human
language emerged and changes are still being pursued. In this study, we speculate that human
language may have started from a consistent set of mappings between meanings and signals.
These mappings are considered to be the results of conventions established among the indi-
viduals of a population. We use simulation models to investigate how such conventions can
be reached.
Keywords
language evolution, emergence, vocabulary, self-organization, selection
2
1
Introduction
Language is generally considered to be one of the most important characteristics that differ-
entiates humans from other species. The question as to how fully-fledged human language
came into being has been pursued for centuries; various theories have been suggested and
many controversies exist [1]. Recently there has been growing interest in the study of lan-
guage emergence and change within the framework of complex adaptive systems [2]. In this
study, we focus on the emergence of vocabulary within a population of early humans using
simulation models, as is often adopted in the study of complex adaptive systems.
Most would agree that the first step in human language evolution was communication
systems using a number of holistic signals, like those found in primates and other animals
such as bees and birds [3]. These signals are not equivalent to the “names” or “words” as
we use these terms today. Each signal may refer instead to a complex mix of meanings. The
signal for “danger”, for example, may imply a cry of fear, a warning – “run, there is danger
coming”, as well as a general reference for the dangerous predator. Later, such a signal
would have been narrowed down to name a single object or class of objects, and used in a
non-situation specific fashion [4]. The realization of signals as symbols representing objects or
events is referred as the “naming insight” [5]. In the language development of a normal child,
the naming insight comes so naturally that most parents do not notice the exact moment of
this event without special attention. On the contrary, chimpanzees need intensive teaching
to learn to name [6].
How the naming insight occurred in the phylogeny of the hominid line is still a mystery.
However, we may assume that there was a stage during which early humans became equipped
with the naming insight. What interests us in this study is that once the naming insight was
achieved in the agents, each of whom might have his or her own way to name in an arbitrary
manner, how did the actual naming of objects or events become consistent across the entire
population? Some twenty centuries ago, two great philosophers, continents apart, arrived
at a similar observation that names are formed by convention [7]. Xunzi in China taught
that “names have no intrinsic appropriateness” and “names have no intrinsic reality” —the
appropriateness and reality of names are both given by convention. At about the same time
in Greece, Plato wrote that “any name which you give is the right one, and if you change
that and give another, the new name is as correct as the old.”
In this study, we use simulation models and mathematical analysis to explore the process
of how convention is achieved to form a coherent naming system—which we may refer to as
a
vocabulary.
In our models, at first, a number of agents in a population each have their own
agent-specific way for naming a set of objects. Upon interacting with each other, some are led
to modify their naming systems. Agents may not be concerned with and may not care about
the general communication performance at the population level, but instead concentrate only
on their
own
communication performance with other agents. Without any explicit or implicit
design, however, a consistent common vocabulary develops as an emergent property of the
population. Our proposed model of vocabulary emergence shares the same spirit as the
“invisible hand” mechanism suggested by Keller [8], which has been echoed by Kirby: “the
local, individual actions of many speakers, hearers, and acquirers of language across time
and space conspire to produce non-local, universal patterns of variation” [9]. In fact such
phenomena have been widely studied in many complex systems in various disciplines and are
described as “self-organization” [10]. A system is defined as self-organizing “if it acquires
a functional, spatial, or temporal structure without specific interference from the outside”
[11]. In recent studies of language emergence, there have been several reports explicitly
adopting this framework, which use computer simulation models to study the self-organization
mechanism of the emergence of vocabulary [12] and sound systems [13].
3
From the evolution point of view, self-organization plays a role in the synchronic and
horizontal interaction in a population. However, self-organization has not addressed the
evolution process in the diachronic aspect, i.e. from generation to generation. On the other
hand, natural selection has been generally considered as the major mechanism for evolution
in terms of gene transmission across generations, i.e. vertical transmission. In the study of
language evolution there are two views related to natural selection. In one view, some consider
language as the product of a biological organ, the Language Acquisition Device (LAD); they
think it is the LAD that must have evolved through natural selection [14]. Alternatively,
Dawkins introduced the term “meme” to refer to a new type of replicating unit in cultural
evolution as a counterpart of the gene in biological evolution [15]. In this sense, a language is
an elaborate complex of memes, which evolve based on linguistic or cultural selection
1
. It is
the languages
themselves
that adapt for their own “survival” in the transmission from speaker
to speaker. It is impossible to evaluate which language, e.g. English, Chinese or any other,
is better, in terms of its overall fitness by taking into account learnability, expressiveness,
complexity and so on. However, we argue that when considering some specific aspects,
for example the possibility of ambiguity in terms of the number of homophones
2
, different
language subsystems indeed can be compared with respect to their fitness. Subsystems with
higher fitness will have a higher probability to diffuse more into the next generation. In this
study we assume that vocabulary can be transmitted from generation to generation as a result
of children’s learning from parents and that the communicative consistency of the vocabulary
determines the possibility in which it is transmitted to the next generation. In this way, we
study the effect of cultural selection in this vertical transmission process and examine how
selection couples with self-organization in catalyzing the formation of a consistent vocabulary
in the population.
In the study of modeling vocabulary emergence, there have been a few related studies.
Among them, Steels [12], as mentioned above, focuses on the self-organization mechanism,
while Hurford [16] and Nowak & Krakauer [17] mainly emphasize the selection process. In this
study, we propose that both mechanisms are indispensable for structure emergence in evolu-
tion. Using the “tinkerer” metaphor introduced by Jacob [18]
3
, we highlight the combined
function of these two “tinkerers” in language evolution.
The rest of the paper is organized as follows. First, we report an imitation model in
Section 2, which simulates how a common vocabulary is formed by agents imitating each other
either merely randomly or by following the majority. We use Markov chain theory to analyze
the model. A detailed proof of the convergence is given in the appendix. In Section 3 we
present an interaction model which uses a probabilistic representation of vocabulary. Different
parameters in the model are investigated by simulation and a few interesting observations
are reported. Section 4 introduces a hybrid model which combines the self-organization and
cultural selection mechanisms. Conclusions and discussion are given in the last section.
2
The imitation model
The strong ability of imitation in human, even from early infancy, has been extensively
documented in the studies reported by many investigators, e.g. Meltzoff [19]. While other
In this study, however, we still use the term “natural selection” in a broad sense to refer to the selection
mechanism in language evolution.
2
Two or more words are homophones if they are pronounced the same but differ in meaning, such as the
words ‘too’ and ‘two’.
3
Jacob states that “natural selection
. . .
works like a tinkerer, who does not know exactly what he is going
to produce, but uses whatever he finds around him, whether it be pieces of string, fragments of wood, or old
cardboards,
. . .
to produce some kind of workable object.”
1
4
social animals, particularly the primates, also imitate, it appears that the tendency is by far
the strongest and most general in our species. We assume that imitation may serve as the
most explanatory mechanism for the formation of a common vocabulary. Before establishing
a consistent way of naming things, early humans very likely made use of their propensity for
imitation; the younger ones imitating their elders, the followers imitating the leaders or, just
by chance, their neighbors. Such imitation between agents can be seen as self-organization
in the population.
In this study, we set up a model to simulate the process of word imitation by agents in a
population. Assume that in a population of
P
s
agents there are
M
classes of objects
4
that
the agents must name in their daily communication, and
U
different utterances
5
that the
agents can make. Each agent’s vocabulary consists of a set of associations, or one-to-one
mappings, between meanings and utterances. Each agent can create and change his or her
own vocabulary by imitation, similar to the model proposed by Steels [12]. We assume that,
at first, each agent already has his or her own specific meaning-utterance mappings. For
example, the M-U mappings of two agents (A and B) might be as shown in Table 1.
Table 1 about here
Let us assume that when they interact with each other, the two agents only communicate
a single meaning. When they are not using the same word to represent that meaning, an
imitation event is likely to occur; for example A may imitate B when they communicate
meaning
m
1
by replacing his own
u
2
with B’s
u
5
, or vice versa. We assume that each agent
is equipped with an imitation strategy. For simplicity, we assume that such imitation events
involve no errors. There are many possible imitation strategies that can be conceived. In
this study we report two strategies that may be the most realistic
6
:
Strategy 1: imitating by random direction—either A imitates B, or vice versa, with
equal probability.
Strategy 2: imitating by following the majority—A imitates B if the utterance B uses
is shared by more agents than the utterance A uses.
Before studying how imitation affects agents’ vocabulary in a population, we need to
design some criteria to evaluate how well the vocabularies of the members of a population
work for conveying meanings. First, we need to consider the population consistency, denoted
by
C,
i.e. how many consistent meaning-utterance mappings there are among the agents.
Second, we consider the effect of homophones, which are words that have the same form but
different meanings. The existence of homophones is likely to cause confusion. Therefore,
assuming that the fewer the homophones, the better the vocabulary, we select a second
criterion to be the distinctiveness of the vocabulary, denoted by
D. C
and
D
are calculated
as follows. For a population with
P
s
agents,
1. The overall
consistency
for all meanings is
1
C
=
M
4
5
M
C
i
i=1
A class of objects containing a single object is permissible.
In this report “utterance”, “signal” and “word” are used interchangeably for the sake of convenience,
particularly when discussing other studies, though we acknowledge that there are important differences among
them.
6
We have considered other strategies such as following the authority and homophone avoidance. The
detailed simulation results are reported in an earlier study by Wang & Ke [20].
5
Zgłoś jeśli naruszono regulamin