Language projects

Case grammar experiment

The case grammar experiment investigates how a population of autonomous artificial agents can self-organize a grammatical language that features similar properties as those found in case-languages such as Latin, German and Turkish. The case grammar experiment is the most advanced experiment in artificial language evolution to date: it is the first multi-agent experiment ever that features multifunctional, polysemous grammatical categories as opposed to other experiments in the field in which the agents always converge on one-to-one mappings.

In the experiment, the agents have to describe dynamic real-world scenes to each other (click here to see an example movie). Starting with a lexicon but without grammar, the agents gradually start to introduce case markers in order to reduce the cognitive load they need for interpreting sentences. For example, an utterance such as boy girl pushed is ambiguous as to whom did what to whom, so the hearer has to witness the scene to find the correct meaning. Adding more grammar could thus avoid this effort by directing the hearer to the correct interpretation. For example, in the utterance boy-bo girl-ka push, the grammatical markers can be used for indicating that the girl was being pushed and that the boy was the pusher.

Through analogical reasoning over event structures, agents can then generalize these markers to become more abstract. In the experiments, we typically see the emergence of agent-like and patient-like case markers, as in the following sentences:

  1. jack-fuitap walk-to jill-ginah
    Lit.: jack-sem-role-6 walk-to jill-sem-role-26
    ‘Jack walks to Jill.’
  2. touch jill-fuitap house-payis
    Lit.: touch jill-sem-role-6 house-sem-role-29
    ‘Jill touches the house.’
  3. house-woechen move-inside boy-fuitap
    Lit.: house-sem-role-10 move-inside boy-sem-role-6
    ‘The boy moves inside the house.’

Team: Luc Steels, Remi van Trijp

Grounded Emergent Semantics

Our experiments require that the agents are fully grounded in the world through a sensori-motor apparatus and that they can conceptualise the world. So we are carrying out research into vision and motor control to build up and maintain world models usable by the language systems, and we are developing a framework called IRL for the automated planning of complex conceptualisations. IRL has been released through emergent-languages.org

Team: Michael Spranger

Collaboration: Joris Bleys (VUB), Martin Loetzsch

Fluid Construction Grammar

In order to conduct experiments in language evolution it was necessary to develop a solid flexible formalism for the representation of linguistic knowledge and for orchestrating the parsing, production and learning processes necessary for language interaction. In collaboration with the VUB AI laboratory we have developed Fluid Construction Grammar (FCG). FCG is within the tradition of feature structure based, unification oriented grammars. It has some unusual features including bidirectionality, scoring of rules, and special operators (particularly the J-operator) for dealing with hierarchy. FCG is fully operational and can be downloaded from www.fcg-net.org.

Collaboration: VUB AI LAB (Joachim De Beule, Joris Bleys, Pieter Wellens, Martin Loetzsch)

Origins and evolution of shared combinatorial speech sounds

This research is concerned with the elaboration of a unified theory about the following three fundamental questions concerning the origins of the sound systems of human languages:

  1. Origins of combinatoriality anf phonemic coding: how do we get from holistic to digital speech, in which syllables are composed of re-usable parts ?
  2. Origins of phonotactics: how do we explain the structural regularities of phoneme inventories ?
  3. Origins of a culturally shared speech code: How can a society of agents develop a shared system of vocalizations ?

The hypothesis that is explored is that the origins of speech can be understood only as the result of complex dynamical interactions between speaking and listening individuals, each of them being a complex system in which the vocal tract, the ear, and the neural system that connects them are coupled.

The dynamics of complex systems is difficult to understand, and one of the best tools to study them is computer modeling. This is why societies of artificial agents were built. Agents were endowed with artificial vocal tracts, ears and brains, in order to explore the possible mechanisms that could explain the origins of shared speech systems.

Origin of speech - architecture.

Architecture of the coupling between sensorimotor modalities
and between agents

Through these computational experiments, we have shown that from a minimal neural kit for vocal replication, a shared combinatorial speech code with structural regularities and diversity can spontaneously self-organize in a population of agents. This allows to understand that the evolutionary step from vocal replication systems to modern human speech systems might have been rather small.

The aim of these experiments is primarily exploratory. It is not supposed to prove directly what mechanisms were used for humans, but rather develop our intuitions and help structuring the research debate. In particular, we think that building artificial systems allows us to shape the search space of possible answers, in particular by showing what is sufficient and what is not necessary.

Links: More details on computational modeling of the origins of speech sounds.

The production and recognition of emotions in speech

Maido, joieQuestion: How to make a robot express emotions through the modulation of its voice’s intonation ? How can a robot recognize the emotions or attitudes through the prosody of humans’ voices?

The ability to express and recognize emotions or attitudes through the modulation of the intonation of the voice is fundamental to human communication. In particular, it allows to coordinate the social interactions with babies, like in language games (giving feedback, calling for attention). Any robot designed to interact with humans in a natural manner needs these skills.

Emotions have some mechanical effects on physiology, like heart rate modulation or dryness in the mouth, which in turn have effects on the intonation of the voice. This is why it is possible in principle to predict some emotional information from the prosody of a sentence.

We are investigating how to control the pitch (fundamental frequency) and energy of synthetic speech signal so that a robot can express attitudes or emotions that can be recognized by humans. We have designed an algorithm which generates lively cartoon emotional speech, stylized so that people of many different native languages can reliably identify the emotion. The degree of emotion can continuously be varied. The technology was validated with human subjects.

We have also made a large scale experiment in order to find out which features and which machine learning algorithms are best to recognize the emotion in the voice of human speakers. State-of-the-art data-mining techniques were used to find the best feature set among more than 400 features together with the best learning algorithm, ranging from nearest-neighbours, bayesian classifiers, neural networks and support vector machines. Tests were made with a database of 6 Japanese speakers, with five emotions (neutral, anger, sadness, bored, happiness). We had 6000 sound samples in total. Surprisingly, we found that the best features were not those used in the psycho-acoustic litterature, but new ones based on the quartiles of the distribution of the energy values and of the minimas of the pitch contour.

Links: More info on the production and recognition of emotions in speech.

AIBO’s First Words

Question: What are the mechanisms needed to learn the meaning of new words in natural social contexts? How are the social regulation mechanisms involved in language learning? How can one draw the attention of a robot towards particular aspects of their environment? What are the interactions between acquisition mechanisms and language evolution?

We investigate the mechanisms that enable humans and robots to learn new words and to use them in appropriate situations. We have built a number of robotic and computational experiments studying the mechanisms of concept formation, joint attention, social coordination and language games, and articulating the roles of learning, physical and environmental biases in language acquisition. The unifying theme of all these experiments is development: we explore the hypothesis that language can only be acquired through the progressive structuring of the sensorimotor and social experience. These experiments are described in the papers below.

AIBO’s First Words

Interaction between an AIBO and its human trainer in the
AIBO’s First Words experiment.

Participants: Luc Steels, Masahiro Fujita [Sony DCL Tokyo], Frédéric Kaplan, Angus McIntyre, and Pierre-Yves Oudeyer

Talking Heads

Question: How do language users develop a shared communication system?

The Talking Heads experiment studied the evolution of a shared lexicon in a population of embodied software agents. The agents developed their vocabulary by observing a scene through digital cameras and communicating about what they have seen together. To add an extra level of complexity to their task, agents were able to move freely between different computer installations located in different parts of the world. Members of the public were able to influence the course of the experiment by logging on to the Talking Heads website to create and teach their own agents.

The guessing game

Links: The Talking Heads Experiment

The Naming Game

The ‘naming game’ experiments used computer simulations of communities of language users to explore the emergence of shared lexicons in a population. In the naming game, software agents interact with each other in a stylised interaction (termed a ‘language game’). Repeated interactions lead to the development of a common repertoire of words for naming objects. By varying experimental parameters, it is possible to explore the effect of environmental factors such as noise and uncertainty, memory limitations, and contact between different language groups.

Word competition in the Naming Game

Word competition in the Naming Game

As the language evolves, different word tokens ‘compete’ to represent particular meanings. This graph shows the competition between three words, two of which ultimately die out of the language, leaving the third as the preferred word for the given meaning.

Population change in the Naming Game

.Population change in the Naming Game

Initially, the population size is fixed. The language emerges, becomes coherent and supports effective communication. In the second phase of the experiment, an agent is replaced every hundred games. The change affects the language, but communication is still possible, as the existing agents preserve the established language. In the third phase, the rate of replacement is speeded up further. Under these conditions, the language breaks down: the population is changing too quickly to maintain the language.

Form stochasticity in the Naming Game

Form stochasticity in the Naming Game

The ‘form stochasticity’ parameter governs the accuracy with which word forms are transmitted: higher stochasticity means that word tokens are more likely to be distorted during transmission. This parameter might reflect the effects of ambient noise, or imperfections in the perception or production process.

In this graph, stochasticity is initially high: the language is slow to stabilize, because word forms are not reliably communicated. Reducing the stochasticity allows the formation of a stable language. When the stochasticity is returned to the previous, higher value in the third phase, communication is able to continue because the existence of a shared language allows hearers to compensate for errors in transmission.