The production and recognition of emotions in speech
Question: How to make a robot express emotions through the modulation of its voice’s intonation ? How can a robot recognize the emotions or attitudes through the prosody of humans’ voices?
The ability to express and recognize emotions or attitudes through the modulation of the intonation of the voice is fundamental to human communication. In particular, it allows to coordinate the social interactions with babies, like in language games (giving feedback, calling for attention). Any robot designed to interact with humans in a natural manner needs these skills.
Emotions have some mechanical effects on physiology, like heart rate modulation or dryness in the mouth, which in turn have effects on the intonation of the voice. This is why it is possible in principle to predict some emotional information from the prosody of a sentence.
We are investigating how to control the pitch (fundamental frequency) and energy of synthetic speech signal so that a robot can express attitudes or emotions that can be recognized by humans. We have designed an algorithm which generates lively cartoon emotional speech, stylized so that people of many different native languages can reliably identify the emotion. The degree of emotion can continuously be varied. The technology was validated with human subjects.
We have also made a large scale experiment in order to find out which features and which machine learning algorithms are best to recognize the emotion in the voice of human speakers. State-of-the-art data-mining techniques were used to find the best feature set among more than 400 features together with the best learning algorithm, ranging from nearest-neighbours, bayesian classifiers, neural networks and support vector machines. Tests were made with a database of 6 Japanese speakers, with five emotions (neutral, anger, sadness, bored, happiness). We had 6000 sound samples in total. Surprisingly, we found that the best features were not those used in the psycho-acoustic litterature, but new ones based on the quartiles of the distribution of the energy values and of the minimas of the pitch contour.
Links: More info on the production and recognition of emotions in speech.
Participants: Pierre-Yves Oudeyer
References
The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction, 59(1-2):157-183 2003. special issue on Affective Computing