Developing Behaviour for a Social Robot

Developing for R3D3

The Rolling Receptionist Robot with Double Dutch Dialog is a movable robot that consists of both a robot head and a virtual human. With the virtual human, called Leeloo, users can conduct a dialog in Dutch. The robot head, called EyePi, can move its head and show emotions. Users can interact with both agents simultaneously. Among other things it consists of speech recognition, computer vision and text-to-speech, all ingredients for a natural robot-human interaction.

For the course Conversational Agents Daan Wiltenburg and I did research on interaction with the robot. This included an experiment on turn allocation and a ‘in the wild’ study in NEMO Science Museum, Amsterdam. Our contribution to the project was a turn allocation role for EyePi with interruption management, acknowledgement behavior and Dynamic Coherent Responses with which a conversation is mimicked even when the speech is not recognized.

When the study was completed I continued to be a developer for the behavior of R3D3 and its ‘caretaker’ for various showcase events. I added acknowledgements based on age and gender, responses based on emotions and a speech recognition system that reacts with a random response when the speech is not recognized.

Visit R3D3's website

Creating a next speaker selection role for EyePi

While EyePi is only able to show non verbal behavior, Leeloo is the one users can have a conversation with. Hence, Leeloo gets all the attention and EyePi gets ignored, challenging the use of the speechless robot head. Also, there was not any real behavior for EyePi developed. Daan and I investigated whether EyePi can have a useful role in a conversation. Furthermore, we investigated how much attention both agents actually received. We gave EyePi the role of next speaker selector in a conversation with two other participants by let it gaze to the person that needed to get the new speaker role. We found that this can be successful when the robot head is involved in the conversation. We found that if EyePi is not involved right before a performed action it is likely that it gets ignored, since Leeloo would receive more attention. Nevertheless, it is possible to shift the attention to EyePi by involving it in the conversation at any moment. Based on our work the paper  “You can leave your head on – attention management and turn-taking in multi-party interaction with a virtual human/robot duo” was published.

Read the paper

R3D3 at NEMO: Turn Allocation With Children

The previous research in a lab setting had shown that EyePi can be used as a turn allocation tool in a multi-party conversation. In a new study we extended this previous research with an ’in the wild’ experiment with a multi-party conversation with children at NEMO Science Museum. Last time, a Wizard of Oz method was used. This time, R3D3 functioned autonomously. Furthermore, we added two more behaviors; acknowledgement by gaze, to allocate the role of bystander, and interruption management by gaze, to silence interrupters. These autonomous functions were based on the ability of the computer vision to detect faces and to detect who is speaking. When the virtual human was finished with speaking, EyePi would gaze towards a detected face. When a user started talking, EyePi would gaze to that user. When a user started talking while the virtual human or another user was talking already, EyePi would try to silence this person by gazing angry. When a new user would enter the scene, he or she was acknowledged. When a new user would enter the scene while someone was speaking, the new user was acknowledge by a silence gaze of EyePi.

A pilot experiment was conducted beforehand at a daycare to test R3D3’s functionalities. The final results showed that the overall experience with R3D3 is more interesting when EyePi is more active. We also found that turn allocation is really difficult in multi-party conversation with children. The functionalities became less effective when the group size increases. Based on our work the paper “R3D3 in the Wild: Using A Robot for Turn Management in Multi-Party Interaction with a Virtual Human” was published.

read the paper

Mimicking a conversation

During the study at NEMO we had to deal with a lot of restrictions. One of them was that the speech recognition would not work with the high pitch voices of children. Since the speech recognition was the part that triggered the responses, we had to recreate the dialog management. We decided to let the computer vision trigger the responses. The problem was though, that the computer vision could only know when someone was talking, not what he or she was saying. Hence, the responses of R3D3 would always be random. That is why we developed a, what I later named as, Dynamic Coherent Responses (DCR) system. The DCR system consisted of a (random) start sentence and one or several coherent responses that belonged to the first sentence. The coherent responses were triggered after the user was finished answering the first sentence. For example, the virtual human would ask a question and wait until the user answered the question. Then, it would respond with something that would be a good follow up independent of the answer of the user:

R3D3: “Do you like the museum?”

User: “Yes very much” or maybe: “No I hate it”

R3D3: “Well, we are very happy that you’re here!”

You could add as many coherent follow up responses as you wanted. And for each follow up response a behaviour of EyePi could be added.

Building a clever conversational partner

Later on I started to develop the speech recognition and integrate it in the behaviour program. There was the problem (and still is) that the speech recognition would not work properly all the time, for example when two users talked at once or when there is a lot of background noise. Therefore, I developed a system that triggers a coherent response when the speech is recognized, or a random response with DCR when it is not. This way, under good circumstances, the robot really gives the impression that it understands the user and a natural conversation can take place.

To make it even more clever I made an acknowledgment behaviour that greets the user according to the age, gender and distance. For example, a man of 24 is greeted as young man, and a man of 50 is greeted as sir. Next to that it looks at the distance of the user. If the user is far away, he or she is asked to come closer.

I created behaviour that reacts on the emotion of the user. When the user is looking very sad for example, the robot asks what is wrong. Furthermore, I created behaviour for when the robot is ignored. When someone is standing facing his or her back to the robot, it kindly asks to stop ignoring it.


J. Linssen, M. Berkhoff, M. Bode, E. Rens, M. Theune, and D. Wiltenburg (2017). You can leave your head on – attention management and turn-taking in multi-party interaction with a virtual human/robot duo. In Proceedings of Intelligent Virtual Agents (IVA 2017).

M. Theune, D. Wiltenburg, M. Bode, and J. Linssen (2017). R3D3 in the Wild: Using A Robot for Turn Management in Multi-Party Interaction with a Virtual Human. In Proceedings of the IVA 2017 workshop on Interaction with Agents and Robots: Different Embodiments, Common Challenges.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google