EP2834811A1

EP2834811A1 - Robot capable of incorporating natural dialogues with a user into the behaviour of same, and methods of programming and using said robot

Info

Publication number: EP2834811A1
Application number: EP13714282.4A
Authority: EP
Inventors: David Houssin; Gwennael GATE
Original assignee: Aldebaran Robotics SA
Current assignee: Aldebaran SAS
Priority date: 2012-04-04
Filing date: 2013-04-03
Publication date: 2015-02-11
Also published as: FR2989209A1; US20150100157A1; JP2015524934A; FR2989209B1; CN104350541B; US10052769B2; CN104350541A; WO2013150076A1; JP6367179B2

Abstract

The invention concerns a humanoid robot, said robot being capable of holding a dialogue with at least one user, said dialogue using two speech recognition modes, one open and the other closed, the closed mode being defined by a concept characterising a sequence of dialogue. The dialogue can also be influenced by events that are neither speech nor a text. The robot of the invention is capable of executing behaviour and generating expressions and emotions. Relative to the robots of the prior art, the invention provides the advantage of considerably reducing the programming time and latency of execution of the sequences of dialogue, which provides a fluency and a nature close to human dialogues.

Description

ROBOT FOR INTEGRATING NATURAL DIALOGUES WITH A USER IN HIS BEHAVIOR, METHODS FOR

PROGRAMMING AND USING DUDIT ROBOT

The present invention belongs to the field of robot programming systems. More specifically, it provides a humanoid robot already equipped with the ability to perform behaviors of advanced dialogue capabilities with a human user. A robot can be called a humanoid from the moment it has certain attributes of the appearance and functionality of the man: a head, a trunk, two arms, possibly two hands, two legs, two feet ... In the context of the present invention, however, the most important humanoid characteristic is the capacity for oral expression in dialogue with a human, said expression capacity being as coordinated as possible with the gestural and / or symbolic expression of the human being. personality and emotions of the robot. One can imagine the development of applications of the "companion robot" type, that is to say a robot that is able to take care, particularly on behalf of one or more human beings in a state of dependence, a certain number of assistance functions in everyday life, while providing these humans with a presence that can be considered as an emotionally equivalent substitute for the presence of a personal human assistant. For this, it is essential to develop the ability of said humanoid robots to interact with human beings as closely as possible to human behavior. In particular, it is necessary for the robot to be able to interpret questions or affirmations of the human being, to make replicas in conversational mode, with a richness of expression corresponding to that of a human being and modes of expression that are in synergy with types of behavior and emotions that are normally those of a human being.

The first steps in this direction have been made thanks to programming methods of Nao ™ brand humanoid robots marketed by the applicant of the present patent application and disclosed in the international patent application published under the number WO2012 / 000927 relating to a player robot and in the international patent application published under No. WO2012 / 010451 relating to a humanoid robot with a natural dialogue interface.

However, the robots disclosed by these documents can only execute a limited and predetermined number of dialogue elements, or at least if one wishes to multiply said dialogue elements towards a diversity corresponding to the normal behavior of a being. human, combinatorics would quickly become inaccessible to one. In particular, in order to be able to provide the assistance services to the person indicated above, it is necessary to provide the humanoid robots with a richer conversational aptitude than that of the robots of the prior art.

To do this, the present invention implements in said robot a conversational agent, voice recognition tools and tools for analyzing the behavior of human beings with which the robot converses.

For this purpose, the present invention discloses a humanoid robot comprising: i) at least one sensor selected from a group comprising first sound-type sensors and second sensors, of at least a second type, of events generated by at least one a user of said robot, ii) at least one event recognition module at the output of said at least one sensor, and iii) at least one event generation module to said at least one user, a dialogue module with said at least one user least one user, said dialog module receiving as input outputs of said at least one recognition module and producing outputs to said selected event generation module in a group comprising words, motions, expressions and emotions, said robot being characterized in that it further comprises an artificial intelligence engine configured to drive the outputs of the event generation module.

Advantageously, the control of the event generation module by the artificial intelligence engine is performed according to the context of the dialogue and variables defining the present and predictive configuration of the robot. Advantageously, said at least one event recognition module receives inputs from at least two sensors belonging to at least two different types, and in that said at least one event generation module at the output of said module of dialogue is able to output events taking into account said inputs from said at least two sensors.

Advantageously, said at least one recognition module is able to structure the entries in concepts according to a dynamic hierarchical tree.

Advantageously, an entry in said at least one recognition module applies to textual or voice entries and activates a grammar in said dialog module. Advantageously, an entry in said at least one recognition module activates / deactivates the recognition of said entry.

Advantageously, said at least one recognition module comprises a first and a second sub-module, the first submodule operating on a closed list of words attached to at least one concept and the second sub-module operating on an open list of words. .

Advantageously, an output of the first sub-module is provided alone to the dialogue module.

Advantageously, an output of the second sub-module is only provided to the dialogue module.

Advantageously, an output of the first sub-module and an output of the second sub-module are jointly provided to the dialogue module.

Advantageously, an output of the first sub-module is first provided only to the dialogue module, said output of the first sub-module being confirmed in the dialogue module by an output of the second sub-module. Advantageously, none of the outputs of the first and second sub-modules generates output of the dialogue module and in that said robot proposes at least one input to said at least one user. Advantageously, the dialogue module also receives input dynamic elements from an application.

Advantageously, at least one output of the dialogue module is provided to a module able to execute a function chosen in a group of functions for generating at least one expression of said robot, decision to generate at least one behavior of said robot and generating at least one emotion of said robot.

Advantageously, said function for generating at least one behavior takes into account the constraints of the system of said robot.

Advantageously, said function for generating at least one emotion is able to generate a series of predefined expressions between a neutral state and a predefined state in response to input events.

Advantageously, the humanoid robot of the invention further comprises a visual recognition module, said module being able to interpret at least one sign of said at least one user as a beginning or an end of a sequence of a dialogue.

Advantageously, said dialog module comprises a lexical analysis sub-module and an interpretation sub-module of the outputs of said lexical analysis sub-module capable of generating concepts to which the words of the current dialogue are attached.

Advantageously, said dialog module is able to process questions and commands from said at least one user relating to the state of his physical and / or logical system. The invention also discloses a method of dialogue between a humanoid robot and at least one user, comprising: i) at least one step of recognizing inputs from at least one sensor selected from a group comprising first sensors of the type and second sensors, of at least a second type, of events generated by said at least one user, ii) a step of generating events to said at least one user, and, iii) a step of dialog with said dialog with said at least one user, said dialog step receiving at the outputs of said at least one recognition step and producing outputs to said event generation step selected from a group comprising speech, motion, expressions and emotions, said method being characterized in that it further comprises a step of controlling the outputs of the event generation module by means of n engine of artificial intelligence.

Advantageously, the control of the event generation module by the artificial intelligence engine is performed according to the context of the dialogue and variables defining the present and predictive configuration of the robot.

Advantageously, said robot dialogs with at least two users, parameters characterizing said at least two users being stored in a memory of said robot to be used when said robot recognizes one of the at least two users.

The invention also discloses a computer program embarked on a humanoid robot comprising program code instructions for executing the method of the invention, when the program is executed on a computer, said program being adapted to manage a dialogue between said humanoid robot and at least one user, said computer program comprising: i) at least one event recognition module at the output of at least one sensor selected from a group comprising first sound-type sensors and second sensors of at least a second type, events generated by said at least one user, ii) at least one event generation module to said at least one user, and iii) a dialog module with said at least one user, said dialog module receiving as input outputs of said at least one recognition module and outputting outputs to said event generation module chosen from a group comprising words, movements, expressions and emotions, said program being characterized in that it further comprises an artificial intelligence engine configured to control the outputs of the event generation module. Advantageously the invention also discloses a computer program comprising program code instructions configured to generate a computer program according to the invention and transmit it to at least one humanoid robot, said instructions being generated in a ChatScript type interface .

The invention allows the use of programming languages already in use in the field of conversational agents, the syntax of said languages being already known to a large community of programmers who will thus be available to develop new applications embodying the present invention. . Implemented in the context of the present invention, the conversational agents of the prior art are considerably increased in their possibilities thanks to the integration of advanced speech recognition functionalities, as well as to the taking into account of information. from other sensors of the robot, including visual recognition, which allow him to detect situations of activation dialogues and its interlocutors. A dialogue according to the method of the invention may be adapted to different categories of conversation elements with personalities of different robots that will depend on the preferences of their user. The robot will be able to express emotions in adequacy with said conversation elements to have behaviors also synchronized with said elements, which will allow the creation of fluid scenarios of exchanges between a user and his or her robots. In addition, the robot will be able to provide information on the status of a number of elements of its system (eg remaining battery life) and receive system commands in a system. dialogue mode with a user, which greatly improves the ergonomics of use of said robot.

The invention will be better understood and its various features and advantages will emerge from the following description of several exemplary embodiments and its appended figures including:

FIG. 1 represents a humanoid robot capable of implementing the invention in several of its embodiments;

FIG. 2 represents a general flowchart of the treatments according to several embodiments of the invention;

FIG. 3 represents the processing blocks of a dialogue behavior management module and a voice recognition module according to several embodiments of the invention;

FIG. 4 represents an example of a tree of several dialogue levels according to several embodiments of the invention;

FIG. 4a represents a concept tree according to several embodiments of the invention;

FIG. 5 represents a simplified flowchart of the speech recognition module processes in several embodiments of the invention;

FIG. 6 represents the data flow between several software modules configured to implement the invention in several of its embodiments;

FIG. 6a illustrates the operation of an emotion engine in certain embodiments of the invention;

FIG. 6b illustrates the operation of a decision engine in certain embodiments of the invention;

FIG. 7 represents the different functions at the input and at the output of the management module of a dialogue for implementing the invention in several of its embodiments;

FIG. 8 represents the data model of a dialog analysis and interpretation module for implementing the invention in several of its embodiments; FIG. 9 represents the architecture of the software modules implanted on a robot configured to implement the invention in several of its embodiments. FIG. 1 represents a humanoid robot capable of implementing the invention in several of its embodiments.

This humanoid robot is shown in the figure in one embodiment of the invention. Such a robot has been disclosed in particular in the patent application WO2009 / 124951 published on 15/10/2009. This platform served as a basis for the improvements that led to the present invention. In the remainder of the description, this humanoid robot can be indifferently referred to under this generic name or under its trademark NAO ™, without the generality of the reference being modified.

This robot has about two dozen electronic sensor control cards and actuators that drive the joints. The electronic control card includes a commercial microcontroller. It can be for example a DSPIC ™ of the company Microchip. It is a 16-bit MCU coupled to a DSP. This MCU has a servo loop cycle of one ms. The robot can also include other types of actuators, including LEDs (electroluminescent diodes) whose color and intensity can reflect the emotions of the robot. It may also include other types of position sensors, including an inertial unit, FSR (ground pressure sensors), etc ....

The head 1 10 comprises the intelligence of the robot, in particular the card which performs the high-level functions that enable the robot to perform the tasks assigned to it, notably, in the context of the present invention, for the execution of the dialogues written by a user. The head will advantageously also include specialized cards, especially in the treatment of speech (synthesis and recognition) or vision.

With regard to speech recognition, in the currently used audio signal processing architecture, said audio signals are captured by four microphones and processed in software in specialized modules which are described in comments in FIG. 9. The direction of origin of the sounds can be determined analysis of the arrival date differences of the sound signals on the four sensors. The lyrics are recognized by a grammatical engine speech recognition software (for example of the type marketed by the company Nuance ™) or natural language interpreter.

The head also includes one or more dedicated I / O processing cards, such as the encoding required to open a port to establish remote communication over a wide area network (WAN) wide area network. The card processor can be a commercial x86 processor. We will choose in a preferred way a low-power processor, for example an ATOM ™ from Intel (32-bit, 1600 MHz). The card also includes a set of RAM and flash memories. This card also manages the communication of the robot with the outside (behavior server, other robots ...), normally on a WiFi transmission layer, WiMax, possibly on a public network of mobile data communications with standard protocols possibly encapsulated in a VPN. The processor is normally controlled by a standard OS which allows to use the usual high-level languages (C, C ++, Python, ...) or the specific languages of artificial intelligence like URBI (programming language specialized in robotics) for programming high-level functions.

The robot will be able to execute behaviors for which it may have been programmed in advance, in particular by a code generated according to the invention disclosed in the international patent application published under No. WO2012 / 010451 already cited, said code having been created by a programmer in a graphical interface. According to this invention and in the remainder of the present description, a behavior is a combination of actions (movements, words) and possibly events. These behaviors may also have been arranged in a scenario created by a user who is not a professional programmer using the invention disclosed in the patent application WO201 1/003628. In the first case, they may be behaviors articulated among themselves according to a relatively complex logic in which the sequences of behaviors are conditioned by the events that occur in the environment of the robot. In this case, a user who needs a minimum of programmer skills can use the workshop Choregraph ™, whose main modes of operation are described in the cited application. In the second case, the flow logic of the scenario is not in principle adaptive.

According to the present invention, a programmer is able to produce a complex scenario comprising sets of behaviors comprising various gestures and movements, sound or visual signal transmissions, and especially natural dialogues between the robot and a human being or another robot, said dialogues being coordinated with the personality and emotions of the robot and the semantic and event context of the conversation.

FIG. 2 represents a general flowchart of the treatments according to several embodiments of the invention.

According to the invention, a dialog writing module 210 implanted on a workstation, for example a PC, separate from the robot is intended to program dialogue scenarios. The dialogues may have several characters, one or more robots and one or more speakers. Said module is advantageously implanted in the Chorégraphe ™ software workshop which makes it possible to program robot behaviors, the dialogues being mixed within scenarios with behaviors to be executed by the robot in relation to the elements of the dialogs. A voice recognition module 220 whose features have been indicated in comment in Figure 1 is installed on the robot. It is intended to interpret the elements of the dialogues created in the dialog writing module 210, said dialog elements being transmitted to the robot by a wired or wireless communication interface, according to the modalities described above in commentary on FIG. . The elements of the dialogs transmitted to the module 220 are compiled for example in a language using the normalized BNF (Backus Normal Form) syntax. For example, a sequence of words will be interpreted as a logical "AND", a logical "OR" to be symbolized in a different way, for example by a "| ". The operation of the voice recognition module 220 is detailed later in the description in comment in FIG.

The elements coming from the module 210 for writing the dialogs and the outputs of the module 220 of speech recognition are passed to a module dialogue engine 230. Said engine generates words, emotions, expressions, behaviors and events created in module 210, according to modalities explained in commentary in FIGS. 6 and 7. Behavior is a sequence of gestures defining a compound movement (getting up , play football, etc.). An expression is a behavior of a particular type defined for a given dialogue by a word / action couple. An action can be a combination of movements and / or signs emitted for example by the LED of the robot. A method for creating scenarios consisting of sequences of expressions has been disclosed by the international application published under No. WO201 1/003628. An emotion is a sequence of expressions defined by a terminal expression and a sequence of expressions that tend towards the terminal expression. For example, we can define the following emotions E _iin "happy / sad,""tired,""scared,""excited,""curious", each expression _in E, for i varying from 1 to n being an intermediate expression between a reference state and the expression E _{n, n.} If the robot is in a state E _jiP p different from n, a list of events defined to provoke the emotion will make the robot go from the state E _jiP to a state E _jin .

The dialog description language is derived from the ChatScript language (http://chatscript.sourceforqe.net/).

In ChatScript syntax, we write a scenario as a set of rules. For example :

? : MEAT (you like meat) Yes

A complete rule usually includes:

- a type, "?:" in the example, which indicates a question;

- a label, "MEAT" in the example, which can be omitted, but which, when present, allows calls by other dialogues;

- an entry characterized by a pattern in parentheses, "(you like meat)" in the example, sentence to which will be attached sentences containing these three words in this order, but also other words: "Albert, you like meat, "Albert, you like red meat", ...;

- an exit, "Yes" in the example According to the invention, the language is adapted to mix the dialogue elements with behaviors of the robot. Nonlimiting examples of adaptation are given in the following description.

For example, a behavior will be defined by a single string of characters (for example: "failure", "football", "taichi", etc.). An emotion also, given that a code will indicate that it is an emotion (we can for example use a capital initial: "Happy / Sad", "Tired", "Scared", "Excited", " Curious ", ...). The language used makes it possible to simply write several formulations for a sentence of the user (different formulations of 'hello' for example). An entry can be a sentence of the entity interacting with the robot (a "user" who can be a human being or another robot), an event, or both (I say hello as I move my hand towards the robot). For a user's sentence, this language makes it possible to express several possible answers in the form of sentences, emotions, events or behaviors. A dialog behavior may for example be of the type in which the robot follows the user's gaze and analyzes the movements of the user to achieve more natural responses (eg do not speak at the same time as the user).

Each element of the dialogue language is retranscribed in its equivalent in the module 220 comprising the voice recognition engine, said engine being able to recognize only a certain number of words. Thanks to this transformation, we are guaranteed that every recognized sentence has an answer. At the writing of the dialogue, not at the execution, we thus generate all the dialogs and all the entries of the dialogue in the format of the speech recognition. It is therefore important that the dialog description language has an equivalent in speech recognition, which is not the case for a keyboard dialogue which is the known context of use of the ChatScript language.

The grammar of the dialogue description language includes the following features:

1) Pattern recognition (or matchinq pattern):

Some patterns are indicated in the dialog script by a sign: - 'Or' accepts a list of possible words, for example: [hello hello];

- 'And' looks for an exact list of words, for example: 'I'm happy';

- Optional words, for example: hi {'my robot'};

- Forbidden words, for example: I am not happy; the word not does not check the entry;

- Unknown words, for example: my name is ^* ; we do not know the name of the user;

2) Context in a dialogue; we switch from one dialog to another using trigger sentences, for example:

- Ut: (let's talk about cars) this sentence will provoke the launch of the dialogue on cars;

3) Sub-dialogue; a sub-dialog is activated on certain sentences and can be linked in cascades, for example:

- U: (how are you?) I'm fine and you?

A: (I'm not good) ha why?

B: (I'm sick) Damage, do you want a medicine?

A: (I'm fine) super

This sub-dialog feature can for example give rise to a dialog of the type:

Human: How are you?

Robot: I'm fine and you?

Human: I'm not well

Robot: ha good, why?

Human: I'm sick

4) Events:

The taking into account of events as input of a dialogue in the same way as the words picked up by the robot gives the conversational agent of the invention potentialities that do not exist in the prior art. In particular, the visual recognition of the robot allows him to detect a person in his environment and send him a greeting, as will be the case when the person speaks to him: - U: ([e .iaceDetected hi]) hi you

If the robot sees a person or someone says 'hi', then the robot answers 'hello you'.

An event can also be triggered at the end of a dialog, possibly by launching an application:

- U: (I'm hungry) $ userstate = 'hungry'

$ userstate = 'hungry' will both affect hunger to userstate and launch a [userstate, hungry] event upon which an application can subscribe;

5) Selection of implicit or explicit behaviors:

- U: (do you recognize me?) [$ FaceRecognized == "run: faceRecognition I do not recognize you but I'll remember you next time]

6) Proposals; when the robot does not understand or understands what the user is saying, then he or she consumes a proposition of the current dialogue in order to specify it, for example:

- Proposai: how old are you?

- U: (I [5 6 7 8] years old) you are young!

7) Variables; the dialog can store user information, for example:

- U: (I have _ [5 6 7 8] years old) $ age = $ 1 you're young!

8) Dynamic elements; variables and lists (mp3, applications, preferences ...) can be integrated into the input and output dialog, for example:

- U: (what do you know?) I know -applications

- U: (what is your name?) My name is $ name

- U: ({spear reads} ^* _ ~ application) ok I run $ 1 $ application can be for example ('three musketeers', 'the world')

9) Emotions. SAD, HAPPY, CURIOUS, SCARED, TIRED (or SAD, HAPPY, CURIOUS, AFFECTED, FATIGUE), or:

- U: (I do not love you!) It makes me sad TRISTE

10) Erase rules; an entry can be disabled or enabled to avoid a repetition phenomenon in the responses; the same input can be repeated in the dialog or in several dialogs, the erase rules will allow all entries to be interpreted, for example:

- U: delete (how are you) I'm fine

- U: (how are you nao) you remember my name! I'm fine

- U: (how are you) like just now

1 1) Response rules; it is possible to produce several possible outputs by the robot between which the choice is determined according to the inputs it receives from the user or users in a deterministic manner (always the same output, or the output of a given rank in the list, which whether the input), random, sequential (the input i + 1 triggers the output j + 1 if the input i triggers the output j) or conditional. The dialog module has access to the entire memory of the robot and can therefore give an answer according to values of the robot memory; the outputs can be erased after being used to add variety to the dialog; for exemple :.

- U: (how are you) ['I'm fine' 'I've already told you'] # sequential by default

- U: (how are you) ^A random ['I'm fine''I'mfine''I'mfine'']

- U: (what's your name) ^A first ['my name is $ name''I do not have a name'] # Here 'my name is $ name' is only viewable if $ name exists.

- U: (how are you) ^To delete I'm fine # erase the rule after displaying the answer

12) L_ancer a sub-dialog, topic:

- U: (I mean cars) topic: cars Figure 3 shows the processing blocks of a dialog behavior management module and a voice recognition module according to several embodiments of the invention.

When a dialog is executed by the embedded runtime on the robot, the dialogue engine 230 acts on both the network and the dialogue lists 310, 330 and on the voice recognition 220. The dialogue network 310 is the structured set of dialogues that indicates how to articulate them: first an introduction and then another dialogue for example. The network gives meaning to dialogues. List 330 is the unstructured list of active dialogs that is present in both the chat engine and the speech engine.

A dialog can be enabled or disabled (which simultaneously affects all of its entries 340). Activation / deactivation can be triggered automatically by a trigger (ut :) or manually by a user. Minimizing the number of active dialogs at a given time optimizes speech recognition performance in quality and processing time. You can set the dialogs in the editor so that they remain active even if a new dialog is opened, the default solution being that the opening of a new dialog closes the previous dialog. An input of a dialog can also be enabled / disabled individually, either by connecting to a sub-dialog or by deleting to avoid a repetition of an element of the current dialog. The dialogue engine 230 comprises a pattern recognition module 320 whose operation has been illustrated in comment in FIG. 2 (point 1). It also includes a dynamic concepts tree 350.

A concept is a list of words that are defined as semantically equivalent in a given dialogue. As an example, the phrase "I live" is considered in a given dialogue as semantically equivalent to the sentences "I live" "I lodge" "I live", "I live" "I lodge" "I live" ... So we will define a concept (living) and a concept (I):

Concept: (live) (live house live lives live)

Concept: (I) (me I have)

The sentence will be written in several places in the dialogues:

U: (~ I-live)

A dynamic concept tree groups several hierarchically organized concepts. It will also be possible to modify the list of sentences attached to a concept at runtime. For example, the concept "food" includes the concepts "fruits" and "meat" and the concept "fruits" includes "banana" and "orange": Concept: (food) (-fruit -viande)

Concept: (fruit) (orange banana)

It will be possible to add new fruits during dialogues. The following dialogues can be realized:

U: (do you know _ {of the} _ ~ food) yes I know $ 1 $ 2 Which gives to the execution:

User: do you know banana?

Robot: yes I know banana

U: (tell me a fruit) -fruit is a fruit

User: tell me a fruit

Robot: banana is a fruit

The entry in a dialog of the list 330 activates a grammar in the list of grammars 360 of the speech recognition module 220. The list of entries 370 of the speech recognition module is activated / deactivated synchronously with the list of entries 340 of the dialog module. The modification of a concept in the dynamic concepts tree 350 of the dialog module 230 causes an adaptation of the dynamic inputs 380 of the speech recognition module.

FIG. 4 represents an example of a tree of several dialogue levels according to several embodiments of the invention.

Several dialogues are represented in the figure. They can run in parallel (with a stack of priorities), a dialogue that can replace another.

A dialog comprises three logical levels in the robot engine module of dialogue 230 embedded on the robot:

- A level 410 including the active dialogs by default: general dialogs 41 10 (greeting, presentation, mood) and a dialogue called "system" 4120 to know the state of the robot (battery, temperature, configuration ...) or give basic commands (get up, walk ...); the possibility not only to obtain information on the state of the robot's vital functions, but to be able to control some of them (to go into stand-by mode, to connect to a power supply, etc.) makes it possible to lower the barrier psychological feeling felt by non-technical users in their confrontation with robots;

A level 420 comprising the dialog selection routines according to the inputs of the user, said selection being triggered by a trigger Ut: several selections 4210, 4220, 4230, 4240, for example, can be programmed;

A level 430 comprising applications 4310, 4320, 4330, for example, which are dialogue sequences or files and are capable of being launched automatically or manually by a user.

By default a dialog containing generalities and system commands ('speak louder' for example) are loaded. Trigger sentences can then trigger the loading of other dialogs for example for:

- Change the subject of discussion (talk about cars, his day ...);

- Explain what the robot knows how to do ('I know how to tell a story'); this part contains dynamic elements: mp3 installed, applications installed; any application that can be launched by voice recognition must contain information: its theme (game, information ...) and optionally a dialogue specifying the application (The robot can indicate that Alice in Wonderland is a story with a small girl...) ;

- Launch the dialogue of an application (an interactive story for example)

A choice can be proposed: guess a famous person, select a behavior, choose a product, look for a person in a company ... The choice can be made either by the robot (the human must understand what the robot wants), either by the human (the robot must understand the choice of the robot). This choice can be made with a dialogue as described above but this dialogue often involves repeating the same sentences which makes the dialogue difficult to write: U: (guess who I think) it's a man?

A: (yes) it's a woman?

B: (yes) ...

B: (no) ...

A: (no) it's a fictional character?

The notion of concept makes it possible to traverse a tree of possibilities. A concept is a word related to other words, phrases or concepts.

Concept: (man) ['he breathes' 'it's a human']

Concept: (superman) [-man superhero -vole -cape]

Concept: (halliday) [-catter-man]

Concept: (all) [-superman -halliday]

The hierarchical nature of the possible tree is illustrated in Figure 4a for the example above.

The words represent the leaves of the tree. The concepts represent the nodes of the tree. Nodes and leaves are elements of pattern matching.

With one entry:

U: (['know he' is he '-superman) yes

We will be able to match:

Know how to fly

Is he a man

Is he superman

We can also propose:

U: (help me) -superman

Here we display one of the leaves of the tree.

User: help me Robot: he's breathing.

For the human to guess superman, just write:

U: (is it superman?) Yes you found!

U: (-superman) yes

Proposai: no, it's not him.

For the robot to guess a character, just write:

U: (guess who do I think)? ~ All

FIG. 5 represents a simplified flowchart of the speech recognition module processes in one embodiment of the invention.

Two levels of speech recognition are superimposed:

A first level 510 includes a recognizable number of recognizable words; recognized words must appear in a closed list; examples of voice recognition software of this type are provided by Nuance ™ (Vocon ™ brand), Acapella ™ and, for software using natural language, Dragon ™;

A second voice recognition level 520 is of the open type, that is to say that the diversity of the recognized words is much greater; examples of voice recognition software of this type are provided in particular by the company Nuance ™ under the brand name NMSP ™; these softwares make it possible to manage words which are not known in advance which will be designated by a numbered joker

$ X.

A voice recognition architect of this type, comprising two levels, one closed 510 and the other open 520 (for example of the voice dictation type) makes it possible to optimize the speed / quality of recognition torque. FIG. are merged the two types of voice recognition:

- Case 530: the robot is in the same context as that of the user and what he says is recognized by the limited recognition; then voice dictation is not necessary; - Case 540: the robot is not in the same context as that of the user (the user speaks of a car but the robot thinks he is talking about food); then the sentences recognized by the vocal dictation can be compared to a dialogue;

- Case 550: type 520 recognition completes type 510 recognition;

- Case 560: open recognition confirms a possible choice of closed recognition;

- Case 570: the robot does not understand what the user says; it makes a proposal to validate the domain of the dialogue or move to another subject, the cases above 530 to 560 can then be linked to this restart of the robot.

FIG. 6 shows the data flow between several software modules configured to implement the invention in several of its embodiments.

The figure shows the data exchanges between the input events 610, the dialog 620, the events outputs 630 and an artificial intelligence engine embedded on the robot 640:

The dialogue 620 waits for events input (for example a smile 6130 or the word of the user 6120;

- The dialog engine can dynamically load new 6240 dialogs or 6230 dynamic data (for example an mp3 file or an application installed on it);

- He formulates his answer in the form of expressive speech 6310, otherwise a word with information on how to interpret the text (a didascalia for the robot), behavior 6320, emotion 6330, event 6340;

- The outputs of the dialog can be sent to different artificial intelligence modules 640:

^■ Speech and expressions are processed by a processing engine of the expression 6410, Narrator, using movements and speech synthesis, according to the methods described in particular in the international patent application published under No. WO201 1/003628; ^■ Emotions are treated by an emotional engine 6420 that changes the emotion of the robot to stabilize over time;

^■ A 6430 decision engine decides whether to initiate a behavior and can report the decision to the event dialog engine; the robot can refuse to get up if the conditions are not checked to do it.

This behavior may be the choice to use voice recognition or the input keyboard, as explained above in commentary in FIG. 4; the behavior triggers the speech and the interruption of the speech according to the actions of the user, for example, open the mouth, turn the heels, turn the head, etc.

The dialog includes a 6230 interpreter and a 6240 dialog template. A dialog template includes:

- A network of dialogues as well as active dialogues;

- The set of dialogue entries as well as the active entries;

- The set of dialogue outputs;

- The set of propositions of the dialogues.

References 6310, 6320, 6330, 6340 represent the outputs of the dialog engine as events.

Figure 6a illustrates the operation of an emotion engine in some embodiments of the invention.

As explained above in comment 2, the robot's emotion is a point in a multidimensional space of emotions (for example, TRISTE, CONTENT, FURIEUX, FATIGUE ...).

The dialogue engine, but not only the dialogue engine for example, its battery status, faces encountered, time is also a source of evolution of emotion, sends a pulse to the emotional motor that moves his current emotion. This emotion stabilizes toward neutral emotion (0,0,0,0,0,0.) Over time.

Figure 6b illustrates the operation of a decision engine in some embodiments of the invention. The decision engine takes into account all the requests for the execution of behaviors and all the constraints of the robot in the form of available resources. A request to run the chat engine is only part of the decision. The set of variables / events of the robot participate in the decision (battery, temperature, emotions ...).

FIG. 7 represents the various functions at the input and at the output of the management module of a dialogue for implementing the invention in several of its embodiments.

The figure illustrates that a dialogue 710 takes as input the result of the voice recognition 730 as the keyboard inputs 740 or events 720. Dynamic data 750, such as mp3 files or an application can also be taken into account. Advantageously, from the processing of the images received by a camera embedded on the robot, the dialog module analyzes the position of the head of the speaker to know if it is addressed to him. Similarly, he can evaluate the positions of the lips to know if the user speaks or not and therefore, whether to listen or if he can speak (item 760).

Also, the face recognition allows, as well as the speech itself, to indicate the name of the current speaker.

A speech response from the chat engine can be given by the robot's voice or on a 7A0 screen (or both).

As already indicated, the dialog module is able to trigger the execution of behaviors (element 7B0).

Figure 8 shows the data model of a dialog analysis and interpretation module for implementing the invention in several of its embodiments.

The parser 810 finds words of a lexicon 81 1 0 in dialogues 8120 which are supplied to it as input. The input dialogs have the 8140 data model. Libraries 8130 "Libparser.so" parsing the contents of the dialogs perform this function. It allows to build in memory, for the interpreter 820 a model of dialogues and all the entries of these dialogues. At runtime, the interpreter maintains an 8210 stack of active dialogs as well as all active inputs for each user. The "parsed" dialogs at the input of the interpreter have the form 8220 and the data model 8240. The interpreter contains libraries 8240 "Libinterpreter.so" to fulfill its interpretation functions.

Indeed, concepts, variables and current dialogues can be made dependent on the user.

Thus, the following rules allow you to change users:

U: (e: faceRecognition) ($ name = $ faceRecognition)

U: (my name is _ ^* ) ($ name = $ 1)

In this case the variables depending on the user (preferences, age, size ...) are automatically reset or affected according to the user's history.

Behaviors 830 have an 8310 data model of state variables. FIG. 9 represents the architecture of the software modules implanted on a robot configured to implement the invention in several of its embodiments.

A robot such as NAO is advantageously equipped with high level software for controlling the functions of the robot in one embodiment of the invention. A software architecture of this type, called NAOQI, has been disclosed in particular in the patent application WO2009 / 124955 published on 15/10/2009. It comprises the basic functions of communication management between a robot and a PC or a remote site and exchange of software that provide the software infrastructure necessary for the implementation of the present invention.

NAOQI is an optimized framework for robotic applications; It supports several languages, including C ++, Python, Urbi, Java, Matlab. In the context of the present invention, the following NAOQI modules are particularly useful:

- the ALMemory module, 910, manages a shared memory between the different modules of NAOQI;

the ALMotion module 920 manages the movements of the robot;

the voice synthesis module, 930, generates the words of the robot;

the closed recognition module 940 performs the functions of reference 510 of FIG. 5; the Open Recognition module, 950, performs the functions of reference 520 of FIG. 6;

the ALDialog module, 960, performs the functions of the reference dialogue engine module 230 in FIG. 2;

the Narrator module, 970, performs the functions of reference 6410 of FIG. 6;

the decision engine module, 980, performs the functions of reference 6420 of FIG. 6;

the emotion engine module, 990, performs the functions of reference 6430 of FIG. 6.

These modules are advantageously coded in C ++. The figure also shows the data flows between modules.

As indicated in comment in FIG. 2, the dialogs are generated in a dialog editing module 9A0 implanted on a standard computer. They can also be generated in the Choreographer workshop. The consistency between the dialogs of the ALDialog 960 module and those of the 9A0 editing module is ensured. The data stream between the analyzer 810 and the interpreter 820 (which are shown in FIG. 8) of the dialog engine 960 is produced both on the computer at the time of editing and on the robot at the time. execution.

The parser can read a dialog description file u: (....)

The interpreter builds, from the result of the parser (a written dialogue without syntax error), the model of dialogue in memory.

The examples described above are given by way of illustration of embodiments of the invention. They in no way limit the scope of the invention which is defined by the following claims.

Claims

1. A humanoid robot (1 10) comprising: i) at least one sensor selected from a group comprising first sound type sensors and second sensors, of at least a second type, of events generated by at least one user of said robot, ii) at least one event recognition module (610) at the output of said at least one sensor, and iii) at least one event generation module (630) to said at least one user, a module (620) of dialogue with said at least one user, said dialog module receiving as input outputs of said at least one recognition module and producing outputs to said event generation module selected from a group comprising words, motions, expressions and emotions, said robot being characterized in that it further comprises an artificial intelligence engine (640) configured to drive the outputs of the event generation module.

2. Humanoid robot according to claim 1, characterized in that the control of the event generation module by the artificial intelligence engine is performed according to the context of the dialogue and variables defining the present and predictive configuration of the robot.

3. humanoid robot according to one of claims 1 to 2, characterized in that said at least one event recognition module receives inputs from at least two sensors belonging to at least two different types, and in that said at least one event generation module at the output of said dialog module is able to output events taking into account said inputs from said at least two sensors.

4. humanoid robot according to one of claims 1 to 3, characterized in that said at least one recognition module is able to structure the entries in concepts according to a dynamic hierarchical tree.

5. Humanoid robot according to one of claims 1 to 4, characterized in that an entry in said at least one recognition module applies to textual or voice inputs and activates a grammar in said dialogue module.

6. Humanoid robot according to claim 5, characterized in that an entry in said at least one recognition module activates / deactivates the recognition of said input.

7. humanoid robot according to one of claims 5 to 6, characterized in that said at least one recognition module comprises a first and a second sub-modules, the first submodule operating on a closed list of words attached to the least one concept and the second sub-module operating on an open list of words.

8. humanoid robot according to claim 7, characterized in that an output of the first submodule is provided alone to the dialogue module.

9. humanoid robot according to claim 7, characterized in that an output of the second sub-module is provided alone to the dialogue module.

10. humanoid robot according to claim 7, characterized in that an output of the first sub-module and an output of the second sub-module are jointly provided to the dialogue module.

1 1. Humanoid robot according to claim 7, characterized in that an output of the first sub-module is first supplied to the dialogue module, said output of the first sub-module being confirmed in the dialogue module by an output of the second sub-module. -module.

12. humanoid robot according to one of claims 7 to 1 1, characterized in that none of the outputs of the first and second sub-modules generates output of the dialogue module and in that said robot proposes at least one input audit at least one user.

13. humanoid robot according to one of claims 1 to 12, characterized in that the dialogue module further receives dynamic input elements from an application.

14. humanoid robot according to one of claims 1 to 13, characterized in that at least one output of the dialogue module is provided to a module adapted to perform a function selected in a generation of group of functions of at least one expression of said robot, decision to generate at least one behavior of said robot and generating at least one emotion of said robot.

15. humanoid robot according to claim 14, characterized in that said generating function of at least one behavior takes into account the constraints of the system of said robot.

16. humanoid robot according to claim 14, characterized in that said generating function of at least one emotion is able to generate a sequence of predefined expressions between a neutral state and a predefined state in response to input events.

17. humanoid robot according to one of claims 1 to 16, characterized in that it further comprises a visual recognition module, said module being able to interpret at least one sign of said at least one user as a beginning or an end of a sequence of a dialogue.

18. Humanoid robot according to one of claims 4 to 17, characterized in that said dialog module comprises a lexical analysis sub-module and a sub-module for interpreting the outputs of said submodule lexical analysis able to generate concepts to which are attached the words of the current dialogue.

19. Humanoid robot according to one of claims 1 to 18, characterized in that said dialog module is able to handle questions and commands of said at least one user relating to the state of his physical system and / or logic.

20. A method of dialogue between a humanoid robot and at least one user, comprising: i) at least one step of recognizing inputs from at least one sensor selected from a group comprising first sound-type sensors and second sensors; sensors, of at least a second type, of events generated by said at least one user, ii) a step of generating events to said at least one user, and, iii) a step of dialogue with said dialog with said at least one user, said dialog step receiving at the outputs of said at least one recognition step and producing outputs to said event generating step selected from a group comprising words, motions, expressions and emotions , said method being characterized in that it further comprises a step of controlling the outputs of the event generation module by an artific intelligence engine ial.

21. Dialogue method according to claim 20, characterized in that the control of the event generation module by the artificial intelligence engine is performed according to the context of the dialogue and variables defining the present and predictive configuration of the robot.

22. Dialogue method according to one of claims 20 to 21, characterized in that said robot dialog with at least two users, parameters characterizing said at least two users being stored in a memory of said robot to be used when said robot recognizes one of the at least two users.

23. Computer program embedded on a humanoid robot comprising program code instructions for executing the method according to one of claims 20 to 22 when the program is executed on a computer, said program being adapted to manage a dialogue between said humanoid robot and at least one user, said computer program comprising: i) at least one event recognition module at the output of at least one sensor selected from a group comprising first sound-type sensors and second sensors of at least a second type, events generated by said at least one user, ii) at least one event generation module to said at least one user, and iii) a dialog module with said at least one user; a user, said dialog module receiving as input outputs of said at least one recognition module and producing outputs to said event generation module chosen from a group comprising words, movements, expressions and emotions, said program being characterized in that it further comprises an artificial intelligence engine configured to control the outputs of the event generation module.

A computer program comprising program code instructions configured to generate a computer program according to claim 23 and transmitting it to at least one humanoid robot, said instructions being generated in a ChatScript type interface.