EP2596493A1 - Robot humanoide dote d'une interface de dialogue naturel, procede de controle du robot et programme correspondant - Google Patents

Robot humanoide dote d'une interface de dialogue naturel, procede de controle du robot et programme correspondant

Info

Publication number
EP2596493A1
EP2596493A1 EP11730675.3A EP11730675A EP2596493A1 EP 2596493 A1 EP2596493 A1 EP 2596493A1 EP 11730675 A EP11730675 A EP 11730675A EP 2596493 A1 EP2596493 A1 EP 2596493A1
Authority
EP
European Patent Office
Prior art keywords
channel
robot
message
interlocutor
messages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP11730675.3A
Other languages
German (de)
English (en)
French (fr)
Inventor
Bruno Maisonnier
Jérôme MONCEAUX
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Softbank Robotics SAS
Original Assignee
Aldebaran Robotics SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aldebaran Robotics SA filed Critical Aldebaran Robotics SA
Publication of EP2596493A1 publication Critical patent/EP2596493A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B15/00Systems controlled by a computer
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/003Controls for manipulators by means of an audio-responsive input
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention belongs to the field of humanoid robots. More specifically, it applies to the methods of programming and the use of dialogue interfaces with a robot of this type to enable the said robot to perform actions on the order of a user, the provision of adequate responses by said robot and, more generally, restoration of "humanoid relations" between said robot and his or her interlocutors.
  • a robot can be called a humanoid from the moment it has certain attributes of the appearance and functionality of the man: a head, a trunk, two arms, possibly two hands, two legs, two feet ... Beyond appearance, the functions that a humanoid robot is able to fulfill will depend on its ability to perform movements, talk and reason. Humanoid robots are able to walk, to make gestures, with the limbs or with the head. The complexity of the gestures they are able to perform increases constantly.
  • the present invention solves this problem by providing a dialog interface with a humanoid robot that uses a natural mode of confirming responses.
  • the present invention discloses a humanoid robot comprising at least two message communication channels with at least one interlocutor according to different modalities, said at least two channels being each chosen from the reception, transmission, and a control module group. input / output of said channels, said robot being characterized in that said control module is able to improve the understanding of the messages received by said robot by executing at least one function selected in the group combination of messages received / transmitted on a first channel and on a second channel, sending a second message generated from a first message received on a channel.
  • said communication channels are chosen from the group of communication channels transmitting and / or receiving sound, visual, tactile messages, displacements and / or positions of at least part of the robot, and digital.
  • a first communication channel is a sound transmission channel and a second communication channel is a reception channel for displacements and / or positions of at least a part of the robot by said at least one interlocutor, said displacements and / or or positions being representative of inputs communicated by the interlocutor to the robot, the specifications of said inputs being defined by the robot to the interlocutor by the message transmitted on the first channel.
  • the robot of the invention further comprises a third touch communication channel by which the interlocutor validates the inputs made on the second channel.
  • a first communication channel is a sound message reception channel and a second communication channel is a sound message transmission channel and in that said control module is able to evaluate the level of confidence of the understanding by said robot of a first message received on said first channel and generating at least a second message on said second channel whose content depends on said confidence level.
  • the first channel comprises a voice recognition filter messages received by a list of expressions each of which is associated with an expected recognition rate and that the content of said second message is chosen by a heuristic in the group comprising repetition of said first message on the first channel, confirmation request by a third message to be sent by the interlocutor on the first channel of a subset of the expressions of the filter, transmission request by the interlocutor of at least one another message on at least one third channel.
  • the robot of the invention is able to transmit on the second channel a start of listening signal on the first channel to ensure the half-duplex mode sequencing messages on the first and second channel.
  • said choice heuristic is a function of the position of the real recognition rates with respect to thresholds determined from the expected recognition rates.
  • said third channel is a touch reception channel or displacements of a part of the robot.
  • the robot of the invention further comprises an interface module with an electronic mail, said interface module allowing an account holder on said messaging to use said robot as an agent for receiving / reading messages.
  • electronic on the second channel write / forward on the first channel and administer said account by dialogue using said first and second channel.
  • said third channel is a visual reception channel for images of objects corresponding to the list of expressions of the filter of the first channel, said images being compared to an image database of said objects previously recorded with said accessible expressions.
  • a first communication channel is a visual message reception channel and a second communication channel is a sound message transmission channel and in that said control module is able to evaluate the level of confidence of the understanding by said robot of a first message received on said first channel and generating at least a second message on said second channel whose content depends on said confidence level.
  • the first channel comprises an image recognition filter of the messages received by a list of expressions to each of which is associated an expected recognition rate and in that the content of said second message is chosen by a heuristic in the group comprising request for repetition of said first message on the first channel, request for confirmation by a third message to be sent by the party on a third channel for receiving sound messages of a subset of the expressions of the filter, transmission request by the interlocutor of at least one other message on at least one fourth channel.
  • at least one of the channels is a hybrid channel receiving as inputs the outputs of two channels merged by said input and output control module.
  • the invention also discloses a method for controlling the communications of a humanoid robot with at least one interlocutor comprising at least two message transmission steps by communication channels using different modalities, said two steps being each chosen in the reception group , transmission, and a control step of the inputs / outputs of said channels, said robot being characterized in that said control step is able to improve the understanding of the messages received by said robot by performing at least one function selected in the combination group of messages received / transmitted on a first channel and a second channel, sending a second message generated from a first message received on a channel.
  • the invention also discloses a computer program comprising program code instructions for executing the above method when the program is run on a computer, said program being adapted to allow a humanoid robot comprising at least two channels communicating messages with at least one interlocutor according to different modalities, said at least two channels being each chosen from the reception group, transmission, and a sub-program for controlling the inputs / outputs of said channels, said computer program being characterized in that said control routine is able to improve the understanding of the messages received by said robot by executing at least one function selected in the group combination of messages received / transmitted on a first channel and on a second channel, transmission of a second message generated from a first message received on a channel.
  • the invention also discloses a method of developing a communication interface between at least one humanoid robot and at least one interlocutor, said at least one humanoid robot comprising at least two message communication channels with the at least one interlocutor according to different modalities, said at least two channels being each chosen in the reception group, transmission, and an input / output control module of said channels, said control module being able to improve the understanding of the messages received by said robot in performing at least one function selected from the combination group of received / transmitted messages on a first channel and a second channel, transmitting a second message generated from a first message received on a channel, said method being characterized in that it comprises a programming step of said chosen function.
  • said step of programming said chosen function comprises at least one substep of defining a first communication channel as a sound transmission channel and a second communication channel as a travel receiving channel. at least one robot member by said at least one interlocutor, a sub-step of defining a correspondence between said movements and inputs communicated by the interlocutor to the robot, and a substep of defining the specifications of said inputs by generating at least one message to be transmitted by the robot to the interlocutor on the first channel.
  • the development method of the invention further comprises a substep of defining a third touch communication channel by which the interlocutor validates the inputs made on the second channel.
  • the steps of the development method of the invention are carried out via at least one control box in which a main action frame to be performed by said robot is connected to at least one event selected from the group. antecedent events and successor events to the action to be programmed and programmed to take place according to a temporal constraint predefined by a Timeline.
  • said programming step of said selected function comprises at least one sub-step of defining a first channel of communication as a sound message receiving channel and a second communication channel as a sound message transmission channel, a sub-step of defining a confidence level evaluation function of the understanding by said robot of a first message received on said first channel and a substep of defining the generation of at least a second message on said second channel whose content depends on said confidence level.
  • the development method of the invention further comprises a substep of defining a voice recognition filter of the messages received on the first channel by a list of expressions each of which is associated with an expected recognition rate and a sub-step of defining the content of said second message by a heuristic chosen in the request group of repetition of said first message on the first channel, confirmation request by a third message to be sent by the interlocutor on the first channel of a sub all the expressions of the filter, request for transmission by the interlocutor of at least one other message on at least one third channel.
  • the steps of the development method of the invention are carried out via at least one control box in which a main action frame to be performed by said robot is connected to at least one event selected from the group. antecedent events and successor events to the action to be programmed and programmed to take place according to a temporal constraint predefined by a timeline, said command box being a choice type box.
  • the invention also discloses a computer program comprising program code instructions for executing the above development method when the program is run on a computer, said program being adapted to allow a user to program a robot.
  • humanoid device comprising at least two message communication channels with at least one interlocutor according to different modalities, said at least two channels being each selected from the reception, transmission group, and a control subprogram of input / output of said channels, said computer program being characterized in that it comprises a programming module in the control routine of at least one function to be executed by the selected robot in the group of received messages / transmitted on a first channel and a second channel, sending a second message generated from a first message received on a channel.
  • the computer program of the invention further comprises a module for programming the passage of at least one parameter to a control box.
  • the computer program of the invention further comprises a module for programming the return of the inputs of a visual communication channel of the robot in the interface of said program.
  • the computer program of the invention further comprises a module for programming behaviors of the robot running in parallel.
  • the interface of the invention also has the advantage of offering multimodal confirmation modes that can easily be adapted to the environment in which the dialogue is executed, for example if the ambient noise is too high for the voice recognition may have some effectiveness.
  • the user can be asked to replace / confirm ambiguous answers with a touch, a gesture or the display of a particular numerical symbol, color or shape.
  • the user has at his disposal means enabling him to intuitively replace or emulate the traditional interfaces that he is used to using when he is facing his computer or using a smart phone or a touch pad.
  • the modes of expression of the robot can themselves be multimodal, combining inter alia intonation, gaze, gesture to hold the attention of his interlocutor and communicate emotions or clues to answers to provide.
  • the interface of the invention contributes to to improve the results of the recognition system and to enhance the quality of the experience of the user immersed in a "real virtuality", that is to say that of a dialogue with a physically embodied avatar.
  • the invention also provides an environment for developing these interfaces, ergonomic and versatile, which makes it very easy to create, in a very short time, new interaction scenarios specially adapted for uses of the robot not imagined by its designer.
  • FIG. 1 is a diagram of the physical architecture of a humanoid robot in several embodiments of the invention
  • FIG. 2 illustrates the head of a humanoid robot comprising sensors that are useful for implementing the invention in several of its embodiments;
  • FIG. 3 is a schematic diagram of the architecture of high level software for controlling the functions of the robot in several embodiments of the invention.
  • FIG. 4 is a diagram of the functional architecture for editing and programming the behaviors / interactions of a robot in several embodiments of the invention
  • FIG. 5 is a functional flowchart of the treatments generally applied to improve the interpretation given by a humanoid robot of the responses / stimuli it receives in several embodiments of the invention
  • FIG. 6 is a logic diagram for programming the behaviors / interactions of a robot in several embodiments of the invention.
  • FIGS. 7a, 7b and 7c represent timing diagrams illustrating the logical and temporal combination of the interactions of a multimodal interface in several embodiments of the invention.
  • FIGS. 8a, 8b, 8c, 8d and 8e show a series of screens making it possible to program a dialogue with a robot humanoid with binary choice and option of changing the interaction language in one embodiment of the invention
  • FIGS. 9a, 9b, 9c, 9d and 9e show a series of screens making it possible to program a dialogue with a humanoid robot with choice choices in a list and option of changing the language of interaction in one embodiment of the invention. 'invention;
  • FIGS. 10a, 10b, 10c and 10d show a series of screens making it possible to perform a comparative speech recognition test between several options of a list of choices in one embodiment of the invention
  • FIGS. 11a and 11b show a series of screens making it possible to replace or supplement options of a list of choices and to perform a new comparative speech recognition test between several options in one embodiment of the invention. 'invention;
  • FIGS. 12a, 12b, 12c and 12d show a series of screens making it possible to perform a comparative voice recognition test between several options of a list of choices in a language different from that of the question in one embodiment of the invention. the invention;
  • FIGS. 13a, 13b, 13c and 13d show a series of screens making it possible to check / modify the thresholds of the comparative speech recognition tests between several options of a list of choices in one embodiment of the invention.
  • FIG 1 illustrates the physical architecture of a humanoid robot in one embodiment of the invention.
  • a humanoid robot has been disclosed in particular in the patent application WO2009 / 124951 published on 15/10/2009.
  • This platform served as a basis for the improvements that led to the present invention.
  • this humanoid robot can be indifferently referred to under this generic name or under its trademark NAO TM, without the generality of the reference being modified.
  • This robot comprises about two dozen electronic cards type 1 10 control sensors and actuators that control the joints.
  • the card 1 10 shown in the figure is the one that controls the left foot.
  • a The virtues of architecture are that the cards controlling the joints are for the most part interchangeable.
  • a joint normally has at least two degrees of freedom and therefore two motors.
  • the joint also includes several position sensors, including MRE (Magnetic Rotary Encoder).
  • MRE Magnetic Rotary Encoder
  • the electronic control card includes a commercial microcontroller. It can be for example a DSPIC TM of the company Microchip. It is a 16-bit MCU coupled to a DSP. This MCU has a servo loop cycle of one ms.
  • the robot can also include other types of actuators, including LEDs (electroluminescent diodes) whose color and intensity can reflect the emotions of the robot. It may also include other types of position sensors, including an inertial unit, FSR (ground pressure sensors), etc ....
  • the head 160 includes the intelligence of the robot, including the card 130 which performs the high-level functions that allow the robot to perform the tasks assigned to it, including, in the context of the present invention, participation in games.
  • the card 130 could however be located elsewhere in the robot, for example in the trunk. However, we will see that this location, when the head is removable, can replace these high-level functions and thus in particular to completely change the intelligence of the robot and therefore its missions very quickly. Or conversely to change a body by another (for example a defective body by a non defective) keeping the same artificial intelligence.
  • the head may also include specialized cards, especially in the speech or vision processing or also in the processing of service inputs / outputs, such as the encoding necessary to open a port to establish a communication remotely over Wide Area Network (WAN).
  • WAN Wide Area Network
  • the processor of the card 130 may be a commercial x86 processor.
  • a low-power processor such as the Géode TM from AMD (32-bit, 500 MHz) will be favorably selected.
  • the card also includes a set of RAM and flash memories. This card also manages the communication of the robot with the outside (behavior server, other robots ...), normally on a WiFi transmission layer, WiMax, possibly on a public network of mobile data communications with standard protocols possibly encapsulated in a VPN.
  • the processor is normally controlled by a standard OS which allows to use the usual high-level languages (C, C ++, Python, ...) or the specific languages of artificial intelligence like URBI (programming language specialized in robotics) for programming high-level functions.
  • a card 120 is housed in the trunk of the robot. This is where the computer that provides the transmission to cards 1 10 orders calculated by the card 130. This card could be housed elsewhere in the robot. But the location in the trunk is advantageous because it is located near the head and at the crossroads of the four members, which allows to minimize the connectivity connecting the card 130 to the card 120 and cards 1 10.
  • the calculator of this card 120 is also a commercial processor. This may advantageously be a 32-bit processor of the ARM 9 TM type clocked at 100 MHz. The type of processor, its central position, close to the on / off button, its connection to the control of the power supply make it a tool well adapted for the management of the power supply of the robot (standby mode, emergency stop,. ..).
  • the card also includes a set of RAM and flash memories.
  • FIG. 2a and 2b respectively show a front view and a side view of the head of a humanoid robot having sensors useful for the implementation of the invention in several of its embodiments.
  • the head 160 of Figure 1 is improved to a head 200a, 200b, so as to provide the robot with sensory capabilities and expressions that are useful in the practice of the present invention.
  • NAO has 4 omnidirectional microphones 21a, 212a, 213a, 214a, for example KEEG1540PBL-A provided by Kingstate Electronics Corp., 21a at the front, 214a at the back and 212a and 213a. on each side of the head (see also Figure 2b), of which only the access holes to the outside are visible in the figures because they are distributed to the inside of the head.
  • a voice recognition and analysis system for example a BabEAR TM system provided by the company Acapela TM, recognizes a corpus of predefined words that a user having the appropriate interfaces, presented more far in the description, can enrich with its own terms. These words trigger the behavior of his choice, including answers to questions interpreted by the robot.
  • the software environment supports multiple languages, as indicated later in the description. NAO is also able to detect the origin of a sound, which allows it to remove ambiguities between several speakers.
  • NAO sees through two 640x480 CMOS cameras, 220a, capable of capturing 30 frames per second, for example of brand cameras Omnivision TM reference 0V760 (CMOS sensor 1/6 th inch: 3.6 ⁇ pixels).
  • the first camera placed at the forehead is pointed towards its horizon, while the second placed at the level of the mouth, scrutinizes its immediate environment.
  • the software can retrieve photos of what NAO sees and the video stream.
  • NAO embeds a set of face detection and recognition algorithms, shapes, which allow it to recognize its interlocutor, to locate a ball as well as more complex objects.
  • NAO is equipped with a capacitive sensor, 230a, for example divided into three sections and developed specifically by the plaintiff for this application. More than three sections could be planned for particular applications. It is thus possible to give information to NAO by touch, for example by pressing a series of buttons allowing the triggering of actions defined by the application, which may be, in the context of the present invention, different responses associated with each button, a scroll in a list of choices, access to a help menu, etc.
  • the system is accompanied by LEDs that indicate whether there is contact.
  • NAO can express itself by reading aloud any text file residing locally in its storage space, for example programmed according to the modes explained later in the description or retrieved from a website or an RSS feed.
  • With 2 loudspeakers, 210b arranged on each side of the head, its voice synthesis system Acapela Acapela Acapela for example, is configurable, which allows changes in particular the speed and / or tone of the voice .
  • FIG. 3 is a diagram of the architecture of high level software for controlling the functions of the robot in one embodiment of the invention.
  • FIG. 3 is very schematically represented a first humanoid robot RH1 communicating with a first remote terminal TD1, for example by wireless link for reasons of mobility.
  • Remote terminal means a remote terminal of the server platform PFS, providing, via a communication network, access to a web service SW, dedicated to this type of humanoid robot RH1.
  • a second humanoid robot RH2 communicates with a second remote terminal TD2, for example also by wireless link so as not to hinder the mobility of the humanoid robot RH2.
  • the TD1 and TD2 remote terminals and the PFS server platform are networked via the RC communication network.
  • the respective modules M51, M52, M21, M22, M41, M42, M1 1, M12, M31, M32 of the linking modules B5, B2, B4, B1 and B3 are in this example represented in number of two per module. linking, but this number may be different and any for each linking module.
  • the first humanoid robot RH1 triggers the module M1 1 which must first use a function "On".
  • the module M1 1 then uses a connection interface module and function call or proxy P1 which makes a request to the connection module B1 to which the module M1 1 is linked.
  • the linking module B1 makes requests to its own modules and modules for connecting the network to which it is directly connected (child contact modules) which repeat this operation iteratively until a network matching module responds to the request with the location of the called function it has in a module.
  • the response to the request is also transmitted iteratively by the parent linking modules (in the opposite direction) to the connection module B1 directly linked to the proxy P1 needing to connect and call this function.
  • the function requested for the step is located in the module M41 of the second remote terminal TD2.
  • the connection module B4 returned the call parameters of the "on" function, which, for example, contain an integer duration parameter in seconds representing the duration during which the robot is going to walk, and a parameter Exclusive, of Boolean type, representing the exclusive or not running of the robot, ie if the robot is allowed to do another action or not while walking.
  • the walk function with the parameter Duration equal to 10 and the Exclusive parameter equal to 1, because we want it to speak after having walked for 10 seconds in this example.
  • connection and call interface module P1 can thus make the connection and the call of the "on" function with the desired parameters, remotely, as if it were located locally.
  • the connection interface and function call modules use intercom software capable of calling a function of a module located on a different terminal or server, the function being able to be written by a series of instructions in a computer language different from that of the calling module.
  • Proxies use, for example, the SOAP intercommunication software. We therefore have an inter-platform and inter-language communication architecture. Once this function delocalized "on” carried out, the module M1 1 must call a function "speaks".
  • Another connection interface and function call or P2 proxy module makes a request to the connection module B1 to which the module M1 1 is linked.
  • connection module B1 makes a request to its own modules M1 1 and M12 in a first step, through a function performed in the form of a sequence of stored instructions, which goes, by example, return the presence of this function "speaks" in the module M12.
  • the connection module B1 informs the connection interface module and function call P2 which can then call directly, by a call-type call local the "speak" function of the module M12, with as parameter, for example, the text to say "hello", this parameter having been transmitted to the P2 proxy by the linking module B1.
  • the system comprises an STM storage and management module (short for "Short Term Memory” in English) of parameters representative of the state of the mobile terminal, in this case the humanoid robot RH1, adapted to set day the values of said parameters on receipt of an external event, and to inform a module, upon prior request, of an update of one of said stored parameter. Also the module prevented will be able to initiate an action according to the modifications of parameters of which it was informed.
  • STM storage and management module short for "Short Term Memory” in English
  • the STM storage and management module can memorize the state of a parameter representative of the appearance of someone detected by a motion detector of the robot RH1.
  • this parameter passes from a representative state of person in the immediate environment of the robot to a representative state of someone present in the immediate environment of the robot, on request previously carried out by the module M1 1, the memory module and STM prevents, by an event or signal, this change in value.
  • the module M1 1 can then, for example, automatically trigger the successive triggering described above (the functions "on” and "speaks").
  • the storage and management module STM is part of the remote terminal TD1, but, as a variant, it can be part of the other remote terminal TD2, of the server platform PFS, or a humanoid robot RH1 or RH2.
  • the STM storage and management module is also capable of storing in memory a temporal evolution of certain parameters over respective reference time intervals.
  • a module of the system can, in addition, have access to the evolution of the values of these parameters for a certain duration, and take into account these changes in the actions to be carried out.
  • the modules of the called functions can be located on the server platform PGS, on a humanoid robot RH1, RH2 or on a remote terminal TD1, TD2 of the communication network RC.
  • the present invention makes it possible to have a program distributed over the network, and an identical operation of the mobile terminal, whether it makes a local or remote call to a function.
  • the present architecture also makes it possible to have a set of stored parameters representative of the state of the mobile terminal, and to be able to take account of changes in this state to trigger certain actions automatically.
  • the storage and management module can also record an evolution of parameter values during a predetermined time interval, which allows a module to have access to a history of the evolution of these parameters.
  • NAOQI a system for operating and managing robot interfaces
  • FIG. 4 is a diagram of the functional architecture for editing and programming the behaviors of a robot in one embodiment of the invention.
  • Choregraph TM Such an architecture has been described by patent application PCT / EP2010 / 057111 filed on 25/05/2010.
  • the software for editing and programming the behaviors of a humanoid robot for implementing said architecture is commercially known as Choregraph TM, and may be denoted by its generic name or by its commercial name, without altering the generality of the references.
  • the robot controlled by this architecture may be a humanoid robot having a head, a trunk and four members, each of the parts being articulated, each articulation being controlled by one or more motors.
  • This architecture allows a user of the system to control such a robot by creating simulated behaviors on a virtual robot and executed on the real robot connected to the system by a wired or wireless link.
  • behaviors such as walking - straight, right or left of n not, a hello - movements of one of the arms above the head speech, etc.
  • movements of the head, part of a member, a given angle
  • Figure 4 is a process flow diagram that illustrates the articulation of the commands triggered by events with their temporal dimension. Commands triggered by events are represented in the semantics of the invention by Boxes or "Boxes” or “Control Boxes” 410.
  • a Box is a tree-based programming structure that may include one or more of the following elements that are defined next:
  • timeline or time axis of 420 frames
  • Control boxes are normally interconnected by connections that most often transmit event information from one Box to another, as detailed later in the description. Any Box is connected directly or indirectly to a "Root Box” or Root that initializes the behavior / motion scenario of the robot.
  • a time axis of Frames 420 represents the temporal constraint to which the behaviors and movements of the robot defined in the box in which the said time axis of frames is inserted.
  • the Timeline thus synchronizes the behaviors and movements of the Box. It is divided into frames which are associated with a run rate defined in number of frames per second or Frames Per Second (FPS).
  • the FPS of each Timeline is customizable by the user. By default, the FPS can be set to a given value, for example 15 FPS.
  • a Timeline can include:
  • Behavior Layers or "Behavior Layers” 430 each comprising one or more Behavior Key Frames or “Behavior Principal Frames” 450, which may themselves include one or more Diagrams or "flow diagrams” 470, which are actually sets of boxes that can also be directly attached to a higher level mailbox, without going through a Behavior Layer or a Timeline;
  • One or more Motion Layers or "Motion Layers” 440 each comprising one or more Motion Key Frames or "Main Motion Frames” 460 which may include one or more Motion Screens or "Motion Screens” 480.
  • a behavior layer defines a set of behaviors of the robot or main behavior patterns. Several behavior layers can be defined within the same box. They will then be programmed to run synchronously by the Timeline of the Box.
  • a behavior layer may include one or more main patterns of behavior.
  • a main behavior matrix defines a behavior of the robot, such as walking ("Walk"), speech (“Say”), the music game (“Music”) ...
  • a certain number of behaviors are pre-programmed in the system of the invention to be directly inserted by the user in a simple "drag and drop” from a library as detailed later in the description.
  • Each Primary Behavior Frame is defined by a trigger event that is the start of the frame to which it is inserted into the Timeline.
  • the end of the Main Behavior Frame is defined only to the extent that another Main Behavior Frame is inserted after it, or if an end event is defined.
  • a Movement Layer defines a set of robot motions that are programmed by one or more successive main Motion Frames that group movements of the robot's joint motors. These movements to be executed are defined by the angular positions of arrival of said engines that can be programmed by action on motion screens, said actions being detailed further in the description. All the main Motion Frames of the same Box are synchronized by the Timeline of the Box.
  • a Main Motion Frame is defined by an Arrival Frame. The starting frame is the ending frame of the previous main movement frame or the start event of the box.
  • the Main Behavior Frames and the Main Motion Frames are commonly referred to as Main Action Frame.
  • a flow diagram is a set of connected boxes, as detailed below.
  • Each of the Boxes may in turn include other timelines to which new patterns of behavior or movement are attached.
  • a script is a program directly executable by the robot.
  • the scripts are privilegedly written in C ++ language.
  • a Box that includes a script does not include any other element.
  • the software can be installed on a PC or other personal computer platform using a Windows TM, Mac TM or Linux TM operating system.
  • the humanoid robot of the present invention will generally be programmed to interact with a human being using the Choregraph TM software.
  • Choregraph TM software The combination of temporal and behavioral logics made possible by this development architecture is particularly advantageous for the implementation of the present invention.
  • a number of tools, discussed later in the following description, have been particularly developed for the implementation of a humanoid robot with a natural dialogue interface in the context of the present invention.
  • FIG. 5 is a functional flowchart of the treatments generally applied to improve the interpretation given by a humanoid robot of the response / stimuli it receives in several embodiments of the invention.
  • GUI components in English: GUI Elements or Graphical User Interface Elements
  • text box in English
  • OK / Cancel buttons checkboxes
  • radio buttons or Combo Boxes.
  • the existing autonomous robots can set up simple human-robot interfaces, such as voice recognition, but in the prior art, no multimodal, regionalized (allowing multilingualism) and fail-managing user interface elements provided to users and developers.
  • the human does not speak naturally to a robot because he does not find his human references, that is to say, the gestures and the behaviors that a human would have in the same situation.
  • the interaction will not be particularly natural if the robot does not look in the direction of the human, usual interaction in the Man-Man interaction.
  • the type of voice recognition compatible with the computer resources embedded on a multi-function humanoid robot does not effectively manage interactions with multiple users.
  • speech synthesis is usually programmed with pre-written sentences by humans, whether a story invented for the robot or an email written by a human and that the robot will read. There is therefore a lack of elements to bring the Man-robot interaction closer to a Man-Man interaction.
  • the human-robot interfaces of the prior art do not have enough multi-modality or interaction codes to simulate a natural human-human interaction and contribute to the success of the interaction.
  • the interface uses knowledge already acquired by the user and even those he uses daily, the experience will be much easier and will require little learning from the user .
  • sweeping a room in a virtual world will be all the more instinctively with a virtual reality headset by moving your head than by pressing the arrows on a computer keyboard.
  • the solution of the invention proposes user interface elements, combining software and hardware, adapted to an autonomous humanoid robot.
  • GUI Elements used above to the behaviors of a robot
  • UlElements can for example be defined to simply code actions such as:
  • the UlElements of the invention are elements that can be used and parameterized easily by a behavior developer. These are mainly choreographic boxes that become GUIs Basic elements for programming behaviors. Notably, some of these boxes include Choreographer plugins, encoded in C ++ using a Widget library produced by the Qt TM environment for developing GUI components.
  • This module comprises, physically or logically, the transmission / reception preprocessing means of the specialized communication channels of which the robot is equipped.
  • a type 1 receiver channel 521 corresponds to human hearing and enables a robot to acquire sound signals, preferably messages voice with semantic content.
  • the robot can be equipped with microphones 210a shown in Figure 2a.
  • the outputs of this channel are normally preprocessed by a dedicated signal processing processor that executes speech recognition algorithms. These algorithms can be more or less complex and variable in effectiveness depending on the environment in which they are used (ambient noise, multi speakers ...) and the achievement of specific learning more or less complete. In all configurations, however, recognition errors are unavoidable.
  • a type 1 transmitter channel 531 corresponds to human speech and enables a robot to speak, that is to say to pronounce semantic content voice messages, for example via speakers 210b represented on the device.
  • Figure 2b The language, timbre, rhythm and tone of the voice can be varied depending on the context and to express a feeling. But these sounds can also be beeps, prerecorded music, it being understood that the beep, in Morse sequence for example, and the music, according to pre-established codes, can also have a semantic content.
  • a type 2 receiver channel 522 corresponds to human vision and allows a robot to locate its environment and acquire images that it can then recognize if they are stored in a memory accessible to it.
  • the robot can be equipped for example CMOS cameras 220a shown in Figure 2a.
  • One of the cameras is preferably dedicated to distant vision, the other to near vision.
  • the image recognition algorithms are adapted to allow detection or recognition of the faces of the interlocutors of the robot. Again, whatever the performance of recognition, uncertainties or errors are inevitable.
  • Image recognition can also be applied to simple shapes such as figures presented to the robot on visuals or marks, the meaning of which can be defined by coding.
  • a transmitter channel 532 of type 2 is an artificial channel without direct human equivalent. This channel allows the emission of light signals produced by LEDs implanted on the body of the robot. Many LEDs can be provided, especially on the eyes, ears, torso, feet. They may have different colors and may have variable frequency flashing capability. This channel provides the robot with simple and powerful means of sending messages. In particular, a particular code can be defined and programmed by a user.
  • a 523 type 3 receiver channel is a channel equivalent to human touch. This channel is however limited in its tactile areas. These may for example be concentrated in a touch sensor such as the sensor 230a shown in Figure 2a.
  • the interlocutor of the robot will activate the touch sensor to communicate a message to the robot, binary type (validation of an action) or more complex.
  • the information received by this channel can indeed correspond to a code defined by the user, either unitary (tape, caress respectively having a meaning of punishment and reward), or sequential Morse type.
  • a specific touch sensor is not necessarily necessary to define a communication channel of this type.
  • a channel of the same type to the extent that it receives a contact action from a speaker, can be defined in which the message sensor is a continuous analog sensor represented by the positions of the arms and / or forearms of the robot, said positions being representative of digital values communicated by the interlocutor to the robot, as will be explained later in the following description.
  • the robot knows at any moment the angular positions of its joints and therefore knows how to interpret as a message variations thereof caused by a displacement under the action of the interlocutor, if the meaning of said displacement has been defined to advance.
  • a simple touch of a limb (the forearm for example) can also be detected by the angular position sensors of the joints of the robot. Sudden movements, such as jolts or uplift, can be detected by the robot's inertial unit and its foot-sole sensors (FSR), respectively.
  • FSR foot-sole sensors
  • a type 533 transmitter channel of type 3 is equivalent to the human gesture.
  • the head can be endowed with two degrees of freedom: displacement in azimuth, measured by a yaw angle (or yaw in English) and displacement in elevation, measured by a pitch angle (or pitch in English).
  • These two movements traditionally define approval (pitch) or denial (yaw) messages. They also allow the robot to lead his gaze towards the interlocutor with whom he is in conversation.
  • the shoulders, elbows, wrists can be respectively given the following degrees of freedom: pitch and roll (roll or twist right / left); yaw; yaw.
  • the hand can have opening and closing capabilities. Combinations of the movements of these joints make it possible to define the content of messages to be communicated to the interlocutors of the robot by this channel.
  • the robot can receive and transmit signals via infrared, Bluetooth or Wifi connection. It is therefore possible for an interlocutor to transmit messages to the robot via this channel, in particular by using a remote control programmed for this purpose, for example an Apple TM iPhone TM or another phone with motion capture and / or positioning.
  • a robot can send messages to another robot via these communication ports.
  • a message communication channel can be defined by merging different type channels into a hybrid type channel.
  • the outputs of a sound channel with speech recognition and a visual channel with image recognition can be combined to create a new channel whose outputs will be enhanced by a data fusion process.
  • the output output of this channel is a priori a higher level of confidence than those of the two outputs taken separately.
  • interlocutors 541 and 542 of the robot are shown in FIG. 5 .
  • the interlocutors can be located at a distance from the robot, provided they are connected to the room where it is located by the data links for transmitting the audio and / or visual signals necessary for the exchange of messages. .
  • the use of type 3 communication channels that require physical contact will not be possible.
  • the relative position of the robot with respect to its interlocutors and with respect to its environment can also be measured by particular sensors (speech recognition associated with a location of the speaker, image recognition, ultrasonic sensor, etc.) and be interpreted, for example, crossed with an analysis of volume, tone or expression to characterize the nature of the human dialogue / robot and possibly modify its progress.
  • sensors speech recognition associated with a location of the speaker, image recognition, ultrasonic sensor, etc.
  • module 510 The logic control of the inputs / outputs of these different communication channels is performed by module 510.
  • the input / output control module of the communication channels 510 can also be used more simply to combine message entries, this combination to virtually eliminate any possibility of doubt in the "mind" of the robot.
  • the programming of the combination function of the inputs received by a receiver channel and the outputs transmitted by a receiver channel can be achieved in a simple way using BUIEIements.
  • BUIEIements constituted by a command box of type Choice or Box Choice. This represents a way of making a choice in a closed list. It is especially adapted to the recognition of a limited number of words and sentences, within the framework a dialogue, the robot can ask a question before listening to the choice of the user.
  • the robot states on its type 1 transmitter channel 531, the minimum number and the maximum number available to the user, and tends to its interlocutor one of his arms, the latter being in weak servo.
  • This arm will constitute the receiver channel 523 of type 3 of Figure 5.
  • the low position of the arm is associated with the minimum figure, the high position to the maximum figure.
  • the user thus uses the robot arm as a cursor to choose its number.
  • the robot knows the position of his arm thanks to the sensors available on the ShoulderPitch. To enhance this interaction, the robot looks at his hand while the user moves his arm. At each change of position, the robot can state the number chosen.
  • the user can validate his choice by touching the middle touch sensor on the head of the robot, using another type 523 receiver channel 3. It is also possible, especially in case of too many numbers in relation to the accuracy of the sensors, one arm allows to make a rough adjustment, and the second to choose more precisely. Lists of ordered expressions can be represented by numbers. The procedure above then becomes a modality of choice in a drop-down menu announced by the robot.
  • An alternative for selecting a digit is to use only the touch sensor. For example :
  • FIG. 6 is a logic diagram for programming the behaviors / interactions of a robot in several embodiments of the invention.
  • the example illustrated by the figure is a scenario where a robot dialogues with an interlocutor who offers him a choice in a list of words, for example in the case of a guessing game.
  • a type 1 receive channel, a type 3 receive channel, and a type 1 transmit channel are used.
  • the actions represented by the code 610 in the figure are actions of a robot interlocutor: a choice made by the user on a list, for example, previously stated by the robot; timeout (or lack of choice); answer "yes / no" to a request for confirmation of understanding of one or more words on this list.
  • the actions represented by the code 620 in the figure are actions of the robot that will be activated according to the state of the internal variables represented by the code 630.
  • the significance of these internal variables is as follows:
  • r probability rate of recognition by the robot of the word spoken by the user among those of the list of choices
  • - f cumulative number of recognition failures
  • t number of timeouts (or no choice by the interlocutor after a predefined time);
  • threshold 2 of probability of recognition rate - tmax: maximum number of possible timeouts
  • fmax maximum number of possible failures.
  • timeout is treated corresponds to the application to the problem posed of a simple principle of everyday human life: "Who does not say a word consents ".
  • NAO listens to the user / interlocutor and the variables f and t are initialized to zero. If the interlocutor passes the predetermined timeout time, the timeout counter is incremented and if the maximum number of timeouts is reached, the interaction loop is interrupted.
  • This application can be initialized either in a behavior in a deterministic context where a specific action made by the user will trigger it such as an interpellation of the robot, in a game to know the number of players when it is started or by the support on one of the tactile sensors of the head, either in the context of an artificial intelligence which will trigger it according to parameters such as the detected presence of a human being, the time of day or more generally, the history of events of the day stored by the robot.
  • the probability rate of measured recognition r is compared with thresholds S1 and S2 (S1 ⁇ S2), expected recognition probability rates, which will be described later. of which they are determined.
  • the robot also indicates "I did not understand” and activates another "activateHelpWhenFailure” function consisting in providing the listener with the list of choices and asking the other person to use its tactile sensor, telling him how to use it; beyond (3 ⁇ f ⁇ fmax), the robot can pronounce sentences telling the other person that the conditions for an efficient conversation are not fulfilled, such as "there is too much "noise", which will normally prompt the caller to stop the conversation.
  • the robot can activate that of the functions "activateHelpWhenFailure” consisting of repeating the list of choices;
  • FIGS. 7a, 7b and 7c represent timing diagrams illustrating the logical and temporal combination of the interactions of a multimodal interface in several embodiments of the invention.
  • Choice Boxes are Boxes such as those illustrated under item 410 in FIG. 4, but they are a particular type allowing the particularly effective programming of specialized behaviors for a natural dialogue.
  • - 710a denotes the actions / words of the robot or its interlocutor
  • 720a denotes the touch sensor
  • 750a designates the LEDs of the face of the robot in a rotating animated position
  • 760a designates the LED flash of the robot's face (which may be of different colors depending on the robot's understanding of the message received);
  • R1, R2 and R3 respectively denote a case where the robot unambiguously understands, a case where the robot understands but doubt and a case where the robot does not understand at all; - In Figure 7c, 710c designates the function "Return to the previous menu".
  • LEDs 750a of the face of the robot possibly the LED flash to punctuate the exchange of questions and answers: the LEDs are in fixed position 751 a to indicate that the robot detects speech and analysis;
  • FIGS. 8a, 8b, 8c, 8d and 8e show a series of screens making it possible to program a dialogue with a humanoid robot with a binary choice and an option to change the language of interaction in one embodiment of the invention
  • FIGS. 9a, 9b, 9c, 9d and 9e show a series of screens making it possible to program a dialogue with a humanoid robot with choice choices in a list and option of changing the interaction language in one embodiment of the invention. invention
  • FIGS. 10a, 10b, 10c and 10d show a series of screens making it possible to perform a comparative voice recognition test between several options of a list of choices in one embodiment of the invention
  • FIGS. 11a and 11b show a series of screens making it possible to replace or supplement options of a list of choices and to perform a new comparative speech recognition test between several options in one embodiment of the invention.
  • Figures 12a, 12b, 12c and 12d show a sequence of screens for performing a comparative speech recognition test between several options of a choice list in a language different from that of the question in an embodiment of the invention. 'invention;
  • FIGS. 13a, 13b, 13c and 13d show a series of screens making it possible to check / modify the thresholds of the comparative speech recognition tests between several options of a list of choices in one embodiment of the invention.
  • a Choice Box allows a user to choose a response from a predefined set of choices. It uses an array-like component that allows a developer to write an intuitive and readable set of possible choices. The list of choices can also be put in the box, if the developer does not know it in advance. Thus, for example, in the case of an application handling the email of the user, the robot can make him choose a contact in his address book stored in a separate file.
  • Ul Elements are very configurable tools. UlElements using recognition and / or speech synthesis are regionalized. For example, the Choice Box is editable in French and English. At the GUI level for programming, the Widget Qt TM used to change the edit language of the box can be a ComboBox.
  • the inputs (and outputs) of the choreographer boxes can be of several types:
  • An input (respectively output) of dynamic type retrieves (respectively sort) an ALValue.
  • the ALValue are a union of common types, described in an NAOQI library, including: integers, floats, array, boolean, string, but also "bang", which is an uninitialized ALValue.
  • Dynamic type entries allow you to manage the changes of an application in a very flexible way. In particular, the choice of inter-modal and / or intra- modalities, the presentation of aids are provided to the interlocutors of the robot to activate them may depend on the number of possible choices.
  • Choregraphic software used to implement the invention includes box parameters of Boolean type (Check Box), string of characters (Text Box), multiple choice of strings (Combo Box) editable or not by the end user , integer or Floating floating (Slider), or other.
  • Check Box the programmer who uses the Choice Box in his behavior or application has the option to check or uncheck the boolean "Repeat validated choice” (in French, "Repeat the validated choice”). This will affect the behavior of NAO during the interaction as it defines whether NAO always repeats the user-validated choice or not.
  • a diagnostic tool can maximize the success of voice interaction.
  • the Choice Box when the developer has finished writing his list of words in the table, he can launch this tool which will indicate a percentage of recognition of these words, 100% corresponding to a word that will certainly be recognized by the robot, 0% to a word that the robot will not recognize.
  • This diagnosis is made by comparing the word said speech synthesis (which is assumed to be close to what the user will say) and the expected word by voice recognition.
  • the solution of the invention also solves the problem of voice recognition that does not handle the presence of multiple users. Humans realize that when talking to others, communication is difficult, so they adapt by talking one by one. This situation is facilitated by the existence of clearly single-user interaction codes, such as the use of the tutelage by the robot.
  • a deficient voice recognition requires that the Man-robot interface manages chess situations in the best possible way, to make the user talk at the right moment (it will go through interaction codes) and to make available alternative solutions to the dialogue.
  • an audio diagnostic function makes it possible to solve this type of problem.
  • This function is executed by pronouncing the word to be tested by the speech-synthesis software, text-to-speech.
  • This word is then analyzed by voice recognition. More precisely, the same word is pronounced, for example three times, each time by changing the speed of the voice and its pitch, so as to have a representative sample of the ways of pronouncing the word.
  • the three recognition rates returned by the speech recognition are then averaged, and it is this value which is the estimated percentage of recognition of the word.
  • "Together" mode works as follows: all words in the choice box are listened to by speech recognition, and NAO then calculates the estimated recognition rate as described elsewhere.
  • the "One by One” mode works as follows: for a given line, the word to be analyzed is listened to by voice recognition, as well as the other possible choices on the other lines, but not its alternatives located on the same line as him.
  • the advantage of this diagnosis is that if two "synonyms" are similar, for example "hello! and "hello!, the estimated rate of recognition will not be as low as it would be in "Together” mode (the rates would be very bad because they would often be confused by voice recognition). is not serious that two synonyms are confused by the robot.
  • the synonyms are ranked in descending order of the estimated rate of recognition, and the recognition rate of the best synonym is written at the end of the line.
  • the Choice Box is programmed to ask a user to confirm his answer when the robot is not sure of having correctly recognized or interpreted it.
  • This mechanism is identical to that used by a human who has poor hearing or is immersed in an environment that makes understanding difficult.
  • the robot will have different reactions depending on the level of understanding of the response of the user.
  • Several thresholds (for example the thresholds S1 and S2 defined in comment in FIG. 5) are then fixed as a function of the recognition confidence calculated by the recognition software: for example, when the first recognition threshold S1 is not reached. , the robot asks the player to repeat his answer; when the first threshold S1 is reached but a second threshold S2 of higher recognition is not, the robot will ask a question whose answer will remove the doubt.
  • the robot can also provide help for the user to respond correctly to the robot: he can give the list of possible choices, indicate the means of interaction with him, repeat the question asked if there was one.
  • Interaction codes are also very useful to overcome the deficiencies of speech recognition. Indeed, the speech recognition does not speak to the robot while he speaks, and the time between the launch of speech recognition and the moment when it is really active is quite long. A tone code is thus played at the launch of voice recognition, indicating to the user that he can speak. Then, a rather intuitive visual code, the LEDs of the ears that turn, allows the user to know that the robot is listening.
  • UlElements using voice recognition also offer an alternative way to this voice recognition, to allow the user to succeed in communication even in case of repeated problems of understanding (this may be due to an extremely noisy environment for example).
  • These alternative means can be tactile, sound, visual, etc.
  • the Choice Box allows the user to choose an answer by using the touch sensor: pressing on the front sensor makes it possible to advance in the list of choices (the robot then states each choice), the back one makes it possible to back in this list, the middle one to validate his choice.
  • the robot states the various choices, and that the user says “OK” when he hears the choice he wants to validate. Or, for a confirmation, instead of saying "yes” or "no" the user can press one of the arms of the robot.
  • the input / output control module of the communication channels of the various types 1, 2, 3 defined in comment in FIG. 5 makes it possible to generate, in a simple and user-friendly manner, the management functions of these combinations by links between the different inputs / Outputs of Choice Boxes.
  • the solution of the invention proposes a humanization of the interface, a simulation of the Man-Man interface. We know that three main factors come into play when there is a direct communication between two humans: of course the word, that is to say the words, but also the tone of the voice and the visual elements.
  • the tone of the voice and the facial expressions are nevertheless missing on a robot with frozen face and tone.
  • these two elements are offset by other functions, codes that translate these elements. They require a more or less long learning of the user. The goal is to make this learning as short as possible and therefore the codes are the most consistent and as close as possible to what the user already knows.
  • Recognition and speech synthesis are limiting, in particular by the absence of natural language and a recognition that is uniquely user-friendly and that allows only a limited number of words to be recognized.
  • the solution of the invention solves the problem of the non-use of natural language by robots in order to propose a sufficiently natural human-robot interaction.
  • the voice synthesis of the robot is used at best.
  • most of the UlElements of the robot using synthesis and / or voice recognition are regionalised.
  • a French-speaking user (respectively English-speaking) will be able to converse with his robot in French (respectively in English), thus maximizing the success of the interaction.
  • timings and interaction codes are best used to improve the responsiveness of the robot and facilitate the success of human-robot communication.
  • the Choice Box offers several parameters such as the waiting time for a user response. This ensures that the robot does not wait too long before considering that the user has not responded, but also that it waits long enough for the voice recognition can be activated at the right time.
  • the interaction codes can be gestural, sound and / or visual. Thus a beep of end of voice recognition allows the user to know that the robot does not listen anymore.
  • the communication is made more natural by the use of several communication channels of different modalities, and particular behavior on the part of the robot.
  • sound localization and face detection allows the robot to turn its head towards its human interlocutor, which seems a fact when it is addressed to another human.
  • the robot can also implement a speaker identification (facial recognition, voice stamp, voice print %) so to speak to a particular human by using his name, his own characteristics such as, for example, the history of conversations and behaviors played by the robot.
  • the robot can also know what the user has thought of a behavior as he has touched his touch sensor (Man liked the behavior), and then offer to play it during an oral communication for example. The robot will try to act appropriately to the situation.
  • the robot is enslaved by the icon Chorégraphe "enslave garlic motors on / off", then it is put upright thanks to the "init pose” position of the pose library.
  • the robot By moving the arms, the robot asks "What is your favorite animal?", And then starts a listening sound. While he is listening, his eyes turn blue, as well as his ears, and the touch sensors of his head blink blue.
  • the robot is not sure but believes to understand pangolin. His eyes flash once in green. He then said, while throwing an animation of the arms, "I understood pangolin, is it correct?".
  • the robot flashes once the eyes red and throws a helper while moving his arms: "pangolin, spider, rabbit, or horse? You can also choose an answer using my touch sensor.What is your favorite animal? " and he comes back in listening mode.
  • the robot responds "spider” while flaming his eyes once in blue.
  • the robot flashes once his eyes green, then repeat "rabbit” and out of the box and behavior.
  • Other interactions between communication channels of the robot are possible, such as those described below.
  • the Choice Box makes special use of voice recognition in combination with the touch sensor to recognize the user's choice. Another possibility is to use the robot's vision, especially image recognition. It is an object recognition and not a concept: if you show him a can, he will recognize this same can and not another brand.
  • One of the possibilities of the development software in its version allowing to implement the invetion is to have in this software the camera feedback of the robot. The user can show objects to the robot, see the image obtained in Choreographer, and identify by hand the interesting object in the image. The user names it. The robot then analyzes the object and stores it in its image database.
  • the user can then use these images as possible choices for a choice box.
  • the user wants to fill a Choice Box with object names, such as "can”, "cup”, “magazine”. He fills the Choice Box with these words, then takes a can, his favorite mug and the cover of a magazine and shows them to the robot for analysis as explained above.
  • the Choice Box searches the robot's image database: if an object marked "cup” is present, NAO then searches for it while listening to the user, and so on for the others. words. Thus, the user launches this Box on NAO, which listens to his choices. The user says “bobbin” but the robot does not understand. After two times, the robot explains that he can show him “can", “cup” and “magazine” because they are in his database. The user can while listening listen to the bobbin that was used for recording (or the same brand). The robot then acts as if he recognized the word "bobbin”.
  • the robot In the context of the present invention, it is also possible to program the robot to act as an agent for receiving / reading, writing / sending and administering an email account of a user of the robot. This application is described below.
  • NAO can read emails, reply to an email or send emails to a contact, but also add the author of a mail received to contacts, delete a message, mark it as unread, re-read it, read the next or previous message.
  • Choice Boxes Three Choice Boxes are used in this application, making it an indispensable element. The words were chosen thanks to the audio diagnosis.
  • the robot starts by looking if the user has received new messages. If so, it reads the first new message then launches a Choice Box without question. If not, he launches this same Choice Box but with a question: "What do you want me to do?" Being able to launch a Choice Box with or without question is therefore used in the mail application.
  • This Choice Box allows the user to choose from the possible actions of NAO. These actions are written in the table of the plugin of the box.
  • the output of Choice Box "timeout" is useful, because in case of timeout, NAO reads the following message.
  • a parameter "maximum number of repetition when no reply” is then set to 1: the robot leaves this box choice at the first timeout.
  • the parameter "repeat validated choice” is disabled, because after a choice of the user the robot launches a specific animation or action that clearly shows what he understood. Thanks to the boolean parameters “activate head”, “activate arms” and “activate legs”, the robot will be animated with animations dedicated to the speech.
  • Exit is one of the default choices of the box choice, which allows here to exit the mail application.
  • the parameter "maximum number of repetition when no reply” is for example 3, its default value for, in case of timeout, not to send an email to anyone, but to be able to cancel the sending of the mail and return to the main menu. Similarly, saying "Exit”, the default choice of the application, allows you to return to the main menu.
  • a help function is for the case where the user does not remember his contacts. In this case, with the touch sensor for example, NAO states the list of contacts.
  • the robot will record the message of the user.
  • the message can be re-registered, if the first one is not suitable;
  • the settings are essentially the same as for the Main Menu Choice Box, with the "Maximum number of repeat when no reply" setting set to 1.
  • the parameters "speech recognition timeout”, which indicate after how much time without response the robot considers that there is timeout, and "speech recognition timeout when confirmation” can for example be set to 4 seconds instead of 6 by default, so that the user can easily say nothing and let the message be sent.
  • the Choice Box can also be statically configured with constant parameters over the entire life of the Box. But in the context of using an automatic question generation system, the parameters can be set automatically. For example, in the context of using a conversational agent such as that developed by the company As An Angel, said agent can configure the Choice Box based on the questions and answers that he automatically generated.
  • Chorégraphe boxes are implemented by means of a script in one of the supported programming languages. If this box has some parametrizable aspects, such as the number of repetitions, the language used by the robot, the text that the robot must pronounce, this information is integrated directly into the script of the box. When one wants to modify the parameters of the box, for example after having duplicated it to use it differently, it is necessary to modify the script of the box to change its behavior.
  • each "box parameter” has a name, a description, a type (among boolean, integer, float and string), and depending on the type can have additional attributes, such as a default value.
  • a "box parameter” can be defined as inheriting from the parent box, this which will affect how the value will be determined.
  • the author of the Box can now access the "Box parameters" using several functions that take the name of the "Box parameter” as an argument. It can view the current value of a "Box parameter” and change it. It can also create dynamic "Box parameters", which will not appear in Choregraph, but which can be used as temporary storage in the Box's scripts.
  • the current value of a parameter depends on whether it is marked as inheriting from the Parent Box or not. If it is not (the default case), the "Box parameter" is box specific, and when the box script looks at its current value, it is simply returned. If marked as inheriting, when reading the value, the Boards diagrams hierarchy will be rolled up to find a Parent Box containing a "Box parameter" of the same name. If none is found the current value for the Current Box is used.
  • the robot also has a software module allowing it to recognize objects that pass in the field of vision of its camera. However, the objects to be recognized must first be learned in a learning phase. This learning is done using a specific interface in Choreographer.
  • This interface displays in real time the video sent by the robot's camera.
  • the image is only available when Choreographer is connected to a robot with a camera and a properly configured video capture module.
  • video display When video display is enabled, the user can initiate a learning. A countdown then appears on the image, and the user then has for example 4 seconds to present an object in front of the camera. At the end of the countdown the images are captured and recorded. The user must then crop the object of interest in the image by drawing a polygon on the frozen image. Once the polygon is closed, a dialog opens asking the user to enter keywords defining the object.
  • Each learning generates an entry in a database that is saved by choreographer on the user's computer. Once the learning is finished, a button makes it possible to send a light version of the database on the robot. The object recognition module will then use this database, and when an object is recognized, an event containing the associated keywords will be triggered on the robot.
  • Choreographer is also a behavior editor for the robot.
  • a behavior is an object similar to a computer program, which can be executed by the robot.
  • a behavior management interface on the robot.
  • Choreographer When Choreographer is connected to a robot, an entry in the application's menus is used to display the behavior manager. It is a modal window displaying a list of behaviors installed on the robot, as well as a set of buttons to manipulate them.
  • buttons displayed next to the behavior list allow you to add, remove, and transfer to the user's computer.
  • the user can very easily manipulate the behaviors installed on the robot, as if it were files on his computer.
  • a user can download a behavior, modify it, and reinstall it on the robot, without having to save it on his computer.
  • the behaviors installed by the user can then run in parallel, under the constraints of temporal coherence and between behaviors defined by the different Behavior Boxes, Behavior Frames and Timeline.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Robotics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Mechanical Engineering (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)
  • User Interface Of Digital Computer (AREA)
EP11730675.3A 2010-07-23 2011-07-11 Robot humanoide dote d'une interface de dialogue naturel, procede de controle du robot et programme correspondant Withdrawn EP2596493A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1056047A FR2963132A1 (fr) 2010-07-23 2010-07-23 Robot humanoide dote d'une interface de dialogue naturel, methode d'utilisation et de programmation de ladite interface
PCT/EP2011/061743 WO2012010451A1 (fr) 2010-07-23 2011-07-11 Robot humanoide dote d'une interface de dialogue naturel, procede de controle du robot et programme correspondant

Publications (1)

Publication Number Publication Date
EP2596493A1 true EP2596493A1 (fr) 2013-05-29

Family

ID=43618099

Family Applications (1)

Application Number Title Priority Date Filing Date
EP11730675.3A Withdrawn EP2596493A1 (fr) 2010-07-23 2011-07-11 Robot humanoide dote d'une interface de dialogue naturel, procede de controle du robot et programme correspondant

Country Status (8)

Country Link
US (1) US8942849B2 (zh)
EP (1) EP2596493A1 (zh)
JP (2) JP6129073B2 (zh)
KR (1) KR101880775B1 (zh)
CN (1) CN103119644B (zh)
BR (1) BR112013001711A2 (zh)
FR (1) FR2963132A1 (zh)
WO (2) WO2012010437A1 (zh)

Families Citing this family (113)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9634855B2 (en) 2010-05-13 2017-04-25 Alexander Poltorak Electronic personal interactive device that determines topics of interest using a conversational agent
FR2962048A1 (fr) * 2010-07-02 2012-01-06 Aldebaran Robotics S A Robot humanoide joueur, methode et systeme d'utilisation dudit robot
US9566710B2 (en) 2011-06-02 2017-02-14 Brain Corporation Apparatus and methods for operating robotic devices using selective state space training
US10866783B2 (en) * 2011-08-21 2020-12-15 Transenterix Europe S.A.R.L. Vocally activated surgical control system
KR20130021943A (ko) * 2011-08-24 2013-03-06 한국전자통신연구원 디지털 마인드 서비스 장치 및 방법
JP5982840B2 (ja) * 2012-01-31 2016-08-31 富士通株式会社 対話装置、対話プログラムおよび対話方法
FR2989209B1 (fr) 2012-04-04 2015-01-23 Aldebaran Robotics Robot apte a integrer des dialogues naturels avec un utilisateur dans ses comportements, procedes de programmation et d'utilisation dudit robot
US20130311528A1 (en) * 2012-04-25 2013-11-21 Raanan Liebermann Communications with a proxy for the departed and other devices and services for communicaiton and presentation in virtual reality
US20150314454A1 (en) * 2013-03-15 2015-11-05 JIBO, Inc. Apparatus and methods for providing a persistent companion device
US9764468B2 (en) 2013-03-15 2017-09-19 Brain Corporation Adaptive predictor apparatus and methods
US20170206064A1 (en) * 2013-03-15 2017-07-20 JIBO, Inc. Persistent companion device configuration and deployment platform
US9037396B2 (en) * 2013-05-23 2015-05-19 Irobot Corporation Simultaneous localization and mapping for a mobile robot
US9242372B2 (en) * 2013-05-31 2016-01-26 Brain Corporation Adaptive robotic interface apparatus and methods
US9792546B2 (en) 2013-06-14 2017-10-17 Brain Corporation Hierarchical robotic controller apparatus and methods
US9314924B1 (en) 2013-06-14 2016-04-19 Brain Corporation Predictive robotic controller apparatus and methods
US9384443B2 (en) 2013-06-14 2016-07-05 Brain Corporation Robotic training apparatus and methods
JP5945732B2 (ja) * 2013-07-03 2016-07-05 パナソニックIpマネジメント株式会社 電子部品実装システムにおける伝言伝達装置
US9579789B2 (en) 2013-09-27 2017-02-28 Brain Corporation Apparatus and methods for training of robotic control arbitration
JP5996603B2 (ja) * 2013-10-31 2016-09-21 シャープ株式会社 サーバ、発話制御方法、発話装置、発話システムおよびプログラム
US9597797B2 (en) 2013-11-01 2017-03-21 Brain Corporation Apparatus and methods for haptic training of robots
US9358685B2 (en) 2014-02-03 2016-06-07 Brain Corporation Apparatus and methods for control of robot actions based on corrective user inputs
US9302393B1 (en) * 2014-04-15 2016-04-05 Alan Rosen Intelligent auditory humanoid robot and computerized verbalization system programmed to perform auditory and verbal artificial intelligence processes
EP2933067B1 (en) 2014-04-17 2019-09-18 Softbank Robotics Europe Method of performing multi-modal dialogue between a humanoid robot and user, computer program product and humanoid robot for implementing said method
EP2933070A1 (en) * 2014-04-17 2015-10-21 Aldebaran Robotics Methods and systems of handling a dialog with a robot
CN106573378A (zh) * 2014-06-12 2017-04-19 普雷-艾公司 通过机器人反馈增强编程教育的系统和方法
US10279470B2 (en) 2014-06-12 2019-05-07 Play-i, Inc. System and method for facilitating program sharing
CN106575382B (zh) * 2014-08-07 2021-12-21 学校法人冲绳科学技术大学院大学学园 估计对象行为的计算机方法和系统、预测偏好的系统和介质
CN104267922B (zh) * 2014-09-16 2019-05-31 联想(北京)有限公司 一种信息处理方法及电子设备
US9630318B2 (en) 2014-10-02 2017-04-25 Brain Corporation Feature detection apparatus and methods for training of robotic navigation
CN104493827A (zh) * 2014-11-17 2015-04-08 福建省泉州市第七中学 智能认知机器人及其认知系统
US9717387B1 (en) 2015-02-26 2017-08-01 Brain Corporation Apparatus and methods for programming and training of robotic household appliances
CN104951077A (zh) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 基于人工智能的人机交互方法、装置和终端设备
WO2016206643A1 (zh) * 2015-06-26 2016-12-29 北京贝虎机器人技术有限公司 机器人交互行为的控制方法、装置及机器人
CN106313113B (zh) * 2015-06-30 2019-06-07 芋头科技(杭州)有限公司 一种对机器人进行训练的系统及方法
CN104985599B (zh) * 2015-07-20 2018-07-10 百度在线网络技术(北京)有限公司 基于人工智能的智能机器人控制方法、系统及智能机器人
US9828094B2 (en) * 2015-07-26 2017-11-28 John B. McMillion Autonomous cleaning system
US20170050320A1 (en) * 2015-08-18 2017-02-23 Behzad Nejat Novel robotic device with a configurable behavior image
CN105206273B (zh) * 2015-09-06 2019-05-10 上海智臻智能网络科技股份有限公司 语音传输控制方法及系统
JP5892531B1 (ja) * 2015-11-16 2016-03-23 プレンプロジェクト・ホールディングス有限会社 リンク列マッピング装置、リンク列マッピング方法、及びプログラム
CN105425648A (zh) * 2016-01-11 2016-03-23 北京光年无限科技有限公司 便携机器人及其数据处理方法和系统
CN105680972A (zh) * 2016-01-20 2016-06-15 山东大学 机器人集群协同任务网络同步控制方法
CN105808501A (zh) * 2016-03-09 2016-07-27 北京众星智联科技有限责任公司 一种人工智能学习的实现
JP6726388B2 (ja) * 2016-03-16 2020-07-22 富士ゼロックス株式会社 ロボット制御システム
EP3450118A4 (en) * 2016-04-28 2019-04-10 Fujitsu Limited ROBOT
DE102016115243A1 (de) * 2016-04-28 2017-11-02 Masoud Amri Programmieren in natürlicher Sprache
US11645444B2 (en) * 2016-05-10 2023-05-09 Trustees Of Tufts College Systems and methods enabling online one-shot learning and generalization by intelligent systems of task-relevant features and transfer to a cohort of intelligent systems
US10241514B2 (en) 2016-05-11 2019-03-26 Brain Corporation Systems and methods for initializing a robot to autonomously travel a trained route
US20170326443A1 (en) * 2016-05-13 2017-11-16 Universal Entertainment Corporation Gaming machine
US9987752B2 (en) 2016-06-10 2018-06-05 Brain Corporation Systems and methods for automatic detection of spills
US10282849B2 (en) 2016-06-17 2019-05-07 Brain Corporation Systems and methods for predictive/reconstructive visual object tracker
US10239205B2 (en) * 2016-06-29 2019-03-26 International Business Machines Corporation System, method, and recording medium for corpus curation for action manifestation for cognitive robots
US10016896B2 (en) 2016-06-30 2018-07-10 Brain Corporation Systems and methods for robotic behavior around moving bodies
CN106056109A (zh) * 2016-07-30 2016-10-26 深圳市寒武纪智能科技有限公司 一种基于计算机视觉的讲故事机器人
CN106327291A (zh) * 2016-08-10 2017-01-11 深圳市豆娱科技有限公司 一种基于虚拟现实商城的导购交互系统及其应用方法
JP6517762B2 (ja) 2016-08-23 2019-05-22 ファナック株式会社 人とロボットが協働して作業を行うロボットの動作を学習するロボットシステム
JP2018067100A (ja) * 2016-10-18 2018-04-26 株式会社日立製作所 ロボット対話システム
US10987804B2 (en) * 2016-10-19 2021-04-27 Fuji Xerox Co., Ltd. Robot device and non-transitory computer readable medium
US10274325B2 (en) 2016-11-01 2019-04-30 Brain Corporation Systems and methods for robotic mapping
US10001780B2 (en) 2016-11-02 2018-06-19 Brain Corporation Systems and methods for dynamic route planning in autonomous navigation
JP6713057B2 (ja) * 2016-11-08 2020-06-24 シャープ株式会社 移動体制御装置および移動体制御プログラム
US10723018B2 (en) 2016-11-28 2020-07-28 Brain Corporation Systems and methods for remote operating and/or monitoring of a robot
US11443161B2 (en) 2016-12-12 2022-09-13 Microsoft Technology Licensing, Llc Robot gesture generation
JP6795387B2 (ja) * 2016-12-14 2020-12-02 パナソニック株式会社 音声対話装置、音声対話方法、音声対話プログラム及びロボット
KR102616403B1 (ko) * 2016-12-27 2023-12-21 삼성전자주식회사 전자 장치 및 그의 메시지 전달 방법
CN106548772A (zh) * 2017-01-16 2017-03-29 上海智臻智能网络科技股份有限公司 语音识别测试系统及方法
US10377040B2 (en) 2017-02-02 2019-08-13 Brain Corporation Systems and methods for assisting a robotic apparatus
US10852730B2 (en) 2017-02-08 2020-12-01 Brain Corporation Systems and methods for robotic mobile platforms
JP6433525B2 (ja) * 2017-03-06 2018-12-05 政信 近藤 個人認証装置
CN110692048B (zh) * 2017-03-20 2023-08-15 电子湾有限公司 会话中任务改变的检测
JP7002143B2 (ja) * 2017-03-21 2022-01-20 国立大学法人東京工業大学 コミュニケーション解析装置およびそれに使用される測定・フィードバック装置、インタラクション装置
US10293485B2 (en) 2017-03-30 2019-05-21 Brain Corporation Systems and methods for robotic path planning
CN106920552A (zh) * 2017-03-30 2017-07-04 天津中科先进技术研究院有限公司 一种具有云端交互功能的智能机器人
JP6610610B2 (ja) * 2017-04-27 2019-11-27 トヨタ自動車株式会社 音声入出力装置、無線接続方法、音声対話システム
CN108235745B (zh) 2017-05-08 2021-01-08 深圳前海达闼云端智能科技有限公司 机器人唤醒方法、装置和机器人
CN107219849B (zh) * 2017-05-23 2020-04-07 北京理工大学 一种多途径的捡球和发球机器人控制系统
US10678338B2 (en) 2017-06-09 2020-06-09 At&T Intellectual Property I, L.P. Determining and evaluating data representing an action to be performed by a robot
US10569420B1 (en) 2017-06-23 2020-02-25 X Development Llc Interfacing with autonomous devices
CN111201566A (zh) 2017-08-10 2020-05-26 费赛特实验室有限责任公司 用于处理数据和输出用户反馈的口语通信设备和计算体系架构以及相关方法
US20200357382A1 (en) * 2017-08-10 2020-11-12 Facet Labs, Llc Oral, facial and gesture communication devices and computing architecture for interacting with digital media content
US10083006B1 (en) * 2017-09-12 2018-09-25 Google Llc Intercom-style communication using multiple computing devices
KR102128812B1 (ko) * 2017-12-11 2020-07-02 한국전자통신연구원 로봇의 사회 지능 평가 방법 및 이를 위한 장치
US11024294B2 (en) 2017-12-29 2021-06-01 DMAI, Inc. System and method for dialogue management
US11222632B2 (en) 2017-12-29 2022-01-11 DMAI, Inc. System and method for intelligent initiation of a man-machine dialogue based on multi-modal sensory inputs
US11504856B2 (en) * 2017-12-29 2022-11-22 DMAI, Inc. System and method for selective animatronic peripheral response for human machine dialogue
US10800039B2 (en) * 2018-01-23 2020-10-13 General Electric Company Controlling and commanding an unmanned robot using natural interfaces
US20190236976A1 (en) * 2018-01-31 2019-08-01 Rnd64 Limited Intelligent personal assistant device
US11331807B2 (en) 2018-02-15 2022-05-17 DMAI, Inc. System and method for dynamic program configuration
US10832118B2 (en) * 2018-02-23 2020-11-10 International Business Machines Corporation System and method for cognitive customer interaction
CN108161955A (zh) * 2018-03-19 2018-06-15 重庆鲁班机器人技术研究院有限公司 机器人控制装置
CN110322875A (zh) * 2018-03-29 2019-10-11 富泰华工业(深圳)有限公司 机器人交互系统及方法
FR3080926B1 (fr) * 2018-05-04 2020-04-24 Spoon Procede de commande d'une pluralite d'effecteurs d'un robot
WO2019222160A1 (en) * 2018-05-14 2019-11-21 Board Of Regents, The University Of Texas System Integrated system design for a mobile manipulation robot with socially expressive abilities
JP7000253B2 (ja) * 2018-05-31 2022-01-19 国立大学法人東海国立大学機構 力覚視覚化装置、ロボットおよび力覚視覚化プログラム
CN109003612B (zh) * 2018-06-08 2021-01-29 英业达科技有限公司 基于人工智能的语音问答验证系统及其方法
CN108942926B (zh) * 2018-06-28 2020-06-19 达闼科技(北京)有限公司 一种人机交互的方法、装置和系统
US11230017B2 (en) * 2018-10-17 2022-01-25 Petoi Llc Robotic animal puzzle
KR102228866B1 (ko) * 2018-10-18 2021-03-17 엘지전자 주식회사 로봇 및 그의 제어 방법
WO2020090332A1 (ja) * 2018-10-30 2020-05-07 ソニー株式会社 情報処理装置、情報処理方法、及びプログラム
CN109262617A (zh) * 2018-11-29 2019-01-25 北京猎户星空科技有限公司 机器人控制方法、装置、设备及存储介质
CN109822581A (zh) * 2018-12-08 2019-05-31 浙江国自机器人技术有限公司 用于机房机器人的导览方法
CN109889723A (zh) * 2019-01-30 2019-06-14 天津大学 一种基于nao机器人的音视频数据采集系统
CN109828568B (zh) * 2019-02-15 2022-04-15 武汉理工大学 对RoboCup比赛的NAO机器人寻球步态优化方法
EP3894972B1 (en) 2019-04-29 2023-11-08 Google LLC Motorized computing device that autonomously adjusts device location and/or orientation of interfaces according to automated assistant requests
WO2020251074A1 (ko) * 2019-06-12 2020-12-17 엘지전자 주식회사 음성 인식 기능을 제공하는 인공 지능 로봇 및 그의 동작 방법
CN111061370B (zh) * 2019-12-16 2021-07-16 深圳市云网万店电子商务有限公司 用于智能设备的人机交互装置及方法
CN111694939B (zh) * 2020-04-28 2023-09-19 平安科技(深圳)有限公司 智能调用机器人的方法、装置、设备及存储介质
US11875362B1 (en) 2020-07-14 2024-01-16 Cisco Technology, Inc. Humanoid system for automated customer support
US11907670B1 (en) 2020-07-14 2024-02-20 Cisco Technology, Inc. Modeling communication data streams for multi-party conversations involving a humanoid
CN113222805B (zh) * 2021-05-08 2023-04-07 西北工业大学 一种快速高准确度nao型足球机器人视觉处理方法
KR102519599B1 (ko) * 2021-10-29 2023-04-11 주식회사 서큘러스 멀티모달 기반의 인터랙션 로봇, 및 그 제어 방법
WO2023090951A1 (en) * 2021-11-19 2023-05-25 Samsung Electronics Co., Ltd. Methods and systems for suggesting an enhanced multimodal interaction
CN114770514A (zh) * 2022-05-11 2022-07-22 北京睿知文峰教育科技有限公司 基于stm32的人工智能机器人控制方法及装置
CN116117834A (zh) * 2023-04-11 2023-05-16 佛山宜视智联科技有限公司 可交互的机器人变色系统

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7881936B2 (en) * 1998-12-04 2011-02-01 Tegic Communications, Inc. Multimodal disambiguation of speech recognition
JP2001188555A (ja) * 1999-12-28 2001-07-10 Sony Corp 情報処理装置および方法、並びに記録媒体
JP2002261966A (ja) * 2000-09-08 2002-09-13 Matsushita Electric Works Ltd コミュニケーション支援システムおよび撮影装置
JP4765155B2 (ja) * 2000-09-28 2011-09-07 ソニー株式会社 オーサリング・システム及びオーサリング方法、並びに記憶媒体
WO2002029715A1 (en) * 2000-10-03 2002-04-11 Kent Ridge Digital Labs A system, method and language for programming behaviour in synthetic creatures
JP2004283943A (ja) * 2003-03-20 2004-10-14 Sony Corp コンテンツ選択装置及び方法並びにロボット装置
JP2004295766A (ja) * 2003-03-28 2004-10-21 Sony Corp ロボット装置及びロボットを介したユーザの認証方法
WO2005008432A2 (en) * 2003-07-11 2005-01-27 Sonolink Communications Systems, Llc System and method for advanced rule creation and management within an integrated virtual workspace
WO2005050849A2 (en) * 2003-10-01 2005-06-02 Laird Mark D Wireless virtual campus escort system
US20060031340A1 (en) * 2004-07-12 2006-02-09 Boban Mathew Apparatus and method for advanced attachment filtering within an integrated messaging platform
JP4629560B2 (ja) 2004-12-01 2011-02-09 本田技研工業株式会社 対話型情報システム
US20060122837A1 (en) * 2004-12-08 2006-06-08 Electronics And Telecommunications Research Institute Voice interface system and speech recognition method
JP2006187825A (ja) * 2005-01-05 2006-07-20 Yaskawa Electric Corp ロボット装置およびその制御方法
JP2007069302A (ja) * 2005-09-07 2007-03-22 Hitachi Ltd 動作表出装置
JP2007260864A (ja) * 2006-03-29 2007-10-11 Advanced Telecommunication Research Institute International コミュニケーションロボット
JP2008052178A (ja) * 2006-08-28 2008-03-06 Toyota Motor Corp 音声認識装置と音声認識方法
KR100827088B1 (ko) * 2006-09-07 2008-05-02 삼성전자주식회사 소프트웨어 로봇 장치
US8468244B2 (en) * 2007-01-05 2013-06-18 Digital Doors, Inc. Digital information infrastructure and method for security designated data and with granular data stores
JP2008241933A (ja) * 2007-03-26 2008-10-09 Kenwood Corp データ処理装置及びデータ処理方法
US8706914B2 (en) * 2007-04-23 2014-04-22 David D. Duchesneau Computing infrastructure
JP2009061547A (ja) * 2007-09-06 2009-03-26 Olympus Corp ロボット制御システム、ロボット、プログラム及び情報記憶媒体
FR2930108B1 (fr) 2008-04-09 2010-07-30 Aldebaran Robotics Systeme et procede de communication distribue comprenant au moins un serveur, au moins un terminal distant, et au moins un terminal mobile capable de communiquer avec le terminal distant relie en reseau audit serveur
FR2929873B1 (fr) 2008-04-09 2010-09-03 Aldebaran Robotics Architecture de controle-commande d'un robot mobile utilisant des membres articules
US8275803B2 (en) 2008-05-14 2012-09-25 International Business Machines Corporation System and method for providing answers to questions
JP5334178B2 (ja) * 2009-01-21 2013-11-06 クラリオン株式会社 音声認識装置およびデータ更新方法
FR2946160B1 (fr) 2009-05-26 2014-05-09 Aldebaran Robotics Systeme et procede pour editer et commander des comportements d'un robot mobile.
CN101604204B (zh) * 2009-07-09 2011-01-05 北京科技大学 智能情感机器人分布式认知系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None *
See also references of WO2012010451A1 *

Also Published As

Publication number Publication date
US8942849B2 (en) 2015-01-27
CN103119644A (zh) 2013-05-22
JP2013539569A (ja) 2013-10-24
CN103119644B (zh) 2016-01-20
WO2012010451A1 (fr) 2012-01-26
KR20140000189A (ko) 2014-01-02
US20130218339A1 (en) 2013-08-22
JP2017041260A (ja) 2017-02-23
FR2963132A1 (fr) 2012-01-27
JP6129073B2 (ja) 2017-05-17
KR101880775B1 (ko) 2018-08-17
WO2012010437A1 (fr) 2012-01-26
BR112013001711A2 (pt) 2016-05-31

Similar Documents

Publication Publication Date Title
EP2596493A1 (fr) Robot humanoide dote d'une interface de dialogue naturel, procede de controle du robot et programme correspondant
KR102306624B1 (ko) 지속적 컴패니언 디바이스 구성 및 전개 플랫폼
US11148296B2 (en) Engaging in human-based social interaction for performing tasks using a persistent companion device
EP2834811A1 (fr) Robot apte a integrer des dialogues naturels avec un utilisateur dans ses comportements, procedes de programmation et d'utilisation dudit robot
US20170206064A1 (en) Persistent companion device configuration and deployment platform
JP7260221B2 (ja) ロボット対話方法およびデバイス
WO2018093806A1 (en) Embodied dialog and embodied speech authoring tools for use with an expressive social robot
AU2017228574A1 (en) Apparatus and methods for providing a persistent companion device
CN107430501A (zh) 对语音触发进行响应的竞争设备
WO2016011159A9 (en) Apparatus and methods for providing a persistent companion device
FR2947923A1 (fr) Systeme et procede pour generer des comportements contextuels d'un robot mobile
TW201916005A (zh) 互動方法和設備
FR2991222A1 (fr) Systeme et procede pour generer des comportements contextuels d'un robot mobile executes en temps reel
EP3752958A1 (en) System and method for visual scene construction based on user communication
Li et al. " BIRON, let me show you something": evaluating the interaction with a robot companion
WO2021174102A1 (en) Systems and methods for short- and long-term dialog management between a robot computing device/digital companion and a user
WO2018183812A1 (en) Persistent companion device configuration and deployment platform
Avdic Physical Actuation as an Alternative Approach to the Intelligibility of Smart Speakers
Pettersson et al. Perspectives on Ozlab in the cloud: A literature review of tools supporting Wizard-of-Oz experimentation, including an historical overview of 1971-2013 and notes on methodological issues and supporting generic tools

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20130222

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20171114

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20190403