WO2018000268A1

WO2018000268A1 - Method and system for generating robot interaction content, and robot

Info

Publication number: WO2018000268A1
Application number: PCT/CN2016/087753
Authority: WO
Inventors: 邱楠; 杨新宇; 王昊奋
Original assignee: 深圳狗尾草智能科技有限公司
Priority date: 2016-06-29
Filing date: 2016-06-29
Publication date: 2018-01-04
Also published as: CN106462254A

Abstract

A method for generating robot interaction content, comprising: obtaining a multi-modal signal (S101); determining a user intention according to the multi-modal signal (S102); and generating robot interaction content by combining a current life timeline of a robot according to the multi-modal signal and the user intention (S103). By means of the method, the life timeline where the robot is located is added to generation of the robot interaction content, such that the robot is more humanized when interacting with human and has a human lifestyle within the life timeline, and the humanization of robot interaction content generation, the human-robot interaction experience, and the intelligence can be improved.

Description

Method, system and robot for generating robot interactive content

Technical field

The invention relates to the field of robot interaction technology, and in particular to a method, a system and a robot for generating robot interactive content.

Background technique

Usually, an expression is made in the process of human interaction. Generally, after the eye sees or the ear hears the sound, after the brain analyzes, a reasonable expression feedback is given, and the person comes to a life scene on a certain time axis, such as eating, Sleeping, exercise, etc., changes in various scene values can affect the feedback of human expression. For robots, the current desire for robots to make expression feedback is mainly through pre-designed methods and deep learning training corpus. This kind of feedback through pre-designed programs and corpus training has the following disadvantages: The output of the expression depends on the human text representation, that is, similar to a question-and-answer machine, the different words of the user trigger different expressions. In this case, the robot actually outputs the expression according to the human pre-designed interaction mode, which leads to the robot. Can not be more anthropomorphic, can not be like humans, life scenes at different time points, showing different expressions, that is, the way the robot interactive content is generated is completely passive, so the generation of expressions requires a lot of human-computer interaction, resulting in robots The intelligence is very poor.

Therefore, how to make the robot itself have a human lifestyle in the life time axis and improve the anthropomorphicity of the robot interactive content generation is a technical problem that needs to be solved in the technical field.

Summary of the invention

The object of the present invention is to provide a method, a system and a robot for generating robot interactive content, so that the robot itself has a human lifestyle within the active interactive variable parameters, enhances the anthropomorphicity of the robot interactive content generation, and enhances the human-computer interaction experience. Improve intelligence.

The object of the present invention is achieved by the following technical solutions:

A method for generating robot interactive content, comprising:

Obtaining a multimodal signal;

Determining a user intent based on the multimodal signal;

Based on the multimodal signal and the user intent, the robot interaction content is generated in conjunction with the current robot life timeline.

Preferably, the method for generating parameters of the life time axis of the robot includes:

Extend the robot's self-awareness;

Get the parameters of the life timeline;

The self-cognitive parameters of the robot are fitted to the parameters in the life time axis to generate a robot life time axis.

Preferably, the step of expanding the self-cognition of the robot specifically comprises: combining the life scene with the self-knowledge of the robot to form a self-cognitive curve based on the life time axis.

Preferably, the step of fitting the self-cognitive parameter of the robot to the parameter in the life time axis comprises: using a probability algorithm to calculate each parameter of the robot on the life time axis after the time axis scene parameter is changed. The probability of change forms a fitted curve.

Preferably, wherein the life time axis refers to a time axis including 24 hours a day, and the parameters in the life time axis include at least a daily life behavior performed by the user on the life time axis and parameter values representing the behavior.

Preferably, the multi-modal signal includes at least an image signal, and the step of generating the robot interaction content according to the current multi-modality signal and the user intention, in combination with the current robot life time axis, specifically includes:

Based on the image signal and the user intent, the robot interaction content is generated in conjunction with the current robot life timeline.

Preferably, the multi-modal signal includes at least a voice signal, and the step of generating the robot interaction content according to the current multi-modality signal and the user intention, in combination with the current robot life time axis, specifically includes:

Based on the speech signal and the user intent, the robot interaction content is generated in conjunction with the current robot life timeline.

Preferably, the multi-modal signal includes at least a gesture signal, and the step of generating the robot interaction content according to the current multi-modality signal and the user intention, in combination with the current robot life time axis, specifically includes:

A robot interaction content is generated in accordance with the current robot life timeline based on the gesture signal and the user intent.

The invention discloses a system for generating robot interactive content, comprising:

An acquisition module for acquiring a multimodal signal;

An intent identification module, configured to determine a user intent according to the multimodal signal;

a content generating module, configured to combine current according to the multimodal signal and the user intention The robot life timeline generates robot interactive content.

Preferably, the system comprises a time axis based and artificial intelligence cloud processing module for:

Extend the robot's self-awareness;

Get the parameters of the life timeline;

Preferably, the time-based and artificial intelligence cloud processing module is further configured to combine a life scene with a self-awareness of the robot to form a self-cognitive curve based on a life time axis.

Preferably, the time axis-based and artificial intelligence cloud processing module is further configured to: use a probability algorithm to calculate a probability of each parameter change of the robot on the life time axis after the time axis scene parameter is changed, to form a fitting curve.

Preferably, the multi-modal signal includes at least an image signal, and the content generating module is specifically configured to: generate, according to the image signal and the user intention, a robot interaction content according to a current robot life time axis.

Preferably, the multi-modal signal includes at least a voice signal, and the content generating module is specifically configured to: generate, according to the voice signal and the user intention, a robot interaction content according to a current life time axis of the robot.

Preferably, the multi-modal signal includes at least a gesture signal, and the content generating module is specifically configured to: generate, according to the gesture signal and the user intention, a robot interaction content according to a current robot life time axis.

The invention discloses a robot comprising a system for generating interactive content of a robot as described above.

Compared with the prior art, the present invention has the following advantages: the existing robot is generally based on the method of generating the interactive interactive content of the question and answer interactive robot in the solid scene, and cannot generate the robot more accurately based on the current scene. expression. A method for generating interactive content of a robot includes: acquiring a multi-modal signal; determining a user intention according to the multi-modal signal; and combining a current life time axis of the robot according to the multi-modal signal and the user intention Generate robot interaction content. In this way, multi-modal signals such as image signals, speech signals, and robot variable parameters can be combined to more accurately generate robot interaction content, thereby being more accurate, Personification and interaction with people. For people, everyday life has a certain regularity. In order to make robots communicate with people more anthropomorphic, let the robots sleep, exercise, eat, dance, read books, eat, make up, etc. in 24 hours a day. Sleep and other actions. Therefore, the present invention adds the life time axis in which the robot is located to the interactive content generation of the robot, and makes the robot more humanized when interacting with the human, so that the robot has a human lifestyle in the life time axis, and the method can enhance the robot interaction content. Generate anthropomorphic, enhance the human-computer interaction experience and improve intelligence.

DRAWINGS

1 is a flowchart of a method for generating interactive content of a robot according to Embodiment 1 of the present invention;

2 is a schematic diagram of a system for generating interactive content of a robot according to a second embodiment of the present invention.

detailed description

Although the flowcharts describe various operations as a sequential process, many of the operations can be implemented in parallel, concurrently or concurrently. The order of the operations can be rearranged. Processing may be terminated when its operation is completed, but may also have additional steps not included in the figures. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and the like.

Computer devices include user devices and network devices. The user equipment or the client includes but is not limited to a computer, a smart phone, a PDA, etc.; the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud computing-based computer or network server. cloud. The computer device can operate alone to carry out the invention, and can also access the network and implement the invention through interoperation with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

The terms "first," "second," and the like may be used herein to describe the various elements, but the elements should not be limited by these terms, and the terms are used only to distinguish one element from another. The term "and/or" used herein includes any and all combinations of one or more of the associated listed items. When a unit is referred to as being "connected" or "coupled" to another unit, it can be directly connected or coupled to the other unit, or an intermediate unit can be present.

The terminology used herein is for the purpose of describing the particular embodiments, The singular forms "a", "an", It should also be understood that the term "includes" is used herein. And/or "comprises" the existence of the stated features, integers, steps, operations, units and/or components, and does not exclude the presence or addition of one or more other features, integers, steps, operations, units, components and/or Or a combination thereof.

The invention will now be further described with reference to the drawings and preferred embodiments.

Embodiment 1

As shown in FIG. 1 , a method for generating interactive content of a robot is disclosed in this embodiment, including:

S101. Acquire a multi-modal signal.

S102. Determine a user intent according to the multimodal signal.

S103. Generate robot interaction content according to the current robot life timeline 300 according to the multimodal signal and the user intention.

For the application scenario, the existing robot is generally based on the method of generating interactive interactive content of the question and answer interaction robot in the solid scene, and cannot generate the expression of the robot more accurately based on the current scene. A method for generating interactive content of a robot includes: acquiring a multi-modal signal; determining a user intention according to the multi-modal signal; and combining a current life time axis of the robot according to the multi-modal signal and the user intention Generate robot interaction content. In this way, multi-modal signals such as image signals and speech signals can be combined with robot variable parameters to more accurately generate robot interaction content, thereby more accurately and anthropomorphic interaction and communication with people. For people, everyday life has a certain regularity. In order to make robots communicate with people more anthropomorphic, let the robots sleep, exercise, eat, dance, read books, eat, make up, etc. in 24 hours a day. Sleep and other actions. Therefore, the present invention adds the life time axis in which the robot is located to the interactive content generation of the robot, and makes the robot more humanized when interacting with the human, so that the robot has a human lifestyle in the life time axis, and the method can enhance the robot interaction content. Generate anthropomorphic, enhance the human-computer interaction experience and improve intelligence. The interactive content can be an expression or text or voice. The robot life timeline 300 is completed and set in advance. Specifically, the robot life timeline 300 is a series of parameter collections, and this parameter is transmitted to the system to generate interactive content.

The multimodal information in this embodiment may be one of user expression, voice information, gesture information, scene information, image information, video information, face information, pupil iris information, light sense information, and fingerprint information. Several. In this embodiment, it is preferable to add a voice signal and a gesture signal to the picture signal, so that the recognition is accurate and the recognition efficiency is high.

In this embodiment, the life time axis is specifically: according to the time axis of human daily life, the robot is fitted with the time axis of human daily life, and the behavior of the robot follows this fitting line. Move, that is, get the robot's own behavior in a day, so that the robot can perform its own behavior based on the life time axis, such as generating interactive content and communicating with humans. If the robot is always awake, it will act according to the behavior on this timeline, and the robot's self-awareness will be changed according to this timeline. The life timeline and variable parameters can be used to change the attributes of self-cognition, such as mood values, fatigue values, etc., and can also automatically add new self-awareness information, such as no previous anger value, based on the life time axis and The scene of the variable factor will automatically add to the self-cognition of the robot based on the scene that previously simulated the human self-cognition.

For example, the multi-modal signal is used by the user to speak to the robot by using voice: "good sleepy", the multi-modal signal can be added with a picture signal, and the robot comprehensively judges according to the multi-modal signal such as the above-mentioned voice signal plus the picture signal. Identifying the user's intention is that the user is very sleepy, and the robot life timeline, for example, the current time is 9 am, then the robot knows that the owner is just getting up, then you should ask the owner early, for example, answer "Good morning" as a reply, It is also possible to match an expression, a picture, etc., and the interactive content in the present invention can be understood as a reply of the robot. If the multi-modal signal is used by the user to speak to the robot by using voice: "good sleepy", the multi-modal signal can be added with a picture signal, and the robot comprehensively judges according to the multi-modal signal such as the above-mentioned voice signal plus the picture signal. Identifying the user's intention is that the user is very sleepy, and the robot lives on the timeline. For example, the current time is 9:00 pm, then the robot knows that the owner needs to sleep, then he will reply with the words "master good night, sleep well" and the like. It can also be accompanied by expressions, pictures, etc. This kind of approach is more anthropomorphic than simply relying on scene recognition to generate replies and expressions that are more intimate with people's lives. The multi-modal signal is generally a combination of a plurality of signals, such as a picture signal plus a voice signal, or a picture signal plus a voice signal plus a gesture signal.

According to one example, the method for generating parameters of the robot life time axis includes:

Extend the robot's self-awareness;

Get the parameters of the life timeline;

In this way, the life time axis is added to the self-cognition of the robot itself, so that the robot has an anthropomorphic life. For example, add the cognition of lunch to the robot.

According to another example, the step of expanding the self-cognition of the robot specifically includes: combining the life scene with the self-awareness of the robot to form a self-cognitive curve based on the life time axis. In this way, the life time axis can be specifically added to the parameters of the robot itself.

According to another example, the parameter of the self-cognition of the robot and the life time axis The step of fitting the parameters in the method specifically includes: using a probability algorithm, calculating a probability of each parameter change of the robot on the life time axis after the time axis scene parameter is changed, and forming a fitting curve. In this way, the parameters of the robot's self-cognition can be specifically matched with the parameters in the life time axis. The probability algorithm may be a Bayesian probability algorithm.

For example, in 24 hours a day, the robot will have sleep, exercise, eat, dance, read books, eat, make up, sleep and other actions. Each action will affect the self-cognition of the robot itself, and combine the parameters on the life time axis with the self-cognition of the robot itself. After fitting, the robot's self-cognition includes, mood, fatigue value, intimacy. , goodness, number of interactions, three-dimensional cognition of the robot, age, height, weight, intimacy, game scene value, game object value, location scene value, location object value, etc. For the robot to identify the location of the scene, such as cafes, bedrooms, etc.

The machine will perform different actions in the time axis of the day, such as sleeping at night, eating at noon, exercising during the day, etc. All the scenes in the life time axis will have an impact on self-awareness. These numerical changes are modeled by the dynamic fit of the probability model, fitting the probability that all of these actions occur on the time axis.

According to another example, the multi-modal signal includes at least an image signal, and the step of generating the robot interaction content in combination with the current robot life time axis according to the multi-modality signal and the user intention specifically includes:

Based on the image signal and the user intent, the robot interaction content is generated in conjunction with the current robot life timeline. The multi-modal signal includes at least an image signal, so that the robot can grasp the user's intention, and in order to better understand the user's intention, other signals, such as a voice signal, a gesture signal, etc., are generally added, so that the robot can be more accurately understood. Whether the user is the real expression or the meaning of a joke.

According to another example, the multi-modal signal includes at least a voice signal, and the step of generating the robot interaction content according to the current multi-modality signal and the user intention, in combination with the current robot life time axis, specifically includes:

According to another example, the multi-modal signal includes at least a gesture signal, and the step of generating the robot interaction content according to the current multi-modality signal and the user intention, in combination with the current robot life time axis, specifically includes:

According to the gesture signal and the user intention, combined with the current robot life time axis Into the robot interactive content.

For example, the multi-modal signal is used by the user to speak to the robot by using voice: "hungry", the multi-modal signal can be added with a picture signal, and the robot comprehensively judges and recognizes according to the multi-modal signal such as the above-mentioned voice signal plus picture signal. The user's intention is that the user is very hungry, and the robot life timeline, for example, the current time is 9 am, then the robot will reply, let the user go to breakfast, and with a cute expression. If the multi-modal signal is used by the user to speak to the robot by using voice: "hungry", the multi-modal signal can be added with a picture signal, and the robot comprehensively judges and recognizes the multi-modal signal according to the above-mentioned voice signal plus picture signal. The user's intention is that the user is very hungry, and the robot lives on the timeline. For example, the current time is 9:00 pm, then the robot will reply, eat too late, and have a cute expression.

In this embodiment, the voice signal and the picture signal are generally used to accurately understand the meaning of the user, thereby more accurately replying to the user. Of course, other signals are more accurate, such as gesture signals, video signals, and the like.

Embodiment 2

As shown in FIG. 2, in this embodiment, a system for generating interactive content of a robot includes:

The obtaining module 201 is configured to acquire a multi-modal signal;

The intent identification module 202 is configured to determine a user intent according to the multimodal signal;

The content generation module 203 is configured to generate the robot interaction content according to the current multi-modality signal and the user intention, in conjunction with the current robot life time axis sent by the robot life timeline module 301.

In this way, multi-modal signals such as image signals and speech signals can be combined with robot variable parameters to more accurately generate robot interaction content, thereby more accurately and anthropomorphic interaction and communication with people. For people, everyday life has a certain regularity. In order to make robots communicate with people more anthropomorphic, let the robots sleep, exercise, eat, dance, read books, eat, make up, etc. in 24 hours a day. Sleep and other actions. Therefore, the present invention adds the life time axis in which the robot is located to the interactive content generation of the robot, and makes the robot more humanized when interacting with the human, so that the robot has a human lifestyle in the life time axis, and the method can enhance the robot interaction content. Generate anthropomorphic, enhance the human-computer interaction experience and improve intelligence. The interactive content can be an expression or text or voice.

For example, the multi-modal signal is used by the user to speak to the robot by using voice: "good sleepy", the multi-modal signal can be added with a picture signal, and the robot according to the multi-modal signal such as the above-mentioned voice signal In addition to the comprehensive judgment of the picture signal, the user's intention is recognized as the user is very sleepy, and the robot life timeline, for example, the current time is 9:00 am, then the robot knows that the owner is just getting up, then the owner should ask early, for example, answer "Good morning" as a reply, can also be accompanied by expressions, pictures, etc., the interactive content in the present invention can be understood as the reply of the robot. If the multi-modal signal is used by the user to speak to the robot by using voice: "good sleepy", the multi-modal signal can be added with a picture signal, and the robot comprehensively judges according to the multi-modal signal such as the above-mentioned voice signal plus the picture signal. Identifying the user's intention is that the user is very sleepy, and the robot lives on the timeline. For example, the current time is 9:00 pm, then the robot knows that the owner needs to sleep, then he will reply with the words "master good night, sleep well" and the like. It can also be accompanied by expressions, pictures, etc. This kind of approach is more anthropomorphic than simply relying on scene recognition to generate replies and expressions that are more intimate with people's lives. The multi-modal signal is generally a combination of a plurality of signals, such as a picture signal plus a voice signal, or a picture signal plus a voice signal plus a gesture signal.

According to one example, the system includes a time axis based and artificial intelligence cloud processing module for:

Extend the robot's self-awareness;

Get the parameters of the life timeline;

According to another example, the time-based and artificial intelligence cloud processing module is further configured to combine a life scene with a self-awareness of the robot to form a self-cognitive curve based on a life time axis. In this way, the life time axis can be specifically added to the parameters of the robot itself.

According to another example, the time axis-based and artificial intelligence cloud processing module is further configured to: use a probability algorithm to calculate a probability of each parameter change of the robot on the life time axis after the time axis scene parameter changes, to form a fit curve. In this way, the parameters of the robot's self-cognition can be specifically matched with the parameters in the life time axis. The probability algorithm may be a Bayesian probability algorithm.

For example, in 24 hours a day, the robot will have sleep, exercise, eat, dance, read books, eat, make up, sleep and other actions. Each action will affect the self-cognition of the robot itself, and combine the parameters on the life time axis with the self-cognition of the robot itself. After fitting, the robot's self-cognition includes, mood, fatigue value, intimacy. , good feelings, number of interactions, The three-dimensional cognition, age, height, weight, intimacy, game scene value, game object value, location scene value, location object value, etc. of the robot. For the robot to identify the location of the scene, such as cafes, bedrooms, etc.

According to another example, the multi-modality signal includes at least an image signal, and the content generation module is specifically configured to generate the robot interaction content according to the current robot life time axis according to the image signal and the user intention.

The multi-modal signal includes at least an image signal, so that the robot can grasp the user's intention, and in order to better understand the user's intention, other signals, such as a voice signal, a gesture signal, etc., are generally added, so that the robot can be more accurately understood. Whether the user is the real expression or the meaning of a joke.

According to another example, the multi-modal signal includes at least a voice signal, and the content generating module is specifically configured to: generate, according to the voice signal and the user intention, a robot interaction content according to a current robot life time axis.

According to another example, the multi-modality signal includes at least a gesture signal, and the content generation module is specifically configured to generate the robot interaction content according to the current robot life time axis according to the gesture signal and the user intention.

The above is a further detailed description of the present invention in connection with the specific preferred embodiments, and the specific embodiments of the present invention are not limited to the description. It will be apparent to those skilled in the art that the present invention may be made without departing from the spirit and scope of the invention.

Claims

A method for generating interactive content of a robot, comprising:

Obtaining a multimodal signal;

Determining a user intent based on the multimodal signal;

Based on the multimodal signal and the user intent, the robot interaction content is generated in conjunction with the current robot life timeline.
The generating method according to claim 1, wherein the generating method of the parameter of the life time axis of the robot comprises:

Extend the robot's self-awareness;

Get the parameters of the life timeline;

The self-cognitive parameters of the robot are fitted to the parameters in the life time axis to generate a robot life time axis.
The generating method according to claim 2, wherein the step of expanding the self-cognition of the robot specifically comprises: combining the life scene with the self-awareness of the robot to form a self-cognitive curve based on the life time axis.
The generating method according to claim 2, wherein the step of fitting the parameter of the self-cognition of the robot to the parameter in the life time axis comprises: using a probability algorithm to calculate the robot on the life time axis The probability of each parameter change after the time axis scene parameter is changed forms a fitted curve.
The generating method according to claim 2, wherein the living time axis refers to a time axis including 24 hours a day, and the parameter in the living time axis includes at least a user performing on the living time axis. Daily life behavior and the values of the parameters that represent the behavior.
The generating method according to claim 1, wherein the multimodal signal includes at least an image signal, and the generating robot interaction is combined with a current life time axis of the robot according to the multimodal signal and the user intention The steps of the content specifically include:

Based on the image signal and the user intent, the robot interaction content is generated in conjunction with the current robot life timeline.
The generating method according to claim 1, wherein the multi-modal signal comprises at least a voice signal, and the generating a robot interaction according to the current multi-modal signal and the user intention combined with a current robot life time axis The steps of the content specifically include:

Based on the speech signal and the user intent, the robot interaction content is generated in conjunction with the current robot life timeline.
The generating method according to claim 1, wherein the multi-modal signal comprises at least a gesture signal, and the generating a robot interaction according to the current multi-modal signal and the user intention combined with a current robot life time axis The steps of the content specifically include:

A robot interaction content is generated in accordance with the current robot life timeline based on the gesture signal and the user intent.
A system for generating interactive content of a robot, comprising:

An acquisition module for acquiring a multimodal signal;

An intent identification module, configured to determine a user intent according to the multimodal signal;

And a content generating module, configured to generate the robot interaction content according to the current multi-modality signal and the user intention, in combination with the current robot life time axis.
The generating system according to claim 8, wherein the system comprises a time axis based and artificial intelligence cloud processing module for:

Extend the robot's self-awareness;

Get the parameters of the life timeline;

The self-cognitive parameters of the robot are fitted to the parameters in the life time axis to generate a robot life time axis.
The generating system according to claim 10, wherein the time-based and artificial intelligence cloud processing module is further configured to combine a life scene with a self-awareness of the robot to form a self-cognitive curve based on a life time axis.
The generating system according to claim 10, wherein the time-based and artificial intelligence cloud processing module is further configured to: use a probability algorithm to calculate each of the robots on the life time axis after the time axis scene parameter changes The probability of a parameter change forms a fitted curve.
The generating system according to claim 10, wherein said life time axis refers to a time axis including 24 hours a day, and parameters in said life time axis include at least a user performing on said life time axis Daily life behavior and the values of the parameters that represent the behavior.
The generating system according to claim 9, wherein the multimodal signal includes at least an image signal, and the content generating module is specifically configured to: combine the current robot life according to the image signal and the user intention The timeline generates robot interaction content.
The generating system according to claim 9, wherein the multi-modal signal comprises at least a voice signal, and the content generating module is specifically configured to: combine the current robot life according to the voice signal and the user intention The timeline generates robot interaction content.
The generating system according to claim 9, wherein said multimodal signal At least the gesture signal is included, and the content generating module is specifically configured to: generate the robot interaction content according to the current robot life time axis according to the gesture signal and the user intention.
A robot comprising a robot interactive content generating system according to any one of claims 9 to 16.