CN109176535B

CN109176535B - Interaction method and system based on intelligent robot

Info

Publication number: CN109176535B
Application number: CN201810777646.9A
Authority: CN
Inventors: 谢巧菁; 魏晨
Original assignee: Beijing Guangnian Infinite Technology Co ltd
Current assignee: Beijing Guangnian Infinite Technology Co ltd
Priority date: 2018-07-16
Filing date: 2018-07-16
Publication date: 2021-10-19
Anticipated expiration: 2038-07-16
Also published as: CN109176535A

Abstract

The invention provides an interaction method based on an intelligent robot, which comprises the following steps: acquiring multi-modal input data under the condition that the intelligent robot is activated; extracting current scene parameters in a current scene from multi-modal input data; fusing the current scene parameters and the historical scene parameters to form a dynamic life track map of the user; generating multi-modal interactive output data based on the dynamic life trajectory graph. The invention provides an intelligent robot which has a preset image and preset attributes and can interact with a user in a multi-mode. In addition, the invention can also effectively utilize the life maps of time, people, places and the like generated in the daily life of the user, effectively expand the interactive topics or recommend the significant topics to the user according to the full development and utilization of the life maps, enrich the interactive contents and improve the interactive accuracy.

Description

Interaction method and system based on intelligent robot

Technical Field

The invention relates to the field of artificial intelligence, in particular to an interaction method and system based on an intelligent robot.

Background

The development of robotic multi-modal interaction systems has been directed to mimicking human dialog in an attempt to mimic interactions between humans between contexts. However, at present, the development of a robot multimodal interaction system related to an intelligent robot is not perfect, an intelligent robot performing multimodal interaction does not appear, and more importantly, an interactive product developed and utilized in relation to a life map such as time, people, and places generated by a user in daily life does not exist.

Therefore, the invention provides an interaction method and system based on an intelligent robot.

Disclosure of Invention

In order to solve the above problems, the present invention provides an interaction method based on an intelligent robot, the method comprising the following steps:

acquiring multi-modal input data under the condition that the intelligent robot is activated;

extracting current scene parameters in a current scene from the multi-modal input data;

fusing the current scene parameters and the historical scene parameters to form a dynamic life track map of the user;

generating multimodal interaction output data based on the dynamic life trajectory graph.

According to an embodiment of the present invention, the step of extracting the current scene parameters in the current scene from the multi-modal input data comprises:

positioning the geographic position of the current scene, and determining the location parameter of the current scene;

capturing and identifying characters in the range of the current scene, and determining the face of the current scene;

acquiring time information of a current scene, and determining a time parameter of the current scene;

and associating the location parameter, the person parameter and the time parameter, and recording the location parameter, the person parameter and the time parameter as the current scene parameter of the current scene.

According to an embodiment of the present invention, the step of fusing the current context parameters and the historical context parameters to form the dynamic life trajectory map of the user includes:

judging the parameter attribute of the current scene parameter, and determining the category of the current scene parameter;

and according to the category of the current scene parameters, fusing the current scene parameters and the historical scene parameters to form a dynamic life track map of the user.

According to an embodiment of the present invention, the step of generating multi-modal interaction output data based on the dynamic life trajectory graph comprises:

determining an interaction topic of a user and the intelligent robot;

traversing a dynamic life track map of the user, and searching a user life track related to the interactive topic;

and generating multi-modal interaction output data according to the user life track related to the interaction topic and the multi-modal input data, or updating the user life track.

According to an embodiment of the present invention, the step of determining the interactive topic of the user with the intelligent robot includes:

analyzing the multi-modal input data, acquiring the interaction intention of the user, and deciding and outputting an interaction topic corresponding to the intention;

or the like, or, alternatively,

and analyzing the current scene parameters, analyzing the current scene parameters and deciding the interactive topics.

According to an embodiment of the present invention, before the step of fusing the current context parameters and the historical context parameters to form the dynamic life track map of the user, the method further includes:

acquiring identity characteristic information of a current user, judging user attributes of the current user, and determining the category of the current user, wherein the category of the user comprises: a child user.

According to an embodiment of the present invention, when the user interacting with the intelligent robot is a child user, the method further comprises:

determining an interaction topic suitable for the children interaction intention;

or the like, or, alternatively,

analyzing the current scene parameters, analyzing the current scene parameters and the child face parameters, and deciding the interactive topics.

According to one embodiment of the present invention, when the user interacting with the intelligent robot includes a child user, the step of generating multi-modal interaction output data based on the dynamic life trajectory graph includes:

and screening the multi-mode interactive output data, and removing the multi-mode interactive output data which are not suitable for the child user.

According to another aspect of the invention, there is also provided a program product containing a series of instructions for carrying out the steps of the method according to any one of the above.

According to another aspect of the present invention, there is also provided an intelligent robot-based interaction system, including:

the intelligent terminal comprises a camera, a positioning device and a time device, is used for acquiring current scene parameters of a current scene and multi-mode input data, and has the capability of outputting voice, emotion, expression and action;

the intelligent robot is installed on the intelligent terminal and performs multi-mode interaction by adopting the method;

and the cloud brain is used for performing semantic understanding, visual recognition, cognitive computation and emotion computation on the multi-modal input data so as to decide that the intelligent robot outputs multi-modal interactive output data.

The interaction method and the interaction system based on the intelligent robot provided by the invention provide the intelligent robot, and the intelligent robot has a preset image and preset attributes and can perform multi-mode interaction with a user. In addition, the invention can also effectively utilize the life maps of time, people, places and the like generated in the daily life of the user, effectively expand the interactive topics or recommend the significant topics to the user according to the full development and utilization of the life maps, enrich the interactive contents and improve the interactive accuracy.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 shows an interaction diagram of an intelligent robot-based interaction system according to an embodiment of the invention;

FIG. 2 shows a block diagram of an intelligent robot-based interactive system, according to an embodiment of the present invention;

FIG. 3 shows a flow diagram of a smart robot-based interaction method according to one embodiment of the invention;

FIG. 4 shows a flowchart of extracting current scene parameters according to an intelligent robot-based interaction method of the present invention;

FIG. 5 shows a flowchart for generating multi-modal interaction output data based on the intelligent robot interaction method according to an embodiment of the present invention;

FIG. 6 shows another flowchart of an intelligent robot-based interaction method according to an embodiment of the invention; and

fig. 7 shows a flowchart of communication among three parties, namely a user, a smart terminal and a cloud brain according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

For clarity, the following description is required before the examples:

the intelligent robot can interact with a user in a multi-mode manner.

the intelligent robot acquires multi-mode input data based on the hardware of the intelligent terminal, and performs semantic understanding, visual recognition, cognitive computation and emotion computation on the multi-mode input data under the support of the capability of a cloud brain so as to complete the decision output process.

The cloud brain realizes interaction with the user for the terminal providing the processing capability of the intelligent robot for performing semantic understanding (language semantic understanding, action semantic understanding, visual recognition, emotion calculation and cognitive calculation) on the interaction requirement of the user so as to decide the output multi-mode interaction output data of the intelligent robot.

Various embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Fig. 1 shows an interaction diagram of an intelligent robot-based interaction system according to an embodiment of the present invention. As shown in fig. 1, performing multi-modal interaction requires a user 101, a smart terminal 102, a smart robot 103, and a cloud brain 104. Wherein, the user 101 interacting with the intelligent robot can be a real person and another intelligent robot, and the interaction process of the other intelligent robot and the intelligent robot is similar to the interaction process of a single person and the intelligent robot. Thus, only the multi-modal interaction process of the user (human) with the smart robot is illustrated in FIG. 1.

In addition, the intelligent terminal 102 includes a display area 1021 and hardware support devices 1022 (essentially core processors). The display area 1021 is used for displaying the image of the intelligent robot 103, and the hardware support device 1022 is used in cooperation with the cloud brain 104 for data processing in the interaction process.

The process of interaction between the intelligent robot and the user 101 in fig. 1 is:

the intelligent robot has the AI capabilities of natural language understanding, visual perception, touch perception, language output, emotion expression and action output and the like. According to an embodiment of the invention, in order to enhance the interactive experience, the intelligent robot can be displayed in a preset area after being started.

Fig. 2 shows a block diagram of an intelligent robot-based interactive system according to another embodiment of the present invention. As shown in fig. 2, completing the interaction requires a user 101, a smart terminal 102, and a cloud brain 104. The intelligent terminal 102 includes a human-machine interface 201, a data processing unit 202, an input/output device 203, and an interface unit 204. Cloud brain 104 includes semantic understanding interface 1041, visual recognition interface 1042, cognitive computation interface 1043, and emotion computation interface 1044.

The interactive system based on the intelligent robot provided by the invention comprises an intelligent terminal 102 and a cloud brain 104. The intelligent robot 103 operates in the intelligent terminal 102, and the intelligent robot 103 has a preset image and preset attributes, and can start voice, emotion, vision and perception capabilities when in an interactive state.

In one embodiment, the smart terminal 102 may include: a human-machine interface 201, a data processing unit 202, an input-output device 203, and an interface unit 204. The human-machine interface 201 displays the intelligent robot 103 in the running state in a preset area of the intelligent terminal 102.

The data processing unit 202 is used for processing data generated in the process of multi-modal interaction between the user 101 and the intelligent robot 103. The Processor may be a data Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the terminal, and various interfaces and lines connecting the various parts of the overall terminal.

The intelligent terminal 102 includes a memory, which mainly includes a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like; the storage data area may store data (such as audio data, browsing recordings, etc.) created according to the use of the smart terminal 102, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The input and output device 203 is used for acquiring multi-modal interaction data and outputting output data in the interaction process. The interface unit 204 is configured to communicate with the cloud brain 104, and invoke the smart robot capability in the cloud brain 104 by interfacing with the interface in the cloud brain 104. Examples of input-output devices include microphones for voice operation, scanners, cameras (detecting motion not involving touch using visible or invisible wavelengths), and so forth. The smart terminal 102 may obtain multimodal input data through the input devices mentioned above.

Cloud brain 104 includes semantic understanding interface 1041, visual recognition interface 1042, cognitive computation interface 1043, and emotion computation interface 1044. These interfaces communicate with the interface unit 204 in the intelligent terminal 102. In addition, the cloud brain 104 further includes semantic understanding logic corresponding to the semantic understanding interface 1041, visual recognition logic corresponding to the visual recognition interface 1042, cognitive computation logic corresponding to the cognitive computation interface 1043, and emotion computation logic corresponding to the emotion computation interface 1044.

As shown in fig. 2, each capability interface calls corresponding logic processing in the multi-modal data parsing process. The following is a description of the various interfaces:

a semantic understanding interface 1041 that receives specific voice commands forwarded from the interface unit 204, performs voice recognition thereon, and natural language processing based on a large corpus.

The visual recognition interface 1042 can perform video content detection, recognition, tracking, and the like according to a computer vision algorithm, a deep learning algorithm, and the like for human bodies, human faces, and scenes. Namely, the image is identified according to a preset algorithm, and a quantitative detection result is given. The system has an image preprocessing function, a feature extraction function, a decision function and a specific application function;

the image preprocessing function can be basic processing of the acquired visual acquisition data, including color space conversion, edge extraction, image transformation and image thresholding;

the characteristic extraction function can extract characteristic information of complexion, color, texture, motion, coordinates and the like of a target in the image;

the decision function can be that the feature information is distributed to specific multi-mode output equipment or multi-mode output application needing the feature information according to a certain decision strategy, such as the functions of face detection, person limb identification, motion detection and the like are realized.

The cognitive computing interface 1043 receives the multimodal data forwarded from the interface unit 204, and the cognitive computing interface 1043 is used for processing the multimodal data to perform data acquisition, recognition and learning so as to obtain a user portrait, a knowledge graph and the like, so as to make a reasonable decision on the multimodal output data.

And emotion calculation interface 1044 for receiving the multimodal data forwarded from interface unit 204 and calculating the current emotional state of the user by using emotion calculation logic (which may be emotion recognition technology). The emotion recognition technology is an important component of emotion calculation, the content of emotion recognition research comprises the aspects of facial expression, voice, behavior, text, physiological signal recognition and the like, and the emotional state of a user can be judged through the content. The emotion recognition technology may monitor the emotional state of the user only through the visual emotion recognition technology, or may monitor the emotional state of the user in a manner of combining the visual emotion recognition technology and the voice emotion recognition technology, and is not limited thereto. In this embodiment, it is preferable to monitor the emotion by a combination of both.

The emotion calculation interface 1044 is used for collecting human facial expression images by using image acquisition equipment during visual emotion recognition, converting the human facial expression images into analyzable data, and then performing expression emotion analysis by using technologies such as image processing and the like. Understanding facial expressions typically requires detecting subtle changes in the expression, such as changes in cheek muscles, mouth, and eyebrow plucking.

In addition, the interaction system based on the intelligent robot provided by the invention can also be matched with a program product which comprises a series of instructions for executing the steps of the interaction method of the intelligent robot. The program product is capable of executing computer instructions comprising computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.

The program product may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like.

It should be noted that the program product may include content that is appropriately increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, the program product does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

Fig. 3 shows a flowchart of an intelligent robot-based interaction method according to an embodiment of the invention.

In step S301, when the smart robot is activated, multimodal input data is acquired. There are many ways to activate the intelligent robot, in one embodiment, the intelligent robot may be activated by pressing a button, or may be activated by waking up with voice, and other ways to activate the intelligent robot may be applied to the present invention, which is not limited thereto.

And acquiring surrounding multi-modal input data after the intelligent robot is activated. The surrounding multimodal input data includes user input data as well as surrounding environment data. The multimodal input data may be voice data, image video data, and perceptual data, among others. The intelligent terminal 102 is provided with a corresponding device for acquiring the multi-modal input data. In one embodiment, the multimodal input data may be user expressions, voice data, gesture data, image data, video data, face data, pupil iris information, light sensation information, fingerprint information, and environmental information.

Next, in step S302, the current scene parameters in the current scene in the multimodal input data are extracted. For the present invention, it is necessary to extract the current scene parameters in the multimodal input data. Fig. 4 shows a flowchart of extracting current scene parameters by the intelligent robot-based interaction method according to an embodiment of the invention.

As shown in fig. 4, in step S401, the geographic position of the current scene is located, and the location parameter of the current scene is determined. The intelligent terminal 102 is provided with a positioning device, so that the geographic position of the current scene can be positioned. And recording the positioned position information as the position parameter of the current scene.

Meanwhile, in step S402, people in the current scene are captured and recognized, and the face of the current scene is determined. In this step, a camera on the intelligent terminal 102 is required to capture people in the current scene range, and the captured people information is compared and screened to determine the face of the current scene.

Meanwhile, in step S403, time information of the current scene is acquired, and a time parameter of the current scene is determined. The time device on the intelligent terminal 102 can acquire the time information of the current scene in real time and record the time information as the time parameter of the current scene.

Finally, in step S404, the location parameter, the face and the time parameter are associated and recorded as the current scene parameter of the current scene. And associating the location parameter, the face and the time parameter in the current scene, and recording the location parameter, the face and the time parameter as the current scene parameter of the current scene.

According to an embodiment of the present invention, after extracting the current scene parameters of the current scene, these scene parameter pairs need to be sorted. The location parameters, the face parameters and the time parameters are divided into three groups, namely entertainment parameters, work parameters and family parameters. The location parameters include entertainment type locations, work type locations, and home type locations. The human faces include entertainment type human faces, work type human faces and family type human faces. The time parameters comprise entertainment time, work time and family events.

In one embodiment, the current time parameter may be identified as the work class time if the current time is between 9 am and 6 pm on monday through friday. If the current face is a colleague, the face can be identified as a working face. If the current location is a unit building, it can be identified as a job-like location.

The standard for distinguishing the entertainment parameters, the work parameters and the family parameters can be preset and changed and can be determined according to the daily work and rest habits of the user. In addition, when the user is a child user, the location parameter, the face parameter and the time parameter are divided into three groups, namely an entertainment parameter, a family parameter and a learning parameter.

In one embodiment, the current time parameter may be identified as a learning-like time if the current time is from 8 am to 4 pm on monday through friday. If the current face is a classmate, the face can be identified as a learning-class face. If the current location is a school, it can be identified as a learning class location.

Next, in step S303, the current context parameters and the historical context parameters are fused to form a dynamic life trajectory map of the user. Prior to this step, in accordance with one embodiment of the present invention, the identity of the user needs to be discerned. Acquiring identity characteristic information of a current user, judging user attributes of the current user, and determining the category of the current user, wherein the category of the user comprises: a child user.

After the identity type of the user is determined, the current context parameters and the historical context parameters need to be fused. Before fusion, the category of the current scene parameters needs to be identified. When the user is a common adult user, whether the current scene parameter belongs to entertainment, work or family is identified. When the user is a child user, whether the current scene parameters belong to entertainment, learning or family is identified. And then, fusing the identified scene parameters with the historical scene parameters of the same category to generate a new user life track map under the current category.

Finally, in step S304, multimodal interaction output data is generated based on the dynamic life trajectory graph. The dynamic life map of the user comprises the daily behaviors of the user, and a basis can be provided for generating multi-mode interactive output data. FIG. 5 shows a flowchart for generating multi-modal interaction output data based on the intelligent robot interaction method according to an embodiment of the present invention.

In step S501, the topic of interaction of the user with the intelligent robot is determined. In one embodiment, the interaction intention of the user can be obtained by analyzing the multi-modal input data, and the interactive topic corresponding to the intention is output in a decision-making mode. The current scene parameters can also be analyzed, and the current scene parameters are analyzed to decide the interactive topics.

According to one embodiment of the invention, when the user interacting with the intelligent robot is a child user, the interaction topic suitable for the child interaction intention is determined. Or analyzing the current scene parameters, analyzing the current scene parameters and the child face parameters, and deciding the interactive topics.

Next, in step S502, the dynamic life track map of the user is traversed, and a user life track related to the interactive topic is found.

Finally, in step S503, multimodal interaction output data is generated from the user life trajectory related to the interaction topic and the multimodal input data, or the user life trajectory is updated. According to one embodiment of the invention, when the user interacting with the intelligent robot comprises a child user, the multi-modal interaction output data is screened, and the multi-modal interaction output data which are not suitable for the child user are eliminated.

The following describes the interaction of a user with an intelligent robot by way of an example:

the user: i have gone to the Shanxi and Shaanxi tours, leaving a profound impression of the scenery there. And what is eaten is also much, especially for cooked wheaten food.

The intelligent robot: for the people, the tourism resources of Shaanxi and Shanxi are very rich, and the tourist has many scenic spots and historic sites.

The user: in the sense of wheaten food, Shanxi and Shanxi wheaten food are rich in types, and the wheaten food is very good in taste.

The intelligent robot: however, the Shanxi wheaten food is various in types, such as sliced noodles, pulled noodles, lug cakes, steamed dimples and the like. The Shanxi wheaten food comprises minced noodles, biangbiang noodles, oiled noodles, dipped noodles and the like.

The user: for o, I eat the fried noodles with oil, and are particularly delicious.

The intelligent robot: now that it is time to eat dinner, you will not go to the museum that you had with friend A to eat the bar.

The user: preferably, I get about friend A.

In the above dialogue, the user firstly mentions the place of previous travel, reviews the experience of previous travel, mentions landscape and wheaten food, and then the user and the intelligent robot develop the dialogue according to the place of travel of the user, namely, Shanxi and Shanxi. The user mentions the eaten pasta, the intelligent robot recommends the user to go to a restaurant to eat the pasta according to the current time and the previous life track of the user, and the user answers the suggestion of the intelligent robot.

In addition, in the interaction between the intelligent robot and the user, the intelligent robot can actively propose an interactive topic according to the current interaction place and the current time, and the interactive topic is developed to interact with the user. For example, the current user is a child, and the current time is the summer midday, the intelligent robot may expand topics around ice cream with the child user, interacting with the child user.

In one embodiment, when the current interaction object is a child user, the interaction topics suitable for the interaction intention of the child are determined. Or analyzing the current scene parameters, analyzing the current scene parameters and the child face parameters, and deciding the interactive topics. And when the user interacting with the intelligent robot comprises a child user, screening the multi-mode interaction output data, and removing the multi-mode interaction output data which are not suitable for the child user.

For the children, the intelligent robot also comprises a learning conversation mode, and can supervise the learning progress of the user and guide the learning of the user according to the previous learning progress of the user. For example, the intelligent robot can call up a learning wrong question record of a previous user and develop guidance for the current learning of the user.

It should be noted that there are many ways to interact with the user, and all activities related to the life of the user can be recorded as the life trace map of the user, and can be applied to the interaction, and the invention is not limited to this.

FIG. 6 shows another flowchart of an intelligent robot-based interaction method according to an embodiment of the invention.

As shown in fig. 6, in step S601, the intelligent terminal 102 sends a request to the cloud brain 104. Then, in step S602, the intelligent terminal 102 is in a state of waiting for the cloud brain 104 to reply. During the waiting period, the intelligent terminal 102 will perform a timing operation on the time taken for returning the data.

In step S603, if the returned response data is not obtained for a long time, for example, the time length exceeds a predetermined time length of 5S, the intelligent terminal 102 may select to perform local reply, and generate local general response data. Then, in step S604, animation matched with the local general response is output, and the voice playing device is called to perform voice playing.

In order to realize multi-modal interaction between the intelligent terminal 102 and the user 101, a communication connection is required to be established between the user 101, the intelligent terminal 102 and the cloud brain 104. The communication connection should be real-time and unobstructed to ensure that the interaction is not affected.

In order to complete the interaction, some conditions or preconditions need to be met. These conditions or preconditions include the intelligent robot being loaded and operated in the intelligent terminal 102, and the intelligent terminal 102 having hardware facilities for sensing and controlling functions. The intelligent robot starts voice, emotion, vision and perception capabilities when in an interactive state.

After the preparation in the early stage is completed, the smart terminal 102 starts to perform interaction with the user 101, and first, when the smart robot is activated, multi-modal input data is acquired. At this time, the two parties of the communication are the smart terminal 102 and the user 101 or the smart terminal 102 and the surrounding environment, and the direction of data transfer is from the user 101 and the surrounding environment to the smart terminal 102.

Then, the current scene parameters in the current scene in the multimodal input data are extracted. The multimodal input data may include various forms of data, for example, text data, speech data, perceptual data, and motion data. The intelligent terminal 102 can analyze the multi-modal input data and extract the current scene parameters in the current scene.

And then, fusing the current scene parameters and the historical scene parameters to form a dynamic life track map of the user. The intelligent terminal 102 sends a request to the cloud brain 104 to request the cloud brain 104 to fuse the current contextual parameters and the historical contextual parameters. At this time, two parties of the communication are the smart terminal 102 and the cloud brain 104.

And finally, generating multi-mode interactive output data based on the dynamic life track map. Cloud brain 104 may generate multi-modal interactive output data based on the dynamic life track of the user. After the intelligent terminal 102 receives the multi-modal interactive output data transmitted by the cloud brain 104, the intelligent terminal 102 outputs the multi-modal interactive output data through the intelligent robot. At this time, two parties of the communication are the smart terminal 102 and the user 101.

It is to be understood that the disclosed embodiments of the invention are not limited to the particular structures, process steps, or materials disclosed herein but are extended to equivalents thereof as would be understood by those ordinarily skilled in the relevant arts. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An interaction method based on an intelligent robot is characterized by comprising the following steps:

generating multi-modal interactive output data based on the dynamic life trajectory graph;

in the multi-modal input data, the step of extracting the current scene parameters in the current scene comprises the following steps:

associating the location parameter, the face and the time parameter, recording the location parameter, the face and the time parameter as current scene parameters of a current scene, and dividing the location parameter, the face and the time parameter into three groups, wherein the three groups are respectively entertainment parameters, work parameters and family parameters, the location parameter comprises an entertainment place, a work place and a family place, the face comprises an entertainment face, a work face and a family face, and the time parameter comprises entertainment time, work time and family time;

the step of fusing the current scene parameters and the historical scene parameters to form the dynamic life track map of the user comprises the following steps:

2. The method of claim 1, wherein generating multi-modal interaction output data based on the dynamic life trajectory graph comprises:

determining an interaction topic of a user and the intelligent robot;

3. The method of claim 2, wherein the step of determining the interaction topic of the user with the intelligent robot comprises:

or the like, or, alternatively,

4. The method according to any one of claims 1-3, wherein before the step of fusing the current context parameters with the historical context parameters to form the dynamic life track map of the user, further comprising:

5. The method of claim 4, wherein when the user interacting with the intelligent robot is a child user, the method further comprises:

or the like, or, alternatively,

6. The method of claim 4, wherein, when the user interacting with the smart robot comprises a child user, the step of generating multi-modal interaction output data based on the dynamic life trajectory graph comprises:

7. A program product comprising a series of instructions for carrying out the method steps according to any one of claims 1 to 6.

8. An intelligent robot-based interaction system, the system comprising:

the intelligent robot is installed on the intelligent terminal and performs multi-modal interaction by adopting the method of any one of claims 1-6;