CN113580166B

CN113580166B - Interaction method, device, equipment and storage medium of anthropomorphic robot

Info

Publication number: CN113580166B
Application number: CN202110961067.1A
Authority: CN
Inventors: 刘庆升; 吴玉胜; 王晓斐; 刘剑; 陈鑫
Original assignee: Anhui Toycloud Technology Co Ltd
Current assignee: Anhui Toycloud Technology Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2023-11-28
Anticipated expiration: 2041-08-20
Also published as: CN113580166A

Abstract

The application discloses an interaction method, device and equipment of a personification robot and a storage medium, wherein the method comprises the following steps: the target robot actively collects the environmental information in the surrounding preset range; the environment information comprises at least one of sound information, image information and contact information of the target robot and other people, and then identity information and emotion states of users in a surrounding preset range are identified by utilizing the environment information; and then, actively feeding back personified interaction information such as voice information, expression information or action information which accords with the emotion state of the user to the user according to the identity information and the emotion state of the user. Therefore, the target robots in the application can identify the emotional states of the users such as surrounding family members by actively collecting the surrounding environment information, and actively feed back the personified interaction information such as voice or action conforming to the emotional states to the users, thereby not only improving the personification degree of the robots, but also improving the use experience of the users.

Description

Interaction method, device, equipment and storage medium of anthropomorphic robot

Technical Field

The present application relates to the field of robots, and in particular, to an interaction method, apparatus, device, and storage medium for a personified robot.

Background

The robot has the basic characteristics of perception, decision making, execution and the like, can assist or even replace human beings to finish dangerous, heavy and complex work, improves the working efficiency and quality, serves the life of the human beings, and enlarges or extends the activity and capacity range of the human beings. Moreover, with the continuous development of technology, robotics are not limited to applications in the field of industrial manufacturing, and robotics are rapidly expanding to fields of home entertainment, medical services, and the like.

The household intelligent robots such as alpha eggs and the like which are put forward in the market at present can perform more visual man-machine interaction with people than computers or mobile terminals, but only passively receive information and passively answer the questions of users, and the answer mode is single, like a preset player. However, active communication is an important process of initiating emotion connection between people, and only in active communication, the emotion of other users can be known in time, for example, some users may be in an emotion state that needs to be concerned, be comforted, etc. Therefore, the existing household intelligent robots can only wake up passively, and cannot actively "worry about the difficulty" of users such as family personnel, so that the "personification" degree of the robots is insufficient, and the robots cannot really have human-like thinking or active interaction modes, so that the robots are very rigid, and the use requirements of modern households are difficult to meet.

Therefore, how to improve the "personification" degree of the household intelligent robot so as to realize the intelligent interaction between the personification robot and people is a technical problem to be solved at present.

Disclosure of Invention

The main purpose of the embodiment of the application is to provide an interaction method, device, equipment and storage medium of a personification robot, which can actively collect surrounding environment information through the robot, and can realize more active intelligent interaction with users such as family personnel and the like according to the environment information, thereby improving the personification degree of the robots and further improving the use experience of the users.

The embodiment of the application provides an interaction method of a personified robot, which comprises the following steps:

actively collecting environmental information in a preset range around the target robot; the environment information comprises at least one of sound information, image information and contact information of the target robot and others;

identifying identity information and emotion states of users in a preset range around the target robot by using the environment information;

and actively feeding back personified interaction information which accords with the emotion state of the user to the user according to the identity information and the emotion state of the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information.

In a possible implementation manner, the actively collecting environmental information within a preset range around the target robot includes:

actively collecting sound information in a preset range around the target robot by utilizing a preset sound sensor on the target robot;

when the existence of a user in a preset range around the target robot is judged through the sound information, starting a shooting device preset on the target robot to shoot image information containing the user;

and actively collecting contact information between the target robot and users in a surrounding preset range by utilizing a preset touch sensor on the target robot.

In a possible implementation, when the environmental information includes sound information; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

extracting voiceprint features of the sound information to serve as target voiceprint features;

comparing the target voiceprint characteristics with preset voiceprint characteristics of preset users stored in the target robot, and determining identity information and emotion states of the target users to which the target voiceprint characteristics belong.

In a possible implementation, when the environment information includes image information; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

extracting image features of the image information as target image features;

comparing the target image characteristics with preset image characteristics of preset users stored in the target robot, and determining identity information and emotional states of the target users to which the target image characteristics belong.

In a possible implementation, when the environment information includes sound information and image information; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

extracting voiceprint features of the sound information to serve as target voiceprint features; extracting image features of the image information to serve as target image features;

comparing the target voiceprint characteristics with preset voiceprint characteristics of a preset user stored in the target robot in advance to obtain a first comparison result; comparing the target image characteristics with preset image characteristics of a preset user stored in the target robot in advance to obtain a second comparison result;

And determining identity information and emotional states of the target users to which the target voiceprint features and the target image features belong according to the first comparison result and the second comparison result.

In a possible implementation manner, N types of emotional states corresponding to M preset users stored in advance in the target robot are respectively stored; comparing the target voiceprint feature with preset voiceprint features of preset users stored in the target robot, and determining identity information and emotion states of the target users to which the target voiceprint feature belongs, wherein the method comprises the following steps:

calculating M similarity between the target voiceprint features and M preset voiceprint features of the M preset users; m is a positive integer greater than 1; the N is a positive integer greater than 1;

selecting the maximum similarity from the M similarities, and determining a target user to which the target voiceprint feature belongs as a preset user corresponding to the maximum similarity; and determining the emotional state of the target user from the N types of emotional states.

In a possible implementation manner, N types of emotional states corresponding to M preset users stored in advance in the target robot are respectively stored; comparing the target image features with preset image features of preset users stored in the target robot, and determining identity information and emotional states of the target users to which the target image features belong, wherein the method comprises the following steps:

Calculating M similarity between the target image features and M preset image features of the M preset users; m is a positive integer greater than 1; the N is a positive integer greater than 1;

selecting the maximum similarity from the M similarities, and determining the target user to which the target image features belong as a preset user corresponding to the maximum similarity; and determining the emotional state of the target user from the N types of emotional states.

In a possible implementation manner, N types of emotional states corresponding to M preset users stored in advance in the target robot are respectively stored; comparing the target voiceprint characteristics with preset voiceprint characteristics of a preset user stored in the target robot in advance to obtain a first comparison result; and comparing the target image features with preset image features of a preset user stored in the target robot in advance to obtain a second comparison result, wherein the second comparison result comprises:

calculating M first similarities between the target voiceprint features and M preset voiceprint features of the M preset users; calculating M second similarity between the target image features and M preset image features of the M preset users; m is a positive integer greater than 1; the N is a positive integer greater than 1;

The determining, according to the first comparison result and the second comparison result, identity information and an emotional state of the target user to which the target voiceprint feature belongs includes:

selecting a corresponding preset user from the M first similarities and the M second similarities when the sum of the first similarities and the second similarities meets a preset condition, and taking the corresponding preset user as a target user to which the target voiceprint feature and the target image feature belong; and determining the emotional state of the target user from the N types of emotional states.

In a possible implementation, when the environmental information includes contact information of the target robot with another person; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

and identifying the identity information and the emotion state of the user in a preset range around the target robot according to the contact information of the target robot and other people.

In a possible implementation manner, the actively feeding back personified interaction information conforming to the emotion state of the user to the user includes:

actively feeding back voice information conforming to the emotion state of the user to the user; the voice information is used for representing the target emotion state of the target robot through the change of tone; the target emotional state is in accordance with an emotional state of the user;

And/or actively feeding back expression information conforming to the emotion state of the user to the user; the expression information is a target emotion state of the target robot reflected by at least one of color change, expression image change and light change on a display screen of the target robot; the target emotional state is in accordance with an emotional state of the user.

In a possible implementation, the target robot includes a preset swing member; the actively feeding back anthropomorphic interaction information conforming to the emotion state of the user to the user comprises the following steps:

the preset swinging part swings out a preset action, and action information conforming to the emotion state of the user is actively fed back to the user; the action information embodies a target emotional state of the target robot; the target emotional state is in accordance with an emotional state of the user.

The embodiment of the application also provides an interaction device of the anthropomorphic robot, which comprises:

the acquisition unit is used for actively acquiring environmental information in a preset range around the target robot; the environment information comprises at least one of sound information, image information and contact information of the target robot and others;

The identification unit is used for identifying the identity information and the emotion state of the user in a preset range around the target robot by utilizing the environment information;

and the feedback unit is used for actively feeding back personified interaction information which accords with the emotion state of the user to the user according to the identity information and the emotion state of the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information.

In a possible implementation manner, the collecting unit includes:

the first acquisition subunit is used for actively acquiring sound information in a preset range around the target robot by utilizing a preset sound sensor on the target robot;

the shooting subunit is used for starting a shooting device preset on the target robot to shoot image information containing a user when the user exists in a preset range around the target robot through the sound information;

and the second acquisition subunit is used for actively acquiring contact information between the target robot and users in a surrounding preset range by utilizing a preset touch sensor on the target robot.

In a possible implementation, when the environmental information includes sound information; the identification unit includes:

A first extraction subunit, configured to extract a voiceprint feature of the sound information as a target voiceprint feature;

the first comparison subunit is used for comparing the target voiceprint feature with preset voiceprint features of preset users stored in the target robot in advance and determining identity information and emotion states of the target users to which the target voiceprint feature belongs.

In a possible implementation, when the environment information includes image information; the identification unit includes:

a second extraction subunit, configured to extract an image feature of the image information as a target image feature;

and the second comparison subunit is used for comparing the target image characteristics with preset image characteristics of preset users stored in the target robot in advance and determining identity information and emotion states of the target users to which the target image characteristics belong.

In a possible implementation, when the environment information includes sound information and image information; the identification unit includes:

a third extraction subunit, configured to extract a voiceprint feature of the sound information as a target voiceprint feature; extracting image features of the image information to serve as target image features;

The third comparison subunit is used for comparing the target voiceprint characteristics with preset voiceprint characteristics of a preset user stored in the target robot in advance to obtain a first comparison result; comparing the target image characteristics with preset image characteristics of a preset user stored in the target robot in advance to obtain a second comparison result;

and the determining subunit is used for determining the identity information and the emotion state of the target user to which the target voiceprint feature and the target image feature belong according to the first comparison result and the second comparison result.

In a possible implementation manner, N types of emotional states corresponding to M preset users stored in advance in the target robot are respectively stored; the first contrast subunit includes:

a first calculating subunit, configured to calculate M similarities between the target voiceprint feature and M preset voiceprint features of the M preset users; m is a positive integer greater than 1; the N is a positive integer greater than 1;

a first selecting subunit, configured to select a maximum similarity from the M similarities, and determine that a target user to which the target voiceprint feature belongs is a preset user corresponding to the maximum similarity; and determining the emotional state of the target user from the N types of emotional states.

In a possible implementation manner, N types of emotional states corresponding to M preset users stored in advance in the target robot are respectively stored; the second contrast subunit includes:

a second calculating subunit, configured to calculate M similarities between the target image feature and M preset image features of the M preset users; m is a positive integer greater than 1; the N is a positive integer greater than 1;

a second selecting subunit, configured to select a maximum similarity from the M similarities, and determine that a target user to which the target image feature belongs is a preset user corresponding to the maximum similarity; and determining the emotional state of the target user from the N types of emotional states.

In a possible implementation manner, N types of emotional states corresponding to M preset users stored in advance in the target robot are respectively stored; the third contrast subunit is specifically configured to:

The determining subunit is specifically configured to:

In a possible implementation, when the environmental information includes contact information of the target robot with another person; the identification unit is specifically configured to:

In a possible implementation manner, the feedback unit is specifically configured to:

In a possible implementation, the target robot includes a preset swing member; the feedback unit is specifically configured to:

The embodiment of the application also provides interaction equipment of the anthropomorphic robot, which comprises the following steps: a processor, memory, system bus;

the processor and the memory are connected through the system bus;

the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform any one of the implementations of the humanoid robot interaction method described above.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on the terminal equipment, the terminal equipment is caused to execute any implementation mode of the interaction method of the personification robot.

The embodiment of the application also provides a computer program product which enables the terminal equipment to execute any implementation mode of the interaction method of the anthropomorphic robot when being run on the terminal equipment.

The embodiment of the application provides an interaction method, device, equipment and storage medium of a personification robot, wherein a target robot actively collects environmental information in a surrounding preset range; the environment information comprises at least one of sound information, image information and contact information of the target robot and other people, and then identity information and emotion states of users in a preset range around the target robot are identified by utilizing the environment information; then, according to the identity information and the emotion state of the user, the personified interaction information which accords with the emotion state of the user can be actively fed back to the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information. Therefore, according to the embodiment of the application, the target robot actively collects surrounding environment information to identify the emotional states of users such as surrounding family members, so that the user can be actively fed back the personified interaction information such as voice or action conforming to the emotional states according to the emotional states, and the user is "in-the-worry and difficult", so that more active intelligent interaction between the target robot and the users such as family personnel is realized, the personification degree of the robots is improved, and the use experience of the users is also improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an interaction method of a anthropomorphic robot provided by an embodiment of the application;

FIG. 2 is one of the frame diagrams of the anthropomorphic robot provided by the embodiment of the application;

FIG. 3 is a second schematic diagram of a frame of a humanoid robot provided by an embodiment of the present application;

fig. 4 is a schematic diagram of an interaction device of a humanoid robot according to an embodiment of the present application.

Detailed Description

With the progress of society, robots are widely used not only in industry, medicine, agriculture or military, but also in social contact with humans at the beginning of life. Robots in common society are mostly used in households, especially household intelligent robots, such as alpha eggs, to provide more convenience for family members in life. However, at present, these robots can only passively receive information and passively answer the questions of the users, and the answer mode is single, just like a preset player, and cannot actively and timely learn the emotion of the users, for example, timely learn that some users may be in an emotion state that needs to be concerned and is comforted. Therefore, the existing household intelligent robots can only wake up passively, and cannot actively work for users such as family personnel, so that the degree of personification of the robots is insufficient, and the robots cannot really have human-like thinking or active interaction modes, so that the robots are very rigid, and the use requirements of modern households are difficult to meet.

In order to solve the defects, the application provides an interaction method of a anthropomorphic robot, wherein a target robot actively collects environmental information within a preset range around; the environment information comprises at least one of sound information, image information and contact information of the target robot and other people, and then identity information and emotion states of users in a preset range around the target robot are identified by utilizing the environment information; then, according to the identity information and the emotion state of the user, the personified interaction information which accords with the emotion state of the user can be actively fed back to the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information. Therefore, according to the embodiment of the application, the target robot actively collects surrounding environment information to identify the emotional states of users such as surrounding family members, so that the user can be actively fed back the personified interaction information such as voice or action conforming to the emotional states according to the emotional states, and the user is "in-the-worry and difficult", so that more active intelligent interaction between the target robot and the users such as family personnel is realized, the personification degree of the robots is improved, and the use experience of the users is also improved.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

First embodiment

Referring to fig. 1, a flow chart of an interaction method of a humanoid robot provided in this embodiment includes the following steps:

s101: actively collecting environmental information in a preset range around the target robot; wherein the environment information includes at least one of sound information, image information, contact information of the target robot with others.

In this embodiment, any anthropomorphic robot that needs to realize active intelligent interaction with a user is defined as a target robot, and it should be noted that the type of the target robot is not limited in this embodiment, for example, the target robot may be an artificial intelligence learning assistant alpha egg robot, which can provide a learning mode for a child by using an artificial intelligence technology customized for the child, and improve learning interest and efficiency.

In order to improve personification of the target robot, the target robot is required to actively collect environmental information within a preset range around the target robot, for example, the target robot can actively collect environmental information within 5 meters around the target robot, the specific range value can be set according to actual conditions, and the embodiment of the application is not limited to the specific range value. In addition, in order to improve the initiative of the target robot, a timer may be used to periodically start the information acquisition function of the target robot, so that the target robot may acquire the environmental information around the target robot in time, so as to execute the subsequent step S102.

The environmental information in the preset range around the target robot, which is actively collected by the target robot, can include at least one of sound information, image information and contact information between the target robot and others. The initiative of the target robot is shown in that the user such as a family personnel is not required to wake up, the environment information acquisition process is carried out at regular time, so that the personification degree of the target robot is high, and the use experience of the user on the target robot is improved through more initiative intelligent interaction with the user such as the family personnel.

An alternative implementation manner, the specific implementation process of the step S101 may include the following steps A1-A3:

Step A1: and actively collecting sound information in a preset range around the target robot by utilizing a preset sound sensor on the target robot.

In this implementation manner, in order to facilitate the target robot to actively and timely acquire sound information within a preset range around the target robot, a sensor for acquiring sound is usually installed on the target robot in advance, and the type and specification of the sound sensor are not limited and can be selected according to actual situations. For example, a microphone array with a specific specification can be selected to be mounted on the target robot, and used as a sound sensor to periodically and actively collect sound information within a preset range (such as within 5 meters) around the target robot. And further, by identifying the acquired sound information, whether the sound of the person walking or other sounds which can represent the existence of the user exists in the surrounding preset range is judged, so that the subsequent step A2 is executed.

Step A2: when the fact that the user exists in the preset range around the target robot is judged through the sound information, a shooting device preset on the target robot is started to shoot image information containing the user.

In this implementation manner, in order to facilitate the target robot to actively and timely acquire image information within a preset range around the target robot, a photographing device for photographing an image or a video is usually installed on the target robot in advance, and the type and the specification of the photographing device are not limited and can be selected according to actual situations. For example, a camera with higher definition can be selected to be mounted on the target robot, so that when it is determined that users exist in a preset range (such as within 5 meters) around the target robot, images containing the users can be timely shot. The image may then be subsequently identified in step S102 to determine the identity information and emotional state of the users, so as to perform the subsequent step S103.

Step A3: and actively collecting contact information between the target robot and users in a surrounding preset range by utilizing a preset touch sensor on the target robot.

In this implementation manner, in order to facilitate the target robot to actively and timely obtain the contact information in the preset range around the target robot, a touch sensor for obtaining the contact information between the user and the target robot is usually mounted on the target robot in advance, and the type and specification of the touch sensor are not limited and can be selected according to actual situations. When the user touches the target robot within a preset range (e.g., within 5 meters) around the target robot, for example, when the user touches an arm of the target robot or the user beats the target robot, the touch information is obtained by a touch sensor mounted on the target robot, and then the touch information is identified in step S102, so as to determine the emotional state of the user, so as to execute the subsequent step S103.

S102: and identifying the identity information and the emotion state of the user in a preset range around the target robot by using the environment information.

In this embodiment, after the target robot actively collects the environmental information in the surrounding preset range through step S101, the environmental information may be further processed to identify the identity information and the emotion state of the user in the surrounding preset range of the target robot according to the processing result, for example, identify which family members (such as father, mom, or child in the family) are in the surrounding preset range of the user, and the current emotion state (such as happiness, anger, etc.) of the user, so as to execute the following step S103.

In particular, an alternative implementation is when the environmental information comprises sound information; the specific implementation process of the step S102 may include the following steps B1-B2:

step B1: and extracting voiceprint features of the sound information as target voiceprint features.

In this implementation manner, after the target robot actively collects the voice information in the surrounding preset range through step S101, voice recognition can be further performed on the voice information, the voice print feature representing the voice print information of the target robot is extracted from the voice information, and is defined as the target voice print feature, and is used as the recognition basis, so that effective recognition of the voice information is realized through the subsequent step B2, and the identity information and the emotion state of the target user to which the target voice print feature belongs are further recognized.

Specifically, when extracting voiceprint features of sound information, firstly, framing processing is required to be carried out on target voice to obtain a corresponding voice frame sequence, and then pre-emphasis is carried out on the framed voice frame sequence; and sequentially extracting voiceprint features of each voice frame, wherein the voiceprint features refer to feature data for representing voiceprint information of the corresponding voice frame, and can be, for example, mel-frequency coefficient (Mel-scale Frequency Cepstral Coefficients, MFCC) features or Log Mel-Filter Bank (FBANK) features.

It should be noted that, the embodiment of the present application is not limited to the extraction method of the voiceprint feature of the sound information, nor is it limited to a specific extraction process, and an appropriate extraction method may be selected according to the actual situation, and the corresponding feature extraction operation may be performed.

Step B2: comparing the target voiceprint characteristics with preset voiceprint characteristics of preset users stored in the target robot, and determining identity information and emotion states of the target users to which the target voiceprint characteristics belong.

In order to improve the personification degree of the target robot, more active intelligent interaction with users such as family personnel is realized. It is generally required that personal information of a family member or a user in contact with the family member is stored in advance in a target robot by a person, and information of an interaction object with which the target robot has interacted is recorded by the target robot itself.

The personal information of the family member or the user contacting the family member stored in the target robot in advance may include the birth month, the weight, the height, the time to work and work, the personal preference, the personal image (such as the head portrait image, the whole body image, etc.), the voiceprint feature information, etc. of the user. The personal image also comprises preset image characteristics representing various emotional states (such as happiness, anger, fun, happiness, vexation, tiredness, hunger and the like) of the corresponding user, and the voiceprint characteristic information also comprises preset voiceprint characteristics representing various emotional states (such as happiness, anger, fun, happiness, vexation, tiredness, hunger and the like) of the corresponding user.

The information of the interactive object interacted with the target robot can also comprise image information, sound information and the like corresponding to the interactive object, the information can be acquired through a shooting device or a sound sensor on the target robot in the interaction process, and it can be understood that the information can be updated and perfected in the continuous interaction process. And if the interaction object is a user which is not stored before, a new character file is established for the user in the interaction process, and the personal information of the user is perfected in the subsequent interaction.

On the basis, after the target voiceprint feature is extracted in the step B1, the target voiceprint feature can be further compared with the preset voiceprint feature of a preset user (namely, a preset family member or a user in contact with the family member and an interactive object interacted with the target robot) stored in the target robot in advance, and the identity information and the emotion state of the target user to which the target voiceprint feature belongs are determined. For example, assuming that the collected sound information is "mango eaten in kindergarten today is good to eat", the identity information of the target user to which the target voiceprint feature belongs can be determined to be a child in a family member by comparing the voiceprint features, and the corresponding emotion state is "happiness".

Specifically, when M preset users stored in the target robot in advance respectively correspond to N types of emotional states, and M and N are positive integers greater than 1, the specific implementation process of the step B2 includes the following steps B21-B22:

step B21: and calculating M similarity between the target voiceprint features and M preset voiceprint features of M preset users.

In this implementation manner, when the identity information of the target user to which the target voiceprint feature belongs needs to be confirmed to identify which of M preset users stored in the target robot is the target user, after the target voiceprint feature of the target user is obtained in step B1, M similarities between the target voiceprint feature of the target user and M preset voiceprint features of the M preset users are further calculated, so as to execute the subsequent step B22, where the specific calculation formula is as follows:

Wherein v is ₁ A target voiceprint feature representing a target user; v ₂ Representing preset voiceprint features of a preset user; cos (v) ₁ ,v ₂ ) Representing the similarity between the target voiceprint features of the target user and the preset voiceprint features of the preset user, cos (v ₁ ,v ₂ ) The higher the value of (c) indicates that the target user is more similar to the preset user, i.e., the greater the likelihood that the target user is the same person as the preset user, whereas cos (v) ₁ ,v ₂ ) The smaller the value of (2) is, the target is representedThe less similar the user and the preset user are, i.e., the less likely that the target user and the preset user are the same person.

Step B22: selecting the maximum similarity from the M similarities, and determining the target user to which the target voiceprint features belong as a preset user corresponding to the maximum similarity; and determining the emotional state of the target user from the N types of emotional states.

In this implementation manner, after calculating M similarities between the target voiceprint features of the target user and M preset voiceprint features of M preset users through step B21, a maximum similarity may be further selected from the M similarities, and the target user is determined to be the preset user corresponding to the maximum similarity. And according to the tone information contained in the target voiceprint characteristics of the target user, determining that the emotional state of the user is at least one of N types of emotional states (such as happy, angry, grive, happy, restless, stranded, tired and hungry) corresponding to the preset user.

Illustrating: assuming that the acquired sound information is "mango eaten in kindergarten today is good and delicious", and three preset users A, B and C and eight emotion states corresponding to each user are prestored in the target robot, "happy, angry, funny, restless, tired and hungry", the similarity between the target voiceprint characteristic of the target user and the preset voiceprint characteristic of the preset user A is calculated to be 0.09, the similarity between the target voiceprint characteristic of the target user and the preset voiceprint characteristic of the preset user B is 0.93, and the similarity between the target voiceprint characteristic of the target user and the preset voiceprint characteristic of the preset user C is 0.13, the highest similarity can be determined to be 0.94, and the identity of the target user is the preset user B according to the highest similarity of 0.94. Further, by analyzing the tone information contained in the target voiceprint feature of the target user, it can be determined that the emotional state of the user is "happy" or "happy" in eight types of emotional states corresponding to the preset user b.

Another alternative implementation is when the environmental information includes image information; the specific implementation process of the step S102 may include the following steps C1-C2:

Step C1: image features of the image information are extracted as target image features.

In this implementation manner, after the target robot actively collects the image information in the surrounding preset range through step S101, it further performs voice recognition on the image information, extracts image features from the image information, defines the image features as target image features, and uses the target image features as recognition basis, so as to implement effective recognition on the image information through a subsequent step C2, and further recognize identity information and emotional state of the target user to which the target image features belong.

It should be noted that, the embodiment of the present application is not limited to the extraction method of the image features of the image information, nor is it limited to a specific extraction process, and an appropriate extraction method may be selected according to the actual situation, and the corresponding feature extraction operation may be performed.

Step C2: comparing the target image characteristics with preset image characteristics of a preset user stored in the target robot, and determining identity information and emotional states of the target user to which the target image characteristics belong.

In this implementation manner, after the target image feature is extracted in the step C1, the target image feature may be further compared with a preset image feature of a preset user stored in the target robot in advance (that is, a user in the target robot that prestores a family member or contacts a family member and an interactive object that interacts with the target robot), so as to determine identity information and an emotional state of the target user to which the target image feature belongs. For example, assuming that the collected image information is a smiling face close-up image of a child, by comparing the image features, it can be determined that the identity information of the target user to which the target image features belong is a child in the family member, and the corresponding emotional state is "happiness".

Specifically, when M preset users stored in the target robot in advance respectively correspond to N types of emotional states, and M and N are positive integers greater than 1, the specific implementation process of the step C2 includes the following steps C21-C22:

step C21: and calculating M similarities between the target image features and M preset image features of M preset users.

In this implementation manner, when the identity information of the target user to which the target image feature belongs needs to be confirmed to identify which of M preset users stored in the target robot is the target user, after the target image feature of the target user is obtained in step C1, M similarities between the target image feature of the target user and M preset image features of the M preset users may be further calculated, so as to execute the subsequent step C22, where a specific calculation formula is as follows:

wherein p is ₁ A target image feature representing a target user; p is p ₂ Representing preset image features of a preset user; cos (p) ₁ ,p ₂ ) Representing the similarity between the target image features of the target user and the preset image features of the preset user, cos (p ₁ ,p ₂ ) The higher the value of (c) indicates that the target user is more similar to the preset user, i.e., the greater the likelihood that the target user is the same person as the preset user, whereas cos (p ₁ ,p ₂ ) The smaller the value of (c) indicates that the target user and the preset user are less similar, i.e., the less likely that the target user and the preset user are the same person.

Step C22: selecting the maximum similarity from the M similarities, and determining the target user to which the target image features belong as a preset user corresponding to the maximum similarity; and determining the emotional state of the target user from the N types of emotional states.

In this implementation manner, after calculating M similarities between the target image features of the target user and M preset image features of the M preset users through step C21, the maximum similarity may be further selected from the M similarities, and the target user is determined to be the preset user corresponding to the maximum similarity. And according to the expression information contained in the target image characteristics of the target user, determining that the emotional state of the user is at least one of N types of emotional states (such as happy, angry, sad, tired, hungry) corresponding to the preset user.

Illustrating: assuming that the acquired image information is a smiling face close-up image of a child, and three preset users J, Q, K and eight emotion states 'happy, angry, fun, tired and hungry' corresponding to each user are pre-stored in the target robot, the similarity between the target image characteristics of the target users and the preset image characteristics of the preset user J is calculated to be 0.12, the similarity between the target image characteristics of the target users and the preset image characteristics of the preset user Q is calculated to be 0.21, and the similarity between the target image characteristics of the target users and the preset image characteristics of the preset user K is calculated to be 0.85, the highest similarity can be determined from the preset image characteristics, and the identity of the target user is the preset user K according to the highest similarity of 0.85. Further, by analyzing expression information (i.e., "smile") included in the image information of the target user, it can be determined that the emotional state of the user is "happy" or "happy" among eight types of emotional states corresponding to the preset user K.

Yet another alternative implementation is when the environmental information includes sound information and image information; the specific implementation process of the step S102 may include the following steps D1-D2:

step D1: extracting voiceprint features of the sound information as target voiceprint features; and extracting image features of the image information as target image features.

In this implementation manner, the implementation process of the step D1 is identical to the steps B1 and C1, and the relevant points can be referred to the detailed description of the steps B1 and C1, which are not repeated here. .

Step D2: comparing the target voiceprint characteristics with preset voiceprint characteristics of a preset user stored in the target robot in advance to obtain a first comparison result; and comparing the target image characteristics with preset image characteristics of a preset user stored in the target robot in advance to obtain a second comparison result.

In this implementation manner, after the target voiceprint feature and the target image feature are extracted in the step D1, the implementation processes of the steps B2 and C2 may be further executed to respectively compare the target voiceprint feature with the preset voiceprint feature of the preset user stored in the target robot, and to respectively compare the target image feature with the preset image feature of the preset user stored in the target robot, so as to respectively obtain the first comparison result and the second comparison result.

Step D3: and determining the identity information and the emotional state of the target user to which the target voiceprint feature and the target image feature belong according to the first comparison result and the second comparison result.

In this implementation manner, after the first comparison result and the second comparison result are obtained in the step D2, the first comparison result and the second comparison result may be further comprehensively analyzed to determine identity information and emotional state of the target user to which the target voiceprint feature belongs. For example, assuming that the collected sound information is "mango eaten today in kindergarten is good to eat", the comparison is performed through voiceprint features, and the first comparison result is that: the identity information of the target user to which the target voiceprint feature belongs is a child in the family member, and the corresponding emotional state is "happiness". And the acquired image information is assumed to be a smiling close-up image of the face of a child, and then the acquired image information is compared through image features, and a second comparison result is that: the identity information of the target user to which the target image feature belongs is a child in the family member, and the corresponding emotional state is "happiness". Therefore, through comprehensive analysis of the sound comparison result and the image comparison result, the target user can be determined to be the child in the family member, and the corresponding emotion state is 'happiness'.

Specifically, when M preset users stored in the target robot in advance respectively correspond to N types of emotional states, and M and N are positive integers greater than 1, the specific implementation process of the step D2 is as follows: calculating M first similarities between the target voiceprint features and M preset voiceprint features of M preset users; and calculating M second similarities between the target image features and M preset image features of the M preset users.

In this implementation manner, when the identity information of the target user to which the target voiceprint feature and the target image feature belong needs to be confirmed to identify which of M preset users stored in the target robot is the target user, after the target voiceprint feature possessed by the target user is obtained in step B1 and the target image feature possessed by the target user is obtained in step C1, M similarities (which are defined as first similarities) between the target voiceprint feature and M preset voiceprint features of the M preset users and M similarities (which are defined as second similarities) between the target image feature and M preset image features of the M preset users may be further calculated by using the above formulas (1) and (2), respectively.

On the basis, the specific implementation process of the step D2 is as follows: selecting a corresponding preset user when the sum of the first similarity and the second similarity meets a preset condition from M first similarities and M second similarities as a target user to which the target voiceprint feature and the target image feature belong; and determining the emotional state of the target user from the N types of emotional states.

Specifically, after calculating the M first similarities and the M second similarities, a preset user corresponding to the sum of the first similarities and the second similarities satisfying a preset condition may be further selected from the M first similarities and the M second similarities, where the preset condition may be set according to an actual situation, and the embodiment is not limited thereto, and may be set to select a highest value of the sum of the first similarities and the second similarities, or may be set to select a highest value of the sum of the first similarities and the second similarities after selecting the first similarities and the second similarities higher than a preset threshold, respectively. Further, the preset user corresponding to the preset condition may be used as the target user, and the emotional state of the user may be determined to be at least one of N types of emotional states (such as happy, anger, grippe, happy, restlessness, tired, hungry) corresponding to the preset user according to the tone information included in the target voiceprint feature of the target user and the expression information included in the target image feature of the target user.

Yet another alternative implementation is when the environmental information includes contact information of the target robot with others; the implementation process of the step S102 may specifically include: according to the contact information of the target robot and other people, identifying the identity information and the emotion state of the user in a preset range around the target robot

In this implementation manner, after the target robot actively collects the contact information of the user contacting the target robot in the surrounding preset range through step S101, the contact information may be further analyzed and identified, so as to determine the emotional state of the user according to the analysis result, so as to execute the subsequent step S103. The contact information includes, but is not limited to, information of touch, tap, collision and the like between the user and the target robot.

For example: assuming that the collected contact information is that the target robot is continuously beaten by a certain user, the emotion state of the user can be determined to be 'anger', and the identity information of the user can be further determined through the photographed image information containing the user.

It should be noted that, when the environment information includes the sound information, the image information and the contact information between the target robot and another person, the three are preferably comprehensively analyzed to more accurately identify the identity information and the emotion state of the user according to the processing result, and the specific analysis process is described above and will not be repeated here.

S103: and actively feeding back personified interaction information conforming to the emotion state of the user to the user according to the identity information and the emotion state of the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information.

In this embodiment, after identifying the identity information and the emotion state of the user in the preset range around the target robot in step S102, the personified interaction information meeting the emotion state of the user may be actively fed back to the corresponding user according to the identity information and the emotion state of each user, where the personified interaction information includes at least one of voice information, expression information, and action information. For example, when the target robot recognizes that the user is in a sad emotion state, a certain amount of comfort voice can be actively fed back to the target robot, so that more active intelligent interaction with the target robot is realized for 'anxiety difficulty', the 'personification' degree of the target robot is improved, and the use experience of the user (such as the target robot) is further improved.

In one possible implementation manner of the embodiment of the present application, the specific implementation process of the step S103 may include: actively feeding back voice information conforming to the emotion state of the user to the user; wherein, the voice information reflects the target emotion state of the target robot through the change of tone; the target emotional state is in accordance with the emotional state of the user; and/or actively feeding back expression information conforming to the emotion state of the user to the user; the expression information is a target emotion state of the target robot reflected by at least one of color change, expression image change and lamplight change on a display screen of the target robot; the target emotional state is in accordance with the emotional state of the user.

In this implementation, after identifying the identity information and the emotional state of surrounding users through S102, the target robot may further actively respond to the users to implement intelligent interaction with the users, such as sending anthropomorphic speech for feedback, and reflecting the "target emotional state" of the target robot through the tone change of the "speaking" language, where the target emotional state is in accordance with the emotional state of the user. For example, assuming that the sound information collected by the child is "mango eaten at kindergarten is good at the same time today," the target robot can feedback the voice to the child by using a cheerful intonation "too excellent, a child is happy with you, hope that more and better food can be eaten at the kindergarten later, so as to reflect that the corresponding target emotional state of the target robot is" happy "and accords with the emotional state of the child" happy "or" happy ".

And/or, the target robot can also reflect the 'target emotion state' of the target robot through the modes of self display screen color, expression images, lamplight and the like and the change of the overall color of the surface of the target robot, and the target emotion state accords with the emotion state of a user. For example, assuming that the sound information collected by the child is "how good the mango is at today's kindergarten, the target robot may display a face-up image of the thumb on the display screen, or flash the light of the display screen in various preset face-up modes to reflect that the corresponding target emotional state of the target robot is" happy ", and accords with the emotional state of the child" happy "or" happy ".

In another possible implementation manner of the embodiment of the present application, the specific implementation process of the step S103 may include: the preset swinging part swings out a preset action, and action information conforming to the emotion state of the user is actively fed back to the user; wherein, the action information reflects the target emotion state of the target robot; the target emotional state is in accordance with the emotional state of the user.

In this implementation manner, after the identity information and the emotional state of the surrounding users are identified through S102, the target robot may further swing out a preset action through the preset swinging component, and actively respond to the users, so as to implement intelligent interaction with the users. For example, as shown in fig. 2, it is assumed that the preset swing part is an "ear" of the target robot, and thus a "target emotional state" of the target robot can be reflected by the back-and-forth swing of the "ear", and the target emotional state is in accordance with the emotional state of the user. For example, assuming that the sound information collected by the child is "how good the mango is at eating in the kindergarten today", the target robot can respond to the emotion state of the child, which is "happy" by means of the back-and-forth swing through the ear, and accords with the emotion state of the child.

For another example, as shown in fig. 3, it is assumed that the preset swing part is a "tail" of the target robot, and then a "target emotional state" of the target robot can be reflected by the up-and-down swing of the "tail", and the target emotional state is in accordance with the emotional state of the user. For example, still assuming that the sound information collected by the child is "how good the mango is at today's kindergarten," the target robot can respond to the emotion state of the child corresponding to the "happiness" by swinging up and down through the "tail" and accords with the emotion state of the child.

It should be noted that, after identifying the identity information and the emotional state of surrounding users, the target robot can further actively respond to the users in a mode of combining voice, expression and action so as to realize intelligent interaction with the users. For example, still assuming that the acquired sound information of the child is "how good the mango is at today's kindergarten, after the target robot recognizes the emotion state of the child" happy "or" happy ", the target robot can further use the cheerful tone to feed back the voice" too excellent "to the child, so that the child is happy and you, hope that more and better food can be eaten at the kindergarten later, meanwhile, the face-up image of the thumb can be displayed on the display screen, and the ear can swing back and forth to jointly reflect that the target emotion state corresponding to the target robot is" happy "and accords with the emotion state of the child" happy "or" happy ".

In this way, through the steps S101-S103, after the target robot actively collects the environmental information including the sound information, the image information and the contact information, firstly, the environmental information is used to accurately determine the identities of the users and the emotional states of the users within the surrounding preset range, then actively communicate with the users, and in the communication process, the personified emotion of the target robot is reflected through the change of the voice tone of the target robot, the change of the surface skin or the display screen and various different actions, so that more personified and more active communication between the target robot and the users such as family personnel is realized, and the communication is actively performed with the users such as family members, so that different anxiety is eliminated for different family members.

In summary, in the interaction method of the anthropomorphic robot provided in the embodiment, the target robot actively collects the environmental information in the surrounding preset range; the environment information comprises at least one of sound information, image information and contact information of the target robot and other people, and then identity information and emotion states of users in a preset range around the target robot are identified by utilizing the environment information; then, according to the identity information and the emotion state of the user, the personified interaction information which accords with the emotion state of the user can be actively fed back to the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information. Therefore, according to the embodiment of the application, the target robot actively collects surrounding environment information to identify the emotional states of users such as surrounding family members, so that the user can be actively fed back the personified interaction information such as voice or action conforming to the emotional states according to the emotional states, and the user is "in-the-worry and difficult", so that more active intelligent interaction between the target robot and the users such as family personnel is realized, the personification degree of the robots is improved, and the use experience of the users is also improved.

Second embodiment

The embodiment will be described with reference to an interactive apparatus of a humanoid robot, and the related content refers to the above-mentioned method embodiment.

Referring to fig. 4, a schematic diagram of an interaction device of a humanoid robot according to this embodiment is provided, and the device 400 includes:

the acquisition unit 401 is configured to actively acquire environmental information within a preset range around the target robot; the environment information comprises at least one of sound information, image information and contact information of the target robot and others;

an identifying unit 402, configured to identify identity information and an emotional state of a user within a preset range around the target robot by using the environmental information;

and the feedback unit 403 is configured to actively feedback, to the user, anthropomorphic interaction information according with the emotional state of the user according to the identity information and the emotional state of the user, where the anthropomorphic interaction information includes at least one of voice information, expression information, and action information.

In one implementation of this embodiment, the collecting unit 401 includes:

In one implementation of this embodiment, when the environmental information includes sound information; the identification unit 402 includes:

In one implementation of this embodiment, when the environmental information includes image information; the identification unit 402 includes:

In one implementation of the present embodiment, when the environment information includes sound information and image information; the identification unit 402 includes:

In one implementation manner of this embodiment, N types of emotional states corresponding to each of M preset users stored in advance in the target robot; the first contrast subunit includes:

In one implementation manner of this embodiment, N types of emotional states corresponding to each of M preset users stored in advance in the target robot; the second contrast subunit includes:

In one implementation manner of this embodiment, N types of emotional states corresponding to each of M preset users stored in advance in the target robot; the third contrast subunit is specifically configured to:

the determining subunit is specifically configured to:

In one implementation of this embodiment, when the environmental information includes contact information of the target robot with others; the identification unit 402 is specifically configured to:

In one implementation manner of this embodiment, the feedback unit 403 is specifically configured to:

In one implementation of this embodiment, the target robot includes a preset swing member; the feedback unit is specifically configured to:

Further, the embodiment of the application also provides an interaction device of the anthropomorphic robot, which comprises: a processor, memory, system bus;

The processor and the memory are connected through the system bus;

the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform any of the implementations of the humanoid robot interaction method described above.

Further, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on the terminal equipment, the terminal equipment is caused to execute any implementation method of the interaction method of the anthropomorphic robot.

Further, the embodiment of the application also provides a computer program product, which when being run on the terminal equipment, causes the terminal equipment to execute any implementation method of the interaction method of the anthropomorphic robot.

From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An interactive method of a personification robot is characterized by comprising the following steps:

the active collection of environmental information within a preset range around the target robot comprises the following steps: actively collecting sound information in a preset range around the target robot by utilizing a preset sound sensor on the target robot; when the existence of a user in a preset range around the target robot is judged through the sound information, starting a shooting device preset on the target robot to shoot image information containing the user; actively collecting contact information between the target robot and users in a surrounding preset range by utilizing a preset touch sensor on the target robot;

when the environment information includes sound information and image information; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

according to the first comparison result and the second comparison result, determining identity information and emotional states of the target users to which the target voiceprint features and the target image features belong;

n types of emotion states corresponding to each of M preset users stored in advance in the target robot; comparing the target voiceprint characteristics with preset voiceprint characteristics of a preset user stored in the target robot in advance to obtain a first comparison result; and comparing the target image features with preset image features of a preset user stored in the target robot in advance to obtain a second comparison result, wherein the second comparison result comprises:

selecting a corresponding preset user from the M first similarities and the M second similarities when the sum of the first similarities and the second similarities meets a preset condition, and taking the corresponding preset user as a target user to which the target voiceprint feature and the target image feature belong; determining the emotional state of the target user from the N types of emotional states;

actively feeding back personified interaction information which accords with the emotion state of the user to the user according to the identity information and the emotion state of the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information;

the actively feeding back anthropomorphic interaction information conforming to the emotion state of the user to the user comprises the following steps:

2. The method of claim 1, wherein when the environmental information comprises sound information; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

3. The method of claim 1, wherein when the environmental information comprises image information; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

extracting image features of the image information as target image features;

4. The method according to claim 2, wherein M preset users stored in the target robot each correspond to N types of emotional states; comparing the target voiceprint feature with preset voiceprint features of preset users stored in the target robot, and determining identity information and emotion states of the target users to which the target voiceprint feature belongs, wherein the method comprises the following steps:

5. A method according to claim 3, wherein M preset users stored in the target robot each correspond to N types of emotional states; comparing the target image features with preset image features of preset users stored in the target robot, and determining identity information and emotional states of the target users to which the target image features belong, wherein the method comprises the following steps:

6. The method of claim 1, wherein when the environmental information includes contact information of the target robot with others; the identifying, by using the environmental information, identity information and emotional states of the user within a preset range around the target robot includes:

7. The method of claim 1, wherein the target robot comprises a preset swing member; the actively feeding back anthropomorphic interaction information conforming to the emotion state of the user to the user comprises the following steps:

8. An interactive apparatus for a personified robot, comprising:

when the environment information includes sound information and image information; the identification unit includes:

the determining subunit is used for determining the identity information and the emotion state of the target user to which the target voiceprint feature and the target image feature belong according to the first comparison result and the second comparison result;

n types of emotion states corresponding to each of M preset users stored in advance in the target robot; the third contrast subunit is specifically configured to:

The determining subunit is specifically configured to:

the acquisition unit comprises:

the second acquisition subunit is used for actively acquiring contact information between the target robot and users in a surrounding preset range by utilizing a preset touch sensor on the target robot; the feedback unit is used for actively feeding back personified interaction information which accords with the emotion state of the user to the user according to the identity information and the emotion state of the user, wherein the personified interaction information comprises at least one of voice information, expression information and action information;

The feedback unit is specifically configured to:

9. An interactive apparatus for personifying a robot, comprising: a processor, memory, system bus;

the processor and the memory are connected through the system bus;

the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to perform the method of any of claims 1-7.