CN112711331A

CN112711331A - Robot interaction method and device, storage equipment and electronic equipment

Info

Publication number: CN112711331A
Application number: CN202011579071.3A
Authority: CN
Inventors: 郭小俊
Original assignee: Jingdong Shuke Haiyi Information Technology Co Ltd
Current assignee: Jingdong Shuke Haiyi Information Technology Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-27

Abstract

The present disclosure relates to the field of robot control technologies, and in particular, to a robot interaction method and apparatus, a storage device, and an electronic device. The robot comprises a main body, a rotatable head and a rotatable chassis, wherein the rotatable head and the rotatable chassis are arranged on the main body, and the robot interaction method comprises the following steps: after receiving the dialogue information, acquiring the current sound source direction corresponding to the dialogue information and the current pose of the robot; calculating first pose adjustment information according to the current sound source direction and the current pose; adjusting the pose of the head and/or chassis based on the first pose adjustment information to match the head of the robot with the current sound source direction. The robot interaction method can improve the efficiency and safety of human-computer interaction.

Description

Robot interaction method and device, storage equipment and electronic equipment

Technical Field

The present disclosure relates to the field of robot control technologies, and in particular, to a robot interaction method and apparatus, a storage device, and an electronic device.

Background

The robot is a comprehensive system integrating multiple functions of environment perception, dynamic decision and planning, behavior control and execution and the like. With the development of robotics, there are more and more application scenarios of robots, such as business robots launched in malls, hotels, banks, etc., in which the robots need to perform human-computer interaction with users.

In order to avoid potential safety hazards when the robot executes actions, most of the existing business robots interact with people by playing pictures or audio in a screen, and the gesture of the robot cannot be changed in the interaction process to actively follow an interaction object. The mode enables the robot to be numb and stiff when interacting with people, and the human-computer interaction effect is poor as that of using a tablet.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a robot interaction method, apparatus, storage medium, and electronic device, and aims to improve efficiency and safety of human-computer interaction.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the embodiments of the present disclosure, there is provided a robot interaction method applied to an interactive robot, the robot including a main body, and a rotatable head and a rotatable chassis provided on the main body, the method including: after receiving the dialogue information, acquiring the current sound source direction corresponding to the dialogue information and the current pose of the robot; calculating first pose adjustment information according to the current sound source direction and the current pose; adjusting the pose of the head and/or chassis based on the first pose adjustment information to match the head of the robot with the current sound source direction.

According to some embodiments of the present disclosure, based on the foregoing scheme, the current sound source direction includes: the dialogue information corresponds to a current sound source horizontal angle and a current sound source vertical angle of a sound source relative to the robot head.

According to some embodiments of the present disclosure, based on the foregoing solution, the current pose includes: a current head pose and a current chassis direction, wherein the current head pose comprises a current head horizontal angle and the current head vertical angle.

According to some embodiments of the present disclosure, based on the foregoing solution, the head is configured with a rotatable range, the first pose adjustment information includes head horizontal direction adjustment information or chassis horizontal direction adjustment information, and head vertical direction adjustment information, and the calculating the first pose adjustment information according to the current sound source direction and the current pose includes: when the current sound source horizontal angle is within the rotatable range, calculating the difference value between the current sound source horizontal angle and the current head horizontal angle to obtain the head horizontal direction adjustment information; when the current sound source horizontal angle is out of the rotatable range, taking a negative value of the current sound source horizontal angle as adjustment information of the chassis horizontal direction; when the vertical angle of the current sound source is within the rotatable range, calculating the difference value between the vertical angle of the current sound source and the vertical angle of the current head to obtain the adjustment information of the vertical direction of the head; and when the vertical angle of the current sound source is out of the rotatable range of the head, calculating the difference value between the limit vertical angle close to the current sound source direction and the current head vertical angle to obtain the adjustment information of the head vertical direction.

According to some embodiments of the present disclosure, based on the foregoing, when the current sound source horizontal angle is outside the rotatable range, the first posture adjustment information further includes head horizontal direction adjustment information, and the method further includes: and taking a negative value of the current head horizontal angle as head horizontal direction adjustment information so as to enable the head of the robot to be matched with the current sound source direction.

According to some embodiments of the present disclosure, based on the foregoing solution, after adjusting the pose of the head and/or chassis based on the first pose adjustment information, the method further comprises: acquiring image information in a current visual field range through a camera of the robot so as to perform face recognition; when a face is identified, acquiring the current face direction corresponding to the face and the current head pose of the robot in real time; calculating second pose adjustment information according to the current face direction and the current head pose; and adjusting the posture of the head in real time based on the second posture adjustment information so as to enable the head of the robot to correspond to the current face direction.

According to some embodiments of the present disclosure, based on the foregoing solution, the current face direction includes: the face is relative to the current face horizontal angle and the current face vertical angle of the robot head.

According to some embodiments of the present disclosure, based on the foregoing scheme, when a plurality of faces are recognized, the method further includes: acquiring feature matching values of each face and a standard face; and selecting the face with the maximum feature matching value as a target face, and configuring the position of the target face as the current face position.

According to some embodiments of the present disclosure, based on the foregoing scheme, when a plurality of faces are recognized, the method further includes: calculating the direction of each face based on the depth information of the face; acquiring depth information of each face through a depth camera of the robot; calculating a direction of each face relative to the robot head based on the depth information; and selecting a target face according to the angle difference between the direction of each face and the current sound source direction, and configuring the position of the target face as the current face position.

According to some embodiments of the present disclosure, based on the foregoing solution, when the current face horizontal angle is outside the rotatable range, the second posture adjustment information includes head horizontal direction adjustment information, and the method further includes: and calculating the difference value between the limit horizontal angle close to the current face direction and the current head horizontal angle to obtain the head horizontal direction adjustment information.

The method further comprises the following steps: and when the horizontal angle of the current face is out of the rotatable range, calculating the difference value between the limit horizontal angle close to the direction of the current face and the horizontal angle of the current head to obtain the adjustment information of the horizontal direction of the head.

According to some embodiments of the present disclosure, based on the foregoing scheme, when the robot is in an interactive state, the method further includes: acquiring image information in a current visual field range through a camera of the robot so as to perform face recognition and acquire the waiting time of a current conversation period; when the face is identified and the waiting time is greater than a first preset value, playing a first audio and/or a first picture; when the face is not recognized and the waiting time is greater than a second preset value; or after receiving the end word, switching to a standby state.

According to some embodiments of the present disclosure, based on the foregoing solution, when the robot is in a standby state, the method further includes: identifying an interactive object through a human face and/or voice; and after the interactive object is identified, playing second audio and/or a second picture.

According to a second aspect of the embodiments of the present disclosure, there is provided a robot interaction device applied to an interactive robot, the robot including a main body, and a rotatable head and a rotatable chassis provided on the main body, the robot interaction device including: the monitoring module is used for acquiring the current sound source direction corresponding to the dialogue information and the current pose of the robot after the dialogue information is received; the calculation module is used for calculating pose adjustment information according to the current sound source direction and the current pose; an adjusting module, configured to adjust a pose of the head and/or the chassis based on the pose adjustment information, so that the head of the robot matches the current sound source direction.

According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a robot interaction method as in the above embodiments.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the robot interaction method as in the above embodiments.

Exemplary embodiments of the present disclosure may have some or all of the following benefits:

in the technical scheme provided by some embodiments of the present disclosure, in the robot interaction process, the pose of the head and/or the chassis of the interactive robot can be adjusted according to the sound source of the dialogue information, so that the head of the robot is matched with the current sound source direction. On one hand, the head of the robot is matched with the current sound source direction, so that the sense of reality during machine interaction can be enhanced, the man-machine interaction efficiency is improved, and the user experience is improved; on the other hand, the robot is provided with a rotatable head and a rotatable chassis, so that when the alignment posture is adjusted in the robot interaction process, the robot can be ensured to rotate the head and/or the chassis in situ, and the safety in the human-computer interaction process is ensured; on the other hand, after receiving the dialogue information, the pose of the robot is adjusted according to the current sound source of the dialogue information, so that the pose adjustment accuracy and the pose adjustment real-time performance are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 schematically illustrates a flow diagram of a method of robot interaction in an exemplary embodiment of the disclosure;

FIG. 2 schematically illustrates a schematic structural diagram of a robot in an exemplary embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of another method of robot interaction in an exemplary embodiment of the disclosure;

FIG. 4 schematically illustrates a flow diagram of a single person mode robot interaction method in an exemplary embodiment of the disclosure;

fig. 5 schematically illustrates a flowchart of a multi-person mode robot interaction method in an exemplary embodiment of the present disclosure;

FIG. 6 schematically illustrates a component schematic of a robotic interaction device in an exemplary embodiment of the disclosure;

FIG. 7 schematically illustrates a schematic diagram of a computer-readable storage medium in an exemplary embodiment of the disclosure;

fig. 8 schematically shows a structural diagram of a computer system of an electronic device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Most of the existing business robots are fixed in place, and users are slightly numb and rigid like using a tablet in the interaction mode through the hmi screen interaction or voice interaction; a small portion of the robot arms can perform some actions, such as the Pepper robot in japan, but may present some safety hazards, such as hitting a person while the arms are doing the actions, and not standing a horse to stop even if hitting a person.

In view of the problems in the related art, the present disclosure provides a robot interaction method, and details of implementation of the technical solutions of the embodiments of the present disclosure are set forth below.

Fig. 1 schematically illustrates a flow diagram of a robot interaction in an exemplary embodiment of the disclosure. As shown in fig. 1, the robot interaction method includes steps S1 to S3:

step S1, after receiving dialogue information, acquiring the current sound source direction corresponding to the dialogue information and the current pose of the robot;

step S2, calculating first pose adjustment information according to the current sound source direction and the current pose;

step S3, adjusting the pose of the head and/or chassis based on the first pose adjustment information so that the head of the robot matches the current sound source direction.

Hereinafter, each step of the game data processing method in the present exemplary embodiment will be described in more detail with reference to the drawings and examples.

In one embodiment of the present disclosure, the robot interaction method described above is applied to an interactive robot. The robot includes a body, and a rotatable head and a rotatable chassis disposed on the body.

Fig. 2 schematically illustrates a schematic structural diagram of a robot in an exemplary embodiment of the present disclosure. As shown in fig. 2, the head of the robot main body is provided with a plug-in device interface 201, a pickup microphone 202, a freedom degree joint 203, a front camera 207 and an expression screen 208; the upper end of the robot main body is also provided with a top indicator lamp 204, a card swiping area 209 and a depth camera 210; a display screen 211 is assembled in the middle of the robot body; the bottom of the robot main body is provided with a charging electrode 205, a chassis indicator lamp 206, a card swiping area 209, a depth camera 210, a display screen 211, an ultrasonic module 212, a laser radar 213 and a chassis 214; in addition, the robot is also equipped with and built-in modules, such as a CPU, a test sound board, a face recognition module, and the like.

The pickup microphone 202 can receive voice information, judge the sound source direction of the received voice information through a sound test board, and perform content recognition on the voice information through an analysis module, and corresponding interaction behaviors can be set for different voice contents.

The joint 203 can be used to control the movement of the robot head, and the head can be configured with a rotatable range to rotate in a certain angle between the horizontal direction and the vertical direction.

The front camera 207 can collect image information of the front camera in the current visual field range, and can call the face recognition module to perform face recognition after the image information is obtained, so that whether a face exists can be recognized, and the number of faces can be judged.

The depth camera 210 may be configured to, after the face recognition module recognizes a face, start detecting depth information of each recognized face, and then determine position information of each face according to the depth information.

The chassis 214 may be used to control chassis movement, such as turning in a horizontal direction or walking back and forth and side to side. The chassis rotates in the horizontal direction without angle limitation, and the obstacle avoidance in the moving process can be realized by combining the laser radar 213 when the chassis moves forwards, backwards, leftwards and rightwards.

In one embodiment of the present disclosure, the robot has two states: one is the interactive state, i.e. the robot interacts with the person; the other is a standby state, i.e. the robot waits to interact with a person.

When the robot receives the awakening words, the robot is activated, the robot is switched from the standby state to the interaction state, and the user can be prevented from entering the interaction by mistake. When in the interactive state, the wake-up word is received again, and the wake-up word is not processed.

In the interaction process, after the robot waits for receiving a conversation instruction, firstly, the pose is adjusted according to the current sound source direction of the conversation instruction so that the head of the robot faces an interactive object, then, the received conversation instruction is analyzed by an analysis module, corresponding conversation behaviors are executed according to the analysis result, and then, the robot waits for receiving the conversation instruction again and repeats the steps.

And when a preset ending condition is met in the interaction process or the robot receives an ending word, the robot is switched from the interaction state to the standby state.

It should be noted that, in order to avoid accumulation of the dialogue instructions in the pose adjustment process and influence the performance of human-computer interaction, the received dialogue instructions may not be processed when the pose of the head and/or the chassis is adjusted.

The receiving of the dialogue instruction and the execution of the dialogue action are regarded as one dialogue cycle, and then one interactive process of the robot comprises a plurality of dialogue cycles. When each conversation period is finished, the robot can determine an interactive object of the current conversation period according to the current sound source direction of the conversation instruction to adjust the pose; in each dialog period, the robot may adjust the head pose to follow the interactive object according to the interactive object of the current dialog period.

When the robot is in a standby state, the head and the chassis of the robot can be both located at initial positions, or the postures of the head and the chassis when the interaction state is maintained, or the postures of the chassis when the interaction state is maintained, so that the head is restored to the initial positions.

And step S1, after receiving the dialogue information, acquiring the current sound source direction corresponding to the dialogue information and the current pose of the robot.

In one embodiment of the present disclosure, the dialog information may be a wakeup word spoken by the user or a dialog instruction spoken by the user during the interaction process. That is, the robot acquires the corresponding current sound source direction and the current pose of the robot after hearing the wake-up word or the dialogue instruction, so as to adjust the pose of the head and/or the chassis.

It should be noted that the dialog information does not include the end word, because the robot directly ends the interaction after receiving the end word, at this time, it is not necessary to match the head of the robot with the current sound source direction.

In one embodiment of the present disclosure, the current sound source direction includes: the dialogue information corresponds to a current sound source horizontal angle and a current sound source vertical angle of a sound source relative to the robot head.

Specifically, when the current sound source direction is acquired, a test tone plate may be built in the head of the robot, and the sound source direction of the sound source of the dialogue information with respect to the test tone plate may be calculated from the test tone plate as the current sound source direction.

Preferably, a three-dimensional space coordinate system is established by taking the center of mass, the center, the front camera and the like of the robot head as an origin o, the horizontal plane where the robot main body is located as an xoy plane, the vertical direction where the robot main body is located as a z-axis, the direction opposite to the robot main body as a y-axis and the direction on the left side of the robot main body as an x-axis. And calculating an included angle between the current sound source direction and the yoz plane to obtain a current sound source horizontal angle, and calculating an included angle between the current sound source direction and the xoy plane to obtain a current sound source vertical angle.

Wherein the current sound source horizontal angle and the current sound source vertical angle are both divided into positive and negative, e.g. the current sound source vertical angle is +15 °, i.e. the current sound source is 15 ° in relation to the positive z-axis direction in the coordinate system

It should be noted that the origin of the spatial three-dimensional coordinate system may also be other parts of the robot, and only the current sound source position and the current pose need to be expressed relative to the coordinate system.

In one embodiment of the present disclosure, the current pose includes: a current head pose and a current chassis direction, wherein the current head pose comprises a current head vertical angle and the current head horizontal angle.

Specifically, since the head of the robot is rotatable in both the horizontal and vertical directions, the head pose includes these two angles, i.e., the current head vertical angle and the current head horizontal angle. And acquiring the vertical angle and the horizontal angle of the current head through the joint information of the freedom degree of the robot.

The chassis of the robot rotates on the xoy plane, so the pose information of the chassis is the current chassis direction, namely the y-axis direction.

Similarly, the head vertical angle, the current head horizontal angle, and the current chassis direction are also divided by plus or minus.

And step S2, calculating pose adjustment information according to the current sound source direction and the current pose.

In one embodiment of the present disclosure, the pose adjustment includes an adjustment of the robot in the horizontal direction and an adjustment in the vertical direction to match the head of the robot with the current sound source direction. The robot can be adjusted in the horizontal direction by rotating the head or the chassis at an angle in the horizontal direction, and can be adjusted in the vertical direction only by adjusting the head.

Therefore, the first posture adjustment information includes head posture adjustment information and/or chassis horizontal direction adjustment information, which in turn includes head horizontal direction adjustment information and head vertical direction adjustment information.

It should be noted that the head of the robot is configured with a rotatable range, i.e. the head can only rotate within a certain horizontal angle and a certain vertical angle. For example, the robot can rotate by an angle of [ -60 °, 60 ° ] in the horizontal direction, i.e. the head of the robot can rotate by 60 ° to the left, 60 ° to the right, and by an angle of [ -30 °, 30 ° ] in the vertical direction, i.e. the robot has a pitch angle of 30 ° and an elevation angle of 30 °. The rotation range of the head is set, so that the skeleton motion of the robot head is closer to the real motion of a human, and the head is prevented from being excessively twisted.

In an embodiment of the present disclosure, the calculating pose adjustment information according to the current sound source direction and the current pose specifically includes the following steps:

step S21, when the current sound source horizontal angle is in the rotatable range, calculating the difference between the current sound source horizontal angle and the current head horizontal angle to obtain the adjustment information of the head horizontal direction;

when the current sound source horizontal angle is out of the rotatable range, taking a negative value of the current sound source horizontal angle as adjustment information of the chassis horizontal direction; and

step S22, when the vertical angle of the current sound source is in the rotatable range, calculating the difference value between the vertical angle of the current sound source and the vertical angle of the current head to obtain the adjustment information of the vertical direction of the head;

and when the vertical angle of the current sound source is out of the rotatable range of the head, calculating the difference value between the limit vertical angle close to the current sound source direction and the current head vertical angle to obtain the adjustment information of the head vertical direction.

Next, step S21 and step S22 will be specifically described.

In one embodiment of the present disclosure, step S21 is to calculate the adjustment information of the robot in the horizontal direction. Firstly, judging whether the head or the chassis needs to be adjusted according to the current horizontal angle of the sound source, and then calculating adjustment information.

When the horizontal angle of the current sound source is within the rotatable range, the interactive object corresponding to the current sound source can be placed in the visual field range of the head of the robot through the rotation of the head of the robot, so that the head of the robot is matched with the current sound source in direction. Therefore, the head horizontal direction adjustment information is the difference between the current sound source horizontal angle and the current head horizontal angle.

It should be noted that, since the current sound source horizontal angle and the current head horizontal angle are both positive and negative numbers, the difference is also positive and negative, and corresponds to different rotation directions relative to the coordinate axis, respectively, a positive value is a positive rotation toward the coordinate axis, and a negative value is a negative rotation toward the coordinate axis.

When the horizontal angle of the current sound source is out of the rotatable range, the interactive object corresponding to the current sound source cannot be placed in the visual field range of the head of the robot only through the rotation of the head of the robot, and at the moment, the chassis needs to be rotated, so that the head of the robot is matched with the current sound source direction.

Meanwhile, when the chassis is rotated, the head can be reset to the initial position, so that the main body of the robot is opposite to the interactive object after the pose is adjusted, the actual interactive condition is met, and the reality and experience of human-computer interaction are enhanced.

The adjustment information of the robot in the horizontal direction includes chassis horizontal direction adjustment information and head horizontal direction adjustment information. Adjusting information of the horizontal direction of the chassis, namely a negative value of the horizontal angle of the current sound source; the head horizontal direction adjustment information is to reset the head to the initial position, i.e., the negative value of the current head horizontal angle.

Step S22 is to calculate the adjustment information of the robot in the vertical direction. Similarly to step S21, when the current sound source vertical angle is within the head rotatable range, the head may be directly rotated, and the head vertical direction adjustment information is the difference between the current sound source vertical angle and the current head vertical angle.

And when the vertical angle of the current sound source is out of the rotatable range of the head, the robot stops after rotating to the limit position, and the adjustment information of the vertical direction of the head is the difference value between the limit vertical angle close to the current sound source direction and the current vertical angle of the head.

It should be noted that the present disclosure does not specifically limit the execution sequence of step S21 and step S22.

Step S3, adjusting the pose of the head and/or chassis based on the pose adjustment information so that the head of the robot matches the current sound source direction.

In an embodiment of the disclosure, according to the direction and angle in the head horizontal direction adjustment information, the head vertical direction adjustment information, and/or the chassis horizontal direction adjustment information in the pose adjustment information, the motion of the head and/or the chassis of the robot is controlled, and finally the head of the robot is matched with the current sound source direction, that is, an interactive object corresponding to the sound source is in the field of view of a camera in front of the head of the robot.

The pose is adjusted to enable the head of the robot to be matched with the current sound source direction, so that the sense of reality during machine interaction can be enhanced, the man-machine interaction efficiency is improved, and the user experience is improved; in addition, the rotation of the head and the rotation of the chassis of the robot are controlled, the pose of the head and/or the chassis of the robot can be adjusted in situ according to a sound source, and the safety of human-computer interaction is ensured; meanwhile, after receiving the dialogue information, the pose of the robot is adjusted according to the current sound source of the dialogue information, so that the pose adjustment accuracy and real-time performance are improved, and the interactive object feeling of the robot is enhanced.

In an embodiment of the disclosure, the robot analysis module may be further utilized to perform content recognition on the voice information, when it is recognized that an interactive object needs to operate the display screen in an interactive action corresponding to the voice information, the laser radar of the robot may be utilized to judge a distance between the interactive object and the robot main body, and the chassis of the robot is moved to a closer distance from the interactive object within a safe distance, so as to facilitate user operation.

In one embodiment of the present disclosure, during each dialog period in the robot interaction process, the position of the interaction object may change, such as the user walks, squats, and the like. Thus, the head of the robot may adjust the pose of the head to follow the interactive object in real time.

Fig. 3 schematically illustrates a flow chart of another robot interaction method in an exemplary embodiment of the present disclosure. As shown in fig. 3, after adjusting the pose of the head and/or chassis based on the first pose adjustment information, the method further comprises:

step S31, acquiring image information in the current visual field range through a camera of the robot to perform face recognition;

step S32, when a human face is recognized, acquiring the current human face direction corresponding to the human face and the current head pose of the robot in real time;

step S33, calculating second pose adjustment information according to the current face direction and the current head pose;

and step S34, adjusting the posture of the head in real time based on the second posture adjustment information so that the head of the robot corresponds to the current face direction.

Next, steps S31 to S34 are explained, respectively:

in one embodiment of the present disclosure, the face recognition in step S31 may be accomplished by a front camera and a face recognition module of the robot.

Specifically, the image information in the current visual field range of the front camera is collected, then a face recognition algorithm of a face recognition module is called to extract facial features of the image information, the extracted features are matched with the facial features of a standard face preset in the algorithm, and then a face recognition result is obtained, wherein the face recognition result comprises the recognized face and the feature matching values of the faces.

For example, features such as an eye distance range, a three-family ratio and the like of a standard face can be preset according to a machine learning result of face recognition, then the feature extraction results are matched to obtain a recognized face, the recognized face is scored to obtain a feature matching value, and the matching degree of the face and the standard face is represented.

The face recognition can be realized by adopting the prior art, so the process of the face recognition is not particularly limited by the disclosure.

The face recognition result comprises that one face is recognized and a plurality of faces are recognized. When a face is identified, acquiring the direction of the current face according to the position of the face; when a plurality of faces are identified, a target face needs to be selected from the plurality of faces to obtain a current face direction.

In one embodiment of the present disclosure, when a plurality of faces are recognized, the method further includes: acquiring feature matching values of each face and a standard face; and selecting the face with the maximum feature matching value as a target face, and configuring the position of the target face as the current face position.

Specifically, feature matching values corresponding to each face may be obtained according to the face recognition result, the feature matching values are sorted, and the face with the largest feature matching value is used as the target face. In one embodiment of the present disclosure, when a plurality of faces are recognized, the method further includes: calculating the direction of each face based on the depth information of the face; acquiring depth information of each face through a depth camera of the robot; calculating a direction of each face relative to the robot head based on the depth information; and selecting a target face according to the angle difference between the direction of each face and the current sound source direction, and configuring the position of the target face as the current face position.

In one embodiment of the present disclosure, when a plurality of faces are recognized, the method further includes: acquiring depth information of each face through a depth camera of the robot; calculating a direction of each face relative to the robot head based on the depth information; and selecting a target face according to the angle difference between the direction of each face and the current sound source direction, and configuring the position of the target face as the current face position.

Specifically, firstly, a depth camera of the robot is used for acquiring depth information of each face; then calculating the centroid of each face relative to the head of the robot, namely the face position of the origin of the three-dimensional coordinate system, according to the depth information; and finally, calculating the angle difference between each face position and the current sound source direction, and taking the face direction with the minimum angle difference as the current face direction, namely taking the face close to the current sound source direction as a target face.

In one embodiment of the present disclosure, when multiple faces are recognized, the above two methods may also be combined.

Specifically, a face with a high feature matching value in each face may be selected first, and under the condition that the feature matching values are the same, a face closest to the current sound source direction may be selected as a target face; or selecting a plurality of faces close to the current sound source direction according to the current sound source direction, and then selecting the face with a high feature matching value as a target face.

Based on the method, the accuracy of interactive object recognition in the current conversation period can be improved when a plurality of faces are recognized.

In step S32, when a face is recognized, a current face direction corresponding to the face and a current head pose of the robot are acquired in real time.

Specifically, the face direction of the target face is set as the current face direction from the target face determined in step S31.

And acquiring the depth information of the target face by using a depth camera of the robot, and calculating the center of mass of the target face relative to the head of the robot, namely the face position of the origin of the three-dimensional coordinate system according to the depth information to obtain the face direction. Then, similarly to step S1, an included angle between the face direction and the yoz plane may be calculated based on the established spatial three-dimensional coordinate system to obtain a current face horizontal angle, and an included angle between the current face direction and the xoy plane may be calculated to obtain a current sound source face angle.

The current head pose of the robot can be obtained through the freedom joint information of the robot, and the current head vertical angle and the current head horizontal angle are obtained respectively.

In step S33, second pose adjustment information is calculated from the current face direction and the current head pose.

In one embodiment of the present disclosure, the second posture adjustment information includes head horizontal direction adjustment information and head vertical direction adjustment information. Because the interaction time in one conversation period is short, in order to avoid abrasion of joint parts of the robot, when the head of the robot follows an interaction object in real time, only the head can be adjusted without adjusting the bottom, and after new conversation information is heard, the head and/or the bottom is adjusted according to a sound source corresponding to the conversation information.

Therefore, when the current face horizontal angle is outside the rotatable range, the calculation of the head horizontal direction adjustment information is different from step S21. When the current face horizontal angle is out of the rotatable range, the robot stops after rotating to the limit position, and the head horizontal direction adjustment information is the difference value between the limit horizontal angle close to the current face direction and the current head horizontal angle.

When the horizontal angle of the current face is within the rotatable range, the calculation manner of calculating the adjustment information of the horizontal direction of the head and the adjustment information of the vertical direction of the head is similar to that in step S22, except that the current sound source direction is replaced by the current face direction, and the calculation steps are not repeated here.

Similarly, the angles in the head horizontal direction adjustment information and the vertical direction adjustment information are divided into positive and negative values, which indicate the direction and angle of rotation.

In step 34, the pose of the head is adjusted in real time based on the second pose adjustment information so that the head of the robot corresponds to the current face direction.

In an embodiment of the disclosure, the control of the head of the robot is realized according to the direction and the angle in the head horizontal direction adjustment information and the head vertical direction adjustment information in the second position and orientation adjustment information, and finally the head of the robot is matched with the current face direction, that is, the face of the interactive object is in the field of view of the camera in front of the head of the robot.

The head of the robot is matched with the current face direction by adjusting the posture of the head of the robot, so that the sense of reality during machine interaction can be enhanced, the man-machine interaction efficiency is improved, and the user experience is improved; meanwhile, only the pose of the robot head is adjusted in the interaction process, so that the abrasion of rotating parts of the robot chassis can be reduced.

In an embodiment of the disclosure, in the interaction process, if the face of the interactive object is in the visual field range of the front camera of the robot, but the time for waiting to receive the dialogue information is long, the interactive object can be reminded to interact. Thus, when the robot is in the interactive state, the method further comprises: acquiring image information in a current visual field range through a camera of the robot so as to perform face recognition and acquire the waiting time of a current conversation period; and when the face is identified and the waiting time is greater than a first preset value, playing a first audio and/or a first picture.

Specifically, when the dialogue information is a wake-up word, the waiting time of the current dialogue period is calculated from the time after the pose of the head and/or the chassis is adjusted; and when the conversation information is a conversation instruction, calculating the waiting time of the current conversation period after the interactive action corresponding to the last conversation instruction is executed.

The first audio and/or the first visual can be used to prompt the user to continue interacting. For example, the second audio may be a tts prompt and the second screen may be a "question" emoticon of the robotic emoticon. After the interactive object is identified, tts prompt words and 'question' expressions can be played, or only audio or only pictures can be played.

The first preset value may be preset in advance according to the interaction situation, for example, set to 3 minutes or 5 minutes, and if the set value is 3 minutes, that is, the waiting time of the current dialog period exceeds 5 minutes, the user is prompted to continue the interaction.

In one embodiment of the present disclosure, after a preset end condition is met in an interaction process, or after the robot receives an end word, the robot switches from an interaction state to a standby state. Thus, when the robot is in the interactive state, the method further comprises: acquiring image information in a current visual field range through a camera of the robot so as to perform face recognition and acquire the waiting time of a current conversation period; when the face is not recognized and the waiting time is greater than a second preset value; or after receiving the end word, switching to a standby state.

In one embodiment of the present disclosure, when the robot is in a standby state, the method further comprises: identifying an interactive object through a human face and/or voice; and after the interactive object is identified, playing second audio and/or a second picture.

Specifically, in the standby state, the robot may preset a recognition period, and periodically determine whether an interactive object exists in a certain range around the robot by using face recognition, voice recognition, or simultaneous recognition of a face and a voice.

In one embodiment of the present disclosure, the face recognition may be performed by a front camera and a face recognition module of the robot. The method comprises the steps of collecting image information in the current visual field range of a front camera, calling a face recognition algorithm of a face recognition module to extract facial features of the image information, matching the extracted features with facial features of a standard face preset in the algorithm, and further obtaining a face recognition result to judge whether an interactive object exists.

In addition, when the face is identified, the depth information of the identified face is detected by combining a depth camera, then the position information of each face is determined according to the depth information, and the interactive object is judged to be identified when the position is within the preset range.

In one embodiment of the disclosure, the voice recognition may receive voice information through a pickup microphone of the robot, recognize voice content through the parsing module, and determine whether an interactive object exists according to the recognized voice content. For example, if the speech content is a clear and complete utterance or question, it is determined that an interactive object exists.

In one embodiment of the present disclosure, the second audio and/or the second screen may be used to guide the user to open the interaction. For example, the second audio may be a tts (Text-To-Speech) welcome, and the second frame may be a "smile" expression of the robotic emoticon. After the interactive object is identified, tts welcome words and smile expressions can be played, and only audio or only pictures can be played.

Based on the method, whether the potential interaction object exists can be actively monitored in the standby state, and the interaction probability of the robot is improved.

Fig. 4 schematically illustrates a flowchart of a single-person mode robot interaction method in an exemplary embodiment of the present disclosure, and as shown in fig. 4, when the robot recognizes only one face, that is, the robot adopts the single-person mode, the robot interaction method specifically includes the following steps:

step S401, the robot is in a standby state, when a face is recognized, step S402 is executed, tts welcome words are played and expressions are displayed, and when the face is not recognized, step S403 is executed to judge whether awakening words are heard;

when hearing the awakening word, the robot is switched from the standby state to the interactive state, and step S404 is executed, and the sound source direction and the current pose of the robot are judged according to the awakening word; calculating pose adjustment information, executing a step S405, judging whether an angle needing to be adjusted is within a head rotation range, if not, executing a step S406, rotating the chassis according to the position of a sound source, and if so, executing a step S407, and directly adjusting the horizontal and vertical directions of the head; executing step S408, and directly facing the interactive person and starting interaction after adjustment;

then, executing a step S409, judging whether the user stays in the visual field range, namely whether the face is recognized, if not, executing a step S410, playing tts prompts, and if so, executing a step S411, continuing to interact with the user, and at the moment, adjusting the head of the robot so that the head is matched with the face position; finally, step S412 is executed, and if the user says the end word or exceeds a certain time, the interaction is ended.

Fig. 5 schematically illustrates a flowchart of a multi-person mode robot interaction method in an exemplary embodiment of the present disclosure, and as shown in fig. 5, when a robot recognizes a plurality of faces, a multi-person mode may be adopted, and the robot interaction method specifically includes the following steps:

step S501, when hearing the awakening words, the robot is switched from a standby state to an interactive state, step S502 is executed to identify the human faces, and step S503 is executed to judge whether a plurality of human faces exist or not when the human faces are identified;

if a plurality of faces are identified, executing step S504, and selecting the face with the highest face score, namely, the face with the highest feature matching value calculated by using a face identification algorithm as an interactive object; if only one face is identified, the face is directly used as an interactive object;

after the interactive object is determined, executing step S505, determining whether the angle to be adjusted is within the head rotation range, if the angle is not within the head rotation range, executing step S506, rotating the chassis according to the sound source position, if the angle is within the head rotation range, executing step S407, and directly adjusting the horizontal and vertical directions of the head; executing step S508, directly facing the interactive object and starting interaction, wherein if a plurality of people are identified, the directly facing interactive object is the person with the highest face score;

then step S509 is executed, whether the user stays in the visual field range is judged, if not, step S510 is executed, tts prompt words are played to prompt user interaction, if so, step S511 is executed to continue the interaction with the user, and at the moment, the head of the robot can be adjusted to enable the head to be matched with the face position; finally, step S512 is executed, and if the user says the end word or exceeds a certain time, the interaction is ended.

Based on the method, after the conversation period is finished, the pose of the head and/or the chassis of the robot can be adjusted according to the sound source direction of the conversation information, and it can be ensured that a user corresponding to the sound source can be placed in the visual field range of the front camera of the robot when each conversation period is started; in the conversation period, the face direction can be judged according to the recognized face depth information, and then the pose of the head of the robot is adjusted, so that the head of the robot is matched with the face direction in the conversation period. Therefore, compare in current merchant robot, the robot is stronger with the affinity in mutual in-process object sense, improves the robot interaction efficiency, has promoted user experience.

In addition, corresponding interaction strategies are configured when a single face or a plurality of faces are identified, and the interactive object of the current conversation period can be determined according to the feature matching value and/or the proximity degree of the multiple faces to the sound source direction when the multiple faces are identified, so that the applicability of the robot is stronger, and the accuracy of interactive object identification is improved.

Fig. 6 schematically illustrates a composition diagram of a robot interaction device in an exemplary embodiment of the present disclosure, and as shown in fig. 6, the robot interaction device 600 may include a listening module 601, a calculating module 602, and an adjusting module 603. Wherein:

the monitoring module 601 is configured to, after receiving the dialogue information, obtain a current sound source direction corresponding to the dialogue information and a current pose of the robot;

a calculating module 602, configured to calculate first pose adjustment information according to the current sound source direction and the current pose;

an adjusting module 603, configured to adjust a pose of the head and/or the chassis based on the first pose adjustment information, so that the head of the robot matches the current sound source direction.

According to an exemplary embodiment of the present disclosure, the head is configured with a rotatable range, the first posture adjustment information includes head horizontal direction adjustment information or chassis horizontal direction adjustment information, and head vertical direction adjustment information, the calculation module 602 includes a horizontal direction calculation unit and a vertical direction calculation unit (not shown in the figure), the horizontal direction calculation unit is configured to calculate a difference between a current sound source horizontal angle and a current head horizontal angle to obtain the head horizontal direction adjustment information when the current sound source horizontal angle is within the rotatable range; when the current sound source horizontal angle is out of the rotatable range, taking a negative value of the current sound source horizontal angle as adjustment information of the chassis horizontal direction; the vertical direction calculating unit is used for calculating the difference value between the current sound source vertical angle and the current head vertical angle to obtain the head vertical direction adjusting information when the current sound source vertical angle is within the rotatable range; and when the vertical angle of the current sound source is out of the rotatable range of the head, calculating the difference value between the limit vertical angle close to the current sound source direction and the current head vertical angle to obtain the adjustment information of the head vertical direction.

According to an exemplary embodiment of the present disclosure, when the current sound source horizontal angle is outside the rotatable range, the first posture adjustment information further includes head horizontal direction adjustment information, and the horizontal direction calculation unit is further configured to take a negative value of the current head horizontal angle as the head horizontal direction adjustment information, so that the head of the robot matches the current sound source direction.

According to an exemplary embodiment of the present disclosure, after adjusting the pose of the head and/or the chassis based on the first pose adjustment information, the robot interaction device 600 further includes a head following module (not shown in the figure) for acquiring image information within a current field of view through a camera of the robot for face recognition; when a face is identified, acquiring the current face direction corresponding to the face and the current head pose of the robot in real time; calculating second pose adjustment information according to the current face direction and the current head pose; and adjusting the posture of the head in real time based on the second posture adjustment information so as to enable the head of the robot to correspond to the current face direction.

According to an exemplary embodiment of the present disclosure, when a plurality of faces are recognized, the head following module further includes a first recognition unit configured to obtain a feature matching value between each face and a standard face; and selecting the face with the maximum feature matching value as a target face, and configuring the position of the target face as the current face position.

According to an exemplary embodiment of the present disclosure, when a plurality of faces are recognized, the head following module further includes a second recognition unit for acquiring depth information of each face by a depth camera of the robot; calculating a direction of each face relative to the robot head based on the depth information; and selecting a target face according to the angle difference between the direction of each face and the current sound source direction, and configuring the position of the target face as the current face position.

According to an exemplary embodiment of the present disclosure, when the current face horizontal angle is outside the rotatable range, the second posture adjustment information includes head horizontal direction adjustment information, and the head following module is further configured to calculate a difference between a limit horizontal angle close to the current face direction and the current head horizontal angle to obtain head horizontal direction adjustment information.

According to an exemplary embodiment of the present disclosure, the robot interaction apparatus 600 further includes an interaction determining module (not shown in the figure), configured to, when the robot is in an interaction state, acquire image information in a current view range through a camera of the robot to perform face recognition, and acquire a waiting time of a current conversation period; when the face is identified and the waiting time is greater than a first preset value, playing a first audio and/or a first picture; when the face is not recognized and the waiting time is greater than a second preset value; or after receiving the end word, switching to a standby state.

According to an exemplary embodiment of the present disclosure, the robot interaction apparatus 600 further includes a standby determining module (not shown in the figure) for recognizing an interaction object through a human face and/or voice when the robot is in a standby state; and after the interactive object is identified, playing second audio and/or a second picture.

The details of each module in the robot interaction apparatus 600 are already described in detail in the corresponding robot interaction method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In an exemplary embodiment of the present disclosure, there is also provided a storage medium capable of implementing the above-described method. Fig. 7 schematically illustrates a schematic diagram of a computer-readable storage medium in an exemplary embodiment of the disclosure, and as shown in fig. 7, a program product 700 for implementing the above method according to an embodiment of the disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a mobile phone. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided. Fig. 8 schematically shows a structural diagram of a computer system of an electronic device in an exemplary embodiment of the disclosure.

It should be noted that the computer system 800 of the electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 8, a computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for system operation are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. When the computer program is executed by a Central Processing Unit (CPU)801, various functions defined in the system of the present disclosure are executed.

It should be noted that the computer readable medium shown in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A robot interaction method applied to an interactive robot, the robot comprising a main body, a rotatable head and a rotatable chassis arranged on the main body, the method comprising:

after receiving the dialogue information, acquiring the current sound source direction corresponding to the dialogue information and the current pose of the robot;

calculating first pose adjustment information according to the current sound source direction and the current pose;

adjusting the pose of the head and/or chassis based on the first pose adjustment information to match the head of the robot with the current sound source direction.

2. The robot interaction method of claim 1, wherein the current sound source direction comprises: the dialogue information corresponds to a current sound source horizontal angle and a current sound source vertical angle of a sound source relative to the robot head.

3. The robotic interaction method of claim 1, wherein the current pose comprises: a current head pose and a current chassis direction, wherein the current head pose comprises a current head horizontal angle and the current head vertical angle.

4. The robot interaction method according to claim 1, the head being configured with a rotatable range, the first pose adjustment information including head horizontal direction adjustment information or chassis horizontal direction adjustment information, and head vertical direction adjustment information, wherein the calculating first pose adjustment information from the current sound source direction and the current pose comprises:

when the current sound source horizontal angle is within the rotatable range, calculating the difference value between the current sound source horizontal angle and the current head horizontal angle to obtain the head horizontal direction adjustment information;

when the vertical angle of the current sound source is within the rotatable range, calculating the difference value between the vertical angle of the current sound source and the vertical angle of the current head to obtain the adjustment information of the vertical direction of the head;

5. The robot interaction method according to claim 4, wherein the first pose adjustment information further includes head horizontal direction adjustment information when a current sound source horizontal angle is outside the rotatable range, the method further comprising:

and taking a negative value of the current head horizontal angle as head horizontal direction adjustment information so as to enable the head of the robot to be matched with the current sound source direction.

6. The robotic interaction method of claim 1, wherein after adjusting the pose of the head and/or chassis based on the first pose adjustment information, the method further comprises:

acquiring image information in a current visual field range through a camera of the robot so as to perform face recognition;

when a face is identified, acquiring the current face direction corresponding to the face and the current head pose of the robot in real time;

calculating second pose adjustment information according to the current face direction and the current head pose;

and adjusting the posture of the head in real time based on the second posture adjustment information so as to enable the head of the robot to correspond to the current face direction.

7. The robot interaction method of claim 6, wherein the current face direction comprises: the face is relative to the current face horizontal angle and the current face vertical angle of the robot head.

8. A robot interaction method as claimed in claim 6, wherein, when a plurality of faces are identified, the method further comprises:

acquiring feature matching values of each face and a standard face;

and selecting the face with the maximum feature matching value as a target face, and configuring the position of the target face as the current face position.

9. A robot interaction method as claimed in claim 6, wherein, when a plurality of faces are identified, the method further comprises:

acquiring depth information of each face through a depth camera of the robot;

calculating a direction of each face relative to the robot head based on the depth information;

and selecting a target face according to the angle difference between the direction of each face and the current sound source direction, and configuring the position of the target face as the current face position.

10. The robot interaction method according to claim 6, wherein the second posture adjustment information includes head horizontal direction adjustment information when the current face horizontal angle is out of the rotatable range, the method further comprising:

and calculating the difference value between the limit horizontal angle close to the current face direction and the current head horizontal angle to obtain the head horizontal direction adjustment information.

11. A robot interaction method according to claim 1, characterized in that, while the robot is in an interactive state, the method further comprises:

acquiring image information in a current visual field range through a camera of the robot so as to perform face recognition and acquire the waiting time of a current conversation period;

when the face is identified and the waiting time is greater than a first preset value, playing a first audio and/or a first picture;

when the face is not recognized and the waiting time is greater than a second preset value; or after receiving the end word, switching to a standby state.

12. The robot interaction method of claim 1, wherein while the robot is in a standby state, the method further comprises:

identifying an interactive object through a human face and/or voice;

and after the interactive object is identified, playing second audio and/or a second picture.

13. A robot interaction device applied to an interactive robot, the robot comprising a main body, a head part and a rotatable chassis, the head part and the rotatable chassis being provided on the main body, the robot interaction device comprising:

the monitoring module is used for acquiring the current sound source direction corresponding to the dialogue information and the current pose of the robot after the dialogue information is received;

the calculation module is used for calculating first pose adjustment information according to the current sound source direction and the current pose;

and the adjusting module is used for adjusting the pose of the head and/or the chassis based on the first pose adjusting information so as to enable the head of the robot to be matched with the current sound source direction.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the robot interaction method according to any one of claims 1 to 12.

15. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the robot interaction method of any one of claims 1 to 12.