CN110794964A

CN110794964A - Interaction method and device for virtual robot, electronic equipment and storage medium

Info

Publication number: CN110794964A
Application number: CN201911007922.4A
Authority: CN
Inventors: 刘炫鹏
Original assignee: Shenzhen Chase Technology Co Ltd
Current assignee: Shenzhen Chase Technology Co Ltd; Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2020-02-14

Abstract

The application discloses an interaction method and device of a virtual robot, electronic equipment and a storage medium. The method comprises the following steps: acquiring audio data of a target user corresponding to a current session process in real time; determining a spatial position of a head of a target user according to the audio data; acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters; gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces a target user, and generating an image sequence, wherein the image sequence is formed by a plurality of frames of continuous pose images of the virtual robot; and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence. According to the method and the device, the head posture of the virtual robot can be dynamically adjusted according to the head position of the user, so that the virtual robot in the interactive video always faces the user, and the naturalness of man-machine interaction is improved.

Description

Interaction method and device for virtual robot, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of human-computer interaction, in particular to an interaction method and device of a virtual robot, electronic equipment and a storage medium.

Background

Along with the development of the related technology of artificial intelligence, the interaction function of the robot customer service is stronger and stronger, more and more scenes can be applied, the interaction efficiency is greatly improved, and the artificial resources are saved. As is well known, face-to-face communication is the most basic way between people, and even with robots, users experience this aspect somewhat. However, most of robots currently have a single interactive function, and usually only execute corresponding actions based on simple instructions input by user voice or touch, which is lack of vividness and poor user experience.

Disclosure of Invention

The embodiment of the application provides an interaction method and device of a virtual robot, electronic equipment and a storage medium, so that the virtual robot in an interactive video can always face a user, and the human-computer interaction experience is optimized.

In a first aspect, an embodiment of the present application provides an interaction method for a virtual robot, where the method may include: acquiring audio data of a target user corresponding to a current conversation process in real time, wherein the virtual robot and the target user interact one to one in the conversation process; determining a spatial position of the head of the target user according to the audio data; acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters; gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating an image sequence consisting of a plurality of frames of successive pose images of the virtual robot; and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence.

Optionally, the audio data is collected by two microphones, and the determining the spatial position of the head of the target user according to the audio data includes: determining a plane coordinate of a sound source corresponding to the audio data according to the audio feature difference of the same audio data acquired by the two microphones; converting the plane coordinates into three-dimensional coordinates in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

Optionally, the audio data is collected by a microphone array, the microphone array is composed of at least 3 microphones, and the determining the spatial position of the head of the target user according to the audio data includes: according to the audio feature difference of the same audio data collected by each microphone in the microphone array, determining the three-dimensional coordinates of a sound source of the audio data in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

Optionally, before the obtaining, in real time, audio data of a target user corresponding to a current session progress, the interaction method of the virtual robot further includes: when audio data of a first user are acquired, extracting first voiceprint features corresponding to the audio data of the first user; establishing a first session process corresponding to the first user, and binding the first session process with the first voiceprint feature;

the acquiring, in real time, audio data of a target user corresponding to a current session process includes: and when the current conversation process is the first conversation process, taking a first user corresponding to the first conversation process as a target user corresponding to the current conversation process, and acquiring the audio data of the target user in real time according to the first voiceprint characteristic.

Optionally, the interaction method of the virtual robot further includes: when the current session process is the first session process, if audio data of a second user is acquired, extracting second voiceprint features corresponding to the audio data of the second user; establishing a second session process corresponding to the second user, and binding the second session process with the second voiceprint feature; when the first session process is ended or the first session process is in a waiting timeout state, switching the current session process into the second session process;

the acquiring, in real time, audio data of a target user corresponding to a current session process includes:

and taking a second user corresponding to the second session process as a target user corresponding to the current session process, and acquiring the audio data of the target user in real time according to the second voiceprint feature.

Optionally, the first session process is in a wait timeout state, including: when the audio data of the first user is not acquired, timing the waiting time; and when the timed numerical value does not reach the set duration and the audio data of the first user is not collected in the timing period, determining that the first session process is in a waiting overtime state.

Optionally, the obtaining a sequence of pose adjustment parameters of the head of the virtual robot according to the spatial position includes: according to the space position, acquiring a target posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user; acquiring current attitude parameters of the head of the virtual robot; determining a sequence of pose adjustment parameters of the head of the virtual robot based on the current pose parameters and the target pose parameters.

Optionally, after the generating and outputting an interactive video containing a head pose change of the virtual robot according to the image sequence, the virtual robot interaction method further includes: in the playing process of the interactive video, if the change of the spatial position of the head of the target user is detected, acquiring a target update posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user according to the changed spatial position; when the difference value between the target updating posture parameter and the target posture parameter is larger than a preset value, determining a new posture adjustment parameter sequence of the head of the virtual robot based on the current posture parameter of the head of the virtual robot and the target updating posture parameter; gradually adjusting the head pose of the virtual robot based on the new pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating a new image sequence; and generating and outputting an interactive update video containing the head posture change of the virtual robot according to the new image sequence, wherein the interactive update video is used for replacing the interactive video to play.

Optionally, the gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating an image sequence includes: determining a body adjustment parameter sequence of the virtual robot according to the corresponding relation between the head and the body of the virtual robot and the posture adjustment parameter sequence, wherein the body adjustment parameter sequence corresponds to the posture adjustment parameter sequence; gradually adjusting the head posture and the body posture of the virtual robot based on the posture adjustment parameter sequence and the body adjustment parameter sequence until the head and the body of the virtual robot face the target user, and generating an image sequence.

In a second aspect, an embodiment of the present application provides an interaction apparatus for a virtual robot, which may include: the audio acquisition module is used for acquiring audio data of a target user corresponding to a current conversation process in real time, wherein the virtual robot and the target user interact one to one in the conversation process; a position determination module for determining a spatial position of the head of the target user according to the audio data; the parameter acquisition module is used for acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters; an image generation module, configured to gradually adjust a head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generate an image sequence, where the image sequence is formed by multiple frames of consecutive pose images of the virtual robot; and the video generation module is used for generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence.

Optionally, the audio data is collected by two microphones, and the position determining module may be specifically configured to: determining a plane coordinate of a sound source corresponding to the audio data according to the audio feature difference of the same audio data acquired by the two microphones; converting the plane coordinates into three-dimensional coordinates in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

Optionally, the audio data is collected by a microphone array, the microphone array is composed of at least 3 microphones, and the position determining module may be specifically configured to: according to the audio feature difference of the same audio data collected by each microphone in the microphone array, determining the three-dimensional coordinates of a sound source of the audio data in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

Optionally, the interaction device of the virtual robot further includes: the device comprises a first feature extraction module and a first feature binding module. The first feature extraction module is used for extracting a first voiceprint feature corresponding to audio data of a first user when the audio data of the first user are obtained; the first feature binding module is used for establishing a first session process corresponding to the first user and binding the first session process with the first voiceprint feature;

the audio acquisition module may be specifically configured to: and when the current conversation process is the first conversation process, taking a first user corresponding to the first conversation process as a target user corresponding to the current conversation process, and acquiring the audio data of the target user in real time according to the first voiceprint characteristic.

Optionally, the interaction device of the virtual robot further includes: the system comprises a second feature extraction module, a second feature binding module and a process switching module. The second feature extraction module is used for extracting a second voiceprint feature corresponding to the audio data of a second user if the audio data of the second user is acquired when the current conversation process is the first conversation process; the second feature binding module is used for establishing a second session process corresponding to the second user and binding the second session process with the second voiceprint feature; the process switching module is used for switching the current session process into the second session process when the first session process is ended or the first session process is in a waiting overtime state;

the audio acquisition module may be specifically configured to: and taking a second user corresponding to the second session process as a target user corresponding to the current session process, and acquiring the audio data of the target user in real time according to the second voiceprint feature.

Optionally, the first session process in the process switching module is in a wait timeout state, including: when the audio data of the first user is not acquired, timing the waiting time; and when the timed numerical value does not reach the set duration and the audio data of the first user is not collected in the timing period, determining that the first session process is in a waiting overtime state.

Optionally, the parameter obtaining module may be specifically configured to: according to the space position, acquiring a target posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user; acquiring current attitude parameters of the head of the virtual robot; determining a sequence of pose adjustment parameters of the head of the virtual robot based on the current pose parameters and the target pose parameters.

Optionally, the interaction device of the virtual robot further includes: the device comprises a position detection module, a parameter updating module, an image updating module and a video updating module. The position detection module is used for acquiring a target update posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user according to the changed spatial position if the spatial position of the head of the target user is detected to be changed in the playing process of the interactive video; the parameter updating module is used for determining a new posture adjustment parameter sequence of the head of the virtual robot based on the current posture parameter of the head of the virtual robot and the target updating posture parameter when the difference value between the target updating posture parameter and the target posture parameter is larger than a preset value; the image updating module is used for gradually adjusting the head posture of the virtual robot based on the new posture adjustment parameter sequence until the head of the virtual robot faces the target user and generating a new image sequence; and the video updating module is used for generating and outputting an interactive updating video containing the head posture change of the virtual robot according to the new image sequence, and the interactive updating video is used for replacing the interactive video to play.

Optionally, the image generation module may be specifically configured to: determining a body adjustment parameter sequence of the virtual robot according to the corresponding relation between the head and the body of the virtual robot and the posture adjustment parameter sequence, wherein the body adjustment parameter sequence corresponds to the posture adjustment parameter sequence; gradually adjusting the head posture and the body posture of the virtual robot based on the posture adjustment parameter sequence and the body adjustment parameter sequence until the head and the body of the virtual robot face the target user, and generating an image sequence.

In a third aspect, an embodiment of the present application provides an electronic device, which may include: a memory; one or more processors coupled with the memory; one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more application programs configured to perform the method of the first aspect as described above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having program code stored therein, where the program code is called by a processor to execute the method according to the first aspect.

The embodiment of the application provides an interaction method and device of a virtual robot, electronic equipment and a storage medium, and the audio data of a target user corresponding to a current conversation process is obtained in real time, so that the spatial position of the head of the target user is determined according to the audio data, wherein the virtual robot interacts with the target user one to one in the conversation process. Then acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position of the head of the target user; gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces a target user, and generating an image sequence, wherein the image sequence is formed by a plurality of frames of continuous pose images of the virtual robot; and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence. According to the method and the device, the head posture of the virtual robot can be dynamically adjusted according to the head position of the user, so that the virtual robot in the interactive video always faces the user, the naturalness of human-computer interaction is improved, and the human-computer interaction experience is optimized.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments, not all embodiments, of the present application. All other embodiments and drawings obtained by a person skilled in the art based on the embodiments of the present application without any inventive step are within the scope of the present invention.

Fig. 1 shows a schematic diagram of an application environment suitable for the embodiment of the present application.

Fig. 2 shows a flowchart of an interaction method of a virtual robot according to an embodiment of the present application.

Fig. 3 shows an interaction diagram of an interaction method of a virtual robot provided by an embodiment of the present application.

Fig. 4 shows a flowchart of an interaction method of a virtual robot according to another embodiment of the present application.

Fig. 5 shows a flowchart of the method of step S350 in fig. 4.

Fig. 6 shows a schematic diagram of a spatial coordinate system provided by an embodiment of the present application.

Fig. 7 shows a flowchart of the method of step S360 in fig. 4.

Fig. 8 shows another flowchart of an interaction method of a virtual robot according to an embodiment of the present application.

Fig. 9 is a flowchart illustrating an interaction method of a virtual robot according to another embodiment of the present application.

Fig. 10 shows a method flowchart of step S530 in fig. 9.

FIG. 11 illustrates a block diagram of an interaction device of a virtual robot, according to an embodiment of the present application;

fig. 12 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure, the electronic device being configured to execute an interaction method of a virtual robot according to an embodiment of the present disclosure;

fig. 13 is a block diagram illustrating a computer-readable storage medium for performing an interaction method of a virtual robot according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

With the development of the related technology of artificial intelligence, the robot has stronger and stronger functions. At present, a robot can interact with a user in various modes such as voice, text, vision and the like, wherein the visual interaction mainly comprises displaying diversified cartoon images or real human images and designing various limb actions, facial expressions and appearances for the robot. However, the face orientation and head movement of the robot are both designed in advance by the developer, and information on the user side is not taken into consideration. In a real scene, when a user interacts with the robot, the standing position of the user generally changes and cannot be kept still on the front side of the robot, or the horizontal positions of the user and the head of the robot are different, so that the face of the robot always faces to the front or other directions instead of looking at the face of the user, and poor human-computer interaction experience can be caused.

The inventor researches the difficulty of interaction between the robot and the user at present, and more comprehensively considers the use requirements of the actual scene, and provides the interaction method, the interaction device, the electronic equipment and the storage medium of the virtual robot, so that the robot can always face the user, the face-to-face communication between the user and the robot is realized, and the human-computer interaction experience is optimized.

In order to better understand the interaction method, device, electronic device, and storage medium of the virtual robot provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for the embodiment of the present application. The interaction method of the virtual robot provided by the embodiment of the application can be applied to the multi-state interaction system 100 shown in fig. 1. The polymorphic interaction system 100 includes an electronic device 101 and a server 102, the server 102 being communicatively coupled to the electronic device 101. The server 102 may be a conventional server or a cloud server, and is not limited herein.

In some embodiments, the electronic device 101 may be a variety of electronic devices having a display screen and supporting data entry, including but not limited to smartphones, tablets, laptop portable computers, desktop computers, wearable electronic devices, and the like. Specifically, the data input may be based on a voice module provided on the electronic device 101 to input voice, a character input module to input characters, an image input module to input images, a video input module to input video, and the like, or may be based on a gesture recognition module provided on the electronic device 101, so that a user may implement an interaction manner such as gesture input.

Wherein, the electronic device 101 may be installed with a client application program, and the user may communicate with the server 102 based on the client application program (e.g. APP, wechat applet, etc.), specifically, the server 102 is installed with a corresponding server application program, and the user may register a user account at the server 102 based on the client application program and communicate with the server 102 based on the user account, for example, a user logs into a user account at a client application, and enters through the client application based on the user account, text information, voice information, image information or video information and the like can be input, and after the client application program receives the information input by the user, the information may be sent to the server 102, so that the server 102 may receive the information, process and store the information, and the server 102 may also receive the information and return a corresponding output information to the electronic device 101 according to the information.

In some embodiments, a client application may be used to provide customer service to a user, in customer service communication with the user, and the client application may interact with the user based on a virtual robot. In particular, the client application may receive information input by a user and respond to the information based on the virtual robot. The virtual robot is a software program based on visual graphics, and the software program can present robot forms simulating biological behaviors or ideas to a user after being executed. The virtual robot may be a robot simulating a real person, such as a robot resembling a real person, which is created according to the shape of the user himself or the other person, or a robot having an animation effect, such as a robot having an animal shape or a cartoon character shape.

In some embodiments, after acquiring the reply information corresponding to the information input by the user, the electronic device 101 may display a virtual robot image corresponding to the reply information on a display screen of the electronic device 101 or other image output device connected thereto. As a mode, while the virtual robot image is played, the audio corresponding to the virtual robot image may be played through a speaker of the electronic device 101 or other audio output devices connected thereto, and a text or a graphic corresponding to the reply information may be displayed on a display screen of the electronic device 101, so that multi-state interaction with the user in multiple aspects of image, voice, text, and the like is realized.

In some embodiments, the means for processing the information input by the user may also be disposed on the electronic device 101, so that the electronic device 101 can interact with the user without relying on the server 102 to establish communication, and in this case, the polymorphic interaction system 100 may only include the electronic device 101.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The interaction method, the interaction device, the electronic device, and the storage medium for the virtual robot provided by the embodiments of the present application will be described in detail through specific embodiments.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an interaction method of a virtual robot according to an embodiment of the present application, where the interaction method of a virtual robot according to the embodiment may be applied to the electronic device or the server. In a specific embodiment, the interaction method of the virtual robot may be applied to the interaction apparatus 900 of the virtual robot shown in fig. 11 and the electronic device 600 shown in fig. 12. As will be explained in detail with respect to the flow shown in fig. 2, the interaction method of the virtual robot may specifically include the following steps:

step S210: and acquiring audio data of a target user corresponding to a current conversation process in real time, wherein the virtual robot and the target user interact one to one in the conversation process.

In the embodiment of the application, the electronic device can acquire the audio data of the target user corresponding to the current conversation process in real time, so as to further interact with the target user according to the audio data. And the virtual robot and the target user perform one-to-one interaction in the session process.

In some embodiments, the current session process is a session process currently running on the electronic device, and a session process may be bound to a user to characterize that the virtual robot performs one-to-one interaction with the bound user in the session process, that is, a session of each turn of the virtual robot can only be performed with a single user. It can be understood that the target user is a bound user corresponding to the currently running session process, that is, in the currently running session process, the virtual robot performs one-to-one interaction with the target user.

In some embodiments, the electronic device may detect the sound of the target user in real time through an audio acquisition device such as a microphone, so as to acquire the audio data of the target user acquired by the microphone in real time. Specifically, as a manner, when an application corresponding to the virtual robot is run in the system foreground of the electronic device, a microphone hardware module of the electronic device may be called to collect audio data of the target user.

Step S220: determining a spatial position of the target user's head from the audio data.

In the embodiment of the application, after the electronic device obtains the audio data of the target user, the electronic device may determine the spatial position of the head of the target user according to the audio data, so as to further determine the head posture of the virtual robot according to the head position of the target user, so that the virtual robot always faces the user.

Step S230: and acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters.

After the electronic device obtains the spatial position of the head of the target user, the electronic device can obtain the posture adjustment parameter sequence of the head of the virtual robot so as to adjust the head posture of the virtual robot according to the posture adjustment parameter sequence, so that the virtual robot always faces the user. Wherein the sequence of pose adjustment parameters may include a plurality of pose adjustment parameters that may be used to drive the head of the virtual robot toward the user.

In some embodiments, since directly updating the head pose of the virtual robot to the pose facing the user may cause a visually obtrusive feeling to the user, the pose adjustment parameter sequence may be a time-sequential and continuous set of pose adjustment parameters to gradually adjust the head pose of the virtual robot until the head of the virtual robot faces the target user. In some embodiments, the attitude adjustment parameter may be an angle that the head of the virtual robot needs to be adjusted, such as Yaw rotation angle, Pitch angle, or a spatial coordinate value of the head of the virtual robot after adjustment, which is not limited herein.

In some embodiments, the electronic device may obtain a current spatial position of the head of the virtual robot, to determine a relative positional relationship between the head of the current virtual robot and the head of the target user according to the obtained spatial position of the head of the target user, so as to determine a sequence of pose adjustment parameters of the head of the virtual robot according to the relative positional relationship. As an embodiment, the electronic device may first determine whether the head of the virtual robot faces the head of the target user according to the relative position relationship, and when the head of the virtual robot does not face the head of the target user, obtain the posture adjustment parameter sequence of the head of the virtual robot according to the specific relative position relationship, so as to adjust the head of the virtual robot to face the head of the target user according to the posture adjustment parameter sequence. When the head of the virtual robot faces the head of the target user, the electronic device may not need to acquire the above-described gesture adjustment parameter sequence, i.e., may not need to adjust the head of the virtual robot.

Step S240: gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating an image sequence composed of a plurality of frames of successive pose images of the virtual robot.

In this embodiment, to reduce the obtrusiveness of the head pose transition action of the virtual robot, the electronic device may gradually adjust the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generate an image sequence. Wherein, the image sequence is composed of a plurality of frames of continuous posture images of the virtual robot. Therefore, the head posture of the virtual robot is dynamically adjusted according to the head position of the target user, and the virtual robot always faces the user. It should be noted that the more pose adjustment parameters in the pose adjustment parameter sequence, the more compact the step of adjusting the head pose of the virtual robot, the smoother the motion of the head pose transition of the virtual robot, and the better the presented visual effect.

In some embodiments, the virtual robot may be generated based on a 3D (3 Dimensions) model. The electronic device can gradually drive the head of the virtual robot through a plurality of posture adjustment parameters in the posture adjustment parameter sequence, namely gradually drive the head model in the 3D model corresponding to the virtual robot to present different postures.

In other embodiments, the virtual robot may also be generated by deep learning techniques. As one mode, when the posture adjustment parameter is an angle at which the head of the virtual robot needs to be adjusted, the electronic device may input the posture adjustment parameter sequence into the trained depth-generating model, and obtain an output result output by the depth-generating model, where the output result may be the posture parameter after the head of the virtual robot is adjusted, and the electronic device may generate the head of the virtual robot after the head is adjusted according to the output result. It is to be understood that the adjustment manner of the head pose of the virtual robot is not limited in the present application, and an appropriate adjustment manner may be selected according to a specific application scenario and a processing capability of the electronic device.

In the embodiment of the application, the electronic device may generate the corresponding image sequence according to the postures presented one by one in the process of gradually adjusting the head posture of the virtual robot to enable the head of the virtual robot to present continuous different postures. Wherein, the image sequence can be composed of a plurality of frames of continuous posture images of the virtual robot.

In some embodiments, the head model of the virtual robot has a plurality of three-dimensional model keypoints corresponding to different feature positions on the head model, and these three-dimensional model keypoints may be a set of keypoints describing all or part of the morphology of the head model, which describes the positions of the respective keypoints on the head model in three-dimensional space. For example, the position of the face feature corresponding to the head may be a plurality of points that are distributed at intervals as a three-dimensional model key point for describing the face contour on a contour line of the head model corresponding to the face part as required. In this way, when the head model of the virtual robot is driven by the attitude adjustment parameters, the attitude image of the virtual robot, which is delicate, vivid, and realistic, can be obtained. Therefore, an image sequence can be generated according to the posture images of the continuous multi-frame virtual robot.

Step S250: and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence.

After the electronic equipment obtains the image sequence, the interactive video containing the head posture change of the virtual robot can be generated and output according to the image sequence, and the interactive video of the virtual robot facing the target user all the time is obtained, so that the virtual robot can speak towards the user all the time like a real person conversation and rotates along with the movement of the user, and the human-computer interaction experience is improved.

In some embodiments, the interactive video may also include audio, which may be reply audio to the user during the course of the virtual robot's conversation with the target user. Such as "you good", "i sorry". Thereby presenting the user with a simulated appearance, sound and behavior resembling the image of a real person.

In a specific application scenario, as shown in fig. 3, a user may open an application client (e.g., a wechat applet or a standalone APP) via an electronic device to enter an interactive interface with a virtual robot. The electronic equipment can establish a session process and bind with the user, and simultaneously can acquire the audio data of the user in real time by calling hardware modules such as a microphone and the like, so that the virtual robot can determine the head position of the user according to the audio data of the user and adjust the head posture of the virtual robot to face the user, and the user can directly have face-to-face conversation with the virtual robot displayed on the interactive interface.

In some embodiments, in a state where the electronic device and the server establish a communication connection, when the electronic device acquires audio data of a target user, the audio data may also be sent to the server, the server determines a head position of the target user for the audio data, then the server determines a head posture adjustment parameter sequence of the virtual robot, and generates an image sequence, and generates an interactive video including a head posture change of the virtual robot according to the image sequence. And outputting the interactive video to the electronic equipment, and acquiring, playing and displaying the interactive video by the electronic equipment.

It can be understood that, in this embodiment, each of the above steps may be performed locally by the electronic device, may also be performed in the server, and may also be performed by the electronic device and the server separately, and according to different actual application scenarios, tasks may be allocated according to requirements to implement an optimized immersive robot interaction experience, which is not limited herein.

According to the interaction method of the virtual robot, the audio data of the target user corresponding to the current conversation process is obtained in real time, so that the spatial position of the head of the target user is determined according to the audio data, wherein the virtual robot interacts with the target user one to one in the conversation process. Then acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position of the head of the target user; gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces a target user, and generating an image sequence, wherein the image sequence is formed by a plurality of frames of continuous pose images of the virtual robot; and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence. According to the method and the device, the head position of the user can be determined according to the audio data of the user, and the head posture of the virtual robot can be dynamically adjusted according to the head position of the user, so that the virtual robot in the interactive video faces the user all the time, the naturalness of human-computer interaction is improved, and the human-computer interaction experience is optimized.

Referring to fig. 4, fig. 4 is a flowchart illustrating an interaction method of a robot according to another embodiment of the present application. As will be described in detail with respect to the flow shown in fig. 4, the interaction method of the robot specifically includes the following steps:

step S310: when the audio data of a first user are acquired, extracting a first voiceprint feature corresponding to the audio data of the first user.

In the embodiment of the application, when the electronic device acquires the audio data of the first user, a first voiceprint feature corresponding to the audio data of the first user can be extracted, so that whether interaction with the user is needed or not is determined according to the first voiceprint feature. Wherein the first user may be any user.

In some embodiments, the electronic device may collect audio data of a current scene in real time through a microphone, detect whether audio data of human voice exists in the audio data, and when the audio data of a single user is detected, may extract a voiceprint feature of the audio data, so that the user may be subsequently identified according to the voiceprint feature. Because each person's pitch, tone intensity, duration, tone quality all are different, consequently, different people's pronunciation can present different vocal print characteristics on the vocal print atlas to can carry out user's identity according to the vocal print characteristic and confirm. As one way, the tone corresponding to the audio data may be extracted as the voiceprint feature.

In the embodiment of the application, when the audio data of the first user is acquired, the electronic device may extract the first voiceprint feature in the audio data in multiple ways. Optionally, the electronic device may obtain the first voiceprint feature in the audio data by using a feature extraction algorithm, where the feature extraction algorithm may be MFCC (Mel-Frequency cepstral coefficients), LPCC (Linear Prediction cepstral coefficients), or the like.

In some embodiments, there may be a case where the audio data of the first user has been previously acquired, i.e. the first voiceprint feature of the first user has been extracted and stored. When the electronic device currently acquires the audio data of the first user, whether the audio data of the first user has been acquired before can be determined according to the matching condition of the currently extracted first voiceprint feature and the stored voiceprint feature, so that whether the audio data of the first user has been interacted before can be determined. If it is determined that the audio data is obtained, in some embodiments, the audio data may be associated with previous interaction data, such that historical interaction data of the virtual robot with each user may be obtained.

In some embodiments, there may also be a case where the audio data of the first user is acquired for the first time, and the electronic device may store the first voiceprint feature of the first user extracted this time, so that the audio data of the first user may be identified subsequently according to the first voiceprint feature, and thus, the integration analysis of the interaction data of the first user is performed.

Step S320: and establishing a first session process corresponding to the first user, and binding the first session process with the first voiceprint feature.

When the electronic device obtains the first voiceprint feature corresponding to the audio data of the first user, a first session process corresponding to the first user can be established, and the first session process is bound with the first voiceprint feature, so that the first user and the first session process are bound. For example, the user a1 calls "hello" to the virtual robot, and when the electronic device detects the audio data a2 of the user a1 through the microphone, the voiceprint feature A3 in the audio data a2 can be extracted, and the voiceprint feature A3 and the session progress a4 are bound, so that the binding of the user a1 and the session progress a4 is realized.

In some embodiments, there may be a case that the first user has bound the session process, that is, the audio data of the first user has been obtained before and the binding of the voiceprint feature and the session process is performed, so that when the electronic device obtains the first voiceprint feature, it may be determined whether the first voiceprint feature matches a voiceprint feature in the voiceprint feature library. The voiceprint feature library is used for storing voiceprint features of the bound session process. When the first voiceprint feature is not matched with the voiceprint feature in the voiceprint feature library, the first user can be considered as an unbound session process, the electronic equipment can establish a first session process corresponding to the first user, and the first session process is bound with the first voiceprint feature; when the first voiceprint feature is matched with the voiceprint feature in the voiceprint feature library, the first user can be considered to be bound with the session process, and the session process does not need to be newly established for binding. Further, in some embodiments, when the session process is ended, the voiceprint features bound with the session ending process in the voiceprint feature library can be deleted, so that the storage space is reduced.

In some embodiments, all the interaction data of each session process may be uploaded to a server for storage for subsequent data analysis and data tracing.

Step S330: and when the current conversation process is the first conversation process, taking a first user corresponding to the first conversation process as a target user corresponding to the current conversation process, and acquiring the audio data of the target user in real time according to the first voiceprint characteristic.

In some embodiments, each round of the virtual robot can only interact with a single user, that is, the electronic device can only run one session process (between the virtual robot and the user) currently, and therefore, when the current session process is the first session process, it can be considered that the virtual robot interacts with the first user currently, and the electronic device can use the first user as a target user corresponding to the current session process and obtain audio data of the target user, that is, audio data of the first user, in real time according to the first voiceprint feature. It can be understood that the virtual robot interacts with the target user one-to-one during the session.

In some embodiments, since multiple users may exist in an interaction scenario between a first user and a virtual robot, audio data collected by a microphone may also be mixed audio data of the multiple users, so that when the virtual robot needs to interact with the first user currently, to ensure accurate interaction between the virtual robot and the first user, the audio data of the first user needs to be acquired from the mixed audio data. As a way, multiple voiceprint features corresponding to multiple users in mixed audio data one to one can be extracted, the multiple voiceprint features are respectively matched with the first voiceprint feature, a target voiceprint feature matched with the first voiceprint feature is obtained from the multiple voiceprint features, then audio data corresponding to the target voiceprint feature is extracted from the mixed audio data, namely the audio data of the first user, and therefore accurate interaction between the virtual robot and the users is achieved.

In some embodiments, where the electronic device may not currently be running a conversation process, the electronic device may interact directly with the first user when detecting audio data of the first user. Specifically, the electronic device can directly identify the audio data, extract voiceprint information in the audio data to bind the session process, and directly run the session process.

Step S340: determining a spatial position of the target user's head from the audio data.

In the embodiments of the present application, the spatial position of the head of the target user may be determined in various ways according to the audio data.

In one embodiment, when the electronic device includes two microphones, the audio data may be collected by the two microphones, so that the spatial position of the head of the target user may be determined according to the audio data collected by the two microphones. Specifically, the determining the spatial position of the head of the target user according to the audio data may include: determining a plane coordinate of a sound source corresponding to the audio data according to the audio feature difference of the same audio data acquired by the two microphones; converting the plane coordinates into three-dimensional coordinates in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin. In this way, by establishing a spatial coordinate system with the spatial position of the head of the virtual robot as the origin, it is possible to facilitate determination of the relative position of the head of the target user.

Specifically, since the two microphones (hereinafter, the first microphone and the second microphone) are located at different positions of the electronic device, the positions of the two microphones relative to the sound source are different, and the sound source can be regarded as the head of the target user, and the difference will cause a slight difference between the first audio data collected by the first microphone and the second audio data collected by the second microphone, and the first audio data and the second audio data are the same sound source. The slight difference can be used to localize the spatial position of the audio source with respect to the two microphones to obtain corresponding spatial position information.

Optionally, the difference in the audio characteristics of the same audio data collected by the two microphones may be a spectral difference of the audio data, and the spectral difference may be a phase difference and/or an amplitude difference of the same audio data at the same time point. The electronic device can determine the plane coordinates of the sound source according to the audio characteristic difference of the whole audio data and the relative position relationship between the two microphones, wherein the plane coordinates are two-dimensional space coordinates, such as an (x, y) form.

Optionally, the electronic device may also determine a first sound source direction corresponding to the audio data according to first audio data acquired by a first microphone, determine a second sound source direction corresponding to the audio data according to second audio data acquired by a second microphone, and determine a plane coordinate of the sound source according to an intersection point of the first sound source direction and the second sound source direction. It should be understood that the above-mentioned determining manner of the sound source is only an example, and is not limited, and the position of the sound source only needs to be determined according to the difference of the audio characteristics of the same audio data collected by two microphones.

After obtaining the plane coordinates of the sound source, the electronic device may convert the plane coordinates into three-dimensional coordinates in a spatial coordinate system, and may use the three-dimensional coordinates as the spatial position of the head of the target user. As an embodiment, the electronic device may convert the plane coordinates (x, y) into three-dimensional coordinates (x, y, z0), where z0 may be any constant and may be set according to a specific application scenario, and is not limited herein. For example, z0 may be chosen to have a default value of 0, such that the head of the target user is at the same level as the head of the virtual robot.

In some embodiments, in order to improve the accuracy of sound source positioning, and in the case that the user is far away from the microphone, the audio data received by the microphone may have some attenuation and noise interference, resulting in the degradation of audio quality. Therefore, as another embodiment, when the electronic device may include a microphone array, the microphone array is composed of at least 3 microphones, and the audio data may be collected by the microphone array, so that the spatial position of the head of the target user may be accurately determined according to the audio data collected by the microphone array. Specifically, the determining the spatial position of the head of the target user according to the audio data may also include: according to the audio feature difference of the same audio data collected by each microphone in the microphone array, determining the three-dimensional coordinates of a sound source of the audio data in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin. It will be appreciated that the greater the number of microphones, the more accurate the spatial position of the head of the target user is acquired.

In some embodiments, the electronic device may obtain, from the same audio data collected by multiple microphones in the microphone array, multiple sound source directions pointed by the multiple microphones, and may obtain, according to the multiple sound source direction intersection positions, three-dimensional coordinates of a sound source in a spatial coordinate system as the spatial position of the head of the target user.

In some embodiments, to ensure that the head of the virtual robot can be accurately oriented to the target user, the spatial position of the head of the virtual robot and the spatial position of the head of the target user must be chosen according to the same rule. Optionally, the three-dimensional coordinates of the sound source may be regarded as the spatial position of a certain key point of the mouth of the target user, and then the spatial position of the corresponding key point of the mouth of the virtual robot is correspondingly selected as the spatial position of the head of the virtual robot.

Step S350: and acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters.

In some embodiments, referring to fig. 5, the obtaining a sequence of pose adjustment parameters of the head of the virtual robot according to the spatial position may include:

step S351: and acquiring a target posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user according to the spatial position.

In some embodiments, since the spatial position of the head of the virtual robot is the origin of the spatial coordinate system, the spatial position of the head of the target user may be taken as a direction vector, so that the target posture parameter of the head of the virtual robot when the head of the virtual robot is directed toward the target user may be calculated from the direction vector. Wherein, the attitude parameter can be a 2-dimensional vector which is respectively Pitch dimension and Yaw dimension. For example, referring to fig. 6, fig. 6 shows a schematic diagram of a space coordinate system, where O is the space position of the head of the virtual robot, and a is the space position of the head of the target user, then the direction vector

Projection on plane XOZ

Angle ∠ A with the X axis₁OA₂In order to be the Yaw, the method comprises the steps of,

angle ∠ AOA with Y-axis₃Is Pitch.

For example, if the spatial position of the head of the virtual robot is set to O ═ 0,0,0, it is assumed that the spatial position of the head of the target user is O

Can vector the direction

As the virtual robot's face orientation when "looking at" the target user's face, the corresponding target pose parameters Pitch, Yaw may be based on the direction vector

Respectively calculated by the following methods:

namely, (45,45) the target attitude parameter.

Step S352: and acquiring the current attitude parameters of the head of the virtual robot.

In some embodiments, when the head pose of the virtual robot changes, the electronic device may record the current pose parameter of the head of the virtual robot in real time, so that the electronic device may acquire the current pose parameter of the head of the virtual robot. The current attitude parameter corresponds to the target attitude parameter, and is a 2-dimensional vector, namely Pitch dimension and Yaw dimension.

In other embodiments, the current pose parameter of the head of the virtual robot may be a default pose, which may be the pose of the virtual robot when looking straight ahead. Further, as a way, when the electronic device does not currently run the session process, the current pose parameter of the head of the virtual robot may be adjusted back to the default pose.

Step S353: determining a sequence of pose adjustment parameters of the head of the virtual robot based on the current pose parameters and the target pose parameters.

After obtaining the current attitude parameter and the target attitude parameter of the head of the virtual robot, the electronic device may determine an attitude adjustment parameter sequence of the head of the virtual robot based on the current attitude parameter and the target attitude parameter. The sequence of posture adjustment parameters may be understood as posture adjustment parameters for transition in the process of adjusting the posture of the head of the virtual robot to face the target user. For example, the current pose parameter P_current＝(Pitch_current,Yaw_current) Change to target attitude parameter P_target＝(Pitch_target,Yaw_target) The desired sequence of pose adjustment parameters may be ([ P ]_current,P₀,P₁,…,P_k,P_target]) Wherein "P₀,P₁,…,P_k"is the attitude adjustment parameter for transition. It is understood that the more the pose adjustment parameters of the transition portion, the smoother the action of the change in the pose of the head of the virtual robot (e.g., transition from a frontal face toward a straight ahead to a left-down face).

In some embodiments, the determining of the pose adjustment parameter sequence may be pre-calculating a common pose adjustment condition, and storing the required pose adjustment parameter sequence in a database, so that the electronic device pre-stores the database of the "pose adjustment parameter sequence" and then matches an appropriate "pose adjustment parameter sequence" from the database according to the current pose parameter and the target pose parameter.

In other embodiments, the sequence of pose adjustment parameters may also be generated in real time according to the current pose parameters and the target pose parameters. As an embodiment, the plurality of pose adjustment parameters in the sequence of pose adjustment parameters may be uniformly varied. For example, setting the one-dimensional angle change of the head pose to be not more than 3 per 100ms, the pose adjustment parameter sequence of the head pose of the virtual robot changing from (0,90) to (-30,110) within 1 second may be a sequence of length 11: [ (0,90), (-3,92), (-6,94), (-9,96), (-12,98), (-15,100), (-18,102), (-21,104), (-24,106), (-27,108), (-30,110) ], a "uniform" sequence of attitude adjustment parameters that achieves uniform variation of the attitude adjustment parameters. Wherein the change speeds of Pitch and Yaw are 30 °/s (i.e., 3 °/100ms) and 20 °/s (i.e., 2 °/100ms), respectively. It should be understood that the above determination manner of the pose adjustment parameter sequence is only an example, and the specific determination manner of the pose adjustment parameter sequence is not limited in the embodiment of the present application.

Step S360: gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating an image sequence composed of a plurality of frames of successive pose images of the virtual robot.

In the embodiment of the present application, step S360 may refer to the contents of the foregoing embodiments, and is not described herein again.

In some embodiments, the head orientation and body orientation of the virtual robot may be maintained coincident. Specifically, referring to fig. 7, the step of gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user and generating an image sequence may include:

step S361: and determining a body adjustment parameter sequence of the virtual robot according to the corresponding relation between the head and the body of the virtual robot and the posture adjustment parameter sequence, wherein the body adjustment parameter sequence corresponds to the posture adjustment parameter sequence.

In some embodiments, the electronic device may determine a body adjustment parameter sequence of the virtual robot corresponding to the pose adjustment parameter sequence from a head-to-body correspondence of the virtual robot. The body adjustment parameter sequence may include a plurality of body adjustment parameters, and the posture adjustment parameter sequence and the body adjustment parameter sequence need to be aligned in time to ensure one-to-one correspondence.

Step S362: gradually adjusting the head posture and the body posture of the virtual robot based on the posture adjustment parameter sequence and the body adjustment parameter sequence until the head and the body of the virtual robot face the target user, and generating an image sequence.

In some embodiments, to reduce the obtrusiveness of the transition actions of the head pose and the body pose of the virtual robot, the electronic device may gradually adjust the head pose and the body pose of the virtual robot based on the sequence of pose adjustment parameters and the sequence of body adjustment parameters until the head and the body of the virtual robot are directed toward the target user, and generate the sequence of images. Therefore, the head posture and the body posture of the virtual robot are dynamically adjusted according to the head position of the target user, and the face and the body of the virtual robot always face the target user. The specific gradual adjustment manner can refer to the contents in the foregoing embodiments, and is not described herein again.

In some embodiments, the body may also be fixed in a certain orientation without turning with the head pose changes. For example, when there are a plurality of session progresses, the body of the virtual robot can be adjusted to face the front, and only the posture of the head can be adjusted, so that the virtual robot can quickly look at the corresponding bound user when switching between different session progresses.

In some embodiments, at least one of an expression parameter sequence, a limb action parameter sequence, and a mouth parameter sequence corresponding to the posture adjustment parameter sequence may also be acquired. Wherein, the mouth shape parameter sequence is matched with the reply audio of the current virtual robot, and the expression parameter isThe number sequence, the limb movement parameter sequence, the mouth shape parameter sequence and the posture adjustment parameter sequence also need to be aligned in time to ensure one-to-one correspondence. For example, the sequence of attitude adjustment parameters is ([ P ]_current,P₀,P₁,…,P_k,P_target]) When, the expression parameter sequence may be (a)_current,a₀,…a_k,a_target)。

Furthermore, the expression parameter sequence may further include an eye adjustment parameter sequence, the eye adjustment parameter sequence corresponds to the posture adjustment parameter sequence, and the electronic device may gradually adjust the head posture, the eye posture and the body posture of the virtual robot based on the posture adjustment parameter sequence, the eye adjustment parameter sequence and the body adjustment parameter sequence until the head, the eyes and the body of the virtual robot face the target user, so as to ensure that the eyes of the virtual robot always look in the direction toward the head of the virtual robot. Since the head of the virtual robot is directed toward the head of the target user, an effect that the virtual robot "looks" at the target user may be produced.

Step S370: and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence.

In the embodiment of the present application, step S370 may refer to the contents of the foregoing embodiments, and is not described herein again.

In some embodiments, if the current interactive video is not played completely, that is, the head pose parameter of the virtual robot is not adjusted to the target pose parameter, and the head position of the target user has changed greatly, the electronic device may acquire a new target pose parameter, and gradually adjust the head pose of the virtual robot according to the new target pose parameter. Therefore, referring to fig. 8, after the interactive video including the head pose change of the virtual robot is generated and output according to the image sequence, the virtual robot interaction method may further include:

step S400: in the playing process of the interactive video, if the change of the spatial position of the head of the target user is detected, acquiring a target update posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user according to the changed spatial position.

In some embodiments, after the electronic device generates and outputs the interactive video, the interactive video may be played for presenting the user with a visual effect that the virtual robot rotates according to the head movement of the target user. In the playing process of the interactive video, the electronic device can detect the spatial position of the head of the target user in real time, and if the change of the spatial position of the head of the target user is detected, the target attitude parameter of the head of the virtual robot when the head of the virtual robot faces the target user (the changed head position) can be obtained again according to the changed spatial position and used as the target updating attitude parameter.

Specifically, the spatial position of the head of the target user can be determined in real time according to the audio data of the target user acquired in real time, so that whether the spatial position of the head of the target user changes can be detected in real time.

Step S410: when the difference value between the target update attitude parameter and the target attitude parameter is larger than a preset value, determining a new attitude adjustment parameter sequence of the head of the virtual robot based on the current attitude parameter of the head of the virtual robot and the target update attitude parameter.

In some embodiments, when the difference between the re-acquired target update posture parameter and the previously acquired target posture parameter is greater than the preset value, it may be considered that the spatial position of the head of the target user is largely changed, and thus, to ensure the "face-to-face" effect of the virtual robot and the target user, a new posture adjustment parameter sequence of the head of the virtual robot may be re-determined based on the current posture parameter of the head of the virtual robot and the above target update posture parameter.

In some embodiments, the current pose parameter of the head of the virtual robot may be a pose adjustment parameter during stepwise adjustment of the virtual robot according to a previously acquired sequence of pose adjustment parameters, and may be determined according to a time point at which the head of the target user changes, so that the virtual robot may be smoothly updated from the previous pose adjustment parameter to the newly acquired pose adjustment parameter.

In some embodiments, the electronic device may also calculate a sequence of pose adjustment parameters from the current pose parameter to the target pose parameter of the head of the virtual robot at a small time interval t, so as to ensure that the electronic device can always drive the head of the virtual robot with the latest sequence of pose adjustment parameters.

Step S420: gradually adjusting the head pose of the virtual robot based on the new pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating a new image sequence.

After obtaining the new pose adjustment parameter sequence, the electronic device may gradually adjust the head pose of the virtual robot based on the new pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generate a new image sequence. Specifically, the step-by-step adjustment is described in the foregoing embodiments and will not be described herein.

Step S430: and generating and outputting an interactive update video containing the head posture change of the virtual robot according to the new image sequence, wherein the interactive update video is used for replacing the interactive video to play.

After obtaining the new image sequence, the electronic device may generate and output an interactive update video containing the head pose change of the virtual robot. The interactive update video is used for replacing the interactive video to play. Therefore, after the head position of the target user changes, the head posture of the virtual robot can be dynamically adjusted in real time according to the changed head position of the target user, and the head of the virtual robot always faces the target user.

It can be understood that, in this embodiment, each of the above steps may be performed locally by the electronic device, may also be performed in the server, and may also be performed by the electronic device and the server separately, and according to different actual application scenarios, tasks may be allocated according to requirements to implement an optimized virtual robot customer service experience, which is not limited herein.

According to the interaction method of the virtual robot provided by the embodiment of the application, when the audio data of the first user is obtained, the first voice print feature corresponding to the audio data of the first user is extracted, the first session process corresponding to the first user is established to bind the first session process with the first voice print feature, then when the current session process is the first session process, the first user corresponding to the first session process is used as the target user corresponding to the current session process, the audio data of the target user is obtained in real time according to the first voice print feature, so that the spatial position of the head of the target user is determined according to the audio data, then the posture adjustment parameter sequence of the head of the virtual robot is obtained according to the spatial position, so that the head posture of the virtual robot is gradually adjusted based on the posture adjustment parameter sequence until the head of the virtual robot faces the target user, and generating a sequence of images, thereby generating and outputting an interactive video containing the head pose change of the virtual robot. Under the condition that a plurality of users exist, the head posture of the virtual robot can be dynamically adjusted according to the head position of the binding user corresponding to the current running process, so that the virtual robot in the interactive video always faces the binding user, the naturalness of human-computer interaction is improved, and the human-computer interaction experience is optimized.

Referring to fig. 9, fig. 9 is a flowchart illustrating an interaction method of a robot according to another embodiment of the present application. As will be described in detail with respect to the flow shown in fig. 9, the interaction method of the robot specifically includes the following steps:

step S510: and when the current session process is the first session process, if the audio data of a second user is acquired, extracting a second voiceprint feature corresponding to the audio data of the second user.

In some embodiments, when the current session process is a first session process of a first user and a virtual robot, if the first session process is not ended, the electronic device acquires audio data of a second user, and a second voiceprint feature corresponding to the audio data of the second user may be extracted to establish a new session process. The second user and the first user are different users, and correspondingly, the first voiceprint feature and the second voiceprint feature are different.

In other embodiments, if the first session progress has not ended, the electronic device may also only obtain the audio data of the first user and ignore the audio data of the other users, i.e., without obtaining the audio data of the other users and establishing a new session progress, so that the electronic device has only one session progress.

Step S520: and establishing a second session process corresponding to the second user, and binding the second session process with the second voiceprint feature.

In some embodiments, after extracting the second voiceprint feature, the electronic device may establish a second session process corresponding to the second user and bind the second session process with the second voiceprint feature. Wherein thus the electronic device may have multiple user-bound session processes.

Step S530: and when the first session process is ended or the first session process is in a waiting timeout state, switching the current session process to the second session process.

In some embodiments, when the second user needs to interact with the virtual robot, session process switching needs to be performed. Specifically, when the first session process is ended or the first session process is in a waiting timeout state, the current session process is switched to the second session process, so that interaction with the second user is realized.

In some embodiments, the ending of the first session process may be actively ended by the first user by inputting a specified instruction, which may be a text instruction or a voice instruction, such as 'bye'.

In other embodiments, the ending of the first session process may also be a forced ending of the electronic device. As a mode, if the electronic device does not acquire the audio data of the first user, the first session process is not immediately interrupted, but the set duration is the maximum waiting duration of the first session process by setting the set duration, so that the first session process is interrupted only when the waiting duration of the first session process is longer than the set duration, that is, the first session process is forcibly ended. Therefore, the user does not need to have a conversation with the robot all the time, namely, the user can optionally have pause and rest in the interaction process under the condition that the pause time is less than the set duration.

In some embodiments, referring to fig. 10, the first session process in the wait timeout state may include:

step S531: and when the audio data of the first user is not acquired, timing the waiting time.

In some embodiments, when the electronic device does not acquire the audio data of the first user, the waiting time may be counted to determine whether the first session process is ended according to the waiting time. Wherein the timing can be performed in a variety of ways. For example, with a timer, when the electronic device does not acquire the audio data of the first user, the timer is started to count time.

Step S532: and when the timed numerical value does not reach the set duration and the audio data of the first user is not collected in the timing period, determining that the first session process is in a waiting overtime state.

In some embodiments, when the counted value does not reach the set duration and the audio data of the first user is not collected during the counting period, the electronic device may determine that the first session process is in a wait timeout state and may not force the first session process to end. Therefore, by setting the waiting time, the first conversation process is maintained in the state of 'not confirming the end yet' under the condition of not overtime, and the first conversation process is not interrupted immediately, so that the condition that the conversation process is ended due to occasional pause of a user and the same conversation process is repeatedly established is avoided.

In some embodiments, the wait time for the first session process may also be determined to have expired by setting a countdown. Wherein, the initial value of the countdown is set as the set time length.

In some embodiments, if the electronic device re-acquires the audio data of the first user during the waiting period of the first session progress, the countdown of the waiting time may be reset, or the timer may be restarted, and the counting or the countdown may be restarted until the electronic device does not acquire the audio data of the first user next time.

Step S540: and taking a second user corresponding to the second session process as a target user corresponding to the current session process, and acquiring the audio data of the target user in real time according to the second voiceprint feature.

In some embodiments, when the electronic device switches the current session process from the first session process to the second session process, the second user corresponding to the second session process may be used as the target user corresponding to the current session process, and according to the second voiceprint feature, the audio data of the target user, that is, the audio data of the second user, is obtained in real time. The specific image obtaining manner can refer to the contents of the foregoing embodiments, and is not described again.

Step S550: determining a spatial position of the target user's head from the audio data.

Step S560: and acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters.

Step S570: gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating an image sequence composed of a plurality of frames of successive pose images of the virtual robot.

Step S580: and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence.

In the embodiment of the present application, steps S550 to S580 may refer to the contents of the foregoing embodiments, and are not described herein again.

For example, when the electronic device is currently running a session process a, the head and body of the virtual robot can be adjusted to face the user a, and when the electronic device is currently running a session process switched to a session process B, the head and body of the virtual robot can be adjusted to face the user B to interact with the user B.

According to the interaction method of the virtual robot provided by the embodiment of the application, when the current session process is the first session process, if the audio data of the second user is acquired, the second voiceprint feature corresponding to the audio data of the second user can be extracted, the second session process corresponding to the second user is established, the second session process is bound with the second voiceprint feature, then when the first session process is ended or the first session process is in a waiting timeout state, the current session process is switched to the second session process, so that the virtual robot interacts with the second user currently, the second user corresponding to the second session process can be used as a target user corresponding to the current session process, the audio data of the target user is acquired in real time according to the second voiceprint feature, then the spatial position of the head of the target user is determined according to the audio data, and according to the spatial position, acquiring a head posture adjustment parameter sequence of the virtual robot, gradually adjusting the head posture of the virtual robot based on the posture adjustment parameter sequence until the head of the virtual robot faces a target user, generating an image sequence, and generating and outputting an interactive video containing head posture changes of the virtual robot according to the image sequence. The method and the device can establish the session process bound by a plurality of users, and can update the head posture of the virtual robot from the head position of one user to the head posture of another user in real time when the session process is switched, so that the head posture and the body posture of the digital robot can be dynamically adjusted according to different bound users when the interaction is carried out with the bound users.

It should be understood that, although the steps in the flow charts of fig. 2, 4, 5, 7, 8, 9, and 10 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 4, 5, 7, 8, 9, and 10 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the sub-steps or stages may not necessarily be performed in sequence, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.

Referring to fig. 11, fig. 11 is a block diagram illustrating an interaction apparatus of a virtual robot according to an embodiment of the present disclosure. As will be explained below with respect to the block diagram of fig. 11, the virtual robot interaction device 900 includes: an audio acquisition module 910, a location determination module 920, a parameter acquisition module 930, an image generation module 940, and a video generation module 950. Wherein:

an audio obtaining module 910, configured to obtain, in real time, audio data of a target user corresponding to a current session process, where the virtual robot interacts with the target user in a one-to-one manner in the session process;

a position determining module 920, configured to determine a spatial position of the head of the target user according to the audio data;

a parameter obtaining module 930, configured to obtain, according to the spatial position, an attitude adjustment parameter sequence of the head of the virtual robot, where the attitude adjustment parameter sequence includes a plurality of attitude adjustment parameters;

an image generation module 940, configured to gradually adjust the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generate an image sequence, where the image sequence is formed by multiple consecutive pose images of the virtual robot;

a video generating module 950, configured to generate and output an interactive video including a change in head pose of the virtual robot according to the image sequence.

In some embodiments, the audio data is collected by two microphones, and the position determining module 920 may be specifically configured to: determining a plane coordinate of a sound source corresponding to the audio data according to the audio feature difference of the same audio data acquired by the two microphones; converting the plane coordinates into three-dimensional coordinates in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

In other embodiments, the audio data is collected by a microphone array, the microphone array is composed of at least 3 microphones, and the position determining module 920 may be further specifically configured to: according to the audio feature difference of the same audio data collected by each microphone in the microphone array, determining the three-dimensional coordinates of a sound source of the audio data in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

In some embodiments, the interaction device 900 of the virtual robot may further include: the device comprises a first feature extraction module and a first feature binding module. The first feature extraction module is used for extracting a first voiceprint feature corresponding to audio data of a first user when the audio data of the first user are obtained; the first feature binding module is used for establishing a first session process corresponding to the first user and binding the first session process with the first voiceprint feature. The audio acquisition module 910 may be specifically configured to: and when the current conversation process is the first conversation process, taking a first user corresponding to the first conversation process as a target user corresponding to the current conversation process, and acquiring the audio data of the target user in real time according to the first voiceprint characteristic.

Further, in some embodiments, the interaction device 900 of the virtual robot may further include: the system comprises a second feature extraction module, a second feature binding module and a process switching module. The second feature extraction module is used for extracting a second voiceprint feature corresponding to the audio data of a second user if the audio data of the second user is acquired when the current conversation process is the first conversation process; the second feature binding module is used for establishing a second session process corresponding to the second user and binding the second session process with the second voiceprint feature; and the process switching module is used for switching the current session process to the second session process when the first session process is ended or the first session process is in a waiting timeout state. The audio acquisition module 910 may be specifically configured to: and taking a second user corresponding to the second session process as a target user corresponding to the current session process, and acquiring the audio data of the target user in real time according to the second voiceprint feature.

Further, in some embodiments, the waiting for the first session process in the process switching module to be in a timeout state may include: when the audio data of the first user is not acquired, timing the waiting time; and when the timed numerical value does not reach the set duration and the audio data of the first user is not collected in the timing period, determining that the first session process is in a waiting overtime state.

In some embodiments, the parameter obtaining module 930 may be specifically configured to: according to the space position, acquiring a target posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user; acquiring current attitude parameters of the head of the virtual robot; determining a sequence of pose adjustment parameters of the head of the virtual robot based on the current pose parameters and the target pose parameters.

Further, in some embodiments, the interaction device 900 of the virtual robot may further include: the device comprises a position detection module, a parameter updating module, an image updating module and a video updating module. The position detection module is used for acquiring a target update posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user according to the changed spatial position if the spatial position of the head of the target user is detected to be changed in the playing process of the interactive video; the parameter updating module is used for determining a new posture adjustment parameter sequence of the head of the virtual robot based on the current posture parameter of the head of the virtual robot and the target updating posture parameter when the difference value between the target updating posture parameter and the target posture parameter is larger than a preset value; the image updating module is used for gradually adjusting the head posture of the virtual robot based on the new posture adjustment parameter sequence until the head of the virtual robot faces the target user and generating a new image sequence; and the video updating module is used for generating and outputting an interactive updating video containing the head posture change of the virtual robot according to the new image sequence, and the interactive updating video is used for replacing the interactive video to play.

In some embodiments, the image generation module 940 may be specifically configured to: determining a body adjustment parameter sequence of the virtual robot according to the corresponding relation between the head and the body of the virtual robot and the posture adjustment parameter sequence, wherein the body adjustment parameter sequence corresponds to the posture adjustment parameter sequence; gradually adjusting the head posture and the body posture of the virtual robot based on the posture adjustment parameter sequence and the body adjustment parameter sequence until the head and the body of the virtual robot face the target user, and generating an image sequence.

The interaction device of the robot provided in the embodiment of the present application is used to implement the corresponding interaction method of the robot in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.

It can be clearly understood by those skilled in the art that the interaction device for a robot provided in the embodiment of the present application can implement each process in the foregoing method embodiment, and for convenience and brevity of description, the specific working processes of the device and the module described above may refer to the corresponding processes in the foregoing method embodiment, and are not described herein again.

In the several embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 12, a block diagram of an electronic device 600 (i.e., the electronic device 101) according to an embodiment of the present disclosure is shown. The electronic device 600 may be an electronic device capable of running an application, such as a smart phone, a tablet computer, an electronic book, or an entity robot. The electronic device 600 in the present application may include one or more of the following components: a processor 610, a memory 620, and one or more applications, wherein the one or more applications may be stored in the memory 620 and configured to be executed by the one or more processors 610, the one or more programs configured to perform the methods as described in the aforementioned method embodiments.

The processor 610 may include one or more processing cores. The processor 610 interfaces with various components throughout the electronic device 600 using various interfaces and circuitry to perform various functions of the electronic device 600 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 620 and invoking data stored in the memory 620. Alternatively, the processor 610 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 610 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 610, but may be implemented by a communication chip.

The Memory 620 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 620 may be used to store instructions, programs, code sets, or instruction sets. The memory 620 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created during use by the electronic device 600 (e.g., phone books, audio-visual data, chat log data), and so forth.

Those skilled in the art will appreciate that the configuration shown in fig. 12 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in fig. 12, or combine certain components, or have a different arrangement of components.

Referring to fig. 13, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable storage medium 1100 has stored therein a program code 1110, the program code 1110 being invokable by the processor for performing the method described in the above-described method embodiments.

The computer-readable storage medium 1100 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 1100 includes a non-transitory computer-readable storage medium. The computer readable storage medium 1100 has storage space for program code 1110 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 1110 may be compressed, for example, in a suitable form.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a smart gateway, a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, the present embodiments are not limited to the above embodiments, which are merely illustrative and not restrictive, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention.

Claims

1. An interaction method of a virtual robot, the method comprising:

acquiring audio data of a target user corresponding to a current conversation process in real time, wherein the virtual robot and the target user interact one to one in the conversation process;

determining a spatial position of the head of the target user according to the audio data;

acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters;

gradually adjusting the head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating an image sequence consisting of a plurality of frames of successive pose images of the virtual robot;

and generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence.

2. The method of claim 1, wherein the audio data is captured by two microphones, and wherein determining the spatial position of the target user's head from the audio data comprises:

determining a plane coordinate of a sound source corresponding to the audio data according to the audio feature difference of the same audio data acquired by the two microphones;

converting the plane coordinates into three-dimensional coordinates in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

3. The method of claim 1, wherein the audio data is collected by a microphone array, the microphone array consisting of at least 3 microphones, and wherein determining the spatial location of the target user's head from the audio data comprises:

according to the audio feature difference of the same audio data collected by each microphone in the microphone array, determining the three-dimensional coordinates of a sound source of the audio data in a space coordinate system, and taking the three-dimensional coordinates as the space position of the head of the target user, wherein the space coordinate system takes the space position of the head of the virtual robot as an origin.

4. The method according to claim 1, wherein before the obtaining audio data of the target user corresponding to the current session progress in real time, the method further comprises:

when audio data of a first user are acquired, extracting first voiceprint features corresponding to the audio data of the first user;

establishing a first session process corresponding to the first user, and binding the first session process with the first voiceprint feature;

and when the current conversation process is the first conversation process, taking a first user corresponding to the first conversation process as a target user corresponding to the current conversation process, and acquiring the audio data of the target user in real time according to the first voiceprint characteristic.

5. The method of claim 4, further comprising:

when the current session process is the first session process, if audio data of a second user is acquired, extracting second voiceprint features corresponding to the audio data of the second user;

establishing a second session process corresponding to the second user, and binding the second session process with the second voiceprint feature;

when the first session process is ended or the first session process is in a waiting timeout state, switching the current session process into the second session process;

6. The method of claim 5, wherein the first session process is in a wait timeout state comprising:

when the audio data of the first user is not acquired, timing the waiting time;

and when the timed numerical value does not reach the set duration and the audio data of the first user is not collected in the timing period, determining that the first session process is in a waiting overtime state.

7. The method according to claim 1, wherein the obtaining a sequence of pose adjustment parameters of the head of the virtual robot according to the spatial position comprises:

acquiring a target posture parameter of the head of the virtual robot when the virtual robot faces the target user according to the spatial position;

acquiring current attitude parameters of the head of the virtual robot;

determining a sequence of pose adjustment parameters of the head of the virtual robot based on the current pose parameters and the target pose parameters.

8. The method of claim 7, wherein after said generating and outputting an interactive video containing head pose changes of the virtual robot from the sequence of images, the method further comprises:

in the playing process of the interactive video, if the change of the spatial position of the head of the target user is detected, acquiring a target update posture parameter of the head of the virtual robot when the head of the virtual robot faces the target user according to the changed spatial position;

when the difference value between the target updating posture parameter and the target posture parameter is larger than a preset value, determining a new posture adjustment parameter sequence of the head of the virtual robot based on the current posture parameter of the head of the virtual robot and the target updating posture parameter;

gradually adjusting the head pose of the virtual robot based on the new pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generating a new image sequence;

and generating and outputting an interactive update video containing the head posture change of the virtual robot according to the new image sequence, wherein the interactive update video is used for replacing the interactive video to play.

9. The method of any one of claims 1-8, wherein said gradually adjusting the pose of the head of the virtual robot based on the sequence of pose adjustment parameters until the head of the virtual robot is oriented toward the target user and generating a sequence of images comprises:

determining a body adjustment parameter sequence of the virtual robot according to the corresponding relation between the head and the body of the virtual robot and the posture adjustment parameter sequence, wherein the body adjustment parameter sequence corresponds to the posture adjustment parameter sequence;

gradually adjusting the head posture and the body posture of the virtual robot based on the posture adjustment parameter sequence and the body adjustment parameter sequence until the head and the body of the virtual robot face the target user, and generating an image sequence.

10. An interaction apparatus of a virtual robot, the apparatus comprising:

the audio acquisition module is used for acquiring audio data of a target user corresponding to a current conversation process in real time, wherein the virtual robot and the target user interact one to one in the conversation process;

a position determination module for determining a spatial position of the head of the target user according to the audio data;

the parameter acquisition module is used for acquiring a posture adjustment parameter sequence of the head of the virtual robot according to the spatial position, wherein the posture adjustment parameter sequence comprises a plurality of posture adjustment parameters;

an image generation module, configured to gradually adjust a head pose of the virtual robot based on the pose adjustment parameter sequence until the head of the virtual robot faces the target user, and generate an image sequence, where the image sequence is formed by multiple frames of consecutive pose images of the virtual robot;

and the video generation module is used for generating and outputting an interactive video containing the head posture change of the virtual robot according to the image sequence.

11. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-9.

12. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 9.