CN111063024A - Three-dimensional virtual human driving method and device, electronic equipment and storage medium - Google Patents

Three-dimensional virtual human driving method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111063024A
CN111063024A CN201911267521.2A CN201911267521A CN111063024A CN 111063024 A CN111063024 A CN 111063024A CN 201911267521 A CN201911267521 A CN 201911267521A CN 111063024 A CN111063024 A CN 111063024A
Authority
CN
China
Prior art keywords
driving
dimensional virtual
virtual human
information
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911267521.2A
Other languages
Chinese (zh)
Inventor
黄生辉
林祥凯
欧阳才晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911267521.2A priority Critical patent/CN111063024A/en
Publication of CN111063024A publication Critical patent/CN111063024A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality

Abstract

The disclosure provides a three-dimensional virtual human driving method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a figure image with depth information; based on the depth information, performing three-dimensional reconstruction on the character image to obtain a three-dimensional virtual human corresponding to the character image; acquiring driving information of a trigger driver; and driving the three-dimensional virtual human to execute corresponding actions based on the driving information. The embodiment of the disclosure can improve the fidelity of the three-dimensional virtual human to execute the action.

Description

Three-dimensional virtual human driving method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of artificial intelligence, in particular to a three-dimensional virtual human driving method and device, electronic equipment and a storage medium.
Background
With the development of internet technology, the influence exerted by a three-dimensional virtual human in the life of more and more users is larger and larger. Specifically, the user can interact with the three-dimensional virtual human to meet the requirements of the user on information, emotion and entertainment. In the prior art, the process of driving the three-dimensional virtual human is relatively rigid and low in fidelity, so that emotion is difficult to blend into when a user interacts with the three-dimensional virtual human, and user experience is poor.
Disclosure of Invention
An object of the present disclosure is to provide a three-dimensional virtual human driving method, apparatus, electronic device, and storage medium, which can improve the fidelity of the actions performed by the three-dimensional virtual human.
According to an aspect of the disclosed embodiments, a three-dimensional virtual human driving method is disclosed, the method comprising:
acquiring a figure image with depth information;
based on the depth information, performing three-dimensional reconstruction on the character image to obtain a three-dimensional virtual human corresponding to the character image;
acquiring driving information of a trigger driver;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving information.
According to an aspect of the disclosed embodiments, a three-dimensional virtual human driving apparatus is disclosed, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a person image with depth information;
the second acquisition module is configured to perform three-dimensional reconstruction on the character image based on the depth information to acquire a three-dimensional virtual human corresponding to the character image;
the third acquisition module is configured to acquire driving information of the trigger driving;
and the driving module is configured to drive the three-dimensional virtual human to execute corresponding actions based on the driving information.
According to an aspect of the disclosed embodiments, a three-dimensional virtual human driving electronic device is disclosed, comprising: a memory storing computer readable instructions; a processor reading computer readable instructions stored by the memory to perform the method of any of the preceding claims.
According to an aspect of embodiments of the present disclosure, a computer-readable storage medium is disclosed, having computer-readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any of the preceding claims.
In the embodiment of the disclosure, the character image is three-dimensionally reconstructed based on the depth information of the character image, so that the generated three-dimensional virtual human can comprehensively restore the appearance of the user corresponding to the character image. And then, the driving information of the trigger driving is obtained, the three-dimensional virtual human is driven to execute the corresponding action, and the fidelity of the action execution of the three-dimensional virtual human is improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIGS. 1A-1D illustrate architectural diagrams that can be employed with embodiments of the present disclosure.
Fig. 2 shows a correlation interface of a client in a process of generating a three-dimensional virtual human according to an embodiment of the disclosure.
Fig. 3 shows a flowchart of a three-dimensional virtual human driving method according to an embodiment of the present disclosure.
FIG. 4 illustrates the implementation logic of three-dimensional avatar drive according to one embodiment of the present disclosure.
FIG. 5 illustrates the implementation logic of three-dimensional avatar drive according to one embodiment of the present disclosure.
Fig. 6 shows a block diagram of a three-dimensional virtual human driving apparatus according to one embodiment of the present disclosure.
FIG. 7 shows a hardware diagram of a three-dimensional virtual human drive electronic device, according to one embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
The scheme provided by the embodiment of the disclosure relates to an artificial intelligence computer vision technology, a voice technology and a natural language processing technology, and is specifically explained by the following embodiments.
An architecture to which embodiments of the present disclosure may be applied is described below with reference first to fig. 1A-1D.
FIG. 1A illustrates an architecture for an application of an embodiment of the present disclosure. In this embodiment, the generation and driving of the three-dimensional virtual human is controlled by the client terminal 10. Specifically, the client 10 obtains a character image with depth information of a user, performs three-dimensional reconstruction on the character image based on the depth information, and obtains a three-dimensional virtual human corresponding to the character image inside the client 10, that is, obtains a three-dimensional virtual human corresponding to the user. After the three-dimensional virtual human is generated, the user can interact with the three-dimensional virtual human. Specifically, if the client 10 acquires the driving information for triggering driving, the client drives the three-dimensional virtual human to execute a corresponding action, so as to interact with the user.
FIG. 1B illustrates the architecture of an application of an embodiment of the present disclosure. In this embodiment, the generation and driving of the three-dimensional virtual human are mainly controlled by the cloud server 20. Specifically, after acquiring the personal image with depth information of the user, the client 10 uploads the personal image with depth information to the cloud server 20 (the process shown by the dotted line in fig. 1B). After the cloud server 20 obtains the character image with the depth information, three-dimensional reconstruction is performed on the character image based on the depth information, and a virtual human corresponding to the character image is obtained inside the cloud server 20, that is, a three-dimensional virtual human corresponding to the user is obtained. After the three-dimensional virtual human is generated, the cloud server 20 sends the relevant model data of the three-dimensional virtual human to the client 10, so that the user can interact with the three-dimensional virtual human in the client 10. Specifically, if the cloud server 20 acquires the driving information for triggering driving, a corresponding driving instruction is sent to the client 10, so that the client 10 drives the three-dimensional virtual human inside the client 10 to execute a corresponding action according to the driving instruction, so as to interact with the user.
FIG. 1C illustrates the architecture of an application of an embodiment of the present disclosure. In this embodiment, the generation and driving of the three-dimensional virtual human are mainly controlled by the cloud server 20. Specifically, after acquiring the image of the character with depth information of the anchor, the anchor terminal 10a uploads the image of the character with depth information to the cloud server 20 (the process shown by the dotted line in fig. 1C). After the cloud server 20 obtains the character image with the depth information, three-dimensional reconstruction is performed on the character image based on the depth information, and a virtual human corresponding to the character image is obtained inside the cloud server 20, that is, a three-dimensional virtual human corresponding to the anchor is obtained.
After the three-dimensional virtual human is generated, the cloud server 20 sends the relevant model data of the three-dimensional virtual human to the anchor terminal 10a, so that the anchor can interact with the three-dimensional virtual human in the anchor terminal 10a, and specifically, the three-dimensional virtual human in the anchor terminal 10a can simulate the behavior of the anchor; if the cloud server 20 acquires the anchor behavior data, a corresponding driving instruction is sent to the anchor terminal 10a, so that the anchor terminal 10a drives the three-dimensional virtual human inside the anchor terminal 10a to execute a corresponding action according to the driving instruction, so as to simulate the anchor behavior.
After the three-dimensional virtual human is generated, the cloud server 20 sends the relevant model data of the three-dimensional virtual human to the audience 10b, so that the audience 10b can play a video of the three-dimensional virtual human imitating the anchor behavior. Specifically, the cloud server 20 sends a driving instruction indicating that the three-dimensional avatar in the anchor 10a imitates the anchor behavior to the viewer 10b, so that the viewer 10b can drive the three-dimensional avatar in the viewer 10b according to the driving instruction and play a video of the three-dimensional avatar imitating the anchor behavior to the viewer.
FIG. 1D illustrates the architecture of an application of an embodiment of the present disclosure. In this embodiment, the generation and driving of the three-dimensional virtual human are mainly controlled by the cloud server 20. Specifically, after acquiring the personal image with depth information of the first user, the first user 10a uploads the personal image with depth information to the cloud server 20 (the process shown by the dotted line in fig. 1D). After the cloud server 20 obtains the character image with the depth information, three-dimensional reconstruction is performed on the character image based on the depth information, and a virtual human corresponding to the character image is obtained inside the cloud server 20, that is, a three-dimensional virtual human corresponding to the first user is obtained.
After the three-dimensional virtual human is generated, the cloud server 20 sends the relevant model data of the three-dimensional virtual human to the second user terminal 10b, so that the second user can interact with the three-dimensional virtual human of the first user in the second user terminal 10 b. Specifically, if the cloud server 20 acquires the driving information for triggering driving, a corresponding driving instruction is sent to the second user end 10b, so that the second user end 10b drives the three-dimensional virtual human of the first user inside the second user end 10b to execute a corresponding action according to the driving instruction, so as to interact with the second user.
In a more specific application scenario, the embodiment can be applied to the scenario of a chat robot. For example: the first user is the xiao zhang, and the second user is the xiao wang, and xiao zhang and xiao wang are friend each other, but the xiao zhang can not often contact with the xiao wang because work is busy. And uploading the figure image with the depth information to a chat robot system of a cloud server through a mobile phone terminal of the small piece. The chat robot system generates a small three-dimensional virtual human according to the small image of the person with the depth information and deploys the small three-dimensional virtual human to the mobile phone terminal of the King. Therefore, the King can chat with the three-dimensional virtual man with the small piece at any time in the chat interface of the mobile phone terminal.
It should be noted that the description of the applicable architecture of the present disclosure presented in relation to fig. 1A-1D is merely exemplary and should not limit the scope of the present disclosure in its function and use.
It should be noted that the method provided by the present disclosure may be deployed to a personal computer terminal, a mobile phone terminal, or other computing terminals with sufficient computing capability.
In the embodiment of the present disclosure, a description is given below with reference to fig. 2 to describe a relevant interface of a client in a process of generating a three-dimensional virtual human, so as to exemplarily show an approximate interaction process of a user and the client in the process of generating the three-dimensional virtual human in the present disclosure.
In this embodiment, the client may be a mobile phone terminal. After a user starts a camera of the mobile phone terminal, the head is calibrated according to a head dotted line frame displayed in an image acquisition interface of the mobile phone terminal. After the user calibrates the head, the user clicks a 'start scanning' button in an image acquisition interface, and the head image is acquired according to the prompt of the mobile phone terminal.
Specifically, in this example, the mobile phone terminal prompts the head angle designation on the image capture interface. For example: with the lapse of time, the mobile phone terminal displays the text information of 'please face the camera', 'please rotate the head by 90 degrees to the left', 'please face the camera and then rotate the head by 90 degrees to the right', 'please face the camera and then lift up the head' in sequence on the image acquisition interface.
Therefore, the mobile phone terminal can select a head image with depth information at a preset angle from video frames obtained by scanning the head of a user: a frontal head image, a left-facing head image, a right-facing head image, an upward-facing head image. And then the mobile phone terminal carries out three-dimensional reconstruction on the head of the user according to the obtained head image with the depth information, so that a head reconstruction result, namely the three-dimensional virtual head of the user, is displayed in an interactive interface of the mobile phone terminal.
It should be noted that the embodiment shown in fig. 2 is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
Specific implementations of the present disclosure are described in detail below.
Referring to fig. 3, a three-dimensional virtual human driving method includes:
step 310, acquiring a person image with depth information;
step 320, performing three-dimensional reconstruction on the character image based on the depth information to obtain a three-dimensional virtual human corresponding to the character image;
step 330, acquiring driving information of the trigger driver;
and 340, driving the three-dimensional virtual human to execute corresponding actions based on the driving information.
In the embodiment of the disclosure, the character image is three-dimensionally reconstructed based on the depth information of the character image, so that the generated three-dimensional virtual human can comprehensively restore the appearance of the user corresponding to the character image. And then, the driving information of the trigger driving is obtained, the three-dimensional virtual human is driven to execute the corresponding action, and the fidelity of the action execution of the three-dimensional virtual human is improved.
It can be understood that the acquired character image with the depth information can be a whole body image of the user, and correspondingly, during three-dimensional reconstruction, a three-dimensional virtual human which is basically consistent with the whole body of the user can be acquired. For purposes of brief description, in the exemplary description of the present disclosure: the acquired character image with the depth information is a head image of the user, and correspondingly, during three-dimensional reconstruction, the acquired three-dimensional virtual head basically consistent with the head of the user is regarded as a three-dimensional virtual human corresponding to the user. This manner of illustration is merely exemplary and should not limit the scope of the present disclosure in any way.
In step 310, a person image with depth information is acquired.
In the embodiment of the disclosure, in order to obtain a person image with depth information, an image acquisition terminal in charge of acquiring the person image is required to acquire the person image through a camera capable of acquiring the depth information.
In an embodiment, the step of setting an acquisition slot corresponding to each shooting angle in advance to acquire a person image with depth information includes: and acquiring the figure image with the depth information of each shooting angle from each acquisition groove.
In this embodiment, each acquisition slot is set in advance at the client for acquiring the person image at the corresponding shooting angle. And when the user triggers image acquisition, the client displays each acquisition groove for indicating the user to specifically acquire the figure images of which shooting angles. After the user shoots the figure image at the corresponding shooting angle through each acquisition groove according to the indication of the acquisition groove, the figure image with the depth information at the corresponding shooting angle can be obtained, and therefore the figure image can be subjected to three-dimensional reconstruction on the basis.
For example: 4 acquisition slots are arranged in advance of the image acquisition interface of the client, namely a slot 1 for shooting a head image of the front side, a slot 2 for shooting a head image towards the left, a slot 3 for shooting a head image towards the right and a slot 4 for shooting a head image towards the upper.
And when the user triggers the client to enter the image acquisition interface, the client displays the 4 acquisition slots in the image acquisition interface. After the user clicks the button of the slot 1, the client pops up a camera window, and displays the text information of 'please just face the camera' on the camera window. Therefore, the user can shoot the front head image in the camera window according to the reminding of the text information. After shooting is finished, the client exits from the camera window, returns to the image acquisition interface, and displays a front head image to be uploaded to the cloud server and acquired in the slot 1.
Similarly, the user clicks the slot 2 to shoot the left-facing head image, and the client displays the left-facing head image to be uploaded to the cloud server and collected in the slot 2 of the image collection interface; the user clicks the slot 3 to shoot the head image facing right, and the client displays the head image facing right to be uploaded to the cloud server and collected in the slot 3 of the image collection interface; and the user clicks the slot 4 to shoot the upward head image, and the client displays the upward head image to be uploaded to the cloud server and collected in the slot 4 of the image collection interface.
After the 4 acquisition grooves acquire the head images at the corresponding shooting angles, the client automatically uploads the 4 head images to the cloud server, so that the cloud server can perform three-dimensional reconstruction on the head of the user according to the 4 head images to obtain the corresponding three-dimensional virtual human.
Or, after the 4 acquisition slots acquire the head images at the corresponding shooting angles, the user can click an "upload" button in the image acquisition interface. The client uploads the 4 head images to the cloud server, so that the cloud server can perform three-dimensional reconstruction on the head of the user according to the 4 head images to obtain the corresponding three-dimensional virtual human.
The embodiment has the advantages that the acquisition grooves corresponding to the shooting angles are arranged in advance, so that the shooting angles corresponding to the person images can be quickly determined according to the corresponding relation between the acquisition grooves and the person images, and the processing time of the person images is saved.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In one embodiment, acquiring a person image with depth information includes:
acquiring a video frame containing a character image and depth information of the video frame;
extracting figure key points in the video frame based on a preset key point extraction algorithm;
acquiring a shooting angle of a character image in the video frame based on the character key point;
and if the shooting angle meets the preset condition, acquiring the figure image in the video frame and the depth information of the figure image.
In the embodiment, after the user triggers image acquisition, the client pops up a camera window and instructs the user to shoot a video according to the corresponding prompt information. After the user finishes shooting the video according to the prompt information, the video frame containing the figure image and the depth information of the video frame can be obtained. Then extracting figure key points of the video frame based on a preset key point extraction algorithm; determining the shooting angle of the figure image of the video frame based on the figure key point of the video frame; and selecting a video frame with a shooting angle meeting a preset condition to obtain a character image in the video frame and depth information of the character image, so that the character image can be subjected to three-dimensional reconstruction on the basis.
For example: after a user triggers a client to enter an image acquisition interface, the client pops up a camera window in the image acquisition interface and sends out the following prompt information in a voice information mode along with the time lapse: the voice information of "please face the camera", the voice information of "please turn the head 90 degrees to the left", the voice information of "please face the camera and then turn the head 90 degrees to the right", and the voice information of "please face the camera and then raise the head upwards".
And after the user finishes shooting the video according to the prompt information, the client uploads the video to the cloud server. The cloud server picks out video frames from the video according to a preset video frame picking rule (for example, picking out one video frame every 5 frames), and extracts character key points of the video frames based on a preset key point extraction algorithm; determining the shooting angle of the figure image of the video frame based on the figure key point of the video frame; according to the shooting angle, selecting a video frame with the head of the user right facing the camera, a video frame with the head of the user facing left, a video frame with the head of the user facing right and a video frame with the head of the user facing upward, and acquiring character images in the video frames and depth information of the character images, so that the cloud server can perform three-dimensional reconstruction on the head of the user on the basis to obtain a corresponding three-dimensional virtual person.
Or after the user finishes shooting the video according to the prompt information, the client picks out video frames from the video according to a preset video frame selection rule (for example, picking out one video frame every 5 frames), and character key points of the video frames are extracted based on a preset key point extraction algorithm; determining the shooting angle of the figure image of the video frame based on the figure key point of the video frame; according to the shooting angle, selecting a video frame with the head of the user facing the camera, a video frame with the head of the user facing left, a video frame with the head of the user facing right and a video frame with the head of the user facing upwards, and acquiring the figure images in the video frames and the depth information of the figure images. The client uploads the character images in the video frames and the depth information of the character images to the cloud server, so that the cloud server can perform three-dimensional reconstruction on the head of the user on the basis to obtain the corresponding three-dimensional virtual human.
The embodiment has the advantages that the selected character image better meets the requirement on the shooting angle by selecting a large number of continuous video frames based on the shooting angle, and the error caused by subjective judgment of the user on the shooting angle is reduced as much as possible.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
The following describes in detail a process of performing three-dimensional reconstruction on a character image to obtain a three-dimensional virtual human corresponding to the character image.
In step 320, based on the depth information, performing three-dimensional reconstruction on the human figure image to obtain a three-dimensional virtual human corresponding to the human figure image.
In an embodiment, the three-dimensional reconstruction of the human figure image based on the depth information to obtain a three-dimensional virtual human corresponding to the human figure image includes:
based on the depth information, performing three-dimensional reconstruction on the figure image to obtain a three-dimensional model corresponding to the figure image;
and performing image rendering on the three-dimensional model based on the texture information of the character image to obtain a three-dimensional virtual human corresponding to the character image.
It can be understood that the depth information reflects the three-dimensional spatial depth of the pixel points. In this embodiment, the positions of the pixel points in the character image in the three-dimensional space are restored based on the depth information, so that a three-dimensional model corresponding to the character image is obtained. And after the three-dimensional model corresponding to the character image is obtained, correspondingly rendering the surface texture of the character image to the surface of the three-dimensional model based on the texture information of the character image, thereby obtaining the three-dimensional virtual human corresponding to the character image.
For example: after a head image with little bright depth information is obtained, three-dimensional reconstruction is carried out on the head image based on the depth information to obtain a three-dimensional model of the head of little bright; and correspondingly rendering the surface texture of the head with the small brightness to the surface of the three-dimensional model of the head with the small brightness based on the texture information of the head image, thereby obtaining the three-dimensional virtual human body consistent with the appearance of the face with the small brightness.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In an embodiment, the three-dimensional reconstruction of the human image based on the depth information to obtain a three-dimensional model corresponding to the human image includes: and processing the depth information by using a preset RGBD algorithm to obtain a three-dimensional model corresponding to the character image.
The RGBD algorithm is an algorithm for three-dimensionally reconstructing a two-dimensional image on the basis of RGB (R red; G green; B blue) data and Depth map (D: Depth map) data.
In this embodiment, the RGBD algorithm is used to process the depth information. Specifically, after the character image and the depth map of the character image are obtained, the RGBD algorithm is used to perform three-dimensional reconstruction on the two-dimensional character image, so as to obtain a three-dimensional model corresponding to the character image.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
The following describes in detail a process of driving the three-dimensional virtual human.
In step 330, driving information for triggering the driving is acquired.
In step 340, the three-dimensional virtual human is driven to execute the corresponding action based on the driving information.
In one embodiment, acquiring the driving information of the trigger driving includes: and acquiring preset driving information of the trigger driving.
Based on the driving information, driving the three-dimensional virtual human to execute corresponding actions, including: and driving the three-dimensional virtual human to execute the corresponding preset action based on the driving information.
In this embodiment, the driving information for triggering driving is set in advance inside the system for driving the three-dimensional virtual human. If the driving of the three-dimensional virtual human is mainly controlled by the client, the system is positioned at the client; if the driving of the three-dimensional virtual human is mainly controlled by the cloud server, the system is located in the cloud server. Correspondingly, the action to be executed by the three-dimensional virtual human corresponding to the driving information is also set in advance.
With the progress of the internal business logic of the system, when a corresponding event is triggered, the system can acquire internal preset trigger-driven driving information, so that the three-dimensional virtual human is driven to execute a preset action.
For example: the driving of the three-dimensional virtual human is mainly controlled by an RPG (Role-playing game) system in the cloud server. By the method provided by the disclosure, when the Xiaoming plays the RPG game at the mobile phone terminal, the three-dimensional virtual human corresponding to the Xiaoming is generated at the mobile phone terminal, and the role A in the game is replaced by the three-dimensional virtual human through the system function of the game. In advance of the fact that the driving information for triggering the driving role A to execute the corresponding action is set in the game system in the cloud server, namely the driving information for triggering the driving role A to run is 'click on the option A'; the driving information for triggering the driving role A to drink water is 'click on the option B'. Thus, when the click on the option A is detected, the character A is controlled to run; and when the click on the option B is detected, controlling the character A to drink water.
After the character A is replaced by the three-dimensional virtual human corresponding to the Xiaoming, the control logic of the three-dimensional virtual human is consistent with that of the character A. Therefore, if the game system in the cloud server detects the click on the option A, the three-dimensional virtual human in the mobile phone terminal is controlled to run on the game interface of the mobile phone terminal; and if the game system in the cloud server detects the click on the option B, controlling the three-dimensional virtual human in the mobile phone terminal to drink water on the game interface of the mobile phone terminal.
It is understood that the preset driving information of the trigger driving does not necessarily need a specific trigger action to be triggered. For example: the action of the character A in the game is set in advance in the game system in the cloud server, and the character A is controlled to laugh on the game interface of the mobile phone terminal every 5 minutes. After the three-dimensional virtual person corresponding to Xiaoming replaces the character A, every 5 minutes, the game system in the cloud server acquires the driving information according to the time information, and then the three-dimensional virtual person in the mobile phone terminal is controlled to laugh on the game interface of the mobile phone terminal.
The embodiment has the advantages that the virtual object (such as a game character) with relatively fixed original behaviors in the system can execute corresponding behaviors (such as running and laughing on the appearance of the user) on the appearance of the user by combining with a relatively closed system, so that the fidelity of the execution of the three-dimensional virtual human actions is improved, and further, the interaction intimacy between the user and the system is improved.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In one embodiment, driving the three-dimensional virtual human to execute a corresponding preset action based on the driving information comprises:
acquiring preset mouth shape data corresponding to the driving information and corresponding preset voice data;
and driving the three-dimensional virtual human to send out the voice corresponding to the voice data by the mouth shape corresponding to the mouth shape data based on the voice data and the mouth shape data.
In this embodiment, the data corresponding to the acquired driving information and used for driving the three-dimensional virtual human to perform the action are preset mouth shape data and preset voice data, and correspondingly, on the basis of the mouth shape data and the voice data, the three-dimensional virtual human is driven to emit the corresponding voice with the corresponding mouth shape.
For example: and setting corresponding mouth shape data, namely data for driving the mouth shape to be regularly opened and closed, and corresponding voice data, namely data for driving the water drinking sound to be sent out for the driving information 'clicking on the option B'. And when the click on the option B is detected, driving the mouth shape rule of the three-dimensional virtual human to open and close according to the mouth shape data and the voice data, and sending out the water drinking sound. Specifically, the mouth shape rule of the three-dimensional virtual person can be driven to open and close, and meanwhile, the loudspeaker at the driving terminal sends out the water drinking sound, so that the effect that the three-dimensional virtual person sends out the water drinking sound when the mouth shape rule is opened and closed is achieved.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In one embodiment, driving the three-dimensional virtual human to execute a corresponding preset action based on the driving information comprises:
acquiring preset posture data corresponding to the driving information;
and driving the three-dimensional virtual human to execute a gesture corresponding to the gesture data based on the gesture data.
In this embodiment, the data for driving the three-dimensional virtual human to perform actions corresponding to the acquired driving information is preset gesture data, and correspondingly, on the basis of the gesture data, the three-dimensional virtual human is driven to perform a corresponding gesture.
For example: the corresponding posture data, data of driving the running posture, is set to the driving information "click on option a" in advance. And when the click on the option A is detected, driving the three-dimensional virtual human to make a running gesture according to the gesture data.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In one embodiment, the driving information for triggering the driving is from the outside of the system for driving the three-dimensional virtual human, and mainly from the outside user. Since the specific content of the external trigger-driven information is difficult to be exhausted, it is difficult to preset a corresponding action inside. Therefore, in this embodiment, when the driving information of the trigger driving from the outside is acquired, the driving information needs to be processed and understood, and then the three-dimensional virtual human is driven to execute the corresponding action.
In one embodiment, based on the driving information, driving the three-dimensional virtual human to execute corresponding actions, including: and driving the three-dimensional virtual human to execute corresponding actions for simulation based on the driving information.
In this embodiment, after acquiring drive information for triggering driving from the outside, the drive information is processed to drive the three-dimensional virtual human to execute a corresponding action for impersonation, so that the three-dimensional virtual human can impersonate a source user of the drive information.
For example: the driving of the three-dimensional virtual human is mainly controlled by a virtual anchor system in the cloud server. By the method provided by the disclosure, the three-dimensional virtual human consistent with the face appearance of the main broadcasting terminal is generated by the minired broadcasting terminal, and the video stream and the voice stream are pushed to the audience terminals of each audience through the virtual main broadcasting system, so that each audience can watch the content presented by the three-dimensional virtual human-acting motion carrier. Specifically, the anchor side uploads the shot small red actions and the collected voice to the virtual anchor system in the cloud server. After receiving the action information and the voice information from the small red, the virtual anchor system drives the three-dimensional virtual human in each audience end to make the action corresponding to the action information and send the voice corresponding to the voice information on the play interface.
The embodiment has the advantages that the three-dimensional virtual human which is basically consistent with the appearance of the user is driven to imitate the behavior of the user, and the fidelity of the action execution of the three-dimensional virtual human is improved, so that the user can more easily put into emotion, and the interactive experience of the user and the three-dimensional virtual human is improved.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In one embodiment, based on the driving information, driving the three-dimensional virtual human to execute corresponding actions, including: and driving the three-dimensional virtual human to execute a corresponding action for responding based on the driving information. In this embodiment, after acquiring the driving information of the trigger drive from the outside, the driving information is processed and understood, and the three-dimensional virtual human is driven to execute the corresponding action for responding, so that the three-dimensional virtual human can respond to the source user of the driving information.
For example: the driving of the three-dimensional virtual human is mainly controlled by a chat robot system in the cloud server. By the method, the three-dimensional virtual human consistent with the face appearance of the mobile phone terminal is generated on the chat interface of the mobile phone terminal, and the three-dimensional virtual human can be used as a chat robot to carry out conversation and chat with the mobile phone terminal. Specifically, the three-dimensional virtual human in the chat interface of the mobile phone terminal is subjected to voice questioning just before. After the mobile phone terminal collects the voice of the small and big messages, the collected voice is uploaded to the chat robot system in the cloud server. And after receiving the voice information from the small and the hard parties, the chat robot system processes and understands the voice information, and further drives the three-dimensional virtual human in the chat interface to send out corresponding response voice with a corresponding mouth shape.
The embodiment has the advantages that the three-dimensional virtual human which is basically consistent with the appearance of the user is driven to respond to the user, the fidelity of the action execution of the three-dimensional virtual human is improved, and therefore the interactive experience of the user and the three-dimensional virtual human is improved.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
The following describes in detail a process of externally driving the three-dimensional virtual human in response.
In one embodiment, acquiring the driving information of the trigger driving includes: drive information of the trigger drive in the form of text is acquired.
Based on the driving information, driving the three-dimensional virtual human to execute corresponding actions, including:
processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
A natural Language understanding technology, namely, nlu (natural Language understanding) technology, is a technology for performing natural Language processing and understanding using a computer.
In this embodiment, the three-dimensional virtual human can be driven by driving information in the form of text. After the driving information in the text form is acquired, processing the driving information based on a preset natural language understanding technology, further acquiring driving data corresponding to the driving information, and further driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
Specifically, after the driving information is processed based on a preset natural language understanding technology, corresponding driving data for simulation can be obtained, and then the three-dimensional virtual human is driven to execute corresponding actions for simulation based on the driving data; corresponding driving data for response can also be obtained, and the three-dimensional virtual human is driven to execute the corresponding action for response based on the driving data.
For example: the user just inputs text information through a keyboard of the mobile phone terminal, namely 'the weather is cool today' so as to chat with a three-dimensional virtual human which is consistent with the appearance of the face of the user in a chat interface and serves as a chat robot. After receiving the text information, the chat robot system in the cloud server processes and understands the text information based on a preset natural language understanding technology to generate driving data capable of driving the three-dimensional virtual human to make a corresponding response; and driving the three-dimensional virtual human in the chat interface to make corresponding actions based on the driving data, so that the three-dimensional virtual human responds to the user to express that the user has come from the recent cold tide and needs to add some clothes.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In one embodiment, acquiring the driving information of the trigger driving includes: drive information of the trigger drive in the form of voice is acquired.
Based on the driving information, driving the three-dimensional virtual human to execute corresponding actions, including:
carrying out voice recognition on the driving information to obtain the driving information in a corresponding text form;
processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
The speech recognition may be performed by a preset speech recognition technology, i.e., asr (automatic speech recognition) technology. ASR technology is capable of recognizing speech and converting speech into corresponding text.
In this embodiment, the three-dimensional virtual human may be driven by driving information in the form of voice. And after the driving information in the voice form is acquired, performing voice recognition on the driving information to acquire the driving information in the corresponding text form. And processing the driving information based on a preset natural language understanding technology to further obtain driving data corresponding to the driving information, and further driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
Specifically, after the driving information is processed based on a preset natural language understanding technology, corresponding driving data for simulation can be obtained, and then the three-dimensional virtual human is driven to execute corresponding actions for simulation based on the driving data; corresponding driving data for response can also be obtained, and the three-dimensional virtual human is driven to execute the corresponding action for response based on the driving data.
For example: the small mobile phone terminal inputs voice information through a microphone of the mobile phone terminal, namely 'the weather is cool today' so as to chat with a three-dimensional virtual human which is consistent with the appearance of the face of the small mobile phone terminal in the chat interface and serves as a chat robot. The chat robot system in the cloud server receives the voice information, performs voice recognition on the voice information, and obtains text information corresponding to the voice information; processing and understanding the text information based on a preset natural language understanding technology to generate driving data capable of driving the three-dimensional virtual human to make corresponding response; and driving the three-dimensional virtual human in the chat interface to make corresponding actions based on the driving data, so that the three-dimensional virtual human responds to the user to express that the user has come from the recent cold tide and needs to add some clothes.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In an embodiment, processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information includes:
processing the driving information based on a preset natural language understanding technology to obtain a driving intention corresponding to the driving information;
and processing the driving intention based on a preset voice synthesis technology to obtain voice data and mouth shape data corresponding to the driving intention.
Based on the driving data, driving the three-dimensional virtual human to execute corresponding actions, including: and driving the three-dimensional virtual human to send out the voice corresponding to the voice data by the mouth shape corresponding to the mouth shape data based on the voice data and the mouth shape data.
The driving intention is an intention corresponding to the content indicated by the obtained driving information after being combined with a specific scene. In this embodiment, the acquired driving intention is information in the form of text.
For example: in a scene of a virtual anchor, the obtained driving information is a speech of "the weather is a little cold today" spoken by the anchor, and after the scene is combined and processed by a natural language understanding technology, the obtained driving intention is "-the weather is a little cold today". That is, the purpose of the driving information in the scene is to drive the three-dimensional virtual human to say the sentence "the weather today is a little cold".
Another example is: in the scene of the chat robot, the obtained driving information is the voice of ' the weather is cool today ' spoken by the user, and the driving intention is- ' the latest cold tide needs to be added with some clothes after the scene is combined and processed by a natural language understanding technology. Namely, the purpose of the driving information in the scene is to drive the three-dimensional virtual human to say the sentence of voice of "recently cold tide, and some more clothes are needed".
A speech synthesis technology, tts (text To speech) technology, is a technology capable of converting text into speech.
In the embodiment, the three-dimensional virtual human is driven to emit corresponding voice with the corresponding mouth shape. Specifically, after the driving information is acquired, a corresponding driving intention is acquired based on a natural language understanding technology; then, voice data and mouth shape data corresponding to the driving intention are obtained based on a voice synthesis technology; and then driving the three-dimensional virtual human to make a mouth shape corresponding to the mouth shape data, and simultaneously driving a loudspeaker of the client to send out voice corresponding to the voice data, thereby achieving the effect that the three-dimensional virtual human sends out corresponding voice with the corresponding mouth shape.
For example: after a driving intention is obtained, namely 'the latest cold tide comes and a few clothes need to be added', voice data and mouth shape data of the words 'the latest cold tide comes and a few clothes need to be added' are obtained on the basis of a voice synthesis technology. The voice data drives a loudspeaker of the mobile phone terminal to play the voice of the voice, and simultaneously drives the three-dimensional virtual person in the chat interface to make a corresponding mouth shape based on the mouth shape data, so that the effect that the three-dimensional virtual person says 'recent cold tide and needs to add some more clothes' with the corresponding mouth shape is achieved.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
In an embodiment, processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information includes:
processing the driving information based on a preset natural language understanding technology to obtain a driving intention corresponding to the driving information;
and acquiring gesture data corresponding to the driving intention based on a matching result of the driving intention and a preset gesture data set.
Based on the driving data, driving the three-dimensional virtual human to execute corresponding interactive actions for response, wherein the interactive actions comprise: and driving the three-dimensional virtual human to execute a gesture corresponding to the gesture data.
In the embodiment, the three-dimensional virtual human is driven to make a corresponding gesture. Specifically, after the driving information is acquired, a corresponding driving intention is acquired based on a natural language understanding technology; matching the driving intention with a preset posture data set, and acquiring posture data corresponding to the driving intention according to a matching result; and driving the three-dimensional virtual human to execute the gesture corresponding to the gesture data.
For example: a gesture data set is preset in a chat robot system in a cloud server, and each gesture data stored in the gesture data set can drive the three-dimensional virtual human to make a corresponding gesture. Specifically, the gesture data set stores gesture data for driving the three-dimensional virtual human to make a "winning gesture", gesture data for driving the three-dimensional virtual human to make a "running gesture", and gesture data for driving the three-dimensional virtual human to make a "frog-jumping gesture".
The small just inputs voice information-learning frog leaping through a microphone of the mobile phone terminal. The voice information is understood and processed by the natural language of the chat robot system in the cloud server to obtain the driving intention, namely 'frog leaping'. Matching the driving intention with the gesture data set to obtain a matching result, namely gesture data for driving the three-dimensional virtual human to make a frog jumping gesture; and driving the three-dimensional virtual human in the chat interface to make a frog leaping gesture by using the gesture data.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
The implementation logic of the three-dimensional virtual human drive according to an embodiment of the present disclosure is described below with reference to fig. 4 to show the general implementation logic of the present disclosure in practical applications.
FIG. 4 shows the implementation logic of the three-dimensional virtual human driver according to an embodiment of the disclosure: and performing three-dimensional reconstruction on the basis of the video containing the figure image of the user to obtain a model of the three-dimensional virtual human of the user.
After detecting the Voice of the user through Voice Detection technology (VAD, Voice Activity Detection, a technology for detecting whether a Voice signal exists), the Voice is sent to the cloud server. Carrying out voice recognition on the voice by the cloud server to obtain a text corresponding to the voice; further, natural language understanding is carried out on the text corresponding to the voice to obtain the text corresponding to the response result; and then carrying out voice synthesis on the text corresponding to the response result, and driving the facial expression of the model of the three-dimensional virtual human. And if necessary, when the facial expression of the model of the three-dimensional virtual human is driven, the loudspeaker is driven to play the voice corresponding to the response result, so that the effect of performing voice interaction on the three-dimensional virtual human and the user by using the corresponding mouth shape is achieved.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
The implementation logic of the three-dimensional virtual human drive according to an embodiment of the present disclosure is described in detail below with reference to fig. 5 to show the detailed implementation logic of the present disclosure in practical application.
FIG. 5 shows the implementation logic of the three-dimensional virtual human driver according to an embodiment of the disclosure: in controlling the face of the three-dimensional virtual human to make the three-dimensional virtual human interact with, the implementation logic can be divided into the following two.
1. The video side: in the embodiment, a video frame containing a character image and a depth map are acquired through a mobile phone camera; selecting a video frame and a depth map according to the shooting angle of the person image; after the video frame and the depth map with the shooting angles meeting the preset conditions are selected, the server performs three-dimensional reconstruction and model rendering on the character image based on an RGBD algorithm, and accordingly a three-dimensional virtual human model is obtained.
2. The audio side: in the embodiment, the audio is collected through a microphone of the mobile phone; carrying out voice detection on the collected audio to determine whether the voice of the user exists; after the user voice is acquired, carrying out voice recognition on the user voice to obtain a text corresponding to the user voice; natural voice understanding is carried out on a text corresponding to the voice of the user to obtain a response result; and carrying out voice synthesis on the response result to obtain voice data and mouth shape data corresponding to the response result.
And then controlling the face of the model of the three-dimensional virtual human obtained at the video side according to the mouth shape data corresponding to the response result obtained at the audio side, and simultaneously driving the playing of the application voice according to the voice data corresponding to the response result obtained at the audio side, thereby achieving the effect of performing voice interaction between the three-dimensional virtual human and the user according to the corresponding mouth shape.
It should be noted that the embodiment is only an exemplary illustration, and should not limit the function and the scope of the disclosure.
According to an embodiment of the present disclosure, as shown in fig. 6, there is also provided a three-dimensional virtual human driving apparatus, including:
a first obtaining module 410 configured to obtain a person image with depth information;
the second obtaining module 420 is configured to perform three-dimensional reconstruction on the human figure image based on the depth information, and obtain a three-dimensional virtual human corresponding to the human figure image;
a third obtaining module 430 configured to obtain driving information of the trigger driver;
and the driving module 440 is configured to drive the three-dimensional virtual human to execute a corresponding action based on the driving information.
In an exemplary embodiment of the disclosure, the acquisition slots corresponding to the shooting angles are set in advance, and the first obtaining module 410 is configured to: and acquiring the figure image with the depth information of each shooting angle from each acquisition groove.
In an exemplary embodiment of the present disclosure, the first obtaining module 410 is configured to:
acquiring a video frame containing a figure image and depth information of the video frame;
extracting figure key points in the video frame based on a preset key point extraction algorithm;
acquiring a shooting angle of a character image in the video frame based on the character key point;
and if the shooting angle meets a preset condition, acquiring the figure image in the video frame and the depth information of the figure image.
In an exemplary embodiment of the disclosure, the second obtaining module 420 is configured to:
based on the depth information, performing three-dimensional reconstruction on the figure image to obtain a three-dimensional model corresponding to the figure image;
and performing image rendering on the three-dimensional model based on the texture information of the character image to obtain a three-dimensional virtual human corresponding to the character image.
In an exemplary embodiment of the disclosure, the third obtaining module 430 is configured to: acquiring preset driving information of trigger driving; the drive module 440 is configured to:
acquiring preset mouth shape data corresponding to the driving information and corresponding preset voice data;
and driving the three-dimensional virtual human to send out the voice corresponding to the voice data according to the mouth shape corresponding to the mouth shape data based on the voice data and the mouth shape data.
In an exemplary embodiment of the present disclosure, the driving module 440 is configured to:
acquiring preset posture data corresponding to the driving information;
and driving the three-dimensional virtual human to execute a gesture corresponding to the gesture data based on the gesture data.
In an exemplary embodiment of the disclosure, the third obtaining module 430 is configured to: acquiring driving information of trigger driving in a text form; the drive module 440 is configured to:
processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
In an exemplary embodiment of the disclosure, the third obtaining module 430 is configured to: acquiring driving information of trigger driving in a voice form; the drive module 440 is configured to:
carrying out voice recognition on the driving information to obtain the driving information in a corresponding text form;
processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
In an exemplary embodiment of the present disclosure, the driving module 440 is configured to:
processing the driving information based on a preset natural language understanding technology to obtain a driving intention corresponding to the driving information;
processing the driving intention based on a preset voice synthesis technology to obtain voice data and mouth shape data corresponding to the driving intention;
and driving the three-dimensional virtual human to send out the voice corresponding to the voice data according to the mouth shape corresponding to the mouth shape data based on the voice data and the mouth shape data.
In an exemplary embodiment of the present disclosure, the driving module 440 is configured to:
processing the driving information based on a preset natural language understanding technology to obtain a driving intention corresponding to the driving information;
acquiring gesture data corresponding to the driving intention based on a matching result of the driving intention and a preset gesture data set;
and driving the three-dimensional virtual human to execute a gesture corresponding to the gesture data based on the gesture data.
The three-dimensional virtual human driving electronic device 50 according to the embodiment of the present disclosure is described below with reference to fig. 7. The three-dimensional virtual human driving electronic device 50 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the range of use of the embodiment of the present disclosure.
As shown in FIG. 7, the three-dimensional avatar driving electronics 50 is in the form of a general purpose computing device. The components of the three-dimensional avatar driving electronics 50 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 that couples various system components including the memory unit 520 and the processing unit 510.
Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the description part of the above exemplary methods of the present specification. For example, the processing unit 510 may perform the various steps as shown in fig. 3.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The three-dimensional avatar-driving electronic device 50 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the three-dimensional avatar-driving electronic device 50, and/or with any devices (e.g., router, modem, etc.) that enable the three-dimensional avatar-driving electronic device 50 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the three-dimensional avatar driving electronics 50 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 560. As shown, network adapter 560 communicates with the other modules of the three-dimensional avatar drive electronics 50 via bus 530. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the three-dimensional avatar driving electronics 50, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.
According to an embodiment of the present disclosure, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a random access memory (RGM), a Read Only Memory (ROM), an erasable programmable read only memory (EPROM or flash memory), an optical fiber, a portable compact disk read only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (KGN) or a wide area network (WGN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (14)

1. A three-dimensional virtual human driving method is characterized by comprising the following steps:
acquiring a figure image with depth information;
based on the depth information, performing three-dimensional reconstruction on the character image to obtain a three-dimensional virtual human corresponding to the character image;
acquiring driving information of a trigger driver;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving information.
2. The method of claim 1, wherein the previously setting the collection slots corresponding to the respective shooting angles to obtain the image of the person with the depth information comprises: and acquiring the figure image with the depth information of each shooting angle from each acquisition groove.
3. The method of claim 1, wherein obtaining the image of the person with the depth information comprises:
acquiring a video frame containing a figure image and depth information of the video frame;
extracting figure key points in the video frame based on a preset key point extraction algorithm;
acquiring a shooting angle of a character image in the video frame based on the character key point;
and if the shooting angle meets a preset condition, acquiring the figure image in the video frame and the depth information of the figure image.
4. The method of claim 1, wherein three-dimensionally reconstructing the human figure image based on the depth information to obtain a three-dimensional virtual human corresponding to the human figure image comprises:
based on the depth information, performing three-dimensional reconstruction on the figure image to obtain a three-dimensional model corresponding to the figure image;
and performing image rendering on the three-dimensional model based on the texture information of the character image to obtain a three-dimensional virtual human corresponding to the character image.
5. The method of claim 1, wherein obtaining actuation information for triggering actuation comprises: acquiring preset driving information of trigger driving;
driving the three-dimensional virtual human to execute corresponding actions based on the driving information, wherein the actions comprise: and driving the three-dimensional virtual human to execute a corresponding preset action based on the driving information.
6. The method according to claim 5, wherein driving the three-dimensional virtual human to execute the corresponding preset action based on the driving information comprises:
acquiring preset mouth shape data corresponding to the driving information and corresponding preset voice data;
and driving the three-dimensional virtual human to send out the voice corresponding to the voice data according to the mouth shape corresponding to the mouth shape data based on the voice data and the mouth shape data.
7. The method according to claim 5, wherein driving the three-dimensional virtual human to execute the corresponding preset action based on the driving information comprises:
acquiring preset posture data corresponding to the driving information;
and driving the three-dimensional virtual human to execute a gesture corresponding to the gesture data based on the gesture data.
8. The method of claim 1, wherein obtaining actuation information for triggering actuation comprises: acquiring driving information of trigger driving in a text form;
driving the three-dimensional virtual human to execute corresponding actions based on the driving information, wherein the actions comprise:
processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
9. The method of claim 1, wherein obtaining actuation information for triggering actuation comprises: acquiring driving information of trigger driving in a voice form;
driving the three-dimensional virtual human to execute corresponding actions based on the driving information, wherein the actions comprise:
carrying out voice recognition on the driving information to obtain the driving information in a corresponding text form;
processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information;
and driving the three-dimensional virtual human to execute corresponding actions based on the driving data.
10. The method according to claim 8 or 9, wherein processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information comprises:
processing the driving information based on a preset natural language understanding technology to obtain a driving intention corresponding to the driving information;
processing the driving intention based on a preset voice synthesis technology to obtain voice data and mouth shape data corresponding to the driving intention;
driving the three-dimensional virtual human to execute corresponding actions based on the driving data, wherein the actions comprise: and driving the three-dimensional virtual human to send out the voice corresponding to the voice data according to the mouth shape corresponding to the mouth shape data based on the voice data and the mouth shape data.
11. The method according to claim 8 or 9, wherein processing the driving information based on a preset natural language understanding technology to obtain driving data corresponding to the driving information comprises:
processing the driving information based on a preset natural language understanding technology to obtain a driving intention corresponding to the driving information;
acquiring gesture data corresponding to the driving intention based on a matching result of the driving intention and a preset gesture data set;
driving the three-dimensional virtual human to execute corresponding actions based on the driving data, wherein the actions comprise: and driving the three-dimensional virtual human to execute a gesture corresponding to the gesture data based on the gesture data.
12. A three-dimensional virtual human driving device, characterized in that the device comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire a person image with depth information;
the second acquisition module is configured to perform three-dimensional reconstruction on the character image based on the depth information to acquire a three-dimensional virtual human corresponding to the character image;
the third acquisition module is configured to acquire driving information of the trigger driving;
and the driving module is configured to drive the three-dimensional virtual human to execute corresponding actions based on the driving information.
13. A three-dimensional virtual human driving electronic device, comprising:
a memory storing computer readable instructions;
a processor reading computer readable instructions stored by the memory to perform the method of any of claims 1-11.
14. A computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-11.
CN201911267521.2A 2019-12-11 2019-12-11 Three-dimensional virtual human driving method and device, electronic equipment and storage medium Pending CN111063024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911267521.2A CN111063024A (en) 2019-12-11 2019-12-11 Three-dimensional virtual human driving method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911267521.2A CN111063024A (en) 2019-12-11 2019-12-11 Three-dimensional virtual human driving method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111063024A true CN111063024A (en) 2020-04-24

Family

ID=70300628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911267521.2A Pending CN111063024A (en) 2019-12-11 2019-12-11 Three-dimensional virtual human driving method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111063024A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560622A (en) * 2020-12-08 2021-03-26 中国联合网络通信集团有限公司 Virtual object motion control method and device and electronic equipment
CN114979675A (en) * 2022-04-09 2022-08-30 澳克多普有限公司 Virtual mobile phone management method based on virtual live broadcast platform
CN115814421A (en) * 2023-01-10 2023-03-21 北京红棉小冰科技有限公司 Virtual human behavior self-driven scheduling generation method, device, equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693091A (en) * 2012-05-22 2012-09-26 深圳市环球数码创意科技有限公司 Method for realizing three dimensional virtual characters and system thereof
CN104461525A (en) * 2014-11-27 2015-03-25 韩慧健 Intelligent user-defined consulting platform generating system
US20170064235A1 (en) * 2015-08-27 2017-03-02 Samsung Electronics Co., Ltd. Epipolar plane single-pulse indirect tof imaging for automotives
CN108305283A (en) * 2018-01-22 2018-07-20 清华大学 Human bodys' response method and device based on depth camera and basic form
CN108986190A (en) * 2018-06-21 2018-12-11 珠海金山网络游戏科技有限公司 A kind of method and system of the virtual newscaster based on human-like persona non-in three-dimensional animation
CN109215131A (en) * 2017-06-30 2019-01-15 Tcl集团股份有限公司 The driving method and device of conjecture face
CN109285214A (en) * 2018-08-16 2019-01-29 Oppo广东移动通信有限公司 Processing method, device, electronic equipment and the readable storage medium storing program for executing of threedimensional model
CN109816773A (en) * 2018-12-29 2019-05-28 深圳市瑞立视多媒体科技有限公司 A kind of driving method, plug-in unit and the terminal device of the skeleton model of virtual portrait
CN109949412A (en) * 2019-03-26 2019-06-28 腾讯科技(深圳)有限公司 A kind of three dimensional object method for reconstructing and device
CN110211582A (en) * 2019-05-31 2019-09-06 量子动力(深圳)计算机科技有限公司 A kind of real-time, interactive intelligent digital virtual actor's facial expression driving method and system
WO2019190278A1 (en) * 2018-03-30 2019-10-03 Samsung Electronics Co., Ltd. Method and apparatus for compressing point clouds
CN110415322A (en) * 2019-07-29 2019-11-05 网易(杭州)网络有限公司 The generation method and device of the action command of virtual objects model

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693091A (en) * 2012-05-22 2012-09-26 深圳市环球数码创意科技有限公司 Method for realizing three dimensional virtual characters and system thereof
CN104461525A (en) * 2014-11-27 2015-03-25 韩慧健 Intelligent user-defined consulting platform generating system
US20170064235A1 (en) * 2015-08-27 2017-03-02 Samsung Electronics Co., Ltd. Epipolar plane single-pulse indirect tof imaging for automotives
CN109215131A (en) * 2017-06-30 2019-01-15 Tcl集团股份有限公司 The driving method and device of conjecture face
CN108305283A (en) * 2018-01-22 2018-07-20 清华大学 Human bodys' response method and device based on depth camera and basic form
WO2019190278A1 (en) * 2018-03-30 2019-10-03 Samsung Electronics Co., Ltd. Method and apparatus for compressing point clouds
CN108986190A (en) * 2018-06-21 2018-12-11 珠海金山网络游戏科技有限公司 A kind of method and system of the virtual newscaster based on human-like persona non-in three-dimensional animation
CN109285214A (en) * 2018-08-16 2019-01-29 Oppo广东移动通信有限公司 Processing method, device, electronic equipment and the readable storage medium storing program for executing of threedimensional model
CN109816773A (en) * 2018-12-29 2019-05-28 深圳市瑞立视多媒体科技有限公司 A kind of driving method, plug-in unit and the terminal device of the skeleton model of virtual portrait
CN109949412A (en) * 2019-03-26 2019-06-28 腾讯科技(深圳)有限公司 A kind of three dimensional object method for reconstructing and device
CN110211582A (en) * 2019-05-31 2019-09-06 量子动力(深圳)计算机科技有限公司 A kind of real-time, interactive intelligent digital virtual actor's facial expression driving method and system
CN110415322A (en) * 2019-07-29 2019-11-05 网易(杭州)网络有限公司 The generation method and device of the action command of virtual objects model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周文: "基于RGB-D相机的三维人体重建方法研究", 《中国硕士学位论文全文数据库(信息科技辑)》 *
陈益强: "基于数据挖掘的虚拟人多模式行为合成研究", 《《中国博士学位论文全文数据库(信息科技辑)》》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560622A (en) * 2020-12-08 2021-03-26 中国联合网络通信集团有限公司 Virtual object motion control method and device and electronic equipment
CN112560622B (en) * 2020-12-08 2023-07-21 中国联合网络通信集团有限公司 Virtual object action control method and device and electronic equipment
CN114979675A (en) * 2022-04-09 2022-08-30 澳克多普有限公司 Virtual mobile phone management method based on virtual live broadcast platform
CN115814421A (en) * 2023-01-10 2023-03-21 北京红棉小冰科技有限公司 Virtual human behavior self-driven scheduling generation method, device, equipment and storage medium
CN115814421B (en) * 2023-01-10 2023-10-27 北京红棉小冰科技有限公司 Virtual human behavior self-driven scheduling generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110531860B (en) Animation image driving method and device based on artificial intelligence
TWI778477B (en) Interaction methods, apparatuses thereof, electronic devices and computer readable storage media
US11281709B2 (en) System and method for converting image data into a natural language description
KR102503413B1 (en) Animation interaction method, device, equipment and storage medium
CN112215927B (en) Face video synthesis method, device, equipment and medium
CN112379812B (en) Simulation 3D digital human interaction method and device, electronic equipment and storage medium
US8381108B2 (en) Natural user input for driving interactive stories
US8660847B2 (en) Integrated local and cloud based speech recognition
US11640519B2 (en) Systems and methods for domain adaptation in neural networks using cross-domain batch normalization
CN111063024A (en) Three-dimensional virtual human driving method and device, electronic equipment and storage medium
JP2022505718A (en) Systems and methods for domain adaptation in neural networks using domain classifiers
US20230047858A1 (en) Method, apparatus, electronic device, computer-readable storage medium, and computer program product for video communication
CN110598576A (en) Sign language interaction method and device and computer medium
US20230368461A1 (en) Method and apparatus for processing action of virtual object, and storage medium
CN111538456A (en) Human-computer interaction method, device, terminal and storage medium based on virtual image
CN114567693B (en) Video generation method and device and electronic equipment
CN116229311A (en) Video processing method, device and storage medium
CN115526772A (en) Video processing method, device, equipment and storage medium
JP2023120130A (en) Conversation-type ai platform using extraction question response
KR20220023005A (en) Realistic Interactive Edutainment System Using Tangible Elements
CN112135152B (en) Information processing method and device
CN114972589A (en) Driving method and device for virtual digital image
CN113766295A (en) Playing processing method, device, equipment and storage medium
CN116841436A (en) Video-based interaction method, apparatus, device, storage medium, and program product
CN116863042A (en) Motion generation method of virtual object and training method of motion generation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40021696

Country of ref document: HK