CN113395476A - Virtual character video call method and system based on three-dimensional face reconstruction - Google Patents
Virtual character video call method and system based on three-dimensional face reconstruction Download PDFInfo
- Publication number
- CN113395476A CN113395476A CN202110632937.0A CN202110632937A CN113395476A CN 113395476 A CN113395476 A CN 113395476A CN 202110632937 A CN202110632937 A CN 202110632937A CN 113395476 A CN113395476 A CN 113395476A
- Authority
- CN
- China
- Prior art keywords
- dimensional face
- video
- communication terminal
- model parameters
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000004891 communication Methods 0.000 claims abstract description 90
- 238000005516 engineering process Methods 0.000 claims abstract description 19
- 238000013507 mapping Methods 0.000 claims abstract description 15
- 238000009877 rendering Methods 0.000 claims abstract description 7
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 7
- 238000005286 illumination Methods 0.000 claims description 11
- 150000001875 compounds Chemical class 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000007547 defect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/08—Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention provides a virtual character video call method and a virtual character video call system based on three-dimensional face reconstruction, aiming at overcoming the defects of low video call fluency and low flexibility, and the method comprises the following steps: acquiring a video stream and an audio stream of a first communication terminal, or only acquiring the audio stream of the first communication terminal; inputting video stream image frames into a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters; or inputting the audio stream into an audio prediction network to obtain predicted three-dimensional human face model parameters; merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file; transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring a corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence; and rendering the video image frame sequence and then synthesizing the video of the virtual character with the audio stream.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a virtual character video call method and a virtual character video call system based on three-dimensional face reconstruction.
Background
With the popularization of smart phones and the rapid development of internet technologies, the communication modes among people are changed greatly, and video communication becomes a popular mode for communication among people, but has certain limitations in practical application. Firstly, in the current video call, the video call can still be normally carried out with the other end under the condition that the camera is inconvenient to open at one communication end, so that the video call lacks certain flexibility; secondly, when the video call is performed in an area with low network transmission capability, the video is blocked, which greatly reduces the user experience in the video call.
At present, a video call method for constructing a virtual character is provided, which not only can reduce the amount of transmitted data to increase the smoothness of video call, but also can improve the interest of video call by replacing the character identity of the opposite communication terminal. For example, a virtual instant messaging method disclosed in publication No. CN110213521A (published japanese 2019-09-06) proposes to use a virtual 2D/3D image model with the same expression and posture as those of two parties to replace the real appearance of the two parties in the process of virtual instant messaging. However, the method needs a terminal camera to capture the face image of the person at any time, lacks of acquiring the head posture information, still does not get rid of the excessive dependence on the camera, and still has the problems of low flexibility and low video call fluency.
Disclosure of Invention
In order to overcome the defects of low video call fluency and low flexibility in the prior art, the invention provides a virtual character video call method based on three-dimensional face reconstruction and a virtual character video call system based on three-dimensional face reconstruction.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a virtual character video call method based on three-dimensional face reconstruction comprises the following steps:
s1: selecting virtual character video call modes, including a video-to-video call mode and an audio-to-video call mode:
when the video-to-video call mode is selected: acquiring a video stream and an audio stream of a first communication terminal, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when the audio-to-video call mode is selected: acquiring an audio stream of a first communication terminal, inputting the audio stream into an audio prediction network to perform model parameter prediction of a parameterized three-dimensional face, and obtaining predicted three-dimensional face model parameters; merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
s2: transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring a corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence;
s3: and rendering the video image frame sequence and then synthesizing the video image frame sequence and the audio stream into a virtual character video.
As a preferred scheme, the three-dimensional face model parameters obtained by the three-dimensional face reconstruction network prediction comprise an identity model parameter S, an expression model parameter E, a texture model parameter T, a posture model parameter P and an illumination model parameter L; the three-dimensional face model parameters obtained through the audio prediction network prediction comprise expression model parameters E and posture model parameters P.
Preferably, the method further comprises the following steps: carrying out optimization training on the three-dimensional face reconstruction network according to the video stream image frames, the predicted three-dimensional face model parameters and the restored three-dimensional face model, wherein the expression formula is as follows:
in the formula (I), the compound is shown in the specification,network parameters representing a three-dimensional face reconstruction network,representing the image frames of the original video stream,representing a prediction function learned by a three-dimensional face reconstruction network; ω (-) represents the function that maps the restored three-dimensional face model to the two-dimensional image plane.
Preferably, the three-dimensional face reconstruction network comprises an R-Net network.
Preferably, the method further comprises the following steps: performing optimization training on the audio prediction network according to the audio stream, the predicted three-dimensional face model parameters and the preset initial three-dimensional face model parameters, wherein an expression formula is as follows:
in the formula, theta1And theta2Predicting a network parameter of a network for the audio,is the preset initial expression model parameters,showing the pre-set initial pose model parameters,representing an audio stream; h isE(. h) represents an expressive feature prediction function learned by an audio prediction networkP(. cndot.) represents a gesture feature prediction function learned by an audio prediction network.
Preferably, the audio prediction network comprises an LSTM network.
Preferably, in the step S2, the specific step of restoring the corresponding three-dimensional face model by using the three-dimensional face reconstruction technique includes:
initializing a set of vertices of a three-dimensional faceAnd RGB set corresponding to three-dimensional face vertex set
Changing the position of the three-dimensional face vertex set according to the expression model parameter E and the posture model parameter P in the parameter file, and changing the color value of the RGB set corresponding to the three-dimensional face vertex set according to the texture model parameter T and the illumination model parameter L in the parameter file, wherein the expression formula is as follows:
in the formula (I), the compound is shown in the specification,an identity base representing a three-dimensional face,an expression base representing a three-dimensional face,a texture base representing a three-dimensional face; x (lambda; P) represents a function for changing the position of the three-dimensional face vertex set according to the posture model parameter P, and lambda represents the vertex set of the position to be changed; c (epsilon; L) represents a function for changing the RGB set corresponding to the three-dimensional face vertex set according to the illumination model parameter L, wherein epsilon is the RGB set of the color value to be changed; n is a radical of1、N2The total numbers of the identity bases and the expression bases are respectively, and the subscripts i and j are respectively the ordinal numbers of the identity bases and the expression bases;
according to the vertex set S of the changed three-dimensional face*And RGB set T*And constructing a recovered three-dimensional face model, mapping each vertex in the three-dimensional face model to a two-dimensional image plane by affine transformation, and mapping the RGB color value of each vertex to the two-dimensional image plane correspondingly to serve as a pixel point of the mapping point to obtain a recovered video image frame.
Preferably, the step S2 further includes the following steps: after the parameter file is compressed, the parameter file is transmitted to a second communication terminal by adopting cloud service; and transmitting the audio stream of the first communication terminal to a second communication terminal by adopting a network protocol.
Preferably, the method further comprises the following steps: and the second communication terminal is preset with identity model parameters S of other figures, and when the second communication terminal receives the parameter file and the audio stream transmitted by the first communication terminal, the identity model parameters S and the identity model parameters in the parameter file are replaced, and then the corresponding three-dimensional face model is recovered by using a three-dimensional face reconstruction technology.
The invention also provides a virtual character video call system based on three-dimensional face reconstruction, which is applied to the virtual character video call method provided by any technical scheme, and the virtual character video call system comprises a first communication terminal and a second communication terminal, wherein the first communication terminal and the second communication terminal respectively comprise a video acquisition module, an audio acquisition module, a display module, a communication module and a main control module; wherein:
the video acquisition module is used for acquiring video streams and sending the video streams to the main control module;
the audio acquisition module is used for acquiring audio streams and sending the audio streams to the main control module;
the main control module decomposes the video stream into image frames according to the currently selected virtual character video call mode, and then carries out model parameter prediction on a parameterized three-dimensional face by adopting a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and store the predicted three-dimensional face model parameters as a parameter file;
or inputting the audio stream into an audio prediction network to carry out model parameter prediction of a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
the master control module transmits the generated parameter file to another communication terminal through the communication module;
the main control module is also used for restoring a corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the parameter file received by the communication module and then mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence; and rendering the video image frame sequence, synthesizing the video image frame sequence with the audio stream to form a virtual character video, and transmitting the virtual character video to the display module for displaying.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
according to the invention, a three-dimensional face reconstruction technology is utilized, image frames captured by a camera are subjected to three-dimensional face reconstruction to obtain parameterized three-dimensional face model parameters for transmission, or the three-dimensional face model parameters of a communication terminal user are predicted from a recorded audio stream and then are transmitted, so that the data transmission quantity of a communication terminal can be reduced, and the smoothness of video call is effectively improved;
the invention can predict the three-dimensional face model parameters of the communication terminal user only from the recorded audio stream, and can recover the complete three-dimensional face by combining the preset initial three-dimensional face model parameters, thereby realizing video call under the condition of closing the camera.
Drawings
Fig. 1 is a flowchart of a virtual character video call method based on three-dimensional face reconstruction in embodiment 1.
Fig. 2 is a schematic diagram of a virtual character video call method according to embodiment 1.
Fig. 3 is a schematic diagram of a video-to-video one-way avatar video call in embodiment 1.
Fig. 4 is a schematic diagram of an audio-to-video one-way avatar video call in embodiment 1.
Fig. 5 is a schematic view of a virtual character video call for replacing the virtual character identity according to embodiment 2.
Fig. 6 is a schematic view of a virtual character video call for replacing the virtual character identity according to embodiment 2.
Fig. 7 is a schematic structural diagram of a virtual character video call system based on three-dimensional face reconstruction in embodiment 3.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The first communication terminal proposed in this embodiment refers to a data transmitting end, and the second communication terminal refers to a data receiving end.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a virtual character video call method based on three-dimensional face reconstruction, and is a flow chart of the virtual character video call method based on three-dimensional face reconstruction in the embodiment, as shown in fig. 1 to 2.
The virtual character video call method based on three-dimensional face reconstruction provided by the embodiment comprises the following steps:
step 1: the selected video-to-video communication mode or audio-to-video communication mode is as follows:
when the video-to-video call mode is selected: acquiring a video stream and an audio stream of a first communication terminal, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when the audio-to-video call mode is selected: the method comprises the steps of obtaining an audio stream of a first communication terminal, inputting the audio stream into an audio prediction network to carry out model parameter prediction on a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and storing the parameters as parameter files.
The three-dimensional face model parameters obtained through the three-dimensional face reconstruction network prediction comprise an identity model parameter S, an expression model parameter E, a texture model parameter T, a posture model parameter P and an illumination model parameter L; the three-dimensional face model parameters obtained through the audio prediction network prediction comprise expression model parameters E and posture model parameters P. In addition, the preset initial three-dimensional face model parameters in this embodiment include an identity model parameter S, an expression model parameter E, a texture model parameter T, a pose model parameter P, and an illumination model parameter L, and the initial three-dimensional face model parameters are obtained by performing three-dimensional face reconstruction through a face image of the front face of a user character shot in advance before a virtual character video call is performed.
The three-dimensional face reconstruction network in this step adopts an R-Net network, and the three-dimensional face reconstruction network is optimally trained according to the video stream image frames, the predicted three-dimensional face model parameters and the restored three-dimensional face model, and the expression formula is as follows:
in the formula (I), the compound is shown in the specification,showing the network parameters of the three-dimensional face reconstruction network,representing the image frames of the original video stream,representing a prediction function learned by a three-dimensional face reconstruction network; ω (-) represents the function that maps the restored three-dimensional face model to the two-dimensional image plane.
The audio prediction network in this step adopts an LSTM network, and the audio prediction network is optimized and trained according to the audio stream, the predicted three-dimensional face model parameters and the preset initial three-dimensional face model parameters, and the expression formula is as follows:
in the formula, theta1And theta2Predicting a network parameter of a network for the audio,is the preset initial expression model parameters,representing pre-set initial pose model parameters,representing an audio stream; h isE(. represents an audio prediction networkLearned expression feature prediction function, hP(. cndot.) represents a gesture feature prediction function learned by an audio prediction network.
Step 2: and transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring the corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence.
In this step, the parameter file is compressed, then the cloud service is adopted to transmit the parameter file to the second communication terminal, and a network protocol is adopted to transmit the audio stream of the first communication terminal to the second communication terminal.
Further, the specific steps of restoring the corresponding three-dimensional face model by using the three-dimensional face reconstruction technology in the step include:
initializing a set of vertices of a three-dimensional faceRGB set corresponding to three-dimensional face vertex set
Changing the position of the three-dimensional face vertex set according to the expression model parameter E and the posture model parameter P in the parameter file, and changing the color value of the RGB set corresponding to the three-dimensional face vertex set according to the texture model parameter T and the illumination model parameter L in the parameter file, wherein the expression formula is as follows:
in the formula (I), the compound is shown in the specification,an identity base representing a three-dimensional face,an expression base representing a three-dimensional face,a texture base representing a three-dimensional face; x (lambda; P) represents a function for changing the position of the three-dimensional face vertex set according to the posture model parameter P, and lambda represents the vertex set of the position to be changed; c (epsilon; L) represents a function for changing the RGB set corresponding to the three-dimensional face vertex set according to the illumination model parameter L, wherein epsilon is the RGB set of the color value to be changed; n is a radical of1、N2The total numbers of the identity bases and the expression bases are respectively, and the subscripts i and j are respectively the ordinal numbers of the identity bases and the expression bases;
according to the vertex set S of the changed three-dimensional face*And RGB set T*And constructing a recovered three-dimensional face model, mapping each vertex in the three-dimensional face model to a two-dimensional image plane by affine transformation, and mapping the RGB color value of each vertex to the two-dimensional image plane correspondingly to serve as a pixel point of the mapping point to obtain a recovered video image frame.
And step 3: and rendering the video image frame sequence and then synthesizing the video image frame sequence and the audio stream into a virtual character video.
In the step, a face renderer for generating a countermeasure network is adopted to render the synthesized video image frame sequence, so that the reality of the video image frame sequence is improved, and then the video image frame sequence and the received audio stream are synthesized by adopting a multimedia processing technology to obtain a virtual character video and are displayed.
In a specific implementation process, the method for virtual character video call provided in this embodiment is applied to a video-to-video call mode, and a flow diagram of the method is shown in fig. 3.
When the virtual character video call is carried out, the first communication terminal continuously captures video stream and audio stream, the video stream is decomposed into image frame sequences by adopting a multimedia processing technology, then model parameter prediction of a parameterized three-dimensional face is carried out by adopting a three-dimensional face reconstruction network, and predicted three-dimensional face model parameters are obtained and stored as parameter files in a mat format or a yml format. Compressing the parameter file in the mat format or yml format into a file in the zip format, then transmitting the file by using cloud service, and compressing the audio stream into an mp3 file, then transmitting the file by using a network protocol. And the second communication terminal recovers the corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the received parameter file, then maps the three-dimensional face model to a two-dimensional image plane to obtain a recovered video image frame sequence, and then renders the video image frame sequence to synthesize a virtual character video with the audio stream.
In the embodiment, the problem of poor video call fluency in a poor network area is considered, a three-dimensional face reconstruction technology is utilized, image frames captured by a camera are subjected to three-dimensional face reconstruction to obtain parameterized three-dimensional face model parameters, and the parameterized three-dimensional face model parameters almost contain all information of image portraits, so that the communication terminal can complete data volume transmission of video calls only by transmitting the complete three-dimensional face model parameters and audio streams to a communication opposite terminal, and accordingly the data volume required to be transmitted by the video calls is reduced and the fluency of the video calls is improved. When the user selects the audio-video call mode, the data volume to be transmitted is still the complete three-dimensional human face model parameter and audio stream.
In another specific implementation process, the method for virtual character video call provided in this embodiment is applied to a call mode from audio to video, and a flow diagram of the method is shown in fig. 4.
Before the virtual character video call is carried out, a front face image of a single portrait needs to be shot in advance to carry out three-dimensional face reconstruction, so that initial three-dimensional face model parameters are obtained and stored on a corresponding communication terminal.
When the virtual character video call is carried out, the first communication terminal only collects audio stream data, then inputs the audio stream into an audio prediction network adopting an LSTM network to carry out model parameter prediction on a parameterized three-dimensional face to obtain predicted expression model parameters and posture model parameters, then carries out merging and updating with initial three-dimensional face model parameters stored in the current communication terminal, and then stores the parameters as a parameter file in a mat format or a parameter file in a yml format.
And similarly, compressing the parameter file, transmitting the compressed parameter file by using cloud service, and transmitting the audio stream by using a network protocol. And the second communication terminal recovers the corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the received parameter file, then maps the three-dimensional face model to a two-dimensional image plane to obtain a recovered video image frame sequence, and then renders the video image frame sequence to synthesize a virtual character video with the audio stream.
In the embodiment, the limitation problem of the video call in an application scene is considered, an audio prediction method combining a three-dimensional face reconstruction technology with deep learning is utilized, under the condition that a camera is closed, the expression and head posture model parameters of a parameterized three-dimensional face of a communication terminal user are predicted from a recorded audio stream, and then are combined with the three-dimensional face model parameters preset on the communication terminal, the communication terminal can recover the complete three-dimensional face from the complete three-dimensional face model parameters and map the complete three-dimensional face to a two-dimensional image plane to obtain a corresponding image frame, and therefore the video call mode from audio to video is achieved.
Example 2
The present embodiment provides a virtual character video call method for replacing the identity of a virtual character on the basis of the virtual character video call method based on three-dimensional face reconstruction provided in embodiment 1. Fig. 5 to 6 are schematic diagrams of the virtual character video call for replacing the virtual character identity according to the embodiment.
In this embodiment, the method further includes the following steps: and the second communication terminal is preset with identity model parameters S of other figures, and when the second communication terminal receives the parameter file and the audio stream transmitted by the first communication terminal, the identity model parameters S and the identity model parameters in the parameter file are replaced, and then the corresponding three-dimensional face model is recovered by using a three-dimensional face reconstruction technology.
In the specific implementation process, the communication terminal can autonomously select other virtual character identities which are pre-stored, that is, the communication terminal pre-stores the identity model parameters S of the corresponding character identities, and the virtual character identities can be replaced at any time in the video call process.
When the communication terminal selects the identities of other virtual characters, the communication terminal selects corresponding identity model parameters S from the prestored identity model parameters S, replaces the corresponding identity model parameters S with the identity model parameters in the currently received parameter file, keeps expression model parameters E, texture model parameters T, posture model parameters P and illumination model parameters L unchanged, and recovers the corresponding three-dimensional face model by using a three-dimensional face reconstruction technology.
In the embodiment, the flexibility of replaceable modification of the parameterized three-dimensional face model parameters is utilized, when the complete three-dimensional face model parameters are received at one end of a video call, only the identity model parameters S in the three-dimensional face model parameters are replaced by the identity parameters of the preset cartoon characters or celebrities, and the identity of other virtual characters can be replaced under the condition that the expression and the head posture of the characters at the opposite end of communication are not changed, so that the interestingness and the flexibility of the video call are improved.
Example 3
The embodiment provides a virtual character video call system based on three-dimensional face reconstruction, which is applied to the virtual character video call method based on three-dimensional face reconstruction provided in embodiment 1 or embodiment 2. Fig. 7 is a schematic structural diagram of a virtual character video call system based on three-dimensional face reconstruction according to this embodiment.
The virtual character video call system based on three-dimensional face reconstruction provided by the embodiment comprises a first communication terminal and a second communication terminal which have the same structure, wherein the first communication terminal and the second communication terminal respectively comprise a video acquisition module 1, an audio acquisition module 2, a display module 5, a communication module 3 and a main control module 4; wherein:
the video acquisition module 1 is used for acquiring video streams and sending the video streams to the main control module 4;
the audio acquisition module 2 is used for acquiring audio streams and sending the audio streams to the main control module 4;
the main control module 4 processes the collected video stream or audio stream according to the currently selected virtual character video call mode, specifically:
when a video-to-video call mode is selected, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when an audio-video call mode is selected, inputting the audio stream into an audio prediction network to perform model parameter prediction of a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and storing the parameters as a parameter file;
the main control module 4 transmits the generated parameter file to another communication terminal through the communication module 3;
the main control module 4 is further configured to recover a corresponding three-dimensional face model by using a three-dimensional face reconstruction technique according to the parameter file received by the communication module 3, and then map the three-dimensional face model to a two-dimensional image plane to obtain a recovered video image frame sequence; and rendering the video image frame sequence, synthesizing the video image frame sequence with the audio stream to form a virtual character video, and transmitting the virtual character video to the display module 5 for display.
The same or similar reference numerals correspond to the same or similar parts;
the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A virtual character video call method based on three-dimensional face reconstruction is characterized by comprising the following steps:
s1: selecting virtual character video call modes, including a video-to-video call mode and an audio-to-video call mode:
when the video-to-video call mode is selected: acquiring a video stream and an audio stream of a first communication terminal, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when the audio-to-video call mode is selected: acquiring an audio stream of a first communication terminal, inputting the audio stream into an audio prediction network to perform model parameter prediction of a parameterized three-dimensional face, and obtaining predicted three-dimensional face model parameters; merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
s2: transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring a corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence;
s3: and rendering the video image frame sequence and then synthesizing the video image frame sequence and the audio stream into a virtual character video.
2. The virtual character video call method according to claim 1, wherein the three-dimensional face model parameters predicted by the three-dimensional face reconstruction network include an identity model parameter S, an expression model parameter E, a texture model parameter T, a pose model parameter P, and an illumination model parameter L; the three-dimensional face model parameters obtained through the audio prediction network prediction comprise expression model parameters E and posture model parameters P.
3. The virtual character video call method as claimed in claim 2, further comprising the steps of: carrying out optimization training on the three-dimensional face reconstruction network according to the video stream image frames, the predicted three-dimensional face model parameters and the restored three-dimensional face model, wherein the expression formula is as follows:
in the formula (I), the compound is shown in the specification,network parameters representing a three-dimensional face reconstruction network,representing the image frames of the original video stream,representing a prediction function learned by a three-dimensional face reconstruction network; ω (-) represents the function that maps the restored three-dimensional face model to the two-dimensional image plane.
4. The virtual character video call method as claimed in claim 3, wherein the three-dimensional face reconstruction network comprises an R-Net network.
5. The virtual character video call method as claimed in claim 2, further comprising the steps of: performing optimization training on the audio prediction network according to the audio stream, the predicted three-dimensional face model parameters and the preset initial three-dimensional face model parameters, wherein an expression formula is as follows:
in the formula, theta1And theta2Predicting a network parameter of a network for the audio,is the preset initial expression model parameters,representing pre-set initial pose model parameters,representing an audio stream; h isE(. h) represents an expressive feature prediction function learned by an audio prediction networkP(. cndot.) represents a gesture feature prediction function learned by an audio prediction network.
6. The avatar video call method of claim 5, wherein said audio prediction network comprises an LSTM network.
7. The virtual character video call method as claimed in claim 2, wherein in the step S2, the specific step of restoring the corresponding three-dimensional face model by using the three-dimensional face reconstruction technique includes:
initializing a set of vertices of a three-dimensional faceAnd RGB set corresponding to three-dimensional face vertex set
Changing the position of the three-dimensional face vertex set according to the expression model parameter E and the posture model parameter P in the parameter file, and changing the color value of the RGB set corresponding to the three-dimensional face vertex set according to the texture model parameter T and the illumination model parameter L in the parameter file, wherein the expression formula is as follows:
in the formula (I), the compound is shown in the specification,an identity base representing a three-dimensional face,an expression base representing a three-dimensional face,a texture base representing a three-dimensional face; x (lambda; P) represents a function for changing the position of the three-dimensional face vertex set according to the posture model parameter P, and lambda represents the vertex set of the position to be changed; c (epsilon; L) represents a function for changing the RGB set corresponding to the three-dimensional face vertex set according to the illumination model parameter L, wherein epsilon is the RGB set of the color value to be changed; n is a radical of1、N2The total numbers of the identity bases and the expression bases are respectively, and the subscripts i and j are respectively the ordinal numbers of the identity bases and the expression bases;
according to the vertex set S of the changed three-dimensional face*And RGB set T*And constructing a recovered three-dimensional face model, mapping each vertex in the three-dimensional face model to a two-dimensional image plane by affine transformation, and mapping the RGB color value of each vertex to the two-dimensional image plane correspondingly to serve as a pixel point of the mapping point to obtain a recovered video image frame.
8. The virtual character video call method as claimed in claim 2, wherein said step of S2 further comprises the steps of: after the parameter file is compressed, the parameter file is transmitted to a second communication terminal by adopting cloud service; and transmitting the audio stream of the first communication terminal to a second communication terminal by adopting a network protocol.
9. The virtual character video call method according to any one of claims 2 to 8, further comprising the steps of: and the second communication terminal is preset with identity model parameters S of other figures, and when the second communication terminal receives the parameter file and the audio stream transmitted by the first communication terminal, the identity model parameters S and the identity model parameters in the parameter file are replaced, and then the corresponding three-dimensional face model is recovered by using a three-dimensional face reconstruction technology.
10. A virtual character video call system based on three-dimensional face reconstruction is applied to the virtual character video call method of any one of claims 1 to 9, and is characterized by comprising a first communication terminal and a second communication terminal, wherein the first communication terminal and the second communication terminal respectively comprise a video acquisition module, an audio acquisition module, a display module, a communication module and a main control module; wherein:
the video acquisition module is used for acquiring video streams and sending the video streams to the main control module;
the audio acquisition module is used for acquiring audio streams and sending the audio streams to the main control module;
the main control module decomposes the video stream into image frames according to the currently selected virtual character video call mode, and then carries out model parameter prediction on a parameterized three-dimensional face by adopting a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and store the predicted three-dimensional face model parameters as a parameter file;
or inputting the audio stream into an audio prediction network to carry out model parameter prediction of a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
the master control module transmits the generated parameter file to another communication terminal through the communication module;
the main control module is also used for restoring a corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the parameter file received by the communication module and then mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence; and rendering the video image frame sequence, synthesizing the video image frame sequence with the audio stream to form a virtual character video, and transmitting the virtual character video to the display module for displaying.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110632937.0A CN113395476A (en) | 2021-06-07 | 2021-06-07 | Virtual character video call method and system based on three-dimensional face reconstruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110632937.0A CN113395476A (en) | 2021-06-07 | 2021-06-07 | Virtual character video call method and system based on three-dimensional face reconstruction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113395476A true CN113395476A (en) | 2021-09-14 |
Family
ID=77618475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110632937.0A Pending CN113395476A (en) | 2021-06-07 | 2021-06-07 | Virtual character video call method and system based on three-dimensional face reconstruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113395476A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113900565A (en) * | 2021-10-18 | 2022-01-07 | 深圳追一科技有限公司 | Interaction method, device, equipment and storage medium of self-service terminal |
CN114500912A (en) * | 2022-02-23 | 2022-05-13 | 联想(北京)有限公司 | Call processing method, electronic device and storage medium |
CN114821404A (en) * | 2022-04-08 | 2022-07-29 | 马上消费金融股份有限公司 | Information processing method and device, computer equipment and storage medium |
CN117474807A (en) * | 2023-12-27 | 2024-01-30 | 科大讯飞股份有限公司 | Image restoration method, device, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108765550A (en) * | 2018-05-09 | 2018-11-06 | 华南理工大学 | A kind of three-dimensional facial reconstruction method based on single picture |
CN109255831A (en) * | 2018-09-21 | 2019-01-22 | 南京大学 | The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate |
CN109584353A (en) * | 2018-10-22 | 2019-04-05 | 北京航空航天大学 | A method of three-dimensional face expression model is rebuild based on monocular video |
CN110536095A (en) * | 2019-08-30 | 2019-12-03 | Oppo广东移动通信有限公司 | Call method, device, terminal and storage medium |
CN111445582A (en) * | 2019-01-16 | 2020-07-24 | 南京大学 | Single-image human face three-dimensional reconstruction method based on illumination prior |
CN111951383A (en) * | 2020-08-12 | 2020-11-17 | 北京鼎翰科技有限公司 | Face reconstruction method |
CN112215927A (en) * | 2020-09-18 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Method, device, equipment and medium for synthesizing face video |
CN112866586A (en) * | 2021-01-04 | 2021-05-28 | 北京中科闻歌科技股份有限公司 | Video synthesis method, device, equipment and storage medium |
-
2021
- 2021-06-07 CN CN202110632937.0A patent/CN113395476A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108765550A (en) * | 2018-05-09 | 2018-11-06 | 华南理工大学 | A kind of three-dimensional facial reconstruction method based on single picture |
CN109255831A (en) * | 2018-09-21 | 2019-01-22 | 南京大学 | The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate |
CN109584353A (en) * | 2018-10-22 | 2019-04-05 | 北京航空航天大学 | A method of three-dimensional face expression model is rebuild based on monocular video |
CN111445582A (en) * | 2019-01-16 | 2020-07-24 | 南京大学 | Single-image human face three-dimensional reconstruction method based on illumination prior |
CN110536095A (en) * | 2019-08-30 | 2019-12-03 | Oppo广东移动通信有限公司 | Call method, device, terminal and storage medium |
CN111951383A (en) * | 2020-08-12 | 2020-11-17 | 北京鼎翰科技有限公司 | Face reconstruction method |
CN112215927A (en) * | 2020-09-18 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Method, device, equipment and medium for synthesizing face video |
CN112866586A (en) * | 2021-01-04 | 2021-05-28 | 北京中科闻歌科技股份有限公司 | Video synthesis method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
傅勇等: "改进级联卷积神经网络的平面旋转人脸检测", 《计算机工程与设计》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113900565A (en) * | 2021-10-18 | 2022-01-07 | 深圳追一科技有限公司 | Interaction method, device, equipment and storage medium of self-service terminal |
CN114500912A (en) * | 2022-02-23 | 2022-05-13 | 联想(北京)有限公司 | Call processing method, electronic device and storage medium |
CN114821404A (en) * | 2022-04-08 | 2022-07-29 | 马上消费金融股份有限公司 | Information processing method and device, computer equipment and storage medium |
CN114821404B (en) * | 2022-04-08 | 2023-07-25 | 马上消费金融股份有限公司 | Information processing method, device, computer equipment and storage medium |
CN117474807A (en) * | 2023-12-27 | 2024-01-30 | 科大讯飞股份有限公司 | Image restoration method, device, equipment and storage medium |
CN117474807B (en) * | 2023-12-27 | 2024-05-31 | 科大讯飞股份有限公司 | Image restoration method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113395476A (en) | Virtual character video call method and system based on three-dimensional face reconstruction | |
CN113422903B (en) | Shooting mode switching method, equipment and storage medium | |
US8072479B2 (en) | Method system and apparatus for telepresence communications utilizing video avatars | |
US7728866B2 (en) | Video telephony image processing | |
US9210372B2 (en) | Communication method and device for video simulation image | |
US11741616B2 (en) | Expression transfer across telecommunications networks | |
US20220172424A1 (en) | Method, system, and medium for 3d or 2.5d electronic communication | |
CN112037320A (en) | Image processing method, device, equipment and computer readable storage medium | |
CN115909015B (en) | Method and device for constructing deformable nerve radiation field network | |
CN113206971B (en) | Image processing method and display device | |
CN114255496A (en) | Video generation method and device, electronic equipment and storage medium | |
CN110536095A (en) | Call method, device, terminal and storage medium | |
CN114007099A (en) | Video processing method and device for video processing | |
WO2017079679A1 (en) | Depth camera based image stabilization | |
CN115239857B (en) | Image generation method and electronic device | |
CN114331918B (en) | Training method of image enhancement model, image enhancement method and electronic equipment | |
CN117808854A (en) | Image generation method, model training method, device and electronic equipment | |
Isikdogan et al. | Eye contact correction using deep neural networks | |
CN113515193B (en) | Model data transmission method and device | |
CN115100707A (en) | Model training method, video information generation method, device and storage medium | |
WO2011003315A1 (en) | Mobile terminal based image processing method and mobile terminal | |
CN101521754A (en) | Remote two-person photo sticker | |
CN111476899A (en) | Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera | |
CN114513647B (en) | Method and device for transmitting data in three-dimensional virtual scene | |
CN116546183B (en) | Dynamic image generation method and system with parallax effect based on single frame image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210914 |
|
RJ01 | Rejection of invention patent application after publication |