CN113395476A - Virtual character video call method and system based on three-dimensional face reconstruction - Google Patents

Virtual character video call method and system based on three-dimensional face reconstruction Download PDF

Info

Publication number
CN113395476A
CN113395476A CN202110632937.0A CN202110632937A CN113395476A CN 113395476 A CN113395476 A CN 113395476A CN 202110632937 A CN202110632937 A CN 202110632937A CN 113395476 A CN113395476 A CN 113395476A
Authority
CN
China
Prior art keywords
dimensional face
video
communication terminal
model parameters
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110632937.0A
Other languages
Chinese (zh)
Inventor
杨志景
温瑞冕
徐永宗
李为杰
李凯
凌永权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110632937.0A priority Critical patent/CN113395476A/en
Publication of CN113395476A publication Critical patent/CN113395476A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/275Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention provides a virtual character video call method and a virtual character video call system based on three-dimensional face reconstruction, aiming at overcoming the defects of low video call fluency and low flexibility, and the method comprises the following steps: acquiring a video stream and an audio stream of a first communication terminal, or only acquiring the audio stream of the first communication terminal; inputting video stream image frames into a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters; or inputting the audio stream into an audio prediction network to obtain predicted three-dimensional human face model parameters; merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file; transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring a corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence; and rendering the video image frame sequence and then synthesizing the video of the virtual character with the audio stream.

Description

Virtual character video call method and system based on three-dimensional face reconstruction
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a virtual character video call method and a virtual character video call system based on three-dimensional face reconstruction.
Background
With the popularization of smart phones and the rapid development of internet technologies, the communication modes among people are changed greatly, and video communication becomes a popular mode for communication among people, but has certain limitations in practical application. Firstly, in the current video call, the video call can still be normally carried out with the other end under the condition that the camera is inconvenient to open at one communication end, so that the video call lacks certain flexibility; secondly, when the video call is performed in an area with low network transmission capability, the video is blocked, which greatly reduces the user experience in the video call.
At present, a video call method for constructing a virtual character is provided, which not only can reduce the amount of transmitted data to increase the smoothness of video call, but also can improve the interest of video call by replacing the character identity of the opposite communication terminal. For example, a virtual instant messaging method disclosed in publication No. CN110213521A (published japanese 2019-09-06) proposes to use a virtual 2D/3D image model with the same expression and posture as those of two parties to replace the real appearance of the two parties in the process of virtual instant messaging. However, the method needs a terminal camera to capture the face image of the person at any time, lacks of acquiring the head posture information, still does not get rid of the excessive dependence on the camera, and still has the problems of low flexibility and low video call fluency.
Disclosure of Invention
In order to overcome the defects of low video call fluency and low flexibility in the prior art, the invention provides a virtual character video call method based on three-dimensional face reconstruction and a virtual character video call system based on three-dimensional face reconstruction.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a virtual character video call method based on three-dimensional face reconstruction comprises the following steps:
s1: selecting virtual character video call modes, including a video-to-video call mode and an audio-to-video call mode:
when the video-to-video call mode is selected: acquiring a video stream and an audio stream of a first communication terminal, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when the audio-to-video call mode is selected: acquiring an audio stream of a first communication terminal, inputting the audio stream into an audio prediction network to perform model parameter prediction of a parameterized three-dimensional face, and obtaining predicted three-dimensional face model parameters; merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
s2: transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring a corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence;
s3: and rendering the video image frame sequence and then synthesizing the video image frame sequence and the audio stream into a virtual character video.
As a preferred scheme, the three-dimensional face model parameters obtained by the three-dimensional face reconstruction network prediction comprise an identity model parameter S, an expression model parameter E, a texture model parameter T, a posture model parameter P and an illumination model parameter L; the three-dimensional face model parameters obtained through the audio prediction network prediction comprise expression model parameters E and posture model parameters P.
Preferably, the method further comprises the following steps: carrying out optimization training on the three-dimensional face reconstruction network according to the video stream image frames, the predicted three-dimensional face model parameters and the restored three-dimensional face model, wherein the expression formula is as follows:
Figure BDA0003104354100000021
in the formula (I), the compound is shown in the specification,
Figure BDA0003104354100000022
network parameters representing a three-dimensional face reconstruction network,
Figure BDA0003104354100000023
representing the image frames of the original video stream,
Figure BDA0003104354100000024
representing a prediction function learned by a three-dimensional face reconstruction network; ω (-) represents the function that maps the restored three-dimensional face model to the two-dimensional image plane.
Preferably, the three-dimensional face reconstruction network comprises an R-Net network.
Preferably, the method further comprises the following steps: performing optimization training on the audio prediction network according to the audio stream, the predicted three-dimensional face model parameters and the preset initial three-dimensional face model parameters, wherein an expression formula is as follows:
Figure BDA0003104354100000025
Figure BDA0003104354100000026
in the formula, theta1And theta2Predicting a network parameter of a network for the audio,
Figure BDA0003104354100000031
is the preset initial expression model parameters,
Figure BDA0003104354100000032
showing the pre-set initial pose model parameters,
Figure BDA0003104354100000033
representing an audio stream; h isE(. h) represents an expressive feature prediction function learned by an audio prediction networkP(. cndot.) represents a gesture feature prediction function learned by an audio prediction network.
Preferably, the audio prediction network comprises an LSTM network.
Preferably, in the step S2, the specific step of restoring the corresponding three-dimensional face model by using the three-dimensional face reconstruction technique includes:
initializing a set of vertices of a three-dimensional face
Figure BDA0003104354100000034
And RGB set corresponding to three-dimensional face vertex set
Figure BDA0003104354100000035
Changing the position of the three-dimensional face vertex set according to the expression model parameter E and the posture model parameter P in the parameter file, and changing the color value of the RGB set corresponding to the three-dimensional face vertex set according to the texture model parameter T and the illumination model parameter L in the parameter file, wherein the expression formula is as follows:
Figure BDA0003104354100000036
Figure BDA0003104354100000037
in the formula (I), the compound is shown in the specification,
Figure BDA0003104354100000038
an identity base representing a three-dimensional face,
Figure BDA0003104354100000039
an expression base representing a three-dimensional face,
Figure BDA00031043541000000310
a texture base representing a three-dimensional face; x (lambda; P) represents a function for changing the position of the three-dimensional face vertex set according to the posture model parameter P, and lambda represents the vertex set of the position to be changed; c (epsilon; L) represents a function for changing the RGB set corresponding to the three-dimensional face vertex set according to the illumination model parameter L, wherein epsilon is the RGB set of the color value to be changed; n is a radical of1、N2The total numbers of the identity bases and the expression bases are respectively, and the subscripts i and j are respectively the ordinal numbers of the identity bases and the expression bases;
according to the vertex set S of the changed three-dimensional face*And RGB set T*And constructing a recovered three-dimensional face model, mapping each vertex in the three-dimensional face model to a two-dimensional image plane by affine transformation, and mapping the RGB color value of each vertex to the two-dimensional image plane correspondingly to serve as a pixel point of the mapping point to obtain a recovered video image frame.
Preferably, the step S2 further includes the following steps: after the parameter file is compressed, the parameter file is transmitted to a second communication terminal by adopting cloud service; and transmitting the audio stream of the first communication terminal to a second communication terminal by adopting a network protocol.
Preferably, the method further comprises the following steps: and the second communication terminal is preset with identity model parameters S of other figures, and when the second communication terminal receives the parameter file and the audio stream transmitted by the first communication terminal, the identity model parameters S and the identity model parameters in the parameter file are replaced, and then the corresponding three-dimensional face model is recovered by using a three-dimensional face reconstruction technology.
The invention also provides a virtual character video call system based on three-dimensional face reconstruction, which is applied to the virtual character video call method provided by any technical scheme, and the virtual character video call system comprises a first communication terminal and a second communication terminal, wherein the first communication terminal and the second communication terminal respectively comprise a video acquisition module, an audio acquisition module, a display module, a communication module and a main control module; wherein:
the video acquisition module is used for acquiring video streams and sending the video streams to the main control module;
the audio acquisition module is used for acquiring audio streams and sending the audio streams to the main control module;
the main control module decomposes the video stream into image frames according to the currently selected virtual character video call mode, and then carries out model parameter prediction on a parameterized three-dimensional face by adopting a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and store the predicted three-dimensional face model parameters as a parameter file;
or inputting the audio stream into an audio prediction network to carry out model parameter prediction of a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
the master control module transmits the generated parameter file to another communication terminal through the communication module;
the main control module is also used for restoring a corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the parameter file received by the communication module and then mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence; and rendering the video image frame sequence, synthesizing the video image frame sequence with the audio stream to form a virtual character video, and transmitting the virtual character video to the display module for displaying.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
according to the invention, a three-dimensional face reconstruction technology is utilized, image frames captured by a camera are subjected to three-dimensional face reconstruction to obtain parameterized three-dimensional face model parameters for transmission, or the three-dimensional face model parameters of a communication terminal user are predicted from a recorded audio stream and then are transmitted, so that the data transmission quantity of a communication terminal can be reduced, and the smoothness of video call is effectively improved;
the invention can predict the three-dimensional face model parameters of the communication terminal user only from the recorded audio stream, and can recover the complete three-dimensional face by combining the preset initial three-dimensional face model parameters, thereby realizing video call under the condition of closing the camera.
Drawings
Fig. 1 is a flowchart of a virtual character video call method based on three-dimensional face reconstruction in embodiment 1.
Fig. 2 is a schematic diagram of a virtual character video call method according to embodiment 1.
Fig. 3 is a schematic diagram of a video-to-video one-way avatar video call in embodiment 1.
Fig. 4 is a schematic diagram of an audio-to-video one-way avatar video call in embodiment 1.
Fig. 5 is a schematic view of a virtual character video call for replacing the virtual character identity according to embodiment 2.
Fig. 6 is a schematic view of a virtual character video call for replacing the virtual character identity according to embodiment 2.
Fig. 7 is a schematic structural diagram of a virtual character video call system based on three-dimensional face reconstruction in embodiment 3.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The first communication terminal proposed in this embodiment refers to a data transmitting end, and the second communication terminal refers to a data receiving end.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a virtual character video call method based on three-dimensional face reconstruction, and is a flow chart of the virtual character video call method based on three-dimensional face reconstruction in the embodiment, as shown in fig. 1 to 2.
The virtual character video call method based on three-dimensional face reconstruction provided by the embodiment comprises the following steps:
step 1: the selected video-to-video communication mode or audio-to-video communication mode is as follows:
when the video-to-video call mode is selected: acquiring a video stream and an audio stream of a first communication terminal, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when the audio-to-video call mode is selected: the method comprises the steps of obtaining an audio stream of a first communication terminal, inputting the audio stream into an audio prediction network to carry out model parameter prediction on a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and storing the parameters as parameter files.
The three-dimensional face model parameters obtained through the three-dimensional face reconstruction network prediction comprise an identity model parameter S, an expression model parameter E, a texture model parameter T, a posture model parameter P and an illumination model parameter L; the three-dimensional face model parameters obtained through the audio prediction network prediction comprise expression model parameters E and posture model parameters P. In addition, the preset initial three-dimensional face model parameters in this embodiment include an identity model parameter S, an expression model parameter E, a texture model parameter T, a pose model parameter P, and an illumination model parameter L, and the initial three-dimensional face model parameters are obtained by performing three-dimensional face reconstruction through a face image of the front face of a user character shot in advance before a virtual character video call is performed.
The three-dimensional face reconstruction network in this step adopts an R-Net network, and the three-dimensional face reconstruction network is optimally trained according to the video stream image frames, the predicted three-dimensional face model parameters and the restored three-dimensional face model, and the expression formula is as follows:
Figure BDA0003104354100000061
in the formula (I), the compound is shown in the specification,
Figure BDA0003104354100000062
showing the network parameters of the three-dimensional face reconstruction network,
Figure BDA0003104354100000063
representing the image frames of the original video stream,
Figure BDA0003104354100000064
representing a prediction function learned by a three-dimensional face reconstruction network; ω (-) represents the function that maps the restored three-dimensional face model to the two-dimensional image plane.
The audio prediction network in this step adopts an LSTM network, and the audio prediction network is optimized and trained according to the audio stream, the predicted three-dimensional face model parameters and the preset initial three-dimensional face model parameters, and the expression formula is as follows:
Figure BDA0003104354100000065
Figure BDA0003104354100000066
in the formula, theta1And theta2Predicting a network parameter of a network for the audio,
Figure BDA0003104354100000067
is the preset initial expression model parameters,
Figure BDA0003104354100000068
representing pre-set initial pose model parameters,
Figure BDA0003104354100000069
representing an audio stream; h isE(. represents an audio prediction networkLearned expression feature prediction function, hP(. cndot.) represents a gesture feature prediction function learned by an audio prediction network.
Step 2: and transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring the corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence.
In this step, the parameter file is compressed, then the cloud service is adopted to transmit the parameter file to the second communication terminal, and a network protocol is adopted to transmit the audio stream of the first communication terminal to the second communication terminal.
Further, the specific steps of restoring the corresponding three-dimensional face model by using the three-dimensional face reconstruction technology in the step include:
initializing a set of vertices of a three-dimensional face
Figure BDA00031043541000000610
RGB set corresponding to three-dimensional face vertex set
Figure BDA00031043541000000611
Changing the position of the three-dimensional face vertex set according to the expression model parameter E and the posture model parameter P in the parameter file, and changing the color value of the RGB set corresponding to the three-dimensional face vertex set according to the texture model parameter T and the illumination model parameter L in the parameter file, wherein the expression formula is as follows:
Figure BDA0003104354100000071
Figure BDA0003104354100000072
in the formula (I), the compound is shown in the specification,
Figure BDA0003104354100000073
an identity base representing a three-dimensional face,
Figure BDA0003104354100000074
an expression base representing a three-dimensional face,
Figure BDA0003104354100000075
a texture base representing a three-dimensional face; x (lambda; P) represents a function for changing the position of the three-dimensional face vertex set according to the posture model parameter P, and lambda represents the vertex set of the position to be changed; c (epsilon; L) represents a function for changing the RGB set corresponding to the three-dimensional face vertex set according to the illumination model parameter L, wherein epsilon is the RGB set of the color value to be changed; n is a radical of1、N2The total numbers of the identity bases and the expression bases are respectively, and the subscripts i and j are respectively the ordinal numbers of the identity bases and the expression bases;
according to the vertex set S of the changed three-dimensional face*And RGB set T*And constructing a recovered three-dimensional face model, mapping each vertex in the three-dimensional face model to a two-dimensional image plane by affine transformation, and mapping the RGB color value of each vertex to the two-dimensional image plane correspondingly to serve as a pixel point of the mapping point to obtain a recovered video image frame.
And step 3: and rendering the video image frame sequence and then synthesizing the video image frame sequence and the audio stream into a virtual character video.
In the step, a face renderer for generating a countermeasure network is adopted to render the synthesized video image frame sequence, so that the reality of the video image frame sequence is improved, and then the video image frame sequence and the received audio stream are synthesized by adopting a multimedia processing technology to obtain a virtual character video and are displayed.
In a specific implementation process, the method for virtual character video call provided in this embodiment is applied to a video-to-video call mode, and a flow diagram of the method is shown in fig. 3.
When the virtual character video call is carried out, the first communication terminal continuously captures video stream and audio stream, the video stream is decomposed into image frame sequences by adopting a multimedia processing technology, then model parameter prediction of a parameterized three-dimensional face is carried out by adopting a three-dimensional face reconstruction network, and predicted three-dimensional face model parameters are obtained and stored as parameter files in a mat format or a yml format. Compressing the parameter file in the mat format or yml format into a file in the zip format, then transmitting the file by using cloud service, and compressing the audio stream into an mp3 file, then transmitting the file by using a network protocol. And the second communication terminal recovers the corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the received parameter file, then maps the three-dimensional face model to a two-dimensional image plane to obtain a recovered video image frame sequence, and then renders the video image frame sequence to synthesize a virtual character video with the audio stream.
In the embodiment, the problem of poor video call fluency in a poor network area is considered, a three-dimensional face reconstruction technology is utilized, image frames captured by a camera are subjected to three-dimensional face reconstruction to obtain parameterized three-dimensional face model parameters, and the parameterized three-dimensional face model parameters almost contain all information of image portraits, so that the communication terminal can complete data volume transmission of video calls only by transmitting the complete three-dimensional face model parameters and audio streams to a communication opposite terminal, and accordingly the data volume required to be transmitted by the video calls is reduced and the fluency of the video calls is improved. When the user selects the audio-video call mode, the data volume to be transmitted is still the complete three-dimensional human face model parameter and audio stream.
In another specific implementation process, the method for virtual character video call provided in this embodiment is applied to a call mode from audio to video, and a flow diagram of the method is shown in fig. 4.
Before the virtual character video call is carried out, a front face image of a single portrait needs to be shot in advance to carry out three-dimensional face reconstruction, so that initial three-dimensional face model parameters are obtained and stored on a corresponding communication terminal.
When the virtual character video call is carried out, the first communication terminal only collects audio stream data, then inputs the audio stream into an audio prediction network adopting an LSTM network to carry out model parameter prediction on a parameterized three-dimensional face to obtain predicted expression model parameters and posture model parameters, then carries out merging and updating with initial three-dimensional face model parameters stored in the current communication terminal, and then stores the parameters as a parameter file in a mat format or a parameter file in a yml format.
And similarly, compressing the parameter file, transmitting the compressed parameter file by using cloud service, and transmitting the audio stream by using a network protocol. And the second communication terminal recovers the corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the received parameter file, then maps the three-dimensional face model to a two-dimensional image plane to obtain a recovered video image frame sequence, and then renders the video image frame sequence to synthesize a virtual character video with the audio stream.
In the embodiment, the limitation problem of the video call in an application scene is considered, an audio prediction method combining a three-dimensional face reconstruction technology with deep learning is utilized, under the condition that a camera is closed, the expression and head posture model parameters of a parameterized three-dimensional face of a communication terminal user are predicted from a recorded audio stream, and then are combined with the three-dimensional face model parameters preset on the communication terminal, the communication terminal can recover the complete three-dimensional face from the complete three-dimensional face model parameters and map the complete three-dimensional face to a two-dimensional image plane to obtain a corresponding image frame, and therefore the video call mode from audio to video is achieved.
Example 2
The present embodiment provides a virtual character video call method for replacing the identity of a virtual character on the basis of the virtual character video call method based on three-dimensional face reconstruction provided in embodiment 1. Fig. 5 to 6 are schematic diagrams of the virtual character video call for replacing the virtual character identity according to the embodiment.
In this embodiment, the method further includes the following steps: and the second communication terminal is preset with identity model parameters S of other figures, and when the second communication terminal receives the parameter file and the audio stream transmitted by the first communication terminal, the identity model parameters S and the identity model parameters in the parameter file are replaced, and then the corresponding three-dimensional face model is recovered by using a three-dimensional face reconstruction technology.
In the specific implementation process, the communication terminal can autonomously select other virtual character identities which are pre-stored, that is, the communication terminal pre-stores the identity model parameters S of the corresponding character identities, and the virtual character identities can be replaced at any time in the video call process.
When the communication terminal selects the identities of other virtual characters, the communication terminal selects corresponding identity model parameters S from the prestored identity model parameters S, replaces the corresponding identity model parameters S with the identity model parameters in the currently received parameter file, keeps expression model parameters E, texture model parameters T, posture model parameters P and illumination model parameters L unchanged, and recovers the corresponding three-dimensional face model by using a three-dimensional face reconstruction technology.
In the embodiment, the flexibility of replaceable modification of the parameterized three-dimensional face model parameters is utilized, when the complete three-dimensional face model parameters are received at one end of a video call, only the identity model parameters S in the three-dimensional face model parameters are replaced by the identity parameters of the preset cartoon characters or celebrities, and the identity of other virtual characters can be replaced under the condition that the expression and the head posture of the characters at the opposite end of communication are not changed, so that the interestingness and the flexibility of the video call are improved.
Example 3
The embodiment provides a virtual character video call system based on three-dimensional face reconstruction, which is applied to the virtual character video call method based on three-dimensional face reconstruction provided in embodiment 1 or embodiment 2. Fig. 7 is a schematic structural diagram of a virtual character video call system based on three-dimensional face reconstruction according to this embodiment.
The virtual character video call system based on three-dimensional face reconstruction provided by the embodiment comprises a first communication terminal and a second communication terminal which have the same structure, wherein the first communication terminal and the second communication terminal respectively comprise a video acquisition module 1, an audio acquisition module 2, a display module 5, a communication module 3 and a main control module 4; wherein:
the video acquisition module 1 is used for acquiring video streams and sending the video streams to the main control module 4;
the audio acquisition module 2 is used for acquiring audio streams and sending the audio streams to the main control module 4;
the main control module 4 processes the collected video stream or audio stream according to the currently selected virtual character video call mode, specifically:
when a video-to-video call mode is selected, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when an audio-video call mode is selected, inputting the audio stream into an audio prediction network to perform model parameter prediction of a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and storing the parameters as a parameter file;
the main control module 4 transmits the generated parameter file to another communication terminal through the communication module 3;
the main control module 4 is further configured to recover a corresponding three-dimensional face model by using a three-dimensional face reconstruction technique according to the parameter file received by the communication module 3, and then map the three-dimensional face model to a two-dimensional image plane to obtain a recovered video image frame sequence; and rendering the video image frame sequence, synthesizing the video image frame sequence with the audio stream to form a virtual character video, and transmitting the virtual character video to the display module 5 for display.
The same or similar reference numerals correspond to the same or similar parts;
the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A virtual character video call method based on three-dimensional face reconstruction is characterized by comprising the following steps:
s1: selecting virtual character video call modes, including a video-to-video call mode and an audio-to-video call mode:
when the video-to-video call mode is selected: acquiring a video stream and an audio stream of a first communication terminal, decomposing the video stream into image frames, and then performing model parameter prediction on a parameterized three-dimensional face by using a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and storing the predicted three-dimensional face model parameters as a parameter file;
when the audio-to-video call mode is selected: acquiring an audio stream of a first communication terminal, inputting the audio stream into an audio prediction network to perform model parameter prediction of a parameterized three-dimensional face, and obtaining predicted three-dimensional face model parameters; merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
s2: transmitting the parameter file and the audio stream of the first communication terminal to a second communication terminal, restoring a corresponding three-dimensional face model by the second communication terminal according to the parameter file by using a three-dimensional face reconstruction technology, and mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence;
s3: and rendering the video image frame sequence and then synthesizing the video image frame sequence and the audio stream into a virtual character video.
2. The virtual character video call method according to claim 1, wherein the three-dimensional face model parameters predicted by the three-dimensional face reconstruction network include an identity model parameter S, an expression model parameter E, a texture model parameter T, a pose model parameter P, and an illumination model parameter L; the three-dimensional face model parameters obtained through the audio prediction network prediction comprise expression model parameters E and posture model parameters P.
3. The virtual character video call method as claimed in claim 2, further comprising the steps of: carrying out optimization training on the three-dimensional face reconstruction network according to the video stream image frames, the predicted three-dimensional face model parameters and the restored three-dimensional face model, wherein the expression formula is as follows:
Figure FDA0003104354090000011
in the formula (I), the compound is shown in the specification,
Figure FDA0003104354090000012
network parameters representing a three-dimensional face reconstruction network,
Figure FDA0003104354090000013
representing the image frames of the original video stream,
Figure FDA0003104354090000014
representing a prediction function learned by a three-dimensional face reconstruction network; ω (-) represents the function that maps the restored three-dimensional face model to the two-dimensional image plane.
4. The virtual character video call method as claimed in claim 3, wherein the three-dimensional face reconstruction network comprises an R-Net network.
5. The virtual character video call method as claimed in claim 2, further comprising the steps of: performing optimization training on the audio prediction network according to the audio stream, the predicted three-dimensional face model parameters and the preset initial three-dimensional face model parameters, wherein an expression formula is as follows:
Figure FDA0003104354090000021
Figure FDA0003104354090000022
in the formula, theta1And theta2Predicting a network parameter of a network for the audio,
Figure FDA0003104354090000023
is the preset initial expression model parameters,
Figure FDA0003104354090000024
representing pre-set initial pose model parameters,
Figure FDA0003104354090000025
representing an audio stream; h isE(. h) represents an expressive feature prediction function learned by an audio prediction networkP(. cndot.) represents a gesture feature prediction function learned by an audio prediction network.
6. The avatar video call method of claim 5, wherein said audio prediction network comprises an LSTM network.
7. The virtual character video call method as claimed in claim 2, wherein in the step S2, the specific step of restoring the corresponding three-dimensional face model by using the three-dimensional face reconstruction technique includes:
initializing a set of vertices of a three-dimensional face
Figure FDA0003104354090000026
And RGB set corresponding to three-dimensional face vertex set
Figure FDA0003104354090000027
Changing the position of the three-dimensional face vertex set according to the expression model parameter E and the posture model parameter P in the parameter file, and changing the color value of the RGB set corresponding to the three-dimensional face vertex set according to the texture model parameter T and the illumination model parameter L in the parameter file, wherein the expression formula is as follows:
Figure FDA0003104354090000028
Figure FDA0003104354090000029
in the formula (I), the compound is shown in the specification,
Figure FDA00031043540900000210
an identity base representing a three-dimensional face,
Figure FDA00031043540900000211
an expression base representing a three-dimensional face,
Figure FDA00031043540900000212
a texture base representing a three-dimensional face; x (lambda; P) represents a function for changing the position of the three-dimensional face vertex set according to the posture model parameter P, and lambda represents the vertex set of the position to be changed; c (epsilon; L) represents a function for changing the RGB set corresponding to the three-dimensional face vertex set according to the illumination model parameter L, wherein epsilon is the RGB set of the color value to be changed; n is a radical of1、N2The total numbers of the identity bases and the expression bases are respectively, and the subscripts i and j are respectively the ordinal numbers of the identity bases and the expression bases;
according to the vertex set S of the changed three-dimensional face*And RGB set T*And constructing a recovered three-dimensional face model, mapping each vertex in the three-dimensional face model to a two-dimensional image plane by affine transformation, and mapping the RGB color value of each vertex to the two-dimensional image plane correspondingly to serve as a pixel point of the mapping point to obtain a recovered video image frame.
8. The virtual character video call method as claimed in claim 2, wherein said step of S2 further comprises the steps of: after the parameter file is compressed, the parameter file is transmitted to a second communication terminal by adopting cloud service; and transmitting the audio stream of the first communication terminal to a second communication terminal by adopting a network protocol.
9. The virtual character video call method according to any one of claims 2 to 8, further comprising the steps of: and the second communication terminal is preset with identity model parameters S of other figures, and when the second communication terminal receives the parameter file and the audio stream transmitted by the first communication terminal, the identity model parameters S and the identity model parameters in the parameter file are replaced, and then the corresponding three-dimensional face model is recovered by using a three-dimensional face reconstruction technology.
10. A virtual character video call system based on three-dimensional face reconstruction is applied to the virtual character video call method of any one of claims 1 to 9, and is characterized by comprising a first communication terminal and a second communication terminal, wherein the first communication terminal and the second communication terminal respectively comprise a video acquisition module, an audio acquisition module, a display module, a communication module and a main control module; wherein:
the video acquisition module is used for acquiring video streams and sending the video streams to the main control module;
the audio acquisition module is used for acquiring audio streams and sending the audio streams to the main control module;
the main control module decomposes the video stream into image frames according to the currently selected virtual character video call mode, and then carries out model parameter prediction on a parameterized three-dimensional face by adopting a three-dimensional face reconstruction network to obtain predicted three-dimensional face model parameters and store the predicted three-dimensional face model parameters as a parameter file;
or inputting the audio stream into an audio prediction network to carry out model parameter prediction of a parameterized three-dimensional face to obtain predicted three-dimensional face model parameters, merging and updating preset initial three-dimensional face model parameters according to the predicted three-dimensional face model parameters, and then saving the parameters as a parameter file;
the master control module transmits the generated parameter file to another communication terminal through the communication module;
the main control module is also used for restoring a corresponding three-dimensional face model by using a three-dimensional face reconstruction technology according to the parameter file received by the communication module and then mapping the three-dimensional face model to a two-dimensional image plane to obtain a restored video image frame sequence; and rendering the video image frame sequence, synthesizing the video image frame sequence with the audio stream to form a virtual character video, and transmitting the virtual character video to the display module for displaying.
CN202110632937.0A 2021-06-07 2021-06-07 Virtual character video call method and system based on three-dimensional face reconstruction Pending CN113395476A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110632937.0A CN113395476A (en) 2021-06-07 2021-06-07 Virtual character video call method and system based on three-dimensional face reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110632937.0A CN113395476A (en) 2021-06-07 2021-06-07 Virtual character video call method and system based on three-dimensional face reconstruction

Publications (1)

Publication Number Publication Date
CN113395476A true CN113395476A (en) 2021-09-14

Family

ID=77618475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110632937.0A Pending CN113395476A (en) 2021-06-07 2021-06-07 Virtual character video call method and system based on three-dimensional face reconstruction

Country Status (1)

Country Link
CN (1) CN113395476A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900565A (en) * 2021-10-18 2022-01-07 深圳追一科技有限公司 Interaction method, device, equipment and storage medium of self-service terminal
CN114500912A (en) * 2022-02-23 2022-05-13 联想(北京)有限公司 Call processing method, electronic device and storage medium
CN114821404A (en) * 2022-04-08 2022-07-29 马上消费金融股份有限公司 Information processing method and device, computer equipment and storage medium
CN117474807A (en) * 2023-12-27 2024-01-30 科大讯飞股份有限公司 Image restoration method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765550A (en) * 2018-05-09 2018-11-06 华南理工大学 A kind of three-dimensional facial reconstruction method based on single picture
CN109255831A (en) * 2018-09-21 2019-01-22 南京大学 The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate
CN109584353A (en) * 2018-10-22 2019-04-05 北京航空航天大学 A method of three-dimensional face expression model is rebuild based on monocular video
CN110536095A (en) * 2019-08-30 2019-12-03 Oppo广东移动通信有限公司 Call method, device, terminal and storage medium
CN111445582A (en) * 2019-01-16 2020-07-24 南京大学 Single-image human face three-dimensional reconstruction method based on illumination prior
CN111951383A (en) * 2020-08-12 2020-11-17 北京鼎翰科技有限公司 Face reconstruction method
CN112215927A (en) * 2020-09-18 2021-01-12 腾讯科技(深圳)有限公司 Method, device, equipment and medium for synthesizing face video
CN112866586A (en) * 2021-01-04 2021-05-28 北京中科闻歌科技股份有限公司 Video synthesis method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765550A (en) * 2018-05-09 2018-11-06 华南理工大学 A kind of three-dimensional facial reconstruction method based on single picture
CN109255831A (en) * 2018-09-21 2019-01-22 南京大学 The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate
CN109584353A (en) * 2018-10-22 2019-04-05 北京航空航天大学 A method of three-dimensional face expression model is rebuild based on monocular video
CN111445582A (en) * 2019-01-16 2020-07-24 南京大学 Single-image human face three-dimensional reconstruction method based on illumination prior
CN110536095A (en) * 2019-08-30 2019-12-03 Oppo广东移动通信有限公司 Call method, device, terminal and storage medium
CN111951383A (en) * 2020-08-12 2020-11-17 北京鼎翰科技有限公司 Face reconstruction method
CN112215927A (en) * 2020-09-18 2021-01-12 腾讯科技(深圳)有限公司 Method, device, equipment and medium for synthesizing face video
CN112866586A (en) * 2021-01-04 2021-05-28 北京中科闻歌科技股份有限公司 Video synthesis method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
傅勇等: "改进级联卷积神经网络的平面旋转人脸检测", 《计算机工程与设计》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900565A (en) * 2021-10-18 2022-01-07 深圳追一科技有限公司 Interaction method, device, equipment and storage medium of self-service terminal
CN114500912A (en) * 2022-02-23 2022-05-13 联想(北京)有限公司 Call processing method, electronic device and storage medium
CN114821404A (en) * 2022-04-08 2022-07-29 马上消费金融股份有限公司 Information processing method and device, computer equipment and storage medium
CN114821404B (en) * 2022-04-08 2023-07-25 马上消费金融股份有限公司 Information processing method, device, computer equipment and storage medium
CN117474807A (en) * 2023-12-27 2024-01-30 科大讯飞股份有限公司 Image restoration method, device, equipment and storage medium
CN117474807B (en) * 2023-12-27 2024-05-31 科大讯飞股份有限公司 Image restoration method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113395476A (en) Virtual character video call method and system based on three-dimensional face reconstruction
CN113422903B (en) Shooting mode switching method, equipment and storage medium
US8072479B2 (en) Method system and apparatus for telepresence communications utilizing video avatars
US7728866B2 (en) Video telephony image processing
US9210372B2 (en) Communication method and device for video simulation image
US11741616B2 (en) Expression transfer across telecommunications networks
US20220172424A1 (en) Method, system, and medium for 3d or 2.5d electronic communication
CN112037320A (en) Image processing method, device, equipment and computer readable storage medium
CN115909015B (en) Method and device for constructing deformable nerve radiation field network
CN113206971B (en) Image processing method and display device
CN114255496A (en) Video generation method and device, electronic equipment and storage medium
CN110536095A (en) Call method, device, terminal and storage medium
CN114007099A (en) Video processing method and device for video processing
WO2017079679A1 (en) Depth camera based image stabilization
CN115239857B (en) Image generation method and electronic device
CN114331918B (en) Training method of image enhancement model, image enhancement method and electronic equipment
CN117808854A (en) Image generation method, model training method, device and electronic equipment
Isikdogan et al. Eye contact correction using deep neural networks
CN113515193B (en) Model data transmission method and device
CN115100707A (en) Model training method, video information generation method, device and storage medium
WO2011003315A1 (en) Mobile terminal based image processing method and mobile terminal
CN101521754A (en) Remote two-person photo sticker
CN111476899A (en) Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera
CN114513647B (en) Method and device for transmitting data in three-dimensional virtual scene
CN116546183B (en) Dynamic image generation method and system with parallax effect based on single frame image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210914

RJ01 Rejection of invention patent application after publication