CN117422802B - Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium - Google Patents

Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium Download PDF

Info

Publication number
CN117422802B
CN117422802B CN202311747116.7A CN202311747116A CN117422802B CN 117422802 B CN117422802 B CN 117422802B CN 202311747116 A CN202311747116 A CN 202311747116A CN 117422802 B CN117422802 B CN 117422802B
Authority
CN
China
Prior art keywords
image
input image
feature
point cloud
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311747116.7A
Other languages
Chinese (zh)
Other versions
CN117422802A (en
Inventor
楚选耕
李昱
林丽健
刘云飞
余飞
周昌印
幺宝刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Gaishi Technology Co ltd
International Digital Economy Academy IDEA
Original Assignee
Hangzhou Gaishi Technology Co ltd
International Digital Economy Academy IDEA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Gaishi Technology Co ltd, International Digital Economy Academy IDEA filed Critical Hangzhou Gaishi Technology Co ltd
Priority to CN202311747116.7A priority Critical patent/CN117422802B/en
Publication of CN117422802A publication Critical patent/CN117422802A/en
Application granted granted Critical
Publication of CN117422802B publication Critical patent/CN117422802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a three-dimensional figure digital reconstruction method, a device, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring an input image, and constructing a standard feature space based on the input image, wherein the standard feature space is a three-plane standard feature space; acquiring point cloud data corresponding to an input image, and constructing a dynamic table scene based on the point cloud data, wherein the dynamic expression scene is used for reflecting dynamic expression characteristics corresponding to the input image; and acquiring camera attitude parameters, and performing volume rendering based on the standard feature space, the dynamic expression features and the camera attitude parameters to obtain a replay image so as to realize digital reconstruction of the three-dimensional portrait. According to the invention, the three-dimensional portrait digital reconstruction can be realized by analyzing and reasoning the input image, and a large amount of user data and time-consuming independent training process are not needed. In addition, the invention also uses a dynamic table plot based on point cloud data, avoids the excessive processing of expression information, and is convenient for realizing accurate and natural expression driving.

Description

Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium
Technical Field
The present invention relates to the field of virtual reality and computer vision technologies, and in particular, to a method and apparatus for digitally reconstructing a three-dimensional figure, a terminal device, and a storage medium.
Background
Existing methods for digitally reconstructing and driving three-dimensional figures can be roughly divided into three types: a 2D warped field based method, a mesh model based method, and a neural rendering based method. However, 2D warped field based methods have difficulty maintaining multi-view consistency due to lack of necessary 3D constraints. Grid model based methods have difficulty generating high precision photo-level digital avatars. Whereas neural rendering based methods either require a large amount of portrait data to reconstruct, or involve time-consuming optimization procedures in the reasoning process, and expression control often has insufficient accuracy. Therefore, the optimization process of the three-dimensional figure digital reconstruction in the prior art is complex and time-consuming, and the problem of insufficient accuracy of the reconstructed three-dimensional digital figure also exists.
Accordingly, there is a need for improvement and advancement in the art.
Disclosure of Invention
The invention aims to solve the technical problems that the optimization process of the three-dimensional figure digital reconstruction is complex and time-consuming, and the reconstructed three-dimensional digital figure has insufficient precision.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a method for digitally reconstructing a three-dimensional portrait, where the method includes:
acquiring an input image, and constructing a standard feature space based on the input image, wherein the standard feature space is a three-plane standard feature space and is used for reflecting three-dimensional image features of the input image;
acquiring point cloud data corresponding to the input image, and constructing a dynamic table plot based on the point cloud data, wherein the dynamic table plot is used for reflecting dynamic expression characteristics corresponding to the input image;
and acquiring camera attitude parameters, and performing volume rendering based on the standard feature space, the dynamic expression features and the camera attitude parameters to obtain a replay image so as to realize digital reconstruction of the three-dimensional human image.
In one implementation, the constructing a canonical feature space based on the input image includes:
based on a preset encoder, mapping image features of the input image into three feature planes respectively;
and constructing the standard feature space based on the three feature planes.
In one implementation, the constructing a canonical feature space based on the input image further includes:
if the input images are multiple, fusing the standard feature spaces corresponding to each input image based on a preset attention module to obtain fused standard feature spaces.
In one implementation manner, the obtaining the point cloud data corresponding to the input image, and building a dynamic table scenario based on the point cloud data, includes:
acquiring shape parameters, expression parameters and posture parameters corresponding to the input image, and generating point cloud data based on the shape parameters, the expression parameters and the posture parameters;
and acquiring position features corresponding to the point cloud data, and constructing the dynamic table scene based on the point cloud data and the position features.
In one implementation, the building the dynamic table scenario based on the point cloud data and the location features includes:
acquiring point cloud vertexes in the point cloud data, and determining a plurality of adjacent points corresponding to the point cloud vertexes;
acquiring a first position characteristic of the adjacent point, and acquiring a sampling point for synthesizing light in the volume rendering process and the position characteristic of the sampling point;
encoding the relative position between the first position feature of the adjacent point and the position feature of the sampling point based on a preset position encoding function to obtain an encoded position feature;
and carrying out linear regression on the second position feature and the coding position feature based on a linear regression layer to obtain the dynamic expression feature, and constructing the dynamic table scene based on the dynamic expression feature.
In one implementation, the performing volume rendering based on the canonical feature space, the dynamic expression feature, and the camera pose parameter to obtain a replay image to implement three-dimensional portrait digitized reconstruction includes:
sampling a plurality of beams of light based on the camera attitude parameters to obtain RGB data and density data of each sampling point, and synthesizing RGB colors of the light based on the RGB data and the density data;
and performing volume rendering on the standard feature space, the dynamic expression features and the RGB colors based on a preset volume rendering function to obtain the replay image.
In one implementation, the method further comprises:
and processing the replay image by using a lightweight super-resolution module to obtain a final image, wherein the resolution of the final image is higher than that of the replay image.
In one implementation, the method further comprises:
acquiring a target image, wherein the target image and the input image come from the same video data;
obtaining a perceived loss weight, and determining a perceived loss distance between the final image and the target image based on the target image, the replay image, the final image, and the perceived loss weight, the perceived loss distance being used to reflect a difference between the final image and the target image;
and optimizing the final image based on the perceived loss distance.
In a second aspect, an embodiment of the present invention further provides a three-dimensional image digitized reconstruction device, where the device includes:
the characteristic space construction module is used for acquiring an input image and constructing a standard characteristic space based on the input image, wherein the standard characteristic space is a three-plane standard characteristic space and is used for reflecting the three-dimensional image characteristics of the input image;
the expression characteristic analysis module is used for acquiring point cloud data corresponding to the input image, and constructing a dynamic table plot based on the point cloud data, wherein the dynamic table plot is used for reflecting dynamic expression characteristics corresponding to the input image;
and the image reconstruction module is used for acquiring camera attitude parameters, and performing volume rendering based on the standard feature space, the dynamic expression features and the camera attitude parameters to obtain a replay image so as to realize digital reconstruction of the three-dimensional portrait.
In a third aspect, an embodiment of the present invention further provides a terminal device, where the terminal device includes a memory, a processor, and a three-dimensional portrait digital reconstruction program stored in the memory and capable of running on the processor, and when the processor executes the three-dimensional portrait digital reconstruction program, the steps of the three-dimensional portrait digital reconstruction method according to any one of the above schemes are implemented.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a three-dimensional portrait digital reconstruction program, where the three-dimensional portrait digital reconstruction program, when executed by a processor, implements the steps of the three-dimensional portrait digital reconstruction method according to any one of the above schemes.
The beneficial effects are that: compared with the prior art, the invention provides a three-dimensional portrait digital reconstruction method, which comprises the steps of firstly acquiring an input image, and constructing a standard feature space based on the input image, wherein the standard feature space is a three-plane standard feature space, which is beneficial to establishing three-dimensional image features. Then, the invention acquires point cloud data corresponding to the input image, and constructs a dynamic table scenario based on the point cloud data, wherein the dynamic table scenario is used for reflecting dynamic expression characteristics corresponding to the input image. The invention uses the dynamic table plot based on the point cloud data, avoids the excessive processing of the expression information, and is convenient for realizing accurate and natural expression driving. Finally, the invention obtains the camera attitude parameters, and performs volume rendering based on the standard feature space, the dynamic expression features and the camera attitude parameters to obtain a replay image so as to realize the digital reconstruction of the three-dimensional portrait. Therefore, the invention can realize the three-dimensional figure digital reconstruction by analyzing and reasoning the input image without carrying out a large amount of user data and time-consuming independent training process, thereby simplifying the three-dimensional figure digital reconstruction process, improving the reconstruction efficiency and improving the reconstruction precision.
Drawings
Fig. 1 is a flowchart of a specific implementation of a three-dimensional figure digital reconstruction method according to an embodiment of the present invention.
Fig. 2 is a flowchart of a specific application scenario of the three-dimensional figure digital reconstruction method provided by the embodiment of the present invention.
Fig. 3 is a functional schematic diagram of a three-dimensional figure digital reconstruction device according to an embodiment of the present invention.
Fig. 4 is a schematic block diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Prior art 2D warp field based methods have difficulty maintaining multi-view consistency due to lack of necessary 3D constraints when significant changes in the head pose need to be driven. That is, when a large angle of turning is required, the model may fail or the identity information of the person may change (long phase, hairstyle, etc.). The expression capability of the method based on the grid model in the prior art is limited by the 3D deformable model, and the accuracy and the resolution of the grid are limited. It is difficult for the mesh model-based method to generate a high-precision photo-level digital avatar driving result. And since the 3D deformable model does not include the portion below the neck and the hair, the reconstructed digital avatar often lacks non-facial information such as the hair, and the expression often is not natural enough due to the limited expressive power of the 3D deformable model. The neural rendering-based method in the prior art either needs a large amount of portrait data for reconstruction, or the optimization process in the reasoning process is very time-consuming and cannot be well popularized to any new identity. However, in some methods for reconstructing a digital avatar using a single image, the accuracy of expression control is often insufficient, and it is difficult to drive some fine expression changes, resulting in an insufficient driving result.
In order to solve the problems in the prior art, the embodiment provides a three-dimensional portrait digital reconstruction method, which does not need to carry out a large amount of user data and time-consuming independent training process, thereby simplifying the process of three-dimensional portrait digital reconstruction, improving the reconstruction efficiency and improving the reconstruction precision. In specific application, the embodiment firstly acquires an input image, and constructs a standard feature space based on the input image, wherein the standard feature space is a three-plane standard feature space and is used for reflecting three-dimensional image features of the input image. And then, acquiring point cloud data corresponding to the input image, and constructing a dynamic table plot based on the point cloud data, wherein the dynamic table plot is used for reflecting dynamic expression characteristics corresponding to the input image. Finally, the embodiment obtains the camera gesture parameters, performs volume rendering based on the standard feature space, the dynamic expression features and the camera gesture parameters, and obtains a replay image so as to realize digital reconstruction of the three-dimensional portrait. From the method provided by the embodiment, the embodiment uses the dynamic table scene based on the point cloud data, avoids the excessive processing of the expression information, and is convenient for realizing accurate and natural expression driving
The three-dimensional figure digital reconstruction method can be applied to terminal equipment, wherein the terminal equipment comprises intelligent product terminals such as computers, mobile phones and intelligent televisions. Specifically, as shown in fig. 1, the method for digitally reconstructing the three-dimensional figure comprises the following steps:
step S100, an input image is obtained, and a standard feature space is constructed based on the input image, wherein the standard feature space is a three-plane standard feature space and is used for reflecting three-dimensional image features of the input image.
In this embodiment, the input image is a face image, as shown in fig. 2, and the input image may be one or more images, and when the input image is acquired, the embodiment may construct a canonical feature space based on the input image. The canonical feature space is a three-plane canonical feature space, which can be used to reflect three-dimensional image features because image features are reflected in the input image.
In one implementation, the present embodiment includes the following steps in building a canonical feature space:
step S101, mapping image features of the input image into three feature planes respectively based on a preset encoder;
and step S102, constructing the standard feature space based on the three feature planes.
Specifically, the present embodiment may map the image features of the input image into three feature planes using a preset encoder, which may be a Stylegan+Unet structure. For example, an input image of 3×512×512 dimensions may be displayedIs mapped into three feature planes of dimensions 3x32 x 256 by means of an encoder of the stylegan+unet structure. This step encodes a static gaugeFan Tezheng space, the present embodiment therefore builds the canonical feature space after mapping the image features to three feature planes.
In another implementation manner, if the input image is multiple, the embodiment may fuse the canonical feature spaces corresponding to each input image based on a preset attention module, so as to obtain the fused canonical feature spaces. Specifically, in inputting a plurality of input imagesIn this embodiment, a focus Module (MTA) is used to fuse the three-plane canonical feature space from different images to obtain the fused three-plane canonical feature space P, and the expression is as follows:
wherein->
Wherein the method comprises the steps ofIs a plurality of input images received by the attention module, wherein +.>Representing different input image numbers, +.>Representing the number of images entered. />Is an encoder that normalizes feature space, such as a Stylegan+Unet structure. />Is a learnable three-plane feature, +.>And->Generating query features and key features for calculating attention, respectively,>is the attention weight of the attention module. Output of attention module->The fused three-plane canonical feature space is shown, and since the image features of the multiple input images are received and the multiple three-plane canonical feature spaces are fused in the embodiment, the three-plane canonical feature space is convenient for generating the animated high-fidelity result.
Step 200, acquiring point cloud data corresponding to the input image, and constructing a dynamic table scenario based on the point cloud data, wherein the dynamic table scenario is used for reflecting dynamic expression characteristics corresponding to the input image.
Further, the terminal device generates point cloud data based on the input image, and then constructs a dynamic table scenario based on the point cloud data, in this embodiment, the point cloud data reflects a shape parameter, an expression parameter and a gesture parameter corresponding to the input image, where the shape parameter reflects a face contour corresponding to the input image, the expression parameter reflects a face expression corresponding to the input image, and the gesture parameter reflects a face gesture (such as a head-tilting gesture) corresponding to the input image, so that the dynamic table scenario constructed based on the point cloud data is used to reflect a dynamic expression feature corresponding to the input image.
In one implementation, the method includes the following steps when constructing the dynamic table scenario:
step 201, acquiring a shape parameter, an expression parameter and a posture parameter corresponding to the input image, and generating point cloud data based on the shape parameter, the expression parameter and the posture parameter;
step S202, obtaining position features corresponding to the point cloud data, and constructing the dynamic table scene based on the point cloud data and the position features.
In this embodiment, first, a shape parameter, an expression parameter, and a posture parameter corresponding to the input image are obtained, and then, based on the shape parameter, the expression parameter, and the posture parameter, point cloud data is generatedWherein->Is an input image +.>Form parameters of->Is an input image +.>Expression parameters of->Is an input image +.>Is a gesture parameter of (a). Then, the present embodiment may further determine a point cloud vertex in the point cloud data. In the embodiment, the position features corresponding to the point cloud data are acquired, when a dynamic table plot is constructed, the embodiment samples 8 points closest to each point cloud vertex, further combines the position features of the 8 points to obtain dynamic expression features, and then obtains the dynamic table plot based on the dynamic expression features.
Specifically, the embodiment obtains a plurality of neighboring points corresponding to the point cloud vertex, for example, the neighboring points are 8 closest points closest to the point cloud vertex. Then, the embodiment further obtains a first position feature of the adjacent point, and obtains a sampling point for synthesizing the light in the volume rendering process and the position feature of the sampling point. The first location feature of the neighboring points may be the location coordinates of these neighboring points, as the application is specific. Then, based on a preset position coding function, the first position characteristic and the first position characteristic of the adjacent point are comparedAnd encoding the relative positions among the position features of the sampling points to obtain encoded position features. And finally, carrying out linear regression on the second position feature and the coding position feature based on a linear regression layer to obtain the dynamic expression feature. In a specific application, the second position feature of the present embodiment may be a preset position weightFor example, the point cloud vertices include 5023 points, each having a position weight of 32 dimensions +.>The feature dimension of the vertices of the point cloud is therefore 5023x32. The linear regression layer carries out linear regression on the preset position weight and the coding position feature, so that dynamic expression features can be obtained, and the dynamic table scene is constructed based on the dynamic expression features. The dynamic expression features of this embodiment are expressed as:
wherein->
Wherein,is the position characteristic of the sampling points used for synthesizing the light rays in the volume rendering process, K is the number of adjacent points,for position weight, ++>For the location feature of the neighboring points, +.>Is a linear regression layer used, +.>Is a position coding function.
And step S300, acquiring camera attitude parameters, and performing volume rendering based on the standard feature space, the dynamic expression features and the camera attitude parameters to obtain a replay image so as to realize digital reconstruction of the three-dimensional portrait.
After the terminal equipment obtains the standard feature space, the dynamic expression features and the camera attitude parameters, volume rendering can be performed to obtain a replay image, and the replay image is the reconstructed digital avatar, so that the digital reconstruction of the three-dimensional figure is realized.
In one implementation manner, step S300 of the present embodiment specifically includes the following steps:
step 301, sampling a plurality of light beams based on the camera attitude parameters to obtain RGB data and density data of each sampling point, and synthesizing RGB colors of the light beams based on the RGB data and the density data;
step S302, performing volume rendering on the canonical feature space, the dynamic expression feature and the RGB colors based on a preset volume rendering function, so as to obtain the replay image.
Specifically, in this embodiment, the camera-in-camera-participating camera-out parameters are first acquired, and the camera-in-camera-participating camera-out parameters and the camera-in-camera-out parameters can be used to determine the field of view and the viewing angle of the camera, so that the camera pose parameters can be obtained. And then, sampling the multiple beams of light based on the camera attitude parameters to obtain RGB data and density data of each sampling point. When sampling is performed, the embodiment can perform two times of layered sampling along the light, the first time of layered sampling can find out the high-density area, the second time of layered sampling is performed on the high-density area to perform finer sampling, so that RGB data and density data of each sampling point are obtained, and the obtained RGB data and density data are synthesized to obtain the final RGB color of the light. Then, the embodiment performs volume rendering on the canonical feature space, the dynamic expression feature and the RGB colors based on a preset volume rendering function to obtain the replay image, where the expression is as follows:
wherein,r is a volume rendering function for replay image, < ->For camera pose parameters, E is a canonical feature space,for dynamic table scenario->For a parameter estimator for generating point cloud data, < ->Is the location weight of the point cloud.
Further, the replay image of the present embodimentFor the purpose of improving the accuracy of the digitized reconstruction of the three-dimensional portrait of the present embodiment, the present embodiment may utilize a lightweight super-resolution module GFPGAN to reconstruct the low-resolution image +.>Processing to obtain final high resolution image +.>Said final high resolution image +.>Is higher than the resolution of the replay image +.>Is a single-layer structure. For example, replay image->The resolution of (2) is 128x128, and the final image is obtained after the processing of the lightweight super-resolution module>Is 512x512.
In one embodiment, the present embodiment is also directed to the final image obtainedIt is sufficient that a target image may be sampled in advance from the same video data, as shown in fig. 2, the target image may constitute an image pair with an input image, and the target image and the input image may be images sampled for different expressions and postures of the same person, respectively. The present embodiment may evaluate the final image based on a perceptual loss technique in order to reconcile the final image with the target image. In particular, the present embodiment may acquire a perceptual loss weight acquisition target image, the target image and the input image being from the same video data. Then, a perceived loss weight is obtained, and a perceived loss distance between the final image and the target image is determined based on the target image, the replay image, the final image, and the perceived loss weight, the perceived loss distance being used to reflect a difference between the final image and the target image. And finally, optimizing the final image based on the perceived loss distance. The perceived loss distance->The specific expression of (2) is as follows:
wherein,is a target image, +.>For replay images of low resolution +.>Is the final image of high resolution, +.>Is based on the perceived loss of AlexNet, < ->Is the perceptual loss weight.
In addition, the present embodiment may also add density-based norm loss when evaluating the final image, as follows:
wherein,is the density used in volume rendering. The norm loss promotes the total density of the nerve rendering to be as low as possible, and can be used for carrying out constraint evaluation on the volume rendering process in the steps, so that the final image can be closely attached to the actual 3D shape to carry out three-dimensional image digital reconstruction and avoid the occurrence of artifacts. The objective function of this embodiment is:
wherein,and->Is the weight of the balance loss. The perceived loss distance between the final image of the present embodiment and the target image is evaluated based on the above-described objective function, and the final image is closest to the target image only when the training target reaches a low density of the entire feature space.
In summary, the present embodiment does not need to train a model for each digital avatar, and only needs to perform an inference process on the input image once to obtain a reconstructed digital avatar. We also have certain advantages in image rendering. Research shows that the embodiment has a certain advantage in expression driving precision compared with other synchronous methods, and results on a test set of a public dataset VFHQ show that indexes LMD (local mean value) and AED (simple arithmetic mean value) related to expression synchronism are obviously lower than those of other methods, and are more real in image quality, and PSNR (peak signal to noise ratio), SSIM (structural similarity) and LPIPS (perception loss) are also lower than those of other methods. The smaller the values of these indicators, the closer the expression representing the digital avatar drive to the desired expression, the better the quality of the video generation. In addition, the present embodiment can combine multiple input images to improve the reconstruction quality, and the results on the test set of the VFHQ public data set indicate that more input images can improve the expression synchronization related indicators LMD and AED, and are more realistic in image quality.
Based on the above embodiment, the present invention further provides a three-dimensional portrait digital reconstruction apparatus, as shown in fig. 3, including: the system comprises a feature space construction module 10, an expression feature analysis module 20 and an image reconstruction module 30. Specifically, the feature space construction module 10 is configured to acquire an input image, and construct a canonical feature space based on the input image, where the canonical feature space is a three-plane canonical feature space, and is configured to reflect three-dimensional image features of the input image. The expression feature analysis module 20 is configured to obtain point cloud data corresponding to the input image, and construct a dynamic table scenario based on the point cloud data, where the dynamic table scenario is configured to reflect dynamic expression features corresponding to the input image. The image reconstruction module 30 is configured to obtain camera pose parameters, perform volume rendering based on the canonical feature space, the dynamic expression features, and the camera pose parameters, and obtain a replay image, so as to implement digital reconstruction of a three-dimensional figure.
In one implementation, the feature space construction module 10 includes:
the feature mapping unit is used for mapping the image features of the input image into three feature planes respectively based on a preset encoder;
and the space construction unit is used for constructing the standard feature space based on the three feature planes.
In one implementation, the feature space construction module 10 further includes:
and the space fusion unit is used for fusing the standard feature space corresponding to each input image based on a preset attention module if the input images are multiple, so as to obtain the fused standard feature space.
In one implementation, the expression profile module 20 includes:
the point cloud generating unit is used for acquiring shape parameters, expression parameters and posture parameters corresponding to the input image and generating point cloud data based on the shape parameters, the expression parameters and the posture parameters;
and the expression analysis unit is used for acquiring the position characteristics corresponding to the point cloud data and constructing the dynamic table scene based on the point cloud data and the position characteristics.
In one implementation, the expression analysis unit includes:
a neighboring point obtaining unit, configured to obtain a point cloud vertex in the point cloud data, and determine a plurality of neighboring points corresponding to the point cloud vertex;
the position acquisition unit is used for acquiring a first position characteristic of the adjacent point and acquiring a sampling point used for synthesizing light in the volume rendering process and the position characteristic of the sampling point;
the position coding unit is used for coding the relative position between the first position feature of the adjacent point and the position feature of the sampling point based on a preset position coding function to obtain a coded position feature;
and the feature combination unit is used for carrying out linear regression on the second position feature and the coding position feature based on a linear regression layer to obtain the dynamic expression feature, and constructing the dynamic table scene based on the dynamic expression feature.
In one implementation, the image reconstruction module 30 includes:
the light sampling unit is used for sampling a plurality of beams of light based on the camera attitude parameters to obtain RGB data and density data of each sampling point, and synthesizing RGB colors of the light based on the RGB data and the density data;
and the volume rendering unit is used for performing volume rendering on the standard feature space, the dynamic expression features and the RGB colors based on a preset volume rendering function to obtain the replay image.
In one implementation, the apparatus further comprises:
and the image processing module is used for processing the replay image by utilizing the lightweight super-resolution module to obtain a final image, and the resolution of the final image is higher than that of the replay image.
In one implementation, the apparatus further comprises:
a target image acquisition unit configured to acquire a target image, the target image and the input image being from the same video data;
a perceived-loss-distance determination unit configured to acquire a perceived-loss weight, and determine a perceived-loss distance between the final image and the target image based on the target image, the replay image, the final image, and the perceived-loss weight, the perceived-loss distance being configured to reflect a difference between the final image and the target image;
and the final image optimization unit is used for optimizing the final image based on the perceived loss distance.
The working principle of each module in the three-dimensional portrait digital reconstruction apparatus in this embodiment is the same as that of each step in the above method embodiment, and will not be described here again.
Based on the above embodiment, the present invention also provides a terminal device, and a schematic block diagram of the terminal device may be shown in fig. 4. The terminal device may include one or more processors 100 (only one shown in fig. 4), a memory 101, and a computer program 102, e.g., a three-dimensional portrait digitized reconstruction program, stored in the memory 101 and executable on the one or more processors 100. The execution of the computer program 102 by one or more processors 100 may implement the various steps of an embodiment of a method for digitally reconstructing a three-dimensional figure. Alternatively, the one or more processors 100, when executing the computer program 102, may implement the functions of the modules/units in the embodiment of the three-dimensional image reconstruction apparatus, which is not limited herein.
In one embodiment, the processor 100 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In one embodiment, the memory 101 may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The memory 101 may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device. Further, the memory 101 may also include both an internal storage unit and an external storage device of the electronic device. The memory 101 is used to store computer programs and other programs and data required by the terminal device. The memory 101 may also be used to temporarily store data that has been output or is to be output.
It will be appreciated by persons skilled in the art that the functional block diagram shown in fig. 4 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the terminal device to which the present inventive arrangements are applied, and that a particular terminal device may include more or fewer components than shown, or may combine some of the components, or may have a different arrangement of components.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium, that when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for digitally reconstructing a three-dimensional figure, the method comprising:
acquiring an input image, and constructing a standard feature space based on the input image, wherein the standard feature space is a three-plane standard feature space and is used for reflecting three-dimensional image features of the input image;
acquiring point cloud data corresponding to the input image, and constructing a dynamic table plot based on the point cloud data, wherein the dynamic table plot is used for reflecting dynamic expression characteristics corresponding to the input image;
acquiring camera attitude parameters, and performing volume rendering based on the standard feature space, the dynamic expression features and the camera attitude parameters to obtain a replay image so as to realize digital reconstruction of the three-dimensional human image;
the obtaining the point cloud data corresponding to the input image, and constructing a dynamic table scenario based on the point cloud data includes:
acquiring shape parameters, expression parameters and posture parameters corresponding to the input image, and generating point cloud data based on the shape parameters, the expression parameters and the posture parameters;
acquiring position features corresponding to the point cloud data, acquiring point cloud vertexes in the point cloud data, and determining a plurality of adjacent points corresponding to the point cloud vertexes;
acquiring a first position characteristic of the adjacent point, and acquiring a sampling point for synthesizing light in the volume rendering process and the position characteristic of the sampling point;
encoding the relative position between the first position feature of the adjacent point and the position feature of the sampling point based on a preset position encoding function to obtain an encoded position feature;
and carrying out linear regression on the second position feature and the coding position feature based on a linear regression layer to obtain the dynamic expression feature, and constructing the dynamic table scene based on the dynamic expression feature.
2. The method of claim 1, wherein constructing a canonical feature space based on the input image comprises:
based on a preset encoder, mapping image features of the input image into three feature planes respectively;
and constructing the standard feature space based on the three feature planes.
3. The method for digitally reconstructing a three-dimensional portrait of claim 2, wherein said constructing a canonical feature space based on said input image further includes:
if the input images are multiple, fusing the standard feature spaces corresponding to each input image based on a preset attention module to obtain fused standard feature spaces.
4. The method according to claim 1, wherein the performing volume rendering based on the canonical feature space, the dynamic expression feature, and the camera pose parameter to obtain a replay image to implement three-dimensional image digital reconstruction includes:
sampling a plurality of beams of light based on the camera attitude parameters to obtain RGB data and density data of each sampling point, and synthesizing RGB colors of the light based on the RGB data and the density data;
and performing volume rendering on the standard feature space, the dynamic expression features and the RGB colors based on a preset volume rendering function to obtain the replay image.
5. The method of digitized reconstruction of a three-dimensional figure of claim 4, further comprising:
and processing the replay image by using a lightweight super-resolution module to obtain a final image, wherein the resolution of the final image is higher than that of the replay image.
6. The method of digitized reconstruction of a three-dimensional figure of claim 5, further comprising:
acquiring a target image, wherein the target image and the input image come from the same video data;
obtaining a perceived loss weight, and determining a perceived loss distance between the final image and the target image based on the target image, the replay image, the final image, and the perceived loss weight, the perceived loss distance being used to reflect a difference between the final image and the target image;
and optimizing the final image based on the perceived loss distance.
7. A three-dimensional portrait digitized reconstruction apparatus, the apparatus comprising:
the characteristic space construction module is used for acquiring an input image and constructing a standard characteristic space based on the input image, wherein the standard characteristic space is a three-plane standard characteristic space and is used for reflecting the three-dimensional image characteristics of the input image;
the expression characteristic analysis module is used for acquiring point cloud data corresponding to the input image, and constructing a dynamic table plot based on the point cloud data, wherein the dynamic table plot is used for reflecting dynamic expression characteristics corresponding to the input image;
the image reconstruction module is used for acquiring camera attitude parameters, and performing volume rendering based on the standard feature space, the dynamic expression features and the camera attitude parameters to obtain a replay image so as to realize digital reconstruction of the three-dimensional portrait;
the expression characteristic analysis module comprises:
the point cloud generating unit is used for acquiring shape parameters, expression parameters and posture parameters corresponding to the input image and generating point cloud data based on the shape parameters, the expression parameters and the posture parameters;
the expression analysis unit is used for acquiring the position characteristics corresponding to the point cloud data and constructing the dynamic table scene based on the point cloud data and the position characteristics;
the expression analysis unit includes:
a neighboring point obtaining unit, configured to obtain a point cloud vertex in the point cloud data, and determine a plurality of neighboring points corresponding to the point cloud vertex;
the position acquisition unit is used for acquiring a first position characteristic of the adjacent point and acquiring a sampling point used for synthesizing light in the volume rendering process and the position characteristic of the sampling point;
the position coding unit is used for coding the relative position between the first position feature of the adjacent point and the position feature of the sampling point based on a preset position coding function to obtain a coded position feature;
and the feature combination unit is used for carrying out linear regression on the second position feature and the coding position feature based on a linear regression layer to obtain the dynamic expression feature, and constructing the dynamic table scene based on the dynamic expression feature.
8. A terminal device, characterized in that it comprises a memory, a processor and a three-dimensional figure digitized reconstruction program stored in the memory and executable on the processor, the processor implementing the steps of the three-dimensional figure digitized reconstruction method according to any one of claims 1-6 when executing the three-dimensional figure digitized reconstruction program.
9. A computer readable storage medium, wherein a three-dimensional figure digitized reconstruction program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of the three-dimensional figure digitized reconstruction method of any one of claims 1-6.
CN202311747116.7A 2023-12-19 2023-12-19 Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium Active CN117422802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311747116.7A CN117422802B (en) 2023-12-19 2023-12-19 Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311747116.7A CN117422802B (en) 2023-12-19 2023-12-19 Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117422802A CN117422802A (en) 2024-01-19
CN117422802B true CN117422802B (en) 2024-04-12

Family

ID=89532944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311747116.7A Active CN117422802B (en) 2023-12-19 2023-12-19 Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117422802B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745956A (en) * 2024-02-20 2024-03-22 之江实验室 Pose guidance-based image generation method, device, medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023050992A1 (en) * 2021-09-30 2023-04-06 广州视源电子科技股份有限公司 Network training method and apparatus for facial reconstruction, and device and storage medium
CN116228979A (en) * 2023-02-24 2023-06-06 上海大学 Voice-driven editable face replay method, device and storage medium
CN116310076A (en) * 2022-12-29 2023-06-23 深圳万兴软件有限公司 Three-dimensional reconstruction method, device, equipment and storage medium based on nerve radiation field
CN116977546A (en) * 2023-03-31 2023-10-31 广东花至美容科技有限公司 Reconstruction method and device of face three-dimensional model and wearable device
CN116993948A (en) * 2023-09-26 2023-11-03 粤港澳大湾区数字经济研究院(福田) Face three-dimensional reconstruction method, system and intelligent terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023050992A1 (en) * 2021-09-30 2023-04-06 广州视源电子科技股份有限公司 Network training method and apparatus for facial reconstruction, and device and storage medium
CN116310076A (en) * 2022-12-29 2023-06-23 深圳万兴软件有限公司 Three-dimensional reconstruction method, device, equipment and storage medium based on nerve radiation field
CN116228979A (en) * 2023-02-24 2023-06-06 上海大学 Voice-driven editable face replay method, device and storage medium
CN116977546A (en) * 2023-03-31 2023-10-31 广东花至美容科技有限公司 Reconstruction method and device of face three-dimensional model and wearable device
CN116993948A (en) * 2023-09-26 2023-11-03 粤港澳大湾区数字经济研究院(福田) Face three-dimensional reconstruction method, system and intelligent terminal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Point-NeRF: Point-based Neural Radiance Fields;Qiangeng Xu 等;2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);20220927;第5428-5438页 *
Towards Real-World Blind Face Restoration with Generative Facial Prior;Xintao Wang 等;2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);20211202;第9164-9174页 *

Also Published As

Publication number Publication date
CN117422802A (en) 2024-01-19

Similar Documents

Publication Publication Date Title
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
CN110675489B (en) Image processing method, device, electronic equipment and storage medium
JP4679033B2 (en) System and method for median fusion of depth maps
JP4947593B2 (en) Apparatus and program for generating free viewpoint image by local region segmentation
CN117422802B (en) Three-dimensional figure digital reconstruction method, device, terminal equipment and storage medium
CN109844818B (en) Method for building deformable 3d model of element and related relation
CN109584327B (en) Face aging simulation method, device and equipment
CN109978984A (en) Face three-dimensional rebuilding method and terminal device
US20200357128A1 (en) Image reconstruction for virtual 3d
CN116109798B (en) Image data processing method, device, equipment and medium
US20030234784A1 (en) Accelerated visualization of surface light fields
CN116310076A (en) Three-dimensional reconstruction method, device, equipment and storage medium based on nerve radiation field
CN113762147B (en) Facial expression migration method and device, electronic equipment and storage medium
WO2021226862A1 (en) Neural opacity point cloud
CN116958362A (en) Image rendering method, device, equipment and storage medium
CN116778063A (en) Rapid virtual viewpoint synthesis method and device based on characteristic texture grid and hash coding
US20220222842A1 (en) Image reconstruction for virtual 3d
CN114092611A (en) Virtual expression driving method and device, electronic equipment and storage medium
CN116912148B (en) Image enhancement method, device, computer equipment and computer readable storage medium
CN117333637B (en) Modeling and rendering method, device and equipment for three-dimensional scene
KR102422822B1 (en) Apparatus and method for synthesizing 3d face image using competitive learning
JP4229398B2 (en) Three-dimensional modeling program, three-dimensional modeling control program, three-dimensional modeling data transmission program, recording medium, and three-dimensional modeling method
CN117036581A (en) Volume rendering method, system, equipment and medium based on two-dimensional nerve rendering
CN112184912A (en) Multi-metric three-dimensional face reconstruction method based on parameterized model and position map
CN115147577A (en) VR scene generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant