CN117422829A - Face image synthesis optimization method based on nerve radiation field - Google Patents

Face image synthesis optimization method based on nerve radiation field Download PDF

Info

Publication number
CN117422829A
CN117422829A CN202311379379.7A CN202311379379A CN117422829A CN 117422829 A CN117422829 A CN 117422829A CN 202311379379 A CN202311379379 A CN 202311379379A CN 117422829 A CN117422829 A CN 117422829A
Authority
CN
China
Prior art keywords
face image
dimensional
radiation field
parameters
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311379379.7A
Other languages
Chinese (zh)
Inventor
魏明强
赵安
朱哲
郭延文
王伟明
谢浩然
王富利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Research Institute Of Nanjing University Of Aeronautics And Astronautics
Nanjing University of Aeronautics and Astronautics
Original Assignee
Shenzhen Research Institute Of Nanjing University Of Aeronautics And Astronautics
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Research Institute Of Nanjing University Of Aeronautics And Astronautics, Nanjing University of Aeronautics and Astronautics filed Critical Shenzhen Research Institute Of Nanjing University Of Aeronautics And Astronautics
Priority to CN202311379379.7A priority Critical patent/CN117422829A/en
Publication of CN117422829A publication Critical patent/CN117422829A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face image synthesis optimization method based on a nerve radiation field, which comprises the steps of obtaining monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, carrying out real-time face tracking and analysis on a two-dimensional face image, and obtaining camera internal parameters, pose parameters and expression parameters; adopting camera internal parameters and pose parameters to sample light of a two-dimensional face image, obtaining corresponding three-dimensional coordinates, and carrying out position coding on the obtained three-dimensional coordinates; inputting the coded position and expression parameters and latent codes into a neural network formed by MLP to train a nerve radiation field, and obtaining RGB color values and densities; synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering; and (3) optimizing the two-dimensional face image synthesized in the step (S4) by adopting a generating countermeasure network. The neural network training time is short, the generated face image has less artifacts, the details are more abundant, and the neural network training method has wider prospects.

Description

Face image synthesis optimization method based on nerve radiation field
Technical Field
The invention belongs to the technical field of three-dimensional reconstruction and image new view angle synthesis, and particularly relates to a face image synthesis optimization method based on a nerve radiation field.
Background
Virtual digital man is a human body simulation technology based on computer technology, and can be used for simulating and reproducing the appearance, actions, behaviors and the like of human beings. Technically, virtual digital people have strong comprehensiveness and crossover, and relate to multiple fields of three-dimensional vision, computer graphics, natural language processing, including bionics, behavioural psychology, behavioural logic, and the like. Industrial applications around virtual digital people have also begun to land successively thanks to the continued development of digital man technology, for example in games and virtual reality, where virtual digital people can achieve highly realistic character simulation and motion capture, improving the immersion and realism of games and virtual reality.
As an important component of virtual digital man-made technology, three-dimensional face reconstruction has been a hot research direction in the fields of computer vision, computer graphics, and three-dimensional reconstruction. Because the facial features are basically similar in distribution position, the facial features can be mapped to the constructed low-dimensional parameterized face model, so that the digital high-efficiency expression of the face is realized.
Most of the existing researches are based on face reconstruction technology of a generated type countermeasure network, and by constructing an effective network structure and restricting the generated result to be consistent with the data distribution of a pre-collected data set, the related method can directly bypass the traditional three-dimensional explicit modeling and directly render high-quality face images with high resolution at photo level. However, when the pose parameters of the camera are changed, the reconstructed face image is difficult to maintain the consistency of the visual angles. In recent years, face reconstruction based on an implicit nerve function starts to be raised, and by taking HeadNeRF as an example, the method implicitly expresses information such as facial expression, identity and the like by means of a 3DMM low-dimensional face model. However, for images deviating from the training data, only results close to the training data can be returned in the relevant fitting task, so that the fitting cannot be performed accurately, and since the training data rarely relate to images with headwear, it is difficult to render headwear relevant content in the fitting results.
Disclosure of Invention
The invention aims to solve the technical problem of providing a face image synthesis optimization method based on a nerve radiation field aiming at the defects of the prior art.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a face image synthesis optimization method based on a nerve radiation field comprises the following steps:
s1, acquiring monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, and carrying out real-time face tracking and analysis on the two-dimensional face image to obtain camera internal parameters, pose parameters and expression parameters;
s2, carrying out layered light sampling on the two-dimensional face image by adopting camera internal parameters and pose parameters to obtain corresponding three-dimensional coordinates, and carrying out position coding on the three-dimensional coordinates;
s3, inputting the coded position, expression parameters and latent codes into a neural network formed by the MLP together for training a nerve radiation field to obtain RGB color values and densities;
s4, synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering;
and S5, optimizing the two-dimensional face image synthesized in the step S4 by adopting a generation countermeasure network.
In order to optimize the technical scheme, the specific measures adopted further comprise:
the above S1 uses a flap low dimensional face three dimensional model to perform two dimensional face image fitting to estimate head geometry, appearance and facial expression from a single Zhang Ren face image.
The S2 performs ray sampling on the real 3D scene by using a hierarchical sampling manner, performs dense sampling near points with large color contribution, performs sparse sampling near points with small contribution, and performs proportional distribution on samples to optimize two subsequent MLP networks before inputting the samples into the MLP networks: coarse and fine networks, thereby improving rendering efficiency.
The coding equation adopted in the above step S2 is as follows:
γ(p)=(sin(2 0 πp),cos(2 0 πp),…,sin(2 L-1 πp),cos(2 L-1 πp))
gamma (-) acts on each component (X, y, z) in the three-dimensional space coordinate X and the unit vector of the viewing angle direction respectivelyIs a dimension, p is a parameter of a gamma (·) function, for representing a three-dimensional coordinate X and a viewing angle vector +.>
The neural network formed by the MLP described in the above S3 is the main body of the neural radiation fieldThe backbone part consists of 8 fully-connected layers, each layer has 256 neurons, the layers are connected through a ReLu activation function, the backbone part is used for transmitting calculation position codes, the final output density sigma, and then 24-dimensional visual angle direction codes v are added The 3-dimensional RGB color values are output after passing through the 4-layer full connection layer.
The loss function of the neural network in S3 is as follows:
wherein R is the collection of rays in each batch in the S2 hierarchical sampling, R is each ray in the collection, C (R),The colors of the light rays RGB output in the real scene, the coarse network and the fine network in the step 3 are respectively;
and S3, accelerating the training process by utilizing a NerfAcc technology in a mode of skipping the empty area and the shielding area.
The volume rendering formula of S4 is as follows:
wherein rt (t) =o+td represents light, σ θ Representing density parameters, RGB θ The color parameter is represented by a color parameter,representing the viewing angle direction vector, z far And z near Respectively representing a far plane and a near plane, T (T) represents the light ray at T n The transmittance within the distance t, i.e. the probability that a ray can propagate without hitting any other particle, is defined as follows:
and S5, optimizing the two-dimensional face image synthesized in the S4 by using a HiFaceGAN network, wherein the network adopts a front-end suppression module to suppress heterogeneous degradation, and encodes robust hierarchical semantic information so as to guide a subsequent replenishment module to reconstruct a correspondingly lifelike-detailed renovated face, and after semantic features are acquired from the front-end suppression module, guiding detail replenishment is performed by using encoding features.
The invention has the following beneficial effects:
the invention relates to a combined face tracking, nerve radiation field (NeRF), generation type countermeasure network (GAN) and the like, and only a group of monocular RGB face video sequences are required to be input, so that a new view angle face image which can be edited and has rich details can be reconstructed; the training of the nerve radiation field is accelerated by using a NerfAcc acceleration technology, so that the time required by training is greatly shortened, and the time required by training can be compressed on the premise of ensuring the accuracy of the result by skipping an empty region and stopping the ray in a shielding region in advance; generating a new view angle image of the head of the human face in a volume rendering mode; a user-friendly visual interface can be developed, and the expression and the pose of the generated face image can be explicitly edited by modifying corresponding parameters in the interface, so that the model has wider prospect in the face reconstruction; the generation countermeasure network module HiFaceGAN is used for optimizing the picture generated by volume rendering, so that rendering result optimization is realized, noise reduction, artifact removal and detail enhancement are performed on the reconstructed face picture, and the generated face image artifact is reduced and the detail is richer.
Drawings
Fig. 1 is a schematic diagram of the overall design of the present invention.
FIG. 2 is a schematic diagram of a portion of a neural network according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Although the steps of the present invention are arranged by reference numerals, the order of the steps is not limited, and the relative order of the steps may be adjusted unless the order of the steps is explicitly stated or the execution of a step requires other steps as a basis. It is to be understood that the term "and/or" as used herein relates to and encompasses any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, the invention refers to a face image synthesis optimization method based on a nerve radiation field, which comprises the following steps:
s1, acquiring monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, and carrying out real-time face tracking and analysis on the two-dimensional face image to obtain camera internal parameters, pose parameters and expression parameters; specifically, a monocular RGB face video sequence is obtained, frame extraction is carried out, an available frame is obtained, a FLAME low-dimensional face three-dimensional model is used for fitting, and face tracking is used for analyzing the available frame to obtain relevant parameters.
S2, carrying out layered light sampling on the two-dimensional face image by adopting camera internal parameters and pose parameters to obtain corresponding three-dimensional coordinates, and carrying out position coding on the three-dimensional coordinates; specifically, object position parameters obtained in light sampling are encoded, and coordinates are mapped into a space with higher dimension, so that MLP can represent a function with higher frequency, and the geometry and texture of the object surface are more vivid;
s3, inputting the coded position, expression parameters and latent codes into a neural network formed by the MLP together for training a nerve radiation field to obtain RGB color values and densities;
S2-S3, sampling each extracted human face available frame image by utilizing the pose matrix and the camera internal and external parameters obtained in the S1, and simultaneously adding a latent code for reducing deviation in a preliminary model fitting process as much as possible, wherein the output trained by an MLP network is density and RGB color values;
s4, synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering; accelerating training by using a NerfAcc technology, and synthesizing a new view by using volume rendering;
and S5, optimizing the two-dimensional face image synthesized in the step S4 by adopting a generation countermeasure network. And inputting the preliminarily rendered face image into a HiFaceGAN network, and carrying out noise reduction, artifact removal and detail enhancement on the reconstructed face image.
In an embodiment, the S1 input is a set of monocular RGB face video sequences that require the camera position to be kept fixed. After preprocessing the original data, the face of the person is tracked and analyzed in real time by a face tracking technology, and the required information such as camera internal and external parameters, displacement matrixes, expression parameters and the like is obtained. Comprising the following steps:
step S11, for a group of input monocular RGB face videos, firstly extracting frames to obtain available frames, then fitting two-dimensional face images by combining a face low-dimensional model FLAME by using a face tracking technology, and estimating head geometry and appearance from the single Zhang Ren face images. The method is characterized in that three-dimensional faces are regarded as linear combination superposition of base vectors such as shapes, textures, expressions and the like, and each group of three-dimensional face data can be represented by combination among base vector spaces in a database, so that a model for solving any three-dimensional face is equivalent to coefficients for solving each base vector. Wherein, the related parameters of the FLAME model comprise spherical harmonic illumination parameters, texture geometric parameters, facial expression parameters, camera internal and external parameters and the like; then, using the parameters of the participation pose in the camera to sample light;
step S12, for the first frame, the iteration times are increased to obtain a more accurate initialization model, and for the rest frames, the initialization can be performed according to the previous estimation;
and S13, after finishing data preprocessing, generating a corresponding folder for each face two-dimensional picture, wherein the folder comprises the picture and 2D position coordinates of 68 face key points and is used for fitting a face low-dimensional deformation model in the next stage.
By editing the configuration file, 5 pictures with significant differences in expression and pose are manually selected as key frames for model fitting (including texture and shape) in the face tracking process.
In the embodiment, S2 performs light sampling on the two-dimensional face image by using the internal and external parameters of the camera and the pose matrix in the face tracking result, wherein the expression parameters and the pose parameters are editable parts, and the facial expression and the pose can be explicitly edited by modifying the corresponding parameters.
(1) A coarse-to-fine hierarchical sampling (Hierarchical volume sampling) technique is adopted to sample densely near points with large color contributions and sparsely near points with small contributions so as to reduce the necessary sampling times and fully sample the whole high-frequency scene representation. The samples are scaled according to their expected impact on final rendering for subsequent optimization of the two MLP networks: coarse and fine networks to improve rendering efficiency.
The layered sampling firstly divides the interval range formed by near and far on the camera ray equally, then uniformly samples in each cell to obtain a sampling point, N is the total c Sampling points:
wherein far=1, near=0;
for the light sampling part, the number of randomly sampled light rays is 2048, the chunksize value in the training process is 2048, the chunksize value in the verification process is 65536, and the number of sampling points of each light ray on the coarse network and the fine network is 64.
(2) The three-dimensional coordinates obtained by light sampling are subjected to position coding, mapped to a higher-dimensional space through a high-frequency function and then transmitted to a network, so that data containing high-frequency change can be better fitted, and a picture generated by learning is not excessively blurred;
the position coding is to change the original representation function into a combination of two functions:
wherein F' Θ The function is an MLP requiring learning over the network, while the gamma function does not require learning, and is here only a mapping function, mapping from space R to high-dimensional spaceThe coding equation used is:
γ(p)=(sin(2 0 πp),cos(2 0 πp),…,sin(2 L-1 πp),cos(2 L-1 πp))
gamma (-) acts on each component (X, y, z) in the three-dimensional space coordinate X and the unit vector of the viewing angle direction respectivelyIs included in the three components of (a). Finally, the encoded result is normalized to the interval [ -1,1 by using the sinh function]Between them. p is a parameter of a gamma (·) function for representing the three-dimensional coordinate X and the view vector.
The choice of dimension L, which is related to the complexity of the scene and the hardware computational effort used, also determines the magnitude of the highest frequency that the neural network can learn. For the spatial coordinate X, l=10, for the viewing angle directionFor l=4;
in the embodiment, the S3 MLP network is used as a main part of the nerve radiation field, the backbone part of the S3 MLP network is composed of 8 fully-connected layers, each layer has 256 neurons, the layers are connected through a ReLu activation function, and the S3 MLP network is used for transmitting calculation position codes and finally outputting density sigma. Then 24-dimensional view direction code v is added Outputting 3-dimensional RGB color values after 4-layer full connection layers (each layer contains 128 neurons);
since the coordinate system is converted into the head specification space, the near plane is set to 0.2 and the far plane is set to 0.8. The number of position coding layers of the two networks is 10, and the number of coding layers of the view direction is 4.
Inputting the encoded position parameters and visual angle parameters into a nerve radiation field, and adding 32-dimensional latent encoding parameters into the input of the MLP in order to reduce possible deviation in the face tracking process as much as possible;
the optimizer adopts Adam and the initial learning rate lr origin For 0.0005, the learning rate update formula is as follows:
wherein lr is decay Set to 250 lr decay_factor Set to 0.1.
The coded position is input into a neural network (shown in figure 2) formed by MLP together with the expression parameters of 76 dimensions and the latent codes of 32 dimensions to train the nerve radiation field, and the loss function of the MLP in the nerve radiation field is as follows:
wherein R is the collection of rays in each batch in the S2 hierarchical sampling, R is each ray in the collection, C (R),The colors of the light rays RGB output in the real scene (group Truth), the coarse network and the fine network in the step 3 are respectively;
the neural network is a neural radiation field backbone network, and outputs RGB color values and densities for subsequent image synthesis.
Using a NerfAcc technique, accelerating training of the neural radiation field by skipping empty regions and occlusion regions, comprising:
the method has a faster reading speed by using an occupied grid estimator (Occupancy Grid Estimator) to buffer the density in the scene using a binarized voxel grid, passing the light through the grid in a preset step size when sampling, and skipping the blank area by querying the voxel grid.
For a ray, if it strikes an occluding object, the point occluded by the object may be ignored, i.e., the color expected by the sampled ray is the color of the occluding object.
In NerfAcc, a threshold T is set, the density sigma of each point is calculated in the light projection process, so that the corresponding T value is calculated, if the T value of the point is smaller than the set threshold, the ray is hit on an shielding object at the moment, and the light projection process can be terminated.
In the embodiment, S4 synthesizes the density and RGB color output by the network into a two-dimensional face image with two-dimensional new viewing angles by using volume rendering. The principle of the volume rendering technique is to sample in a three-dimensional dataset, then perform ray tracing on each sampling point, and finally generate a two-dimensional image. In this process, the value of each sampling point represents the color and transparency, and since each pixel value of the two-dimensional image can be regarded as a cumulative superposition of all points on one ray emitted by the camera, the color of each pixel can be obtained by integrating the colors based on the density, thereby rendering the two-dimensional image under the view angle.
The sampling results are accumulated, and the distance from the camera to the near plane is assumed to be t n Distance to far plane t f For the sampling ray r (t) =o+td, the final desired color calculation formula is as follows:
wherein r (t) =o+td represents light, σ θ Representing density parameters, RGB θ The color parameter is represented by a color parameter,representing the viewing angle direction vector, z far And z near Respectively representing a far plane and a near plane, T (T) represents the light ray at T n The transmittance within the distance t, i.e. the probability that a ray can propagate without hitting any other particle, is defined as follows:
in an embodiment, S5 optimizes the face image rendered in step S4 using a HiFaceGAN network, which is collectively referred to as facial repair through collaborative suppression and replenishment. And (3) denoising and removing artifacts and improving detail precision for the image synthesized by the volume rendering by using the network.
(1) The suppression module aims at suppressing heterogeneous degradation and encoding robust hierarchical semantic information to guide a subsequent replenishment module to reconstruct a correspondingly lifelike-detailed renovated face.
(2) After semantic features are acquired from the front-end suppression module, guide detail supplementation is performed by using coding features.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (9)

1. The face image synthesis optimization method based on the nerve radiation field is characterized by comprising the following steps of:
s1, acquiring monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, and carrying out real-time face tracking and analysis on the two-dimensional face image to obtain camera internal parameters, pose parameters and expression parameters;
s2, carrying out layered light sampling on the two-dimensional face image by adopting camera internal parameters and pose parameters to obtain corresponding three-dimensional coordinates, and carrying out position coding on the three-dimensional coordinates;
s3, inputting the coded position, expression parameters and latent codes into a neural network formed by the MLP together for training a nerve radiation field to obtain RGB color values and densities;
s4, synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering;
and S5, optimizing the two-dimensional face image synthesized in the step S4 by adopting a generation countermeasure network.
2. The method of claim 1, wherein S1 uses a flag low-dimensional face model to fit two-dimensional face images, and estimates head geometry, appearance and facial expression from a single Zhang Ren face image.
3. The face image synthesis optimization method based on the neural radiation field according to claim 1, wherein S2 performs ray sampling on a real 3D scene by using a hierarchical sampling technique from thick to thin, performs dense sampling near points with large color contribution, performs sparse sampling near points with small color contribution, and performs proportional distribution on samples according to the expected effect of the samples on final rendering before inputting to the MLP network, for subsequent optimization of two MLP networks: coarse network and fine network, raise and render the efficiency.
4. The face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the coding equation adopted by S2 is:
γ(p)=(sin(2 0 πp),cos(2 0 πp),…,sin(2 L-1 πp),cos(2 L-1 πp))
wherein gamma (-) acts on each component (X, y, z) in the three-dimensional space coordinate X and the unit vector of the viewing angle direction respectivelyIs included in the three components of (a); l is dimension, p is gamma (& gt) parameter, which is used for representing three-dimensional coordinate X and view angle vector +.>
5. The face image synthesis optimization method based on nerve radiation field according to claim 1, wherein the neural network formed by the MLP in S3 is a main body part of the nerve radiation field, a backbone part of the neural network is composed of 8 fully connected layers, each layer has 256 neurons, the layers are connected through a ReLu activation function, the backbone part is used for transmitting calculated position codes, the final output density sigma, and then 24-dimensional visual angle direction codes v are added The 3-dimensional RGB color values are output after passing through the 4-layer full connection layer.
6. The face image synthesis optimization method based on the neural radiation field according to claim 1, wherein the loss function of the neural network in S3 is as follows:
wherein R is the collection of rays in each batch in the S2 hierarchical sampling, R is each ray in the collection, C (R),The colors of the light rays RGB output in the real 3D scene, the coarse network and the fine network in the step 3 are respectively.
7. The face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the step S3 is to accelerate the training process by skipping the empty region and the shielding region by using the nerface technology.
8. The face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the volume rendering formula of S4 is as follows:
wherein r (t) =o+td represents light, σ θ Representing density parameters, RGB θ The color parameter is represented by a color parameter,representing the viewing angle direction vector, z far And z near Respectively representing a far plane and a near plane, T (T) represents the light ray at T n The transmittance within the distance t, i.e. the probability that a ray can propagate without hitting any other particle, is defined as follows:
9. the face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the S5 optimizes the two-dimensional face image synthesized by the S4 by using a HiFaceGAN network, the network adopts a front-end suppression module to suppress heterogeneous degradation and encodes robust hierarchical semantic information to guide a subsequent replenishment module to reconstruct a correspondingly lifelike face, and the encoding feature is utilized to guide detail replenishment after semantic features are acquired from the front-end suppression module.
CN202311379379.7A 2023-10-24 2023-10-24 Face image synthesis optimization method based on nerve radiation field Pending CN117422829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311379379.7A CN117422829A (en) 2023-10-24 2023-10-24 Face image synthesis optimization method based on nerve radiation field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311379379.7A CN117422829A (en) 2023-10-24 2023-10-24 Face image synthesis optimization method based on nerve radiation field

Publications (1)

Publication Number Publication Date
CN117422829A true CN117422829A (en) 2024-01-19

Family

ID=89532032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311379379.7A Pending CN117422829A (en) 2023-10-24 2023-10-24 Face image synthesis optimization method based on nerve radiation field

Country Status (1)

Country Link
CN (1) CN117422829A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689783A (en) * 2024-02-02 2024-03-12 湖南马栏山视频先进技术研究院有限公司 Face voice driving method and device based on super-parameter nerve radiation field

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170559A (en) * 2022-08-12 2022-10-11 杭州像衍科技有限公司 Personalized human head nerve radiation field substrate representation and reconstruction method based on multilevel Hash coding
CN115439311A (en) * 2022-08-24 2022-12-06 南京航空航天大学 Face mask editing method based on generation of confrontation network
CN115689869A (en) * 2022-10-21 2023-02-03 中国科学院计算技术研究所 Video makeup migration method and system
CN116071494A (en) * 2022-12-23 2023-05-05 杭州像衍科技有限公司 High-fidelity three-dimensional face reconstruction and generation method based on implicit nerve function
WO2023093186A1 (en) * 2022-06-15 2023-06-01 之江实验室 Neural radiation field-based method and apparatus for constructing pedestrian re-identification three-dimensional data set
CN116228979A (en) * 2023-02-24 2023-06-06 上海大学 Voice-driven editable face replay method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023093186A1 (en) * 2022-06-15 2023-06-01 之江实验室 Neural radiation field-based method and apparatus for constructing pedestrian re-identification three-dimensional data set
CN115170559A (en) * 2022-08-12 2022-10-11 杭州像衍科技有限公司 Personalized human head nerve radiation field substrate representation and reconstruction method based on multilevel Hash coding
CN115439311A (en) * 2022-08-24 2022-12-06 南京航空航天大学 Face mask editing method based on generation of confrontation network
CN115689869A (en) * 2022-10-21 2023-02-03 中国科学院计算技术研究所 Video makeup migration method and system
CN116071494A (en) * 2022-12-23 2023-05-05 杭州像衍科技有限公司 High-fidelity three-dimensional face reconstruction and generation method based on implicit nerve function
CN116228979A (en) * 2023-02-24 2023-06-06 上海大学 Voice-driven editable face replay method, device and storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JIAYANG BAI等: "Self-NeRF: A Self-Training Pipeline for Few-Shot Neural Radiance Fields", 《ARXIV》, 10 March 2023 (2023-03-10), pages 2303 *
LINGBO YANG等: "HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment", 《IN 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM ’20)》, 12 October 2020 (2020-10-12), pages 1 - 6 *
SHAHRUKH ATHAR等: "FLAME-in-NeRF: Neural control of Radiance Fields for Free View Face Animation", 《2023 IEEE 17TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG)》, 16 February 2023 (2023-02-16) *
YIYU ZHUANG等: "MoFaNeRF: Morphable Facial Neural Radiance Field", 《ARXIV》, 22 July 2022 (2022-07-22), pages 1 - 6 *
古路: "加速Nerf训练:nerfacc", pages 0 - 2, Retrieved from the Internet <URL:https://blog.csdn.net/fb_941219/article/details/131680149> *
张耀等: "基于深度学习的视觉同时定位与建图研究进展", 《仪器仪表学报》, vol. 44, no. 07, 31 July 2023 (2023-07-31), pages 214 - 241 *
洪阳: "高保真虚拟数字人的表示与重建", 《中国博士学位论文全文数据库 信息科技辑》, no. 03, 15 March 2023 (2023-03-15), pages 138 - 32 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689783A (en) * 2024-02-02 2024-03-12 湖南马栏山视频先进技术研究院有限公司 Face voice driving method and device based on super-parameter nerve radiation field
CN117689783B (en) * 2024-02-02 2024-04-30 湖南马栏山视频先进技术研究院有限公司 Face voice driving method and device based on super-parameter nerve radiation field

Similar Documents

Publication Publication Date Title
Rematas et al. Novel views of objects from a single image
Gao et al. Reconstructing personalized semantic facial nerf models from monocular video
CN112887698B (en) High-quality face voice driving method based on nerve radiation field
CN113066171B (en) Face image generation method based on three-dimensional face deformation model
CN115951784B (en) Method for capturing and generating motion of wearing human body based on double nerve radiation fields
CN113808047B (en) Denoising method for human motion capture data
CN117422829A (en) Face image synthesis optimization method based on nerve radiation field
CN115457169A (en) Voice-driven human face animation generation method and system
CN116416376A (en) Three-dimensional hair reconstruction method, system, electronic equipment and storage medium
CN117315211B (en) Digital human synthesis and model training method, device, equipment and storage medium thereof
CN117496072B (en) Three-dimensional digital person generation and interaction method and system
CN116134491A (en) Multi-view neuro-human prediction using implicit differentiable renderers for facial expression, body posture morphology, and clothing performance capture
Chen et al. TeSTNeRF: text-driven 3D style transfer via cross-modal learning
Rabby et al. Beyondpixels: A comprehensive review of the evolution of neural radiance fields
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN114972619A (en) Single-image face three-dimensional reconstruction method based on self-alignment double regression
CN117036620A (en) Three-dimensional face reconstruction method based on single image
CN112184912A (en) Multi-metric three-dimensional face reconstruction method based on parameterized model and position map
Maxim et al. A survey on the current state of the art on deep learning 3D reconstruction
CN116385606A (en) Speech signal driven personalized three-dimensional face animation generation method and application thereof
Yang et al. Poxture: Human posture imitation using neural texture
Chang et al. 3D hand reconstruction with both shape and appearance from an RGB image
CN112233018A (en) Reference image guided face super-resolution method based on three-dimensional deformation model
Knoll et al. Animating NeRFs from Texture Space: A Framework for Pose-Dependent Rendering of Human Performances
Feng et al. 3D face style transfer with a hybrid solution of NeRF and mesh rasterization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination