CN117422829A - Face image synthesis optimization method based on nerve radiation field - Google Patents
Face image synthesis optimization method based on nerve radiation field Download PDFInfo
- Publication number
- CN117422829A CN117422829A CN202311379379.7A CN202311379379A CN117422829A CN 117422829 A CN117422829 A CN 117422829A CN 202311379379 A CN202311379379 A CN 202311379379A CN 117422829 A CN117422829 A CN 117422829A
- Authority
- CN
- China
- Prior art keywords
- face image
- dimensional
- radiation field
- parameters
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000005855 radiation Effects 0.000 title claims abstract description 29
- 210000005036 nerve Anatomy 0.000 title claims abstract description 25
- 238000005457 optimization Methods 0.000 title claims abstract description 18
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 17
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 17
- 238000009877 rendering Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 230000014509 gene expression Effects 0.000 claims abstract description 15
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 5
- 238000004458 analytical method Methods 0.000 claims abstract description 4
- 238000005070 sampling Methods 0.000 claims description 31
- 239000013598 vector Substances 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 230000001629 suppression Effects 0.000 claims description 7
- 230000008921 facial expression Effects 0.000 claims description 5
- 230000001537 neural effect Effects 0.000 claims description 5
- 230000000007 visual effect Effects 0.000 claims description 5
- 239000003086 colorant Substances 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000015556 catabolic process Effects 0.000 claims description 3
- 238000006731 degradation reaction Methods 0.000 claims description 3
- 239000002245 particle Substances 0.000 claims description 3
- 238000002834 transmittance Methods 0.000 claims description 3
- 230000003631 expected effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 15
- 241000282414 Homo sapiens Species 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 3
- 230000003542 behavioural effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 235000001968 nicotinic acid Nutrition 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a face image synthesis optimization method based on a nerve radiation field, which comprises the steps of obtaining monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, carrying out real-time face tracking and analysis on a two-dimensional face image, and obtaining camera internal parameters, pose parameters and expression parameters; adopting camera internal parameters and pose parameters to sample light of a two-dimensional face image, obtaining corresponding three-dimensional coordinates, and carrying out position coding on the obtained three-dimensional coordinates; inputting the coded position and expression parameters and latent codes into a neural network formed by MLP to train a nerve radiation field, and obtaining RGB color values and densities; synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering; and (3) optimizing the two-dimensional face image synthesized in the step (S4) by adopting a generating countermeasure network. The neural network training time is short, the generated face image has less artifacts, the details are more abundant, and the neural network training method has wider prospects.
Description
Technical Field
The invention belongs to the technical field of three-dimensional reconstruction and image new view angle synthesis, and particularly relates to a face image synthesis optimization method based on a nerve radiation field.
Background
Virtual digital man is a human body simulation technology based on computer technology, and can be used for simulating and reproducing the appearance, actions, behaviors and the like of human beings. Technically, virtual digital people have strong comprehensiveness and crossover, and relate to multiple fields of three-dimensional vision, computer graphics, natural language processing, including bionics, behavioural psychology, behavioural logic, and the like. Industrial applications around virtual digital people have also begun to land successively thanks to the continued development of digital man technology, for example in games and virtual reality, where virtual digital people can achieve highly realistic character simulation and motion capture, improving the immersion and realism of games and virtual reality.
As an important component of virtual digital man-made technology, three-dimensional face reconstruction has been a hot research direction in the fields of computer vision, computer graphics, and three-dimensional reconstruction. Because the facial features are basically similar in distribution position, the facial features can be mapped to the constructed low-dimensional parameterized face model, so that the digital high-efficiency expression of the face is realized.
Most of the existing researches are based on face reconstruction technology of a generated type countermeasure network, and by constructing an effective network structure and restricting the generated result to be consistent with the data distribution of a pre-collected data set, the related method can directly bypass the traditional three-dimensional explicit modeling and directly render high-quality face images with high resolution at photo level. However, when the pose parameters of the camera are changed, the reconstructed face image is difficult to maintain the consistency of the visual angles. In recent years, face reconstruction based on an implicit nerve function starts to be raised, and by taking HeadNeRF as an example, the method implicitly expresses information such as facial expression, identity and the like by means of a 3DMM low-dimensional face model. However, for images deviating from the training data, only results close to the training data can be returned in the relevant fitting task, so that the fitting cannot be performed accurately, and since the training data rarely relate to images with headwear, it is difficult to render headwear relevant content in the fitting results.
Disclosure of Invention
The invention aims to solve the technical problem of providing a face image synthesis optimization method based on a nerve radiation field aiming at the defects of the prior art.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a face image synthesis optimization method based on a nerve radiation field comprises the following steps:
s1, acquiring monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, and carrying out real-time face tracking and analysis on the two-dimensional face image to obtain camera internal parameters, pose parameters and expression parameters;
s2, carrying out layered light sampling on the two-dimensional face image by adopting camera internal parameters and pose parameters to obtain corresponding three-dimensional coordinates, and carrying out position coding on the three-dimensional coordinates;
s3, inputting the coded position, expression parameters and latent codes into a neural network formed by the MLP together for training a nerve radiation field to obtain RGB color values and densities;
s4, synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering;
and S5, optimizing the two-dimensional face image synthesized in the step S4 by adopting a generation countermeasure network.
In order to optimize the technical scheme, the specific measures adopted further comprise:
the above S1 uses a flap low dimensional face three dimensional model to perform two dimensional face image fitting to estimate head geometry, appearance and facial expression from a single Zhang Ren face image.
The S2 performs ray sampling on the real 3D scene by using a hierarchical sampling manner, performs dense sampling near points with large color contribution, performs sparse sampling near points with small contribution, and performs proportional distribution on samples to optimize two subsequent MLP networks before inputting the samples into the MLP networks: coarse and fine networks, thereby improving rendering efficiency.
The coding equation adopted in the above step S2 is as follows:
γ(p)=(sin(2 0 πp),cos(2 0 πp),…,sin(2 L-1 πp),cos(2 L-1 πp))
gamma (-) acts on each component (X, y, z) in the three-dimensional space coordinate X and the unit vector of the viewing angle direction respectivelyIs a dimension, p is a parameter of a gamma (·) function, for representing a three-dimensional coordinate X and a viewing angle vector +.>
The neural network formed by the MLP described in the above S3 is the main body of the neural radiation fieldThe backbone part consists of 8 fully-connected layers, each layer has 256 neurons, the layers are connected through a ReLu activation function, the backbone part is used for transmitting calculation position codes, the final output density sigma, and then 24-dimensional visual angle direction codes v are added → The 3-dimensional RGB color values are output after passing through the 4-layer full connection layer.
The loss function of the neural network in S3 is as follows:
wherein R is the collection of rays in each batch in the S2 hierarchical sampling, R is each ray in the collection, C (R),The colors of the light rays RGB output in the real scene, the coarse network and the fine network in the step 3 are respectively;
and S3, accelerating the training process by utilizing a NerfAcc technology in a mode of skipping the empty area and the shielding area.
The volume rendering formula of S4 is as follows:
wherein rt (t) =o+td represents light, σ θ Representing density parameters, RGB θ The color parameter is represented by a color parameter,representing the viewing angle direction vector, z far And z near Respectively representing a far plane and a near plane, T (T) represents the light ray at T n The transmittance within the distance t, i.e. the probability that a ray can propagate without hitting any other particle, is defined as follows:
and S5, optimizing the two-dimensional face image synthesized in the S4 by using a HiFaceGAN network, wherein the network adopts a front-end suppression module to suppress heterogeneous degradation, and encodes robust hierarchical semantic information so as to guide a subsequent replenishment module to reconstruct a correspondingly lifelike-detailed renovated face, and after semantic features are acquired from the front-end suppression module, guiding detail replenishment is performed by using encoding features.
The invention has the following beneficial effects:
the invention relates to a combined face tracking, nerve radiation field (NeRF), generation type countermeasure network (GAN) and the like, and only a group of monocular RGB face video sequences are required to be input, so that a new view angle face image which can be edited and has rich details can be reconstructed; the training of the nerve radiation field is accelerated by using a NerfAcc acceleration technology, so that the time required by training is greatly shortened, and the time required by training can be compressed on the premise of ensuring the accuracy of the result by skipping an empty region and stopping the ray in a shielding region in advance; generating a new view angle image of the head of the human face in a volume rendering mode; a user-friendly visual interface can be developed, and the expression and the pose of the generated face image can be explicitly edited by modifying corresponding parameters in the interface, so that the model has wider prospect in the face reconstruction; the generation countermeasure network module HiFaceGAN is used for optimizing the picture generated by volume rendering, so that rendering result optimization is realized, noise reduction, artifact removal and detail enhancement are performed on the reconstructed face picture, and the generated face image artifact is reduced and the detail is richer.
Drawings
Fig. 1 is a schematic diagram of the overall design of the present invention.
FIG. 2 is a schematic diagram of a portion of a neural network according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Although the steps of the present invention are arranged by reference numerals, the order of the steps is not limited, and the relative order of the steps may be adjusted unless the order of the steps is explicitly stated or the execution of a step requires other steps as a basis. It is to be understood that the term "and/or" as used herein relates to and encompasses any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, the invention refers to a face image synthesis optimization method based on a nerve radiation field, which comprises the following steps:
s1, acquiring monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, and carrying out real-time face tracking and analysis on the two-dimensional face image to obtain camera internal parameters, pose parameters and expression parameters; specifically, a monocular RGB face video sequence is obtained, frame extraction is carried out, an available frame is obtained, a FLAME low-dimensional face three-dimensional model is used for fitting, and face tracking is used for analyzing the available frame to obtain relevant parameters.
S2, carrying out layered light sampling on the two-dimensional face image by adopting camera internal parameters and pose parameters to obtain corresponding three-dimensional coordinates, and carrying out position coding on the three-dimensional coordinates; specifically, object position parameters obtained in light sampling are encoded, and coordinates are mapped into a space with higher dimension, so that MLP can represent a function with higher frequency, and the geometry and texture of the object surface are more vivid;
s3, inputting the coded position, expression parameters and latent codes into a neural network formed by the MLP together for training a nerve radiation field to obtain RGB color values and densities;
S2-S3, sampling each extracted human face available frame image by utilizing the pose matrix and the camera internal and external parameters obtained in the S1, and simultaneously adding a latent code for reducing deviation in a preliminary model fitting process as much as possible, wherein the output trained by an MLP network is density and RGB color values;
s4, synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering; accelerating training by using a NerfAcc technology, and synthesizing a new view by using volume rendering;
and S5, optimizing the two-dimensional face image synthesized in the step S4 by adopting a generation countermeasure network. And inputting the preliminarily rendered face image into a HiFaceGAN network, and carrying out noise reduction, artifact removal and detail enhancement on the reconstructed face image.
In an embodiment, the S1 input is a set of monocular RGB face video sequences that require the camera position to be kept fixed. After preprocessing the original data, the face of the person is tracked and analyzed in real time by a face tracking technology, and the required information such as camera internal and external parameters, displacement matrixes, expression parameters and the like is obtained. Comprising the following steps:
step S11, for a group of input monocular RGB face videos, firstly extracting frames to obtain available frames, then fitting two-dimensional face images by combining a face low-dimensional model FLAME by using a face tracking technology, and estimating head geometry and appearance from the single Zhang Ren face images. The method is characterized in that three-dimensional faces are regarded as linear combination superposition of base vectors such as shapes, textures, expressions and the like, and each group of three-dimensional face data can be represented by combination among base vector spaces in a database, so that a model for solving any three-dimensional face is equivalent to coefficients for solving each base vector. Wherein, the related parameters of the FLAME model comprise spherical harmonic illumination parameters, texture geometric parameters, facial expression parameters, camera internal and external parameters and the like; then, using the parameters of the participation pose in the camera to sample light;
step S12, for the first frame, the iteration times are increased to obtain a more accurate initialization model, and for the rest frames, the initialization can be performed according to the previous estimation;
and S13, after finishing data preprocessing, generating a corresponding folder for each face two-dimensional picture, wherein the folder comprises the picture and 2D position coordinates of 68 face key points and is used for fitting a face low-dimensional deformation model in the next stage.
By editing the configuration file, 5 pictures with significant differences in expression and pose are manually selected as key frames for model fitting (including texture and shape) in the face tracking process.
In the embodiment, S2 performs light sampling on the two-dimensional face image by using the internal and external parameters of the camera and the pose matrix in the face tracking result, wherein the expression parameters and the pose parameters are editable parts, and the facial expression and the pose can be explicitly edited by modifying the corresponding parameters.
(1) A coarse-to-fine hierarchical sampling (Hierarchical volume sampling) technique is adopted to sample densely near points with large color contributions and sparsely near points with small contributions so as to reduce the necessary sampling times and fully sample the whole high-frequency scene representation. The samples are scaled according to their expected impact on final rendering for subsequent optimization of the two MLP networks: coarse and fine networks to improve rendering efficiency.
The layered sampling firstly divides the interval range formed by near and far on the camera ray equally, then uniformly samples in each cell to obtain a sampling point, N is the total c Sampling points:
wherein far=1, near=0;
for the light sampling part, the number of randomly sampled light rays is 2048, the chunksize value in the training process is 2048, the chunksize value in the verification process is 65536, and the number of sampling points of each light ray on the coarse network and the fine network is 64.
(2) The three-dimensional coordinates obtained by light sampling are subjected to position coding, mapped to a higher-dimensional space through a high-frequency function and then transmitted to a network, so that data containing high-frequency change can be better fitted, and a picture generated by learning is not excessively blurred;
the position coding is to change the original representation function into a combination of two functions:
wherein F' Θ The function is an MLP requiring learning over the network, while the gamma function does not require learning, and is here only a mapping function, mapping from space R to high-dimensional spaceThe coding equation used is:
γ(p)=(sin(2 0 πp),cos(2 0 πp),…,sin(2 L-1 πp),cos(2 L-1 πp))
gamma (-) acts on each component (X, y, z) in the three-dimensional space coordinate X and the unit vector of the viewing angle direction respectivelyIs included in the three components of (a). Finally, the encoded result is normalized to the interval [ -1,1 by using the sinh function]Between them. p is a parameter of a gamma (·) function for representing the three-dimensional coordinate X and the view vector.
The choice of dimension L, which is related to the complexity of the scene and the hardware computational effort used, also determines the magnitude of the highest frequency that the neural network can learn. For the spatial coordinate X, l=10, for the viewing angle directionFor l=4;
in the embodiment, the S3 MLP network is used as a main part of the nerve radiation field, the backbone part of the S3 MLP network is composed of 8 fully-connected layers, each layer has 256 neurons, the layers are connected through a ReLu activation function, and the S3 MLP network is used for transmitting calculation position codes and finally outputting density sigma. Then 24-dimensional view direction code v is added → Outputting 3-dimensional RGB color values after 4-layer full connection layers (each layer contains 128 neurons);
since the coordinate system is converted into the head specification space, the near plane is set to 0.2 and the far plane is set to 0.8. The number of position coding layers of the two networks is 10, and the number of coding layers of the view direction is 4.
Inputting the encoded position parameters and visual angle parameters into a nerve radiation field, and adding 32-dimensional latent encoding parameters into the input of the MLP in order to reduce possible deviation in the face tracking process as much as possible;
the optimizer adopts Adam and the initial learning rate lr origin For 0.0005, the learning rate update formula is as follows:
wherein lr is decay Set to 250 lr decay_factor Set to 0.1.
The coded position is input into a neural network (shown in figure 2) formed by MLP together with the expression parameters of 76 dimensions and the latent codes of 32 dimensions to train the nerve radiation field, and the loss function of the MLP in the nerve radiation field is as follows:
wherein R is the collection of rays in each batch in the S2 hierarchical sampling, R is each ray in the collection, C (R),The colors of the light rays RGB output in the real scene (group Truth), the coarse network and the fine network in the step 3 are respectively;
the neural network is a neural radiation field backbone network, and outputs RGB color values and densities for subsequent image synthesis.
Using a NerfAcc technique, accelerating training of the neural radiation field by skipping empty regions and occlusion regions, comprising:
the method has a faster reading speed by using an occupied grid estimator (Occupancy Grid Estimator) to buffer the density in the scene using a binarized voxel grid, passing the light through the grid in a preset step size when sampling, and skipping the blank area by querying the voxel grid.
For a ray, if it strikes an occluding object, the point occluded by the object may be ignored, i.e., the color expected by the sampled ray is the color of the occluding object.
In NerfAcc, a threshold T is set, the density sigma of each point is calculated in the light projection process, so that the corresponding T value is calculated, if the T value of the point is smaller than the set threshold, the ray is hit on an shielding object at the moment, and the light projection process can be terminated.
In the embodiment, S4 synthesizes the density and RGB color output by the network into a two-dimensional face image with two-dimensional new viewing angles by using volume rendering. The principle of the volume rendering technique is to sample in a three-dimensional dataset, then perform ray tracing on each sampling point, and finally generate a two-dimensional image. In this process, the value of each sampling point represents the color and transparency, and since each pixel value of the two-dimensional image can be regarded as a cumulative superposition of all points on one ray emitted by the camera, the color of each pixel can be obtained by integrating the colors based on the density, thereby rendering the two-dimensional image under the view angle.
The sampling results are accumulated, and the distance from the camera to the near plane is assumed to be t n Distance to far plane t f For the sampling ray r (t) =o+td, the final desired color calculation formula is as follows:
wherein r (t) =o+td represents light, σ θ Representing density parameters, RGB θ The color parameter is represented by a color parameter,representing the viewing angle direction vector, z far And z near Respectively representing a far plane and a near plane, T (T) represents the light ray at T n The transmittance within the distance t, i.e. the probability that a ray can propagate without hitting any other particle, is defined as follows:
in an embodiment, S5 optimizes the face image rendered in step S4 using a HiFaceGAN network, which is collectively referred to as facial repair through collaborative suppression and replenishment. And (3) denoising and removing artifacts and improving detail precision for the image synthesized by the volume rendering by using the network.
(1) The suppression module aims at suppressing heterogeneous degradation and encoding robust hierarchical semantic information to guide a subsequent replenishment module to reconstruct a correspondingly lifelike-detailed renovated face.
(2) After semantic features are acquired from the front-end suppression module, guide detail supplementation is performed by using coding features.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
Claims (9)
1. The face image synthesis optimization method based on the nerve radiation field is characterized by comprising the following steps of:
s1, acquiring monocular RGB face video, carrying out frame extraction and two-dimensional face image fitting, and carrying out real-time face tracking and analysis on the two-dimensional face image to obtain camera internal parameters, pose parameters and expression parameters;
s2, carrying out layered light sampling on the two-dimensional face image by adopting camera internal parameters and pose parameters to obtain corresponding three-dimensional coordinates, and carrying out position coding on the three-dimensional coordinates;
s3, inputting the coded position, expression parameters and latent codes into a neural network formed by the MLP together for training a nerve radiation field to obtain RGB color values and densities;
s4, synthesizing RGB color values and density into a two-dimensional face image of a two-dimensional new view angle by utilizing volume rendering;
and S5, optimizing the two-dimensional face image synthesized in the step S4 by adopting a generation countermeasure network.
2. The method of claim 1, wherein S1 uses a flag low-dimensional face model to fit two-dimensional face images, and estimates head geometry, appearance and facial expression from a single Zhang Ren face image.
3. The face image synthesis optimization method based on the neural radiation field according to claim 1, wherein S2 performs ray sampling on a real 3D scene by using a hierarchical sampling technique from thick to thin, performs dense sampling near points with large color contribution, performs sparse sampling near points with small color contribution, and performs proportional distribution on samples according to the expected effect of the samples on final rendering before inputting to the MLP network, for subsequent optimization of two MLP networks: coarse network and fine network, raise and render the efficiency.
4. The face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the coding equation adopted by S2 is:
γ(p)=(sin(2 0 πp),cos(2 0 πp),…,sin(2 L-1 πp),cos(2 L-1 πp))
wherein gamma (-) acts on each component (X, y, z) in the three-dimensional space coordinate X and the unit vector of the viewing angle direction respectivelyIs included in the three components of (a); l is dimension, p is gamma (& gt) parameter, which is used for representing three-dimensional coordinate X and view angle vector +.>
5. The face image synthesis optimization method based on nerve radiation field according to claim 1, wherein the neural network formed by the MLP in S3 is a main body part of the nerve radiation field, a backbone part of the neural network is composed of 8 fully connected layers, each layer has 256 neurons, the layers are connected through a ReLu activation function, the backbone part is used for transmitting calculated position codes, the final output density sigma, and then 24-dimensional visual angle direction codes v are added → The 3-dimensional RGB color values are output after passing through the 4-layer full connection layer.
6. The face image synthesis optimization method based on the neural radiation field according to claim 1, wherein the loss function of the neural network in S3 is as follows:
wherein R is the collection of rays in each batch in the S2 hierarchical sampling, R is each ray in the collection, C (R),The colors of the light rays RGB output in the real 3D scene, the coarse network and the fine network in the step 3 are respectively.
7. The face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the step S3 is to accelerate the training process by skipping the empty region and the shielding region by using the nerface technology.
8. The face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the volume rendering formula of S4 is as follows:
wherein r (t) =o+td represents light, σ θ Representing density parameters, RGB θ The color parameter is represented by a color parameter,representing the viewing angle direction vector, z far And z near Respectively representing a far plane and a near plane, T (T) represents the light ray at T n The transmittance within the distance t, i.e. the probability that a ray can propagate without hitting any other particle, is defined as follows:
9. the face image synthesis optimization method based on the nerve radiation field according to claim 1, wherein the S5 optimizes the two-dimensional face image synthesized by the S4 by using a HiFaceGAN network, the network adopts a front-end suppression module to suppress heterogeneous degradation and encodes robust hierarchical semantic information to guide a subsequent replenishment module to reconstruct a correspondingly lifelike face, and the encoding feature is utilized to guide detail replenishment after semantic features are acquired from the front-end suppression module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311379379.7A CN117422829A (en) | 2023-10-24 | 2023-10-24 | Face image synthesis optimization method based on nerve radiation field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311379379.7A CN117422829A (en) | 2023-10-24 | 2023-10-24 | Face image synthesis optimization method based on nerve radiation field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117422829A true CN117422829A (en) | 2024-01-19 |
Family
ID=89532032
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311379379.7A Pending CN117422829A (en) | 2023-10-24 | 2023-10-24 | Face image synthesis optimization method based on nerve radiation field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117422829A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689783A (en) * | 2024-02-02 | 2024-03-12 | 湖南马栏山视频先进技术研究院有限公司 | Face voice driving method and device based on super-parameter nerve radiation field |
CN117953165A (en) * | 2024-03-26 | 2024-04-30 | 合肥工业大学 | New human face view synthesis method and system based on nerve radiation field |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115170559A (en) * | 2022-08-12 | 2022-10-11 | 杭州像衍科技有限公司 | Personalized human head nerve radiation field substrate representation and reconstruction method based on multilevel Hash coding |
CN115439311A (en) * | 2022-08-24 | 2022-12-06 | 南京航空航天大学 | Face mask editing method based on generation of confrontation network |
CN115689869A (en) * | 2022-10-21 | 2023-02-03 | 中国科学院计算技术研究所 | Video makeup migration method and system |
CN116071494A (en) * | 2022-12-23 | 2023-05-05 | 杭州像衍科技有限公司 | High-fidelity three-dimensional face reconstruction and generation method based on implicit nerve function |
WO2023093186A1 (en) * | 2022-06-15 | 2023-06-01 | 之江实验室 | Neural radiation field-based method and apparatus for constructing pedestrian re-identification three-dimensional data set |
CN116228979A (en) * | 2023-02-24 | 2023-06-06 | 上海大学 | Voice-driven editable face replay method, device and storage medium |
-
2023
- 2023-10-24 CN CN202311379379.7A patent/CN117422829A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023093186A1 (en) * | 2022-06-15 | 2023-06-01 | 之江实验室 | Neural radiation field-based method and apparatus for constructing pedestrian re-identification three-dimensional data set |
CN115170559A (en) * | 2022-08-12 | 2022-10-11 | 杭州像衍科技有限公司 | Personalized human head nerve radiation field substrate representation and reconstruction method based on multilevel Hash coding |
CN115439311A (en) * | 2022-08-24 | 2022-12-06 | 南京航空航天大学 | Face mask editing method based on generation of confrontation network |
CN115689869A (en) * | 2022-10-21 | 2023-02-03 | 中国科学院计算技术研究所 | Video makeup migration method and system |
CN116071494A (en) * | 2022-12-23 | 2023-05-05 | 杭州像衍科技有限公司 | High-fidelity three-dimensional face reconstruction and generation method based on implicit nerve function |
CN116228979A (en) * | 2023-02-24 | 2023-06-06 | 上海大学 | Voice-driven editable face replay method, device and storage medium |
Non-Patent Citations (7)
Title |
---|
JIAYANG BAI等: "Self-NeRF: A Self-Training Pipeline for Few-Shot Neural Radiance Fields", 《ARXIV》, 10 March 2023 (2023-03-10), pages 2303 * |
LINGBO YANG等: "HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment", 《IN 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM ’20)》, 12 October 2020 (2020-10-12), pages 1 - 6 * |
SHAHRUKH ATHAR等: "FLAME-in-NeRF: Neural control of Radiance Fields for Free View Face Animation", 《2023 IEEE 17TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG)》, 16 February 2023 (2023-02-16) * |
YIYU ZHUANG等: "MoFaNeRF: Morphable Facial Neural Radiance Field", 《ARXIV》, 22 July 2022 (2022-07-22), pages 1 - 6 * |
古路: "加速Nerf训练:nerfacc", pages 0 - 2, Retrieved from the Internet <URL:https://blog.csdn.net/fb_941219/article/details/131680149> * |
张耀等: "基于深度学习的视觉同时定位与建图研究进展", 《仪器仪表学报》, vol. 44, no. 07, 31 July 2023 (2023-07-31), pages 214 - 241 * |
洪阳: "高保真虚拟数字人的表示与重建", 《中国博士学位论文全文数据库 信息科技辑》, no. 03, 15 March 2023 (2023-03-15), pages 138 - 32 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689783A (en) * | 2024-02-02 | 2024-03-12 | 湖南马栏山视频先进技术研究院有限公司 | Face voice driving method and device based on super-parameter nerve radiation field |
CN117689783B (en) * | 2024-02-02 | 2024-04-30 | 湖南马栏山视频先进技术研究院有限公司 | Face voice driving method and device based on super-parameter nerve radiation field |
CN117953165A (en) * | 2024-03-26 | 2024-04-30 | 合肥工业大学 | New human face view synthesis method and system based on nerve radiation field |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gao et al. | Reconstructing personalized semantic facial nerf models from monocular video | |
Rematas et al. | Novel views of objects from a single image | |
CN112887698B (en) | High-quality face voice driving method based on nerve radiation field | |
CN117422829A (en) | Face image synthesis optimization method based on nerve radiation field | |
CN113066171B (en) | Face image generation method based on three-dimensional face deformation model | |
CN115914505B (en) | Video generation method and system based on voice-driven digital human model | |
CN117496072B (en) | Three-dimensional digital person generation and interaction method and system | |
CN113808047B (en) | Denoising method for human motion capture data | |
CN117315211B (en) | Digital human synthesis and model training method, device, equipment and storage medium thereof | |
CN115457169A (en) | Voice-driven human face animation generation method and system | |
CN115951784B (en) | Method for capturing and generating motion of wearing human body based on double nerve radiation fields | |
CN116385667B (en) | Reconstruction method of three-dimensional model, training method and device of texture reconstruction model | |
CN111402403B (en) | High-precision three-dimensional face reconstruction method | |
CN116416376A (en) | Three-dimensional hair reconstruction method, system, electronic equipment and storage medium | |
CN112184912A (en) | Multi-metric three-dimensional face reconstruction method based on parameterized model and position map | |
CN116385606A (en) | Speech signal driven personalized three-dimensional face animation generation method and application thereof | |
CN116134491A (en) | Multi-view neuro-human prediction using implicit differentiable renderers for facial expression, body posture morphology, and clothing performance capture | |
Chen et al. | TeSTNeRF: Text-Driven 3D Style Transfer via Cross-Modal Learning. | |
Yang et al. | Poxture: Human posture imitation using neural texture | |
CN117333604A (en) | Character face replay method based on semantic perception nerve radiation field | |
CN114972619A (en) | Single-image face three-dimensional reconstruction method based on self-alignment double regression | |
CN117745932A (en) | Neural implicit curved surface reconstruction method based on depth fusion constraint | |
CN116883524A (en) | Image generation model training, image generation method and device and computer equipment | |
Han et al. | Learning residual color for novel view synthesis | |
Wang et al. | Expression-aware neural radiance fields for high-fidelity talking portrait synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |