Disclosure of Invention
The invention aims to provide a human body re-illumination method based on a dynamic surface reflection field, which utilizes compact space-time implicit expression to learn human body motion with high degree of freedom and realizes fine dynamic human body geometric reconstruction and material estimation. In order to model an accurate shadow effect, the method estimates direct illumination and indirect illumination simultaneously, and adopts a physical-based rendering method to realize a vivid rendering effect.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A human body re-illumination method based on a dynamic surface reflection field comprises the following steps:
Decomposing the 4D space by utilizing multi-plane and hash representation, and encoding the input multi-view dynamic human body video by using time-space multi-plane representation to obtain compact time-space position encoding; the method comprises the following steps: the 4D space is decomposed into a compact multi-planar feature encoder and a time-aware hash encoder. In modeling, light is emitted from a camera center point to an imaging plane, light in 4D space is sampled, each light samples a certain number of points, and space-time encoding is performed for each point using the two encoders obtained above.
Inputting the space-time position codes into a geometric network to obtain a symbol distance function value and geometric characteristics of the light sampling points; the method comprises the following steps: and (3) inputting space-time position codes of the light sampling points into a multi-layer perceptron, and obtaining symbol distance function values and geometric features of the corresponding light sampling points through rendering loss fitting.
Inputting the geometric characteristics and space-time position codes of the light sampling points into a color network to obtain color values of the light sampling points; the method comprises the following steps: and splicing the space-time position codes of the light sampling points with the geometric features, inputting the space-time position codes into a multi-layer perceptron, and obtaining color values of corresponding points through rendering loss fitting.
Integrating the density, normal direction, color and material of the sampling points on each light ray by using a volume rendering technology to obtain the depth, normal direction, color and material of the corresponding pixels; thereby obtaining a depth map, a normal map, a color map and a texture map of the dynamic human body;
for modeling of illumination, the method estimates direct illumination and indirect illumination simultaneously, the direct illumination uses a spherical Gaussian function for modeling, and parameters can be compressed and optimized, so that the parameters are easy to converge; indirect light relies on the characteristics of the neural radiation field, modeling visibility and indirect illumination using ray tracing.
Determining the positions of the surface points by using the obtained depth map, and obtaining a final rendering image by using a physical-based rendering method for each surface point; the method comprises the following steps: and obtaining the spatial positions of the surface points by sampling the light rays by utilizing the depth information, and obtaining a final rendered image by using a micro-surface model to input geometry, materials, visibility and illumination through a physical-based rendering method for each surface point.
Taking the target video as a monitor, simultaneously restricting the rendering image obtained by the volume rendering and the physical-based rendering method in the steps, and learning model parameters by minimizing the restriction; the main constraint is rendering loss with target video as supervision, and the main constraint comprises smooth loss of materials and geometric constraint.
When in re-illumination, the new ambient light map is used for replacing direct illumination in illumination modeling, and a physical-based rendering method is used for synthesizing dynamic human re-illumination video under the new illumination.
The effects provided in the summary of the invention are merely effects of embodiments, not all effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the invention provides a human body relighting method based on a dynamic surface reflection field, which designs an efficient 4D implicit representation to model the human body surface reflection field, overcomes the problems of large fitting error and lower motion freedom inherent in the method based on a template, and realizes accurate estimation of the dynamic human body surface reflection field. In the illumination modeling, visibility and indirect light are introduced through ray tracing, so that the coloring effect of secondary ejection is accurately simulated, and more accurate material calculation and relighting effect are realized.
Detailed Description
As shown in fig. 1, a human body re-illumination method based on a dynamic surface reflection field comprises the following steps:
S1, decomposing a 4D space by using multi-plane and hash representation, and encoding an input multi-view dynamic human body video by using a time-space multi-plane representation to obtain a compact space-time position code;
s2, inputting the space-time position codes into a geometric network to obtain a symbol distance function value and geometric characteristics of the light sampling points;
s3, inputting the geometric features and space-time position codes of the light sampling points into a color network to obtain color values of the light sampling points;
S4, obtaining the depth, normal direction, color and material of the corresponding pixel by using a volume rendering technology for the light sampling points; thereby obtaining a depth map, a normal map, a color map and a texture map of the dynamic human body;
S5, modeling direct illumination by using a spherical Gaussian function, and modeling light visibility and indirect illumination by using ray tracing;
s6, determining the positions of the surface points by using the obtained depth map, and obtaining a final rendering image for each surface point by using a physical-based rendering method;
S7, taking the target video as a monitor, simultaneously restraining the rendering image obtained through the volume rendering and the physical-based rendering method in the steps, and learning model parameters by minimizing the restraint;
And S8, when the human body is relight, replacing the direct illumination by using a new ambient light map to obtain a dynamic human body relight video.
In step S1, the 4D space is decomposed into a compact multi-planar feature encoder and a time-aware hash encoder. In modeling, light is emitted from a camera center point to an imaging plane, light in 4D space is sampled, each light samples a certain number of points, and space-time encoding is performed for each point using the two encoders obtained above. For each sampling point in spaceThe space-time coding can be defined as:
Wherein, Representing a multi-planar feature encoder,/>A hash encoder representing the temporal perception,Is a low-dimensional tensor decomposed from the 4D tensor,/>Representing a splice operation,/>Representing the hadamard product.
In step S2, the position codes of the light sampling points are input into a small multi-layer perceptron, and the symbol distance function values and geometric features of the corresponding light sampling points are obtained through rendering loss fitting. The process can be expressed as: Wherein, the method comprises the steps of, wherein, Is a geometric network,/>Is a sign distance function value,/>Is a geometric feature.
In step S3, the space-time position codes of the sampling points are spliced with the geometric features, the spliced space-time position codes are input into a small multi-layer perceptron, and color values of the corresponding light sampling points are obtained through rendering loss fitting. The process can be expressed as: wherein/> For color network,/>Color values for the sample points.
In step S4, the density, normal direction, color and material of the sampling points on each ray are integrated by using the volume rendering technique, so as to obtain a depth map, a normal direction map, a color map and a material map of the dynamic human body. Taking a color chart as an example, this process can be expressed as:
Wherein, Representing the camera center,/>Representing the opposite direction of the light emitted from the camera center,/>Representing transmittance,/>Representing volume density,/>For sampling point color values,/>The color of the resulting pixel value is rendered for the volume.
In step S5, for modeling of illumination, the method estimates direct illumination and indirect illumination at the same time, the direct illumination uses a spherical gaussian function for modeling, compression can optimize the parameter quantity, so that the parameter quantity is easy to converge, indirect light depends on the characteristics of a nerve radiation field, and the visibility and the indirect illumination are obtained by using a light tracking mode.
Direct illuminationCan be expressed as:
Wherein, Representing a mixed sphere gaussian function,/>Representation for lobe/>Is/are optimized for the parametersFor the total number of lobes,Is the incident direction of the light.
Indirect light relies on the characteristics of the neural radiation field, and the visibility and indirect illumination are obtained by using a light tracking modeThe concrete representation is as follows:
Wherein, For/>Location of timetable points,/>Color of pixel value obtained for volume rendering,/>For/>Transmittance of each sample point, emission direction from surface point/>The rays issued may be expressed as: /(I). In actual sampling, N (=512) points are acquired by using a discrete sampling mode, wherein the number of points is/areFor/>Sampling intervals of the sampling points.
In step S6, the spatial positions of the surface points are obtained by sampling the light using the depth information, and for each surface point, the final rendered image is obtained by using the micro-surface model to input geometry, material, visibility and illumination through a physical-based rendering method. The physics-based rendering formula is as follows:
Wherein, Is in the normal direction/>To be from/>Incident radiance of direction reception,/>Is the emergent direction/>Is made of surface material.
In step S7, taking the target video as a supervision, and simultaneously constraining the rendered image obtained by the volume rendering and the physical-based rendering method in the above steps, wherein the main constraint is the rendering loss taking the target video as the supervision, and secondly comprises the smooth loss of the material and the geometric constraint, and learning the model parameters by minimizing the constraint.
Principal constraint lossThe definition is as follows:
Wherein, Is the color resulting from volume rendering,/>For colors based on physical rendering,/>Is the true color for supervision.
In step S8, after modeling is completed, only a new ambient light map is needed to replace direct illumination during re-illumination, so as to obtain a dynamic human body re-illumination video.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.