WO2021223134A1 - Micro-renderer-based method for acquiring reflection material of human face from single image - Google Patents

Micro-renderer-based method for acquiring reflection material of human face from single image Download PDF

Info

Publication number
WO2021223134A1
WO2021223134A1 PCT/CN2020/088883 CN2020088883W WO2021223134A1 WO 2021223134 A1 WO2021223134 A1 WO 2021223134A1 CN 2020088883 W CN2020088883 W CN 2020088883W WO 2021223134 A1 WO2021223134 A1 WO 2021223134A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
face
network
reflection material
spherical harmonic
Prior art date
Application number
PCT/CN2020/088883
Other languages
French (fr)
Chinese (zh)
Inventor
翁彦琳
周昆
耿佳豪
王律迪
Original Assignee
浙江大学
杭州相芯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, 杭州相芯科技有限公司 filed Critical 浙江大学
Priority to PCT/CN2020/088883 priority Critical patent/WO2021223134A1/en
Publication of WO2021223134A1 publication Critical patent/WO2021223134A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping

Definitions

  • the present invention relates to the field of face capture, in particular to a method for solving the reflection material of a face based on a single image.
  • 3D deformable models Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In SIGGRAPH. https://doi.org/ 10.1145/311535.311556) is the first to successfully model the change of face shape and appearance as a linear combination of a set of orthogonal bases.
  • 3D deformable models have influenced the development of many methods such as (James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniahy, and David Dunaway. 2016. A 3D Morphable Model Learnt from 10,000 Faces.
  • Yamaguchi and others proposed another algorithm based on deep learning (Shuco Yamaguchi, Shunsuke Saito, Koki Nagano, Yajie Zhao, Weikai Chen, Kyle Olszewski, Shigeo Morishima, and Hao Li. 2018. High-fidelity Facial Reflectin and Reference an unconstrained image.ACMTransactions on Graphics(TOG)37,4(2018),162.), they can infer high-quality face material from a single unconstrained image, and they can use them to render reasonable and realistic results, but they The method cannot guarantee the consistency of the rendering result with the target image characteristics.
  • the purpose of the present invention is to address the shortcomings of the prior art and provide a method for solving high-quality human face reflection materials from a single image based on a differentiable renderer.
  • the present invention first detects the 3D geometric information of the face of the input image, initializes the hidden space of the face reflection material and the spherical harmonic illumination variable, and then uses the decoder based on the neural network to decode the hidden space variable to generate the corresponding face reflection material, and then uses the face reflection material based on The quality enhancer of the neural network improves the quality of the reflection material generated by the decoder.
  • the physically-based differentiable renderer performs the character rendering according to the reflection material and spherical harmonic lighting, and minimizes the color space between the rendering result and the input face
  • the hidden space and spherical harmonic illumination variables are iteratively updated until convergence, and the finally obtained hidden space variables can be decoded and quality enhancement operations to obtain a high-quality face reflection material that meets the input face characteristics and is performed with the above materials Rendering can get high-fidelity, high-feature matching rendering results. This method has reached the most advanced level of face material generation technology and has high practical value.
  • a method for solving the reflection material of a human face from a single image based on a differentiable renderer including the following steps:
  • the 3D information includes a 3D model of a human face, a rigid body change matrix, and a projection matrix; the static information includes a shadow map T sha and an environment normal map T bn .
  • the initial value of the hidden space coefficient of the face reflection material is obtained through the encoder encoding based on the convolutional neural network
  • the initial value of the spherical harmonic illumination coefficient *A, n, s represent diffuse reflection material, normal material and specular reflection material respectively.
  • step 1 includes the following sub-steps:
  • step 1.1 Calculation of static information based on physically differentiable rendering: using the 3D model, rigid body change matrix, and projection matrix in step 1.1, the texture coordinates are drawn as color information to the image space to obtain the texture coordinate image I uv ; using 1.1 The rigid body change matrix and 3D model obtained in the step are obtained, and the rigid change 3D model is obtained.
  • the ray tracing algorithm is used to calculate the occlusion in each direction of each vertex of the above 3D model, and project it to the spherical harmonic function polynomial, thereby obtaining each vertex
  • the occluded spherical harmonic coefficient in addition, the proportion of the unoccluded area and the center direction of the unoccluded area are recorded to obtain the environment normal vector of each vertex.
  • the final shadow map T sha and the environment normal map T bn are obtained .
  • the Poisson algorithm is used to fill the void areas in the face color image in the texture space.
  • the encoder and decoder based on the convolutional neural network are obtained by forming a U-shaped network for joint training, and the training specifically includes the following sub-steps:
  • Training data Obtain N target face images I o and corresponding diffuse reflection materials Normal material And specular reflection material The face image is mapped to the texture space, and the corresponding texture space face color image I is obtained.
  • the training data that composes the U-shaped network, each of which has a resolution of 1024 ⁇ 1024.
  • Diffuse material, normal material, and specular material each have a U-shaped network.
  • U a of diffuse reflection material the input is a scaled texture space face color image
  • the encoder part E a of U a contains 9 down-sampling modules.
  • the first 8 down-sampling modules all include a convolutional layer with a core size of 3 ⁇ 3 and a step size of 2 ⁇ 2, a batch normalization layer, and an LReLU activation function layer.
  • the last down-sampling module includes a convolutional layer with a kernel size of 1 ⁇ 1 and a step size of 2 ⁇ 2, a batch normalization layer, and an LReLU activation function layer.
  • the final encoding becomes a 1 ⁇ 1 ⁇ 1024 diffuse reflection material hidden space.
  • the decoder part of U a D a contains 9 up-sampling modules.
  • Each up-sampling module contains a zoomed convolutional layer with a core size of 3 ⁇ 3 and doubled magnification, batch normalization layer, LReLu activation function layer, and finally passed A convolutional layer with a core size of 1 ⁇ 1, a step size of 1 ⁇ 1, and an activation function of Sigmoid will obtain an output with a final resolution of 512 ⁇ 512 ⁇ 3.
  • U-shaped network of normal material U n the input is a face color image in texture space scaled by regional interpolation
  • the resolution is 256 ⁇ 256
  • the encoder En includes 8 down-sampling modules.
  • the first 7 down-sampling modules include a convolutional layer with a core size of 3 ⁇ 3 and a step size of 2 ⁇ 2, a batch normalization layer, LReLU activation function layer, the last down-sampling module includes a convolutional layer with a core size of 1 ⁇ 1 and a step size of 2 ⁇ 2, a batch normalization layer, and an LReLU activation function layer.
  • the final encoding becomes a 1 ⁇ 1 ⁇ 512 normal material Hidden space.
  • the decoder D n includes 8 up-sampling modules, the first 7 up-sampling modules all include a core size of 3 ⁇ 3, zoomed convolution layer twice magnified, batch normalization layer, LReLu activation function layer, and finally through a core
  • a convolutional layer with a size of 1 ⁇ 1, a step size of 1 ⁇ 1, and an activation function of Sigmoid obtains an output with a final resolution of 256 ⁇ 256 ⁇ 3.
  • Specularly reflective material Network U S U, S E of the encoder structure which is same as E n, D s 7 before upsampled module contains a core size of 3 ⁇ 3, enlarged to twice the scale layer convolution, normalized batch Layer, LReLu activation function layer, and finally through a convolutional layer with a kernel size of 1 ⁇ 1, a step size of 1 ⁇ 1, and an activation function of Sigmoid to obtain an output with a final resolution of 256 ⁇ 256 ⁇ 1.
  • the three modules with the highest resolution in the U-shaped network E * and D * are connected by skipping transmission, and * is a, n, s.
  • U * represents a U-shaped network, where the subscript * can be a, n, s representing diffuse reflection material, normal material, and specular reflection material, respectively.
  • the resolution is 512 ⁇ 512, and The resolution is 256 ⁇ 256.
  • the initial value of the spherical harmonic illumination of the input image It is obtained by constructing a spherical harmonic illumination coefficient regression network.
  • the spherical harmonic illumination coefficient regression network includes an encoder based on a convolutional neural network and a regression module composed of a full connection.
  • the training process includes the following steps:
  • a training data pair is composed of ⁇ I o , z e ⁇ , in which the spherical harmonic coefficient z e is calculated according to the HDR ambient light image I e by the following formula:
  • the reflective material image is improved by constructing the reflective material quality enhancement network R *
  • the resolution and quality of detail include the following sub-steps:
  • Training data input the face color image I used for training into the U-shaped network trained in step 2 to generate Original with face color image I Compose training data pair * Means a, n, s.
  • SRGAN network is used as the reflection material quality enhancement network R *
  • GAN generative confrontation
  • the input is 512 ⁇ 512
  • the output image resolution is 1024 ⁇ 1024.
  • the first layer of the network accepts an image depth of 4, and the input includes And scaled texture space face color image
  • the input resolution is 256 ⁇ 256
  • the output resolution is 1024 ⁇ 1024 high-quality texture images.
  • step 4.1 Quality enhancement of material images: based on step 3 Use the trained quality enhancement network in step 4.1 for quality enhancement to obtain a high-quality material image T * ,* represents a, n, s, and the whole process can be expressed by the following formula:
  • step 5 includes the following sub-steps:
  • the T a , T n and T s output by the T * quality enhancement network and the shadow map T sha and the environment normal map T bn are bilinear Sampling to obtain the material image t * of the corresponding image space, * is a, n, s, sha, bn representing the diffuse material, the normal material, the specular material, the shadow map, and the environment normal map, respectively.
  • Traverse all the pixels in I uv and use the following physically-based rendering formula to calculate the diffuse lighting of each pixel:
  • k represents the order of the spherical harmonic function polynomial, and reprojects z e and v using the properties of spherical harmonic multiplication projection to obtain w
  • v represents the visibility of each pixel in each direction, and is recorded in t sha
  • c is determined by max
  • the spherical harmonic coefficient of (0, cos ⁇ ) is rotated to the spherical harmonic coefficient of the normal direction n of the current pixel, and n is recorded in t n .
  • DFG represents the pre-calculated rendering transmission equation that obeys the GGX distribution
  • LD calculation method
  • L represents the loss function
  • Rendering using differentiable, differentiable strengthen the network quality and a micro decoder is transmitted to the reverse loss values z *, and iteratively updated z *, * is a, n, s, e each represent diffuse material, method To material, specular reflection material, spherical harmonic lighting, until convergence, and finally to diffuse reflection, normal vector, specular reflection material decoder respectively input z a , z n , z s , and then input the output to the corresponding material quality enhancement
  • materials T a , T n , and T s that meet the character characteristics of the input image are obtained.
  • the beneficial effect of the present invention is that the present invention proposes a method that combines a neural network-based non-linear decoder, quality enhancement, and a physically-based differentiable renderer to calculate a face reflection material from a single face image.
  • This method reaches the most advanced level of the technology for solving face reflection materials, and the processing time is short.
  • the present invention can be used in applications such as the capture of human face materials, the reconstruction of human faces, and the rendering of real human faces.
  • Figure 1 is the result of solving, reconstructing and re-rendering the material of the first face picture by applying the method of the present invention.
  • A is the input image
  • B is the result of the reconstruction of the face reflection material obtained by the solution
  • C is Render the result under new lighting conditions
  • D is the diffuse material t a
  • E is the normal material t n
  • F is the specular material t s .
  • Figure 2 is the result of solving, reconstructing and re-rendering the material of the second face picture by applying the method of the present invention.
  • A is the input image
  • B is the result of the reconstruction of the face reflection material obtained by the solution
  • C is Render the result under new lighting conditions
  • D is the diffuse material t a
  • E is the normal material t n
  • F is the specular material t s .
  • Figure 3 is the result of solving, reconstructing and re-rendering the material of the third face picture by applying the method of the present invention.
  • A is the input image
  • B is the result of the reconstruction of the face reflection material obtained by the solution
  • C is Render the result under new lighting conditions
  • D is the diffuse material t a
  • E is the normal material t n
  • F is the specular material t s .
  • Figure 4 is the result of solving, reconstructing and re-rendering the material of the fourth face picture by applying the method of the present invention.
  • A is the input image
  • B is the result of the reconstruction of the face reflection material obtained by the solution
  • C is Render the result under new lighting conditions
  • D is the diffuse material t a
  • E is the normal material t n
  • F is the specular material t s .
  • Figure 5 is the result of solving, reconstructing and re-rendering the material of the fifth face picture by applying the method of the present invention.
  • A is the input image
  • B is the result of the reconstruction of the face reflection material obtained by the solution
  • C is Render the result under new lighting conditions
  • D is the diffuse material t a
  • E is the normal material t n
  • F is the specular material t s .
  • the core technology of the present invention uses a neural network to non-linearly express the complex face reflection material space, and uses a physically-based differentiable renderer to optimize the space to obtain a face reflection material that meets the characteristics of the input image.
  • the method is mainly divided into the following five main steps: the calculation of 3D geometric information of the face, the initialization of the hidden space of the face reflection material and the spherical harmonic lighting, the decoding of the hidden space to the reflection material image, the quality improvement of the face reflection material, and iteration Optimize the hidden space coefficient and spherical harmonic illumination coefficient of the face reflection material, and solve the face reflection material according to the hidden space coefficient of the reflection material.
  • Figures 1-5 are the results of applying the method of the present invention to solving the material of five character pictures, reconstructing the face, and re-rendering under new lighting.
  • the left picture in the first row of each picture is the input image
  • the middle picture is the result of the reconstruction of the face reflection material obtained by the solution
  • the right picture is the rendering result under the new lighting conditions;
  • the second row the left picture is the diffuse reflection material t a
  • the middle picture is the normal material t n
  • the right picture is the specular reflection material t s , which is obtained by bilinear sampling of the solved material by I uv.
  • Calculation of the 3D geometric information of the face in the image Calculate the 3D information of the face in the input image, and obtain the face color map in the texture space and the static information for physically-based differentiable rendering.
  • the present invention uses algorithms (Chen Cao, Qiming Hou, and Kun Zhou.2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on graphics (TOG) 33, 4 (2014), 43.) to detect input Two-dimensional feature points of the image face, using (Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nie ⁇ ner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedrferences V of the IEEE Conference Pattern Recognition.2387-2395.) Solve the identity coefficient (identity), rigid body change matrix and projection matrix, and interpolate the deformable shape model through the identity coefficient, and then the 3D model of the input face can be obtained:
  • step 1.1 Using the rigid body change matrix and projection matrix obtained in step 1.1, project the 3D model obtained in step 1.1 to the input image, and establish a mapping between each vertex of the 3D model and the image pixel, so that the input image pixel can be mapped to the vertex of the 3D model , And then use the mapping between the vertices of the 3D model and the texture space to map the image pixels to the texture space, and then obtain the face color image in the texture space by triangulating the texture space and interpolating the coordinates of the triangle's center of gravity. Due to the occlusion of the input face, the face color image in the texture space has a hole area, and the Poisson algorithm is used to fill the hole to obtain the final texture space face color image.
  • step 1.1 Use the 3D model, rigid body change matrix, and projection matrix in step 1.1 to draw texture coordinates as color information to the image space to obtain the texture coordinate image I uv ; use the rigid body change matrix and 3D model obtained in step 1.1 to obtain the rigid body change matrix 3D model, using the ray tracing algorithm to calculate the occlusion of each vertex of the above 3D model in various directions, and project it to the spherical harmonic function polynomial.
  • the 9th order is used to obtain the spherical harmonic coefficient of the occlusion of each vertex; Record the proportion of the unoccluded area and the center direction of the unoccluded area to obtain the environment normal vector of each vertex.
  • the face model database contains 84 3D digital characters, each of which contains a 3D model and diffuse material Normal material And specular reflection material
  • the data in this embodiment comes from 3D Scan Store. Use CFD (Debbie S Ma, Joshua Correll, and Bernd Wittenbrink. 2015.The Chicago face database: A free stimulus set of faces and norming data.Behavior research methods 47,4(2015),1122--1135.)
  • the skin color data of the diffuse reflection material is augmented to obtain about 4000 diffuse reflection material images.
  • the ambient light database contains 2957 HDR ambient light images I e . Using the above data, we render the face image through image-based lighting technology and screen-based subsurface technology.
  • U-shaped network structure Diffuse reflection material, normal material, and specular reflection material each have a U-shaped network.
  • Each U-shaped network is composed of encoder E, decoder D, and skip transfer.
  • the input is a scaled texture space face color image in,
  • the area interpolation scaling algorithm is used to scale I to a resolution of 512 ⁇ 512.
  • the encoder part E a of U a contains 9 down-sampling modules.
  • the first 8 down-sampling modules all include a convolutional layer with a core size of 3 ⁇ 3 and a step size of 2 ⁇ 2, and a batch normalization layer (S.Ioffe and C.Szegedy.Batch normalization:Accelerating deep network training by reducing internal covariate shift.arXiv preprint arXiv:1502.03167,2015.), LReLU activation function layer (Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013.
  • Rectifier nonlinearities improve neural network acoustic models.
  • the last one is different from the first eight in that the core size is 1 ⁇ 1, and the final encoding becomes a 1 ⁇ 1 ⁇ 1024 diffuse reflection material hidden space.
  • the decoder part D a of U a contains 9 up-sampling modules, each of which contains a scaled convolutional layer with a core size of 3 ⁇ 3 and twice the magnification (Jon Gauthier.2014.Conditional generative adversarial nets for convolutional face generation.Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014, 5(2014), 2.), batch normalization layer, LReLu activation function layer, and finally pass a core size of 1 ⁇ 1
  • the convolutional layer with 1 ⁇ 1 and activation function Sigmoid obtains an output with a final resolution of 512 ⁇ 512 ⁇ 3.
  • the above network structure can be expressed as (C32K3S2,BN,LReLU,Skip1)->(C64K3S2,BN,LReLU,Skip2)->(C128K3S2,BN,LReLU,Skip3)->(C258K3S2,BN,LReLU)->(C512K3S2 ,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C1024K1S2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU) ->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K
  • the input is the skin color image of the face in the texture space after regional interpolation scaling
  • the resolution is 256 ⁇ 256.
  • the main difference from U a is that the encoder En and the decoder D n each lack a down-sampling layer and an up-sampling layer.
  • the hidden space size is 1 ⁇ 1 ⁇ 512, and D n is output.
  • the size is 256 ⁇ 256 ⁇ 3.
  • the network structure is represented as follows, (C32K3S2,BN,LReLU,Skip1)->(C64K3S2,BN,LReLU,Skip2)->(C128K3S2,BN,LReLU,Skip3)->(C258K3S2,BN,LReLU)->(C512K3S2, BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K1S2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)- >(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(R256K3R2,BN,LReLU)->(Skip3,RC128K3R2,BN,LReLU)->(Skip2,RC64K3R2,BN,LReLU)->(Skip1,RC32K3R2,BN
  • the resolution is 512 ⁇ 512, and The resolution is 256 ⁇ 256.
  • the resolution is 512 ⁇ 512, and The resolution is 256 ⁇ 256.
  • the learning rate is 1e-4, and the optimizer used is the Adam optimizer (DPKingma and J.Ba.Adam:A method for stochastic optimization.arXiv preprint arXiv:1412.6980,2014.).
  • the spherical harmonic illumination coefficient z e of I e is calculated by the following formula:
  • the training data pair is composed of ⁇ I o , z e ⁇ .
  • Decoding from hidden space to reflective material space using a differentiable decoder based on a convolutional neural network to decode the hidden space coefficients of the face reflection material into the corresponding reflective material.
  • the differentiable quality enhancement network based on the convolutional neural network is used to further improve the quality of the reflection material.
  • Training data Use the U-shaped network trained in 2.1 and use the I of the training data in step 2.1 as the network input to generate Form a training data pair with T * of the training data in step 2.1 * Means a, n, s.
  • R n For normal materials and specular materials, we also sample and generate confrontation methods to train the super-resolution network R n , R s . There are two points different from R a . First, they will input 256 ⁇ 256 material images for quality enhancement , Get a high-quality texture image of 1024 ⁇ 1024; secondly, their input is in addition to There are also face color images in scaled texture space
  • step 4.1 Quality enhancement of material images: based on step 3 Use the trained quality enhancement network in step 4.1 for quality enhancement to obtain a high-quality material image T * ,* represents a, n, s, and the whole process can be expressed by the following formula:
  • Iterative optimization of hidden space using physically-based differentiable renderer by minimizing the difference between the rendering result of physically-based differentiable renderer and the input face image, iteratively optimize the hidden space of the face reflection material, and pass Decoding and quality improvement operations get the output face reflection material result.
  • L( ⁇ ) represents the incident light in the ⁇ direction
  • V represents visibility
  • N represents the normal direction.
  • the entire formula represents the area of the sphere on the normal hemisphere.
  • the above formula uses spherical harmonic approximation (Peter-Pike Sloan, Jan Kautz, and John Snyder. 2002. Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. In ACM Transactions on Graphics (TOG), Vol. 21 .ACM, 527–536.) can be further simplified.
  • L and V can be expressed as spherical harmonic functions as v is recorded in t sha and represents the spherical harmonic coefficient of visibility, max(0,N ⁇ ) can also be expressed as spherical harmonic Among them, c represents the spherical harmonic coefficient of the truncated cosine function, which is rotated from the spherical harmonic coefficient of max(0, cos ⁇ ) to the spherical harmonic coefficient of the current pixel normal direction n, and n is recorded in t n .
  • f r represents the light transmission equation that obeys the GGX distribution (Bruce Walter, Stephen R. Marschner, Hongsong Li, and Kenneth E. Torrance. 2007. Microfacet Models for Refraction through Rough Surfaces.), and ⁇ o represents the viewing angle direction.
  • DFG represents the pre-calculated GGX rendering transmission equation
  • LD calculation method
  • L represents the loss function
  • the loss value is passed back to z * , and z * is updated iteratively until convergence, and finally to the diffuse, normal, and specular material decoders
  • Input z a , z n , z s respectively and then input the output to the corresponding material quality enhancement network to obtain materials T a , T n , T s that meet the character characteristics of the input image.
  • *It can be a, n, s, e to represent diffuse reflection material, normal material, specular reflection material, and spherical harmonic lighting respectively.
  • the inventor implemented the embodiment of the present invention on a machine equipped with Intel Xeon E5-4650 CPU and NVidia GeForce RTX 2080Ti graphics processor (11GB). The inventor used all the parameter values listed in the specific embodiments to obtain all the experimental results shown in Figures 1-5.
  • the present invention can effectively output a high-quality human face reflection material according to the input character image. For an image of a face area of 600 ⁇ 800, the calculation of the 3D geometric information of the face takes about 30 seconds, the initialization of the hidden space takes about 10 milliseconds, and the iterative optimization process requires each round of forward calculation (decoding, quality enhancement, rendering). 250 milliseconds, it takes 150 iterations to converge, so the entire iteration process takes about 40 seconds. In addition, it takes 12 hours to train the U-shaped network, 4 hours to train the spherical harmonic light coefficient regression network, and about 50 hours to train the material quality enhancement network. These modules only need to be trained once and can be used to process any input character image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A micro-renderer-based method for acquiring a reflection material of a human face from a single image. The method mainly comprises the five steps of: calculating 3D geometrical information of a human face in an image; initializing a hidden space and spherical harmonic lighting of a reflection of the human face; decoding the hidden space of the human face into a reflection material image of the human face; improving the quality of a reflection material of the human face; and iteratively optimizing a hidden space coefficient and a spherical harmonic lighting coefficient of the reflection material of the human face, and acquiring the reflection material of the human face according to the hidden space coefficient of the reflection material. By means of the method, a high-quality material of a human face can be iteratively acquired according to a frontal facial image with a neutral facial expression, and both the results of facial reconstruction and re-rendering performed by using the obtained material achieve the level of the currently most advanced technique. The method can be applied to a series of applications, such as capturing of the material of a human face, facial reconstruction, and real face rendering.

Description

一种基于微渲染器的从单幅图像求解人脸反射材质的方法A method for solving face reflection material from a single image based on micro-renderer 技术领域Technical field
本发明涉及人脸面部捕捉领域,尤其涉及一种基于单张图像求解人脸反射材质的方法。The present invention relates to the field of face capture, in particular to a method for solving the reflection material of a face based on a single image.
背景技术Background technique
关于人脸面部捕捉领域,有一类基于专业设备的专业面部捕捉方法。这些方法需要目标人物在一种特定且受控的环境中,并由专业人员使用专门设计的设备与算法求解目标人物的反射材质。例如通过Light Stages(Paul Debevec,Tim Hawkins,Chris Tchou,Haarm-Pieter Duiker,Westley Sarokin,and Mark Sagar.2000.Acquiring the Reflectance Field of a Human Face.In Proceedings of SIGGRAPH 2000.)(Abhijeet Ghosh,Graham Fyffe,Borom Tunwattanapong,Jay Busch,Xueming Yu,and Paul Debevec.2011.Multiview Face Capture using Polarized Spherical Gradient Illumination.ACMTrans.Graphics(Proc.SIGGRAPHAsia)(2011).)(Wan-Chun Ma,Tim Hawkins,Pieter Peers,Charles-Felix Chabert,Malte Weiss,and Paul Debevec.2007.Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination.)得到的高质量数据推动了影视行业中众多数字形象的创作。也有像(Thabo Beeler,Bernd Bickel,Paul Beardsley,Bob Sumner,and Markus Gross.2010.High-Quality Single-Shot Capture of Facial Geometry.ACM Trans.on Graphics(Proc.SIGGRAPH)29,3(2010),40:1–40:9.)(Thabo Beeler,Fabian Hahn,Derek Bradley,Bernd Bickel,Paul Beardsley,Craig Gotsman,Robert W.Sumner,and Markus Gross.2011.High-quality passive facial performance capture using anchor frames.ACM Trans.Graph.30,4(Aug.2011),75:1–75:10.https://doi.org/10.1145/2010324.1964970)基于多相机设备,利用shape-from-shading技术重建人脸中毛孔级别的细微信息。Graham等人(P.Graham,Borom Tunwattanapong,Jay Busch,X.Yu,Andrew Jones,and Paul Debevec.2013.Measurement-based Synthesis of Facial Microgeometry.)使用光学和弹性传感器测量面部围观信息。这样的技术可以用来创造高逼真度的数字形象,就像方法(J.von der Pahlen,J.Jimenez,E.Danvoye,Paul Debevec,Graham Fyffe,and Oleg Alexander.2014.Digital Ira and Beyond:Creating a Real-Time Photoreal Digital Actor.Technical Report.)这些方法虽然可以重建高逼真度的数字人脸形象,但是它们对设备要求高且设备价格昂贵,需要专业人员操作,对普通用户不友好。Regarding the field of facial capture, there is a class of professional facial capture methods based on professional equipment. These methods require the target person to be in a specific and controlled environment, and professionals use specially designed equipment and algorithms to solve the target person's reflective material. For example, through Light Stages (Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, and Mark Sagar. 2000.Acquiring the Reflectance Field of a Human Face. In Proceedings of Ghoffe GRAPH, Grahamet of SIGhije 2000.) ( ,Borom Tunwattanapong, Jay Busch, Xueming Yu, and Paul Debevec.2011.Multiview Face Capture using Polarized Spherical Gradient Illumination.ACMTrans.Graphics(Proc.SIGGRAPHAsia)(2011).)(Wan-ChunMas, Timer Hawkins Charles-Felix Chabert, Malte Weiss, and Paul Debevec. 2007. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. The high-quality data obtained has promoted the creation of many digital images in the film and television industry. There are also things like (Thabo Beerer, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-Quality Single-Shot Capture of Facial Geometry. ACM Trans. on Graphics (Proc. SIGGRAPH) 29, 3 (2010), 40 :1-40:9.)(Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. Trans.Graph.30,4(Aug.2011),75:1–75:10.https://doi.org/10.1145/2010324.1964970) Based on multi-camera equipment, shape-from-shading technology is used to reconstruct pores in human faces Level of subtle information. Graham et al. (P. Graham, Borom Tunwattanapong, Jay Busch, X. Yu, Andrew Jones, and Paul Debevec. 2013. Measurement-based Synthesis of Facial Microgeometry.) used optical and elastic sensors to measure facial peripheral information. Such technology can be used to create high-fidelity digital images, just like methods (J.von der Pahlen, J. Jimenez, E. Danvoye, Paul Debevec, Graham Fyffe, and Oleg Alexander.2014.Digital Ira and Beyond: Creating a Real-Time Photoreal Digital Actor Technical Report.) Although these methods can reconstruct high-fidelity digital face images, they require high equipment and are expensive, requiring professional operations, and are not friendly to ordinary users.
除此之外,也有一些基于单视图的面部捕捉方法,其中3D可形变模型(Volker Blanz and Thomas Vetter.1999.A Morphable Model for the Synthesis of 3D Faces.In SIGGRAPH. https://doi.org/10.1145/311535.311556)是最早成功地将人脸形状和外观变化建模为一组正交基的线性组合。多年来,3D可形变模型影响了众多方法的发展例如(James Booth,Anastasios Roussos,Stefanos Zafeiriou,Allan Ponniahy,and David Dunaway.2016.A 3D Morphable Model Learnt from 10,000Faces.In 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).5543–5552.https://doi.org/10.1109/CVPR.2016.598ISSN:1063-6919.)(Ira Kemelmacher.2013.Internet Based Morphable Model.3256–3263.https://doi.org/10.1109/ICCV.2013.404)(Justus Thies,Michael Zollhofer,Marc Stamminger,Christian Theobalt,and Matthias Nieβner.2016.Face2face:Real-time face capture and reenactment of rgb videos.In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition.2387–2395.)。这些基于参数化的线性模型方法,都是利用最小化拟合损失函数来生成人脸形状和反射材质,这类方法的主要缺陷在于其效果受限于线性模型的表达能力,线性模型难以逼真地表达人脸特征。像(Ayush Tewari,Michael
Figure PCTCN2020088883-appb-000001
Hyeongwoo Kim,Pablo Garrido,Florian Bernard,Patrick Pérez,and Christian Theobalt.2017.MoFA:Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction.In arXiv:1703.10580[cs].http://arxiv.org/abs/1703.10580arXiv:1703.10580.)(Luan Tran,Feng Liu,and Xiaoming Liu.2019.Towards High-fidelity Nonlinear 3D Face Morphable Model.In In Proceeding ofIEEE Computer Vision and Pattern Recognition.Long Beach,CA.)(Kyle Genova,Forrester Cole,Aaron Maschinot,Aaron Sarna,Daniel Vlasic,and William T.Freeman.2018.Unsupervised Training for3D Morphable Model Regression.In arXiv:1806.06098[cs].http://arxiv.org/abs/1806.06098arXiv:1806.06098.)(Yu Deng,Jiaolong Yang,Sicheng Xu,Dong Chen,Yunde Jia,and Xin Tong.2019.Accurate 3D Face Reconstruction with Weakly-Supervised Learning:From Single Image to Image Set.In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition Workshops.0–0.)这些方法利用神经网络分离单张人脸图像的几何与反射材质。然而这些方法并非旨在生成可用于高真实度人脸重建的材质,因此它们的效果依然缺乏表现力。
In addition, there are also some single-view-based face capture methods, including 3D deformable models (Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In SIGGRAPH. https://doi.org/ 10.1145/311535.311556) is the first to successfully model the change of face shape and appearance as a linear combination of a set of orthogonal bases. Over the years, 3D deformable models have influenced the development of many methods such as (James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniahy, and David Dunaway. 2016. A 3D Morphable Model Learnt from 10,000 Faces. In 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).5543–5552.https://doi.org/10.1109/CVPR.2016.598ISSN:1063-6919.)(Ira Kemelmacher.2013.Internet Based Morphable Model.3256–3263.https://doi .org/10.1109/ICCV.2013.404)(Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nieβner. 2016.Face2face:Real-time face capture and reenactment of rgb videos.In Proceedings of the IEEE Conference on Vision and Pattern Recognition. 2387-2395.). These parameterized linear model methods all use the minimized fitting loss function to generate the face shape and reflection material. The main drawback of this type of method is that its effect is limited by the expressive ability of the linear model, and the linear model is difficult to express realistically. Face features. Like (Ayush Tewari, Michael
Figure PCTCN2020088883-appb-000001
Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Pérez, and Christian Theobalt. 2017.MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction.In arXiv:1703.10580[cs].http://arxiv.org/abs/ 1703.10580arXiv:1703.10580.)(Luan Tran,Feng Liu,and Xiaoming Liu.2019.Towards High-fidelity Nonlinear 3D Face Morphable Model.In In Proceeding of IEEE Computer Vision and Pattern Recognition.Long Beach,CA.)(Kyle Genova,Forrester Cole, Aaron Maschinot, Aaron Sarna, Daniel Vlasic, and William T. Freeman.2018.Unsupervised Training for3D Morphable Model Regression.In arXiv:1806.06098[cs].http://arxiv.org/abs/1806.06098arXiv:1806.06098.) (Yu Deng,Jiaolong Yang,Sicheng Xu,Dong Chen,Yunde Jia,and Xin Tong.2019.Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.0-0.) These methods use neural networks to separate the geometry and reflection materials of a single face image. However, these methods are not intended to generate materials that can be used for high-fidelity face reconstruction, so their effects are still lacking in expressiveness.
另外也有一些旨在生成高真实度人脸反射材质的方法。Saito等人提出了一种从单张无约束图像中推断高分辨率漫反射材质的算法(Shunsuke Saito,Lingyu Wei,Liwen Hu,Koki Nagano,and Hao Li.2017.Photorealistic Facial Texture Inference Using Deep Neural Networks.In arXiv:1612.00523[cs].http://arxiv.org/abs/1612.00523arXiv:1612.00523.),他们的中心思想是利用神经网络的中间层特征相关性混合数据库中的高分辨率材质,以此生成微小的人脸细节特征。Yamaguchi等人提出了另一种基于深度学习的算法(Shuco Yamaguchi,Shunsuke Saito,Koki Nagano,Yajie Zhao,Weikai Chen,Kyle Olszewski,Shigeo Morishima,and Hao Li.2018. High-fidelity facial reflectance and geometry inference from an unconstrained image.ACMTransactions on Graphics(TOG)37,4(2018),162.),他们可以从单张无约束图像中推断人脸高质量材质,并且可以用它们渲染出合理真实的结果,但是他们的方法无法保证渲染结果与目标图像特征的一致性。There are also some methods aimed at generating high-realistic face reflection materials. Saito et al. proposed an algorithm to infer high-resolution diffuse material from a single unconstrained image (Shunsuke Saito, Lingyu Wei, Liwen Hu, Koki Nagano, and Hao Li. 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks .In arXiv:1612.00523[cs].http://arxiv.org/abs/1612.00523arXiv:1612.00523.), their central idea is to use the middle layer feature correlation of the neural network to mix the high-resolution materials in the database to This generates tiny facial details. Yamaguchi and others proposed another algorithm based on deep learning (Shuco Yamaguchi, Shunsuke Saito, Koki Nagano, Yajie Zhao, Weikai Chen, Kyle Olszewski, Shigeo Morishima, and Hao Li. 2018. High-fidelity Facial Reflectin and Reference an unconstrained image.ACMTransactions on Graphics(TOG)37,4(2018),162.), they can infer high-quality face material from a single unconstrained image, and they can use them to render reasonable and realistic results, but they The method cannot guarantee the consistency of the rendering result with the target image characteristics.
发明内容Summary of the invention
本发明的目的在于针对现有技术的不足,提供了一种基于可微渲染器的从单幅图像求解高质量人脸反射材质的方法。本发明先检测输入图像人脸3D几何信息,并初始化人脸反射材质隐空间与球谐光照变量,再利用基于神经网络的解码器解码隐空间变量,生成对应的人脸反射材质,然后利用基于神经网络的质量强化器提升解码器生成的反射材质的质量,最后由基于物理的可微渲染器根据反射材质和球谐光照进行人物渲染,并最小化渲染结果与输入人脸之间在颜色空间的差异,对隐空间与球谐光照变量进行迭代更新,直至收敛,最终得到的隐空间变量经解码与质量强化操作可以得到符合输入人脸特征且高质量的人脸反射材质,用上述材质进行渲染,可以得到高逼真度、高特征匹配的渲染结果。该方法达到最先进的人脸材质生成技术水平,具有很高的实用价值。The purpose of the present invention is to address the shortcomings of the prior art and provide a method for solving high-quality human face reflection materials from a single image based on a differentiable renderer. The present invention first detects the 3D geometric information of the face of the input image, initializes the hidden space of the face reflection material and the spherical harmonic illumination variable, and then uses the decoder based on the neural network to decode the hidden space variable to generate the corresponding face reflection material, and then uses the face reflection material based on The quality enhancer of the neural network improves the quality of the reflection material generated by the decoder. Finally, the physically-based differentiable renderer performs the character rendering according to the reflection material and spherical harmonic lighting, and minimizes the color space between the rendering result and the input face The hidden space and spherical harmonic illumination variables are iteratively updated until convergence, and the finally obtained hidden space variables can be decoded and quality enhancement operations to obtain a high-quality face reflection material that meets the input face characteristics and is performed with the above materials Rendering can get high-fidelity, high-feature matching rendering results. This method has reached the most advanced level of face material generation technology and has high practical value.
本发明的目的是通过以下技术方案来实现的:一种基于可微渲染器的从单幅图像求解人脸反射材质的方法,包括以下步骤:The purpose of the present invention is achieved through the following technical solutions: a method for solving the reflection material of a human face from a single image based on a differentiable renderer, including the following steps:
(1)计算输入图像中人脸的3D信息,并根据3D信息获得纹理空间的人脸颜色图和用于基于物理的可微渲染的静态信息。所述3D信息包括人脸的3D模型、刚体变化矩阵以及投影矩阵;所述静态信息包括阴影贴图T sha和环境法向贴图T bn(1) Calculate the 3D information of the face in the input image, and obtain the face color map in the texture space and the static information for physically-based differentiable rendering according to the 3D information. The 3D information includes a 3D model of a human face, a rigid body change matrix, and a projection matrix; the static information includes a shadow map T sha and an environment normal map T bn .
(2)基于步骤1得到的纹理空间人脸颜色图,通过基于卷积神经网络的编码器编码得到人脸反射材质隐空间系数初始值
Figure PCTCN2020088883-appb-000002
和球谐光照系数的初始值
Figure PCTCN2020088883-appb-000003
*是a,n,s分别表示漫反射材质、法向材质以及镜面反射材质。
(2) Based on the texture space face color map obtained in step 1, the initial value of the hidden space coefficient of the face reflection material is obtained through the encoder encoding based on the convolutional neural network
Figure PCTCN2020088883-appb-000002
The initial value of the spherical harmonic illumination coefficient
Figure PCTCN2020088883-appb-000003
*A, n, s represent diffuse reflection material, normal material and specular reflection material respectively.
(3)利用基于卷积神经网络实现的可微解码器,将人脸反射材质隐空间的系数
Figure PCTCN2020088883-appb-000004
解码为相应的反射材质图像
Figure PCTCN2020088883-appb-000005
(3) Using a differentiable decoder based on a convolutional neural network to reflect the coefficients of the hidden space of the face material
Figure PCTCN2020088883-appb-000004
Decode into the corresponding reflection material image
Figure PCTCN2020088883-appb-000005
(4)提升步骤3得到的反射材质图像
Figure PCTCN2020088883-appb-000006
的分辨率及细节质量得到图像T *
(4) Improve the reflective material image obtained in step 3
Figure PCTCN2020088883-appb-000006
The resolution and detail quality of the image T * are obtained.
(5)通过最小化基于物理的可微渲染器渲染步骤4提升了质量的反射材质图像T *得到的渲染结果与输入人脸图像的差异,迭代优化人脸反射材质的隐空间系数和球谐光照系数,将优化后的人脸反射材质隐空间系数通过步骤3-4的解码与质量提升操作求解得到人脸反射材质。 (5) Iteratively optimize the hidden space coefficient and spherical harmonics of the face reflection material by minimizing the difference between the rendering result obtained by the physically-based differentiable renderer rendering step 4 and the improved quality of the reflective material image T * and the input face image Illumination coefficient, the hidden space coefficient of the optimized face reflection material is solved through the decoding and quality improvement operations in steps 3-4 to obtain the face reflection material.
进一步地,所述步骤1包括如下子步骤:Further, the step 1 includes the following sub-steps:
(1.1)人脸3D信息的计算:检测输入图像中人脸的二维特征点,并利用可形变模型优化人物身份系数、刚体变化矩阵以及投影矩阵,通过可形变模型与人物身份系数的线性插值,得到该人物的3D模型。(1.1) Calculation of face 3D information: detect the two-dimensional feature points of the face in the input image, and use the deformable model to optimize the character identity coefficient, the rigid body change matrix and the projection matrix, through the linear interpolation of the deformable model and the character identity coefficient To get a 3D model of the character.
(1.2)纹理空间的人脸颜色图片的计算:利用步骤1.1得到的刚体变化矩阵、投影矩阵,将步骤1.1得到的3D模型投影至输入图像,建立3D模型的每个顶点与图像像素的映射,将输入图像像素映射至3D模型的顶点,再利用3D模型的顶点与纹理空间的映射,将图像像素映射至纹理空间,然后通过对纹理空间三角网格化及三角形重心坐标插值,得到纹理空间的人脸颜色图像。(1.2) Calculation of face color pictures in texture space: using the rigid body change matrix and projection matrix obtained in step 1.1, project the 3D model obtained in step 1.1 to the input image, and establish the mapping between each vertex of the 3D model and the image pixel. Map the input image pixels to the vertices of the 3D model, and then use the mapping between the vertices of the 3D model and the texture space to map the image pixels to the texture space. Then, the texture space is obtained by triangulating the texture space and interpolating the triangle center of gravity coordinates. Human face color image.
(1.3)基于物理的可微渲染的静态信息的计算:利用1.1步骤中的3D模型、刚体变化矩阵、投影矩阵,将纹理坐标作为颜色信息绘制至图像空间,得到纹理坐标图像I uv;利用1.1步骤得到的刚体变化矩阵、3D模型,得到经过刚性变化的3D模型,利用光线追踪算法计算上述3D模型每个顶点各个方向的遮挡,并将其投影至球谐函数多项式,由此得到每个顶点遮挡的球谐系数;另外记录未遮挡区域占比以及未遮挡区域的中心方向,得到每个顶点的环境法向量。最后通过纹理空间的三角网格化,以及分别对每个顶点的遮挡球谐系数和环境法向量进行三角形重心坐标插值,得到最终的阴影贴图T sha和环境法向贴图T bn(1.3) Calculation of static information based on physically differentiable rendering: using the 3D model, rigid body change matrix, and projection matrix in step 1.1, the texture coordinates are drawn as color information to the image space to obtain the texture coordinate image I uv ; using 1.1 The rigid body change matrix and 3D model obtained in the step are obtained, and the rigid change 3D model is obtained. The ray tracing algorithm is used to calculate the occlusion in each direction of each vertex of the above 3D model, and project it to the spherical harmonic function polynomial, thereby obtaining each vertex The occluded spherical harmonic coefficient; in addition, the proportion of the unoccluded area and the center direction of the unoccluded area are recorded to obtain the environment normal vector of each vertex. Finally, through the triangular meshing of the texture space, and the triangular barycentric coordinate interpolation of the occlusion spherical harmonic coefficient of each vertex and the environment normal vector, the final shadow map T sha and the environment normal map T bn are obtained .
进一步地,所述步骤1.2中,利用泊松算法填补纹理空间的人脸颜色图像中存在的空洞区域。Further, in the step 1.2, the Poisson algorithm is used to fill the void areas in the face color image in the texture space.
进一步地,基于卷积神经网络的编码器、解码器通过组成U型网络共同训练得到,训练具体包括如下子步骤:Further, the encoder and decoder based on the convolutional neural network are obtained by forming a U-shaped network for joint training, and the training specifically includes the following sub-steps:
(a)训练数据:获取N张目标人脸图像I o及对应的漫反射材质
Figure PCTCN2020088883-appb-000007
法向材质
Figure PCTCN2020088883-appb-000008
以及镜面反射材质
Figure PCTCN2020088883-appb-000009
将人脸图像映射到纹理空间,得到对应的纹理空间人脸颜色图像I。
Figure PCTCN2020088883-appb-000010
Figure PCTCN2020088883-appb-000011
组成U型网络的训练数据,其中每项分辨率都为1024×1024。
(a) Training data: Obtain N target face images I o and corresponding diffuse reflection materials
Figure PCTCN2020088883-appb-000007
Normal material
Figure PCTCN2020088883-appb-000008
And specular reflection material
Figure PCTCN2020088883-appb-000009
The face image is mapped to the texture space, and the corresponding texture space face color image I is obtained.
Figure PCTCN2020088883-appb-000010
Figure PCTCN2020088883-appb-000011
The training data that composes the U-shaped network, each of which has a resolution of 1024×1024.
(b)漫反射材质、法向材质、镜面反射材质各有一个U型网络。对于漫反射材质的U型网络U a,输入是经缩放的纹理空间人脸颜色图像
Figure PCTCN2020088883-appb-000012
U a的编码器部分E a包含9个下采样模块,前8个下采样模块都包含一个核大小为3×3、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最后一个下采样模块包含核大小为1×1、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最终编码成为1×1×1024的漫反射材质隐空间。U a的解码器部分D a包含9个上采样模块,每个上采样模块都包含一个核大小为3×3、放大两倍的缩放卷积层、批标准化层、LReLu激活函数层,最后通过一个核大小为1×1、步长为1×1、激活函数为Sigmoid的卷积层得到最终分辨率为512×512×3的输出。法向材质的U型网络U n,输入是 经过区域插值缩放的纹理空间人脸颜色图像
Figure PCTCN2020088883-appb-000013
分辨率是256×256,其编码器E n包括8个下采样模块,前7个下采样模块都包含一个核大小为3×3、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最后一个下采样模块包含核大小为1×1、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最终编码成为1×1×512的法向材质隐空间。解码器D n中包括8个上采样模块,前7个上采样模块都包含一个核大小为3×3、放大两倍的缩放卷积层、批标准化层、LReLu激活函数层,最后通过一个核大小为1×1、步长为1×1、激活函数为Sigmoid的卷积层得到最终分辨率为256×256×3的输出。镜面反射材质的U型网络U s,其编码器结构E s与E n相同,D s前7个上采样模块都包含一个核大小为3×3、放大两倍的缩放卷积层、批标准化层、LReLu激活函数层,最后通过一个核大小为1×1、步长为1×1、激活函数为Sigmoid的卷积层得到最终分辨率为256×256×1的输出。其中,U形状网络的E *与D *中最高分辨率的3个模块进行跳跃式传递连接,*为a,n,s。
(b) Diffuse material, normal material, and specular material each have a U-shaped network. For a U-shaped network U a of diffuse reflection material, the input is a scaled texture space face color image
Figure PCTCN2020088883-appb-000012
The encoder part E a of U a contains 9 down-sampling modules. The first 8 down-sampling modules all include a convolutional layer with a core size of 3×3 and a step size of 2×2, a batch normalization layer, and an LReLU activation function layer. , The last down-sampling module includes a convolutional layer with a kernel size of 1×1 and a step size of 2×2, a batch normalization layer, and an LReLU activation function layer. The final encoding becomes a 1×1×1024 diffuse reflection material hidden space. The decoder part of U a D a contains 9 up-sampling modules. Each up-sampling module contains a zoomed convolutional layer with a core size of 3×3 and doubled magnification, batch normalization layer, LReLu activation function layer, and finally passed A convolutional layer with a core size of 1×1, a step size of 1×1, and an activation function of Sigmoid will obtain an output with a final resolution of 512×512×3. U-shaped network of normal material U n , the input is a face color image in texture space scaled by regional interpolation
Figure PCTCN2020088883-appb-000013
The resolution is 256×256, and the encoder En includes 8 down-sampling modules. The first 7 down-sampling modules include a convolutional layer with a core size of 3×3 and a step size of 2×2, a batch normalization layer, LReLU activation function layer, the last down-sampling module includes a convolutional layer with a core size of 1×1 and a step size of 2×2, a batch normalization layer, and an LReLU activation function layer. The final encoding becomes a 1×1×512 normal material Hidden space. The decoder D n includes 8 up-sampling modules, the first 7 up-sampling modules all include a core size of 3×3, zoomed convolution layer twice magnified, batch normalization layer, LReLu activation function layer, and finally through a core A convolutional layer with a size of 1×1, a step size of 1×1, and an activation function of Sigmoid obtains an output with a final resolution of 256×256×3. Specularly reflective material Network U S U, S E of the encoder structure which is same as E n, D s 7 before upsampled module contains a core size of 3 × 3, enlarged to twice the scale layer convolution, normalized batch Layer, LReLu activation function layer, and finally through a convolutional layer with a kernel size of 1×1, a step size of 1×1, and an activation function of Sigmoid to obtain an output with a final resolution of 256×256×1. Among them, the three modules with the highest resolution in the U-shaped network E * and D * are connected by skipping transmission, and * is a, n, s.
(c)训练的损失函数的定义如下:(c) The training loss function is defined as follows:
Figure PCTCN2020088883-appb-000014
Figure PCTCN2020088883-appb-000014
Figure PCTCN2020088883-appb-000015
Figure PCTCN2020088883-appb-000015
U *表示U型网络,其中下标*可以是a,n,s分别表示漫反射材质、法向材质、镜面反射材质,
Figure PCTCN2020088883-appb-000016
表示表示缩放后的纹理空间的人脸颜色图像,
Figure PCTCN2020088883-appb-000017
Figure PCTCN2020088883-appb-000018
分别表示U型网络输出材质图像与相应的缩放后的真实材质图像,其中
Figure PCTCN2020088883-appb-000019
分辨率是512×512,而
Figure PCTCN2020088883-appb-000020
分辨率是256×256。
Figure PCTCN2020088883-appb-000021
分辨率是512×512,而
Figure PCTCN2020088883-appb-000022
的分辨率是256×256。
U * represents a U-shaped network, where the subscript * can be a, n, s representing diffuse reflection material, normal material, and specular reflection material, respectively.
Figure PCTCN2020088883-appb-000016
Represents the face color image representing the scaled texture space,
Figure PCTCN2020088883-appb-000017
and
Figure PCTCN2020088883-appb-000018
Respectively represent the U-shaped network output material image and the corresponding zoomed real material image, where
Figure PCTCN2020088883-appb-000019
The resolution is 512×512, and
Figure PCTCN2020088883-appb-000020
The resolution is 256×256.
Figure PCTCN2020088883-appb-000021
The resolution is 512×512, and
Figure PCTCN2020088883-appb-000022
The resolution is 256×256.
进一步地,所述步骤2中,输入图像的球谐光照的初始值
Figure PCTCN2020088883-appb-000023
通过构建球谐光照系数回归网络获得,所述球谐光照系数回归网络包括基于卷积神经网络的编码器以及全连接构成的回归模块,训练过程包括如下步骤:
Further, in the step 2, the initial value of the spherical harmonic illumination of the input image
Figure PCTCN2020088883-appb-000023
It is obtained by constructing a spherical harmonic illumination coefficient regression network. The spherical harmonic illumination coefficient regression network includes an encoder based on a convolutional neural network and a regression module composed of a full connection. The training process includes the following steps:
(A)由{I o,z e}组成训练数据对,其中球谐系数z e根据HDR环境光图像I e,通过下式计算: (A) A training data pair is composed of {I o , z e }, in which the spherical harmonic coefficient z e is calculated according to the HDR ambient light image I e by the following formula:
Figure PCTCN2020088883-appb-000024
Figure PCTCN2020088883-appb-000024
其中i,j表示图像长宽W,H方向的笛卡尔坐标,Y k表示球谐函数多项式,k表示球谐的阶数,0≤k<9,φ表示图像坐标i,j到球面坐标θ,
Figure PCTCN2020088883-appb-000025
的转换方程,其表达式如下:
Where i,j represent the Cartesian coordinates of the image length and width in W and H directions, Y k represents the spherical harmonic function polynomial, k represents the order of the spherical harmonic, 0≤k<9, and φ represents the image coordinate i,j to the spherical coordinate θ ,
Figure PCTCN2020088883-appb-000025
The conversion equation of, its expression is as follows:
Figure PCTCN2020088883-appb-000026
Figure PCTCN2020088883-appb-000026
Figure PCTCN2020088883-appb-000027
Figure PCTCN2020088883-appb-000027
(B)缩放I o至分辨率256×256作为网络输入,利用L2范数作为损失函数对网络进行端到端的监督性学习训练。 (B) Scaling I o to a resolution of 256×256 as the network input, and using the L2 norm as the loss function to perform end-to-end supervised learning training on the network.
进一步地,所述步骤4中,通过构建反射材质质量强化网络R *提升反射材质图像
Figure PCTCN2020088883-appb-000028
的分辨率及细节质量,具体包括如下子步骤:
Further, in the step 4, the reflective material image is improved by constructing the reflective material quality enhancement network R *
Figure PCTCN2020088883-appb-000028
The resolution and quality of detail include the following sub-steps:
(4.1)训练基于卷积神经网络的反射材质质量强化网络,具体如下:(4.1) Train the reflection material quality enhancement network based on the convolutional neural network, as follows:
(4.1.1)训练数据:将训练用的人脸颜色图像I输入步骤2训练好的U型网络生成
Figure PCTCN2020088883-appb-000029
与人脸颜色图像I原始的
Figure PCTCN2020088883-appb-000030
组成训练数据对
Figure PCTCN2020088883-appb-000031
*表示a,n,s。
(4.1.1) Training data: input the face color image I used for training into the U-shaped network trained in step 2 to generate
Figure PCTCN2020088883-appb-000029
Original with face color image I
Figure PCTCN2020088883-appb-000030
Compose training data pair
Figure PCTCN2020088883-appb-000031
* Means a, n, s.
(4.1.2)训练方式:采用SRGAN网络作为反射材质质量强化网络R *,并采用生成对抗(GAN)方式训练;其中,对于漫反射材质质量强化网络R a,输入为512×512的
Figure PCTCN2020088883-appb-000032
输出图像分辨率为1024×1024。对于法向材质质量强化网络R n,和高光材质质量强化网络R s,其网络的第一层接受的图像深度为4,输入包括
Figure PCTCN2020088883-appb-000033
和缩放的纹理空间的人脸颜色图像
Figure PCTCN2020088883-appb-000034
输入的分辨率为256×256,输出分辨率为1024×1024的高质量材质图像。
(4.1.2) Training method: SRGAN network is used as the reflection material quality enhancement network R * , and the generative confrontation (GAN) method is used for training; among them, for the diffuse reflection material quality enhancement network R a , the input is 512×512
Figure PCTCN2020088883-appb-000032
The output image resolution is 1024×1024. For the normal material quality enhancement network R n and the specular material quality enhancement network R s , the first layer of the network accepts an image depth of 4, and the input includes
Figure PCTCN2020088883-appb-000033
And scaled texture space face color image
Figure PCTCN2020088883-appb-000034
The input resolution is 256×256, and the output resolution is 1024×1024 high-quality texture images.
(4.2)材质图像的质量强化:基于步骤3生成的
Figure PCTCN2020088883-appb-000035
利用步骤4.1中训练好的质量强化网络进行质量强化,得到高质量材质图像T *,*表示a,n,s,整个过程可以用下式表示:
(4.2) Quality enhancement of material images: based on step 3
Figure PCTCN2020088883-appb-000035
Use the trained quality enhancement network in step 4.1 for quality enhancement to obtain a high-quality material image T * ,* represents a, n, s, and the whole process can be expressed by the following formula:
Figure PCTCN2020088883-appb-000036
Figure PCTCN2020088883-appb-000036
Figure PCTCN2020088883-appb-000037
Figure PCTCN2020088883-appb-000037
Figure PCTCN2020088883-appb-000038
Figure PCTCN2020088883-appb-000038
Figure PCTCN2020088883-appb-000039
表示缩放至256×256的纹理空间人脸颜色图像。
Figure PCTCN2020088883-appb-000039
Represents a face color image scaled to 256×256 in texture space.
进一步地,所述步骤5包括如下子步骤:Further, the step 5 includes the following sub-steps:
(5.1)利用反射材质和球谐光照进行基于物理的正向渲染:(5.1) Physically-based forward rendering using reflective materials and spherical harmonic lighting:
(5.1.1)计算人脸漫反射:按照步骤1.3中得到I uv对T *质量强化网络输出的T a、T n和T s以及阴影贴图T sha和环境法向贴图T bn进行双线性采样,得到对应的图像空间的材质图像t *,*是a,n,s,sha,bn分别表示漫反射材质、法向材质、镜面反射材质、阴影贴图以及环境法向贴图。遍历I uv中所有像素,利用以下基于物理的渲染公式计算每个像素的漫反射光照: (5.1.1) Calculate the diffuse reflection of the face: According to the I uv obtained in step 1.3, the T a , T n and T s output by the T * quality enhancement network and the shadow map T sha and the environment normal map T bn are bilinear Sampling to obtain the material image t * of the corresponding image space, * is a, n, s, sha, bn representing the diffuse material, the normal material, the specular material, the shadow map, and the environment normal map, respectively. Traverse all the pixels in I uv , and use the following physically-based rendering formula to calculate the diffuse lighting of each pixel:
Figure PCTCN2020088883-appb-000040
Figure PCTCN2020088883-appb-000040
其中,k表示球谐函数多项式的阶数,利用球谐乘投影性质将z e,v进行重投影,得到w,v表示每个像素各个方向的可见性,记录在t sha中;c由max(0,cosθ)的球谐系数旋转至当前像素法向方向n的球谐系数,n记录在t n中。 Among them, k represents the order of the spherical harmonic function polynomial, and reprojects z e and v using the properties of spherical harmonic multiplication projection to obtain w, v represents the visibility of each pixel in each direction, and is recorded in t sha ; c is determined by max The spherical harmonic coefficient of (0, cosθ) is rotated to the spherical harmonic coefficient of the normal direction n of the current pixel, and n is recorded in t n .
(5.1.2)计算人脸镜面反射并计算渲染结果:利用以下公式计算人脸镜面高光反射:(5.1.2) Calculate the specular reflection of the face and calculate the rendering result: use the following formula to calculate the specular specular reflection of the face:
L s=DFG·LD, L s =DFG·LD,
其中DFG表示预先计算的服从GGX分布的渲染传输方程,LD的计算方式如下:Where DFG represents the pre-calculated rendering transmission equation that obeys the GGX distribution, and the calculation method of LD is as follows:
Figure PCTCN2020088883-appb-000041
Figure PCTCN2020088883-appb-000041
利用以下公式融合漫反射与镜面反射,计算I uv中每个像素的渲染结果: Use the following formula to fuse diffuse reflection and specular reflection to calculate the rendering result of each pixel in I uv:
Figure PCTCN2020088883-appb-000042
Figure PCTCN2020088883-appb-000042
Figure PCTCN2020088883-appb-000043
即为最终渲染结果。
Figure PCTCN2020088883-appb-000043
That is the final rendering result.
(5.2)迭代优化材质隐空间变量与球谐光照系数z e:最小化以下公式: (5.2) Iterative optimization of material hidden space variables and spherical harmonic illumination coefficient z e : Minimize the following formula:
Figure PCTCN2020088883-appb-000044
Figure PCTCN2020088883-appb-000044
L表示损失函数,
Figure PCTCN2020088883-appb-000045
表示步骤5.1的可微渲染过程。利用可微渲染、可微的质量强化网络以及可微的解码器,将损失值反向传递至z *,并迭代更新z *,*是a,n,s,e分别表示漫反射材质、法向材质、镜面反射材质、球谐光照,直至收敛,最后向漫反射、法向量、镜面反射材质解码器分别输入z a,z n,z s,并将其输出再输入至对应的材质质量强化网络,得到符合输入图像人物特征的材质T a,T n,T s
L represents the loss function,
Figure PCTCN2020088883-appb-000045
Represents the differentiable rendering process of step 5.1. Rendering using differentiable, differentiable strengthen the network quality and a micro decoder, is transmitted to the reverse loss values z *, and iteratively updated z *, * is a, n, s, e each represent diffuse material, method To material, specular reflection material, spherical harmonic lighting, until convergence, and finally to diffuse reflection, normal vector, specular reflection material decoder respectively input z a , z n , z s , and then input the output to the corresponding material quality enhancement Through the network, materials T a , T n , and T s that meet the character characteristics of the input image are obtained.
本发明的有益效果是,本发明提出结合基于神经网络的非线性解码器、质量强化其与基于物理的可微渲染器从单张人脸图像中计算人脸反射材质的方法。借助基于神经网络的非线性解码器与质量强化器表达复杂的人脸反射材质空间,另外借助基于物理的可微渲染器优化人脸反射材质空间,使得求解的人脸反射材质符合输入人脸的特征,且渲染得到结果逼真且与输入人脸相像。本方法达到人脸反射材质求解技术的最先进水平,且处理时间短。本发明可以用于人脸材质的捕捉,人脸的重建,以及真实人脸的渲染等应用。The beneficial effect of the present invention is that the present invention proposes a method that combines a neural network-based non-linear decoder, quality enhancement, and a physically-based differentiable renderer to calculate a face reflection material from a single face image. Use the neural network-based nonlinear decoder and quality enhancer to express the complex face reflection material space, and use the physically-based differentiable renderer to optimize the face reflection material space, so that the solved face reflection material conforms to the input face Features, and the rendering results are realistic and similar to the input face. This method reaches the most advanced level of the technology for solving face reflection materials, and the processing time is short. The present invention can be used in applications such as the capture of human face materials, the reconstruction of human faces, and the rendering of real human faces.
附图说明Description of the drawings
图1是应用本发明的方法对第一张人脸图片材质求解、重建及再渲染的结果图,图中,A为输入图像,B是利用求解得到的人脸反射材质重建的结果,C是在新光照条件下渲染结果;D是漫反射材质t a,E是法向材质t n,F是镜面反射材质t sFigure 1 is the result of solving, reconstructing and re-rendering the material of the first face picture by applying the method of the present invention. In the figure, A is the input image, B is the result of the reconstruction of the face reflection material obtained by the solution, and C is Render the result under new lighting conditions; D is the diffuse material t a , E is the normal material t n , and F is the specular material t s .
图2是应用本发明的方法对第二张人脸图片材质求解、重建及再渲染的结果图,图中,A为输入图像,B是利用求解得到的人脸反射材质重建的结果,C是在新光照条件下渲染结果;D是漫反射材质t a,E是法向材质t n,F是镜面反射材质t sFigure 2 is the result of solving, reconstructing and re-rendering the material of the second face picture by applying the method of the present invention. In the figure, A is the input image, B is the result of the reconstruction of the face reflection material obtained by the solution, and C is Render the result under new lighting conditions; D is the diffuse material t a , E is the normal material t n , and F is the specular material t s .
图3是应用本发明的方法对第三张人脸图片材质求解、重建及再渲染的结果图,图中,A为输入图像,B是利用求解得到的人脸反射材质重建的结果,C是在新光照条件下渲染结果;D是漫反射材质t a,E是法向材质t n,F是镜面反射材质t sFigure 3 is the result of solving, reconstructing and re-rendering the material of the third face picture by applying the method of the present invention. In the figure, A is the input image, B is the result of the reconstruction of the face reflection material obtained by the solution, and C is Render the result under new lighting conditions; D is the diffuse material t a , E is the normal material t n , and F is the specular material t s .
图4是应用本发明的方法对第四张人脸图片材质求解、重建及再渲染的结果图,图中,A为输入图像,B是利用求解得到的人脸反射材质重建的结果,C是在新光照条件下渲染结果;D是漫反射材质t a,E是法向材质t n,F是镜面反射材质t sFigure 4 is the result of solving, reconstructing and re-rendering the material of the fourth face picture by applying the method of the present invention. In the figure, A is the input image, B is the result of the reconstruction of the face reflection material obtained by the solution, and C is Render the result under new lighting conditions; D is the diffuse material t a , E is the normal material t n , and F is the specular material t s .
图5是应用本发明的方法对第五张人脸图片材质求解、重建及再渲染的结果图,图中, A为输入图像,B是利用求解得到的人脸反射材质重建的结果,C是在新光照条件下渲染结果;D是漫反射材质t a,E是法向材质t n,F是镜面反射材质t sFigure 5 is the result of solving, reconstructing and re-rendering the material of the fifth face picture by applying the method of the present invention. In the figure, A is the input image, B is the result of the reconstruction of the face reflection material obtained by the solution, and C is Render the result under new lighting conditions; D is the diffuse material t a , E is the normal material t n , and F is the specular material t s .
具体实施方式Detailed ways
本发明的核心技术利用神经网络非线性表达复杂的人脸反射材质空间,并利用基于物理的可微渲染器优化该空间,以得到符合输入图像特征的人脸反射材质。该方法主要分为以下五个主要步骤:人脸3D几何信息的计算,人脸反射材质隐空间以及球谐光照的初始化,隐空间到反射材质图像的解码,人脸反射材质的质量提升,迭代优化人脸反射材质的隐空间系数和球谐光照系数并根据反射材质隐空间系数求解人脸反射材质。The core technology of the present invention uses a neural network to non-linearly express the complex face reflection material space, and uses a physically-based differentiable renderer to optimize the space to obtain a face reflection material that meets the characteristics of the input image. The method is mainly divided into the following five main steps: the calculation of 3D geometric information of the face, the initialization of the hidden space of the face reflection material and the spherical harmonic lighting, the decoding of the hidden space to the reflection material image, the quality improvement of the face reflection material, and iteration Optimize the hidden space coefficient and spherical harmonic illumination coefficient of the face reflection material, and solve the face reflection material according to the hidden space coefficient of the reflection material.
下面详细说明本发明的各个步骤。图1-5是应用本发明的方法对五张人物图片进行材质求解、人脸重建以及在新光照下再渲染的结果。每张图中第一行的左图是输入图像,中图是利用求解得到的人脸反射材质重建的结果,右图是在新光照条件下渲染结果;第二行左图是漫反射材质t a,中图是法向材质t n,右图是镜面反射材质t s,由I uv对求解得到的材质双线性采样获得。 The steps of the present invention are described in detail below. Figures 1-5 are the results of applying the method of the present invention to solving the material of five character pictures, reconstructing the face, and re-rendering under new lighting. The left picture in the first row of each picture is the input image, the middle picture is the result of the reconstruction of the face reflection material obtained by the solution, the right picture is the rendering result under the new lighting conditions; the second row, the left picture is the diffuse reflection material t a , The middle picture is the normal material t n , and the right picture is the specular reflection material t s , which is obtained by bilinear sampling of the solved material by I uv.
1.图像中人脸3D几何信息的计算:计算输入图像中人脸的3D信息,并获得纹理空间的人脸颜色图以及用于基于物理的可微渲染的静态信息。1. Calculation of the 3D geometric information of the face in the image: Calculate the 3D information of the face in the input image, and obtain the face color map in the texture space and the static information for physically-based differentiable rendering.
1.1 人脸3D信息的计算1.1 Calculation of face 3D information
本发明采用算法(Chen Cao,Qiming Hou,and Kun Zhou.2014a.Displaced dynamic expression regression for real-time facial tracking and animation.ACM Transactions on graphics(TOG)33,4(2014),43.)来检测输入图像脸部二维特征点,采用(Justus Thies,Michael Zollhofer,Marc Stamminger,Christian Theobalt,and Matthias Nieβner.2016.Face2face:Real-time face capture and reenactment of rgb videos.In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition.2387–2395.)求解人物身份系数(identity)、刚体变化矩阵以及投影矩阵,通过身份系数对可形变形状模型插值,可以得到输入人脸的3D模型:The present invention uses algorithms (Chen Cao, Qiming Hou, and Kun Zhou.2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on graphics (TOG) 33, 4 (2014), 43.) to detect input Two-dimensional feature points of the image face, using (Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nieβner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedrferences V of the IEEE Conference Pattern Recognition.2387-2395.) Solve the identity coefficient (identity), rigid body change matrix and projection matrix, and interpolate the deformable shape model through the identity coefficient, and then the 3D model of the input face can be obtained:
1.2 纹理空间的人脸颜色图片计算1.2 Face color image calculation in texture space
利用步骤1.1得到的刚体变化矩阵、投影矩阵,将步骤1.1得到的3D模型投影至输入图像,建立3D模型的每个顶点与图像像素的映射,由此可以将输入图像像素映射至3D模型的顶点,再利用3D模型的顶点与纹理空间的映射,可以将图像像素映射至纹理空间,然后通过对纹理空间三角网格化以及三角形重心坐标插值,得到纹理空间的人脸颜色图像。由于输入人脸存在遮挡,上述纹理空间的人脸颜色图像存在空洞区域,利用泊松算法填补空洞,得到最终的纹理空间人脸颜色图像。Using the rigid body change matrix and projection matrix obtained in step 1.1, project the 3D model obtained in step 1.1 to the input image, and establish a mapping between each vertex of the 3D model and the image pixel, so that the input image pixel can be mapped to the vertex of the 3D model , And then use the mapping between the vertices of the 3D model and the texture space to map the image pixels to the texture space, and then obtain the face color image in the texture space by triangulating the texture space and interpolating the coordinates of the triangle's center of gravity. Due to the occlusion of the input face, the face color image in the texture space has a hole area, and the Poisson algorithm is used to fill the hole to obtain the final texture space face color image.
1.3 基于物理的可微渲染的静态信息的计算1.3 Calculation of static information based on physically differentiable rendering
利用1.1步骤中的3D模型、刚体变化矩阵、投影矩阵,将纹理坐标作为颜色信息绘制至图像空间,得到纹理坐标图像I uv;利用1.1步骤得到的刚体变化矩阵、3D模型,得到经过刚性变化的3D模型,利用光线追踪算法计算上述3D模型每个顶点各个方向的遮挡,并将其投影至球谐函数多项式,本实施方式中采用9阶,由此得到每个顶点遮挡的球谐系数;另外记录未遮挡区域占比以及未遮挡区域的中心方向,得到每个顶点的环境法向量。最后通过纹理空间的三角网格化,以及分别对每个顶点的遮挡球谐系数和环境法向量进行三角形重心坐标插值,得到最终的阴影贴图T sha和环境法向贴图T bnUse the 3D model, rigid body change matrix, and projection matrix in step 1.1 to draw texture coordinates as color information to the image space to obtain the texture coordinate image I uv ; use the rigid body change matrix and 3D model obtained in step 1.1 to obtain the rigid body change matrix 3D model, using the ray tracing algorithm to calculate the occlusion of each vertex of the above 3D model in various directions, and project it to the spherical harmonic function polynomial. In this embodiment, the 9th order is used to obtain the spherical harmonic coefficient of the occlusion of each vertex; Record the proportion of the unoccluded area and the center direction of the unoccluded area to obtain the environment normal vector of each vertex. Finally, through the triangular meshing of the texture space, and the triangular barycentric coordinate interpolation of the occlusion spherical harmonic coefficient of each vertex and the environment normal vector, the final shadow map T sha and the environment normal map T bn are obtained .
2.人脸反射材质隐空间与球谐光照的初始化:基于步骤1得到的纹理空间人脸颜色图,通过基于卷积神经网络的编码器编码得到人脸反射材质隐空间系数和球谐光照的初始值。2. Initialization of the hidden space of the face reflection material and the spherical harmonic illumination: based on the texture space face color map obtained in step 1, the hidden space coefficients of the face reflection material and the spherical harmonic illumination are obtained through the encoder encoding based on the convolutional neural network Initial value.
2.1 训练基于卷积神经网络的U型网络2.1 Training U-shaped network based on convolutional neural network
训练数据。人脸模型数据库中包含84个3D数字人物,每个人物包含3D模型、漫反射材质
Figure PCTCN2020088883-appb-000046
法向材质
Figure PCTCN2020088883-appb-000047
以及镜面反射材质
Figure PCTCN2020088883-appb-000048
本实施例中数据来源于3D Scan Store。用CFD(Debbie S Ma,Joshua Correll,and Bernd Wittenbrink.2015.The Chicago face database:A free stimulus set of faces and norming data.Behavior research methods 47,4(2015),1122–1135.)中人脸照片对漫反射材质进行肤色数据增广得到约4000张漫反射材质图像。另外环境光数据库中包含2957张HDR环境光图像I e。利用上述数据,我们通过基于图像的光照技术以及基于屏幕的次表面技术渲染人脸图像,在渲染过程中,我们随机转动3D模型和HDR环境光图像I e。这样总共得到约十万张目标人脸图像I o.将人脸图像映射到纹理空间,得到对应的纹理空间人脸颜色图像I。由
Figure PCTCN2020088883-appb-000049
组成U型网络的训练数据,其中每项分辨率都为1024×1024。
Training data. The face model database contains 84 3D digital characters, each of which contains a 3D model and diffuse material
Figure PCTCN2020088883-appb-000046
Normal material
Figure PCTCN2020088883-appb-000047
And specular reflection material
Figure PCTCN2020088883-appb-000048
The data in this embodiment comes from 3D Scan Store. Use CFD (Debbie S Ma, Joshua Correll, and Bernd Wittenbrink. 2015.The Chicago face database: A free stimulus set of faces and norming data.Behavior research methods 47,4(2015),1122--1135.) The skin color data of the diffuse reflection material is augmented to obtain about 4000 diffuse reflection material images. In addition, the ambient light database contains 2957 HDR ambient light images I e . Using the above data, we render the face image through image-based lighting technology and screen-based subsurface technology. During the rendering process, we randomly rotate the 3D model and the HDR ambient light image I e . In this way, a total of about 100,000 target face images I o are obtained . The face image is mapped to the texture space, and the corresponding texture space face color image I is obtained. Depend on
Figure PCTCN2020088883-appb-000049
The training data that composes the U-shaped network, each of which has a resolution of 1024×1024.
网络结构。U型网络结构:漫反射材质、法向材质、镜面反射材质各有一个U型网络。每个U型网络都由编码器E、解码器D以及跳跃式传递构成。对于漫反射材质的U型网络U a,输入是经缩放的纹理空间人脸颜色图像
Figure PCTCN2020088883-appb-000050
其中,
Figure PCTCN2020088883-appb-000051
利用区域插值缩放算法将I缩放至分辨率512×512得到。U a的编码器部分E a包含9个下采样模块,前8个下采样模块都包含一个核大小为3×3、步长为2×2的卷积层、批标准化层(S.Ioffe and C.Szegedy.Batch normalization:Accelerating deep network training by reducing internal covariate shift.arXiv preprint arXiv:1502.03167,2015.)、LReLU激活函数层(Andrew L Maas,Awni Y Hannun,and Andrew Y Ng.2013.Rectifier nonlinearities improve neural network acoustic models.In Proc.icml,Vol.30.3.),最后一个与前八个区别是核大小为1×1,最终编码成为1×1×1024的漫反射材质隐空间。U a的解码器部分D a包含9个上采样模块,每个上采样模块都包含一个核大小为3×3、放大两倍的缩放卷积层(Jon Gauthier.2014.Conditional generative adversarial nets for  convolutional face generation.Class Project for Stanford CS231N:Convolutional Neural Networks for Visual Recognition,Winter semester 2014,5(2014),2.)、批标准化层、LReLu激活函数层,最后通过一个核大小为1×1、步长为1×1、激活函数为Sigmoid的卷积层得到最终分辨率为512×512×3的输出。另外E a与D a中最高分辨率的3个模块会进行跳跃式传递连接(Phillip Isola,Jun-Yan Zhu,Tinghui Zhou,and Alexei A Efros.2017.Image-to-image translation with conditional adversarial networks.Proceedings of the IEEE conference on computer vision and pattern recognition(2017).)。上述网络结构可以表示为(C32K3S2,BN,LReLU,Skip1)->(C64K3S2,BN,LReLU,Skip2)->(C128K3S2,BN,LReLU,Skip3)->(C258K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C1024K1S2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(R256K3R2,BN,LReLU)->(Skip3,RC128K3R2,BN,LReLU)->(Skip2,RC64K3R2,BN,LReLU)->(Skip1,RC32K3R2,BN,LReLU)->(C3K1S1,Sigmoid),其中CxKySz表示z大小步长、核大小为y、输出深度为x的卷积层,BN表示批标准化,RCxKyRz表示缩放比例为z、核大小为y、输出深度为x的缩放卷积层,Skip表示跳跃式连接,其后的数字表示编号,编号相同表示同一组跳跃式连接。对于法向材质的U型网络U n,输入是经过区域插值缩放的纹理空间人脸肤色图像
Figure PCTCN2020088883-appb-000052
分辨率是256×256,其与U a的主要区别在于编码器E n与解码器D n中各少了一个下采样层与上采样层,隐空间大小为1×1×512,D n输出大小为256×256×3。网络结构表示如下,(C32K3S2,BN,LReLU,Skip1)->(C64K3S2,BN,LReLU,Skip2)->(C128K3S2,BN,LReLU,Skip3)->(C258K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K1S2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(R256K3R2,BN,LReLU)->(Skip3,RC128K3R2,BN,LReLU)->(Skip2,RC64K3R2,BN,LReLU)->(Skip1,RC32K3R2,BN,LReLU)->(C3K1S1,Sigmoid)。对于镜面反射材质的U型网络U s,其编码器结构E s与E n相同,D s与D n的唯一不同在于最后一层卷积层的输出深度为1,D s的输出大小为256×256×1。
Network structure. U-shaped network structure: Diffuse reflection material, normal material, and specular reflection material each have a U-shaped network. Each U-shaped network is composed of encoder E, decoder D, and skip transfer. For a U-shaped network U a of diffuse reflection material, the input is a scaled texture space face color image
Figure PCTCN2020088883-appb-000050
in,
Figure PCTCN2020088883-appb-000051
The area interpolation scaling algorithm is used to scale I to a resolution of 512×512. The encoder part E a of U a contains 9 down-sampling modules. The first 8 down-sampling modules all include a convolutional layer with a core size of 3×3 and a step size of 2×2, and a batch normalization layer (S.Ioffe and C.Szegedy.Batch normalization:Accelerating deep network training by reducing internal covariate shift.arXiv preprint arXiv:1502.03167,2015.), LReLU activation function layer (Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models.In Proc.icml,Vol.30.3.), the last one is different from the first eight in that the core size is 1×1, and the final encoding becomes a 1×1×1024 diffuse reflection material hidden space. The decoder part D a of U a contains 9 up-sampling modules, each of which contains a scaled convolutional layer with a core size of 3×3 and twice the magnification (Jon Gauthier.2014.Conditional generative adversarial nets for convolutional face generation.Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester 2014, 5(2014), 2.), batch normalization layer, LReLu activation function layer, and finally pass a core size of 1×1 The convolutional layer with 1×1 and activation function Sigmoid obtains an output with a final resolution of 512×512×3. In addition , the 3 modules with the highest resolution in E a and D a will be connected by skip transfer (Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition(2017).). The above network structure can be expressed as (C32K3S2,BN,LReLU,Skip1)->(C64K3S2,BN,LReLU,Skip2)->(C128K3S2,BN,LReLU,Skip3)->(C258K3S2,BN,LReLU)->(C512K3S2 ,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C1024K1S2,BN,LReLU)->(RC512K3R2,BN,LReLU) ->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(R256K3R2,BN,LReLU)->(Skip3, RC128K3R2,BN,LReLU)->(Skip2,RC64K3R2,BN,LReLU)->(Skip1,RC32K3R2,BN,LReLU)->(C3K1S1,Sigmoid), where CxKySz represents z size step, core size is y, output Convolutional layer with depth x, BN means batch normalization, RCxKyRz means scaled convolutional layer with scaling ratio z, kernel size y, output depth x, Skip means skip connection, the following number means number, the same number Represents the same group of skip connections. For the U-shaped network U n of normal material, the input is the skin color image of the face in the texture space after regional interpolation scaling
Figure PCTCN2020088883-appb-000052
The resolution is 256×256. The main difference from U a is that the encoder En and the decoder D n each lack a down-sampling layer and an up-sampling layer. The hidden space size is 1×1×512, and D n is output. The size is 256×256×3. The network structure is represented as follows, (C32K3S2,BN,LReLU,Skip1)->(C64K3S2,BN,LReLU,Skip2)->(C128K3S2,BN,LReLU,Skip3)->(C258K3S2,BN,LReLU)->(C512K3S2, BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K3S2,BN,LReLU)->(C512K1S2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)- >(RC512K3R2,BN,LReLU)->(RC512K3R2,BN,LReLU)->(R256K3R2,BN,LReLU)->(Skip3,RC128K3R2,BN,LReLU)->(Skip2,RC64K3R2,BN,LReLU)->(Skip1,RC32K3R2,BN,LReLU)->(C3K1S1,Sigmoid). For the material of the specular reflection-type network U S U, S E of the encoder structure which is the same as E n, D s and the only difference is that the output depth D n convolutional layer is the last layer 1, D s is the size of the output 256 ×256×1.
损失函数。用U *表示U型网络,其中下标*可以是a,n,s分别表示漫反射材质、法向材质、镜面反射材质,损失函数的定义如下: Loss function. Use U * to denote a U-shaped network, where the subscript * can be a, n, and s to denote diffuse reflection material, normal material, and specular reflection material, respectively. The loss function is defined as follows:
Figure PCTCN2020088883-appb-000053
Figure PCTCN2020088883-appb-000053
Figure PCTCN2020088883-appb-000054
Figure PCTCN2020088883-appb-000054
Figure PCTCN2020088883-appb-000055
表示缩放后的纹理空间的人脸颜色图像,
Figure PCTCN2020088883-appb-000056
Figure PCTCN2020088883-appb-000057
分别表示U型网络输出材质图像与相应的缩放后的真实材质图像。其中
Figure PCTCN2020088883-appb-000058
分辨率是512×512,而
Figure PCTCN2020088883-appb-000059
分辨率是256×256。
Figure PCTCN2020088883-appb-000060
分辨率是512×512,而
Figure PCTCN2020088883-appb-000061
的分辨率是256×256。在训练过程中,学习速率是1e-4,使用的优化器是Adam优化器(D.P.Kingma and J.Ba.Adam:A method for stochastic optimization.arXiv preprint arXiv:1412.6980,2014.)。
Figure PCTCN2020088883-appb-000055
Represents the face color image of the scaled texture space,
Figure PCTCN2020088883-appb-000056
and
Figure PCTCN2020088883-appb-000057
Respectively represent the U-shaped network output material image and the corresponding scaled real material image. in
Figure PCTCN2020088883-appb-000058
The resolution is 512×512, and
Figure PCTCN2020088883-appb-000059
The resolution is 256×256.
Figure PCTCN2020088883-appb-000060
The resolution is 512×512, and
Figure PCTCN2020088883-appb-000061
The resolution is 256×256. In the training process, the learning rate is 1e-4, and the optimizer used is the Adam optimizer (DPKingma and J.Ba.Adam:A method for stochastic optimization.arXiv preprint arXiv:1412.6980,2014.).
2.2 训练基于卷积神经网络的球谐光照系数回归网络2.2 Training the Spherical Harmonic Illumination Coefficient Regression Network Based on Convolutional Neural Network
训练数据。根据2.1中得到的目标人脸图像I o.以及I o渲染使用的HDR环境光图像I e,通过下式,计算I e的球谐光照系数z eTraining data. According to the target face image I o . Obtained in 2.1 and the HDR ambient light image I e used for I o rendering, the spherical harmonic illumination coefficient z e of I e is calculated by the following formula:
Figure PCTCN2020088883-appb-000062
Figure PCTCN2020088883-appb-000062
其中i,j表示图像长宽W,H方向的笛卡尔坐标,Y k表示球谐函数多项式,k表示球谐的阶数,0≤k<9,φ表示图像坐标i,j到球面坐标θ,
Figure PCTCN2020088883-appb-000063
的转换方程,其表达式如下:
Where i,j represent the Cartesian coordinates of the image length and width in W and H directions, Y k represents the spherical harmonic function polynomial, k represents the order of the spherical harmonic, 0≤k<9, and φ represents the image coordinate i,j to the spherical coordinate θ ,
Figure PCTCN2020088883-appb-000063
The conversion equation of, its expression is as follows:
Figure PCTCN2020088883-appb-000064
Figure PCTCN2020088883-appb-000064
Figure PCTCN2020088883-appb-000065
Figure PCTCN2020088883-appb-000065
最终,由{I o,z e}组成训练数据对。 Finally, the training data pair is composed of {I o , z e }.
网络训练。我们采用类似VGG(Karen Simonyan and Andrew Zisserman.2014.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556(2014).)网络结构来构建球谐光照系数回归网络E e。具体地,将I o缩放至分辨率256×256,并通过与VGG相同的10层卷积层,最后通过一个平均池化层和全连接层输出球谐光照系数z e。通过球谐光照系数的网络输出与真实值之间的L2范数作为损失函数训练球谐光照系数回归网络。训练学习速率为1e-4,使用的优化器是Adam。 Network training. We use a network structure similar to VGG (Karen Simonyan and Andrew Zisserman.2014.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556(2014).) to construct the spherical harmonic illumination coefficient regression network E e . Specifically, I o is scaled to a resolution of 256×256, and the same 10 convolutional layers as VGG are passed, and finally the spherical harmonic illumination coefficient z e is output through an average pooling layer and a fully connected layer. The L2 norm between the network output of the spherical harmonic illumination coefficient and the true value is used as the loss function to train the spherical harmonic illumination coefficient regression network. The training learning rate is 1e-4, and the optimizer used is Adam.
2.3 初始化材质隐空间变量2.3 Initialize material hidden space variables
向2.1中训练好的3个U型网络中的编码器E *输入缩放的纹理空间的人脸颜色图片
Figure PCTCN2020088883-appb-000066
可以得到
Figure PCTCN2020088883-appb-000067
该值即为材质隐空间变量z *的初始值,另外还需要记录编码器前3个下采样模块输出的特征图组成的集合
Figure PCTCN2020088883-appb-000068
*是a,n,s分别表示漫反射材质、法向材质以及镜面反射材质。该过程可以用以下公式表示:
Input the face color image of the scaled texture space to the encoder E* in the 3 U-shaped networks trained in 2.1
Figure PCTCN2020088883-appb-000066
Can get
Figure PCTCN2020088883-appb-000067
This value is the initial value of the material hidden space variable z * . In addition, it is necessary to record the set of feature maps output by the first three downsampling modules of the encoder.
Figure PCTCN2020088883-appb-000068
*A, n, s represent diffuse reflection material, normal material and specular reflection material respectively. This process can be expressed by the following formula:
Figure PCTCN2020088883-appb-000069
Figure PCTCN2020088883-appb-000069
2.4 初始化球谐光照2.4 Initialize spherical harmonic lighting
向步骤2.2中训练好的球谐光照回归网络E e输入缩放至256×256的人脸照片
Figure PCTCN2020088883-appb-000070
得到表示球谐光照系数
Figure PCTCN2020088883-appb-000071
以此作为球谐光照系数z e的初始值。该过程可以用以下公式表示:
Input the face photo scaled to 256×256 to the spherical harmonic illumination regression network E e trained in step 2.2
Figure PCTCN2020088883-appb-000070
Obtain the spherical harmonic illumination coefficient
Figure PCTCN2020088883-appb-000071
Take this as the initial value of the spherical harmonic illumination coefficient z e. This process can be expressed by the following formula:
Figure PCTCN2020088883-appb-000072
Figure PCTCN2020088883-appb-000072
3.隐空间到反射材质空间的解码:利用基于卷积神经网络实现的可微解码器,将人脸反射材质隐空间的系数解码为相应的反射材质。3. Decoding from hidden space to reflective material space: using a differentiable decoder based on a convolutional neural network to decode the hidden space coefficients of the face reflection material into the corresponding reflective material.
3.1 解码3.1 Decoding
向步骤2.1中训练好的U *中的解码器D *输入z *以及
Figure PCTCN2020088883-appb-000073
进行解码操作,可以得到对应的材质图像,可以用如下表达式表示:
Input z * to decoder D * in U * trained in step 2.1, and
Figure PCTCN2020088883-appb-000073
Performing the decoding operation, the corresponding material image can be obtained, which can be expressed by the following expression:
Figure PCTCN2020088883-appb-000074
Figure PCTCN2020088883-appb-000074
4.人脸反射材质的质量提升:基于步骤3得到的反射材质,利用基于卷积神经网络实现的可微的质量强化网络进一步提升反射材质质量。4. Improvement of the quality of face reflection materials: Based on the reflection material obtained in step 3, the differentiable quality enhancement network based on the convolutional neural network is used to further improve the quality of the reflection material.
4.1 训练基于卷积神经网络的反射材质质量强化网络4.1 Training the reflective material quality enhancement network based on convolutional neural network
训练数据。利用2.1中训练好的U型网络,以步骤2.1中训练数据的I作为网络输入生成
Figure PCTCN2020088883-appb-000075
与步骤2.1中训练数据的T *组成训练数据对
Figure PCTCN2020088883-appb-000076
*表示a,n,s。
Training data. Use the U-shaped network trained in 2.1 and use the I of the training data in step 2.1 as the network input to generate
Figure PCTCN2020088883-appb-000075
Form a training data pair with T * of the training data in step 2.1
Figure PCTCN2020088883-appb-000076
* Means a, n, s.
训练方式。对于漫反射材质的质量强化网络,我们参考SRGAN(Christian Ledig,Lucas Theis,Ferenc Huszár,Jose Caballero,Andrew Cunningham,Alejandro Acosta,Andrew Aitken,Alykhan Tejani,Johannes Totz,Zehan Wang,et al.2017.Photo-realistic single image super-resolution using a generative adversarial network.In Proceedings ofthe IEEE conference on computer vision and pattern recognition.4681–4690)采用生成对抗(GAN)方式训练超分辨率网络R a,将输入的512×512的
Figure PCTCN2020088883-appb-000077
进行质量强化,得到1024×1024的T a。对于法向材质和高光材质,我们同样采样生成对抗方式训练超分辨率网络R n,R s,和R a不同的有两点,第一点,它们将输入256×256的材质图像进行质量强化,得到1024×1024的高质量材质图像;第二点,它们的输入除了
Figure PCTCN2020088883-appb-000078
还有缩放的纹理空间的人脸颜色图像
Figure PCTCN2020088883-appb-000079
Training method. For the quality enhancement network of diffuse materials, we refer to SRGAN (Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017.Photo- realistic single image super-resolution using a generative adversarial network.In Proceedings ofthe IEEE conference on computer vision and pattern recognition.4681-4690) using generated against (GAN) training the super-resolution mode network R a, the input of 512 × 512
Figure PCTCN2020088883-appb-000077
Carrying out quality enhancement, a T a of 1024×1024 is obtained. For normal materials and specular materials, we also sample and generate confrontation methods to train the super-resolution network R n , R s . There are two points different from R a . First, they will input 256×256 material images for quality enhancement , Get a high-quality texture image of 1024×1024; secondly, their input is in addition to
Figure PCTCN2020088883-appb-000078
There are also face color images in scaled texture space
Figure PCTCN2020088883-appb-000079
4.2材质图像的质量强化:基于步骤3生成的
Figure PCTCN2020088883-appb-000080
利用步骤4.1中训练好的质量强化网络进行质量强化,得到高质量材质图像T *,*表示a,n,s,整个过程可以用下式表示:
4.2 Quality enhancement of material images: based on step 3
Figure PCTCN2020088883-appb-000080
Use the trained quality enhancement network in step 4.1 for quality enhancement to obtain a high-quality material image T * ,* represents a, n, s, and the whole process can be expressed by the following formula:
Figure PCTCN2020088883-appb-000081
Figure PCTCN2020088883-appb-000081
Figure PCTCN2020088883-appb-000082
Figure PCTCN2020088883-appb-000082
Figure PCTCN2020088883-appb-000083
表示缩放至256×256的纹理空间人脸颜色图像。
Figure PCTCN2020088883-appb-000083
Represents a face color image scaled to 256×256 in texture space.
5.利用基于物理的可微渲染器对隐空间的迭代优化:通过最小化基于物理的可微渲染器的渲染结果与输入人脸图像的差异,迭代优化人脸反射材质的隐空间,并通过解码与质量提升操作得到输出的人脸反射材质结果。5. Iterative optimization of hidden space using physically-based differentiable renderer: by minimizing the difference between the rendering result of physically-based differentiable renderer and the input face image, iteratively optimize the hidden space of the face reflection material, and pass Decoding and quality improvement operations get the output face reflection material result.
5.1 利用反射材质和球谐光照进行基于物理的正向渲染5.1 Physically-based forward rendering using reflective materials and spherical harmonic lighting
计算人脸漫反射。首先按照步骤1.3中得到I uv对质量强化网络输出的T *,*表示a,n,s,及步骤1.3中得到的阴影贴图T sha和环境法向贴图T bn进行双线性采样,可以得到对应的图像空间的材质图像t *,*可以是a,n,s,sha,bn分别表示漫反射材质、法向材质、镜面反射材质、阴影贴图以及环境法向贴图。遍历I uv中所有像素,利用渲染公式计算每个像素的漫反射光照: Calculate the diffuse reflection of the face. First, according to the I uv obtained in step 1.3, perform bilinear sampling on the output of the quality enhancement network T * , * represents a, n, s, and the shadow map T sha and the environment normal map T bn obtained in step 1.3 to obtain The material image t * of the corresponding image space, * can be a, n, s, sha, bn representing diffuse material, normal material, specular material, shadow map, and environment normal map, respectively. Traverse all the pixels in I uv and use the rendering formula to calculate the diffuse illumination of each pixel:
Figure PCTCN2020088883-appb-000084
Figure PCTCN2020088883-appb-000084
其中L(ω)表示ω方向的入射光,V表示可见性,N表示法向,整个公式表示在法向半球上的球面积分。上述公式利用球谐近似(Peter-Pike Sloan,Jan Kautz,and John Snyder.2002.Precomputed radiance transfer for real-time rendering in dynamic,low-frequency lighting environments.In ACM Transactions on Graphics(TOG),Vol.21.ACM,527–536.)可以进一步化简。L和V可以用球谐函数表达为
Figure PCTCN2020088883-appb-000085
v记录在t sha中,表示可见性的球谐系数,max(0,N·ω)同样可以用球谐表示为
Figure PCTCN2020088883-appb-000086
其中,c表示截断余弦函数的球谐系数,由max(0,cosθ)的球谐系数旋转至当前像素法向方向n的球谐系数,n记录在t n中。利用球谐函数乘投影(Peter-Pike Sloan.2008.Stupid spherical harmonics(sh)tricks.In Game developers conference,Vol.9.Citeseer,42.),重投影z e与v,可以得到w,最终利用球谐函数的点乘法点乘w项与c项则可以化解为下式:
Among them, L(ω) represents the incident light in the ω direction, V represents visibility, and N represents the normal direction. The entire formula represents the area of the sphere on the normal hemisphere. The above formula uses spherical harmonic approximation (Peter-Pike Sloan, Jan Kautz, and John Snyder. 2002. Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. In ACM Transactions on Graphics (TOG), Vol. 21 .ACM, 527–536.) can be further simplified. L and V can be expressed as spherical harmonic functions as
Figure PCTCN2020088883-appb-000085
v is recorded in t sha and represents the spherical harmonic coefficient of visibility, max(0,N·ω) can also be expressed as spherical harmonic
Figure PCTCN2020088883-appb-000086
Among them, c represents the spherical harmonic coefficient of the truncated cosine function, which is rotated from the spherical harmonic coefficient of max(0, cosθ) to the spherical harmonic coefficient of the current pixel normal direction n, and n is recorded in t n . Multiply projection by spherical harmonic function (Peter-Pike Sloan.2008.Stupid spherical harmonics(sh)tricks.In Game developers conference,Vol.9.Citeseer,42.), reproject z e and v, you can get w, and finally use The point multiplication of the spherical harmonic function, the point multiplication w term and the c term can be resolved into the following formula:
Figure PCTCN2020088883-appb-000087
Figure PCTCN2020088883-appb-000087
计算人脸镜面反射。同样遍历I uv中所有像素,利用以下渲染公式计算每个像素的镜面反射光照: Calculate the specular reflection of the face. Similarly traverse all the pixels in I uv , and use the following rendering formula to calculate the specular illumination of each pixel:
L s=∮f r(ω,ω o)L(ω)V(ω)max(0,N·ω)dω, L s =∮f r (ω,ω o )L(ω)V(ω)max(0,N·ω)dω,
f r表示服从GGX分布(Bruce Walter,Stephen R.Marschner,Hongsong Li,and Kenneth E.Torrance.2007.Microfacet Models for Refraction through Rough Surfaces.)的光线传输方程,ω o表示视角方向。我们利用(Sébastien Lagarde and Charles de Rousiers.2014.Moving frostbite to physically based rendering.In SIGGRAPH2014 Conference,Vancouver.)拆分上述积分公式,可得下式: f r represents the light transmission equation that obeys the GGX distribution (Bruce Walter, Stephen R. Marschner, Hongsong Li, and Kenneth E. Torrance. 2007. Microfacet Models for Refraction through Rough Surfaces.), and ω o represents the viewing angle direction. We use (Sébastien Lagarde and Charles de Rousiers.2014.Moving frostbite to physically based rendering.In SIGGRAPH2014 Conference,Vancouver.) to split the above integral formula, we can get the following formula:
L s=DFG·LD, L s =DFG·LD,
其中DFG表示预先计算的GGX渲染传输方程,LD的计算方式如下:Where DFG represents the pre-calculated GGX rendering transmission equation, and the calculation method of LD is as follows:
Figure PCTCN2020088883-appb-000088
Figure PCTCN2020088883-appb-000088
利用以下公式融合漫反射与镜面反射,计算I uv中每个像素的渲染结果: Use the following formula to fuse diffuse reflection and specular reflection to calculate the rendering result of each pixel in I uv:
Figure PCTCN2020088883-appb-000089
Figure PCTCN2020088883-appb-000089
Figure PCTCN2020088883-appb-000090
即为最终渲染结果。
Figure PCTCN2020088883-appb-000090
That is the final rendering result.
5.2迭代优化材质隐空间变量与球谐光照系数z e:最小化以下公式: 5.2 Iterative optimization of material hidden space variables and spherical harmonic illumination coefficient z e : Minimize the following formula:
Figure PCTCN2020088883-appb-000091
Figure PCTCN2020088883-appb-000091
L表示损失函数,
Figure PCTCN2020088883-appb-000092
表示步骤5.1的可微渲染过程。利用可微渲染、可微的质量强化网络以及可微的解码器,将损失值反向传递至z *,并迭代更新z *,直至收敛,最后向漫反射、法向量、镜面反射材质解码器分别输入z a,z n,z s,并将其输出再输入至对应的材质质量强化网络,可以得到符合输入图像人物特征的材质T a,T n,T s。*可以是a,n,s,e分别表示漫反射材质、法向材质、镜面反射材质、球谐光照。
L represents the loss function,
Figure PCTCN2020088883-appb-000092
Represents the differentiable rendering process of step 5.1. Using differentiable rendering, differentiable quality enhancement networks and differentiable decoders, the loss value is passed back to z * , and z * is updated iteratively until convergence, and finally to the diffuse, normal, and specular material decoders Input z a , z n , z s respectively , and then input the output to the corresponding material quality enhancement network to obtain materials T a , T n , T s that meet the character characteristics of the input image. *It can be a, n, s, e to represent diffuse reflection material, normal material, specular reflection material, and spherical harmonic lighting respectively.
实施实例Implementation examples
发明人在一台配备Intel Xeon E5-4650中央处理器,NVidia GeForce RTX 2080Ti图形处理器(11GB)的机器上实现了本发明的实施实例。发明人采用所有在具体实施方式中列出的参数值,得到了附图1-5中所示的所有实验结果。本发明可以有效地根据输入的人物图像输出符合其特征且高质量的人脸反射材质。对于一张人脸区域600×800的图像,人脸3D几何信息的计算大约需要30秒,隐空间的初始化大约需要10毫秒,迭代优化过程每轮正向计算(解码、质量强化、渲染)需要250毫秒,需要150轮迭代可以收敛,因此整个迭代过程花费大约40秒时间。另外,训练U型网络需要12小时,训练球谐光照系数回归网路需要4小时,训练材质质量强化网络需要大约50小时,这些模块都只需要训练一次,便可用于处理任何输入的人物图像。The inventor implemented the embodiment of the present invention on a machine equipped with Intel Xeon E5-4650 CPU and NVidia GeForce RTX 2080Ti graphics processor (11GB). The inventor used all the parameter values listed in the specific embodiments to obtain all the experimental results shown in Figures 1-5. The present invention can effectively output a high-quality human face reflection material according to the input character image. For an image of a face area of 600×800, the calculation of the 3D geometric information of the face takes about 30 seconds, the initialization of the hidden space takes about 10 milliseconds, and the iterative optimization process requires each round of forward calculation (decoding, quality enhancement, rendering). 250 milliseconds, it takes 150 iterations to converge, so the entire iteration process takes about 40 seconds. In addition, it takes 12 hours to train the U-shaped network, 4 hours to train the spherical harmonic light coefficient regression network, and about 50 hours to train the material quality enhancement network. These modules only need to be trained once and can be used to process any input character image.

Claims (7)

  1. 一种基于可微渲染器的从单幅图像求解人脸反射材质的方法,其特征在于,包括以下步骤:A method for solving face reflection material from a single image based on a differentiable renderer, which is characterized in that it includes the following steps:
    (1)计算输入图像中人脸的3D信息,并根据3D信息获得纹理空间的人脸颜色图和用于基于物理的可微渲染的静态信息;所述3D信息包括人脸的3D模型、刚体变化矩阵以及投影矩阵;所述静态信息包括阴影贴图T sha和环境法向贴图T bn(1) Calculate the 3D information of the face in the input image, and obtain the face color map in the texture space and the static information for physically-based differentiable rendering according to the 3D information; the 3D information includes the 3D model of the face and the rigid body A change matrix and a projection matrix; the static information includes a shadow map T sha and an environment normal map T bn ;
    (2)基于步骤1得到的纹理空间人脸颜色图,通过基于卷积神经网络的编码器编码得到人脸反射材质隐空间系数初始值
    Figure PCTCN2020088883-appb-100001
    和球谐光照系数的初始值
    Figure PCTCN2020088883-appb-100002
    *是a,n,s分别表示漫反射材质、法向材质以及镜面反射材质;
    (2) Based on the texture space face color map obtained in step 1, the initial value of the hidden space coefficient of the face reflection material is obtained through the encoder encoding based on the convolutional neural network
    Figure PCTCN2020088883-appb-100001
    The initial value of the spherical harmonic illumination coefficient
    Figure PCTCN2020088883-appb-100002
    *A, n, s represent diffuse reflection material, normal material and specular reflection material respectively;
    (3)利用基于卷积神经网络实现的可微解码器,将人脸反射材质隐空间的系数
    Figure PCTCN2020088883-appb-100003
    解码为相应的反射材质图像
    Figure PCTCN2020088883-appb-100004
    (3) Using a differentiable decoder based on a convolutional neural network to reflect the coefficients of the hidden space of the face material
    Figure PCTCN2020088883-appb-100003
    Decode into the corresponding reflection material image
    Figure PCTCN2020088883-appb-100004
    (4)提升步骤(3)得到的反射材质图像
    Figure PCTCN2020088883-appb-100005
    的分辨率及细节质量得到图像T *
    (4) Upgrade the reflective material image obtained in step (3)
    Figure PCTCN2020088883-appb-100005
    The resolution and detail quality of, get the image T * ;
    (5)通过最小化基于物理的可微渲染器渲染步骤(4)提升了质量的反射材质图像T *得到的渲染结果与输入人脸图像的差异,迭代优化人脸反射材质的隐空间系数和球谐光照系数,将优化后的人脸反射材质隐空间系数通过步骤(3)-(4)的解码与质量提升操作求解得到人脸反射材质。 (5) By minimizing the rendering step of the physically-based differentiable renderer (4) The quality of the reflective material image T * is improved and the difference between the rendering result obtained and the input face image is iteratively optimized and the hidden space coefficient of the face reflection material Spherical harmonic light coefficient, the optimized hidden space coefficient of the face reflection material is solved through the decoding and quality improvement operations of steps (3)-(4) to obtain the face reflection material.
  2. 根据权利要求1所述的基于可微渲染器的从单幅图像求解人脸反射材质的方法,其特征在于,所述步骤(1)包括如下子步骤:The method for solving the reflection material of a human face from a single image based on a differentiable renderer according to claim 1, wherein the step (1) comprises the following sub-steps:
    (1.1)人脸3D信息的计算:检测输入图像中人脸的二维特征点,并利用可形变模型优化人物身份系数、刚体变化矩阵以及投影矩阵,通过可形变模型与人物身份系数的线性插值,得到该人物的3D模型;(1.1) Calculation of face 3D information: detect the two-dimensional feature points of the face in the input image, and use the deformable model to optimize the character identity coefficient, the rigid body change matrix and the projection matrix, through the linear interpolation of the deformable model and the character identity coefficient , Get the 3D model of the character;
    (1.2)纹理空间的人脸颜色图片的计算:利用步骤(1.1)得到的刚体变化矩阵、投影矩阵,将步骤(1.1)得到的3D模型投影至输入图像,建立3D模型的每个顶点与图像像素的映射,将输入图像像素映射至3D模型的顶点,再利用3D模型的顶点与纹理空间的映射,将图像像素映射至纹理空间,然后通过对纹理空间三角网格化及三角形重心坐标插值,得到纹理空间的人脸颜色图像;(1.2) Calculation of face color images in texture space: use the rigid body change matrix and projection matrix obtained in step (1.1) to project the 3D model obtained in step (1.1) to the input image, and establish each vertex and image of the 3D model Pixel mapping, the input image pixels are mapped to the vertices of the 3D model, and then the vertices of the 3D model are mapped to the texture space to map the image pixels to the texture space, and then the texture space is triangulated and the triangle center of gravity coordinate interpolation is performed. Get the face color image in texture space;
    (1.3)基于物理的可微渲染的静态信息的计算:利用(1.1)步骤中的3D模型、刚体变化矩阵、投影矩阵,将纹理坐标作为颜色信息绘制至图像空间,得到纹理坐标图像I uv;利用(1.1)步骤得到的刚体变化矩阵、3D模型,得到经过刚性变化的3D模型,利用光线追踪算 法计算上述3D模型每个顶点各个方向的遮挡,并将其投影至球谐函数多项式,由此得到每个顶点遮挡的球谐系数;另外记录未遮挡区域占比以及未遮挡区域的中心方向,得到每个顶点的环境法向量;最后通过纹理空间的三角网格化,以及分别对每个顶点的遮挡球谐系数和环境法向量进行三角形重心坐标插值,得到最终的阴影贴图T sha和环境法向贴图T bn(1.3) Physics-based calculation of static information that can be rendered differently: using the 3D model, rigid body change matrix, and projection matrix in step (1.1), the texture coordinates are drawn as color information to the image space to obtain the texture coordinate image I uv ; Use the rigid body change matrix and 3D model obtained in step (1.1) to obtain the rigidly changed 3D model, use the ray tracing algorithm to calculate the occlusion of each vertex of the above 3D model in each direction, and project it to the spherical harmonic function polynomial, thus Obtain the spherical harmonic coefficient occluded by each vertex; in addition, record the proportion of the unoccluded area and the center direction of the unoccluded area to obtain the environment normal vector of each vertex; finally, through the triangular meshing of the texture space, and separately for each vertex The occlusion spherical harmonic coefficient and the environment normal vector are interpolated with the triangle barycentric coordinates to obtain the final shadow map T sha and the environment normal map T bn .
  3. 根据权利要求2所述的基于可微渲染器的从单幅图像求解人脸反射材质的方法,其特征在于,所述步骤(1.2)中,利用泊松算法填补纹理空间的人脸颜色图像中存在的空洞区域。The method for solving the reflection material of a face from a single image based on a differentiable renderer according to claim 2, wherein in the step (1.2), a Poisson algorithm is used to fill the face color image in the texture space The void area of existence.
  4. 根据权利要求2所述的基于可微渲染器的从单幅图像求解人脸反射材质的方法,其特征在于,基于卷积神经网络的编码器、解码器通过组成U型网络共同训练得到,训练具体包括如下子步骤:The method for solving the reflection material of a human face from a single image based on a differentiable renderer according to claim 2, wherein the encoder and decoder based on the convolutional neural network are jointly trained by forming a U-shaped network, and the training Specifically, it includes the following sub-steps:
    (a)训练数据:获取N张目标人脸图像I o及对应的漫反射材质
    Figure PCTCN2020088883-appb-100006
    法向材质
    Figure PCTCN2020088883-appb-100007
    以及镜面反射材质
    Figure PCTCN2020088883-appb-100008
    将人脸图像映射到纹理空间,得到对应的纹理空间人脸颜色图像I;
    Figure PCTCN2020088883-appb-100009
    Figure PCTCN2020088883-appb-100010
    组成U型网络的训练数据,其中每项分辨率都为1024×1024;
    (a) Training data: Obtain N target face images I o and corresponding diffuse reflection materials
    Figure PCTCN2020088883-appb-100006
    Normal material
    Figure PCTCN2020088883-appb-100007
    And specular reflection material
    Figure PCTCN2020088883-appb-100008
    Map the face image to the texture space to obtain the corresponding texture space face color image I;
    Figure PCTCN2020088883-appb-100009
    Figure PCTCN2020088883-appb-100010
    Form the training data of the U-shaped network, each of which has a resolution of 1024×1024;
    (b)漫反射材质、法向材质、镜面反射材质各有一个U型网络;对于漫反射材质的U型网络U a,输入是经缩放的纹理空间人脸颜色图像
    Figure PCTCN2020088883-appb-100011
    U a的编码器部分E a包含9个下采样模块,前8个下采样模块都包含一个核大小为3×3、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最后一个下采样模块包含核大小为1×1、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最终编码成为1×1×1024的漫反射材质隐空间;U a的解码器部分D a包含9个上采样模块,每个上采样模块都包含一个核大小为3×3、放大两倍的缩放卷积层、批标准化层、LReLu激活函数层,最后通过一个核大小为1×1、步长为1×1、激活函数为Sigmoid的卷积层得到最终分辨率为512×512×3的输出;法向材质的U型网络U n,输入是经过区域插值缩放的纹理空间人脸颜色图像
    Figure PCTCN2020088883-appb-100012
    分辨率是256×256,其编码器E n包括8个下采样模块,前7个下采样模块都包含一个核大小为3×3、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最后一个下采样模块包含核大小为1×1、步长为2×2的卷积层、批标准化层、LReLU激活函数层,最终编码成为1×1×512的法向材质隐空间;解码器D n中包括8个上采样模块,每个上采样模块都包含一个核大小为3×3、放大两倍的缩放卷积层、批标准化层、LReLu激活函数层,最后通过一个核大小为1×1、步长为1×1、激活函数为Sigmoid的卷积层得到最终分辨率为256×256×3的输出;镜面反射材质的U型网络U s,其编码器结构E s与E n相同,D s的8个上采样模块都包含一个核大小为3×3、放大两倍的缩放卷积层、批标准化层、LReLu激活函数层,最后通过一个核大小为1×1、步长为1×1、激活函数为Sigmoid的卷积层得到最终分辨率为256×256×1的输出;其中,U形状网络的E *与D * 中最高分辨率的3个模块进行跳跃式传递连接,*为a,n,s;
    (b) Diffuse material, normal material, and specular reflection material each have a U-shaped network; for the U-shaped network U a of diffuse material, the input is a scaled texture space face color image
    Figure PCTCN2020088883-appb-100011
    The encoder part E a of U a contains 9 down-sampling modules. The first 8 down-sampling modules all include a convolutional layer with a core size of 3×3 and a step size of 2×2, a batch normalization layer, and an LReLU activation function layer. , The last down-sampling module includes a convolutional layer with a core size of 1×1, a step size of 2×2, a batch normalization layer, and an LReLU activation function layer, and the final encoding becomes a 1×1×1024 diffuse reflection material hidden space; U a decoder section comprises nine D a sampling module, each module contains the sample core size of a 3 × 3, two magnifications convolution scaling layer, a standardized batch layer, LReLu activation function layer, and finally through a The kernel size is 1×1, the step size is 1×1, and the activation function is the Sigmoid convolutional layer to get the final resolution of 512×512×3 output; the U-shaped network of normal material U n , the input is through regional interpolation Scaled texture space face color image
    Figure PCTCN2020088883-appb-100012
    The resolution is 256×256, and the encoder En includes 8 down-sampling modules. The first 7 down-sampling modules include a convolutional layer with a core size of 3×3 and a step size of 2×2, a batch normalization layer, LReLU activation function layer, the last down-sampling module includes a convolutional layer with a core size of 1×1 and a step size of 2×2, a batch normalization layer, and an LReLU activation function layer. The final encoding becomes a 1×1×512 normal material Latent space; the decoder D n includes 8 up-sampling modules, each up-sampling module contains a core size of 3×3, zoomed in twice the convolutional layer, batch normalization layer, LReLu activation function layer, and finally passed A convolutional layer with a core size of 1×1, a step size of 1×1, and an activation function of Sigmoid obtains an output with a final resolution of 256×256×3; a U-shaped network U s of specular reflection material, and its encoder structure E s and E n the same, D s eight sampling module contains a core size of 3 × 3, enlarged to twice the scale layer convolution, normalized batch layer, LReLu activation function layer, and finally through a core size of 1 ×1, a convolutional layer with a step size of 1×1 and an activation function of Sigmoid to obtain an output with a final resolution of 256×256×1; among them, the three modules with the highest resolution among the E* and D * of the U-shaped network Perform skip transfer connection, * is a, n, s;
    (c)训练的损失函数的定义如下:(c) The training loss function is defined as follows:
    Figure PCTCN2020088883-appb-100013
    Figure PCTCN2020088883-appb-100013
    Figure PCTCN2020088883-appb-100014
    Figure PCTCN2020088883-appb-100014
    U *表示U型网络,其中下标*可以是a,n,s分别表示漫反射材质、法向材质、镜面反射材质,
    Figure PCTCN2020088883-appb-100015
    表示表示缩放后的纹理空间的人脸颜色图像,
    Figure PCTCN2020088883-appb-100016
    Figure PCTCN2020088883-appb-100017
    分别表示U型网络输出材质图像与相应的缩放后的真实材质图像,其中
    Figure PCTCN2020088883-appb-100018
    分辨率是512×512,而
    Figure PCTCN2020088883-appb-100019
    分辨率是256×256;
    Figure PCTCN2020088883-appb-100020
    分辨率是512×512,而
    Figure PCTCN2020088883-appb-100021
    的分辨率是256×256。
    U * represents a U-shaped network, where the subscript * can be a, n, s representing diffuse reflection material, normal material, and specular reflection material, respectively.
    Figure PCTCN2020088883-appb-100015
    Represents the face color image representing the scaled texture space,
    Figure PCTCN2020088883-appb-100016
    and
    Figure PCTCN2020088883-appb-100017
    Respectively represent the U-shaped network output material image and the corresponding zoomed real material image, where
    Figure PCTCN2020088883-appb-100018
    The resolution is 512×512, and
    Figure PCTCN2020088883-appb-100019
    The resolution is 256×256;
    Figure PCTCN2020088883-appb-100020
    The resolution is 512×512, and
    Figure PCTCN2020088883-appb-100021
    The resolution is 256×256.
  5. 根据权利要求2所述的基于可微渲染器的从单幅图像求解人脸反射材质的方法,其特征在于,所述步骤(2)中,输入图像的球谐光照的初始值
    Figure PCTCN2020088883-appb-100022
    通过构建球谐光照系数回归网络获得,所述球谐光照系数回归网络包括基于卷积神经网络的编码器以及全连接构成的回归模块,训练过程包括如下步骤:
    The method for solving the reflection material of a human face from a single image based on a differentiable renderer according to claim 2, characterized in that, in the step (2), the initial value of the spherical harmonic illumination of the input image
    Figure PCTCN2020088883-appb-100022
    It is obtained by constructing a spherical harmonic illumination coefficient regression network. The spherical harmonic illumination coefficient regression network includes an encoder based on a convolutional neural network and a regression module composed of a full connection. The training process includes the following steps:
    (A)由{I o,z e}组成训练数据对,其中球谐系数z e根据HDR环境光图像I e,通过下式计算: (A) A training data pair is composed of {I o , z e }, in which the spherical harmonic coefficient z e is calculated according to the HDR ambient light image I e by the following formula:
    Figure PCTCN2020088883-appb-100023
    Figure PCTCN2020088883-appb-100023
    其中i,j表示图像长宽W,H方向的笛卡尔坐标,Y k表示球谐函数多项式,k表示球谐的阶数,0≤k<9,φ表示图像坐标i,j到球面坐标θ,
    Figure PCTCN2020088883-appb-100024
    的转换方程,其表达式如下:
    Where i,j represent the Cartesian coordinates of the image length and width in W and H directions, Y k represents the spherical harmonic function polynomial, k represents the order of the spherical harmonic, 0≤k<9, and φ represents the image coordinate i,j to the spherical coordinate θ ,
    Figure PCTCN2020088883-appb-100024
    The conversion equation of, its expression is as follows:
    Figure PCTCN2020088883-appb-100025
    Figure PCTCN2020088883-appb-100025
    Figure PCTCN2020088883-appb-100026
    Figure PCTCN2020088883-appb-100026
    (B)缩放I o至分辨率256×256作为网络输入,利用L2范数作为损失函数对网络进行端到端的监督性学习训练。 (B) Scaling I o to a resolution of 256×256 as the network input, and using the L2 norm as the loss function to perform end-to-end supervised learning training on the network.
  6. 根据权利要求1所述的基于可微渲染器的从单幅图像求解人脸反射材质的方法,其特征在于,所述步骤(4)中,通过构建反射材质质量强化网络R *提升反射材质图像
    Figure PCTCN2020088883-appb-100027
    的分辨率及细节质量,具体包括如下子步骤:
    The method for solving the reflection material of a human face from a single image based on a differentiable renderer according to claim 1, wherein in the step (4), the reflection material image is improved by constructing a reflection material quality enhancement network R*
    Figure PCTCN2020088883-appb-100027
    The resolution and quality of detail include the following sub-steps:
    (4.1)训练基于卷积神经网络的反射材质质量强化网络,具体如下:(4.1) Train the reflection material quality enhancement network based on the convolutional neural network, as follows:
    (4.1.1)训练数据:将训练用的人脸颜色图像I输入步骤(2)训练好的U型网络生成
    Figure PCTCN2020088883-appb-100028
    与人脸颜色图像I原始的
    Figure PCTCN2020088883-appb-100029
    组成训练数据对
    Figure PCTCN2020088883-appb-100030
    *表示a,n,s;
    (4.1.1) Training data: Input the face color image I used for training into step (2) Generated by the trained U-shaped network
    Figure PCTCN2020088883-appb-100028
    Original with face color image I
    Figure PCTCN2020088883-appb-100029
    Compose training data pair
    Figure PCTCN2020088883-appb-100030
    * Indicates a, n, s;
    (4.1.2)训练方式:采用SRGAN网络作为反射材质质量强化网络R *,并采用生成对抗(GAN)方式训练;其中,对于漫反射材质质量强化网络R a,输入为512×512的
    Figure PCTCN2020088883-appb-100031
    输出 图像分辨率为1024×1024;对于法向材质质量强化网络R n,和高光材质质量强化网络R s,其网络的第一层接受的图像深度为4,输入包括
    Figure PCTCN2020088883-appb-100032
    和缩放的纹理空间的人脸颜色图像
    Figure PCTCN2020088883-appb-100033
    输入的分辨率为256×256,输出分辨率为1024×1024的高质量材质图像;
    (4.1.2) Training method: SRGAN network is used as the reflection material quality enhancement network R * , and the generative confrontation (GAN) method is used for training; among them, for the diffuse reflection material quality enhancement network R a , the input is 512×512
    Figure PCTCN2020088883-appb-100031
    The output image resolution is 1024×1024; for the normal material quality enhancement network R n , and the specular material quality enhancement network R s , the first layer of the network accepts an image depth of 4, and the input includes
    Figure PCTCN2020088883-appb-100032
    And scaled texture space face color image
    Figure PCTCN2020088883-appb-100033
    The input resolution is 256×256, and the output resolution is 1024×1024 high-quality texture images;
    (4.2)材质图像的质量强化:基于步骤3生成的
    Figure PCTCN2020088883-appb-100034
    利用步骤(4.1)中训练好的质量强化网络进行质量强化,得到高质量材质图像T *,*表示a,n,s,整个过程可以用下式表示:
    (4.2) Quality enhancement of material images: based on step 3
    Figure PCTCN2020088883-appb-100034
    Use the trained quality enhancement network in step (4.1) for quality enhancement to obtain a high-quality material image T * ,* represents a, n, s, and the whole process can be expressed by the following formula:
    Figure PCTCN2020088883-appb-100035
    Figure PCTCN2020088883-appb-100035
    Figure PCTCN2020088883-appb-100036
    Figure PCTCN2020088883-appb-100036
    Figure PCTCN2020088883-appb-100037
    Figure PCTCN2020088883-appb-100037
    Figure PCTCN2020088883-appb-100038
    表示缩放至256×256的纹理空间人脸颜色图像。
    Figure PCTCN2020088883-appb-100038
    Represents a face color image scaled to 256×256 in texture space.
  7. 根据权利要求2所述的基于可微渲染器的从单幅图像求解人脸反射材质的方法,其特征在于,所述步骤(5)包括如下子步骤:The method for solving the reflection material of a human face from a single image based on a differentiable renderer according to claim 2, wherein the step (5) comprises the following sub-steps:
    (5.1)利用反射材质和球谐光照进行基于物理的正向渲染:(5.1) Physically-based forward rendering using reflective materials and spherical harmonic lighting:
    (5.1.1)计算人脸漫反射:按照步骤1.3中得到I uv对质量强化网络输出的T a、T n和T s以及阴影贴图T sha和环境法向贴图T bn进行双线性采样,得到对应的图像空间的材质图像t *,*是a,n,s,sha,bn,分别表示漫反射材质、法向材质、镜面反射材质、阴影贴图以及环境法向贴图;遍历I uv中所有像素,利用以下基于物理的渲染公式计算每个像素的漫反射光照: (5.1.1) Calculate the diffuse reflection of the human face: According to the I uv obtained in step 1.3, perform bilinear sampling on the T a , T n and T s output by the quality enhancement network, as well as the shadow map T sha and the environment normal map T bn, Obtain the material image t * of the corresponding image space, * is a, n, s, sha, bn, respectively representing diffuse material, normal material, specular material, shadow map and environment normal map; traverse all of I uv Pixels, use the following physically-based rendering formula to calculate the diffuse lighting of each pixel:
    Figure PCTCN2020088883-appb-100039
    Figure PCTCN2020088883-appb-100039
    其中,k表示球谐函数多项式的阶数,利用球谐乘投影性质将z e,v进行重投影,得到w,v表示每个像素各个方向的可见性,记录在t sha中;c由max(0,cosθ)的球谐系数旋转至当前像素法向方向n的球谐系数,n记录在t n中; Among them, k represents the order of the spherical harmonic function polynomial, and reprojects z e and v using the properties of spherical harmonic projection to obtain w, v represents the visibility of each pixel in each direction, and is recorded in t sha ; c is determined by max The spherical harmonic coefficient of (0, cosθ) is rotated to the spherical harmonic coefficient of the normal direction n of the current pixel, and n is recorded in t n ;
    (5.1.2)计算人脸镜面反射并计算渲染结果:利用以下公式计算人脸镜面高光反射:(5.1.2) Calculate the specular reflection of the face and calculate the rendering result: use the following formula to calculate the specular specular reflection of the face:
    L s=DFG·LD, L s =DFG·LD,
    其中DFG表示预先计算的服从GGX分布的渲染传输方程,LD的计算方式如下:Where DFG represents the pre-calculated rendering transmission equation that obeys the GGX distribution, and the calculation method of LD is as follows:
    Figure PCTCN2020088883-appb-100040
    Figure PCTCN2020088883-appb-100040
    利用以下公式融合漫反射与镜面反射,计算I uv中每个像素的渲染结果: Use the following formula to fuse diffuse reflection and specular reflection to calculate the rendering result of each pixel in I uv:
    Figure PCTCN2020088883-appb-100041
    Figure PCTCN2020088883-appb-100041
    Figure PCTCN2020088883-appb-100042
    即为最终渲染结果;
    Figure PCTCN2020088883-appb-100042
    That is the final rendering result;
    (5.2)迭代优化材质隐空间变量与球谐光照系数z e:最小化以下公式: (5.2) Iterative optimization of material hidden space variables and spherical harmonic illumination coefficient z e : Minimize the following formula:
    Figure PCTCN2020088883-appb-100043
    Figure PCTCN2020088883-appb-100043
    L表示损失函数,
    Figure PCTCN2020088883-appb-100044
    表示步骤5.1的可微渲染过程;利用可微渲染、可微的质量强化网络以及可微的解码器,将损失值反向传递至z *,并迭代更新z *,*是a,n,s,e分别表示漫反射材质、法向材质、镜面反射材质、球谐光照,直至收敛,最后向漫反射、法向量、镜面反射材质解码器分别输入z a,z n,z s,并将其输出再输入至对应的材质质量强化网络,得到符合输入图像人物特征的材质T a,T n,T s
    L represents the loss function,
    Figure PCTCN2020088883-appb-100044
    Step 5.1 represents differentiable rendering process; rendering using differentiable, differentiable strengthen the network quality and a micro decoder, is transmitted to the reverse loss values z *, and iteratively updated z *, * is a, n, s ,e respectively represent diffuse reflection material, normal direction material, specular reflection material, spherical harmonic illumination, until convergence, and finally input z a , z n , z s to the diffuse reflection, normal vector, and specular reflection material decoder respectively, and then The output is then input to the corresponding material quality enhancement network to obtain materials T a , T n , and T s that meet the character characteristics of the input image.
PCT/CN2020/088883 2020-05-07 2020-05-07 Micro-renderer-based method for acquiring reflection material of human face from single image WO2021223134A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/088883 WO2021223134A1 (en) 2020-05-07 2020-05-07 Micro-renderer-based method for acquiring reflection material of human face from single image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/088883 WO2021223134A1 (en) 2020-05-07 2020-05-07 Micro-renderer-based method for acquiring reflection material of human face from single image

Publications (1)

Publication Number Publication Date
WO2021223134A1 true WO2021223134A1 (en) 2021-11-11

Family

ID=78468579

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/088883 WO2021223134A1 (en) 2020-05-07 2020-05-07 Micro-renderer-based method for acquiring reflection material of human face from single image

Country Status (1)

Country Link
WO (1) WO2021223134A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419233A (en) * 2021-12-31 2022-04-29 网易(杭州)网络有限公司 Model generation method and device, computer equipment and storage medium
CN114677292A (en) * 2022-03-07 2022-06-28 北京航空航天大学 High-resolution material recovery method based on two image inverse rendering neural network
CN114842121A (en) * 2022-06-30 2022-08-02 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN116091684A (en) * 2023-04-06 2023-05-09 杭州片段网络科技有限公司 WebGL-based image rendering method, device, equipment and storage medium
CN117132461A (en) * 2023-10-27 2023-11-28 中影年年(北京)文化传媒有限公司 Method and system for whole-body optimization of character based on character deformation target body
CN117173343A (en) * 2023-11-03 2023-12-05 北京渲光科技有限公司 Relighting method and relighting system based on nerve radiation field
CN117173383A (en) * 2023-11-02 2023-12-05 摩尔线程智能科技(北京)有限责任公司 Color generation method, device, equipment and storage medium
CN117372604A (en) * 2023-12-06 2024-01-09 国网电商科技有限公司 3D face model generation method, device, equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756325B2 (en) * 2005-06-20 2010-07-13 University Of Basel Estimating 3D shape and texture of a 3D object based on a 2D image of the 3D object
CN102346857A (en) * 2011-09-14 2012-02-08 西安交通大学 High-precision method for simultaneously estimating face image illumination parameter and de-illumination map
CN102426695A (en) * 2011-09-30 2012-04-25 北京航空航天大学 Virtual-real illumination fusion method of single image scene
CN108765550A (en) * 2018-05-09 2018-11-06 华南理工大学 A kind of three-dimensional facial reconstruction method based on single picture

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756325B2 (en) * 2005-06-20 2010-07-13 University Of Basel Estimating 3D shape and texture of a 3D object based on a 2D image of the 3D object
CN102346857A (en) * 2011-09-14 2012-02-08 西安交通大学 High-precision method for simultaneously estimating face image illumination parameter and de-illumination map
CN102426695A (en) * 2011-09-30 2012-04-25 北京航空航天大学 Virtual-real illumination fusion method of single image scene
CN108765550A (en) * 2018-05-09 2018-11-06 华南理工大学 A kind of three-dimensional facial reconstruction method based on single picture

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419233A (en) * 2021-12-31 2022-04-29 网易(杭州)网络有限公司 Model generation method and device, computer equipment and storage medium
CN114677292A (en) * 2022-03-07 2022-06-28 北京航空航天大学 High-resolution material recovery method based on two image inverse rendering neural network
CN114842121A (en) * 2022-06-30 2022-08-02 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN114842121B (en) * 2022-06-30 2022-09-09 北京百度网讯科技有限公司 Method, device, equipment and medium for generating mapping model training and mapping
CN116091684A (en) * 2023-04-06 2023-05-09 杭州片段网络科技有限公司 WebGL-based image rendering method, device, equipment and storage medium
CN117132461A (en) * 2023-10-27 2023-11-28 中影年年(北京)文化传媒有限公司 Method and system for whole-body optimization of character based on character deformation target body
CN117132461B (en) * 2023-10-27 2023-12-22 中影年年(北京)文化传媒有限公司 Method and system for whole-body optimization of character based on character deformation target body
CN117173383B (en) * 2023-11-02 2024-02-27 摩尔线程智能科技(北京)有限责任公司 Color generation method, device, equipment and storage medium
CN117173383A (en) * 2023-11-02 2023-12-05 摩尔线程智能科技(北京)有限责任公司 Color generation method, device, equipment and storage medium
CN117173343A (en) * 2023-11-03 2023-12-05 北京渲光科技有限公司 Relighting method and relighting system based on nerve radiation field
CN117173343B (en) * 2023-11-03 2024-02-23 北京渲光科技有限公司 Relighting method and relighting system based on nerve radiation field
CN117372604A (en) * 2023-12-06 2024-01-09 国网电商科技有限公司 3D face model generation method, device, equipment and readable storage medium
CN117372604B (en) * 2023-12-06 2024-03-08 国网电商科技有限公司 3D face model generation method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
WO2021223134A1 (en) Micro-renderer-based method for acquiring reflection material of human face from single image
CN111652960B (en) Method for solving human face reflection material from single image based on micro-renderer
Lombardi et al. Neural volumes: Learning dynamic renderable volumes from images
Nguyen-Phuoc et al. Rendernet: A deep convolutional network for differentiable rendering from 3d shapes
LeGendre et al. Deeplight: Learning illumination for unconstrained mobile mixed reality
Tewari et al. State of the art on neural rendering
Wang et al. Nerf-art: Text-driven neural radiance fields stylization
Georgoulis et al. Reflectance and natural illumination from single-material specular objects using deep learning
US20050017968A1 (en) Differential stream of point samples for real-time 3D video
Kopanas et al. Neural point catacaustics for novel-view synthesis of reflections
Bemana et al. Eikonal fields for refractive novel-view synthesis
US20030117675A1 (en) Curved image conversion method and record medium where this method for converting curved image is recorded
Li et al. Topologically consistent multi-view face inference using volumetric sampling
Huang et al. Refsr-nerf: Towards high fidelity and super resolution view synthesis
Karunratanakul et al. Harp: Personalized hand reconstruction from a monocular rgb video
Schwandt et al. A single camera image based approach for glossy reflections in mixed reality applications
Zhang et al. Video-driven neural physically-based facial asset for production
Ren et al. Facial geometric detail recovery via implicit representation
Feng et al. Learning disentangled avatars with hybrid 3d representations
Lin et al. Single-shot implicit morphable faces with consistent texture parameterization
Wang et al. GaussianHead: Impressive Head Avatars with Learnable Gaussian Diffusion
Sumantri et al. 360 panorama synthesis from a sparse set of images on a low-power device
Lin et al. Multiview textured mesh recovery by differentiable rendering
US20230031750A1 (en) Topologically consistent multi-view face inference using volumetric sampling
Chu et al. GPAvatar: Generalizable and Precise Head Avatar from Image (s)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20934489

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20934489

Country of ref document: EP

Kind code of ref document: A1