WO2022213623A1 - 图像生成、三维人脸模型生成的方法、装置、电子设备及存储介质 - Google Patents
图像生成、三维人脸模型生成的方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2022213623A1 WO2022213623A1 PCT/CN2021/133390 CN2021133390W WO2022213623A1 WO 2022213623 A1 WO2022213623 A1 WO 2022213623A1 CN 2021133390 W CN2021133390 W CN 2021133390W WO 2022213623 A1 WO2022213623 A1 WO 2022213623A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- normal vector
- face
- level
- data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 126
- 230000001815 facial effect Effects 0.000 title abstract description 9
- 239000013598 vector Substances 0.000 claims abstract description 232
- 238000007499 fusion processing Methods 0.000 claims abstract description 193
- 239000000523 sample Substances 0.000 claims description 82
- 238000013528 artificial neural network Methods 0.000 claims description 74
- 238000012549 training Methods 0.000 claims description 53
- 238000012545 processing Methods 0.000 claims description 52
- 239000011159 matrix material Substances 0.000 claims description 38
- 230000004927 fusion Effects 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 20
- 239000013074 reference sample Substances 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 13
- 230000009466 transformation Effects 0.000 claims description 13
- 238000005070 sampling Methods 0.000 claims description 12
- 238000011084 recovery Methods 0.000 claims description 9
- 230000008921 facial expression Effects 0.000 claims description 5
- 238000009877 rendering Methods 0.000 description 26
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 2
- 230000036544 posture Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 208000032538 Depersonalisation Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Definitions
- the present disclosure relates to the technical field of image processing, and in particular, to a method, an apparatus, an electronic device, and a storage medium for image generation and three-dimensional face model generation.
- 3D face reconstruction refers to the restoration of the 3D model of the face based on the face image. After the three-dimensional model of the human face is generated, a human face image can be generated based on the three-dimensional model of the human face.
- the current face image generation method has the problem of poor accuracy of the generated face image.
- the embodiments of the present disclosure provide at least a method, an apparatus, an electronic device, and a medium for image generation and three-dimensional face model generation.
- an embodiment of the present disclosure provides a method for generating a face image, including: acquiring a normal vector image and texture feature data, wherein the pixel value representation of each pixel in the normal vector image is the same as that of the method.
- the normal vector image is used as the geometric condition, and the texture feature data is used to model other influencing factors on the face image.
- the obtained reconstructed face image has higher Accuracy.
- an embodiment of the present disclosure further provides a method for generating a 3D face model, including: performing 3D face reconstruction and texture feature recovery on a target face image including a target face, to obtain an initial method of the target face vector image and initial texture feature data of the target face; based on the initial normal vector image and the initial texture feature data, a three-dimensional face model of the target face is obtained.
- the generated 3D face model of the target face also has higher accuracy.
- an apparatus for generating a face image includes: a first acquisition module, configured to acquire a normal vector image and texture feature data, wherein the value of each pixel in the normal vector image is The pixel value represents the value of the normal vector of the model vertex corresponding to the pixel point in the three-dimensional face model corresponding to the normal vector image; the first processing module is used for the normal vector image and the texture feature data. Perform multi-level data fusion processing to obtain reconstructed face images.
- an embodiment of the present disclosure further provides an apparatus for generating a three-dimensional face model, including: a second acquisition module configured to perform three-dimensional face reconstruction and texture feature recovery on a target face image including a target face, to obtain the obtained The initial normal vector image of the target face and the initial texture feature data of the target face; the second processing module is used to obtain the target person based on the initial normal vector image and the initial texture feature data. 3D face model of the face.
- an optional implementation manner of the present disclosure further provides an electronic device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the memory stored in the memory.
- machine-readable instructions when the machine-readable instructions are executed by the processor, when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or any possible implementation of the first aspect, is executed or perform the steps in the second aspect or any possible implementation manner of the second aspect.
- an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect, or any one of the first aspect when the computer program is run. Steps in a possible implementation manner; or perform the above-mentioned second aspect, or the steps in any possible implementation manner of the second aspect.
- FIG. 1 shows a flowchart of a method for generating a face image provided by an embodiment of the present disclosure
- FIG. 2 shows a flowchart of a specific method for performing multi-level data fusion processing on normal vector images and texture feature data provided by an embodiment of the present disclosure
- FIG. 3 shows a flowchart of a specific method for training a first neural network provided by an embodiment of the present disclosure
- FIG. 4 shows a schematic structural diagram of a first neural network provided by an embodiment of the present disclosure
- FIG. 5 shows a schematic structural diagram of a rendering block in a first neural network provided by an embodiment of the present disclosure
- FIG. 6 shows a flowchart of a method for generating a 3D face model provided by an embodiment of the present disclosure
- FIG. 7 shows a schematic diagram of an apparatus for generating a face image provided by an embodiment of the present disclosure
- FIG. 8 shows a schematic diagram of an apparatus for generating a three-dimensional face model provided by an embodiment of the present disclosure
- FIG. 9 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
- the 3D face model can be recovered by 3D face reconstruction with monocular face image as input.
- the three-dimensional face model includes a plurality of model vertices and a connection relationship between the model vertices.
- the vertices of the model are connected to each other to form multiple patches, which can represent the outline of the face.
- the purpose of performing 3D geometric reconstruction on the face is to restore the 3D position information of each model vertex among the multiple model vertices constituting the 3D face model.
- the way to generate a 3D face model based on deep learning technology is usually to use a face image as an input to learn to return the corresponding 3D face model parameters. face reconstruction.
- this method of 3D face reconstruction requires a large amount of labeled data, and the acquisition of labeled data is often difficult, and the generated 3D face model has low accuracy.
- the generation of face images based on 3D face models plays an important role in the fields of film and television, games, and virtual social interaction.
- the position of the model vertices in the 3D face model can be adjusted in a certain way, so that a new face can be generated based on the adjusted 3D face model. image.
- the expression of the original face can be transformed, so that the generated face image has both the characteristics of the original face and the characteristics of a certain style.
- the way to generate a face image by using a three-dimensional face model is usually an optimization method. This method obtains face parameters, such as reflectivity, texture, illumination, angle of view, etc.
- the rules generate face images.
- the graphical rules usually use a simplified model to describe the physical process of face image capture, many details of the imaging process cannot be modeled, resulting in poor accuracy of the face image generated in this way.
- the present disclosure provides a face image generation method, which uses the normal vector image of the face as the geometric condition, and uses texture feature data to model other influencing factors on the face image.
- the data is processed by data fusion, and the obtained reconstructed face image has higher accuracy.
- an embodiment of the present disclosure also provides a method for generating a three-dimensional face model, by predicting an initial normal vector image of the target face based on a target face image including the target face, and then based on the initial normal vector image, and The initial texture feature data of the target face is used to obtain a three-dimensional face model of the target face.
- the face image reconstruction based on the initial normal vector image and the initial texture feature data has higher accuracy
- the three-dimensional image of the target face generated based on the initial normal vector image and the initial texture feature data Face models also have higher accuracy.
- the execution subject of the face image generation provided by the embodiment of the present disclosure is generally an electronic device with a certain computing capability.
- the electronic device includes, for example, a terminal device or a server or other processing device.
- the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a mobile terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, wearable devices, etc.
- the face image generation may be implemented by the processor invoking computer-readable instructions stored in the memory.
- FIG. 1 is a flowchart of a method for generating a face image provided by an embodiment of the present disclosure, the method includes steps S101-S102.
- S101 Acquire a normal vector image and texture feature data.
- the pixel value of each pixel in the normal vector image represents the value of the normal vector of the model vertex corresponding to the pixel in the three-dimensional face model corresponding to the normal vector image.
- S102 Perform multi-level data fusion processing on the normal vector image and the texture feature data to obtain a reconstructed face image.
- the embodiment of the present disclosure obtains the normal vector image and texture feature data of the three-dimensional face model, and performs multi-level data fusion processing on the normal vector image and the texture feature data to obtain a reconstructed face image, thereby realizing Taking the normal vector image of the face as the geometric condition, and using the texture feature data to model other influencing factors on the face image, the reconstructed face image obtained has higher accuracy.
- the three-dimensional face model for example, it may be an initial three-dimensional face model obtained by performing three-dimensional face reconstruction using an image including a human face, that is, the initial three-dimensional face model is directly determined as the three-dimensional human face image reconstruction to be carried out. face model; alternatively, after the initial three-dimensional face model is obtained by performing three-dimensional face reconstruction on the image including the human face, the positions of each model vertex in the initial three-dimensional face model in the three-dimensional coordinate system can be adjusted to obtain the desired face model.
- 3D face model for face image reconstruction the adjustment may be, for example, adjustment for the degree of fatness or thinness of the face, adjustment for the expression of the face, or adjustment based on a certain style.
- the normal vector image of the three-dimensional face model can be generated, for example, by using the normal vector of each model vertex in the three-dimensional face model.
- the value of the normal vector of each model vertex may include the coordinate value corresponding to each coordinate axis of the normal vector in the three-dimensional coordinate system.
- the normal vector image may be, for example, a three-channel image, and the pixel value of each pixel in the image represents the coordinate values corresponding to the three coordinate axes of the normal vector of the model vertex corresponding to the pixel.
- the normal vector image of the three-dimensional face model For example, for each model vertex in the three-dimensional face model, at least one mesh with the model vertex as one vertex can be determined; then according to the at least one mesh, The normal vector corresponding to the at least one mesh can be determined; and then the normal vector corresponding to the vertex of the model can be determined by using the normal vector corresponding to the at least one mesh. For example, the normal vectors corresponding to at least one mesh can be averaged to obtain the normal vectors corresponding to the vertices of the model.
- the texture feature data can be obtained by random Gaussian sampling on any image containing a face.
- the acquisition methods of texture feature data are different. For example, if the application scenario is to generate a reconstructed face image with a certain style, random Gaussian sampling of texture information may be performed on the face image of this style to obtain texture feature data. For another example, to generate a reconstructed face image under a certain lighting environment, random Gaussian sampling of texture information may be performed on the face image under this lighting environment to obtain texture feature data.
- the extraction source of texture feature data can be determined according to different needs.
- an embodiment of the present disclosure provides a specific method for performing multi-level data fusion processing on the normal vector image and the texture feature data to obtain a reconstructed face image, including:
- S201 Perform feature transformation on the texture feature data to obtain transformed texture feature data corresponding to multi-level data fusion processing.
- the texture feature data obtained by random Gaussian sampling may be obtained by using another image including a human face
- another face image used to collect texture feature data corresponds to the three-dimensional face model
- the face image cannot be adapted, and the texture feature data and the normal vector image are directly processed for data fusion, which may result in poor authenticity of the generated reconstructed face image. Therefore, in this embodiment of the present disclosure, when performing multi-level data fusion processing on the normal vector image and texture feature data, feature transformation is first performed on the texture feature data to obtain transformed texture feature data corresponding to the multi-level data fusion processing.
- the degree of adaptation between the transformed texture feature data and the three-dimensional face model can be improved, and the transformed texture feature data corresponding to different levels of data fusion processing can include different texture features. Then, for each level of data fusion processing, the transformed texture feature data corresponding to each level of data fusion processing and the normal vector image are fused, that is, the gradual fusion of the texture feature data and the normal vector image is realized, so that the reconstructed face is obtained.
- the images are more precise and have a stronger sense of realism.
- An embodiment of the present disclosure provides a specific method for performing feature transformation on texture feature data to obtain transformed texture feature data corresponding to multi-level data fusion processing, including: performing first full connection processing on the texture feature data to obtain a first Intermediate texture feature data; performing multiple second full connection processing on the first intermediate texture feature data to obtain transformed texture feature data corresponding to the multi-level data fusion processing respectively.
- the full connection parameters of different second full connection processing are different.
- a first fully-connected network with multiple fully-connected layers may be used to perform a first fully-connected process on the texture feature data to obtain the first intermediate texture feature data.
- a plurality of second fully connected networks may be used to perform full connection processing on the first intermediate texture feature data, respectively, to obtain transformed texture feature data corresponding to the plurality of second fully connected networks.
- each second fully connected network corresponds to the first-level data fusion processing, and different second fully connected networks have different network parameters, so that different second fully connected networks can extract different second fully connected networks from the first intermediate texture feature data.
- texture features so that by dividing the different texture features into multi-level and gradually merging the normal vector image, the authenticity of the generated reconstructed face image can be improved.
- the texture feature data is represented as z; the first intermediate texture feature data obtained after performing the first full connection processing on the texture feature data is represented as z′. Then the relationship between the texture feature data z and the first intermediate texture feature data z' can be expressed as
- the transformed texture feature data is represented as w, then the relationship between the first intermediate texture feature data z' and the transformed texture feature data w can be expressed as:
- M 1 ( ⁇ ) represents the first full connection processing
- M 2 ( ⁇ ) represents the second full connection processing
- feature transformation is performed on the texture feature data z to obtain the transformed texture feature data corresponding to the multi-level data fusion processing respectively.
- w can be simplified as the following formula (1):
- the method for performing multi-level data fusion processing on the normal vector image and the texture feature data further includes S202: for each level of data fusion processing in the multi-level data fusion processing, fuse the data at this level The first feature fusion is performed on the corresponding transformed texture feature data and the input feature map corresponding to the data fusion process at this level to obtain an intermediate feature map corresponding to the data fusion process at this level.
- each level of data fusion processing has a corresponding input feature map.
- the input feature map corresponding to the first-level data fusion process may be determined by using a preset feature map.
- the preset feature maps may be the same.
- the preset feature map may be determined during the training of the image generation neural network.
- the preset feature map can be directly determined as the input feature map corresponding to the first-level data fusion process, or the preset feature map can be upsampled to obtain the input corresponding to the first-level data fusion process. feature map.
- the input feature map corresponding to this level of data fusion processing is determined by using the result feature map output by the previous level of data fusion processing.
- the result feature map output by the corresponding previous-level data fusion processing can be directly used as the input feature map corresponding to this level of data fusion processing.
- an up-sampling process may also be performed on the result feature map output from the data fusion processing of the previous stage to obtain an input feature map corresponding to the data fusion processing of this stage.
- the result feature maps corresponding to the data fusion processing at all levels are up-sampled step by step, so that the size of the resulting feature maps output by the data fusion processing at the last level can conform to the reconstructed face to be generated.
- the size of the image makes the generated reconstructed face image have a higher resolution and improves the clarity of the reconstructed face image.
- each feature point (together with the feature value) in the result feature map can be copied and filled to the adjacent positions of the corresponding feature points.
- the size of the resulting feature map is m*n
- the size of the feature map obtained after copying and filling is 2m*2n.
- the first feature fusion is performed on the transformed texture feature data corresponding to this level of data fusion processing and the input feature map corresponding to this level of data fusion processing, and the specific method for obtaining the intermediate feature map corresponding to this level of data fusion processing, for example, the following method can be used Either A or B.
- Method A For each level of data fusion processing in the multi-level data fusion processing, using the transformed texture feature data corresponding to this level of data fusion processing, transform the convolution kernel corresponding to this level of data fusion processing to obtain a transformed convolution kernel.
- the corresponding transformed texture feature data can be processed by this level of data fusion, and the convolution kernel corresponding to this level of data fusion processing can be transformed:
- k cij is the convolution kernel parameter with position (i, j) in the c-th convolution kernel channel corresponding to the convolution kernel of this level of data fusion processing;
- w c represents the transformation texture feature data, and the c-th The texture feature element corresponding to the convolution kernel channel.
- ⁇ is a hyperparameter, which is used to avoid the divisor being 0;
- k′ cij represents the convolution kernel parameter at position (i, j) in the cth convolution kernel channel of the transformed convolution kernel.
- the input feature map is subjected to convolution processing by using the transform convolution kernel to obtain an intermediate feature map corresponding to this level of data fusion processing.
- the input feature map can be convolved with a transformed convolution kernel:
- f c, x+i, y+j represents the eigenvalue of the feature point whose position is (x+i, y+j) in the lth channel of the input feature map;
- f′ lxy represents the feature value of the feature point whose position is (x, y) in the l-th channel of the intermediate feature map .
- the first feature fusion of the transformed texture feature data and the input feature map can be achieved. Since the convolution kernel is transformed by transforming the texture feature data, and the data amount of the convolution kernel is usually smaller than the data amount of the input feature map, the data that needs to be processed in the transformation process can be greatly reduced, and the data processing can be effectively improved. efficiency.
- Method B for each level of data fusion processing in the multi-level data fusion processing, use the transformed texture feature data corresponding to this level of data fusion processing to transform the input feature map corresponding to this level of data fusion processing to obtain a transformed feature map; then , using the convolution kernel corresponding to this level of data fusion processing, to perform convolution processing on the transformed feature map, and obtain the intermediate feature map corresponding to this level of data fusion processing.
- the method of transforming the input feature map by using the transformed texture feature data is similar to the method of transforming the convolution kernel by using the transformed texture feature data in the above method A, and will not be repeated here.
- the method for performing multi-level data fusion processing on the normal vector image and the texture feature data further includes S203: performing a multi-level data fusion process on the intermediate feature map corresponding to the level of data fusion processing and the normal vector image.
- the second feature fusion is to obtain a result feature map corresponding to the data fusion processing at this level.
- the product between the intermediate feature map and the normal vector image may be calculated, and the product result matrix is directly used as the result feature map corresponding to the data fusion process at this level.
- the resulting feature map may, for example, satisfy the following formula (4):
- f′ lxy represents the feature value of the feature point at the position (x, y) in the lth channel of the intermediate feature map
- f′′ lxy represents the feature point at the position (x, y) in the lth channel of the result feature map
- the eigenvalue of ; n xy represents the value of the normal vector corresponding to the pixel at the position (x, y) in the normal vector image.
- the normal vector image and the size of the intermediate feature map may be different, before the second feature fusion of the normal vector image and the intermediate feature map is performed, the normal vector image can be resized so that its size can be Meets the requirement of second feature fusion with the intermediate feature map.
- the size of the normal vector image may be adjusted by performing up-sampling or down-sampling processing. The specific adjustment method is related to the size of the intermediate feature maps corresponding to the data fusion processing at all levels, and will not be repeated here.
- the texture feature data of the face cannot fully express all the details of the face image, in order to make the final generated reconstructed face image more realistic, after obtaining the product result matrix, based on The product result matrix corresponding to this level of data fusion processing, and the preset deviation matrix and/or noise matrix of this level of data fusion processing, to obtain a result feature map corresponding to this level of data fusion processing.
- the preset deviation matrix here may be a hyperparameter
- the noise matrix may be, for example, a random Gaussian noise map.
- the product result matrix corresponding to this level of data fusion processing, and the preset deviation matrix and/or noise matrix of this level of data fusion processing may be aligned to obtain a result feature map corresponding to this level of data fusion processing.
- the feature value of each feature point in the result feature map can be directly used as the pixel value of each pixel point in the reconstructed face image, and rendered to generate a reconstructed face image.
- the method for generating a face image provided by the embodiment of the present disclosure may be implemented by using a pre-trained first neural network.
- an example of the present disclosure provides a specific method for training a first neural network, including:
- S301 Acquire a first sample normal vector image of a first sample three-dimensional face model and first sample texture feature data.
- the acquisition method of the first sample normal vector image and the first sample texture feature data of the first sample 3D face model is the same as the acquisition method of the normal vector image and texture feature data of the 3D face model in the above S101. Similar, and will not be repeated here.
- S302 Using the first neural network to be trained, perform data fusion processing on the first sample normal vector image and the first sample texture feature data to obtain a reconstruction of the first sample three-dimensional face model image.
- the process of generating a reconstructed image by using the first neural network to be trained is similar to the above-mentioned process of generating a reconstructed face image, and details are not repeated here.
- the first training loss may include at least one of the following: normal vector consistency loss, face key point consistency loss, and identity consistency loss.
- obtaining the first training loss based on the reconstructed image includes: performing a normal vector prediction process on the reconstructed image to obtain the The predicted normal vector image of the reconstructed image; the normal vector consistency loss is obtained by using the first sample normal vector image and the predicted normal vector image.
- a predicted three-dimensional face model may be generated based on the reconstructed image, and then a predicted normal vector image of the reconstructed image may be obtained based on the generated predicted three-dimensional face model.
- the reconstructed image I out output by the first neural network G satisfies The following formula (5):
- Iout G(n,z, ⁇ ) (5)
- the face normal vector prediction network N is used to predict the normal vector image ns that can generate the reconstructed image Iout .
- the normal vector image n s and the first sample normal vector image n of the first sample 3D face model should also be consistent enough. Therefore, the normal vector consistency loss L n satisfies the description formula (6):
- P( ) is the face detection network, which outputs the face area mask, which makes the normal vector consistency loss only valid in the face area
- N( ) is the pre-trained face normal vector prediction The network is used to predict the normal vector image ns of the reconstructed image Iout ; ⁇ represents element-wise multiplication.
- the face normal vector prediction network N( ⁇ ) can adopt the SfSNet (Shape from Shading Net) network.
- the obtaining the first training loss based on the reconstructed image includes: using the first neural network, based on the first sample Perform key point identification on the first reconstructed image obtained from the normal vector image and the first reference sample texture feature data to obtain the first key point of the first reconstructed image; Perform key point identification on the second reconstructed image obtained from the normal vector image of the sample and the texture feature data of the first target sample to obtain the second key point of the second reconstructed image; using the first key point and the second key point keypoints, and obtain the keypoint consistency loss.
- the first sample normal vector images corresponding to the first reconstructed image and the second reconstructed image are the same.
- the normal vector image is concerned with the general structure of the face surface, for the face, it is more concerned with whether the position of each key point in the face is accurate.
- two sets of different first sample texture feature data including the first reference sample texture feature data and the first target sample texture feature data
- the same set of first sample normal vector images are used to generate the first sample texture feature data.
- a reconstructed image and a second reconstructed image are used to generate the first sample texture feature data.
- the key points of the face are regarded as an important constraint condition, and the training accuracy of the first neural network is further improved by utilizing the loss of consistency of the key points between the first reconstructed image and the second reconstructed image.
- the key point consistency loss L ldmk for example, satisfies the following formula (7):
- z 1 represents the texture feature data of the first reference sample
- z 2 represents the texture feature data of the first target sample
- H( ⁇ ) represents the key point recognition of the image.
- the obtaining the first training loss based on the reconstructed image includes: using the first neural network, based on the first reference sample method The third reconstructed image obtained by the fusion processing of the vector image and the first sample texture feature data is identified, and the first identification result is obtained; performing identity recognition on the fourth reconstructed image obtained by fusing the texture feature data of the first sample to obtain a second identity recognition result; based on the first identity recognition result and the second identity recognition result, it is obtained that the identity is consistent sexual loss.
- the first reference sample normal vector image and the first target sample normal vector image correspond to different facial expressions and/or facial poses of the same person.
- the reconstructed image obtained by using the first neural network includes a third reconstructed image obtained based on the normal vector image of the first reference sample and a fourth reconstructed image obtained based on the normal vector image of the first target sample. The image and the fourth reconstructed image are identified, and the identity consistency loss is obtained using the identification results of both.
- the identity consistency loss L id for example, satisfies the following formula (8):
- ⁇ represents the shape of the face
- ⁇ 1 and ⁇ 2 respectively represent different facial expressions
- ⁇ 1 and ⁇ 2 represent different facial poses respectively
- R( ) represents the identification of the image.
- the first training loss includes an adversarial loss
- an adversarial network of the first neural network can be constructed, and the reconstructed image predicted by the first neural network is used as the input of the adversarial network, and the adversarial network is used to predict the reconstruction. realism of the image.
- the loss generated by the adversarial network is the adversarial loss La adv .
- the first training loss L GAR when training the first neural network can be represented by the following formula (9):
- ⁇ represents the weight value corresponding to the corresponding loss.
- the first neural network GAR may include: N rendering blocks 410-1, 410-2, . . . 410-N, where N is an integer greater than or equal to 1.
- the first fully-connected network 420 is used for performing the first fully-connected processing on the texture feature data z to obtain the first intermediate texture feature data z′.
- the second fully-connected networks 430-1, 430-2, . . . 430-N are used to perform second fully-connected processing on the first intermediate texture feature data z' to obtain transformed texture feature data w corresponding to the corresponding rendering block.
- Noise propagation module 440 used to propagate the noise matrix ⁇ to each rendering block 410-1, 410-2, . . . 410-N.
- an up-sampling module (not shown in the figure) for resizing the normal vector image n so as to adjust the normal vector image to that required by each rendering block 410-1, 410-2, . . . 410-N size.
- FIG. 4 can obtain a reconstructed face image with strong authenticity based on the texture feature data z, the normal vector image n and the noise matrix ⁇ .
- an example of the present disclosure also provides a specific structure of a rendering block 410, which may include:
- the convolution kernel transformation module 411 is configured to transform the convolution kernel k by using the transformed texture feature data w to obtain the transformed convolution kernel k′.
- Upsampling layer 412 used to perform upsampling processing on the result feature map output by the previous rendering block, or the preset feature map, to obtain the input feature map corresponding to the current rendering block.
- the convolution layer 413 is used to perform convolution processing on the input feature map corresponding to the current rendering block by using the transform convolution kernel k′ to obtain the intermediate feature map corresponding to the current rendering block.
- the fusion module 414 is configured to perform second feature fusion on the intermediate feature map and the normal vector image corresponding to the current rendering block to obtain a fusion feature map.
- the fused feature map is aligned with the noise matrix ⁇ and/or the deviation matrix b, and the result feature map corresponding to the current rendering block can be obtained.
- an embodiment of the present disclosure provides a method for generating a three-dimensional face model, including:
- S601 carry out three-dimensional face reconstruction and texture feature recovery to the target face image including the target face, obtain the initial normal vector image of the target face and the initial texture feature data of the target face.
- S602 Obtain a three-dimensional face model of the target face based on the initial normal vector image and the initial texture feature data.
- the data obtained based on the initial normal vector image and the initial texture feature data has higher accuracy.
- the generated 3D face model of the target face also has higher accuracy.
- a pre-trained 3D face model prediction network can be used to process the target face image including the target face to obtain a 3D face model of the target face, and then use the 3D face model of the target face to determine The initial normal vector image of the target face.
- a 3DMM algorithm may be used to perform three-dimensional face reconstruction on the target face image.
- a pre-trained second neural network can be used to recover the texture feature of the target face image including the target face, so as to obtain initial texture feature data of the target face.
- the second neural network may be obtained, for example, by training the first neural network provided in the embodiment of the present disclosure.
- the second neural network can be obtained by coupling training with the confronting first neural network.
- the network structure of the second neural network can be designed as the inverse structure of the first neural network, that is, the second neural network includes a plurality of inverse rendering blocks.
- each inverse rendering block the upsampling layer in each rendering block in the first neural network is replaced with a convolutional layer, and the convolutional layer in the rendering block is replaced with an upsampling layer.
- each inverse rendering block can output an output feature map corresponding to it.
- the predicted initial texture feature data can be obtained by using the output feature maps output by the inverse rendering blocks at all levels.
- the output feature map of each network layer has the same size as the input feature map of the network layer corresponding to the first neural network.
- the output feature map of each inverse rendering block in the second neural network can be obtained; then for each inverse rendering block, calculate the output of the inverse rendering block
- the mean and variance of the feature values of the feature points in the feature map; the mean and variance corresponding to each inverse rendering block are combined to obtain the target feature map; finally, the target feature map is fully connected using the fully connected network to obtain the initial texture characteristic data.
- Embodiments of the present disclosure also provide a specific method for training a second neural network, which may include: using the first neural network to perform a second sample normal vector image of the second sample three-dimensional face model and the second sample texture feature data. processing to obtain a sample face image of the second sample three-dimensional face model; using the second neural network to be trained to process the sample face image to obtain the predicted texture feature data corresponding to the sample face image; The predicted texture feature data and the second sample texture feature data are used to determine a third loss; based on the third loss, the second neural network is trained.
- the acquisition method of the predicted texture feature data is similar to the acquisition method of the initial texture feature data, and details are not repeated here.
- the loss function of the third loss for example, satisfies the following formula (10):
- R represents the second neural network
- R i (I out ) represents the output feature map of the ith inverse rendering block of the second neural network R;
- G i represents the input texture feature data w of the ith rendering block of the first neural network;
- ⁇ and ⁇ represent the mean values, respectively and standard deviation.
- MLP ([ ⁇ (R i (I out )); ⁇ (R i (I out ))]) represents the predicted texture feature data obtained by fully connecting the target feature map by using the fully connected layer MLP; z represents the first Two-sample texture feature data; G i (n, z, ⁇ ) represents the resulting feature map output by the ith rendering block of the first neural network; R i (I out ) represents the ith image of the second neural network (to be trained). The output feature map of the output of the inverse rendering block.
- the target face image can be processed by the second neural network to obtain an initial normal vector image corresponding to the target face image and initial texture feature data.
- the current normal vector image can be used as the normal vector image
- the current texture feature data can be used as texture feature data
- the face image generation method provided by any embodiment of the present disclosure can be used to generate a reconstructed face image as the current face image. Reconstruct face images.
- the current normal vector is adjusted to obtain a target normal vector image of the target face.
- the following iterative process may be performed: based on the target face image and the current reconstructed face image, a second loss is obtained; the current normal vector image and the current normal vector image and the The current texture feature data is adjusted to obtain a new normal vector image and new texture feature data; the new normal vector image is used as the current normal vector image, and the new texture feature data is used as the current texture feature data, Return to the step of generating the current reconstructed face image based on the current normal vector image and the current texture feature data, until the second loss is less than a preset loss threshold.
- the current normal vector image corresponding to the last iteration is used as the target normal vector image.
- the 3D face model of the target face can be generated by using the target normal vector image.
- the second loss includes, for example, pixel consistency loss and/or classification feature consistency loss.
- the second loss includes pixel consistency loss
- the pixel value of the current reconstructed face image and the difference between the pixel value of the target face image and the target face image are calculated, and the L2 norm is calculated for the difference, and based on The result of the calculation of the L2 norm determines the pixel consistency loss.
- the second loss includes the loss of classification feature consistency
- a pre-trained image classification network can be used to classify the target face image, and the first feature data output by the target network layer of the image classification network can be obtained. And utilize described image classification network to carry out classification processing to described current reconstruction face image, obtain the second characteristic data of described target network layer output; Based on described first characteristic data and described second characteristic data, obtain The categorical feature consistency loss.
- the image classification network is obtained by training the target face image and the current reconstructed face image. The specific position of the target network layer can be determined according to the actual situation.
- the target network layer is, for example, the first to fifth layers of the network layer in the image classification network.
- the loss function of the second loss may satisfy the following formula (11):
- G represents the first neural network; is a normal vector image generated by ⁇ , ⁇ , ⁇ , wherein ⁇ , ⁇ , ⁇ have the same meanings as the above formula (8);
- I t represents the target face image; represents the loss of pixel consistency;
- F represents the classification network trained using the target face image and the current reconstructed face image;
- F i represents the feature data output by the ith network layer of the classification network;
- ⁇ n represents the weight of the regularization term to random noise.
- Minimizing the above-mentioned second loss L f yields the optimized geometric parameters ⁇ , ⁇ and ⁇ . Based on the optimized geometric parameters ⁇ , ⁇ and ⁇ , the target normal vector image of the target face can be determined.
- the 3D face model has higher accuracy.
- the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
- the embodiment of the present disclosure also provides a face image generating apparatus corresponding to the face image generating method, because the principle of solving the problem by the apparatus in the embodiment of the present disclosure is the same as that of the above-mentioned face image generating method in the embodiment of the present disclosure. Similar, therefore, the implementation of the apparatus may refer to the implementation of the method, and repeated descriptions will not be repeated.
- FIG. 7 is a schematic diagram of an apparatus for generating a face image provided by an embodiment of the present disclosure, the apparatus includes:
- the first acquiring module 71 is configured to acquire normal vector images and texture feature data.
- the pixel value of each pixel in the normal vector image represents the value of the normal vector of the model vertex corresponding to the pixel in the three-dimensional face model corresponding to the normal vector image.
- the first processing module 72 is configured to perform multi-level data fusion processing on the normal vector image and the texture feature data to obtain a reconstructed face image.
- the first processing module 72 when performing multi-level data fusion processing on the normal vector image and the texture feature data to obtain a reconstructed face image, is specifically used to:
- the texture feature data is subjected to feature transformation to obtain transformed texture feature data corresponding to the multi-level data fusion processing.
- the first feature fusion is performed on the transformed texture feature data corresponding to this level of data fusion processing and the input feature map corresponding to this level of data fusion processing to obtain this level of data fusion processing.
- the corresponding intermediate feature map; the intermediate feature map corresponding to this level of data fusion processing and the normal vector image are subjected to second feature fusion to obtain the result feature map corresponding to this level of data fusion processing; based on the last level of data fusion processing corresponding
- the resulting feature map is obtained to obtain the reconstructed face image.
- the first processing module 72 when the first processing module 72 performs feature transformation on the texture feature data to obtain transformed texture feature data corresponding to multi-level data fusion processing, the first processing module 72 is specifically configured to: perform feature transformation on the texture feature data. Perform first full join processing on the data to obtain first intermediate texture feature data; perform multiple second full join processing on the first intermediate texture feature data to obtain transformed texture feature data corresponding to the multi-level data fusion processing respectively. Wherein, the full connection parameters of different second full connection processes are different.
- the first processing module 72 for each level of data fusion processing in the multi-level data fusion processing, combines the transformed texture feature data corresponding to this level of data fusion processing and the data fusion processing with this level of data fusion processing.
- the corresponding input feature map is subjected to the first feature fusion, and before obtaining the intermediate feature map corresponding to the data fusion process at this level, it is also used for: for the first-level data fusion process in the multi-level data fusion process, the preset feature map Perform up-sampling to obtain the input feature map corresponding to the first-level data fusion processing; for any level of data fusion processing that is not the first-level data fusion processing in the multi-level data fusion processing, the upper The result feature map output by the first-level data fusion process is up-sampled to obtain the input feature map corresponding to this level of data fusion process.
- the first processing module 72 for each level of data fusion processing in the multi-level data fusion processing, combines the transformed texture feature data corresponding to this level of data fusion processing and the data fusion processing with this level of data fusion processing.
- the first feature fusion is performed on the corresponding input feature map, and the intermediate feature map corresponding to the data fusion processing of this level is obtained, it is specifically used for: for each level of data fusion processing in the multi-level data fusion processing, use the corresponding data fusion processing of this level. Transform the texture feature data, transform the convolution kernel corresponding to this level of data fusion processing, and obtain the transformed convolution kernel.
- the input feature map is subjected to convolution processing by using the transform convolution kernel to obtain an intermediate feature map corresponding to this level of data fusion processing.
- the first processing module 72 for each level of data fusion processing in the multi-level data fusion processing, combines the transformed texture feature data corresponding to this level of data fusion processing and the data fusion processing with this level of data fusion processing.
- the first feature fusion is performed on the corresponding input feature map, and the intermediate feature map corresponding to the data fusion processing of this level is obtained, it is specifically used for: for each level of data fusion processing in the multi-level data fusion processing, use the corresponding data fusion processing of this level. Transform the texture feature data, transform the input feature map corresponding to this level of data fusion processing, and obtain the transformed feature map. Then, using the convolution kernel corresponding to this level of data fusion processing, convolution processing is performed on the transformed feature map, and the intermediate feature map corresponding to this level of data fusion processing is obtained.
- the first processing module 72 performs second feature fusion on the intermediate feature map and the normal vector image corresponding to the data fusion processing at this level to obtain the result feature corresponding to the data fusion processing at this level.
- it is specifically used to: determine the intermediate feature map corresponding to the data fusion process at this level and the product result matrix of the normal vector image, and determine the result feature map corresponding to the data fusion process at this level based on the product result matrix.
- the first processing module 72 when determining the result feature map corresponding to the data fusion processing at this level based on the product result matrix, is used to: based on the product result matrix corresponding to the data fusion processing at this level. , and the preset deviation matrix and/or noise matrix of the data fusion processing at this level, to obtain a result feature map corresponding to the data fusion processing at this level.
- the first processing module 72 obtains this level of data based on the corresponding product result matrix of this level of data fusion processing and the preset deviation matrix and/or noise matrix of this level of data fusion processing.
- the result feature map corresponding to the fusion processing is specifically used for: performing the alignment addition of the product result matrix corresponding to the data fusion processing of this level, the preset deviation matrix and/or the noise matrix of the data fusion processing of this level, to obtain the level of the data fusion processing.
- the resulting feature map corresponding to the data fusion process is specifically used for: performing the alignment addition of the product result matrix corresponding to the data fusion processing of this level, the preset deviation matrix and/or the noise matrix of the data fusion processing of this level.
- the face image generation method is implemented by a pre-trained first neural network.
- the face image generating apparatus further includes: a first training module 73 for obtaining the first neural network by adopting the following method: obtaining the first sample three-dimensional face model of the first The sample normal vector image and the first sample texture feature data; using the first neural network to be trained, perform data fusion processing on the first sample normal vector image and the first sample texture feature data, obtaining a reconstructed image of the first sample three-dimensional face model; obtaining a first training loss based on the reconstructed image, and using the first training loss to train the first neural network.
- a first training module 73 for obtaining the first neural network by adopting the following method: obtaining the first sample three-dimensional face model of the first The sample normal vector image and the first sample texture feature data; using the first neural network to be trained, perform data fusion processing on the first sample normal vector image and the first sample texture feature data, obtaining a reconstructed image of the first sample three-dimensional face model; obtaining a first training loss based on the reconstructed image, and using the first training loss to train the first
- the first training loss includes at least one of the following: normal vector consistency loss, face key point consistency loss, identity consistency loss, and adversarial loss.
- the first training loss includes a normal vector consistency loss
- the first training module 73 when acquiring the first training loss based on the reconstructed image, is specifically configured to: Performing normal vector prediction processing on the reconstructed image to obtain a predicted normal vector image of the reconstructed image; using the first sample normal vector image and the predicted normal vector image to obtain the normal vector consistency loss.
- the first training loss includes keypoint consistency loss
- the first sample texture feature data includes: first reference sample texture feature data and first target sample texture feature data
- the The reconstructed image includes a first reconstructed image obtained based on the texture feature data of the first reference sample, and a second reconstructed image obtained based on the texture feature data of the first target sample.
- the first training module 73 when acquiring the first training loss based on the reconstructed image, is specifically configured to: perform key point identification on the first reconstructed image to obtain the first key point of the first reconstructed image . Perform key point identification based on the second reconstructed image to obtain a second key point of the second reconstructed image. Using the first keypoint and the second keypoint, the keypoint consistency loss is obtained.
- the first training loss includes identity consistency loss.
- the first training module 73 when acquiring the first training loss based on the reconstructed image, is specifically used for: using the first neural network, based on the first reference sample normal vector image and the first sample texture feature.
- the third reconstructed image obtained by data fusion processing is identified, and the first identification result is obtained;
- the fourth reconstructed image obtained by the fusion process is subjected to identity recognition to obtain a second identity recognition result; and the identity consistency loss is obtained based on the first identity recognition result and the second identity recognition result.
- an embodiment of the present disclosure further provides a device for generating a three-dimensional face model, including:
- the second acquisition module 81 is configured to perform three-dimensional face reconstruction and texture feature recovery on the target face image including the target face, to obtain the initial normal vector image of the target face and the initial texture feature of the target face data.
- the second processing module 82 is configured to obtain a three-dimensional face model of the target face based on the initial normal vector image and the initial texture feature data.
- the second processing module 82 when obtaining the three-dimensional face model of the target face based on the initial normal vector image and the initial texture feature data, is specifically used for: Using the initial normal vector image as the current normal vector image and the initial texture feature data as the current texture feature data, based on the current normal vector image and the current texture feature data, generate a current reconstructed face image ; Based on the target face image and the current reconstructed face image, the current normal vector is adjusted to obtain the target normal vector image of the target face; Based on the target normal vector image, generate the Describe the 3D face model of the target face.
- the second processing module 82 when generating a reconstructed face image based on the current normal vector image and the texture feature data, is specifically configured to: convert the current normal vector The image is used as a normal vector image, and the current texture feature data is used as texture feature data, and the reconstructed face image is generated by using the face image generation method described in any one of the first aspect as the current reconstructed face image.
- the second processing module 82 adjusts the current normal vector based on the target face image and the current reconstructed face image to obtain the target face.
- the target normal vector image of the Adjust the normal vector image and the current texture feature data to obtain a new normal vector image and new texture feature data; use the new normal vector image as the current normal vector image, and use the new texture feature
- the data is used as the current texture feature data, and returns to the step of generating the current reconstructed face image based on the current normal vector image and the current texture feature data, until the second loss is less than a preset loss threshold.
- the second loss may include pixel consistency loss and/or classification feature consistency loss.
- the second processing module 82 based on the target face image and the current reconstructed face image, when the second loss is obtained, it is specifically used for: using a pre-trained image classification network to classify the target face image to obtain the first feature data output by the target network layer of the image classification network; and using the image The classification network performs classification processing on the current reconstructed face image to obtain the second feature data output by the target network layer; based on the first feature data and the second feature data, the classification feature consistency is obtained loss.
- the image classification network can be obtained by training the target face image and the current reconstructed face image.
- the second acquisition module 81 when performing texture feature recovery on the target face image including the target face to obtain the initial texture feature data of the target face, is specifically used for: using The pre-trained second neural network performs texture feature recovery on the target face image including the target face to obtain an initial normal vector image of the target face.
- the second neural network may be obtained by training the first neural network described in any embodiment of the present disclosure.
- a second training module 83 configured to train the second neural network in the following manner: using the first neural network, The sample normal vector image and the second sample texture feature data are processed to obtain the sample face image of the second sample three-dimensional face model; the sample face image is processed by the second neural network to be trained to obtain the The predicted texture feature data corresponding to the sample face image; based on the predicted texture feature data and the second sample texture feature data, a third loss is determined; based on the third loss, the second neural network is trained.
- An embodiment of the present disclosure further provides an electronic device.
- a schematic structural diagram of the electronic device provided by an embodiment of the present disclosure includes: a processor 91 and a memory 92 .
- the memory 92 stores machine-readable instructions executable by the processor 91, and the processor 91 is configured to execute the machine-readable instructions stored in the memory 92.
- the processor 91 executes the instructions. The following steps:
- the pixel value of each pixel in the normal vector image represents the value of the normal vector of the model vertex corresponding to the pixel in the three-dimensional face model corresponding to the normal vector image.
- the above-mentioned memory 92 includes a memory 921 and an external memory 922 .
- the memory 921 here is also called internal memory, and is used to temporarily store operation data in the processor 91 and data exchanged with the external memory 922 such as a hard disk.
- the processor 91 exchanges data with the external memory 922 through the memory 921 .
- Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the method for generating a face image or the three-dimensional face image described in the foregoing method embodiments is executed when the computer program is run by a processor.
- the steps of the face model generation method may be a volatile or non-volatile computer-readable storage medium.
- Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program code, and the program code includes instructions that can be used to execute the face image generation method or the three-dimensional face model generation method described in the above method embodiments
- the program code includes instructions that can be used to execute the face image generation method or the three-dimensional face model generation method described in the above method embodiments
- the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof.
- the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
- the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
- the computer software products are stored in a storage medium, including Several instructions are used to cause an electronic device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Generation (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (24)
- 一种人脸图像生成方法,其特征在于,包括:获取法向量图像、以及纹理特征数据,其中,所述法向量图像中每个像素点的像素值表征与所述法向量图像对应的三维人脸模型中与该像素点对应的模型顶点的法向量的值;对所述法向量图像、以及所述纹理特征数据进行多级数据融合处理,得到重构人脸图像。
- 根据权利要求1所述的人脸图像生成方法,其特征在于,所述对所述法向量图像、以及所述纹理特征数据进行多级数据融合处理,得到重构人脸图像,包括:对所述纹理特征数据进行特征变换,得到所述多级数据融合处理分别对应的变换纹理特征数据;针对所述多级数据融合处理中的每级数据融合处理,将该级数据融合处理对应的变换纹理特征数据、与该级数据融合处理对应的输入特征图进行第一特征融合,得到该级数据融合处理对应的中间特征图;对该级数据融合处理对应的中间特征图、以及所述法向量图像进行第二特征融合,得到该级数据融合处理对应的结果特征图;基于所述多级数据融合处理中最后一级数据融合处理对应的结果特征图,得到所述重构人脸图像。
- 根据权利要求2所述的人脸图像生成方法,其特征在于,所述对所述纹理特征数据进行特征变换,得到所述多级数据融合处理分别对应的变换纹理特征数据,包括:对所述纹理特征数据进行第一全连接处理,得到第一中间纹理特征数据;对所述第一中间纹理特征数据进行多次第二全连接处理,得到所述多级数据融合处理分别对应的变换纹理特征数据,其中,不同的所述第二全连接处理的全连接参数不同。
- 根据权利要求2或3所述的人脸图像生成方法,其特征在于,针对所述多级数据融合处理中第一级数据融合处理,对预设特征图进行上采样,得到该第一级数据融合处理对应的输入特征图;针对所述多级数据融合处理中非第一级数据融合处理的任一级数据融合处理,对该级数据融合处理的上一级数据融合处理输出的结果特征图进行上采样,得到该级数据融合处理对应的输入特征图。
- 根据权利要求2-4任一项所述的人脸图像生成方法,其特征在于,将该级数据融合处理对应的变换纹理特征数据、与该级数据融合处理对应的输入特征图进行第一特征融合,得到该级数据融合处理对应的中间特征图,包括:利用该级数据融合处理对应的变换纹理特征数据,对该级数据融合处理对应的卷积核进行变换,得到变换卷积核;利用所述变换卷积核对所述输入特征图进行卷积处理,得到该级数据融合处理对应的中间特征图。
- 根据权利要求2-4任一项所述的人脸图像生成方法,其特征在于,将该级数据融合处理对应的变换纹理特征数据、与该级数据融合处理对应的输入特征图进行第一特征融合,得到该级数据融合处理对应的中间特征图,包括:利用该级数据融合处理对应的变换纹理特征数据,对该级数据融合处理对应的输入特征图进行变换,得到变换特征图;利用该级数据融合处理对应的卷积核,对所述变换特征图进行卷积处理,得到该级 数据融合处理对应的中间特征图。
- 根据权利要求2-6任一项所述的人脸图像生成方法,其特征在于,所述对该级数据融合处理对应的中间特征图、以及所述法向量图像进行第二特征融合,得到该级数据融合处理对应的结果特征图,包括:确定该级数据融合处理对应的中间特征图、以及所述法向量图像的乘积结果矩阵;基于所述乘积结果矩阵确定该级数据融合处理对应的结果特征图。
- 根据权利要求7所述的人脸图像生成方法,其特征在于,所述基于所述乘积结果矩阵确定该级数据融合处理对应的结果特征图,包括:基于该级数据融合处理对应的乘积结果矩阵、以及该级数据融合处理的预设偏差矩阵和/或噪声矩阵,得到该级数据融合处理对应的结果特征图。
- 根据权利要求8所述的人脸图像生成方法,其特征在于,所述基于该级数据融合处理对应的乘积结果矩阵、以及该级数据融合处理的预设偏差矩阵和/或噪声矩阵,得到该级数据融合处理对应的结果特征图,包括:将该级数据融合处理对应的乘积结果矩阵、与该级数据融合处理的预设偏差矩阵和/或噪声矩阵进行对位相加,得到该级数据融合处理对应的结果特征图。
- 根据权利要求1-9任一项所述的人脸图像生成方法,其特征在于,所述人脸图像生成方法通过第一神经网络实现,采用下述方式训练得到所述第一神经网络:获取第一样本三维人脸模型的第一样本法向量图像、以及第一样本纹理特征数据;利用待训练的所述第一神经网络,对所述第一样本法向量图像、以及所述第一样本纹理特征数据进行数据融合处理,得到所述第一样本三维人脸模型的重建图像;基于所述重建图像,获取第一训练损失;并利用所述第一训练损失,训练所述第一神经网络。
- 根据权利要求10所述的人脸图像生成方法,其特征在于,所述第一训练损失包括法向量一致性损失,所述基于所述重建图像,获取第一训练损失,包括:对所述重建图像进行法向量预测处理,得到所述重建图像的预测法向量图像;利用所述第一样本法向量图像、和所述预测法向量图像,得到所述法向量一致性损失。
- 根据权利要求10或11所述的人脸图像生成方法,其特征在于,所述第一训练损失包括关键点一致性损失,所述第一样本纹理特征数据包括第一参考样本纹理特征数据以及第一目标样本纹理特征数据;所述基于所述重建图像,获取第一训练损失,包括:对利用所述第一神经网络,基于所述第一样本法向量图像、以及所述第一参考样本纹理特征数据得到的第一重建图像进行关键点识别,得到所述第一重建图像的第一关键点;对利用所述第一神经网络,基于所述第一样本法向量图像、以及所述第一目标样本纹理特征数据得到的第二重建图像进行关键点识别,得到所述第二重建图像的第二关键点;利用所述第一关键点和所述第二关键点,得到所述关键点一致性损失。
- 根据权利要求10至12任一项所述的人脸图像生成方法,其特征在于,所述第一训练损失包括身份一致性损失,所述第一样本三维人脸模型的第一样本法向量图像包括第一参考样本法向量图像、以及第一目标样本法向量图像;所述第一参考样本法向量图像和所述第一目标样本法向量图像对应的人脸表情和/或人脸位姿不同;所述基于所述重建图像,获取第一训练损失,包括:对利用所述第一神经网络,基于所述第一参考样本法向量图像、以及所述第一样本纹理特征数据得到的第三重建图像进行身份识别,得到第一身份识别结果;对利用所述第一神经网络,基于所述第一目标样本法向量图像、以及所述第一样本纹理特征数据得到的第四重建图像进行身份识别,得到第二身份识别结果;基于所述第一身份识别结果和所述第二身份识别结果,得到所述身份一致性损失。
- 一种三维人脸模型生成方法,其特征在于,包括:对包括目标人脸的目标人脸图像进行三维人脸重建和纹理特征恢复,得到所述目标人脸的初始法向量图像、以及所述目标人脸的初始纹理特征数据;基于所述初始法向量图像、以及所述初始纹理特征数据,得到所述目标人脸的三维人脸模型。
- 根据权利要求14所述的三维人脸模型生成方法,其特征在于,所述基于所述初始法向量图像、以及所述初始纹理特征数据,得到所述目标人脸的三维人脸模型,包括:将所述初始法向量图像作为当前法向量图像、以及将所述初始纹理特征数据作为当前纹理特征数据;基于所述当前法向量图像、以及所述当前纹理特征数据,生成当前重构人脸图像;基于所述目标人脸图像、以及所述当前重构人脸图像,对所述当前法向量图像进行调整,得到所述目标人脸的目标法向量图像;基于所述目标法向量图像,生成所述目标人脸的三维人脸模型。
- 根据权利要求15所述的三维人脸模型生成方法,其特征在于,所述基于所述当前法向量图像、以及所述当前纹理特征数据,生成当前重构人脸图像,包括:将所述当前法向量图像作为法向量图像、以及将所述当前纹理特征数据作为纹理特征数据,利用权利要求1至13任一项所述的人脸图像生成方法生成重构人脸图像,作为所述当前重构人脸图像。
- 根据权利要求15或16所述的三维人脸模型生成方法,其特征在于,所述基于所述目标人脸图像、以及所述当前重构人脸图像,对所述当前法向量图像进行调整,得到所述目标人脸的目标法向量图像,包括:基于所述目标人脸图像、以及所述当前重构人脸图像,得到第二损失;利用所述第二损失对所述当前法向量图像、以及所述当前纹理特征数据进行调整得到新的法向量图像、以及新的纹理特征数据;将所述新的法向量图像作为当前法向量图像、以及将所述新的纹理特征数据作为当前纹理特征数据,返回至基于所述当前法向量图像、以及所述当前纹理特征数据,生成当前重构人脸图像的步骤,直至所述第二损失小于预设的损失阈值;将最后一次迭代对应的当前法向量图像作为所述目标法向量图像。
- 根据权利要求17所述的人脸模型生成方法,其特征在于,所述第二损失包括分类特征一致性损失,所述基于所述目标人脸图像、以及所述当前重构人脸图像,得到第二损失,包括:利用预先训练的图像分类网络对所述目标人脸图像进行分类处理,得到所述图像分类网络的目标网络层输出的第一特征数据;利用所述图像分类网络对所述当前重构人脸图像进行分类处理,得到所述目标网络层输出的第二特征数据;基于所述第一特征数据和所述第二特征数据,得到所述分类特征一致性损失;其中,所述图像分类网络利用所述目标人脸图像和所述当前重构人脸图像训练得到。
- 根据权利要求14至18任一项所述的三维人脸模型生成方法,其特征在于,所述对包括目标人脸的目标人脸图像进行纹理特征恢复,得到所述目标人脸的初始纹理特征数据,包括:利用预先训练的第二神经网络对所述目标人脸图像进行纹理特征恢复,得到所述目标人脸的初始纹理特征数据。
- 根据权利要求19所述的人脸模型生成方法,其特征在于,采用下述方式训练所述第二神经网络:利用待训练的第二神经网络对样本人脸图像进行处理,得到所述样本人脸图像对应的预测纹理特征数据,其中,所述样本人脸图像是根据权利要求1至13中任一项所述的人脸图像生成方法,根据第二样本三维人脸模型的第二样本法向量图像、以及第二样本纹理特征数据进行处理得到的;基于所述预测纹理特征数据、以及所述第二样本纹理特征数据,确定第三损失;基于所述第三损失,训练所述第二神经网络。
- 一种人脸图像生成装置,其特征在于,包括:第一获取模块,用于获取法向量图像、以及纹理特征数据,其中,所述法向量图像中每个像素点的像素值表征与所述法向量图像对应的三维人脸模型中与该像素点对应的模型顶点的法向量的值;第一处理模块,用于对所述法向量图像、以及所述纹理特征数据进行多级数据融合处理,得到重构人脸图像。
- 一种三维人脸模型生成装置,其特征在于,包括:第二获取模块,用于对包括目标人脸的目标人脸图像进行三维人脸重建和纹理特征恢复,得到所述目标人脸的初始法向量图像、以及所述目标人脸的初始纹理特征数据;第二处理模块,用于基于所述初始法向量图像、以及所述初始纹理特征数据,得到所述目标人脸的三维人脸模型。
- 一种电子设备,其特征在于,包括:处理器、存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述处理器执行如权利要求1至13任一项所述的人脸图像生成方法,或执行如权利要求14至20任一项所述的三维人脸模型生成方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被电子设备运行时,所述电子设备执行如权利要求1至13任一项所述的人脸图像生成方法,或执行如权利要求14至20任一项所述的三维人脸模型生成方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110387786.7 | 2021-04-09 | ||
CN202110387786.7A CN112926543A (zh) | 2021-04-09 | 2021-04-09 | 图像生成、三维模型生成方法、装置、电子设备及介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022213623A1 true WO2022213623A1 (zh) | 2022-10-13 |
Family
ID=76174048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/133390 WO2022213623A1 (zh) | 2021-04-09 | 2021-11-26 | 图像生成、三维人脸模型生成的方法、装置、电子设备及存储介质 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN112926543A (zh) |
TW (1) | TW202240531A (zh) |
WO (1) | WO2022213623A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523414A (zh) * | 2023-06-29 | 2023-08-01 | 深圳市鑫冠亚科技有限公司 | 一种复合镍铜散热底板的生产管理方法及系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112926543A (zh) * | 2021-04-09 | 2021-06-08 | 商汤集团有限公司 | 图像生成、三维模型生成方法、装置、电子设备及介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598878A (zh) * | 2015-01-07 | 2015-05-06 | 深圳市唯特视科技有限公司 | 基于灰度和深度信息的多层融合的多模态人脸识别装置及方法 |
CN107292234A (zh) * | 2017-05-17 | 2017-10-24 | 南京邮电大学 | 一种基于信息边缘和多模态特征的室内场景布局估计方法 |
CN108229548A (zh) * | 2017-12-27 | 2018-06-29 | 华为技术有限公司 | 一种物体检测方法及装置 |
CN110428491A (zh) * | 2019-06-24 | 2019-11-08 | 北京大学 | 基于单帧图像的三维人脸重建方法、装置、设备及介质 |
CN112926543A (zh) * | 2021-04-09 | 2021-06-08 | 商汤集团有限公司 | 图像生成、三维模型生成方法、装置、电子设备及介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111739146B (zh) * | 2019-03-25 | 2024-07-30 | 华为技术有限公司 | 物体三维模型重建方法及装置 |
CN110020620B (zh) * | 2019-03-29 | 2021-07-30 | 中国科学院深圳先进技术研究院 | 一种大姿态下的人脸识别方法、装置及设备 |
CN111784821B (zh) * | 2020-06-30 | 2023-03-14 | 北京市商汤科技开发有限公司 | 三维模型生成方法、装置、计算机设备及存储介质 |
CN111882643A (zh) * | 2020-08-10 | 2020-11-03 | 网易(杭州)网络有限公司 | 三维人脸构建方法、装置和电子设备 |
-
2021
- 2021-04-09 CN CN202110387786.7A patent/CN112926543A/zh active Pending
- 2021-11-26 WO PCT/CN2021/133390 patent/WO2022213623A1/zh active Application Filing
- 2021-12-17 TW TW110147533A patent/TW202240531A/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598878A (zh) * | 2015-01-07 | 2015-05-06 | 深圳市唯特视科技有限公司 | 基于灰度和深度信息的多层融合的多模态人脸识别装置及方法 |
CN107292234A (zh) * | 2017-05-17 | 2017-10-24 | 南京邮电大学 | 一种基于信息边缘和多模态特征的室内场景布局估计方法 |
CN108229548A (zh) * | 2017-12-27 | 2018-06-29 | 华为技术有限公司 | 一种物体检测方法及装置 |
CN110428491A (zh) * | 2019-06-24 | 2019-11-08 | 北京大学 | 基于单帧图像的三维人脸重建方法、装置、设备及介质 |
CN112926543A (zh) * | 2021-04-09 | 2021-06-08 | 商汤集团有限公司 | 图像生成、三维模型生成方法、装置、电子设备及介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116523414A (zh) * | 2023-06-29 | 2023-08-01 | 深圳市鑫冠亚科技有限公司 | 一种复合镍铜散热底板的生产管理方法及系统 |
CN116523414B (zh) * | 2023-06-29 | 2023-09-05 | 深圳市鑫冠亚科技有限公司 | 一种复合镍铜散热底板的生产管理方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN112926543A (zh) | 2021-06-08 |
TW202240531A (zh) | 2022-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11501415B2 (en) | Method and system for high-resolution image inpainting | |
Dong et al. | Denoising prior driven deep neural network for image restoration | |
US11983850B2 (en) | Image processing method and apparatus, device, and storage medium | |
CN107154023B (zh) | 基于生成对抗网络和亚像素卷积的人脸超分辨率重建方法 | |
CN111784821B (zh) | 三维模型生成方法、装置、计算机设备及存储介质 | |
WO2020119527A1 (zh) | 人体动作识别方法、装置、终端设备及存储介质 | |
CN110533712A (zh) | 一种基于卷积神经网络的双目立体匹配方法 | |
CN112001914A (zh) | 深度图像补全的方法和装置 | |
CN110211035B (zh) | 融合互信息的深度神经网络的图像超分辨率方法 | |
CN112396645B (zh) | 一种基于卷积残差学习的单目图像深度估计方法和系统 | |
WO2022213623A1 (zh) | 图像生成、三维人脸模型生成的方法、装置、电子设备及存储介质 | |
JP2024510265A (ja) | 高解像度ニューラル・レンダリング | |
WO2020098257A1 (zh) | 一种图像分类方法、装置及计算机可读存储介质 | |
CN114511576B (zh) | 尺度自适应特征增强深度神经网络的图像分割方法与系统 | |
US12026857B2 (en) | Automatically removing moving objects from video streams | |
CN110874575A (zh) | 一种脸部图像处理方法及相关设备 | |
Zhang et al. | Multimodal image outpainting with regularized normalized diversification | |
CN115082322B (zh) | 图像处理方法和装置、图像重建模型的训练方法和装置 | |
Liu et al. | Facial image inpainting using attention-based multi-level generative network | |
CN115830241A (zh) | 一种基于神经网络的真实感三维人脸纹理重建方法 | |
Lu et al. | Parallel region-based deep residual networks for face hallucination | |
CN113298931B (zh) | 一种物体模型的重建方法、装置、终端设备和存储介质 | |
CN116912148B (zh) | 图像增强方法、装置、计算机设备及计算机可读存储介质 | |
Dinesh et al. | Point cloud video super-resolution via partial point coupling and graph smoothness | |
Liu et al. | Capsule embedded resnet for image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21935840 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21935840 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21935840 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21935840 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26.03.2024) |