WO2022087941A1 - 人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质 - Google Patents

人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质 Download PDF

Info

Publication number
WO2022087941A1
WO2022087941A1 PCT/CN2020/124657 CN2020124657W WO2022087941A1 WO 2022087941 A1 WO2022087941 A1 WO 2022087941A1 CN 2020124657 W CN2020124657 W CN 2020124657W WO 2022087941 A1 WO2022087941 A1 WO 2022087941A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
face image
loss
network model
Prior art date
Application number
PCT/CN2020/124657
Other languages
English (en)
French (fr)
Inventor
卢运华
张丽杰
陈冠男
刘瀚文
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202080002537.5A priority Critical patent/CN114981835A/zh
Priority to PCT/CN2020/124657 priority patent/WO2022087941A1/zh
Publication of WO2022087941A1 publication Critical patent/WO2022087941A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting

Definitions

  • the present disclosure relates to the technical field of face reconstruction, and in particular, to a training method and apparatus for a face reconstruction model, an electronic device, and a readable storage medium.
  • Face reconstruction refers to a technology of reconstructing a face image of a person included in the video data, and is widely used in person identification and tracking. In related technologies, the accuracy and clarity of face reconstruction are poor.
  • Embodiments of the present disclosure provide a training method and apparatus for a face reconstruction model, and a face reconstruction method and apparatus, so as to solve the problem of poor accuracy and clarity of face reconstruction.
  • an embodiment of the present disclosure provides a method for training a face reconstruction model, including the following steps:
  • the training data includes a target face image and a first face image corresponding to the target face image, and the clarity of the first face image is smaller than the clarity of the target face image;
  • the discriminant result of sex is the discriminant network model of the output, and the discriminant result includes the discriminant result of the overall authenticity of the input face image and the authenticity of the local feature;
  • the trained first network model is used as a face reconstruction model, wherein, when the training is completed, the values of the first loss function and the second loss function both reach corresponding target thresholds.
  • the second loss function includes a first discriminative adversarial loss
  • the second network model includes a global discriminative sub-network
  • the acquiring the second loss function corresponding to the second network model includes:
  • the first discriminant confrontation loss is obtained according to the first discriminant result and the second discriminant result.
  • the second loss function includes a second discriminative adversarial loss and a third discriminative adversarial loss
  • the second network model further includes an eye discrimination sub-network and a mouth discrimination sub-network
  • the acquiring the second loss function corresponding to the second network model includes:
  • the first eye image and the first mouth image are marked as false, the second eye image and the second mouth image are marked as true, the first eye image and the first
  • the two eye images are respectively input to the eye discrimination sub-network to output the third discrimination result and the fourth discrimination result respectively;
  • the first mouth image and the second mouth image are respectively input to the mouth discrimination sub-network respectively Output the fifth discrimination result and the sixth discrimination result;
  • a third discriminant confrontation loss is obtained according to the fifth discriminant result and the sixth discriminant result.
  • the first loss function includes a first sub-loss and a second sub-loss
  • the obtaining the first loss function corresponding to the first network model includes:
  • the two-person face bitmap corresponds to different regions of the same face image
  • the first-person face bitmap and the third-person face bitmap correspond to the same region of different face images
  • the second-person face bitmap corresponds to the same region of different face images.
  • the bitmap and the fourth person's face bitmap correspond to the same area of different face images
  • the second sub-loss is obtained according to the difference between the second person's face bitmap and the fourth person's face bitmap.
  • the first human face bitmap includes a facial feature image of a human face image
  • the second human face bitmap includes a skin image of the human face image
  • the first loss function includes a third sub-loss
  • the acquiring the first loss function corresponding to the first network model further includes:
  • the third sub-loss is obtained according to the difference between the first feature point data and the second feature point data.
  • the first feature point data includes a heat map of the target face image
  • the second feature point data includes a heat map of the second face image
  • the heat map includes a human face
  • the first loss function includes a fourth sub-loss
  • the acquiring the first loss function corresponding to the first network model further includes:
  • the fourth sub-loss is obtained according to the difference between the first eigenvector and the second eigenvector.
  • the first loss function includes a fifth sub-loss
  • the acquiring the first loss function corresponding to the first network model further includes:
  • the fifth sub-loss is obtained according to the difference between the target face image and the second face image.
  • the first loss function includes one or more of a sixth sub-loss and a seventh sub-loss
  • the acquiring the first loss function corresponding to the first network model further includes:
  • the perceptual loss according to the difference between the mouth region image of the target face image and the mouth region image of the second face image is taken as the seventh sub-loss.
  • the first loss function includes an eighth sub-loss
  • the acquiring the first loss function corresponding to the first network model further includes:
  • the eighth sub-loss is obtained according to a generative adversarial loss between the first network model and the second network model, wherein the second network model includes a global discriminant sub-network, an eye discriminant sub-network and a mouth One or more of the discriminative sub-networks, and the generated adversarial loss is to mark the second face image output by the first network model as true, and then input the second face image into the global discriminant sub-network, After one or more items of the eye discrimination sub-network and the mouth discrimination sub-network, a discrimination result is obtained, and is determined according to the obtained discrimination result.
  • an embodiment of the present disclosure provides a face reconstruction method, including the following steps:
  • the input image is input into a face reconstruction model, and a face reconstruction image is obtained, wherein the face reconstruction model is obtained by performing model training with the training method of any one of the face reconstruction model in the first aspect.
  • an embodiment of the present disclosure provides a training device for a face reconstruction model, including:
  • a training data acquisition module configured to acquire training data, the training data includes a target face image and a first face image corresponding to the target face image, and the first face image has a lower definition than the target The clarity of the face image;
  • the first input module is used for inputting the first face image into a first network model to obtain a second face image, wherein the first network model takes the face image as an input to The reconstructed image of the image is the output generative network model;
  • the second input module is configured to input the target face image and the second face image into a second network model to obtain a discrimination result, wherein the second network model takes the face image as an input to
  • the discriminant result of the authenticity of the input face image is the output discriminant network model, and the discriminant result includes the discriminant result of the authenticity of the whole input face image and the authenticity of the local features;
  • a first loss function obtaining module configured to obtain a first loss function corresponding to the first network model, and adjust parameters of the first network model according to the first loss function
  • a second loss function obtaining module configured to obtain a second loss function corresponding to the second network model, and adjust parameters of the second network model according to the second loss function
  • a training module configured to perform model training on the first network model and the second network model in turn;
  • the face reconstruction model confirmation module is used to use the trained first network model as a face reconstruction model, wherein, when the training is completed, the values of the first loss function and the second loss function both reach corresponding values target threshold.
  • an embodiment of the present disclosure provides a face reconstruction device, including:
  • Input image acquisition module used to acquire input image
  • the input module has the ability to input the input image into a face reconstruction model to obtain a face reconstruction image, wherein the face reconstruction model is obtained by performing model training with the training method of any one of the face reconstruction model in the first aspect. of.
  • an embodiment of the present disclosure provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program is executed by the processor At the same time, the steps of implementing the training method of the face reconstruction model according to any one of the first aspects, or the steps of implementing the face reconstruction method according to the second aspect.
  • an embodiment of the present disclosure provides a readable storage medium on which a computer program is stored, and when the computer program is executed by the processor, implements the face reconstruction according to any one of the first aspects The steps of the training method of the model, or the steps of implementing the face reconstruction method as described in the second aspect.
  • the embodiments of the present disclosure perform model training on the first network model and the second network model by establishing a generative adversarial network including the first network model and the second network model, wherein the second network model includes the authenticity of the whole and the authenticity of the local features
  • the accurate judgment result will help to improve the accuracy of the judgment of the output results of the first network model, thereby improving the accuracy of the reconstructed model after training for face image reconstruction, and also helping to improve the iteration speed, thereby improving the model training efficiency.
  • FIG. 1 is a flowchart of a training method for a face reconstruction model provided by an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a face reconstruction method provided by an embodiment of the present disclosure
  • FIG. 3 is a structural diagram of a training device for a face reconstruction model provided by an embodiment of the present disclosure
  • FIG. 4 is a structural diagram of a face reconstruction apparatus provided by an embodiment of the present disclosure.
  • the embodiments of the present disclosure provide a training method for a face reconstruction model.
  • the training method of the face reconstruction model includes the following steps:
  • Step 101 Acquire training data.
  • the training data in this embodiment is also called a training set.
  • the training data includes a target face image and a first face image, and the training data includes a target face image and a first face image corresponding to the target face image.
  • the first The sharpness of the face image is smaller than that of the target face image.
  • the format of the face image may be video data or a photo.
  • the target face image and the first face image in this embodiment may be directly provided by the training data, or only the target face image, that is, the face image with higher definition, may be provided, and then the clarity of the target face image may be reduced
  • the first face image is generated, which is also called degenerating the target face image. After the degradation, the definition of the target face image is reduced, so as to obtain a first face image with a definition smaller than that of the target face image.
  • “sharpness” may refer to the degree of clarity of each detail shadow pattern and its boundary in an image. The higher the clarity, the better the perception effect of the human eye.
  • the definition of the output image is higher than that of the input image, for example, it means that the input image is processed by using the image processing method provided by the embodiments of the present disclosure, such as performing denoising, super-resolution and/or The resulting output image is sharper than the input image
  • the image can be processed by one or more of adding noise to the image, Gaussian blur, adjusting the brightness and contrast of the image, scaling the image, warping the image, and adding a motion blur effect to the image.
  • the target face image is degraded.
  • the quality of the face image in the target face image is relatively high, for example, the target face image may have suitable brightness and contrast, its image scale is appropriate, there is no motion blur, and the image quality is high, etc.
  • the target face image can be degraded by reducing or increasing its brightness and contrast, adjusting its image scale to make the image scale unbalanced, etc., to obtain the first face image, that is, to obtain a face image with lower definition.
  • the training data of the target face image and the first face image can be obtained.
  • Step 102 Input the first face image into a first network model to obtain a second face image.
  • the first network model in this embodiment is a generative network model that takes a face image as an input and takes a reconstructed image of the input face image as an output.
  • the first network model in this embodiment is used as a generator to process and reconstruct the input first face image.
  • the first network model performs deblurring or resolution enhancement processing on the first face image, so as to obtain a second face image from the first face image, in other words, the second face image is the first network The reconstruction result of the model for the first face image.
  • Step 103 Input the target face image and the second face image into a second network model to obtain a discrimination result.
  • the second network model is a discrimination network model that takes a face image as an input, and takes a result of judging the authenticity of the input face image as an output.
  • the second network model in this embodiment is equivalent to a discriminator, and the first network model and the second network model together constitute a generative adversarial network for model training.
  • the discrimination result of the second network model includes the discrimination result of the overall authenticity of the input face image and the authenticity of the local features, wherein the overall authenticity refers to the overall authenticity of the input face image from the global perspective of the face image.
  • the discrimination result, and the authenticity of the local feature refers to the determination result of the local detail feature of the face image.
  • the discrimination result of the output of the second network model as the discriminator is a numerical value between 0 and 1. The closer the discrimination result is to 1, the higher the authenticity of the discrimination of the second network model. On the contrary, if The closer the discrimination result is to 0, the lower the authenticity of the discrimination of the second network model.
  • Step 104 Obtain a first loss function corresponding to the first network model, and adjust parameters of the first network model according to the first loss function.
  • Step 105 Obtain a second loss function corresponding to the second network model, and adjust parameters of the second network model according to the second loss function.
  • Step 106 Perform the above steps alternately to perform model training on the first network model and the second network model in turn.
  • first loss function corresponding to the first network model and the second loss function corresponding to the second network model are respectively established, and the corresponding parameter pair of the first network model or the second network model is adjusted according to the established loss function.
  • the first network model and the second network model perform model training.
  • the process of alternately training the first network model and the second network model can be adjusted.
  • the first network model can be trained once, the second network model can be trained once, the first network model can be trained again, and so on; or after the first network model has been trained for many times, the first network model can be trained once.
  • the second network model is trained once, and then the first network model is trained multiple times, and so on.
  • the training methods for the first network model and the second network model in this embodiment are not limited to this.
  • Step 107 Use the trained first network model as a face reconstruction model.
  • the first network model after the training is completed is a face reconstruction model that meets the needs of face reconstruction.
  • the target threshold here can be set according to the actual situation, for example, it can be the minimum or maximum value that the first loss function or the second loss function can achieve.
  • the embodiments of the present disclosure perform model training on the first network model and the second network model by establishing a generative adversarial network including the first network model and the second network model, wherein the second network model includes the authenticity of the whole and the authenticity of the local features
  • the accurate judgment result will help to improve the accuracy of the judgment of the output results of the first network model, thereby improving the accuracy of the reconstructed model after training for face image reconstruction, and also helping to improve the iteration speed, thereby improving the model training efficiency.
  • the first loss function includes a first sub-loss and a second sub-loss
  • the above step 104 includes:
  • the second sub-loss is obtained according to the difference between the second person's face bitmap and the fourth person's face bitmap.
  • the first-person face bitmap and the second-person face bitmap corresponding to the target face image may be directly provided by training data, or may be obtained by analyzing the target face image.
  • the third-person face bitmap and the fourth-person face bitmap corresponding to the second face image are obtained by analyzing the second face image.
  • the corresponding human face bitmap can be obtained by using a pre-trained face parsing model.
  • the face parsing model can select existing or improved people such as RoI Tanh (Face Parsing with RoI Tanh-Warping).
  • RoI Tanh Full Parsing with RoI Tanh-Warping
  • the face parsing model is not further limited here.
  • the face bitmap of the first person and the face bitmap of the second person correspond to different regions of the same face image
  • the face bitmap of the first person and the face bitmap of the third person The corresponding areas are the same, in other words, the first person's face bitmap corresponds to a certain area of the target face image, such as the eye area, and the third person's face bitmap corresponds to the eye area of the second face image, Similarly, the second person's face bitmap and the fourth person's face bitmap correspond to the same area of the face image.
  • the first human face bitmap includes a facial feature image of the human face image
  • the second human face bitmap includes a skin image of the human face image
  • the first-person face bitmap and the third-person face bitmap correspond to the facial features in the face image, which are also referred to as organ maps in this embodiment, and the organs may refer to the mouth, nose, and eyes. , eyebrows, ears and other facial features.
  • the second person's face bitmap and the fourth person's face bitmap correspond to the skin area other than the facial features.
  • the reconstruction results of the first network model for the organ region and the skin region can be determined respectively, thereby improving the adjustment accuracy of the first network model. Helps improve model training efficiency.
  • each sub-loss in this embodiment may be calculated in different ways, for example, the L1 loss of the first-person face bitmap and the third-person face bitmap may be used as the first sub-loss, or Taking the L2 loss of the first person's face bitmap and the third person's face bitmap as the first sub-loss, obviously, during implementation, it may not be further limited in this embodiment.
  • the L1 loss refers to the least absolute deviation (LAD)
  • the L2 loss refers to the least square error (LSE).
  • the specific calculation method can refer to the related art, which will not be repeated here.
  • the first sub-loss is denoted as L2_feat.
  • the second sub-loss is denoted as L2_skin.
  • the human face bitmap and the skin map of the human face image are images, which can reflect the feeling from a human visual and subjective perspective, that is to say, reflect the first network model from a human visual observation perspective.
  • the feeling of similarity between the output result and the target face image is to say, reflect the feeling from a human visual and subjective perspective, that is to say, reflect the first network model from a human visual observation perspective. The feeling of similarity between the output result and the target face image.
  • the first loss function includes a third sub-loss
  • the above step 104 further includes:
  • the third sub-loss is obtained according to the difference between the first feature point data and the second feature point data.
  • face alignment analysis is performed according to the feature points in the target face image and the second face image, and the process of face alignment analysis can be understood as extracting the first feature point of the target face image through a face alignment model Then, the second feature point data of the second face image is extracted, the extracted first feature point data is compared with the second feature point data, and the third sub-loss is determined according to the difference value.
  • the analysis of the feature point data can be understood as comparing the similarity between the output result of the first network model and the real face corresponding to the input image from a numerical point of view.
  • the feature point data of the face image can be extracted through coordinate regression, which is faster and requires less computation.
  • the feature point data includes a heatmap of a face image
  • the heatmap of the face image includes a left-eye heatmap, a right-eye heatmap, a nose heatmap, a mouth heatmap, and a face contour of the face image
  • the left eye heat map refers to the heat map composed of key points located in the area corresponding to the left eye
  • the face contour heat map refers to the heat map composed of the key points corresponding to the regions other than each organ, and so on.
  • Multiple local heatmaps that make up a face image By generating a plurality of local heat maps constituting the face image, it is helpful to further improve the accuracy of the calculation of the feature point data of the face image.
  • the key points are first determined, and the number of key points can be set as required, for example, a heat map of 68 points is selected; next, n heat maps with the same number of key points as n are output, which is 68 in this embodiment.
  • a heat map further, find the point with the highest peak in the heat map as a key point, or perform weighted calculation on the contribution value of each pixel in the heat map to obtain the coordinates of the key point.
  • the calculation accuracy can be further improved by obtaining the feature point data of the face image based on the heat map regression.
  • the face alignment model can be a face alignment model such as AWing ([ICCV 2019] Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression).
  • AWing [ICCV 2019] Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression.
  • the face alignment model is used to obtain the first feature point data of the target face image, that is, the heat map of the target face image
  • the face alignment model is used to obtain the second feature point data of the second face image. is the heatmap of the second face image.
  • a third sub-loss is obtained according to the difference between the first feature point data and the second feature point data.
  • the third sub-loss is the L2 loss of the heatmap of the target face image and the heatmap of the second face image, which is denoted as L2_heatmap.
  • the first loss function includes a fourth sub-loss
  • the above step 104 further includes:
  • the fourth sub-loss is obtained according to the difference between the first eigenvector and the second eigenvector.
  • feature analysis is also performed on the target face image and the second face image. Specifically, firstly, the feature vector of the target face image is calculated, then the feature vector of the second face image is calculated, and finally according to these two features The difference between the vectors determines the sub-loss.
  • the cosine similarity of the two feature vectors is calculated, and then the cosine similarity is subtracted from 1 as a loss function corresponding to the feature analysis.
  • the fourth sub-loss is denoted as LCosSimilarity.
  • the first loss function includes a fifth sub-loss
  • the above step 104 further includes:
  • the fifth sub-loss is obtained according to the difference between the target face image and the second face image.
  • the L2 loss of the target face image and the face reconstruction image output by the first network model is further introduced as the fifth sub-loss.
  • the difference value between the target face image and the second face image can be determined through a pre-trained face recognition model.
  • the face recognition model can be ArcFace "ArcFace: Additive Angular Margin Loss for Deep Face Recognition” and other existing or Improved face recognition model.
  • the fifth sub-loss is denoted as L20.
  • the first loss function includes one or more of the sixth sub-loss and the seventh sub-loss, and the above step 104 further includes:
  • the perceptual loss according to the difference between the mouth region image of the target face image and the mouth region image of the second face image is taken as the seventh sub-loss.
  • the image of the eye area and the image of the mouth area are further analyzed, respectively, to determine the perceptual loss of the target face image and the second face image in the eye area image, as the sixth sub-loss, denoted as L2_eye; determine the perceptual loss of the target face image and the second face image in the mouth area image, as the seventh sub-loss, denoted as L2_mouth.
  • the first loss function includes an eighth sub-loss
  • the above step 104 further includes:
  • the eighth sub-loss is obtained according to a generative adversarial loss between the first network model and the second network model.
  • the obtained discriminant result is a value between 0 and 1. According to the difference between the discriminant result and 1, the first discriminant result is obtained.
  • the generative adversarial loss between the network model and the second network model is denoted as the eighth sub-loss LG.
  • the second network model includes one or more of a global discrimination sub-network, an eye discrimination sub-network, and a mouth discrimination sub-network.
  • the global confrontation loss can be determined according to the judgment result of the global discriminant sub-network, denoted as LG_all; the second face image marked as real is input into the eye discrimination After the sub-network, the eye confrontation loss can be determined according to the judgment result of the eye discrimination sub-network, which is denoted as LG_eye; after the second face image marked as true is input into the mouth discrimination sub-network, according to the judgment of the mouth discrimination sub-network The result can determine the mouth adversarial loss, denoted LG_mouth.
  • the first loss function can be obtained.
  • the first loss function is denoted as L, then:
  • L w1*L2_feat+w2*L2_skin+w3*L2_heatmap+w4*LCosSimilarity+w5*L20+w6*L2_eye+w7*L2_mouth+LG.
  • LG w8*LG_all+w9*LG_eye+w10*LG_mouth.
  • w1 to w10 are the weight coefficients corresponding to each loss value, which can be set as needed.
  • the coefficient corresponding to the value is set relatively large, thereby obtaining the first loss function corresponding to the first network model.
  • the second loss function includes a first discriminative adversarial loss
  • the above step 106 further includes:
  • the first discriminant confrontation loss is obtained according to the first discriminant result and the second discriminant result. .
  • the second network model includes a global discrimination sub-network.
  • the second face image output by the first network model is first marked as false, for example, it can be marked as 0, and the target face image is marked as true , for example can be marked as 0.
  • the second face image and the target face image are respectively input into the global discrimination sub-network to obtain the discrimination result, which is a value between 0 and 1, wherein the discrimination result corresponding to the second face image is the first A discrimination result, the discrimination result corresponding to the second face image is the second discrimination result.
  • the first discriminant adversarial loss corresponding to the first network model and the global discriminant sub-network determined according to the obtained first discriminant result and the second discriminant result is denoted as LD_all.
  • the second loss function includes a second discriminative adversarial loss and a third discriminative adversarial loss
  • the above step 106 further includes:
  • the first eye image and the first mouth image are marked as false, the second eye image and the second mouth image are marked as true, the first eye image and the first
  • the two eye images are respectively input to the eye discrimination sub-network to output the third discrimination result and the fourth discrimination result respectively;
  • the first mouth image and the second mouth image are respectively input to the mouth discrimination sub-network respectively Output the fifth discrimination result and the sixth discrimination result;
  • a third discriminant confrontation loss is obtained according to the fifth discriminant result and the sixth discriminant result. .
  • the eye image and mouth image of the second face image need to be extracted.
  • the eye image of the second face image is recorded as the first
  • the mouth image of the second face image is recorded as the first mouth image.
  • the eye image of the target face image is recorded as the second eye
  • the mouth image of the target face image is denoted as the second mouth image.
  • Both the extracted first eye image and the first mouth image are marked as false, for example, both are marked as 0, and both the second eye image and the second mouth image are marked as true, for example, both are marked as 1 .
  • the second discriminant adversarial loss is obtained according to the difference between the third discriminant result and the fourth discriminant result, denoted as LD_eye
  • the third discriminant adversarial loss is obtained according to the difference between the fifth discriminant result and the sixth discriminant result, denoted as LD_mouth.
  • w11 to w13 are weight coefficients corresponding to each loss value.
  • the embodiment of the present disclosure also provides a face reconstruction method.
  • the face reconstruction method includes the following steps:
  • Step 201 obtain an input image
  • Step 202 Input the input image into a face reconstruction model to obtain face reconstruction data.
  • the face reconstruction model is obtained by performing model training using any one of the above methods for training a face reconstruction model.
  • the face reconstruction model used is a face reconstruction model obtained by training the above-mentioned face reconstruction model training method, and the input image is input into the face reconstruction model, and the output is consistent with the real face image. Higher face reconstruction results.
  • the present disclosure provides a training device for a face reconstruction model.
  • the training device 300 of the face reconstruction model includes:
  • a training data acquisition module 301 configured to acquire training data, the training data includes a target face image and a first face image corresponding to the target face image, the The clarity of the target face image;
  • the first input module 302 is configured to input the first face image into a first network model to obtain a second face image, wherein the first network model The reconstructed image of the face image is the output generative network model;
  • the second input module 302 is configured to input the target face image and the second face image into a second network model to obtain a discrimination result, wherein the second network model takes the face image as an input, and uses The discriminant network model for the output of the discriminant result for the authenticity of the input face image, and the discriminant result includes the discriminant result of the authenticity of the whole input face image and the authenticity of the local features;
  • a first loss function obtaining module 304 configured to obtain a first loss function corresponding to the first network model, and adjust parameters of the first network model according to the first loss function;
  • a second loss function obtaining module 305 configured to obtain a second loss function corresponding to the second network model, and adjust parameters of the second network model according to the second loss function;
  • a training module 306, configured to perform model training on the first network model and the second network model in turn;
  • the face reconstruction model confirmation module 307 is used to use the trained first network model as a face reconstruction model, wherein, when the training is completed, the values of the first loss function and the second loss function both reach the corresponding target threshold.
  • the second loss function includes a first discriminative adversarial loss
  • the second network model includes a global discriminative sub-network
  • the second loss function obtaining module 305 includes:
  • the first discrimination result acquisition sub-module is used to mark the second face image as false, mark the target face image as true, and mark the second face image and the target face image respectively Inputting the global discriminant sub-network to obtain the first discriminant result and the second discriminant result respectively;
  • the first discriminative adversarial loss obtaining sub-module is configured to obtain the first discriminative adversarial loss according to the first discriminant result and the second discriminant result.
  • the second loss function includes a second discriminative adversarial loss and a third discriminative adversarial loss
  • the second network model further includes an eye discrimination sub-network and a mouth discrimination sub-network
  • the second loss function obtaining module 305 further includes:
  • the second discriminant adversarial loss acquisition sub-module is used for the first discriminant adversarial loss acquisition sub-module
  • a first image acquisition submodule configured to obtain a corresponding first eye image and a first mouth image according to the second face image
  • the second image acquisition submodule is used to obtain the corresponding second eye image and the second mouth image according to the target face image;
  • a marking submodule configured to mark the first eye image and the first mouth image as false, mark the second eye image and the second mouth image as true, and mark the first eye image and the second mouth image as true
  • the eye image and the second eye image are respectively input to the eye discrimination sub-network to output the third discrimination result and the fourth discrimination result respectively;
  • the first mouth image and the second mouth image are respectively input to the The mouth discrimination sub-network outputs the fifth discrimination result and the sixth discrimination result respectively;
  • the second discriminative adversarial loss obtaining sub-module is configured to obtain the second discriminative adversarial loss according to the third discriminant result and the fourth discriminant result;
  • the third discriminative adversarial loss obtaining sub-module is configured to obtain the third discriminative adversarial loss according to the fifth discriminant result and the sixth discriminant result.
  • the first loss function includes a first sub-loss and a second sub-loss
  • the first loss function obtaining module 304 includes:
  • a human face bitmap acquisition submodule used for acquiring the first person face bitmap and the second person face bitmap corresponding to the target face image
  • a first parsing submodule configured to parse the second face image, and obtain a third-person face bitmap and a fourth-person face bitmap corresponding to the second face image, wherein the first person The face bitmap and the second person's face bitmap correspond to different areas of the same face image, and the first person's face bitmap and the third person's face bitmap correspond to the same area of different face images , the second person's face bitmap and the fourth person's face bitmap correspond to the same area of different face images;
  • a first sub-loss obtaining sub-module configured to obtain the first sub-loss according to the difference between the first-person face bitmap and the third-person face bitmap
  • the second sub-loss obtaining sub-module is configured to obtain the second sub-loss according to the difference between the second-person face bitmap and the fourth-person face bitmap.
  • the first human face bitmap includes a facial feature image of a human face image
  • the second human face bitmap includes a skin image of the human face image
  • the first loss function includes a third sub-loss
  • the first loss function obtaining module 304 further includes:
  • a feature point data acquisition sub-module used for acquiring the first feature point data corresponding to the target face image
  • a second parsing submodule configured to parse the second face image to obtain second feature point data corresponding to the second face image
  • the second sub-loss obtaining sub-module is configured to obtain the third sub-loss according to the difference between the first feature point data and the second feature point data.
  • the first feature point data includes a heat map of the target face image
  • the second feature point data includes a heat map of the second face image
  • the heat map includes a human face
  • the first loss function includes a fourth sub-loss
  • the first loss function obtaining module 304 further includes:
  • a feature vector acquisition submodule used for acquiring the first feature vector corresponding to the target face image
  • the feature vector obtaining submodule is also used to obtain the second feature vector corresponding to the second face image
  • the fourth sub-loss obtaining sub-module is configured to obtain the fourth sub-loss according to the difference between the first feature vector and the second feature vector.
  • the first loss function includes a fifth sub-loss
  • the first loss function obtaining module 304 further includes:
  • the fifth sub-loss obtaining sub-module is configured to obtain the fifth sub-loss according to the difference between the target face image and the second face image.
  • the first loss function includes one or more of a sixth sub-loss and a seventh sub-loss
  • the first loss function obtaining module 304 further includes:
  • a sixth sub-loss obtaining sub-module configured to use a perceptual loss based on the difference between the eye region image of the target face image and the eye region image of the second face image as the sixth sub-loss; and / or
  • the seventh sub-loss obtaining sub-module is configured to use the perceptual loss based on the difference between the mouth region image of the target face image and the mouth region image of the second face image as the seventh sub-loss.
  • the first loss function includes an eighth sub-loss
  • the first loss function obtaining module 304 further includes:
  • the eighth sub-loss obtaining sub-module is configured to obtain the eighth sub-loss according to the generative adversarial loss between the first network model and the second network model, wherein the second network model includes a global discriminator One or more of the network, the eye discrimination sub-network and the mouth discrimination sub-network, the generative adversarial loss is to mark the second face image output by the first network model as true, and then the second person After the face image is input to one or more of the global discrimination sub-network, the eye discrimination sub-network and the mouth discrimination sub-network, a discrimination result is obtained, which is determined according to the obtained discrimination result.
  • the training apparatus for a face reconstruction model of the embodiment of the present disclosure can implement the steps of the above-mentioned embodiments of the training method for a face reconstruction model, and can at least achieve the same or similar technical effects, which will not be repeated here.
  • Embodiments of the present disclosure provide a face reconstruction apparatus.
  • the face reconstruction apparatus 400 includes:
  • an input image acquisition module 401 for acquiring an input image
  • the input module 402 is configured to input the input image into a face reconstruction model to obtain a face reconstruction image, wherein the face reconstruction model is obtained by performing model training with the training method of any one of the above-mentioned face reconstruction models of.
  • the face reconstruction apparatus of the embodiment of the present disclosure implements each step of the above-mentioned face reconstruction method, and can at least achieve the same or similar technical effects, which will not be repeated here.
  • An embodiment of the present disclosure provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program is executed by the processor to achieve the above Any one of the steps of the training method for a face reconstruction model, or the steps of implementing the above-mentioned face reconstruction method.
  • An embodiment of the present disclosure provides a readable storage medium on which a computer program is stored, and when the computer program is executed by the processor, implements the steps of the method for training a face reconstruction model according to any one of the above, Or implement the steps of the above-mentioned face reconstruction method.
  • the electronic device and the readable storage medium of this embodiment can implement the steps of the above-mentioned training method of a face reconstruction model and face reconstruction method, and can at least achieve the same or similar technical effects, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质。人脸重建模型的训练方法包括获取训练数据(101);将所述第一人脸图像输入第一网络模型,获得第二人脸图像(102);将所述目标人脸图像和所述第二人脸图像输入第二网络模型,获得判别结果(103);获取所述第一网络模型对应的第一损失函数,并根据所述第一损失函数调节所述第一网络模型的参数(104);获取所述第二网络模型对应的第二损失函数,并根据所述第二损失函数调节所述第二网络模型的参数(105);交替进行上述步骤,以对所述第一网络模型和所述第二网络模型轮流进行模型训练(106);将训练完成的第一网络模型作为人脸重建模型(107)。该方案能够提高对于人脸重建的准确度和清晰度。

Description

人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质 技术领域
本公开涉及人脸重建技术领域,尤其涉及一种人脸重建模型的训练方法及装置、电子设备和可读存储介质。
背景技术
人脸重建指的是根据视频数据重建其中包括的人物的脸部图像的技术,被广泛应用于人物识别跟踪等方面,相关技术中,对于人脸重建的精确度和清晰度较差。
发明内容
本公开实施例提供一种人脸重建模型的训练方法及装置、人脸重建方法及装置,以解决对于人脸重建的精确度和清晰度较差的问题。
第一方面,本公开实施例提供了一种人脸重建模型的训练方法,包括以下步骤:
获取训练数据,所述训练数据包括目标人脸图像和与所述目标人脸图像对应的第一人脸图像,所述第一人脸图像的清晰度小于所述目标人脸图像的清晰度;
将所述第一人脸图像输入第一网络模型,获得第二人脸图像,其中,所述第一网络模型是以人脸图像为输入,以对于输入的人脸图像的重建图像为输出的生成网络模型;
将所述目标人脸图像和所述第二人脸图像输入第二网络模型,获得判别结果,其中,所述第二网络模型是以人脸图像为输入,以对于输入的人脸图像的真实性的判别结果为输出的判别网络模型,所述判别结果包括输入的人脸图像的整体的真实性和局部特征的真实性的判别结果;
获取所述第一网络模型对应的第一损失函数,并根据所述第一损失函数调节所述第一网络模型的参数;
获取所述第二网络模型对应的第二损失函数,并根据所述第二损失函数调节所述第二网络模型的参数;
交替进行上述步骤,以对所述第一网络模型和所述第二网络模型轮流进行模型训练;
将训练完成的第一网络模型作为人脸重建模型,其中,在训练完成的情况下,所述第一损失函数和所述第二损失函数的值均达到相应的目标阈值。
在一些实施例中,所述第二损失函数包含第一判别对抗损失,第二网络模型包含全局判别子网络;
所述获取所述第二网络模型对应的第二损失函数,包括:
将所述第二人脸图像标记为假,将所述目标人脸图像标记为真,将所述第二人脸图像和所述目标人脸图像分别输入所述全局判别子网络,分别得到第一判别结果和第二判别结果;
根据所述第一判别结果和所述第二判别结果得到所述第一判别对抗损失。
在一些实施例中,所述第二损失函数包含第二判别对抗损失和第三判别对抗损失,所述第二网络模型还包括眼部判别子网络和嘴部判别子网络;
所述获取所述第二网络模型对应的第二损失函数,包括:
根据所述第二人脸图像得到对应的第一眼部图像和第一嘴部图像;
根据所述目标人脸图像得到对应的第二眼部图像和第二嘴部图像;
将所述第一眼部图像和所述第一嘴部图像标记为假,将所述第二眼部图像和第二嘴部图像标记为真,将所述第一眼部图像和所述第二眼部图像分别输入到所述眼部判别子网络分别输出第三判别结果和第四判别结果;所述第一嘴部图像和第二嘴部图像分别输入到所述嘴部判别子网络分别输出第五判别结果和第六判别结果;
根据所述第三判别结果和所述第四判别结果得到第二判别对抗损失;
根据所述第五判别结果和所述第六判别结果得到第三判别对抗损失。
在一些实施例中,所述第一损失函数包含第一子损失和第二子损失;
所述获取所述第一网络模型对应的第一损失函数,包括:
获取所述目标人脸图像对应的第一人脸部位图和第二人脸部位图;
解析所述第二人脸图像,获得所述第二人脸图像对应的第三人脸部位图和第四人脸部位图,其中,所述第一人脸部位图和所述第二人脸部位图对应同一人脸图像的不同区域,所述第一人脸部位图和所述第三人脸部位图对应不同人脸图像的同一区域,所述第二人脸部位图和所述第四人脸部位图对应不同人脸图像的同一区域;
根据所述第一人脸部位图和所述第三人脸部位图之间的差异,得到所述第一子损失;
根据所述第二人脸部位图和所述第四人脸部位图之间的差异,得到所述第二子损失。
在一些实施例中,所述第一人脸部位图包括人脸图像的五官图像,所述第二人脸部位图包括所述人脸图像的皮肤图像。
在一些实施例中,所述第一损失函数包含第三子损失;
所述获取所述第一网络模型对应的第一损失函数,还包括:
获取所述目标人脸图像对应的第一特征点数据;
解析所述第二人脸图像,获得所述第二人脸图像对应的第二特征点数据;
根据所述第一特征点数据和所述第二特征点数据之间的差异,得到所述第三子损失。
在一些实施例中,所述第一特征点数据包括所述目标人脸图像的热图,所述第二特征点数据包括所述第二人脸图像的热图,其中,热图包括人脸图像的左眼热图、右眼热图、鼻部热图、嘴部热图和脸部轮廓热图中的一项或多项。
在一些实施例中,所述第一损失函数包含第四子损失;
所述获取所述第一网络模型对应的第一损失函数,还包括:
获取所述目标人脸图像对应的第一特征向量;
获取所述第二人脸图像对应的第二特征向量;
根据所述第一特征向量和所述第二特征向量之间的差异,得到所述第四子损失。
在一些实施例中,所述第一损失函数包含第五子损失;
所述获取所述第一网络模型对应的第一损失函数,还包括:
根据所述目标人脸图像和所述第二人脸图像的差异,得到所述第五子损失。
在一些实施例中,所述第一损失函数包含第六子损失和第七子损失中的一项或多项;
所述获取所述第一网络模型对应的第一损失函数,还包括:
根据所述目标人脸图像的眼部区域图像和所述第二人脸图像的眼部区域图像之间的差异的感知损失作为所述第六子损失;和/或
根据所述目标人脸图像的嘴部区域图像和所述第二人脸图像的嘴部区域图像之间的差异的感知损失作为所述第七子损失。
在一些实施例中,所述第一损失函数包含第八子损失;
所述获取所述第一网络模型对应的第一损失函数,还包括:
根据所述第一网络模型和所述第二网络模型之间的生成对抗损失获得所述第八子损失,其中,所述第二网络模型包括全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项,所述生成对抗损失是将所述第一网络模型输出的第二人脸图像标记为真,然后将第二人脸图像输入所述全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项之后,获得判别结果,并根据所获得的判别结果确定的。
第二方面,本公开实施例提供了一种人脸重建方法,包括以下步骤:
获取输入图像;
将所述输入图像输入人脸重建模型,获得人脸重建图像,其中,所述人脸重建模型是通过第一方面中任一项的人脸重建模型的训练方法进行模型训练得到的。
第三方面,本公开实施例提供了一种人脸重建模型的训练装置,包括:
训练数据获取模块,用于获取训练数据,所述训练数据包括目标人脸图像和与所述目标人脸图像对应的第一人脸图像,所述第一人脸图像的清晰度小于所述目标人脸图像的清晰度;
第一输入模块,用于将所述第一人脸图像输入第一网络模型,获得第二人脸图像,其中,所述第一网络模型是以人脸图像为输入,以对于输入的人脸图像的重建图像为输出的生成网络模型;
第二输入模块,用于将所述目标人脸图像和所述第二人脸图像输入第二网络模型,获得判别结果,其中,所述第二网络模型是以人脸图像为输入,以对于输入的人脸图像的真实性的判别结果为输出的判别网络模型,所述判别结果包括输入的人脸图像的整体的真实性和局部特征的真实性的判别结果;
第一损失函数获取模块,用于获取所述第一网络模型对应的第一损失函数,并根据所述第一损失函数调节所述第一网络模型的参数;
第二损失函数获取模块,用于获取所述第二网络模型对应的第二损失函数,并根据所述第二损失函数调节所述第二网络模型的参数;
训练模块,用于对所述第一网络模型和所述第二网络模型轮流进行模型训练;
人脸重建模型确认模块,用于将训练完成的第一网络模型作为人脸重建模型,其中,在训练完成的情况下,所述第一损失函数和所述第二损失函数的值均达到相应的目标阈值。
第四方面,本公开实施例提供了一种人脸重建装置,包括:
输入图像获取模块,用于获取输入图像;
输入模块,拥有将所述输入图像输入人脸重建模型,获得人脸重建图像,其中,所述人脸重建模型是通过第一方面中任一项的人脸重建模型的训练方法进行模型训练得到的。
第五方面,本公开实施例提供了一种电子设备,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如第一方面中任一项所述的人脸重建模型的训练方法的步骤,或者实现如第二方面中所述的人脸重建方法的步骤。
第五方面,本公开实施例提供了一种可读存储介质,其上存储有计算机程序,所述计算机程序被所述处理器执行时实现如第一方面中任一项所述的人脸重建模型的训练方法的步骤,或者实现如第二方面中所述的人脸重建方法的步骤。
本公开实施例通过建立包括第一网络模型和第二网络模型的生成对抗网络对第一网络模型和第二网络模型进行模型训练,其中,第二网络模型包括 整体的真实性和局部特征的真实性的判别结果,有助于提高对于第一网络模型的输出结果判断的准确程度,从而提高训练完成的重建模型对于人脸图像重建的准确程度,也有助于提高迭代速度,从而提高模型训练效率。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获取其他的附图。
图1是本公开一实施例提供的人脸重建模型的训练方法的流程图;
图2是本公开一实施例提供的人脸重建方法的流程图;
图3是本公开一实施例提供的人脸重建模型的训练装置的结构图;
图4是本公开一实施例提供的人脸重建装置的结构图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获取的所有其他实施例,都属于本公开保护的范围。
本公开实施例提供了一种人脸重建模型的训练方法。
如图1所示,在一个实施例中,该人脸重建模型的训练方法包括以下步骤:
步骤101:获取训练数据。
本实施例中的训练数据又称作训练集,训练数据包括目标人脸图像和第一人脸图像,训练数据包括目标人脸图像和与目标人脸图像对应的第一人脸图像,第一人脸图像的清晰度小于目标人脸图像的清晰度。其中,人脸图像的格式可以是视频数据,也可以是照片。
本实施例中的目标人脸图像和第一人脸图像可以直接由训练数据提供,也可以仅提供目标人脸图像,也就是清晰度较高的人脸图像,然后降低目标 人脸图像的清晰度,生成第一人脸图像,也称作将标人脸图像退化,在退化之后,标人脸图像的清晰度降低,从而获得清晰度小于目标人脸图像的第一人脸图像。
本公开的实施例中,“清晰度”可以指的是指图像中各细部影纹及其边界的清晰程度,清晰度越高,人眼的感观效果越好。输出图像的清晰度高于输入图像的清晰度,例如是指采用本公开实施例提供的图像处理方法对输入图像进行处理,例如进行去噪、超分和/或去模糊处理,从而使处理后得到的输出图像比输入图像更清晰
在一个实施例中,可以通过为图像添加噪声、高斯模糊、调整图像的亮度及对比度、对图像进行缩放处理、对于图像进行变形处理、为图像添加运动模糊效果中的一项或多项来对目标人脸图像进行退化处理。
应当理解的是,目标人脸图像中的人脸图像的质量相对较高,例如,目标人脸图像可能具有合适的亮度和对比度、其图像比例恰当、不存在运动模糊、图像质量较高等特点,实施时,可以通过降低或增加其亮度和对比度、调整其图像比例使图像比例失调等方式使目标人脸图像退化,以获得第一人脸图像,也就是获得清晰度较低的人脸图像。
这样,能够获得目标人脸图像和第一人脸图像的训练数据。
步骤102:将所述第一人脸图像输入第一网络模型,获得第二人脸图像。
本实施例中的第一网络模型是以人脸图像为输入,以对于输入的人脸图像的重建图像为输出的生成网络模型。
本实施例中的第一网络模型作为生成器,用于对输入的第一人脸图像进行处理重建。该第一网络模型对第一人脸图像进行去模糊化或分辨率提高处理,从而由该第一人脸图像获得第二人脸图像,换句话说,该第二人脸图像是第一网络模型对于第一人脸图像的重建结果。
步骤103:将所述目标人脸图像和所述第二人脸图像输入第二网络模型,获得判别结果。
本实施例中,第二网络模型是以人脸图像为输入,以对于输入的人脸图像的真实性的判别结果为输出的判别网络模型。
本实施例中的第二网络模型相当于判别器,第一网络模型和第二网络模 型共同构成生成对抗网络,以进行模型训练。
第二网络模型判别结果包括输入的人脸图像的整体的真实性和局部特征的真实性的判别结果,其中,整体的真实性指的是从人脸图像的全局角度对于输入的人脸图像的判别结果,而局部特征的真实性指的是对于人脸图像的局部细节特征的判定结果。
一般来说,作为判别器的第二网络模型的输出的判别结果为0至1之间的数值,其中,判别结果越接近1,则说明第二网络模型判别的真实性越高,反之,如果判别结果越接近0,则说明第二网络模型判别的真实性越低。
步骤104:获取所述第一网络模型对应的第一损失函数,并根据所述第一损失函数调节所述第一网络模型的参数。
步骤105:获取所述第二网络模型对应的第二损失函数,并根据所述第二损失函数调节所述第二网络模型的参数。
步骤106:交替进行上述步骤,以对所述第一网络模型和所述第二网络模型轮流进行模型训练。
进一步的,分别建立第一网络模型对应的第一损失函数,以及第二网络模型对应的第二损失函数,并根据所建立的损失函数调整相应的第一网络模型或第二网络模型的参数对第一网络模型和第二网络模型进行模型训练。
本实施例中,对于第一网络模型和第二网络模型交替训练的过程是可以调整的。例如,可以第一网络模型进行一次训练,对第二网络模型进行一次训练,再对第一网络模型进行一次训练,依此类推;也可以对第一网络模型进行多次训练之后,在对第二网络模型进行一次训练,之后再对第一网络模型进行多次训练,依此类推。显然,本实施例中对于第一网络模型和第二网络模型的训练方式并不局限于此。
步骤107:将训练完成的第一网络模型作为人脸重建模型。
本实施例汇中,在第一损失函数和第二损失函数的值均达到相应的目标阈值时,或称第一损失函数和第二损失函数均收敛的情况下,认为第一网络模型已经训练完成,该训练完成的第一网络模型就是满足人脸重建需要的人脸重建模型。需要说明的是此处的目标阈值可以根据实际情况设置,例如可以为第一损失函数或第二损失函数能达到的最小值或最大值。
本公开实施例通过建立包括第一网络模型和第二网络模型的生成对抗网络对第一网络模型和第二网络模型进行模型训练,其中,第二网络模型包括整体的真实性和局部特征的真实性的判别结果,有助于提高对于第一网络模型的输出结果判断的准确程度,从而提高训练完成的重建模型对于人脸图像重建的准确程度,也有助于提高迭代速度,从而提高模型训练效率。
在一些实施例中,所述第一损失函数包含第一子损失和第二子损失,上述步骤104包括:
获取所述目标人脸图像对应的第一人脸部位图和第二人脸部位图;
解析所述第二人脸图像,获得所述第二人脸图像对应的第三人脸部位图和第四人脸部位图;
根据所述第一人脸部位图和所述第三人脸部位图之间的差异,得到所述第一子损失;
根据所述第二人脸部位图和所述第四人脸部位图之间的差异,得到所述第二子损失。
本实施例中,目标人脸图像对应的第一人脸部位图和第二人脸部位图可以由训练数据直接提供,也可以通过对目标人脸图像进行解析获得。第二人脸图像对应的第三人脸部位图和第四人脸部位图则通过对第二人脸图像解析获得。
对于人脸图像解析获得相应的人脸部位图可以利用预训练的人脸解析模型实现,该人脸解析模型可以选择RoI Tanh(Face Parsing with RoI Tanh-Warping)等现有的或改进的人脸解析模型,此处不做进一步限定。
本实施例中,第一人脸部位图和所述第二人脸部位图对应同一人脸图像的不同区域,所述第一人脸部位图和所述第三人脸部位图对应的区域是相同的,换句话说,第一人脸部位图对应目标人脸图像的某一区域,例如眼睛区域,则第三人脸部位图对应第二人脸图像的眼睛区域,类似的,第二人脸部位图和第四人脸部位图对应人脸图像的同一区域。
在一些实施例中,第一人脸部位图包括人脸图像的五官图像,第二人脸部位图包括人脸图像的皮肤图像。
也就是说,第一人脸部位图和第三人脸部位图对应人脸图像中的五官图 像,本实施例中又将其称作器官图,器官指的可以是口、鼻、双眼、眉毛、耳朵等五官。第二人脸部位图和第四人脸部位图对应的是五官之外的皮肤区域。
通过分别获得器官图对应的第一子损失和皮肤图对应的第二子损失,能够分别确定第一网络模型对于器官区域和皮肤区域的重建结果,从而提高对于第一网络模型的调节精度,有助于提高模型训练效率。
应当理解的是,本实施例中的各个子损失可以以不同的方式计算,例如,可以以第一人脸部位图和第三人脸部位图的L1损失作为第一子损失,也可以以第一人脸部位图和第三人脸部位图的L2损失作为第一子损失,显然,实施时,还可以本实施例中对此不做进一步限定。其中,L1损失指的是最小绝对值偏差(LAD),L2损失指的是最小平方误差(LSE),具体计算方式可参考相关技术,此处不再赘述。
本实施例中,以第一子损失为通过人脸解析模型生成的目标人脸图像的器官图和第一网络模型输出的第二人脸图像的器官图之间的L2损失为例说明,本实施例中,将第一子损失记作L2_feat。
本实施例中,以第二子损失为通过人脸解析模型生成的目标人脸图像的皮肤图和第一网络模型输出的第二人脸图像的皮肤图之间的L2损失为例说明,本实施例中,将第二子损失记作L2_skin。
应当理解的是,人脸图像的人脸部位图和皮肤图是图像,能够体现从人的视觉和主观角度的感觉,也就是说,体现的是从人的视觉观察角度对于第一网络模型的输出结果和目标人脸图像之间的相似度的感受。
在一些实施例中,所述第一损失函数包含第三子损失,上述步骤104还包括:
获取所述目标人脸图像对应的第一特征点数据;
解析所述第二人脸图像,获得所述第二人脸图像对应的第二特征点数据;
根据所述第一特征点数据和所述第二特征点数据之间的差异,得到所述第三子损失。
本实施例中根据目标人脸图像和第二人脸图像中的特征点进行人脸对齐分析,该人脸对齐分析的过程可以理解为通过人脸对齐模型提取目标人脸图 像的第一特征点数据,然后提取第二人脸图像的第二特征点数据,将提取的第一特征点数据与第二特征点数据相对比,根据其差异值确定第三子损失。
对于特征点数据的分析可以理解为从数值角度对第一网络模型的输出结果和输入图像对应的真实人脸之间的相似度进行对比。
在一些实施例中,可以通过坐标回归提取人脸图像的特征点数据,速度较快,且计算量较小。
在其他一些实施例中,特征点数据包括人脸图像的热图,人脸图像的热图包括人脸图像的左眼热图、右眼热图、鼻部热图、嘴部热图和脸部轮廓热图中的一项或多项。例如,左眼热图指的是位于左眼对应的区域的关键点构成的热图,脸部轮廓热图指的是各器官之外的区域对应的关键点构成的热图,依此类推,生成构成人脸图像的多个局部热图。通过生成构成人脸图像的多个局部热图,有助于进一步提高对于人脸图像的特征点数据计算的精确程度。
本实施例中,首先确定关键点,关键点的数量可以根据需要设置,例如,选择68点热图;接下来,输出与关键点个数n相同的n幅热图,本实施例中为68幅热图;进一步的,寻找热图中峰值最高的点,作为关键点,或者对热图中每个像素点的贡献值进行加权计算,得到关键点坐标。
通过基于热图回归获得人脸图像的特征点数据,能够进一步提高计算精度。
实施时,首先提供一预训练的人脸对齐模型,该人脸对齐模型具体可以是AWing([ICCV 2019]Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression)等人脸对齐模型,具体可参考相关技术。
接下来,分别利用人脸对齐模型获取目标人脸图像的第一特征点数据,也就是目标人脸图像的热图,利用人脸对齐模型获取第二人脸图像的第二特征点数据,也就是第二人脸图像的热图。
最后,根据第一特征点数据和第二特征点数据的差异得到第三子损失。本实施例中,第三子损失为目标人脸图像的热图和第二人脸图像的热图的L2损失,将其记作L2_heatmap。
在一些实施例中,所述第一损失函数包含第四子损失,上述步骤104还包括:
获取所述目标人脸图像对应的第一特征向量;
获取所述第二人脸图像对应的第二特征向量;
根据所述第一特征向量和所述第二特征向量之间的差异,得到所述第四子损失。
本实施例中还对目标人脸图像和第二人脸图像进行特征分析,具体的,首先计算目标人脸图像的特征向量,然后计算第二人脸图像的特征向量,最后根据这两个特征向量之间的差异确定第子损失。
本实施例中通过计算这两个特征向量的余弦相似度,然后以1减去该余弦相似度作为特征分析对应的损失函数,本实施例中将该第四子损失记作LCosSimilarity。
在一些实施例中,所述第一损失函数包含第五子损失,上述步骤104还包括:
根据所述目标人脸图像和所述第二人脸图像的差异,得到所述第五子损失。
本实施例中,还进一步引入了目标人脸图像与第一网络模型输出的人脸重建图像的L2损失作为第五子损失。实施时,可以通过预训练人脸识别模型确定目标人脸图像和第二人脸图像的差异值,人脸识别模型可以是ArcFace《ArcFace:Additive Angular Margin Loss for Deep Face Recognition》等现有的或改进的人脸识别模型。实施例中将第五子损失记作L20。
在一些实施例中,所述第一损失函数包含第六子损失和第七子损失中的一项或多项,上述步骤104还包括:
根据所述目标人脸图像的眼部区域图像和所述第二人脸图像的眼部区域图像之间的差异的感知损失作为所述第六子损失;和/或
根据所述目标人脸图像的嘴部区域图像和所述第二人脸图像的嘴部区域图像之间的差异的感知损失作为所述第七子损失。
本实施例中,进一步分别对眼部区域图像和嘴部区域图像进行分析,分别确定在目标人脸图像和第二人脸图像在眼部区域图像的感知损失,作为第六子损失,记作L2_eye;确定在目标人脸图像和第二人脸图像在嘴部区域图像的感知损失,作为第七子损失,记作L2_mouth。
在一些实施例中,所述第一损失函数包含第八子损失,上述步骤104还包括:
根据所述第一网络模型和所述第二网络模型之间的生成对抗损失获得所述第八子损失。
实施时,首先将第一网络模型输出的第二人脸图像标记为真,具体的,例如将其标记为1,然后将该第二人脸图像输入全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项之后,获得相应的判别结果,所得到的判别结果是一个位于0至1之间的数值,根据该判别结果和1之间的差异,得到第一网络模型和所述第二网络模型之间的生成对抗损失,记作第八子损失LG。
本实施例中,第二网络模型包括全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项。
当将标记为真的第二人脸图像输入全局判别子网络后,根据全局判别子网络的判定结果能够确定全局对抗损失,记作LG_all;将标记为真的第二人脸图像输入眼部判别子网络后,根据眼部判别子网络的判定结果能够确定眼部对抗损失,记作LG_eye;将标记为真的第二人脸图像输入嘴部判别子网络后,根据嘴部判别子网络的判定结果能够确定嘴部对抗损失,记作LG_mouth。
在确定了上述第一子损失至第八子损失之后,能够获得第一损失函数,本实施例中将第一损失函数记作L,则:
L=w1*L2_feat+w2*L2_skin+w3*L2_heatmap+w4*LCosSimilarity+w5*L20+w6*L2_eye+w7*L2_mouth+LG。
其中,LG=w8*LG_all+w9*LG_eye+w10*LG_mouth。
上述公式中,w1至w10分别为各损失值对应的权重系数,可以根据需要设定,例如,可以均设置为1,也可以根据不同损失值的重要性程度,将重要性程度较大的损失值对应的系数设置的相对较大,从而获得第一网络模型对应的第一损失函数。
在一些实施例中,所述第二损失函数包含第一判别对抗损失,上述步骤106还包括:
将所述第二人脸图像标记为假,将所述目标人脸图像标记为真,将所述 第二人脸图像和所述目标人脸图像分别输入所述全局判别子网络,分别得到第一判别结果和第二判别结果;
根据所述第一判别结果和所述第二判别结果得到所述第一判别对抗损失。。
本实施例中,第二网络模型包括全局判别子网络,实施时,首先将第一网络模型输出的第二人脸图像标记为假,例如可以是标记为0,将目标人脸图像标记为真,例如可以是标记为0。然后将第二人脸图像和目标人脸图像分别输入全局判别子网络,获得判别结果,该判别结果是一个位于0至1之间的数值,其中,第二人脸图像对应的判别结果为第一判别结果,第二人脸图像对应的判别结果为第二判别结果。
接下来,根据所获得的第一判别结果和第二判别结果确定的第一网络模型和全局判别子网络对应的第一判别对抗损失,记作LD_all。
在一些实施例中,所述第二损失函数包含第二判别对抗损失和第三判别对抗损失,上述步骤106还包括:
根据所述第二人脸图像得到对应的第一眼部图像和第一嘴部图像;
根据所述目标人脸图像得到对应的第二眼部图像和第二嘴部图像;
将所述第一眼部图像和所述第一嘴部图像标记为假,将所述第二眼部图像和第二嘴部图像标记为真,将所述第一眼部图像和所述第二眼部图像分别输入到所述眼部判别子网络分别输出第三判别结果和第四判别结果;所述第一嘴部图像和第二嘴部图像分别输入到所述嘴部判别子网络分别输出第五判别结果和第六判别结果;
根据所述第三判别结果和所述第四判别结果得到第二判别对抗损失;
根据所述第五判别结果和所述第六判别结果得到第三判别对抗损失。。
在确定第二判别对抗损失和第三判别对抗损失时,需要提取第二人脸图像的眼部图像和嘴部图像,本实施例中,将第二人脸图像的眼部图像记作第一眼部图像,将第二人脸图像的嘴部图像记作第一嘴部图像。
在确定第二判别对抗损失和第三判别对抗损失时,还需要提取目标人脸图像的眼部图像和嘴部图像,本实施例中,将目标人脸图像的眼部图像记作第二眼部图像,将目标人脸图像的嘴部图像记作第二嘴部图像。
将所提取的第一眼部图像和第一嘴部图像均标记为假,例如,均标记为0,将第二眼部图像和第二嘴部图像均标记为真,例如,均标记为1。
将第一眼部图像输入眼部判别子网络,获得第三判别结果;将第二眼部图像输入眼部判别子网络,获得第四判别结果;将第一嘴部图像输入嘴部判别子网络,获得第五判别结果;将第二嘴部图像输入嘴部判别子网络,获得第六判别结果。
最后,根据第三判别结果和第四判别结果的差异得到第二判别对抗损失,记作LD_eye,根据第五判别结果和第六判别结果的差异得到第三判别对抗损失,记作LD_mouth。
在确定了第一判别对抗损失、第二判别对抗损失和第三判别对抗损失之后,能够获得第二损失函数,记作LD=w11*LD_all+w12*LD_mouth+w13*LD_mouth。其中,w11至w13为各损失值对应的权重系数。
本公开实施例还提供了一种人脸重建方法。
如图2所示,该人脸重建方法包括以下步骤:
步骤201:获取输入图像;
步骤202:将所述输入图像输入人脸重建模型,获得人脸重建数据。
本实施例中,人脸重建模型是通过以上任一项的人脸重建模型的训练方法进行模型训练得到的。
本实施例中,所使用的人脸重建模型为通过上述人脸重建模型的训练方法训练获得的人脸重建模型,将输入图像输入该人脸重建模型,能够输出与真实的人脸图像一致程度较高的人脸重建结果。
本公开提供了一种人脸重建模型的训练装置。
在一个实施例中,如图3所示,该人脸重建模型的训练装置300包括:
训练数据获取模块301,用于获取训练数据,所述训练数据包括目标人脸图像和与所述目标人脸图像对应的第一人脸图像,所述第一人脸图像的清晰度小于所述目标人脸图像的清晰度;
第一输入模块302,用于将所述第一人脸图像输入第一网络模型,获得第二人脸图像,其中,所述第一网络模型是以人脸图像为输入,以对于输入的人脸图像的重建图像为输出的生成网络模型;
第二输入模块302,用于将所述目标人脸图像和所述第二人脸图像输入第二网络模型,获得判别结果,其中,所述第二网络模型是以人脸图像为输入,以对于输入的人脸图像的真实性的判别结果为输出的判别网络模型,所述判别结果包括输入的人脸图像的整体的真实性和局部特征的真实性的判别结果;
第一损失函数获取模块304,用于获取所述第一网络模型对应的第一损失函数,并根据所述第一损失函数调节所述第一网络模型的参数;
第二损失函数获取模块305,用于获取所述第二网络模型对应的第二损失函数,并根据所述第二损失函数调节所述第二网络模型的参数;
训练模块306,用于对所述第一网络模型和所述第二网络模型轮流进行模型训练;
人脸重建模型确认模块307,用于将训练完成的第一网络模型作为人脸重建模型,其中,在训练完成的情况下,所述第一损失函数和所述第二损失函数的值均达到相应的目标阈值。
在一些实施例中,所述第二损失函数包含第一判别对抗损失,第二网络模型包含全局判别子网络;
所述第二损失函数获取模块305,包括:
第一判别结果获取子模块,用于根将所述第二人脸图像标记为假,将所述目标人脸图像标记为真,将所述第二人脸图像和所述目标人脸图像分别输入所述全局判别子网络,分别得到第一判别结果和第二判别结果;
第一判别对抗损失获取子模块,用于根据所述第一判别结果和所述第二判别结果得到所述第一判别对抗损失。
在一些实施例中,所述第二损失函数包含第二判别对抗损失和第三判别对抗损失,所述第二网络模型还包括眼部判别子网络和嘴部判别子网络;
所述第二损失函数获取模块305,还包括:
第二判别对抗损失获取子模块,用于
第一图像获取子模块,用于根据所述第二人脸图像得到对应的第一眼部图像和第一嘴部图像;
第二图像获取子模块,用于根据所述目标人脸图像得到对应的第二眼部 图像和第二嘴部图像;
标记子模块,用于将所述第一眼部图像和所述第一嘴部图像标记为假,将所述第二眼部图像和第二嘴部图像标记为真,将所述第一眼部图像和所述第二眼部图像分别输入到所述眼部判别子网络分别输出第三判别结果和第四判别结果;所述第一嘴部图像和第二嘴部图像分别输入到所述嘴部判别子网络分别输出第五判别结果和第六判别结果;
第二判别对抗损失获取子模块,用于根据所述第三判别结果和第四判别结果得到第二判别对抗损失;
第三判别对抗损失获取子模块,用于根据所述第五判别结果和第六判别结果得到第三判别对抗损失。
在一些实施例中,所述第一损失函数包含第一子损失和第二子损失;
第一损失函数获取模块304,包括:
人脸部位图获取子模块,用于获取所述目标人脸图像对应的第一人脸部位图和第二人脸部位图;
第一解析子模块,用于解析所述第二人脸图像,获得所述第二人脸图像对应的第三人脸部位图和第四人脸部位图,其中,所述第一人脸部位图和所述第二人脸部位图对应同一人脸图像的不同区域,所述第一人脸部位图和所述第三人脸部位图对应不同人脸图像的同一区域,所述第二人脸部位图和所述第四人脸部位图对应不同人脸图像的同一区域;
第一子损失获取子模块,用于根据所述第一人脸部位图和所述第三人脸部位图之间的差异,得到所述第一子损失;
第二子损失获取子模块,用于根据所述第二人脸部位图和所述第四人脸部位图之间的差异,得到所述第二子损失。
在一些实施例中,所述第一人脸部位图包括人脸图像的五官图像,所述第二人脸部位图包括所述人脸图像的皮肤图像。
在一些实施例中,所述第一损失函数包含第三子损失;
所述第一损失函数获取模块304,还包括:
特征点数据获取子模块,用于获取所述目标人脸图像对应的第一特征点数据;
第二解析子模块,用于解析所述第二人脸图像,获得所述第二人脸图像对应的第二特征点数据;
第二子损失获取子模块,用于根据所述第一特征点数据和所述第二特征点数据之间的差异,得到所述第三子损失。
在一些实施例中,所述第一特征点数据包括所述目标人脸图像的热图,所述第二特征点数据包括所述第二人脸图像的热图,其中,热图包括人脸图像的左眼热图、右眼热图、鼻部热图、嘴部热图和脸部轮廓热图中的一项或多项。
在一些实施例中,所述第一损失函数包含第四子损失;
所述第一损失函数获取模块304,还包括:
特征向量获取子模块,用于获取所述目标人脸图像对应的第一特征向量;
所述特征向量获取子模块,还用于获取所述第二人脸图像对应的第二特征向量;
第四子损失获取子模块,用于根据所述第一特征向量和所述第二特征向量之间的差异,得到所述第四子损失。
在一些实施例中,所述第一损失函数包含第五子损失;
所述第一损失函数获取模块304,还包括:
第五子损失获取子模块,用于根据所述目标人脸图像和所述第二人脸图像的差异,得到所述第五子损失。
在一些实施例中,所述第一损失函数包含第六子损失和第七子损失中的一项或多项;
所述第一损失函数获取模块304,还包括:
第六子损失获取子模块,用于根据所述目标人脸图像的眼部区域图像和所述第二人脸图像的眼部区域图像之间的差异的感知损失作为所述第六子损失;和/或
第七子损失获取子模块,用于根据所述目标人脸图像的嘴部区域图像和所述第二人脸图像的嘴部区域图像之间的差异的感知损失作为所述第七子损失。
在一些实施例中,所述第一损失函数包含第八子损失;
所述第一损失函数获取模块304,还包括:
第八子损失获取子模块,用于根据所述第一网络模型和所述第二网络模型之间的生成对抗损失获得所述第八子损失,其中,所述第二网络模型包括全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项,所述生成对抗损失是将所述第一网络模型输出的第二人脸图像标记为真,然后将第二人脸图像输入所述全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项之后,获得判别结果,并根据所获得的判别结果确定的。
本公开实施例的人脸重建模型的训练装置能够实现上述人脸重建模型的训练方法实施例的各个步骤,并至少能实现相同或相似的技术效果,此处不再赘述
本公开实施例提供了一种人脸重建装置。
如图4所示,在一个实施例中,该人脸重建装置400包括:
输入图像获取模块401,用于获取输入图像;
输入模块402,用于将所述输入图像输入人脸重建模型,获得人脸重建图像,其中,所述人脸重建模型是通过上述中任一项的人脸重建模型的训练方法进行模型训练得到的。
本公开实施例的人脸重建装置实现上述人脸重建方法的各个步骤,并至少能实现相同或相似的技术效果,此处不再赘述
本公开实施例提供了一种电子设备,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如以上任一项所述的人脸重建模型的训练方法的步骤,或者实现如上述的人脸重建方法的步骤。
本公开实施例提供了一种可读存储介质,其上存储有计算机程序,所述计算机程序被所述处理器执行时实现如以上任一项所述的人脸重建模型的训练方法的步骤,或者实现如上述的人脸重建方法的步骤。
本实施例的电子设备和可读存储介质能够实现上述人脸重建模型的训练方法和人脸重建方法的步骤,并至少能实现相同或相似的技术效果,此处不再赘述。
以上,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此, 任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (16)

  1. 一种人脸重建模型的训练方法,包括以下步骤:
    获取训练数据,所述训练数据包括目标人脸图像和与所述目标人脸图像对应的第一人脸图像,所述第一人脸图像的清晰度小于所述目标人脸图像的清晰度;
    将所述第一人脸图像输入第一网络模型,获得第二人脸图像,其中,所述第一网络模型是以人脸图像为输入,以对于输入的人脸图像的重建图像为输出的生成网络模型;
    将所述目标人脸图像和所述第二人脸图像输入第二网络模型,获得判别结果,其中,所述第二网络模型是以人脸图像为输入,以对于输入的人脸图像的真实性的判别结果为输出的判别网络模型,所述判别结果包括输入的人脸图像的整体的真实性和局部特征的真实性的判别结果;
    获取所述第一网络模型对应的第一损失函数,并根据所述第一损失函数调节所述第一网络模型的参数;
    获取所述第二网络模型对应的第二损失函数,并根据所述第二损失函数调节所述第二网络模型的参数;
    交替进行上述步骤,以对所述第一网络模型和所述第二网络模型轮流进行模型训练;
    将训练完成的第一网络模型作为人脸重建模型,其中,在训练完成的情况下,所述第一损失函数和所述第二损失函数的值均达到相应的目标阈值。
  2. 根据权利要求1所述的方法,其中,所述第二损失函数包含第一判别对抗损失,第二网络模型包含全局判别子网络;
    所述获取所述第二网络模型对应的第二损失函数,包括:
    将所述第二人脸图像标记为假,将所述目标人脸图像标记为真,将所述第二人脸图像和所述目标人脸图像分别输入所述全局判别子网络,分别得到第一判别结果和第二判别结果;
    根据所述第一判别结果和所述第二判别结果得到所述第一判别对抗损失。
  3. 根据权利要求2所述的方法,其中,所述第二损失函数包含第二判别对抗损失和第三判别对抗损失,所述第二网络模型还包括眼部判别子网络和嘴部判别子网络;
    所述获取所述第二网络模型对应的第二损失函数,包括:
    根据所述第二人脸图像得到对应的第一眼部图像和第一嘴部图像;
    根据所述目标人脸图像得到对应的第二眼部图像和第二嘴部图像;
    将所述第一眼部图像和所述第一嘴部图像标记为假,将所述第二眼部图像和第二嘴部图像标记为真,将所述第一眼部图像和所述第二眼部图像分别输入到所述眼部判别子网络分别输出第三判别结果和第四判别结果;所述第一嘴部图像和第二嘴部图像分别输入到所述嘴部判别子网络分别输出第五判别结果和第六判别结果;
    根据所述第三判别结果和所述第四判别结果得到第二判别对抗损失;
    根据所述第五判别结果和所述第六判别结果得到第三判别对抗损失。
  4. 根据权利要求1至3中任一项所述的方法,其中,所述第一损失函数包含第一子损失和第二子损失;
    所述获取所述第一网络模型对应的第一损失函数,包括:
    获取所述目标人脸图像对应的第一人脸部位图和第二人脸部位图;
    解析所述第二人脸图像,获得所述第二人脸图像对应的第三人脸部位图和第四人脸部位图,其中,所述第一人脸部位图和所述第二人脸部位图对应同一人脸图像的不同区域,所述第一人脸部位图和所述第三人脸部位图对应不同人脸图像的同一区域,所述第二人脸部位图和所述第四人脸部位图对应不同人脸图像的同一区域;
    根据所述第一人脸部位图和所述第三人脸部位图之间的差异,得到所述第一子损失;
    根据所述第二人脸部位图和所述第四人脸部位图之间的差异,得到所述第二子损失。
  5. 根据权利要求4所述的方法,其中,所述第一人脸部位图包括人脸图像的五官图像,所述第二人脸部位图包括所述人脸图像的皮肤图像。
  6. 根据权利要求1至3中任一项所述的方法,其中,所述第一损失函数 包含第三子损失;
    所述获取所述第一网络模型对应的第一损失函数,还包括:
    获取所述目标人脸图像对应的第一特征点数据;
    解析所述第二人脸图像,获得所述第二人脸图像对应的第二特征点数据;
    根据所述第一特征点数据和所述第二特征点数据之间的差异,得到所述第三子损失。
  7. 根据权利要求6所述的方法,其中,所述第一特征点数据包括所述目标人脸图像的热图,所述第二特征点数据包括所述第二人脸图像的热图,其中,热图包括人脸图像的左眼热图、右眼热图、鼻部热图、嘴部热图和脸部轮廓热图中的一项或多项。
  8. 根据权利要求1至3中任一项所述的方法,其中,所述第一损失函数包含第四子损失;
    所述获取所述第一网络模型对应的第一损失函数,还包括:
    获取所述目标人脸图像对应的第一特征向量;
    获取所述第二人脸图像对应的第二特征向量;
    根据所述第一特征向量和所述第二特征向量之间的差异,得到所述第四子损失。
  9. 根据权利要求1至3中任一项所述的方法,其中,所述第一损失函数包含第五子损失;
    所述获取所述第一网络模型对应的第一损失函数,还包括:
    根据所述目标人脸图像和所述第二人脸图像的差异,得到所述第五子损失。
  10. 根据权利要求1至3中任一项所述的方法,其中,所述第一损失函数包含第六子损失和第七子损失中的一项或多项;
    所述获取所述第一网络模型对应的第一损失函数,还包括:
    根据所述目标人脸图像的眼部区域图像和所述第二人脸图像的眼部区域图像之间的差异的感知损失作为所述第六子损失;和/或
    根据所述目标人脸图像的嘴部区域图像和所述第二人脸图像的嘴部区域图像之间的差异的感知损失作为所述第七子损失。
  11. 根据权利要求1至3中任一项所述的方法,其中,所述第一损失函数包含第八子损失;
    所述获取所述第一网络模型对应的第一损失函数,还包括:
    根据所述第一网络模型和所述第二网络模型之间的生成对抗损失获得所述第八子损失,其中,所述第二网络模型包括全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项,所述生成对抗损失是将所述第一网络模型输出的第二人脸图像标记为真,然后将第二人脸图像输入所述全局判别子网络、眼部判别子网络和嘴部判别子网络中的一项或多项之后,获得判别结果,并根据所获得的判别结果确定的。
  12. 一种人脸重建方法,包括以下步骤:
    获取输入图像;
    将所述输入图像输入人脸重建模型,获得人脸重建图像,其中,所述人脸重建模型是通过权利要求1至11中任一项的人脸重建模型的训练方法进行模型训练得到的。
  13. 一种人脸重建模型的训练装置,包括:
    训练数据获取模块,用于获取训练数据,所述训练数据包括目标人脸图像和与所述目标人脸图像对应的第一人脸图像,所述第一人脸图像的清晰度小于所述目标人脸图像的清晰度;
    第一输入模块,用于将所述第一人脸图像输入第一网络模型,获得第二人脸图像,其中,所述第一网络模型是以人脸图像为输入,以对于输入的人脸图像的重建图像为输出的生成网络模型;
    第二输入模块,用于将所述目标人脸图像和所述第二人脸图像输入第二网络模型,获得判别结果,其中,所述第二网络模型是以人脸图像为输入,以对于输入的人脸图像的真实性的判别结果为输出的判别网络模型,所述判别结果包括输入的人脸图像的整体的真实性和局部特征的真实性的判别结果;
    第一损失函数获取模块,用于获取所述第一网络模型对应的第一损失函数,并根据所述第一损失函数调节所述第一网络模型的参数;
    第二损失函数获取模块,用于获取所述第二网络模型对应的第二损失函 数,并根据所述第二损失函数调节所述第二网络模型的参数;
    训练模块,用于对所述第一网络模型和所述第二网络模型轮流进行模型训练;
    人脸重建模型确认模块,用于将训练完成的第一网络模型作为人脸重建模型,其中,在训练完成的情况下,所述第一损失函数和所述第二损失函数的值均达到相应的目标阈值。
  14. 一种人脸重建装置,包括:
    输入图像获取模块,用于获取输入图像;
    输入模块,用于将所述输入图像输入人脸重建模型,获得人脸重建图像,其中,所述人脸重建模型是通过权利要求1至11中任一项的人脸重建模型的训练方法进行模型训练得到的。
  15. 一种电子设备,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至11中任一项所述的人脸重建模型的训练方法的步骤,或者实现如权利要求12中所述的人脸重建方法的步骤。
  16. 一种可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至11中任一项所述的人脸重建模型的训练方法的步骤,或者实现如权利要求12中所述的人脸重建方法的步骤。
PCT/CN2020/124657 2020-10-29 2020-10-29 人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质 WO2022087941A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080002537.5A CN114981835A (zh) 2020-10-29 2020-10-29 人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质
PCT/CN2020/124657 WO2022087941A1 (zh) 2020-10-29 2020-10-29 人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/124657 WO2022087941A1 (zh) 2020-10-29 2020-10-29 人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质

Publications (1)

Publication Number Publication Date
WO2022087941A1 true WO2022087941A1 (zh) 2022-05-05

Family

ID=81381725

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124657 WO2022087941A1 (zh) 2020-10-29 2020-10-29 人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质

Country Status (2)

Country Link
CN (1) CN114981835A (zh)
WO (1) WO2022087941A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362972A (zh) * 2023-05-22 2023-06-30 飞狐信息技术(天津)有限公司 图像处理方法、装置、电子设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439610B (zh) * 2022-09-14 2024-04-26 中国电信股份有限公司 模型的训练方法、训练装置、电子设备和可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615582A (zh) * 2018-11-30 2019-04-12 北京工业大学 一种基于属性描述生成对抗网络的人脸图像超分辨率重建方法
CN109785258A (zh) * 2019-01-10 2019-05-21 华南理工大学 一种基于多判别器生成对抗网络的人脸图像修复方法
US20190370608A1 (en) * 2018-05-31 2019-12-05 Seoul National University R&Db Foundation Apparatus and method for training facial locality super resolution deep neural network
CN110543846A (zh) * 2019-08-29 2019-12-06 华南理工大学 一种基于生成对抗网络的多姿态人脸图像正面化方法
CN111080521A (zh) * 2019-12-12 2020-04-28 天津中科智能识别产业技术研究院有限公司 一种基于结构先验的人脸图像超分辨率方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489290B (zh) * 2019-04-02 2023-05-16 长信智控网络科技有限公司 一种人脸图像超分辨重建方法、装置及终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370608A1 (en) * 2018-05-31 2019-12-05 Seoul National University R&Db Foundation Apparatus and method for training facial locality super resolution deep neural network
CN109615582A (zh) * 2018-11-30 2019-04-12 北京工业大学 一种基于属性描述生成对抗网络的人脸图像超分辨率重建方法
CN109785258A (zh) * 2019-01-10 2019-05-21 华南理工大学 一种基于多判别器生成对抗网络的人脸图像修复方法
CN110543846A (zh) * 2019-08-29 2019-12-06 华南理工大学 一种基于生成对抗网络的多姿态人脸图像正面化方法
CN111080521A (zh) * 2019-12-12 2020-04-28 天津中科智能识别产业技术研究院有限公司 一种基于结构先验的人脸图像超分辨率方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362972A (zh) * 2023-05-22 2023-06-30 飞狐信息技术(天津)有限公司 图像处理方法、装置、电子设备及存储介质
CN116362972B (zh) * 2023-05-22 2023-08-08 飞狐信息技术(天津)有限公司 图像处理方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114981835A (zh) 2022-08-30

Similar Documents

Publication Publication Date Title
Grishchenko et al. Attention mesh: High-fidelity face mesh prediction in real-time
CN109376582B (zh) 一种基于生成对抗网络的交互式人脸卡通方法
CN111476200A (zh) 基于生成对抗网络的人脸去识别化生成方法
US20230081982A1 (en) Image processing method and apparatus, computer device, storage medium, and computer program product
WO2022087941A1 (zh) 人脸重建模型的训练方法及装置、人脸重建方法及装置、电子设备和可读存储介质
US11282257B2 (en) Pose selection and animation of characters using video data and training techniques
CN115914505B (zh) 基于语音驱动数字人模型的视频生成方法及系统
WO2023088277A1 (zh) 虚拟穿戴方法、装置、设备、存储介质及程序产品
CN114187624B (zh) 图像生成方法、装置、电子设备及存储介质
JP2019197311A (ja) 学習方法、学習プログラム、および学習装置
RU2721180C1 (ru) Способ генерации анимационной модели головы по речевому сигналу и электронное вычислительное устройство, реализующее его
Kaur et al. Eyegan: Gaze-preserving, mask-mediated eye image synthesis
CN114862710A (zh) 红外和可见光图像融合方法及装置
Pini et al. Learning to generate facial depth maps
CN113052866B (zh) 基于局部二值拟合模型的超声图像舌轮廓提取方法
Hwang et al. Discohead: audio-and-video-driven talking head generation by disentangled control of head pose and facial expressions
CN106778576A (zh) 一种基于sehm特征图序列的动作识别方法
WO2023241298A1 (zh) 一种视频生成方法、装置、存储介质及电子设备
CN115294622B (zh) 语音驱动说话人头动视频合成增强方法、系统和存储介质
Ma et al. Edge-guided cnn for denoising images from portable ultrasound devices
WO2023088276A1 (zh) 漫画化模型构建方法、装置、设备、存储介质及程序产品
CN115424337A (zh) 基于先验引导的虹膜图像修复系统
Zhou et al. Synchronizing detection and removal of smoke in endoscopic images with cyclic consistency adversarial nets
CN112837318B (zh) 超声图像生成模型的生成方法、合成方法、介质及终端
CN111582120A (zh) 用于捕捉眼球活动特征的方法、终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20959104

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20959104

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20959104

Country of ref document: EP

Kind code of ref document: A1