WO2022088089A1 - 图像处理方法、图像处理装置、电子设备及可读存储介质 - Google Patents

图像处理方法、图像处理装置、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2022088089A1
WO2022088089A1 PCT/CN2020/125463 CN2020125463W WO2022088089A1 WO 2022088089 A1 WO2022088089 A1 WO 2022088089A1 CN 2020125463 W CN2020125463 W CN 2020125463W WO 2022088089 A1 WO2022088089 A1 WO 2022088089A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
repaired
scale
generator
Prior art date
Application number
PCT/CN2020/125463
Other languages
English (en)
French (fr)
Inventor
王镜茹
陈冠男
胡风硕
刘瀚文
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/425,715 priority Critical patent/US20230325973A1/en
Priority to CN202080002585.4A priority patent/CN114698398A/zh
Priority to PCT/CN2020/125463 priority patent/WO2022088089A1/zh
Publication of WO2022088089A1 publication Critical patent/WO2022088089A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present disclosure relate to the technical field of image processing, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a readable storage medium.
  • Image quality restoration technology is widely used in old photo restoration, video sharpening and other fields.
  • Most of the current algorithms use super-resolution reconstruction technology to repair low-definition images, and the results are usually smooth, or the facial features of the face are easily deformed during the face repair process. Therefore, how to improve the image repair effect is urgent. technical problems solved.
  • Embodiments of the present disclosure provide an image processing method, an image processing apparatus, an electronic device, and a readable storage medium, which are used to solve the problem that the restoration effect of the current image restoration method is not ideal.
  • an embodiment of the present disclosure provides an image processing method, including:
  • the first generator is obtained by training the generator to be trained by using at least two discriminators.
  • an embodiment of the present disclosure provides an image processing method, including:
  • the first repaired training image and the second repaired training image are fused to obtain a fused image, and the definition of the fusion image is higher than that of the input image.
  • an image processing apparatus including:
  • a receiving module for receiving an input image
  • a processing module configured to use the first generator to process the input image to obtain an output image, wherein the definition of the output image is higher than that of the input image;
  • the first generator is obtained by training the generator to be trained by using at least two discriminators.
  • an image processing apparatus including:
  • a receiving module for receiving an input image
  • a face detection module for performing face detection on the input image to obtain a face image
  • a first processing module configured to process the face image by using the method described in the first aspect above to obtain a first repaired training image, wherein the definition of the first repaired training image is higher than that of the input image clarity;
  • the second processing module is configured to process the input image or the input image from which the face image is removed to obtain a second repaired training image, wherein the definition of the second repaired training image is higher than that of the input image clarity;
  • the first repaired training image and the second repaired training image are fused to obtain a fused image, and the definition of the fusion image is higher than that of the input image.
  • an embodiment of the present disclosure provides an electronic device, including a processor, a memory, and a program or instruction stored on the memory and executable on the processor, where the program or instruction is processed by the processor.
  • an embodiment of the present disclosure provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the image processing described in the first aspect above is implemented The steps of the method, or the steps of implementing the image processing method described in the second aspect above.
  • the restored image details can be enriched, and the restoration effect can be improved.
  • FIG. 1 is a schematic flowchart of an image processing method according to an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of a multi-scale first generator according to an embodiment of the disclosure
  • FIG. 3 is a schematic flowchart of an image processing method according to another embodiment of the disclosure.
  • FIG. 4 is a schematic flowchart of an image processing method according to another embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a key point extraction method according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a method for generating a key point mask image according to an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of a multi-scale first generator according to another embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of various types of losses of a generator according to an embodiment of the disclosure.
  • FIG. 9 , FIG. 11 , FIG. 13 , FIG. 17 , FIG. 18 , and FIG. 19 are schematic diagrams of a training method of a generator according to an embodiment of the disclosure.
  • FIG. 10 , FIG. 12 , and FIG. 14 are schematic diagrams of a training method of a discriminator according to an embodiment of the disclosure.
  • FIG. 15 is a schematic diagram of a human face image according to an embodiment of the disclosure.
  • 16 is a schematic diagram of the input and output of a generator and a discriminator according to an embodiment of the disclosure
  • FIG. 20 is a schematic diagram of a training method of a generator according to another embodiment of the disclosure.
  • 21 is a schematic diagram of a training method of a discriminator according to another embodiment of the disclosure.
  • 22 is a schematic diagram of the input and output of a generator and a discriminator according to another embodiment of the disclosure.
  • FIG. 23 is a schematic diagram of a training method for a generator according to another embodiment of the disclosure.
  • FIG. 24 is a schematic diagram of a training method of a discriminator according to another embodiment of the disclosure.
  • 25 is a schematic flowchart of an image processing method according to another embodiment of the disclosure.
  • 26 is a schematic structural diagram of an image processing apparatus according to an embodiment of the disclosure.
  • FIG. 27 is a schematic structural diagram of an image processing apparatus according to another embodiment of the disclosure.
  • an embodiment of the present disclosure provides an image processing method, including:
  • Step 11 Receive the input image
  • the input image may be an image to be processed, such as an image with lower definition.
  • the image to be processed may be a video frame extracted from a video, or a picture downloaded through a network or captured by a camera, or an image obtained through other means, which is not limited in this embodiment of the present disclosure.
  • the image processing method provided by the embodiment of the present disclosure needs to be used to denoise and/or deblur, so as to improve the definition and realize the enhancement of the image quality.
  • the input image when the input image is a color image, the input image may include a red (R) channel input image, a green (G) channel input image, and a blue (B) channel input image.
  • Step 12 Use a first generator to process the input image to obtain an output image, wherein the definition of the output image is higher than that of the input image; wherein the first generator uses at least two A discriminator is obtained by training the generator.
  • the first generator can be a trained neural network.
  • the generator to be trained can be established based on the structure of the convolutional neural network described above, but the parameters still need to be trained.
  • the generator to be trained is used to train the first generator, and the parameters of the generator to be trained are more than the parameters of the first generator.
  • the parameters of the neural network include the weight parameters of each convolutional layer in the neural network. The greater the absolute value of the weight parameter, the greater the contribution of the neuron corresponding to the weight parameter to the output of the neural network, and the more important it is to the neural network.
  • the higher the complexity of a neural network with more parameters the greater its "capacity", which means that the neural network can complete more complex learning tasks.
  • the first generator is simplified, and the first generator has fewer parameters and a simpler network structure, so that the first generator occupies less resources (such as computing resources, storage resources, etc.), so it can be applied to lightweight terminals.
  • the first generator can learn the reasoning ability of the generator to be trained, so that the first generator has a simple structure and strong reasoning ability.
  • “sharpness” refers to, for example, the sharpness of each detail shadow pattern and its boundary in an image. The higher the sharpness, the better the perception effect of the human eye.
  • Restoring the clarity of the training image is higher than the clarity of the input image refers to using the image processing method provided by the embodiment of the present disclosure to process the input image, such as denoising and/or deblurring, so that the obtained image after processing is processed.
  • the inpainted training images are sharper than the input images.
  • the input image may include a face image, that is, the first generator is used to perform face restoration.
  • the input image may also be other types of images.
  • the restored image details can be enriched, and the restoration effect can be improved.
  • the first generator includes N inpainting modules, and the inpainting modules are configured to perform denoising and/or deblurring on an input image of a specified scale, and improve the input image.
  • N is an integer greater than or equal to 2.
  • N may be equal to 4.
  • the four repair modules include: a repair module with a scale of 64*64, a The repair module of 256*256 scale and the repair module of 512*512 scale.
  • the above-mentioned number of repair modules may also be other values, and the scale corresponding to each repair module is not limited to the four examples described above.
  • the scale refers to the resolution
  • the network structure adopted by the repair module is SRCNN or U-Net.
  • using the first generator to process the input image to obtain an output image includes:
  • the output image is obtained by using the N inpainting modules and the N scales of images to be inpainted.
  • the latter scale is twice the former scale.
  • the N scales are 64*64 scale, 128*128 scale, 256*256 scale and 512*512 scale respectively.
  • processing the input image into an image to be repaired with N scales includes:
  • the upsampling and downsampling in the above embodiments may be interpolation, such as bicubic interpolation.
  • the input image can be first processed into an image to be repaired with one of the N scales, and then the image to be repaired can be up-sampled and/or down-sampled to obtain images of other N-1 scales to be repaired .
  • the input image can also be sequentially sampled into images to be repaired with N scales.
  • the scale interval to which the scale of the input image belongs is first determined. If the scale of the input image is less than or equal to 96*96, the input image is subjected to up-sampling or down-sampling processing to obtain 64*64 scale training images to be repaired, and then upsampling the 64*64 scale training images to be repaired to obtain 128*128, 256*256, and 512*512 scale training images to be repaired. If the scale of the input image is greater than 96*96 and less than or equal to 192*192, perform upsampling or downsampling on the input image to obtain a 128*128 scale training image to be repaired.
  • the repaired training images are down-sampled and up-sampled to obtain 64*64, 256*256, 512*512 scale training images to be repaired. If the scale of the input image is greater than 192*192 and less than or equal to 384*384, perform up-sampling or down-sampling processing on the input image to obtain a 256*256 scale training image to be repaired, and then perform an upsampling or downsampling process on the 256*256 scale.
  • the repaired training images are down-sampled and up-sampled to obtain 64*64, 128*128, 512*512 scale training images to be repaired.
  • the input image will be up-sampled or down-sampled to obtain a 512*512 scale training image to be repaired, and then the 512*512 scale of the training image to be repaired will be downsampled, Obtain training images of 64*64, 128*128, 256*256 scales to be repaired.
  • the above-mentioned numerical value for judging the interval to which the input image belongs can be selected as needed.
  • the intermediate scale of two adjacent scales in the N scales of the image to be repaired is taken, for example, 64 *64 and 128*128, the intermediate scale of two adjacent scales is 96*96, and the intermediate scale of 128*128 and 256*256 adjacent two scales is 192*192, and so on.
  • the specific scheme is not limited to the above 96*96, 192*192, 384*384.
  • up-sampling or down-sampling may be implemented by interpolation.
  • obtaining the output image includes:
  • Step 31 Splicing the image to be repaired at the first scale and the random noise image at the first scale to obtain a first spliced image; inputting the first spliced image into the first repair module to obtain a repaired image at the first scale ; Perform up-sampling processing on the repaired image of the first scale to obtain an up-sampled image of the second scale;
  • the above-mentioned random noise image of the first scale may be generated randomly, or may be generated by up-sampling or down-sampling on a random noise image of the same scale of the input image.
  • the 64*64 scale to-be-repaired image and 64*64 scale are obtained.
  • the 64-scale random noise images are stitched to obtain the first stitched image, and the first stitched image is input into the first repair module to obtain a 64*64 scale repaired image, and then the 64*64 scale repaired image is processed. Sampling processing to obtain an up-sampled image of 128*128 scale;
  • Step 32 splicing the upsampled image of the ith scale, the image to be repaired of the ith scale, and the random noise image of the ith scale to obtain the ith stitched image; input the ith stitched image to the ith repair module Obtain the repaired image of the i-th scale; perform up-sampling processing on the repaired image of the i-th scale to obtain an up-sampled image of the i+1-th scale; wherein, i is an integer greater than or equal to 2;
  • the i-th repair module is a repair module located between the first repair module and the last repair module.
  • the second repair module firstly, the obtained 128*128 scale image to be repaired (ie input 2 in Figure 2), 128*128 scale random noise image and 128*128 scale
  • the scale upsampled images are stitched to obtain a second stitched image, and the second stitched image is input into the second repair module to obtain a 128*128 scale repaired image, and then the 128*128 scale repaired image is upsampled process to obtain an up-sampled image with a scale of 256*256;
  • the third repair module the obtained image to be repaired with a scale of 256*256 (ie, input 3 in Figure 2), a random noise image with a scale of 256*256, and
  • the 256*256 scale upsampled images are stitched to obtain the third stitched image, and the third stitched image is input into the third repair module to obtain the 256*256 scale repaired image, and then the 256*256 scale repaired image is obtained. Perform up-sampling
  • Step 33 Stitch the upsampled image of the Nth scale, the image to be repaired of the Nth size, and the random noise image of the Nth scale to obtain the Nth stitched image; input the Nth stitched image to the Nth restoration module
  • the repaired image of the Nth scale is obtained as the repaired training image of the first generator.
  • the obtained image to be repaired with a scale of 512*512 (ie input 4 in Figure 2), a random noise image with a scale of 512*512, and a scale of 512*512 will be obtained.
  • the up-sampled images are spliced to obtain a fourth spliced image, and the fourth spliced image is input into the last repair module to obtain a repaired image with a scale of 512*512, which is used as the repaired training image of the first generator.
  • random noise is added to the first generator when performing image restoration, because the blurred image is input into the first generator alone, and the obtained restored image may appear due to lack of high-frequency information.
  • Random noise is added to the input of the first generator, and the random noise can be mapped as high-frequency information on the restored image, thereby enriching the details of the restored image.
  • obtaining the output image includes:
  • Step 41 For each scale of the image to be repaired, extract the key points in the image to be repaired, generate multiple key point heatmaps, merge and classify the key point heatmaps, and obtain S sheets of each scale. keypoint mask image, where S is an integer greater than or equal to 2;
  • a 4-stack hourglass model can be used to extract key points in the image to be repaired, for example, 68 key points in a face image are extracted to generate 68 key points A heatmap, where each keypoint heatmap represents the probability that all pixels on the image are a landmark.
  • merge Merge
  • classify softmax
  • multiple keypoint heatmaps to obtain S keypoint mask images corresponding to different facial parts (components), for example, S It can be 5, and the corresponding facial parts can be: left eye, right eye, nose, mouth, and outline.
  • the extracted key points are not limited to 68, and the number of key point mask images is not limited to 68. It is also not limited to 5 sheets, that is, the face parts are not limited to 5.
  • Step 42 Stitch the image to be repaired at the first scale and the S key point mask images at the first scale to obtain a first stitched image; input the first stitched image into the first restoration module to obtain the first stitched image.
  • a repaired image of a scale performing an upsampling process on the repaired image of the first scale to obtain an upsampled image of a second scale;
  • the 64*64 scale to-be-repaired image and the 64*64 scale key point mask image are obtained.
  • Step 43 Splicing the upsampled image of the ith scale, the image to be repaired of the ith scale, and the S key point mask images of the ith scale to obtain the ith stitched image; input the ith stitched image
  • the repaired image of the i-th scale is obtained in the i-th repair module; the up-sampling process is performed on the repaired image of the i-th scale to obtain an up-sampled image of the i+1-th scale; wherein, i is an integer greater than or equal to 2 ;
  • the i-th repair module is a repair module located between the first repair module and the last repair module.
  • Upsampling image for the third repair module, firstly stitch the obtained 256*256 scale image to be repaired, 256*256 scale key point mask image and 256*256 scale upsampled image to obtain the third stitching image, and input the third stitched image into the third repair module to obtain a repaired image of 256*256 scale, and then upsample the repaired image of 256*256 scale to obtain an upsampled image of 512*512 scale;
  • Step 44 Splicing the upsampled image of the Nth scale, the to-be-repaired image of the Nth scale, and the S key point mask images of the Nth scale to obtain the Nth stitched image; input the Nth stitched image to the Nth stitched image.
  • the N-th scale inpainting image is obtained from the N inpainting modules as the inpainting training image of the first generator.
  • the heat map of the key points of the face is introduced into the image sharpening process, which can reduce the deformation degree of the facial features and improve the final image restoration effect while ensuring the sharpness of the image.
  • the first generator is obtained by training the generator to be trained by using at least two discriminators, including: training the generator to be trained and the at least two generators according to the training image and the verification image.
  • the discriminator is alternately trained to obtain the first generator, wherein the definition of the verification image is higher than that of the training image, and when the generator to be trained is trained, the generator to be trained
  • the total loss includes at least one of the following: the first loss and the total adversarial loss of the at least two discriminators.
  • the first generator includes N repair modules, where N is an integer greater than or equal to 2.
  • N may be equal to 4, and more preferably 2, the four repair modules include: 64*64 scale repair module, 128*128 scale repair module, 256*256 scale repair module and 512*512 scale repair module.
  • the at least two discriminators include: N first-type discriminators with different network structures corresponding to the N repair modules respectively; for example, the first generator includes four repair modules, then the at least two The discriminator includes four first type discriminators, please refer to FIG.
  • the four first type discriminators can be discriminator 1, discriminator 2, discriminator 3 and discriminator 4 in FIG. 8 respectively.
  • Using the first type of discriminator corresponding to multiple scales for training can make the face image output by the first generator processed by training more accurate than the face image output by the first generator obtained by training with a single discriminator of a single scale. Close to the real face image, the repair effect is also better, the details are richer, and the deformation is smaller.
  • the following describes the training process of the generator to be trained and the at least two discriminators, respectively.
  • training the generator to be trained includes:
  • Step 91 Process the training images into N-scale training images to be repaired
  • the training image may be first processed into a training image to be repaired of one of the N scales, and then the training image to be repaired is up-sampled and/or down-sampled to obtain other N-1 scale of training images to be repaired.
  • the training images can also be sequentially sampled into N-scale training images to be repaired.
  • the training images can be processed into four training images to be repaired with scales of 64*64, 128*128, 256*256 and 512*512.
  • Step 92 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • the N scales of the training images to be repaired are input into the generator to be trained; if it is not the first training, the The training images of N scales to be repaired are input into the generator after the last training.
  • Step 93 For the repaired training images of each scale, set the repaired training images of each scale to have ground-truth labels, and input the repaired training images with ground-truth labels to the initial first-class discriminator or the first type of discriminator after the last training to obtain the first discrimination result;
  • Step 94 Calculate a first adversarial loss based on the first discrimination result; the total adversarial loss includes the first adversarial loss.
  • the first adversarial loss is the sum of adversarial losses corresponding to the repaired training images at each scale.
  • Step 95 Adjust the parameters of the generator to be trained according to the total adversarial loss.
  • training the at least two discriminators includes:
  • Step 101 Process the training image into a training image of N scales to be repaired, and process the verification image into a verification image of N scales;
  • the training image may be first processed into a training image to be repaired of one of the N scales, and then the training image to be repaired is up-sampled and/or down-sampled to obtain other N-1 scale of training images to be repaired.
  • the training images can also be sequentially sampled into N-scale training images to be repaired.
  • the verification image may be first processed into a verification image of one scale among N scales, and then the processed verification image may be up-sampled and/or down-sampled to obtain other N-1 Validation images at scales.
  • the verification images can also be sequentially sampled into verification images of N scales.
  • the training images can be processed into four training images to be repaired with scales of 64*64, 128*128, 256*256 and 512*512.
  • the verification images are processed into four verification images of 64*64, 128*128, 256*256 and 512*512 scales.
  • Step 102 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 103 For each scale inpainting training image, set the inpainting training image of each scale to have a false value label, and input the inpainting training image with a false value label to the initial first-class discriminator or the first-class discriminator after the last training, to obtain the third discriminant result; set the validation images of each scale to have ground truth labels, and input each verification image with ground truth labels into the first class a discriminator to obtain a fourth discrimination result;
  • Step 104 Calculate a third confrontation loss based on the third identification result and the fourth identification result
  • Step 105 Adjust the parameters of the first-type discriminator according to the third adversarial loss to obtain an updated first-type discriminator.
  • the at least two discriminators further include: a first type discriminator and a second type discriminator with different N network structures corresponding to the N repair modules respectively , the second type of discriminator is configured to improve the clarity restoration of the face part of the training image by the first generator, so that the clarity of the face part in the image output by the first generator obtained by training will be higher;
  • the following describes the training process of the generator to be trained and the at least two discriminators respectively.
  • training the generator to be trained includes:
  • Step 111 Process the training images into N-scale training images to be repaired
  • the training image may be first processed into a training image to be repaired of one of the N scales, and then the training image to be repaired is up-sampled and/or down-sampled to obtain other N-1 A scale of training images to be repaired.
  • the training images can also be sequentially sampled into N-scale training images to be repaired.
  • the training images can be processed into four training images to be repaired with scales of 64*64, 128*128, 256*256 and 512*512.
  • Step 112 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 113 Obtain the first partial face image of the repaired training image of the Nth scale
  • the first partial face image is an eye image.
  • the eye image in the repaired training image of the Nth scale may be captured as the first partial face image in a direct screenshot manner.
  • Step 114 For each scale inpainted training image, set the inpainted training image of each scale to have ground truth labels, and input the inpainted training images with ground truth labels to the initial first-class discriminator Or the first type of discriminator after the last training to obtain the first discrimination result;
  • Step 115 Set the first partial face image to have a ground truth label, and input the first partial face image with ground truth label to the initial second-class discriminator or the first-class discriminator after the last training. Class II discriminator to obtain a second discriminant result;
  • the discriminator 5 in Fig. 8 is a second type discriminator, and the first partial face image is set to have a true value label, and the first partial face image with a true value label is input to the discriminator.
  • the device 5 to obtain the second identification result of the discriminator 5;
  • Step 116 Calculate a first adversarial loss based on the first discrimination result; calculate a second adversarial loss based on the second discrimination result, where the total adversarial loss includes the first adversarial loss and the second adversarial loss;
  • the first adversarial loss is the sum of adversarial losses corresponding to the repaired training images at each scale.
  • Step 117 Adjust parameters of the generator to be trained or the generator trained last time according to the total adversarial loss.
  • training the at least two discriminators includes:
  • Step 121 Process the training image into a training image of N scales to be repaired, and process the verification image into a verification image of N scales;
  • the training image may be first processed into a training image to be repaired of one of the N scales, and then the training image to be repaired is up-sampled and/or down-sampled to obtain other N-1 scale of training images to be repaired.
  • the training images can also be sequentially sampled into N-scale training images to be repaired.
  • the verification image may be first processed into a verification image of one scale among N scales, and then the processed verification image may be up-sampled and/or down-sampled to obtain other N-1 Validation images at scales.
  • the verification images can also be sequentially sampled into verification images of N scales.
  • the training images can be processed into four training images to be repaired with scales of 64*64, 128*128, 256*256 and 512*512.
  • the verification images are processed into four verification images of 64*64, 128*128, 256*256 and 512*512 scales.
  • Step 122 Obtain the second partial face image of the verification image of the Nth scale
  • the first partial face image and the second partial face image are eye images.
  • the eye image in the verification image of the Nth scale may be captured as the second partial face image in a direct screenshot manner.
  • Step 123 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 124 Obtain the first partial face image of the repaired training image of the Nth scale
  • the eye image in the repaired training image of the Nth scale may be captured as the first partial face image in a direct screenshot manner.
  • Step 125 For each scale inpainting training image, set the inpainting training image of each scale to have a false value label, and input the inpainting training image with a false value label to the initial first-class discriminator or the first-class discriminator after the last training, to get the third discriminant result; set the validation images of each scale to have ground-truth labels, and enter each validation image with ground-truth labels into the first-class a discriminator to obtain a fourth discrimination result;
  • Step 126 Set the first partial face image to have a false value label, and input the first partial face image with a false value label to the initial second-class discriminator or the first-class discriminator after the last training.
  • a second-class discriminator to obtain a fifth discriminant result set the second partial face image to have a ground truth label, and input the second partial face image with a ground truth label to the initial second type The discriminator or the second type discriminator after the last training to obtain the sixth discrimination result;
  • Step 127 Calculate a third adversarial loss based on the third discrimination result and the fourth discrimination result; calculate a fourth adversarial loss based on the fifth discrimination result and the sixth discrimination result;
  • Step 128 Adjust the parameters of the first type discriminator according to the third adversarial loss to obtain an updated first type discriminator; adjust the parameters of the second type discriminator according to the fourth adversarial loss to obtain The updated second-class discriminator.
  • the training effect can be improved by increasing the confrontation loss of the eye image.
  • the at least two discriminators further include: X third type discriminators; X is a positive integer greater than or equal to 1, and the third type discriminators are configured In order to improve the detailed restoration of the face position of the training image by the first generator, that is, compared with other training methods, in the face image output by the first generator obtained by training the third type of discriminator.
  • the image of the human eye is sharper, and the image of the human eye has more detail.
  • training the generator to be trained further includes:
  • Step 131 Process the training images into N-scale training images to be repaired
  • Step 132 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 133 Perform face analysis processing on the repaired image of the Nth scale by using a face analysis network to obtain X first face images corresponding to the repaired image of the Nth scale, where if X is equal to 1, The first person's face position image includes one person's face position, and if X is greater than 1, the X first person's face position images include different face positions;
  • the face parsing network adopts a semantic segmentation network.
  • the face parsing network parses the human face, and the output face position may include at least one of the following: background, facial skin, left eyebrow, right eyebrow, left eye, right eye, left Ear, right ear, nose, teeth, upper lip, lower lip, clothes, hair, hat, glasses, neck, etc.
  • Step 134 Set the X first-person face images to have truth labels, and input each first-person face image with truth labels to the initial third-class discriminator or upper The third type of discriminator after one training to obtain the seventh discrimination result;
  • Step 135 Calculate a fifth adversarial loss based on the seventh discrimination result; the total adversarial loss includes the fifth adversarial loss;
  • Step 136 Adjust the parameters of the generator to be trained or the generator after the last training according to the total adversarial loss.
  • training the at least two discriminators includes:
  • Step 141 Process the training image into a training image of N scales to be repaired, and process the verification image into a verification image of N scales;
  • Step 142 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 143 Perform face analysis processing on the repaired image of the Nth scale by using a face analysis network, and obtain X first-person face images corresponding to the repaired image of the Nth scale, wherein X first-person face images are obtained.
  • the face image includes different face parts;
  • a face analysis network is used to perform face analysis processing on the verification image of the Nth scale, and X second faces corresponding to the verification image of the Nth scale are obtained.
  • part images, wherein the X second human face part images contain different human face parts;
  • the face parsing network adopts a semantic segmentation network.
  • the face parsing network parses the human face, and the output face position may include at least one of the following: background, facial skin, left eyebrow, right eyebrow, left eye, right eye, left Ear, right ear, nose, teeth, upper lip, lower lip, clothes, hair, hat, glasses, neck, etc.
  • X is equal to 1
  • the third type of discriminator is configured to improve the detail restoration of the face skin of the training image by the first generator, That is, compared with other training methods, the skin image in the face image output by the first generator obtained by training the third type of discriminator is clearer, and the skin image has more details.
  • Step 144 Set the X first-person face images to have false-value labels, and input the first-person face images with false-value labels to the initial third-class discriminator or upper The third type of discriminator after one training, to obtain the eighth discrimination result; set the X second-person face images to have true-value labels, and set each second-person face with true-value labels The part image is input to the initial third type discriminator or the third type discriminator after the last training to obtain the ninth discrimination result;
  • Step 145 Calculate a sixth adversarial loss based on the eighth identification result and the ninth identification result;
  • Step 146 Adjust the parameters of the third type discriminator according to the sixth adversarial loss to obtain an updated third type discriminator.
  • FIG. 16 is a schematic diagram of the input and output of the generator and discriminator to be trained according to an embodiment of the present disclosure.
  • the input of the generator to be trained includes training images of N scales, N A random noise image of a scale (or a key point mask image of N scales), the output of the generator to be trained is the repaired training image after repair;
  • the discriminator includes the N first-type identifications of the above-mentioned repair modules corresponding to N scales
  • the input of the discriminator includes: the repaired training image of the generator to be trained, the verification images of N scales, the X face images corresponding to the verification images of the Nth scale, The X face images corresponding to the repaired training images of the Nth scale.
  • the facial features, skin and/or hair, etc. of the face are segmented and input into the discriminator to discriminate between true and false, so that when the generator is trained to repair each part of the face, there is a countermeasure against it.
  • the total loss of the generator to be trained further includes: face similarity loss;
  • training the generator to be trained further includes:
  • Step 171 Process the training images into N-scale training images to be repaired
  • Step 172 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 172 Use a keypoint detection network to perform keypoint detection on the repaired image of the Nth scale, to obtain a first keypoint heatmap corresponding to the repaired image of the Nth scale;
  • Step 173 Use a keypoint detection network to perform keypoint detection on the Nth scale to-be-repaired training image to obtain a second keypoint heatmap corresponding to the Nth-scale to-be-repaired training image;
  • Step 174 Calculate the face similarity loss according to the first keypoint heatmap and the second keypoint heatmap.
  • the key point detection module in FIG. 8 is the key point detection network
  • the heatmap_1 is the first keypoint heatmap
  • the heatmap_2 is the second keypoint heatmap.
  • a 4-stack hourglass model can be used to extract the Nth scale training image to be repaired and key points in the repaired training image, for example, extracting 68 in the face image 68 keypoint heatmaps are generated, where each keypoint heatmap represents the probability that all pixels on the image are a certain keypoint (landmark).
  • the total loss of the generator to be trained further includes: average gradient loss;
  • training the generator to be trained further includes:
  • Step 181 Process the training images into N-scale training images to be repaired
  • Step 182 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 183 Calculate the average gradient loss of the repaired training image of the Nth size.
  • the calculation formula of the average gradient loss AvgG is as follows:
  • m and n are the width and height of the repaired training image of size N, respectively, fi,j are the pixels of the repaired training image of size N at position (i, j), represents the difference between f i,j and adjacent pixels in the row direction, Represents the difference between f i,j and adjacent pixels in the column direction.
  • the first generator includes N repair modules, and the loss used by the generator to be trained includes the first loss; in this embodiment, the first loss may be referred to as perceived loss;
  • training the generator to be trained further includes:
  • Step 191 Process the training image into a training image of N scales to be repaired, and process the verification image into a verification image of N scales;
  • Step 192 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 193 Input the repaired training images of the N scales and the verification images of the N scales into the VGG network, and obtain the loss of the repaired training images of each scale on the M target layers of the VGG network, M is an integer greater than or equal to 1; the first loss includes the loss of the N scales of repaired training images on M target layers.
  • the first loss includes: the loss of the repaired training images of each scale on the M target layers is multiplied by the corresponding weights and added, wherein the weights of the repaired training images of different scales are used in the target layer. different.
  • the generator to be trained includes four scales of repair modules, namely 64*64, 128*128, 256*256, and 512*512.
  • the VGG network is a VGG19 network
  • the M target layers are respectively 2-2 layers, 3-4 layers, 4-4 layers, and 5-4 layers
  • the calculation formula of the first loss (that is, the perceptual loss) L as follows:
  • L per_64 is the perceptual loss of the repaired training image at scale 64*64
  • L per_128 is the perceptual loss of repaired training image at scale 128*12
  • L per_256 is the perceptual loss of repaired training image at scale 256*256
  • L per_512 is Perceptual loss of inpainting training images at scale 512*512
  • Perceptual loss at layers 2-2 for inpainting training images of different scales
  • Perceptual loss at layers 3-4 for inpainting training images of different scales
  • Perceptual loss at layers 4-4 for inpainting training images of different scales
  • Perceptual loss at layers 5-4 for inpainting training images at different scales.
  • the weights used in the target layer for inpainting training images of different scales may also be the same, for example:
  • the first loss further includes at least one of the following: an L1 loss, a second loss, and a third loss;
  • the training of the generator to be trained includes:
  • the training of the generator to be trained includes:
  • the training of the generator to be trained includes:
  • the first face skin image and the second face skin image are input into the VGG network to obtain the third loss of the first face skin image on the M target layers of the VGG network.
  • the at least two discriminators include: a fourth type of discriminator and a fifth type of discriminator; the fourth type of discriminator is configured to keep the first generator on the Structural features of the training image, the output image of the specific first generator can retain more content information of the input image; the fifth type of discriminator is configured to enhance the details of the training image by the first generator Repair, specifically, compared with other training methods, the output image processed by the first generator obtained by training the fifth type of discriminator has more detailed features and higher definition.
  • training the generator to be trained includes:
  • Step 201 Process the training images into N-scale training images to be repaired
  • Step 202 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 203 For each scale of the repaired training image, set the repaired training image of each scale to have a ground truth label, and input the repaired training image with a ground truth label to the initial fourth type of discriminator Or the fourth type of discriminator after the last training to get the tenth discrimination result;
  • Step 204 Calculate a seventh confrontation loss based on the tenth identification result
  • Step 205 For each scale inpainted training image, set the inpainted training image of each scale to have a ground truth label, and input the repaired training image with a ground truth label to the initial fifth-class discriminator Or the fifth type of discriminator after the last training to get the eleventh discrimination result;
  • Step 206 Calculate an eighth adversarial loss based on the eleventh discrimination result; the total adversarial loss includes the seventh adversarial loss and the eighth adversarial loss.
  • Step 207 Adjust parameters of the generator to be trained or the generator trained last time according to the total confrontation loss.
  • training the at least two discriminators includes:
  • Step 211 Process the training image into a training image of N scales to be repaired; process the verification image into a verification image of N scales;
  • Step 212 Input the N scales of training images to be repaired into the generator to be trained or the generator after the last training to obtain N scales of repaired training images;
  • Step 213 For each scale inpainting training image, set the inpainting training image of each scale to have a false value label, and input the inpainting training image with a false value label to the initial fourth-class discriminator Or the fourth type of discriminator after the last training to obtain the twelfth discrimination result; for each scale of the training image to be repaired, set the to-be-repaired training image to have the true value label, and set the to-be-repaired training image to have the true value label The described training image to be repaired is input to the initial fourth type discriminator or the fourth type discriminator after the last training to obtain the thirteenth discrimination result;
  • Step 214 Calculate a ninth adversarial loss based on the twelfth identification result and the third identification result;
  • Step 215 Adjust the parameters of the fourth type of discriminator according to the ninth adversarial loss to obtain an updated fourth type of discriminator.
  • Step 216 For the repaired training image of each scale, perform high-frequency filtering processing on the repaired training image and the verification image of the corresponding scale to obtain the repaired training image and the verification image after high-frequency filtering;
  • Step 217 For the high-frequency filtered inpainting training images of each scale, set the high-frequency filtered inpainting training images to have false value labels, and set the high-frequency filtered inpainted training images with false value labels
  • the training image is input to the initial fifth type discriminator or the fifth type discriminator after the last training to obtain the fourteenth discrimination result;
  • the Gaussian filtered verification image of each scale the Gaussian filtered
  • the validation image is set to have ground truth labels, and the Gaussian filtered verification image with ground truth labels is input to the initial fifth-class discriminator or the fifth-class discriminator after the last training to obtain fifteenth identification results;
  • Step 218 Calculate the tenth adversarial loss based on the fourteenth discrimination result and the fifteenth discrimination result;
  • Step 219 Adjust the parameters of the fifth type discriminator according to the tenth adversarial loss to obtain an updated fifth type discriminator.
  • FIG. 22 is a schematic diagram of the input and output of the generator and discriminator to be trained according to another embodiment of the present disclosure.
  • the input of the generator to be trained includes training images of N scales, N-scale random noise images (or N-scale keypoint mask images), the output of the generator to be trained is the repaired training image after repair;
  • the fourth type of discriminator includes the above N corresponding to the N-scale repair module N
  • the input of the first type of discriminator, the input of the fourth type of discriminator includes: the repaired training image of the generator to be trained, and the training images of N scales.
  • the fifth type of discriminator includes the N first type of discriminators corresponding to the repair modules of N scales, and the input of the fifth type of discriminator includes: the image after high-frequency filtering of the repaired training image of the generator to be trained, N Scaled validation image after high-frequency filtering.
  • the above verification image may be an image with the same content as the training image but different in definition, or may be an image with different content and different definition from the training image.
  • two types of discriminators are designed.
  • the reason for this design is that the detailed texture is the high-frequency information in the image, and the high-frequency information in the natural image has the following characteristics. characteristics of a particular distribution.
  • the fifth type of discriminator and the generator are trained against each other, so that the generator learns the distribution obeyed by the detailed texture, so that the smooth low-definition image can be mapped to the real natural image space with rich details.
  • the fourth type of discriminator discriminates the low-definition image and its corresponding restoration result, and can constrain the image to maintain its structural features and not deform after passing through the generator.
  • the loss function of the fifth type of discriminator is as follows:
  • the loss function for the fourth type of discriminator is as follows:
  • G represents the generator
  • D1 and D2 represent the fifth and fourth discriminators, respectively
  • HF represents the Gaussian high-frequency filter
  • x represents the training image input to the generator
  • y represents the real high-definition verification image.
  • training the generator to be trained further includes:
  • ⁇ , ⁇ , ⁇ represent the weight of each loss
  • AvgG represents the average gradient loss.
  • the average gradient can be used to evaluate the richness of the detailed texture in the image. The richer the details in the image, the faster the gray value changes in a certain direction, and the larger the average gradient value is.
  • the calculation formula of the average gradient loss AvgG is as follows:
  • m and n are the width and height of the inpainted training image of size N, respectively, and fi,j are the pixels of the inpainted training image of size N at position (i, j).
  • the first generator includes N repair modules
  • the at least two discriminators include: a first type of N network structures that are respectively different from the N repair modules corresponding to the N repair modules discriminator;
  • training the generator to be trained includes:
  • Step 231 Process the training images into N-scale training images to be repaired
  • Step 232 For each scale of the training image to be repaired, extract the key points in the to-be-repaired training image, generate multiple key point heatmaps, merge and classify the key point heatmaps, and obtain the S key point mask images, where S is an integer greater than or equal to 2;
  • Step 233 Input the N scales of the training images to be repaired and the S key point mask images of each scale into the generator to be trained or the generator after the last training to obtain the repaired training images of N scales;
  • Step 234 For each scale inpainted training image, set the inpainted training image of each scale to have ground truth labels, and input the inpainted training images with ground truth labels to the initial first-class discriminator or the first type of discriminator after the last training to obtain the first discrimination result;
  • Step 235 Calculate a first adversarial loss based on the first discrimination result; the total adversarial loss includes the first adversarial loss;
  • Step 236 Adjust the parameters of the generator to be trained or the generator trained last time according to the total confrontation loss
  • training the at least two discriminators includes:
  • Step 241 Process the training image into a training image of N scales to be repaired, and process the verification image into a verification image of N scales;
  • Step 242 For each scale of the training image to be repaired, extract the key points in the to-be-repaired training image, generate multiple key point heatmaps, merge and classify the key point heatmaps, and obtain the S key point mask images;
  • Step 243 Input the N scale training images to be repaired and the S key point mask images of each scale into the generator to be trained or the generator after the last training to obtain N scale repair training images;
  • Step 244 For each scale of the inpainted training image, set the inpainted training image of each scale to have false labels, and input the inpainted training images with false labels to the initial first-class discriminator or the first-class discriminator after the last training, to get the third discriminant result; set the validation images of each scale to have ground-truth labels, and enter each validation image with ground-truth labels into the first-class a discriminator to obtain a fourth discrimination result;
  • Step 245 Calculate a third confrontation loss based on the third discrimination result and the fourth discrimination result
  • Step 246 Adjust the parameters of the first type of discriminator according to the third adversarial loss to obtain an updated first type of discriminator.
  • training the generator to be trained includes:
  • the first loss includes the loss of the N scales of inpainted training images on M target layers.
  • the first loss includes: the losses of the repaired training images of each scale on the M target layers are multiplied by the corresponding weights, and the weights used in the target layers of the repaired training images of different scales are added. different.
  • the generator to be trained includes four scales of repair modules, namely 64*64, 128*128, 256*256, and 512*512.
  • the VGG network is a VGG19 network
  • the M target layers are respectively 2-2 layers, 3-4 layers, 4-4 layers, and 5-4 layers
  • the calculation formula of the first loss (that is, the perceptual loss) L as follows:
  • L per_64 is the perceptual loss of the repaired training image at scale 64*64
  • L per_128 is the perceptual loss of repaired training image at scale 128*12
  • L per_256 is the perceptual loss of repaired training image at scale 256*256
  • L per_512 is Perceptual loss of inpainting training images at scale 512*512
  • Perceptual loss at layers 2-2 for inpainting training images of different scales
  • Perceptual loss at layers 3-4 for inpainting training images of different scales
  • Perceptual loss at layers 4-4 for inpainting training images of different scales
  • Perceptual loss at layers 5-4 for inpainting training images at different scales.
  • the calculation method of L2 loss is as follows: the training image is processed into a training image of N scales to be repaired, the verification image is processed into a verification image of N scales; the training image of N scales to be repaired is processed Input to the generator to be trained or the generator after the last training to obtain the repaired training images of N scales; by comparing the repaired training images of N scales and the verification images of N scales, the L2 loss is obtained.
  • the first generator includes N repair modules, and each of the repair modules adopts the same network structure;
  • the training process for the generator to be trained includes a first training phase and a second training phase; both the first training phase and the second training phase include at least one training process for the generator to be trained;
  • each of the repair modules is independently adjusted for parameters.
  • the shared parameters are decoupled, so that the super-resolution module on each scale can pay more attention to the information on that scale, so as to achieve better detail restoration effect.
  • an embodiment of the present disclosure further provides an image processing method, including:
  • Step 251 receive an input image
  • Step 252 performing face detection on the input image to obtain a face image
  • optionally performing face detection on the input image to obtain a face image includes: performing face detection on the input image to obtain a detection image, and standardizing and aligning the detection image to obtain the detected image. face image.
  • Step 253 Use the method in any of the above embodiments to process the face image to obtain a first repaired training image, wherein the clarity of the first repaired training image is higher than that of the input image;
  • Step 254 Process the input image or the input image from which the face image is removed to obtain a second repaired training image, wherein the clarity of the second repaired training image is higher than that of the input image;
  • Step 255 fuse the first repaired training image and the second repaired training image to obtain a fused image, and the definition of the fused image is higher than that of the input image.
  • processing the input image or the input image from which the face image is removed to obtain the second repaired training image includes: using the method described in any of the foregoing embodiments to The image or the input image from which the face image is removed is processed to obtain a second repaired training image.
  • an embodiment of the present application further provides an image processing apparatus 260, including:
  • a receiving module 261, configured to receive an input image
  • the processing module 262 is configured to process the input image by using a first generator to obtain an output image, wherein the definition of the output image is higher than that of the input image; wherein the first generator is The generator is trained with at least two discriminators to be trained.
  • the first generator includes N repair modules, where N is an integer greater than or equal to 2;
  • the processing module is configured to process the input image into N scale images to be repaired, wherein the scale of the first scale image to be repaired increases to the scale of the Nth scale image to be repaired; using the N scale and the N scales of images to be repaired to obtain the output image.
  • the latter scale is twice as large as the former scale.
  • the processing module is configured to determine the scale interval to which the input image belongs; process the input image into an image to be repaired at the jth scale corresponding to the scale interval to which it belongs, where the jth scale is the One of the first scale to the Nth scale; performing up-sampling and/or down-sampling processing on the j-th scale to be repaired to obtain the remaining N-1 scales of the to-be-repaired images.
  • processing module is used for:
  • the processing module is used for:
  • a 4-stack hourglass model is used to extract key points in the image to be repaired.
  • the device further includes:
  • a training module configured to alternately train the generator to be trained and the at least two discriminators according to the training image and the verification image to obtain the first generator, wherein the verification image has a higher definition than all
  • the total loss of the generator to be trained includes at least one of the following: a first loss and a total adversarial loss of the at least two discriminators.
  • the first generator includes N repair modules, where N is an integer greater than or equal to 2, and the at least two discriminators include: N network structures corresponding to the N repair modules respectively A different first type of discriminator, and, a second type of discriminator, the second type of discriminator configured to improve the sharpness inpainting of the face parts of the training image by the first generator.
  • the training module includes a first training submodule
  • the first training submodule is used to train the generator to be trained, including:
  • the first partial face image is set to have a ground truth label, and the first partial face image with ground truth label is input to the initial second-class discriminator or the second-class discriminator after the last training device to obtain the second identification result;
  • the first training submodule is used to train the at least two discriminators, including:
  • the first partial face image is set to have a false value label, and the first partial face image with the false value label is input to the initial second type discriminator or the second type discriminator after the last training to obtain the fifth discrimination result; set the second partial face image to have a true value label, and input the second partial face image with the true value label to the initial second type discriminator or The second type of discriminator after the last training to obtain the sixth discrimination result;
  • the first partial face image and the second partial face image are eye images.
  • the at least two discriminators further include: X third type discriminators; X is a positive integer greater than or equal to 1, and the third type discriminator is configured to enhance the first generator pair Detail restoration of the human face in the training image.
  • the first training submodule is used to train the generator to be trained, including:
  • a face parsing network is used to perform face parsing processing on the repaired image of the Nth scale, and X first face images corresponding to the repaired image of the Nth scale are obtained, wherein if X is equal to 1, the The face image of one person includes one face, and if X is greater than 1, the X first face images include different faces;
  • a fifth adversarial loss is calculated based on the seventh discrimination result; the total adversarial loss includes the fifth adversarial loss;
  • the first training submodule is used to train the at least two discriminators, including:
  • the face analysis network is used to perform face analysis on the repaired image of the Nth scale, and X first face images corresponding to the repaired image of the Nth scale are obtained, wherein X first face images are obtained.
  • the image includes different face parts;
  • a face analysis network is used to perform face analysis processing on the verification image of the Nth scale to obtain X second face images corresponding to the verification image of the Nth scale , wherein the X second person face position images include different face positions;
  • the X first-person face images are set to have false-value labels, and the first-person face images with false-value labels are input to the initial third-class discriminator or after the last training.
  • the third type of discriminator is used to obtain the eighth discrimination result;
  • the X second-person face images are set to have truth labels, and each second-person face image with truth labels is set to Input to the initial third type discriminator or the third type discriminator after the last training to obtain the ninth discrimination result;
  • the parameters of the third type discriminator are adjusted according to the sixth adversarial loss to obtain an updated third type discriminator.
  • the face parsing network adopts a semantic segmentation network.
  • X is equal to 1
  • the third type of discriminator is configured to enhance the detail inpainting of the human face skin of the training image by the first generator.
  • the total loss of the generator to be trained further includes: face similarity loss;
  • the first training submodule is used to train the generator to be trained, including:
  • the face similarity loss is calculated according to the first keypoint heatmap and the second keypoint heatmap.
  • the total loss of the generator to be trained further includes: average gradient loss;
  • the first training submodule is used to train the generator to be trained, including:
  • the first generator includes N repair modules, where N is an integer greater than or equal to 2, and each of the repair modules adopts the same network structure;
  • the training process for the generator to be trained includes a first training phase and a second training phase, and both the first training phase and the second training phase include at least one training process for the generator to be trained;
  • each of the repair modules is independently adjusted for parameters.
  • the learning rate used in the first training phase is greater than the learning rate used in the second training phase.
  • the at least two discriminators include: a fourth type discriminator and a fifth type discriminator; the fourth type discriminator is configured to maintain the structure of the training image by the first generator feature; the fifth type of discriminator is configured to enhance the detail inpainting of the training image by the first generator.
  • the training module further includes a second training submodule
  • the second training submodule is used to train the generator to be trained, including:
  • the total adversarial loss includes the seventh adversarial loss and the eighth adversarial loss
  • the second training submodule is used to train the at least two discriminators, including:
  • the training image to be repaired For each scale of inpainted training images, set the inpainted training images of each scale to have false-valued labels, and input the inpainted training images with false-valued labels to the initial fourth-class discriminator or the last The fourth type of discriminator after training to obtain the twelfth discrimination result; for each scale of the training image to be repaired, the training image to be repaired is set to have a true value label, and the The training image to be repaired is input to the initial fourth type discriminator or the fourth type discriminator after the last training to obtain the thirteenth discrimination result;
  • the high-frequency filtered inpainting training images of each scale are set to have false value labels, and the high frequency filtered inpainting training images with false value labels are input to the initial fifth type discriminator or the fifth type discriminator after the last training to obtain the fourteenth discrimination result;
  • the Gaussian filtered verification image of each scale set the Gaussian filtered verification image to In order to have the true value label, and input the Gaussian filtered verification image with the true value label to the initial fifth type discriminator or the fifth type discriminator after the last training, to obtain the fifteenth discrimination result;
  • the parameters of the fifth type of discriminator are adjusted according to the tenth adversarial loss to obtain an updated fifth type of discriminator.
  • the total loss of the generator to be trained further includes: average gradient loss;
  • the second training submodule is used to train the generator to be trained, including:
  • the calculation formula of the average gradient loss AvgG is as follows:
  • m and n are the width and height of the inpainted training image of size N, respectively, and fi,j are the pixels of the inpainted training image of size N at position (i, j).
  • the first generator includes N repair modules, and the at least two discriminators include: N first-type discriminators with different network structures corresponding to the N repair modules respectively;
  • the training module also includes a third training module
  • the third training sub-module is used to train the generator to be trained and includes:
  • For each scale inpainted training image set the inpainted training image of each scale to have ground truth labels, and input the inpainted training images with ground truth labels to the initial first-class discriminator or the last The first type of discriminator after training to obtain the first discrimination result;
  • the total adversarial loss includes the first adversarial loss
  • the third training submodule is used to train the at least two discriminators, including:
  • the training image is processed into a training image of N scales to be repaired, and the verification image is processed into a verification image of N scales;
  • the parameters of the first type discriminator are adjusted according to the third adversarial loss to obtain an updated first type discriminator.
  • the first generator includes N repair modules
  • the third training submodule is used to train the generator to be trained, including:
  • the first loss includes the loss of the N scales of inpainted training images on M target layers.
  • the first loss includes: the losses of the repaired training images of each scale on the M target layers are multiplied by the corresponding weights, and the weights used in the target layers of the repaired training images of different scales are added. different.
  • the first loss further includes: pixel-by-pixel two-normal form loss.
  • the first generator includes four scale repair modules, namely: a 64*64 scale repair module, a 128*128 scale repair module, a 256*256 scale repair module, and a 512*512 scale repair module. Repair module.
  • S is equal to 5
  • the S key point mask images include: key point mask images of the left eye, the right eye, the nose, the mouth, and the outline.
  • an embodiment of the present disclosure further provides an image processing apparatus, including:
  • a receiving module 271, configured to receive an input image
  • a face detection module 272 configured to perform face detection on the input image to obtain a face image
  • a first processing module configured to process the face image by using the image processing method described in any of the above embodiments to obtain a first repaired training image, wherein the first repaired training image has a higher definition than the the clarity of the input image;
  • the second processing module 273 is configured to process the input image or the input image from which the face image is removed to obtain a second repaired training image, wherein the definition of the second repaired training image is higher than that of the input image clarity;
  • the fusion module 274 is configured to fuse the first repaired training image and the second repaired training image to obtain a fused image, and the definition of the fusion image is higher than that of the input image.
  • the second processing module 273 is configured to process the input image or the input image from which the face image is removed by using the image processing method described in any of the foregoing embodiments to obtain a second repaired training image.
  • Embodiments of the present disclosure further provide an electronic device, including a processor, a memory, and a program or instruction stored on the memory and executable on the processor, where the program or instruction is implemented when executed by the processor.
  • An embodiment of the present disclosure further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the image processing method in any of the foregoing embodiments are implemented.
  • the processor is the processor in the terminal described in the foregoing embodiment.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供一种图像处理方法、图像处理装置、电子设备及可读存储介质,该图像处理方法包括:接收输入图像;利用第一生成器对所述输入图像进行处理得到输出图像,其中,所述输出图像的清晰度高于所述输入图像的清晰度;其中,所述第一生成器是利用至少两个鉴别器对待训练生成器训练得到。本公开中,由于用于图像修复的第一生成器是采用至少两个鉴别器训练得到,因此能够使得修复的图像细节更加丰富,提高修复效果。

Description

图像处理方法、图像处理装置、电子设备及可读存储介质 技术领域
本公开实施例涉及图像处理技术领域,尤其涉及一种图像处理方法、图像处理装置、电子设备及可读存储介质。
背景技术
图像质量修复技术在老照片修复,视频清晰化等领域中有着广泛应用。当前的大多数算法使用超分辨率重构技术来修复低清图像,得到的结果通常比较平滑,或者,在人脸修复过程中人脸五官比较容易变形,因此,如何提高图像的修复效果是亟待解决的技术问题。
发明内容
本公开实施例提供一种图像处理方法、图像处理装置、电子设备及可读存储介质,用于解决目前的图像修复方法修复效果不理想的问题。
为了解决上述技术问题,本公开是这样实现的:
第一方面,本公开实施例提供了一种图像处理方法,包括:
接收输入图像;
利用第一生成器对所述输入图像进行处理得到输出图像,其中,所述输出图像的清晰度高于所述输入图像的清晰度;
其中,所述第一生成器是利用至少两个鉴别器对待训练生成器训练得到。
第二方面,本公开实施例提供了一种图像处理方法,包括:
接收输入图像;
对所述输入图像进行人脸检测,得到人脸图像;
采用如上述第一方面的方法对所述人脸图像进行处理得到第一修复训练图像,其中,所述第一修复训练图像的清晰度高于所述输入图像的清晰度;
对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像,其中,所述第二修复训练图像的清晰度高于所述输入图像的清晰度;
将所述第一修复训练图像和所述第二修复训练图像进行融合,得到融合后的图像,所述融合图像的清晰度高于所述输入图像的清晰度。
第三方面,本公开实施例提供了一种图像处理装置,包括:
接收模块,用于接收输入图像;
处理模块,用于利用第一生成器对所述输入图像进行处理得到输出图像,其中,所述输出图像的清晰度高于所述输入图像的清晰度;
其中,所述第一生成器是利用至少两个鉴别器对待训练生成器训练得到。
第四方面,本公开实施例提供了一种图像处理装置,包括:
接收模块,用于接收输入图像;
人脸检测模块,用于对所述输入图像进行人脸检测,得到人脸图像;
第一处理模块,用于采用如上述第一方面所述的方法对所述人脸图像进行处理得到第一修复训练图像,其中,所述第一修复训练图像的清晰度高于所述输入图像的清晰度;
第二处理模块,用于对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像,其中,所述第二修复训练图像的清晰度高于所述输入图像的清晰度;
将所述第一修复训练图像和所述第二修复训练图像进行融合,得到融合后的图像,所述融合图像的清晰度高于所述输入图像的清晰度。
第五方面,本公开实施例提供了一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现上述第一方面的所述的图像处理方法的步骤,或者,所述程序或指令被所述处理器执行时实现上述第二方面所述的图像处理方法的步骤。
第六方面,本公开实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如上述第一方面所述的图像处理方法的步骤,或者,实现如上述第二方面所述的图像处理方法的步骤。
在本公开实施例中,由于用于图像修复的第一生成器是采用至少两个鉴别器训练得到,因此能够使得修复的图像细节更加丰富,提高修复效果。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本公开的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1为本公开一实施例的图像处理方法的流程示意图;
图2为本公开一实施例的多尺度的第一生成器的示意图;
图3为本公开另一实施例的图像处理方法的流程示意图;
图4为本公开又一实施例的图像处理方法的流程示意图;
图5为本公开实施例的关键点提取方法的示意图;
图6为本公开实施例的关键点蒙版图像的生成方法的示意图;
图7为本公开实施例的另一实施例的多尺度的第一生成器的示意图;
图8为本公开实施例的生成器的各类损失的示意图;
图9、图11、图13、图17、图18、图19为本公开一实施例的生成器的训练方法的示意图;
图10、图12、图14为本公开一实施例的鉴别器的训练方法的示意图;
图15为本公开实施例的人脸部位图像的示意图;
图16为本公开一实施例的生成器和鉴别器的输入和输出示意图;
图20为本公开另一实施例的生成器的训练方法的示意图;
图21为本公开另一实施例的鉴别器的训练方法的示意图;
图22为本公开另一实施例的生成器和鉴别器的输入和输出示意图;
图23为本公开又一实施例的生成器的训练方法的示意图;
图24为本公开又一实施例的鉴别器的训练方法的示意图;
图25为本公开又一实施例的图像处理方法的流程示意图;
图26为本公开一实施例的图像处理装置的结构示意图;
图27为本公开另一实施例的图像处理装置的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
请参考图1,本公开实施例提供一种图像处理方法,包括:
步骤11:接收输入图像;
所述输入图像可以为待处理的图像,例如为清晰度较低的图像。待处理图像可以是从视频中提取的视频帧,也可以是通过网络下载或者通过相机拍摄的图片,还可以为通过其他途径获取的图像,本公开的实施例对此不作限制。输入图像中例如有很多噪声,且画质比较模糊,因此需要利用本公开实施例提供的图像处理方法来去噪和/或去模糊,从而提升清晰度,实现画质增强。例如,当输入图像为彩色图像时,输入图像可以包括红色(R)通道输入图像、绿色(G)通道输入图像和蓝色(B)通道输入图像。
步骤12:利用第一生成器对所述输入图像进行处理得到输出图像,其中,所述输出图像的清晰度高于所述输入图像的清晰度;其中,所述第一生成器是利用至少两个鉴别器对待训练生成器训练得到。
第一生成器可以为已经训练好的神经网络。待训练生成器可以为基于上文描述的卷积神经网络的结构建立的,但是参数还需要训练的网络。例如,利用待训练生成器来训练第一生成器,待训练生成器的参数多于第一生成器的参数。例如,神经网络的参数包括神经网络中各个卷积层的权重参数。权重参数的绝对值越大,则该权重参数对应的神经元对神经网络的输出的贡献越大,进而对该神经网络来说越重要。通常,参数越多的神经网络的复杂度越高,其“容量”也就越大,也就意味着该神经网络能完成更复杂的学习任务。相对于待训练生成器,第一生成器得到了简化,第一生成器具有更少的参数和更简单的网络结构,使得第一生成器在其运行时占用较少的资源(例如计算资源、存储资源等),因而可以应用于轻量级的终端。采用上述训练的方式,可以使第一生成器学习待训练生成器的推理能力,从而使第一生成器在具备简单结构的同时具备较强的推理能力。
需要说明的是,本公开的实施例中,“清晰度”例如是指图像中各细部影 纹及其边界的清晰程度,清晰度越高,人眼的感观效果越好。修复训练图像的清晰度高于输入图像的清晰度,例如是指采用本公开实施例提供的图像处理方法对输入图像进行处理,例如进行去噪和/或去模糊处理,从而使处理后得到的修复训练图像比输入图像更清晰。
本公开实施例中,所述输入图像可以包含人脸图像,即第一生产器用于进行人脸修复,当然,所述输入图像也可以是其他类型的图像。
本公开实施例中,由于用于图像修复的第一生成器是采用至少两个鉴别器训练得到,因此能够使得修复的图像细节更加丰富,提高修复效果。
在本公开的一些实施例中,可选的,所述第一生成器包括N个修复模块,所述修复模块用于对输入的指定尺度的图像进行去噪和/或去模糊,提升输入的图像的清晰度。其中,N为大于或等于2的整数,在一些优选实施例中,N可以等于4,进一步优选的,请参考图2,4个修复模块包括:64*64尺度的修复模块、128*128尺度的修复模块、256*256尺度的修复模块和512*512尺度的修复模块。当然,上述修复模块的个数也可以为其他数值,每个修复模块对应的尺度也不限于上述举例的四种。
本公开实施例中,尺度是指分辨率。
本公开实施例中,可选的,所述修复模块采用的网络结构为SRCNN或U-Net。
本公开实施例中,可选的,利用第一生成器对所述输入图像进行处理得到输出图像包括:
将所述输入图像处理成N个尺度的待修复图像,其中,第一尺度的待修复图像的尺度到第N尺度的待修复图像的尺度依次递增;
利用所述N个修复模块和所述N个尺度的待修复图像,得到所述输出图像。本公开实施例中,可选的,所述N个尺度中相邻的两个尺度,后一个尺度为前一个尺度的2倍。例如,N个尺度分别64*64尺度、128*128尺度、256*256尺度和512*512尺度。
本公开实施例中,可选的,将所述输入图像处理成N个尺度的待修复图像包括:
确定所述输入图像所属的尺度区间;
将所述输入图像处理成与其所属的尺度区间对应的第j尺度的待修复图像,所述第j尺度为所述第一尺度至第N尺度中的一种;
对所述第j尺度的待修复图像进行上采样和/或下采样处理,得到其余N-1个尺度的待修复图像。
上述实施例中的上采样和下采样可以为插值,例如双立方插值等。
即,可以首先将所述输入图像处理成是N个尺度中的其中一个尺度的待修复图像,然后对待修复图像进行上采样和/或下采样图像,得到其他N-1个尺度的待修复图像。或者,也可以将所述输入图像依次采样为N个尺度的待修复图像。
请参考图2,图2所示的实施例中,首先判断输入图像的尺度所属的尺度区间,若输入图像的尺度小于或等于96*96,则对输入图像进行上采样或下采样处理,得到64*64尺度的待修复训练图像,然后,再对64*64尺度的待修复训练图像进行上采样,得到128*128、256*256、512*512尺度的待修复训练图像。若输入图像的尺度大于96*96且小于或等于192*192,则对输入图像进行上采样或下采样处理,得到128*128尺度的待修复训练图像,然后,再对128*128尺度的待修复训练图像进行下采样和上采样,得到64*64、256*256、512*512尺度的待修复训练图像。若输入图像的尺度大于192*192且小于或等于384*384,则对输入图像进行上采样或下采样处理,得到256*256尺度的待修复训练图像,然后,再对256*256尺度的待修复训练图像进行下采样和上采样,得到64*64、128*128、512*512尺度的待修复训练图像。若输入图像的尺度大于384*384,则对输入图像进行上采样或下采样处理,得到512*512尺度的待修复训练图像,然后,再对512*512尺度的待修复训练图像进行下采样,得到64*64、128*128、256*256尺度的待修复训练图像。
当然,需要说明的是,上述用于判断输入图像所属的区间的数值,可以根据需要进行选择,上述方案中是取N个尺度的待修复图像中的相邻两个尺度的中间尺度,例如64*64和128*128相邻两个尺度的中间尺度就是96*96,128*128和256*256相邻两个尺度的中间尺度就是192*192,以此类推,当然具体方案也不限于上述96*96、192*192、384*384。
上述实施例中,上采样或下采样可以通过插值方式实现。
在本公开的一些实施例中,请参考图3,利用所述N个修复模块和所述N个尺度的待修复图像,得到所述输出图像包括:
步骤31:将第一尺度的待修复图像和第一尺度的随机噪声图像进行拼接,得到第一拼接图像;将所述第一拼接图像输入至第一个修复模块中得到第一尺度的修复图像;对所述第一尺度的修复图像进行上采样处理,得到第二尺度的上采样图像;
上述第一尺度的随机噪声图像可以是随机生成的,也可以是通过对输入图像相同尺度的随机噪声图像通过上采样或下采样生成。
仍以图2为例进行说明,得到64*64尺度的待修复图像(即图2中的输入1)和64*64尺度的随机噪声图像之后,将64*64尺度的待修复图像和64*64尺度的随机噪声图像进行拼接,得到第一拼接图像,并将第一拼接图像输入到第一个修复模块中,得到64*64尺度的修复图像,然后对64*64尺度的修复图像进行上采样处理,得到128*128尺度的上采样图像;
步骤32:将第i尺度的上采样图像、第i尺度的待修复图像和第i尺度的随机噪声图像进行拼接,得到第i拼接图像;将所述第i拼接图像输入至第i个修复模块中得到第i尺度的修复图像;对所述第i尺度的修复图像进行上采样处理,得到第i+1尺度的上采样图像;其中,i为大于或等于2的整数;
第i个修复模块是位于第一个修复模块和最后一个修复模块之间的修复模块。
仍以图2为例进行说明,对于第二个修复模块,首先将得到的128*128尺度的待修复图像(即图2中的输入2)、128*128尺度的随机噪声图像以及128*128尺度的上采样图像进行拼接,得到第二拼接图像,并将第二拼接图像输入到第二个修复模块中,得到128*128尺度的修复图像,然后对128*128尺度的修复图像进行上采样处理,得到256*256尺度的上采样图像;对于第三个修复模块,首先将得到的256*256尺度的待修复图像(即图2中的输入3)、256*256尺度的随机噪声图像以及256*256尺度的上采样图像进行拼接,得到第三拼接图像,并将第三拼接图像输入到第三个修复模块中,得到256*256尺度的修复图像,然后对256*256尺度的修复图像进行上采样处理, 得到512*512尺度的上采样图像;
步骤33:将第N尺度的上采样图像、第N尺寸的待修复图像和第N尺度的随机噪声图像进行拼接,得到第N拼接图像;将所述第N拼接图像输入至第N个修复模块中得到第N尺度的修复图像,作为所述第一生成器的修复训练图像。
仍以图2为例进行说明,对于最后一个修复模块,首先将得到的512*512尺度的待修复图像(即图2中的输入4)、512*512尺度的随机噪声图像以及512*512尺度的上采样图像进行拼接,得到第四拼接图像,并将第四拼接图像输入到最后一个修复模块中,得到512*512尺度的修复图像,作为所述第一生成器的修复训练图像。
本公开实施例中,在进行图像修复时,向第一生成器中加入了随机噪声,原因在于单独将模糊图像输入到第一生成器中,得到的修复图像可能会因为缺乏高频信息而出现“磨皮”过度的效果。在第一生成器的输入中加入随机噪声,随机噪声能够被映射为修复后图像上的高频信息,从而丰富修复图像的细节。
在本公开的另外一些实施例中,请参考图4,利用所述N个修复模块和所述N个尺度的待修复图像,得到所述输出图像包括:
步骤41:针对每个尺度的待修复图像,提取所述待修复图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像,其中,S为大于或等于2的整数;
本公开实施例中,可选的,请参考图5,可以采用4堆栈沙漏模型,提取所述待修复图像中的关键点,例如提取人脸图像中的68个关键点,生成68张关键点热图,其中,每张关键点热图上代表图像上所有像素是某个关键点(landmark)的概率。然后,请参考图6,对多张关键点热图进行合并(Merge)和分类(softmax),得到S张对应不同面部部位(components)的关键点蒙版(mask)图像,举例来说,S可以为5,对应的面部部位分别可以是:左眼、右眼、鼻子、嘴、轮廓。当然,在本公开的其他一些实施例中,也不排除采用其他的关键点提取技术提取所述待修复图像中的关键点,提取的关键点也不限于68,关键点蒙版图像的个数也不限于5张,即面部部位不限于5。
步骤42:将第一尺度的待修复图像和第一尺度的S张关键点蒙版图像进行拼接,得到第一拼接图像;将所述第一拼接图像输入至第一个修复模块中得到第一尺度的修复图像;对所述第一尺度的修复图像进行上采样处理,得到第二尺度的上采样图像;
以图7为例进行说明,得到64*64尺度的待修复图像和64*64尺度的关键点蒙版图像之后,将64*64尺度的待修复图像和64*64尺度的关键点蒙版图像进行拼接,得到第一拼接图像,并将第一拼接图像输入到第一个修复模块中,得到64*64尺度的修复图像,然后对64*64尺度的修复图像进行上采样处理,得到128*128尺度的上采样图像;
步骤43:将所述第i尺度的上采样图像、第i尺度的待修复图像和第i尺度的S张关键点蒙版图像进行拼接,得到第i拼接图像;将所述第i拼接图像输入至第i个修复模块中得到第i尺度的修复图像;对所述第i尺度的修复图像进行上采样处理,得到第i+1尺度的上采样图像;其中,i为大于或等于2的整数;
第i个修复模块是位于第一个修复模块和最后一个修复模块之间的修复模块。
以图7为例进行说明,对于第二个修复模块,首先将得到的128*128尺度的待修复图像、128*128尺度的关键点蒙版图像以及128*128尺度的上采样图像进行拼接,得到第二拼接图像,并将第二拼接图像输入到第二个修复模块中,得到128*128尺度的修复图像,然后对128*128尺度的修复图像进行上采样处理,得到256*256尺度的上采样图像;对于第三个修复模块,首先将得到的256*256尺度的待修复图像、256*256尺度的关键点蒙版图像以及256*256尺度的上采样图像进行拼接,得到第三拼接图像,并将第三拼接图像输入到第三个修复模块中,得到256*256尺度的修复图像,然后对256*256尺度的修复图像进行上采样处理,得到512*512尺度的上采样图像;
步骤44:将第N尺度的上采样图像、第N尺度的待修复图像和第N尺度的S张关键点蒙版图像进行拼接,得到第N拼接图像;将所述第N拼接图像输入至第N个修复模块中得到第N尺度的修复图像,作为所述第一生成器的修复训练图像。
仍以图7为例进行说明,对于最后一个修复模块,首先将得到的512*512尺度的待修复图像、512*512尺度的关键点蒙版图像以及512*512尺度的上采样图像进行拼接,得到第四拼接图像,并将第四拼接图像输入到最后一个修复模块中,得到512*512尺度的修复图像,作为所述第一生成器的修复训练图像。
本公开实施例中,将人脸关键点热图引入到图像清晰化处理中,可以在保证图像清晰化的同时,减轻人脸五官的变形程度,提高最终的图像修复效果。
下面对本公开实施例中的第一生成器的训练方法进行说明。
本公开实施例中,可选的,所述第一生成器是采用至少两个鉴别器对待训练生成器训练得到包括:根据训练图像和验证图像对所述待训练生成器和所述至少两个鉴别器进行交替训练,得到所述第一生成器,其中,所述验证图像的清晰度高于所述训练图像的清晰度,对所述待训练生成器进行训练时,所述待训练生成器的总损失包括以下至少一项:第一损失和所述至少两个鉴别器的总对抗损失。
在本公开的一些实施例中,可选的,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数,在一些优选实施例中,N可以等于4,进一步优选的,请参考图2,4个修复模块包括:64*64尺度的修复模块、128*128尺度的修复模块、256*256尺度的修复模块和512*512尺度的修复模块。当然,上述修复模块的个数也可以为其他数值,每个修复模块对应的尺度也不限于上述举例的四种。所述至少两个鉴别器包括:分别与所述N个修复模块对应的N个网络结构不同的第一类鉴别器;例如所述第一生成器包括4个修复模块,则所述至少两个鉴别器包括4个第一类鉴别器,请参考图8,4个第一类鉴别器可以分别为图8中的鉴别器1、鉴别器2、鉴别器3和鉴别器4。使用对应多个尺度的第一类鉴别器进行训练可以使训练得到的第一生成器处理输出的人脸图像比采用单一尺度的单个鉴别器进行训练得到的第一生成器输出的人脸图像更接近于真实的人脸图像,修复效果也更好,细节更丰富,形变也更小。
下面分别对待训练生成器和所述至少两个鉴别器的训练过程进行说明。
其中,请参考图9,训练所述待训练生成器包括:
步骤91:将所述训练图像处理成N个尺度的待修复训练图像;
本公开实施例中,可以首先将所述训练图像处理成是N个尺度中的其中一个尺度的待修复训练图像,然后对待修复训练图像进行上采样和/或下采样图像,得到其他N-1个尺度的待修复训练图像。或者,也可以将所述训练图像依次采样为N个尺度的待修复训练图像。
以图8为例,可以将训练图像处理成64*64、128*128、256*256和512*512尺度的四个待修复训练图像。
步骤92:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
本公开实施例中,如果是对待训练生成器第一次训练,则是将所述N个尺度的待修复训练图像输入至待训练生成器中,如果不是第一次训练,则是将所述N个尺度的待修复训练图像输入至上一次训练后的生成器中。
待训练生成器对N个尺度的待修复训练图像的具体处理方式可以参见图3和图4所示的实施例中的处理方式,在此不再重复说明。
以图8为例,将64*64、128*128、256*256和512*512尺度的四个待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到64*64、128*128、256*256和512*512尺度的四个修复训练图像。
步骤93:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第一鉴别结果;
以图8为例,将64*64尺度的修复训练图像设置为具有真值标签,并将具有真值标签的64*64尺度的修复训练图像输入至鉴别器1中,以得到鉴别器1的鉴别结果;将128*128尺度的修复训练图像设置为具有真值标签,并将具有真值标签的128*128尺度的修复训练图像输入至鉴别器2中,以得到鉴别器2的鉴别结果;将256*256尺度的修复训练图像设置为具有真值标签,并将具有真值标签的256*256尺度的修复训练图像输入至鉴别器3中,以得到鉴别器3的鉴别结果;将512*512尺度的修复训练图像设置为具有真值标签,并将具有真值标签的512*512尺度的修复训练图像输入至鉴别器4中, 以得到鉴别器4的鉴别结果。
步骤94:基于所述第一鉴别结果计算第一对抗损失;所述总对抗损失包括所述第一对抗损失。
可选的,第一对抗损失为每个尺度的修复训练图像对应的对抗损失之和。
步骤95:根据所述总对抗损失调整所述待训练生成器的参数。
其中,请参考图10,训练所述至少两个鉴别器包括:
步骤101:将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
本公开实施例中,可以首先将所述训练图像处理成是N个尺度中的其中一个尺度的待修复训练图像,然后对待修复训练图像进行上采样和/或下采样图像,得到其他N-1个尺度的待修复训练图像。或者,也可以将所述训练图像依次采样为N个尺度的待修复训练图像。
本公开实施例中,可以首先将所述验证图像处理成是N个尺度中的其中一个尺度的验证图像,然后对处理后的验证图像进行上采样和/或下采样图像,得到其他N-1个尺度的验证图像。或者,也可以将所述验证图像依次采样为N个尺度的验证图像。
以图8为例,可以将训练图像处理成64*64、128*128、256*256和512*512尺度的四个待修复训练图像。将验证图像处理成64*64、128*128、256*256和512*512尺度的四个验证图像。
步骤102:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
待训练生成器对N个尺度的待修复训练图像的具体处理方式可以参见图3和图4所示的实施例中的处理方式,不再重复说明。
以图8为例,将64*64、128*128、256*256和512*512尺度的四个待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到64*64、128*128、256*256和512*512尺度的四个修复训练图像。
步骤103:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第三鉴别结果; 将每一尺度的验证图像设置为具有真值标签,并将具有真值标签的每个验证图像输入所述第一类鉴别器以得到第四鉴别结果;
以图8为例,将64*64尺度的修复训练图像设置为具有假值标签,并将具有假值标签的64*64尺度的修复训练图像输入至鉴别器1中,以得到鉴别器1的第三鉴别结果,将64*64尺度的验证图像设置为具有真值标签,并将具有真值标签的64*64尺度的验证图像输入至鉴别器1中,以得到鉴别器1的第四鉴别结果;将128*128尺度的修复训练图像设置为具有假值标签,并将具有假值标签的128*128尺度的修复训练图像输入至鉴别器2中,以得到鉴别器2的第三鉴别结果,将128*128尺度的验证图像设置为具有真值标签,并将具有真值标签的128*128尺度的验证图像输入至鉴别器2中,以得到鉴别器2的第四鉴别结果;将256*256尺度的修复训练图像设置为具有假值标签,并将具有假值标签的256*256尺度的修复训练图像输入至鉴别器3中,以得到鉴别器3的第三鉴别结果,将256*256尺度的验证图像设置为具有真值标签,并将具有真值标签的256*256尺度的验证图像输入至鉴别器3中,以得到鉴别器3的第四鉴别结果;将512*512尺度的修复训练图像设置为具有假值标签,并将具有假值标签的512*512尺度的修复训练图像输入至鉴别器4中,以得到鉴别器4的第三鉴别结果,将512*512尺度的验证图像设置为具有真值标签,并将具有真值标签的512*512尺度的验证图像输入至鉴别器4中,以得到鉴别器4的第四鉴别结果。
步骤104:基于所述第三鉴别结果和第四鉴别结果计算第三对抗损失;
步骤105:根据所述第三对抗损失调整所述第一类鉴别器的参数以得到更新后的第一类鉴别器。
在本公开的一些实施例中,可选的,所述至少两个鉴别器还包括:分别与所述N个修复模块对应的N个网络结构不同的第一类鉴别器和第二类鉴别器,所述第二类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸局部的清晰度修复,这样训练得到的第一生成器输出的图像中人脸局部特征清晰度会更高;
下面分别对待训练生成器、至少两个鉴别器的训练过程进行说明。
请参考图11,训练所述待训练生成器包括:
步骤111:将所述训练图像处理成N个尺度的待修复训练图像;
本公开实施例中,可以首先将所述训练图像处理成是N个尺度中的其中一个尺度的待修复训练图像,然后对待修复训练图像进行上采样和/或下采样图像,得到其他N-1个尺度的待修复训练图像。或者,也可以将所述训练图像依次采样为N个尺度的待修复训练图像。
以图8为例,可以将训练图像处理成64*64、128*128、256*256和512*512尺度的四个待修复训练图像。
步骤112:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
待训练生成器对N个尺度的待修复训练图像的具体处理方式可以参见图3和图4所示的实施例中的处理方式,不再重复说明。
以图8为例,将64*64、128*128、256*256和512*512尺度的四个待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到64*64、128*128、256*256和512*512尺度的四个修复训练图像。
步骤113:获取第N尺度的修复训练图像的第一人脸局部图像;
在本公开的一些实施例中,可选的,所述第一人脸局部图像为眼部图像。本公开实施例中,可以直接截图的方式截取第N尺度的修复训练图像中的眼部图像作为第一人脸局部图像。
步骤114:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第一鉴别结果;
以图8为例,将64*64尺度的修复训练图像设置为具有真值标签,并将具有真值标签的64*64尺度的修复训练图像输入至鉴别器1中,以得到鉴别器1的一鉴别结果;将128*128尺度的修复训练图像设置为具有真值标签,并将具有真值标签的128*128尺度的修复训练图像输入至鉴别器2中,以得到鉴别器2的一鉴别结果;将256*256尺度的修复训练图像设置为具有真值标签,并将具有真值标签的256*256尺度的修复训练图像输入至鉴别器3中,以得到鉴别器3的一鉴别结果;将512*512尺度的修复训练图像设置为具有真值标签,并将具有真值标签的512*512尺度的修复训练图像输入至鉴别器 4中,以得到鉴别器4的第一鉴别结果。
步骤115:将所述第一人脸局部图像设置为具有真值标签,并将具有真值标签的所述第一人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第二鉴别结果;
以图8为例,图8中的鉴别器5为第二类鉴别器,将第一人脸局部图像设置为具有真值标签,并将具有真值标签的第一人脸局部图像输入至鉴别器5中,以得到鉴别器5的第二鉴别结果;
步骤116:基于所述第一鉴别结果计算第一对抗损失;基于所述第二鉴别结果计算第二对抗损失,所述总对抗损失包括所述第一对抗损失和所述第二对抗损失;
可选的,第一对抗损失为每个尺度的修复训练图像对应的对抗损失之和。
步骤117:根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整。
请参考图12,训练所述至少两个鉴别器包括:
步骤121:将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
本公开实施例中,可以首先将所述训练图像处理成是N个尺度中的其中一个尺度的待修复训练图像,然后对待修复训练图像进行上采样和/或下采样图像,得到其他N-1个尺度的待修复训练图像。或者,也可以将所述训练图像依次采样为N个尺度的待修复训练图像。
本公开实施例中,可以首先将所述验证图像处理成是N个尺度中的其中一个尺度的验证图像,然后对处理后的验证图像进行上采样和/或下采样图像,得到其他N-1个尺度的验证图像。或者,也可以将所述验证图像依次采样为N个尺度的验证图像。
以图8为例,可以将训练图像处理成64*64、128*128、256*256和512*512尺度的四个待修复训练图像。将验证图像处理成64*64、128*128、256*256和512*512尺度的四个验证图像。
步骤122:获取第N尺度的验证图像的第二人脸局部图像;
本公开实施例中,可选的,所述第一人脸局部图像和第二人脸局部图像 为眼部图像。
本公开实施例中,可以直接截图的方式截取第N尺度的验证图像中的眼部图像作为所述第二人脸局部图像。
步骤123:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
待训练生成器对N个尺度的待修复训练图像的具体处理方式可以参见图3和图4所示的实施例中的处理方式,再次不再重复说明。
以图8为例,将64*64、128*128、256*256和512*512尺度的四个待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到64*64、128*128、256*256和512*512尺度的四个修复训练图像。
步骤124:获取第N尺度的修复训练图像的第一人脸局部图像;
本公开实施例中,可以直接截图的方式截取第N尺度的修复训练图像中的眼部图像作为所述第一人脸局部图像。
步骤125:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第三鉴别结果;将每一尺度的验证图像设置为具有真值标签,并将具有真值标签的每个验证图像输入所述第一类鉴别器以得到第四鉴别结果;
步骤126:将所述第一人脸局部图像设置为具有假值标签,并将具有假值标签的所述第一人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第五鉴别结果;将所述第二人脸局部图像设置为具有真值标签,并将具有真值标签的所述第二人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第六鉴别结果;
步骤127:基于所述第三鉴别结果和第四鉴别结果计算第三对抗损失;基于所述第五鉴别结果和第六鉴别结果计算第四对抗损失;
步骤128:根据所述第三对抗损失调整所述第一类鉴别器的参数以得到更新后的第一类鉴别器;根据所述第四对抗损失调整所述第二类鉴别器的参数以得到更新后的第二类鉴别器。
本公开实施例中,由于眼部是人脸的最要组成部分,通过增加眼部图像 的对抗损失,可以提高训练效果。
在本公开的一些实施例中,可选的,所述至少两个鉴别器还包括:X个第三类鉴别器;X为大于或等于1的正整数,所述第三类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸部位的细节修复,即与其他训练方法相比较,采用该第三类鉴别器训练得到的第一生成器输出的人脸图像中的人眼图像更加清晰,人眼图像具有更多细节。
请参考图13,训练所述待训练生成器还包括:
步骤131:将所述训练图像处理成N个尺度的待修复训练图像;
将所述训练图像处理成N个尺度的待修复训练图像的具体方法可以参见上述实施例中的说明,不再重复描述。
步骤132:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
待训练生成器对N个尺度的待修复训练图像的处理过程可以参见上述实施例中的说明,不再重复描述。
步骤133:采用人脸解析网络对所述第N尺度的修复图像进行人脸解析处理,得到所述第N尺度的修复图像对应的X张第一人脸部位图像,其中若X等于1,所述第一人脸部位图像包含一个人脸部位,若X大于1,所述X张第一人脸部位图像包含包括不同的人脸部位;
本公开实施例中,所述人脸解析网络采用语义分割网络。
本公开实施例中,所述人脸解析网络对人脸进行解析,输出的人脸部位可以包括以下至少一项:背景、脸部皮肤、左眉毛、右眉毛、左眼、右眼、左耳、右耳、鼻子、牙齿、上嘴唇、下嘴唇、衣服、头发、帽子、眼镜、脖子等。
步骤134:将所述X张第一人脸部位图像都设置为具有真值标签,并将具有真值标签的每个第一人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第七鉴别结果;
步骤135:基于所述第七鉴别结果计算第五对抗损失;所述总对抗损失包括所述第五对抗损失;
步骤136:根据所述总对抗损失调整所述待训练生成器或上一次训练后 的生成器的参数。
请参考图14,训练所述至少两个鉴别器包括:
步骤141:将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
步骤142:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤143:采用人脸解析网络对所述第N尺度的修复图像进行人脸解析处理,得到所述第N尺度的修复图像对应的X张第一人脸部位图像,其中X张第一人脸部位图像包含包括不同的人脸部位;采用人脸解析网络对所述第N尺度的验证图像进行人脸解析处理,得到所述第N尺度的验证图像对应的X张第二人脸部位图像,其中所述X张第二人脸部位图像包含不同的人脸部位;
本公开实施例中,所述人脸解析网络采用语义分割网络。
本公开实施例中,所述人脸解析网络对人脸进行解析,输出的人脸部位可以包括以下至少一项:背景、脸部皮肤、左眉毛、右眉毛、左眼、右眼、左耳、右耳、鼻子、牙齿、上嘴唇、下嘴唇、衣服、头发、帽子、眼镜、脖子等。
请参考图15,图15为所示的实施例中,X等于1,所述第三类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸皮肤的细节修复,,即与其他训练方法相比较,采用该第三类鉴别器训练得到的第一生成器输出的人脸图像中的皮肤图像更加清晰,皮肤图像具有更多细节。
步骤144:将所述X张第一人脸部位图像都设置为具有假值标签,并将具有假值标签的所述第一人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第八鉴别结果;将所述X张第二人脸部位图像都设置为具有真值标签,并将具有真值标签的每张第二人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第九鉴别结果;
步骤145:基于所述第八鉴别结果和第九鉴别结果计算第六对抗损失;
步骤146:根据所述第六对抗损失调整所述第三类鉴别器的参数以得到 更新后的第三类鉴别器。
请参考图16,图16为本公开实施例的待训练生成器和鉴别器的输入和输出示意图,从图16中可以看出,待训练生成器的输入包括N个尺度的训练图像、N个尺度的随机噪声图像(或者N个尺度的关键点蒙版图像),待训练生成器的输出为修复后的修复训练图像;鉴别器包括上述对应N个尺度的修复模块的N个第一类鉴别器,以及,X个第三类鉴别器,鉴别器的输入包括:待训练生成器的修复训练图像、N个尺度的验证图像、第N尺度的验证图像对应的X张人脸部位图像、第N尺度的修复训练图像对应的X张人脸部位图像。
本公开实施例中,通过将人脸五官、皮肤和/或头发等分割出来,分别输入到鉴别器中判别真假,使得训练生成器器对每部分人脸进行修复时都存在一个与之对抗的过程,加强了生成器对人脸各部分的生成能力,从而得到更加丰富的细节。
在本公开的一些实施例中,可选的,所述待训练生成器的总损失还包括:人脸相似度损失;
请参考图17,训练所述待训练生成器还包括:
步骤171:将所述训练图像处理成N个尺度的待修复训练图像;
步骤172:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤172:采用关键点检测网络对所述第N尺度的修复图像进行关键点检测,得到所述第N尺度的修复图像对应的第一关键点热图;
步骤173:采用关键点检测网络对所述第N尺度的待修复训练图像进行关键点检测,得到所述第N尺度的待修复训练图像对应的第二关键点热图;
步骤174:根据所述第一关键点热图和所述第二关键点热图计算人脸相似度损失。
请参考图8,图8中的关键点检测模块即关键点检测网络,热图_1即第一关键点热图,热图_2即第二关键点热图。
本公开实施例中,可选的,请参考图5,可以采用4堆栈沙漏模型,提取所述第N尺度的待修复训练图像和修复训练图像中的关键点,例如提取人 脸图像中的68个关键点,生成68张关键点热图,其中,每张关键点热图上代表图像上所有像素是某个关键点(landmark)的概率。
在本公开的一些实施例中,可选的,所述待训练生成器的总损失还包括:平均梯度损失;
请参考图18,训练所述待训练生成器还包括:
步骤181:将所述训练图像处理成N个尺度的待修复训练图像;
步骤182:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤183:计算所述第N尺寸的修复训练图像的平均梯度损失。
本公开实施例中,可选的,平均梯度损失AvgG的计算公式如下:
Figure PCTCN2020125463-appb-000001
其中,m和n分别是第N尺寸的修复训练图像的宽和高,fi,j为第N尺寸的修复训练图像在位置(i,j)上的像素,
Figure PCTCN2020125463-appb-000002
表示在行方向上f i,j与相邻像素之差,
Figure PCTCN2020125463-appb-000003
表示在列方向上f i,j与相邻像素之差。
在本公开的一些实施例中,可选的,所述第一生成器包括N个修复模块,所述待训练生成器采用的损失包括第一损失;本实施例中,第一损失可以称为感知损失;
请参考图19,训练所述待训练生成器还包括:
步骤191:将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
步骤192:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤193:将所述N个尺度的修复训练图像和所述N个尺度的验证图像输入至VGG网络中,得到每个尺度的修复训练图像在所述VGG网络的M个目标层上的损失,M为大于或等于1的整数;所述第一损失包括所述N个尺度的修复训练图像在M个目标层上的损失。
可选的,所述第一损失包括:每个尺度的修复训练图像在M个目标层上的损失与对应的权重相乘之后相加,其中,不同尺度的修复训练图像在目标 层使用的权重不同。
举例来说,待训练生成器包括4个尺度的修复模块,分别为64*64、128*128、256*256、512*512。所述VGG网络为VGG19网络,所述M个目标层分别为2-2层、3-4层、4-4层、5-4层,所述第一损失(即感知损失)L的计算公式如下:
L=L per_64+L per_128+L per_256+L per_512
Figure PCTCN2020125463-appb-000004
Figure PCTCN2020125463-appb-000005
Figure PCTCN2020125463-appb-000006
Figure PCTCN2020125463-appb-000007
其中,L per_64为64*64尺度的修复训练图像的感知损失,L per_128为128*128尺度的修复训练图像的感知损失,L per_256为256*256尺度的修复训练图像的感知损失,L per_512为512*512尺度的修复训练图像的感知损失,
Figure PCTCN2020125463-appb-000008
为不同尺度的修复训练图像在第2-2层的感知损失,
Figure PCTCN2020125463-appb-000009
为不同尺度的修复训练图像在第3-4层的感知损失,
Figure PCTCN2020125463-appb-000010
为不同尺度的修复训练图像在第4-4层的感知损失,
Figure PCTCN2020125463-appb-000011
为不同尺度的修复训练图像在第5-4层的感知损失。
上述例子中,由于不同尺度的清晰化关注的不同,越小分辨率的尺度关注的越全局,也就对应越浅的VGG层,越大分辨率的尺度关注的越局部,也就对应越深的VGG层。
当然,在本公开的一些实施例中,不同尺度的修复训练图像在目标层使用的权重也可以相同,举例来说:
Figure PCTCN2020125463-appb-000012
Figure PCTCN2020125463-appb-000013
Figure PCTCN2020125463-appb-000014
Figure PCTCN2020125463-appb-000015
本公开实施例中,可选的,所述第一损失还包括以下至少一项:L1损失、第二损失和第三损失;
所述第一损失包括L1损失时,所述训练所述待训练生成器包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
通过比较所述N个尺度的修复训练图像和所述N个尺度的验证图像,得到L1损失;
所述第一损失包括所述第二损失时,所述训练所述待训练生成器包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
获取第N尺度的修复训练图像的第一眼部图像和第N尺度的验证图像的第二眼部图像;
将所述第一眼部图像和所述第二眼部图像输入至VGG网络中,得到所述第一眼部图像在所述VGG网络的M个目标层上的第二损失,M为大于或等于1的整数;
所述第一损失包括所述第三损失时,所述训练所述待训练生成器包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
获取第N尺度的修复训练图像的第一人脸皮肤图像和第N尺度的验证图像的第二人脸皮肤图像;
将所述第一人脸皮肤图像和所述第二人脸皮肤图像输入至VGG网络中,得到所述第一人脸皮肤图像在所述VGG网络的M个目标层上的第三损失。
通过上述的第二损失和第三损失,可以更好的提升输出图像的眼部区域和皮肤区域的细节。
在本公开的一些实施例中,所述至少两个鉴别器包括:第四类鉴别器和 第五类鉴别器;所述第四类鉴别器被配置为保持所述第一生成器对所述训练图像的结构性特征,具体的第一生成器的输出图像可以保留输入图像的更多内容信息;所述第五类鉴别器被配置为提升所述第一生成器对所述训练图像的细节修复,具体的与其他训练方法相比,采用第五类鉴别器训练得到的第一生成器处理得到的输出图像具有更多的细节特征,清晰度更高。
请参考图20,训练所述待训练生成器包括:
步骤201:将所述训练图像处理成N个尺度的待修复训练图像;
步骤202:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤203:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十鉴别结果;
步骤204:基于所述第十鉴别结果计算第七对抗损失;
步骤205:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十一鉴别结果;
步骤206:基于所述第十一鉴别结果计算第八对抗损失;所述总对抗损失包括所述第七对抗损失和所述第八对抗损失。
步骤207:根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整。
请参考图21,训练所述至少两个鉴别器包括:
步骤211:将所述训练图像处理成N个尺度的待修复训练图像;将所述验证图像处理成N个尺度的验证图像;
步骤212:将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤213:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十二鉴别结果;针对每一尺度的待修复训练图像,将所述待修复训练图像设置为具有真值标 签,并将具有真值标签的所述待修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十三鉴别结果;
步骤214:基于所述第十二鉴别结果和第三鉴别结果计算第九对抗损失;
步骤215:根据所述第九对抗损失调整所述第四类鉴别器的参数以得到更新后的第四类鉴别器。
步骤216:针对每个尺度的修复训练图像,对所述修复训练图像和对应尺度的验证图像进行高频滤波处理,得到高频滤波后的修复训练图像和验证图像;
步骤217:针对每一尺度的高频滤波后的修复训练图像,将所述高频滤波后的修复训练图像设置为具有假值标签,并将具有假值标签的所述高频滤波后的修复训练图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十四鉴别结果;针对每一尺度的高斯滤波后的验证图像,将所述高斯滤波后的验证图像设置为具有真值标签,并将具有真值标签的所述高斯滤波后的验证图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十五鉴别结果;
步骤218:基于所述第十四鉴别结果和第十五鉴别结果计算第十对抗损失;
步骤219:根据所述第十对抗损失调整所述第五类鉴别器的参数以得到更新后的第五类鉴别器。
请参考图22,图22为本公开另一实施例的待训练生成器和鉴别器的输入和输出示意图,从图22中可以看出,待训练生成器的输入包括N个尺度的训练图像、N个尺度的随机噪声图像(或者N个尺度的关键点蒙版图像),待训练生成器的输出为修复后的修复训练图像;第四类鉴别器包括上述对应N个尺度的修复模块的N个第一类鉴别器,第四类鉴别器的输入包括:待训练生成器的修复训练图像、N个尺度的训练图像。第五类鉴别器包括上述对应N个尺度的修复模块的N个第一类鉴别器,第五类鉴别器的输入包括:待训练生成器的修复训练图像进行高频滤波后的图像、N个尺度的验证图像进行高频滤波后的图像。
本公开实施例中,上述验证图像可以是与训练图像相同内容但清晰度不 同的图像,也可以是与训练图像内容不同清晰度不同的图像。
上述实施例中,设计了两类鉴别器(第四类鉴别器和第五类鉴别器),这样设计的原因在于:细节纹理是图像中的高频信息,自然图像中的高频信息具有服从某种特定分布的特征。第五类鉴别器与生成器相互对抗训练,使得生成器学习到细节纹理所服从的分布,从而能够将平滑的低清图像映射到细节丰富的真实自然图像空间上。第四类鉴别器对低清图像与它所对应的修复结果进行判别,能够约束图像在通过生成器后保持它的结构性特征,不发生形变。
本公开实施例中,可选的,所述第五类鉴别器的损失函数如下所示:
maxV(D1,G)=log[D1(HF(y))]+log[1-D1(HF(G(x))]
第四类鉴别器的损失函数如下所示:
maxV(D2,G)=log[D2(x)]+log[1-D2(G(x))]
其中,G代表生成器,D1和D2分别代表第五类鉴别器和第四类鉴别器,HF代表高斯高频滤波器,x表示输入生成器的训练图像,y表示真实的高清验证图像。
本公开实施例中,所述待训练生成器的总损失还包括:平均梯度损失;即待训练生成器的总损失=第四鉴别器的损失+第五鉴别器的损失+平均梯度损失;
此时,训练所述待训练生成器还包括:
将所述训练图像处理成N个尺度的待修复训练图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
计算所述第N尺寸的修复训练图像的平均梯度损失。
即,生成器的损失函数如下所示:
minV(D,G)=αlog[1-D1(G(x))]+βlog[1-D2(x)]+γAvgG(G(x))
其中,α、β、γ分别代表各项损失的权重,AvgG代表平均梯度损失。平均梯度可以用来评价图像中细节纹理的丰富程度,图像中的细节越丰富,它在某个方向上的灰度值变化速度越快,平均梯度值也越大。
可选的,平均梯度损失AvgG的计算公式如下:
Figure PCTCN2020125463-appb-000016
其中,m和n分别是第N尺寸的修复训练图像的宽和高,fi,j为第N尺寸的修复训练图像在位置(i,j)上的像素。
在本公开的另外一些实施例中,所述第一生成器包括N个修复模块,所述至少两个鉴别器包括:分别与所述N个修复模块对应的N个网络结构不同的第一类鉴别器;
其中,请参考图23,训练所述待训练生成器包括:
步骤231:将所述训练图像处理成N个尺度的待修复训练图像;
步骤232:针对每个尺度的待修复训练图像,提取所述待修复训练图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像,其中,S为大于或等于2的整数;
步骤233:将所述N个尺度的待修复训练图像和每个尺度的S张关键点蒙版图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤234:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第一鉴别结果;
步骤235:基于所述第一鉴别结果计算第一对抗损失;所述总对抗损失包括所述第一对抗损失;
步骤236:根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整;
请参考图24,训练所述至少两个鉴别器包括:
步骤241:将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
步骤242:针对每个尺度的待修复训练图像,提取所述待修复训练图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像;
步骤243:将所述N个尺度的待修复训练图像和每个尺度的S张关键点 蒙版图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
步骤244:针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第三鉴别结果;将每一尺度的验证图像设置为具有真值标签,并将具有真值标签的每个验证图像输入所述第一类鉴别器以得到第四鉴别结果;
步骤245:基于所述第三鉴别结果和第四鉴别结果计算第三对抗损失;
步骤246:根据所述第三对抗损失调整所述第一类鉴别器的参数以得到更新后的第一类鉴别器。
本公开实施例中,可选的,所述第一生成器包括N个修复模块;所述待训练生成器的总损失=第一类鉴别器的损失+第一损失(感知损失);
此时,训练所述待训练生成器包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
将所述N个尺度的修复训练图像和所述N个尺度的验证图像输入至VGG网络中,得到每个尺度的修复训练图像在所述VGG网络的M个目标层上的损失,M为大于或等于1的整数;
所述第一损失包括所述N个尺度的修复训练图像在M个目标层上的损失。
可选的,所述第一损失包括:每个尺度的修复训练图像在M个目标层上的损失与对应的权重相乘之后相加,其中,不同尺度的修复训练图像在目标层使用的权重不同。
举例来说,待训练生成器包括4个尺度的修复模块,分别为64*64、128*128、256*256、512*512。所述VGG网络为VGG19网络,所述M个目标层分别为2-2层、3-4层、4-4层、5-4层,所述第一损失(即感知损失)L的计算公式如下:
L=L per_64+L per_128+L per_256+L per_512
Figure PCTCN2020125463-appb-000017
Figure PCTCN2020125463-appb-000018
Figure PCTCN2020125463-appb-000019
Figure PCTCN2020125463-appb-000020
其中,L per_64为64*64尺度的修复训练图像的感知损失,L per_128为128*128尺度的修复训练图像的感知损失,L per_256为256*256尺度的修复训练图像的感知损失,L per_512为512*512尺度的修复训练图像的感知损失,
Figure PCTCN2020125463-appb-000021
为不同尺度的修复训练图像在第2-2层的感知损失,
Figure PCTCN2020125463-appb-000022
为不同尺度的修复训练图像在第3-4层的感知损失,
Figure PCTCN2020125463-appb-000023
为不同尺度的修复训练图像在第4-4层的感知损失,
Figure PCTCN2020125463-appb-000024
为不同尺度的修复训练图像在第5-4层的感知损失。
上述例子中,由于不同尺度的清晰化关注的不同,越小分辨率的尺度关注的越全局,也就对应越浅的VGG层,越大分辨率的尺度关注的越局部,也就对应越深的VGG层。
可选的,所述待训练生成器采用的损失还包括:逐像素的二范式(L2)损失。即所述待训练生成器的总损失=第一类鉴别器的损失+第一损失(感知损失)+逐像素的二范式损失。
其中,L2损失的计算方法如下:将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;通过比较N个尺度的修复训练图像和N个尺度的验证图像,得到L2损失。
本公开实施例中,可选的,所述第一生成器包括N个修复模块,每个所述修复模块采用相同的网络结构;
对所述待训练生成器的训练过程包括第一训练阶段和第二训练阶段;所述第一训练阶段和所述第二训练阶段均包括对所述待训练生成器的至少一次训练过程;
在所述第一训练阶段,对每个所述修复模块进行调参时,所有修复模块 共享相同的参数;
在所述第二训练阶段,每个所述修复模块分别独立调参。
可选的,在所述第一训练阶段采用的学习率(例如学习率lr=0.0001)大于在所述第二训练阶段采用的学习率(例如学习率lr=0.00005),学习率越大,训练速度越快,由于第一训练阶段需要快速训练出共享的参数,因而使用较大的学习率,而第二训练阶段是更加精细的训练,因而采用较小的学习率对各个修复模块进行微调。这是因为修复模块在较低的尺度上关注人脸的结构性信息,在较高的尺度上关注人脸的细节信息。第一训练阶段之后将共享的参数解耦,使每个尺度上的超分辨率模块能够更加关注该尺度上的信息,从而达到更好的细节修复效果。
请参考图25,本公开实施例还提供一种图像处理方法,包括:
步骤251:接收输入图像;
步骤252:对所述输入图像进行人脸检测,得到人脸图像;
本公开实施例中,可选的对所述输入图像进行人脸检测,得到人脸图像包括:对所述输入图像进行人脸检测,得到检测图像,对所述检测图像进行标准化对齐,得到所述人脸图像。
步骤253:采用上述任一实施例中的方法对所述人脸图像进行处理得到第一修复训练图像,其中,所述第一修复训练图像的清晰度高于所述输入图像的清晰度;
步骤254:对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像,其中,所述第二修复训练图像的清晰度高于所述输入图像的清晰度;
步骤255:将所述第一修复训练图像和所述第二修复训练图像进行融合,得到融合后的图像,所述融合图像的清晰度高于所述输入图像的清晰度。
本公开实施例中,可选的,对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像包括:采用上述任一实施例中所述的方法对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像。
请参考图26,本申请实施例还提供一种图像处理装置260,包括:
接收模块261,用于接收输入图像;
处理模块262,用于利用第一生成器对所述输入图像进行处理得到输出图像,其中,所述输出图像的清晰度高于所述输入图像的清晰度;其中,所述第一生成器是利用至少两个鉴别器对待训练生成器训练得到。
可选的,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数;
所述处理模块,用于将所述输入图像处理成N个尺度的待修复图像,其中,第一尺度的待修复图像的尺度到第N尺度的待修复图像的尺度依次递增;利用所述N个修复模块和所述N个尺度的待修复图像,得到所述输出图像。
可选的,所述N个尺度中相邻的两个尺度,后一个尺度为前一个尺度的2倍。
可选的,所述处理模块,用于确定所述输入图像所属的尺度区间;将所述输入图像处理成与其所属的尺度区间对应的第j尺度的待修复图像,所述第j尺度为所述第一尺度至第N尺度中的一种;对所述第j尺度的待修复图像进行上采样和/或下采样处理,得到其余N-1个尺度的待修复图像。
可选的,所述处理模块,用于:
将第一尺度的待修复图像和第一尺度的随机噪声图像进行拼接,得到第一拼接图像;将所述第一拼接图像输入至第一个修复模块中得到第一尺度的修复图像;对所述第一尺度的修复图像进行上采样处理,得到第二尺度的上采样图像;
将第i尺度的上采样图像、第i尺度的待修复图像和第i尺度的随机噪声图像进行拼接,得到第i拼接图像;将所述第i拼接图像输入至第i个修复模块中得到第i尺度的修复图像;对所述第i尺度的修复图像进行上采样处理,得到第i+1尺度的上采样图像;其中,i为大于或等于2的整数;
将第N尺度的上采样图像、第N尺寸的待修复图像和第N尺度的随机噪声图像进行拼接,得到第N拼接图像;将所述第N拼接图像输入至第N个修复模块中得到第N尺度的修复图像,作为所述第一生成器的输出图像。
可选的,
所述处理模块,用于:
针对每个尺度的待修复图像,提取所述待修复图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像,其中,S为大于或等于2的整数;
将第一尺度的待修复图像和第一尺度的S张关键点蒙版图像进行拼接,得到第一拼接图像;将所述第一拼接图像输入至第一个修复模块中得到第一尺度的修复图像;对所述第一尺度的修复图像进行上采样处理,得到第二尺度的上采样图像;
将所述第i尺度的上采样图像、第i尺度的待修复图像和第i尺度的S张关键点蒙版图像进行拼接,得到第i拼接图像;将所述第i拼接图像输入至第i个修复模块中得到第i尺度的修复图像;对所述第i尺度的修复图像进行上采样处理,得到第i+1尺度的上采样图像;其中,i为大于或等于2的整数;
将第N尺度的上采样图像、第N尺度的待修复图像和第N尺度的S张关键点蒙版图像进行拼接,得到第N拼接图像;将所述第N拼接图像输入至第N个修复模块中得到第N尺度的修复图像,作为所述第一生成器的输出图像。
可选的,采用4堆栈沙漏模型提取所述待修复图像中的关键点。
可选的,所述装置还包括:
训练模块,用于根据训练图像和验证图像对所述待训练生成器和所述至少两个鉴别器进行交替训练,得到所述第一生成器,其中,所述验证图像的清晰度高于所述训练图像的清晰度,对所述待训练生成器进行训练时,所述待训练生成器的总损失包括以下至少一项:第一损失和所述至少两个鉴别器的总对抗损失。
可选的,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数,所述至少两个鉴别器包括:分别与所述N个修复模块对应的N个网络结构不同的第一类鉴别器,以及,第二类鉴别器,所述第二类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸局部的清晰度修复。
所述训练模块包括第一训练子模块;
所述第一训练子模块用于训练所述待训练生成器,包括:
将所述训练图像处理成N个尺度的待修复训练图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
获取第N尺度的修复训练图像的第一人脸局部图像;
针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第一鉴别结果;
将所述第一人脸局部图像设置为具有真值标签,并将具有真值标签的所述第一人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第二鉴别结果;
基于所述第一鉴别结果计算第一对抗损失;基于所述第二鉴别结果计算第二对抗损失,所述总对抗损失包括所述第一对抗损失和所述第二对抗损失;
根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整;
所述第一训练子模块用于训练所述至少两个鉴别器,包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
获取第N尺度的验证图像的第二人脸局部图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
获取第N尺度的修复训练图像的第一人脸局部图像;
针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第三鉴别结果;将每一尺度的验证图像设置为具有真值标签,并将具有真值标签的每个验证图像输入所述第一类鉴别器以得到第四鉴别结果;
将所述第一人脸局部图像设置为具有假值标签,并将具有假值标签的所述第一人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第五鉴别结果;将所述第二人脸局部图像设置为具有真值标签,并将具有真值标签的所述第二人脸局部图像输入至初始的第二类鉴别器或上 一次训练后的第二类鉴别器,以得到第六鉴别结果;
基于所述第三鉴别结果和第四鉴别结果计算第三对抗损失;基于所述第五鉴别结果和第六鉴别结果计算第四对抗损失;
根据所述第三对抗损失调整所述第一类鉴别器的参数以得到更新后的第一类鉴别器;根据所述第四对抗损失调整所述第二类鉴别器的参数以得到更新后的第二类鉴别器。
可选的,所述第一人脸局部图像和第二人脸局部图像为眼部图像。
可选的,所述至少两个鉴别器还包括:X个第三类鉴别器;X为大于或等于1的正整数,所述第三类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸部位的细节修复。
可选的,所述第一训练子模块用于训练所述待训练生成器,包括:
将所述训练图像处理成N个尺度的待修复训练图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
采用人脸解析网络对所述第N尺度的修复图像进行人脸解析处理,得到所述第N尺度的修复图像对应的X张第一人脸部位图像,其中若X等于1,所述第一人脸部位图像包含一个人脸部位,若X大于1,所述X张第一人脸部位图像包含包括不同的人脸部位;
将所述X张第一人脸部位图像都设置为具有真值标签,并将具有真值标签的每个第一人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第七鉴别结果;
基于所述第七鉴别结果计算第五对抗损失;所述总对抗损失包括所述第五对抗损失;
所述第一训练子模块用于训练所述至少两个鉴别器,包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
采用人脸解析网络对所述第N尺度的修复图像进行人脸解析处理,得到 所述第N尺度的修复图像对应的X张第一人脸部位图像,其中X张第一人脸部位图像包含包括不同的人脸部位;采用人脸解析网络对所述第N尺度的验证图像进行人脸解析处理,得到所述第N尺度的验证图像对应的X张第二人脸部位图像,其中所述X张第二人脸部位图像包含不同的人脸部位;
将所述X张第一人脸部位图像都设置为具有假值标签,并将具有假值标签的所述第一人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第八鉴别结果;将所述X张第二人脸部位图像都设置为具有真值标签,并将具有真值标签的每张第二人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第九鉴别结果;
基于所述第八鉴别结果和第九鉴别结果计算第六对抗损失;
根据所述第六对抗损失调整所述第三类鉴别器的参数以得到更新后的第三类鉴别器。
可选的,所述人脸解析网络采用语义分割网络。
可选的,X等于1,所述第三类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸皮肤的细节修复。
可选的,所述待训练生成器的总损失还包括:人脸相似度损失;
所述第一训练子模块用于训练所述待训练生成器,包括:
将所述训练图像处理成N个尺度的待修复训练图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
采用关键点检测网络对所述第N尺度的修复图像进行关键点检测,得到所述第N尺度的修复图像对应的第一关键点热图;
采用关键点检测网络对所述第N尺度的待修复训练图像进行关键点检测,得到所述第N尺度的待修复训练图像对应的第二关键点热图;
根据所述第一关键点热图和所述第二关键点热图计算人脸相似度损失。
可选的,所述待训练生成器的总损失还包括:平均梯度损失;
所述第一训练子模块用于训练所述待训练生成器,包括:
将所述训练图像处理成N个尺度的待修复训练图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后 的生成器中得到N个尺度的修复训练图像;
计算所述第N尺寸的修复训练图像的平均梯度损失。
可选的,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数,每个所述修复模块采用相同的网络结构;
对所述待训练生成器的训练过程包括第一训练阶段和第二训练阶段,所述第一训练阶段和所述第二训练阶段均包括对所述待训练生成器的至少一次训练过程;
在所述第一训练阶段,对每个所述修复模块进行调参时,所有修复模块共享相同的参数;
在所述第二训练阶段,每个所述修复模块分别独立调参。
可选的,在所述第一训练阶段采用的学习率大于在所述第二训练阶段采用的学习率。
可选的,所述至少两个鉴别器包括:第四类鉴别器和第五类鉴别器;所述第四类鉴别器被配置为保持所述第一生成器对所述训练图像的结构性特征;所述第五类鉴别器被配置为提升所述第一生成器对所述训练图像的细节修复。
可选的,所述训练模块还包括第二训练子模块;
所述第二训练子模块用于训练所述待训练生成器,包括:
将所述训练图像处理成N个尺度的待修复训练图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十鉴别结果;
基于所述第十鉴别结果计算第七对抗损失;
针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十一鉴别结果;
基于所述第十一鉴别结果计算第八对抗损失;
所述总对抗损失包括所述第七对抗损失和所述第八对抗损失;
根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整;
所述第二训练子模块用于训练所述至少两个鉴别器,包括:
将所述训练图像处理成N个尺度的待修复训练图像;将所述验证图像处理成N个尺度的验证图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十二鉴别结果;针对每一尺度的待修复训练图像,将所述待修复训练图像设置为具有真值标签,并将具有真值标签的所述待修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十三鉴别结果;
基于所述第十二鉴别结果和第三鉴别结果计算第九对抗损失;
根据所述第九对抗损失调整所述第四类鉴别器的参数以得到更新后的第四类鉴别器;针对每个尺度的修复训练图像,对所述修复训练图像和对应尺度的验证图像进行高频滤波处理,得到高频滤波后的修复训练图像和验证图像;
针对每一尺度的高频滤波后的修复训练图像,将所述高频滤波后的修复训练图像设置为具有假值标签,并将具有假值标签的所述高频滤波后的修复训练图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十四鉴别结果;针对每一尺度的高斯滤波后的验证图像,将所述高斯滤波后的验证图像设置为具有真值标签,并将具有真值标签的所述高斯滤波后的验证图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十五鉴别结果;
基于所述第十四鉴别结果和第十五鉴别结果计算第十对抗损失;
根据所述第十对抗损失调整所述第五类鉴别器的参数以得到更新后的第五类鉴别器。
可选的,所述待训练生成器的总损失还包括:平均梯度损失;
所述第二训练子模块用于训练所述待训练生成器,包括:
将所述训练图像处理成N个尺度的待修复训练图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
计算所述第N尺寸的修复训练图像的平均梯度损失。
可选的,平均梯度损失AvgG的计算公式如下:
Figure PCTCN2020125463-appb-000025
其中,m和n分别是第N尺寸的修复训练图像的宽和高,fi,j为第N尺寸的修复训练图像在位置(i,j)上的像素。
可选的,所述第一生成器包括N个修复模块,所述至少两个鉴别器包括:分别与所述N个修复模块对应的N个网络结构不同的第一类鉴别器;
所述训练模块还包括第三训练模块;
所述第三训练子模块用于训练所述待训练生成器包括:
将所述训练图像处理成N个尺度的待修复训练图像;
针对每个尺度的待修复训练图像,提取所述待修复训练图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像,其中,S为大于或等于2的整数;
将所述N个尺度的待修复训练图像和每个尺度的S张关键点蒙版图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第一鉴别结果;
基于所述第一鉴别结果计算第一对抗损失;所述总对抗损失包括所述第一对抗损失;
根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整;
所述第三训练子模块用于训练所述至少两个鉴别器,包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处 理成N个尺度的验证图像;
针对每个尺度的待修复训练图像,提取所述待修复训练图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像;
将所述N个尺度的待修复训练图像和每个尺度的S张关键点蒙版图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第三鉴别结果;将每一尺度的验证图像设置为具有真值标签,并将具有真值标签的每个验证图像输入所述第一类鉴别器以得到第四鉴别结果;
基于所述第三鉴别结果和第四鉴别结果计算第三对抗损失;
根据所述第三对抗损失调整所述第一类鉴别器的参数以得到更新后的第一类鉴别器。
可选的,所述第一生成器包括N个修复模块;
所述第三训练子模块用于训练所述待训练生成器,包括:
将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
将所述N个尺度的修复训练图像和所述N个尺度的验证图像输入至VGG网络中,得到每个尺度的修复训练图像在所述VGG网络的M个目标层上的损失,M为大于或等于1的整数;
所述第一损失包括所述N个尺度的修复训练图像在M个目标层上的损失。
可选的,所述第一损失包括:每个尺度的修复训练图像在M个目标层上的损失与对应的权重相乘之后相加,其中,不同尺度的修复训练图像在目标层使用的权重不同。
可选的,所述第一损失还包括:逐像素的二范式损失。
可选的,所述第一生成器包括4个尺度的修复模块,分别为:64*64尺 度的修复模块、128*128尺度的修复模块、256*256尺度的修复模块和512*512尺度的修复模块。
可选的,S等于5,所述S张关键点蒙版图像包括:左眼、右眼、鼻子、嘴和轮廓的关键点蒙版图像。
请参考图27,本公开实施例还提供一种图像处理装置,包括:
接收模块271,用于接收输入图像;
人脸检测模块272,用于对所述输入图像进行人脸检测,得到人脸图像;
第一处理模块,用于采用上述任一实施例所述的图像处理方法对所述人脸图像进行处理得到第一修复训练图像,其中,所述第一修复训练图像的清晰度高于所述输入图像的清晰度;
第二处理模块273,用于对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像,其中,所述第二修复训练图像的清晰度高于所述输入图像的清晰度;
融合模块274,用于将所述第一修复训练图像和所述第二修复训练图像进行融合,得到融合后的图像,所述融合图像的清晰度高于所述输入图像的清晰度。
可选的,第二处理模块273,用于采用上述任一实施例所述的图像处理方法对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像。
本公开实施例还提供一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现上述任一实施例中所述的图像处理方法的步骤。
本公开实施例还提供一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现杀上述任一实施例中的图像处理方法的步骤。
其中,所述处理器为上述实施例中所述的终端中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本公开的实施例进行了描述,但是本公开并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本公开的启示下,在不脱离本公开宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本公开的保护之内。

Claims (38)

  1. 一种图像处理方法,其中,包括:
    接收输入图像;
    利用第一生成器对所述输入图像进行处理得到输出图像,其中,所述输出图像的清晰度高于所述输入图像的清晰度;
    其中,所述第一生成器是利用至少两个鉴别器对待训练生成器训练得到。
  2. 如权利要求1所述的图像处理方法,其中,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数;利用第一生成器对所述输入图像进行处理得到输出图像包括:
    将所述输入图像处理成N个尺度的待修复图像,其中,第一尺度的待修复图像的尺度到第N尺度的待修复图像的尺度依次递增;
    利用所述N个修复模块和所述N个尺度的待修复图像,得到所述输出图像。
  3. 如权利要求2所述的图像处理方法,其中,所述N个尺度中相邻的两个尺度,后一个尺度为前一个尺度的2倍。
  4. 如权利要求2所述的图像处理方法,其中,将所述输入图像处理成N个尺度的待修复图像包括:
    确定所述输入图像所属的尺度区间;
    将所述输入图像处理成与其所属的尺度区间对应的第j尺度的待修复图像,所述第j尺度为所述第一尺度至第N尺度中的一种;
    对所述第j尺度的待修复图像进行上采样和/或下采样处理,得到其余N-1个尺度的待修复图像。
  5. 如权利要求2所述的图像处理方法,其中,利用所述N个修复模块和所述N个尺度的待修复图像,得到所述输出图像包括:将第一尺度的待修复图像和第一尺度的随机噪声图像进行拼接,得到第一拼接图像;将所述第一拼接图像输入至第一个修复模块中得到第一尺度的修复图像;对所述第一尺度的修复图像进行上采样处理,得到第二尺度的上采样图像;
    将第i尺度的上采样图像、第i尺度的待修复图像和第i尺度的随机噪声 图像进行拼接,得到第i拼接图像;将所述第i拼接图像输入至第i个修复模块中得到第i尺度的修复图像;对所述第i尺度的修复图像进行上采样处理,得到第i+1尺度的上采样图像;其中,i为大于或等于2的整数;
    将第N尺度的上采样图像、第N尺寸的待修复图像和第N尺度的随机噪声图像进行拼接,得到第N拼接图像;将所述第N拼接图像输入至第N个修复模块中得到第N尺度的修复图像,作为所述第一生成器的输出图像。
  6. 如权利要求2所述的图像处理方法,其中,
    利用所述N个修复模块和所述N个尺度的待修复图像,得到所述输出图像包括:
    针对每个尺度的待修复图像,提取所述待修复图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像,其中,S为大于或等于2的整数;
    将第一尺度的待修复图像和第一尺度的S张关键点蒙版图像进行拼接,得到第一拼接图像;将所述第一拼接图像输入至第一个修复模块中得到第一尺度的修复图像;对所述第一尺度的修复图像进行上采样处理,得到第二尺度的上采样图像;
    将第i尺度的上采样图像、第i尺度的待修复图像和第i尺度的S张关键点蒙版图像进行拼接,得到第i拼接图像;将所述第i拼接图像输入至第i个修复模块中得到第i尺度的修复图像;对所述第i尺度的修复图像进行上采样处理,得到第i+1尺度的上采样图像;其中,i为大于或等于2的整数;
    将第N尺度的上采样图像、第N尺度的待修复图像和第N尺度的S张关键点蒙版图像进行拼接,得到第N拼接图像;将所述第N拼接图像输入至第N个修复模块中得到第N尺度的修复图像,作为所述第一生成器的输出图像。
  7. 如权利要求6所述的方法,其中,采用4堆栈沙漏模型提取所述待修复图像中的关键点。
  8. 如权利要求1所述的方法,其中,所述第一生成器是采用至少两个鉴别器对待训练生成器训练得到包括:
    根据训练图像和验证图像对所述待训练生成器和所述至少两个鉴别器进 行交替训练,得到所述第一生成器,其中,所述验证图像的清晰度高于所述训练图像的清晰度,对所述待训练生成器进行训练时,所述待训练生成器的总损失包括以下至少一项:第一损失和所述至少两个鉴别器的总对抗损失。
  9. 如权利要求8所述的方法,其中,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数,所述至少两个鉴别器包括:分别与所述N个修复模块对应的N个网络结构不同的第一类鉴别器,以及,第二类鉴别器;其中,所述第二类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸局部的清晰度修复。
  10. 如权利要求9所述的方法,其中,
    训练所述待训练生成器包括:
    将所述训练图像处理成N个尺度的待修复训练图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    获取第N尺度的修复训练图像的第一人脸局部图像;
    针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第一鉴别结果;
    将所述第一人脸局部图像设置为具有真值标签,并将具有真值标签的所述第一人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第二鉴别结果;
    基于所述第一鉴别结果计算第一对抗损失;基于所述第二鉴别结果计算第二对抗损失,所述总对抗损失包括所述第一对抗损失和所述第二对抗损失;
    根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整;
    训练所述至少两个鉴别器包括:
    将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
    获取第N尺度的验证图像的第二人脸局部图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后 的生成器中得到N个尺度的修复训练图像;
    获取第N尺度的修复训练图像的第一人脸局部图像;
    针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第三鉴别结果;将每一尺度的验证图像设置为具有真值标签,并将具有真值标签的每个验证图像输入所述第一类鉴别器以得到第四鉴别结果;
    将所述第一人脸局部图像设置为具有假值标签,并将具有假值标签的所述第一人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第五鉴别结果;将所述第二人脸局部图像设置为具有真值标签,并将具有真值标签的所述第二人脸局部图像输入至初始的第二类鉴别器或上一次训练后的第二类鉴别器,以得到第六鉴别结果;
    基于所述第三鉴别结果和第四鉴别结果计算第三对抗损失;基于所述第五鉴别结果和第六鉴别结果计算第四对抗损失;
    根据所述第三对抗损失调整所述第一类鉴别器的参数以得到更新后的第一类鉴别器;根据所述第四对抗损失调整所述第二类鉴别器的参数以得到更新后的第二类鉴别器。
  11. 如权利要求10所述的方法,其中,所述第一人脸局部图像和第二人脸局部图像为眼部图像。
  12. 如权利要求9所述的方法,其中,所述至少两个鉴别器还包括:X个第三类鉴别器;X为大于或等于1的正整数,所述第三类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸部位的细节修复。
  13. 如权利要求12所述的方法,其中,
    训练所述待训练生成器还包括:
    将所述训练图像处理成N个尺度的待修复训练图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    采用人脸解析网络对第N尺度的修复图像进行人脸解析处理,得到所述第N尺度的修复图像对应的X张第一人脸部位图像,其中若X等于1,所述 第一人脸部位图像包含一个人脸部位,若X大于1,所述X张第一人脸部位图像包含不同的人脸部位;
    将所述X张第一人脸部位图像都设置为具有真值标签,并将具有真值标签的每个第一人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第七鉴别结果;
    基于所述第七鉴别结果计算第五对抗损失;所述总对抗损失包括所述第五对抗损失;
    训练所述至少两个鉴别器还包括:
    将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    采用人脸解析网络对所述第N尺度的修复图像进行人脸解析处理,得到所述第N尺度的修复图像对应的X张第一人脸部位图像,其中X张第一人脸部位图像包含包括不同的人脸部位;采用人脸解析网络对所述第N尺度的验证图像进行人脸解析处理,得到所述第N尺度的验证图像对应的X张第二人脸部位图像,其中所述X张第二人脸部位图像包含不同的人脸部位;
    将所述X张第一人脸部位图像都设置为具有假值标签,并将具有假值标签的所述第一人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第八鉴别结果;将所述X张第二人脸部位图像都设置为具有真值标签,并将具有真值标签的每张第二人脸部位图像输入至初始的第三类鉴别器或上一次训练后的第三类鉴别器,以得到第九鉴别结果;
    基于所述第八鉴别结果和第九鉴别结果计算第六对抗损失;
    根据所述第六对抗损失调整所述第三类鉴别器的参数以得到更新后的第三类鉴别器。
  14. 如权利要求12或13所述的方法,其中,X等于1,所述第三类鉴别器被配置为提升所述第一生成器对所述训练图像的人脸皮肤的细节修复。
  15. 如权利要求13所述的方法,其中,所述人脸解析网络采用语义分割网络。
  16. 如权利要求9所述的方法,其中,所述待训练生成器的总损失还包括:人脸相似度损失;
    训练所述待训练生成器还包括:
    将所述训练图像处理成N个尺度的待修复训练图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    采用关键点检测网络对第N尺度的修复图像进行关键点检测,得到所述第N尺度的修复图像对应的第一关键点热图;
    采用关键点检测网络对所述第N尺度的待修复训练图像进行关键点检测,得到所述第N尺度的待修复训练图像对应的第二关键点热图;
    根据所述第一关键点热图和所述第二关键点热图计算人脸相似度损失。
  17. 如权利要求9所述的方法,其中,所述待训练生成器的总损失还包括:平均梯度损失;
    训练所述待训练生成器还包括:
    将所述训练图像处理成N个尺度的待修复训练图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    计算第N尺寸的修复训练图像的平均梯度损失。
  18. 如权利要求8所述的方法,其中,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数,每个所述修复模块采用相同的网络结构;
    对所述待训练生成器的训练过程包括第一训练阶段和第二训练阶段,所述第一训练阶段和所述第二训练阶段均包括对所述待训练生成器的至少一次训练过程;
    在所述第一训练阶段,对每个所述修复模块进行调参时,所有修复模块共享相同的参数;
    在所述第二训练阶段,每个所述修复模块分别独立调参。
  19. 如权利要求18所述的方法,其中,在所述第一训练阶段采用的学习率大于在所述第二训练阶段采用的学习率。
  20. 如权利要求8所述的方法,其中,所述至少两个鉴别器包括:第四类鉴别器和第五类鉴别器;所述第四类鉴别器被配置为保持所述第一生成器对所述训练图像的结构性特征;所述第五类鉴别器被配置为提升所述第一生成器对所述训练图像的细节修复。
  21. 如权利要求20所述的方法,其中,
    训练所述待训练生成器包括:
    将所述训练图像处理成N个尺度的待修复训练图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十鉴别结果;
    基于所述第十鉴别结果计算第七对抗损失;
    针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十一鉴别结果;
    基于所述第十一鉴别结果计算第八对抗损失;
    所述总对抗损失包括所述第七对抗损失和所述第八对抗损失;
    根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整;
    训练所述至少两个鉴别器包括:
    将所述训练图像处理成N个尺度的待修复训练图像;将所述验证图像处理成N个尺度的验证图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十二鉴别结果;针对每一尺度的待修复训练图像,将所述待修复训练图像设置为具有真值标签,并将 具有真值标签的所述待修复训练图像输入至初始的第四类鉴别器或上一次训练后的第四类鉴别器,以得到第十三鉴别结果;
    基于所述第十二鉴别结果和第三鉴别结果计算第九对抗损失;
    根据所述第九对抗损失调整所述第四类鉴别器的参数以得到更新后的第四类鉴别器;针对每个尺度的修复训练图像,对所述修复训练图像和对应尺度的验证图像进行高频滤波处理,得到高频滤波后的修复训练图像和验证图像;
    针对每一尺度的高频滤波后的修复训练图像,将所述高频滤波后的修复训练图像设置为具有假值标签,并将具有假值标签的所述高频滤波后的修复训练图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十四鉴别结果;针对每一尺度的高斯滤波后的验证图像,将所述高斯滤波后的验证图像设置为具有真值标签,并将具有真值标签的所述高斯滤波后的验证图像输入至初始的第五类鉴别器或上一次训练后的第五类鉴别器,以得到第十五鉴别结果;
    基于所述第十四鉴别结果和第十五鉴别结果计算第十对抗损失;
    根据所述第十对抗损失调整所述第五类鉴别器的参数以得到更新后的第五类鉴别器。
  22. 如权利要求20所述的方法,其中,所述待训练生成器的总损失还包括:平均梯度损失;
    训练所述待训练生成器还包括:
    将所述训练图像处理成N个尺度的待修复训练图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    计算第N尺寸的修复训练图像的平均梯度损失。
  23. 如权利要求17或22所述的方法,其中,平均梯度损失AvgG的计算公式如下:
    Figure PCTCN2020125463-appb-100001
    其中,m和n分别是第N尺寸的修复训练图像的宽和高,fi,j为第N尺寸 的修复训练图像在位置(i,j)上的像素。
  24. 如权利要求8所述的方法,其中,所述第一生成器包括N个修复模块,其中,N为大于或等于2的整数,所述至少两个鉴别器包括:分别与所述N个修复模块对应的N个网络结构不同的第一类鉴别器。
  25. 如权利要求24所述的方法,其中,
    训练所述待训练生成器包括:
    将所述训练图像处理成N个尺度的待修复训练图像;
    针对每个尺度的待修复训练图像,提取所述待修复训练图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像,其中,S为大于或等于2的整数;
    将所述N个尺度的待修复训练图像和每个尺度的S张关键点蒙版图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有真值标签,并将具有真值标签的所述修复训练图像输入至初始的第一类鉴别器或上一次训练后的第一类鉴别器,以得到第一鉴别结果;
    基于所述第一鉴别结果计算第一对抗损失;所述总对抗损失包括所述第一对抗损失;
    根据所述总对抗损失对所述待训练生成器或上一次训练的生成器进行参数调整;
    训练所述至少两个鉴别器包括:
    将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
    针对每个尺度的待修复训练图像,提取所述待修复训练图像中的关键点,生成多张关键点热图,将所述关键点热图进行合并和分类,得到每个尺度的S张关键点蒙版图像;
    将所述N个尺度的待修复训练图像和每个尺度的S张关键点蒙版图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    针对每一尺度的修复训练图像,将所述每一尺度的修复训练图像设置为具有假值标签,并将具有假值标签的所述修复训练图像输入至初始的第一类 鉴别器或上一次训练后的第一类鉴别器,以得到第三鉴别结果;将每一尺度的验证图像设置为具有真值标签,并将具有真值标签的每个验证图像输入所述第一类鉴别器以得到第四鉴别结果;
    基于所述第三鉴别结果和第四鉴别结果计算第三对抗损失;
    根据所述第三对抗损失调整所述第一类鉴别器的参数以得到更新后的第一类鉴别器。
  26. 如权利要求8或24所述的方法,其中,
    训练所述待训练生成器包括:
    将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    将所述N个尺度的修复训练图像和所述N个尺度的验证图像输入至VGG网络中,得到每个尺度的修复训练图像在所述VGG网络的M个目标层上的损失,M为大于或等于1的整数;
    所述第一损失包括所述N个尺度的修复训练图像在M个目标层上的损失。
  27. 如权利要求26所述的方法,其中,所述第一损失包括:每个尺度的修复训练图像在M个目标层上的损失与对应的权重相乘之后相加,其中,不同尺度的修复训练图像在目标层使用的权重不同。
  28. 如权利要求24所述的方法,其中,所述第一损失包括:逐像素的二范式损失。
  29. 如权利要求8所述的方法,其中,所述第一损失还包括以下至少一项:L1损失、第二损失和第三损失;
    所述第一损失包括L1损失时,所述训练所述待训练生成器包括:
    将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    通过比较所述N个尺度的修复训练图像和所述N个尺度的验证图像,得 到L1损失;
    所述第一损失包括所述第二损失时,所述训练所述待训练生成器包括:
    将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    获取第N尺度的修复训练图像的第一眼部图像和第N尺度的验证图像的第二眼部图像;
    将所述第一眼部图像和所述第二眼部图像输入至VGG网络中,得到所述第一眼部图像在所述VGG网络的M个目标层上的第二损失,M为大于或等于1的整数;
    所述第一损失包括所述第三损失时,所述训练所述待训练生成器包括:
    将所述训练图像处理成N个尺度的待修复训练图像,将所述验证图像处理成N个尺度的验证图像;
    将所述N个尺度的待修复训练图像输入至待训练生成器或上一次训练后的生成器中得到N个尺度的修复训练图像;
    获取第N尺度的修复训练图像的第一人脸皮肤图像和第N尺度的验证图像的第二人脸皮肤图像;
    将所述第一人脸皮肤图像和所述第二人脸皮肤图像输入至VGG网络中,得到所述第一人脸皮肤图像在所述VGG网络的M个目标层上的第三损失。
  30. 如权利要求1所述的方法,其中,所述第一生成器包括4个尺度的修复模块,分别为:64*64尺度的修复模块、128*128尺度的修复模块、256*256尺度的修复模块和512*512尺度的修复模块。
  31. 如权利要求6或25所述的方法,其中,S等于5,所述S张关键点蒙版图像包括:左眼、右眼、鼻子、嘴和轮廓的关键点蒙版图像。
  32. 如权利要求2、5、6、9、18或24所述的方法,其中,所述修复模块采用的网络结构为SRCNN或U-Net。
  33. 一种图像处理方法,其中,包括:
    接收输入图像;
    对所述输入图像进行人脸检测,得到人脸图像;
    采用如权利要求1-32任一项所述的方法对所述人脸图像进行处理得到第一修复训练图像,其中,所述第一修复训练图像的清晰度高于所述输入图像的清晰度;
    对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像,其中,所述第二修复训练图像的清晰度高于所述输入图像的清晰度;
    将所述第一修复训练图像和所述第二修复训练图像进行融合,得到融合后的图像,所述融合图像的清晰度高于所述输入图像的清晰度。
  34. 如权利要求33所述的方法,其中,对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像包括:
    采用如权利要求1至32任一项所述的方法对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像。
  35. 一种图像处理装置,其中,包括:
    接收模块,用于接收输入图像;
    处理模块,用于利用第一生成器对所述输入图像进行处理得到输出图像,其中,所述输出图像的清晰度高于所述输入图像的清晰度;
    其中,所述第一生成器是利用至少两个鉴别器对待训练生成器训练得到。
  36. 一种图像处理装置,其中,包括:
    接收模块,用于接收输入图像;
    人脸检测模块,用于对所述输入图像进行人脸检测,得到人脸图像;
    第一处理模块,用于采用如权利要求1至32任一项所述的方法对所述人脸图像进行处理得到第一修复训练图像,其中,所述第一修复训练图像的清晰度高于所述输入图像的清晰度;
    第二处理模块,用于对所述输入图像或去除所述人脸图像的输入图像进行处理得到第二修复训练图像,其中,所述第二修复训练图像的清晰度高于所述输入图像的清晰度;
    将所述第一修复训练图像和所述第二修复训练图像进行融合,得到融合后的图像,所述融合图像的清晰度高于所述输入图像的清晰度。
  37. 一种电子设备,其中,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至32任一项所述的图像处理方法的步骤,或者,所述程序或指令被所述处理器执行时实现如权利要求33或34所述的图像处理方法的步骤。
  38. 一种可读存储介质,其中,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至32任一项所述的图像处理方法的步骤,或者,实现如权利要求33或34所述的图像处理方法的步骤。
PCT/CN2020/125463 2020-10-30 2020-10-30 图像处理方法、图像处理装置、电子设备及可读存储介质 WO2022088089A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/425,715 US20230325973A1 (en) 2020-10-30 2020-10-30 Image processing method, image processing device, electronic device and computer-readable storage medium
CN202080002585.4A CN114698398A (zh) 2020-10-30 2020-10-30 图像处理方法、图像处理装置、电子设备及可读存储介质
PCT/CN2020/125463 WO2022088089A1 (zh) 2020-10-30 2020-10-30 图像处理方法、图像处理装置、电子设备及可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/125463 WO2022088089A1 (zh) 2020-10-30 2020-10-30 图像处理方法、图像处理装置、电子设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2022088089A1 true WO2022088089A1 (zh) 2022-05-05

Family

ID=81381798

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125463 WO2022088089A1 (zh) 2020-10-30 2020-10-30 图像处理方法、图像处理装置、电子设备及可读存储介质

Country Status (3)

Country Link
US (1) US20230325973A1 (zh)
CN (1) CN114698398A (zh)
WO (1) WO2022088089A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115660985A (zh) * 2022-10-25 2023-01-31 中山大学中山眼科中心 白内障眼底图像的修复方法、修复模型的训练方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
CN107945118A (zh) * 2017-10-30 2018-04-20 南京邮电大学 一种基于生成式对抗网络的人脸图像修复方法
CN109345455A (zh) * 2018-09-30 2019-02-15 京东方科技集团股份有限公司 图像鉴别方法、鉴别器和计算机可读存储介质
CN110033416A (zh) * 2019-04-08 2019-07-19 重庆邮电大学 一种结合多粒度的车联网图像复原方法
CN110222837A (zh) * 2019-04-28 2019-09-10 天津大学 一种基于CycleGAN的图片训练的网络结构ArcGAN及方法
US20190286950A1 (en) * 2018-03-16 2019-09-19 Ebay Inc. Generating a digital image using a generative adversarial network
CN111507934A (zh) * 2019-01-30 2020-08-07 富士通株式会社 训练设备、训练方法和计算机可读记录介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122826A (zh) * 2017-05-08 2017-09-01 京东方科技集团股份有限公司 用于卷积神经网络的处理方法和系统、和存储介质
CN107945118A (zh) * 2017-10-30 2018-04-20 南京邮电大学 一种基于生成式对抗网络的人脸图像修复方法
US20190286950A1 (en) * 2018-03-16 2019-09-19 Ebay Inc. Generating a digital image using a generative adversarial network
CN109345455A (zh) * 2018-09-30 2019-02-15 京东方科技集团股份有限公司 图像鉴别方法、鉴别器和计算机可读存储介质
CN111507934A (zh) * 2019-01-30 2020-08-07 富士通株式会社 训练设备、训练方法和计算机可读记录介质
CN110033416A (zh) * 2019-04-08 2019-07-19 重庆邮电大学 一种结合多粒度的车联网图像复原方法
CN110222837A (zh) * 2019-04-28 2019-09-10 天津大学 一种基于CycleGAN的图片训练的网络结构ArcGAN及方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115660985A (zh) * 2022-10-25 2023-01-31 中山大学中山眼科中心 白内障眼底图像的修复方法、修复模型的训练方法及装置

Also Published As

Publication number Publication date
CN114698398A (zh) 2022-07-01
US20230325973A1 (en) 2023-10-12

Similar Documents

Publication Publication Date Title
Li et al. Interactive image segmentation with latent diversity
EP3916627A1 (en) Living body detection method based on facial recognition, and electronic device and storage medium
CN106803067B (zh) 一种人脸图像质量评估方法及装置
US20110292051A1 (en) Automatic Avatar Creation
CN111667400B (zh) 一种基于无监督学习的人脸轮廓特征风格化生成方法
US20040184657A1 (en) Method for image resolution enhancement
CN110111316B (zh) 基于眼部图像识别弱视的方法及系统
CN105917353A (zh) 用于生物认证的特征提取及匹配以及模板更新
CN109711268B (zh) 一种人脸图像筛选方法及设备
CN111445410A (zh) 基于纹理图像的纹理增强方法、装置、设备和存储介质
Pan et al. MIEGAN: Mobile image enhancement via a multi-module cascade neural network
CN111127309B (zh) 肖像风格迁移模型训练方法、肖像风格迁移方法以及装置
CN110674759A (zh) 一种基于深度图的单目人脸活体检测方法、装置及设备
CN114898284B (zh) 一种基于特征金字塔局部差异注意力机制的人群计数方法
CN109961397B (zh) 图像重建方法及设备
CN113486944A (zh) 人脸融合方法、装置、设备及存储介质
CN111243051B (zh) 基于肖像照片的简笔画生成方法、系统及存储介质
Krishnan et al. SwiftSRGAN-Rethinking super-resolution for efficient and real-time inference
WO2022088089A1 (zh) 图像处理方法、图像处理装置、电子设备及可读存储介质
CN113516604B (zh) 图像修复方法
Geng et al. Cervical cytopathology image refocusing via multi-scale attention features and domain normalization
Lau et al. Semi-supervised landmark-guided restoration of atmospheric turbulent images
CN109165551B (zh) 一种自适应加权融合显著性结构张量和lbp特征的表情识别方法
Marnissi et al. GAN-based Vision Transformer for High-Quality Thermal Image Enhancement
Wang et al. Exposure fusion using a relative generative adversarial network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20959251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20959251

Country of ref document: EP

Kind code of ref document: A1