WO2023245927A1 - Image generator training method and apparatus, and electronic device and readable storage medium - Google Patents

Image generator training method and apparatus, and electronic device and readable storage medium Download PDF

Info

Publication number
WO2023245927A1
WO2023245927A1 PCT/CN2022/125015 CN2022125015W WO2023245927A1 WO 2023245927 A1 WO2023245927 A1 WO 2023245927A1 CN 2022125015 W CN2022125015 W CN 2022125015W WO 2023245927 A1 WO2023245927 A1 WO 2023245927A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sample face
face image
generator
discriminator
Prior art date
Application number
PCT/CN2022/125015
Other languages
French (fr)
Chinese (zh)
Inventor
葛国敬
王金桥
朱贵波
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Publication of WO2023245927A1 publication Critical patent/WO2023245927A1/en

Links

Images

Classifications

    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the field of image processing technology, and in particular to a training method, device, electronic device and readable storage medium for an image generator.
  • Image repair technology is a technology that repairs the lost information or detailed information in the image to be repaired based on the known information of the image and preset repair rules to achieve visually realistic effects.
  • the blind image repair technology refers to the technology of pre-selecting the image to be repaired without knowing the image loss type or image degradation type of the image to be repaired.
  • CNN Convolutional Neural Networks
  • This application provides a training method, device, electronic equipment and readable storage medium for an image generator to solve the problem that when using a convolutional neural network to perform blind image repair processing in the prior art, ideal training results cannot be obtained in a single stage.
  • the training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome and complicated to achieve the end-to-end image blind repair function, and the training process does not require manual intervention, and the training path is relatively simple.
  • This application provides a training method for an image generator, which includes: acquiring an original sample face image and a degraded sample face image corresponding to the original sample face image; inputting the degraded sample face image into a pre-constructed
  • the repaired sample face image generated by the image generator is obtained; the image generator is constructed based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-constructed
  • the image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image; based on the original sample face image and the repaired sample face image Sample face image, optimize the image generator to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; alternately repeat the above steps of optimizing the image discriminator and optimizing the image
  • the generator steps until the preset convergence condition is reached, the optimization is stopped, and the optimized image generator is used as the target image generator to perform blind image repair processing on the face image to be repaired.
  • the pre-constructed image discriminator is optimized based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator. , including: inputting the original sample face image and the repaired sample face image to the image discriminator; obtaining the first image discrimination result corresponding to the original sample face image, and obtaining the repaired sample face image. a second image discrimination result corresponding to the face image; based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator; fix the device parameters of the image generator, and Iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator to obtain an optimized image discriminator.
  • the image generator is optimized based on the original sample face image and the repaired sample face image to obtain an optimized image generator
  • the method includes: obtaining the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator; and based on the original sample face image, the repaired sample face image and the second image discrimination result. , obtain the second loss function of the image generator; fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator, and obtain the optimization Image generator after.
  • the second image generator of the image generator is obtained based on the original sample face image, the repaired sample face image and the second image discrimination result.
  • the loss function includes: obtaining the content loss of the image generator based on the original sample face image and the repaired sample face image, where the content loss is used to measure the difference between the repaired sample face image and the original sample face image.
  • the ID loss of the image generator based on the original sample face image and the repaired sample face image, the ID loss is used to measure the difference between the repaired sample face image and the original sample face image distance difference between; obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability; based on the content loss, the ID loss and the Generate loss, obtain the second loss function of the image generator.
  • the image discriminator is a wavelet discriminator.
  • the wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into multiple frequency scales. Feature image; the splicing convolution module is used to splice feature images of multiple frequency scales, and perform convolution processing on the spliced feature images to obtain a reconstructed image.
  • the degraded sample face image is input into a pre-constructed image generator to obtain a repaired sample face image generated by the image generator, including : Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features; input the high-level semantic features into the feature conversion module of the image generator, Obtain the style vector; input the low-level semantic features, the high-level semantic features and the style vector into the decoder of the image generator to obtain the repaired sample face image.
  • This application also provides a training device for an image generator, including: a sample image acquisition module, used to acquire an original sample face image and a degraded sample face image corresponding to the original sample face image; a degraded image repair module , used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model; the discriminator optimization module , used to optimize the pre-constructed image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image A face image and the repaired sample face image; a generator optimization module configured to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generation
  • the image generator and the image discriminator form a generative adversarial network; the generator determination module is used to alternately repeat the above-mentioned steps of optimizing the image discriminator and the steps of optimizing the image generator until the preset convergence
  • the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the program, it implements any of the above image generators. training methods.
  • the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the training method of any of the above image generators is implemented.
  • the training method, device, electronic equipment and readable storage medium of the image generator pre-constructs the image generator and the image discriminator to form a generative adversarial network.
  • the image generator does not know in advance According to the image loss type or image degradation type of the degraded sample face image, blind image repair processing is performed on the degraded sample face image, and try to generate a repaired sample face image with high image performance index, high degree of restoration and realistic; and The image discriminator tries to identify the difference between the repaired sample face image generated by the image generator and the original sample face, so that the image generator and image discriminator are continuously optimized during the adversarial training process until the preset convergence conditions are reached and the optimization is stopped.
  • the process does not require manual intervention, and the training path is relatively simple. It overcomes the problem that when using convolutional neural networks for blind image repair processing in the existing technology, ideal training results cannot be obtained in a single stage, and the training task needs to be completed in two stages, and The training process requires manual intervention, and the training path is cumbersome and complex.
  • Figure 1 is one of the flow diagrams of the training method of the image generator provided by this application.
  • Figure 2 is the second schematic flow chart of the training method of the image generator provided by this application.
  • Figure 3 is the third schematic flow chart of the training method of the image generator provided by this application.
  • Figure 4 is the fourth schematic flowchart of the training method of the image generator provided by this application.
  • Figure 5 is the fifth schematic flow chart of the training method of the image generator provided by this application.
  • Figure 6 is a schematic structural diagram of the optimized training model of the image generator in the second embodiment of the present application.
  • Figure 7 is a schematic structural diagram of the training device of the image generator provided by this application.
  • Figure 8 is a schematic structural diagram of an electronic device provided by this application.
  • 100 Training device for image generator; 10: Sample image acquisition module; 20: Degraded image repair module; 30: Discriminator optimization module; 40: Generator optimization module; 50: Generator determination module; 810: Processor; 820: communication interface; 830: memory; 840: communication bus.
  • this application provides a training method for an image generator, including:
  • Step S1 Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image.
  • the original sample face image represents a sample face image with a relatively high image performance index (or image quality index).
  • Degraded sample face images represent sample face images with relatively low image performance indicators.
  • the original sample face image and the degraded sample face image constitute a sample face image pair, which is used to supervise the training of the image generator and image discriminator.
  • Step S2 Input the degraded sample face image into the pre-built image generator to obtain the repaired sample face image generated by the image generator; the image generator is built based on the Transformer model.
  • the Transformer model is a model built based on the idea of Attention, which is widely used in technical fields such as natural language processing, semantic relationship extraction, summary generation, named entity recognition, and machine translation.
  • Step S3 Based on the original sample face image and the repaired sample face image, optimize the pre-built image discriminator to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image .
  • Step S4 Based on the original sample face image and the repaired sample face image, optimize the image generator to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network.
  • Step S5 Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stop the optimization, and use the optimized image generator as the target image generator for the face image to be repaired Perform blind image repair processing.
  • the preset convergence condition may be a preset maximum number of iterations, a preset image performance index threshold, or other convergence conditions, which are not specifically limited in this application. For example, when the preset convergence condition is the preset maximum number of iterations, determine whether the current iteration number reaches the preset maximum iteration number, and when the current iteration number reaches the preset maximum iteration number, stop the iteration; when the current iteration number If the preset maximum number of iterations is not reached, the iteration continues until the current iteration number reaches the preset maximum number of iterations.
  • the preset convergence condition is the preset image performance index threshold
  • the image generator does not know the image loss type or image degradation type of the degraded sample face image in advance, and is used to perform image blind repair processing on the degraded sample face image to generate a repaired sample face image.
  • the image discriminator is used to determine whether the repaired sample face image generated by the image generator is consistent with the original sample face image.
  • an image generator and an image discriminator are constructed in advance to form a generative adversarial network.
  • the image generator does not know in advance the image loss type or image degradation type of the degraded sample face image. , carry out image blind repair processing on the degraded sample face image, and try to generate a repaired sample face image with high image performance index, high degree of restoration and realistic face image; while the image discriminator tries to identify the repaired sample face image generated by the image generator.
  • the difference between the face image and the original sample face causes the image generator and image discriminator to be continuously optimized during the adversarial training process until the preset convergence conditions are reached, the optimization is stopped, and the optimized image generator is used as the target image generator , perform blind image repair processing on the face image to be repaired, so as to obtain a high-quality target repaired face image, and realize the end-to-end image blind repair function.
  • the training process does not require manual intervention, and the training path is relatively simple, overcoming the When using convolutional neural networks for blind image repair processing in the existing technology, ideal training results cannot be obtained in a single stage.
  • the training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome and complicated.
  • the training method of the image generator provided by this application also includes: performing an image degradation operation on the original sample face image to obtain a degraded sample face corresponding to the original sample face image.
  • image degradation operations include but are not limited to blur operations, downsampling operations, Gaussian white noise addition operations, and JPEG compression operations.
  • the blur operation includes Gaussian blur operation and motion blur operation.
  • Downsampling operations include bicubic interpolation (Bicubic) downsampling operations, bilinear interpolation (Bilinear) downsampling operations, and Lanczos downsampling operations.
  • the Lanczos algorithm is a method that transforms a symmetric matrix into a symmetric three pair through orthogonal similarity transformation. Algorithm for angular matrices.
  • the noise adding operation includes the Gaussian white noise adding operation and the Poisson noise adding operation.
  • this embodiment does not use pre-prepared degraded sample face images, but performs online image degradation operations during the training process, which can make the types of degraded sample face images used during the training process It is richer to improve the adaptive image repair ability of the image generator when dealing with face images to be repaired with unknown image loss types, and improves the optimization training effect.
  • the original sample face image is subjected to online image degradation processing by setting an online image degradation operation to obtain a degraded sample face image, which enriches the image loss types of the degraded sample face image, thereby improving the Optimizing the training effect improves the generalization performance of the target image generator so that it can perform blind image repair processing for face images to be repaired with different image loss types.
  • step S3 specifically includes steps S31 to step S34, wherein:
  • Step S31 Input the original sample face image and the repaired sample face image to the image discriminator.
  • Step S32 Obtain the first image discrimination result corresponding to the original sample face image, and obtain the second image discrimination result corresponding to the repaired sample face image.
  • the first image discrimination result represents the image discrimination result output by the image discriminator after the original sample face image is input to the image discriminator.
  • the second image discrimination result represents the image discrimination result output by the image discriminator after the repaired sample face image is input to the image discriminator.
  • Step S33 Based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator.
  • the loss function of the image discriminator can use the first loss function provided in the embodiment of this application, or other loss functions, and this application does not impose specific restrictions.
  • Step S34 Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain an optimized image discriminator.
  • the device parameters of the image generator need to be fixed, that is, the device parameters of the image generator are kept fixed and only the device parameters of the image discriminator are iteratively updated.
  • the loss of the image discriminator can be minimized at the fastest iteration speed, that is, the optimization training task of the image discriminator can be completed with high quality and efficiency, achieving While improving the optimization training efficiency of the image discriminator, it further improves the optimization training effect of the image discriminator.
  • a first distribution probability that the first image discrimination result is true is obtained, and a second distribution probability that the second image discrimination result is false is obtained, and the image discriminator is determined based on the first distribution probability and the second distribution probability.
  • the first loss function is obtained.
  • the first distribution probability represents the distribution probability that the image discrimination result obtained by inputting the original sample face image to the image discriminator is expected to be true.
  • the second distribution probability represents the distribution probability that the image discrimination result obtained by inputting the repaired sample face image to the image discriminator is false.
  • step S4 specifically includes steps S41 to step S43, wherein:
  • Step S41 Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result.
  • Step S42 Obtain the second loss function of the image generator based on the original sample face image, the repaired sample face image, and the second image discrimination result.
  • loss function of the image generator can use the second loss function provided in the embodiment of the present application, or other loss functions, which are not specifically limited by this application.
  • Step S43 Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.
  • the third image generator of the image generator can be accurately calculated.
  • the second loss function is used as the objective function to iteratively optimize the device parameters of the image generator, which can improve the optimization training effect of the image generator.
  • iterating along the direction of gradient descent of the second loss function can minimize the loss of the image generator at the fastest iteration speed, that is, the optimization training task of the image generator can be completed with high quality and efficiency, achieving While improving the optimization training efficiency of the image generator, the optimization training effect of the image generator is further improved.
  • the image generator training method provided by this application uses fewer loss functions and training techniques than the existing method of using convolutional neural networks for blind image repair processing, so the training process is relatively simple and easy to implement. .
  • step S42 specifically includes steps S421 to step S424, wherein:
  • Step S421 Obtain the content loss of the image generator based on the original sample face image and the repaired sample face image.
  • the content loss is used to measure the content difference between the repaired sample face image and the original sample face image.
  • Step S422 Obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image.
  • the ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.
  • Step S423 Obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability.
  • the maximum probability represents the maximum probability that the second image discrimination result obtained when the repaired sample face image is input into the image discriminator is true.
  • Step S424 Obtain the second loss function of the image generator based on the content loss, ID loss and generation loss.
  • the above steps S421 to S424 can accurately calculate the content loss, ID loss and generation loss of the image generator in the process of generating the repaired sample face image, and by combining the content loss, ID loss and generation loss of the image generator.
  • Calculating the second loss function of the image generator, and then using the second loss function as the objective function to iteratively optimize the device parameters of the image generator can further improve the optimization training effect of the image generator.
  • the image discriminator is a wavelet discriminator.
  • the wavelet discriminator provided in this embodiment is used to eliminate or weaken the block effect in the repair sample face image generated during the iterative training process of the image generator, so that the finally obtained target image generator has a better blind image repair effect, and further Improved the optimization training effect of image generator.
  • the blocking effect can be intuitively observed from the repaired sample face image generated by the image generator, and the image performance index parameters of the repaired sample face image can be obtained to determine whether there is a blocking effect in the repaired sample face image. question.
  • the image generator is optimized and trained through the wavelet discriminator provided in this embodiment, so that there is no block effect or less block effect in the repaired sample face image generated based on the optimized image generator. block effect.
  • the image discriminator also includes a spectral normalization (Spectral Normalization) stability constraint, which is used to improve the stability of the optimization training model to solve the problem of unstable training during the optimization training process.
  • Spectral Normalization Spectral Normalization
  • the wavelet discriminator includes a discrete wavelet transform module and a concatenated convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into feature images of multiple frequency scales; the concatenated convolution module is used to decompose multiple frequency scales.
  • the frequency scale feature images are spliced, and the spliced feature images are convolved to obtain the reconstructed image.
  • feature images at multiple frequency scales contain more image detail information than the input image. Since the discrete wavelet transform module has good time-frequency positioning function, it has better ability to retain image detail information. Therefore, the discrete wavelet transform can be used to recover the image detail information that is lost in the input image, but exists in the original image corresponding to the input image, so that the feature images of multiple frequency scales containing image detail information can be spliced according to the splicing convolution module. and convolution smoothing processing to obtain a reconstructed image containing image detail information, which increases the range of the image's receptive field, thereby eliminating or weakening the block effect existing in the input image.
  • the wavelet discriminator provided in this embodiment can use its discrete wavelet transform principle and splicing convolution principle to supervise and train the image generator to generate a repaired sample face image with more image detail information, thereby improving the receptive field of the repaired sample face image. range, thereby eliminating or weakening the block effect existing in the repaired sample face image, improving the optimization training effect, and obtaining an image generator with better performance.
  • the self-control requires global attention.
  • global attention has the problem of excessive calculation, so local attention is used instead of global attention to solve the problem.
  • the problem of excessive calculation amount can be used instead of global attention to reduce the range of the receptive field of the generated image, resulting in block effects in the generated repaired sample face images.
  • the wavelet discriminator provided in this embodiment can expand the range of the receptive field and achieve a better balance between calculation efficiency and image repair performance to solve the problem of block effects in repair sample face images while ensuring calculation efficiency.
  • the blind image repair effect of the target image generator is improved.
  • step S2 specifically includes steps S21 to step S23, wherein:
  • Step S21 Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.
  • low-level semantic features include image contour features, edge features, color features, texture features and shape features.
  • High-level semantic features represent visual features visualized in images, such as faces, beaches and other features with rich semantic information.
  • the encoder is used to perform convolution operations, nonlinear operations, etc. on the degraded sample face image to obtain the low-level semantic features and high-level semantic features of the degraded sample face image.
  • Step S22 Input the high-level semantic features into the feature conversion module of the image generator to obtain the style vector.
  • the feature conversion module can also be called a mapping module.
  • Step S23 Input low-level semantic features, high-level semantic features and style vectors into the decoder of the image generator to obtain a repaired sample face image.
  • the encoder includes multiple coding modules.
  • Each coding module corresponds to a feature scale.
  • the coding module is used to extract a feature map corresponding to its own feature scale from the input image, and combine the low-dimensional feature map with the high-dimensional feature map.
  • the feature map is sent to the decoder, and the high-dimensional feature map is sent to the mapping module.
  • the low-dimensional feature map is the low-level semantic feature
  • the high-dimensional feature map is the high-level semantic feature.
  • the mapping module (ie, the above-mentioned feature conversion module) includes multiple fully connected layers.
  • the multiple fully connected layers are used to receive high-dimensional feature maps sent by the encoding module and convert the high-dimensional feature map mapping into Style vector, style vector includes multiple vector elements, each vector element corresponds to a visual feature.
  • the decoder includes multiple cascaded decoding modules, each decoding module corresponding to a feature scale.
  • Each decoding module is used to obtain the low-dimensional feature map corresponding to its own feature scale, based on the low-dimensional feature map corresponding to its own feature scale, the high-dimensional feature map, the style vector corresponding to the high-dimensional feature map, and the previous level input.
  • Parameters generate image repair results, and output the image repair results as next-level input parameters.
  • the upper-level input parameters represent the image repair results of the upper-level decoding module.
  • the input parameters of the upper level of the first layer decoding module are constants or Fourier features.
  • the last-level decoding module generates repaired sample face images based on the low-dimensional feature map corresponding to its own feature scale, the high-dimensional feature map, the style vector corresponding to the high-dimensional feature map, and the input parameters of the previous level.
  • the image repair results output by the upper-level decoding module are added to their corresponding relative position codes as the input parameters of the next-level decoding module.
  • the training method of the image generator provided by this application includes the following steps:
  • Step 1 Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image.
  • the degraded sample face image is input into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.
  • the image generator is built based on the Transformer model.
  • the high-level semantic features are input into the feature transformation module of the image generator to obtain the style vector.
  • the low-level semantic features, high-level semantic features and style vectors are input into the decoder of the image generator to obtain the repaired sample face image.
  • Step 2 Input the original sample face image and the repaired sample face image to the image discriminator.
  • the image discriminator is used to distinguish the original sample face image and the repaired sample face image.
  • a first image discrimination result corresponding to the original sample face image is obtained, and a second image discrimination result corresponding to the repaired sample face image is obtained.
  • a first loss function of the image discriminator is obtained. Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain the optimized image discriminator.
  • Step 3 Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result.
  • the content loss of the image generator is obtained based on the original sample face image and the repaired sample face image.
  • the content loss is used to measure the content difference between the repaired sample face image and the original sample face image.
  • the ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image.
  • the ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.
  • the maximum probability that the second image discrimination result is true is obtained, and the generation loss of the image generator is obtained based on the maximum probability.
  • Based on the content loss, ID loss, and generation loss obtain the second loss function of the image generator. Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.
  • Step 4 Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator for the face image to be repaired Perform blind image repair processing.
  • FIG. 6 is a schematic structural diagram of the optimized training model of the image generator in the second specific embodiment of the present application. As shown in Figure 6, the second specific embodiment provided by the present application specifically includes the following steps:
  • Step (1) Obtain the original sample face image, and perform an online image degradation operation on the original sample face image to obtain a degraded sample face image corresponding to the original sample face image.
  • the image degradation operation includes: Not limited to blur operations, downsampling operations, Gaussian white noise operations, and JPEG compression operations.
  • Step (2) Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.
  • the image generator is built based on the Transformer model. Input the high-level semantic features into the feature conversion module (i.e., mapping module) of the image generator to obtain the style vector corresponding to the high-level semantic features.
  • the low-level semantic features, high-level semantic features and style vectors are input into the decoder of the image generator to obtain the repaired sample face image.
  • the encoder includes multiple encoding modules.
  • the mapping module includes multiple fully connected layers.
  • the decoder includes multiple decoding modules, and the number of decoding modules is equal to the number of encoding modules.
  • the decoding module can be composed of AdaIN and double attention layer (Double Attn), or it can be composed of AdaIN and multi-layer perceptron layer (MLP).
  • the input and output of the decoding module use residual connections.
  • Step (3) Input the original sample face image and the repaired sample face image to the image discriminator, and the image discriminator is used to distinguish the original sample face image and the repaired sample face image.
  • a first image discrimination result corresponding to the original sample face image is obtained, and a second image discrimination result corresponding to the repaired sample face image is obtained.
  • the first distribution probability represents the distribution probability that the image discrimination result obtained by inputting the original sample face image to the image discriminator is expected to be true.
  • the second distribution probability represents the distribution probability that the image discrimination result obtained by inputting the repaired sample face image to the image discriminator is false.
  • the first loss function is shown in the following formula (1):
  • L D represents the first loss function
  • y represents the original sample face image
  • P y represents the distribution probability of the original sample face image
  • D(y) represents the first image discrimination result corresponding to the original sample face image
  • x represents the degraded sample face image
  • P x represents the distribution probability of the degraded sample face image
  • G(x) represents the repaired sample face image corresponding to the degraded sample face image
  • D(G(x)) represents the repaired face image
  • the second image discrimination result corresponding to the sample face image Indicates the second distribution probability corresponding to the repaired sample face image
  • represents the weight coefficient
  • spectral normalization stability constraint The two negative signs in the formula indicate the direction of gradient descent to control the value of the first loss function between (0,1) for gradient descent.
  • the image discriminator consists of a wavelet discriminator and a spectral normalization stability constraint.
  • the wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module.
  • the discrete wavelet transform module is the DWT discrete wavelet transform module, which is used to decompose the input image into Feature images at multiple frequency scales.
  • the concatenated convolution module includes concat concatenation unit and conv convolution unit. The concat concatenation unit is used to concatenate feature images at multiple frequency scales.
  • the conv convolution unit performs convolution and smoothing processing on the spliced feature images to obtain the reconstructed image.
  • the DWT discrete wavelet transform module decomposes a 1024*1024 input image into four 512*512 feature images.
  • the concat splicing unit splices four 512*512 feature images.
  • the conv convolution unit will perform convolution and smoothing processing on the spliced feature images to obtain a 1024*1024 reconstructed image.
  • Step (4) Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result.
  • the content loss of the image generator is obtained based on the original sample face image and the repaired sample face image.
  • the content loss is used to measure the content difference between the repaired sample face image and the original sample face image.
  • L 1 loss is used as the image generator.
  • the content loss of where the calculation method of content loss is as shown in formula (2):
  • L 1 (x) represents the content loss of the image generator
  • x represents the degraded sample face image
  • y represents the original sample face image
  • G(x) represents the repaired sample face image
  • the ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image.
  • the ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.
  • the ID loss is calculated as follows Formula (3) shows:
  • L ID (x) represents the ID loss of the image generator
  • R represents the face recognition network trained based on the preset face recognition algorithm
  • R (y) represents the input of the degraded sample face image to the face recognition network output.
  • the first face recognition result, R(G(x)) represents the second face recognition result output from the repaired sample face image input to the face recognition network, ⁇ R(y),R(G(x))> Indicates the similarity between the degraded sample face image and the repaired sample face image.
  • the above formula represents "1 minus the similarity between the degraded sample face image and the repaired sample face image.” Since the difference between the degraded sample face image and the repaired sample face image when generative adversarial training was just started, The similarity is low. As the generative adversarial training continues, the similarity between the two gradually increases. Using “1 minus the similarity between the two” means that as the generative adversarial training continues, the similarity gradually increases, and The ID loss is gradually reduced to achieve gradient reduction of the ID loss. Obtain the maximum probability that the second image discrimination result is true, obtain the unsaturated loss based on the maximized probability, and use the unsaturated loss as the generation loss of the image generator, where the calculation method of the generation loss is as shown in formula (4):
  • Lgan (x) represents the generation loss of the image generator
  • G(x) represents the repaired sample face image
  • D(G(x)) represents the second image obtained by inputting the repaired sample face image into the image discriminator.
  • the discrimination result, maxlog[D(G(x))] represents the maximum probability that the second image discrimination result is true.
  • the repaired sample face image generated by the image generator is easily recognized by the image discriminator, that is, D(G(x)) approaches 0.
  • the non-saturated image The gradient of the generator's log[D(G(x))] does not tend to 0, which can provide a better gradient direction for the device parameter update of the image generator and improve the convergence speed of the iteration.
  • the second loss function of the image generator is obtained, where the calculation method of the second loss function is as shown in formula (5):
  • L G represents the second loss function
  • L 1 (x) represents the content loss of the image generator
  • L gan (x) represents the generation loss of the image generator
  • L ID (x) represents the ID loss of the image generator
  • ⁇ 1 represents the first hyperparameter
  • ⁇ 2 represents the second hyperparameter
  • ⁇ 3 represents the third hyperparameter.
  • Step (5) Repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator alternately to obtain the image performance index of the repaired sample face image generated by the current image generator.
  • the image performance index reaches the preset image performance index threshold.
  • the training device for an image generator provided by this application is described below.
  • the training device for an image generator described below and the training method for an image generator described above may be referred to correspondingly.
  • this application provides an image generator training device 100, including a sample image acquisition module 10, a degraded image repair module 20, a discriminator optimization module 30, a generator optimization module 40 and a generator determination module 50 ,in:
  • the sample image acquisition module 10 is used to acquire the original sample face image and the degraded sample face image corresponding to the original sample face image.
  • the degraded image repair module 20 is used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model.
  • the discriminator optimization module 30 is used to optimize the pre-built image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish between the original sample face image and the repaired sample face image. Repair sample face image.
  • the generator optimization module 40 is used to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network.
  • the generator determination module 50 is configured to alternately repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stopping the optimization, and using the optimized image generator as the target image generator to Perform blind image repair processing on the face image to be repaired.
  • the discriminator optimization module 30 includes a sample image input unit, a discrimination result acquisition unit, a first function acquisition unit and a discriminator optimization unit, where:
  • the sample image input unit is used to input the original sample face image and the repaired sample face image to the image discriminator.
  • the discrimination result acquisition unit is used to obtain the first image discrimination result corresponding to the original sample face image, and to obtain the second image discrimination result corresponding to the repaired sample face image.
  • the first function acquisition unit is configured to acquire the first loss function of the image discriminator based on the first image discrimination result and the second image discrimination result.
  • the discriminator optimization unit is used to fix the device parameters of the image generator and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator to obtain an optimized image discriminator.
  • the generator optimization module 40 includes a discriminant data acquisition unit, a second function acquisition unit and a generator optimization unit, wherein:
  • the discrimination data acquisition unit is used to acquire the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator.
  • the second function acquisition unit is used to acquire the second loss function of the image generator based on the original sample face image, the repaired sample face image, and the second image discrimination result.
  • the generator optimization unit is used to fix the device parameters of the image discriminator and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.
  • the second function acquisition unit includes a content loss acquisition subunit, an ID loss acquisition subunit, a generation loss acquisition subunit, and a loss function acquisition subunit, where.
  • the content loss acquisition subunit is used to obtain the content loss of the image generator based on the original sample face image and the repaired sample face image.
  • the content loss is used to measure the content difference between the repaired sample face image and the original sample face image.
  • the ID loss acquisition subunit is used to obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image.
  • the ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.
  • the generation loss acquisition subunit is used to obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability.
  • the loss function acquisition subunit is used to acquire the second loss function of the image generator based on content loss, ID loss and generation loss.
  • the image discriminator is a wavelet discriminator.
  • the wavelet discriminator includes a discrete wavelet transform module and a concatenated convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into feature images of multiple frequency scales; the concatenated convolution module is used to decompose multiple frequency scales.
  • the frequency scale feature images are spliced, and the spliced feature images are convolved to obtain the reconstructed image.
  • the degraded image repair module 20 includes a feature acquisition unit, a feature conversion unit and an image repair unit, where:
  • the feature acquisition unit is used to input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.
  • the feature conversion unit is used to input high-level semantic features into the feature conversion module of the image generator to obtain the style vector.
  • the image repair unit is used to input low-level semantic features, high-level semantic features and style vectors into the decoder of the image generator to obtain a repaired sample face image.
  • Figure 8 illustrates a schematic diagram of the physical structure of an electronic device.
  • the electronic device may include: a processor (processor) 810, a communication interface (Communications Interface) 820, a memory (memory) 830 and a communication bus 840.
  • the processor 810, the communication interface 820, and the memory 830 complete communication with each other through the communication bus 840.
  • the processor 810 can call the logic instructions in the memory 830 to execute the training method of the image generator.
  • the method includes: obtaining the original sample face image and the degraded sample face image corresponding to the original sample face image;
  • the face image is input into the pre-built image generator, and the repaired sample face image generated by the image generator is obtained; the image generator is built based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-built sample face image is
  • the image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image; the image generator is optimized based on the original sample face image and the repaired sample face image.
  • the optimized image generator is obtained; the image generator and the image discriminator form a generative adversarial network; alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and
  • the optimized image generator is used as the target image generator to perform blind image repair processing on the face image to be repaired.
  • the above-mentioned logical instructions in the memory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, Read-Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program code. .
  • the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program is implemented when executed by the processor to execute the training method of the image generator provided by each of the above methods.
  • the method includes: obtaining an original sample face image and a degraded sample face image corresponding to the original sample face image; inputting the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator.
  • the image generator is built based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-built image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image Face image and repaired sample face image; based on the original sample face image and repaired sample face image, the image generator is optimized to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; alternately Repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator to perform blind image repair of the face image to be repaired. deal with.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place. , or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute various embodiments or methods of certain parts of the embodiments.

Abstract

An image generator training method and apparatus, and an electronic device and a readable storage medium, which relate to the technical field of image processing. The image generator training method comprises: acquiring an original facial sample image and a degraded facial sample image corresponding to the original facial sample image (S1); inputting the degraded facial sample image into an image generator to obtain a repaired facial sample image, wherein the image generator is constructed on the basis of a transformer model (S2); on the basis of the original facial sample image and the repaired facial sample image, optimizing an image discriminator (S3); on the basis of the original facial sample image and the repaired facial sample image, optimizing the image generator (S4); and repeating the steps of optimizing the image discriminator and the image generator until a preset convergence condition is met, such that blind image repair processing is performed, by means of the optimized image generator, on a facial image to be repaired, thereby achieving an end-to-end blind image repair function (S5).

Description

图像生成器的训练方法、装置、电子设备和可读存储介质Training method, device, electronic device and readable storage medium for image generator
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年06月23日提交的申请号为202210715667.4,发明名称为“图像生成器的训练方法、装置、电子设备和可读存储介质”的中国专利申请的优先权,其通过引用方式全部并入本文。This application claims the priority of the Chinese patent application with application number 202210715667.4 submitted on June 23, 2022, and the invention title is "Training method, device, electronic device and readable storage medium for image generator", which is by reference All are incorporated herein.
技术领域Technical field
本申请涉及图像处理技术领域,尤其涉及一种图像生成器的训练方法、装置、电子设备和可读存储介质。The present application relates to the field of image processing technology, and in particular to a training method, device, electronic device and readable storage medium for an image generator.
背景技术Background technique
图像修复技术是一种基于图像已知信息以及预设修复规则对待修复图像中的丢失信息或者细节信息进行修复,从而达到视觉上逼真效果的技术。而图像盲修复技术是指预选不知道待修复图像的图像损失类型或者图像退化类型的情况下对待修复图像进行修复的技术。Image repair technology is a technology that repairs the lost information or detailed information in the image to be repaired based on the known information of the image and preset repair rules to achieve visually realistic effects. The blind image repair technology refers to the technology of pre-selecting the image to be repaired without knowing the image loss type or image degradation type of the image to be repaired.
现有技术中,采用卷积神经网络(Convolutional Neural Networks,CNN)技术来实现图像盲修复功能。然而,采用这种方法单个阶段无法获得理想的训练结果,因此需要分两个阶段来完成训练任务,其中,第一阶段需要训练一个生成器,而第二阶段需要将训练好的生成器嵌入至深度学习分割网络(Unet)的网络结构中去调试,从而根据调试好的生成器对待修复图像进行修复处理。由此可见,现有技术中提供的图像盲修复方法无法通过单个阶段的训练获得理想的训练结果,而需要分两个阶段来完成训练任务,并且训练过程需要人工干预,训练路径比较繁琐复杂。In the existing technology, Convolutional Neural Networks (CNN) technology is used to realize the blind image repair function. However, this method cannot obtain ideal training results in a single stage, so the training task needs to be completed in two stages. The first stage requires training a generator, and the second stage requires embedding the trained generator into The network structure of the deep learning segmentation network (Unet) is debugged, so that the image to be repaired is repaired according to the debugged generator. It can be seen that the blind image repair method provided in the existing technology cannot obtain ideal training results through a single stage of training, but needs to complete the training task in two stages, and the training process requires manual intervention, and the training path is relatively cumbersome and complicated.
因此,针对现有技术中,采用卷积神经网络进行图像盲修复处理时,单个阶段无法获得理想的训练结果,需要分两个阶段来完成训练任务,并且训练过程需要人工干预,以及训练路径繁琐复杂的技术问题,相关领域技术人员尚无有效解决方法。Therefore, in the existing technology, when using convolutional neural networks for blind image repair processing, ideal training results cannot be obtained in a single stage. The training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome. For complex technical problems, technicians in relevant fields have no effective solutions.
发明内容Contents of the invention
本申请提供一种图像生成器的训练方法、装置、电子设备和可读存储介质,用以解决现有技术中采用卷积神经网络进行图像盲修复处理时,单个阶段无法获得理想的训练结果,需要分两个阶段来完成训练任务,并且训练过程需要人工干预,以及训练路径繁琐复杂的缺陷,实现端对端的图像盲修复功能,且训练过程不需要人工干预,训练路径比较简单。This application provides a training method, device, electronic equipment and readable storage medium for an image generator to solve the problem that when using a convolutional neural network to perform blind image repair processing in the prior art, ideal training results cannot be obtained in a single stage. The training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome and complicated to achieve the end-to-end image blind repair function, and the training process does not require manual intervention, and the training path is relatively simple.
本申请提供一种图像生成器的训练方法,包括:获取原始样本人脸图像以及所述原始样本人脸图像对应的降质样本人脸图像;将所述降质样本人脸图像输入至预先构建的图像生成器中,得到所述图像生成器生成的修复样本人脸图像;所述图像生成器基于Transformer模型构建;基于所述原始样本人脸图像和所述修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;所述图像判别器用于区分所述原始样本人脸图像与所述修复样本人脸图像;基于所述原始样本人脸图像和所述修复样本人脸图像,对所述图像生成器进行优化,得到优化后的图像生成器;所述图像生成器与所述图像判别器构成生成对抗网络;交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。This application provides a training method for an image generator, which includes: acquiring an original sample face image and a degraded sample face image corresponding to the original sample face image; inputting the degraded sample face image into a pre-constructed In the image generator, the repaired sample face image generated by the image generator is obtained; the image generator is constructed based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-constructed The image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image; based on the original sample face image and the repaired sample face image Sample face image, optimize the image generator to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; alternately repeat the above steps of optimizing the image discriminator and optimizing the image The generator steps until the preset convergence condition is reached, the optimization is stopped, and the optimized image generator is used as the target image generator to perform blind image repair processing on the face image to be repaired.
根据本申请提供的一种图像生成器的训练方法,所述基于所述原始样本人脸图像和所述修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器,包括:将所述原始样本人脸图像和所述修复样本人脸图像输入至所述图像判别器;获取所述原始样本人脸图像对应的第一图像判别结果,以及获取所述修复样本人脸图像对应的第二图像判别结果;基于所述第一图像判别结果和所述第二图像判别结果,获取所述图像判别器的第一损失函数;固定所述图像生成器的设备参数,并沿着所述第一损失函数梯度下降的方向进行迭代,以优化所述图像判别器的设备参数,得到优化后的图像判别器。According to an image generator training method provided by the present application, the pre-constructed image discriminator is optimized based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator. , including: inputting the original sample face image and the repaired sample face image to the image discriminator; obtaining the first image discrimination result corresponding to the original sample face image, and obtaining the repaired sample face image. a second image discrimination result corresponding to the face image; based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator; fix the device parameters of the image generator, and Iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator to obtain an optimized image discriminator.
根据本申请提供的一种图像生成器的训练方法,所述基于所述原始样本人脸图像和所述修复样本人脸图像,对所述图像生成器进行优化,得到优化后的图像生成器,包括:获取所述修复样本人脸图像输入至所述图像判别器中得到的第二图像判别结果;基于所述原始样本人脸图像、所述修 复样本人脸图像以及所述第二图像判别结果,获取所述图像生成器的第二损失函数;固定图像判别器的设备参数,并沿着所述第二损失函数梯度下降的方向进行迭代,以优化所述图像生成器的设备参数,得到优化后的图像生成器。According to an image generator training method provided by this application, the image generator is optimized based on the original sample face image and the repaired sample face image to obtain an optimized image generator, The method includes: obtaining the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator; and based on the original sample face image, the repaired sample face image and the second image discrimination result. , obtain the second loss function of the image generator; fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator, and obtain the optimization Image generator after.
根据本申请提供的一种图像生成器的训练方法,所述基于所述原始样本人脸图像、所述修复样本人脸图像以及所述第二图像判别结果,获取所述图像生成器的第二损失函数,包括:基于所述原始样本人脸图像和所述修复样本人脸图像获取所述图像生成器的内容损失,所述内容损失用于衡量修复样本人脸图像与原始样本人脸图像之间的内容差异;基于所述原始样本人脸图像和所述修复样本人脸图像获取所述图像生成器的ID损失,所述ID损失用于衡量修复样本人脸图像与原始样本人脸图像之间的距离差异;获取所述第二图像判别结果为真的最大化概率,并基于所述最大化概率获取所述图像生成器的生成损失;基于所述内容损失、所述ID损失以及所述生成损失,获取所述图像生成器的第二损失函数。According to a training method of an image generator provided by the present application, the second image generator of the image generator is obtained based on the original sample face image, the repaired sample face image and the second image discrimination result. The loss function includes: obtaining the content loss of the image generator based on the original sample face image and the repaired sample face image, where the content loss is used to measure the difference between the repaired sample face image and the original sample face image. content difference between; obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image, the ID loss is used to measure the difference between the repaired sample face image and the original sample face image distance difference between; obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability; based on the content loss, the ID loss and the Generate loss, obtain the second loss function of the image generator.
根据本申请提供的一种图像生成器的训练方法,所述图像判别器为小波判别器。According to an image generator training method provided by this application, the image discriminator is a wavelet discriminator.
根据本申请提供的一种图像生成器的训练方法,所述小波判别器包括离散小波变换模块和拼接卷积模块,其中:所述离散小波变换模块用于将输入图像分解为多个频率尺度的特征图像;所述拼接卷积模块用于对多个频率尺度的特征图像进行拼接,并对拼接后的特征图像进行卷积处理,得到重建图像。According to an image generator training method provided by this application, the wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into multiple frequency scales. Feature image; the splicing convolution module is used to splice feature images of multiple frequency scales, and perform convolution processing on the spliced feature images to obtain a reconstructed image.
根据本申请提供的一种图像生成器的训练方法,所述将所述降质样本人脸图像输入至预先构建的图像生成器中,得到所述图像生成器生成的修复样本人脸图像,包括:将所述降质样本人脸图像输入至所述图像生成器的编码器中,得到低层语义特征和高层语义特征;将所述高层语义特征输入至所述图像生成器的特征转换模块中,得到风格向量;将所述低层语义特征、所述高层语义特征和所述风格向量输入至所述图像生成器的解码器中,得到所述修复样本人脸图像。According to an image generator training method provided by the present application, the degraded sample face image is input into a pre-constructed image generator to obtain a repaired sample face image generated by the image generator, including : Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features; input the high-level semantic features into the feature conversion module of the image generator, Obtain the style vector; input the low-level semantic features, the high-level semantic features and the style vector into the decoder of the image generator to obtain the repaired sample face image.
本申请还提供一种图像生成器的训练装置,包括:样本图像获取模块,用于获取原始样本人脸图像以及所述原始样本人脸图像对应的降质样本人 脸图像;降质图像修复模块,用于将所述降质样本人脸图像输入至预先构建的图像生成器中,得到所述图像生成器生成的修复样本人脸图像;所述图像生成器基于Transformer模型构建;判别器优化模块,用于基于所述原始样本人脸图像和所述修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;所述图像判别器用于区分所述原始样本人脸图像与所述修复样本人脸图像;生成器优化模块,用于基于所述原始样本人脸图像和所述修复样本人脸图像,对所述图像生成器进行优化,得到优化后的图像生成器;所述图像生成器与所述图像判别器构成生成对抗网络;生成器确定模块,用于交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。This application also provides a training device for an image generator, including: a sample image acquisition module, used to acquire an original sample face image and a degraded sample face image corresponding to the original sample face image; a degraded image repair module , used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model; the discriminator optimization module , used to optimize the pre-constructed image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image A face image and the repaired sample face image; a generator optimization module configured to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generation The image generator and the image discriminator form a generative adversarial network; the generator determination module is used to alternately repeat the above-mentioned steps of optimizing the image discriminator and the steps of optimizing the image generator until the preset convergence condition is reached, and stop Optimize, and use the optimized image generator as the target image generator to perform blind image repair processing on the face image to be repaired.
本申请还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述图像生成器的训练方法。The present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements any of the above image generators. training methods.
本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述图像生成器的训练方法。The present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the training method of any of the above image generators is implemented.
本申请提供的图像生成器的训练方法、装置、电子设备和可读存储介质,通过预先构建图像生成器与图像判别器以构成生成对抗网络,在多次优化过程中,图像生成器预先不知道降质样本人脸图像的图像损失类型或者图像退化类型,对降质样本人脸图像进行图像盲修复处理,尽量生成图像性能指标高的、还原度高的以及逼真的修复样本人脸图像;而图像判别器尽量识别出图像生成器生成的修复样本人脸图像与原始样本人脸的区别,使得图像生成器与图像判别器在对抗训练过程中不断被优化,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理,从而可以得到高质量的目标修复人脸图像,实现了端对端的图像盲修复功能,训练过程不需要人工干预,且训练路径比较简单,克服了现有技术中采用卷积神经网络进行图像盲修复处理时,单个阶段无法获得理想的训练结果,需要分两个阶段来完成训 练任务,并且训练过程需要人工干预,以及训练路径繁琐复杂的缺陷。The training method, device, electronic equipment and readable storage medium of the image generator provided by this application pre-constructs the image generator and the image discriminator to form a generative adversarial network. During the multiple optimization processes, the image generator does not know in advance According to the image loss type or image degradation type of the degraded sample face image, blind image repair processing is performed on the degraded sample face image, and try to generate a repaired sample face image with high image performance index, high degree of restoration and realistic; and The image discriminator tries to identify the difference between the repaired sample face image generated by the image generator and the original sample face, so that the image generator and image discriminator are continuously optimized during the adversarial training process until the preset convergence conditions are reached and the optimization is stopped. , and use the optimized image generator as the target image generator to perform blind image repair processing on the face image to be repaired, so that a high-quality target repaired face image can be obtained, and the end-to-end image blind repair function and training can be achieved The process does not require manual intervention, and the training path is relatively simple. It overcomes the problem that when using convolutional neural networks for blind image repair processing in the existing technology, ideal training results cannot be obtained in a single stage, and the training task needs to be completed in two stages, and The training process requires manual intervention, and the training path is cumbersome and complex.
附图说明Description of the drawings
为了更清楚地说明本申请或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in this application or the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.
图1是本申请提供的图像生成器的训练方法的流程示意图之一;Figure 1 is one of the flow diagrams of the training method of the image generator provided by this application;
图2是本申请提供的图像生成器的训练方法的流程示意图之二;Figure 2 is the second schematic flow chart of the training method of the image generator provided by this application;
图3是本申请提供的图像生成器的训练方法的流程示意图之三;Figure 3 is the third schematic flow chart of the training method of the image generator provided by this application;
图4是本申请提供的图像生成器的训练方法的流程示意图之四;Figure 4 is the fourth schematic flowchart of the training method of the image generator provided by this application;
图5是本申请提供的图像生成器的训练方法的流程示意图之五;Figure 5 is the fifth schematic flow chart of the training method of the image generator provided by this application;
图6是本申请具体实施例二中图像生成器的优化训练模型的结构示意图;Figure 6 is a schematic structural diagram of the optimized training model of the image generator in the second embodiment of the present application;
图7是本申请提供的图像生成器的训练装置的结构示意图;Figure 7 is a schematic structural diagram of the training device of the image generator provided by this application;
图8是本申请提供的电子设备的结构示意图。Figure 8 is a schematic structural diagram of an electronic device provided by this application.
附图标记:Reference signs:
100:图像生成器的训练装置;10:样本图像获取模块;20:降质图像修复模块;30:判别器优化模块;40:生成器优化模块;50:生成器确定模块;810:处理器;820:通信接口;830:存储器;840:通信总线。100: Training device for image generator; 10: Sample image acquisition module; 20: Degraded image repair module; 30: Discriminator optimization module; 40: Generator optimization module; 50: Generator determination module; 810: Processor; 820: communication interface; 830: memory; 840: communication bus.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the drawings in this application. Obviously, the described embodiments are part of the embodiments of this application. , not all examples. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
下面结合图1-图5描述本申请提供的图像生成器的训练方法。如图1所示,本申请提供一种图像生成器的训练方法,包括:The training method of the image generator provided by this application is described below with reference to Figures 1-5. As shown in Figure 1, this application provides a training method for an image generator, including:
步骤S1:获取原始样本人脸图像以及原始样本人脸图像对应的降质样本人脸图像。Step S1: Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image.
其中,原始样本人脸图像表示图像性能指标(或称为图像质量指标)比较高的样本人脸图像。降质样本人脸图像表示图像性能指标比较低的样本人脸图像。原始样本人脸图像和降质样本人脸图像构成样本人脸图像对,用于监督训练图像生成器和图像判别器。Among them, the original sample face image represents a sample face image with a relatively high image performance index (or image quality index). Degraded sample face images represent sample face images with relatively low image performance indicators. The original sample face image and the degraded sample face image constitute a sample face image pair, which is used to supervise the training of the image generator and image discriminator.
步骤S2:将降质样本人脸图像输入至预先构建的图像生成器中,得到图像生成器生成的修复样本人脸图像;图像生成器基于Transformer模型构建。Step S2: Input the degraded sample face image into the pre-built image generator to obtain the repaired sample face image generated by the image generator; the image generator is built based on the Transformer model.
Transformer模型是一种基于Attention思想构建的模型,其广泛应用于自然语言处理、语义关系抽取、摘要生成、命名实体识别以及机器翻译等技术领域。The Transformer model is a model built based on the idea of Attention, which is widely used in technical fields such as natural language processing, semantic relationship extraction, summary generation, named entity recognition, and machine translation.
步骤S3:基于原始样本人脸图像和修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;图像判别器用于区分原始样本人脸图像与修复样本人脸图像。Step S3: Based on the original sample face image and the repaired sample face image, optimize the pre-built image discriminator to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image .
步骤S4:基于原始样本人脸图像和修复样本人脸图像,对图像生成器进行优化,得到优化后的图像生成器;图像生成器与图像判别器构成生成对抗网络。Step S4: Based on the original sample face image and the repaired sample face image, optimize the image generator to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network.
步骤S5:交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。Step S5: Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stop the optimization, and use the optimized image generator as the target image generator for the face image to be repaired Perform blind image repair processing.
其中,预设收敛条件可以是预设最大迭代次数,也可以是预设图像性能指标阈值,也可以是其他收敛条件,本申请不作具体限制。例如,在预设收敛条件为预设最大迭代次数的情况下,判断当前迭代次数是否达到预设最大迭代次数,在当前迭代次数达到预设最大迭代次数的情况下,停止迭代;在当前迭代次数未达到预设最大迭代次数的情况下,继续迭代,直至当前迭代次数达到预设最大迭代次数。同理,在预设收敛条件为预设图像性能指标阈值的情况下,判断修复样本人脸图像的图像性能指标是否达到预设图像性能指标阈值,并根据判断结果确定是否继续停止迭代。The preset convergence condition may be a preset maximum number of iterations, a preset image performance index threshold, or other convergence conditions, which are not specifically limited in this application. For example, when the preset convergence condition is the preset maximum number of iterations, determine whether the current iteration number reaches the preset maximum iteration number, and when the current iteration number reaches the preset maximum iteration number, stop the iteration; when the current iteration number If the preset maximum number of iterations is not reached, the iteration continues until the current iteration number reaches the preset maximum number of iterations. In the same way, when the preset convergence condition is the preset image performance index threshold, it is judged whether the image performance index of the repaired sample face image reaches the preset image performance index threshold, and based on the judgment result, it is determined whether to continue to stop the iteration.
图像生成器预先不知道降质样本人脸图像的图像损失类型或者图像退化类型,用于对降质样本人脸图像进行图像盲修复处理,生成修复样本人脸图像。而图像判别器用于判别图像生成器生成的修复样本人脸图像是否 与原始样本人脸图像一致。The image generator does not know the image loss type or image degradation type of the degraded sample face image in advance, and is used to perform image blind repair processing on the degraded sample face image to generate a repaired sample face image. The image discriminator is used to determine whether the repaired sample face image generated by the image generator is consistent with the original sample face image.
上述步骤S1至步骤S5,通过预先构建图像生成器与图像判别器以构成生成对抗网络,在多次优化过程中,图像生成器预先不知道降质样本人脸图像的图像损失类型或者图像退化类型,对降质样本人脸图像进行图像盲修复处理,尽量生成图像性能指标高的、还原度高的以及逼真的修复样本人脸图像;而图像判别器尽量识别出图像生成器生成的修复样本人脸图像与原始样本人脸的区别,使得图像生成器与图像判别器在对抗训练过程中不断被优化,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理,从而可以得到高质量的目标修复人脸图像,实现了端对端的图像盲修复功能,训练过程不需要人工干预,且训练路径比较简单,克服了现有技术中采用卷积神经网络进行图像盲修复处理时,单个阶段无法获得理想的训练结果,需要分两个阶段来完成训练任务,并且训练过程需要人工干预,以及训练路径繁琐复杂的缺陷。In the above steps S1 to S5, an image generator and an image discriminator are constructed in advance to form a generative adversarial network. During multiple optimization processes, the image generator does not know in advance the image loss type or image degradation type of the degraded sample face image. , carry out image blind repair processing on the degraded sample face image, and try to generate a repaired sample face image with high image performance index, high degree of restoration and realistic face image; while the image discriminator tries to identify the repaired sample face image generated by the image generator. The difference between the face image and the original sample face causes the image generator and image discriminator to be continuously optimized during the adversarial training process until the preset convergence conditions are reached, the optimization is stopped, and the optimized image generator is used as the target image generator , perform blind image repair processing on the face image to be repaired, so as to obtain a high-quality target repaired face image, and realize the end-to-end image blind repair function. The training process does not require manual intervention, and the training path is relatively simple, overcoming the When using convolutional neural networks for blind image repair processing in the existing technology, ideal training results cannot be obtained in a single stage. The training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome and complicated.
在一个实施例中,在步骤S1之前,本申请提供的图像生成器的训练方法,还包括:对原始样本人脸图像进行图像降质操作,得到原始样本人脸图像对应的降质样本人脸图像,其中,图像降质操作包括但不仅限于模糊操作、下采样操作、加高斯白噪声操作以及JPEG压缩操作。In one embodiment, before step S1, the training method of the image generator provided by this application also includes: performing an image degradation operation on the original sample face image to obtain a degraded sample face corresponding to the original sample face image. Image, wherein the image degradation operations include but are not limited to blur operations, downsampling operations, Gaussian white noise addition operations, and JPEG compression operations.
可选地,模糊操作包括高斯模糊操作和运动模糊操作。下采样操作包括双三次插值(Bicubic)下采样操作、双线性插值(Bilinear)下采样操作以及Lanczos下采样操作,其中,Lanczos算法是一种将对称矩阵通过正交相似变换变成对称三对角矩阵的算法。加噪声操作包括加高斯白噪声操作以及加泊松噪声操作。Optionally, the blur operation includes Gaussian blur operation and motion blur operation. Downsampling operations include bicubic interpolation (Bicubic) downsampling operations, bilinear interpolation (Bilinear) downsampling operations, and Lanczos downsampling operations. Among them, the Lanczos algorithm is a method that transforms a symmetric matrix into a symmetric three pair through orthogonal similarity transformation. Algorithm for angular matrices. The noise adding operation includes the Gaussian white noise adding operation and the Poisson noise adding operation.
需要说明的是,本实施例并非使用预先准备好的降质样本人脸图像,而是在训练过程进行在线的图像降质操作,可以使得训练过程中使用到的降质样本人脸图像的种类更加丰富,以提高图像生成器应对未知图像损失类型的待修复人脸图像时的适应性图像修复能力,提高了优化训练效果。It should be noted that this embodiment does not use pre-prepared degraded sample face images, but performs online image degradation operations during the training process, which can make the types of degraded sample face images used during the training process It is richer to improve the adaptive image repair ability of the image generator when dealing with face images to be repaired with unknown image loss types, and improves the optimization training effect.
本实施例,通过设置在线的图像降质操作对原始样本人脸图像进行在线的图像降质处理,得到降质样本人脸图像,丰富了降质样本人脸图像的图像损失类型,从而提高了优化训练的效果,提高了目标图像生成器的泛 化性能,以使其可以针对不同图像损失类型的待修复人脸图像进行图像盲修复处理。In this embodiment, the original sample face image is subjected to online image degradation processing by setting an online image degradation operation to obtain a degraded sample face image, which enriches the image loss types of the degraded sample face image, thereby improving the Optimizing the training effect improves the generalization performance of the target image generator so that it can perform blind image repair processing for face images to be repaired with different image loss types.
在一个实施例中,如图2所示,上述步骤S3具体包括步骤S31至步骤S34,其中:In one embodiment, as shown in Figure 2, the above-mentioned step S3 specifically includes steps S31 to step S34, wherein:
步骤S31:将原始样本人脸图像和修复样本人脸图像输入至图像判别器。Step S31: Input the original sample face image and the repaired sample face image to the image discriminator.
步骤S32:获取原始样本人脸图像对应的第一图像判别结果,以及获取修复样本人脸图像对应的第二图像判别结果。Step S32: Obtain the first image discrimination result corresponding to the original sample face image, and obtain the second image discrimination result corresponding to the repaired sample face image.
其中,第一图像判别结果表示原始样本人脸图像输入至图像判别器后,图像判别器输出的图像判别结果。第二图像判别结果表示修复样本人脸图像输入至图像判别器后,图像判别器输出的图像判别结果。The first image discrimination result represents the image discrimination result output by the image discriminator after the original sample face image is input to the image discriminator. The second image discrimination result represents the image discrimination result output by the image discriminator after the repaired sample face image is input to the image discriminator.
步骤S33:基于第一图像判别结果和第二图像判别结果,获取图像判别器的第一损失函数。Step S33: Based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator.
需要说明的是,图像判别器的损失函数可以采用本申请实施例中提供的第一损失函数,也可以采用其他损失函数,本申请不作具体限制。It should be noted that the loss function of the image discriminator can use the first loss function provided in the embodiment of this application, or other loss functions, and this application does not impose specific restrictions.
步骤S34:固定图像生成器的设备参数,并沿着第一损失函数梯度下降的方向进行迭代,以优化图像判别器的设备参数,得到优化后的图像判别器。Step S34: Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain an optimized image discriminator.
需要说明的是,在对图像判别器进行优化的过程中,需要固定图像生成器的设备参数,即保持图像生成器的设备参数固定不变,只对图像判别器的设备参数进行迭代更新。It should be noted that during the process of optimizing the image discriminator, the device parameters of the image generator need to be fixed, that is, the device parameters of the image generator are kept fixed and only the device parameters of the image discriminator are iteratively updated.
上述步骤S31至步骤S34,通过以原始样本人脸图像输入至图像判别器得到的第一图像判别结果以对照,并结合修复样本人脸图像输入至图像判别器得到的第二图像判别结果,能够准确地计算出图像判别器的第一损失函数,并以第一损失函数为目标函数对图像判别器的设备参数进行迭代优化,从而可以提高图像判别器的优化训练效果。另外,沿着第一损失函数梯度下降的方向进行迭代,可以以最快的迭代速度将图像判别器的损失降到最低,即可以高质量高效率地完成图像判别器的优化训练任务,实现了在提高图像判别器的优化训练效率同时,进一步提高了图像判别器的优化训练效果。In the above steps S31 to S34, by inputting the original sample face image to the first image discrimination result obtained by the image discriminator for comparison, and combining the repaired sample face image with the second image discrimination result obtained by inputting the image discriminator, it is possible to Accurately calculate the first loss function of the image discriminator, and use the first loss function as the objective function to iteratively optimize the device parameters of the image discriminator, thereby improving the optimization training effect of the image discriminator. In addition, by iterating along the gradient descent direction of the first loss function, the loss of the image discriminator can be minimized at the fastest iteration speed, that is, the optimization training task of the image discriminator can be completed with high quality and efficiency, achieving While improving the optimization training efficiency of the image discriminator, it further improves the optimization training effect of the image discriminator.
在一个实施例中,获取第一图像判别结果为真的第一分布概率,以及获取第二图像判别结果为假的第二分布概率,并基于第一分布概率和第二分布概率确定图像判别器的第一损失函数。In one embodiment, a first distribution probability that the first image discrimination result is true is obtained, and a second distribution probability that the second image discrimination result is false is obtained, and the image discriminator is determined based on the first distribution probability and the second distribution probability. The first loss function.
其中,第一分布概率表示期望原始样本人脸图像输入至图像判别器得到的图像判别结果为真的分布概率。第二分布概率表示期望修复样本人脸图像输入至图像判别器得到的图像判别结果为假的分布概率。The first distribution probability represents the distribution probability that the image discrimination result obtained by inputting the original sample face image to the image discriminator is expected to be true. The second distribution probability represents the distribution probability that the image discrimination result obtained by inputting the repaired sample face image to the image discriminator is false.
在一个实施例中,如图3所示,上述步骤S4具体包括步骤S41至步骤S43,其中:In one embodiment, as shown in Figure 3, the above-mentioned step S4 specifically includes steps S41 to step S43, wherein:
步骤S41:获取修复样本人脸图像输入至图像判别器中得到的第二图像判别结果。Step S41: Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result.
步骤S42:基于原始样本人脸图像、修复样本人脸图像以及第二图像判别结果,获取图像生成器的第二损失函数。Step S42: Obtain the second loss function of the image generator based on the original sample face image, the repaired sample face image, and the second image discrimination result.
需要说明的是,图像生成器的损失函数可以采用本申请实施例中提供的第二损失函数,也可以采用其他损失函数,本申请不作具体限制。It should be noted that the loss function of the image generator can use the second loss function provided in the embodiment of the present application, or other loss functions, which are not specifically limited by this application.
步骤S43:固定图像判别器的设备参数,并沿着第二损失函数梯度下降的方向进行迭代,以优化图像生成器的设备参数,得到优化后的图像生成器。Step S43: Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.
同样地,在对图像生成器进行优化的过程中,需要固定图像判别器的设备参数,即保持图像判别器的设备参数固定不变,只对图像生成器的设备参数进行迭代更新。Similarly, in the process of optimizing the image generator, it is necessary to fix the device parameters of the image discriminator, that is, keep the device parameters of the image discriminator fixed and only update the device parameters of the image generator iteratively.
上述步骤S41至步骤S43,通过结合原始样本人脸图像、修复样本人脸图像以及修复样本人脸图像输入至图像判别器中得到的第二图像判别结果,能够准确地计算出图像生成器的第二损失函数,从而以第二损失函数为目标函数对图像生成器的设备参数进行迭代优化,可以提高图像生成器的优化训练效果。另外,沿着第二损失函数梯度下降的方向进行迭代,可以以最快的迭代速度将图像生成器的损失降到最低,即可以高质量高效率地完成图像生成器的优化训练任务,实现了在提高图像生成器的优化训练效率同时,进一步提高了图像生成器的优化训练效果。此外,本申请提供的图像生成器的训练方法,相比于现有技术中采用卷积神经网络进行图像盲修复处理的方法,使用的损失函数和训练技巧更少,因此训练过程比较 简单便于实现。From the above steps S41 to S43, by combining the original sample face image, the repaired sample face image, and the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator, the third image generator of the image generator can be accurately calculated. The second loss function is used as the objective function to iteratively optimize the device parameters of the image generator, which can improve the optimization training effect of the image generator. In addition, iterating along the direction of gradient descent of the second loss function can minimize the loss of the image generator at the fastest iteration speed, that is, the optimization training task of the image generator can be completed with high quality and efficiency, achieving While improving the optimization training efficiency of the image generator, the optimization training effect of the image generator is further improved. In addition, the image generator training method provided by this application uses fewer loss functions and training techniques than the existing method of using convolutional neural networks for blind image repair processing, so the training process is relatively simple and easy to implement. .
在一个实施例中,如图4所示,上述步骤S42具体包括步骤S421至步骤S424,其中:In one embodiment, as shown in Figure 4, the above-mentioned step S42 specifically includes steps S421 to step S424, wherein:
步骤S421:基于原始样本人脸图像和修复样本人脸图像获取图像生成器的内容损失,内容损失用于衡量修复样本人脸图像与原始样本人脸图像之间的内容差异。Step S421: Obtain the content loss of the image generator based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image.
步骤S422:基于原始样本人脸图像和修复样本人脸图像获取图像生成器的ID损失,ID损失用于衡量修复样本人脸图像与原始样本人脸图像之间的距离差异。Step S422: Obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.
步骤S423:获取第二图像判别结果为真的最大化概率,并基于最大化概率获取图像生成器的生成损失。Step S423: Obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability.
其中,最大化概率表示期望修复样本人脸图像输入至图像判别器中得到的第二图像判别结果为真的最大化概率。The maximum probability represents the maximum probability that the second image discrimination result obtained when the repaired sample face image is input into the image discriminator is true.
步骤S424:基于内容损失、ID损失以及生成损失,获取图像生成器的第二损失函数。Step S424: Obtain the second loss function of the image generator based on the content loss, ID loss and generation loss.
上述步骤S421至步骤S424,通过分别计算图像生成器在生成修复样本人脸图像过程中的内容损失、ID损失以及生成损失,并通过结合图像生成器的内容损失、ID损失以及生成损失能够准确地计算出图像生成器的第二损失函数,进而以该第二损失函数为目标函数对图像生成器的设备参数进行迭代优化,可以进一步提高图像生成器的优化训练效果。The above steps S421 to S424 can accurately calculate the content loss, ID loss and generation loss of the image generator in the process of generating the repaired sample face image, and by combining the content loss, ID loss and generation loss of the image generator. Calculating the second loss function of the image generator, and then using the second loss function as the objective function to iteratively optimize the device parameters of the image generator can further improve the optimization training effect of the image generator.
在一个实施例中,图像判别器为小波判别器。本实施例提供的小波判别器用于消除或者减弱图像生成器迭代训练过程中生成的修复样本人脸图像中的块效应,以使最终获得的目标图像生成器具有更好的图像盲修复效果,进一步提高了图像生成器的优化训练效果。In one embodiment, the image discriminator is a wavelet discriminator. The wavelet discriminator provided in this embodiment is used to eliminate or weaken the block effect in the repair sample face image generated during the iterative training process of the image generator, so that the finally obtained target image generator has a better blind image repair effect, and further Improved the optimization training effect of image generator.
需要说明的是,可以从图像生成器生成的修复样本人脸图像中直观的观察到块效应,则可以获取修复样本人脸图像的图像性能指标参数去判断修复样本人脸图像中是否存在块效应问题。相较于其他图像判别器,通过本实施例提供的小波判别器对图像生成器进行优化训练,从而使得基于优化后的图像生成器生成的修复样本人脸图像中不存在块效应或者存在较少的块效应。It should be noted that the blocking effect can be intuitively observed from the repaired sample face image generated by the image generator, and the image performance index parameters of the repaired sample face image can be obtained to determine whether there is a blocking effect in the repaired sample face image. question. Compared with other image discriminators, the image generator is optimized and trained through the wavelet discriminator provided in this embodiment, so that there is no block effect or less block effect in the repaired sample face image generated based on the optimized image generator. block effect.
在一个实施例中,图像判别器还包括谱归一化(Spectral Normalization)稳定性约束,用于提高优化训练模型的稳定性,以解决优化训练过程中训练不稳定的问题。In one embodiment, the image discriminator also includes a spectral normalization (Spectral Normalization) stability constraint, which is used to improve the stability of the optimization training model to solve the problem of unstable training during the optimization training process.
在一个实施例中,小波判别器包括离散小波变换模块和拼接卷积模块,其中:离散小波变换模块用于将输入图像分解为多个频率尺度的特征图像;拼接卷积模块用于对多个频率尺度的特征图像进行拼接,并对拼接后的特征图像进行卷积处理,得到重建图像。In one embodiment, the wavelet discriminator includes a discrete wavelet transform module and a concatenated convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into feature images of multiple frequency scales; the concatenated convolution module is used to decompose multiple frequency scales. The frequency scale feature images are spliced, and the spliced feature images are convolved to obtain the reconstructed image.
需要说明的是,多个频率尺度的特征图像相较于输入图像包含了更多的图像细节信息,由于离散小波变换模块具有良好的时频定位功能,即具有更好的图像细节信息保留能力,因此通过离散小波变换处理可以恢复出输入图像丢失的,但是输入图像对应的原始图像中存在的图像细节信息,从而根据拼接卷积模块对包含图像细节信息的多个频率尺度的特征图像进行拼接处理以及卷积平滑处理,得到包含图像细节信息的重建图像,提高了图像的感受野的范围,从而消除或者减弱了输入图像中存在的块效应。It should be noted that feature images at multiple frequency scales contain more image detail information than the input image. Since the discrete wavelet transform module has good time-frequency positioning function, it has better ability to retain image detail information. Therefore, the discrete wavelet transform can be used to recover the image detail information that is lost in the input image, but exists in the original image corresponding to the input image, so that the feature images of multiple frequency scales containing image detail information can be spliced according to the splicing convolution module. and convolution smoothing processing to obtain a reconstructed image containing image detail information, which increases the range of the image's receptive field, thereby eliminating or weakening the block effect existing in the input image.
本实施例提供的小波判别器可以利用其离散小波变换原理以及拼接卷积原理,监督以及训练图像生成器生成具有更多图像细节信息的修复样本人脸图像,提高修复样本人脸图像的感受野的范围,从而消除或者减弱了修复样本人脸图像中存在的块效应,提高了优化训练效果,可以得到性能更优的图像生成器。The wavelet discriminator provided in this embodiment can use its discrete wavelet transform principle and splicing convolution principle to supervise and train the image generator to generate a repaired sample face image with more image detail information, thereby improving the receptive field of the repaired sample face image. range, thereby eliminating or weakening the block effect existing in the repaired sample face image, improving the optimization training effect, and obtaining an image generator with better performance.
需要进一步说明的是,由于基于transformer模型构建的图像生成器在图像修复生成过程中,自制力需要进行全局attention,但是全局attention存在计算量过大的问题,因此使用局部attention代替全局attention,以解决计算量过大的问题。但是,采用局部attention代替全局attention会降低生成图像的感受野的范围,从而导致生成的修复样本人脸图像中存在块效应的问题。而本实施例提供的小波判别器可以扩大感受野的范围,在计算效率与图像修复性能之间取得更好的平衡,以解决修复样本人脸图像中存在块效应的问题,在保证计算效率的同时,提高目标图像生成器的图像盲修复效果。It should be further explained that during the image repair and generation process of the image generator built based on the transformer model, the self-control requires global attention. However, global attention has the problem of excessive calculation, so local attention is used instead of global attention to solve the problem. The problem of excessive calculation amount. However, using local attention instead of global attention will reduce the range of the receptive field of the generated image, resulting in block effects in the generated repaired sample face images. The wavelet discriminator provided in this embodiment can expand the range of the receptive field and achieve a better balance between calculation efficiency and image repair performance to solve the problem of block effects in repair sample face images while ensuring calculation efficiency. At the same time, the blind image repair effect of the target image generator is improved.
在一个实施例中,如图5所示,上述步骤S2具体包括步骤S21至步骤S23,其中:In one embodiment, as shown in Figure 5, the above-mentioned step S2 specifically includes steps S21 to step S23, wherein:
步骤S21:将降质样本人脸图像输入至图像生成器的编码器中,得到低层语义特征和高层语义特征。Step S21: Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.
其中,低层语义特征包括图像的轮廓特征、边缘特征、颜色特征、纹理特征以及形状特征。高层语义特征表示图像中可视化的视觉特征,例如人脸、海滩等语义信息比较丰富的特征。Among them, low-level semantic features include image contour features, edge features, color features, texture features and shape features. High-level semantic features represent visual features visualized in images, such as faces, beaches and other features with rich semantic information.
进一步地,编码器用于对降质样本人脸图像进行卷积操作、非线性操作等,以获取降质样本人脸图像的低层语义特征和高层语义特征。Further, the encoder is used to perform convolution operations, nonlinear operations, etc. on the degraded sample face image to obtain the low-level semantic features and high-level semantic features of the degraded sample face image.
步骤S22:将高层语义特征输入至图像生成器的特征转换模块中,得到风格向量。其中,特征转换模块也可以称为映射模块。Step S22: Input the high-level semantic features into the feature conversion module of the image generator to obtain the style vector. Among them, the feature conversion module can also be called a mapping module.
步骤S23:将低层语义特征、高层语义特征和风格向量输入至图像生成器的解码器中,得到修复样本人脸图像。Step S23: Input low-level semantic features, high-level semantic features and style vectors into the decoder of the image generator to obtain a repaired sample face image.
在一个实施例中,编码器包括多个编码模块,每一编码模块对应一个特征尺度,编码模块用于从输入图像中提取自身特征尺度对应的特征图,并将低维度的特征图和高维度的特征图发送至解码器,以及将高维度的特征图发送至映射模块,其中,低维度的特征图即为低层语义特征,高维度的特征图即为高层语义特征。In one embodiment, the encoder includes multiple coding modules. Each coding module corresponds to a feature scale. The coding module is used to extract a feature map corresponding to its own feature scale from the input image, and combine the low-dimensional feature map with the high-dimensional feature map. The feature map is sent to the decoder, and the high-dimensional feature map is sent to the mapping module. The low-dimensional feature map is the low-level semantic feature, and the high-dimensional feature map is the high-level semantic feature.
在一个实施例中,映射模块(即上述特征转换模块)包括多个全连接层,多个全连接层用于接收编码模块发送的高维度的特征图,并将高维度的特征图映射转换为风格向量,风格向量包括多个向量元素,每一个向量元素对应一个视觉特征。In one embodiment, the mapping module (ie, the above-mentioned feature conversion module) includes multiple fully connected layers. The multiple fully connected layers are used to receive high-dimensional feature maps sent by the encoding module and convert the high-dimensional feature map mapping into Style vector, style vector includes multiple vector elements, each vector element corresponds to a visual feature.
在一个实施例中,解码器包括多个级联的解码模块,每一解码模块对应一个特征尺度。每一解码模块用于获取自身特征尺度对应的低维度的特征图,基于自身特征尺度对应的低维度的特征图、高维度的特征图、高维度的特征图对应的风格向量以及上一级输入参数生成图像修复结果,并将图像修复结果作为下一级输入参数输出。In one embodiment, the decoder includes multiple cascaded decoding modules, each decoding module corresponding to a feature scale. Each decoding module is used to obtain the low-dimensional feature map corresponding to its own feature scale, based on the low-dimensional feature map corresponding to its own feature scale, the high-dimensional feature map, the style vector corresponding to the high-dimensional feature map, and the previous level input. Parameters generate image repair results, and output the image repair results as next-level input parameters.
需要说明的是,上一级输入参数表示上一级解码模块的图像修复结果。第一层解码模块的上一级输入参数为常数或傅里叶特征。最后一级解码模块基于自身特征尺度对应的低维度的特征图、高维度的特征图、高维度的特征图对应的风格向量以及上一级输入参数生成修复样本人脸图像。It should be noted that the upper-level input parameters represent the image repair results of the upper-level decoding module. The input parameters of the upper level of the first layer decoding module are constants or Fourier features. The last-level decoding module generates repaired sample face images based on the low-dimensional feature map corresponding to its own feature scale, the high-dimensional feature map, the style vector corresponding to the high-dimensional feature map, and the input parameters of the previous level.
进一步地,上一级解码模块输出的图像修复结果与其对应的相对位置 编码相加,作为下一级解码模块的输入参数。Further, the image repair results output by the upper-level decoding module are added to their corresponding relative position codes as the input parameters of the next-level decoding module.
下面提供两个具体实施例,以对本申请提供的图像生成器的训练方法作进一步说明。Two specific examples are provided below to further illustrate the training method of the image generator provided by this application.
在具体实施例一中,本申请提供的图像生成器的训练方法包括以下步骤:In specific embodiment 1, the training method of the image generator provided by this application includes the following steps:
步骤1:获取原始样本人脸图像以及原始样本人脸图像对应的降质样本人脸图像。将降质样本人脸图像输入至图像生成器的编码器中,得到低层语义特征和高层语义特征,图像生成器基于Transformer模型构建。将高层语义特征输入至图像生成器的特征转换模块中,得到风格向量。将低层语义特征、高层语义特征和风格向量输入至图像生成器的解码器中,得到修复样本人脸图像。Step 1: Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image. The degraded sample face image is input into the encoder of the image generator to obtain low-level semantic features and high-level semantic features. The image generator is built based on the Transformer model. The high-level semantic features are input into the feature transformation module of the image generator to obtain the style vector. The low-level semantic features, high-level semantic features and style vectors are input into the decoder of the image generator to obtain the repaired sample face image.
步骤2:将原始样本人脸图像和修复样本人脸图像输入至图像判别器,图像判别器用于区分原始样本人脸图像与修复样本人脸图像。获取原始样本人脸图像对应的第一图像判别结果,以及获取修复样本人脸图像对应的第二图像判别结果。基于第一图像判别结果和第二图像判别结果,获取图像判别器的第一损失函数。固定图像生成器的设备参数,并沿着第一损失函数梯度下降的方向进行迭代,以优化图像判别器的设备参数,得到优化后的图像判别器。Step 2: Input the original sample face image and the repaired sample face image to the image discriminator. The image discriminator is used to distinguish the original sample face image and the repaired sample face image. A first image discrimination result corresponding to the original sample face image is obtained, and a second image discrimination result corresponding to the repaired sample face image is obtained. Based on the first image discrimination result and the second image discrimination result, a first loss function of the image discriminator is obtained. Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain the optimized image discriminator.
步骤3:获取修复样本人脸图像输入至图像判别器中得到的第二图像判别结果。基于原始样本人脸图像和修复样本人脸图像获取图像生成器的内容损失,内容损失用于衡量修复样本人脸图像与原始样本人脸图像之间的内容差异。基于原始样本人脸图像和修复样本人脸图像获取图像生成器的ID损失,ID损失用于衡量修复样本人脸图像与原始样本人脸图像之间的距离差异。获取第二图像判别结果为真的最大化概率,并基于最大化概率获取图像生成器的生成损失。基于内容损失、ID损失以及生成损失,获取图像生成器的第二损失函数。固定图像判别器的设备参数,并沿着第二损失函数梯度下降的方向进行迭代,以优化图像生成器的设备参数,得到优化后的图像生成器。Step 3: Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result. The content loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image. The ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image. The maximum probability that the second image discrimination result is true is obtained, and the generation loss of the image generator is obtained based on the maximum probability. Based on the content loss, ID loss, and generation loss, obtain the second loss function of the image generator. Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.
步骤4:交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为 目标图像生成器,以对待修复人脸图像进行图像盲修复处理。Step 4: Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator for the face image to be repaired Perform blind image repair processing.
图6是本申请具体实施例二中图像生成器的优化训练模型的结构示意图,如图6所示,本申请提供的具体实施例二具体包括以下步骤:Figure 6 is a schematic structural diagram of the optimized training model of the image generator in the second specific embodiment of the present application. As shown in Figure 6, the second specific embodiment provided by the present application specifically includes the following steps:
步骤(1):获取原始样本人脸图像,并对原始样本人脸图像进行在线的图像降质操作,得到原始样本人脸图像对应的降质样本人脸图像,其中,图像降质操作包括但不仅限于模糊操作、下采样操作、加高斯白噪声操作以及JPEG压缩操作。Step (1): Obtain the original sample face image, and perform an online image degradation operation on the original sample face image to obtain a degraded sample face image corresponding to the original sample face image. The image degradation operation includes: Not limited to blur operations, downsampling operations, Gaussian white noise operations, and JPEG compression operations.
步骤(2):将降质样本人脸图像输入至图像生成器的编码器中,得到低层语义特征和高层语义特征,图像生成器基于Transformer模型构建。将高层语义特征输入至图像生成器的特征转换模块(即映射模块)中,得到高层语义特征对应的风格向量。将低层语义特征、高层语义特征和风格向量输入至图像生成器的解码器中,得到修复样本人脸图像。编码器包括多个编码模块。映射模块包括多个全连接层。解码器包括多个解码模块,解码模块与编码模块的数量相等。解码模块可以由AdaIN和双注意力层(Double Attn)构成,也可以由AdaIN和多层感知机层(MLP)。解码模块的输入和输出使用残差连接。Step (2): Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features. The image generator is built based on the Transformer model. Input the high-level semantic features into the feature conversion module (i.e., mapping module) of the image generator to obtain the style vector corresponding to the high-level semantic features. The low-level semantic features, high-level semantic features and style vectors are input into the decoder of the image generator to obtain the repaired sample face image. The encoder includes multiple encoding modules. The mapping module includes multiple fully connected layers. The decoder includes multiple decoding modules, and the number of decoding modules is equal to the number of encoding modules. The decoding module can be composed of AdaIN and double attention layer (Double Attn), or it can be composed of AdaIN and multi-layer perceptron layer (MLP). The input and output of the decoding module use residual connections.
步骤(3):将原始样本人脸图像和修复样本人脸图像输入至图像判别器,图像判别器用于区分原始样本人脸图像与修复样本人脸图像。获取原始样本人脸图像对应的第一图像判别结果,以及获取修复样本人脸图像对应的第二图像判别结果。获取第一图像判别结果为真的第一分布概率,以及获取第二图像判别结果为假的第二分布概率,并基于第一分布概率和第二分布概率确定图像判别器的第一损失函数。其中,第一分布概率表示期望原始样本人脸图像输入至图像判别器得到的图像判别结果为真的分布概率。第二分布概率表示期望修复样本人脸图像输入至图像判别器得到的图像判别结果为假的分布概率。Step (3): Input the original sample face image and the repaired sample face image to the image discriminator, and the image discriminator is used to distinguish the original sample face image and the repaired sample face image. A first image discrimination result corresponding to the original sample face image is obtained, and a second image discrimination result corresponding to the repaired sample face image is obtained. Obtain a first distribution probability that the first image discrimination result is true, obtain a second distribution probability that the second image discrimination result is false, and determine a first loss function of the image discriminator based on the first distribution probability and the second distribution probability. The first distribution probability represents the distribution probability that the image discrimination result obtained by inputting the original sample face image to the image discriminator is expected to be true. The second distribution probability represents the distribution probability that the image discrimination result obtained by inputting the repaired sample face image to the image discriminator is false.
具体地,第一损失函数如下公式(1)所示:Specifically, the first loss function is shown in the following formula (1):
Figure PCTCN2022125015-appb-000001
Figure PCTCN2022125015-appb-000001
其中,L D表示第一损失函数,y表示原始样本人脸图像,P y表示原始样本人脸图像的分布概率,D(y)表示原始样本人脸图像对应的第一图像判别结果,
Figure PCTCN2022125015-appb-000002
表示原始样本人脸图像对应的第一分布概率。x表示降 质样本人脸图像,P x表示降质样本人脸图像的分布概率,G(x)表示降质样本人脸图像对应的修复样本人脸图像,D(G(x))表示修复样本人脸图像对应的第二图像判别结果,
Figure PCTCN2022125015-appb-000003
表示修复样本人脸图像对应的第二分布概率。γ表示权重系数,
Figure PCTCN2022125015-appb-000004
表示谱归一化稳定性约束。公式中的两个负号表示梯度下降的方向,以将第一损失函数的值控制在(0,1)之间进行梯度下降。
Among them, L D represents the first loss function, y represents the original sample face image, P y represents the distribution probability of the original sample face image, D(y) represents the first image discrimination result corresponding to the original sample face image,
Figure PCTCN2022125015-appb-000002
Indicates the first distribution probability corresponding to the original sample face image. x represents the degraded sample face image, P x represents the distribution probability of the degraded sample face image, G(x) represents the repaired sample face image corresponding to the degraded sample face image, D(G(x)) represents the repaired face image The second image discrimination result corresponding to the sample face image,
Figure PCTCN2022125015-appb-000003
Indicates the second distribution probability corresponding to the repaired sample face image. γ represents the weight coefficient,
Figure PCTCN2022125015-appb-000004
represents the spectral normalization stability constraint. The two negative signs in the formula indicate the direction of gradient descent to control the value of the first loss function between (0,1) for gradient descent.
固定图像生成器的设备参数,并沿着第一损失函数梯度下降的方向进行迭代,以优化图像判别器的设备参数,得到优化后的图像判别器。图像判别器由小波判别器和谱归一化稳定性约束构成,小波判别器包括离散小波变换模块和拼接卷积模块,其中:离散小波变换模块为DWT离散小波变换模块用于将输入图像分解为多个频率尺度的特征图像。拼接卷积模块包括concat拼接单元和conv卷积单元。concat拼接单元用于对多个频率尺度的特征图像进行拼接。conv卷积单元对拼接后的特征图像进行卷积平滑处理,得到重建图像。例如,DWT离散小波变换模块将一个1024*1024的输入图像分解为4个512*512的特征图像。concat拼接单元将4个512*512的特征图像进行拼接。conv卷积单元将对拼接后的特征图像进行卷积平滑处理,得到一个1024*1024的重建图像。Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain the optimized image discriminator. The image discriminator consists of a wavelet discriminator and a spectral normalization stability constraint. The wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module. The discrete wavelet transform module is the DWT discrete wavelet transform module, which is used to decompose the input image into Feature images at multiple frequency scales. The concatenated convolution module includes concat concatenation unit and conv convolution unit. The concat concatenation unit is used to concatenate feature images at multiple frequency scales. The conv convolution unit performs convolution and smoothing processing on the spliced feature images to obtain the reconstructed image. For example, the DWT discrete wavelet transform module decomposes a 1024*1024 input image into four 512*512 feature images. The concat splicing unit splices four 512*512 feature images. The conv convolution unit will perform convolution and smoothing processing on the spliced feature images to obtain a 1024*1024 reconstructed image.
步骤(4):获取修复样本人脸图像输入至图像判别器中得到的第二图像判别结果。基于原始样本人脸图像和修复样本人脸图像获取图像生成器的内容损失,内容损失用于衡量修复样本人脸图像与原始样本人脸图像之间的内容差异,以L 1损失作为图像生成器的内容损失,其中,内容损失的计算方法如公式(2)所示: Step (4): Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result. The content loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image. L 1 loss is used as the image generator. The content loss of , where the calculation method of content loss is as shown in formula (2):
L 1(x)=||y-G(x)|| 1   (2) L 1 (x)=||yG(x)|| 1 (2)
其中,L 1(x)表示图像生成器的内容损失,x表示降质样本人脸图像,y表示原始样本人脸图像,G(x)表示修复样本人脸图像。 Among them, L 1 (x) represents the content loss of the image generator, x represents the degraded sample face image, y represents the original sample face image, and G(x) represents the repaired sample face image.
基于原始样本人脸图像和修复样本人脸图像获取图像生成器的ID损失,ID损失用于衡量修复样本人脸图像与原始样本人脸图像之间的距离差异,其中,ID损失的计算方法如公式(3)所示:The ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image. The ID loss is calculated as follows Formula (3) shows:
L ID(x)=1-<R(y),R(G(x))>   (3) L ID (x)=1-<R(y),R(G(x))> (3)
其中,L ID(x)表示图像生成器的ID损失,R表示基于预设人脸识别算 法训练得到的人脸识别网络,R(y)表示降质样本人脸图像输入至人脸识别网络输出的第一人脸识别结果,R(G(x))表示修复样本人脸图像输入至人脸识别网络输出的第二人脸识别结果,<R(y),R(G(x))>表示降质样本人脸图像与修复样本人脸图像的相似度。 Among them, L ID (x) represents the ID loss of the image generator, R represents the face recognition network trained based on the preset face recognition algorithm, and R (y) represents the input of the degraded sample face image to the face recognition network output. The first face recognition result, R(G(x)) represents the second face recognition result output from the repaired sample face image input to the face recognition network, <R(y),R(G(x))> Indicates the similarity between the degraded sample face image and the repaired sample face image.
需要说明的是,上述公式表示“1减去降质样本人脸图像与修复样本人脸图像的相似度”,由于刚开始进行生成对抗训练时降质样本人脸图像与修复样本人脸图像的相似度较低,随着生成对抗训练不断进行,两者的相似度逐渐增大,而用“1减去两者的相似度”表示随着生成对抗训练不断进行,相似度逐渐增大,而ID损失逐渐减小,以实现ID损失的梯度递减。获取第二图像判别结果为真的最大化概率,基于最大化概率获取非饱和损失,并将非饱和损失作为图像生成器的生成损失,其中,生成损失的计算方法如公式(4)所示:It should be noted that the above formula represents "1 minus the similarity between the degraded sample face image and the repaired sample face image." Since the difference between the degraded sample face image and the repaired sample face image when generative adversarial training was just started, The similarity is low. As the generative adversarial training continues, the similarity between the two gradually increases. Using "1 minus the similarity between the two" means that as the generative adversarial training continues, the similarity gradually increases, and The ID loss is gradually reduced to achieve gradient reduction of the ID loss. Obtain the maximum probability that the second image discrimination result is true, obtain the unsaturated loss based on the maximized probability, and use the unsaturated loss as the generation loss of the image generator, where the calculation method of the generation loss is as shown in formula (4):
L gan(x)=maxlog[D(G(x))]   (4) L gan (x)=maxlog[D(G(x))] (4)
其中,L gan(x)表示图像生成器的生成损失,G(x)表示修复样本人脸图像,D(G(x))表示修复样本人脸图像输入至图像判别器中得到的第二图像判别结果,maxlog[D(G(x))]表示第二图像判别结果为真的最大化概率。 Among them, Lgan (x) represents the generation loss of the image generator, G(x) represents the repaired sample face image, and D(G(x)) represents the second image obtained by inputting the repaired sample face image into the image discriminator. The discrimination result, maxlog[D(G(x))] represents the maximum probability that the second image discrimination result is true.
需要说明的是,在优化训练的初始阶段,图像生成器生成的修复样本人脸图像很容易被图像判别器识别出来,也就是D(G(x))趋近于0,但是,非饱和图像生成器的log[D(G(x))]的梯度不趋于0,能够为图像生成器的设备参数更新提供比较好的梯度方向,提高迭代的收敛速度。It should be noted that in the initial stage of optimization training, the repaired sample face image generated by the image generator is easily recognized by the image discriminator, that is, D(G(x)) approaches 0. However, the non-saturated image The gradient of the generator's log[D(G(x))] does not tend to 0, which can provide a better gradient direction for the device parameter update of the image generator and improve the convergence speed of the iteration.
基于内容损失、ID损失以及生成损失,获取图像生成器的第二损失函数,其中,第二损失函数的计算方法如公式(5)所示:Based on the content loss, ID loss and generation loss, the second loss function of the image generator is obtained, where the calculation method of the second loss function is as shown in formula (5):
L G=λ 1L 1(x)+λ 2L gan(x)+λ 3L ID(x)   (5) L G =λ 1 L 1 (x)+λ 2 L gan (x)+λ 3 L ID (x) (5)
其中,L G表示第二损失函数,L 1(x)表示图像生成器的内容损失,L gan(x)表示图像生成器的生成损失,L ID(x)表示图像生成器的ID损失,λ 1表示第一超参数,λ 2表示第二超参数,λ 3表示第三超参数。 Among them, L G represents the second loss function, L 1 (x) represents the content loss of the image generator, L gan (x) represents the generation loss of the image generator, L ID (x) represents the ID loss of the image generator, λ 1 represents the first hyperparameter, λ 2 represents the second hyperparameter, and λ 3 represents the third hyperparameter.
固定图像判别器的设备参数,并沿着第二损失函数梯度下降的方向进行迭代,以优化图像生成器的设备参数,得到优化后的图像生成器。Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.
步骤(5):交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,获取当前图像生成器生成的修复样本人脸图像的图像性能指标, 在图像性能指标达到预设图像性能指标阈值的情况下,停止优化,并将当前图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。Step (5): Repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator alternately to obtain the image performance index of the repaired sample face image generated by the current image generator. When the image performance index reaches the preset image performance index threshold, In the case of , stop optimization and use the current image generator as the target image generator to perform blind image repair processing on the face image to be repaired.
下面对本申请提供的图像生成器的训练装置进行描述,下文描述的图像生成器的训练装置与上文描述的图像生成器的训练方法可相互对应参照。The training device for an image generator provided by this application is described below. The training device for an image generator described below and the training method for an image generator described above may be referred to correspondingly.
如图7所示,本申请提供一种图像生成器的训练装置100,包括样本图像获取模块10、降质图像修复模块20、判别器优化模块30、生成器优化模块40和生成器确定模块50,其中:As shown in Figure 7, this application provides an image generator training device 100, including a sample image acquisition module 10, a degraded image repair module 20, a discriminator optimization module 30, a generator optimization module 40 and a generator determination module 50 ,in:
样本图像获取模块10,用于获取原始样本人脸图像以及原始样本人脸图像对应的降质样本人脸图像。The sample image acquisition module 10 is used to acquire the original sample face image and the degraded sample face image corresponding to the original sample face image.
降质图像修复模块20,用于将降质样本人脸图像输入至预先构建的图像生成器中,得到图像生成器生成的修复样本人脸图像;图像生成器基于Transformer模型构建。The degraded image repair module 20 is used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model.
判别器优化模块30,用于基于原始样本人脸图像和修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;图像判别器用于区分原始样本人脸图像与修复样本人脸图像。The discriminator optimization module 30 is used to optimize the pre-built image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish between the original sample face image and the repaired sample face image. Repair sample face image.
生成器优化模块40,用于基于原始样本人脸图像和修复样本人脸图像,对图像生成器进行优化,得到优化后的图像生成器;图像生成器与图像判别器构成生成对抗网络。The generator optimization module 40 is used to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network.
生成器确定模块50,用于交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。The generator determination module 50 is configured to alternately repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stopping the optimization, and using the optimized image generator as the target image generator to Perform blind image repair processing on the face image to be repaired.
在一个实施例中,判别器优化模块30包括样本图像输入单元、判别结果获取单元、第一函数获取单元和判别器优化单元,其中:In one embodiment, the discriminator optimization module 30 includes a sample image input unit, a discrimination result acquisition unit, a first function acquisition unit and a discriminator optimization unit, where:
样本图像输入单元,用于将原始样本人脸图像和修复样本人脸图像输入至图像判别器。The sample image input unit is used to input the original sample face image and the repaired sample face image to the image discriminator.
判别结果获取单元,用于获取原始样本人脸图像对应的第一图像判别结果,以及获取修复样本人脸图像对应的第二图像判别结果。The discrimination result acquisition unit is used to obtain the first image discrimination result corresponding to the original sample face image, and to obtain the second image discrimination result corresponding to the repaired sample face image.
第一函数获取单元,用于基于第一图像判别结果和第二图像判别结果, 获取图像判别器的第一损失函数。The first function acquisition unit is configured to acquire the first loss function of the image discriminator based on the first image discrimination result and the second image discrimination result.
判别器优化单元,用于固定图像生成器的设备参数,并沿着第一损失函数梯度下降的方向进行迭代,以优化图像判别器的设备参数,得到优化后的图像判别器。The discriminator optimization unit is used to fix the device parameters of the image generator and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator to obtain an optimized image discriminator.
在一个实施例中,生成器优化模块40包括判别数据获取单元、第二函数获取单元和生成器优化单元,其中:In one embodiment, the generator optimization module 40 includes a discriminant data acquisition unit, a second function acquisition unit and a generator optimization unit, wherein:
判别数据获取单元,用于获取修复样本人脸图像输入至图像判别器中得到的第二图像判别结果。The discrimination data acquisition unit is used to acquire the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator.
第二函数获取单元,用于基于原始样本人脸图像、修复样本人脸图像以及第二图像判别结果,获取图像生成器的第二损失函数。The second function acquisition unit is used to acquire the second loss function of the image generator based on the original sample face image, the repaired sample face image, and the second image discrimination result.
生成器优化单元,用于固定图像判别器的设备参数,并沿着第二损失函数梯度下降的方向进行迭代,以优化图像生成器的设备参数,得到优化后的图像生成器。The generator optimization unit is used to fix the device parameters of the image discriminator and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.
在一个实施例中,第二函数获取单元包括内容损失获取子单元、ID损失获取子单元、生成损失获取子单元和损失函数获取子单元,其中。In one embodiment, the second function acquisition unit includes a content loss acquisition subunit, an ID loss acquisition subunit, a generation loss acquisition subunit, and a loss function acquisition subunit, where.
内容损失获取子单元,用于基于原始样本人脸图像和修复样本人脸图像获取图像生成器的内容损失,内容损失用于衡量修复样本人脸图像与原始样本人脸图像之间的内容差异。The content loss acquisition subunit is used to obtain the content loss of the image generator based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image.
ID损失获取子单元,用于基于原始样本人脸图像和修复样本人脸图像获取图像生成器的ID损失,ID损失用于衡量修复样本人脸图像与原始样本人脸图像之间的距离差异。The ID loss acquisition subunit is used to obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.
生成损失获取子单元,用于获取第二图像判别结果为真的最大化概率,并基于最大化概率获取图像生成器的生成损失。The generation loss acquisition subunit is used to obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability.
损失函数获取子单元,用于基于内容损失、ID损失以及生成损失,获取图像生成器的第二损失函数。The loss function acquisition subunit is used to acquire the second loss function of the image generator based on content loss, ID loss and generation loss.
在一个实施例中,图像判别器为小波判别器。In one embodiment, the image discriminator is a wavelet discriminator.
在一个实施例中,小波判别器包括离散小波变换模块和拼接卷积模块,其中:离散小波变换模块用于将输入图像分解为多个频率尺度的特征图像;拼接卷积模块用于对多个频率尺度的特征图像进行拼接,并对拼接后的特征图像进行卷积处理,得到重建图像。In one embodiment, the wavelet discriminator includes a discrete wavelet transform module and a concatenated convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into feature images of multiple frequency scales; the concatenated convolution module is used to decompose multiple frequency scales. The frequency scale feature images are spliced, and the spliced feature images are convolved to obtain the reconstructed image.
在一个实施例中,降质图像修复模块20包括特征获取单元、特征转换单元和图像修复单元,其中:In one embodiment, the degraded image repair module 20 includes a feature acquisition unit, a feature conversion unit and an image repair unit, where:
特征获取单元,用于将降质样本人脸图像输入至图像生成器的编码器中,得到低层语义特征和高层语义特征。The feature acquisition unit is used to input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.
特征转换单元,用于将高层语义特征输入至图像生成器的特征转换模块中,得到风格向量。The feature conversion unit is used to input high-level semantic features into the feature conversion module of the image generator to obtain the style vector.
图像修复单元,用于将低层语义特征、高层语义特征和风格向量输入至图像生成器的解码器中,得到修复样本人脸图像。The image repair unit is used to input low-level semantic features, high-level semantic features and style vectors into the decoder of the image generator to obtain a repaired sample face image.
图8示例了一种电子设备的实体结构示意图,如图8所示,该电子设备可以包括:处理器(processor)810、通信接口(Communications Interface)820、存储器(memory)830和通信总线840,其中,处理器810,通信接口820,存储器830通过通信总线840完成相互间的通信。处理器810可以调用存储器830中的逻辑指令,以执行图像生成器的训练方法,该方法包括:获取原始样本人脸图像以及原始样本人脸图像对应的降质样本人脸图像;将降质样本人脸图像输入至预先构建的图像生成器中,得到图像生成器生成的修复样本人脸图像;图像生成器基于Transformer模型构建;基于原始样本人脸图像和修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;图像判别器用于区分原始样本人脸图像与修复样本人脸图像;基于原始样本人脸图像和修复样本人脸图像,对图像生成器进行优化,得到优化后的图像生成器;图像生成器与图像判别器构成生成对抗网络;交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。Figure 8 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 8, the electronic device may include: a processor (processor) 810, a communication interface (Communications Interface) 820, a memory (memory) 830 and a communication bus 840. Among them, the processor 810, the communication interface 820, and the memory 830 complete communication with each other through the communication bus 840. The processor 810 can call the logic instructions in the memory 830 to execute the training method of the image generator. The method includes: obtaining the original sample face image and the degraded sample face image corresponding to the original sample face image; The face image is input into the pre-built image generator, and the repaired sample face image generated by the image generator is obtained; the image generator is built based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-built sample face image is The image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image; the image generator is optimized based on the original sample face image and the repaired sample face image. , the optimized image generator is obtained; the image generator and the image discriminator form a generative adversarial network; alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and The optimized image generator is used as the target image generator to perform blind image repair processing on the face image to be repaired.
此外,上述的存储器830中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、 只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, Read-Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program code. .
又一方面,本申请还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各方法提供的图像生成器的训练方法,该方法包括:获取原始样本人脸图像以及原始样本人脸图像对应的降质样本人脸图像;将降质样本人脸图像输入至预先构建的图像生成器中,得到图像生成器生成的修复样本人脸图像;图像生成器基于Transformer模型构建;基于原始样本人脸图像和修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;图像判别器用于区分原始样本人脸图像与修复样本人脸图像;基于原始样本人脸图像和修复样本人脸图像,对图像生成器进行优化,得到优化后的图像生成器;图像生成器与图像判别器构成生成对抗网络;交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。On the other hand, the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by the processor to execute the training method of the image generator provided by each of the above methods. The method includes: obtaining an original sample face image and a degraded sample face image corresponding to the original sample face image; inputting the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator. face image; the image generator is built based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-built image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image Face image and repaired sample face image; based on the original sample face image and repaired sample face image, the image generator is optimized to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; alternately Repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator to perform blind image repair of the face image to be repaired. deal with.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in one place. , or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute various embodiments or methods of certain parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通 技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (8)

  1. 一种图像生成器的训练方法,包括:A training method for an image generator, including:
    获取原始样本人脸图像以及所述原始样本人脸图像对应的降质样本人脸图像;Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image;
    将所述降质样本人脸图像输入至预先构建的图像生成器中,得到所述图像生成器生成的修复样本人脸图像;所述图像生成器基于Transformer模型构建;Input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model;
    基于所述原始样本人脸图像和所述修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;所述图像判别器用于区分所述原始样本人脸图像与所述修复样本人脸图像;其中,所述基于所述原始样本人脸图像和所述修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器包括:将原始样本人脸图像和修复样本人脸图像输入至图像判别器;获取原始样本人脸图像对应的第一图像判别结果,以及获取修复样本人脸图像对应的第二图像判别结果;基于第一图像判别结果和第二图像判别结果,获取图像判别器的第一损失函数;固定图像生成器的设备参数,并沿着第一损失函数梯度下降的方向进行迭代,以优化图像判别器的设备参数,得到优化后的图像判别器;Based on the original sample face image and the repaired sample face image, the pre-constructed image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image from the repaired sample face image. The repaired sample face image; wherein, optimizing the pre-constructed image discriminator based on the original sample face image and the repaired sample face image, and obtaining the optimized image discriminator includes: converting the original The sample face image and the repaired sample face image are input to the image discriminator; the first image discrimination result corresponding to the original sample face image is obtained, and the second image discrimination result corresponding to the repaired sample face image is obtained; based on the first image discrimination As a result and the second image discrimination result, obtain the first loss function of the image discriminator; fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and get Optimized image discriminator;
    基于所述原始样本人脸图像和所述修复样本人脸图像,对所述图像生成器进行优化,得到优化后的图像生成器;所述图像生成器与所述图像判别器构成生成对抗网络;其中,所述基于所述原始样本人脸图像和所述修复样本人脸图像,对所述图像生成器进行优化,得到优化后的图像生成器包括:获取修复样本人脸图像输入至图像判别器中得到的第二图像判别结果;基于原始样本人脸图像、修复样本人脸图像以及第二图像判别结果,获取图像生成器的第二损失函数;固定图像判别器的设备参数,并沿着第二损失函数梯度下降的方向进行迭代,以优化图像生成器的设备参数,得到优化后的图像生成器;Based on the original sample face image and the repaired sample face image, the image generator is optimized to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; Wherein, optimizing the image generator based on the original sample face image and the repaired sample face image to obtain the optimized image generator includes: obtaining the repaired sample face image and inputting it to the image discriminator The second image discrimination result obtained in Iterate in the direction of gradient descent of the two loss functions to optimize the device parameters of the image generator and obtain the optimized image generator;
    交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator to perform image blindness on the face image to be repaired. Repair processing.
  2. 根据权利要求1所述的图像生成器的训练方法,其中,所述基于 所述原始样本人脸图像、所述修复样本人脸图像以及所述第二图像判别结果,获取所述图像生成器的第二损失函数,包括:The training method of an image generator according to claim 1, wherein the method of obtaining the image generator based on the original sample face image, the repaired sample face image and the second image discrimination result is The second loss function includes:
    基于所述原始样本人脸图像和所述修复样本人脸图像获取所述图像生成器的内容损失,所述内容损失用于衡量修复样本人脸图像与原始样本人脸图像之间的内容差异;Obtain the content loss of the image generator based on the original sample face image and the repaired sample face image, and the content loss is used to measure the content difference between the repaired sample face image and the original sample face image;
    基于所述原始样本人脸图像和所述修复样本人脸图像获取所述图像生成器的ID损失,所述ID损失用于衡量修复样本人脸图像与原始样本人脸图像之间的距离差异;The ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image, and the ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image;
    获取所述第二图像判别结果为真的最大化概率,并基于所述最大化概率获取所述图像生成器的生成损失;Obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability;
    基于所述内容损失、所述ID损失以及所述生成损失,获取所述图像生成器的第二损失函数。Based on the content loss, the ID loss and the generation loss, a second loss function of the image generator is obtained.
  3. 根据权利要求1至2中任一项所述的图像生成器的训练方法,其中,所述图像判别器为小波判别器。The training method of an image generator according to any one of claims 1 to 2, wherein the image discriminator is a wavelet discriminator.
  4. 根据权利要求3所述的图像生成器的训练方法,其中,所述小波判别器包括离散小波变换模块和拼接卷积模块,其中:The training method of the image generator according to claim 3, wherein the wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module, wherein:
    所述离散小波变换模块用于将输入图像分解为多个频率尺度的特征图像;The discrete wavelet transform module is used to decompose the input image into feature images of multiple frequency scales;
    所述拼接卷积模块用于对多个频率尺度的特征图像进行拼接,并对拼接后的特征图像进行卷积处理,得到重建图像。The splicing convolution module is used to splice feature images of multiple frequency scales, and perform convolution processing on the spliced feature images to obtain a reconstructed image.
  5. 根据权利要求1所述的图像生成器的训练方法,其中,所述将所述降质样本人脸图像输入至预先构建的图像生成器中,得到所述图像生成器生成的修复样本人脸图像,包括:The training method of an image generator according to claim 1, wherein the degraded sample face image is input into a pre-constructed image generator to obtain a repaired sample face image generated by the image generator. ,include:
    将所述降质样本人脸图像输入至所述图像生成器的编码器中,得到低层语义特征和高层语义特征;Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features;
    将所述高层语义特征输入至所述图像生成器的特征转换模块中,得到风格向量;其中,所述将所述高层语义特征输入至所述图像生成器的特征转换模块中,得到风格向量包括:基于特征转换模块中的多个全连接层将高层语义特征映射转换为风格向量,所述风格向量包括多个向量元素,每一个向量元素对应一个视觉特征;Input the high-level semantic features into the feature conversion module of the image generator to obtain a style vector; wherein, input the high-level semantic features into the feature conversion module of the image generator to obtain a style vector including : Convert high-level semantic feature mapping into style vectors based on multiple fully connected layers in the feature conversion module. The style vector includes multiple vector elements, each vector element corresponding to a visual feature;
    将所述低层语义特征、所述高层语义特征和所述风格向量输入至所述图像生成器的解码器中,得到所述修复样本人脸图像。The low-level semantic features, the high-level semantic features and the style vector are input into the decoder of the image generator to obtain the repaired sample face image.
  6. 一种图像生成器的训练装置,包括:An image generator training device, including:
    样本图像获取模块,用于获取原始样本人脸图像以及所述原始样本人脸图像对应的降质样本人脸图像;A sample image acquisition module, used to acquire the original sample face image and the degraded sample face image corresponding to the original sample face image;
    降质图像修复模块,用于将所述降质样本人脸图像输入至预先构建的图像生成器中,得到所述图像生成器生成的修复样本人脸图像;所述图像生成器基于Transformer模型构建;Degraded image repair module, used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model ;
    判别器优化模块,用于基于所述原始样本人脸图像和所述修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器;所述图像判别器用于区分所述原始样本人脸图像与所述修复样本人脸图像;其中,所述基于所述原始样本人脸图像和所述修复样本人脸图像,对预先构建的图像判别器进行优化,得到优化后的图像判别器包括:将原始样本人脸图像和修复样本人脸图像输入至图像判别器;获取原始样本人脸图像对应的第一图像判别结果,以及获取修复样本人脸图像对应的第二图像判别结果;基于第一图像判别结果和第二图像判别结果,获取图像判别器的第一损失函数;固定图像生成器的设备参数,并沿着第一损失函数梯度下降的方向进行迭代,以优化图像判别器的设备参数,得到优化后的图像判别器;The discriminator optimization module is used to optimize the pre-constructed image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish the The original sample face image and the repaired sample face image; wherein, based on the original sample face image and the repaired sample face image, the pre-constructed image discriminator is optimized to obtain the optimized The image discriminator includes: inputting the original sample face image and the repaired sample face image to the image discriminator; obtaining the first image discrimination result corresponding to the original sample face image, and obtaining the second image discrimination result corresponding to the repaired sample face image Result; based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator; fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the image The device parameters of the discriminator are used to obtain the optimized image discriminator;
    生成器优化模块,用于基于所述原始样本人脸图像和所述修复样本人脸图像,对所述图像生成器进行优化,得到优化后的图像生成器;所述图像生成器与所述图像判别器构成生成对抗网络;其中,所述基于所述原始样本人脸图像和所述修复样本人脸图像,对所述图像生成器进行优化,得到优化后的图像生成器包括:获取修复样本人脸图像输入至图像判别器中得到的第二图像判别结果;基于原始样本人脸图像、修复样本人脸图像以及第二图像判别结果,获取图像生成器的第二损失函数;固定图像判别器的设备参数,并沿着第二损失函数梯度下降的方向进行迭代,以优化图像生成器的设备参数,得到优化后的图像生成器;A generator optimization module, configured to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generator; the image generator and the image The discriminator constitutes a generative adversarial network; wherein, optimizing the image generator based on the original sample face image and the repaired sample face image to obtain the optimized image generator includes: obtaining the repaired sample face image The face image is input into the second image discrimination result obtained by the image discriminator; based on the original sample face image, the repaired sample face image and the second image discrimination result, the second loss function of the image generator is obtained; the fixed image discriminator device parameters, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator;
    生成器确定模块,用于交替重复上述优化图像判别器的步骤以及优化图像生成器的步骤,直至达到预设收敛条件,停止优化,并将优化后的图 像生成器作为目标图像生成器,以对待修复人脸图像进行图像盲修复处理。The generator determination module is used to alternately repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stopping the optimization, and treating the optimized image generator as the target image generator. Repair face images for blind image repair processing.
  7. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至5任一项所述图像生成器的训练方法。An electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein when the processor executes the program, any one of claims 1 to 5 is implemented. The training method of the image generator described in the item.
  8. 一种非暂态计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至5任一项所述图像生成器的训练方法。A non-transitory computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the training method of the image generator according to any one of claims 1 to 5 is implemented.
PCT/CN2022/125015 2022-06-23 2022-10-13 Image generator training method and apparatus, and electronic device and readable storage medium WO2023245927A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210715667.4 2022-06-23
CN202210715667.4A CN114782291B (en) 2022-06-23 2022-06-23 Training method and device of image generator, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
WO2023245927A1 true WO2023245927A1 (en) 2023-12-28

Family

ID=82422490

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125015 WO2023245927A1 (en) 2022-06-23 2022-10-13 Image generator training method and apparatus, and electronic device and readable storage medium

Country Status (2)

Country Link
CN (1) CN114782291B (en)
WO (1) WO2023245927A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117853638A (en) * 2024-03-07 2024-04-09 厦门大学 End-to-end 3D face rapid generation and editing method based on text driving

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782291B (en) * 2022-06-23 2022-09-06 中国科学院自动化研究所 Training method and device of image generator, electronic equipment and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363716A (en) * 2019-06-25 2019-10-22 北京工业大学 One kind is generated based on condition and fights network combined degraded image high quality method for reconstructing
EP3557487A1 (en) * 2018-04-20 2019-10-23 ZF Friedrichshafen AG Generation of validation data with generative contradictory networks
CN111127308A (en) * 2019-12-08 2020-05-08 复旦大学 Mirror image feature rearrangement repairing method for single sample face recognition under local shielding
US20200372351A1 (en) * 2019-05-23 2020-11-26 Htc Corporation Method for training generative adversarial network (gan), method for generating images by using gan, and computer readable storage medium
CN113112411A (en) * 2020-01-13 2021-07-13 南京信息工程大学 Human face image semantic restoration method based on multi-scale feature fusion
CN113160079A (en) * 2021-04-13 2021-07-23 Oppo广东移动通信有限公司 Portrait restoration model training method, portrait restoration method and device
CN113763268A (en) * 2021-08-26 2021-12-07 中国科学院自动化研究所 Blind restoration method and system for face image
CN113936318A (en) * 2021-10-20 2022-01-14 成都信息工程大学 Human face image restoration method based on GAN human face prior information prediction and fusion
CN114782291A (en) * 2022-06-23 2022-07-22 中国科学院自动化研究所 Training method and device of image generator, electronic equipment and readable storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559287A (en) * 2018-11-20 2019-04-02 北京工业大学 A kind of semantic image restorative procedure generating confrontation network based on DenseNet
CN111783603A (en) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 Training method for generating confrontation network, image face changing method and video face changing method and device
CN112330574B (en) * 2020-11-30 2022-07-12 深圳市慧鲤科技有限公司 Portrait restoration method and device, electronic equipment and computer storage medium
CN112837234B (en) * 2021-01-25 2022-07-22 重庆师范大学 Human face image restoration method based on multi-column gating convolution network
CN113298736B (en) * 2021-06-24 2022-03-04 河北工业大学 Face image restoration method based on face pattern
CN113743332B (en) * 2021-09-08 2022-03-25 中国科学院自动化研究所 Image quality evaluation method and system based on universal vision pre-training model
CN113962893A (en) * 2021-10-27 2022-01-21 山西大学 Face image restoration method based on multi-scale local self-attention generation countermeasure network
CN114549341A (en) * 2022-01-11 2022-05-27 温州大学 Sample guidance-based face image diversified restoration method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3557487A1 (en) * 2018-04-20 2019-10-23 ZF Friedrichshafen AG Generation of validation data with generative contradictory networks
US20200372351A1 (en) * 2019-05-23 2020-11-26 Htc Corporation Method for training generative adversarial network (gan), method for generating images by using gan, and computer readable storage medium
CN110363716A (en) * 2019-06-25 2019-10-22 北京工业大学 One kind is generated based on condition and fights network combined degraded image high quality method for reconstructing
CN111127308A (en) * 2019-12-08 2020-05-08 复旦大学 Mirror image feature rearrangement repairing method for single sample face recognition under local shielding
CN113112411A (en) * 2020-01-13 2021-07-13 南京信息工程大学 Human face image semantic restoration method based on multi-scale feature fusion
CN113160079A (en) * 2021-04-13 2021-07-23 Oppo广东移动通信有限公司 Portrait restoration model training method, portrait restoration method and device
CN113763268A (en) * 2021-08-26 2021-12-07 中国科学院自动化研究所 Blind restoration method and system for face image
CN113936318A (en) * 2021-10-20 2022-01-14 成都信息工程大学 Human face image restoration method based on GAN human face prior information prediction and fusion
CN114782291A (en) * 2022-06-23 2022-07-22 中国科学院自动化研究所 Training method and device of image generator, electronic equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117853638A (en) * 2024-03-07 2024-04-09 厦门大学 End-to-end 3D face rapid generation and editing method based on text driving

Also Published As

Publication number Publication date
CN114782291B (en) 2022-09-06
CN114782291A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
WO2023245927A1 (en) Image generator training method and apparatus, and electronic device and readable storage medium
CN111798400B (en) Non-reference low-illumination image enhancement method and system based on generation countermeasure network
CN108520503B (en) Face defect image restoration method based on self-encoder and generation countermeasure network
CN111079532B (en) Video content description method based on text self-encoder
CN108550118B (en) Motion blur image blur processing method, device, equipment and storage medium
CN115345980A (en) Generation method and device of personalized texture map
Garcia-Cardona et al. Subproblem coupling in convolutional dictionary learning
WO2023159746A1 (en) Image matting method and apparatus based on image segmentation, computer device, and medium
CN110969089A (en) Lightweight face recognition system and recognition method under noise environment
US20220414838A1 (en) Image dehazing method and system based on cyclegan
CN114445420A (en) Image segmentation model with coding and decoding structure combined with attention mechanism and training method thereof
CN114863539A (en) Portrait key point detection method and system based on feature fusion
JP2023001926A (en) Method and apparatus of fusing image, method and apparatus of training image fusion model, electronic device, storage medium and computer program
Liu et al. Facial image inpainting using multi-level generative network
CN109523478B (en) Image descreening method and storage medium
KR102393761B1 (en) Method and system of learning artificial neural network model for image processing
WO2022252372A1 (en) Image processing method, apparatus and device, and computer-readable storage medium
Wei et al. Image denoising with deep unfolding and normalizing flows
CN114862699A (en) Face repairing method, device and storage medium based on generation countermeasure network
CN116402916B (en) Face image restoration method and device, computer equipment and storage medium
CN114495236B (en) Image segmentation method, apparatus, device, medium, and program product
RU2817316C2 (en) Method and apparatus for training image generation model, method and apparatus for generating images and their devices
Bera et al. A lightweight convolutional neural network for image denoising with fine details preservation capability
CN116109545A (en) Image downsampling method and device
CN117876676A (en) Pulse-driven FPN-based semantic segmentation system, method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22947664

Country of ref document: EP

Kind code of ref document: A1