WO2023245927A1

WO2023245927A1 - Image generator training method and apparatus, and electronic device and readable storage medium

Info

Publication number: WO2023245927A1
Application number: PCT/CN2022/125015
Authority: WO
Inventors: 葛国敬; 王金桥; 朱贵波
Original assignee: 中国科学院自动化研究所
Priority date: 2022-06-23
Filing date: 2022-10-13
Publication date: 2023-12-28
Also published as: CN114782291B; CN114782291A

Abstract

An image generator training method and apparatus, and an electronic device and a readable storage medium, which relate to the technical field of image processing. The image generator training method comprises: acquiring an original facial sample image and a degraded facial sample image corresponding to the original facial sample image (S1); inputting the degraded facial sample image into an image generator to obtain a repaired facial sample image, wherein the image generator is constructed on the basis of a transformer model (S2); on the basis of the original facial sample image and the repaired facial sample image, optimizing an image discriminator (S3); on the basis of the original facial sample image and the repaired facial sample image, optimizing the image generator (S4); and repeating the steps of optimizing the image discriminator and the image generator until a preset convergence condition is met, such that blind image repair processing is performed, by means of the optimized image generator, on a facial image to be repaired, thereby achieving an end-to-end blind image repair function (S5).

Description

Training method, device, electronic device and readable storage medium for image generator

Cross-references to related applications

This application claims the priority of the Chinese patent application with application number 202210715667.4 submitted on June 23, 2022, and the invention title is "Training method, device, electronic device and readable storage medium for image generator", which is by reference All are incorporated herein.

Technical field

The present application relates to the field of image processing technology, and in particular to a training method, device, electronic device and readable storage medium for an image generator.

Background technique

Image repair technology is a technology that repairs the lost information or detailed information in the image to be repaired based on the known information of the image and preset repair rules to achieve visually realistic effects. The blind image repair technology refers to the technology of pre-selecting the image to be repaired without knowing the image loss type or image degradation type of the image to be repaired.

In the existing technology, Convolutional Neural Networks (CNN) technology is used to realize the blind image repair function. However, this method cannot obtain ideal training results in a single stage, so the training task needs to be completed in two stages. The first stage requires training a generator, and the second stage requires embedding the trained generator into The network structure of the deep learning segmentation network (Unet) is debugged, so that the image to be repaired is repaired according to the debugged generator. It can be seen that the blind image repair method provided in the existing technology cannot obtain ideal training results through a single stage of training, but needs to complete the training task in two stages, and the training process requires manual intervention, and the training path is relatively cumbersome and complicated.

Therefore, in the existing technology, when using convolutional neural networks for blind image repair processing, ideal training results cannot be obtained in a single stage. The training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome. For complex technical problems, technicians in relevant fields have no effective solutions.

Contents of the invention

This application provides a training method, device, electronic equipment and readable storage medium for an image generator to solve the problem that when using a convolutional neural network to perform blind image repair processing in the prior art, ideal training results cannot be obtained in a single stage. The training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome and complicated to achieve the end-to-end image blind repair function, and the training process does not require manual intervention, and the training path is relatively simple.

This application provides a training method for an image generator, which includes: acquiring an original sample face image and a degraded sample face image corresponding to the original sample face image; inputting the degraded sample face image into a pre-constructed In the image generator, the repaired sample face image generated by the image generator is obtained; the image generator is constructed based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-constructed The image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image; based on the original sample face image and the repaired sample face image Sample face image, optimize the image generator to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; alternately repeat the above steps of optimizing the image discriminator and optimizing the image The generator steps until the preset convergence condition is reached, the optimization is stopped, and the optimized image generator is used as the target image generator to perform blind image repair processing on the face image to be repaired.

According to an image generator training method provided by the present application, the pre-constructed image discriminator is optimized based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator. , including: inputting the original sample face image and the repaired sample face image to the image discriminator; obtaining the first image discrimination result corresponding to the original sample face image, and obtaining the repaired sample face image. a second image discrimination result corresponding to the face image; based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator; fix the device parameters of the image generator, and Iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator to obtain an optimized image discriminator.

According to an image generator training method provided by this application, the image generator is optimized based on the original sample face image and the repaired sample face image to obtain an optimized image generator, The method includes: obtaining the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator; and based on the original sample face image, the repaired sample face image and the second image discrimination result. , obtain the second loss function of the image generator; fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator, and obtain the optimization Image generator after.

According to a training method of an image generator provided by the present application, the second image generator of the image generator is obtained based on the original sample face image, the repaired sample face image and the second image discrimination result. The loss function includes: obtaining the content loss of the image generator based on the original sample face image and the repaired sample face image, where the content loss is used to measure the difference between the repaired sample face image and the original sample face image. content difference between; obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image, the ID loss is used to measure the difference between the repaired sample face image and the original sample face image distance difference between; obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability; based on the content loss, the ID loss and the Generate loss, obtain the second loss function of the image generator.

According to an image generator training method provided by this application, the image discriminator is a wavelet discriminator.

According to an image generator training method provided by this application, the wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into multiple frequency scales. Feature image; the splicing convolution module is used to splice feature images of multiple frequency scales, and perform convolution processing on the spliced feature images to obtain a reconstructed image.

According to an image generator training method provided by the present application, the degraded sample face image is input into a pre-constructed image generator to obtain a repaired sample face image generated by the image generator, including : Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features; input the high-level semantic features into the feature conversion module of the image generator, Obtain the style vector; input the low-level semantic features, the high-level semantic features and the style vector into the decoder of the image generator to obtain the repaired sample face image.

This application also provides a training device for an image generator, including: a sample image acquisition module, used to acquire an original sample face image and a degraded sample face image corresponding to the original sample face image; a degraded image repair module , used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model; the discriminator optimization module , used to optimize the pre-constructed image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image A face image and the repaired sample face image; a generator optimization module configured to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generation The image generator and the image discriminator form a generative adversarial network; the generator determination module is used to alternately repeat the above-mentioned steps of optimizing the image discriminator and the steps of optimizing the image generator until the preset convergence condition is reached, and stop Optimize, and use the optimized image generator as the target image generator to perform blind image repair processing on the face image to be repaired.

The present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements any of the above image generators. training methods.

The present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the training method of any of the above image generators is implemented.

The training method, device, electronic equipment and readable storage medium of the image generator provided by this application pre-constructs the image generator and the image discriminator to form a generative adversarial network. During the multiple optimization processes, the image generator does not know in advance According to the image loss type or image degradation type of the degraded sample face image, blind image repair processing is performed on the degraded sample face image, and try to generate a repaired sample face image with high image performance index, high degree of restoration and realistic; and The image discriminator tries to identify the difference between the repaired sample face image generated by the image generator and the original sample face, so that the image generator and image discriminator are continuously optimized during the adversarial training process until the preset convergence conditions are reached and the optimization is stopped. , and use the optimized image generator as the target image generator to perform blind image repair processing on the face image to be repaired, so that a high-quality target repaired face image can be obtained, and the end-to-end image blind repair function and training can be achieved The process does not require manual intervention, and the training path is relatively simple. It overcomes the problem that when using convolutional neural networks for blind image repair processing in the existing technology, ideal training results cannot be obtained in a single stage, and the training task needs to be completed in two stages, and The training process requires manual intervention, and the training path is cumbersome and complex.

Description of the drawings

In order to explain the technical solutions in this application or the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

Figure 1 is one of the flow diagrams of the training method of the image generator provided by this application;

Figure 2 is the second schematic flow chart of the training method of the image generator provided by this application;

Figure 3 is the third schematic flow chart of the training method of the image generator provided by this application;

Figure 4 is the fourth schematic flowchart of the training method of the image generator provided by this application;

Figure 5 is the fifth schematic flow chart of the training method of the image generator provided by this application;

Figure 6 is a schematic structural diagram of the optimized training model of the image generator in the second embodiment of the present application;

Figure 7 is a schematic structural diagram of the training device of the image generator provided by this application;

Figure 8 is a schematic structural diagram of an electronic device provided by this application.

Reference signs:

100: Training device for image generator; 10: Sample image acquisition module; 20: Degraded image repair module; 30: Discriminator optimization module; 40: Generator optimization module; 50: Generator determination module; 810: Processor; 820: communication interface; 830: memory; 840: communication bus.

Detailed ways

In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be clearly and completely described below in conjunction with the drawings in this application. Obviously, the described embodiments are part of the embodiments of this application. , not all examples. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

The training method of the image generator provided by this application is described below with reference to Figures 1-5. As shown in Figure 1, this application provides a training method for an image generator, including:

Step S1: Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image.

Among them, the original sample face image represents a sample face image with a relatively high image performance index (or image quality index). Degraded sample face images represent sample face images with relatively low image performance indicators. The original sample face image and the degraded sample face image constitute a sample face image pair, which is used to supervise the training of the image generator and image discriminator.

Step S2: Input the degraded sample face image into the pre-built image generator to obtain the repaired sample face image generated by the image generator; the image generator is built based on the Transformer model.

The Transformer model is a model built based on the idea of Attention, which is widely used in technical fields such as natural language processing, semantic relationship extraction, summary generation, named entity recognition, and machine translation.

Step S3: Based on the original sample face image and the repaired sample face image, optimize the pre-built image discriminator to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image .

Step S4: Based on the original sample face image and the repaired sample face image, optimize the image generator to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network.

Step S5: Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stop the optimization, and use the optimized image generator as the target image generator for the face image to be repaired Perform blind image repair processing.

The preset convergence condition may be a preset maximum number of iterations, a preset image performance index threshold, or other convergence conditions, which are not specifically limited in this application. For example, when the preset convergence condition is the preset maximum number of iterations, determine whether the current iteration number reaches the preset maximum iteration number, and when the current iteration number reaches the preset maximum iteration number, stop the iteration; when the current iteration number If the preset maximum number of iterations is not reached, the iteration continues until the current iteration number reaches the preset maximum number of iterations. In the same way, when the preset convergence condition is the preset image performance index threshold, it is judged whether the image performance index of the repaired sample face image reaches the preset image performance index threshold, and based on the judgment result, it is determined whether to continue to stop the iteration.

The image generator does not know the image loss type or image degradation type of the degraded sample face image in advance, and is used to perform image blind repair processing on the degraded sample face image to generate a repaired sample face image. The image discriminator is used to determine whether the repaired sample face image generated by the image generator is consistent with the original sample face image.

In the above steps S1 to S5, an image generator and an image discriminator are constructed in advance to form a generative adversarial network. During multiple optimization processes, the image generator does not know in advance the image loss type or image degradation type of the degraded sample face image. , carry out image blind repair processing on the degraded sample face image, and try to generate a repaired sample face image with high image performance index, high degree of restoration and realistic face image; while the image discriminator tries to identify the repaired sample face image generated by the image generator. The difference between the face image and the original sample face causes the image generator and image discriminator to be continuously optimized during the adversarial training process until the preset convergence conditions are reached, the optimization is stopped, and the optimized image generator is used as the target image generator , perform blind image repair processing on the face image to be repaired, so as to obtain a high-quality target repaired face image, and realize the end-to-end image blind repair function. The training process does not require manual intervention, and the training path is relatively simple, overcoming the When using convolutional neural networks for blind image repair processing in the existing technology, ideal training results cannot be obtained in a single stage. The training task needs to be completed in two stages, and the training process requires manual intervention, and the training path is cumbersome and complicated.

In one embodiment, before step S1, the training method of the image generator provided by this application also includes: performing an image degradation operation on the original sample face image to obtain a degraded sample face corresponding to the original sample face image. Image, wherein the image degradation operations include but are not limited to blur operations, downsampling operations, Gaussian white noise addition operations, and JPEG compression operations.

Optionally, the blur operation includes Gaussian blur operation and motion blur operation. Downsampling operations include bicubic interpolation (Bicubic) downsampling operations, bilinear interpolation (Bilinear) downsampling operations, and Lanczos downsampling operations. Among them, the Lanczos algorithm is a method that transforms a symmetric matrix into a symmetric three pair through orthogonal similarity transformation. Algorithm for angular matrices. The noise adding operation includes the Gaussian white noise adding operation and the Poisson noise adding operation.

It should be noted that this embodiment does not use pre-prepared degraded sample face images, but performs online image degradation operations during the training process, which can make the types of degraded sample face images used during the training process It is richer to improve the adaptive image repair ability of the image generator when dealing with face images to be repaired with unknown image loss types, and improves the optimization training effect.

In this embodiment, the original sample face image is subjected to online image degradation processing by setting an online image degradation operation to obtain a degraded sample face image, which enriches the image loss types of the degraded sample face image, thereby improving the Optimizing the training effect improves the generalization performance of the target image generator so that it can perform blind image repair processing for face images to be repaired with different image loss types.

In one embodiment, as shown in Figure 2, the above-mentioned step S3 specifically includes steps S31 to step S34, wherein:

Step S31: Input the original sample face image and the repaired sample face image to the image discriminator.

Step S32: Obtain the first image discrimination result corresponding to the original sample face image, and obtain the second image discrimination result corresponding to the repaired sample face image.

The first image discrimination result represents the image discrimination result output by the image discriminator after the original sample face image is input to the image discriminator. The second image discrimination result represents the image discrimination result output by the image discriminator after the repaired sample face image is input to the image discriminator.

Step S33: Based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator.

It should be noted that the loss function of the image discriminator can use the first loss function provided in the embodiment of this application, or other loss functions, and this application does not impose specific restrictions.

Step S34: Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain an optimized image discriminator.

It should be noted that during the process of optimizing the image discriminator, the device parameters of the image generator need to be fixed, that is, the device parameters of the image generator are kept fixed and only the device parameters of the image discriminator are iteratively updated.

In the above steps S31 to S34, by inputting the original sample face image to the first image discrimination result obtained by the image discriminator for comparison, and combining the repaired sample face image with the second image discrimination result obtained by inputting the image discriminator, it is possible to Accurately calculate the first loss function of the image discriminator, and use the first loss function as the objective function to iteratively optimize the device parameters of the image discriminator, thereby improving the optimization training effect of the image discriminator. In addition, by iterating along the gradient descent direction of the first loss function, the loss of the image discriminator can be minimized at the fastest iteration speed, that is, the optimization training task of the image discriminator can be completed with high quality and efficiency, achieving While improving the optimization training efficiency of the image discriminator, it further improves the optimization training effect of the image discriminator.

In one embodiment, a first distribution probability that the first image discrimination result is true is obtained, and a second distribution probability that the second image discrimination result is false is obtained, and the image discriminator is determined based on the first distribution probability and the second distribution probability. The first loss function.

The first distribution probability represents the distribution probability that the image discrimination result obtained by inputting the original sample face image to the image discriminator is expected to be true. The second distribution probability represents the distribution probability that the image discrimination result obtained by inputting the repaired sample face image to the image discriminator is false.

In one embodiment, as shown in Figure 3, the above-mentioned step S4 specifically includes steps S41 to step S43, wherein:

Step S41: Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result.

Step S42: Obtain the second loss function of the image generator based on the original sample face image, the repaired sample face image, and the second image discrimination result.

It should be noted that the loss function of the image generator can use the second loss function provided in the embodiment of the present application, or other loss functions, which are not specifically limited by this application.

Step S43: Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.

Similarly, in the process of optimizing the image generator, it is necessary to fix the device parameters of the image discriminator, that is, keep the device parameters of the image discriminator fixed and only update the device parameters of the image generator iteratively.

From the above steps S41 to S43, by combining the original sample face image, the repaired sample face image, and the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator, the third image generator of the image generator can be accurately calculated. The second loss function is used as the objective function to iteratively optimize the device parameters of the image generator, which can improve the optimization training effect of the image generator. In addition, iterating along the direction of gradient descent of the second loss function can minimize the loss of the image generator at the fastest iteration speed, that is, the optimization training task of the image generator can be completed with high quality and efficiency, achieving While improving the optimization training efficiency of the image generator, the optimization training effect of the image generator is further improved. In addition, the image generator training method provided by this application uses fewer loss functions and training techniques than the existing method of using convolutional neural networks for blind image repair processing, so the training process is relatively simple and easy to implement. .

In one embodiment, as shown in Figure 4, the above-mentioned step S42 specifically includes steps S421 to step S424, wherein:

Step S421: Obtain the content loss of the image generator based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image.

Step S422: Obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.

Step S423: Obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability.

The maximum probability represents the maximum probability that the second image discrimination result obtained when the repaired sample face image is input into the image discriminator is true.

Step S424: Obtain the second loss function of the image generator based on the content loss, ID loss and generation loss.

The above steps S421 to S424 can accurately calculate the content loss, ID loss and generation loss of the image generator in the process of generating the repaired sample face image, and by combining the content loss, ID loss and generation loss of the image generator. Calculating the second loss function of the image generator, and then using the second loss function as the objective function to iteratively optimize the device parameters of the image generator can further improve the optimization training effect of the image generator.

In one embodiment, the image discriminator is a wavelet discriminator. The wavelet discriminator provided in this embodiment is used to eliminate or weaken the block effect in the repair sample face image generated during the iterative training process of the image generator, so that the finally obtained target image generator has a better blind image repair effect, and further Improved the optimization training effect of image generator.

It should be noted that the blocking effect can be intuitively observed from the repaired sample face image generated by the image generator, and the image performance index parameters of the repaired sample face image can be obtained to determine whether there is a blocking effect in the repaired sample face image. question. Compared with other image discriminators, the image generator is optimized and trained through the wavelet discriminator provided in this embodiment, so that there is no block effect or less block effect in the repaired sample face image generated based on the optimized image generator. block effect.

In one embodiment, the image discriminator also includes a spectral normalization (Spectral Normalization) stability constraint, which is used to improve the stability of the optimization training model to solve the problem of unstable training during the optimization training process.

In one embodiment, the wavelet discriminator includes a discrete wavelet transform module and a concatenated convolution module, wherein: the discrete wavelet transform module is used to decompose the input image into feature images of multiple frequency scales; the concatenated convolution module is used to decompose multiple frequency scales. The frequency scale feature images are spliced, and the spliced feature images are convolved to obtain the reconstructed image.

It should be noted that feature images at multiple frequency scales contain more image detail information than the input image. Since the discrete wavelet transform module has good time-frequency positioning function, it has better ability to retain image detail information. Therefore, the discrete wavelet transform can be used to recover the image detail information that is lost in the input image, but exists in the original image corresponding to the input image, so that the feature images of multiple frequency scales containing image detail information can be spliced according to the splicing convolution module. and convolution smoothing processing to obtain a reconstructed image containing image detail information, which increases the range of the image's receptive field, thereby eliminating or weakening the block effect existing in the input image.

The wavelet discriminator provided in this embodiment can use its discrete wavelet transform principle and splicing convolution principle to supervise and train the image generator to generate a repaired sample face image with more image detail information, thereby improving the receptive field of the repaired sample face image. range, thereby eliminating or weakening the block effect existing in the repaired sample face image, improving the optimization training effect, and obtaining an image generator with better performance.

It should be further explained that during the image repair and generation process of the image generator built based on the transformer model, the self-control requires global attention. However, global attention has the problem of excessive calculation, so local attention is used instead of global attention to solve the problem. The problem of excessive calculation amount. However, using local attention instead of global attention will reduce the range of the receptive field of the generated image, resulting in block effects in the generated repaired sample face images. The wavelet discriminator provided in this embodiment can expand the range of the receptive field and achieve a better balance between calculation efficiency and image repair performance to solve the problem of block effects in repair sample face images while ensuring calculation efficiency. At the same time, the blind image repair effect of the target image generator is improved.

In one embodiment, as shown in Figure 5, the above-mentioned step S2 specifically includes steps S21 to step S23, wherein:

Step S21: Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.

Among them, low-level semantic features include image contour features, edge features, color features, texture features and shape features. High-level semantic features represent visual features visualized in images, such as faces, beaches and other features with rich semantic information.

Further, the encoder is used to perform convolution operations, nonlinear operations, etc. on the degraded sample face image to obtain the low-level semantic features and high-level semantic features of the degraded sample face image.

Step S22: Input the high-level semantic features into the feature conversion module of the image generator to obtain the style vector. Among them, the feature conversion module can also be called a mapping module.

Step S23: Input low-level semantic features, high-level semantic features and style vectors into the decoder of the image generator to obtain a repaired sample face image.

In one embodiment, the encoder includes multiple coding modules. Each coding module corresponds to a feature scale. The coding module is used to extract a feature map corresponding to its own feature scale from the input image, and combine the low-dimensional feature map with the high-dimensional feature map. The feature map is sent to the decoder, and the high-dimensional feature map is sent to the mapping module. The low-dimensional feature map is the low-level semantic feature, and the high-dimensional feature map is the high-level semantic feature.

In one embodiment, the mapping module (ie, the above-mentioned feature conversion module) includes multiple fully connected layers. The multiple fully connected layers are used to receive high-dimensional feature maps sent by the encoding module and convert the high-dimensional feature map mapping into Style vector, style vector includes multiple vector elements, each vector element corresponds to a visual feature.

In one embodiment, the decoder includes multiple cascaded decoding modules, each decoding module corresponding to a feature scale. Each decoding module is used to obtain the low-dimensional feature map corresponding to its own feature scale, based on the low-dimensional feature map corresponding to its own feature scale, the high-dimensional feature map, the style vector corresponding to the high-dimensional feature map, and the previous level input. Parameters generate image repair results, and output the image repair results as next-level input parameters.

It should be noted that the upper-level input parameters represent the image repair results of the upper-level decoding module. The input parameters of the upper level of the first layer decoding module are constants or Fourier features. The last-level decoding module generates repaired sample face images based on the low-dimensional feature map corresponding to its own feature scale, the high-dimensional feature map, the style vector corresponding to the high-dimensional feature map, and the input parameters of the previous level.

Further, the image repair results output by the upper-level decoding module are added to their corresponding relative position codes as the input parameters of the next-level decoding module.

Two specific examples are provided below to further illustrate the training method of the image generator provided by this application.

In specific embodiment 1, the training method of the image generator provided by this application includes the following steps:

Step 1: Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image. The degraded sample face image is input into the encoder of the image generator to obtain low-level semantic features and high-level semantic features. The image generator is built based on the Transformer model. The high-level semantic features are input into the feature transformation module of the image generator to obtain the style vector. The low-level semantic features, high-level semantic features and style vectors are input into the decoder of the image generator to obtain the repaired sample face image.

Step 2: Input the original sample face image and the repaired sample face image to the image discriminator. The image discriminator is used to distinguish the original sample face image and the repaired sample face image. A first image discrimination result corresponding to the original sample face image is obtained, and a second image discrimination result corresponding to the repaired sample face image is obtained. Based on the first image discrimination result and the second image discrimination result, a first loss function of the image discriminator is obtained. Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain the optimized image discriminator.

Step 3: Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result. The content loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image. The ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image. The maximum probability that the second image discrimination result is true is obtained, and the generation loss of the image generator is obtained based on the maximum probability. Based on the content loss, ID loss, and generation loss, obtain the second loss function of the image generator. Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.

Step 4: Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator for the face image to be repaired Perform blind image repair processing.

Figure 6 is a schematic structural diagram of the optimized training model of the image generator in the second specific embodiment of the present application. As shown in Figure 6, the second specific embodiment provided by the present application specifically includes the following steps:

Step (1): Obtain the original sample face image, and perform an online image degradation operation on the original sample face image to obtain a degraded sample face image corresponding to the original sample face image. The image degradation operation includes: Not limited to blur operations, downsampling operations, Gaussian white noise operations, and JPEG compression operations.

Step (2): Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features. The image generator is built based on the Transformer model. Input the high-level semantic features into the feature conversion module (i.e., mapping module) of the image generator to obtain the style vector corresponding to the high-level semantic features. The low-level semantic features, high-level semantic features and style vectors are input into the decoder of the image generator to obtain the repaired sample face image. The encoder includes multiple encoding modules. The mapping module includes multiple fully connected layers. The decoder includes multiple decoding modules, and the number of decoding modules is equal to the number of encoding modules. The decoding module can be composed of AdaIN and double attention layer (Double Attn), or it can be composed of AdaIN and multi-layer perceptron layer (MLP). The input and output of the decoding module use residual connections.

Step (3): Input the original sample face image and the repaired sample face image to the image discriminator, and the image discriminator is used to distinguish the original sample face image and the repaired sample face image. A first image discrimination result corresponding to the original sample face image is obtained, and a second image discrimination result corresponding to the repaired sample face image is obtained. Obtain a first distribution probability that the first image discrimination result is true, obtain a second distribution probability that the second image discrimination result is false, and determine a first loss function of the image discriminator based on the first distribution probability and the second distribution probability. The first distribution probability represents the distribution probability that the image discrimination result obtained by inputting the original sample face image to the image discriminator is expected to be true. The second distribution probability represents the distribution probability that the image discrimination result obtained by inputting the repaired sample face image to the image discriminator is false.

Specifically, the first loss function is shown in the following formula (1):

Among them, L _D represents the first loss function, y represents the original sample face image, P _y represents the distribution probability of the original sample face image, D(y) represents the first image discrimination result corresponding to the original sample face image,

Indicates the first distribution probability corresponding to the original sample face image. x represents the degraded sample face image, P _x represents the distribution probability of the degraded sample face image, G(x) represents the repaired sample face image corresponding to the degraded sample face image, D(G(x)) represents the repaired face image The second image discrimination result corresponding to the sample face image,

Indicates the second distribution probability corresponding to the repaired sample face image. γ represents the weight coefficient,

represents the spectral normalization stability constraint. The two negative signs in the formula indicate the direction of gradient descent to control the value of the first loss function between (0,1) for gradient descent.

Fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and obtain the optimized image discriminator. The image discriminator consists of a wavelet discriminator and a spectral normalization stability constraint. The wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module. The discrete wavelet transform module is the DWT discrete wavelet transform module, which is used to decompose the input image into Feature images at multiple frequency scales. The concatenated convolution module includes concat concatenation unit and conv convolution unit. The concat concatenation unit is used to concatenate feature images at multiple frequency scales. The conv convolution unit performs convolution and smoothing processing on the spliced feature images to obtain the reconstructed image. For example, the DWT discrete wavelet transform module decomposes a 1024*1024 input image into four 512*512 feature images. The concat splicing unit splices four 512*512 feature images. The conv convolution unit will perform convolution and smoothing processing on the spliced feature images to obtain a 1024*1024 reconstructed image.

Step (4): Obtain the repaired sample face image and input it into the image discriminator to obtain the second image discrimination result. The content loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image. L ₁ loss is used as the image generator. The content loss of , where the calculation method of content loss is as shown in formula (2):

L ₁ (x)＝||yG(x)|| ₁ (2)

Among them, L ₁ (x) represents the content loss of the image generator, x represents the degraded sample face image, y represents the original sample face image, and G(x) represents the repaired sample face image.

The ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image. The ID loss is calculated as follows Formula (3) shows:

L _ID (x)＝1-<R(y),R(G(x))> (3)

Among them, L _ID (x) represents the ID loss of the image generator, R represents the face recognition network trained based on the preset face recognition algorithm, and R (y) represents the input of the degraded sample face image to the face recognition network output. The first face recognition result, R(G(x)) represents the second face recognition result output from the repaired sample face image input to the face recognition network, <R(y),R(G(x))> Indicates the similarity between the degraded sample face image and the repaired sample face image.

It should be noted that the above formula represents "1 minus the similarity between the degraded sample face image and the repaired sample face image." Since the difference between the degraded sample face image and the repaired sample face image when generative adversarial training was just started, The similarity is low. As the generative adversarial training continues, the similarity between the two gradually increases. Using "1 minus the similarity between the two" means that as the generative adversarial training continues, the similarity gradually increases, and The ID loss is gradually reduced to achieve gradient reduction of the ID loss. Obtain the maximum probability that the second image discrimination result is true, obtain the unsaturated loss based on the maximized probability, and use the unsaturated loss as the generation loss of the image generator, where the calculation method of the generation loss is as shown in formula (4):

L _gan (x)＝maxlog[D(G(x))] (4)

Among them, _Lgan (x) represents the generation loss of the image generator, G(x) represents the repaired sample face image, and D(G(x)) represents the second image obtained by inputting the repaired sample face image into the image discriminator. The discrimination result, maxlog[D(G(x))] represents the maximum probability that the second image discrimination result is true.

It should be noted that in the initial stage of optimization training, the repaired sample face image generated by the image generator is easily recognized by the image discriminator, that is, D(G(x)) approaches 0. However, the non-saturated image The gradient of the generator's log[D(G(x))] does not tend to 0, which can provide a better gradient direction for the device parameter update of the image generator and improve the convergence speed of the iteration.

Based on the content loss, ID loss and generation loss, the second loss function of the image generator is obtained, where the calculation method of the second loss function is as shown in formula (5):

L _G ＝λ ₁ L ₁ (x)+λ ₂ L _gan (x)+λ ₃ L _ID (x) (5)

Among them, L _G represents the second loss function, L ₁ (x) represents the content loss of the image generator, L _gan (x) represents the generation loss of the image generator, L _ID (x) represents the ID loss of the image generator, λ ₁ represents the first hyperparameter, λ ₂ represents the second hyperparameter, and λ ₃ represents the third hyperparameter.

Fix the device parameters of the image discriminator, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.

Step (5): Repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator alternately to obtain the image performance index of the repaired sample face image generated by the current image generator. When the image performance index reaches the preset image performance index threshold, In the case of , stop optimization and use the current image generator as the target image generator to perform blind image repair processing on the face image to be repaired.

The training device for an image generator provided by this application is described below. The training device for an image generator described below and the training method for an image generator described above may be referred to correspondingly.

As shown in Figure 7, this application provides an image generator training device 100, including a sample image acquisition module 10, a degraded image repair module 20, a discriminator optimization module 30, a generator optimization module 40 and a generator determination module 50 ,in:

The sample image acquisition module 10 is used to acquire the original sample face image and the degraded sample face image corresponding to the original sample face image.

The degraded image repair module 20 is used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model.

The discriminator optimization module 30 is used to optimize the pre-built image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish between the original sample face image and the repaired sample face image. Repair sample face image.

The generator optimization module 40 is used to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network.

The generator determination module 50 is configured to alternately repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stopping the optimization, and using the optimized image generator as the target image generator to Perform blind image repair processing on the face image to be repaired.

In one embodiment, the discriminator optimization module 30 includes a sample image input unit, a discrimination result acquisition unit, a first function acquisition unit and a discriminator optimization unit, where:

The sample image input unit is used to input the original sample face image and the repaired sample face image to the image discriminator.

The discrimination result acquisition unit is used to obtain the first image discrimination result corresponding to the original sample face image, and to obtain the second image discrimination result corresponding to the repaired sample face image.

The first function acquisition unit is configured to acquire the first loss function of the image discriminator based on the first image discrimination result and the second image discrimination result.

The discriminator optimization unit is used to fix the device parameters of the image generator and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator to obtain an optimized image discriminator.

In one embodiment, the generator optimization module 40 includes a discriminant data acquisition unit, a second function acquisition unit and a generator optimization unit, wherein:

The discrimination data acquisition unit is used to acquire the second image discrimination result obtained by inputting the repaired sample face image into the image discriminator.

The second function acquisition unit is used to acquire the second loss function of the image generator based on the original sample face image, the repaired sample face image, and the second image discrimination result.

The generator optimization unit is used to fix the device parameters of the image discriminator and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator.

In one embodiment, the second function acquisition unit includes a content loss acquisition subunit, an ID loss acquisition subunit, a generation loss acquisition subunit, and a loss function acquisition subunit, where.

The content loss acquisition subunit is used to obtain the content loss of the image generator based on the original sample face image and the repaired sample face image. The content loss is used to measure the content difference between the repaired sample face image and the original sample face image.

The ID loss acquisition subunit is used to obtain the ID loss of the image generator based on the original sample face image and the repaired sample face image. The ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image.

The generation loss acquisition subunit is used to obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability.

The loss function acquisition subunit is used to acquire the second loss function of the image generator based on content loss, ID loss and generation loss.

In one embodiment, the image discriminator is a wavelet discriminator.

In one embodiment, the degraded image repair module 20 includes a feature acquisition unit, a feature conversion unit and an image repair unit, where:

The feature acquisition unit is used to input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features.

The feature conversion unit is used to input high-level semantic features into the feature conversion module of the image generator to obtain the style vector.

The image repair unit is used to input low-level semantic features, high-level semantic features and style vectors into the decoder of the image generator to obtain a repaired sample face image.

Figure 8 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 8, the electronic device may include: a processor (processor) 810, a communication interface (Communications Interface) 820, a memory (memory) 830 and a communication bus 840. Among them, the processor 810, the communication interface 820, and the memory 830 complete communication with each other through the communication bus 840. The processor 810 can call the logic instructions in the memory 830 to execute the training method of the image generator. The method includes: obtaining the original sample face image and the degraded sample face image corresponding to the original sample face image; The face image is input into the pre-built image generator, and the repaired sample face image generated by the image generator is obtained; the image generator is built based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-built sample face image is The image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image and the repaired sample face image; the image generator is optimized based on the original sample face image and the repaired sample face image. , the optimized image generator is obtained; the image generator and the image discriminator form a generative adversarial network; alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and The optimized image generator is used as the target image generator to perform blind image repair processing on the face image to be repaired.

In addition, the above-mentioned logical instructions in the memory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, Read-Only Memory (ROM), Random Access Memory (RAM), magnetic disk or optical disk and other media that can store program code. .

On the other hand, the present application also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by the processor to execute the training method of the image generator provided by each of the above methods. The method includes: obtaining an original sample face image and a degraded sample face image corresponding to the original sample face image; inputting the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator. face image; the image generator is built based on the Transformer model; based on the original sample face image and the repaired sample face image, the pre-built image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image Face image and repaired sample face image; based on the original sample face image and repaired sample face image, the image generator is optimized to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; alternately Repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator to perform blind image repair of the face image to be repaired. deal with.

The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in one place. , or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute various embodiments or methods of certain parts of the embodiments.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims

A training method for an image generator, including:

Obtain the original sample face image and the degraded sample face image corresponding to the original sample face image;

Input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model;

Based on the original sample face image and the repaired sample face image, the pre-constructed image discriminator is optimized to obtain an optimized image discriminator; the image discriminator is used to distinguish the original sample face image from the repaired sample face image. The repaired sample face image; wherein, optimizing the pre-constructed image discriminator based on the original sample face image and the repaired sample face image, and obtaining the optimized image discriminator includes: converting the original The sample face image and the repaired sample face image are input to the image discriminator; the first image discrimination result corresponding to the original sample face image is obtained, and the second image discrimination result corresponding to the repaired sample face image is obtained; based on the first image discrimination As a result and the second image discrimination result, obtain the first loss function of the image discriminator; fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the device parameters of the image discriminator, and get Optimized image discriminator;

Based on the original sample face image and the repaired sample face image, the image generator is optimized to obtain an optimized image generator; the image generator and the image discriminator form a generative adversarial network; Wherein, optimizing the image generator based on the original sample face image and the repaired sample face image to obtain the optimized image generator includes: obtaining the repaired sample face image and inputting it to the image discriminator The second image discrimination result obtained in Iterate in the direction of gradient descent of the two loss functions to optimize the device parameters of the image generator and obtain the optimized image generator;

Alternately repeat the above steps of optimizing the image discriminator and optimizing the image generator until the preset convergence conditions are reached, stop the optimization, and use the optimized image generator as the target image generator to perform image blindness on the face image to be repaired. Repair processing.
The training method of an image generator according to claim 1, wherein the method of obtaining the image generator based on the original sample face image, the repaired sample face image and the second image discrimination result is The second loss function includes:

Obtain the content loss of the image generator based on the original sample face image and the repaired sample face image, and the content loss is used to measure the content difference between the repaired sample face image and the original sample face image;

The ID loss of the image generator is obtained based on the original sample face image and the repaired sample face image, and the ID loss is used to measure the distance difference between the repaired sample face image and the original sample face image;

Obtain the maximum probability that the second image discrimination result is true, and obtain the generation loss of the image generator based on the maximum probability;

Based on the content loss, the ID loss and the generation loss, a second loss function of the image generator is obtained.
The training method of an image generator according to any one of claims 1 to 2, wherein the image discriminator is a wavelet discriminator.
The training method of the image generator according to claim 3, wherein the wavelet discriminator includes a discrete wavelet transform module and a splicing convolution module, wherein:

The discrete wavelet transform module is used to decompose the input image into feature images of multiple frequency scales;

The splicing convolution module is used to splice feature images of multiple frequency scales, and perform convolution processing on the spliced feature images to obtain a reconstructed image.
The training method of an image generator according to claim 1, wherein the degraded sample face image is input into a pre-constructed image generator to obtain a repaired sample face image generated by the image generator. ,include:

Input the degraded sample face image into the encoder of the image generator to obtain low-level semantic features and high-level semantic features;

Input the high-level semantic features into the feature conversion module of the image generator to obtain a style vector; wherein, input the high-level semantic features into the feature conversion module of the image generator to obtain a style vector including : Convert high-level semantic feature mapping into style vectors based on multiple fully connected layers in the feature conversion module. The style vector includes multiple vector elements, each vector element corresponding to a visual feature;

The low-level semantic features, the high-level semantic features and the style vector are input into the decoder of the image generator to obtain the repaired sample face image.
An image generator training device, including:

A sample image acquisition module, used to acquire the original sample face image and the degraded sample face image corresponding to the original sample face image;

Degraded image repair module, used to input the degraded sample face image into a pre-built image generator to obtain a repaired sample face image generated by the image generator; the image generator is built based on the Transformer model ;

The discriminator optimization module is used to optimize the pre-constructed image discriminator based on the original sample face image and the repaired sample face image to obtain an optimized image discriminator; the image discriminator is used to distinguish the The original sample face image and the repaired sample face image; wherein, based on the original sample face image and the repaired sample face image, the pre-constructed image discriminator is optimized to obtain the optimized The image discriminator includes: inputting the original sample face image and the repaired sample face image to the image discriminator; obtaining the first image discrimination result corresponding to the original sample face image, and obtaining the second image discrimination result corresponding to the repaired sample face image Result; based on the first image discrimination result and the second image discrimination result, obtain the first loss function of the image discriminator; fix the device parameters of the image generator, and iterate along the direction of gradient descent of the first loss function to optimize the image The device parameters of the discriminator are used to obtain the optimized image discriminator;

A generator optimization module, configured to optimize the image generator based on the original sample face image and the repaired sample face image to obtain an optimized image generator; the image generator and the image The discriminator constitutes a generative adversarial network; wherein, optimizing the image generator based on the original sample face image and the repaired sample face image to obtain the optimized image generator includes: obtaining the repaired sample face image The face image is input into the second image discrimination result obtained by the image discriminator; based on the original sample face image, the repaired sample face image and the second image discrimination result, the second loss function of the image generator is obtained; the fixed image discriminator device parameters, and iterate along the direction of gradient descent of the second loss function to optimize the device parameters of the image generator to obtain an optimized image generator;

The generator determination module is used to alternately repeat the above-mentioned steps of optimizing the image discriminator and optimizing the image generator until the preset convergence condition is reached, stopping the optimization, and treating the optimized image generator as the target image generator. Repair face images for blind image repair processing.
An electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein when the processor executes the program, any one of claims 1 to 5 is implemented. The training method of the image generator described in the item.
A non-transitory computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the training method of the image generator according to any one of claims 1 to 5 is implemented.