CN111028146B

CN111028146B - Image super-resolution method for generating countermeasure network based on double discriminators

Info

Publication number: CN111028146B
Application number: CN201911076333.1A
Authority: CN
Inventors: 刘可文; 马圆; 黄睿挺; 熊红霞; 房攀攀; 陈亚雷; 李小军; 刘朝阳
Original assignee: Wuhan University of Technology WUT
Current assignee: Zhongyao Tiandi Beijing Information Technology Co ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2022-03-18
Anticipated expiration: 2039-11-06
Also published as: CN111028146A

Abstract

The invention provides an image super-resolution method for generating a countermeasure network based on a double-discriminator, which comprises the following steps: constructing a training sample in a training stage; inputting the training sample into a generating network, and outputting a high-resolution image by the generating network; a high resolution image input countermeasure network; sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively; and in the testing stage, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image. The countermeasure network in the invention further improves the super-resolution precision by the constraint generation type network training of two discriminators respectively working in the pixel domain and the characteristic map domain.

Description

Image super-resolution method for generating countermeasure network based on double discriminators

Technical Field

The invention relates to the field of digital image processing, in particular to an image super-resolution method for generating a countermeasure network based on a double-discriminator.

Background

The image is the most main carrier for transmitting information in human society, and the image processing field has great research value. In digital imaging applications, it is particularly necessary to have high resolution images. In practical applications, it may be difficult to obtain high resolution images due to a variety of intrinsic or extrinsic effects. The most straightforward improvement is from the point of view of the imaging hardware, but high resolution optical sensors are expensive. The method has the advantages of being good in universality and high in efficiency due to improvement of software, and having wide application prospect due to the fact that the super-resolution algorithm is used for executing the super-resolution of the low-quality image efficiently and quickly.

The image super-resolution methods mainly include three types, which are interpolation-based methods, modeling-based methods and learning-based methods, and the learning-based methods can be divided into sparse representation-based methods and convolutional neural network-based methods. The interpolation-based method has the characteristic of high calculation efficiency, but high-frequency texture detail information is easily lost. The modeling-based method utilizes prior information to constrain solution space, the effect is improved to a certain extent compared with an interpolation-based method, but when the size of an input image is small, less prior information can be effectively utilized, and the super-resolution effect is poor. The learning-based method realizes super-resolution by learning the internal relationship between low-resolution images and high-resolution images. In recent years, super-resolution methods based on convolutional neural networks have achieved high accuracy. However, the convolution kernel of the convolutional neural network treats each channel and region of the feature map equally, reducing the feature expression capability of the channels and regions of the network that contain rich high-frequency information. In addition, conventional convolutional neural networks have problems of gradient disappearance and network degradation.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image super-resolution method for generating a confrontation network based on a double-discriminator, wherein a hybrid attention mechanism is introduced into a residual neural network by a generating network, so that the network training difficulty is reduced, the characteristic expression capability of the network is enhanced, the network convergence is accelerated, and the network performance is improved; the countermeasure network further improves the super-resolution precision through two discriminants working in a pixel domain and a feature map domain respectively to constrain the generation type network training.

The invention provides an image super-resolution method for generating a confrontation network based on a double-discriminator, which is characterized in that the confrontation network based on the double-discriminator comprises a generating network based on a residual neural network and a mixed attention mechanism and a confrontation network containing the double-discriminator, and the method comprises the following steps:

a. in the training stage, a training sample is constructed;

b. inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;

c. a high resolution image input countermeasure network; the countermeasure network comprises two discriminators, and the discriminators are used for discriminating whether the image input to the discriminators is a real high-resolution image or a high-resolution image generated by the generating network in a pixel domain and a feature map domain respectively;

d. sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively;

e. and in the testing stage, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image.

In the technical scheme, the generating network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit, wherein the feature extraction unit extracts feature representation of an input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features, regional features and the like through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.

In the above technical solution, the nonlinear mapping unit in the generative network includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit also comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, so that the network learns the residual error between the input characteristic diagram and the output characteristic diagram, the problems of gradient disappearance and network degradation are improved, and the difficulty of training a deep network is reduced.

In the above technical solution, the mixed attention block is composed of a cascade convolution layer and an activation layer; and corresponding descriptors are learned in one step aiming at the input feature graph, and the descriptors are used for endowing different channels and different regions with different weights, so that the feature expression capability of the network is enhanced.

In the above technical solution, the determining in the pixel domain means directly inputting an output image of the generating network to a determiner to determine whether the image is a real high resolution image or a high resolution image generated by the generating network;

and in the feature map domain, the judgment means that the output image of the generator network is input into the VGG-19 network to obtain an inactivated feature map before the fifth maximal pooling and after the fourth convolution, and the inactivated feature map is used as the input of another discriminator to judge whether the current input feature map belongs to a real high-resolution image or a high-resolution image generated by the generator network.

In the above technical solution, each of the discriminators is fitted by a neural network, and is composed of 8 cascaded basic units, a linear regression unit, an activation unit, and a linear regression unit, and each basic unit is composed of a cascaded convolutional layer, a batch normalization unit, and an activation unit.

In the above technical solution, each of the discriminators measures similarity between the real data distribution and the generated sample distribution using Wasserstein distance, and an expression of the similarity is as follows:

wherein, P_rFor true distribution, P_gIs sample distribution, gamma-pi (P)_r,P_g) For the joint distribution of the true and generated samples,

transforming the true distribution x into the generated distribution subject to a combined distribution γ of true and generated sample distributions

The required cost, inf is the symbol to lower bound, W (P)_r,P_g) Is the minimum of this "cost"; the Wasserstein distance cannot be directly solved, and according to Kantorovich-Rubinstein duality, the solving approximation of the Wasserstein distance is converted into a continuous function f (g) which meets the Lipschitz continuous condition, so that:

where f (g) is a continuous function that satisfies the Lipschitz continuity condition, and K is a Lipschitz constant.

And (f) ensuring that f (g) meets the Lipschitz continuous condition by adopting a weight clipping mode.

In the technical scheme, the similarity between the high-resolution image obtained by super resolution and the real high-resolution image is quantified by using Charbonier Loss L1-Charbonier-Loss based on an L1 norm, and a small batch (mini-batch) learning is adopted in the training process; the expression of the loss function employed by the generative network is:

wherein

Taking 10 from epsilon^-6，I^HRFor true high-resolution images, I^SRFor performing super-resolution of the resulting high-resolution image, H, W, C are the size and number of channels of the input image, respectively, n is the number of mini-batch learning, I_v,i,j,The position of the k channel of the v image is a pixel value of (i, j);

loss of generator network by discriminators operating in the pixel domain

With the loss of the generator by discriminators operating in the domain of the feature map

Are respectively:

wherein D_WGANAs an abstract function of the discriminator, x to p_gMeans that a sample x is subject to generating a sample data distribution, wherein x-p_rThe method comprises the steps that a sample x obeys real data distribution, and VGG (quadrature green gram) obtains an unactivated feature map before the VGG-19 network is subjected to fifth maximal pooling and after the fourth convolution;

the method comprises the following steps of combining Charbonnier loss of a generative network based on an L1 norm and the loss of a double-discriminator to the generative network, training a network model until convergence is achieved, wherein a loss function of a generator is composed of three parts by weight, and the expression is as follows:

wherein λ₁,λ₂To balance the Charbonier loss based on the L1 norm and the loss factor of the Wasserstein distance-based dual discriminators, respectively, to the generative network, the goal of the network in the training phase is to minimize the loss function L_G，L_GThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image is, the better the super-resolution effect is and the higher the precision is.

In the technical scheme, the step a comprises the steps of cutting an input image, carrying out double-thrice down-sampling operation on a sub-image obtained by cutting to obtain a corresponding low-resolution image, and obtaining more training samples by using data enhancement means such as rotation and mirror image.

Compared with the prior art, the invention has the advantages that:

the generative network is based on a hybrid attention mechanism, and the unit is added in the network structure to accelerate network convergence, enhance the characteristic representation capability and improve the performance of the network;

the generated network is based on a residual error neural network, a local jump connection structure is added in a basic unit, and a global jump connection structure is introduced to directly connect the top layer and the bottom layer of the network, so that the residual error is learned, the problems of gradient disappearance and network degradation are solved, the difficulty in training a deep network is reduced, and the network performance is improved;

the dual-discriminator-based confrontation network measures the similarity of the real data distribution and the generated sample distribution by using Wasserstein distance, has good convergence compared with the originally generated confrontation network, and improves the problems of unstable training, gradient disappearance and model collapse. On the basis of the original generating countermeasure network based on Wasserstein distance, the invention adds a discriminator working in a feature map domain, and the discriminator discriminates whether the current input feature map belongs to a real high-resolution image or a high-resolution image generated by the generating network in the feature map domain and restrains the generating result of the generating network.

Drawings

FIG. 1 is a block diagram of a hybrid power mechanism unit of the present invention.

Fig. 2 is a basic unit structure diagram of a generative network based on a residual mixed attention mechanism proposed by the present invention.

Fig. 3 is an overall network structure diagram of a generative network based on a residual mixed attention mechanism proposed by the present invention.

Fig. 4 is a network structure diagram of the double-discriminator generating countermeasure network based on Wasserstein distance according to the present invention.

Fig. 5 is an overall network configuration diagram of the present invention.

FIG. 6 is a graph of the image effect obtained by the super-resolution of 2 times in the present invention.

Detailed Description

The invention will be further described in detail with reference to the following drawings and specific examples, which are not intended to limit the invention, but are for clear understanding.

The invention provides an image super-resolution method for generating a countermeasure network based on a double-discriminator, wherein the super-resolution image generation countermeasure network comprises a generation network based on a residual mixed attention mechanism and a generation countermeasure network based on the double-discriminator, and the method comprises the following steps:

firstly, preprocessing an input image, enhancing data, and constructing a training sample:

the specific implementation of the preprocessing and the data enhancement is to cut an input image into a sub-image with the size of 96 × 96, perform double-triple down-sampling operation on the cut sub-image by using an imresize function of Matlab to obtain a corresponding low-resolution image with the size of 48 × 48, and acquire more training samples by using data enhancement such as rotation, mirroring and the like.

Inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;

the generating network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit, wherein the feature extraction unit extracts feature representation of an input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features, regional features and the like through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.

The overall network structure diagram of the invention is shown in fig. 5, the generating network is divided into five stages, which are respectively characteristic extraction, characteristic nonlinear mapping, characteristic dimension reduction, sub-pixel convolution upsampling and convolution to obtain final output; the output high-resolution image is input into two discriminators working in a pixel domain and a characteristic map domain to obtain the loss of the double discriminators to a generative network respectively

In the above technical solution, the nonlinear mapping unit includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit also comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, the residual error between the input characteristic diagram and the output characteristic diagram is learned, the problems of gradient disappearance and network degradation are improved, and the difficulty of training a deep network is reduced. The mixed attention block consists of a cascade convolution layer and an activation layer; and corresponding descriptors are learned in one step aiming at the input feature graph, and the descriptors are used for endowing different channels and different regions with different weights, so that the feature expression capability of the network is enhanced.

As shown in fig. 1, there are 2 cascaded convolutional layers and active layers in the mixed attention block. The proposed mixed attention block is composed of cascaded convolution and activation layers, and the mechanism unit learns corresponding descriptors for input feature maps in one step, so that compared with learning corresponding descriptors for different channels and different regions in stages, the mechanism unit has fewer parameters and higher efficiency.

The dimensions of the input feature graph and the output feature graph are H W C, conv is convolution operation, RELU and Sigmoid are two different activation functions,

is a Hadamard product. Inputting a feature graph with dimension H W C, and obtaining a descriptor with dimension H W C through two cascaded convolutions and activations:

wherein W₁For the parameters of the first layer of convolution, the first layer of convolution executes the channel number dimensionality reduction of the feature map with the factor of 16 to obtain the feature map with the dimensionality of H W C/16, delta (g) is the RELU activation operation, W₂For the parameters of the second layer of convolution, the second convolution performs a feature map channel number ascending dimension with a factor of 16, and f (g) activates the operation for Sigmoid. Double convolution, activation pair channel dimensionPerforming channel number reduction and dimension increase, and learning C description matrixes corresponding to different channels

C, adaptively assigning a more sparse description matrix to channels containing a large amount of redundant low-frequency information, so that the neural network focuses more on channels containing abundant high-frequency information. Each description matrix τ_iIs H × W, corresponding to each element of the ith channel of the original input image. After two times of convolution and activation, the area of the original input image containing rich high-frequency information is reserved, the area containing a large amount of redundant low-frequency information is inhibited, and the descriptor tau is obtained_iHadamard multiplication is carried out on the input channel I and the input channel I, so that the neural network focuses more on the area containing rich high-frequency information in the channel I. In summary, the descriptor τ is multiplied by the original input Hadamard to obtain the feature representation through the hybrid attention block.

The structure of the basic unit of the generating network based on the residual mixed attention mechanism is shown in fig. 2, wherein RELU and Sigmoid are two different activation functions,

is a product of the Hadamard codes,

is added pixel by pixel. The basic unit consists of cascaded convolution, activation, convolution and mixed attention blocks, and a Local skip connection structure (Local skip connection) is added in the basic unit, so that the problems of gradient disappearance and network degradation are solved. Specifically, a feature map of a basic unit is input, convolution operation is performed first to obtain a deeper feature representation, parameters of a convolution layer are set to be 3 × 3 × 256 × 256, namely 256 convolution kernels with the size of 3 × 3, the number of channels of each convolution kernel is 256, the step size of the convolution operation is 1, and zero padding operation is used at the edge to keep the sizes of the input and output feature maps consistent. And the feature graph obtained by convolution is output to the next-stage basic unit in cascade connection to extract deeper features after passing through the mixed attention block.

The overall network structure diagram of the generating network based on the residual mixed attention mechanism is shown in fig. 3, and the overall network structure consists of three parts, namely feature extraction, nonlinear mapping and sub-pixel convolution upsampling. The feature extraction unit is composed of convolution layers, parameters of convolution kernels are set to be 3 × 3 × 3 × 256, namely 256 convolution kernels with the size of 3 × 3 are set, and the number of channels of each convolution kernel is 3. The nonlinear mapping stage unit consists of 32 cascaded basic units, a local jump connection structure is added in each basic unit, and a global jump connection structure is introduced to directly connect the top layer and the bottom layer of the network, so that residual errors are learned, the problems of gradient disappearance and network degradation are solved, and the difficulty of training a deep network is reduced; and finally, completing up-sampling by using the sub-pixel convolution layer and the convolution to obtain a final output high-resolution image.

Thirdly, inputting a high-resolution image into a generating type countermeasure network; the generative confrontation network comprises two discriminators which discriminate whether the input image is a real high-resolution image or a high-resolution image generated by the generative network in a pixel domain and a feature map domain, respectively. And (3) jointly generating network loss and dual-arbiter loss to the generating network respectively, reversely propagating gradient information, and updating parameters of the generating network and the dual-arbiter. As shown in fig. 4, where leak relu is the activation function, Negative _ slope of leak relu is set to 0.2, the batch normalization layer batch normalizes the input data of each batch to be subject to a normal distribution with mean 0 and variance 1, and Linear is a Linear regression function. The image data input to the discriminator is subjected to 8 cascaded convolution, batch normalization and activation layers, deep features of the input image are extracted, and then the deep features are input to the cascaded linear regression, activation and linear regression layers, so that the Wasserstein distance between the approximately-fitted real data distribution and the sample data distribution is obtained.

The generative confrontation network is a dual-discriminator generative confrontation network based on Wasserstein distance, each discriminator uses the Wasserstein distance to measure the similarity between a real data distribution and a generated sample distribution, and the expression of the similarity is as follows:

And (f) ensuring that f (g) meets the Lipschitz continuous condition by adopting a weight clipping mode. Each discriminator f (g) is formed by 8 basic units which are cascaded through neural network fitting, linear regression, LeakyRELU activation and linear regression, and each basic unit is formed by cascaded convolution, batch normalization and LeakyRELU activation.

The double-discriminator is composed of two discriminators f (g) which discriminate in a pixel domain and a feature map domain respectively, and the double-discriminator discriminates in the pixel domain and the feature map domain respectively. And judging whether the output image of the generator network is a real high-resolution image or a high-resolution image generated by the generator network directly in a pixel domain, and judging whether the output image of the generator network is the real high-resolution image or the high-resolution image generated by the generator network directly in a feature map domain, wherein the output image of the generator network is input into a VGG-19 network firstly, a feature map which is not activated before the fifth maximum pooling and after the fourth convolution is obtained is used as the input of a feature map domain discriminator, and the current input feature map belongs to the real high-resolution image or the high-resolution image generated by the generator network.

And fourthly, constraining the generative network training by the Charbonnier loss of the joint generative network based on the L1 norm and the loss of the two discriminators to the generative network respectively, enabling the generative network and the two discriminators to resist learning, and training the generative network model until convergence is achieved.

The generator quantifies the similarity degree of the high-resolution image obtained by super resolution and the real high-resolution image by using Charbonnier loss based on the L1 norm and loss of the GAN pair generator based on WGAN, and guides network learning. The expression of the Charbonnier loss function based on the L1 norm is used as follows:

wherein

Taking 10 from epsilon^-6，I^HRFor true high-resolution images, I^SRFor performing super-resolution of the resulting high-resolution image, H, W, C are the size and number of channels of the input image, respectively, n is the number of mini-batch learning, I_v,i,j,kThe k channel of the v-th image has a pixel value of (i, j) as a position.

Loss of generator network by discriminators operating in the pixel domain

Are respectively:

wherein D_WGANAs an abstract function of the discriminator, x to p_gMeans that a sample x is subject to generating a sample data distribution, wherein x-p_rMeaning that sample x obeys the true data distribution, VGG (.) is used to obtain the feature map before the fifth maximal pooling of VGG-19 network, and after the fourth convolution.

And fifthly, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image. The image obtained by super-resolution reconstruction is equivalent to the amplification of the low-resolution image

And (4) doubling the obtained product.

To demonstrate the effectiveness of the present invention, the DIV2K data Set, common in the field of image super resolution studies, was used as the training Set, and the Set5, Set14, BSD100, Urban100, and Manga109 data sets were used as the test Set. In the experiment, a bicubic interpolation method and two representative methods based on a convolutional neural network are selected for comparison with a method only using the method for generating the network. In order to ensure the fairness of the comparison, the methods are all tested under the same hardware environment.

Two representative methods based on the convolutional neural network are selected as follows:

the method comprises the following steps: the method proposed by Dong et al, references are: dong C, Loy C, He K, et al. image super-resolution using depth dependent networks [ J ]. IEEE transactions on pattern analysis and machine interaction, 2015,38(2): 295-.

The method 2 comprises the following steps: the method proposed by Lai et al, reference is made to: lai W S, Huang J B, Ahuja N, et al, fast and cure image super-resolution with deep laplacian pyramid networks [ J ]. IEEE transactions on pattern analysis and machine interaction, 2018.

The selected evaluation indexes are as follows: the objective indexes widely used for evaluating the image super-resolution effect comprise Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), and the PSNR and the SSIM are selected as the indexes for objective evaluation. In addition, the time required for completing super-resolution of a single image is also taken as one of the objective evaluation indexes of reference.

During evaluation, firstly, the obtained high-resolution image and a real high-resolution image are cut, and pixel points with the number of image edges as amplification factors are removed; then, the image is converted from RGB color space to YCbCr color space, and the Y channel image is taken out, and the objective index is calculated only on the Y channel. The PSNR is calculated as:

wherein W, H are image sizes, I^HRFor true high-resolution images, I^SRTo perform super-resolution of the resulting high-resolution image, I_i,jThe larger the value of PSNR for the pixel value at position (i, j), the better the comparative image quality.

The formula for SSIM is:

wherein u, sigma are the pixel mean and variance of two images, C₁,C₂,C₃To prevent constants whose denominator is 0. The value range of SSIM is [0,1 ]]The closer the value is to 1, the more similar the two images compared.

Comparing objective evaluation results of each method with 2 times of super-resolution:

from the experimental data of the methods in the table above, it can be seen that the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) of the present invention are significantly improved over the comparative method, and are significantly better than the method of generating a network only using the present invention. As can be seen from the image effect graph obtained by the super-resolution of 2 times shown in fig. 6, the high-resolution image obtained by the method has better image sharpness and still can retain more texture details. In summary, the present invention is an effective image super-resolution method.

Details not described in this specification are within the skill of the art that are well known to those skilled in the art.

Claims

1. A method for generating image super-resolution of a confrontation network based on a dual-discriminator is characterized in that the confrontation network based on the dual-discriminator comprises a generating network based on a residual neural network and a mixed attention mechanism and a confrontation network containing the dual-discriminator, and the method comprises the following steps:

a. in the training stage, a training sample is constructed;

e. inputting a low-resolution image to a trained generative network model in a testing stage, and performing super-resolution reconstruction to obtain a final high-resolution image;

the similarity degree of a high-resolution image obtained by super resolution and a real high-resolution image is quantified by using Charbonier Loss L1-Charbonier-Loss based on an L1 norm, and a training process adopts small-batch learning; the expression of the loss function employed by the generative network is:

wherein

Taking 10 from epsilon^-6H, W and C are the size and the channel number of the input image respectively, and n is the number of small-batch learning;

a pixel value indicating that the position of the k channel of the v real high-resolution image is (i, j);

a pixel value indicating that the position of the k channel of the vth image obtained by performing super-resolution is (i, j); i is^SRA high resolution image obtained for performing super resolution;

charbonnier loss and dual discriminator pair generation based on L1 norm for joint generation networksLoss of generative network, training the network model until convergence is reached, loss of generative network by a discriminator operating in the pixel domain

Are respectively:

wherein D_WGANAs an abstract function of the discriminator, x to p_gMeans that a sample x is subject to generating a sample data distribution, wherein x-p_rThe sample x obeys real data distribution, and VGG (-) is used for obtaining a feature map after the fifth maximum pooling of the VGG-19 network and before the fourth convolution;

wherein λ₁,λ₂To balance the respective loss factors of the Charbonier loss based on the L1 norm and the double-discriminator based on the Wasserstein distance to the generative network, the objective of the network in the training phase is to minimize the loss function L_G，L_GThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image, the smaller the super-resolutionThe better the effect of the rate, the higher the accuracy.

2. The image super-resolution method for generating a confrontation network based on a dual-discriminator according to claim 1, wherein the generative network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit; the feature extraction unit extracts feature representation of the input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features and regional features through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.

3. The dual-discriminator-based image super-resolution method for generating a countermeasure network according to claim 2, wherein the nonlinear mapping unit in the generative network includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit further comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, and the generated network learns the residual error between the input characteristic graph and the output characteristic graph.

4. The dual-discriminator-based image super-resolution method for generating a countermeasure network according to claim 3, wherein the hybrid attention block is composed of a cascade of a convolutional layer and an active layer; and learning corresponding descriptors in one step aiming at the input feature graph, wherein the descriptors are used for giving different weights to different channels and different regions.

5. The method for generating image super resolution of confrontation network based on dual discriminators as claimed in claim 1, wherein said discriminating in pixel domain means directly inputting the output image of the generating network to a discriminator to discriminate whether it is a real high resolution image or a high resolution image generated by the generating network;

and in the feature map domain, the judgment means that the output image of the generator network is input into the VGG-19 network to obtain the feature map which is not activated before the fifth maximal pooling and after the fourth convolution, and the feature map is used as the input of another discriminator to judge whether the current input feature map belongs to the real high-resolution image or the high-resolution image generated by the generator network.

6. The method for generating image super resolution of confrontation network based on double classifiers according to claim 5, wherein each classifier is composed of 8 cascaded basic units, linear regression units, activation units and linear regression units through neural network fitting, and each basic unit is composed of cascaded convolutional layers, batch normalization units and activation units.

7. The method for generating image super-resolution of countermeasure network based on dual discriminators as claimed in claim 1, wherein step a comprises cropping the input image, performing a bicubic down-sampling operation on the cropped sub-image to obtain a corresponding low resolution image, and acquiring more training samples by using rotation and mirror image data enhancement means.