CN111028146B - Image super-resolution method for generating countermeasure network based on double discriminators - Google Patents

Image super-resolution method for generating countermeasure network based on double discriminators Download PDF

Info

Publication number
CN111028146B
CN111028146B CN201911076333.1A CN201911076333A CN111028146B CN 111028146 B CN111028146 B CN 111028146B CN 201911076333 A CN201911076333 A CN 201911076333A CN 111028146 B CN111028146 B CN 111028146B
Authority
CN
China
Prior art keywords
network
resolution
image
generating
resolution image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911076333.1A
Other languages
Chinese (zh)
Other versions
CN111028146A (en
Inventor
刘可文
马圆
黄睿挺
熊红霞
房攀攀
陈亚雷
李小军
刘朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongyao Tiandi Beijing Information Technology Co ltd
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN201911076333.1A priority Critical patent/CN111028146B/en
Publication of CN111028146A publication Critical patent/CN111028146A/en
Application granted granted Critical
Publication of CN111028146B publication Critical patent/CN111028146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image super-resolution method for generating a countermeasure network based on a double-discriminator, which comprises the following steps: constructing a training sample in a training stage; inputting the training sample into a generating network, and outputting a high-resolution image by the generating network; a high resolution image input countermeasure network; sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively; and in the testing stage, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image. The countermeasure network in the invention further improves the super-resolution precision by the constraint generation type network training of two discriminators respectively working in the pixel domain and the characteristic map domain.

Description

Image super-resolution method for generating countermeasure network based on double discriminators
Technical Field
The invention relates to the field of digital image processing, in particular to an image super-resolution method for generating a countermeasure network based on a double-discriminator.
Background
The image is the most main carrier for transmitting information in human society, and the image processing field has great research value. In digital imaging applications, it is particularly necessary to have high resolution images. In practical applications, it may be difficult to obtain high resolution images due to a variety of intrinsic or extrinsic effects. The most straightforward improvement is from the point of view of the imaging hardware, but high resolution optical sensors are expensive. The method has the advantages of being good in universality and high in efficiency due to improvement of software, and having wide application prospect due to the fact that the super-resolution algorithm is used for executing the super-resolution of the low-quality image efficiently and quickly.
The image super-resolution methods mainly include three types, which are interpolation-based methods, modeling-based methods and learning-based methods, and the learning-based methods can be divided into sparse representation-based methods and convolutional neural network-based methods. The interpolation-based method has the characteristic of high calculation efficiency, but high-frequency texture detail information is easily lost. The modeling-based method utilizes prior information to constrain solution space, the effect is improved to a certain extent compared with an interpolation-based method, but when the size of an input image is small, less prior information can be effectively utilized, and the super-resolution effect is poor. The learning-based method realizes super-resolution by learning the internal relationship between low-resolution images and high-resolution images. In recent years, super-resolution methods based on convolutional neural networks have achieved high accuracy. However, the convolution kernel of the convolutional neural network treats each channel and region of the feature map equally, reducing the feature expression capability of the channels and regions of the network that contain rich high-frequency information. In addition, conventional convolutional neural networks have problems of gradient disappearance and network degradation.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an image super-resolution method for generating a confrontation network based on a double-discriminator, wherein a hybrid attention mechanism is introduced into a residual neural network by a generating network, so that the network training difficulty is reduced, the characteristic expression capability of the network is enhanced, the network convergence is accelerated, and the network performance is improved; the countermeasure network further improves the super-resolution precision through two discriminants working in a pixel domain and a feature map domain respectively to constrain the generation type network training.
The invention provides an image super-resolution method for generating a confrontation network based on a double-discriminator, which is characterized in that the confrontation network based on the double-discriminator comprises a generating network based on a residual neural network and a mixed attention mechanism and a confrontation network containing the double-discriminator, and the method comprises the following steps:
a. in the training stage, a training sample is constructed;
b. inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;
c. a high resolution image input countermeasure network; the countermeasure network comprises two discriminators, and the discriminators are used for discriminating whether the image input to the discriminators is a real high-resolution image or a high-resolution image generated by the generating network in a pixel domain and a feature map domain respectively;
d. sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively;
e. and in the testing stage, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image.
In the technical scheme, the generating network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit, wherein the feature extraction unit extracts feature representation of an input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features, regional features and the like through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.
In the above technical solution, the nonlinear mapping unit in the generative network includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit also comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, so that the network learns the residual error between the input characteristic diagram and the output characteristic diagram, the problems of gradient disappearance and network degradation are improved, and the difficulty of training a deep network is reduced.
In the above technical solution, the mixed attention block is composed of a cascade convolution layer and an activation layer; and corresponding descriptors are learned in one step aiming at the input feature graph, and the descriptors are used for endowing different channels and different regions with different weights, so that the feature expression capability of the network is enhanced.
In the above technical solution, the determining in the pixel domain means directly inputting an output image of the generating network to a determiner to determine whether the image is a real high resolution image or a high resolution image generated by the generating network;
and in the feature map domain, the judgment means that the output image of the generator network is input into the VGG-19 network to obtain an inactivated feature map before the fifth maximal pooling and after the fourth convolution, and the inactivated feature map is used as the input of another discriminator to judge whether the current input feature map belongs to a real high-resolution image or a high-resolution image generated by the generator network.
In the above technical solution, each of the discriminators is fitted by a neural network, and is composed of 8 cascaded basic units, a linear regression unit, an activation unit, and a linear regression unit, and each basic unit is composed of a cascaded convolutional layer, a batch normalization unit, and an activation unit.
In the above technical solution, each of the discriminators measures similarity between the real data distribution and the generated sample distribution using Wasserstein distance, and an expression of the similarity is as follows:
Figure 2
wherein, PrFor true distribution, PgIs sample distribution, gamma-pi (P)r,Pg) For the joint distribution of the true and generated samples,
Figure BDA0002262564180000051
transforming the true distribution x into the generated distribution subject to a combined distribution γ of true and generated sample distributions
Figure BDA0002262564180000052
The required cost, inf is the symbol to lower bound, W (P)r,Pg) Is the minimum of this "cost"; the Wasserstein distance cannot be directly solved, and according to Kantorovich-Rubinstein duality, the solving approximation of the Wasserstein distance is converted into a continuous function f (g) which meets the Lipschitz continuous condition, so that:
Figure BDA0002262564180000053
where f (g) is a continuous function that satisfies the Lipschitz continuity condition, and K is a Lipschitz constant.
And (f) ensuring that f (g) meets the Lipschitz continuous condition by adopting a weight clipping mode.
In the technical scheme, the similarity between the high-resolution image obtained by super resolution and the real high-resolution image is quantified by using Charbonier Loss L1-Charbonier-Loss based on an L1 norm, and a small batch (mini-batch) learning is adopted in the training process; the expression of the loss function employed by the generative network is:
Figure BDA0002262564180000054
wherein
Figure BDA0002262564180000055
Taking 10 from epsilon-6,IHRFor true high-resolution images, ISRFor performing super-resolution of the resulting high-resolution image, H, W, C are the size and number of channels of the input image, respectively, n is the number of mini-batch learning, Iv,i,j,The position of the k channel of the v image is a pixel value of (i, j);
loss of generator network by discriminators operating in the pixel domain
Figure 1
With the loss of the generator by discriminators operating in the domain of the feature map
Figure BDA0002262564180000057
Are respectively:
Figure BDA0002262564180000061
Figure BDA0002262564180000062
wherein DWGANAs an abstract function of the discriminator, x to pgMeans that a sample x is subject to generating a sample data distribution, wherein x-prThe method comprises the steps that a sample x obeys real data distribution, and VGG (quadrature green gram) obtains an unactivated feature map before the VGG-19 network is subjected to fifth maximal pooling and after the fourth convolution;
the method comprises the following steps of combining Charbonnier loss of a generative network based on an L1 norm and the loss of a double-discriminator to the generative network, training a network model until convergence is achieved, wherein a loss function of a generator is composed of three parts by weight, and the expression is as follows:
Figure BDA0002262564180000063
wherein λ12To balance the Charbonier loss based on the L1 norm and the loss factor of the Wasserstein distance-based dual discriminators, respectively, to the generative network, the goal of the network in the training phase is to minimize the loss function LG,LGThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image is, the better the super-resolution effect is and the higher the precision is.
In the technical scheme, the step a comprises the steps of cutting an input image, carrying out double-thrice down-sampling operation on a sub-image obtained by cutting to obtain a corresponding low-resolution image, and obtaining more training samples by using data enhancement means such as rotation and mirror image.
Compared with the prior art, the invention has the advantages that:
the generative network is based on a hybrid attention mechanism, and the unit is added in the network structure to accelerate network convergence, enhance the characteristic representation capability and improve the performance of the network;
the generated network is based on a residual error neural network, a local jump connection structure is added in a basic unit, and a global jump connection structure is introduced to directly connect the top layer and the bottom layer of the network, so that the residual error is learned, the problems of gradient disappearance and network degradation are solved, the difficulty in training a deep network is reduced, and the network performance is improved;
the dual-discriminator-based confrontation network measures the similarity of the real data distribution and the generated sample distribution by using Wasserstein distance, has good convergence compared with the originally generated confrontation network, and improves the problems of unstable training, gradient disappearance and model collapse. On the basis of the original generating countermeasure network based on Wasserstein distance, the invention adds a discriminator working in a feature map domain, and the discriminator discriminates whether the current input feature map belongs to a real high-resolution image or a high-resolution image generated by the generating network in the feature map domain and restrains the generating result of the generating network.
Drawings
FIG. 1 is a block diagram of a hybrid power mechanism unit of the present invention.
Fig. 2 is a basic unit structure diagram of a generative network based on a residual mixed attention mechanism proposed by the present invention.
Fig. 3 is an overall network structure diagram of a generative network based on a residual mixed attention mechanism proposed by the present invention.
Fig. 4 is a network structure diagram of the double-discriminator generating countermeasure network based on Wasserstein distance according to the present invention.
Fig. 5 is an overall network configuration diagram of the present invention.
FIG. 6 is a graph of the image effect obtained by the super-resolution of 2 times in the present invention.
Detailed Description
The invention will be further described in detail with reference to the following drawings and specific examples, which are not intended to limit the invention, but are for clear understanding.
The invention provides an image super-resolution method for generating a countermeasure network based on a double-discriminator, wherein the super-resolution image generation countermeasure network comprises a generation network based on a residual mixed attention mechanism and a generation countermeasure network based on the double-discriminator, and the method comprises the following steps:
firstly, preprocessing an input image, enhancing data, and constructing a training sample:
the specific implementation of the preprocessing and the data enhancement is to cut an input image into a sub-image with the size of 96 × 96, perform double-triple down-sampling operation on the cut sub-image by using an imresize function of Matlab to obtain a corresponding low-resolution image with the size of 48 × 48, and acquire more training samples by using data enhancement such as rotation, mirroring and the like.
Inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;
the generating network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit, wherein the feature extraction unit extracts feature representation of an input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features, regional features and the like through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.
The overall network structure diagram of the invention is shown in fig. 5, the generating network is divided into five stages, which are respectively characteristic extraction, characteristic nonlinear mapping, characteristic dimension reduction, sub-pixel convolution upsampling and convolution to obtain final output; the output high-resolution image is input into two discriminators working in a pixel domain and a characteristic map domain to obtain the loss of the double discriminators to a generative network respectively
In the above technical solution, the nonlinear mapping unit includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit also comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, the residual error between the input characteristic diagram and the output characteristic diagram is learned, the problems of gradient disappearance and network degradation are improved, and the difficulty of training a deep network is reduced. The mixed attention block consists of a cascade convolution layer and an activation layer; and corresponding descriptors are learned in one step aiming at the input feature graph, and the descriptors are used for endowing different channels and different regions with different weights, so that the feature expression capability of the network is enhanced.
As shown in fig. 1, there are 2 cascaded convolutional layers and active layers in the mixed attention block. The proposed mixed attention block is composed of cascaded convolution and activation layers, and the mechanism unit learns corresponding descriptors for input feature maps in one step, so that compared with learning corresponding descriptors for different channels and different regions in stages, the mechanism unit has fewer parameters and higher efficiency.
The dimensions of the input feature graph and the output feature graph are H W C, conv is convolution operation, RELU and Sigmoid are two different activation functions,
Figure BDA0002262564180000101
is a Hadamard product. Inputting a feature graph with dimension H W C, and obtaining a descriptor with dimension H W C through two cascaded convolutions and activations:
Figure BDA0002262564180000102
wherein W1For the parameters of the first layer of convolution, the first layer of convolution executes the channel number dimensionality reduction of the feature map with the factor of 16 to obtain the feature map with the dimensionality of H W C/16, delta (g) is the RELU activation operation, W2For the parameters of the second layer of convolution, the second convolution performs a feature map channel number ascending dimension with a factor of 16, and f (g) activates the operation for Sigmoid. Double convolution, activation pair channel dimensionPerforming channel number reduction and dimension increase, and learning C description matrixes corresponding to different channels
Figure BDA0002262564180000103
C, adaptively assigning a more sparse description matrix to channels containing a large amount of redundant low-frequency information, so that the neural network focuses more on channels containing abundant high-frequency information. Each description matrix τiIs H × W, corresponding to each element of the ith channel of the original input image. After two times of convolution and activation, the area of the original input image containing rich high-frequency information is reserved, the area containing a large amount of redundant low-frequency information is inhibited, and the descriptor tau is obtainediHadamard multiplication is carried out on the input channel I and the input channel I, so that the neural network focuses more on the area containing rich high-frequency information in the channel I. In summary, the descriptor τ is multiplied by the original input Hadamard to obtain the feature representation through the hybrid attention block.
The structure of the basic unit of the generating network based on the residual mixed attention mechanism is shown in fig. 2, wherein RELU and Sigmoid are two different activation functions,
Figure BDA0002262564180000111
is a product of the Hadamard codes,
Figure BDA0002262564180000112
is added pixel by pixel. The basic unit consists of cascaded convolution, activation, convolution and mixed attention blocks, and a Local skip connection structure (Local skip connection) is added in the basic unit, so that the problems of gradient disappearance and network degradation are solved. Specifically, a feature map of a basic unit is input, convolution operation is performed first to obtain a deeper feature representation, parameters of a convolution layer are set to be 3 × 3 × 256 × 256, namely 256 convolution kernels with the size of 3 × 3, the number of channels of each convolution kernel is 256, the step size of the convolution operation is 1, and zero padding operation is used at the edge to keep the sizes of the input and output feature maps consistent. And the feature graph obtained by convolution is output to the next-stage basic unit in cascade connection to extract deeper features after passing through the mixed attention block.
The overall network structure diagram of the generating network based on the residual mixed attention mechanism is shown in fig. 3, and the overall network structure consists of three parts, namely feature extraction, nonlinear mapping and sub-pixel convolution upsampling. The feature extraction unit is composed of convolution layers, parameters of convolution kernels are set to be 3 × 3 × 3 × 256, namely 256 convolution kernels with the size of 3 × 3 are set, and the number of channels of each convolution kernel is 3. The nonlinear mapping stage unit consists of 32 cascaded basic units, a local jump connection structure is added in each basic unit, and a global jump connection structure is introduced to directly connect the top layer and the bottom layer of the network, so that residual errors are learned, the problems of gradient disappearance and network degradation are solved, and the difficulty of training a deep network is reduced; and finally, completing up-sampling by using the sub-pixel convolution layer and the convolution to obtain a final output high-resolution image.
Thirdly, inputting a high-resolution image into a generating type countermeasure network; the generative confrontation network comprises two discriminators which discriminate whether the input image is a real high-resolution image or a high-resolution image generated by the generative network in a pixel domain and a feature map domain, respectively. And (3) jointly generating network loss and dual-arbiter loss to the generating network respectively, reversely propagating gradient information, and updating parameters of the generating network and the dual-arbiter. As shown in fig. 4, where leak relu is the activation function, Negative _ slope of leak relu is set to 0.2, the batch normalization layer batch normalizes the input data of each batch to be subject to a normal distribution with mean 0 and variance 1, and Linear is a Linear regression function. The image data input to the discriminator is subjected to 8 cascaded convolution, batch normalization and activation layers, deep features of the input image are extracted, and then the deep features are input to the cascaded linear regression, activation and linear regression layers, so that the Wasserstein distance between the approximately-fitted real data distribution and the sample data distribution is obtained.
The generative confrontation network is a dual-discriminator generative confrontation network based on Wasserstein distance, each discriminator uses the Wasserstein distance to measure the similarity between a real data distribution and a generated sample distribution, and the expression of the similarity is as follows:
Figure BDA0002262564180000121
wherein, PrFor true distribution, PgIs sample distribution, gamma-pi (P)r,Pg) For the joint distribution of the true and generated samples,
Figure BDA0002262564180000122
transforming the true distribution x into the generated distribution subject to a combined distribution γ of true and generated sample distributions
Figure BDA0002262564180000123
The required cost, inf is the symbol to lower bound, W (P)r,Pg) Is the minimum of this "cost"; the Wasserstein distance cannot be directly solved, and according to Kantorovich-Rubinstein duality, the solving approximation of the Wasserstein distance is converted into a continuous function f (g) which meets the Lipschitz continuous condition, so that:
Figure BDA0002262564180000131
where f (g) is a continuous function that satisfies the Lipschitz continuity condition, and K is a Lipschitz constant.
And (f) ensuring that f (g) meets the Lipschitz continuous condition by adopting a weight clipping mode. Each discriminator f (g) is formed by 8 basic units which are cascaded through neural network fitting, linear regression, LeakyRELU activation and linear regression, and each basic unit is formed by cascaded convolution, batch normalization and LeakyRELU activation.
The double-discriminator is composed of two discriminators f (g) which discriminate in a pixel domain and a feature map domain respectively, and the double-discriminator discriminates in the pixel domain and the feature map domain respectively. And judging whether the output image of the generator network is a real high-resolution image or a high-resolution image generated by the generator network directly in a pixel domain, and judging whether the output image of the generator network is the real high-resolution image or the high-resolution image generated by the generator network directly in a feature map domain, wherein the output image of the generator network is input into a VGG-19 network firstly, a feature map which is not activated before the fifth maximum pooling and after the fourth convolution is obtained is used as the input of a feature map domain discriminator, and the current input feature map belongs to the real high-resolution image or the high-resolution image generated by the generator network.
And fourthly, constraining the generative network training by the Charbonnier loss of the joint generative network based on the L1 norm and the loss of the two discriminators to the generative network respectively, enabling the generative network and the two discriminators to resist learning, and training the generative network model until convergence is achieved.
The generator quantifies the similarity degree of the high-resolution image obtained by super resolution and the real high-resolution image by using Charbonnier loss based on the L1 norm and loss of the GAN pair generator based on WGAN, and guides network learning. The expression of the Charbonnier loss function based on the L1 norm is used as follows:
Figure BDA0002262564180000141
wherein
Figure BDA0002262564180000142
Taking 10 from epsilon-6,IHRFor true high-resolution images, ISRFor performing super-resolution of the resulting high-resolution image, H, W, C are the size and number of channels of the input image, respectively, n is the number of mini-batch learning, Iv,i,j,kThe k channel of the v-th image has a pixel value of (i, j) as a position.
Loss of generator network by discriminators operating in the pixel domain
Figure BDA0002262564180000143
With the loss of the generator by discriminators operating in the domain of the feature map
Figure BDA0002262564180000144
Are respectively:
Figure BDA0002262564180000145
Figure BDA0002262564180000146
wherein DWGANAs an abstract function of the discriminator, x to pgMeans that a sample x is subject to generating a sample data distribution, wherein x-prMeaning that sample x obeys the true data distribution, VGG (.) is used to obtain the feature map before the fifth maximal pooling of VGG-19 network, and after the fourth convolution.
The method comprises the following steps of combining Charbonnier loss of a generative network based on an L1 norm and the loss of a double-discriminator to the generative network, training a network model until convergence is achieved, wherein a loss function of a generator is composed of three parts by weight, and the expression is as follows:
Figure BDA0002262564180000147
wherein λ12To balance the Charbonier loss based on the L1 norm and the loss factor of the Wasserstein distance-based dual discriminators, respectively, to the generative network, the goal of the network in the training phase is to minimize the loss function LG,LGThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image is, the better the super-resolution effect is and the higher the precision is.
And fifthly, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image. The image obtained by super-resolution reconstruction is equivalent to the amplification of the low-resolution image
Figure BDA0002262564180000151
And (4) doubling the obtained product.
To demonstrate the effectiveness of the present invention, the DIV2K data Set, common in the field of image super resolution studies, was used as the training Set, and the Set5, Set14, BSD100, Urban100, and Manga109 data sets were used as the test Set. In the experiment, a bicubic interpolation method and two representative methods based on a convolutional neural network are selected for comparison with a method only using the method for generating the network. In order to ensure the fairness of the comparison, the methods are all tested under the same hardware environment.
Two representative methods based on the convolutional neural network are selected as follows:
the method comprises the following steps: the method proposed by Dong et al, references are: dong C, Loy C, He K, et al. image super-resolution using depth dependent networks [ J ]. IEEE transactions on pattern analysis and machine interaction, 2015,38(2): 295-.
The method 2 comprises the following steps: the method proposed by Lai et al, reference is made to: lai W S, Huang J B, Ahuja N, et al, fast and cure image super-resolution with deep laplacian pyramid networks [ J ]. IEEE transactions on pattern analysis and machine interaction, 2018.
The selected evaluation indexes are as follows: the objective indexes widely used for evaluating the image super-resolution effect comprise Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), and the PSNR and the SSIM are selected as the indexes for objective evaluation. In addition, the time required for completing super-resolution of a single image is also taken as one of the objective evaluation indexes of reference.
During evaluation, firstly, the obtained high-resolution image and a real high-resolution image are cut, and pixel points with the number of image edges as amplification factors are removed; then, the image is converted from RGB color space to YCbCr color space, and the Y channel image is taken out, and the objective index is calculated only on the Y channel. The PSNR is calculated as:
Figure BDA0002262564180000161
wherein W, H are image sizes, IHRFor true high-resolution images, ISRTo perform super-resolution of the resulting high-resolution image, Ii,jThe larger the value of PSNR for the pixel value at position (i, j), the better the comparative image quality.
The formula for SSIM is:
Figure BDA0002262564180000162
wherein u, sigma are the pixel mean and variance of two images, C1,C2,C3To prevent constants whose denominator is 0. The value range of SSIM is [0,1 ]]The closer the value is to 1, the more similar the two images compared.
Comparing objective evaluation results of each method with 2 times of super-resolution:
Figure BDA0002262564180000171
from the experimental data of the methods in the table above, it can be seen that the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) of the present invention are significantly improved over the comparative method, and are significantly better than the method of generating a network only using the present invention. As can be seen from the image effect graph obtained by the super-resolution of 2 times shown in fig. 6, the high-resolution image obtained by the method has better image sharpness and still can retain more texture details. In summary, the present invention is an effective image super-resolution method.
Details not described in this specification are within the skill of the art that are well known to those skilled in the art.

Claims (7)

1. A method for generating image super-resolution of a confrontation network based on a dual-discriminator is characterized in that the confrontation network based on the dual-discriminator comprises a generating network based on a residual neural network and a mixed attention mechanism and a confrontation network containing the dual-discriminator, and the method comprises the following steps:
a. in the training stage, a training sample is constructed;
b. inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;
c. a high resolution image input countermeasure network; the countermeasure network comprises two discriminators, and the discriminators are used for discriminating whether the image input to the discriminators is a real high-resolution image or a high-resolution image generated by the generating network in a pixel domain and a feature map domain respectively;
d. sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively;
e. inputting a low-resolution image to a trained generative network model in a testing stage, and performing super-resolution reconstruction to obtain a final high-resolution image;
the similarity degree of a high-resolution image obtained by super resolution and a real high-resolution image is quantified by using Charbonier Loss L1-Charbonier-Loss based on an L1 norm, and a training process adopts small-batch learning; the expression of the loss function employed by the generative network is:
Figure FDA0003304986890000011
wherein
Figure FDA0003304986890000012
Taking 10 from epsilon-6H, W and C are the size and the channel number of the input image respectively, and n is the number of small-batch learning;
Figure FDA0003304986890000013
a pixel value indicating that the position of the k channel of the v real high-resolution image is (i, j);
Figure FDA0003304986890000014
a pixel value indicating that the position of the k channel of the vth image obtained by performing super-resolution is (i, j); i isSRA high resolution image obtained for performing super resolution;
charbonnier loss and dual discriminator pair generation based on L1 norm for joint generation networksLoss of generative network, training the network model until convergence is reached, loss of generative network by a discriminator operating in the pixel domain
Figure FDA0003304986890000015
With the loss of the generator by discriminators operating in the domain of the feature map
Figure FDA0003304986890000016
Are respectively:
Figure FDA0003304986890000017
Figure FDA0003304986890000018
wherein DWGANAs an abstract function of the discriminator, x to pgMeans that a sample x is subject to generating a sample data distribution, wherein x-prThe sample x obeys real data distribution, and VGG (-) is used for obtaining a feature map after the fifth maximum pooling of the VGG-19 network and before the fourth convolution;
the method comprises the following steps of combining Charbonnier loss of a generative network based on an L1 norm and the loss of a double-discriminator to the generative network, training a network model until convergence is achieved, wherein a loss function of a generator is composed of three parts by weight, and the expression is as follows:
Figure FDA0003304986890000021
wherein λ12To balance the respective loss factors of the Charbonier loss based on the L1 norm and the double-discriminator based on the Wasserstein distance to the generative network, the objective of the network in the training phase is to minimize the loss function LG,LGThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image, the smaller the super-resolutionThe better the effect of the rate, the higher the accuracy.
2. The image super-resolution method for generating a confrontation network based on a dual-discriminator according to claim 1, wherein the generative network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit; the feature extraction unit extracts feature representation of the input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features and regional features through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.
3. The dual-discriminator-based image super-resolution method for generating a countermeasure network according to claim 2, wherein the nonlinear mapping unit in the generative network includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit further comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, and the generated network learns the residual error between the input characteristic graph and the output characteristic graph.
4. The dual-discriminator-based image super-resolution method for generating a countermeasure network according to claim 3, wherein the hybrid attention block is composed of a cascade of a convolutional layer and an active layer; and learning corresponding descriptors in one step aiming at the input feature graph, wherein the descriptors are used for giving different weights to different channels and different regions.
5. The method for generating image super resolution of confrontation network based on dual discriminators as claimed in claim 1, wherein said discriminating in pixel domain means directly inputting the output image of the generating network to a discriminator to discriminate whether it is a real high resolution image or a high resolution image generated by the generating network;
and in the feature map domain, the judgment means that the output image of the generator network is input into the VGG-19 network to obtain the feature map which is not activated before the fifth maximal pooling and after the fourth convolution, and the feature map is used as the input of another discriminator to judge whether the current input feature map belongs to the real high-resolution image or the high-resolution image generated by the generator network.
6. The method for generating image super resolution of confrontation network based on double classifiers according to claim 5, wherein each classifier is composed of 8 cascaded basic units, linear regression units, activation units and linear regression units through neural network fitting, and each basic unit is composed of cascaded convolutional layers, batch normalization units and activation units.
7. The method for generating image super-resolution of countermeasure network based on dual discriminators as claimed in claim 1, wherein step a comprises cropping the input image, performing a bicubic down-sampling operation on the cropped sub-image to obtain a corresponding low resolution image, and acquiring more training samples by using rotation and mirror image data enhancement means.
CN201911076333.1A 2019-11-06 2019-11-06 Image super-resolution method for generating countermeasure network based on double discriminators Active CN111028146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911076333.1A CN111028146B (en) 2019-11-06 2019-11-06 Image super-resolution method for generating countermeasure network based on double discriminators

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911076333.1A CN111028146B (en) 2019-11-06 2019-11-06 Image super-resolution method for generating countermeasure network based on double discriminators

Publications (2)

Publication Number Publication Date
CN111028146A CN111028146A (en) 2020-04-17
CN111028146B true CN111028146B (en) 2022-03-18

Family

ID=70200916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911076333.1A Active CN111028146B (en) 2019-11-06 2019-11-06 Image super-resolution method for generating countermeasure network based on double discriminators

Country Status (1)

Country Link
CN (1) CN111028146B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524205A (en) * 2020-04-23 2020-08-11 北京信息科技大学 Image coloring processing method and device based on loop generation countermeasure network
CN111860618B (en) * 2020-07-01 2024-05-14 杭州健培科技有限公司 Bidirectional GAN model for pathological data conversion and construction and application methods thereof
CN111881920B (en) * 2020-07-16 2024-04-09 深圳力维智联技术有限公司 Network adaptation method of large-resolution image and neural network training device
CN111968032B (en) * 2020-07-23 2022-09-09 太原理工大学 Self-adaptive sampling single-pixel imaging method
CN111897809A (en) * 2020-07-24 2020-11-06 中国人民解放军陆军装甲兵学院 Command information system data generation method based on generation countermeasure network
CN111741018B (en) * 2020-07-24 2020-12-01 中国航空油料集团有限公司 Industrial control data attack sample generation method and system, electronic device and storage medium
CN111950619B (en) * 2020-08-05 2022-09-09 东北林业大学 Active learning method based on dual-generation countermeasure network
CN112992304B (en) * 2020-08-24 2023-10-13 湖南数定智能科技有限公司 High-resolution red eye case data generation method, device and storage medium
CN112232395B (en) * 2020-10-08 2023-10-27 西北工业大学 Semi-supervised image classification method for generating countermeasure network based on joint training
CN112396110B (en) * 2020-11-20 2024-02-02 南京大学 Method for generating augmented image of countermeasure cascade network
CN112686119B (en) * 2020-12-25 2022-12-09 陕西师范大学 License plate motion blurred image processing method based on self-attention generation countermeasure network
CN112598578B (en) * 2020-12-28 2022-12-30 北京航空航天大学 Super-resolution reconstruction system and method for nuclear magnetic resonance image
CN112837232B (en) * 2021-01-13 2022-10-04 山东省科学院海洋仪器仪表研究所 Underwater image enhancement and detail recovery method
CN112837221B (en) * 2021-01-26 2022-08-19 合肥工业大学 SAR image super-resolution reconstruction method based on dual discrimination
CN113012045B (en) * 2021-02-23 2022-07-15 西南交通大学 Generation countermeasure network for synthesizing medical image
CN113361566B (en) * 2021-05-17 2022-11-15 长春工业大学 Method for migrating generative confrontation network by using confrontation learning and discriminant learning
CN113724139B (en) * 2021-11-02 2022-03-15 南京理工大学 Unsupervised infrared single-image super-resolution method for generating countermeasure network based on double discriminators
CN115115783B (en) * 2022-07-08 2023-08-15 西南石油大学 Digital rock core construction method and system for simulating shale matrix nano-micro pores

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10971142B2 (en) * 2017-10-27 2021-04-06 Baidu Usa Llc Systems and methods for robust speech recognition using generative adversarial networks
CN108460717A (en) * 2018-03-14 2018-08-28 儒安科技有限公司 A kind of image generating method of the generation confrontation network based on double arbiters
CN109002686B (en) * 2018-04-26 2022-04-08 浙江工业大学 Multi-grade chemical process soft measurement modeling method capable of automatically generating samples
CN109816593B (en) * 2019-01-18 2022-12-20 大连海事大学 Super-resolution image reconstruction method for generating countermeasure network based on attention mechanism

Also Published As

Publication number Publication date
CN111028146A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN111028146B (en) Image super-resolution method for generating countermeasure network based on double discriminators
CN111768342B (en) Human face super-resolution method based on attention mechanism and multi-stage feedback supervision
CN110033410B (en) Image reconstruction model training method, image super-resolution reconstruction method and device
CN110570353B (en) Super-resolution reconstruction method for generating single image of countermeasure network by dense connection
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN109035149B (en) License plate image motion blur removing method based on deep learning
CN112016507B (en) Super-resolution-based vehicle detection method, device, equipment and storage medium
US11107194B2 (en) Neural network for enhancing original image, and computer-implemented method for enhancing original image using neural network
CN111325751A (en) CT image segmentation system based on attention convolution neural network
CN109872305B (en) No-reference stereo image quality evaluation method based on quality map generation network
CN111915490A (en) License plate image super-resolution reconstruction model and method based on multi-scale features
CN110070574B (en) Binocular vision stereo matching method based on improved PSMAT net
CN111951164B (en) Image super-resolution reconstruction network structure and image reconstruction effect analysis method
CN112070670A (en) Face super-resolution method and system of global-local separation attention mechanism
CN110738663A (en) Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method
CN111127316A (en) Single face image super-resolution method and system based on SNGAN network
CN114266957B (en) Hyperspectral image super-resolution restoration method based on multi-degradation mode data augmentation
CN105513033A (en) Super-resolution reconstruction method based on non-local simultaneous sparse representation
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN113066065A (en) No-reference image quality detection method, system, terminal and medium
CN112149662A (en) Multi-mode fusion significance detection method based on expansion volume block
CN117575915A (en) Image super-resolution reconstruction method, terminal equipment and storage medium
CN108335265B (en) Rapid image super-resolution reconstruction method and device based on sample learning
CN108846797B (en) Image super-resolution method based on two training sets
CN110728352A (en) Large-scale image classification method based on deep convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230117

Address after: 2009, Floor 2, Building 3, China Agricultural University International Business Park, No. 10, Tianxiu Road, Haidian District, Beijing, 100091

Patentee after: Zhongyao Tiandi (Beijing) Information Technology Co.,Ltd.

Address before: 430070 Hubei Province, Wuhan city Hongshan District Luoshi Road No. 122

Patentee before: WUHAN University OF TECHNOLOGY