CN111028146B - Image super-resolution method for generating countermeasure network based on double discriminators - Google Patents
Image super-resolution method for generating countermeasure network based on double discriminators Download PDFInfo
- Publication number
- CN111028146B CN111028146B CN201911076333.1A CN201911076333A CN111028146B CN 111028146 B CN111028146 B CN 111028146B CN 201911076333 A CN201911076333 A CN 201911076333A CN 111028146 B CN111028146 B CN 111028146B
- Authority
- CN
- China
- Prior art keywords
- network
- resolution
- image
- generating
- resolution image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 37
- 230000003042 antagnostic effect Effects 0.000 claims abstract description 9
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 238000009826 distribution Methods 0.000 claims description 29
- 230000006870 function Effects 0.000 claims description 21
- 238000013507 mapping Methods 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 19
- 230000007246 mechanism Effects 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 9
- 238000012417 linear regression Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000009977 dual effect Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 230000008707 rearrangement Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 2
- 238000001994 activation Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 11
- 230000008034 disappearance Effects 0.000 description 7
- 230000015556 catabolic process Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000006731 degradation reaction Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 5
- 230000009467 reduction Effects 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 101100365548 Caenorhabditis elegans set-14 gene Proteins 0.000 description 1
- 235000010716 Vigna mungo Nutrition 0.000 description 1
- 240000004922 Vigna radiata Species 0.000 description 1
- 235000010721 Vigna radiata var radiata Nutrition 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides an image super-resolution method for generating a countermeasure network based on a double-discriminator, which comprises the following steps: constructing a training sample in a training stage; inputting the training sample into a generating network, and outputting a high-resolution image by the generating network; a high resolution image input countermeasure network; sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively; and in the testing stage, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image. The countermeasure network in the invention further improves the super-resolution precision by the constraint generation type network training of two discriminators respectively working in the pixel domain and the characteristic map domain.
Description
Technical Field
The invention relates to the field of digital image processing, in particular to an image super-resolution method for generating a countermeasure network based on a double-discriminator.
Background
The image is the most main carrier for transmitting information in human society, and the image processing field has great research value. In digital imaging applications, it is particularly necessary to have high resolution images. In practical applications, it may be difficult to obtain high resolution images due to a variety of intrinsic or extrinsic effects. The most straightforward improvement is from the point of view of the imaging hardware, but high resolution optical sensors are expensive. The method has the advantages of being good in universality and high in efficiency due to improvement of software, and having wide application prospect due to the fact that the super-resolution algorithm is used for executing the super-resolution of the low-quality image efficiently and quickly.
The image super-resolution methods mainly include three types, which are interpolation-based methods, modeling-based methods and learning-based methods, and the learning-based methods can be divided into sparse representation-based methods and convolutional neural network-based methods. The interpolation-based method has the characteristic of high calculation efficiency, but high-frequency texture detail information is easily lost. The modeling-based method utilizes prior information to constrain solution space, the effect is improved to a certain extent compared with an interpolation-based method, but when the size of an input image is small, less prior information can be effectively utilized, and the super-resolution effect is poor. The learning-based method realizes super-resolution by learning the internal relationship between low-resolution images and high-resolution images. In recent years, super-resolution methods based on convolutional neural networks have achieved high accuracy. However, the convolution kernel of the convolutional neural network treats each channel and region of the feature map equally, reducing the feature expression capability of the channels and regions of the network that contain rich high-frequency information. In addition, conventional convolutional neural networks have problems of gradient disappearance and network degradation.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an image super-resolution method for generating a confrontation network based on a double-discriminator, wherein a hybrid attention mechanism is introduced into a residual neural network by a generating network, so that the network training difficulty is reduced, the characteristic expression capability of the network is enhanced, the network convergence is accelerated, and the network performance is improved; the countermeasure network further improves the super-resolution precision through two discriminants working in a pixel domain and a feature map domain respectively to constrain the generation type network training.
The invention provides an image super-resolution method for generating a confrontation network based on a double-discriminator, which is characterized in that the confrontation network based on the double-discriminator comprises a generating network based on a residual neural network and a mixed attention mechanism and a confrontation network containing the double-discriminator, and the method comprises the following steps:
a. in the training stage, a training sample is constructed;
b. inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;
c. a high resolution image input countermeasure network; the countermeasure network comprises two discriminators, and the discriminators are used for discriminating whether the image input to the discriminators is a real high-resolution image or a high-resolution image generated by the generating network in a pixel domain and a feature map domain respectively;
d. sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively;
e. and in the testing stage, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image.
In the technical scheme, the generating network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit, wherein the feature extraction unit extracts feature representation of an input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features, regional features and the like through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.
In the above technical solution, the nonlinear mapping unit in the generative network includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit also comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, so that the network learns the residual error between the input characteristic diagram and the output characteristic diagram, the problems of gradient disappearance and network degradation are improved, and the difficulty of training a deep network is reduced.
In the above technical solution, the mixed attention block is composed of a cascade convolution layer and an activation layer; and corresponding descriptors are learned in one step aiming at the input feature graph, and the descriptors are used for endowing different channels and different regions with different weights, so that the feature expression capability of the network is enhanced.
In the above technical solution, the determining in the pixel domain means directly inputting an output image of the generating network to a determiner to determine whether the image is a real high resolution image or a high resolution image generated by the generating network;
and in the feature map domain, the judgment means that the output image of the generator network is input into the VGG-19 network to obtain an inactivated feature map before the fifth maximal pooling and after the fourth convolution, and the inactivated feature map is used as the input of another discriminator to judge whether the current input feature map belongs to a real high-resolution image or a high-resolution image generated by the generator network.
In the above technical solution, each of the discriminators is fitted by a neural network, and is composed of 8 cascaded basic units, a linear regression unit, an activation unit, and a linear regression unit, and each basic unit is composed of a cascaded convolutional layer, a batch normalization unit, and an activation unit.
In the above technical solution, each of the discriminators measures similarity between the real data distribution and the generated sample distribution using Wasserstein distance, and an expression of the similarity is as follows:
wherein, PrFor true distribution, PgIs sample distribution, gamma-pi (P)r,Pg) For the joint distribution of the true and generated samples,transforming the true distribution x into the generated distribution subject to a combined distribution γ of true and generated sample distributionsThe required cost, inf is the symbol to lower bound, W (P)r,Pg) Is the minimum of this "cost"; the Wasserstein distance cannot be directly solved, and according to Kantorovich-Rubinstein duality, the solving approximation of the Wasserstein distance is converted into a continuous function f (g) which meets the Lipschitz continuous condition, so that:
where f (g) is a continuous function that satisfies the Lipschitz continuity condition, and K is a Lipschitz constant.
And (f) ensuring that f (g) meets the Lipschitz continuous condition by adopting a weight clipping mode.
In the technical scheme, the similarity between the high-resolution image obtained by super resolution and the real high-resolution image is quantified by using Charbonier Loss L1-Charbonier-Loss based on an L1 norm, and a small batch (mini-batch) learning is adopted in the training process; the expression of the loss function employed by the generative network is:
whereinTaking 10 from epsilon-6,IHRFor true high-resolution images, ISRFor performing super-resolution of the resulting high-resolution image, H, W, C are the size and number of channels of the input image, respectively, n is the number of mini-batch learning, Iv,i,j,The position of the k channel of the v image is a pixel value of (i, j);
loss of generator network by discriminators operating in the pixel domainWith the loss of the generator by discriminators operating in the domain of the feature mapAre respectively:
wherein DWGANAs an abstract function of the discriminator, x to pgMeans that a sample x is subject to generating a sample data distribution, wherein x-prThe method comprises the steps that a sample x obeys real data distribution, and VGG (quadrature green gram) obtains an unactivated feature map before the VGG-19 network is subjected to fifth maximal pooling and after the fourth convolution;
the method comprises the following steps of combining Charbonnier loss of a generative network based on an L1 norm and the loss of a double-discriminator to the generative network, training a network model until convergence is achieved, wherein a loss function of a generator is composed of three parts by weight, and the expression is as follows:
wherein λ1,λ2To balance the Charbonier loss based on the L1 norm and the loss factor of the Wasserstein distance-based dual discriminators, respectively, to the generative network, the goal of the network in the training phase is to minimize the loss function LG,LGThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image is, the better the super-resolution effect is and the higher the precision is.
In the technical scheme, the step a comprises the steps of cutting an input image, carrying out double-thrice down-sampling operation on a sub-image obtained by cutting to obtain a corresponding low-resolution image, and obtaining more training samples by using data enhancement means such as rotation and mirror image.
Compared with the prior art, the invention has the advantages that:
the generative network is based on a hybrid attention mechanism, and the unit is added in the network structure to accelerate network convergence, enhance the characteristic representation capability and improve the performance of the network;
the generated network is based on a residual error neural network, a local jump connection structure is added in a basic unit, and a global jump connection structure is introduced to directly connect the top layer and the bottom layer of the network, so that the residual error is learned, the problems of gradient disappearance and network degradation are solved, the difficulty in training a deep network is reduced, and the network performance is improved;
the dual-discriminator-based confrontation network measures the similarity of the real data distribution and the generated sample distribution by using Wasserstein distance, has good convergence compared with the originally generated confrontation network, and improves the problems of unstable training, gradient disappearance and model collapse. On the basis of the original generating countermeasure network based on Wasserstein distance, the invention adds a discriminator working in a feature map domain, and the discriminator discriminates whether the current input feature map belongs to a real high-resolution image or a high-resolution image generated by the generating network in the feature map domain and restrains the generating result of the generating network.
Drawings
FIG. 1 is a block diagram of a hybrid power mechanism unit of the present invention.
Fig. 2 is a basic unit structure diagram of a generative network based on a residual mixed attention mechanism proposed by the present invention.
Fig. 3 is an overall network structure diagram of a generative network based on a residual mixed attention mechanism proposed by the present invention.
Fig. 4 is a network structure diagram of the double-discriminator generating countermeasure network based on Wasserstein distance according to the present invention.
Fig. 5 is an overall network configuration diagram of the present invention.
FIG. 6 is a graph of the image effect obtained by the super-resolution of 2 times in the present invention.
Detailed Description
The invention will be further described in detail with reference to the following drawings and specific examples, which are not intended to limit the invention, but are for clear understanding.
The invention provides an image super-resolution method for generating a countermeasure network based on a double-discriminator, wherein the super-resolution image generation countermeasure network comprises a generation network based on a residual mixed attention mechanism and a generation countermeasure network based on the double-discriminator, and the method comprises the following steps:
firstly, preprocessing an input image, enhancing data, and constructing a training sample:
the specific implementation of the preprocessing and the data enhancement is to cut an input image into a sub-image with the size of 96 × 96, perform double-triple down-sampling operation on the cut sub-image by using an imresize function of Matlab to obtain a corresponding low-resolution image with the size of 48 × 48, and acquire more training samples by using data enhancement such as rotation, mirroring and the like.
Inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;
the generating network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit, wherein the feature extraction unit extracts feature representation of an input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features, regional features and the like through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.
The overall network structure diagram of the invention is shown in fig. 5, the generating network is divided into five stages, which are respectively characteristic extraction, characteristic nonlinear mapping, characteristic dimension reduction, sub-pixel convolution upsampling and convolution to obtain final output; the output high-resolution image is input into two discriminators working in a pixel domain and a characteristic map domain to obtain the loss of the double discriminators to a generative network respectively
In the above technical solution, the nonlinear mapping unit includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit also comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, the residual error between the input characteristic diagram and the output characteristic diagram is learned, the problems of gradient disappearance and network degradation are improved, and the difficulty of training a deep network is reduced. The mixed attention block consists of a cascade convolution layer and an activation layer; and corresponding descriptors are learned in one step aiming at the input feature graph, and the descriptors are used for endowing different channels and different regions with different weights, so that the feature expression capability of the network is enhanced.
As shown in fig. 1, there are 2 cascaded convolutional layers and active layers in the mixed attention block. The proposed mixed attention block is composed of cascaded convolution and activation layers, and the mechanism unit learns corresponding descriptors for input feature maps in one step, so that compared with learning corresponding descriptors for different channels and different regions in stages, the mechanism unit has fewer parameters and higher efficiency.
The dimensions of the input feature graph and the output feature graph are H W C, conv is convolution operation, RELU and Sigmoid are two different activation functions,is a Hadamard product. Inputting a feature graph with dimension H W C, and obtaining a descriptor with dimension H W C through two cascaded convolutions and activations:
wherein W1For the parameters of the first layer of convolution, the first layer of convolution executes the channel number dimensionality reduction of the feature map with the factor of 16 to obtain the feature map with the dimensionality of H W C/16, delta (g) is the RELU activation operation, W2For the parameters of the second layer of convolution, the second convolution performs a feature map channel number ascending dimension with a factor of 16, and f (g) activates the operation for Sigmoid. Double convolution, activation pair channel dimensionPerforming channel number reduction and dimension increase, and learning C description matrixes corresponding to different channelsC, adaptively assigning a more sparse description matrix to channels containing a large amount of redundant low-frequency information, so that the neural network focuses more on channels containing abundant high-frequency information. Each description matrix τiIs H × W, corresponding to each element of the ith channel of the original input image. After two times of convolution and activation, the area of the original input image containing rich high-frequency information is reserved, the area containing a large amount of redundant low-frequency information is inhibited, and the descriptor tau is obtainediHadamard multiplication is carried out on the input channel I and the input channel I, so that the neural network focuses more on the area containing rich high-frequency information in the channel I. In summary, the descriptor τ is multiplied by the original input Hadamard to obtain the feature representation through the hybrid attention block.
The structure of the basic unit of the generating network based on the residual mixed attention mechanism is shown in fig. 2, wherein RELU and Sigmoid are two different activation functions,is a product of the Hadamard codes,is added pixel by pixel. The basic unit consists of cascaded convolution, activation, convolution and mixed attention blocks, and a Local skip connection structure (Local skip connection) is added in the basic unit, so that the problems of gradient disappearance and network degradation are solved. Specifically, a feature map of a basic unit is input, convolution operation is performed first to obtain a deeper feature representation, parameters of a convolution layer are set to be 3 × 3 × 256 × 256, namely 256 convolution kernels with the size of 3 × 3, the number of channels of each convolution kernel is 256, the step size of the convolution operation is 1, and zero padding operation is used at the edge to keep the sizes of the input and output feature maps consistent. And the feature graph obtained by convolution is output to the next-stage basic unit in cascade connection to extract deeper features after passing through the mixed attention block.
The overall network structure diagram of the generating network based on the residual mixed attention mechanism is shown in fig. 3, and the overall network structure consists of three parts, namely feature extraction, nonlinear mapping and sub-pixel convolution upsampling. The feature extraction unit is composed of convolution layers, parameters of convolution kernels are set to be 3 × 3 × 3 × 256, namely 256 convolution kernels with the size of 3 × 3 are set, and the number of channels of each convolution kernel is 3. The nonlinear mapping stage unit consists of 32 cascaded basic units, a local jump connection structure is added in each basic unit, and a global jump connection structure is introduced to directly connect the top layer and the bottom layer of the network, so that residual errors are learned, the problems of gradient disappearance and network degradation are solved, and the difficulty of training a deep network is reduced; and finally, completing up-sampling by using the sub-pixel convolution layer and the convolution to obtain a final output high-resolution image.
Thirdly, inputting a high-resolution image into a generating type countermeasure network; the generative confrontation network comprises two discriminators which discriminate whether the input image is a real high-resolution image or a high-resolution image generated by the generative network in a pixel domain and a feature map domain, respectively. And (3) jointly generating network loss and dual-arbiter loss to the generating network respectively, reversely propagating gradient information, and updating parameters of the generating network and the dual-arbiter. As shown in fig. 4, where leak relu is the activation function, Negative _ slope of leak relu is set to 0.2, the batch normalization layer batch normalizes the input data of each batch to be subject to a normal distribution with mean 0 and variance 1, and Linear is a Linear regression function. The image data input to the discriminator is subjected to 8 cascaded convolution, batch normalization and activation layers, deep features of the input image are extracted, and then the deep features are input to the cascaded linear regression, activation and linear regression layers, so that the Wasserstein distance between the approximately-fitted real data distribution and the sample data distribution is obtained.
The generative confrontation network is a dual-discriminator generative confrontation network based on Wasserstein distance, each discriminator uses the Wasserstein distance to measure the similarity between a real data distribution and a generated sample distribution, and the expression of the similarity is as follows:
wherein, PrFor true distribution, PgIs sample distribution, gamma-pi (P)r,Pg) For the joint distribution of the true and generated samples,transforming the true distribution x into the generated distribution subject to a combined distribution γ of true and generated sample distributionsThe required cost, inf is the symbol to lower bound, W (P)r,Pg) Is the minimum of this "cost"; the Wasserstein distance cannot be directly solved, and according to Kantorovich-Rubinstein duality, the solving approximation of the Wasserstein distance is converted into a continuous function f (g) which meets the Lipschitz continuous condition, so that:
where f (g) is a continuous function that satisfies the Lipschitz continuity condition, and K is a Lipschitz constant.
And (f) ensuring that f (g) meets the Lipschitz continuous condition by adopting a weight clipping mode. Each discriminator f (g) is formed by 8 basic units which are cascaded through neural network fitting, linear regression, LeakyRELU activation and linear regression, and each basic unit is formed by cascaded convolution, batch normalization and LeakyRELU activation.
The double-discriminator is composed of two discriminators f (g) which discriminate in a pixel domain and a feature map domain respectively, and the double-discriminator discriminates in the pixel domain and the feature map domain respectively. And judging whether the output image of the generator network is a real high-resolution image or a high-resolution image generated by the generator network directly in a pixel domain, and judging whether the output image of the generator network is the real high-resolution image or the high-resolution image generated by the generator network directly in a feature map domain, wherein the output image of the generator network is input into a VGG-19 network firstly, a feature map which is not activated before the fifth maximum pooling and after the fourth convolution is obtained is used as the input of a feature map domain discriminator, and the current input feature map belongs to the real high-resolution image or the high-resolution image generated by the generator network.
And fourthly, constraining the generative network training by the Charbonnier loss of the joint generative network based on the L1 norm and the loss of the two discriminators to the generative network respectively, enabling the generative network and the two discriminators to resist learning, and training the generative network model until convergence is achieved.
The generator quantifies the similarity degree of the high-resolution image obtained by super resolution and the real high-resolution image by using Charbonnier loss based on the L1 norm and loss of the GAN pair generator based on WGAN, and guides network learning. The expression of the Charbonnier loss function based on the L1 norm is used as follows:
whereinTaking 10 from epsilon-6,IHRFor true high-resolution images, ISRFor performing super-resolution of the resulting high-resolution image, H, W, C are the size and number of channels of the input image, respectively, n is the number of mini-batch learning, Iv,i,j,kThe k channel of the v-th image has a pixel value of (i, j) as a position.
Loss of generator network by discriminators operating in the pixel domainWith the loss of the generator by discriminators operating in the domain of the feature mapAre respectively:
wherein DWGANAs an abstract function of the discriminator, x to pgMeans that a sample x is subject to generating a sample data distribution, wherein x-prMeaning that sample x obeys the true data distribution, VGG (.) is used to obtain the feature map before the fifth maximal pooling of VGG-19 network, and after the fourth convolution.
The method comprises the following steps of combining Charbonnier loss of a generative network based on an L1 norm and the loss of a double-discriminator to the generative network, training a network model until convergence is achieved, wherein a loss function of a generator is composed of three parts by weight, and the expression is as follows:
wherein λ1,λ2To balance the Charbonier loss based on the L1 norm and the loss factor of the Wasserstein distance-based dual discriminators, respectively, to the generative network, the goal of the network in the training phase is to minimize the loss function LG,LGThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image is, the better the super-resolution effect is and the higher the precision is.
And fifthly, inputting the low-resolution image to the trained generative network model, and performing super-resolution reconstruction to obtain a final high-resolution image. The image obtained by super-resolution reconstruction is equivalent to the amplification of the low-resolution imageAnd (4) doubling the obtained product.
To demonstrate the effectiveness of the present invention, the DIV2K data Set, common in the field of image super resolution studies, was used as the training Set, and the Set5, Set14, BSD100, Urban100, and Manga109 data sets were used as the test Set. In the experiment, a bicubic interpolation method and two representative methods based on a convolutional neural network are selected for comparison with a method only using the method for generating the network. In order to ensure the fairness of the comparison, the methods are all tested under the same hardware environment.
Two representative methods based on the convolutional neural network are selected as follows:
the method comprises the following steps: the method proposed by Dong et al, references are: dong C, Loy C, He K, et al. image super-resolution using depth dependent networks [ J ]. IEEE transactions on pattern analysis and machine interaction, 2015,38(2): 295-.
The method 2 comprises the following steps: the method proposed by Lai et al, reference is made to: lai W S, Huang J B, Ahuja N, et al, fast and cure image super-resolution with deep laplacian pyramid networks [ J ]. IEEE transactions on pattern analysis and machine interaction, 2018.
The selected evaluation indexes are as follows: the objective indexes widely used for evaluating the image super-resolution effect comprise Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), and the PSNR and the SSIM are selected as the indexes for objective evaluation. In addition, the time required for completing super-resolution of a single image is also taken as one of the objective evaluation indexes of reference.
During evaluation, firstly, the obtained high-resolution image and a real high-resolution image are cut, and pixel points with the number of image edges as amplification factors are removed; then, the image is converted from RGB color space to YCbCr color space, and the Y channel image is taken out, and the objective index is calculated only on the Y channel. The PSNR is calculated as:
wherein W, H are image sizes, IHRFor true high-resolution images, ISRTo perform super-resolution of the resulting high-resolution image, Ii,jThe larger the value of PSNR for the pixel value at position (i, j), the better the comparative image quality.
The formula for SSIM is:
wherein u, sigma are the pixel mean and variance of two images, C1,C2,C3To prevent constants whose denominator is 0. The value range of SSIM is [0,1 ]]The closer the value is to 1, the more similar the two images compared.
Comparing objective evaluation results of each method with 2 times of super-resolution:
from the experimental data of the methods in the table above, it can be seen that the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) of the present invention are significantly improved over the comparative method, and are significantly better than the method of generating a network only using the present invention. As can be seen from the image effect graph obtained by the super-resolution of 2 times shown in fig. 6, the high-resolution image obtained by the method has better image sharpness and still can retain more texture details. In summary, the present invention is an effective image super-resolution method.
Details not described in this specification are within the skill of the art that are well known to those skilled in the art.
Claims (7)
1. A method for generating image super-resolution of a confrontation network based on a dual-discriminator is characterized in that the confrontation network based on the dual-discriminator comprises a generating network based on a residual neural network and a mixed attention mechanism and a confrontation network containing the dual-discriminator, and the method comprises the following steps:
a. in the training stage, a training sample is constructed;
b. inputting the training sample into a generating network, and outputting a high-resolution image by the generating network;
c. a high resolution image input countermeasure network; the countermeasure network comprises two discriminators, and the discriminators are used for discriminating whether the image input to the discriminators is a real high-resolution image or a high-resolution image generated by the generating network in a pixel domain and a feature map domain respectively;
d. sequentially and alternately performing antagonistic learning by two discriminators in the generative network and the antagonistic network, and constraining the generative network training until convergence is reached by combining the Charbonnier loss based on the L1 norm of the generative network and the loss of the two discriminators in the antagonistic network to the generative network respectively;
e. inputting a low-resolution image to a trained generative network model in a testing stage, and performing super-resolution reconstruction to obtain a final high-resolution image;
the similarity degree of a high-resolution image obtained by super resolution and a real high-resolution image is quantified by using Charbonier Loss L1-Charbonier-Loss based on an L1 norm, and a training process adopts small-batch learning; the expression of the loss function employed by the generative network is:
whereinTaking 10 from epsilon-6H, W and C are the size and the channel number of the input image respectively, and n is the number of small-batch learning;a pixel value indicating that the position of the k channel of the v real high-resolution image is (i, j);a pixel value indicating that the position of the k channel of the vth image obtained by performing super-resolution is (i, j); i isSRA high resolution image obtained for performing super resolution;
charbonnier loss and dual discriminator pair generation based on L1 norm for joint generation networksLoss of generative network, training the network model until convergence is reached, loss of generative network by a discriminator operating in the pixel domainWith the loss of the generator by discriminators operating in the domain of the feature mapAre respectively:
wherein DWGANAs an abstract function of the discriminator, x to pgMeans that a sample x is subject to generating a sample data distribution, wherein x-prThe sample x obeys real data distribution, and VGG (-) is used for obtaining a feature map after the fifth maximum pooling of the VGG-19 network and before the fourth convolution;
the method comprises the following steps of combining Charbonnier loss of a generative network based on an L1 norm and the loss of a double-discriminator to the generative network, training a network model until convergence is achieved, wherein a loss function of a generator is composed of three parts by weight, and the expression is as follows:
wherein λ1,λ2To balance the respective loss factors of the Charbonier loss based on the L1 norm and the double-discriminator based on the Wasserstein distance to the generative network, the objective of the network in the training phase is to minimize the loss function LG,LGThe smaller the difference between the high-resolution image obtained by performing super-resolution and the real high-resolution image, the smaller the super-resolutionThe better the effect of the rate, the higher the accuracy.
2. The image super-resolution method for generating a confrontation network based on a dual-discriminator according to claim 1, wherein the generative network comprises a feature extraction unit, a nonlinear mapping unit and a sub-pixel convolution upsampling unit; the feature extraction unit extracts feature representation of the input low-resolution image through convolution operation and inputs the feature representation to a subsequent nonlinear mapping unit; the nonlinear mapping unit extracts deeper features such as edge features and regional features through a plurality of cascaded basic units, performs nonlinear mapping on the features, and inputs the features into a subsequent sub-pixel convolution unit; and the sub-pixel convolution unit carries out rapid pixel rearrangement operation on the characteristic graph to obtain a final output high-resolution image.
3. The dual-discriminator-based image super-resolution method for generating a countermeasure network according to claim 2, wherein the nonlinear mapping unit in the generative network includes 32 cascaded basic units; each basic unit consists of a cascaded convolutional layer, an active layer, a convolutional layer and a mixed attention block and also comprises a local jump connection structure which transmits the input of the basic unit to the output of the basic unit; the nonlinear mapping unit further comprises a global jump connection structure, the input of the basic unit at the top layer is transmitted to the output of the basic unit at the bottom layer, and the generated network learns the residual error between the input characteristic graph and the output characteristic graph.
4. The dual-discriminator-based image super-resolution method for generating a countermeasure network according to claim 3, wherein the hybrid attention block is composed of a cascade of a convolutional layer and an active layer; and learning corresponding descriptors in one step aiming at the input feature graph, wherein the descriptors are used for giving different weights to different channels and different regions.
5. The method for generating image super resolution of confrontation network based on dual discriminators as claimed in claim 1, wherein said discriminating in pixel domain means directly inputting the output image of the generating network to a discriminator to discriminate whether it is a real high resolution image or a high resolution image generated by the generating network;
and in the feature map domain, the judgment means that the output image of the generator network is input into the VGG-19 network to obtain the feature map which is not activated before the fifth maximal pooling and after the fourth convolution, and the feature map is used as the input of another discriminator to judge whether the current input feature map belongs to the real high-resolution image or the high-resolution image generated by the generator network.
6. The method for generating image super resolution of confrontation network based on double classifiers according to claim 5, wherein each classifier is composed of 8 cascaded basic units, linear regression units, activation units and linear regression units through neural network fitting, and each basic unit is composed of cascaded convolutional layers, batch normalization units and activation units.
7. The method for generating image super-resolution of countermeasure network based on dual discriminators as claimed in claim 1, wherein step a comprises cropping the input image, performing a bicubic down-sampling operation on the cropped sub-image to obtain a corresponding low resolution image, and acquiring more training samples by using rotation and mirror image data enhancement means.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911076333.1A CN111028146B (en) | 2019-11-06 | 2019-11-06 | Image super-resolution method for generating countermeasure network based on double discriminators |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911076333.1A CN111028146B (en) | 2019-11-06 | 2019-11-06 | Image super-resolution method for generating countermeasure network based on double discriminators |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111028146A CN111028146A (en) | 2020-04-17 |
CN111028146B true CN111028146B (en) | 2022-03-18 |
Family
ID=70200916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911076333.1A Active CN111028146B (en) | 2019-11-06 | 2019-11-06 | Image super-resolution method for generating countermeasure network based on double discriminators |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111028146B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111524205A (en) * | 2020-04-23 | 2020-08-11 | 北京信息科技大学 | Image coloring processing method and device based on loop generation countermeasure network |
CN111860618B (en) * | 2020-07-01 | 2024-05-14 | 杭州健培科技有限公司 | Bidirectional GAN model for pathological data conversion and construction and application methods thereof |
CN111881920B (en) * | 2020-07-16 | 2024-04-09 | 深圳力维智联技术有限公司 | Network adaptation method of large-resolution image and neural network training device |
CN111968032B (en) * | 2020-07-23 | 2022-09-09 | 太原理工大学 | Self-adaptive sampling single-pixel imaging method |
CN111897809A (en) * | 2020-07-24 | 2020-11-06 | 中国人民解放军陆军装甲兵学院 | Command information system data generation method based on generation countermeasure network |
CN111741018B (en) * | 2020-07-24 | 2020-12-01 | 中国航空油料集团有限公司 | Industrial control data attack sample generation method and system, electronic device and storage medium |
CN111950619B (en) * | 2020-08-05 | 2022-09-09 | 东北林业大学 | Active learning method based on dual-generation countermeasure network |
CN112992304B (en) * | 2020-08-24 | 2023-10-13 | 湖南数定智能科技有限公司 | High-resolution red eye case data generation method, device and storage medium |
CN112232395B (en) * | 2020-10-08 | 2023-10-27 | 西北工业大学 | Semi-supervised image classification method for generating countermeasure network based on joint training |
CN112396110B (en) * | 2020-11-20 | 2024-02-02 | 南京大学 | Method for generating augmented image of countermeasure cascade network |
CN112686119B (en) * | 2020-12-25 | 2022-12-09 | 陕西师范大学 | License plate motion blurred image processing method based on self-attention generation countermeasure network |
CN112598578B (en) * | 2020-12-28 | 2022-12-30 | 北京航空航天大学 | Super-resolution reconstruction system and method for nuclear magnetic resonance image |
CN112837232B (en) * | 2021-01-13 | 2022-10-04 | 山东省科学院海洋仪器仪表研究所 | Underwater image enhancement and detail recovery method |
CN112837221B (en) * | 2021-01-26 | 2022-08-19 | 合肥工业大学 | SAR image super-resolution reconstruction method based on dual discrimination |
CN113012045B (en) * | 2021-02-23 | 2022-07-15 | 西南交通大学 | Generation countermeasure network for synthesizing medical image |
CN113361566B (en) * | 2021-05-17 | 2022-11-15 | 长春工业大学 | Method for migrating generative confrontation network by using confrontation learning and discriminant learning |
CN113724139B (en) * | 2021-11-02 | 2022-03-15 | 南京理工大学 | Unsupervised infrared single-image super-resolution method for generating countermeasure network based on double discriminators |
CN115115783B (en) * | 2022-07-08 | 2023-08-15 | 西南石油大学 | Digital rock core construction method and system for simulating shale matrix nano-micro pores |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10971142B2 (en) * | 2017-10-27 | 2021-04-06 | Baidu Usa Llc | Systems and methods for robust speech recognition using generative adversarial networks |
CN108460717A (en) * | 2018-03-14 | 2018-08-28 | 儒安科技有限公司 | A kind of image generating method of the generation confrontation network based on double arbiters |
CN109002686B (en) * | 2018-04-26 | 2022-04-08 | 浙江工业大学 | Multi-grade chemical process soft measurement modeling method capable of automatically generating samples |
CN109816593B (en) * | 2019-01-18 | 2022-12-20 | 大连海事大学 | Super-resolution image reconstruction method for generating countermeasure network based on attention mechanism |
-
2019
- 2019-11-06 CN CN201911076333.1A patent/CN111028146B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111028146A (en) | 2020-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111028146B (en) | Image super-resolution method for generating countermeasure network based on double discriminators | |
CN111768342B (en) | Human face super-resolution method based on attention mechanism and multi-stage feedback supervision | |
CN110033410B (en) | Image reconstruction model training method, image super-resolution reconstruction method and device | |
CN110570353B (en) | Super-resolution reconstruction method for generating single image of countermeasure network by dense connection | |
CN112507997B (en) | Face super-resolution system based on multi-scale convolution and receptive field feature fusion | |
CN112016507B (en) | Super-resolution-based vehicle detection method, device, equipment and storage medium | |
US11107194B2 (en) | Neural network for enhancing original image, and computer-implemented method for enhancing original image using neural network | |
CN111325751A (en) | CT image segmentation system based on attention convolution neural network | |
CN109872305B (en) | No-reference stereo image quality evaluation method based on quality map generation network | |
CN112070670B (en) | Face super-resolution method and system of global-local separation attention mechanism | |
CN111915490A (en) | License plate image super-resolution reconstruction model and method based on multi-scale features | |
CN110070574B (en) | Binocular vision stereo matching method based on improved PSMAT net | |
CN111951164B (en) | Image super-resolution reconstruction network structure and image reconstruction effect analysis method | |
CN110738663A (en) | Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method | |
CN111127316A (en) | Single face image super-resolution method and system based on SNGAN network | |
CN114266957B (en) | Hyperspectral image super-resolution restoration method based on multi-degradation mode data augmentation | |
CN105513033A (en) | Super-resolution reconstruction method based on non-local simultaneous sparse representation | |
CN112149526B (en) | Lane line detection method and system based on long-distance information fusion | |
CN113066065A (en) | No-reference image quality detection method, system, terminal and medium | |
CN112149662A (en) | Multi-mode fusion significance detection method based on expansion volume block | |
CN116486074A (en) | Medical image segmentation method based on local and global context information coding | |
CN117576402B (en) | Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method | |
CN108335265B (en) | Rapid image super-resolution reconstruction method and device based on sample learning | |
CN108846797B (en) | Image super-resolution method based on two training sets | |
CN110728352A (en) | Large-scale image classification method based on deep convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230117 Address after: 2009, Floor 2, Building 3, China Agricultural University International Business Park, No. 10, Tianxiu Road, Haidian District, Beijing, 100091 Patentee after: Zhongyao Tiandi (Beijing) Information Technology Co.,Ltd. Address before: 430070 Hubei Province, Wuhan city Hongshan District Luoshi Road No. 122 Patentee before: WUHAN University OF TECHNOLOGY |