CN116152061A - Super-resolution reconstruction method based on fuzzy core estimation - Google Patents

Super-resolution reconstruction method based on fuzzy core estimation Download PDF

Info

Publication number
CN116152061A
CN116152061A CN202211640579.9A CN202211640579A CN116152061A CN 116152061 A CN116152061 A CN 116152061A CN 202211640579 A CN202211640579 A CN 202211640579A CN 116152061 A CN116152061 A CN 116152061A
Authority
CN
China
Prior art keywords
resolution
network
super
image
convolution layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211640579.9A
Other languages
Chinese (zh)
Inventor
邱超烨
徐焕宇
李富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi University
Original Assignee
Wuxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi University filed Critical Wuxi University
Priority to CN202211640579.9A priority Critical patent/CN116152061A/en
Publication of CN116152061A publication Critical patent/CN116152061A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a super-resolution reconstruction method based on fuzzy core estimation, which is used for constructing an MCRAGAN super-resolution network, wherein the MCRAGAN super-resolution network comprises an MCRAN generation network and a discrimination network which are connected in sequence; cutting a target area of an original high-resolution natural scene image to obtain a target area block, cutting a target area of a low-resolution image conforming to real scene distribution to obtain a downsampled area block, constructing a total loss function based on the perception loss function and the countermeasures loss function, and training an MCRAGAN super-resolution model by utilizing the total loss function; and inputting the low-resolution images conforming to the real scene distribution into the trained MCRAGAN super-resolution network to obtain corresponding super-resolution natural scene images. According to the invention, the fuzzy core information is extracted to carry out fuzzy core estimation on the low-resolution image of the real scene, so that the performance of the MCRAGAN super-resolution network on the image of the real natural scene is improved.

Description

Super-resolution reconstruction method based on fuzzy core estimation
Technical Field
The invention relates to the technical field of super-resolution of natural scene images, in particular to a super-resolution reconstruction method for fuzzy core estimation.
Background
The Super-Resolution (SR) technology of natural scene images refers to a technology of reconstructing one or more Low-Resolution (LR) natural scene images into an accurate High-Resolution (HR) natural scene image by using a specific algorithm. In the field of computer vision tasks, super-resolution reconstruction of a single natural scene image is a classical underdetermined problem, and the main idea is to reconstruct a low-resolution image into an original high-resolution natural scene image through an algorithm, so that the generated super-resolution natural scene image contains more effective information and better visual effect, and meanwhile, the same data structure as the low-resolution image is maintained, thereby enhancing the visual perception of people on the natural scene image information on the premise of not influencing the content of the natural scene image. The natural scene image super-resolution technology has wide application prospect, and various fields use the natural scene image super-resolution reconstruction technology to treat related problems. The super-resolution method based on deep learning can obtain better performance indexes under an ideal degradation model, but still can solve the problem that multi-level characteristics cannot be fully utilized, and a network training mode adopts a bicubic interpolation mode, so that the degradation process of a real scene is complex and difficult to quantify, and the performance of a super-resolution network is greatly limited in application due to the domain difference.
With the deep research of super-resolution, a plurality of super-resolution methods based on deep learning appear, and the performance of the super-resolution network is continuously improved. There are a large number of low resolution images in real scenes, such as low quality natural scene images taken by old cell phones, historical old photos, and natural scene images compressed during internet transmission, and degradation processes of these natural scene images or videos are unknown. The existing super-resolution network construction data set method is still not ideal in effect of applying the super-resolution network to the natural scene image due to the fact that the degradation uncertainty of the natural scene image is ignored.
Disclosure of Invention
The invention provides a super-resolution reconstruction method based on fuzzy core estimation, which is used for improving generalization and robustness of a natural scene image of a super-resolution network in a real scene and improving the reconstruction quality of the natural scene image.
In order to achieve the above effects, the technical scheme of the invention is as follows:
a super-resolution reconstruction method based on fuzzy core estimation comprises the following steps:
step 1: constructing an MCRAGAN super-resolution network, wherein the MCRAGAN super-resolution network comprises an MCRAN generating network and a judging network which are sequentially connected;
step 2: acquiring an original high-resolution natural scene image, cutting a target area of the original high-resolution natural scene image to obtain a target area block, and labeling the target area block as true;
degrading the original high-resolution natural scene image to obtain a low-resolution image conforming to the real scene distribution; cutting a target area of the low-resolution image conforming to the real scene distribution to obtain a downsampling area block, and labeling the downsampling area block as false;
step 3: performing accuracy judgment on the target area block and the downsampled area block by adopting a judgment network, and outputting a D-map thermodynamic diagram matrix to obtain an optimized MCRAN generating network;
step 4: matching the low-resolution image conforming to the real scene distribution with the original high-resolution natural scene image to obtain an LR-HR natural scene image pair, training an MCRAGAN super-resolution network by the LR-HR natural scene image pair, and obtaining a trained MCRAGAN super-resolution network; and inputting the low-resolution images conforming to the real scene distribution into the trained MCRAGAN super-resolution network to obtain corresponding super-resolution natural scene images.
In step 2, the low-resolution image and the original high-resolution natural scene image conforming to the real scene distribution are both image blocks cut into a size of 100×100.
Further, in the step 1, the MCRAN generating network includes a convolution layer conv_1, a plurality of MRAB modules, a convolution layer conv_2, a convolution layer conv_3, a convolution layer conv_4, and a sub-pixel convolution module, which are sequentially connected; the convolutional layers conv_1, conv_2, conv_3 all use the LeakyReLU nonlinear activation function.
Further, in the step 1, the discrimination network includes a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, and a fifth convolution layer that are sequentially connected, where the first convolution layer to the fourth convolution layer all use a LeakyReLU nonlinear activation function; the fifth convolution layer uses a Sigmoid function.
Further, in the step 1, the step sizes of the first convolution layer, the second convolution layer and the third convolution layer of the discrimination network are 2, the step sizes of the fourth convolution layer and the fifth convolution layer of the discrimination network are 1, the convolution kernel sizes of the 5 layers of the discrimination network are 4×4, and the numbers of the convolution kernels of the first convolution layer to the fifth convolution layer are 64, 128, 256, 512 and 1 respectively.
Further, in the step 2, the original high-resolution natural scene image is degraded, namely, an original real scene low-resolution image is obtained, a kernelgan+ network is adopted to estimate the fuzzy core and noise of the original real scene low-resolution image, the fuzzy core and noise are respectively stored in a fuzzy core pool K and a noise pool N, each original high-resolution natural scene image randomly extracts the fuzzy core and noise from the fuzzy core pool K and the noise pool N and is degraded, and a low-resolution image I conforming to the real scene distribution is obtained LR Finishing fuzzy kernel estimation; i LR The formula is shown as follows;
I LR =(H*k)↓ a +n
wherein H represents an original high resolution natural scene image; k and n represent blur kernel and noise, respectively; * Denote convolution operation, ∈and downsampling, and a denotes a downsampling multiple.
Further, the kernelgan+ network is obtained by removing the activation function from the KernelGAN network.
Further, step 3 specifically includes performing accuracy judgment on the target area block and the downsampled area block by using a judgment network, outputting a D-map thermodynamic diagram matrix, wherein each map value in the D-map thermodynamic diagram matrix is used for judging true and false probabilities of the target area block and the downsampled area block, and using the probabilities to represent the map values; the map value ranges from 0 to 1, and the probability is larger, which indicates that the probability that the currently input downsampled region block is derived from the original high-resolution natural scene image is larger; when the discrimination network is difficult to judge the difference between the target area block and the downsampled area block, the weight of the MCRAN generating network is close to the real fuzzy core information at the moment, and the optimized MCRAN generating network is obtained.
Further, in the step 4, the training of the MCRAGAN super-resolution network by the LR-HR natural scene image is specifically,
constructing a perception loss function based on the low-resolution image conforming to the real scene distribution and the original high-resolution natural scene image; constructing an counterloss function based on the generation network, constructing a total loss function based on the perception loss function and the counterloss function, and training an MCRAGAN super-resolution model by utilizing the total loss function;
the perceptual loss function is represented by the following formula:
Figure BDA0004008775980000031
in the formula, h l 、w l 、c l Respectively representing the width, height and channel number of the generated image, I HR Representing an original high resolution natural scene image, G (I LR ) Representing a low-resolution image conforming to the real scene distribution, wherein the perceived loss is an activated characteristic of the VGG-19 network; phi (phi) i,j Representing a feature map obtained after pooling of the ith layer under the VGG19 network and before the jth convolution.
Further, the counterloss function includes a generator counterloss function and a discriminator counterloss function, as shown in the following formulas, respectively:
Figure BDA0004008775980000041
Figure BDA0004008775980000042
in the method, in the process of the invention,
Figure BDA0004008775980000043
representing generator fight loss function->
Figure BDA0004008775980000044
Representing the discriminator against loss function, I HR Representing an original high resolution natural scene image, G (I HR The method comprises the steps of carrying out a first treatment on the surface of the ω) represents the generated image; d (G (I) HR The method comprises the steps of carrying out a first treatment on the surface of the ω) represents the probability of judging whether the generated image is true or false by the discrimination network, ω is a parameter of the MCRAN generated network, and N is the number of natural scene images of one training lot.
Further, the MCRAN generates the total loss function of the network as follows:
Figure BDA0004008775980000045
where α, β and γ are weights of the respective loss functions, respectively.
To balance the effect of the respective loss functions, the weighting coefficients α, β and γ are set to 0.5, e-2 and e-3, respectively.
In the above formula, G represents an MCRAN generation network, and D represents a discrimination network; the purpose of the MCRAN generating network G is to train by learning to output as close to true an original high-resolution natural scene image as possible to fool the discrimination network D, which discriminates the natural scene image generated by the MCRAN generating network G as false as possible, and evaluates the true original high-resolution natural scene image as true, thereby forming an countermeasure idea.
And combining the loss functions, and utilizing the advantages of each loss function to jointly restrict the MCRAN to generate a network so as to generate a natural scene image with better visual effect.
The GAN network in the prior art can generate a clearer natural scene image, and the main reasons are the application of the countermeasure idea and the constraint of the loss function; the invention combines multiple types of losses to jointly restrict the learning of the MCRAGAN super-resolution network, and adds the perceived loss and the counterloss to jointly guide the network to generate high-frequency information on the basis of ensuring that the loss function can accurately recover the low-frequency information, so that the visual perception of the generated natural scene image is better.
The invention judges that the network takes the regional natural scene image block intercepted by the original high-resolution natural scene image as a RealPatch (target regional block), takes the regional natural scene image block intercepted by the generated image as a fakePatch (downsampling regional block), judges that the network has the function of learning the data distribution of the RealPatch, and then distinguishes the RealPatch from the fakePatch generated by the MCRAN generating network.
And each target area block and each downsampling area block are pixel values [0,1] to reflect the similarity of the target area block and the downsampling area block, and when the discrimination network is difficult to judge the difference between the original real scene low-resolution image and the original high-resolution natural scene image, the trained MCRAGAN super-resolution network is obtained.
Most of the current super-resolution algorithms are researched by an ideal degradation method, and cannot effectively cope with the complex situation of a real scene. According to the invention, the Kernelgan+ network is used as a natural scene image degradation model, fuzzy kernel information is extracted, fuzzy kernel estimation is carried out on a low-resolution image of a real scene, the estimation of real noise is considered in the degradation process, the extracted fuzzy kernel information and the noise are jointly acted on an original high-resolution natural scene image to simulate real world degradation, a low-resolution image close to the real scene is generated, and a data set is constructed for training of the super-resolution network, so that the method can be effectively applied to the real scene.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention constructs an MCRAGAN super-resolution network, utilizes a KernelgaN+ network to degrade an original high-resolution natural scene image, completes fuzzy kernel estimation, generates a low-resolution image close to a real scene, constructs a pair data set for training the MCRAGAN super-resolution network, and enables the MCRAGAN super-resolution network to be effectively applied to the real scene; the performance of the MCRAGAN super-resolution network on the real natural scene image is improved, and the generalization and the robustness of the MCRAGAN super-resolution network are improved.
Drawings
The drawings are for illustrative purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
FIG. 1 is a schematic diagram of a fuzzy core estimation process according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a conventional generation network structure according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a conventional discrimination network according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a kernelgan+ natural scene image degradation model provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of the overall structure of an MCRAGAN super-resolution network according to an embodiment of the present invention;
fig. 6 is a schematic diagram showing comparison of different network reconstruction effects of natural scene images in a DPED training set according to an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Related noun paraphrasing:
BN: batch Normalization, normalization of batch samples;
GAN: generative Adversarial Network, generating an antagonism network;
MCRAN: multi-scale Convolution and Residual-dense Attention super-resolution Networks, multi-scale convolution and remaining dense attention super-resolution networks;
MCRAGAN: multi-scale Convolution and Residual-dense Attention super-resolution Generative Adversarial Networks, multi-scale convolution and residual dense attention super-resolution generation countermeasure network
Example 1
For easy understanding, referring to fig. 1 to fig. 6, an embodiment of a super-resolution reconstruction method based on fuzzy core estimation provided by the present invention includes the following steps:
step 1: constructing an MCRAGAN super-resolution network, wherein the MCRAGAN super-resolution network comprises an MCRAN generating network and a judging network which are sequentially connected;
step 2: acquiring an original high-resolution natural scene image, cutting a target area of the original high-resolution natural scene image to obtain a target area block, and labeling the target area block as true;
degrading the original high-resolution natural scene image to obtain a low-resolution image conforming to the real scene distribution; cutting a target area of the low-resolution image conforming to the real scene distribution to obtain a downsampling area block, and labeling the downsampling area block as false;
step 3: performing accuracy judgment on the target area block and the downsampled area block by adopting a judgment network, and outputting a D-map thermodynamic diagram matrix to obtain an optimized MCRAN generating network;
step 4: matching the low-resolution image conforming to the real scene distribution with the original high-resolution natural scene image to obtain an LR-HR natural scene image pair, training an MCRAGAN super-resolution network by the LR-HR natural scene image pair, and obtaining a trained MCRAGAN super-resolution network; and inputting the low-resolution images conforming to the real scene distribution into the trained MCRAGAN super-resolution network to obtain corresponding super-resolution natural scene images.
Specifically, the MCRAGAN super-resolution network structure proposed by the present invention is shown in fig. 5; the MCRAN generating network in the step 1 comprises a convolution layer Conv_1, a plurality of MRAB modules, a convolution layer Conv_2, a convolution layer Conv_3, a convolution layer Conv_4 and a sub-pixel convolution module which are sequentially connected; the convolutional layers conv_1, conv_2, conv_3 all use the LeakyReLU nonlinear activation function.
The discrimination network comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer which are sequentially connected, wherein the first convolution layer to the fourth convolution layer all use a LeakyReLU nonlinear activation function; the fifth convolution layer uses a Sigmoid function.
The step length of the first convolution layer, the second convolution layer and the third convolution layer of the discrimination network is 2, the step length of the fourth convolution layer and the step length of the fifth convolution layer of the discrimination network are 1, the convolution kernels of the 5 layers of convolution of the discrimination network are 4 multiplied by 4, and the number of the convolution kernels of the first convolution layer to the fifth convolution layer is 64, 128, 256, 512 and 1 respectively.
In the step 2, the original high-resolution natural scene image is degraded, namely, an original real scene low-resolution image is obtained, a Kernelgan+ network is adopted to estimate the fuzzy core and noise of the original real scene low-resolution image, the fuzzy core and noise are respectively stored in a fuzzy core pool K and a noise pool N, each original high-resolution natural scene image randomly extracts the fuzzy core and noise from the fuzzy core pool K and the noise pool N, and degradation is carried out, so that a low-resolution image I conforming to real scene distribution is obtained LR ;I LR The formula is shown as follows;
I LR =(H*k)↓ a +n
wherein H represents an original high resolution natural scene image; k and n represent blur kernel and noise, respectively; * Denote convolution operation, ∈and downsampling, and a denotes a downsampling multiple.
The method for performing degradation on the Kernelgan+ network can be suitable for a common data set, so that a simple bicubic interpolation method is replaced. In the blur kernel estimation process, as shown in fig. 1, an original high-resolution natural scene image is taken as an input image, two patches (image blocks) with different sizes are respectively cut, wherein a large-size patch is input into an MCRAN generating network G to obtain a generated image, and a small-size patch and the image downsampled by the MCRAN generating network G are input into a discrimination network D to generate a thermodynamic diagram matrix.
Step 3, accurately judging the target area block and the downsampled area block by adopting a judging network, outputting a D-map thermodynamic diagram matrix, wherein each map value in the D-map thermodynamic diagram matrix is used for judging true and false probabilities of the target area block and the downsampled area block, and the map value is represented by the probability; the map value ranges from 0 to 1, and the probability is larger, which indicates that the probability that the currently input downsampled region block is derived from the original high-resolution natural scene image is larger; when the discrimination network is difficult to judge the difference between the target area block and the downsampled area block, the weight of the MCRAN generating network is close to the real fuzzy core information at the moment, and the optimized MCRAN generating network is obtained.
In the prior art, as shown in fig. 2, the original generation network comprises a first convolution layer, a second convolution layer and a third convolution layer which are sequentially connected, wherein a deep linear network formed by the first convolution layer, the second convolution layer and the third convolution layer has the same characterization capacity as a single-layer convolution network formed by the fourth convolution layer, the third convolution layer and the fourth convolution layer; the sizes of the convolution kernels of the first convolution layer to the third convolution layer are 7 multiplied by 7, 5 multiplied by 5 and 3 multiplied by 3 respectively, and the fourth convolution layer to the sixth convolution layer are all convolution kernels with the size of 1 multiplied by 1. The original high-resolution natural scene image is subjected to first to third convolution layers to obtain a feature image, each pixel on the feature image has a receptive field range of 13×13, the receptive field size is not changed by fourth to sixth convolution layers, the whole original generation network is expressed as a real fuzzy core with the size of 13×13, the sixth convolution layer of the original generation network is 1×1 convolution, the step length is 2, the size of the output natural scene image is reduced by half compared with that of the input natural scene image, and the downsampling operation by 2 times is completed, so that the low-resolution image is obtained. The convolutional layer parameters in FIG. 2 are interpreted as K7n64s1: k7 represents the use of 7*7 convolution kernels, n64 represents 64 convolution kernels, s1 represents stride, and the step size is 1.
As shown in fig. 3, in the prior art, the original discrimination network uses a seven-layer convolution manner, and the first Block module includes a Conv layer, a BN layer and a ReLU activation function; the latter four blocks are used as convolution layers, the first convolution layer to the seventh convolution layer do not use pooling and moving step operation, except that the first convolution layer uses 7×7 convolution kernels, the other convolution layers all use convolution kernels with the size of 1×1, each value in the finally generated thermodynamic diagram matrix heat-map is used for judging true probability and false probability of natural scene image blocks with the size of 7×7 in the original high-resolution natural scene image, the probability is a map value, and the range of the map value is [0,1]; the larger the map value, the greater the likelihood that the currently input natural scene image originates from the original natural scene image. The probability represents the probability that two input image blocks are from the same image, i.e. the similarity of the two input image blocks.
The kernelgan+ network is a linear convolutional network; the invention extracts fuzzy core information by using a Kernelgan+ network, carries out fuzzy core estimation on a low-resolution image of a real scene, generates an LR natural scene image close to the real scene in the process of reducing quality modeling by considering estimation of real noise, and constructs a data set for training of a super-resolution network, so that the data set can be effectively applied to the real scene.
Example 2
Specifically, the description of the embodiment will be given with reference to specific embodiments on the basis of embodiment 1, so as to further demonstrate the technical effects of the present embodiment. The method comprises the following steps:
specifically, the MRAB module includes an input terminal X n-1 The system comprises a convolution layer Conv1_1, a convolution layer Conv1_2, a convolution layer Conv1_3, a first convolution layer, a convolution layer Conv2_1, a convolution layer Conv2_2, a second convolution layer, a convolution layer Conv3_1, a channel attention module and an output end X n The method comprises the steps of carrying out a first treatment on the surface of the Wherein the input terminal X n-1 The input ends of the first fusion layer are respectively connected with the input ends of the convolution layers Conv1_1, conv1_2, conv1_3 and Conv1_3; input terminal X n-1 The output ends of the convolution layers Conv1_2 and Conv1_3 are respectively connected with the input end of the first fusion layer, the output end of the first fusion layer is respectively connected with the input ends of the convolution layers Conv2_1 and Conv2_2, and the input end X n-1 Output end of convolution layer Conv2_1 and convolution layerThe output end of Conv2_2 and the output end of Conv1_1 of the convolution layer are respectively connected with the input end of a second fusion layer, and the output end of the second fusion layer is sequentially connected with Conv3_1 of the convolution layer and the channel attention module to obtain an output end X n . The convolution layers Conv1_2, conv1_3, conv2_1, conv2_2 and Conv3_1 are provided with a Leaky ReLU nonlinear activation function.
The LR-HR natural scene image pair training MCRAGAN super-resolution network in the step 4 is specifically that,
constructing a perception loss function based on the low-resolution image conforming to the real scene distribution and the original high-resolution natural scene image; constructing an counterloss function based on the generation network, constructing a total loss function based on the perception loss function and the counterloss function, training an MCRAGAN super-resolution model by utilizing the total loss function, and according to the set training iteration times, obtaining an optimal super-resolution image;
the perceptual loss function is represented by the following formula:
Figure BDA0004008775980000091
in the formula, h l 、w l 、c l Respectively representing the width, height and channel number of the generated image, I HR Representing an original high resolution natural scene image, G (I LR ) Representing a low resolution image conforming to a real scene distribution, phi i,j Representing a feature map obtained after pooling of the ith layer under the VGG19 network and before the jth convolution.
The contrast loss function includes a generator contrast loss function and a discriminator contrast loss function, as shown in the following formulas, respectively:
Figure BDA0004008775980000092
Figure BDA0004008775980000093
in the method, in the process of the invention,
Figure BDA0004008775980000094
representing generator fight loss function->
Figure BDA0004008775980000095
Representing the discriminator against loss function, I HR Representing an original high resolution natural scene image, G (I HR The method comprises the steps of carrying out a first treatment on the surface of the ω) represents the generated image; d (G (I) HR The method comprises the steps of carrying out a first treatment on the surface of the ω) represents the probability of judging whether the generated image is true or false by the discrimination network, ω is a parameter of the MCRAN generated network, and N is the number of natural scene images of one training lot.
The MCRAN generates the total loss function of the network as follows:
Figure BDA0004008775980000101
where α, β and γ are weights of the respective loss functions, respectively.
According to the invention, the original high-resolution natural scene image is preprocessed by using the fuzzy kernel estimation based on the Kernelgan+ network, so that fuzzy kernel information in various low-resolution images is obtained; each training is only carried out for the fuzzy core in the single Zhang Ziran scene image, then the estimated fuzzy core is put into the fuzzy core pool, and a large number of real scene low-resolution images are repeatedly operated and extracted, so that the construction of a rich fuzzy core pool conforming to the real scene is completed.
The fuzzy core estimation utilizes the contrast process of 'zero and game' of an MCRAN generating network and a discriminating network to make each other stronger and finally reach a Nash equilibrium state, the MCRAN generating network and the discriminating network are both formed by a deep learning-based convolutional neural network, the MCRAN generating network is a multi-layer linear network and a deep linear convolutional network, the input of the MCRAN generating network is a low-resolution image, namely, a natural scene image after 2 times downsampling is generated; then, a target area block is obtained by cutting a target area from an original high-resolution natural scene image by utilizing a trans-scale self-similarity idea, a downsampling area block is obtained by cutting a target area from a low-resolution image conforming to real scene distribution, accuracy judgment is carried out on the target area block and the downsampling area block by adopting a judging network, a D-map thermodynamic diagram matrix is output, pixel values of each image in the D-map thermodynamic diagram matrix are 0 and 1, the similarity of the original high-resolution natural scene image and a generated image is reflected, and when the judging network is difficult to judge the difference between the original high-resolution natural scene image and the generated image, the weight parameters of the MCRAN generated network are close to real fuzzy core information. The concept of cross-scale self-similarity refers to similarity of the same image content at different scaling factors.
In the prior art, a network is originally distinguished, a regional natural scene image block intercepted by a low-resolution image is taken as a Real Patch, a regional natural scene image block intercepted by a natural scene image is taken as a Fake Patch, the role of the network is to learn the data distribution of the Real Patch, and then the Real Patch is distinguished from the Fake Patch generated by a generating network;
the Kernelgan+ network is used as a fuzzy kernel estimation method, and is used for generating a low-resolution image similar to a real scene, and is equivalent to a downsampling module based on learning. The degradation operation of the real scene is a linear transformation process, and the generation of the natural scene image is influenced by adding a nonlinear transformation structure of an activation function; the generator of the KernelGAN + network of the present invention does not contain any activation functions, so unlike the generator of a conventional GAN network,
specifically, a data set for training the MCRAGAN super-resolution network is constructed based on the original high-resolution natural scene image and the low-resolution image, wherein training samples in the data set comprise the original high-resolution natural scene image and the low-resolution image, and the original high-resolution natural scene image and the low-resolution image form an LR-HR natural scene image pair.
In the embodiment, 1000 DIV2K data sets are used as the data sets, and the data sets are divided into a DPED training set, a DIV2KRK test set and a verification set, wherein the ratio is 8:1:1; using a Kernelgan+ network to form 800 DPED training set natural scene images and constructing a real paired data set K-DIV2K; the invention is divided into two parts of simulated natural scene image test and real natural scene image test, wherein the simulated natural scene image test uses a standard public DIV2K data set as an objective evaluation standard, each real natural scene image is generated by combining 100 natural scene images in verification set with random anisotropic Gaussian blur kernel, wherein the blur kernel rotates by random angle, randomly sets length in horizontal and vertical directions, and adds Gaussian white noise with noise level below 25, so as to finally generate a low-resolution image close to the distribution of the real scene.
According to the invention, a Kernelgan+ network is used for fuzzy kernel estimation and noise estimation, the Kernelgan+ network is used as a degradation model, natural scene image blocks with the size of 100 multiplied by 100 are randomly cut out from natural scene images shot by iphone3 and blackberry mobile phones of a DPED training set, fuzzy kernels and noise of original high-resolution natural scene images in a real scene are extracted from the natural scene image blocks, 1000 fuzzy kernels with the size of 13 multiplied by 13 and 1000 noise are obtained and are respectively put into a fuzzy kernel pool K and a noise pool N, and preparation is made for generating natural scene image pairs conforming to the real scene distribution.
The LR-HR natural scene images form paired data sets; in order to verify the effectiveness of the Kernelgan+ network, DIV2KRK is adopted as a test set of a real scene, and a contrast model adopts VDSR, DRRN, SRResNet, ZSSR and the MCRAN generation network and the MCRAGAN super-resolution network; in the DIV2KRK test set simulating the real scene, the objective evaluation index of the super-resolution model trained based on the Kernelgan+ network degradation method is obviously superior to that of the existing degradation method. In the MCRAGAN super-resolution network, the PSNR index mean value of the corresponding generated image is improved by about 1.5dB, so that the KernelgaN+ network is proved to be capable of introducing effective prior information when paired data sets of a real scene are constructed, and the MCRAGAN super-resolution network is facilitated to be applied to the real scene. Table 1 shows objective evaluation metrics for each network in the DIV2KRK test set, where K represents the use of Kernelgan+ network for the input image in the network.
Table 1 objective evaluation index of reconstruction times of different networks at x 2 and x 4
Figure BDA0004008775980000121
In the real natural scene image test, DPED training set, historical old photo and low-quality webpage picture are selected for x 4 times reconstruction test, and as the real low-resolution image does not have corresponding original high-resolution natural scene image to carry out quantitative evaluation standard, the reconstruction effect of each network is evaluated from visual sense. The test results are shown in fig. 6, and fig. 6 is a 95 numbered image in the DPED training set; as can be seen from the comparison of the displayed detail region repeated effects, a Bicubic interpolation algorithm (Bicubic) and a DRRN network are fuzzy at the edge of the reconstructed natural scene image, the detail of the natural scene image is basically lost, and the ZSSR network has noise effects with different degrees in the reconstructed natural scene image to influence the integral quality of the natural scene image; the srres net network and MCRAN network use pixel level loss to reconstruct the original high resolution natural scene image too smooth; the MCRAGAN super-resolution network has clear visual effect of the original high-resolution natural scene images reconstructed in the different types of real natural scene images.
Carrying out ablation test comparison on the Kernelgan network and the Kernelgan+ network provided by the invention; because fuzzy core estimation is difficult to explicitly perform effect comparison, a super-resolution network is trained by adopting data sets generated by different networks, then a DIV2K data set is used for testing, the super-resolution network uses the method provided by the invention, objective evaluation index results of reconstruction multiples of a test set in multiplied by 2 and multiplied by 4 are shown in a table 2, a Kernelgan network in the table represents training by using the data set generated by the original Kernelgan, and a Kernelgan+ network represents training by using the data set generated by the improved Kernelgan+ network;
TABLE 2 super-resolution evaluation index of different degradation models on DIV2KRK
Figure BDA0004008775980000131
As can be seen from the table, under the reconstruction multiple of x 2 and x 4, PSNR indexes of the MCRAN generated network after the improved KernelGAN+ network training of the invention are respectively improved by 0.17dB and 0.14dB compared with the original KernelGAN network, and the MCRAGAN super-resolution network is respectively improved by 0.19dB and 0.13dB, which proves that the noise estimation method is added on the basis of the degradation model based on the fuzzy core to fit the degradation process of the real scene well, thereby proving the effectiveness of the improved degradation model KernelGAN+ network;
the super-resolution network performance trained by the KernelGAN+ network degradation mode is superior to that of the traditional degradation mode, can be well applied to a real scene, and proves the effectiveness of the KernelGAN+ network; the method effectively improves the application effect of the MCRAGAN super-resolution network in the real scene, can improve the natural scene image quality of the real scene, and is beneficial to improving the effect of other computer vision tasks.
The invention selects MCRAN to generate a network, the discrimination network comprises 5 layers of convolutions which are sequentially connected, and a thermodynamic diagram matrix with the size of N multiplied by N is output, the essence of the thermodynamic diagram matrix with the size of N multiplied by N is a characteristic diagram output by a convolution layer in the discrimination network, each element on the characteristic diagram can trace back to a certain position of an input natural scene image, and the influence of the position on a final input result can be seen.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (10)

1. The super-resolution reconstruction method based on fuzzy core estimation is characterized by comprising the following steps of:
step 1: constructing an MCRAGAN super-resolution network, wherein the MCRAGAN super-resolution network comprises an MCRAN generating network and a judging network which are sequentially connected;
step 2: acquiring an original high-resolution natural scene image, cutting a target area of the original high-resolution natural scene image to obtain a target area block, and labeling the target area block as true;
degrading the original high-resolution natural scene image to obtain a low-resolution image conforming to the real scene distribution; cutting a target area of the low-resolution image conforming to the real scene distribution to obtain a downsampling area block, and labeling the downsampling area block as false;
step 3: performing accuracy judgment on the target area block and the downsampled area block by adopting a judgment network, and outputting a D-map thermodynamic diagram matrix to obtain an optimized MCRAN generating network;
step 4: matching the low-resolution image conforming to the real scene distribution with the original high-resolution natural scene image to obtain an LR-HR natural scene image pair, training an MCRAGAN super-resolution network by the LR-HR natural scene image pair, and obtaining a trained MCRAGAN super-resolution network; and inputting the low-resolution images conforming to the real scene distribution into the trained MCRAGAN super-resolution network to obtain corresponding super-resolution natural scene images.
2. The super-resolution reconstruction method based on fuzzy core estimation according to claim 1, wherein in step 1, the MCRAN generating network comprises a convolution layer conv_1, a plurality of MRAB modules, a convolution layer conv_2, a convolution layer conv_3, a convolution layer conv_4 and a sub-pixel convolution module which are sequentially connected; the convolutional layers conv_1, conv_2, conv_3 all use the LeakyReLU nonlinear activation function.
3. The super-resolution reconstruction method based on fuzzy core estimation according to claim 2, wherein in the step 1, the discrimination network includes a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, and a fifth convolution layer which are sequentially connected, and the first convolution layer to the fourth convolution layer use a LeakyReLU nonlinear activation function; the fifth convolution layer uses a Sigmoid function.
4. The super-resolution reconstruction method based on fuzzy core estimation according to claim 3, wherein in the step 1, step sizes of a first convolution layer, a second convolution layer and a third convolution layer of the discrimination network are 2, step sizes of a fourth convolution layer and a fifth convolution layer of the discrimination network are 1, convolution kernels of 5 layers of the discrimination network are 4×4, and numbers of convolution kernels of the first convolution layer to the fifth convolution layer are 64, 128, 256, 512 and 1 respectively.
5. The super-resolution reconstruction method based on fuzzy core estimation as claimed in claim 4, wherein in the step 2, the degradation of the original high-resolution natural scene image is specifically performed by obtaining an original real scene low-resolution image, estimating fuzzy core and noise of the original real scene low-resolution image by using a kernelgan+ network, respectively storing the fuzzy core and noise in a fuzzy core pool K and a noise pool N, randomly extracting the fuzzy core and noise from the fuzzy core pool K and the noise pool N for each original high-resolution natural scene image, and performing degradation to obtain a low-resolution image I conforming to real scene distribution LR Finishing fuzzy kernel estimation; i LR The formula is shown as follows;
I LR =(H*k)↓ a +n
wherein H represents an original high resolution natural scene image; k and n represent blur kernel and noise, respectively; * Denote convolution operation, ∈and downsampling, and a denotes a downsampling multiple.
6. The super-resolution reconstruction method according to claim 5, wherein the kernelgan+ network is obtained by removing an activation function from the KernelGAN network.
7. The super-resolution reconstruction method based on fuzzy core estimation according to claim 6, wherein step 3 specifically comprises performing accuracy discrimination on a target area block and a downsampled area block by using a discrimination network, outputting a D-map thermodynamic diagram matrix, wherein each map value in the D-map thermodynamic diagram matrix is a true/false probability determination on the target area block and the downsampled area block, and using the probability to represent the map value; the map value ranges from 0 to 1, and the probability is larger, which indicates that the probability that the currently input downsampled region block is derived from the original high-resolution natural scene image is larger; when the discrimination network is difficult to judge the difference between the target area block and the downsampled area block, the weight of the MCRAN generating network is close to the real fuzzy core information at the moment, and the optimized MCRAN generating network is obtained.
8. The method of claim 7, wherein the training of the MCRAGAN super-resolution network by using the LR-HR natural scene image in step 4 is specifically,
constructing a perception loss function based on the low-resolution image conforming to the real scene distribution and the original high-resolution natural scene image; constructing an counterloss function based on the generation network, constructing a total loss function based on the perception loss function and the counterloss function, and training an MCRAGAN super-resolution model by utilizing the total loss function;
the perceptual loss function is represented by the following formula:
Figure FDA0004008775970000021
in the formula, h l 、w l 、c l Respectively representing the width, height and channel number of the generated image, I HR Representing an original high resolution natural scene image, G (I LR ) Representing a low-resolution image conforming to the real scene distribution, wherein the perceived loss is an activated characteristic of the VGG-19 network; phi (phi) i,j Representing a feature map obtained after pooling of the ith layer under the VGG19 network and before the jth convolution.
9. The super-resolution reconstruction method according to claim 8, wherein the contrast loss function includes a generator contrast loss function and a discriminator contrast loss function, as shown in the following formulas, respectively:
Figure FDA0004008775970000031
Figure FDA0004008775970000032
in the method, in the process of the invention,
Figure FDA0004008775970000033
representing generator fight loss function->
Figure FDA0004008775970000034
Representing the discriminator against loss function, I HR Representing an original high resolution natural scene image, G (I HR The method comprises the steps of carrying out a first treatment on the surface of the ω) represents the generated image; d (G (I) HR The method comprises the steps of carrying out a first treatment on the surface of the ω) represents the probability of judging whether the generated image is true or false by the discrimination network, ω is a parameter of the MCRAN generated network, and N is the number of natural scene images of one training lot.
10. The super-resolution reconstruction method as claimed in claim 9, wherein the MCRAN generates a total loss function of the network as follows:
Figure FDA0004008775970000035
where α, β and γ are weights of the respective loss functions, respectively.
CN202211640579.9A 2022-12-20 2022-12-20 Super-resolution reconstruction method based on fuzzy core estimation Pending CN116152061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211640579.9A CN116152061A (en) 2022-12-20 2022-12-20 Super-resolution reconstruction method based on fuzzy core estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211640579.9A CN116152061A (en) 2022-12-20 2022-12-20 Super-resolution reconstruction method based on fuzzy core estimation

Publications (1)

Publication Number Publication Date
CN116152061A true CN116152061A (en) 2023-05-23

Family

ID=86338177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211640579.9A Pending CN116152061A (en) 2022-12-20 2022-12-20 Super-resolution reconstruction method based on fuzzy core estimation

Country Status (1)

Country Link
CN (1) CN116152061A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843553A (en) * 2023-07-11 2023-10-03 太原理工大学 Blind super-resolution reconstruction method based on kernel uncertainty learning and degradation embedding

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843553A (en) * 2023-07-11 2023-10-03 太原理工大学 Blind super-resolution reconstruction method based on kernel uncertainty learning and degradation embedding
CN116843553B (en) * 2023-07-11 2024-01-02 太原理工大学 Blind super-resolution reconstruction method based on kernel uncertainty learning and degradation embedding

Similar Documents

Publication Publication Date Title
CN108520503B (en) Face defect image restoration method based on self-encoder and generation countermeasure network
CN111798400B (en) Non-reference low-illumination image enhancement method and system based on generation countermeasure network
CN109146784B (en) Image super-resolution reconstruction method based on multi-scale generation countermeasure network
CN111598761B (en) Anti-printing shooting image digital watermarking method based on image noise reduction
CN111275637B (en) Attention model-based non-uniform motion blurred image self-adaptive restoration method
Sim et al. MaD-DLS: mean and deviation of deep and local similarity for image quality assessment
CN110175986B (en) Stereo image visual saliency detection method based on convolutional neural network
Yan et al. Deep objective quality assessment driven single image super-resolution
CN110473142B (en) Single image super-resolution reconstruction method based on deep learning
CN113688723A (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN113284051B (en) Face super-resolution method based on frequency decomposition multi-attention machine system
CN113313657A (en) Unsupervised learning method and system for low-illumination image enhancement
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
Sun et al. Multiscale generative adversarial network for real‐world super‐resolution
Singla et al. A review on Single Image Super Resolution techniques using generative adversarial network
CN116051408B (en) Image depth denoising method based on residual error self-coding
CN112489164A (en) Image coloring method based on improved depth separable convolutional neural network
Zheng et al. T-net: Deep stacked scale-iteration network for image dehazing
CN116645569A (en) Infrared image colorization method and system based on generation countermeasure network
CN116485934A (en) Infrared image colorization method based on CNN and ViT
Liu et al. Facial image inpainting using multi-level generative network
CN116152061A (en) Super-resolution reconstruction method based on fuzzy core estimation
CN116029902A (en) Knowledge distillation-based unsupervised real world image super-resolution method
CN112598604A (en) Blind face restoration method and system
US20240054605A1 (en) Methods and systems for wavelet domain-based normalizing flow super-resolution image reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination