CN112163998A

CN112163998A - Single-image super-resolution analysis method matched with natural degradation conditions

Info

Publication number: CN112163998A
Application number: CN202011015611.5A
Authority: CN
Inventors: 陈伦强; 龙学军; 陈东文
Original assignee: Zhaoqing Boshixin Electronic Technology Co ltd
Current assignee: Zhaoqing Boshixin Electronic Technology Co ltd
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2021-01-01

Abstract

The invention discloses a single image super-resolution analysis method matched with natural degradation conditions, which comprises a domain learning network and a super-resolution analysis learning network, wherein the domain learning network comprises a generator network G_dAnd discriminator network D_xThe super-resolution analysis learning network comprises a super-resolution analysis network G_SRAnd discriminator network D_y. Generator network G based on deep learning training_dTo generate a low resolution image such that it matches the naturally degraded low resolution source domain image x, and through a super resolution analysis network G_SRLow resolution through use of generation in generating a countermeasure network framework

The image and the corresponding high-resolution image are trained in a clean high-resolution target domain y by pixel-by-pixel supervision, residual error learning is carried out on the image blocks in the high-resolution domain from the generated low resolution by a minimized energy loss function, and countertraining is carried out by pixel-level supervision, so that an image degradation process matched with reality is realized.

Description

Single-image super-resolution analysis method matched with natural degradation conditions

Technical Field

The invention relates to the technical field of image processing, in particular to a single-image super-resolution analysis method matched with natural degradation conditions.

Background

The goal of single image super resolution analysis (SISR) is to recover a High Resolution (HR) image from a Low Resolution (LR) observed image. The single image super resolution problem is a fundamental low-level vision and image processing problem, and has various practical applications in satellite imaging, medical imaging, astronomy, microscopic imaging, seismology, remote sensing, surveillance, biometric identification, image compression, and the like. Recently, many works have solved the task of single-graph super-resolution analysis due to the powerful functional representation capabilities of deep CNNs. Lim et al in 2017 proposed an enhanced deep super resolution analysis network (EDSR) using residual learning. Wang et al in 2018 proposed an enhanced super-resolution analysis-generated countermeasure network model (ESRGAN) to achieve the most advanced perceptual performance at that time, trained on two-three downsampling through HR/LR images, and therefore limited performance in a practical setting. Fritsche et al, 2019, proposed to train the network to learn the degradation mechanism in an unsupervised manner and modify the enhanced super-resolution analysis generation countermeasure network into frequency-separated super-resolution analysis (SR-FS) to further improve its performance in real environments [3 ]. These single image super resolution analysis methods based on deep learning learn a non-linear mapping between low resolution input and high resolution output from a large number of pairs of low resolution/high resolution (LR/HR) training data. The high-resolution images are actually shot high-resolution images, paired low-resolution images are generally obtained by a bicubic down-sampling degradation operation of the high-resolution images, and natural image features cannot be well summarized by using a known bicubic down-sampling degradation model. Therefore, when such degradation process is compared with the actual image damage, such as inherent sensor noise, random noise and compression artifacts, the image degradation process does not match with these natural degradation conditions, resulting in limited performance of super-resolution analysis, and therefore we propose a single-image super-resolution analysis method matching with the natural degradation conditions to solve the above problems.

Disclosure of Invention

The invention aims to provide a single-image super-resolution analysis method matched with natural degradation conditions, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a single image super-resolution analysis method matched with natural degradation conditions comprises a domain learning network and a super-resolution analysis learning network, wherein the domain learning network comprises a generator network G_dAnd discriminator network D_xThe super-resolution analysis learning network comprises a super-resolution analysis network G_SRAnd discriminator network D_yGenerator network G based on deep learning training_dAs a sampling network to generate a low resolution image such that it matches, i.e. has the same characteristics, a naturally degraded low resolution source domain image x, and through a super resolution analysis network G_SRLow resolution through use of generation in generating a countermeasure network framework

The image and the corresponding high-resolution image are trained in a clean high-resolution target domain y by pixel supervision, so that the mapping from a bicubic down-sampled image z of the high-resolution image y to a source domain image x is learned, the image content is kept, residual error learning is carried out on image blocks in the high-resolution domain from the generated low resolution, countertraining is carried out by pixel-level supervision, and an image degradation process matched with reality is realized.

In a preferred embodiment, the model of the degradation process is as follows:

wherein the content of the first and second substances,

is a low-resolution image obtained by the degradation,

is a downsampling operator which scales a high resolution image by a scaling factor s and an additive white gaussian noise η

Resizing, where the standard deviation of additive white gaussian noise is σ, N × N is the total number of pixels in the image.

In a preferred embodiment, the model of the degradation process is a training process, whereas in the application process, Y is an observed degraded low-resolution image, and the high-resolution image X is restored from the observed low-resolution image Y by using a variation method combining the observation and the prior information, and the corresponding loss function is as follows:

wherein, therein

Is a data fidelity term that measures the closeness of a solution to an observation, R_W(X) is a regularization term associated with the image prior information and is a trade-off parameter that controls the trade-off between data fidelity and regularization conditions.

In a preferred embodiment, the regularizer used by the regularization term is of the general form:

where L corresponds to a first or higher order differential linear operator and ρ (-) represents a function such as L applied to the filtered output_pA potential function of the vector or matrix norm, regularizer R_W(X) the parameter is expressed by W in the deep convolutional neural network, so that strong image prior capability can be obtained.

In a preferred embodiment, the energy loss minimization function uses a suitable optimization strategy to find W, and minimizes the energy-based loss function, and the method comprises the following steps:

1) the high resolution image X is restored by the minimization method:

by referring to equations (2) and (3), it is written as:

2) the image intensity needs to be limited, the values in the natural image need to be non-negative, so (5) it needs to be rewritten in a constraint optimization form:

and solving equation (6) by an approximate gradient method to process optimization of the function which cannot be completely differentiated, dividing the function into a smooth part and a non-smooth part, and rewriting equation (6) as follows:

wherein, iota_cIs a convex set

The trainable projection layer calculates a near-end mapping for the index function:

wherein

Is a parameterized threshold, where α is a trainable parameter, σ is the noise level, C H W is the total number of pixels in the image;

3) performing an iterative solution to the problem in (7) by updating the rules:

wherein, γ^tIs the step size of the frame,

is related to the index function iota_cA related near-end operator defined as:

the gradient of F (X) is calculated as follows:

wherein phi_k(. is a potential function ρ)_kGradient of (·);

4) by combining equations (8), (9) and (10), the final form is:

formula (11) is at the starting points Y and X⁽⁰⁾A near gradient descent inference step is performed at 0, which has the formula:

where α ═ λ γ corresponds to the projection layer trainable parameters, L_k ^TIs L_kIs associated with the filter, and H^TIndicating up-scaling operations

In a preferred embodiment, step 4), the formula (12) is also used for designing the generator network G_dWherein phi_k(. to) corresponds to a parameterized rectified linear unit (PReLU) applied to a convolutional signature.

In a preferred embodiment, the generatorNetwork of generators G_dTraining by adopting a loss function, so as to learn the image degradation of the domain distribution layer from the source domain x and train the generator network G under the framework of the countermeasure network_dThe loss function of (a) is as follows:

wherein L is_colorRepresents the color loss, L_texRepresents the loss of texture, L_perIndicating a loss of perception.

In an embodiment, the super-resolution analysis network G_SRThe network has the following loss function:

wherein the perception loss L_perFocusing on the perceptual quality of the output image, it is defined as:

where φ is a feature extracted from the pre-trained network;

texture loss L_GANFocusing on the high frequency part of the output image, it is defined as:

wherein E is_yAnd

respectively representing all true y sums generated for small image blocks

The data averaging operation, then the relative generation of true high-resolution and reconstructed high-resolution image blocks opposes the network score may be defined as:

where C is the raw discriminator output and is a sigmoid (sigmoid) function;

content loss L₁Is defined as:

where N is the size of the small image block;

total variation loss L_tv: focusing on minimizing gradient differences and producing sharpness in the output image, which is defined as;

wherein the content of the first and second substances,

and

representing horizontal and vertical gradations of the image.

In a preferred embodiment, the discriminator network D_xThe convolutional layers (Conv) are operated at the image block level, the convolutional layers (Conv) are provided with 64-256 feature mapping kernels supporting 5 x5, the last convolutional layer maps 256 to 1, and a normalization module (BN) and a Leak activation function (LReLU) are applied after each of the rest convolutional layers.

In a preferred embodiment, the discriminator network D_yThe true HR image can be distinguished from the generated pseudo-super-resolution analysis image through training, the original discriminator network contains 10 convolutional layers, the number of kernels supporting 3x3 and 4x4 feature mapping is 512, and then a normalization module (BN) and a Leak activation function are connected.

Compared with the prior art, the invention has the beneficial effects that: generating a countermeasure network through deep ultra-high resolution residual convolution, performing residual learning on image blocks in a high resolution domain through a minimized energy loss function from generated low resolution, and performing countermeasure training through pixel level supervision, thereby realizing an image degradation process matched with reality; the method can be used for improving the image quality in the real environment, strictly follows an image observation (physical) model, namely a super-resolution analysis learning method matched with a natural degraded image, so as to overcome the super-resolution challenge in the real world, and uses a large-scale optimization technology of image regularization and inverse problem solving to evaluate the method on a plurality of data sets of synthesized and natural degraded images, and the result shows that the method provided by the inventor is superior to other methods under the natural degraded condition of the real world.

Drawings

FIG. 1 is a schematic structural view of the present invention;

FIG. 2 is a super-resolution analysis network G in the present invention_SRA schematic structural diagram;

FIG. 3 is a diagram of an authenticator network D of the present invention_yThe structure is schematic.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution: a single image super-resolution analysis method matched with natural degradation conditions comprises a domain learning network and a super-resolution analysis learning network, wherein the domain learning network comprises a generator network G_dAnd discriminator network D_xThe super-resolution analysis learning network comprises a super-resolution analysis network G_SRAnd discriminator network D_ySince there is no pair of (low resolution/high resolution) data, the generator network G under deep learning training is first based_dAs a sampling network to generate a low resolution image such that the low resolution image and a naturally degraded low resolution imageThe source domain images x are matched, i.e. have the same characteristics, and are analyzed by the super-resolution analysis network G_SRLow resolution through use of generation in generating a countermeasure network framework

Learning a non-linear mapping between a low resolution input and a high resolution output from a large number of pairs of low resolution/high resolution (LR/HR) training data, the high resolution images being high resolution images taken from reality, the pairs of low resolution images being typically obtained from the high resolution images by a degeneration operation, the model of the degeneration process being as follows:

wherein the content of the first and second substances,

is a low-resolution image obtained by the degradation,

Further, the model of the degradation process is a training process, whereas in the application process, Y is an observed degraded low-resolution image, and a high-resolution image X is restored from the observed low-resolution image Y using a variational method combining observation and prior information, and the corresponding loss function is as follows:

wherein, therein

Further, the regularizer used by the regularization term is of the general form:

Further, in addition to properly formulating the loss function and the regularization function, the variational method used in the method also needs to obtain a minimization strategy of the required solution, the derived solution can be described as a penalty maximum likelihood estimation (p) or a Maximum A Posteriori (MAP) estimation, the minimization energy loss function adopts a proper optimization strategy to find W, and the energy-based loss function is minimized, and the method comprises the following steps:

1) the high resolution image X is restored by the minimization method:

by referring to equations (2) and (3), it is written as:

wherein, iota_cIs a convex set

wherein

wherein, γ^tIs the step size of the frame,

is related to the index function iota_cA related near-end operator defined as:

the gradient of F (X) is calculated as follows:

where φ k (·) is the potential function ρ_kGradient of (·);

4) by combining equations (8), (9) and (10), the final form is:

In step 4), the formula (12) is also used to design the generator network G_dWherein phi_k(. to) corresponds to a parameterized rectified linear unit (PReLU) applied to a convolutional signature.

Further, most of the parameters in equation (12) come from the antecedents of equation (2), which results in the proposed generator network representing most of its parameters as image prior information, with weights set to zero mean and fixed scale constraints for learning the effective weights of the regularization parameters, and with generator network G for learning the image degradation at the domain distribution level from the source domain x_dTraining by adopting a loss function, and training a generator under the condition of generating a countermeasure network frameworkNetwork G_dThe loss function of (a) is as follows:

Further, a generator network G_dConsisting of 8 residual modules (two convolutional layers and a PReLU activation component between them) sandwiched between two convolutional layers (Conv). All convolutional layers (Conv) have 64 feature mapping kernels supporting 3 × 3, while sigmoid (sigmoid) nonlinear functions are applied to G_dAnd (4) outputting of the network.

Discriminator network D_xThe convolutional network is composed of three layers of convolutional networks, the convolutional layers (Conv) run at the image block level and have 64-256 feature mapping kernels supporting 5 x5, the last convolutional layer maps 256 to 1, and a normalization module (BN) and a Leak activation function (LReLU) are applied after each of the rest convolutional layers.

Training G with 512x512 image blocks_dThe network is subjected to secondary down-sampling, and the source domain image (x) is randomly clipped to 128x128 size, specifically, the use parameter is beta₁＝0.5，β₂＝0.999，＝10^-8The Adam optimizer of (1) trains the network.

Super-resolution analysis network G_SRThe network has the following loss function:

where φ is a feature extracted from the pre-trained network;

textureLoss L_GANFocusing on the high frequency part of the output image, it is defined as:

wherein E is_yAnd

respectively representing all true y sums generated for small image blocks

where C is the raw discriminator output and is a sigmoid (sigmoid) function;

content loss L₁Is defined as:

where N is the size of the small image block;

wherein the content of the first and second substances,

and

representing horizontal and vertical gradations of the image.

Further, a super-resolution analysis network G_SRArchitecture of (1), encoder layer (L)_kFilter) and decoder layers(i.e., L)_k ^TFilters) have 64 eigenmap kernels supporting 5 x5, with a C x H x W tensor, where C is the number of channels of the input image, and inside the encoder the LR image Y is up-sampled by a bilinear kernel with an up-sampling layer (equivalent to H-upsampling layer)^TY operation), wherein a random up-sampling kernel is selected, a residual network (Resnet) is composed of 5 residual modules and 2 pre-activation convolutional layers, each of which has 64 feature mapping kernels supporting 3 × 3, and pre-activation (phi)_k(. DEG)) refers to a parameterized rectified linear unit (PReLU) supporting 64 functional channels, a trainable projection layer (Proj) inside the decoder, i.e. the near-end operator P_C) Computing a near-end mapping using the estimated noise standard deviation sigma and processing the data fidelity and a priori terms, and during training, fine-tuned by the projection layer (Proj) parameters, noise estimation in the residual network (Resnet) sandwiched between encoder and decoder, subtraction of the estimated residual image after decoder from the LR input image, finally, the cropping layer incorporates our a priori knowledge of the effective range of image intensities and forces the pixel values of the reconstructed image to [0, 255%]Interval, reflection filling is used before all convolutional layers to ensure slow variation at the boundaries of the input image.

Discriminator network D_yThe true HR image can be distinguished from the generated pseudo-super-resolution analysis image through training, the original discriminator network contains 10 convolutional layers, the number of kernels supporting 3x3 and 4x4 feature mapping is 512, and then a normalization module (BN) and a Leak activation function are connected.

Further, in training, the input LR image block size is set to 32x32 using a parameter β₁＝0.9，β₂＝0.999，＝10^-8The Adam optimizer network of (1) was subjected to 51000 iterative trainings, the number of samples in one training selection was 16, the weights of the generator and discriminator were not reduced to minimize the loss function in equation (14), the learning rate was initially set to 10-4, and then multiplied by 0.5 after 5K, 10K, 20K and 30K iterations, the projection layer parameter σ was estimated from the input LR image, and the projection layer parameter σ was estimated from α_max2 to a_minThe projection layer parameters α are initialized on a logarithmic scale value of 1 and then in the training processFurther fine-tuned by back propagation.

Further, the method of the invention is applied to practical test application, which specifically comprises the following steps:

the trained model is evaluated under peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM) and learning-perceptual image patch similarity (LPIPS) indices using source domain data (X: 2650 corrupted HR images) corrupted by unknown degradation, which may be sensor noise, compression artifacts, etc., using target domain data (Y: 800 sharp HR images), using training data augmented by random vertical and horizontal flips and 90 rotations. PSNR and SSIM are distortion-based metrics that have poor correlation to actual perception-based similarity, while LPIPS has better correlation to human perception than distortion-based metrics, comparing the method of the present invention with other most advanced super resolution analysis methods, including enhanced deep super resolution network (EDSR), enhanced super resolution generation countermeasure network (ESRGAN) and frequency separation super resolution analysis (SR-FS), the results of which are shown in the table below,

comparison results

The results show a quantitative result comparison of the method of the invention on the selected data sets, tested under synthetic degradation conditions (JPEG compression with 8 sensor noise and quality 30) and natural degradation conditions, respectively. Compared with other methods, the performance of the method is superior to that of other methods in both PSNR and SSIM indexes. On the other hand, the LPIPS index is not as good as SR-FS under the condition of synthetic degradation, but is superior to other methods under the condition of natural degradation in the real world.

In summary, the present invention generates a countermeasure network by a deep ultra-high resolution residual convolution, performs residual learning from the generated low resolution image blocks in a high resolution domain by minimizing an energy loss function, and performs countermeasure training by pixel-level supervision, thereby implementing an image degradation process matching reality, which can be used to improve image quality in such a real environment, strictly follows an image observation (physical) model, i.e., a super-resolution analysis learning method matching a naturally degraded image, to overcome the challenges of super-resolution in the real world, uses a large-scale optimization technique of image regularization and solving an inverse problem, evaluates the method on multiple data sets of synthesized and naturally degraded images, and results show that the method proposed by us is superior to other methods under the naturally degraded conditions in the real world.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A single image super-resolution analysis method matched with natural degradation conditions comprises a domain learning network and a super-resolution analysis learning network, and is characterized in that: the domain learning network comprises a generator network G_dAnd discriminator network D_xThe super-resolution analysis learning network comprises a super-resolution analysis network G_SRAnd discriminator network D_yGenerator network G based on deep learning training_dAs a sampling network to generate a low resolution image such that it matches, i.e. has the same characteristics, a naturally degraded low resolution source domain image x, and through a super resolution analysis network G_SRLow resolution through use of generation in generating a countermeasure network framework

The image and the corresponding high-resolution image are trained in a clean high-resolution target domain y by pixel-by-pixel supervision, so that the mapping from a bicubic down-sampled image z of the high-resolution image y to a source domain image x is learned while the image content is retained, and residual error learning is performed on the image blocks in the high-resolution domain by minimizing an energy loss function from the generated low resolutionAnd performing countermeasure training by using pixel-level supervision to realize an image degradation process matched with reality.

2. The single-image super-resolution analysis method matching with natural degradation conditions according to claim 1, wherein: the model of the degradation process is as follows:

wherein the content of the first and second substances,

is a low-resolution image obtained by the degradation,

3. The single-image super-resolution analysis method matching the natural degradation condition according to claim 2, wherein: the model of the degradation process is a training process, otherwise, in an application process, Y is an observed degraded low-resolution image, a high-resolution image X is recovered by the observed low-resolution image Y, a variation method combining observation and prior information is used, and corresponding loss functions are as follows:

wherein, therein

4. The single-image super-resolution analysis method matching natural degradation conditions according to claim 3, wherein: the regularizer used by the regularization term is generally of the form:

5. The single-image super-resolution analysis method matching with natural degradation conditions according to claim 1, wherein: the energy loss minimization function adopts a proper optimization strategy to find W, and minimizes an energy-based loss function, and the method comprises the following steps:

1) the high resolution image X is restored by the minimization method:

by referring to equations (2) and (3), it is written as:

wherein, iota_cIs a convex set

wherein

wherein, γ^tIs the step size of the frame,

is related to the index function iota_cA related near-end operator defined as:

the gradient of F (X) is calculated as follows:

wherein phi_k(. is a potential function ρ)_kGradient of (·);

4) by combining equations (8), (9) and (10), the final form is:

where α ═ λ γ corresponds to the projection layer trainable parameters, L_k ^TIs L_kIs associated with the filter, and H^TShowing a scale-up operation.

6. The single-image super-resolution analysis method matching natural degradation conditions according to claim 5, wherein: in step 4), the formula (12) is also used to design the generator network G_dWherein phi_k(. to) corresponds to a parameterized rectified linear unit (PReLU) applied to a convolutional signature.

7. The single-image super-resolution analysis method matching natural degradation conditions according to claim 6, wherein: the generator network G_dTraining by adopting a loss function, so as to learn the image degradation of the domain distribution layer from the source domain x and train the generator network G under the framework of the countermeasure network_dThe loss function of (a) is as follows:

8. The single-image super-resolution analysis method matching with natural degradation conditions according to claim 1, wherein: the super-resolution analysis network G_SRThe network has the following loss function:

where φ is a feature extracted from the pre-trained network;

wherein E is_yAnd

respectively representing all true y sums generated for small image blocks

where C is the raw discriminator output and is a sigmoid (sigmoid) function;

content loss L₁Is defined as:

where N is the size of the small image block;

wherein the content of the first and second substances,

and

representing horizontal and vertical gradations of the image.

9. The single-image super-resolution analysis method matching with natural degradation conditions according to claim 1, wherein: said discriminator network D_xThe convolutional layers (Conv) are operated at the image block level, the convolutional layers (Conv) are provided with 64-256 feature mapping kernels supporting 5 x5, the last convolutional layer maps 256 to 1, and a normalization module (BN) and a Leak activation function (LReLU) are applied after each of the rest convolutional layers.

10. The single-image super-resolution analysis method matching with natural degradation conditions according to claim 1, wherein: said discriminator network D_yThe true HR image can be distinguished from the generated pseudo-super-resolution analysis image through training, the original discriminator network comprises 10 convolutional layers and supports 3x3 and 4x4 feature mappingIs 512, after which a normalization module (BN) and a Leak activation function are connected.