CN114049261A

CN114049261A - Image super-resolution reconstruction method focusing on foreground information

Info

Publication number: CN114049261A
Application number: CN202210035833.6A
Authority: CN
Inventors: 何凡; 彭丽薇; 邓靖凛; 吴家俊; 程艳芬; 李辰皓
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-02-15
Anticipated expiration: 2042-01-13
Also published as: CN114049261B

Abstract

The invention discloses an image super-resolution reconstruction method for paying attention to foreground information. And further providing the PAMNet, connecting a plurality of PAM modules in series in the PAMNet, introducing jump connection, fully utilizing the shallow features of the image, training through a designed network, and finishing the reconstruction of the super-resolution image. The method can concentrate on the extraction of the foreground information and the identification features of the image, simultaneously reserve the color and texture features of the image, and improve the utilization rate of a shallow layer; the method can reduce the number of parameters and has better objective scoring; the invention has good balance between the performance and the model complexity, and the PAM module has universality and can be embedded into various network structures.

Description

Image super-resolution reconstruction method focusing on foreground information

Technical Field

The invention relates to the technical field of image processing and identification methods, in particular to a super-resolution image reconstruction method focusing on foreground information.

Background

Super-Resolution reconstruction (SR) is an image processing technique that obtains a high-Resolution image by using low-Resolution image restoration. The SR technology aims at improving the image resolution through signal processing and a software method without changing the limitation of physical imaging equipment, and the SR technology not only has important academic research value, but also has practical application in many fields, such as medical imaging, video security monitoring, remote sensing image processing and the like. In addition to improving image quality, SR technology can also improve many computer vision tasks, and increasing image resolution to obtain high quality images has become a problem that needs to be solved urgently in the research community.

Single Image Super-Resolution (SISR) is a research hotspot in the field of Image Super-Resolution reconstruction. SISR only uses a single low-Resolution image as input, reconstructs a High Resolution (HR) image with rich image details and clear textures, and has High practical value. SISR algorithms are mainly divided into three categories: interpolation-based methods, reconstruction-based methods, learning-based methods. Interpolation-based methods such as bicubic interpolation and the like lose a large amount of image details, and reconstructed images are seriously distorted and have low definition. The reconstructed picture quality is relatively high but the expandability is poor based on the model obtained by the reconstruction method such as a neighborhood embedding method, a sparse representation method and the like by adopting the traditional machine learning algorithm. With the development of deep learning and generation of confrontation networks, a learning-based method continuously emerges new novel networks, good performance is shown in the aspect of improving image details and texture features, the reconstructed and generated HR image is more vivid, and great progress is made in image super-resolution.

The residual error network uses residual error learning, a skip connection mode is used in network design for direct mapping in a residual error unit, and a depth model VDSR based on the residual error learning improves the model performance by introducing a residual error structure, but has the problems of large training parameter quantity, unclear reconstructed image background and the like. The EDSR removes the BN layer, improving the quality of the reconstructed image by reducing the memory consumption of the BN layer to stack more layers, but the objective index of the reconstructed image is lower due to training using only the L1 loss.

At present, the super-resolution task of image processing still has the problems of underutilization of shallow features of images, unobtrusive image foreground, lack of visual emphasis and the like.

Disclosure of Invention

Aiming at the problems that the super-resolution task in the image processing still has the problems of underutilization of image shallow layer characteristics, unobtrusive image foreground and lack of visual emphasis, the invention provides the image super-resolution reconstruction method focusing on the foreground information, which reduces the number of training parameters while focusing on the image foreground information and the detail characteristics, improves the utilization rate of the image shallow layer characteristics and the model performance, and greatly improves the visual quality of the image super-resolution.

In order to achieve the above object, the present invention provides a method for reconstructing super-resolution of an image focusing on foreground information, the method comprising the following steps:

1) acquiring an image to be trained, preprocessing image data to obtain a characteristic diagram X ϵ R^C×H×W，R^C×H×WParameters representing an image, R representing a real number set, C representing the number of channels, H, W representing the image size;

2) for the characteristic diagram X ϵ R^C×H×WExtracting the features to obtain a feature output graph S_LFϵR^C×H×W；

3) Outputting a graph S based on the features_LFϵR^C×H×WCarrying out image reconstruction on the image characteristics;

4) training the reconstructed image by adopting a loss function to obtain a super-resolution image SR;

the method is characterized in that the specific steps of the step 2) comprise:

2.1) inputting the feature map X into a feature extraction layer of the PAMNet networkThrough N serial PAM module basic units, residual shallow features of the image are calculated and transmitted for multiple times, shallow features of all the preposed N-2 PAM modules are input to the tail end of the (N-1) th PAM module by using jump connection, and splicing operation is carried out in channel dimensions to obtain an image residual shallow feature map S ϵ R^10C×H×W；

2.2) the image residual shallow feature map S ϵ R^10C×H×WAdopting 1 × 1 convolution to reduce dimension and aggregate shallow features to obtain a feature map S with dimensions (C, H, W)_LϵR^C×H×W；

2.3) mapping the feature map S_LϵR^C×H×WObtaining an extracted characteristic output graph S through the Nth PAM module_LFϵR^C ^×H×W。

Preferably, the specific steps of step 1) include:

1.1) selecting a training sample image as a training sample image set;

1.2) randomly selecting n original images from the training sample image set to carry out cutting and mirror image inversion operation, and randomly dividing each original image into m multiplied by m images marked as I^SR(x) X =1,2, … … n; then randomly selecting q sub-images as training images;

1.3) the training image I^SR(x) Inputting the images to be trained into a downsampling layer of a PAMNet network, performing convolution operation on the images to be trained respectively through two times of serial LxL convolution cores, preliminarily extracting image color, contour and texture features, and increasing the number of feature map channels to obtain n images to be trained;

each lxl convolutional layer consists of three convolutional layers: firstly, passing through an L multiplied by L convolutional layer, then passing through an L multiplied by L BN layer and an L multiplied by L Relu layer; obtaining a feature map D of m multiplied by m through the 1 st convolution; obtaining a characteristic map X of m multiplied by m through convolution for the 2 nd time;

1.4) the downsampling layer of the PAMNet network finally outputs an m X m image feature map X ϵ R^C×H×WWherein R is^C ^×H×WParameters representing an image, R represents a real number set, C represents the number of channels, H, W represents the image size, and H = W = m.

Preferably, the L × L convolutional layers performing convolution operation each time in step 1.3) are all composed of three convolutional layers: l × L convolutional layers, L × L BN layers and an L × L Relu layer; obtaining a feature map D of m multiplied by m through the 1 st convolution; the feature map X of m × m is obtained by the 2 nd convolution.

Preferably, the specific steps of step 2.1) include:

2.1.1) when the characteristic diagram X is input into the tPA module at the time t, regarding the 1 st PAM module, X ϵ R^C×H×WInputting the Residual error module Residual to carry out convolution operation to obtain an output characteristic diagram X_RϵR^C×H×W；

In the Residual module Residual, the feature map X is input_IN ϵR^C×H×W=XϵR^C×H×WUsing two convolution kernels F ϵ R of size K^C×P×PPerforming convolution operation by using the convolution of the packet with the number of g and the convolution of 1 × 1, wherein the parameter P is_GComprises the following steps:

P_G=(K×K×C×C×1/g+1×1×C×C)×2

=2×C×C×(K×K×1/g+1)

wherein C represents the number of channels, and K multiplied by K represents the size of a convolution kernel F; obtaining an output characteristic diagram X through a Residual module_RϵR^C×H×W；

2.1.2) outputting the characteristic diagram X_RϵR^C×H×WSimultaneous input of Channel Attention and Spatial Attention modules Channel Attention and Spatial Attention in parallel computing Channel Attention Y_CAnd spatial domain attention Y_S；

Preferably, the channel domain attention Y is calculated_CIn the process, a SENet structure is adopted, the full connection layer in the SENet is replaced by 1 × 1 convolution, the space characteristics of the image are reserved, and the specific calculation of the attention of the channel domain is as follows:

Y_C=X+CA(X_R)

in the formula, X ϵ R^C×H×WRepresenting the input of a residual block, X_RϵR^C×H×WRepresenting the output after computation of the residue, CA () representing the compute channel domain attention, Y_CϵR^C×H×WA final output representing the channel domain attention;

simultaneous parallel computation of spatial domain attention Y_SFirstly, three-layer cascade expansion convolution with expansion rates of 1,2 and 3 is used for calculating spatial domain attention, firstly, 1 multiplied by 1 convolution is used for dimensionality reduction, and an input feature graph with dimensionality (C, H and W) is converted into a feature graph with dimensionality (C/K, H and W), wherein K is a dimensionality reduction coefficient;

secondly, performing three times of expansion convolution with different expansion rates on the feature map after dimension reduction, and expanding the receptive field by the minimum parameter number in a limited step number, thereby ensuring the continuity of the receptive field and avoiding information loss caused by pooling;

and finally, fusing information of different channels of the feature diagram by using 1 × 1 convolution and activating by Sigmoid to obtain the feature diagram weight phi of (1, H, W) dimension, and multiplying the weight to the input feature diagram X in the (H, W) dimension distribution_RϵR^C×H×WThe purpose of paying attention to the foreground information of the image is achieved, and the specific calculation of the attention of the spatial domain is as follows:

Y_S=X+SA(X_R)

in the formula, X ϵ R^C×H×WRepresenting the input of a residual block, X_RϵR^C×H×WRepresenting the output after computation of the residual, SA () representing the computation of spatial domain attention, Y_SϵR^C×H×WRepresenting the final output of spatial domain attention.

2.1.3) attention of the channel region Y_CAnd spatial domain attention Y_SSplicing operation is carried out in the channel dimension to obtain an input value G of the Gate control network Gate_INϵR^2C×H×W；

2.1.4) input value G of the gated network Gate_INϵR^2C×H×WInformation fusion using 1 × 1 convolution, G_INIs reduced to (C, H, W); performing feature extraction and Sigmoid activation by two times of 3 × 3 convolution to obtain activation output with value range of (0, 1)σϵR^C×H×W；

2.1.5) outputting the activationσDivision by Y as linear combination coefficient_CAnd Y_STo obtain an output G_OUTϵR^C×H×W；

2.1.6) continuously updating the activation output during the back propagationσDynamically score by learningMatching the weights of the channel domain attention and the spatial domain attention, and concentrating on the attention domain with higher weight to extract image foreground information;

preferably, the specific calculation of the activation output σ that is continuously updated during the back-propagation is

G_OUT=(1-σ)Y_C+σY_S。

2.1.7) output G of the gating network_OUTϵR^C×H×WtPA module initial input X with current time_INϵR^C ^×H×WAdding to obtain X_OUTϵR^C×H×W；

2.1.8) repeating the operation of the steps 2.1.1) -2.1.7) for N-2 times, and passing N-1 serial PAM modules in total; wherein the input of the first PAM module is X ϵ R^C×H×WFor PAMNet other PAM modules, the input and output relations are as follows;

the output of the PAM module at the last time t-1 is recorded as: x_OUT(t-1)ϵR^C×H×WWill be used as the input X of the tPA module at the current time_IN(t)ϵR^C×H×WThat is to say X_IN(1)ϵR^C×H×W=XϵR^C×H×W(ii) a Output X of tPAM at current time_OUT(t)ϵR^C×H×WAs input X to the next time t +1PAM module_IN(t+1)ϵR^C×H×WWherein t is in the interval [1, 10 ]]In between, other parameters X_R、Y_C、Y_S、G_IN、G_OUTAnd so on.

2.1.9) inputting shallow layer characteristics of all preposed N-2 PAM modules to the tail end of the (N-1) th PAM module of the characteristic extraction layer by adopting jump connection, and splicing in channel dimensions to obtain an image residual error shallow layer characteristic diagram S ϵ R spliced in channel dimensions with dimensions of (10C, H, W)^10C×H×WWhere H = W = m.

Preferably, the specific steps of step 3) include:

3.1) adopting a sub-Pixel convolution Pixel-Shuffle method to the characteristic map S obtained in the step 2)_LFPerforming a/2 times of upsampling; then, performing convolution operation of a convolution kernel bx b on the image matrix; and then using Leaky-Relu activation function to carry out image momentActivating the array, and outputting the activated image matrix S_N1；

3.2) applying bicubic interpolation method to corresponding characteristic graph S_LFPerforming a/2 times up-sampling to obtain an image matrix S_N1Image matrix S with the same size and channel number_P1(ii) a Then to S_N1And S_P1Summing to obtain an image matrix S_NP1：

S_NP1= S_N1+S_P1

3.3) to the image matrix S_NP1Performing convolution operation of a convolution kernel c x c, and outputting an image matrix of 128 channels; then activating by adopting a Leaky-Relu activation function, and performing a/2 times of up-sampling on the activated image matrix by adopting a Pixel-Shuffle method; performing convolution operation of convolution kernel bxb on the image matrix, activating the image matrix by adopting a Leaky-Relu activation function, and outputting an activated image matrix S_N2；

3.4) applying bicubic interpolation method to the image matrix S_NP1A/2 times of amplification is carried out, and an image matrix S is output_P2(ii) a To S_N2And S_P2Summing to obtain an image matrix S_NP2：

S_NP2=S_N2+S_P2

3.5) for the image matrix S_NP2Performing convolution operation of a convolution kernel c x c, and outputting an image matrix of 128 channels; and then activating by adopting a Leaky-Relu activation function, and performing convolution operation of convolution kernel dxd on the activated image matrix to finally obtain a reconstructed image, wherein a, b, c and d are natural numbers which are not 0.

Preferably, the reconstructed image and the corresponding original image in the step 4) are input into a pre-training VGG-19 network for training, that is, a non-uniform joint loss L is adopted_UTo counter the loss L_GContent loss L_CAnd (3) extracting more identification features and detail information while constraining the network to learn the color and texture features of the image by the loss function formed by weighting, and paying more attention to the reconstruction of image foreground information to obtain the super-resolution image SR.

Preferably, the loss function L of the PAMNet network is:

L=γL_G+λL_U+ η L_C

wherein γ, λ, η represent weights against loss, non-uniform joint loss, and content loss, respectively, and γ =0.05, λ =1, η =0.1 is taken;

wherein the discriminator loses L_DComprises the following steps:

L_D=-E_xr[log(D(x_r,x_f))]-E_xf[log(1-D(x_f,x_r))]

against loss L_GComprises the following steps:

L_G=-E_xr[log(1-D(x_r,x_f))]-E_xf[log(D(x_f,x_r))]

in the formula, x_rFor real images, x_fFor reconstructing the image, D (x)_r,x_f) Calculating the difference between the real image and the reconstructed image and limiting D (x) using Sigmoid_r,x_f)ϵ(0,1)，E[]Represents a mathematical expectation;

non-uniform joint loss L_UBased on the L1 loss, the L1 loss L before the first pooling layer was calculated separately_VGG1And L1 loss L before the last pooling layer_VGG2By adjusting L_VGG1And L_VGG2While the constraint generator extracts the underlying features, more detailed information and identifying features are learned:

L_U=αL_VGG1+βL_VGG2

in the formula, alpha is L_VGG1Beta is L_VGG2The weight of (a) is given by α =0.2 and β = 0.1;

L_C=μL₁ (x_r,x_f)+θL₂ (x_r,x_f)

in the formula: x is the number of_rFor real images, x_fFor reconstructing the image, μ and θ represent the weights of the L1 loss and L2 loss, respectively, L₁、L₂L1 loss and L2 loss were expressed, respectively, and μ =0.75 and θ = 0.25.

The invention also proposes a computer-readable storage medium, in which a computer program is stored, which is characterized in that the computer program implements the above-mentioned method when being executed by a processor.

In the image processing method, a game theory method is adopted for generating a countermeasure network (GAN), and a model consists of a generator and a discriminator. In the image super-resolution, a generator is responsible for generating a reconstructed image, a discriminator discriminates the difference between the generated image and a real image according to self-judgment conditions, and the image is further restored by adopting a method of trying to deceive the discriminator. Due to the existence of a special mechanism, the method can generate high-resolution images with good visual effect.

For most of super-resolution methods based on deep learning, in a certain layer of characteristics learned by a network model, neuron activation values at different positions and on different channels have the same weight, and a super-resolution model combined with an attention mechanism can select more important activation values for a super-resolution task and give more weight, so that the reconstruction effect is improved.

The invention provides a universal gating Attention mechanism module PAM (parallel Attention model), which is used for parallelly calculating the Attention Y of a channel domain on a Residual error branch of a Residual Block_cAnd spatial domain attention Y_SWherein the channel domain attention uses full convolution to preserve image spatial features, the spatial domain attention uses cascade expansion convolution to increase the receptive field and keep the receptive domain intact, and the obtained attention Y_cAnd Y_SInputting a cascade gating network Gate, dynamically adjusting the weights of the gating network and the non-uniform joint loss, continuously updating the weight sigma in the back propagation process, dynamically distributing the weights of the attention of a channel domain and the attention of a space domain in a learning mode, and multiplying the weight coefficient distribution to Y_cAnd Y_SObtain the final output G_OUTAnd the PAM module focuses on the attention domain with higher weight, so that the method has the capability of deeply extracting foreground information and improves the definition of the foreground of the reconstructed image.

The PAM module is used as a core unit, a plurality of PAM modules are connected in series, then the PAM modules are sampled to be used as a basic structure, and the PAM module is constructed by using technologies such as jump connection, grouping convolution, feature fusion and the like. The PAMNet is composed of a down-sampling layer, a feature extraction layer and an up-sampling layer. And the downsampling layer preliminarily extracts the bottom-layer features of the image through 2 times of L multiplied by L convolution and increases the number of feature map channels. The feature extraction layer takes a PAM module as a basic structural unit, shallow features are input into the last PAM module of the feature extraction layer by using jump connection, image shallow residual information is fully utilized, and high-level semantic information and identification features of the image are mined. The up-sampling layer adopts Pixel-Shuffle to enlarge the image. In addition, the PAMNet feature extraction layer uses grouping convolution to reduce the parameter quantity, adds 1 x 1 convolution to reduce dimension and aggregates shallow features.

The invention further provides a non-uniform joint loss L_UWhen the network is restrained to learn the color and texture features of the image, the method focuses more on extracting identification features and detail information, emphasizes on reconstruction of image foreground information and highlights visual emphasis.

Compared with the prior art, the invention has the following advantages:

1. the invention provides a universal gate control attention mechanism module PAM (pulse amplitude modulation), which is used for calculating the attention Y of a channel domain on a Residual error branch of a Residual Block in parallel_cAnd spatial domain attention Y_sThe attention of the channel domain and the attention of the space domain are dynamically distributed in a learning mode, so that the PAM module focuses on the attention domain with higher weight, the capability of deeply extracting foreground information is achieved, and the foreground definition of the reconstructed image is improved.

2. The PAMNet reduces the number of parameters, pays attention to the image foreground information, can fully extract detail features and high-frequency information, and has clear details and edges of reconstructed images and better objective scoring.

3. The invention can generate clearer foreground information than the prior art, the detail texture characteristics of the reconstructed image are closer to the real image, the color and the texture characteristics of the image are reserved, and the shallow layer utilization rate is improved.

4. The invention has good balance between the performance and the model complexity, and the PAM module has universality and can be embedded into various network structures.

Drawings

Fig. 1 is a flowchart of an image super-resolution reconstruction method focusing on foreground information according to the present invention.

Fig. 2 is a schematic structural diagram of a PAMNet proposed in the image super-resolution reconstruction method focusing on foreground information.

Fig. 3 is a schematic diagram of a specific structure of PAM in fig. 2.

Fig. 4 is a schematic diagram of a specific structure of the Gate network Gate in fig. 3.

FIG. 5 is a schematic diagram showing the experimental effect of the method of the present invention compared with other methods.

Fig. 6 is a diagram illustrating the comparison between the average PSNR value and the parameter amount of the PAMNet and other networks according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

As shown in fig. 1, the image super-resolution reconstruction method for focusing on foreground information according to the embodiment of the present invention includes the following steps:

step 1) obtaining an image to be trained, preprocessing image data to obtain a characteristic diagram X ϵ R^C×H×W，R^C×H×WParameters representing an image, R representing a real number set, C representing the number of channels, and H, W representing the image size.

1.1) selecting training sample images as a training sample image set, and selecting 3450 images of DIV2K and Flickr2K as a training data set;

1.2) randomly selecting 3450 original images from the training sample image set to perform cutting and mirror image inversion operations, and randomly dividing each original image into 128 x 128 images marked as I^SR(x) X =1,2, … … n; then, randomly selecting 160000 sub-images as training images;

1.3) will train the image I^SR(x) Inputting the images to be trained into a downsampling layer of a PAMNet network, respectively performing convolution operation on the images to be trained through two serial 5 multiplied by 5 convolution kernels, preliminarily extracting image color, contour and texture features, and increasing the number of feature map channels to obtain n images to be trained;

the 5 × 5 convolutional layers each time a convolution operation is performed are composed of three convolutional layers: a 5 × 5 convolutional layer, a 5 × 5 BN layer and a 5 × 5 Relu layer; obtaining a 128 multiplied by 128 feature map D through the 1 st convolution; the feature map X of 128 × 128 is obtained by the 2 nd convolution. The structure of the PAMNet network is shown in fig. 2.

1.4) downsampling layer of PAMNet network finally output image feature map X ϵ R of 128X 128^C×H×WWherein R is^C ^×H×WParameters representing an image, R represents a real number set, C represents the number of channels, and C =64, H, W represents the image size, and H = W = 128.

Step 2) for the feature map X ϵ R^C×H×WExtracting the features to obtain a feature output graph S_LFϵR^C×H×W：

2.1) inputting the feature map X into a feature extraction layer of a PAMNet network, calculating and transmitting residual shallow features of the image for multiple times through 11 serial PAM module basic units, inputting the shallow features of all the 9 preposed PAM modules to the tail end of a 10 th PAM module by using jump connection and carrying out splicing operation in channel dimensions to obtain an image residual shallow feature map S ϵ R^10C×H×WWhere H = W = 128. The structure of the PAM module is shown in fig. 3.

The specific steps of step 2.1) include:

2.1.1) when the characteristic diagram X is input into the tPA module at the time t, regarding the 1 st PAM module, X ϵ R^64×128×128Inputting the Residual error module Residual to carry out convolution operation to obtain an output characteristic diagram X ϵ R^64×128×128；

In the Residual module Residual, the feature map X is input_IN ϵR^64×128×128=XϵR^64×128×128Using two convolution kernels F ϵ R of size 3X 3^64×3×3Performing convolution operation by using 16 packets and 1 × 1 convolution, wherein the parameter P is_GComprises the following steps:

P_G=(3×3×64×64×1/16+1×1×64×64)×2

=2×64×64×(3×3×1/16+1) =12800 (1)

obtaining an output characteristic diagram X through a Residual module_RϵR^64×128×128。

2.1.2) outputting the characteristic diagram X_RϵR^64×128×128Simultaneous input of Channel Attention and Spatial Attention modules Channel Attention and Spatial Attention in parallel computing Channel Attention Y_CAnd spatial domain attention Y_S。

Computing channel Domain attention Y_CIn the process, a SENet structure is adopted, the full connection layer in the SENet is replaced by 1 × 1 convolution, the space characteristics of the image are reserved, and the specific calculation of the attention of the channel domain is as follows:

Y_C=X+CA(X_R) (2)

in the formula, X ϵ R^64×128×128Representing the input of a residual block, X_RϵR^64×128×128Representing the output after computation of the residue, CA () representing the compute channel domain attention, Y_CϵR^64×128×128A final output representing the channel domain attention;

simultaneous parallel computation of spatial domain attention Y_SFirstly, three-layer cascade expansion convolution with expansion rates of 1,2 and 3 is used for calculating spatial domain attention, firstly, 1 x 1 convolution is used for dimensionality reduction, an input feature map with dimensionality of (64,128,128) is converted into a feature map with dimensionality of (64/K,128 and 128), wherein K is a dimensionality reduction coefficient, and K =4 is taken;

secondly, performing expansion convolution with different expansion rates for 3 times on the feature map after dimension reduction, expanding the receptive field by the minimum parameter number in a limited step number, ensuring the continuity of the receptive field and avoiding information loss caused by pooling;

and finally, fusing information of different channels of the feature map by using 1 × 1 convolution and activating by Sigmoid to obtain (1,128,128) dimension feature map weight phi, and multiplying the weight to the input feature map X in (128 ) dimension distribution_RϵR⁶⁴ ^×128×128The purpose of paying attention to the foreground information of the image is achieved, and the specific calculation of the attention of the spatial domain is as follows:

Y_S=X+SA(X_R) (3)

in the formula, X ϵ R^64×128×128Representing the input of a residual block, X_Rϵ R^64×128×128Representing the output after computing the residual, CA representing the computing spatial domain attention, Y_Sϵ R^64×128×128Representing the final output of spatial domain attention.

2.1.3) attention of channel Domain Y_CAnd spatial domain attention Y_SSplicing operation is carried out in the channel dimension to obtain an input value G of the Gate control network Gate_INϵ R^{128×128×128}. The gated network Gate structure is shown in fig. 4.

2.1.4) gating the input value G of the network Gate_INϵ R^{128×128×128}Information fusion using 1 × 1 convolution, G_INIs reduced to (64,128,128); performing feature extraction and Sigmoid activation by two times of 3 × 3 convolution to obtain activation output with value range of (0, 1)σϵR^64×128×128。

2.1.5) will activate the outputσDivision by Y as linear combination coefficient_CAnd Y_STo obtain an output G_OUTϵR^64×128×128。

2.1.6) continuously updating the activation output sigma in the back propagation process, dynamically distributing the weights of the attention of a channel domain and the attention of a space domain in a learning mode, and concentrating on the attention domain with higher weight to extract image foreground information; the specific calculation is as follows:

G_OUT=(1-σ)Y_C+σY_S(4)

2.1.7) output G of the gating network_OUTϵR^64×128×128tPA module initial input X with current time_INϵR⁶⁴ ^×128×128Adding to obtain X_OUTϵR^64×128×128。

2.1.8) repeating the operations of steps 2.1.1) -2.1.7) for 9 times, wherein a total of N-1 serial PAM modules are passed; wherein the input of the first PAM module is X ϵ R^64×128×128For the PAMNet other PAM modules, the input-output relationship is as follows:

the output of the PAM module at the last time t-1 is recorded as: x_OUT(t-1)ϵR^64×128×128Will be used as the input X of the tPA module at the current time_IN(t)ϵR^64×128×128That is to say X_IN(1)ϵR^64×128×128=XϵR^64×128×128(ii) a Current time tOutput X of PAM_OUT(t)ϵR^64×128×128As input X to the next time t +1PAM module_IN(t+1)ϵR^64×128×128Wherein t is in the interval [1, 10 ]]In between, other parameters X_R、Y_C、Y_S、G_IN、G_OUTAnd so on.

2.1.9) inputting shallow features of all the preposed 9 PAM modules to the tail end of the 10 th PAM module of the feature extraction layer by adopting jump connection, and splicing in channel dimension to obtain a channel dimension spliced image residual shallow feature map S ϵ R with dimension (640,128,128)^{640×128×128}。

2.2) obtaining the image residual shallow feature map S ϵ R obtained in the step 2.1)^{640×128×128}Adopting 1 × 1 convolution to reduce dimension and aggregate shallow features to obtain feature map S with dimension (64,128,128)_Lϵ R^64×128×128；

2.3) mapping the feature map S_LϵR^64×128×128Obtaining the extracted characteristic output graph S through the last PAM module (the 11 th PAM module)_LFϵR^64×128×128。

3) Outputting a graph S based on features_LFϵR^64×128×128Performing image reconstruction on the image features:

3.1) adopting a sub-Pixel convolution Pixel-Shuffle method to carry out comparison on the characteristic map S obtained in the step 2)_LFPerforming 2 times of upsampling; then, performing convolution operation of convolution kernel 1 multiplied by 1 on the image matrix; activating the image matrix by adopting a Leaky-Relu activation function, and outputting an activated image matrix S_N1；

3.2) applying bicubic interpolation method to corresponding characteristic graph S_LFPerforming 2 times of upsampling to obtain an image matrix S_N1Image matrix S with the same size and channel number_P1(ii) a Then to S_N1And S_P1Summing to obtain an image matrix S_NP1；

S_NP1= S_N1+S_P1 (5)

3.3) to the image matrix S_NP1Performing convolution operation of convolution kernel 3 × 3, and outputting an image matrix of 128 channels;then activating by adopting a Leaky-Relu activation function, and performing 2-time upsampling on the activated image matrix by adopting a Pixel-Shuffle method; performing convolution operation of convolution kernel 1 × 1 on the image matrix, activating the image matrix by adopting a Leaky-Relu activation function, and outputting an activated image matrix S_N2；

3.4) applying bicubic interpolation method to the image matrix S_NP1Amplifying by 2 times and outputting an image matrix S_P2(ii) a To S_N2And S_P2Summing to obtain an image matrix S_NP2：

S_NP2=S_N2+S_P2 (6)

3.5) to the image matrix S_NP2Performing convolution operation of convolution kernel 3 × 3, and outputting an image matrix of 128 channels; and then activating by adopting a Leaky-Relu activation function, and performing convolution operation of convolution kernel 9 x 9 on the activated image matrix to finally obtain a reconstructed image.

4) And training the reconstructed image by adopting a loss function to obtain a super-resolution image SR.

Inputting the reconstructed image and the corresponding original image into a pre-training VGG-19 network for training, namely adopting non-uniform joint loss L_UTo counter the loss L_GContent loss L_CAnd (3) extracting more identification features and detail information while constraining the network to learn the color and texture features of the image by the loss function formed by weighting, and paying more attention to the reconstruction of image foreground information to obtain the super-resolution image SR.

The loss function L for the entire PAMNet network is:

L=γL_G+λL_U+ η L_C (7)

the method is based on generation of a confrontation structure training network model and optimizes model parameters through combination of discriminator loss and generator loss.

Wherein the discriminator loses L_DComprises the following steps:

L_D=-E_xr[log(D(x_r,x_f))]-E_xf[log(1-D(x_f,x_r))] (8)

against loss L_GComprises the following steps:

L_G=-E_xr[log(1-D(x_r,x_f))]-E_xf[log(D(x_f,x_r))] (9)

in formulae (8) and (9), x_rFor real images, x_fFor reconstructing the image, D (x)_r,x_f) Calculating the difference between the real image and the reconstructed image and limiting D (x) using Sigmoid_r,x_f)ϵ(0,1) E[]Representing a mathematical expectation.

L_U=αL_VGG1+βL_VGG2(10)

in the formula, alpha is L_VGG1Beta is L_VGG2The weight of (a) is given by α =0.2 and β = 1;

L_C=μL₁ (x_r,x_f)+θL₂ (x_r,x_f) (11)

in the formula: x is the number of_rFor real images, x_fFor reconstructing the image, μ and θ represent the weights of the L1 loss and L2 loss, respectively, taking μ =0.75, θ =0.25, L₁、L₂Indicating L1 losses and L2 losses, respectively.

Comparative experiment:

a total of 3450 images of DIV2K and Flickr2K were selected as the training data Set, and as shown in FIG. 5, Set5, Set14, BSD100 and Urban100 were selected as the test data Set. Compared with the existing image super-resolution method in the aspects of subjectivity and objectivity, the PAM module is respectively embedded into backbone networks of SRGAN and ESRGAN to verify the effectiveness and the universality of the PAM module, and PSNR and SSIM are used as quantization standards for reconstructed image quality on objective indexes. The last picture in fig. 5 shows that PAMNet pays attention to image foreground information while reducing the number of parameters, can fully extract detail features and high-frequency information, and the details and edges of the reconstructed image are clear and have better objective scores.

And replacing the basic residual block of the SRGAN and the RRDB structure in the ESRGAN by the PAM module, keeping other structures and loss functions in the original network unchanged, and verifying the effectiveness and the universality of the PAM module, as shown in FIG. 6. PSNR (Peak Signal to noise ratio) of the PAM-SRGAN on 4 test sets is improved by the accuracy value in an interval [0.54dB, 1.91dB ] compared with the SRGAN; compared with the ESRGAN, the PSNR of the PAM-ESRGAN improves the precision value in the interval of 0.03dB and 0.21dB on 4 test sets. The technical result shows that the PAM module improves the performance of the SRGAN and the ESRGAN network and has good universality.

When the PAM module number in the feature extraction layer is N =11, the PAMNet comprehensively shows that PSNR on 4 data sets exceeds that of the SOTA method RFB-ESRGAN and RFANet. In addition, the PSNR value increases with the increase of the number N of the PAM modules, and when N =11, the PSNR can reach 26.93dB without significant increase, so that the model performance and the complexity are well balanced, and the peak signal-to-noise ratio and the visual quality of a reconstructed image are improved.

The invention can generate clearer foreground information than the prior art, the detail texture characteristics of the reconstructed image are closer to the real image, the color and texture characteristics of the image are reserved, and the integral definition of the image is basically equal to that of the SOTA methods such as RFB-ESRGAN, RFANet and the like.

The average PSNR value of the PAMNet reconstructed image is superior to that of SRCNN, SRGAN, VDSR, EDSR, DBPN, ESRGAN, PRANet, RFB-ESRGAN and other technical methods, and the PSNR average value index is increased by the range of [0.01dB, 0.07dB ]. The average SSIM value is slightly lower in the Set5 and Urban100 data sets than in the RFB-ESRGAN technology, and higher in all other data sets than in other technical approaches.

The jump connection has an important influence on the PAMNet, and the utilization of shallow features by the PAMNet can be improved by using the jump connection, so that the comprehensive performance of a model is improved.

The PAMNet parameters are fewer and have better performance than DBPN, RFANet, SAN, ESRGAN. Compared with RFB-ESRGAN, the PAMNet parameter amount is slightly larger but the overall performance is slightly better than that of RFB-ESRGAN. The results show that PAMNet achieves a good balance between performance and model complexity.

The invention relates to an image super-resolution reconstruction method focusing on foreground information, which provides a universal PAM module to extract foreground information and high-frequency characteristics of an image, extracts channel domain attention and space domain attention weight coefficients by using a gating network, and dynamically modifies the weights of the channel domain attention and the space domain attention in a back propagation process by matching with non-uniform joint loss. Further providing a PAMNet, connecting a plurality of PAM modules in series in the PAMNet, introducing jump connection, fully utilizing shallow features of the image, training through a designed network, and completing reconstruction of a super-resolution image; the method can concentrate on the extraction of the foreground information and the identification features of the image, simultaneously reserve the color and texture features of the image, and improve the utilization rate of a shallow layer; the method can reduce the number of parameters and has better objective scoring; the invention has good balance between the performance and the model complexity, and the PAM module has universality and can be embedded into various network structures.

Although the preferred embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and those skilled in the art can make various changes and modifications within the spirit and scope of the present invention without departing from the spirit and scope of the appended claims.

Claims

1. A method for super-resolution reconstruction of an image focusing on foreground information, the method comprising the steps of:

1) acquiring an image to be trained, preprocessing image data to obtain a characteristic diagram X ϵ R^C×H×W， R^C×H×WParameters representing an image, R representing a real number set, C representing the number of channels, H, W representing the image size;

the method is characterized in that: the specific steps of the step 2) comprise:

2.1) inputting the feature map X into a feature extraction layer of a PAMNet network, calculating and transmitting residual shallow features of the image for multiple times through N serial PAM module basic units, inputting the shallow features of all the preposed N-2 PAM modules into the tail end of an N-1 PAM module by using jump connection, and carrying out splicing operation in channel dimensions to obtain an image residual shallow feature map S ϵ R^10C×H×W；

2.3) mapping the feature map S_LϵR^C×H×WObtaining an extracted characteristic output graph S through the Nth PAM module_LFϵR^C×H×W。

2. The image super-resolution reconstruction method of the foreground information of interest according to claim 1, wherein: the specific steps of the step 1) comprise:

1.1) selecting a training sample image as a training sample image set;

1.4) Down-sampling of the PAMNet networkSample layer final output m X m image feature map X ϵ R^C×H×WWhere C denotes the number of channels, H, W denotes the image size, and H = W = m.

3. The image super-resolution reconstruction method of the foreground information of interest according to claim 2, wherein: the L × L convolutional layers performing convolution operation each time in the step 1.3) are all composed of three convolutional layers: l × L convolutional layers, L × L BN layers and an L × L Relu layer; obtaining a feature map D of m multiplied by m through the 1 st convolution; the feature map X of m × m is obtained by the 2 nd convolution.

4. The image super-resolution reconstruction method of the foreground information of interest according to claim 2, wherein: the specific steps of the step 2.1) comprise:

2.1.6) continuously updating the activation output during the back propagationσDynamically distributing the weights of the attention of the channel domain and the attention of the space domain in a learning mode, and concentrating on the attention domain with higher weight to extract image foreground information;

2.1.7) output G of the gating network_OUTϵR^C×H×WtPA module initial input X with current time_INϵR^C×H×WAdding to obtain X_OUTϵR^C×H×W；

2.1.8) repeating the operation of step 2.1.1) -2.1.7) N-2 times;

5. The image super-resolution reconstruction method of the foreground information of interest according to claim 1, wherein: the specific steps of the step 3) comprise:

3.1) adopting a sub-Pixel convolution Pixel-Shuffle method to the characteristic map S obtained in the step 2)_LFPerforming a/2 times of upsampling; then, performing convolution operation of a convolution kernel bx b on the image matrix; activating the image matrix by adopting a Leaky-Relu activation function, and outputting an activated image matrix S_N1；

3.2) applying bicubic interpolation method to corresponding characteristic graph S_LFPerforming a/2 times up-sampling to obtain an image matrix S_N1Image matrix S with the same size and channel number_P1(ii) a Then to S_N1And S_P1Summing to obtain an image matrix S_NP1；

3.3) to the image matrix S_NP1Performing convolution operation of a convolution kernel c x c, and outputting an image matrix of 128 channels; then activating by adopting a Leaky-Relu activation function, and performing a/2 times of up-sampling on the activated image matrix by adopting a Pixel-Shuffle method; for the image againPerforming convolution operation of convolution kernel bxb on the matrix, activating the image matrix by adopting a Leaky-Relu activation function, and outputting an activated image matrix S_N2；

6. The image super-resolution reconstruction method of the foreground information of interest according to claim 1, wherein: inputting the reconstructed image and the corresponding original image into a pre-training VGG-19 network for training in the step 4), namely adopting non-uniform joint loss L_UTo counter the loss L_GContent loss L_CAnd (3) extracting more identification features and detail information while constraining the network to learn the color and texture features of the image by the loss function formed by weighting, and paying more attention to the reconstruction of image foreground information to obtain the super-resolution image SR.

7. The image super-resolution reconstruction method of the foreground information of interest according to claim 6, wherein: the loss function L of the PAMNet network is:

L=γL_G+λL_U+ η L_C

wherein γ, λ, η represent weights for the penalty loss, non-uniform joint penalty loss, and content penalty loss, respectively;

wherein the discriminator loses L_DComprises the following steps:

L_D=-E_xr[log(D(x_r,x_f))]-E_xf[log(1-D(x_f,x_r))]

against loss L_GComprises the following steps:

L_G=-E_xr[log(1-D(x_r,x_f))]-E_xf[log(D(x_f,x_r))]

L_U=αL_VGG1+βL_VGG2

in the formula, alpha is L_VGG1Beta is L_VGG2The weight of (c);

L_C=μL₁ (x_r,x_f)+θL₂ (x_r,x_f)

in the formula: x is the number of_rFor real images, x_fFor reconstructing the image, μ and θ represent the weights of the L1 loss and L2 loss, respectively, L₁、L₂Indicating L1 losses and L2 losses, respectively.

8. The image super-resolution reconstruction method of the foreground information of interest according to claim 4, wherein: said 2.1.2) calculating the channel Domain attention Y_CIn the process, a SENet structure is adopted, the full connection layer in the SENet is replaced by 1 × 1 convolution, the space characteristics of the image are reserved, and the specific calculation of the attention of the channel domain is as follows:

Y_C=X+CA(X_R)

in the formula, X ϵ R^C×H×WRepresenting the input of a residual block, X_RϵR^C×H×WRepresenting the output after computation of the residue, CA () representing the compute channel domain attention, Y_CϵR^C×H×WIndicating channel domain attentionFinal output of force;

Y_S=X+SA(X_R)

9. The image super-resolution reconstruction method of the foreground information of interest according to claim 4, wherein: the specific calculation of continuously updating the activation output σ during the back propagation in step 2.1.6) is

G_OUT=(1-σ)Y_C+σY_S。

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 8.