CN113781308A

CN113781308A - Image super-resolution reconstruction method and device, storage medium and electronic equipment

Info

Publication number: CN113781308A
Application number: CN202111077959.1A
Authority: CN
Inventors: 马明才
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-05-19
Filing date: 2021-09-15
Publication date: 2021-12-10

Abstract

The invention discloses an image super-resolution reconstruction method, an image super-resolution reconstruction device, a storage medium and electronic equipment, wherein the image super-resolution reconstruction method comprises the following steps: acquiring an original image and a reconstruction network, extracting features of the original image by using a first feature extraction module, sequentially passing the original feature map through each second feature extraction module, respectively extracting output feature maps, and then splicing to obtain a global feature map; inputting the global feature map into a global channel attention module, and fusing the generated global channel attention map with the global feature map to obtain an enhanced feature map; and performing super-resolution reconstruction on the enhanced feature map by using an image reconstruction module to obtain a target image. The method has the advantages of high image information extraction efficiency, less redundant information, strong model generalization energy, less parameters and convenient training and deployment, and has good recovery capability on the lossy image acquired in the actual scene.

Description

Image super-resolution reconstruction method and device, storage medium and electronic equipment

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to an image super-resolution reconstruction method, an image super-resolution reconstruction device, a storage medium and electronic equipment.

Background

The single-image super-resolution technology is used for processing a given low-resolution picture to obtain a corresponding high-resolution picture, so that the picture has finer details and better visual effect. Image super-resolution reconstruction is a highly ill-posed problem since the known variables in low-resolution images are much larger than the unknown variables in high-resolution images, and there can be multiple solutions to the spatial mapping from low-resolution images to high-resolution images.

SRCNN is considered as the first successful model for single image super-resolution reconstruction using deep learning, and since then, single image super-resolution technology based on deep learning gradually replaces the traditional algorithm, becoming the mainstream of research in the industry and academia. The performance of the existing related models is basically improved by expanding a network structure, so that the model structure becomes too fat, the parameter quantity is large, redundant information is more, a large amount of computing power and time are consumed in the model training process, and the requirement of a model deployment scene on hardware is high.

Disclosure of Invention

In view of the above-mentioned deficiencies in the prior art, the present invention provides a method, an apparatus, a storage medium and an electronic device for image super-resolution reconstruction, so as to deploy a model in an actual application scenario.

In order to achieve the above purpose, the solution adopted by the invention is as follows: the image super-resolution reconstruction method comprises the following steps:

s1, acquiring an original image needing super-resolution reconstruction, and acquiring a trained image super-resolution reconstruction network, wherein the image super-resolution reconstruction network comprises an image reconstruction module, a global channel attention module, a first feature extraction module and a plurality of second feature extraction modules;

s2, extracting the features of the original image by using the first feature extraction module to obtain an original feature map;

s3, the original feature map sequentially passes through the second feature extraction modules, feature maps output by the second feature extraction modules are respectively extracted and then are spliced, and a global feature map is obtained;

s4, inputting the global feature map into the global channel attention module to generate a global channel attention map, and then fusing the global channel attention map and the global feature map (multiplying the global channel attention map and the global feature map) to obtain an enhanced feature map;

s5, performing super-resolution reconstruction on the enhanced feature map by using the image reconstruction module to obtain a target image, wherein the resolution of the target image is greater than that of the original image.

Further, the second feature extraction module comprises a ReLU activation function, 3 × 3 convolution layers, 5 × 5 convolution layers, 3 × 3 deformable convolution layers, a first hole convolution layer with an expansion rate of 2, a second hole convolution layer with an expansion rate of 3, and a local dimension reduction layer;

inputting the feature map of the second feature extraction module to pass through the 3 × 3 convolutional layer and the ReLU activation function to obtain a first feature map, inputting the feature map of the second feature extraction module to pass through the 5 × 5 convolutional layer and the ReLU activation function to obtain a second feature map, and inputting the feature map of the second feature extraction module to pass through the 3 × 3 deformable convolutional layer and the ReLU activation function to obtain a third feature map;

and after the first feature map, the second feature map and the third feature map are spliced, obtaining a fourth feature map through the first cavity convolution and the ReLU activation function, after the fourth feature map and the third feature map are spliced, obtaining a fifth feature map through the second cavity convolution and the ReLU activation function, and after the fifth feature map and the third feature map are spliced, inputting the fifth feature map into the local dimensionality reduction layer for dimensionality reduction, so that the number of feature map channels output by the local dimensionality reduction layer is the same as the number of feature map channels input into the second feature extraction module.

Furthermore, a residual connection is arranged between the upstream end of the second feature extraction module and the downstream end of the second feature extraction module, the feature map input into the second feature extraction module is fused with the feature map output by the local dimensionality reduction layer through the residual connection, that is, the feature map input into the second feature extraction module and the feature map output by the local dimensionality reduction layer are subjected to matrix addition operation. Therefore, gradient disappearance in the model training process can be effectively avoided, the convergence speed in the model training process is improved, and the deep network training is easier.

Further, the global channel attention module comprises a global average pooling layer, a global maximum average pooling layer, a multilayer perceptron and a sigmoid activation function; and the global feature map sequentially passes through the global average pooling layer and the multilayer perceptron to obtain a first attention feature map, the global feature map passes through the global maximum average pooling layer and the multilayer perceptron to obtain a second attention feature map, and the number of channels of the first attention feature map and the number of channels of the second attention feature map are the same as that of the global feature map. Then fusing the first attention feature map and the second attention feature map (performing matrix addition operation on the first attention feature map and the second attention feature map), and generating the global channel attention map after passing through a sigmoid activation function;

the global maximum average pooling layer may be represented as follows:

where MaxAvePc represents the output of the global maximum average pooling layer, Fc represents each layer of the global feature map, Fc is the input of the global maximum average pooling layer, largi (Fc) represents the ith largest value of the global feature map on the c-th layer, and n may be represented as the following formula:

wherein, W represents the width of the global feature map, H represents the height of the global feature map, and the value of n can be conveniently adjusted in a self-adaptive mode according to the image size. And ensuring that n is an integer through rounding operation. Lambda is a hyper-parameter, and the value of the hyper-parameter can be adjusted manually according to the amount of interference information in the image, so that the model and the image to be reconstructed can be optimally matched.

Furthermore, the image super-resolution reconstruction network also comprises a dimension reduction module, wherein the enhanced feature map is input into the dimension reduction module to reduce the number of channels of the enhanced feature map, and then is input into the image reconstruction module to reduce the memory capacity occupied by the algorithm in the operation process through the dimension reduction module.

Furthermore, the image reconstruction module comprises a first reconstruction convolution layer, a reconstruction sub-pixel convolution layer and a second reconstruction convolution layer, and the enhancement feature map sequentially passes through the first reconstruction convolution layer, the reconstruction sub-pixel convolution layer and the second reconstruction convolution layer to obtain the target image. The image reconstruction module can fully utilize the image characteristics extracted in the front, change the number of channels output by the first reconstruction convolution layer and the length and width of the characteristic diagram output by the reconstruction sub-pixel convolution layer, and can conveniently adjust the magnification of the model to the image.

Furthermore, the image super-resolution reconstruction network is also provided with a branch module, the branch module comprises a branch convolution layer and a branch sub-pixel convolution layer, the length, the width and the channel number of the feature map obtained after the original feature map passes through the branch convolution layer and the branch sub-pixel convolution layer in sequence are the same as those of the feature map output by the reconstruction sub-pixel convolution layer correspondingly, and then the output of the branch sub-pixel convolution layer is fused with the feature map output by the reconstruction sub-pixel convolution layer (matrix addition operation).

The invention provides an image super-resolution reconstruction device, which comprises a first acquisition unit and a second acquisition unit, wherein the first acquisition unit is used for acquiring an original image needing super-resolution reconstruction, the second acquisition unit is used for acquiring a trained image super-resolution reconstruction network, and the image super-resolution reconstruction network comprises an image reconstruction module, a global channel attention module, a first feature extraction module and a plurality of second feature extraction modules;

after the original image sequentially passes through the first feature extraction module and the plurality of second feature extraction modules, splicing feature maps output by the second feature extraction modules to obtain a global feature map; inputting the global feature map into the global channel attention module to generate a global channel attention map, and then fusing the global channel attention map and the global feature map to obtain an enhanced feature map;

the image reconstruction module is used for performing super-resolution reconstruction on the enhanced feature map to obtain a target image, and the resolution of the target image is greater than that of the original image.

The present invention also provides a storage medium having stored thereon a computer program which is executed by a processor to implement the above-described image super-resolution reconstruction method.

The invention also provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program, and the processor is used for executing the image super-resolution reconstruction method by loading the computer program.

The invention has the beneficial effects that:

(1) the method comprises the steps of extracting features of an original image by using a first feature extraction module and a second feature extraction module, splicing feature maps output by the second feature extraction modules to obtain picture features on different scales, distributing different weight parameters in the direction of a global feature map channel through a global channel attention module, selectively extracting more required information, reducing irrelevant information, and after training of models with different amplification factors, enabling the global channel attention module to focus on the global feature map at different positions, so that the model is stronger in pertinence and better in super-resolution reconstruction effect;

(2) in the global channel attention module, the output of the global maximum average pooling layer is the average of the first n maximum values in each feature map, compared with the conventional global maximum pooling layer, the global maximum average pooling layer can extract more content information of the feature maps, the super-resolution reconstruction effect is improved, meanwhile, the deviation of a training data set to a model is reduced, the generalization capability of the model is improved, and the experimental process finds that when interference information (such as damage, artifact, noise point and the like in the picture) exists in the picture input to the model, the global channel attention module has a targeted selection effect due to the fact that the global feature map comprises picture features on different scales, and the global channel attention module is matched with the global feature map, so that the method has good recovery capability on the lossy image acquired in an actual scene;

(3) in the second feature extraction module, two common convolution kernels of 3 × 3 and 5 × 5 are used for extracting picture features under different visual fields, then two cavity convolutions are sequentially used for fully extracting the picture features on a larger visual field, on the other hand, key high-frequency information in the picture is extracted in a targeted manner through one deformable convolution of 3 × 3, the information is repeatedly input into the feature extraction structure at different positions, the deformable convolution is matched with the cavity convolution, the high-frequency information is reused on different scales, the low-frequency information redundancy is reduced, and meanwhile, the adverse effect of interference information on image reconstruction is reduced;

(4) the original characteristic diagram is directly fused with the output of the reconstruction sub-pixel convolution layer after passing through the branch module, so that the lost details of the characteristic diagram after passing through the second characteristic extraction module and the global channel attention module are compensated, the test result shows that the mode of respectively upsampling and fusing the reconstruction sub-pixel convolution layer and the branch sub-pixel convolution layer is adopted, and compared with other upsampling methods, the method has better image denoising, compression artifact removal, non-blind image deblurring and image restoration effects, particularly greatly reduces the chessboard effect in the generated image, and has a good image reconstruction effect;

(5) the model provided by the invention has high feature extraction efficiency, still has good image recovery effect under the conditions of relatively shallow depth and small parameter quantity, and is convenient for training and deployment.

Drawings

FIG. 1 is a schematic diagram of a network structure for super-resolution image reconstruction according to an embodiment;

FIG. 2 is a schematic diagram of a super-resolution image reconstruction network according to another embodiment;

FIG. 3 is a schematic structural diagram of a second feature extraction module in the image super-resolution reconstruction network shown in FIG. 1;

FIG. 4 is a schematic structural diagram of a global channel attention module in the image super-resolution reconstruction network shown in FIG. 1;

FIG. 5 is a comparison graph of the image super-resolution reconstruction network shown in FIG. 1 and the image reconstruction effect of the EDSR model;

FIG. 6 is a graph comparing the image reconstruction effect of the image super-resolution reconstruction network shown in FIG. 1 and the image super-resolution reconstruction network shown in FIG. 2;

in the drawings:

1-original image, 2-target image, 3-first feature extraction module, 4-second feature extraction module, 41-residual connection, 5-global channel attention module, 51-global average pooling layer, 52-global maximum average pooling layer, 53-multilayer perceptron, 54-first attention feature map, 55-second attention feature map, 56-global channel attention map, 6-dimension reduction module, 7-image reconstruction module, 71-first reconstruction convolution layer, 72-reconstruction sub-pixel convolution layer, 73-second reconstruction convolution layer, 8-branch module, 81-branch convolution layer, 82-branch sub-pixel convolution layer.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

example 1:

the model is built according to the image super-resolution reconstruction network structure shown in fig. 1, wherein the first feature extraction module 3 comprises a convolution layer of 3 × 3 and a ReLU layer activation function which are sequentially connected, the structure of the second feature extraction module 4 is shown in fig. 3, the number of the second feature extraction modules 4 is 6, and the convolution kernel sizes of the convolution layer of the first cavity convolution and the convolution kernel size of the convolution kernel of the second cavity convolution are both 3 × 3. As shown in fig. 4, the global channel attention module 5 includes a global average pooling layer 51, a global maximum average pooling layer 52, a multilayer perceptron 53 and a sigmoid activation function, in this embodiment, a hyperparameter λ is 1, the number of layers of the multilayer perceptron 53 is 3, the sigmoid activation functions are all adopted in the multilayer perceptron 53, and the number of neurons in a hidden layer in the middle of the multilayer perceptron 53 is half of the number of channels of the global feature map. In the process that the global feature map passes through the multilayer perceptron 53 twice to generate the first attention feature map 54 and the second attention feature map 55 respectively, the parameters of the multilayer perceptron 53 are kept unchanged, and parameter sharing is achieved. The first attention feature map 54 and the second attention feature map 55 are fused through a matrix addition operation, and then the global channel attention map 56 is generated through a sigmoid activation function, wherein the number of channels of the first attention feature map 54, the second attention feature map 55 and the global channel attention map 56 is the same as that of the channels of the global feature map. The local dimension reduction layer 41 and the dimension reduction module 6 are both realized by convolution layers with convolution kernels of 1 × 1, and a ReLU activation function is arranged immediately behind each of the dimension reduction module 6, the first reconstruction convolution layer 71, the reconstruction sub-pixel convolution layer 72 and the second reconstruction convolution layer 73.

The code is based on the 3.6 version of python and 0.4.1 version of the pytorch framework, using the L1 function as the loss function, Adam as the optimizer, the batch-size set to 16, the learning rate fixed set to 0.0001, and the number of epochs 1000. The CUP adopted by model training is Intel E5-2680 v3 and the memory is 128G, and the video card adopts NVIDIA 2080ti and the video memory is 11G.

During model training, 800 pictures of a training Set in a DIV2K data Set are adopted, the training Set is expanded through mirroring and rotation, test sets are three data sets of Set5, Set14 and BSDS100, and low-resolution images corresponding to an original high-resolution picture are respectively obtained through python downsampling. The picture adopts RGB format during training, and the picture is YCbCr format during testing, and the test result is based on the luminance channel of YCbCr format.

During model training, pictures with the size of 48 x 48 in the low-resolution pictures are randomly captured as an original image 1, the original image is input into an image super-resolution reconstruction network, and the length, the width and the size of the image are kept unchanged before and after the image passes through each convolution layer in the first feature extraction module 3 and the second feature extraction module 4, the dimension reduction module 6, the reconstructed convolution layer and the branch convolution layer 81 by setting a proper padding value. When the network is input, the image channel is 3, after the image passes through the first feature extraction module 3, the original feature map channel is 64, the feature map channels of the input/output second feature extraction module 4 are 64, the dimension reduction module 6 performs dimension reduction on the enhanced feature map, and the image channel output by the dimension reduction module 6 is 64.The number of channels output by the first reconstructed convolutional layer 71 is determined according to the magnification factor, and if the picture magnification factor is M, the number of channels output by the first reconstructed convolutional layer 71 is 64M²After passing through the reconstructed sub-pixel convolution layer 72, the channel of the picture becomes 64, the length and width dimensions become M times of the original ones, and after passing through the second reconstructed convolution layer 73, the number of channels becomes 3, and the target image 2 is obtained.

Widely used PSNR (Peak Signal to noise ratio) and SSIM (structural similarity) are adopted as measurement parameters to be compared with the EDSR (enhanced data scanning) advanced model, and the test results are shown in the following table. In each cell, the former parameter represents PSNR and the latter parameter represents SSIM. The experimental data of the EDSR model are obtained from a paper (Enhanced Deep reactive Networks for Single Image Super-Resolution) published by the model presenter.

Model (model)	Magnification factor	set5	set14	BSDS100
					EDSR	2	38.11/0.9601	33.92/0.9195	32.32/0.9013
Example 1	2	38.69/0.9616	34.05/0.9211	32.53/0.9078
					EDSR	3	34.65/0.9282	30.52/0.8462	29.25/0.8093
Example 1	3	34.82/0.9335	30.83/0.8476	29.80/0.8135
					EDSR	4	32.46/0.8968	28.80/0.7876	27.71/0.7420
Example 1	4	32.96/0.8988	28.92/0.7902	28.13/0.7465

The ratio of the parameter quantities of the model in example 1 to the parameter quantities of the EDSR model is shown in the following table:

model (model)	EDSR	Example 1
			Amount of ginseng	43M	5.9M

From the data, the parameter quantity of the image super-resolution reconstruction network adopted in the embodiment 1 is far lower than that of the EDSR model, and meanwhile, the image reconstruction effect of the embodiment 1 under different magnification factors is slightly better than that of the EDSR model, so that the method has the advantages of high feature extraction efficiency, less redundant information, convenience in training and deployment and the like.

A picture in Set5 is selected, and the trained EDSR model and the image super-resolution reconstruction network in example 1 are used for 4-fold magnification reconstruction, and the visual effect is shown in FIG. 5. In fig. 5, the left side is a picture of EDSR model reconstruction, the right side is a picture of model reconstruction in example 1, and the lower right side of each picture is a partial enlarged view at a square frame in the picture. As can be seen from the figure, the image restored in example 1 is visually clearer than the picture output by the EDSR model.

Example 2:

on the basis of the image super-resolution reconstruction network in embodiment 1, a branch module 8 is added, and the structure thereof is shown in fig. 2. Two branch convolution layers 81 are provided, the convolution kernel size is 3 × 3, and a ReLU activation function is provided immediately after each of the branch convolution layers 81 and the branch sub-pixel convolution layer 82. The training set, the test set, the framework, the optimizer, the learning rate, the CPU, the GUP, and other hardware and software are all the same as those in embodiment 1. Similarly to the image reconstruction module 7, the number of feature map channels for inputting and outputting the first branch convolution layer 81 is the same, and the number of output channels for the second branch convolution layer 81 is 64M²。

The results of the experiments are compared as shown in the following table:

model (model)	Magnification factor	set5	set14	BSDS100
					EDSR	2	38.11/0.9601	33.92/0.9195	32.32/0.9013
Example 1	2	38.69/0.9616	34.05/0.9211	32.53/0.9078
					Example 2	2	39.32/0.9688	34.62/0.9371	32.85/0.9104
EDSR	3	34.65/0.9282	30.52/0.8462	29.25/0.8093
					Example 1	3	34.82/0.9335	30.83/0.8476	29.80/0.8135
Example 2	3	35.14/0.9397	31.13/0.8502	29.87/0.8161
					EDSR	4	32.46/0.8968	28.80/0.7876	27.71/0.7420
Example 1	4	32.96/0.8988	28.92/0.7902	28.13/0.7465
					Example 2	4	33.22/0.9013	28.97/0.7917	28.34/0.7478

As can be seen from the above table, the image super-resolution reconstruction effect is further improved after the branch module 8 is added. Similarly, the image super-resolution reconstruction network in example 2 is reconstructed at 4 times of magnification, and compared with the network reconstruction effect in example 1, and the result is shown in fig. 6. In fig. 6, the left side is the picture reconstructed in example 1, and the right side is the picture reconstructed in the model in example 2. As can be seen from fig. 6, the chessboard effect in the reconstructed picture is reduced by adding the branch module 8.

Example 3:

500 pictures in a DIV2K training set are randomly extracted to serve as a training set, the remaining 300 pictures serve as a test set, and Gaussian noise with zero mean value and standard deviation is added into the pictures to test the denoising performance of the model. The model structure is shown in fig. 2, and the related training details are the same as above. The results of the experiments are shown in the following table:

as can be seen from the data in the above table, the network architecture shown in fig. 2 has a good denoising effect, and the method has a good recovery capability for the lossy image obtained in the actual scene.

The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. The image super-resolution reconstruction method is characterized by comprising the following steps: the method comprises the following steps:

s4, inputting the global feature map into the global channel attention module to generate a global channel attention map, and then fusing the global channel attention map and the global feature map to obtain an enhanced feature map;

2. The image super-resolution reconstruction method according to claim 1, wherein: the second feature extraction module comprises a ReLU activation function, a 3 × 3 convolution layer, a 5 × 5 convolution layer, a 3 × 3 deformable convolution layer, a first cavity convolution layer with an expansion rate of 2, a second cavity convolution layer with an expansion rate of 3 and a local dimension reduction layer;

3. The image super-resolution reconstruction method according to claim 2, wherein: and residual connection is arranged between the upstream end of the second feature extraction module and the downstream end of the second feature extraction module, and the feature graph input into the second feature extraction module is fused with the feature graph output by the local dimensionality reduction layer through the residual connection.

4. The image super-resolution reconstruction method according to claim 1, wherein: the global channel attention module comprises a global average pooling layer, a global maximum average pooling layer, a multilayer perceptron and a sigmoid activation function; the global feature map sequentially passes through the global average pooling layer and the multilayer perceptron to obtain a first attention feature map, the global feature map passes through the global maximum average pooling layer and the multilayer perceptron to obtain a second attention feature map, the first attention feature map and the second attention feature map are fused, and then the first attention feature map and the second attention feature map pass through a sigmoid activation function to generate the global channel attention map;

the global maximum average pooling layer may be represented as follows:

wherein, MaxAvP_cAn output representing the global maximum average pooling layer, F_cEach layer representing the global feature map, F_cAs an input to the global maximum average pooling layer, Large_i(F_c) The ith large value of the global feature map on the c level is represented, and n can be expressed as the following formula:

wherein W represents the width of the global feature map, H represents the height of the global feature map, and λ is a hyper-parameter.

5. The image super-resolution reconstruction method according to claim 1, wherein: the image super-resolution reconstruction network further comprises a dimension reduction module, wherein the enhanced feature map is input into the dimension reduction module firstly, the channel number of the enhanced feature map is reduced, and then the enhanced feature map is input into the image reconstruction module.

6. The image super-resolution reconstruction method according to claim 1, wherein: the image reconstruction module comprises a first reconstruction convolution layer, a reconstruction sub-pixel convolution layer and a second reconstruction convolution layer, and the enhancement characteristic graph sequentially passes through the first reconstruction convolution layer, the reconstruction sub-pixel convolution layer and the second reconstruction convolution layer to obtain the target image.

7. The image super-resolution reconstruction method according to claim 6, wherein: the image super-resolution reconstruction network is further provided with a branch module, the branch module comprises a branch convolution layer and a branch sub-pixel convolution layer, and the original characteristic graph is fused with the characteristic graph output by the reconstruction sub-pixel convolution layer after sequentially passing through the branch convolution layer and the branch sub-pixel convolution layer.

8. An image super-resolution reconstruction device is characterized in that: the super-resolution image reconstruction method comprises a first acquisition unit and a second acquisition unit, wherein the first acquisition unit is used for acquiring an original image needing super-resolution reconstruction, the second acquisition unit is used for acquiring a trained image super-resolution reconstruction network, and the image super-resolution reconstruction network comprises an image reconstruction module, a global channel attention module, a first feature extraction module and a plurality of second feature extraction modules;

9. A storage medium having a computer program stored thereon, characterized by: the computer program is executed by a processor to implement the image super-resolution reconstruction method of any one of claims 1 to 7.

10. An electronic device comprising a processor and a memory, the memory storing a computer program, characterized in that: the processor is configured to execute the image super-resolution reconstruction method according to any one of claims 1 to 7 by loading the computer program.