CN113436076A

CN113436076A - Image super-resolution reconstruction method with characteristics gradually fused and electronic equipment

Info

Publication number: CN113436076A
Application number: CN202110842925.0A
Authority: CN
Inventors: 张世龙
Original assignee: Pomelo Peel Chongqing Technology Co ltd
Current assignee: Shenzhen Sailu Medical Technology Co.,Ltd.
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2021-09-24
Anticipated expiration: 2041-07-26
Also published as: CN113436076B

Abstract

The invention discloses an image super-resolution reconstruction method and electronic equipment with gradually fused features, wherein the image super-resolution reconstruction method comprises the steps of acquiring an original image and a reconstruction network, extracting features of the original image by using a first feature extraction module, leading an original feature map to sequentially pass through a plurality of second feature extraction modules and a plurality of third feature extraction modules, and carrying out super-resolution reconstruction on an intermediate feature map by using an image reconstruction module to obtain a target image with higher resolution. The method gradually screens out the characteristics useful for the image super-resolution reconstruction for multiple times through multiple feature fusion, the ratio of the useful characteristic information finally input into the image reconstruction module is larger, the model characteristic extraction effect is better, the loss of the useful information and the redundancy of the useless information are reduced, and meanwhile, the occupied computer memory in the model training and running processes is smaller.

Description

Image super-resolution reconstruction method with characteristics gradually fused and electronic equipment

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a method for reconstructing super-resolution of an image by gradually fusing features and electronic equipment.

Background

The single image super-resolution technology is a classic task in the field of computer vision, and can be used for improving the resolution of a specific image by utilizing an algorithm and reconstructing details in the image so as to improve the image quality.

The neural network is far beyond the traditional algorithm in the image super-resolution reconstruction effect due to the strong data fitting capacity, so that the super-resolution technology based on deep learning becomes the mainstream. However, after extracting features, the existing super-resolution reconstruction neural network fuses the features at different depths at one time, the fused features contain a large amount of information which is useless for super-resolution reconstruction, the extraction efficiency of effective information of a model is low, and the quality of a neural network reconstruction image is limited.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides the image super-resolution reconstruction method and the electronic equipment with gradually fused features, so that the extraction efficiency of useful information of a model is improved, and the image super-resolution reconstruction effect is improved.

In order to achieve the above purpose, the solution adopted by the invention is as follows: a super-resolution image reconstruction method with gradually fused features comprises the following steps:

s1, acquiring an original image needing super-resolution reconstruction, and acquiring a trained image super-resolution reconstruction network, wherein the image super-resolution reconstruction network comprises a first feature extraction module, a second feature extraction module, a third feature extraction module and an image reconstruction module, the number of the third feature extraction modules is equal to that of the second feature extraction modules, the third feature extraction modules are symmetrically arranged, and the third feature extraction modules correspond to the second feature extraction modules one to one;

s2, extracting the features of the original image by using the first feature extraction module to obtain an original feature map;

s3, sequentially passing the original feature map through the plurality of second feature extraction modules and the plurality of third feature extraction modules to generate an intermediate feature map; after each feature graph passes through the third feature extraction module, the feature graph output by the third feature extraction module is fused with the feature graph output by the corresponding second feature extraction module, and the feature graph obtained after fusion is input into the next third feature extraction module;

and S4, performing super-resolution reconstruction on the intermediate characteristic map by using the image reconstruction module to obtain a target image, wherein the resolution of the target image is greater than that of the original image.

Further, the second feature extraction module and the third feature extraction module have the same structure, and each of the second feature extraction module and the third feature extraction module includes a ReLU activation function, a first 3 × 3 convolution layer, a second 3 × 3 convolution layer, a 5 × 5 convolution layer, a local dimensionality reduction layer, and a channel attention module;

inputting the feature map of the second feature extraction module or the third feature extraction module to pass through the first 3 × 3 convolution layer and the first ReLU activation function to obtain a first feature map; the first feature map is subjected to the second 3 × 3 convolution layer and a second ReLU activation function to obtain a second feature map, the first feature map is subjected to the 5 × 5 convolution layer and a third ReLU activation function to obtain a third feature map, and the first feature map, the second feature map and the third feature map are spliced to obtain a fourth feature map;

inputting the fourth feature map into the channel attention module, generating a channel attention map, fusing the channel attention map and the fourth feature map, inputting the channel attention map into the local dimensionality reduction layer for dimensionality reduction, and enabling the number of feature map channels output by the local dimensionality reduction layer to be the same as the number of feature map channels input into the second feature extraction module or the third feature extraction module. Wherein 3 × 3 convolutional layers and 5 × 5 convolutional layers refer to convolutional layers having convolutional kernel sizes of 3 × 3 and 5 × 5, respectively.

Furthermore, a first residual error connection is arranged between the upstream end of the second feature extraction module and the downstream end of the second feature extraction module, and the feature graph input into the second feature extraction module is fused with the feature graph output by the local dimensionality reduction layer through the first residual error connection; and a second residual error connection is arranged between the upstream end of the third feature extraction module and the downstream end of the third feature extraction module, and the feature graph input into the third feature extraction module is fused with the feature graph output by the local dimension reduction layer through the second residual error connection.

Further, the channel attention module comprises a variance pooling layer, an AM pooling layer, a first full-link layer, a second full-link layer, a sigmoid activation function, and a fourth ReLU activation function; generating a first attention diagram after the fourth feature diagram passes through the variance pooling layer, generating a second attention diagram after the fourth feature diagram passes through the AM pooling layer, fusing the first attention diagram and the second attention diagram, and sequentially passing through the first full-link layer, the fourth ReLU activation function, the second full-link layer and the sigmoid activation function to generate the channel attention diagram;

the variance pooling layer may be represented by the following formula:

Var_c＝pool_var(F_c)

wherein, Var_cRepresenting the output of said variance pooling layer, F_cDenotes the c-th layer, F, in the channel direction of the fourth feature diagram_cAs input to the variance pooling layer, pool_var(F_c) Representing the variance value on the c-th characteristic diagram;

the AM pooling layer may be expressed as follows:

wherein M is_cRepresenting the output of said AM pooling layer, F_cDenotes the c-th layer, F, in the channel direction of the fourth feature diagram_cAs an input to said AM pooling layer, Large_i(F_c) The ith largest value of the fourth feature map on the c-th layer is represented, Mean represents the average value of the c-th layer of the fourth feature map, and n can be represented as the following formula:

wherein W represents a width (pixel value) of the fourth feature map, H represents a height (pixel value) of the fourth feature map,<·>indicating rounding of the values therein. When c takes different values in sequence, F_cThen sequentially take the place ofTable four profile channel direction layers.

Further, the image reconstruction module includes a third 3 × 3 convolution layer, a first sub-pixel convolution layer, and a fourth 3 × 3 convolution layer, and the intermediate feature map sequentially passes through the third 3 × 3 convolution layer, the first sub-pixel convolution layer, and the fourth 3 × 3 convolution layer to generate the target image.

Furthermore, the image super-resolution reconstruction network is also provided with a jump-connection module, wherein the jump-connection module comprises a first jump-connection convolution layer, a second sub-pixel convolution layer, a third jump-connection convolution layer, a fourth jump-connection convolution layer, a fifth ReLU activation function, a sixth ReLU activation function and a seventh ReLU activation function;

and after the first jump-connection feature map is spliced with the second jump-connection feature map, the first jump-connection feature map sequentially passes through the third jump-connection convolution layer, the second sub-pixel convolution layer, the fourth jump-connection convolution layer and the seventh ReLU activation function, and finally is fused with the feature map output by the first sub-pixel convolution layer.

Further, the number of channels of the first hop-and-hop feature map is equal to the number of channels of the second hop-and-hop feature map.

Furthermore, the image super-resolution reconstruction network is provided with a DE fusion module, and the feature map output by the third feature extraction module and the feature map output by the corresponding second feature extraction module are fused through the DE fusion module;

the DE fusion module has the mathematical model as follows:

F1＝[L1,L2]

F2＝δ1(f1(L1))

F3＝δ2(f2(L2))

F_D＝δ3(f3(F1+F2+F3))

wherein, L1 represents the feature map output by the second feature extraction module, L2 represents the feature map output by the third feature extraction module, L1 and L2 are input to the DE fusion module, [ · ] represents that the feature map therein is subjected to a splicing operation in the channel direction, δ 1, δ 2 and δ 3 each represent a ReLU activation function, f1 and f2 each represent a convolution operation with a convolution kernel size of 3 × 3, f3 represents a convolution operation with a convolution kernel size of 1 × 1, and FD represents the feature map output by the DE fusion module.

The invention also provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program, and the processor is used for executing the image super-resolution reconstruction method of the gradual fusion features by loading the computer program.

The invention has the beneficial effects that:

(1) the method comprises the steps of firstly extracting image characteristics of an input model by using a first characteristic extraction module and a second characteristic extraction module, then gradually fusing a characteristic diagram output by the second characteristic extraction module and a characteristic diagram output by a third characteristic extraction module through multiple splicing and dimension reduction, and gradually screening out characteristics useful for image super-resolution reconstruction for multiple times;

(2) the second feature extraction module and the third feature extraction module are symmetrically arranged in front and back, so that the feature map output by the second feature extraction module in front can be directly transmitted to the position behind the network, the detail information lost in the transmission process of the features in the network can be reduced, and the super-resolution reconstruction effect can be improved;

(3) in the prior art, different feature maps are spliced and subjected to dimensionality reduction to realize feature fusion, the effective feature extraction effect is limited, an inventor creatively provides a DE fusion module by combining own actual working experience, not only is two feature maps spliced, but also dimension increasing operation is carried out on the two feature maps input into the DE fusion module respectively, features are further fully extracted and dispersed, and then the feature maps are fused by element summation with the spliced feature maps, and dimension reduction is carried out by utilizing a 1 × 1 convolution layer, so that experiments show that the DE fusion module can store useful information in the feature maps before fusion to a greater extent, so that invalid information redundancy is better reduced, and the feature fusion effect is improved;

(4) the second feature extraction module and the third feature extraction module adopt convolution kernels with different sizes to extract features, feature maps at different depths are spliced to acquire feature information as much as possible, then weight parameters with different sizes are given to different feature maps by utilizing a channel attention module, information useful for super-resolution reconstruction is selectively emphasized, useless information is suppressed, and feature extraction efficiency is improved;

(5) the traditional channel attention mechanism adopts average pooling and maximum pooling, and considering that compared with tasks such as image classification and target detection, the task level is lower in image super-resolution reconstruction, so that variance pooling and AM pooling are adopted in a channel attention module, high-frequency statistical information of a channel can be reflected better, and the filtering effect of the channel attention module on the information is more accurate;

(6) the channel attention module provided by the invention is very light in weight, simple in structure, capable of being conveniently inserted and integrated in other network architectures, and wide in application range;

(7) one part of the features in the jump-connection module is from an original feature map, the part of the features contains a lot of detail information but is mixed with a lot of useless interference information, the other part of the features is from a feature map output by a last second feature extraction module, the part of the features contains a lot of abstract useful information but may have details missing, the jump-connection module selectively extracts useful key information by utilizing the complementary relation of the two parts of the feature information and directly transmits the useful key information to a last image reconstruction module, the image super-resolution reconstruction effect is improved, and compared with a jump-connection structure such as dense connection and the like, the jump-connection module is lower in complexity and reduces the requirement on hardware resources.

Drawings

FIG. 1 is a schematic diagram of a network structure for super-resolution image reconstruction according to an embodiment;

FIG. 2 is a schematic structural diagram of a second feature extraction module in the image super-resolution reconstruction network shown in FIG. 1;

FIG. 3 is a schematic structural diagram of a channel attention module in the image super-resolution reconstruction network shown in FIG. 1;

FIG. 4 is a schematic diagram of an internal structure of a DE fusion module in the image super-resolution reconstruction network shown in FIG. 1;

FIG. 5 is a comparison graph of the image super-resolution reconstruction network shown in FIG. 1 with the image reconstruction effects of EDSR and CSNLN;

FIG. 6 is a schematic diagram of a super-resolution image reconstruction network according to another embodiment;

in the drawings:

1-original image, 2-target image, 3-first feature extraction module, 4-second feature extraction module, 41-first 3 x 3 convolutional layer, 42-second 3 x 3 convolutional layer, 43-5 x 5 convolutional layer, 44-local dimensionality reduction layer, 45-channel attention module, 451-variance pooling layer, 452-AM pooling layer, 453-first fully-connected layer, 454-second fully-connected layer, 455-sigmoid activation function, 456-fourth lu activation function, 46-first lu activation function, 47-second lu activation function, 48-third lu activation function, 5-third feature extraction module, 6-image reconstruction module, 61-third 3 x 3 convolutional layer, 62-first sub-pixel convolutional layer, 63-fourth 3 x 3 convolutional layer, 7-first residual join, 8-DE fusion module, 9-jumplight module, 91-first jumplight convolutional layer, 92-second jumplight convolutional layer, 93-second sub-pixel convolutional layer, 94-third jumplight convolutional layer, 95-fourth jumplight convolutional layer, 96-fifth ReLU activation function, 97-sixth ReLU activation function, 98-seventh ReLU activation function.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

example 1:

according to the super-resolution reconstruction network structure building model shown in FIG. 1, the code adopts 3.7 version of python and is assisted by a pytorch framework. In the aspect of hardware, CUP adopted by model training and testing is Inteli9 and memory 128G, and NVIDIA 2080ti and memory 11G are adopted by a video card.

In this embodiment, the first feature extraction module 3 is implemented by using a convolution layer with a convolution kernel size of 3 × 3, the second feature extraction module 4 has a structure as shown in fig. 2, and the local dimensionality reduction layer 44 is a convolution layer with a convolution kernel size of 1 × 1. The second feature extraction modules 4 and the third feature extraction modules 5 are in one-to-one correspondence, and the number of the second feature extraction modules 4 and the number of the third feature extraction modules 5 are 3. After the feature maps input into the second feature extraction module 4 or the third feature extraction module 5 pass through the first 3 × 3 convolution layer 41 and the first ReLU activation function 46, a first feature map is obtained; and (3) passing the first feature map through a second 3 × 3 convolution layer 42 and a second ReLU activation function 47 to obtain a second feature map, passing the first feature map through a 5 × 5 convolution layer 43 and a third ReLU activation function 48 to obtain a third feature map, and splicing the first feature map, the second feature map and the third feature map to obtain a fourth feature map.

The structure of the channel attention module 45 is shown in fig. 3, wherein a first attention map is generated after a fourth feature map passes through a variance pooling layer 451, a second attention map is generated after the fourth feature map passes through an AM pooling layer 452, and the channel attention map is generated after the first attention map and the second attention map are fused by element summation, sequentially passes through a first fully-connected layer 453, a fourth ReLU activation function 456, a second fully-connected layer 454 and a sigmoid activation function 455. The channel attention map is fused with the fourth feature map by multiplication. In the corridor attention module 45, the number of input elements of the first fully-connected layer 453 and the number of elements output by the second fully-connected layer 454 are both equal to the number of corridors of the fourth feature map, and the number of elements output by the first fully-connected layer 453 is one twelfth of the number of corridors of the fourth feature map.

In order to enable the loss function to be rapidly converged in the training process, a first residual connection 7 is arranged between the upstream end of the second feature extraction module 4 and the downstream end of the second feature extraction module 4, and the feature map input into the second feature extraction module 4 and the feature map output by the local dimensionality reduction layer 44 are fused through element summation.

In this embodiment, the feature map output by the third feature extraction module 5 and the corresponding feature map output by the second feature extraction module 4 are fused by a DE fusion module 8, and the internal structure of the DE fusion module 8 is shown in fig. 4. The feature graph output by the third feature extraction module 5 and the feature graph output by the second feature extraction module 4 are firstly spliced in the channel direction to generate a first fused feature graph, on the other hand, the feature graph output by the third feature extraction module 5 and the feature graph output by the second feature extraction module 4 respectively generate a second fused feature graph and a third fused feature graph through 3 × 3 convolution and a ReLU activation function, and the channel number of the second fused feature graph and the channel number of the third fused feature graph are the same as that of the first fused feature graph. And summing and fusing the first fusion feature map, the second fusion feature map and the third fusion feature map through elements, outputting the feature map obtained after fusion after 1 × 1 convolution and a ReLU activation function, wherein the length and width of the feature map are the same as the feature map output by the second feature extraction module 4, and the number of channels is half of that of the first fusion feature map.

In this embodiment, the training set is obtained from all 1000 pictures in the DIV2K dataset, the low-resolution images are obtained after downsampling the original high-resolution images in the dataset, the model is optimized by using the L1 function as the loss function, Adam is used as the optimizer, the batch-size is set to 8, the learning rate is initially set to 0.0001, the epoch number is 1200, and the learning rate is reduced to half of the original learning rate after every 200 epochs. The test Set is three data sets of Set5, BSDS100 and Urban 100. The picture adopts RGB format during training, and the picture is YCbCr format during testing, and the relevant parameters on the Y channel are calculated during testing.

During model training, fragments with the size of 64 x 64 in a low-resolution picture obtained after down-sampling are randomly intercepted and used as an original image 1, an image super-resolution reconstruction network is input, the channel of the picture is 3 when the picture is input into the network, the channel of an original feature map obtained after the picture passes through the first feature extraction module 3 is 64, and the channels of the feature maps of the input/output second feature extraction module 4 and the third feature extraction module 5 are both 64. The number of feature map channels of the first 3 × 3 convolution layer 41, the second 3 × 3 convolution layer 42, and the 5 × 5 convolution layer 43 is also input/output to/from the second feature extraction module 4 or the third feature extraction module 5Is 64. Assuming that the picture magnification is M, the number of channels outputted from the third 3 x 3 convolution layer 61 is 64M²After passing through the first subpixel convolution layer 62, the number of channels in the feature map becomes 64, the length and width dimensions become M times the original, and after passing through the fourth 3 × 3 convolution layer 63, the number of channels becomes 3, and the target image 2 is obtained.

And after the model training is finished, performing image super-resolution reconstruction on the test set. PSNR (peak signal-to-noise ratio) and SSIM (structural similarity) are used as parameters for measuring the effect of the image after super-resolution reconstruction, and the results are shown in the following table when the super-resolution reconstruction method is compared with some current advanced performance models. In each cell, the former parameter is the PSNR value and the latter parameter is the SSIM value.

Model (model)	Magnification factor	set5	BSDS100	Urban100
					EDSR
	2	38.11/0.9601	32.32/0.9013	33.10/0.9363
					CSNLN	2	38.28/0.9616	32.40/0.9024	33.25/0.9386
Example 1	2	39.13/0.9644	32.76/0.9074	33.57/0.9395
					EDSR	3	34.65/0.9282	29.25/0.8093	29.02/0.8685
CSNLN	3	34.74/0.9300	29.33/0.8105	29.13/0.8712
					Example 1	3	34.79/0.9376	29.92/0.8153	30.19/0.8705
EDSR	4	32.46/0.8968	27.71/0.7420	26.86/0.8080
					CSNLN	4	32.68/0.9004	27.80/0.7439	27.22/0.8168
Example 1	4	33.16/0.9013	28.05/0.7448	27.53/0.8195

From the data, it can be seen that the image reconstruction effect of the image super-resolution reconstruction network adopted in embodiment 1 under different magnification factors is better than that of the existing EDSR and CSNLN models with advanced performance, and the method has the advantages of high useful information extraction efficiency, good super-resolution reconstruction effect and the like.

In addition, by using the trained model, a low-resolution image corresponding to one picture in the Urban100 is randomly selected for 4-fold enlarged reconstruction, and the visual effect of the reconstructed image of different models is shown in fig. 5. In fig. 5, the left side is the picture output by the edrr model, the middle is the picture output by the CSNLN model, and the right side is the picture output by the model in example 1. The lower part of the image is a partial enlarged view of the upper picture at the box indicated by the arrow, and it can be seen that the image restored in embodiment 1 is visually clearer than the pictures output by the other two models.

Example 2:

in contrast, in this embodiment, based on embodiment 1, the channel attention module 45 and the AM pooling layer 452 are separately removed, and the rest and the experimental conditions are exactly the same as those in embodiment 1, and the experimental results are compared as shown in the following table:

model (model)	Magnification factor	set5	set14	BSDS100
					Example 1	2	39.13/0.9644	32.76/0.9074	33.57/0.9395
Model A	2	38.26/0.9602	32.41/0.9031	33.35/0.9390
					Model B	2	38.57/0.9609	32.62/0.9371	33.35/0.9395
Example 1	4	33.16/0.9013	28.05/0.7448	27.53/0.8195
					Model A	4	32.73/0.9004	27.90/0.7441	27.22/0.8171
Model B	4	32.92/0.9010	27.90/0.7440	28.25/0.8188

In the above table, model a is obtained after removing the channel attention module 45 alone on the basis of example 1, and model B is obtained after removing the AM pooling layer 452 alone on the basis of example 1, leaving the variance pooling layer 451. As can be seen from the above table, the channel attention module 45 and the AM pooling layer 452 are both provided to improve the image super-resolution reconstruction effect.

Example 3:

on the basis of the image super-resolution reconstruction network in the embodiment 1, a jump-connection module 9 is additionally arranged for a comparison experiment, the structure of which is shown in fig. 6, and the rest of the structure is completely the same as that in the embodiment 1.

In this embodiment, the original feature map passes through the first skip-and-connect convolution layer 91 and the fifth ReLU activation function 96 to generate a first skip-and-connect feature map, the feature map output by the last second feature extraction module 4 is extracted and passes through the second skip-and-connect convolution layer 92 and the sixth ReLU activation function 97 to generate a second skip-and-connect feature map, and after the first skip-and-connect feature map and the second skip-and-connect feature map are spliced, the first skip-and-connect feature map passes through the third skip-and-connect convolution layer 94, the second sub-pixel convolution layer 93, the fourth skip-and-connect convolution layer 95 and the seventh ReLU activation function 98 in sequence, and is finally fused with the feature map output by the first sub-pixel convolution layer 62 in an element summation manner.

The convolution kernel size of the first jump-up convolutional layer 91, the second jump-up convolutional layer 92, the third jump-up convolutional layer 94, and the fourth jump-up convolutional layer 95 is 3 × 3. Assuming that the image magnification is M, the number of the feature map channels output by the first feature extraction module 3, the second feature extraction module 4 and the third feature extraction module 5 is 64, and the channels of the first skip-join feature map and the second skip-join feature map areThe number of the components is 32M²After splicing, the number of channels becomes 64M². The feature map size and the number of channels before and after input/output by the third jump-up convolutional layer 94 and the fourth jump-up convolutional layer 95 are unchanged, the channel of the feature map becomes 64 after the feature map passes through the second subpixel convolutional layer 93, and the length and width dimensions become M times of the original.

The network shown in fig. 5 was trained and tested using exactly the same experimental conditions as in example 1, with the following results:

model (model)	Magnification factor	set5	set14	BSDS100
					Example 1	2	39.13/0.9644	32.76/0.9074	33.57/0.9395
Example 3	2	39.16/0.9672	32.81/0.9088	33.59/0.9413
					Example 1	4	33.16/0.9013	28.05/0.7448	27.53/0.8195
Example 3	4	33.22/0.9054	28.21/0.7491	27.58/0.8224

As can be seen from the data in the above table, after the skip-connection module 9 is added, both the PSNR parameter value and the SSIM parameter value are increased, and particularly, the increase amplitude of the SSIM is relatively large, which indicates that the skip-connection module 9 has a good promoting effect on the detailed part in the reconstructed image.

The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. A super-resolution image reconstruction method with gradually fused features is characterized in that: the method comprises the following steps:

s1, acquiring an original image needing super-resolution reconstruction, and acquiring a trained image super-resolution reconstruction network, wherein the image super-resolution reconstruction network comprises a first feature extraction module, a second feature extraction module, a third feature extraction module and an image reconstruction module, and the third feature extraction module and the second feature extraction module are equal in number and are symmetrically arranged;

2. The super-resolution image reconstruction method based on gradual feature fusion according to claim 1, wherein: the second feature extraction module and the third feature extraction module have the same structure, and each of the second feature extraction module and the third feature extraction module comprises a ReLU activation function, a first 3 × 3 convolution layer, a second 3 × 3 convolution layer, a 5 × 5 convolution layer, a local dimension reduction layer and a channel attention module;

inputting the fourth feature map into the channel attention module, generating a channel attention map, fusing the channel attention map and the fourth feature map, inputting the channel attention map into the local dimensionality reduction layer for dimensionality reduction, and enabling the number of feature map channels output by the local dimensionality reduction layer to be the same as the number of feature map channels input into the second feature extraction module or the third feature extraction module.

3. The super-resolution image reconstruction method based on gradual feature fusion according to claim 2, wherein: a first residual error connection is arranged between the upstream end of the second feature extraction module and the downstream end of the second feature extraction module, and a feature graph input into the second feature extraction module is fused with a feature graph output by a local dimensionality reduction layer through the first residual error connection; and a second residual error connection is arranged between the upstream end of the third feature extraction module and the downstream end of the third feature extraction module, and the feature graph input into the third feature extraction module is fused with the feature graph output by the local dimension reduction layer through the second residual error connection.

4. The super-resolution image reconstruction method based on gradual feature fusion according to claim 2, wherein: the channel attention module comprises a variance pooling layer, an AM pooling layer, a first full-connection layer, a second full-connection layer, a sigmoid activation function and a fourth ReLU activation function;

generating a first attention diagram after the fourth feature diagram passes through the variance pooling layer, generating a second attention diagram after the fourth feature diagram passes through the AM pooling layer, fusing the first attention diagram and the second attention diagram, and sequentially passing through the first full-link layer, the fourth ReLU activation function, the second full-link layer and the sigmoid activation function to generate the channel attention diagram;

the variance pooling layer may be represented by the following formula:

Var_c＝pool_var(F_c)

the AM pooling layer may be expressed as follows:

wherein W represents the width of the fourth feature map, H represents the height of the fourth feature map, and < · > represents rounding values therein.

5. The super-resolution image reconstruction method based on gradual feature fusion according to claim 1, wherein: the image reconstruction module comprises a third 3 x 3 convolution layer, a first sub-pixel convolution layer and a fourth 3 x 3 convolution layer, and the intermediate feature map sequentially passes through the third 3 x 3 convolution layer, the first sub-pixel convolution layer and the fourth 3 x 3 convolution layer to generate a target image.

6. The super-resolution image reconstruction method based on gradual feature fusion according to claim 5, wherein: the image super-resolution reconstruction network is also provided with a jump-connection module, wherein the jump-connection module comprises a first jump-connection convolutional layer, a second sub-pixel convolutional layer, a third jump-connection convolutional layer, a fourth jump-connection convolutional layer, a fifth ReLU activation function, a sixth ReLU activation function and a seventh ReLU activation function;

7. The super-resolution image reconstruction method based on gradual feature fusion according to claim 6, wherein: the number of channels of the first hop-join feature map is equal to the number of channels of the second hop-join feature map.

8. The super-resolution image reconstruction method based on gradual feature fusion according to claim 1, wherein: the image super-resolution reconstruction network is provided with a DE fusion module, and the feature map output by the third feature extraction module and the feature map output by the corresponding second feature extraction module are fused through the DE fusion module;

the DE fusion module has the mathematical model as follows:

F1＝[L1,L2]

F2＝δ1(f1(L1))

F3＝δ2(f2(L2))

F_D＝δ3(f3(F1+F2+F3))

wherein L1 represents the feature map output by the second feature extraction module, L2 represents the feature map output by the third feature extraction module, and L1 and L2 are input to the DE fusion module [ ·]Showing the splicing operation performed on the characteristic diagram in the channel direction, delta 1, delta 2 and delta 3 all represent the ReLU activation function, F1 and F2 all represent the convolution operation with the convolution kernel size of 3 x 3, F3 represents the convolution operation with the convolution kernel size of 1 x 1, and F_DAnd a characteristic diagram representing the output of the DE fusion module.

9. An electronic device comprising a processor and a memory, the memory storing a computer program, characterized in that: the processor is configured to execute the method for super-resolution image reconstruction with gradual feature fusion according to any one of claims 1 to 8 by loading the computer program.