CN113256496B

CN113256496B - Lightweight progressive feature fusion image super-resolution system and method

Info

Publication number: CN113256496B
Application number: CN202110650557.XA
Authority: CN
Inventors: 张东阳; 李长宇; 邵杰; 申恒涛
Original assignee: Sichuan Artificial Intelligence Research Institute Yibin
Current assignee: Sichuan Artificial Intelligence Research Institute Yibin
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-09-21
Anticipated expiration: 2041-06-11
Also published as: CN113256496A

Abstract

The invention discloses a light-weight progressive feature fusion image super-resolution system and a light-weight progressive feature fusion image super-resolution method, which reduce system parameters and calculation complexity, have high operation efficiency, and reduce the operation time and the operation resource consumption of the system while realizing the image super-resolution. The invention constructs a progressive attention module PAB, adopts a parallel structure, converts a high-dimensional feature map into a low-dimensional feature for processing, and combines a multi-scale feature and an attention mechanism, thereby greatly reducing model parameters, providing better working performance and realizing the balance of the model parameters and the performance; the invention also constructs a dual attention unit DAM which comprises a space attention mechanism and a channel attention mechanism, thereby realizing a lightweight structure and simultaneously improving the performance of network reconstruction; in addition, features are enhanced by constructing the CPAM in the up-sampling module, so that the performance of the system is greatly improved, and the calculation cost is kept small.

Description

Lightweight progressive feature fusion image super-resolution system and method

Technical Field

The invention belongs to the field of image processing, and particularly relates to a lightweight progressive feature fusion image super-resolution system and method.

Background

The research of image super-resolution technology is a very classic task in computer vision, which aims to recover high-resolution image details from a low-resolution image and enhance the visual quality of the image. The image super-resolution technology has important application in medical imaging, meteorological monitoring, remote sensing satellite imaging and the like. However, image super-resolution is a typical ill-posed problem because there is a one-to-many relationship between a low-resolution image and its corresponding high-resolution image. Due to this inherent property, image super-resolution reconstruction remains a challenging task.

At present, with the rapid development of deep learning, a method based on example learning becomes the mainstream of image super-resolution technology research, and the method tries to establish a mapping relation between a low-resolution and a high-resolution image pair. The SRCNN proposed by Dong et al In "Chao Dong, Chen Change Loy, Kaming He, and Xiaooou Tang, Image super-resolution using deep connected network, In IEEE trans. Pattern anal. Mach. inner., vol. 38, No. 2, pp 295 plus 307, 2016" is a pioneer work of this type of algorithm, which first amplifies a low resolution Image to a target size using bicubic interpolation, then fits a nonlinear mapping through a three-layer convolutional network, and finally outputs a high resolution Image result. Subsequently, Kim et al, In the "Jiwon Kim, Jung Kwon Lee, and Kyong Mu Lee, Accurate image super-resolution using top reliable network. In CVPR, 2016, pp 1646-. Under the influence, the following work has achieved a good effect by deepening and widening the network to design a novel network model, such as the EDSR proposed In the document "Bee Lim, Sanghyun Son, Heewon Kim, Seunjun Nah, and Kyong Mu Lee, Enhanced deep residual recovery network for single Image super-recovery, In CVWorks, 2017, pp 1132, 1140", the document "Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bing Zhonghong, and Yun Fu, Image sub-recovery using vertical channel recovery, In + CV, 2018, pp 310", RCAN, Jun Qiun, Jujun, and Tajun: a regenerative recovery model for single-image super-resolution, In ICCV,2019, pp 4179-.

However, most of the above methods neglect the amount of calculation, and have the problems of large number of parameters, high memory occupation, slow inference time and the like. For example, EDSR is the champion of NTIRE2017 super resolution challenge game, possessing 4300 ten thousand parameters. RCAN is one of the better methods at present, and has more than 800 convolutional layers, including more than 1500 ten thousand parameters, and requires a long training time. EBRN is also a huge model, occupies over 800 ten thousand parameters, and requires a long run time to recover a picture.

Some recent work has also made attempts to model lightweight. Ahn et al In the literature "Namhyuk Ahn, Byungkon Kang, and Kyung-Ah Sohn, Fast, acurate, and lightweight super-resolution with screening residual network, In ECCV, 2018, pp 256 + 272" use shared convolution kernels and group convolution to reduce network parameters. Dong et al In the literature "Chan Dong, Chen Change Long, and Xiaooou Tang. Accelerating the super-resolution connected network. In Computer Vision-ECCV, Vol. 9906. pp. 391. 407" propose to constrain the non-linear transformation of features In a low dimensional space, using the upsampling layer only at the end of the network to improve resolution. Wang et al, In the literature "Xuehui Wang, Qing Wang, Yuzhi Zhuao, Junchi Yan, Lei Fan, and Long Chen. Lightweight single-image super-resolution network with an active automatic feature learning, In Computer Vision-ACCV 2020, pp 268 + 285", propose the use of an auxiliary feature of an intermediate layer to achieve the Lightweight of the model. Han et al In the literature "Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, and Thomas S. Huang, Image super-resolution video dual-state recovery networks In CVPR, 2018, pp 1654-. Furthermore, Hui et al, "Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang, light weight image super-resolution with information multi-disclination network, In ACM MM, 2019, pp 2024-. Recently, Zhang et al, "Huangrong Zhang, Zhi Jin, Xiajun Tan, and Xiying Li. Towards light and master: learning walls progressive for image super-resolution. In ACM MM:, 2020, pp 2113-. However, these approaches have not fully utilized the intermediate features and often result in significant performance loss. Therefore, the image super-resolution reconstruction system and method with both space development performance and light-weight model are still available.

Disclosure of Invention

Aiming at the defects in the prior art, the light-weight progressive feature fusion image super-resolution system and the light-weight progressive feature fusion image super-resolution method solve the problems of large calculation amount, long time consumption and complex system in the prior art.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a lightweight progressive feature fusion image super-resolution system comprises an input layer for inputting a low resolution image, connected to a first convolution layer for initial feature extraction of the low resolution image; the first convolution layer is sequentially connected with a plurality of progressive attention modules PAB to perform feature processing on the initial features to obtain secondary features; adding the output characteristic of the first convolution layer and the output characteristic of the last progressive attention module PAB to add and combine the initial characteristic and the secondary characteristic into a characteristic diagram, and using the combined characteristic diagram as an input signal of an up-sampling module; the up-sampling module is used for up-sampling the combined characteristic diagram, acquiring a high-resolution image and outputting the high-resolution image to the output layer for output.

Furthermore, the up-sampling module is used for improving the image resolution, and includes a first Nearest up-sampling layer, a second convolutional layer, a first convolutional pixel attention module CPAM, a third convolutional layer, a second Nearest up-sampling layer, a fourth convolutional layer, a second convolutional pixel attention module CPAM, and a fifth convolutional layer, which are connected in sequence.

Furthermore, each of the progressive attention modules PAB has the same structure, and each of the progressive attention modules PAB includes 4 branches, namely, a first branch, a second branch, a third branch and a fourth branch, where the 4 branches respectively adopt a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer and a ninth convolutional layer to convert an input signal of the progressive attention module PAB into 1/4 of the original channel number, and keep the space size unchanged;

the sixth convolution layer of the first branch is connected to the Concat layer, the seventh convolution layer output signal of the second branch is processed by the third convolution pixel attention module CPAM and then output to the Concat layer, the eighth convolution layer output signal of the third branch is added with the processing signal of the third convolution pixel attention module CPAM, the addition result is processed by the fourth convolution pixel attention module CPAM and then output to the Concat layer, the ninth convolution layer output signal of the fourth branch is added with the fourth convolution pixel attention module CPAM, and the addition result is processed by the fifth convolution pixel attention module CPAM and then output to the Concat layer;

the Concat layer is used for splicing signals of 4 branches in a channel dimension to restore the original size, and output signals of the Concat layer are processed by a dual attention unit (DAM) and a tenth convolution layer to serve as output signals of a progressive attention module (PAB); the dual attention unit DAM is used for improving the system reconstruction performance, and a space attention mechanism and a channel attention mechanism are adopted for feature processing.

Further, the structures of the first to fifth convolution pixel attention modules CPAM are the same; the input signal of the first convolution pixel attention module CPAM is transmitted to an eleventh convolution layer and a twelfth convolution layer respectively, the output characteristic of the eleventh convolution layer is added with the output characteristic of the Sigmoid function layer, the addition result is used as the input signal of the thirteenth convolution layer, the twelfth convolution layer outputs a signal to the Sigmoid function layer, and the output signal of the thirteenth convolution layer is used as the output signal of the first convolution pixel attention module CPAM.

Further, the dual attention unit DAM comprises a channel attention mechanism part and a space attention mechanism part which are connected in sequence;

the channel attention mechanism part takes an input signal of the dual attention unit DAM as an input, processes the input signal through an average pooling layer, a fourteenth pooling layer, a fifteenth pooling layer and a maximum pooling layer, a sixteenth pooling layer and a seventeenth pooling layer which are connected in sequence respectively, adds a processing signal obtained through the average pooling layer, the fourteenth pooling layer and the fifteenth pooling layer with a processing signal obtained through the maximum pooling layer, the sixteenth pooling layer and the seventeenth pooling layer, takes the added signal as an input signal of a second Sigmoid function layer, and multiplies an output signal of the second Sigmoid function layer with the input signal of the dual attention unit DAM as an output;

the space attention mechanism part takes an output signal of the channel attention mechanism part as an input, an input signal is input to a feature splicing layer after a feature average value and a feature maximum value are obtained along a space dimension, the feature splicing layer is sequentially connected with an eighteenth convolution layer and a third Sigmoid function layer, and the output signal of the third Sigmoid function layer is multiplied by the output signal of the channel attention mechanism part to serve as an output signal of the dual attention unit DAM.

The invention has the beneficial effects that:

(1) the invention provides a lightweight progressive feature fusion image super-resolution system, which reduces system parameters and computational complexity, has high running efficiency, realizes image super-resolution, and reduces running time and computational resource consumption

(2) The invention constructs a progressive attention module PAB, adopts a parallel structure, converts a high-dimensional feature map into a low-dimensional feature for processing, combines a multi-scale feature and an attention mechanism, reduces model parameters, provides better working performance and realizes the balance of the model parameters and the performance.

(3) The invention constructs a dual attention unit DAM, which comprises a space attention mechanism and a channel attention mechanism, and improves the performance of network reconstruction while realizing a lightweight structure. Features are enhanced by building a CPAM and matching two convolutional layers enables the up-sampling module to increase resolution by 2 or 3 times.

An image super-resolution method using a lightweight progressive feature fusion image super-resolution system, comprising the steps of:

s1, inputting the low-resolution image to a lightweight progressive feature fusion image super-resolution system, and extracting initial features of the low-resolution image through the first convolution layer;

s2, performing feature processing on the initial features by using a plurality of progressive attention modules PAB to obtain secondary features;

s3, adding the initial features and the secondary features to obtain a feature map;

and S4, upsampling the characteristic diagram through an upsampling module to obtain a high-resolution image.

The invention has the beneficial effects that: the invention can recover the high-resolution image from the low-resolution image and has the characteristics of high operation efficiency and low resource consumption.

Drawings

Fig. 1 is a schematic diagram of a lightweight progressive feature fusion image super-resolution system proposed by the present invention.

Fig. 2 is a flowchart of a lightweight progressive feature fusion image super-resolution method provided by the invention.

Fig. 3 shows a learning rate reduction method for different training modes.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example 1

As shown in fig. 1, a lightweight progressive feature fusion image super-resolution system includes an input layer for inputting a low-resolution image, which is connected to a first convolution layer for initial feature extraction of the low-resolution image; the first convolution layer is sequentially connected with a plurality of progressive Attention modules PAB (progressive Attention Block) to perform feature processing on the initial features to obtain secondary features; adding the output characteristic of the first convolution layer and the output characteristic of the last progressive attention module PAB to combine the initial characteristic and the secondary characteristic into a characteristic diagram, and using the combined characteristic diagram as an input signal of an up-sampling module; the up-sampling module is used for up-sampling the combined characteristic diagram, acquiring a high-resolution image and outputting the high-resolution image to the output layer for output.

The up-sampling module is used for improving the image resolution and comprises a first near up-sampling layer, a second convolution layer, a first convolution Pixel Attention module CPAM (volumetric Pixel Attention module), a third convolution layer, a second near up-sampling layer, a fourth convolution layer, a second convolution Pixel Attention module CPAM and a fifth convolution layer which are sequentially connected.

Each progressive attention module PAB has the same structure, and includes 4 branches, namely a first branch, a second branch, a third branch and a fourth branch, wherein the 4 branches respectively adopt a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer and a ninth convolutional layer to convert the number of channels of the input signal of the progressive attention module PAB into 1/4 of the number of original channels, and the space size is kept unchanged;

the sixth convolution layer of the first branch is connected to the Concat layer, the seventh convolution layer output signal of the second branch is processed by the third convolution pixel attention module CPAM and then output to the Concat layer, the eighth convolution layer output signal of the third branch is added with the processing signal of the third convolution pixel attention module CPAM, the addition result is processed by the fourth convolution pixel attention module CPAM and then output to the Concat layer, the ninth convolution layer output signal of the fourth branch is added with the fourth convolution pixel attention module CPAM, and the addition result is processed by the fifth convolution pixel attention module CPAM and then output to the Concat layer.

In this embodiment, the Concat layer is used to dimension and splice the input feature maps.

The Concat layer is configured to splice signals of 4 branches in a channel dimension to restore an original size, and an output signal of the Concat layer is processed by a dual Attention unit (dam) and a tenth convolution layer to be used as an output signal of the progressive Attention module PAB; the dual attention unit DAM is used for improving the system reconstruction performance, and a space attention mechanism and a channel attention mechanism are adopted for feature processing.

The first to fifth convolution pixel attention modules CPAM to CPAM have the same structure; the input signal of the first convolution pixel attention module CPAM is transmitted to an eleventh convolution layer and a twelfth convolution layer respectively, the output characteristic of the eleventh convolution layer is added with the output characteristic of the Sigmoid function layer, the addition result is used as the input signal of the thirteenth convolution layer, the twelfth convolution layer outputs a signal to the Sigmoid function layer, and the output signal of the thirteenth convolution layer is used as the output signal of the first convolution pixel attention module CPAM.

The dual attention unit DAM comprises a channel attention mechanism part and a space attention mechanism part which are connected in sequence;

the space attention mechanism part takes an output signal of the channel attention mechanism part as an input, an input signal is input to a feature splicing layer after a feature average value and a feature maximum value are obtained along a channel, the feature splicing layer is sequentially connected with an eighteenth convolution layer and a third Sigmoid function layer, and the output signal of the third Sigmoid function layer is multiplied by the output signal of the channel attention mechanism part to serve as an output signal of the dual attention unit DAM.

In the present embodiment, multiplication means multiplication of elements corresponding to spatial positions.

The progressive attention module PAB is proposed by the invention to carry out feature processing and adopts a parallel structure. For a feature input, the feature is first converted to 1/4 for the original number of channels by 4 convolutions of 1 x 1, keeping the spatial size unchanged. The result of the first branch is directly mapped to the middle layer without processing. The result of the second branch is mapped to the middle layer after being processed by the convolution pixel attention module CPAM. For the third and fourth branches, the result of the last branch is added and sent to CPAM processing, and finally mapped to the middle layer. After processing, these 4 branches are spliced in the channel dimension to restore the original size. The DAM module provides an attention mechanism, including channel attention and spatial attention, to process features in a sequential manner. Unlike previous attention mechanism modules, they mostly employ a fully-connected based multi-layer perceptron mechanism to achieve non-linear translation of features. The DAM module in the present invention processes the attention feature maps in the channel and space dimensions, respectively, using only one-dimensional and two-dimensional convolution, respectively. This saves a great deal of computation and model parameters. In the upsampling module, the resolution of the feature map is first improved by using a Nearest interpolation method, and feature enhancement is performed by using 2 3-by-3 convolutions and CPAM. Each up-sampling module can improve the resolution by 2 times or 3 times, and 2 up-sampling modules can be used for achieving the purpose of up-sampling by 4 times.

Table 1 comparative experiment results table

PAB	CPAM	DAM	Parameter(s)	Multiplication and addition calculated quantity	Peak signal to noise ratio
						×	×	×	612579	47.7G	25.90
√	×	×	548899	44.2G	26.02
						√	√	×	569059	45.1G	26.17
√	√	√	569363	45.1G	26.26

As can be seen from Table 1, the model has better effect as the three modules PAB, CPAM and DAM are added into the network. Comparing the models of the second row and the third row, wherein the model of the second row and the progressive attention module PAB are replaced by a series of convolution blocks, the parameter amount is reduced and the effect is enhanced instead with the addition of the PAB. CPAM is a pixel-based attention mechanism, and is added to the model to increase the model effect. The DAM is a very light-weight channel and space attention mechanism block, only the extremely small calculation expense is added, and the effect of the model can be greatly improved.

Example 2

As shown in fig. 2, an image super-resolution method using a lightweight progressive feature fusion image super-resolution system includes the following steps:

in this embodiment, the progressive attention module PAB adopts a parallel structure. For a feature input, the feature is first converted to 1/4 for the original number of channels by 4 convolutions of 1 x 1, keeping the spatial size unchanged. The result of the first branch is directly mapped to the middle layer without processing. The result of the second branch is processed by the CPAM module and then mapped to the middle layer. For the third and fourth branches, they are added with the result of the last branch and sent to CPAM processing, and finally mapped to the middle layer. After processing, these 4 branches are spliced in the channel dimension to restore the original size.

In this implementation, the upsampling module uses a Nearest interpolation to increase the resolution of the feature map, and then performs feature enhancement using two 3 × 3 convolutions and CPAM. Each up-sampling module can improve the resolution by 2 times or 3 times, and two up-sampling modules can be used for achieving the purpose of up-sampling by 4 times.

The invention adopts a random gradient descent algorithm to optimize network parameters and adopts L1 loss as a target loss function. The DIV2K dataset was used as a training set, which contained 800 high definition images. 32 pictures are input into the network training for each batch. The training of the network is stopped when a total of 1000 iterations of 800 pictures are performed. In the training process, the learning rate is a very important variable, and is directly related to the effect of the trained model. Inspired by a restart mechanism, a learning rate attenuation mode of cosine annealing (cos _ lr) is adopted to train the network. The conventional method generally adopts a gait descending (step _ lr) mode, and different learning rate descending modes are shown in fig. 3. It can be seen that the learning rate is reduced by half every 200 training rounds by using the gait reduction method. While cosine annealing may reduce the learning rate by a cosine function. As the number of training increases, the learning rate first slowly decreases, then accelerates, again slowly decreases, and then returns to the original maximum. Because the network in the invention is a lightweight model, the capacity of the model is relatively small, and the objective optimization function may be multimodal, there are a plurality of local optimal solutions in addition to the global optimal solution, and the gradient descent algorithm may fall into a local minimum during training. At this time, by abruptly increasing the learning rate, it is possible to jump out the local minimum and find a path leading to the global minimum. Therefore, the invention adopts the mode of reducing the learning rate of cosine annealing, and can greatly excavate the performance of the lightweight network.

The advantages of the structure of the invention can be described in terms of modules. Firstly, the PAB module adopts a parallel structure, and converts a high-dimensional feature map into a low-dimensional feature for processing. Secondly, in an up-sampling part, deconvolution operation and sub-pixel convolution operation with high complexity are abandoned, and any parameter cannot be introduced by a Nearest interpolation method, so that the size of the model is further reduced. In addition, unlike the previous network structure, the present invention only uses 40 channels to achieve the goal of reducing the network width, which can greatly reduce the model parameters. As can be seen from Table 2, the present invention achieves the best results in terms of running time, model size, model complexity, and reconstruction effect compared with other methods.

TABLE 2 model complexity contrast Table

Model (model)	Amount of ginseng	Multiply-add calculated quantity	Peak signal to noise ratio	Degree of structural similarity
					CARN	1,592k	90.90G	26.07	0.7837
IMDN	715k	40.98G	26.04	0.7838
					A²F-SD	320K	18.20G	25.80	0.7767
WSR	975K	123.27G	26.09	0.7840
					The invention	569K	45.10G	26.19	0.7883

CARN In Table 2 refers to a method from the literature "Namhyuk Ahn, Byungkon Kang, and Kyung-Ah Sohn, Fast, acid, and Lightweight super-resolution with screening resolution, In Computer Vision-ECCV 2018, pp 256-plus 272", IMDN refers to a method from the literature "Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang, Lightweight image super-resolution with information resolution, In ACM MM. pp 2024-plus 2", A²F-SD refers to the method from the documents "Xuehui Wang, Qing Wang, Yuzhi Zhuao, Junchi Yan, Lei Fan, and Long Chen. Lightweight single-image super-resolution network with an active automatic feature learning In Computer Vision-ACCV 2020, pp 268. 285", WSR refers to the method from the documents "Huangrong Zhang, Zhi Jin, Xiaojun Tan, and Xiying Li. tod illuminator and Fan: learning wavelength for image super-resolution In ACM MM:, 2020, 2113. sup. 2121".

Table 3 effect and time comparison table

Table 3 shows the peak signal-to-noise ratio (PSNR) and runtime comparison of the present invention with the existing method on the public data set, it can be found that the present invention achieves the best reconstruction effect (the larger the PSNR value is, the better the effect is), while maintaining the lower runtime overhead. In Table 3, DSRN refers to a method from the documents "Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, and Thomas S. Huang, Image super-resolution via dual-state recovery networks. In CVPR, 2018, pp 1654-cure Image Super-resolution using vertical default conditional Network, In CVPR, 2016, pp 1646-²The methods denoted by F-M are given In Table 2 of the methods denoted by the documents "Xuehui Wang, Qing Wang, Yuzhi Zhuao, Junchi Yan, Lei Fan, and Long Chen. Lightweight single-image super-resolution network with an active automatic feature learning In Computer Vision-ACCV 2020, pp 268-285", CARN, WSR and IMDN.

In summary, the invention is based on the lightweight network with progressive feature fusion, and not only can recover the high-resolution image from the low-resolution image, and obtain the best reconstruction effect compared with the current method, but also the system has fewer parameters and smaller operation overhead.

Claims

1. A lightweight progressive feature fusion image super-resolution system is characterized by comprising an input layer, a super-resolution image generation layer and a super-resolution image generation layer, wherein the input layer is used for inputting a low-resolution image and is connected to a first convolution layer for initial feature extraction of the low-resolution image; the first convolution layer is sequentially connected with a plurality of progressive attention modules PAB to perform feature processing on the initial features to obtain secondary features; adding the output characteristic of the first convolution layer and the output characteristic of the last progressive attention module PAB to add and combine the initial characteristic and the secondary characteristic into a characteristic diagram, and using the combined characteristic diagram as an input signal of an up-sampling module; the up-sampling module is used for up-sampling the combined characteristic diagram, acquiring a high-resolution image and outputting the high-resolution image to an output layer for output; the up-sampling module is used for improving the image resolution and comprises a first Neorest up-sampling layer, a second convolution layer, a first convolution pixel attention module CPAM, a third convolution layer, a second Neorest up-sampling layer, a fourth convolution layer, a second convolution pixel attention module CPAM and a fifth convolution layer which are sequentially connected; each progressive attention module PAB has the same structure, and includes 4 branches, namely a first branch, a second branch, a third branch and a fourth branch, wherein the 4 branches respectively adopt a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer and a ninth convolutional layer to convert an input signal of the progressive attention module PAB into 1/4 of the original channel number, and the space size is kept unchanged;

the Concat layer is used for splicing signals of 4 branches in a channel dimension to restore the original size, and output signals of the Concat layer are processed by a dual attention unit (DAM) and a tenth convolution layer to serve as output signals of a progressive attention module (PAB); the dual attention unit DAM is used for improving the system reconstruction performance and performing feature processing by adopting a space attention mechanism and a channel attention mechanism; the first to fifth convolution pixel attention modules CPAM to CPAM have the same structure; the input signal of the first convolution pixel attention module CPAM is respectively transmitted to an eleventh convolution layer and a twelfth convolution layer, the output characteristic of the eleventh convolution layer is added with the output characteristic of the Sigmoid function layer, the addition result is used as the input signal of a thirteenth convolution layer, the twelfth convolution layer outputs a signal to the Sigmoid function layer, and the output signal of the thirteenth convolution layer is used as the output signal of the first convolution pixel attention module CPAM; the dual attention unit DAM comprises a channel attention mechanism part and a space attention mechanism part which are connected in sequence;

2. A light-weight progressive feature fusion image super-resolution method is characterized by comprising the following steps:

s4, upsampling the characteristic diagram through an upsampling module to obtain a high-resolution image, wherein the high-resolution image is obtained;

the lightweight progressive feature fusion image super-resolution system comprises an input layer, a second convolution layer and a third convolution layer, wherein the input layer is used for inputting a low-resolution image and is connected to the first convolution layer to perform initial feature extraction of the low-resolution image; the first convolution layer is sequentially connected with a plurality of progressive attention modules PAB to perform feature processing on the initial features to obtain secondary features; adding the output characteristic of the first convolution layer and the output characteristic of the last progressive attention module PAB to add and combine the initial characteristic and the secondary characteristic into a characteristic diagram, and using the combined characteristic diagram as an input signal of an up-sampling module; the up-sampling module is used for up-sampling the combined characteristic diagram, acquiring a high-resolution image and outputting the high-resolution image to an output layer for output;

the up-sampling module is used for improving the image resolution and comprises a first Neorest up-sampling layer, a second convolution layer, a first convolution pixel attention module CPAM, a third convolution layer, a second Neorest up-sampling layer, a fourth convolution layer, a second convolution pixel attention module CPAM and a fifth convolution layer which are sequentially connected;

each progressive attention module PAB has the same structure, and includes 4 branches, namely a first branch, a second branch, a third branch and a fourth branch, wherein the 4 branches respectively adopt a sixth convolutional layer, a seventh convolutional layer, an eighth convolutional layer and a ninth convolutional layer to convert an input signal of the progressive attention module PAB into 1/4 of the original channel number, and the space size is kept unchanged;

the Concat layer is used for splicing signals of 4 branches in a channel dimension to restore the original size, and output signals of the Concat layer are processed by a dual attention unit (DAM) and a tenth convolution layer to serve as output signals of a progressive attention module (PAB); the dual attention unit DAM is used for improving the system reconstruction performance and performing feature processing by adopting a space attention mechanism and a channel attention mechanism; the first to fifth convolution pixel attention modules CPAM to CPAM have the same structure; the input signal of the first convolution pixel attention module CPAM is respectively transmitted to an eleventh convolution layer and a twelfth convolution layer, the output characteristic of the eleventh convolution layer is added with the output characteristic of the Sigmoid function layer, the addition result is used as the input signal of a thirteenth convolution layer, the twelfth convolution layer outputs a signal to the Sigmoid function layer, and the output signal of the thirteenth convolution layer is used as the output signal of the first convolution pixel attention module CPAM;

the channel attention mechanism part takes an input signal of the dual attention unit DAM as an input, processes the input signal through an average pooling layer, a fourteenth pooling layer, a fifteenth pooling layer and a maximum pooling layer, a sixteenth pooling layer and a seventeenth pooling layer which are connected in sequence respectively, adds a processing signal obtained through the average pooling layer, the fourteenth pooling layer and the fifteenth pooling layer with a processing signal obtained through the maximum pooling layer, the sixteenth pooling layer and the seventeenth pooling layer, takes the added signal as an input signal of a second Sigmoid function layer, and multiplies an output signal of the second Sigmoid function layer with the input signal of the dual attention unit DAM as an output; the space attention mechanism part takes an output signal of the channel attention mechanism part as an input, an input signal is input to a feature splicing layer after a feature average value and a feature maximum value are obtained along a channel, the feature splicing layer is sequentially connected with an eighteenth convolution layer and a third Sigmoid function layer, and the output signal of the third Sigmoid function layer is multiplied by the output signal of the channel attention mechanism part to serve as an output signal of the dual attention unit DAM.