CN114581300A

CN114581300A - Image super-resolution reconstruction method and device

Info

Publication number: CN114581300A
Application number: CN202210147765.2A
Authority: CN
Inventors: 史景伦; 李显惠; 胡晨晨; 王骁行
Original assignee: Guangdong Weibo Intelligent Technology Co ltd; South China University of Technology SCUT
Current assignee: Guangdong Weibo Intelligent Technology Co ltd; South China University of Technology SCUT
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-06-03

Abstract

The invention discloses an image super-resolution reconstruction method and device, wherein the method comprises the following steps: extracting shallow layer characteristics of the low-resolution input image; performing feature extraction, fusion and enhancement on the shallow features through m trunk networks consisting of cascade residual groups consisting of multi-scale cascade attention residual modules and global jump connection to obtain deep features; upsampling the deep features using sub-pixel convolution; and reconstructing the image by using the obtained characteristics to obtain an image with higher resolution. The invention adopts a multi-scale cascade attention residual error module, and extracts, enhances and fuses various characteristics from the angles of receptive field, width, attention and the like; by means of jump connection and cascade residual, low-frequency information is bypassed, and features of different levels in network depth are integrated, so that richer details are obtained; the method can reconstruct the image with richer details and higher quality, and can be widely applied to the field of image super-resolution reconstruction.

Description

Image super-resolution reconstruction method and device

Technical Field

The invention relates to the field of image super-resolution reconstruction, in particular to an image super-resolution reconstruction method and device.

Background

With the rapid development of computer technology and artificial intelligence field, high resolution vocabularies such as 8k and 1 hundred million pixels are constantly appearing in the visual field of people, and people have more and more requirements for high resolution images. In addition, in the fields of security monitoring, medical images, remote sensing, face recognition and the like, the images are used as important information carriers, and high-quality images can provide richer details and acquire more available information. Therefore, it would be very important to improve the resolution of the image in real life.

In recent years, the requirement for image resolution is becoming higher and higher, and super-resolution reconstruction has become one of the hot spots of research in the field of computer vision as a low-level vision task. The super-resolution solves the problem of reconstructing a high-resolution output image from a low-resolution input image, and the visual quality of the image is better and clearer by continuously enriching the details of the image.

With the continuous development of deep learning, the deep neural network is also widely applied to image super-resolution reconstruction, and a good reconstruction result is obtained. However, the mainstream algorithm at present often needs a very deep architecture and a long training time, and the deeper the network depth is, the more difficult the training is, and the required training skill is increased. Meanwhile, the low-resolution input contains rich low-frequency information which is treated equally among channels, and the learning of the convolutional neural network is also hindered to a certain extent. And the current convolutional neural network for super-resolution cannot fully utilize the features on multiple scales, so that the learning capability of the convolutional neural network is limited. Therefore, it is necessary to solve the existing problems and to rebuild high quality.

Disclosure of Invention

In order to solve at least one of the technical problems in the prior art to a certain extent, the present invention aims to provide a method and an apparatus for reconstructing image super resolution based on a multi-scale cascade attention residual network.

The technical scheme adopted by the invention is as follows:

an image super-resolution reconstruction method comprises the following steps:

extracting shallow features of the low-resolution input image;

performing feature extraction, fusion and enhancement on the shallow features through m trunk networks consisting of cascade residual groups consisting of multi-scale cascade attention residual modules and global jump connection to obtain deep features;

upsampling the deep features using sub-pixel convolution;

and (4) reconstructing the image by using the characteristics obtained by the up-sampling to obtain an image with higher resolution.

Further, the extracting shallow features from the low-resolution input image includes:

defining a feature extraction component composed of a convolution layer, extracting original features from a low-resolution input image, as shown in the following formula:

F₀＝H_SFE(I_LR) (1)

wherein H_SFE(. The) is expressed as a convolution operation applied to low resolution feature extraction, I_LRRepresenting an input image of low resolution, F₀Representing shallow features extracted by convolution.

Further, the cascade residual group comprises n multi-scale cascade attention residual modules, n feature splicing units, n feature compression units, n short jump connections and 1 local jump connection, wherein the feature compression units are formed by convolution of 1x 1.

Further, the formula expression of the cascade residual group is as follows:

F_m，1＝H_MCRAB(F_m-1) (2)

F_m，2＝H_MCRAB(w_1×1*[F_m，1，F_m-1]+b) (3)

…

F_m，n＝H_MCRAB(w_1×1*[F_m，n-1，F_m，n-2]+b) (4)

F_m＝F_m，n+F_m-1 (5)

wherein, F_m，nRepresents the output characteristics of the nth multi-scale cascaded attention residual module in the mth cascaded residual group, H_MCRABRepresenting the operation of a multiscale cascade of attention residual modules, F_mRepresents an output characteristic of the mth cascade residual group, [ alpha ], [ beta ], [ alpha ] of the mth cascade residual group]Representation feature splicing, w_1×1Represents the weight of the 1 × 1 convolution and b represents the deviation of the convolution kernel.

Further, the multi-scale cascade attention residual module comprises an attention residual unit, an error feedback fusion unit, a jump connection and cascade operation;

the formula expression of the multi-scale cascade attention residual module is as follows:

F_3×3，in1＝w_1×1*F_m，n-1+b (6)

F_3，1＝H_RAB，3×3(F_3×3，in1)+F_3×3，in1 (7)

F_3×3，in2＝w_1×1*[F_3，1，F_3×3，in1]+b (8)

F_3，2＝H_RAB，3×3(F_3×3，in2)+F_3×3，in2 (9)

F_{3×3_in3}＝w_1×1*[F_3，2，F_3，1，F_3×3，in1]+b (10)

F_5×5，in1＝w_1×1*F_m，n-1+b (11)

F_5，1＝H_RAB，5×5(F_5×5，in1)+F_5×5，in1 (12)

F_5×5，in2＝w_1×1*[F_5，1，F_5×5，in1]+b (13)

F_5，2＝H_RAB，5×5(F_5×5，in2)+F_5×5，in2 (14)

F_5×5，in3＝w_1×1*[F_5，2，F_5，1，F_5×5，in1]+b (15)

F_m，n＝H_Confusion(F_3×3，in3，F_5×5，in3)+F_m，n-1 (16)

wherein, F_3×3，in1、F_3×3，in2、F_3×3，in3Respectively representing input features at different stages with a scale of 3x3, F_3，1、F_3，2Respectively, an intermediate feature at a scale of 3x3, F_5×5，in1、F_5×5，in2、F_5×5，in3Respectively representing input features at different stages with a scale of 5x5, F_5，1、F_5，2Respectively, the intermediate feature at the scale 5x5, H_RAB，3×3、H_RAB，5×5Attention residual units, H, with convolution kernels of 3x3 and 5x5, respectively_ConfusionIndicating error feedback fusion unit, F_m，nRepresenting an output characteristic of the nth multi-scale cascade attention residual module in the mth cascade residual set, [ 2 ]]Representation feature splicing, w_1×1Represents the weight of the 1 × 1 convolution and b represents the deviation of the convolution kernel.

Furthermore, the attention residual error unit adopts a wide activation mode, under the condition that the parameter quantity is not changed, wider channel characteristics are obtained, a channel attention module is adopted to enhance the channel characteristics before activation, and finally a space attention unit is adopted to perform space characteristic enhancement on the residual error, wherein the two attention mechanisms both adopt a mode of combining average pooling and maximum pooling;

the formula expression of the attention residual unit is as follows:

y＝τ(H_CA(w_k×k*x+b)) (17)

F_r＝H_SA(w_k×k*y+b) (18)

wherein x and y represent input and output characteristics, respectively, and τ represents a nonlinear activation function ReLU, w_k×kWeight, H, representing kxk convolution_CADenotes the channel attention Unit, H_SARepresenting the channel attention units, wherein the formula expression of each attention unit is as follows:

H_CA＝σ(H_FC(τ(H_FC(P_Avg(x)+P_Max(x)))))*x (19)

H_SA＝σ(w_7×7*[(P_Avg(x)，P_Max(x)]+b))*x (20)

where σ denotes the nonlinear activation function Sigmoid, w_7×7Weight of convolution of 7x7, P_AvgDenotes average pooling operation, P_MaxDenotes maximum pooling operation, H_FCRepresenting a fully connected layer.

Further, the formula expression of the error feedback fusion unit is as follows:

f_{feedback_3}＝τ(w_3×3*F_3×3，in3+b)-τ(w_3×3*F_5×5，in3+b) (21)

F_3×3＝τ(w_3×3*f_{feedback_3}+b)+F_3×3，in3 (22)

f_{feedback_5}＝τ(w_3×3*F_5×5，in3+b)-τ(w_3×3*F_3×3，in3+b) (23)

F_5×5＝τ(w_3×3*f_{feedback_5}+b)+F_5×5，in3 (24)

F_confusion＝w_3×3*[F_3×3，F_5×5]+b (25)

wherein f is_{feedback_3}、f_{feedback_5}Respectively representing error feedback between different scales; f_3×3，F_5×5Respectively representing the fusion characteristics after error feedback, and tau represents a nonlinear activation function ReLU, w_3×3Is a 3x3 convolutionWeight of (1), F_confusionRepresenting a multi-scale residual fusion feature.

Further, the extracting, fusing and enhancing the features of the shallow features through a backbone network composed of m cascade residual groups composed of multi-scale cascade attention residual modules and global jump connection to obtain the deep features includes:

obtaining deep features from shallow features through a trunk network consisting of m cascaded residual groups, 1 residual feature extraction trunk with convolution cascade connection of 3x3 and global jump connection; the specific formula is as follows:

F_m＝H_crir，m(F_m-1)＝H_crir，m(H_crir，m-1(…(H_crir，1(F₀)…)) (26)

F_Res＝τ(w_3×3*F_m+b) (27)

F_DF＝F₀+F_Res (28)

wherein H_crir，mFor the operation of the mth cascaded residual group, F_ResTo pass the final residual features after a 3 × 3 convolutional layer, F_DFFor depth features, which finally consist of shallow features and residual features, F₀Is a shallow feature.

Further, the upsampling the deep features using the sub-pixel convolution includes:

the deep level features extracted through the backbone network are up-sampled by using a layer of sub-pixel convolution, and the specific formula is as follows:

F_up＝H_{Sub_pixel}(F_DF) (29)

wherein H_{Sub_pixel}For up-sampling operations with sub-pixel convolution, F_upIs an upsampled output characteristic.

The reconstructing the image by using the features obtained by the upsampling to obtain the image with higher resolution comprises the following steps:

reconstruction of high resolution images I by upsampling predicted features_SRSpecific formula (I)The following were used:

I_SR＝H_R(F_up) (30)

wherein H_RFor reconstructing high-resolution images I_SRThe convolution operation of (1).

The other technical scheme adopted by the invention is as follows:

an image super-resolution reconstruction apparatus comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method described above.

The invention has the beneficial effects that: the invention adopts a multi-scale cascade attention residual error module, and extracts, enhances and fuses various characteristics from the angles of receptive field, width, attention and the like; meanwhile, low-frequency information is bypassed by jump connection and cascade residual, and features of different levels in the depth of the network are integrated, so that richer details are obtained; by the method, the image with richer details and higher quality can be reconstructed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a diagram of a multi-scale cascaded attention residual network in an embodiment of the invention;

FIG. 2 is a diagram of an attention residual unit in an embodiment of the present invention;

FIG. 3 is a diagram of a channel attention unit and a spatial attention unit in an embodiment of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

As shown in fig. 1, the present embodiment provides an image super-resolution reconstruction method based on a multi-scale cascade attention residual network diagram, which performs fusion reinforcement on extracted features through a multi-scale cascade attention residual module, learns depth residual features by using a structure of a global residual embedded cascade residual, and finally completes reconstruction, and specifically includes the following steps:

step 1, shallow feature extraction is carried out on an input image, and the method specifically comprises the following steps:

using a feature extraction module consisting of 3 × 3 convolutional layers, original features are extracted from the low-resolution input, as shown in the following formula:

F₀＝H_SFE(I_LR) (1)

And 2, extracting features of different scales by using convolution kernels of different sizes for the attention residual errors, so that the network can learn richer image information. As shown in fig. 1, shallow features are input into a backbone network formed by connecting m cascade residual groups and global jumps, so that the shallow features can be subjected to feature extraction and enhancement to obtain richer and deeper features. Wherein the cascade residual group comprises n multi-scale cascade residual attention modules, as shown in fig. 1, each multi-scale residual attention module comprises 4 attention residual units, 1 error feedback fusion unit, and 6 short jump connections. In this example, m is 3 and n is 3, but the values of m and n do not limit the technical solution of the present invention. The attention residual unit uses convolution kernels of 3x3 and 5x5, respectively, and is shown in FIG. 2, and the involved channel attention unit and spatial attention unit are shown in FIG. 3; as shown in fig. 1, the error feedback fusion unit is a fusion module formed by convolution of 4 pieces of 3x3 followed by relu and convolution of a feature splicing unit and 3x3, the feature splicing unit is formed by convolution of 1x1, and the network is easier to optimize by using residual learning. The specific process is as follows:

firstly, obtaining two different characteristics of the shallow layer characteristics extracted in the step 1 through an attention residual error unit, then performing characteristic fusion on the two characteristics through an error feedback fusion unit, and finally adding the two characteristics with the shallow layer characteristics to form a residual error block; the expression is as follows:

specifically, the formula is shown as follows:

F_3×3，in1＝w_1×1*F_m，n-1+b (2)

F_3，1＝H_RAB，3×3(F_3×3，in1)+F_3×3，in1 (3)

F_3×3，in2＝w_1×1*[F_3，1，F_3×3，in1]+b (4)

F_3，2＝H_RAB，3×3(F_3×3，in2)+F_3×3，in2 (5)

F_{3×3_in3}＝w_1×1*[F_3，2，F_3，1，F_3×3，in1]+b (6)

F_5×5，in1＝w_1×1*F_m，n-1+b (7)

F_5，1＝H_RAB，5×5(F_5×5，in1)+F_5×5，in1 (8)

F_5×5，in2＝w_1×1*[F_5，1，F_5×5，in1]+b (9)

F_5，2＝H_RAB，5×5(F_5×5，in2)+F_5×5，in2 (10)

F_5×5，in3＝w_1×1*[F_5，2，F_5，1，F_5×5，in1]+b (11)

F_m，n＝H_Confusion(F_3×3，in3，F_5×5，in3)+F_m，n-1 (12)

wherein, F_3×3，in1、F_3×3，in2、F_3×3，in3Respectively representing input features at different stages with a scale of 3x3, F_3，1、F_3，2Respectively, an intermediate feature at a scale of 3x3, F_5×5，in1、F_5×5in2、F_5×5，in3Respectively indicating rulerInput features at various stages with degrees 5x5, F_5，1、F_5，2Respectively, the intermediate feature at the scale 5x5, H_RAB，3×3、H_RAB，5×5Attention residual units, H, with convolution kernels of 3x3 and 5x5, respectively_ConfusionRepresenting an error feedback fusion unit. In this example, m is [1, 3 ]]And n is [1, 3 ]]。

And 3, obtaining the rich features required by the up-sampling by the shallow feature through a feature extraction network consisting of m cascaded residual groups, 1 residual feature extraction trunk which is formed by convolution and series connection of 3x3 and global jump connection. Specifically, the formula is shown as follows:

F_m＝H_crir，m(F_m-1)＝H_crir，m(H_crir，m-1(…(H_crir，1(F₀)…)) (13)

F_Res＝τ(w_3×3*F_m+b) (14)

F_DF＝F₀+F_Res (15)

wherein H_crir，mFor the operation of the mth cascaded residual group, F_ResTo pass the final residual features after a 3 × 3 convolutional layer, F_DFIs a depth feature which is finally composed of a shallow feature and a residual feature.

Step 4, a layer of sub-pixel convolution layer is used for up-sampling the deep level features extracted by the main network to obtain a high-resolution image, and the expression is as follows:

F_up＝H_{Sub_pixel}(F_DF) (16)

And 4, reconstructing the image, which specifically comprises the following steps: reconstruction of high resolution images I by upsampling predicted features_SRThe expression is as follows:

I_SR＝H_R(F_up) (15)

wherein H_RFor reconstructing high-resolution images I_SROf a rollAnd (4) performing product operation.

In summary, compared with the prior art, the invention has the following advantages and effects:

(1) according to the embodiment of the invention, multiple scale features are extracted from an image by adopting a multi-scale cascade attention residual error module, the channel feature before an activation function is widened on the premise of ensuring that parameters are not changed, the channel feature and the space feature in the residual error are respectively enhanced by adopting the channel attention and the space attention, and finally, the respective scale feature is enhanced by means of the error between scales through an error feedback fusion unit and the multi-scale enhanced features are fused, so that the extracted features are richer.

(2) According to the embodiment of the invention, by adopting the structure that the global residual is embedded with the cascade residual, the network can bypass the low-frequency information in the low-resolution input, learn more high-frequency residual information and acquire rich detailed characteristics. And a deep network does not need to be established, and a high-resolution reconstructed image with good effect can be obtained.

The embodiment also provides an image super-resolution reconstruction apparatus, including:

at least one processor;

at least one memory for storing at least one program;

The image super-resolution reconstruction device of the embodiment can execute the image super-resolution reconstruction method provided by the method embodiment of the invention, can execute any combination of the implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.

The embodiment also provides a storage medium, which stores an instruction or a program capable of executing the image super-resolution reconstruction method provided by the embodiment of the method of the invention, and when the instruction or the program is executed, the method can be executed by any combination of the embodiment of the method, and the method has corresponding functions and beneficial effects.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be understood that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The image super-resolution reconstruction method is characterized by comprising the following steps of:

extracting shallow features of the low-resolution input image;

upsampling the deep features using sub-pixel convolution;

2. The image super-resolution reconstruction method according to claim 1, wherein the extracting shallow features from the low-resolution input image comprises:

defining a feature extraction component composed of a convolutional layer, and extracting original features from a low-resolution input image, specifically as shown in the following formula:

F₀＝H_SFE(I_LR) (1)

3. The image super-resolution reconstruction method according to claim 1, wherein the cascade residual group comprises n multi-scale cascade attention residual modules, n feature concatenation units, n feature compression units, n short jump connections and 1 local jump connection, and the feature compression unit comprises a convolution of 1x 1.

4. The image super-resolution reconstruction method according to claim 3, wherein the formula expression of the cascade residual group is as follows:

F_m，1＝H_MCRAB(F_m-1) (2)

F_m，2＝H_MCRAB(w_1×1*[F_m，1，F_m-1]+b) (3)

…

F_m，n＝H_MCRAB(w_1×1*[F_m，n-1，F_m，n-2]+b) (4)

F_m＝F_m，n+F_m-1 (5)

wherein, F_m，nRepresenting the output characteristics of the nth multi-scale cascaded attention residual module in the mth cascaded residual group, H_MCRABRepresenting the operation of a multiscale cascade of attention residual modules, F_mRepresents an output characteristic of the mth cascade residual group, [ alpha ], [ beta ], [ alpha ] of the mth cascade residual group]Representation feature splicing, w_1×1Represents the weight of the 1 × 1 convolution and b represents the deviation of the convolution kernel.

5. The image super-resolution reconstruction method according to claim 3, wherein the multi-scale cascade attention residual module comprises an attention residual unit, an error feedback fusion unit, a jump connection and a cascade operation;

F_3×3，in1＝w_1×1*F_m，n-1+b (6)

F_3，1＝H_RAB，3×3(F_3×3，in1)+F_3×3，in1 (7)

F_3×3，in2＝w_1×1*[F_3，1，F_3×3，in1]+b (8)

F_3，2＝H_RAB，3×3(F_3×3，in2)+F_3×3，in2 (9)

F_{3×3_in3}＝w_1×1*[F_3，2，F_3，1，F_3×3，in1]+b (10)

F_5×5，in1＝w_1×1*F_m，n-1+b (11)

F_5，1＝H_RAB，5×5(F_5×5，in1)+F_5×5，in1 (12)

F_5×5，in2＝w₁×₁*[F_5，1，F_5×5，in1]+b (13)

F_5，2＝H_RAB，5×₅(F_5×5，in2)+F_5×5，in2 (14)

F_5×5，in3＝w_1×1*[F_5，2，F_5，1，F_5×5，in1]+b (15)

F_m，n＝H_Confusion(F_3×3，in3，F_5×5，in3)+F_m，n-1 (16)

6. The image super-resolution reconstruction method according to claim 5, wherein the attention residual error unit adopts a wide activation mode, obtains a wider channel feature under the condition that parameter quantity is not changed, simultaneously adopts a channel attention module to enhance the channel feature before activation, and finally adopts a space attention unit to perform space feature enhancement on the residual error, wherein both attention mechanisms adopt a mode of combining average pooling and maximum pooling;

the formula expression of the attention residual unit is as follows:

y＝τ(H_CA(w_k×k*x+b)) (17)

F_r＝H_SA(w_k×k*y+b) (18)

wherein x and y represent input and output characteristics, respectively, and τ tableShows the nonlinear activation function ReLU, w_k×kWeight, H, representing kxk convolution_CADenotes the channel attention Unit, H_SARepresenting the channel attention units, wherein the formula expression of each attention unit is as follows:

H_CA＝σ(H_FC(τ(H_FC(P_Avg(x)+P_Max(x)))))*x (19)

H_SA＝σ(w_7×7*[(P_Avg(x)，P_Max(x)]+b))*x (20)

7. The image super-resolution reconstruction method according to claim 5, wherein the formula expression of the error feedback fusion unit is as follows:

f_{feedback_3}＝τ(w_3×3*F_3×3，in3+b)-τ(w_3×3*F_5×5，in3+b) (21)

F_3×3＝τ(w_3×3*f_{feedback_3}+b)+F_3×3，in3 (22)

f_{feedback_5}＝τ(w_3×3*F_5×5，in3+b)-τ(w_3×3*F_3×3，in3+b) (23)

F_5×5＝τ(w_3×3*f_{feedback_5}+b)+F_5×5，in3 (24)

F_confusion＝w_3×3*[F_3×3，F_5×5]+b (25)

wherein f is_{feedback_3}、f_{feedback_5}Respectively representing error feedback between different scales; f_3×3，F_3×5Respectively representing the fusion characteristics after error feedback, and tau represents a nonlinear activation function ReLU, w_3×3Weight of convolution of 3x3, F_confusionRepresenting a multi-scale residual fusion feature.

8. The image super-resolution reconstruction method of claim 1, wherein the extracting, fusing and enhancing the features of the shallow features through a backbone network composed of m cascade residual groups composed of multi-scale cascade attention residual modules and global jump connection to obtain the deep features comprises:

F_Res＝τ(w_3×3*F_m+b) (27)

F_DF＝F₀+F_Res (28)

wherein H_crir，mFor the operation of the mth cascaded residual group, F_ResTo pass the final residual features after one 3 × 3 convolution layer, F_DFFor depth features, which finally consist of shallow features and residual features, F₀Is a shallow feature.

9. The image super-resolution reconstruction method according to claim 1, wherein the up-sampling of deep features using sub-pixel convolution comprises:

F_up＝H_{Sub_pixel}(F_DF) (29)

The reconstructing the image by using the feature obtained by the upsampling to obtain the image with higher resolution includes:

feature reconstruction by upsampling predictionHigh resolution image I_SRThe specific formula is as follows:

I_SR＝H_R(F_up) (30)

10. An image super-resolution reconstruction apparatus, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-9.