CN116228542A

CN116228542A - Image super-resolution reconstruction method based on trans-scale non-local attention mechanism

Info

Publication number: CN116228542A
Application number: CN202310250793.1A
Authority: CN
Inventors: 李天平; 李冠兴; 魏艳军; 崔朝童; 李萌
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-06-06

Abstract

The invention discloses an image super-resolution reconstruction method and system based on a cross-scale non-local attention mechanism, and belongs to the technical field of image processing. According to the invention, a trans-scale non-local attention mechanism is introduced into an image super-resolution reconstruction model, and the relation between LR features and large-scale HR plaques in the same feature map is learned and mined. A dual regression network is introduced to provide an additional constraint to learn the mapping from the low resolution image to the high resolution image and the dual mapping from the super resolution image to the low resolution image. The model is more easily adapted to the real world image, and high-frequency details can be better searched from the LR image, so that a more accurate, reliable and high-quality reconstruction result is obtained; the problems of overlarge solution space and low reconstruction accuracy of the image super-resolution in the prior art are solved.

Description

Image super-resolution reconstruction method based on trans-scale non-local attention mechanism

Technical Field

The invention relates to the technical field of image processing, in particular to an image super-resolution reconstruction method based on a cross-scale non-local attention mechanism.

Background

The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.

A super-resolution reconstruction technique for a single image aims to recover a high-resolution image from an existing low-resolution image. The image super-resolution technology has great application value in various fields such as satellites, remote sensing, astronomy, security protection, biomedicine and the like. At present, convolutional neural networks or generation countermeasure networks, such as SRCNN, SRGAN and the like, are commonly adopted in the image super-resolution reconstruction technology.

Before deep learning is raised, the traditional image super-resolution reconstruction algorithm takes the dominant position, such as a super-resolution reconstruction method based on spatial domain, frequency domain, non-uniform interpolation, iterative back projection, convex set projection method, statistical learning and dictionary learning, etc., but the traditional image super-resolution method needs rich prior knowledge, and the edges and details of the generated super-resolution image are blurred. With the rise of deep learning, chao Dong et al first applied deep learning in the field of image superfractionation, and the srcn networks proposed by them have far more than the conventional image superfractionation algorithm in spite of the fact that three layers of networks are used. However, since the srcn network too depends on the context information of the small image, the convergence rate is too slow during training, and the method is only suitable for a single sampling scale, so that after improving the related disadvantages of the srcn network, the Chao Dong et al proposes an FSRCNN network with a faster training rate; the Jiwon Kim et al proposes that the number of layers of the neural network is larger and VDSR with residual connection is adopted, so that the network can extract more feature images, and the reconstructed image details are more abundant; subsequently, the ESPCN network proposed by Wenzhe Shi et al enables real-time processing of 1080P video on a separate K2 GPU; the RED-Net model proposed by Mao et al consists of a convolution layer and a deconvolution layer, and the convolution layer and the deconvolution layer exhibit a symmetrical distribution, i.e., an encoding-decoding structure. The deconvolution layer has the functions of extracting features and denoising, and the deconvolution layer receives the denoised feature image and reconstructs the feature image into a high-resolution image, so that the restored image is clearer.

Image super-resolution reconstruction is an inherently ill-posed problem in that the high resolution image reconstructed from a low resolution image is infinitely variable, and in short, the solution space of the problem is not fixed, and there is no limitation on the solution space of image super-resolution in the above-mentioned method.

In summary, the deep neural network exhibits good performance in terms of image super-resolution (SR) by learning a nonlinear mapping function from a Low Resolution (LR) image to a High Resolution (HR) image. However, the conventional SR reconstruction method has the following problems:

(1) Learning from LR images to HR images is a typical ill-posed problem, and there are infinite numbers of HR images that can be obtained from the same LR image by downsampling, so the solution space of the possible functions is extremely large, which makes it very difficult to find a good solution.

(2) In practical applications, paired LR and HR images tend to be absent and potential degradation methods tend to be unknown.

(3) Effective high-frequency details cannot be searched from the LR image, and accuracy of image super-resolution reconstruction is affected.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides an image super-resolution reconstruction method, an image super-resolution reconstruction system, an electronic device and a computer readable storage medium based on a cross-scale non-local attention mechanism, and provides a double regression network based on the cross-scale non-local attention mechanism, wherein the solution space of the image super-resolution is limited through the double regression network, so that the reconstructed super-resolution image is more similar to a real image; the method uses a trans-scale non-local attention mechanism, so that the network can better mine the trans-scale feature similarity widely existing in the image, further integrate the trans-scale feature similarity with a local prior and a non-local prior in a single scale, and greatly improve the super-division reconstruction performance of the model.

In a first aspect, the present invention provides a method for image super-resolution reconstruction based on a trans-scale non-local attention mechanism;

an image super-resolution reconstruction method based on a trans-scale non-local attention mechanism comprises the following steps:

acquiring an image to be processed;

inputting an image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image;

the image super-resolution reconstruction model comprises a U-Net network, a multi-scale self-attention fusion module and an up-sampling layer which are connected in sequence;

the U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image;

the multi-scale self-attention fusion module is used for introducing a plurality of non-local attention mechanisms, cross-scale non-local attention mechanisms and traditional feature extraction branches to extract high-frequency features of the low-resolution image, and carrying out multi-mutual mapping fusion on the high-frequency feature subgraphs output by the non-local attention mechanisms, the cross-scale non-local attention mechanisms and the traditional feature extraction branches to obtain super-resolution feature subgraphs;

and the up-sampling layer splices the super-resolution characteristic subgraphs output by the multi-scale self-attention fusion modules to obtain a super-resolution image.

Further, the image super-resolution reconstruction model further comprises a dual regression network, wherein the dual regression network is used for being matched with the U-Net network to calculate the similarity of the super-resolution image and the low-resolution image so as to monitor super-resolution reconstruction of the low-resolution image.

Further, the non-local attention mechanism is expressed as

Where X is the feature map of the input low resolution image, ψ (X _g，h ) Is a feature transformation function, phi (X _i，j ，X _g，h ) Is a correlation function that measures similarity.

Further, the processing of the low resolution image by the cross-scale non-local attention mechanism includes:

downsampling the input low-resolution image by S times to obtain a feature map after resolution conversion;

representing the subgraph in the low resolution map as pixels in the feature map, calculating a softmax matching score between the low resolution image and the feature map at a pixel level;

and deconvoluting the subgraphs which are positioned in the low-resolution map and matched with the pixels in the feature map according to the softmax matching score to obtain a high-frequency feature subgraph.

Further, the trans-scale non-local attention mechanism is expressed as

/>

wherein ,

for a feature map of size sx s located at the image coordinates (si, sj), a>

Is a feature subgraph extracted from the input features.

Further, the multi-scale self-attention fusion module performs multi-mapping fusion on the non-local attention mechanism, the cross-scale non-local attention mechanism and the high-frequency feature subgraphs output by the traditional feature extraction branches, including:

calculating residual information between the high-frequency characteristic subgraph output by the non-local attention mechanism and the high-frequency characteristic subgraph output by the trans-scale non-local attention mechanism, performing single-layer convolution on the residual information, adding the residual information after single-layer convolution to the high-frequency characteristic subgraph output by the non-local attention mechanism, and obtaining a first high-frequency characteristic subgraph;

downsampling the first high-frequency characteristic subgraph, calculating residual errors of the downsampled first high-frequency characteristic subgraph and the high-frequency characteristic subgraphs output by the traditional characteristic extraction branches, and upsampling the residual errors;

and adding the residual error after upsampling with the first high-frequency characteristic subgraph to obtain a super-resolution characteristic subgraph.

Further, the multiple multi-scale self-attention fusion modules are multiple, and the multiple multi-scale self-attention fusion modules form a recursive network.

In a second aspect, the present invention provides an image super-resolution reconstruction system based on a trans-scale non-local attention mechanism;

an image super-resolution reconstruction system based on a trans-scale non-local attention mechanism, comprising:

an image acquisition module configured to: acquiring a low-resolution image to be processed;

an image super-resolution reconstruction module configured to: inputting an image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image;

the image super-resolution reconstruction model comprises a U-Net network, a plurality of multi-scale self-attention fusion modules and an up-sampling layer which are connected in sequence;

the multi-scale self-attention fusion module is used for introducing a non-local attention mechanism, a cross-scale non-local attention mechanism and a traditional feature extraction branch to extract a high-frequency feature subgraph of a low-resolution image, and carrying out multiple mutual mapping fusion on the high-frequency features output by the non-local attention mechanism, the cross-scale non-local attention mechanism and the traditional feature extraction branch to obtain a super-resolution feature subgraph;

and the up-sampling layer splices the super-resolution characteristic blocks output by the multi-scale self-attention fusion modules to acquire a super-resolution image.

In a third aspect, the present invention provides an electronic device;

an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the above-described method of image super-resolution reconstruction based on a cross-scale non-local attention mechanism.

In a fourth aspect, the present invention provides a computer-readable storage medium;

a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the above-described method for image super-resolution reconstruction based on a cross-scale non-local attention mechanism.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the technical scheme provided by the invention, the possible function solution space is reduced by introducing additional constraint into LR data; in addition to learning an LR to HR mapping, a dual regression mapping is additionally learned during training to estimate the downsampling kernel and reconstruct the LR image, thus forming a closed loop to provide additional supervision. Because the double regression network provided by the invention does not depend on the HR image, the study can be directly carried out from the LR image, and the image super-resolution reconstruction model is more easily adapted to the image of the real world.

2. According to the technical scheme provided by the invention, in order to better search high-frequency details from an LR image so as to obtain a more accurate, reliable and high-quality reconstruction result, a trans-scale non-local attention mechanism is introduced into a network, the relation between LR features and large-scale HR plaques in the same feature mapping is learned and mined, and then the relation is integrated with a local prior and a non-local prior in the scale into a self-sample mining module, and the self-sample mining module and a multi-branch mutual projection fusion is carried out. Finally, the module is embedded into a dual regression network for image super-resolution tasks.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a schematic flow chart provided in an embodiment of the present invention;

fig. 2 is a schematic diagram of a network architecture of an image super-resolution reconstruction model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a network architecture of a cross-scale non-local attention mechanism according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a network architecture of a multi-scale self-attention fusion module according to an embodiment of the present invention;

fig. 5 is a schematic diagram showing the comparison of effects of different reconstruction models according to an embodiment of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Example 1

In the prior art, the solution space of the super-resolution reconstruction of the image is too large and the cross-scale feature similarity in the image is not well mined; therefore, the invention provides an image super-resolution reconstruction method based on a cross-scale non-local attention mechanism, which reduces a possible function solution space by introducing additional constraint into LR data and learns and mines the relation between LR features and large-scale HR plaques in the same feature map through the cross-scale non-local attention mechanism.

Next, the image super-resolution reconstruction method based on the cross-scale non-local attention mechanism disclosed in this embodiment will be described in detail with reference to fig. 1 to 5.

The image super-resolution reconstruction method based on the trans-scale non-local attention mechanism comprises the following steps of:

s1, acquiring an image to be processed.

S2, inputting the image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image; the image super-resolution reconstruction model comprises a U-Net network, a multi-scale self-attention fusion module, an up-sampling layer and a dual regression network which are sequentially connected, wherein the multi-scale self-attention fusion module comprises a non-local attention mechanism, a trans-scale non-local attention mechanism, a traditional feature extraction branch and a feature fusion module.

The U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image; the multi-scale self-attention fusion module is used for introducing a plurality of non-local attention mechanisms, cross-scale non-local attention mechanisms and traditional feature extraction branches to extract high-frequency features of the low-resolution image, and carrying out multi-mutual mapping fusion on the high-frequency feature subgraphs output by the non-local attention mechanisms, the cross-scale non-local attention mechanisms and the traditional feature extraction branches to obtain super-resolution feature subgraphs; the up-sampling layer splices the super-resolution characteristic subgraphs output by the multi-scale self-attention fusion modules to obtain a super-resolution image; the dual regression network is used for cooperating with the U-Net network to calculate the similarity of the super-resolution image and the low-resolution image so as to supervise the super-resolution reconstruction of the low-resolution image.

Specifically, the overall network architecture of the image super-resolution reconstruction model is shown in fig. 1, and the network is constructed on the basis of the design of the U-Net. The CNDRN network consists of two parts, including an original network and a dual network. The original network adopts the design of a downsampling module (left half part in fig. 1) and an upsampling module (right half part in fig. 1) of the U-Net, wherein each sampling module comprises log ₂ And(s) basic blocks, wherein s represents a scale factor, and in fig. 1, s takes a value of 4, so that the up-sampling and down-sampling modules respectively consist of two basic blocks. To limit the de-space of the image super-resolution, we have introduced a dual network to learn the image down-sampling operation, which is much easier than the up-sampling mapping operation, from the super-resolution generated image, and therefore the dual network is simpler in structure, with only two convolution layers and one LeakyReLU activation layer.

In dual networks, most of the existing methods focus only on learning the mapping of LR to HR images, but the space of possible mapping functions may be very large, which greatly increases the training difficulty. To solve this problem, the present embodiment introduces a dual regression network (Dual Regression Network, abbreviated DRN) to provide an additional constraint.

Illustratively, the network learns not only the mapping from the low resolution image to the high resolution image (i.e., lr→hr) but also the dual mapping from the super resolution image to the low resolution image (i.e., hr→lr) at the same time when training is performed.

Labeling a collection of LR images as X, wherein each LR image is labeled as X _i The corresponding collection of HR images is labeled Y, wherein each HR image is labeled Y _i The above SR problem can be simply expressed as the following two dual regression tasks:

task one: the network learns a mapping P to realize the mapping from X to Y, so that the super-resolution image P (X _i ) And corresponding HR image y _i As similar as possible;

task two: the network learns a mapping D to realize the mapping from Y to X, so that the dual super-resolution image D (Y _i ) With the LR image x originally input _i As similar as possible.

By co-learning the above two tasks, the original learning task and the dual learning task form a closed loop and jointly provide information to train map P and map D. If P (x) _i ) Is a satisfactory HR image, then D (P (x _i ) Should be very close to the input LR image x) _i . In summary, the loss function during training is shown in equation (1):

wherein N represents the number of matched LR-HR; l (L) _p and L_D Loss functions of the original and dual networks are represented respectively, and mean absolute value errors (Mean Absolute Error) are used; λ is used to control the weight of the dual loss. After adding the above constraints, the possible function mapping space is greatly reduced.

Before introducing a multi-scale Self-attention fusion module (MSAF module for short), a non-local attention mechanism within a single scale and a trans-scale non-local attention mechanism are first introduced respectively.

The non-local attention mechanism may explore the self-paradigm by summarizing relevant features from the overall image. Formally, assuming a given image feature map is X, the non-local attention can be expressed as equation (2):

where X is the feature map of the input low resolution image, ψ (X _g，h ) Is a feature transformation function, phi (X _i，j ，X _g，h ) Is a correlation function measuring similarity, and can be expressed as formula (3), wherein θ (X _i，j) and δ(X_g，h ) Are all specialThe sign transform functions, (i, j), (g, h), (u, v) are X coordinate pairs, i, j, g, h, u, v each denote a subscript, and T denotes a transpose.

φ(X _i，j ，X _g，h )＝θ(X _i，j ) ^T δ(X _g，h ) (3)

The above-described Non-Local Attention mechanism is calculated within a single Scale, called In-Scale Non-Local Attention (ISNL), whereas to measure the correlation between pixels In LR images and patches of different scales we introduced a Cross-Scale Non-Local Attention mechanism (CSNL). Unlike the measurement of cross-correlations between low resolution pixels in ISNL, CSNL is intended to measure cross-correlations between low resolution pixels in LR images and their corresponding cross-scale patch.

As shown in FIG. 3, assuming that the spatial size of a given input feature X is (W, H), because of the difference between spatial dimensions, if a general similarity measure is directly used to match pixels and patches, it is difficult to match the pixels and patches, so we first downsample the input feature X by a factor of s to obtain a spatial size of

At this point we represent the patch in X for the pixel in Y, then we calculate the softmax match score at the pixel level between X and Y, finally deconvolve the patch in X that matches the pixel in Y for s×s and use for super resolution reconstruction, so the spatial size of the final output feature Z will be swxsh.

Based on equation (2), we can write the expression for CSNL as shown in equation (4):

wherein ,

representing the ruler at the image coordinates (si, sj)The dimensions are characteristic blocks of sxs.

For feature blocks extracted from input features

Direct weighted averaging yields an output feature block

Intuitively, more rich and reliable high frequency details are mined from the original intrinsic image resources by a trans-scale non-local attention module.

In order to be able to integrate all possible intrinsic priors obtained and rich extrinsic image priors, an MSAF module is introduced, the structure of which is shown in fig. 4. In the MSAF module, we use the multi-branch structure to mine the self-similarity of the images, learn new information, including traditional Local branches and ISNL branches as well as CSNL branches. After deconvolution of ISNL branches, we have to multiplex the three branches for the mutual mapping fusion, specifically as follows: first calculate the output characteristics F of ISNL ₁ And CSNL output feature F _C Residual R of (2) _IC ，R _IC Representing details that are present in one branch and missing in the other, such residual projection allows the network to focus only on the different information between the different sources, ignoring the common information, thus improving the discrimination capability of the network. Subsequently we apply to residual information R _IC Performing single-layer convolution to output result and F _I Added to obtain F _IC F is to F _IC Output characteristic F of Local Branch after downsampling ₁ Residual is made, then the residual is up-sampled and then the residual is combined with F _IC And adding to obtain a result of multiple mutual mapping fusion. Through the operation, residual learning is guaranteed while different characteristic sources are fused, and compared with simple addition or connection, the method has the advantages of better fusion effect and stronger identification performance.

The repeated MSAF modules are embedded in the framework of the loop, as shown in FIG. 2, at each iteration, the result of the multiple inter-map fusion serves on the one hand as the hidden unit H of the MSAF _i Directly outputs, and outputs H of a plurality of MSAFs _i After splicing, inputting the spliced signals into an up-sampling layer; on the other hand, the result is taken as L after double-layer CNN calculation _i To the next MSAF module, so that a plurality of MSAF modules actually form a recursive network.

Illustratively, as shown in fig. 2, the image to be reconstructed is input into a preset image super-resolution reconstruction model. Firstly, shallow feature extraction and downsampling are carried out on an input image through a first fusion inversion residual error module, and an output shallow feature image is in residual error connection with a corresponding deep feature image on one hand and participates in reconstruction of a super-resolution image; on the other hand, the shallow feature map is subjected to normalization processing and then is input into a second fusion inversion residual error module to continue feature extraction and residual error connection with the corresponding deep feature map. After the characteristics are extracted and normalized by the second fusion inversion residual error module, the output characteristic image enters a multi-scale self-attention fusion module to mine the cross-scale self-similarity in the characteristic image, and the image reconstruction is directly carried out by a single convolution layer to obtain a double-sized reconstructed image. The reconstructed image will be used to calculate the similarity to the double-sized low resolution image obtained by downsampling the regression network. After the output features of the attention module are spliced and up-sampled, the operation similar to the previous operation is repeated, the image reconstruction is directly carried out to obtain a reconstructed image with double size, the output features are input into the next multi-scale self-attention fusion module, and the multi-scale self-attention fusion output features are spliced and up-sampled and then reconstructed to obtain a high-resolution image with four times size meeting the requirement. In the dual network, the reconstructed quadruple high-resolution image is sampled down by two times and four times in sequence, so as to obtain the low-resolution image with two times and one time, and the low-resolution image is respectively compared with the reconstructed image with two times and one time obtained before in similarity.

Example two

The embodiment discloses an image super-resolution reconstruction system based on a trans-scale non-local attention mechanism, comprising:

It should be noted that, the image acquisition module and the image super-resolution reconstruction module correspond to the steps in the first embodiment, and the modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

Example III

The third embodiment of the invention provides an electronic device, which comprises a memory, a processor and computer instructions stored on the memory and running on the processor, wherein the computer instructions complete the steps of the image super-resolution reconstruction method based on the cross-scale non-local attention mechanism when being run by the processor.

Example IV

The fourth embodiment of the present invention provides a computer readable storage medium, configured to store computer instructions, where the computer instructions, when executed by a processor, complete the steps of the above-mentioned image super-resolution reconstruction method based on a cross-scale non-local attention mechanism.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The image super-resolution reconstruction method based on the trans-scale non-local attention mechanism is characterized by comprising the following steps of:

acquiring an image to be processed;

2. The method for reconstructing an image super-resolution based on a trans-scale non-local attention mechanism according to claim 1, wherein the image super-resolution reconstruction model further comprises a dual regression network for cooperating with the U-Net network to calculate the similarity of a super-resolution image and a low-resolution image to supervise super-resolution reconstruction of the low-resolution image.

3. The method for super-resolution reconstruction of an image based on a trans-scale non-local attention mechanism as claimed in claim 1, wherein the non-local attention mechanism is represented as

4. The method for super-resolution reconstruction of an image based on a cross-scale non-local attention mechanism as recited in claim 1, wherein the processing of the low-resolution image by the cross-scale non-local attention mechanism comprises:

5. The method for image super-resolution reconstruction based on a cross-scale non-local attention mechanism as recited in claim 1, wherein the cross-scale non-local attention mechanism is expressed as

wherein ,

for a feature map of size sx s located at the image coordinates (si, sj), a>

Is a feature subgraph extracted from the input features.

6. The method for reconstructing an image super-resolution based on a cross-scale non-local attention mechanism according to claim 1, wherein the multi-scale self-attention fusion module performs multiple inter-map fusion on the non-local attention mechanism, the cross-scale non-local attention mechanism and the high-frequency feature subgraphs output by the traditional feature extraction branches, comprising:

7. The method for reconstructing an image super-resolution based on a cross-scale non-local attention mechanism according to claim 1, wherein the multi-scale self-attention fusion modules are plural, and the plural multi-scale self-attention fusion modules form a recursive network.

8. An image super-resolution reconstruction system based on a trans-scale non-local attention mechanism, which is characterized by comprising:

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of any of claims 1-7.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of any of claims 1-7.