CN116228542A - Image super-resolution reconstruction method based on trans-scale non-local attention mechanism - Google Patents

Image super-resolution reconstruction method based on trans-scale non-local attention mechanism Download PDF

Info

Publication number
CN116228542A
CN116228542A CN202310250793.1A CN202310250793A CN116228542A CN 116228542 A CN116228542 A CN 116228542A CN 202310250793 A CN202310250793 A CN 202310250793A CN 116228542 A CN116228542 A CN 116228542A
Authority
CN
China
Prior art keywords
image
resolution
super
scale
attention mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310250793.1A
Other languages
Chinese (zh)
Inventor
李天平
李冠兴
魏艳军
崔朝童
李萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202310250793.1A priority Critical patent/CN116228542A/en
Publication of CN116228542A publication Critical patent/CN116228542A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an image super-resolution reconstruction method and system based on a cross-scale non-local attention mechanism, and belongs to the technical field of image processing. According to the invention, a trans-scale non-local attention mechanism is introduced into an image super-resolution reconstruction model, and the relation between LR features and large-scale HR plaques in the same feature map is learned and mined. A dual regression network is introduced to provide an additional constraint to learn the mapping from the low resolution image to the high resolution image and the dual mapping from the super resolution image to the low resolution image. The model is more easily adapted to the real world image, and high-frequency details can be better searched from the LR image, so that a more accurate, reliable and high-quality reconstruction result is obtained; the problems of overlarge solution space and low reconstruction accuracy of the image super-resolution in the prior art are solved.

Description

Image super-resolution reconstruction method based on trans-scale non-local attention mechanism
Technical Field
The invention relates to the technical field of image processing, in particular to an image super-resolution reconstruction method based on a cross-scale non-local attention mechanism.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
A super-resolution reconstruction technique for a single image aims to recover a high-resolution image from an existing low-resolution image. The image super-resolution technology has great application value in various fields such as satellites, remote sensing, astronomy, security protection, biomedicine and the like. At present, convolutional neural networks or generation countermeasure networks, such as SRCNN, SRGAN and the like, are commonly adopted in the image super-resolution reconstruction technology.
Before deep learning is raised, the traditional image super-resolution reconstruction algorithm takes the dominant position, such as a super-resolution reconstruction method based on spatial domain, frequency domain, non-uniform interpolation, iterative back projection, convex set projection method, statistical learning and dictionary learning, etc., but the traditional image super-resolution method needs rich prior knowledge, and the edges and details of the generated super-resolution image are blurred. With the rise of deep learning, chao Dong et al first applied deep learning in the field of image superfractionation, and the srcn networks proposed by them have far more than the conventional image superfractionation algorithm in spite of the fact that three layers of networks are used. However, since the srcn network too depends on the context information of the small image, the convergence rate is too slow during training, and the method is only suitable for a single sampling scale, so that after improving the related disadvantages of the srcn network, the Chao Dong et al proposes an FSRCNN network with a faster training rate; the Jiwon Kim et al proposes that the number of layers of the neural network is larger and VDSR with residual connection is adopted, so that the network can extract more feature images, and the reconstructed image details are more abundant; subsequently, the ESPCN network proposed by Wenzhe Shi et al enables real-time processing of 1080P video on a separate K2 GPU; the RED-Net model proposed by Mao et al consists of a convolution layer and a deconvolution layer, and the convolution layer and the deconvolution layer exhibit a symmetrical distribution, i.e., an encoding-decoding structure. The deconvolution layer has the functions of extracting features and denoising, and the deconvolution layer receives the denoised feature image and reconstructs the feature image into a high-resolution image, so that the restored image is clearer.
Image super-resolution reconstruction is an inherently ill-posed problem in that the high resolution image reconstructed from a low resolution image is infinitely variable, and in short, the solution space of the problem is not fixed, and there is no limitation on the solution space of image super-resolution in the above-mentioned method.
In summary, the deep neural network exhibits good performance in terms of image super-resolution (SR) by learning a nonlinear mapping function from a Low Resolution (LR) image to a High Resolution (HR) image. However, the conventional SR reconstruction method has the following problems:
(1) Learning from LR images to HR images is a typical ill-posed problem, and there are infinite numbers of HR images that can be obtained from the same LR image by downsampling, so the solution space of the possible functions is extremely large, which makes it very difficult to find a good solution.
(2) In practical applications, paired LR and HR images tend to be absent and potential degradation methods tend to be unknown.
(3) Effective high-frequency details cannot be searched from the LR image, and accuracy of image super-resolution reconstruction is affected.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides an image super-resolution reconstruction method, an image super-resolution reconstruction system, an electronic device and a computer readable storage medium based on a cross-scale non-local attention mechanism, and provides a double regression network based on the cross-scale non-local attention mechanism, wherein the solution space of the image super-resolution is limited through the double regression network, so that the reconstructed super-resolution image is more similar to a real image; the method uses a trans-scale non-local attention mechanism, so that the network can better mine the trans-scale feature similarity widely existing in the image, further integrate the trans-scale feature similarity with a local prior and a non-local prior in a single scale, and greatly improve the super-division reconstruction performance of the model.
In a first aspect, the present invention provides a method for image super-resolution reconstruction based on a trans-scale non-local attention mechanism;
an image super-resolution reconstruction method based on a trans-scale non-local attention mechanism comprises the following steps:
acquiring an image to be processed;
inputting an image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image;
the image super-resolution reconstruction model comprises a U-Net network, a multi-scale self-attention fusion module and an up-sampling layer which are connected in sequence;
the U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image;
the multi-scale self-attention fusion module is used for introducing a plurality of non-local attention mechanisms, cross-scale non-local attention mechanisms and traditional feature extraction branches to extract high-frequency features of the low-resolution image, and carrying out multi-mutual mapping fusion on the high-frequency feature subgraphs output by the non-local attention mechanisms, the cross-scale non-local attention mechanisms and the traditional feature extraction branches to obtain super-resolution feature subgraphs;
and the up-sampling layer splices the super-resolution characteristic subgraphs output by the multi-scale self-attention fusion modules to obtain a super-resolution image.
Further, the image super-resolution reconstruction model further comprises a dual regression network, wherein the dual regression network is used for being matched with the U-Net network to calculate the similarity of the super-resolution image and the low-resolution image so as to monitor super-resolution reconstruction of the low-resolution image.
Further, the non-local attention mechanism is expressed as
Figure BDA0004127793170000041
Where X is the feature map of the input low resolution image, ψ (X g,h ) Is a feature transformation function, phi (X i,j ,X g,h ) Is a correlation function that measures similarity.
Further, the processing of the low resolution image by the cross-scale non-local attention mechanism includes:
downsampling the input low-resolution image by S times to obtain a feature map after resolution conversion;
representing the subgraph in the low resolution map as pixels in the feature map, calculating a softmax matching score between the low resolution image and the feature map at a pixel level;
and deconvoluting the subgraphs which are positioned in the low-resolution map and matched with the pixels in the feature map according to the softmax matching score to obtain a high-frequency feature subgraph.
Further, the trans-scale non-local attention mechanism is expressed as
Figure BDA0004127793170000042
/>
wherein ,
Figure BDA0004127793170000043
for a feature map of size sx s located at the image coordinates (si, sj), a>
Figure BDA0004127793170000044
Is a feature subgraph extracted from the input features.
Further, the multi-scale self-attention fusion module performs multi-mapping fusion on the non-local attention mechanism, the cross-scale non-local attention mechanism and the high-frequency feature subgraphs output by the traditional feature extraction branches, including:
calculating residual information between the high-frequency characteristic subgraph output by the non-local attention mechanism and the high-frequency characteristic subgraph output by the trans-scale non-local attention mechanism, performing single-layer convolution on the residual information, adding the residual information after single-layer convolution to the high-frequency characteristic subgraph output by the non-local attention mechanism, and obtaining a first high-frequency characteristic subgraph;
downsampling the first high-frequency characteristic subgraph, calculating residual errors of the downsampled first high-frequency characteristic subgraph and the high-frequency characteristic subgraphs output by the traditional characteristic extraction branches, and upsampling the residual errors;
and adding the residual error after upsampling with the first high-frequency characteristic subgraph to obtain a super-resolution characteristic subgraph.
Further, the multiple multi-scale self-attention fusion modules are multiple, and the multiple multi-scale self-attention fusion modules form a recursive network.
In a second aspect, the present invention provides an image super-resolution reconstruction system based on a trans-scale non-local attention mechanism;
an image super-resolution reconstruction system based on a trans-scale non-local attention mechanism, comprising:
an image acquisition module configured to: acquiring a low-resolution image to be processed;
an image super-resolution reconstruction module configured to: inputting an image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image;
the image super-resolution reconstruction model comprises a U-Net network, a plurality of multi-scale self-attention fusion modules and an up-sampling layer which are connected in sequence;
the U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image;
the multi-scale self-attention fusion module is used for introducing a non-local attention mechanism, a cross-scale non-local attention mechanism and a traditional feature extraction branch to extract a high-frequency feature subgraph of a low-resolution image, and carrying out multiple mutual mapping fusion on the high-frequency features output by the non-local attention mechanism, the cross-scale non-local attention mechanism and the traditional feature extraction branch to obtain a super-resolution feature subgraph;
and the up-sampling layer splices the super-resolution characteristic blocks output by the multi-scale self-attention fusion modules to acquire a super-resolution image.
In a third aspect, the present invention provides an electronic device;
an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the above-described method of image super-resolution reconstruction based on a cross-scale non-local attention mechanism.
In a fourth aspect, the present invention provides a computer-readable storage medium;
a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the above-described method for image super-resolution reconstruction based on a cross-scale non-local attention mechanism.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the technical scheme provided by the invention, the possible function solution space is reduced by introducing additional constraint into LR data; in addition to learning an LR to HR mapping, a dual regression mapping is additionally learned during training to estimate the downsampling kernel and reconstruct the LR image, thus forming a closed loop to provide additional supervision. Because the double regression network provided by the invention does not depend on the HR image, the study can be directly carried out from the LR image, and the image super-resolution reconstruction model is more easily adapted to the image of the real world.
2. According to the technical scheme provided by the invention, in order to better search high-frequency details from an LR image so as to obtain a more accurate, reliable and high-quality reconstruction result, a trans-scale non-local attention mechanism is introduced into a network, the relation between LR features and large-scale HR plaques in the same feature mapping is learned and mined, and then the relation is integrated with a local prior and a non-local prior in the scale into a self-sample mining module, and the self-sample mining module and a multi-branch mutual projection fusion is carried out. Finally, the module is embedded into a dual regression network for image super-resolution tasks.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a schematic flow chart provided in an embodiment of the present invention;
fig. 2 is a schematic diagram of a network architecture of an image super-resolution reconstruction model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a network architecture of a cross-scale non-local attention mechanism according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a network architecture of a multi-scale self-attention fusion module according to an embodiment of the present invention;
fig. 5 is a schematic diagram showing the comparison of effects of different reconstruction models according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
In the prior art, the solution space of the super-resolution reconstruction of the image is too large and the cross-scale feature similarity in the image is not well mined; therefore, the invention provides an image super-resolution reconstruction method based on a cross-scale non-local attention mechanism, which reduces a possible function solution space by introducing additional constraint into LR data and learns and mines the relation between LR features and large-scale HR plaques in the same feature map through the cross-scale non-local attention mechanism.
Next, the image super-resolution reconstruction method based on the cross-scale non-local attention mechanism disclosed in this embodiment will be described in detail with reference to fig. 1 to 5.
The image super-resolution reconstruction method based on the trans-scale non-local attention mechanism comprises the following steps of:
s1, acquiring an image to be processed.
S2, inputting the image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image; the image super-resolution reconstruction model comprises a U-Net network, a multi-scale self-attention fusion module, an up-sampling layer and a dual regression network which are sequentially connected, wherein the multi-scale self-attention fusion module comprises a non-local attention mechanism, a trans-scale non-local attention mechanism, a traditional feature extraction branch and a feature fusion module.
The U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image; the multi-scale self-attention fusion module is used for introducing a plurality of non-local attention mechanisms, cross-scale non-local attention mechanisms and traditional feature extraction branches to extract high-frequency features of the low-resolution image, and carrying out multi-mutual mapping fusion on the high-frequency feature subgraphs output by the non-local attention mechanisms, the cross-scale non-local attention mechanisms and the traditional feature extraction branches to obtain super-resolution feature subgraphs; the up-sampling layer splices the super-resolution characteristic subgraphs output by the multi-scale self-attention fusion modules to obtain a super-resolution image; the dual regression network is used for cooperating with the U-Net network to calculate the similarity of the super-resolution image and the low-resolution image so as to supervise the super-resolution reconstruction of the low-resolution image.
Specifically, the overall network architecture of the image super-resolution reconstruction model is shown in fig. 1, and the network is constructed on the basis of the design of the U-Net. The CNDRN network consists of two parts, including an original network and a dual network. The original network adopts the design of a downsampling module (left half part in fig. 1) and an upsampling module (right half part in fig. 1) of the U-Net, wherein each sampling module comprises log 2 And(s) basic blocks, wherein s represents a scale factor, and in fig. 1, s takes a value of 4, so that the up-sampling and down-sampling modules respectively consist of two basic blocks. To limit the de-space of the image super-resolution, we have introduced a dual network to learn the image down-sampling operation, which is much easier than the up-sampling mapping operation, from the super-resolution generated image, and therefore the dual network is simpler in structure, with only two convolution layers and one LeakyReLU activation layer.
In dual networks, most of the existing methods focus only on learning the mapping of LR to HR images, but the space of possible mapping functions may be very large, which greatly increases the training difficulty. To solve this problem, the present embodiment introduces a dual regression network (Dual Regression Network, abbreviated DRN) to provide an additional constraint.
Illustratively, the network learns not only the mapping from the low resolution image to the high resolution image (i.e., lr→hr) but also the dual mapping from the super resolution image to the low resolution image (i.e., hr→lr) at the same time when training is performed.
Labeling a collection of LR images as X, wherein each LR image is labeled as X i The corresponding collection of HR images is labeled Y, wherein each HR image is labeled Y i The above SR problem can be simply expressed as the following two dual regression tasks:
task one: the network learns a mapping P to realize the mapping from X to Y, so that the super-resolution image P (X i ) And corresponding HR image y i As similar as possible;
task two: the network learns a mapping D to realize the mapping from Y to X, so that the dual super-resolution image D (Y i ) With the LR image x originally input i As similar as possible.
By co-learning the above two tasks, the original learning task and the dual learning task form a closed loop and jointly provide information to train map P and map D. If P (x) i ) Is a satisfactory HR image, then D (P (x i ) Should be very close to the input LR image x) i . In summary, the loss function during training is shown in equation (1):
Figure BDA0004127793170000101
wherein N represents the number of matched LR-HR; l (L) p and LD Loss functions of the original and dual networks are represented respectively, and mean absolute value errors (Mean Absolute Error) are used; λ is used to control the weight of the dual loss. After adding the above constraints, the possible function mapping space is greatly reduced.
Before introducing a multi-scale Self-attention fusion module (MSAF module for short), a non-local attention mechanism within a single scale and a trans-scale non-local attention mechanism are first introduced respectively.
The non-local attention mechanism may explore the self-paradigm by summarizing relevant features from the overall image. Formally, assuming a given image feature map is X, the non-local attention can be expressed as equation (2):
Figure BDA0004127793170000102
where X is the feature map of the input low resolution image, ψ (X g,h ) Is a feature transformation function, phi (X i,j ,X g,h ) Is a correlation function measuring similarity, and can be expressed as formula (3), wherein θ (X i,j) and δ(Xg,h ) Are all specialThe sign transform functions, (i, j), (g, h), (u, v) are X coordinate pairs, i, j, g, h, u, v each denote a subscript, and T denotes a transpose.
φ(X i,j ,X g,h )=θ(X i,j ) T δ(X g,h ) (3)
The above-described Non-Local Attention mechanism is calculated within a single Scale, called In-Scale Non-Local Attention (ISNL), whereas to measure the correlation between pixels In LR images and patches of different scales we introduced a Cross-Scale Non-Local Attention mechanism (CSNL). Unlike the measurement of cross-correlations between low resolution pixels in ISNL, CSNL is intended to measure cross-correlations between low resolution pixels in LR images and their corresponding cross-scale patch.
As shown in FIG. 3, assuming that the spatial size of a given input feature X is (W, H), because of the difference between spatial dimensions, if a general similarity measure is directly used to match pixels and patches, it is difficult to match the pixels and patches, so we first downsample the input feature X by a factor of s to obtain a spatial size of
Figure BDA0004127793170000111
At this point we represent the patch in X for the pixel in Y, then we calculate the softmax match score at the pixel level between X and Y, finally deconvolve the patch in X that matches the pixel in Y for s×s and use for super resolution reconstruction, so the spatial size of the final output feature Z will be swxsh.
Based on equation (2), we can write the expression for CSNL as shown in equation (4):
Figure BDA0004127793170000112
wherein ,
Figure BDA0004127793170000113
representing the ruler at the image coordinates (si, sj)The dimensions are characteristic blocks of sxs.
For feature blocks extracted from input features
Figure BDA0004127793170000114
Direct weighted averaging yields an output feature block
Figure BDA0004127793170000115
Intuitively, more rich and reliable high frequency details are mined from the original intrinsic image resources by a trans-scale non-local attention module.
In order to be able to integrate all possible intrinsic priors obtained and rich extrinsic image priors, an MSAF module is introduced, the structure of which is shown in fig. 4. In the MSAF module, we use the multi-branch structure to mine the self-similarity of the images, learn new information, including traditional Local branches and ISNL branches as well as CSNL branches. After deconvolution of ISNL branches, we have to multiplex the three branches for the mutual mapping fusion, specifically as follows: first calculate the output characteristics F of ISNL 1 And CSNL output feature F C Residual R of (2) IC ,R IC Representing details that are present in one branch and missing in the other, such residual projection allows the network to focus only on the different information between the different sources, ignoring the common information, thus improving the discrimination capability of the network. Subsequently we apply to residual information R IC Performing single-layer convolution to output result and F I Added to obtain F IC F is to F IC Output characteristic F of Local Branch after downsampling 1 Residual is made, then the residual is up-sampled and then the residual is combined with F IC And adding to obtain a result of multiple mutual mapping fusion. Through the operation, residual learning is guaranteed while different characteristic sources are fused, and compared with simple addition or connection, the method has the advantages of better fusion effect and stronger identification performance.
The repeated MSAF modules are embedded in the framework of the loop, as shown in FIG. 2, at each iteration, the result of the multiple inter-map fusion serves on the one hand as the hidden unit H of the MSAF i Directly outputs, and outputs H of a plurality of MSAFs i After splicing, inputting the spliced signals into an up-sampling layer; on the other hand, the result is taken as L after double-layer CNN calculation i To the next MSAF module, so that a plurality of MSAF modules actually form a recursive network.
Illustratively, as shown in fig. 2, the image to be reconstructed is input into a preset image super-resolution reconstruction model. Firstly, shallow feature extraction and downsampling are carried out on an input image through a first fusion inversion residual error module, and an output shallow feature image is in residual error connection with a corresponding deep feature image on one hand and participates in reconstruction of a super-resolution image; on the other hand, the shallow feature map is subjected to normalization processing and then is input into a second fusion inversion residual error module to continue feature extraction and residual error connection with the corresponding deep feature map. After the characteristics are extracted and normalized by the second fusion inversion residual error module, the output characteristic image enters a multi-scale self-attention fusion module to mine the cross-scale self-similarity in the characteristic image, and the image reconstruction is directly carried out by a single convolution layer to obtain a double-sized reconstructed image. The reconstructed image will be used to calculate the similarity to the double-sized low resolution image obtained by downsampling the regression network. After the output features of the attention module are spliced and up-sampled, the operation similar to the previous operation is repeated, the image reconstruction is directly carried out to obtain a reconstructed image with double size, the output features are input into the next multi-scale self-attention fusion module, and the multi-scale self-attention fusion output features are spliced and up-sampled and then reconstructed to obtain a high-resolution image with four times size meeting the requirement. In the dual network, the reconstructed quadruple high-resolution image is sampled down by two times and four times in sequence, so as to obtain the low-resolution image with two times and one time, and the low-resolution image is respectively compared with the reconstructed image with two times and one time obtained before in similarity.
Example two
The embodiment discloses an image super-resolution reconstruction system based on a trans-scale non-local attention mechanism, comprising:
an image acquisition module configured to: acquiring a low-resolution image to be processed;
an image super-resolution reconstruction module configured to: inputting an image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image;
the image super-resolution reconstruction model comprises a U-Net network, a plurality of multi-scale self-attention fusion modules and an up-sampling layer which are connected in sequence;
the U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image;
the multi-scale self-attention fusion module is used for introducing a non-local attention mechanism, a cross-scale non-local attention mechanism and a traditional feature extraction branch to extract a high-frequency feature subgraph of a low-resolution image, and carrying out multiple mutual mapping fusion on the high-frequency features output by the non-local attention mechanism, the cross-scale non-local attention mechanism and the traditional feature extraction branch to obtain a super-resolution feature subgraph;
and the up-sampling layer splices the super-resolution characteristic blocks output by the multi-scale self-attention fusion modules to acquire a super-resolution image.
It should be noted that, the image acquisition module and the image super-resolution reconstruction module correspond to the steps in the first embodiment, and the modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
Example III
The third embodiment of the invention provides an electronic device, which comprises a memory, a processor and computer instructions stored on the memory and running on the processor, wherein the computer instructions complete the steps of the image super-resolution reconstruction method based on the cross-scale non-local attention mechanism when being run by the processor.
Example IV
The fourth embodiment of the present invention provides a computer readable storage medium, configured to store computer instructions, where the computer instructions, when executed by a processor, complete the steps of the above-mentioned image super-resolution reconstruction method based on a cross-scale non-local attention mechanism.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The image super-resolution reconstruction method based on the trans-scale non-local attention mechanism is characterized by comprising the following steps of:
acquiring an image to be processed;
inputting an image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image;
the image super-resolution reconstruction model comprises a U-Net network, a multi-scale self-attention fusion module and an up-sampling layer which are connected in sequence;
the U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image;
the multi-scale self-attention fusion module is used for introducing a plurality of non-local attention mechanisms, cross-scale non-local attention mechanisms and traditional feature extraction branches to extract high-frequency features of the low-resolution image, and carrying out multi-mutual mapping fusion on the high-frequency feature subgraphs output by the non-local attention mechanisms, the cross-scale non-local attention mechanisms and the traditional feature extraction branches to obtain super-resolution feature subgraphs;
and the up-sampling layer splices the super-resolution characteristic subgraphs output by the multi-scale self-attention fusion modules to obtain a super-resolution image.
2. The method for reconstructing an image super-resolution based on a trans-scale non-local attention mechanism according to claim 1, wherein the image super-resolution reconstruction model further comprises a dual regression network for cooperating with the U-Net network to calculate the similarity of a super-resolution image and a low-resolution image to supervise super-resolution reconstruction of the low-resolution image.
3. The method for super-resolution reconstruction of an image based on a trans-scale non-local attention mechanism as claimed in claim 1, wherein the non-local attention mechanism is represented as
Figure FDA0004127793160000011
Where X is the feature map of the input low resolution image, ψ (X g,h ) Is a feature transformation function, phi (X i,j ,X g,h ) Is a correlation function that measures similarity.
4. The method for super-resolution reconstruction of an image based on a cross-scale non-local attention mechanism as recited in claim 1, wherein the processing of the low-resolution image by the cross-scale non-local attention mechanism comprises:
downsampling the input low-resolution image by S times to obtain a feature map after resolution conversion;
representing the subgraph in the low resolution map as pixels in the feature map, calculating a softmax matching score between the low resolution image and the feature map at a pixel level;
and deconvoluting the subgraphs which are positioned in the low-resolution map and matched with the pixels in the feature map according to the softmax matching score to obtain a high-frequency feature subgraph.
5. The method for image super-resolution reconstruction based on a cross-scale non-local attention mechanism as recited in claim 1, wherein the cross-scale non-local attention mechanism is expressed as
Figure FDA0004127793160000021
wherein ,
Figure FDA0004127793160000022
for a feature map of size sx s located at the image coordinates (si, sj), a>
Figure FDA0004127793160000023
Is a feature subgraph extracted from the input features.
6. The method for reconstructing an image super-resolution based on a cross-scale non-local attention mechanism according to claim 1, wherein the multi-scale self-attention fusion module performs multiple inter-map fusion on the non-local attention mechanism, the cross-scale non-local attention mechanism and the high-frequency feature subgraphs output by the traditional feature extraction branches, comprising:
calculating residual information between the high-frequency characteristic subgraph output by the non-local attention mechanism and the high-frequency characteristic subgraph output by the trans-scale non-local attention mechanism, performing single-layer convolution on the residual information, adding the residual information after single-layer convolution to the high-frequency characteristic subgraph output by the non-local attention mechanism, and obtaining a first high-frequency characteristic subgraph;
downsampling the first high-frequency characteristic subgraph, calculating residual errors of the downsampled first high-frequency characteristic subgraph and the high-frequency characteristic subgraphs output by the traditional characteristic extraction branches, and upsampling the residual errors;
and adding the residual error after upsampling with the first high-frequency characteristic subgraph to obtain a super-resolution characteristic subgraph.
7. The method for reconstructing an image super-resolution based on a cross-scale non-local attention mechanism according to claim 1, wherein the multi-scale self-attention fusion modules are plural, and the plural multi-scale self-attention fusion modules form a recursive network.
8. An image super-resolution reconstruction system based on a trans-scale non-local attention mechanism, which is characterized by comprising:
an image acquisition module configured to: acquiring a low-resolution image to be processed;
an image super-resolution reconstruction module configured to: inputting an image to be processed into a preset image super-resolution reconstruction model to obtain a super-resolution image;
the image super-resolution reconstruction model comprises a U-Net network, a plurality of multi-scale self-attention fusion modules and an up-sampling layer which are connected in sequence;
the U-Net network is used for extracting shallow layer characteristics of an image to be processed and obtaining a low-resolution image;
the multi-scale self-attention fusion module is used for introducing a non-local attention mechanism, a cross-scale non-local attention mechanism and a traditional feature extraction branch to extract a high-frequency feature subgraph of a low-resolution image, and carrying out multiple mutual mapping fusion on the high-frequency features output by the non-local attention mechanism, the cross-scale non-local attention mechanism and the traditional feature extraction branch to obtain a super-resolution feature subgraph;
and the up-sampling layer splices the super-resolution characteristic blocks output by the multi-scale self-attention fusion modules to acquire a super-resolution image.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of any of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of any of claims 1-7.
CN202310250793.1A 2023-03-10 2023-03-10 Image super-resolution reconstruction method based on trans-scale non-local attention mechanism Pending CN116228542A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310250793.1A CN116228542A (en) 2023-03-10 2023-03-10 Image super-resolution reconstruction method based on trans-scale non-local attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310250793.1A CN116228542A (en) 2023-03-10 2023-03-10 Image super-resolution reconstruction method based on trans-scale non-local attention mechanism

Publications (1)

Publication Number Publication Date
CN116228542A true CN116228542A (en) 2023-06-06

Family

ID=86587287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310250793.1A Pending CN116228542A (en) 2023-03-10 2023-03-10 Image super-resolution reconstruction method based on trans-scale non-local attention mechanism

Country Status (1)

Country Link
CN (1) CN116228542A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523759A (en) * 2023-07-04 2023-08-01 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523759A (en) * 2023-07-04 2023-08-01 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism
CN116523759B (en) * 2023-07-04 2023-09-05 江西财经大学 Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism

Similar Documents

Publication Publication Date Title
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
WO2018120329A1 (en) Single-frame super-resolution reconstruction method and device based on sparse domain reconstruction
US20200380294A1 (en) Method and apparatus for sar image recognition based on multi-scale features and broad learning
CN110956126B (en) Small target detection method combined with super-resolution reconstruction
Hui et al. Progressive perception-oriented network for single image super-resolution
CN110501072B (en) Reconstruction method of snapshot type spectral imaging system based on tensor low-rank constraint
CN112767468A (en) Self-supervision three-dimensional reconstruction method and system based on collaborative segmentation and data enhancement
CN109636721B (en) Video super-resolution method based on countermeasure learning and attention mechanism
Liu et al. Effective image super resolution via hierarchical convolutional neural network
WO2024027095A1 (en) Hyperspectral imaging method and system based on double rgb image fusion, and medium
Zuo et al. Residual dense network for intensity-guided depth map enhancement
CN116228542A (en) Image super-resolution reconstruction method based on trans-scale non-local attention mechanism
Chen et al. RBPNET: An asymptotic Residual Back-Projection Network for super-resolution of very low-resolution face image
CN111325697B (en) Color image restoration method based on tensor eigen transformation
Song et al. Deep memory-augmented proximal unrolling network for compressive sensing
Liu et al. PDR-Net: Progressive depth reconstruction network for color guided depth map super-resolution
Cao et al. Diffusion model with disentangled modulations for sharpening multispectral and hyperspectral images
CN113393385B (en) Multi-scale fusion-based unsupervised rain removing method, system, device and medium
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN104463962A (en) Three-dimensional scene reconstruction method based on GPS information video
Wang et al. Joint depth map super-resolution method via deep hybrid-cross guidance filter
Wang et al. Msfnet: multistage fusion network for infrared and visible image fusion
CN111815690B (en) Method, system and computer equipment for real-time splicing of microscopic images
CN116740362B (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
Wu et al. Meta transfer learning-based super-resolution infrared imaging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination