CN117196959A

CN117196959A - Self-attention-based infrared image super-resolution method, device and readable medium

Info

Publication number: CN117196959A
Application number: CN202311475294.9A
Authority: CN
Inventors: 陈菲扬; 黄德天; 宋佳讯; 颜鹏贵; 杨坤; 曾焕强; 朱建清; 陈婧
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2023-12-08
Anticipated expiration: 2043-11-08
Also published as: CN117196959B

Abstract

The invention discloses a self-attention-based infrared image super-resolution method, a device and a readable medium, relating to the field of image processing, comprising the following steps: constructing and training a light-weight infrared image super-resolution model based on self-attention to obtain a trained light-weight infrared image super-resolution model; the method comprises the steps of inputting a low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, wherein the model comprises a 3X 3 convolution layer, a lightweight Transformer, a CNN backbone, a high-efficiency detail self-attention module and an image reconstruction module, inputting the low-resolution infrared image to be reconstructed into the 3X 3 convolution layer to obtain a first characteristic, sequentially passing through the lightweight Transformer, the CNN backbone and the high-efficiency detail self-attention module, circulating the high-efficiency detail self-attention module n times in a parameter sharing mode to obtain a second characteristic, connecting the first characteristic and the second characteristic in a residual mode, inputting the first characteristic and the second characteristic into the image reconstruction module, outputting the high-resolution infrared image, and solving the problems of parameter redundancy, poor performance and the like.

Description

Self-attention-based infrared image super-resolution method, device and readable medium

Technical Field

The invention relates to the field of image processing, in particular to a self-attention-based infrared image super-resolution method, a self-attention-based infrared image super-resolution device and a self-attention-based infrared image super-resolution readable medium.

Background

Infrared (IR) images are widely used in the fields of remote sensing, industry, medicine, security, traffic, etc. However, due to hardware limitations of infrared imaging devices, the captured infrared images are typically low in resolution, preventing the acquisition of important details. Image Super-Resolution (SR) can reconstruct a High-Resolution (HR) image of more detail from an existing Low-Resolution (LR) image. SR is an economical and efficient technique in terms of improving image quality (including resolution and sharpness) since there is no need to upgrade the imaging device. In recent years, CNN-based SR methods have attracted increasing attention due to the strong characteristic expression capability of convolutional neural networks (Convolutional Neural Network, CNN). The srcn first constructed an end-to-end network containing three convolutional layers to learn the mapping between LR and HR image blocks. The VDSR introduces residual error learning into the SR network, so that the network convergence speed is remarkably increased. Further, EDSR and RCAN construct deep networks with parameters of 43M and 16M, respectively, and achieve significant SR performance. However, compared with natural images, the infrared images have the characteristic of small information quantity, and are particularly characterized by low resolution, low contrast, low signal-to-noise ratio, blurred details and the like. Thus, in deep learning applications, infrared images lack to some extent key features that can be learned. Heavy-scale SR methods, such as EDSR and RCAN, are prone to parameter redundancy problems when training on infrared datasets, and therefore difficult to reconstruct satisfactory infrared images with adequate training.

With the rapid development of CNN, a number of lightweight SR methods emerge, the parameter amounts of which are typically between 300K and 1M. But due to the inherent nature of convolution, only local features can be extracted, ignoring long-range dependencies between features. This inevitably affects the restoration of the global contour structure, resulting in unclear contours and even line distortions of the reconstructed image. More importantly, due to the lack of learnable detail features on the local image blocks of the infrared image compared to the natural image, the super-resolution method suitable for the natural image may degrade when applied to the infrared image. However, most infrared SR methods in recent years use a lightweight network with low computational complexity and fast convergence speed due to relatively limited computational resources in practical applications.

Recently, transformers have demonstrated excellent performance in the field of natural language processing (Natural Language Processing, NLP), and many students have begun to explore the use of transformers in the field of computer vision. Compared with the traditional CNN-based model which can only obtain a large receptive field in a deep network by stacking a large number of convolution layers, the transform-based model can obtain a global receptive field in a shallow network by effectively exploring long-distance dependence, and the global feature representation capability is effectively improved. Therefore, the recent Transformer-based SR method is superior to the CNN-based SR method in both performance and efficiency. However, most Transformer-based SR methods still have two problems. First, sub-optimal self-attention attempts may be generated in deep networks due to imperfect feature representation or the presence of distracted image blocks, introducing false feature dependencies, leading to SR performance degradation. Second, the symmetric window-based self-attention module ignores feature dependencies between small-scale image blocks within the window, which is detrimental to fine-grained feature extraction, impeding SR performance.

Disclosure of Invention

The technical problems mentioned above are solved. An embodiment of the present application is directed to a self-attention-based infrared image super-resolution method, device and readable medium, which solve the technical problems mentioned in the background section above.

In a first aspect, the present application provides a self-attention-based infrared image super-resolution method, comprising the steps of:

acquiring a low-resolution infrared image to be reconstructed;

constructing and training a light-weight infrared image super-resolution model based on self-attention to obtain a trained light-weight infrared image super-resolution model;

inputting a low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, wherein the trained lightweight infrared image super-resolution model comprises a 3×3 convolution layer, a lightweight Transformer and CNN backbone, a high-efficiency detail self-attention module and an image reconstruction module, and the lightweight Transformer and CNN backbone comprises 6 hybrid Transformer residual modules and 2 high-efficiency residual self-attention modules which are sequentially connected, and the specific operation is as follows:

；

wherein,and->Respectively representing functions corresponding to the hybrid transducer residual error module and the high-efficiency residual error self-attention module; / >、/>And->The output of the first characteristic and the 6 th high-efficiency residual self-attention module and the output of the lightweight transducer and CNN backbone are respectively represented; the low-resolution infrared image to be reconstructed is input into a 3X 3 convolution layer to obtain a first feature, the first feature is sequentially subjected to a lightweight Transformer, CNN backbone and a high-efficiency detail self-attention module, the high-efficiency detail self-attention module circulates n times in a parameter sharing mode to obtain a second feature, the first feature and the second feature are subjected to residual connection and then input into an image reconstruction module, and the high-resolution infrared image is output.

Preferably, the efficient residual self-attention module comprises 1 layer normalization, 4 1×1 depth separable convolutional layers and 3 3×3 depth separable convolutional layers, and the specific procedure is as follows:

the input features are normalized through layers and respectively pass through 3 groups of depth separable convolution modules to generate query Q, key K and value V, wherein the depth separable convolution modules comprise 1X 1 depth separable convolution layer and 1X 3 depth separable convolution layer, the query Q, key K and value V are input into transformation functions to generate Q, K and V, self-attention force diagrams of non-activated functions of the current layer are obtained according to the Q and K, the self-attention force diagrams of the non-activated functions of the current layer are connected with the self-attention force diagrams of the non-activated functions output by the upper layer high-efficiency residual self-attention module through residual errors, the self-attention force diagrams of the activated functions of the current layer are obtained through adding the self-attention force diagrams of the non-activated functions according to pixel points and then through ELU (extreme light unit) activation functions, and the self-attention force diagrams of the activated functions of the current layer are obtained according to the following formulas:

；

Wherein,、/>and->The self-attention force diagram of the non-activated function of the current layer, the self-attention force diagram of the non-activated function of the upper layer and the self-attention force diagram of the activated function of the current layer are respectively; />、/>Is a learnable parameter; />Representing ELU activation function, based on the value V and self-attention force diagram +.>Generating self-attention seeking weighting features->Self-care force diagram weighting feature->After passing through 1 x 1 depth separable convolution layer, the input feature of the self-attention module is to be combined with the high-efficiency residual error +.>Adding by pixel to generate the output feature +.>。

Preferably, the hybrid transform residual module comprises 2 high-efficiency residual self-attention modules, 2 3×3 depth separable convolutional layers, and 2 1×1 convolutional layers, and the specific operations are as follows:

；

wherein,、/>、/>and->Representing the functions corresponding to the 1 st 1 x 1 convolution layer, the 1 st high-efficiency residual self-attention module, the 1 st 1 x 1 depth separable convolution layer and the 2 nd high-efficiency residual self-attention module respectively,/->、/>And->Respectively representing the input of the hybrid transform residual module, the output of the 1 st 1 x 1 convolution layer, and the output of the hybrid transform residual module,/>Representation->And (3) operating.

Preferably, the high-efficiency detail self-attention module comprises 2 multi-head self-attention modules, 2 feedforward nerve modules and 1X 1 convolution layers, wherein vectors obtained by expanding 2 groups of full-connected layers with asymmetric cores in the 2 multi-head self-attention modules respectively generate a query Q ⁱ Key K ⁱ Sum value V ⁱ Wherein, the method comprises the steps of, wherein,q-switching queries in group 2 and utilizing Q ¹ K ^2T And Q is equal to ² K ^1T Generating two sets of self-attention force diagrams->，/>And then respectively with the value V ² And V is equal to ¹ Performing matrix multiplication to generate two aggregate features +.>Respectively inputting the two aggregation features into 2 feedforward nerve modules to obtain the output of the 2 feedforward nerve modules, and performing +_ on the output of the 2 feedforward nerve modules after feature folding>Operating to obtain splicing characteristics, inputting the splicing characteristics into a 1×1 convolution layer to obtain second characteristics +.>The formula is as follows:

；

wherein,and->Representing the modules corresponding to the multi-head self-attention module and the feedforward nerve module in the classical transducer respectively; />Representing the output of the first multi-headed self-attention module, the output of the second multi-headed self-attention module, the output of the first feedforward neural module, the output of the second feedforward neural module, and the second characteristic, respectively; />Is a parameter that can be learned; />Representing a function corresponding to the 1×1 convolution layer; />Representing a Softmax activation function.

Preferably, the image reconstruction module uses sub-pixel convolution for upsampling.

Preferably, the operation of the lightweight infrared image super-resolution model is as follows:

；

Wherein,representing 3X 3Convolution layer corresponding function, ++>Is an output characteristic of a 3 x 3 convolutional layer,representing the function of the lightweight transducer corresponding to the CNN backbone, < >>Is the output feature of the lightweight transducer and CNN backbone, < >>Representing a function corresponding to the high-efficiency detail self-attention module, wherein n represents n times of cycling of the high-efficiency detail self-attention module in a manner of sharing parameters,/->Is an output feature of the high-efficiency detail self-attention module,/->Representing the function corresponding to the image reconstruction module, +.>And->Representing an input low resolution image and an output high resolution image, respectively.

In a second aspect, the present invention provides a self-attention-based infrared image super-resolution apparatus, comprising:

the data acquisition module is configured to acquire a low-resolution infrared image to be reconstructed;

the model construction module is configured to construct and train a light-weight infrared image super-resolution model based on self-attention, so as to obtain a trained light-weight infrared image super-resolution model;

the execution module is configured to input a low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, wherein the trained lightweight infrared image super-resolution model comprises a 3×3 convolution layer, a lightweight Transformer and CNN backbone, a high-efficiency detail self-attention module and an image reconstruction module, and the lightweight Transformer and CNN backbone comprises 6 hybrid Transformer residual modules and 2 high-efficiency residual self-attention modules which are sequentially connected, and the specific operation is as follows:

；

Wherein,and->Respectively representing functions corresponding to the hybrid transducer residual error module and the high-efficiency residual error self-attention module; />、/>And->The output of the first characteristic and the 6 th high-efficiency residual self-attention module and the output of the lightweight transducer and CNN backbone are respectively represented; the low-resolution infrared image to be reconstructed is input into a 3X 3 convolution layer to obtain a first feature, the first feature is sequentially subjected to a lightweight Transformer, CNN backbone and a high-efficiency detail self-attention module, the high-efficiency detail self-attention module circulates n times in a parameter sharing mode to obtain a second feature, the first feature and the second feature are subjected to residual connection and then input into an image reconstruction module, and the high-resolution infrared image is output.

In a third aspect, the present invention provides an electronic device comprising one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

(1) The self-attention-based infrared image super-resolution method provided by the invention uses fewer parameters, and simultaneously obtains better results on subjective visual effect and objective evaluation index compared with the existing infrared image super-resolution method.

(2) The high-efficiency residual self-attention module in the self-attention-based infrared image super-resolution method guides the feature weighting of the head self-attention module through residual connection, avoids the multi-head self-attention module from generating suboptimal self-attention force diagram, thereby extracting more valuable global features and further assisting in generating high-quality infrared images.

(3) The high-efficiency detail self-attention module in the self-attention-based infrared image super-resolution method extracts global features of pixel-level granularity by utilizing self-attention based on an asymmetric window so as to enhance details of a reconstructed infrared image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary device frame pattern to which an embodiment of the present application may be applied;

FIG. 2 is a flow chart of a self-focusing infrared image super-resolution method according to an embodiment of the application;

FIG. 3 is a flow chart of a dual solid model calculation of a self-focusing infrared image super-resolution method according to an embodiment of the application;

FIG. 4 is a schematic diagram of a high-efficiency residual self-attention module of a self-attention based infrared image super-resolution method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a hybrid transducer residual module of a self-focusing infrared image super-resolution method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a lightweight Transformer and CNN backbone of a self-focusing infrared image super-resolution method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a high-efficiency detailed self-attention module of a self-attention based infrared image super-resolution method according to an embodiment of the present application;

FIG. 8 is a SR image contrast diagram of the self-attention based infrared image super-resolution method and other infrared image super-resolution methods according to the embodiment of the application;

FIG. 9 is a schematic diagram of a self-attention based infrared image super-resolution device according to an embodiment of the present application;

Fig. 10 is a schematic structural view of a computer device suitable for use in implementing an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 illustrates an exemplary device architecture 100 of a self-attention-based infrared image super-resolution method or a self-attention-based infrared image super-resolution device to which embodiments of the present application may be applied.

As shown in fig. 1, the apparatus architecture 100 may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various applications, such as a data processing class application, a file processing class application, and the like, may be installed on the terminal device one 101, the terminal device two 102, and the terminal device three 103.

The first terminal device 101, the second terminal device 102 and the third terminal device 103 may be hardware or software. When the first terminal device 101, the second terminal device 102, and the third terminal device 103 are hardware, they may be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like. When the first terminal apparatus 101, the second terminal apparatus 102, and the third terminal apparatus 103 are software, they can be installed in the above-listed electronic apparatuses. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background data processing server that processes files or data uploaded by the terminal device one 101, the terminal device two 102, and the terminal device three 103. The background data processing server can process the acquired file or data to generate a processing result.

It should be noted that, the self-attention-based infrared image super-resolution method provided by the embodiment of the present application may be executed by the server 105, or may be executed by the first terminal device 101, the second terminal device 102, or the third terminal device 103, and accordingly, the self-attention-based infrared image super-resolution device may be set in the server 105, or may be set in the first terminal device 101, the second terminal device 102, or the third terminal device 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the above-described apparatus architecture may not include a network, but only a server or terminal device.

Fig. 2 shows a self-attention based infrared image super-resolution method according to an embodiment of the present application, including the following steps:

s1, acquiring a low-resolution infrared image to be reconstructed.

Specifically, the low-resolution infrared image to be reconstructed can be used as the input of the super-resolution model of the lightweight infrared image.

S2, constructing and training a self-attention-based lightweight infrared image super-resolution model to obtain a trained lightweight infrared image super-resolution model.

In a specific embodiment, the operation of the lightweight infrared image super-resolution model is as follows:

；

wherein,representing the function corresponding to the 3 x 3 convolutional layer, < >>Is an output characteristic of a 3 x 3 convolutional layer,representing the function of the lightweight transducer corresponding to the CNN backbone, < >>Is the output feature of the lightweight transducer and CNN backbone, < >>Representing a function corresponding to the high-efficiency detail self-attention module, wherein n represents n times of cycling of the high-efficiency detail self-attention module in a manner of sharing parameters,/->Is an output feature of the high-efficiency detail self-attention module,/->Representing the function corresponding to the image reconstruction module, +.>And->Representing an input low resolution image and an output high resolution image, respectively.

Specifically, referring to fig. 3, the framework of the lightweight infrared image super-resolution model involves 1 3×3 convolution layer, 1 lightweight transform and CNN backbone, 1 high-efficiency detail self-attention module and 1 image reconstruction module. The contents of each section will be specifically described below.

S3, inputting a low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, wherein the trained lightweight infrared image super-resolution model comprises a 3X 3 convolution layer, a lightweight Transformer and CNN backbone, a high-efficiency detail self-attention module and an image reconstruction module, and the lightweight Transformer and CNN backbone comprises 6 mixed Transformer residual modules and 2 high-efficiency residual self-attention modules which are sequentially connected, and the specific operation is as follows:

；

In a specific embodiment, the efficient residual self-attention module includes 1 layer normalization, 4 1 x 1 depth separable convolutional layers, and 3 x 3 depth separable convolutional layers, as follows:

features received by the high-efficiency residual self-attention module are normalized through layers, and the features are respectively processed by 3 groups of depth separable convolution modules to generate a query Q, a key K and a value V, wherein the depth separable convolution modules comprise 1X 1 depth separable convolution layer and 1 3X 3 depth separable convolution layer, the query Q, the key K and the value V are input into transformation functions to generate Q, K and V, self-attention force diagrams of non-activated functions of the current layer are obtained according to the Q, K, and the self-attention force diagrams of the non-activated functions of the current layer are added according to pixel points through residual connection and then are processed by ELU (electro-luminescence) activation functions, and the self-attention force diagrams of the activated functions of the current layer are obtained according to the following formulas:

；

Specifically, referring to FIG. 4, an efficient residual self-attention module is first constructed, the inputs of which are in addition to the featuresThe method also comprises the step of realizing the residual connection from the high-efficiency residual self-attention module of the upper layer by self-attention force diagram of the non-activated function of the high-efficiency residual self-attention module of the upper layer. When the efficient residual self-attention module receives the feature +.>Sequentially make the characteristics->Generating query Q, key K, and value V +.>. Next, Q, K and V are generated using transformation functions >. The transformation function refers to an operation (HW) of expanding a two-dimensional feature map (hxw) into a 1-dimensional vector. Then, the self-attention map of the unactivated function of the current layer is obtained by utilizing Q and K, and the self-attention map of the unactivated function of the previous layer is added by pixel point through residual connection and then passed throughAnd (3) ELU activating the function to finally obtain the attention diagram of the current layer through the activating function.

In a specific embodiment, the hybrid transform residual module includes 2 efficient residual self-attention modules, 2 3×3 depth separable convolutional layers, and 2 1×1 convolutional layers, which operate as follows:

；

Specifically, a hybrid transducer residual module is further constructed, referring to fig. 5, the dashed line represents the self-attention seeking flow achieved through residual connection. In the light-weight convertor and CNN backbones, the output of the last hybrid convertor residual error module is used as the input of the current hybrid convertor residual error module, the input sequentially passes through a first 3X 3 convolution layer and a first high-efficiency residual error self-focusing module, the output of the first high-efficiency residual error self-focusing module is connected with the input residual error and then is input into the first 1X 1 convolution layer, the output of the second high-efficiency residual error self-focusing module sequentially passes through a second 3X 3 convolution layer and a second high-efficiency residual error self-focusing module, and the output of the second high-efficiency residual error self-focusing module is connected with the output residual errors of the input and the first high-efficiency residual error self-focusing module and then is input into the second 1X 1 convolution layer, so that the output of the current hybrid convertor residual error module is obtained.

Specifically, referring to fig. 6, a hybrid transducer residual error module and a high-efficiency residual error self-attention module are adopted to further construct a lightweight transducer and CNN backbone, and specifically, the lightweight transducer comprises 6 hybrid transducer residual error modules and 2 high-efficiency residual error self-attention modules which are sequentially connected.

In a particular embodiment, the high-efficiency detail self-attention module includes 2 multi-headed self-attention modules, 2 feedforward neural modules, and 1×1 convolutional layers, each of the 2 multi-headed self-attention modules generating a query Q by developing vectors from 2 sets of fully-connected layers with asymmetric kernels ⁱ Key K ⁱ Sum value V ⁱ Wherein, the method comprises the steps of, wherein,，/>q-switching queries in group 2 and utilizing Q ¹ K ^2T And Q is equal to ² K ^1T Generating two sets of self-attention force diagrams->，/>And then respectively with the value V ² And V is equal to ¹ Performing matrix multiplication to generate two aggregate features +.>Respectively inputting the two aggregation features into 2 feedforward nerve modules to obtain the output of the 2 feedforward nerve modules, and performing +_ on the output of the 2 feedforward nerve modules after feature folding>Operating to obtain splicing characteristics, inputting the splicing characteristics into a 1×1 convolution layer to obtain second characteristics +.>The formula is as follows:

；

wherein,and->Representing the modules corresponding to the multi-head self-attention module and the feedforward nerve module in the classical transducer respectively; / >Representing the output of the first multi-headed self-attention module, the output of the second multi-headed self-attention module, the output of the first feedforward neural module, the output of the second feedforward neural module, and the second characteristic, respectively; />Is a parameter that can be learned; />Representing a function corresponding to the 1×1 convolution layer; />Representing a Softmax activation function.

Specifically, referring to fig. 7, the high-efficiency detail self-attention module is composed of 2 multi-headed self-attention modules and 2 feedforward nerve modules. Query Q is first generated by developing the resulting vector through 2 sets of asymmetric kernels ⁱ Key K ⁱ Sum value V ⁱ . The queries within the 2 groups are then Q-switched, thereby utilizing Q ¹ K ^2T And Q is equal to ² K ^1T Generating two sets of self-attention force diagrams->，/>And then respectively with the value V ² And V is equal to ¹ Performing matrix multiplication to generate an aggregate feature +.>. These two sets of features would then be input to the two feed forward neural modules, respectively. Thereafter, the two outputs of the two feedforward neural modules will restore the shape of the feature map to the shape of the input feature (H W V) in a folding operation opposite to the unfolding operation described above, and pass +.>The operations are polymerized on the channels, and finally the original channel number is restored by 1X 1 convolution, so as to obtain a second characteristic +.>。

In a particular embodiment, the image reconstruction module uses sub-pixel convolution for upsampling.

Specifically, it willThe low-resolution infrared image is used as the input of the super-resolution model of the trained lightweight infrared image, and the obtained output is a +.>A high resolution infrared image in dimension, wherein scale is the required magnification.

The above steps S1-S3 do not merely represent the order between steps, but rather are step notations.

Ablation experiments a were performed as follows for the examples of the present invention. In the ablation experiment, when testing the performance of one module, the basic module used by different experimental projects in the same group of experiments is always ensured to be identical to the network structure so as to stabilize non-experimental variables. The experimentally selected image reconstruction magnification was 4 times.

Ablation experiment a, ablation experiment of high-efficiency residual self-attention module and high-efficiency detail self-attention module. When neither module is used, the basic residual block is used for substitution. The experimental results are shown in table 1, and it can be obtained that after the high-efficiency residual self-attention module and the high-efficiency detail self-attention module are used, the average peak signal to noise ratio is highest, which indicates that the model learns the mixed characteristics which are more conducive to image reconstruction, and the super-resolution reconstruction effect is remarkably enhanced.

TABLE 1

Comparing this embodiment with other advanced lightweight infrared image SR methods, the comparison results are shown in table 2, where in the performance and efficiency trade-off, this embodiment uses fewer parameters, but obtains the optimal peak signal-to-noise ratio and structural similarity in the SR task for all magnification factors.

TABLE 2

Referring to FIG. 8, a SR image contrast diagram of an embodiment of the application and other infrared image SR methods is shown. Since fusion-A is a public infrared image dataset, embodiments of the present application perform subjective visual contrast on fusion-A. In the reconstructed image of "Fused1", the line distortion and the local blurring occur to different degrees in all reconstructed images. However, the embodiment of the application obviously restores the whole outline and the line trend of the image block more completely, which is superior to other advanced methods. Further, in the reconstructed image of "Fused4", the reconstructed image of the embodiment of the application has more precise contours and less blurring than the reconstructed image of other advanced methods.

Therefore, the infrared image super-resolution method based on self-attention provided by the embodiment of the application can obviously reduce the model parameter, can obtain the optimal peak signal-to-noise ratio and structural similarity, and maintains a higher super-resolution reconstruction effect.

With further reference to fig. 9, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of a self-focusing infrared image super-resolution apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

The embodiment of the application provides an infrared image super-resolution device based on self-attention, which comprises:

a data acquisition module 1 configured to acquire a low resolution infrared image to be reconstructed;

the model construction module 2 is configured to construct and train a self-attention-based lightweight infrared image super-resolution model to obtain a trained lightweight infrared image super-resolution model;

the execution module 3 is configured to input a low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, wherein the trained lightweight infrared image super-resolution model comprises a 3×3 convolution layer, a lightweight Transformer and CNN backbone, a high-efficiency detail self-attention module and an image reconstruction module, and the lightweight Transformer and CNN backbone comprises 6 hybrid Transformer residual modules and 2 high-efficiency residual self-attention modules which are sequentially connected, and the specific operations are as follows:

；

wherein,and->Respectively representing functions corresponding to the hybrid transducer residual error module and the high-efficiency residual error self-attention module; />、/>And->The output of the first characteristic and the 6 th high-efficiency residual self-attention module and the output of the lightweight transducer and CNN backbone are respectively represented; the low-resolution infrared image to be reconstructed is input into a 3X 3 convolution layer to obtain a first characteristic, and then sequentially passes through a lightweight transducer, a CNN backbone, a high-efficiency detail self-attention module and the high-efficiency detail self-attention module And (3) cycling for n times in a parameter sharing mode to obtain a second feature, connecting the first feature and the second feature in a residual way, inputting the residual connection to an image reconstruction module, and outputting a high-resolution infrared image.

Referring now to fig. 10, there is illustrated a schematic diagram of a computer apparatus 1000 suitable for use in an electronic device (e.g., a server or terminal device as illustrated in fig. 1) for implementing an embodiment of the present application. The electronic device shown in fig. 10 is merely an example, and should not impose any limitation on the functionality and scope of use of embodiments of the present application.

As shown in fig. 10, the computer apparatus 1000 includes a Central Processing Unit (CPU) 1001 and a Graphics Processor (GPU) 1002, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1003 or a program loaded from a storage section 1009 into a Random Access Memory (RAM) 1004. In the RAM 1004, various programs and data required for the operation of the apparatus 1000 are also stored. The CPU 1001, the GPU1002, the ROM 1003, and the RAM 1004 are connected to each other by a bus 1005. An input/output (I/O) interface 1006 is also connected to bus 1005.

The following components are connected to the I/O interface 1006: an input section 1007 including a keyboard, a mouse, and the like; an output portion 1008 including a speaker, such as a Liquid Crystal Display (LCD), or the like; a storage section 1009 including a hard disk or the like; and a communication section 1010 including a network interface card such as a LAN card, a modem, or the like. The communication section 1010 performs communication processing via a network such as the internet. The drive 1011 may also be connected to the I/O interface 1006 as needed. A removable medium 1012 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 1011 as necessary, so that a computer program read out therefrom is installed into the storage section 1009 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communications portion 1010, and/or installed from the removable media 1012. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 1001 and a Graphics Processor (GPU) 1002.

It should be noted that the computer readable medium according to the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or means, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present application may be implemented in software or in hardware. The described modules may also be provided in a processor.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a low-resolution infrared image to be reconstructed; constructing and training a light-weight infrared image super-resolution model based on self-attention to obtain a trained light-weight infrared image super-resolution model; inputting a low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, wherein the trained lightweight infrared image super-resolution model comprises a 3×3 convolution layer, a lightweight Transformer and CNN backbone, a high-efficiency detail self-attention module and an image reconstruction module, and the lightweight Transformer and CNN backbone comprises 6 hybrid Transformer residual modules and 2 high-efficiency residual self-attention modules which are sequentially connected, and the specific operation is as follows:

；

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. The self-attention-based infrared image super-resolution method is characterized by comprising the following steps of:

acquiring a low-resolution infrared image to be reconstructed;

inputting the low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, wherein the trained lightweight infrared image super-resolution model comprises a 3×3 convolution layer, lightweight transformers and CNN backbones, a high-efficiency detail self-attention module and an image reconstruction module, and the lightweight transformers and CNN backbones comprise 6 hybrid transformers residual modules and 2 high-efficiency residual self-attention modules which are sequentially connected, and the specific operation is as follows:

；

wherein,and->Respectively representing functions corresponding to the hybrid converter residual error module and the high-efficiency residual error self-attention module; />、/>And->The output of the first characteristic and the 6 th high-efficiency residual self-attention module and the output of the lightweight Transformer and CNN backbone are respectively represented; the low-resolution infrared image to be reconstructed is input into a 3X 3 convolution layer to obtain a first feature, then sequentially passes through the lightweight Transformer, CNN backbone and the high-efficiency detail self-attention module, the high-efficiency detail self-attention module circulates n times in a parameter sharing mode to obtain a second feature, the first feature and the second feature are connected in a residual mode, and then the first feature and the second feature are input into the image reconstruction module to output a high-resolution infrared image.

2. The method of claim 1, wherein the efficient residual self-attention module comprises 1 layer normalization, 4 1 x 1 depth separable convolutional layers, and 3 x 3 depth separable convolutional layers, as follows:

the input features are normalized through layers and respectively pass through 3 groups of depth separable convolution modules to generate query Q, key K and value V, wherein the depth separable convolution modules comprise 1 multiplied by 1 depth separable convolution layer and 1 multiplied by 3 depth separable convolution layer, the query Q, the key K and the value V are input into transformation functions to generate Q, K and V, the self-attention force diagram of the non-activated function of the current layer is obtained according to the Q, K, and the self-attention force diagram of the non-activated function of the current layer is connected with the self-attention force diagram of the non-activated function output by the last layer of high-efficiency residual self-attention module through residual connection according to pixel points and then is subjected to ELU activation functions, and the self-attention force diagram of the activated function of the current layer is obtained according to the following formula:

；

wherein,、/>and->The self-attention force diagram of the non-activated function of the current layer, the self-attention force diagram of the non-activated function of the upper layer and the self-attention force diagram of the activated function of the current layer are respectively; / >、/>Is a learnable parameter; />Representing ELU activation function, based on the value V and self-attention force diagram +.>Generating self-attention seeking weighting features->The self-attention seeking weighting feature +.>After passing through 1 x 1 depth separable convolution layer, the input feature of the self-attention module is to be combined with the high-efficiency residual error +.>Adding by pixel to generate the output feature +.>。

3. The method of claim 1, wherein the hybrid transform residual module comprises 2 high-efficiency residual self-attention modules, 2 3 x 3 depth separable convolutional layers, and 2 1 x 1 convolutional layers, and the method comprises the following steps:

；

wherein,、/>、/>and->Representing the functions corresponding to the 1 st 1 x 1 convolution layer, the 1 st high-efficiency residual self-attention module, the 1 st 1 x 1 depth separable convolution layer and the 2 nd high-efficiency residual self-attention module respectively,/->、And->Respectively representing the input of the hybrid transform residual module, the output of the 1 st 1 x 1 convolution layer and the output of the hybrid transform residual module,>representation->And (3) operating.

4. The method of claim 1, wherein the high-efficiency detail self-attention module comprises 2 multi-headed self-attention modules, 2 feedforward neural modules, and 1 x 1 convolution layer, each of the 2 multi-headed self-attention modules generating a query Q by developing vectors from 2 sets of fully connected layers with asymmetric kernels ⁱ Key K ⁱ Sum value V ⁱ Wherein Q is ⁱ ,K ⁱ ,V ⁱ Q-switching queries in group 2 and utilizing Q ¹ K ^2T And Q is equal to ² K ^1T Generating two sets of self-attention force diagrams A ⁱ /> ，/>And then respectively with the value V ² And V is equal to ¹ Performing matrix multiplication to generate two aggregate features F ⁱ />Respectively inputting the two aggregation features into 2 feedforward nerve modules to obtain the output of the 2 feedforward nerve modules, and performing +_ on the output of the 2 feedforward nerve modules after feature folding>Operating to obtain splicing characteristics, inputting the splicing characteristics into a 1X 1 convolution layer to obtain second characteristics F _output />The formula is as follows:

；

wherein,and->Representing the modules corresponding to the multi-head self-attention module and the feedforward nerve module in the classical transducer respectively; />、/>、/>、/>And->Representing the output of the first multi-headed self-attention module, the output of the second multi-headed self-attention module, the output of the first feedforward neural module, the output of the second feedforward neural module, and the second characteristic, respectively; />Is a parameter that can be learned; />Representing a function corresponding to the 1×1 convolution layer; />Representing a Softmax activation function.

5. The self-attention based infrared image super resolution method of claim 1, wherein said image reconstruction module uses sub-pixel convolution for up-sampling.

6. The self-attention-based infrared image super-resolution method of claim 1, wherein the operation of the lightweight infrared image super-resolution model is as follows:

；

wherein,representing the function corresponding to the 3 x 3 convolutional layer, < >>Is an output characteristic of a 3 x 3 convolutional layer, < >>Representing the function of the lightweight transducer corresponding to the CNN backbone, +.>Is the output characteristic of the lightweight transducer and CNN backbone, +.>Representing a function corresponding to the high-efficiency detail self-attention module, wherein n represents n times of circulation of the high-efficiency detail self-attention module in a manner of sharing parameters,/->Output features of self-attention module which are efficient details，Representing a function corresponding to said image reconstruction module, < >>And->Representing an input low resolution image and an output high resolution image, respectively.

7. An infrared image super-resolution device based on self-attention, comprising:

The execution module is configured to input the low-resolution infrared image to be reconstructed into a trained lightweight infrared image super-resolution model, the trained lightweight infrared image super-resolution model comprises a 3 x 3 convolution layer, a lightweight Transformer and CNN backbone, a high-efficiency detail self-attention module and an image reconstruction module, wherein the lightweight Transformer and CNN backbone comprise 6 hybrid Transformer residual modules and 2 high-efficiency residual self-attention modules which are sequentially connected, and the specific operation is as follows:

；

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.