CN116402679A

CN116402679A - Lightweight infrared super-resolution self-adaptive reconstruction method

Info

Publication number: CN116402679A
Application number: CN202211692350.XA
Authority: CN
Inventors: 蒋一纯; 刘云清; 詹伟达; 陈宇; 韩登; 于永吉
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-07-07
Anticipated expiration: 2042-12-28

Abstract

The invention belongs to the technical field of image processing, in particular to a lightweight infrared super-resolution self-adaptive reconstruction method, which comprises the following steps: step 1, constructing a network model: the infrared image super-resolution reconstruction model comprises an input initialization layer, an image feature extraction module and an output image reconstruction module; step 2, preparing a data set: preparing an infrared image data set, and performing analog downsampling and data augmentation on the infrared image data set so as to perform subsequent network training; step 3, training a network model: and training an infrared image super-resolution reconstruction model. The self-adaptive image feature processing unit provided by the invention limits the self-attention mechanism in the sliding window, and self-adaptively calculates and updates the feature value in the window depending on each feature in the sliding window, so that the same convolution kernel is avoided being adopted in the local window, the expression capability is improved, and the calculated amount generated in the self-attention mechanism training and reasoning process is reduced.

Description

Lightweight infrared super-resolution self-adaptive reconstruction method

Technical Field

The invention relates to the technical field of image processing, in particular to a lightweight infrared super-resolution self-adaptive reconstruction method.

Background

The imaging mechanism of the infrared image is imaging by sensing thermal radiation emitted by objects in the environment, does not depend on reflection of ambient light or an artificial light source, and has strong anti-interference and all-weather working capacity; due to the excellent identification capability and the characteristics of passive imaging, the method is widely applied to the fields of military, automatic driving, security protection and the like; however, the manufacturing process of the infrared imaging sensor is complex, and the dense array needs to be supported by a refrigerator, so that the resolution ratio of the infrared imaging sensor is generally low and the cost is high; compared with a direct improved imaging sensor, the method for recovering high-frequency information in partial infrared images by using an image super-resolution method can improve the resolution and quality of the images, can effectively improve the imaging quality, is low in cost, and has important practical significance and wide application prospect; the super-resolution of the infrared image is a high underdetermined problem, and the lost details need to be estimated through a large number of image structural relations, so that the super-resolution reconstruction of the infrared image is difficult; the mainstream scheme at present is to use a convolutional neural network to complete mapping from a low-resolution infrared image to a high-resolution infrared image, which is limited by the principle of convolutional kernel parameter multiplexing in the convolutional network.

The Chinese patent publication number is CN112308772B, the name is CN112308772B, the method is a super-resolution reconstruction method based on deep learning local and non-local information, a deep neural network model is constructed, the same set of feature screening network is time-division multiplexed after an image is input into the network, the two modules comprise a local network and a non-local enhancement network, and lost details in the image are recovered through a very deep convolution operation; the convolution operation adopts a fixed convolution kernel at each layer, so that the expression capacity of a shallow network is poor, the network is often designed to be deep and wide, and the computational complexity and the storage capacity occupancy rate are high; therefore, how to overcome the limitation of convolution operation, to achieve high quality super-resolution reconstruction through a small number of learnable parameters and multiply-add operation is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a lightweight infrared super-resolution self-adaptive reconstruction method, which solves the problems in the background art.

(II) technical scheme

The invention adopts the following technical scheme for realizing the purposes:

a lightweight infrared super-resolution self-adaptive reconstruction method comprises the following steps:

step 1, constructing a network model: the infrared image super-resolution reconstruction model comprises an input initialization layer, an image feature extraction module and an output image reconstruction module;

step 2, preparing a data set: preparing an infrared image data set, and performing analog downsampling and data augmentation on the infrared image data set so as to perform subsequent network training;

step 3, training a network model: training an infrared image super-resolution reconstruction model, and inputting the data set prepared in the step 2 into the network model constructed in the step 1 for training;

step 4, minimizing the loss function and selecting an optimal evaluation index: outputting a loss function of the image and the label through a minimized network, considering that the model parameters are pre-trained and finishing until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters; simultaneously selecting an optimal evaluation index to measure the accuracy of the algorithm and evaluating the performance of the system;

step 5, fine tuning the model: preparing a plurality of additional infrared image data sets, training and fine-tuning the model to obtain better model parameters, and further improving the generalization capability of the model; finally, the model maintains good reconstruction quality when coping with infrared imagers of various models;

step 6, saving the model: and solidifying the finally determined model parameters, and directly inputting the image into a network to obtain a final reconstructed image when the infrared image super-resolution reconstruction operation is needed.

In the light-weight infrared super-resolution self-adaptive reconstruction method, in the step 1, input is initialized into a single-layer convolution layer in an infrared image super-resolution reconstruction model, and the single-layer convolution layer is used for mapping an input image into a feature space for further refinement and processing of subsequent features; the image feature extraction module consists of four layers of self-adaptive image feature processing units, in particular, the self-adaptive image feature processing unit consists of a first convolution layer, a self-attention layer and a second convolution layer, wherein the self-attention layer consists of linear feature disassembly, a self-attention mechanism, a relative position coding layer, a first full-connection layer, a second full-connection layer and feature recombination; the output image reconstruction module consists of a channel compression layer, a global jump connection and a pixel recombination layer.

According to the lightweight infrared super-resolution self-adaptive reconstruction method, the FLIRADAS dataset is used for the infrared image dataset in the training process in the step 2; respectively simulating downsampling by 2 times, 3 times and 4 times on infrared images in a data set, and performing supervised training on super-resolution reconstruction models of different super-resolution scales;

according to the lightweight infrared super-resolution self-adaptive reconstruction method, in the step 4, the self-adaptive loss function is selected and used as the loss function in the training process, and under the condition of high deviation value, pixel loss is introduced to stably and rapidly optimize network parameters, so that the problem of gradient explosion is avoided; when the deviation value is reduced below the threshold value, adopting structural loss to restore the texture details focused on the image when the network parameters are optimized; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the true value, and can correctly feed back the quality of the model.

According to the lightweight infrared super-resolution self-adaptive reconstruction method, in the step 4, the proper evaluation indexes in the training process select peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM), so that the quality of the super-resolution reconstruction result of the algorithm and the distortion degree between the real high-resolution images can be effectively evaluated, and the performance of the network model can be measured.

In the above-mentioned light-weight infrared super-resolution adaptive reconstruction method, in the step 5, MFNet and TNO datasets are used in the process of fine tuning model parameters.

The invention also provides a lightweight infrared super-resolution electronic device, comprising: a multifunctional video stream input/output interface, a central processing unit, a plurality of graphic processing units, a storage device and a computer program stored on the storage device and capable of running on the processor; wherein the steps of the above method are implemented when the central processing unit and the plurality of image processing units execute a computer program.

The invention also provides a computer readable storage medium having stored thereon computer program instructions which when executed by a processor perform the steps of the above method.

(III) beneficial effects

Compared with the prior art, the invention provides a lightweight infrared super-resolution self-adaptive reconstruction method, which has the following beneficial effects:

the self-adaptive image feature processing unit provided by the invention limits the self-attention mechanism in the sliding window, and self-adaptively calculates and updates the feature value in the window depending on each feature in the sliding window, so that the same convolution kernel is avoided being adopted in the local window, the expression capability is improved, and the calculated amount generated in the self-attention mechanism training and reasoning process is reduced.

In the self-adaptive image characteristic processing unit, the relative position codes are added in the sliding window, so that the overlapping part is prevented from being repeatedly calculated when self-attention is calculated; and mathematical expectation updating of corresponding areas in each window is used in the recombination of the overlapped parts, and no additional information interaction means between windows are needed.

The invention does not use layer normalization operation in the self-attention computing mechanism, thereby ensuring the integrity of image structure information and contrast information; meanwhile, the input characteristic vector and the new characteristic vector are spliced and then input into a feedforward network for updating, so that the low-frequency structure of the image is better kept.

The invention provides a self-adaptive loss function, which can automatically select to enable a network to learn overall similarity or image texture details through monitoring the state of a network model in real time in the training process, thereby improving the reconstruction performance of the finally obtained network model.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram showing a network model structure according to the present invention;

FIG. 3 is a process flow diagram of an adaptive image processing unit of the present invention;

FIG. 4 is a schematic diagram of the working principle of the feature diagram in the self-focusing mechanism of the sliding window according to the present invention;

FIG. 5 is a schematic diagram illustrating the operation of the pixel reorganization according to the present invention;

FIG. 6 is a graph of the main performance index comparison result of the method for realizing light-weight infrared super resolution and the prior art according to the present invention;

fig. 7 is a schematic diagram of an internal structure of an electronic device for implementing a lightweight infrared super-resolution method according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1, a flow chart of a lightweight infrared super-resolution adaptive reconstruction method specifically includes the following steps:

step 1, constructing a network model: the infrared image super-resolution reconstruction model comprises an input initialization layer, an image feature extraction module and an output image reconstruction module; the input initialization is a single-layer convolution layer and is used for mapping the input image into a feature space for further refinement and processing of subsequent features; the image feature extraction module consists of four layers of self-adaptive image feature processing units, in particular, the self-adaptive image feature processing unit consists of a first convolution layer, a feature disassembly layer, a relative position coding layer, a self-attention layer, a feature recombination layer and a second convolution layer, wherein the self-attention layer consists of a linear self-attention mechanism, a first full-connection layer and a second full-connection layer; the output image reconstruction module consists of a channel compression layer, a global jump connection layer and a pixel recombination layer;

step 2, preparing a data set: preparing an FLIRADAS infrared image data set; the infrared images in the data set are amplified, and downsampling is simulated by 2 times, 3 times and 4 times respectively, so that the infrared images are used for performing supervised training on super-resolution reconstruction models with different super-resolution scales;

step 4, minimizing the loss function and selecting an optimal evaluation index: the self-adaptive loss function is selected and used for the loss function in the training process, and under the condition of high deviation value, pixel loss is introduced to stably and rapidly optimize network parameters, so that the problem of gradient explosion is avoided; when the deviation value is reduced below the threshold value, adopting structural loss to restore the texture details focused on the image when the network parameters are optimized; outputting a loss function of the image and the label through a minimized network, considering that the model parameters are pre-trained and finishing until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters; simultaneously selecting an optimal evaluation index to measure the accuracy of the algorithm and evaluating the performance of the system;

step 5, fine tuning the model: preparing an MFNet and TNO infrared image data set, training and fine-tuning the model to obtain better model parameters, and further improving the generalization capability of the model; finally, the model maintains good reconstruction quality when coping with infrared imagers of various models;

step 6, saving the model: and solidifying the finally determined model parameters, and directly inputting the image into a network to obtain a final reconstructed image when the infrared super-resolution operation is needed.

Example 2:

step 1, constructing a network model;

the super-resolution reconstruction model of the whole infrared image in the step 1 comprises an input initialization layer, an image feature extraction module and an output image reconstruction module; the input initialization layer is a convolution layer with 3×3 convolution kernel, 1 step size, 1 padding and offset parameter set, which is derived from input I _ir ∈ ^1×H×W Conversion into feature space to obtain initial feature f ₁ ∈ ^C×H×W The process of (1) can be expressed as:

f ₁ ＝W ₁ *I _ir +B ₁

in which W is ₁ To input the convolution kernel in the initialization layer, B ₁ For the bias in the convolution operation, represent the convolution operation;

then, the features are further processed by an input image feature extraction module, wherein the image feature extraction module comprises 4 self-adaptive image processing units, each unit is responsible for processing the feature image output by the previous layer, and the output feature image of the previous layer and the input feature image are spliced in the channel dimension and then output; feature f ₁ ∈ ^C×H×W Input image feature extraction module, and obtain output features f of each unit _n ∈ ^(n+1)C×H×W The specific process of n=1, 2,3,4 can be expressed as:

in the method, in the process of the invention,

the n-th self-adaptive image processing unit has the working principle shown in figure 2; in the adaptive image processing unit, a feature map f _i ′∈ ^C×H×W The channel is changed into the length of the characteristic vector required by the self-attention mechanism by convolution with a convolution kernel of 1 multiplied by 1 and a step length of 1 to obtain a new characteristic diagram f ₁ ′∈ ^C ' ^×H×W The process can be expressed as:

f ₁ ′＝σ(W _i ′*f _i ′+B _i ′)

wherein σ (x) =max (x, 0) +min (x, p) is a parametric linear rectification function; because of its high efficiency and excellent fitting ability, in the present invention, all activation functions are designed as parametric linear rectification functions; next, f ₁ 'the sliding window self-attention mechanism shown in FIG. 3 is processed, and features are divided into n×n vectors with length C' along the channel dimension in a window with size of n×n and step of m, so as to obtain a vector set

Where i=1, 2,) H/m, j=1, 2,) W/m are window numbers of the width-wise split, respectively; then, the index weight W _Q Query weight W _K And content weight W _V Multiplying each vector separately, differentiating the feature vector into an index vector Q, a query vector K and a content vector V, the process can be expressed as:

Q＝W _Q w _i,j ,K＝W _K w _i,j ,V＝W _V w _i,j

transpose K of index vector Q and query vector ^T Performing matrix multiplication, namely calculating an inner product in the vector set, and calculating the correlation between different vectors in the vector set; the correlation matrix is subjected to softmax normalization processing and then multiplied by a content vector V to obtain the output of the self-attention mechanism

Wherein B is _P For relative position coding, for reducing repetitive self-attention calculations introduced during sliding window, d _k Is the length of the feature vector; then, the feature vector is firstly subjected to the full connection layer I and then is spliced with the input image, and then an output vector is obtained through the full connection layer II

Can be expressed as:

in which W is ₁ ′、W ₂ ' is the weight parameter of the full connection layer one and two, B ₁ ′、B ₂ ' bias parameters of the full connection layer one and the full connection layer two respectively; after obtaining the output vector, the output vector is recombined into a feature map f according to the original sequence ₂ ′∈ ^C ' ^×H×W Wherein overlapping pixels are replaced by the desired gray value of the pixel in each window; finally, feature map f ₂ ' through a convolution operation with a kernel size of 1 x 1 and a step size of 1, and then with the input feature map f _i ' adding to achieve local residual connection, resulting in an output feature f _o ' the process can be expressed as:

f _o ′＝σ(W _o ′*f ₂ ′+B _o ′)+f _i ′

the image feature extraction module obtains output features f ₄ ∈ ^5C×H×W Then, the channel number is compressed to the initial characteristic f by a channel compression layer and convolution with a kernel size of 1×1 and a step size of 1 ₁ ∈ ^C×H×W Similarly, after adding to the original features, the channels are further reduced to the square of the scale by a convolution layer with a kernel size of 1×1 and a step size of 1, and finally the final super-resolution reconstructed image I is output using pixel rebinning as shown in fig. 5 _SR ∈ ^1×sH×sW (s is a superdivision multiple); this operation may be expressed specifically as:

I _SR ＝G _pixelshuffle (W _c2 *σ(W _c1 *f ₄ )+f ₁ )

in which W is _c1 、W _c2 G is the weight parameter of the channel compression layer and the convolution layer _pixelshuffle (·) represents a pixel reorganization operation;

step 2, preparing a data set;

the dataset in step 2 was a FLIR ADAS dataset comprising thermal infrared images at a rate of 8862 Zhang Fenbian of 512 x 640; firstly, cutting the images into 256 multiplied by 256 image blocks to obtain 37976 image blocks in total, then obtaining low-resolution images by bicubic downsampling, and combining the images into high-low-resolution image pairs; in order to expand the data volume, the image is subjected to horizontal overturning, vertical overturning, rotation, translation and zooming cutting transformation;

step 3, training a network model;

the training scheme in the step 3 specifically comprises the following steps: setting the training frequency as 100, wherein the number of the network pictures input each time is about 16-32, the upper limit of the number of the network pictures input each time is mainly determined according to the performance of a computer graphic processor, and the number of the network pictures input each time is generally within a 16-32 interval, so that the network training is more stable and the training result is better; the learning rate in the training process is set to be 0.001, so that the training speed can be ensured, and the problem of gradient explosion can be avoided; training is carried out for 100 times, 150 times and 175 times, the learning rate is reduced to 0.1 of the current learning rate, and the optimal value of the parameter can be better approached; the network parameter optimizer selects the self-adaptive moment estimation algorithm, and has the advantages that after bias correction, each iteration learning rate has a determined range, so that the parameters are stable; the threshold value of the function value of the loss function is set to be 0.01, and the training of the whole network can be considered to be basically completed when the function value of the loss function is smaller than the threshold value;

step 4, minimizing a loss function and selecting an optimal evaluation index;

in the step 4, the loss value is calculated at the output and the label of the network, and a better super-resolution reconstruction effect is achieved by minimizing a loss function; selecting structural similarity and pixel loss by the loss function, and adjusting the use of the loss function according to the current training effect of the model; the structural similarity calculation formula is as follows:

SSIM(x,y)＝[l(x,y)] ^α ·[c(x,y)] ^β ·[s(x,y)] ^γ

wherein l (x, y) represents a brightness contrast function, c (x, y) represents a contrast function, s (x, y) represents a structure contrast function, and three functions are defined as follows:

in practical application, alpha, beta and gamma are all 1, C ₃ At 0.5C ₂ The structural similarity formula can thus be expressed as:

x and y respectively represent pixel points of a window with the size of N multiplied by N in two images, mu _x Sum mu _y The average value of x and y is shown as the brightness estimation; sigma (sigma) _x Sum sigma _y The variances of x and y are represented respectively and can be used as contrast estimation; sigma (sigma) _xy The covariance of x and y is represented and can be used as a structural similarity measure; c1 and c2 are minimum value parameters, the denominator can be prevented from being 0, and 0.01 and 0.03 are usually taken respectively; the structural similarity of the whole image is calculated by definition as follows:

x and Y respectively represent two images to be compared, MN is the total number of windows, X _ij And y _ij Each local window in the two pictures; the structural similarity has symmetry and the numerical value ranges are 0,1]The closer the numerical value is to 1, the more similar the structureThe greater the sex, the smaller the difference between the two images; in general, the difference between the two components and 1 is directly reduced through network optimization, and the structural similarity loss is as follows:

SSIM _loss ＝1-MSSIM(I _ir ,I _SR )

by optimizing the structural similarity loss, the difference between the output image and the input image in structure can be gradually reduced, so that the images are more similar in brightness and contrast, are more similar in intuitional perception, and have higher generated image quality;

the pixel loss function is defined as follows:

when the network training starts or serious fluctuation occurs, the pixel loss can stably optimize the network parameters, so that the network continues to train in the correct direction; however, the pixel loss mainly comes from the difference of the low-frequency part in the energy concentration, even if the difference is small, the loss of the structural similarity, which focuses on the difference of the image structure, is more suitable for fine adjustment of the network; based on this, the total loss function is defined as:

in the step 4, a peak signal-to-noise ratio (PSNR) and a Structural Similarity (SSIM) are selected according to the appropriate evaluation index, wherein the peak signal-to-noise ratio is based on the error between corresponding pixels, namely based on error-sensitive image quality evaluation; the structural similarity is an index for measuring the similarity degree of two digital images by measuring the similarity of images from three aspects of brightness, contrast and structure; structural similarity definition and loss function peak signal to noise ratio quality assessment is defined as follows:

step 5, fine tuning the model;

in the step 5, the infrared image data of the MFNet and TNO data sets are adopted, and the infrared image data comprise about 2000 infrared images, and the resolution is 640 multiplied by 480; performing image preprocessing operation in the same step 2 on the image to obtain a model fine-tuning data set; loading the model weight parameters obtained in the step 4, adjusting the learning rate to 0.000001, inputting the image pairs of the model fine tuning dataset into the model, and continuously training for 10 training periods;

step 6, saving the model and parameters;

after the network training is completed in the step 6, all parameters in the network are required to be stored, and then the super-resolution reconstruction result can be obtained by inputting images with any size;

the implementation of convolution, splicing, up-down sampling and other operations is an algorithm well known to those skilled in the art, and the specific flow and method can be referred to in corresponding textbooks or technical literature.

The lightweight infrared super-resolution self-adaptive reconstruction method can obtain a higher-quality super-resolution reconstruction effect, has a smaller parameter amount compared with the prior complex network due to the lightweight structure, and can be applied to various mobile devices; the feasibility and superiority of the method are further verified by calculating the related indexes of the image obtained by the existing method; related index pairs of the prior art and the proposed method of the present invention are shown in fig. 6;

based on the same inventive concept as the above method for reconstructing super-resolution image, the embodiment of the present application further provides an electronic device, which may specifically be a desktop computer, a portable computer, an edge computing device, a tablet computer, a smart phone, etc. with signal transmission, floating point operation and storage, as shown in fig. 7, the electronic device may be composed of a main component processor, a memory, and a communication interface;

the processor may be a general-purpose processor, such as a Central Processing Unit (CPU), digital Signal Processor (DSP), graphics Processor (GPU), application Specific Integrated Circuit (ASIC), field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application; the general purpose processor may be a microprocessor or any conventional processor or the like; the steps of the method disclosed in connection with the embodiments of the present application may be directly embodied as performed by a hardware processor, or may be performed by a combination of hardware and software modules in a processor;

the memory is used as a nonvolatile computer readable storage medium for storing nonvolatile software programs, nonvolatile computer executable programs and modules; the memory may include at least one type of storage medium, which may include, for example, random Access Memory (RAM), static Random Access Memory (SRAM), charged erasable programmable read-only memory (EEPROM), magnetic memory, optical disk, and the like; memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such; the memory in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function, and is configured to store program instructions or data;

the communication interface may be used for data transmission between the computing device and other computing devices, terminals or imaging devices, and may employ a general-purpose protocol, such as Universal Serial Bus (USB), synchronous/asynchronous serial receiver/transmitter (USART), controller Area Network (CAN), etc.; the communication interface can be an interface for transferring data between different devices and a communication protocol thereof, but is not limited thereto; the communication interface in the embodiment of the present application may also be optical communication or any other manner or protocol capable of implementing information transmission;

the invention also provides a lightweight infrared super-resolution self-adaptive reconstruction computer readable storage medium, which can be the computer readable storage medium contained in the device in the embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device; the computer-readable storage medium stores one or more programs for use by one or more processors to perform the methods described herein;

it should be noted that while the electronic device shown in fig. 7 shows only a memory, a processor, and a communication interface, in a particular implementation, those skilled in the art will appreciate that the apparatus also includes other devices necessary to achieve proper operation; meanwhile, as will be appreciated by those skilled in the art, the apparatus may further include components for implementing other additional functions according to specific needs; furthermore, it will be appreciated by those skilled in the art that the apparatus may also include only the devices necessary to implement the embodiments of the present invention, and not necessarily all of the devices shown in FIG. 7.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A lightweight infrared super-resolution self-adaptive reconstruction method is characterized in that: the method comprises the following steps:

2. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: the input initialization of the infrared image super-resolution reconstruction model in the step 1 is a single-layer convolution layer, and the input image is mapped into a feature space for further refinement and processing of subsequent features; the image feature extraction module consists of four layers of self-adaptive image feature processing units, in particular, the self-adaptive image feature processing unit consists of a first convolution layer, a self-attention layer and a second convolution layer, wherein the self-attention layer consists of linear feature disassembly, a self-attention mechanism, a relative position coding layer, a first full-connection layer, a second full-connection layer and feature recombination; the output image reconstruction module consists of a channel compression layer, a global jump connection and a pixel recombination layer.

3. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: the self-attention mechanism in step 1.

4. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in the step 1, the transform module is composed of two layers of normalization layers and two summation operations composed of an efficient global local multi-head self-attention (EGLMSA) and a multi-layer perceptron (MLP), wherein the efficient global local multi-head self-attention layer extracts global context and local context respectively, the global context is critical for semantic segmentation of complex urban scenes, but local information is critical for saving abundant space details, and the proposed effective global-local attention constructs two parallel branches. A local branch is a relatively shallow structure that uses two parallel convolution layers to extract a local context. Then adding two batch normalization operations before the final sum operation; the global branch is firstly deployed with a depth convolution to reduce the resolution of an image, so that the calculated amount and the memory are compressed, then the vector is used as the input of layer normalization, three vectors Q, K, V are sent into three linear predictions, Q, K, V are obtained by linearly transforming the input word vector X, each matrix W can be obtained through learning, the transformation can improve the fitting capacity of a model, the obtained Q, K, V can be understood as Q, information to be queried, K, the queried vector and a value obtained through V, matrix multiplication operation is carried out on the Q and the K vectors, then the obtained attention and V vectors are subjected to matrix multiplication operation through a convolution layer, a Softmax activation function and an instance normalization operation, finally the global context in the global branch and the local context in the local branch are further aggregated to generate a global-local context, and the depth convolution, the batch processing normalization operation and the standard convolution are used for representing the global-local context with fine granularity.

5. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: the semantic segmentation data set in the step 2 uses an MFNet data set; cutting the pictures of the training set and the verification set into a plurality of block pictures, wherein the resolution and the dimension of each block picture are the initial resolution and the initial dimension; and carrying out semantic segmentation labeling on the class of the segmented picture.

6. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in the step 3, the MFNet data set is used for the semantic segmentation data set in the pre-training process; the method comprises the steps of obtaining visible light color images and infrared images through separation of four image channels of a data set, selecting images with complex scenes, multiple details and complete categories as training samples, taking the rest images as test set samples, and respectively taking the visible light images and the infrared images as input networks for training.

7. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in the step 4, a DiceLoss loss function is selected as the loss function in the training process; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the true value, and can correctly feed back the quality of the model.

8. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in step 5, SODA is used in fine tuning the model parameters.