CN116402679A - Lightweight infrared super-resolution self-adaptive reconstruction method - Google Patents

Lightweight infrared super-resolution self-adaptive reconstruction method Download PDF

Info

Publication number
CN116402679A
CN116402679A CN202211692350.XA CN202211692350A CN116402679A CN 116402679 A CN116402679 A CN 116402679A CN 202211692350 A CN202211692350 A CN 202211692350A CN 116402679 A CN116402679 A CN 116402679A
Authority
CN
China
Prior art keywords
image
resolution
model
infrared
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211692350.XA
Other languages
Chinese (zh)
Other versions
CN116402679B (en
Inventor
蒋一纯
刘云清
詹伟达
陈宇
韩登
于永吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University of Science and Technology
Original Assignee
Changchun University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University of Science and Technology filed Critical Changchun University of Science and Technology
Priority to CN202211692350.XA priority Critical patent/CN116402679B/en
Priority claimed from CN202211692350.XA external-priority patent/CN116402679B/en
Publication of CN116402679A publication Critical patent/CN116402679A/en
Application granted granted Critical
Publication of CN116402679B publication Critical patent/CN116402679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of image processing, in particular to a lightweight infrared super-resolution self-adaptive reconstruction method, which comprises the following steps: step 1, constructing a network model: the infrared image super-resolution reconstruction model comprises an input initialization layer, an image feature extraction module and an output image reconstruction module; step 2, preparing a data set: preparing an infrared image data set, and performing analog downsampling and data augmentation on the infrared image data set so as to perform subsequent network training; step 3, training a network model: and training an infrared image super-resolution reconstruction model. The self-adaptive image feature processing unit provided by the invention limits the self-attention mechanism in the sliding window, and self-adaptively calculates and updates the feature value in the window depending on each feature in the sliding window, so that the same convolution kernel is avoided being adopted in the local window, the expression capability is improved, and the calculated amount generated in the self-attention mechanism training and reasoning process is reduced.

Description

Lightweight infrared super-resolution self-adaptive reconstruction method
Technical Field
The invention relates to the technical field of image processing, in particular to a lightweight infrared super-resolution self-adaptive reconstruction method.
Background
The imaging mechanism of the infrared image is imaging by sensing thermal radiation emitted by objects in the environment, does not depend on reflection of ambient light or an artificial light source, and has strong anti-interference and all-weather working capacity; due to the excellent identification capability and the characteristics of passive imaging, the method is widely applied to the fields of military, automatic driving, security protection and the like; however, the manufacturing process of the infrared imaging sensor is complex, and the dense array needs to be supported by a refrigerator, so that the resolution ratio of the infrared imaging sensor is generally low and the cost is high; compared with a direct improved imaging sensor, the method for recovering high-frequency information in partial infrared images by using an image super-resolution method can improve the resolution and quality of the images, can effectively improve the imaging quality, is low in cost, and has important practical significance and wide application prospect; the super-resolution of the infrared image is a high underdetermined problem, and the lost details need to be estimated through a large number of image structural relations, so that the super-resolution reconstruction of the infrared image is difficult; the mainstream scheme at present is to use a convolutional neural network to complete mapping from a low-resolution infrared image to a high-resolution infrared image, which is limited by the principle of convolutional kernel parameter multiplexing in the convolutional network.
The Chinese patent publication number is CN112308772B, the name is CN112308772B, the method is a super-resolution reconstruction method based on deep learning local and non-local information, a deep neural network model is constructed, the same set of feature screening network is time-division multiplexed after an image is input into the network, the two modules comprise a local network and a non-local enhancement network, and lost details in the image are recovered through a very deep convolution operation; the convolution operation adopts a fixed convolution kernel at each layer, so that the expression capacity of a shallow network is poor, the network is often designed to be deep and wide, and the computational complexity and the storage capacity occupancy rate are high; therefore, how to overcome the limitation of convolution operation, to achieve high quality super-resolution reconstruction through a small number of learnable parameters and multiply-add operation is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a lightweight infrared super-resolution self-adaptive reconstruction method, which solves the problems in the background art.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes:
a lightweight infrared super-resolution self-adaptive reconstruction method comprises the following steps:
step 1, constructing a network model: the infrared image super-resolution reconstruction model comprises an input initialization layer, an image feature extraction module and an output image reconstruction module;
step 2, preparing a data set: preparing an infrared image data set, and performing analog downsampling and data augmentation on the infrared image data set so as to perform subsequent network training;
step 3, training a network model: training an infrared image super-resolution reconstruction model, and inputting the data set prepared in the step 2 into the network model constructed in the step 1 for training;
step 4, minimizing the loss function and selecting an optimal evaluation index: outputting a loss function of the image and the label through a minimized network, considering that the model parameters are pre-trained and finishing until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters; simultaneously selecting an optimal evaluation index to measure the accuracy of the algorithm and evaluating the performance of the system;
step 5, fine tuning the model: preparing a plurality of additional infrared image data sets, training and fine-tuning the model to obtain better model parameters, and further improving the generalization capability of the model; finally, the model maintains good reconstruction quality when coping with infrared imagers of various models;
step 6, saving the model: and solidifying the finally determined model parameters, and directly inputting the image into a network to obtain a final reconstructed image when the infrared image super-resolution reconstruction operation is needed.
In the light-weight infrared super-resolution self-adaptive reconstruction method, in the step 1, input is initialized into a single-layer convolution layer in an infrared image super-resolution reconstruction model, and the single-layer convolution layer is used for mapping an input image into a feature space for further refinement and processing of subsequent features; the image feature extraction module consists of four layers of self-adaptive image feature processing units, in particular, the self-adaptive image feature processing unit consists of a first convolution layer, a self-attention layer and a second convolution layer, wherein the self-attention layer consists of linear feature disassembly, a self-attention mechanism, a relative position coding layer, a first full-connection layer, a second full-connection layer and feature recombination; the output image reconstruction module consists of a channel compression layer, a global jump connection and a pixel recombination layer.
According to the lightweight infrared super-resolution self-adaptive reconstruction method, the FLIRADAS dataset is used for the infrared image dataset in the training process in the step 2; respectively simulating downsampling by 2 times, 3 times and 4 times on infrared images in a data set, and performing supervised training on super-resolution reconstruction models of different super-resolution scales;
according to the lightweight infrared super-resolution self-adaptive reconstruction method, in the step 4, the self-adaptive loss function is selected and used as the loss function in the training process, and under the condition of high deviation value, pixel loss is introduced to stably and rapidly optimize network parameters, so that the problem of gradient explosion is avoided; when the deviation value is reduced below the threshold value, adopting structural loss to restore the texture details focused on the image when the network parameters are optimized; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the true value, and can correctly feed back the quality of the model.
According to the lightweight infrared super-resolution self-adaptive reconstruction method, in the step 4, the proper evaluation indexes in the training process select peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM), so that the quality of the super-resolution reconstruction result of the algorithm and the distortion degree between the real high-resolution images can be effectively evaluated, and the performance of the network model can be measured.
In the above-mentioned light-weight infrared super-resolution adaptive reconstruction method, in the step 5, MFNet and TNO datasets are used in the process of fine tuning model parameters.
The invention also provides a lightweight infrared super-resolution electronic device, comprising: a multifunctional video stream input/output interface, a central processing unit, a plurality of graphic processing units, a storage device and a computer program stored on the storage device and capable of running on the processor; wherein the steps of the above method are implemented when the central processing unit and the plurality of image processing units execute a computer program.
The invention also provides a computer readable storage medium having stored thereon computer program instructions which when executed by a processor perform the steps of the above method.
(III) beneficial effects
Compared with the prior art, the invention provides a lightweight infrared super-resolution self-adaptive reconstruction method, which has the following beneficial effects:
the self-adaptive image feature processing unit provided by the invention limits the self-attention mechanism in the sliding window, and self-adaptively calculates and updates the feature value in the window depending on each feature in the sliding window, so that the same convolution kernel is avoided being adopted in the local window, the expression capability is improved, and the calculated amount generated in the self-attention mechanism training and reasoning process is reduced.
In the self-adaptive image characteristic processing unit, the relative position codes are added in the sliding window, so that the overlapping part is prevented from being repeatedly calculated when self-attention is calculated; and mathematical expectation updating of corresponding areas in each window is used in the recombination of the overlapped parts, and no additional information interaction means between windows are needed.
The invention does not use layer normalization operation in the self-attention computing mechanism, thereby ensuring the integrity of image structure information and contrast information; meanwhile, the input characteristic vector and the new characteristic vector are spliced and then input into a feedforward network for updating, so that the low-frequency structure of the image is better kept.
The invention provides a self-adaptive loss function, which can automatically select to enable a network to learn overall similarity or image texture details through monitoring the state of a network model in real time in the training process, thereby improving the reconstruction performance of the finally obtained network model.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram showing a network model structure according to the present invention;
FIG. 3 is a process flow diagram of an adaptive image processing unit of the present invention;
FIG. 4 is a schematic diagram of the working principle of the feature diagram in the self-focusing mechanism of the sliding window according to the present invention;
FIG. 5 is a schematic diagram illustrating the operation of the pixel reorganization according to the present invention;
FIG. 6 is a graph of the main performance index comparison result of the method for realizing light-weight infrared super resolution and the prior art according to the present invention;
fig. 7 is a schematic diagram of an internal structure of an electronic device for implementing a lightweight infrared super-resolution method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, a flow chart of a lightweight infrared super-resolution adaptive reconstruction method specifically includes the following steps:
step 1, constructing a network model: the infrared image super-resolution reconstruction model comprises an input initialization layer, an image feature extraction module and an output image reconstruction module; the input initialization is a single-layer convolution layer and is used for mapping the input image into a feature space for further refinement and processing of subsequent features; the image feature extraction module consists of four layers of self-adaptive image feature processing units, in particular, the self-adaptive image feature processing unit consists of a first convolution layer, a feature disassembly layer, a relative position coding layer, a self-attention layer, a feature recombination layer and a second convolution layer, wherein the self-attention layer consists of a linear self-attention mechanism, a first full-connection layer and a second full-connection layer; the output image reconstruction module consists of a channel compression layer, a global jump connection layer and a pixel recombination layer;
step 2, preparing a data set: preparing an FLIRADAS infrared image data set; the infrared images in the data set are amplified, and downsampling is simulated by 2 times, 3 times and 4 times respectively, so that the infrared images are used for performing supervised training on super-resolution reconstruction models with different super-resolution scales;
step 3, training a network model: training an infrared image super-resolution reconstruction model, and inputting the data set prepared in the step 2 into the network model constructed in the step 1 for training;
step 4, minimizing the loss function and selecting an optimal evaluation index: the self-adaptive loss function is selected and used for the loss function in the training process, and under the condition of high deviation value, pixel loss is introduced to stably and rapidly optimize network parameters, so that the problem of gradient explosion is avoided; when the deviation value is reduced below the threshold value, adopting structural loss to restore the texture details focused on the image when the network parameters are optimized; outputting a loss function of the image and the label through a minimized network, considering that the model parameters are pre-trained and finishing until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters; simultaneously selecting an optimal evaluation index to measure the accuracy of the algorithm and evaluating the performance of the system;
step 5, fine tuning the model: preparing an MFNet and TNO infrared image data set, training and fine-tuning the model to obtain better model parameters, and further improving the generalization capability of the model; finally, the model maintains good reconstruction quality when coping with infrared imagers of various models;
step 6, saving the model: and solidifying the finally determined model parameters, and directly inputting the image into a network to obtain a final reconstructed image when the infrared super-resolution operation is needed.
Example 2:
as shown in fig. 1, a flow chart of a lightweight infrared super-resolution adaptive reconstruction method specifically includes the following steps:
step 1, constructing a network model;
the super-resolution reconstruction model of the whole infrared image in the step 1 comprises an input initialization layer, an image feature extraction module and an output image reconstruction module; the input initialization layer is a convolution layer with 3×3 convolution kernel, 1 step size, 1 padding and offset parameter set, which is derived from input I ir1×H×W Conversion into feature space to obtain initial feature f 1C×H×W The process of (1) can be expressed as:
f 1 =W 1 *I ir +B 1
in which W is 1 To input the convolution kernel in the initialization layer, B 1 For the bias in the convolution operation, represent the convolution operation;
then, the features are further processed by an input image feature extraction module, wherein the image feature extraction module comprises 4 self-adaptive image processing units, each unit is responsible for processing the feature image output by the previous layer, and the output feature image of the previous layer and the input feature image are spliced in the channel dimension and then output; feature f 1C×H×W Input image feature extraction module, and obtain output features f of each unit n(n+1)C×H×W The specific process of n=1, 2,3,4 can be expressed as:
Figure BDA0004021750130000071
in the method, in the process of the invention,
Figure BDA0004021750130000072
the n-th self-adaptive image processing unit has the working principle shown in figure 2; in the adaptive image processing unit, a feature map f i ′∈ C×H×W The channel is changed into the length of the characteristic vector required by the self-attention mechanism by convolution with a convolution kernel of 1 multiplied by 1 and a step length of 1 to obtain a new characteristic diagram f 1 ′∈ C ' ×H×W The process can be expressed as:
f 1 ′=σ(W i ′*f i ′+B i ′)
wherein σ (x) =max (x, 0) +min (x, p) is a parametric linear rectification function; because of its high efficiency and excellent fitting ability, in the present invention, all activation functions are designed as parametric linear rectification functions; next, f 1 'the sliding window self-attention mechanism shown in FIG. 3 is processed, and features are divided into n×n vectors with length C' along the channel dimension in a window with size of n×n and step of m, so as to obtain a vector set
Figure BDA0004021750130000081
Where i=1, 2,) H/m, j=1, 2,) W/m are window numbers of the width-wise split, respectively; then, the index weight W Q Query weight W K And content weight W V Multiplying each vector separately, differentiating the feature vector into an index vector Q, a query vector K and a content vector V, the process can be expressed as:
Q=W Q w i,j ,K=W K w i,j ,V=W V w i,j
transpose K of index vector Q and query vector T Performing matrix multiplication, namely calculating an inner product in the vector set, and calculating the correlation between different vectors in the vector set; the correlation matrix is subjected to softmax normalization processing and then multiplied by a content vector V to obtain the output of the self-attention mechanism
Figure BDA0004021750130000082
Figure BDA0004021750130000083
Wherein B is P For relative position coding, for reducing repetitive self-attention calculations introduced during sliding window, d k Is the length of the feature vector; then, the feature vector is firstly subjected to the full connection layer I and then is spliced with the input image, and then an output vector is obtained through the full connection layer II
Figure BDA0004021750130000084
Can be expressed as:
Figure BDA0004021750130000085
in which W is 1 ′、W 2 ' is the weight parameter of the full connection layer one and two, B 1 ′、B 2 ' bias parameters of the full connection layer one and the full connection layer two respectively; after obtaining the output vector, the output vector is recombined into a feature map f according to the original sequence 2 ′∈ C ' ×H×W Wherein overlapping pixels are replaced by the desired gray value of the pixel in each window; finally, feature map f 2 ' through a convolution operation with a kernel size of 1 x 1 and a step size of 1, and then with the input feature map f i ' adding to achieve local residual connection, resulting in an output feature f o ' the process can be expressed as:
f o ′=σ(W o ′*f 2 ′+B o ′)+f i
the image feature extraction module obtains output features f 45C×H×W Then, the channel number is compressed to the initial characteristic f by a channel compression layer and convolution with a kernel size of 1×1 and a step size of 1 1C×H×W Similarly, after adding to the original features, the channels are further reduced to the square of the scale by a convolution layer with a kernel size of 1×1 and a step size of 1, and finally the final super-resolution reconstructed image I is output using pixel rebinning as shown in fig. 5 SR1×sH×sW (s is a superdivision multiple); this operation may be expressed specifically as:
I SR =G pixelshuffle (W c2 *σ(W c1 *f 4 )+f 1 )
in which W is c1 、W c2 G is the weight parameter of the channel compression layer and the convolution layer pixelshuffle (·) represents a pixel reorganization operation;
step 2, preparing a data set;
the dataset in step 2 was a FLIR ADAS dataset comprising thermal infrared images at a rate of 8862 Zhang Fenbian of 512 x 640; firstly, cutting the images into 256 multiplied by 256 image blocks to obtain 37976 image blocks in total, then obtaining low-resolution images by bicubic downsampling, and combining the images into high-low-resolution image pairs; in order to expand the data volume, the image is subjected to horizontal overturning, vertical overturning, rotation, translation and zooming cutting transformation;
step 3, training a network model;
the training scheme in the step 3 specifically comprises the following steps: setting the training frequency as 100, wherein the number of the network pictures input each time is about 16-32, the upper limit of the number of the network pictures input each time is mainly determined according to the performance of a computer graphic processor, and the number of the network pictures input each time is generally within a 16-32 interval, so that the network training is more stable and the training result is better; the learning rate in the training process is set to be 0.001, so that the training speed can be ensured, and the problem of gradient explosion can be avoided; training is carried out for 100 times, 150 times and 175 times, the learning rate is reduced to 0.1 of the current learning rate, and the optimal value of the parameter can be better approached; the network parameter optimizer selects the self-adaptive moment estimation algorithm, and has the advantages that after bias correction, each iteration learning rate has a determined range, so that the parameters are stable; the threshold value of the function value of the loss function is set to be 0.01, and the training of the whole network can be considered to be basically completed when the function value of the loss function is smaller than the threshold value;
step 4, minimizing a loss function and selecting an optimal evaluation index;
in the step 4, the loss value is calculated at the output and the label of the network, and a better super-resolution reconstruction effect is achieved by minimizing a loss function; selecting structural similarity and pixel loss by the loss function, and adjusting the use of the loss function according to the current training effect of the model; the structural similarity calculation formula is as follows:
SSIM(x,y)=[l(x,y)] α ·[c(x,y)] β ·[s(x,y)] γ
wherein l (x, y) represents a brightness contrast function, c (x, y) represents a contrast function, s (x, y) represents a structure contrast function, and three functions are defined as follows:
Figure BDA0004021750130000101
in practical application, alpha, beta and gamma are all 1, C 3 At 0.5C 2 The structural similarity formula can thus be expressed as:
Figure BDA0004021750130000102
x and y respectively represent pixel points of a window with the size of N multiplied by N in two images, mu x Sum mu y The average value of x and y is shown as the brightness estimation; sigma (sigma) x Sum sigma y The variances of x and y are represented respectively and can be used as contrast estimation; sigma (sigma) xy The covariance of x and y is represented and can be used as a structural similarity measure; c1 and c2 are minimum value parameters, the denominator can be prevented from being 0, and 0.01 and 0.03 are usually taken respectively; the structural similarity of the whole image is calculated by definition as follows:
Figure BDA0004021750130000103
x and Y respectively represent two images to be compared, MN is the total number of windows, X ij And y ij Each local window in the two pictures; the structural similarity has symmetry and the numerical value ranges are 0,1]The closer the numerical value is to 1, the more similar the structureThe greater the sex, the smaller the difference between the two images; in general, the difference between the two components and 1 is directly reduced through network optimization, and the structural similarity loss is as follows:
SSIM loss =1-MSSIM(I ir ,I SR )
by optimizing the structural similarity loss, the difference between the output image and the input image in structure can be gradually reduced, so that the images are more similar in brightness and contrast, are more similar in intuitional perception, and have higher generated image quality;
the pixel loss function is defined as follows:
Figure BDA0004021750130000111
when the network training starts or serious fluctuation occurs, the pixel loss can stably optimize the network parameters, so that the network continues to train in the correct direction; however, the pixel loss mainly comes from the difference of the low-frequency part in the energy concentration, even if the difference is small, the loss of the structural similarity, which focuses on the difference of the image structure, is more suitable for fine adjustment of the network; based on this, the total loss function is defined as:
Figure BDA0004021750130000112
in the step 4, a peak signal-to-noise ratio (PSNR) and a Structural Similarity (SSIM) are selected according to the appropriate evaluation index, wherein the peak signal-to-noise ratio is based on the error between corresponding pixels, namely based on error-sensitive image quality evaluation; the structural similarity is an index for measuring the similarity degree of two digital images by measuring the similarity of images from three aspects of brightness, contrast and structure; structural similarity definition and loss function peak signal to noise ratio quality assessment is defined as follows:
Figure BDA0004021750130000113
step 5, fine tuning the model;
in the step 5, the infrared image data of the MFNet and TNO data sets are adopted, and the infrared image data comprise about 2000 infrared images, and the resolution is 640 multiplied by 480; performing image preprocessing operation in the same step 2 on the image to obtain a model fine-tuning data set; loading the model weight parameters obtained in the step 4, adjusting the learning rate to 0.000001, inputting the image pairs of the model fine tuning dataset into the model, and continuously training for 10 training periods;
step 6, saving the model and parameters;
after the network training is completed in the step 6, all parameters in the network are required to be stored, and then the super-resolution reconstruction result can be obtained by inputting images with any size;
the implementation of convolution, splicing, up-down sampling and other operations is an algorithm well known to those skilled in the art, and the specific flow and method can be referred to in corresponding textbooks or technical literature.
The lightweight infrared super-resolution self-adaptive reconstruction method can obtain a higher-quality super-resolution reconstruction effect, has a smaller parameter amount compared with the prior complex network due to the lightweight structure, and can be applied to various mobile devices; the feasibility and superiority of the method are further verified by calculating the related indexes of the image obtained by the existing method; related index pairs of the prior art and the proposed method of the present invention are shown in fig. 6;
based on the same inventive concept as the above method for reconstructing super-resolution image, the embodiment of the present application further provides an electronic device, which may specifically be a desktop computer, a portable computer, an edge computing device, a tablet computer, a smart phone, etc. with signal transmission, floating point operation and storage, as shown in fig. 7, the electronic device may be composed of a main component processor, a memory, and a communication interface;
the processor may be a general-purpose processor, such as a Central Processing Unit (CPU), digital Signal Processor (DSP), graphics Processor (GPU), application Specific Integrated Circuit (ASIC), field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application; the general purpose processor may be a microprocessor or any conventional processor or the like; the steps of the method disclosed in connection with the embodiments of the present application may be directly embodied as performed by a hardware processor, or may be performed by a combination of hardware and software modules in a processor;
the memory is used as a nonvolatile computer readable storage medium for storing nonvolatile software programs, nonvolatile computer executable programs and modules; the memory may include at least one type of storage medium, which may include, for example, random Access Memory (RAM), static Random Access Memory (SRAM), charged erasable programmable read-only memory (EEPROM), magnetic memory, optical disk, and the like; memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such; the memory in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function, and is configured to store program instructions or data;
the communication interface may be used for data transmission between the computing device and other computing devices, terminals or imaging devices, and may employ a general-purpose protocol, such as Universal Serial Bus (USB), synchronous/asynchronous serial receiver/transmitter (USART), controller Area Network (CAN), etc.; the communication interface can be an interface for transferring data between different devices and a communication protocol thereof, but is not limited thereto; the communication interface in the embodiment of the present application may also be optical communication or any other manner or protocol capable of implementing information transmission;
the invention also provides a lightweight infrared super-resolution self-adaptive reconstruction computer readable storage medium, which can be the computer readable storage medium contained in the device in the embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device; the computer-readable storage medium stores one or more programs for use by one or more processors to perform the methods described herein;
it should be noted that while the electronic device shown in fig. 7 shows only a memory, a processor, and a communication interface, in a particular implementation, those skilled in the art will appreciate that the apparatus also includes other devices necessary to achieve proper operation; meanwhile, as will be appreciated by those skilled in the art, the apparatus may further include components for implementing other additional functions according to specific needs; furthermore, it will be appreciated by those skilled in the art that the apparatus may also include only the devices necessary to implement the embodiments of the present invention, and not necessarily all of the devices shown in FIG. 7.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A lightweight infrared super-resolution self-adaptive reconstruction method is characterized in that: the method comprises the following steps:
step 1, constructing a network model: the infrared image super-resolution reconstruction model comprises an input initialization layer, an image feature extraction module and an output image reconstruction module;
step 2, preparing a data set: preparing an infrared image data set, and performing analog downsampling and data augmentation on the infrared image data set so as to perform subsequent network training;
step 3, training a network model: training an infrared image super-resolution reconstruction model, and inputting the data set prepared in the step 2 into the network model constructed in the step 1 for training;
step 4, minimizing the loss function and selecting an optimal evaluation index: outputting a loss function of the image and the label through a minimized network, considering that the model parameters are pre-trained and finishing until the training times reach a set threshold value or the value of the loss function reaches a set range, and storing the model parameters; simultaneously selecting an optimal evaluation index to measure the accuracy of the algorithm and evaluating the performance of the system;
step 5, fine tuning the model: preparing a plurality of additional infrared image data sets, training and fine-tuning the model to obtain better model parameters, and further improving the generalization capability of the model; finally, the model maintains good reconstruction quality when coping with infrared imagers of various models;
step 6, saving the model: and solidifying the finally determined model parameters, and directly inputting the image into a network to obtain a final reconstructed image when the infrared image super-resolution reconstruction operation is needed.
2. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: the input initialization of the infrared image super-resolution reconstruction model in the step 1 is a single-layer convolution layer, and the input image is mapped into a feature space for further refinement and processing of subsequent features; the image feature extraction module consists of four layers of self-adaptive image feature processing units, in particular, the self-adaptive image feature processing unit consists of a first convolution layer, a self-attention layer and a second convolution layer, wherein the self-attention layer consists of linear feature disassembly, a self-attention mechanism, a relative position coding layer, a first full-connection layer, a second full-connection layer and feature recombination; the output image reconstruction module consists of a channel compression layer, a global jump connection and a pixel recombination layer.
3. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: the self-attention mechanism in step 1.
4. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in the step 1, the transform module is composed of two layers of normalization layers and two summation operations composed of an efficient global local multi-head self-attention (EGLMSA) and a multi-layer perceptron (MLP), wherein the efficient global local multi-head self-attention layer extracts global context and local context respectively, the global context is critical for semantic segmentation of complex urban scenes, but local information is critical for saving abundant space details, and the proposed effective global-local attention constructs two parallel branches. A local branch is a relatively shallow structure that uses two parallel convolution layers to extract a local context. Then adding two batch normalization operations before the final sum operation; the global branch is firstly deployed with a depth convolution to reduce the resolution of an image, so that the calculated amount and the memory are compressed, then the vector is used as the input of layer normalization, three vectors Q, K, V are sent into three linear predictions, Q, K, V are obtained by linearly transforming the input word vector X, each matrix W can be obtained through learning, the transformation can improve the fitting capacity of a model, the obtained Q, K, V can be understood as Q, information to be queried, K, the queried vector and a value obtained through V, matrix multiplication operation is carried out on the Q and the K vectors, then the obtained attention and V vectors are subjected to matrix multiplication operation through a convolution layer, a Softmax activation function and an instance normalization operation, finally the global context in the global branch and the local context in the local branch are further aggregated to generate a global-local context, and the depth convolution, the batch processing normalization operation and the standard convolution are used for representing the global-local context with fine granularity.
5. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: the semantic segmentation data set in the step 2 uses an MFNet data set; cutting the pictures of the training set and the verification set into a plurality of block pictures, wherein the resolution and the dimension of each block picture are the initial resolution and the initial dimension; and carrying out semantic segmentation labeling on the class of the segmented picture.
6. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in the step 3, the MFNet data set is used for the semantic segmentation data set in the pre-training process; the method comprises the steps of obtaining visible light color images and infrared images through separation of four image channels of a data set, selecting images with complex scenes, multiple details and complete categories as training samples, taking the rest images as test set samples, and respectively taking the visible light images and the infrared images as input networks for training.
7. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in the step 4, a DiceLoss loss function is selected as the loss function in the training process; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the true value, and can correctly feed back the quality of the model.
8. The lightweight infrared super-resolution adaptive reconstruction method according to claim 1, wherein: in step 5, SODA is used in fine tuning the model parameters.
CN202211692350.XA 2022-12-28 Lightweight infrared super-resolution self-adaptive reconstruction method Active CN116402679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211692350.XA CN116402679B (en) 2022-12-28 Lightweight infrared super-resolution self-adaptive reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211692350.XA CN116402679B (en) 2022-12-28 Lightweight infrared super-resolution self-adaptive reconstruction method

Publications (2)

Publication Number Publication Date
CN116402679A true CN116402679A (en) 2023-07-07
CN116402679B CN116402679B (en) 2024-05-28

Family

ID=

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196959A (en) * 2023-11-08 2023-12-08 华侨大学 Self-attention-based infrared image super-resolution method, device and readable medium
CN117495681A (en) * 2024-01-03 2024-02-02 国网山东省电力公司济南供电公司 Infrared image super-resolution reconstruction system and method
CN117495681B (en) * 2024-01-03 2024-05-24 国网山东省电力公司济南供电公司 Infrared image super-resolution reconstruction system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092330A (en) * 2021-11-19 2022-02-25 长春理工大学 Lightweight multi-scale infrared image super-resolution reconstruction method
CN114331831A (en) * 2021-11-19 2022-04-12 长春理工大学 Light-weight single-image super-resolution reconstruction method
CN115113303A (en) * 2022-06-21 2022-09-27 天津大学 Early warning method and device for extreme weather of Ernino based on meta learning
CN115131214A (en) * 2022-08-31 2022-09-30 南京邮电大学 Indoor aged person image super-resolution reconstruction method and system based on self-attention
CN115496658A (en) * 2022-09-25 2022-12-20 桂林理工大学 Lightweight image super-resolution reconstruction method based on double attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092330A (en) * 2021-11-19 2022-02-25 长春理工大学 Lightweight multi-scale infrared image super-resolution reconstruction method
CN114331831A (en) * 2021-11-19 2022-04-12 长春理工大学 Light-weight single-image super-resolution reconstruction method
CN115113303A (en) * 2022-06-21 2022-09-27 天津大学 Early warning method and device for extreme weather of Ernino based on meta learning
CN115131214A (en) * 2022-08-31 2022-09-30 南京邮电大学 Indoor aged person image super-resolution reconstruction method and system based on self-attention
CN115496658A (en) * 2022-09-25 2022-12-20 桂林理工大学 Lightweight image super-resolution reconstruction method based on double attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YIKANG DING 等: "TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers", IEEE, 27 September 2022 (2022-09-27), pages 8585 - 8594 *
夏威: "基于卷积神经网络的热红外图像语义分割研究", 中国优秀硕士学位论文全文数据库 信息科技辑, 31 July 2020 (2020-07-31), pages 138 - 1118 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196959A (en) * 2023-11-08 2023-12-08 华侨大学 Self-attention-based infrared image super-resolution method, device and readable medium
CN117196959B (en) * 2023-11-08 2024-03-01 华侨大学 Self-attention-based infrared image super-resolution method, device and readable medium
CN117495681A (en) * 2024-01-03 2024-02-02 国网山东省电力公司济南供电公司 Infrared image super-resolution reconstruction system and method
CN117495681B (en) * 2024-01-03 2024-05-24 国网山东省电力公司济南供电公司 Infrared image super-resolution reconstruction system and method

Similar Documents

Publication Publication Date Title
CN111652321B (en) Marine ship detection method based on improved YOLOV3 algorithm
CN109191382B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN112507898B (en) Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN
CN111882002A (en) MSF-AM-based low-illumination target detection method
CN111091045A (en) Sign language identification method based on space-time attention mechanism
CN113326930B (en) Data processing method, neural network training method, related device and equipment
CN113095254B (en) Method and system for positioning key points of human body part
CN112508125A (en) Efficient full-integer quantization method of image detection model
CN111028153A (en) Image processing and neural network training method and device and computer equipment
CN111931779A (en) Image information extraction and generation method based on condition predictable parameters
Hui et al. Two-stage convolutional network for image super-resolution
Chen et al. MICU: Image super-resolution via multi-level information compensation and U-net
CN115393690A (en) Light neural network air-to-ground observation multi-target identification method
CN114529793A (en) Depth image restoration system and method based on gating cycle feature fusion
CN117197632A (en) Transformer-based electron microscope pollen image target detection method
CN116704200A (en) Image feature extraction and image noise reduction method and related device
CN116402679B (en) Lightweight infrared super-resolution self-adaptive reconstruction method
CN116309213A (en) High-real-time multi-source image fusion method based on generation countermeasure network
CN113538527B (en) Efficient lightweight optical flow estimation method, storage medium and device
CN115565039A (en) Monocular input dynamic scene new view synthesis method based on self-attention mechanism
CN116402679A (en) Lightweight infrared super-resolution self-adaptive reconstruction method
CN115272082A (en) Model training method, video quality improving method, device and computer equipment
CN115035408A (en) Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism
CN114663307A (en) Integrated image denoising system based on uncertainty network
CN115409697A (en) Image processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant