CN114581353A

CN114581353A - Infrared image processing method and device, medium and electronic equipment

Info

Publication number: CN114581353A
Application number: CN202210238624.1A
Authority: CN
Inventors: 丁顺意; 赵纪民
Original assignee: Feichuke Intelligent Technology Shanghai Co ltd
Current assignee: Shanghai Thermal Image Science And Technology Co ltd
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-06-03

Abstract

The embodiment of the application discloses a method, a device, a medium and electronic equipment for processing an infrared image. Wherein, the method comprises the following steps: acquiring an infrared image and a visible light image to be processed; wherein the infrared image and the visible light image are taken in the same scene; inputting the infrared image and the visible light image to be processed into a pre-trained convolutional neural network model as a registration pair; determining fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model; and performing fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters. According to the technical scheme, the signal-to-noise ratio and the contrast ratio of the infrared image can be improved, and the visual display effect of the infrared image is enhanced.

Description

Infrared image processing method and device, medium and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of internet, in particular to a method, a device, a medium and electronic equipment for processing an infrared image.

Background

With the rapid development of the scientific and technological level, the infrared image processing method and technology are gradually improved, and the infrared image processing method and technology are more and more widely applied to military affairs and daily life. The infrared image processing technology is based on the technology that a thermal infrared imager receives an infrared image formed after thermal radiation of an object with the temperature above absolute zero and further processes, analyzes and fuses the infrared image.

However, due to the heat exchange between the target and its surroundings, and the scattering and absorption of thermal radiation by air, the target signal is disturbed more by the background, resulting in poor contrast between the target and the background in the infrared image and blurred target edges. Therefore, how to enhance the display effect of the infrared image becomes a technical problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides an infrared image processing method, an infrared image processing device, a medium and electronic equipment, and the infrared image can be further analyzed and fused by establishing a convolutional neural network model, so that the purpose of enhancing the infrared image display effect is achieved.

In a first aspect, an embodiment of the present application provides a method for processing an infrared image, where the method includes:

acquiring an infrared image and a visible light image to be processed; wherein the infrared image and the visible light image are taken in the same scene;

the infrared image and the visible light image to be processed are used as a registration pair and input into a pre-trained convolutional neural network model, and the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result;

determining fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model;

and performing fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters.

In a second aspect, an embodiment of the present application provides an apparatus for processing an infrared image, where the apparatus includes:

the image acquisition module is used for acquiring an infrared image and a visible light image to be processed; wherein the infrared image and the visible light image are taken in the same scene;

the image preprocessing module is used for inputting the infrared image to be processed and the visible light image serving as a registration pair into a pre-trained convolutional neural network model, and the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result;

the image fusion parameter determining module is used for determining fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model;

and the image fusion processing module is used for carrying out fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a method for processing an infrared image according to an embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the method for processing an infrared image according to the embodiment of the present application.

According to the technical scheme provided by the embodiment of the invention, the infrared image and the visible light image to be processed are obtained; wherein the infrared image and the visible light image are taken in the same scene; the infrared image and the visible light image to be processed are used as a registration pair and input into a pre-trained convolutional neural network model, and the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result; determining fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model; and performing fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters. The infrared image and the visible light image are subjected to grouping convolution by establishing a convolution neural network model based on deep learning, so that the convergence speed of the convolution neural network is prevented from being influenced by information cross, the infrared image and the visible light image can be subjected to fusion processing, and the purpose of enhancing the infrared image display effect is achieved.

Drawings

Fig. 1 is a flowchart of a method for processing an infrared image according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention;

FIG. 4 is a flowchart of a convolutional neural network model training method according to a second embodiment of the present invention;

FIG. 5 is a flowchart of a method for determining a loss function in convolutional neural network model training according to a second embodiment of the present invention;

fig. 6 is a block diagram of an infrared image processing apparatus according to a third embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention;

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in greater detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently, or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a flowchart of an infrared image processing method provided in an embodiment of the present application, where the present embodiment is applicable to a case of fusing a visible light image and an infrared image, and the method may be executed by an infrared image processing apparatus provided in an embodiment of the present application, where the apparatus may be implemented by software and/or hardware, and may be integrated in an electronic device. As shown in fig. 1, the method for processing an infrared image includes:

and S110, acquiring an infrared image and a visible light image to be processed.

The infrared image is an image formed after the infrared thermal imager receives electromagnetic waves with the wavelength of 0.78-1000 mu m, and is not influenced by illumination and weather environment, but the target edge of the infrared image is fuzzy, and the contrast between the target and the background is low. The visible light image is an image formed by an ordinary camera receiving electromagnetic waves with the wavelength of 0.38-0.78 mu m, and can be influenced by illumination and weather environment, but texture and detail information in the image are stronger than that of an infrared image, and the spatial resolution is higher than that of the infrared image. It can be understood that there are many ways to acquire the infrared image and the visible light image to be processed, for example, the infrared image and the visible light image to be processed may be acquired in real time by an image acquisition device, which may be a thermal imager, a video camera, a mobile phone, a monitoring camera, and the like; the image to be recognized may also be retrieved directly from an image database, where the image in the image database may be captured by the image capturing device, and so on. In addition, the number of the infrared images and the visible light images to be processed is not limited.

Specifically, the infrared image and the visible light image to be processed are shot in the same scene, but the shooting positions and the focal lengths when the infrared image and the visible light image are obtained may be different. The advantage of this setting is, because the infrared image and the complementarity of the performance of the visible light image, make the visible light image that will shoot under the same scene carry on the fusion processing to the infrared image after can get the infrared image with better display effect.

And S120, inputting the infrared image to be processed and the visible light image serving as a registration pair into a pre-trained convolutional neural network model.

The convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result.

In order to keep the images temporally and spatially consistent, the infrared image and the visible light image to be processed need to be a registration pair. The method includes the steps of taking an infrared image to be processed and a visible light image to be processed as a registration pair, determining conversion parameters between the images according to certain similarity measurement, and converting the infrared image and the visible light image of the same scene, which are acquired from different sensors and from different visual angles at different times, to the same coordinate system to obtain the best matching on a pixel layer. The method for registering the infrared image to be processed and the visible light image is not limited, and the method can be key point detection, feature description, feature matching, image deformation or the like.

It should be noted that the to-be-processed infrared image and the visible light image as the registration pair are to-be-detected images, and need to be input into the pre-trained convolutional neural network model. Before inputting the pre-trained convolutional neural network model, preprocessing the acquired infrared image and visible light image to be processed, wherein the preprocessing includes one or more of the following processes: normalization, denoising, or detail enhancement.

For example, the size of the image may be set to a fixed size, e.g., 224 x 224, and all acquired training images may be normalized to the same size. And extracting the image characteristics of the infrared image and the visible light image to be processed through the convolutional neural network model, and comparing the image characteristics with the image characteristics of the infrared image and the visible light image which are trained in advance.

The pre-trained convolutional neural network model is used for training an initial convolutional neural network model according to the pre-input infrared image and the visible light image used for training, and extracting image features, wherein the image features can be features which can clearly distinguish object point positions or outlines in the images, and for example, a light pole exists at a certain position. In order to be able to receive the registered pair of infrared image and visible image to be processed, the pre-trained convolutional neural network model may be a two-dimensional convolutional neural network, the input layer of which may accept a two-dimensional or three-dimensional array.

Fig. 2 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention. As shown in FIG. 2, the convolutional neural network model includes a backbone network (MobileNet V3-small), a transport network (Neck), and an output network (Head).

The backbone network comprises three visible light image channels and an infrared image channel. Wherein, three visible light image channels can be used to transmit R, G of visible light and pixel values of three channels B, and an infrared image channel can be used to transmit pixel values of an infrared image. In the scheme, after the visible light image and the infrared image are respectively input through three visible light image channels and one infrared image channel, grouping convolution is adopted. The advantage of packet convolution is that there is no information crossover during transmission and the computational effort is saved. And performing downsampling at least twice after the grouping convolution to obtain at least two horizontal characteristics.

The transmission network is used for carrying out feature fusion on the obtained features of at least two levels.

The output network is used for obtaining an output result.

The main network comprises input layers and a grouping convolution layer GConv, wherein the input layers are used for inputting at least one group of visible light images and infrared images to be processed, the grouping convolution layer GConv extracts features of the visible light images and the infrared images to be processed respectively based on a plurality of groups of convolution layers Stage, and the problem that convergence speed of the convolutional neural network is influenced due to low-dimensional information fusion is avoided. Illustratively, the first layer convolutional layer may employ a block convolution with a block number of 4. It is understood that the backbone network adopts a lightweight network MobileNet V3-small; ReLU activation function is used in the packet convolution layer GConv, and the expression capability of the neural network model is improved by adding nonlinear factors.

In addition, the transmission network tack performs convolution, upsampling or downsampling for multiple times based on single-scale features FM (feature map) extracted from the backbone network, performs feature fusion, and decodes the feature fusion result through a Decoder Decoder to obtain a decoding result. Wherein the Decoder may be a ReplayingDecoder. And the output network Head obtains an output result based on a decoding result of the Decoder Decoder.

Fig. 3 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention. As shown in fig. 3, wherein each convolutional layer Stage in the group convolutional layer GConv is obtained by summing N ibnecks (Inverted Bottleneck) with step size of 1. And IBneck is obtained by summing a first input layer and a first output layer of the model, wherein the first input layer is a point-by-point convolution layer PWConv for the infrared image and visible light image features to be processed, and the first output layer is a product PWConv-linear of the result obtained by using the ReLU activation function and the sigmoid activation function and the depth convolution layer DWConv after the infrared image and visible light image features to be processed are respectively subjected to pooling processing after passing through the point-by-point convolution layer PWConv and the depth convolution layer DWConv. The point-by-point convolution layer PWConv is obtained by performing batch normalization processing on two-dimensional 1 × 1 convolution and using a ReLU activation function, and plays a role in increasing dimensions; the depth convolution layer DWConv is obtained by batch normalization processing of two-dimensional k × k convolution and using an HS (Hard-Swish) activation function, and the first output layer PWConv-linear is obtained by batch normalization processing of two-dimensional 1 × 1 convolution. It is understood that the first output layer PWConv-linear does not use a non-linear activation function because the use of a non-linear activation function when the high-dimensional convolution turns to the low-dimensional convolution may cause information loss or corruption.

In addition, the Decoder is obtained by summing up DW-scaled blocks of M hole convolution layers. And the hole convolution layer DW-scaled block is obtained by summing a second input layer and a second output layer of the model. The second input layer is a point-by-point convolution layer PWConv, the convolution kernel size of the point-by-point convolution layer PWConv is 1 x 1, the second output layer is a result obtained by performing hole convolution on the point-by-point convolution layer PWConv after depth separation by using an HS activation function to obtain the point-by-point convolution PWConv with the convolution kernel size of 1 x 1 and then using the HS activation function.

S130, determining fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model.

And the output result is the image characteristics obtained after the infrared image and the visible light image to be processed are input into the convolutional neural network model.

For example, the infrared image and the visible light image to be processed enter the hidden layer of the convolutional neural network model after passing through the input layer of the convolutional neural network model, and the hidden layer may include a convolutional layer, a pooling layer, and a full connection layer. The convolution layer is used for extracting the characteristics of the infrared image and the visible light image to be processed, the convolution layer internally comprises a plurality of convolution kernels, each element forming the convolution kernels corresponds to a weight coefficient and a deviation value, the convolution kernels can regularly sweep the image characteristics of the infrared image and the visible light image to be processed during operation, matrix element multiplication summation is carried out on the image characteristics of the infrared image and the visible light image to be processed in a receptive field, and the deviation values are superposed, wherein the size of the convolution kernels can be 3 x 3, and the number of the convolution kernels can be 128. The pooling layer is used for performing feature extraction on the convolutional layer, the output feature map is transmitted to the pooling layer for feature selection and information filtering, the pooling layer comprises a preset pooling function, the function of the pooling layer is to replace the result of a single point in the feature map with feature map statistics of an adjacent area, the step of selecting a pooling area by the pooling layer is the same as the step of scanning the feature map by the convolutional core, and the pooling size, the step length and the filling control are performed. The fully-connected layer is located at the last part of the hidden layer of the convolutional neural network model and only transmits signals to other fully-connected layers, a characteristic diagram loses a spatial topological structure in the fully-connected layer and is expanded into a vector and passes through an excitation function, namely, the fully-connected layer is used for carrying out nonlinear combination on the characteristics extracted from the pooling layer to obtain an output result. And the infrared image and the visible light image to be processed enter an output layer of the convolutional neural network model after passing through a hidden layer of the convolutional neural network model, and the output layer is used for outputting a classification label through a logic function or a normalized exponential function. Wherein, the classification label is the output result of the infrared image and the visible light image to be processed.

However, the output results of the infrared image and the visible light image to be processed obtained only by the convolutional neural network model do not perform the fusion processing on the images well, and the output results need to be further processed to obtain the fusion parameters.

Optionally, determining deformation adjustment values of feature points of the infrared image and the visible light image to be processed according to an output result of the convolutional neural network model; and determining fusion parameters of the characteristic points of the infrared image and the visible light image to be processed according to the deformation adjusting value.

And the deformation adjusting value is obtained by inputting the infrared image and the visible light image to be processed into the convolutional neural network model, then carrying out double-channel proportioning and comparing the double-channel proportioning with a prediction result output in the model. The fusion parameter is used for adjusting parallax existing when the visible light image to be processed is fused into the infrared image to be processed.

Optionally, determining a fusion parameter of the feature points of the infrared image and the visible light image to be processed according to the deformation adjustment value includes: and carrying out deformation adjustment on the infrared image to be processed according to the deformation adjustment value so as to compare the infrared image to be processed with the pixel point position of the characteristic point of the visible light image to obtain a fusion parameter.

It is understood that the infrared image and the visible light image to be processed contain a plurality of feature points. The feature point refers to a point where the image gray value changes dramatically or a point with a large curvature on the image edge, that is, an intersection of two edges. And the deformation adjustment is to perform deformation adjustment on the infrared image by using the deformation adjustment value to obtain a predicted deformation field. The advantage that sets up like this lies in, through the deformation regulating value, it is right the infrared image of pending carries out the deformation and adjusts to compare with the pixel point position of the characteristic point of visible light image, obtain the fusion parameter, make the inside pixel transition of the back output image of fusing more smooth and easy, it is better to fuse image display effect.

And S140, performing fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters.

The fusion processing is to reduce the uncertainty and redundancy of output results on the basis of maximum combination by subjecting the infrared image to be processed and visible light image data shot in the same scene and collected by a multi-source channel to image processing and computer technology, and finally synthesize high-quality images, so that the spatial resolution and spectral resolution of the original images are improved.

However, the image pixel points of the convolved neural network model after processing the infrared image and the visible light image may not have corresponding pixel points in the original image. In order to take a value of a pixel gray value without a corresponding point in a target image, for example, an interpolation operation may be performed on an output result obtained by the infrared image and the visible light image to be processed in the convolutional neural network model according to the fusion parameter, and a registration map is output to complete the fusion processing of the infrared image and the visible light image. And the interpolation operation is used for filling the gap positions between pixel points after certain processing is carried out on the infrared image and the visible light image to be processed, so that the details of the image are kept as much as possible, and artificial noise is introduced as little as possible.

The embodiment provides a processing method of an infrared image, which is used for acquiring an infrared image and a visible light image to be processed; wherein the infrared image and the visible light image are taken in the same scene; inputting the infrared image and the visible light image to be processed into a pre-trained convolutional neural network model as a registration pair; the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result; determining fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model; and performing fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters. According to the technical scheme, the convolution neural network model based on deep learning is established, the infrared image and the visible light image are subjected to grouping convolution, the convergence rate of the convolution neural network is prevented from being influenced by information intersection, the infrared image and the visible light image can be subjected to fusion processing, the purpose of enhancing the infrared image display effect is achieved, the problems that the infrared image and the visible light image are not matched and not superposed accurately are solved, and the purpose of enhancing the infrared image display effect is achieved.

Example two

Fig. 4 is a flowchart of a convolutional neural network model training method according to a second embodiment of the present invention, and the second embodiment of the present invention is optimized based on the above embodiments. The embodiment can be applied to the training process of the convolutional neural network model. As shown in fig. 4, the method includes:

s210, obtaining a preset number of training images.

Wherein the training image comprises a registered pair of an infrared image and a visible light image. It is to be understood that the training image and the infrared image and visible light image to be processed are different, the training image is used for training the initial convolutional neural network model, and the infrared image and visible light image to be processed are used for image processing in the trained convolutional neural network model. For the established convolutional neural network model, a different number of training images may be selected for training. For example, the number of training images may be 50, 200, 500, etc. Of course, the number of training images is not limited, and correspondingly, when the convolutional neural network models are different, the number of required training images is also different.

In addition, in order to ensure the accuracy of the training result, the infrared image and the visible light image included in the training image are captured in the same scene, and the size of the captured image may be set to a fixed size, for example, 224 × 224, and all the captured training images are normalized to the same size.

The method for registering the infrared image and the visible light image in the training image is not limited, and may be key point detection, feature description, feature matching, or image deformation. It can be understood that, taking the infrared image in the training image as a moving image, i.e. an image to be adjusted, a true deformation field of the moving image is obtained by using any one of the above registration methods, where the true deformation field is a label of the registration pair. And combining the infrared image and the visible light image of any registration pair and the real deformation field obtained by any registration method into a group of data sets as the basis for subsequent training of the convolutional neural network model.

For example, 2000 sets of infrared images and visible light images may be selected, where each set of infrared image and visible light image is obtained in the same scene; taking the infrared image of each group as a moving image, and obtaining a real deformation field of the infrared image by using a feature description method; further, 2000 sets of data sets were obtained from the infrared image, the visible image, and the real deformation field thereof.

S220, splitting the training image into a training sample set and a testing sample set.

The training sample set is used for determining parameters and an image processing rule in the initial convolutional neural network model, and the testing sample set is used for testing the precision of the convolutional neural network model established based on the training sample set. It should be noted that the training sample set and the testing sample set are independent from each other, and different images are used. The advantage of this arrangement is that the accuracy of the established convolutional neural network model can be improved.

Illustratively, group 2000 data sets are grouped as 4: 1, to obtain a training sample set containing 1600 groups of data sets and a test sample set containing 400 groups of data sets.

And S230, training the initial convolutional neural network model by adopting the training sample set.

And inputting the initial convolutional neural network model into a registration pair in a training sample set, and outputting the registration pair as a deformation parameter of the registration pair.

The registration pair in the training sample set comprises an infrared image and a visible light image, and the infrared image and the visible light image can be registered in a manual registration mode, a human-computer interaction registration mode or an automatic registration mode to obtain the registration pair. And inputting the registration pairs in the training sample set into the initial convolutional neural network model, and extracting the image characteristics of the initial convolutional neural network model to obtain an output result. With continuous updating and expansion of training samples, model parameters of the initial convolutional neural network model can be continuously optimized.

Optionally, the deformation parameter is used to perform deformation adjustment on the infrared image in the registration pair to serve as a fusion base image. The deformation parameters may include the number of pixels included in the unit length along the width and height directions of the image, the total number of pixels included in the image, the pixel value of each pixel point in the image, or the gradient value of each pixel point and the domain pixel point. The method has the advantages that the fusion basic image is obtained by carrying out deformation adjustment on the infrared image in the registration pair, and the fusion precision can be improved.

S240, testing the test sample set based on the deformation parameters, and determining that the training of the convolutional neural network model is finished if the test result meets the preset standard.

And based on the convolutional neural network model obtained after the initial convolutional neural network model is trained by adopting the training sample set, inputting the test sample into the convolutional neural network model to obtain an output result, wherein the output result is a prediction result of the test sample set. And comparing the prediction result with the test sample set and calculating to obtain the precision of the convolutional neural network model, if the precision of the convolutional neural network model is not within the preset threshold range, returning to the step S230, namely, repeatedly inputting different test samples into the convolutional neural network model and continuously adjusting the model parameters of the convolutional neural network model until the model parameters meet the preset standard.

In each of the above technical solutions, optionally, the testing the test sample set based on the deformation parameter includes: carrying out deformation processing on the infrared image of the test sample centralized registration pair based on the deformation parameters to obtain a deformation processing result; and comparing the deformation processing result with the infrared image of the registration pair in terms of sub-region similarity, and determining a test result according to the sub-region similarity comparison result.

The deformation processing result is a prediction deformation field output by the convolution neural network model after convolution, down-sampling or up-sampling, and the like, and the infrared image of the registration pair is a real deformation field output by the infrared image in the training image as a moving image after registration. And dividing the deformation processing result and the infrared image of the registration pair into a plurality of subareas with the same number respectively, comparing the similarity of the two subareas with the same number, and determining a test result according to the comparison result of the subarea similarity.

By comparing the deformation processing result with the similarity of the sub-regions of the registration mid-infrared image, the calculation precision of the convolution neural network model can be improved.

In each of the above technical solutions, optionally, performing sub-region similarity comparison on the deformation processing result and the infrared image of the registration pair, and determining a test result according to the sub-region similarity comparison result, includes: respectively cutting the deformation processing result and the infrared image of the registration pair into sub-regions with preset number; correspondingly calculating the similarity between the sub-region of the deformation processing result and the sub-region of the infrared image of the registration pair; selecting a preset number of sub-regions with high similarity from all the similarities as calculation sub-regions; and determining the similarity score of each calculated subregion as a subregion similarity comparison result.

In order to better improve the calculation precision of the convolutional neural network model, the loss between the predicted deformation field and the real deformation field can be calculated. Fig. 5 is a flowchart of a method for determining a loss function in convolutional neural network model training according to a second embodiment of the present invention. As shown in fig. 5:

firstly, inputting the infrared image IR to be processed and the visible light image Img to be processed into the convolutional neural network model Net;

secondly, outputting a predicted deformation field DF of the infrared image IR to be processed by the convolutional neural network model Net and a label DF (GT) of the registration pair in a training image;

thirdly, carrying out deformation treatment on the infrared image IR to be treated according to the predicted deformation field DF to obtain a deformed infrared image IR';

fourthly, comparing a deformation result of the to-be-processed infrared image IR under the action of a label DF (GT), with the similarity of the first number of subregions in the deformed infrared image IR';

fifthly, selecting a second number of sub-regions with high similarity from all the similarities as calculation sub-regions, and determining the similarity score of each calculation sub-region as a sub-region similarity comparison result;

and sixthly, obtaining a loss function between the predicted deformation field and the real deformation field by using a ReLU activation function and a sigmoid activation function for the comparison result.

Illustratively, the optimizer is set to be SGD, the initial learning rate is 0.01, the weight attenuation rate is 0.0001, the momentum is 0.9, when the error change is very gentle, the learning rate is reduced by 10 times, the batch size is set to be 16, the number of training iterations is 100, the loss gradient is propagated reversely, and the model parameters are updated.

Wherein the loss between the predicted deformation field and the true deformation field can be calculated by the following formula:

L(F,M,φ)＝L_sim(F,M(φ))+λL_smooth(φ)；

wherein F is a real deformation field, M is a predicted deformation field, phi is a registration domain, and lambda L_smooth(phi) spatial regularization of the predicted deformation field, L_sim(F, M (. phi.)) is the degree of similarity.

It should be noted that the above loss is calculated by using mutual information, and cross-correlation is used as a loss function of image registration.

Wherein the similarity L_sim(F, M (φ)) may be calculated using the following equation:

L_sim(F,M(φ))＝-CC(F,M(φ))；

wherein the spatial regularization λ L of the deformation field is predicted_smooth(φ) may be calculated using the following equation:

it should be noted that, since the convolutional neural network tends to generate discontinuous deformation fields, a spatial smoothness constraint, i.e., a penalty is imposed on the spatial gradient of the deformation field, where the L2 norm of the gradient of the deformation field is used for constraint, to predict the deformation field.

Further, the visible light image and the infrared image in the test sample set are used as input of the convolution neural network model in S240, a prediction deformation field is output through prediction of the model, loss between the prediction deformation field and a real deformation field is calculated, according to the loss calculation precision, if the precision meets the requirement, training is finished, otherwise, the training is returned to S240 for continuous training.

Inputting the infrared image and the visible light image to be processed into the convolutional neural network model trained in S240, outputting a predictive deformation field, and performing interpolation operation on the moving image by using the predictive deformation field to obtain a fused image.

In the infrared image processing method provided by the embodiment of the invention, a preset number of training images are obtained in the training process of the convolutional neural network model; wherein the training image comprises a registration pair consisting of an infrared image and a visible light image; splitting the training image into a training sample set and a test sample set; training an initial convolutional neural network model by adopting the training sample set, wherein the input of the initial convolutional neural network model is a registration pair in the training sample set, and the output of the initial convolutional neural network model is a deformation parameter of the registration pair; and testing the test sample set based on the deformation parameters, and determining that the training of the convolutional neural network model is completed if the test result meets the preset standard. Based on deep learning, the model is trained by utilizing a training sample set and a testing sample set, a convolutional neural network model related to infrared image processing with high precision and small error is established, and the generalization capability, robustness and fusion precision of the convolutional neural network model are improved.

EXAMPLE III

Fig. 6 is a block diagram of an infrared image processing apparatus according to a third embodiment of the present invention, which is capable of executing an infrared image processing method according to any embodiment of the present invention, and includes functional modules corresponding to the execution method and beneficial effects. As shown in fig. 6, the apparatus may include:

an image obtaining module 310, configured to obtain an infrared image and a visible light image to be processed; wherein the infrared image and the visible light image are taken in the same scene;

the image preprocessing module 320 is configured to input the infrared image to be processed and the visible light image to a pre-trained convolutional neural network model as a registration pair; the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result;

an image fusion parameter determining module 330, configured to determine a fusion parameter of the infrared image and the visible light image to be processed according to an output result of the convolutional neural network model;

and the image fusion processing module 340 is configured to perform fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters.

The infrared image processing device provided by the embodiment of the invention acquires an infrared image and a visible light image to be processed; wherein the infrared image and the visible light image are taken in the same scene; inputting the infrared image and the visible light image to be processed into a pre-trained convolutional neural network model as a registration pair; the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result; determining fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model; and performing fusion processing on the infrared image and the visible light image to be processed according to the fusion parameters. According to the technical scheme, the convolution neural network model based on deep learning is established, the infrared image and the visible light image are subjected to grouping convolution, the convergence rate of the convolution neural network is prevented from being influenced by information intersection, the infrared image and the visible light image can be subjected to fusion processing, the purpose of enhancing the infrared image display effect is achieved, the problems that the infrared image and the visible light image are not matched and not superposed accurately are solved, and the purpose of enhancing the infrared image display effect is achieved.

Further, the image preprocessing module 320 includes:

the training image acquisition unit is used for acquiring a preset number of training images; wherein the training image comprises a registration pair consisting of an infrared image and a visible light image;

the training image splitting unit is used for splitting the training image into a training sample set and a test sample set;

the model training unit is used for training an initial convolutional neural network model by adopting the training sample set, wherein the input of the initial convolutional neural network model is a registration pair in the training sample set, and the output of the initial convolutional neural network model is a deformation parameter of the registration pair;

and the model testing unit is used for judging whether the deformation parameters output by the initial convolutional neural network model tested by adopting the test sample set meet the preset standard or not, and if so, determining that the training of the convolutional neural network model is finished.

Further, the deformation parameter is used for performing deformation adjustment on the infrared image and/or the visible image in the registration pair to serve as a fusion base image.

Further, the model test unit includes:

the deformation processing result determining subunit is used for carrying out deformation processing on the infrared image of the test sample centralized registration pair based on the deformation parameters to obtain a deformation processing result;

and the test result determining subunit is used for performing sub-region similarity comparison on the deformation processing result and the infrared image of the registration pair, and determining a test result according to the sub-region similarity comparison result.

Further, the test result determination subunit is specifically configured to:

respectively cutting the deformation processing result and the infrared image of the registration pair into sub-regions with preset number;

correspondingly calculating the similarity between the sub-region of the deformation processing result and the sub-region of the infrared image of the registration pair;

selecting a preset number of sub-regions with high similarity from all the similarities as calculation sub-regions;

and determining the similarity score of each calculated subregion as a subregion similarity comparison result. .

Further, the image fusion parameter determining module 330 includes:

the deformation adjusting value determining unit is used for determining deformation adjusting values of the characteristic points of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model;

and the fusion parameter determining unit is used for determining fusion parameters of the characteristic points of the infrared image and the visible light image to be processed according to the deformation adjusting value.

Further, the fusion parameter determining unit is specifically configured to:

and carrying out deformation adjustment on the infrared image to be processed according to the deformation adjustment value so as to compare the infrared image to be processed with the pixel point position of the characteristic point of the visible light image to obtain a fusion parameter.

The product can execute the processing method of the infrared image provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.

Example four

A fourth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the processing method for infrared images provided in all the embodiments of the present invention:

inputting the infrared image and the visible light image to be processed into a pre-trained convolutional neural network model as a registration pair; the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result;

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

EXAMPLE five

The fifth embodiment of the application provides electronic equipment. Fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. As shown in fig. 7, the present embodiment provides an electronic device 400, which includes: one or more processors 420; the storage device 410 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 420, the one or more processors 420 implement the method for processing an infrared image provided in the embodiment of the present application, the method includes:

Of course, those skilled in the art can understand that the processor 420 also implements the technical solution of the method for processing an infrared image provided in any embodiment of the present application.

The electronic device 400 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the electronic device 400 includes a processor 420, a storage device 410, an input device 430, and an output device 440; the number of the processors 420 in the electronic device may be one or more, and one processor 420 is taken as an example in fig. 5; the processor 420, the storage device 410, the input device 430, and the output device 440 in the electronic apparatus may be connected by a bus or other means, and are exemplified by a bus 450 in fig. 7.

The storage device 410 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and module units, such as program instructions corresponding to the infrared image processing method in the embodiment of the present application.

The storage device 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 410 may further include memory located remotely from processor 420, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and function control of the electronic device. The output device 440 may include a display screen, speakers, or other electronic equipment.

The electronic equipment provided by the embodiment of the application carries out grouping convolution on the infrared image and the visible light image by establishing the convolutional neural network model based on deep learning, avoids information cross influence on the convergence speed of the convolutional neural network, enables the infrared image and the visible light image to be fused, and achieves the purpose of enhancing the infrared image display effect.

The processing device, the medium and the electronic device for the infrared image provided in the above embodiments can execute the processing method for the infrared image provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. For details of the technology not described in detail in the foregoing embodiments, reference may be made to a method for processing an infrared image provided in any embodiment of the present application.

It is to be noted that the foregoing description is only exemplary of the invention and that the principles of the technology may be employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for processing an infrared image, the method comprising:

inputting the infrared image and the visible light image to be processed into a pre-trained convolutional neural network model as a registration pair; the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer convolution carries out at least twice down-sampling to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result;

2. The method of claim 1, wherein the training process of the convolutional neural network model comprises:

acquiring a preset number of training images; wherein the training image comprises a registration pair consisting of an infrared image and a visible light image;

splitting the training image into a training sample set and a test sample set;

training an initial convolutional neural network model by adopting the training sample set, wherein the input of the initial convolutional neural network model is a registration pair in the training sample set, and the output of the initial convolutional neural network model is a deformation parameter of the registration pair;

and testing the test sample set based on the deformation parameters, and determining that the training of the convolutional neural network model is finished if the test result meets the preset standard.

3. The method of claim 2, wherein the deformation parameters are used to perform deformation adjustment on the infrared images in the registration pair as a fusion basis image.

4. The method of claim 3, wherein testing the set of test samples based on the deformation parameters comprises:

carrying out deformation processing on the infrared image of the test sample centralized registration pair based on the deformation parameters to obtain a deformation processing result;

and comparing the deformation processing result with the infrared image of the registration pair in terms of sub-region similarity, and determining a test result according to the sub-region similarity comparison result.

5. The method of claim 4, wherein performing a sub-region similarity comparison of the deformation processing result and the infrared image of the registration pair, and determining a test result according to the sub-region similarity comparison result comprises:

respectively cutting the deformation processing result and the infrared image of the registration pair into a first number of sub-regions;

selecting a second number of sub-regions with high similarity from all the similarities as calculation sub-regions;

and determining the similarity score of each calculated subregion as a subregion similarity comparison result.

6. The method according to claim 1, wherein determining the fusion parameters of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model comprises:

determining deformation adjusting values of the characteristic points of the infrared image and the visible light image to be processed according to the output result of the convolutional neural network model;

and determining fusion parameters of the characteristic points of the infrared image and the visible light image to be processed according to the deformation adjusting value.

7. The method according to claim 6, wherein determining fusion parameters of feature points of the infrared image and the visible light image to be processed according to the deformation adjustment value comprises:

8. An infrared image processing apparatus, comprising:

the image preprocessing module is used for inputting the infrared image to be processed and the visible light image to be processed into a pre-trained convolutional neural network model as a registration pair; the convolutional neural network model comprises a backbone network, a transmission network and an output network; the input of the backbone network comprises three visible light image channels and an infrared image channel, the first layer of convolution of the backbone network adopts packet convolution, and the convolution layer after the first layer of convolution carries out downsampling at least twice to obtain at least two horizontal characteristics; the transmission network is used for carrying out feature fusion on the obtained features of at least two levels; the output network is used for obtaining an output result;

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of processing an infrared image as set forth in any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of processing an infrared image according to any one of claims 1 to 7 when executing the computer program.