CN114529593A

CN114529593A - Infrared and visible light image registration method, system, equipment and image processing terminal

Info

Publication number: CN114529593A
Application number: CN202210029468.8A
Authority: CN
Inventors: 王斌; 牛兴振; 郭盛林
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-05-24

Abstract

The invention belongs to the technical field of image processing, and discloses an infrared and visible light image registration method, a system, equipment and an image processing terminal, which comprise the following steps: (1) collecting an infrared image and a visible light image; (2) constructing a training data set, and constructing and training conditions to generate an antagonistic neural network; (3) detecting a visible light image target by using a YOLOv5 network; (4) generating a pseudo-infrared image from the visible light image with a generating antagonistic neural network; (5) constructing a feature descriptor by an accelerated robust feature (SURF) algorithm, and constructing a matching model by using a violent matching method; (6) filtering mismatching points through distance and slope consistency; (7) and estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration. The invention can finish relatively accurate registration of the infrared and visible light images with large target size difference and complex background, and is a practical solution for registering the infrared and visible light images.

Description

Infrared and visible light image registration method, system, equipment and image processing terminal

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an infrared and visible light image registration method, system, equipment and an image processing terminal. The method can be used for the infrared and visible light image registration tasks with large target size difference and complex background.

Background

At present, multi-modal images derived from different imaging devices can provide richer and more comprehensive information than single-modal images, and people have an increasing demand for integration and utilization of various image information. Image registration is a process of determining transformation parameters among images according to some similarity measures, so that two or more images of the same scene acquired from different sensors, different visual angles and different time are transformed to the same coordinate system to obtain the best matching on a pixel layer. The image registration technology is a core technology for providing an operation reference for image processing tasks such as subsequent image splicing, image fusion and the like, and is a hot research problem in the field of computer vision.

The infrared and visible light image registration is an important multi-sensor image registration method, and plays an important role in the fields of computer vision, robots, power equipment fault detection, remote sensing, military application and the like. However, due to the huge difference in resolution and color between the infrared image and the visible light image, the registration between the infrared image and the visible light image is difficult to achieve, wherein the infrared image is mostly within "500 x 960" in resolution, which is much lower than the "2160 x 3840" resolution of the visible light image, which results in serious loss of grayscale detail of the infrared image, blurred image, and large difference from clear texture in the visible light grayscale image. In addition, the infrared image and the visible light image have different imaging mechanisms, and the image contrast is determined by the reflectivity and the shadow in the visible light wave band; in the infrared band, contrast is determined by emissivity and temperature, and particularly as a result of temperature differences, contrast can vary over a wide range, which results in greater differences in color appearance in heterogeneous images.

Aiming at the problems that target information of an infrared image and a visible light image has great difference, and only SURF feature descriptors are utilized, so that the accuracy is low, the robustness is poor and the like, a new registration method is expected to be obtained, and the method can be used for registering the infrared image and the visible light image with large target size difference and complex background.

Through the above analysis, the problems and defects of the prior art are as follows: the existing infrared image and visible light image target information have great difference, and the accuracy and robustness are poor only by using SURF feature descriptors.

The difficulty in solving the above problems and defects is: the prior art is used for solving the problems that the infrared and visible light images with large resolution and color difference and complex background have high registration difficulty, low registration precision and poor robustness, and a better solution is not provided at present.

The significance of solving the problems and the defects is as follows: the invention provides a practical solution, improves the registration precision and robustness of the infrared and visible light images, and effectively solves the registration problem of the infrared and visible light images with huge difference and complex background in target information.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an infrared and visible light image registration method, a system, equipment and an image processing terminal.

The invention is realized in such a way, the infrared and visible light image registration method detects a target approximate region in a visible light image by using a YOLOv5 network, performs modal conversion on the visible light image into a pseudo infrared image by using a condition generation countermeasure network, constructs feature description of the infrared image pair based on a SURF descriptor, constructs a matching model by using violence matching, then filters matching points by using distance constraint and slope consistency, and finally performs parameter estimation of homography transformation by using a random sampling consistency RANSAC algorithm and completes registration on a real infrared image.

Further, the infrared and visible light image registration method comprises the following steps:

collecting an infrared image and a visible light image;

step two, constructing a training data set and a condition to generate a confrontation network and training;

step three, detecting the visible light image target by using a YOLOv5 network: detecting a target in the visible light image by using a YOLOv5 network, screening out pedestrians or bicycles according to label, increasing original height 1/12 above and below a pedestrian type target frame respectively, and adjusting the width to be 0.8 times of the height according to the height; 1/30 of the original width is added on the left and the right of the bicycle category target frame respectively, and the height is adjusted to be 1.25 times of the width according to the width; the other categories directly cut the original image according to the detection frame by taking the height of the original image as half and the width as 0.8 time of the height to obtain a target approximate area; the problem of overlarge resolution difference between an infrared image and a visible light image can be effectively solved by detecting the approximate area of the visible light target;

generating a pseudo infrared image by using a conditional generation countermeasure network, zooming the cut visible light image into sizes of 512 pixels wide and 640 pixels high, inputting the sizes into the conditional generation countermeasure network to obtain a corresponding pseudo infrared image, removing a self-contained watermark from the original real infrared image during shooting, and zooming into sizes of 512 pixels wide and 640 pixels high; the color difference of the images can be reduced by generating a false infrared image by the visible light image through the generation of the countermeasure network, and the registration difficulty of the different-source images is reduced;

step five, constructing a descriptor by an SURF algorithm, and constructing a matching model by violence matching: respectively extracting the positions of key points of the real infrared image and the pseudo infrared image by using an accelerated robust feature SURF algorithm and constructing a descriptor, constructing a matching model between the feature points by using a violent matching method according to Euclidean distance, and selecting two optimal matches for each key point;

step six, filtering mismatching points according to the consistency of the distance and the slope; obviously wrong mismatching points can be eliminated through distance and slope consistency constraint, and the accuracy of subsequent estimation transformation parameters is improved.

Step seven, estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration: and finally, estimating parameters of a homography transformation matrix H according to a random sampling consistency algorithm by using matching points which are reserved in the real infrared image and the pseudo infrared image, and transforming the real infrared image to complete registration.

Further, the step one of acquiring the infrared image and the visible light image specifically includes: a ResNet50 network is constructed to be used as a feature extraction network of a twin convolutional neural network, and the structure sequentially comprises the following steps: a first convolution layer, a first BN layer, an activation function layer, a maximum pooling layer, a second convolution layer, a second BN layer, a third convolution layer, a third BN layer, a fourth convolution layer, a fourth BN layer, a fifth convolution layer, a fifth BN layer; sequentially setting the number of convolution kernels of the first convolution layer to the fifth convolution layer to be 64, 64,128,256 and 512, sequentially setting the sizes of the convolution kernels to be 7, 3, 3, 3 and 3, setting the step sizes of the first convolution layer, the second convolution layer and the third convolution layer to be 2, setting the step sizes of the fourth convolution layer and the fifth convolution layer to be 1, and setting the void ratio of convolution kernels in the fourth convolution layer and the fifth convolution layer to be 2 and 4; the size of the core of the maximum pooling layer pooling area is set to be 3 multiplied by 3, and the step length is set to be 2; the first BN layer to the fifth BN layer adopt batch standardization functions, the activation function layer adopts linear rectification functions, and the maximum pooling layer adopts a regional maximum pooling function.

Further, the step two of constructing a training data set and a condition generating countermeasure network and training comprises:

(1) constructing a training data set, wherein the size difference between the visible light image and the infrared image is large, and the visible light image is reduced by 3 times to 1280 pixels and 720 pixels; manually selecting four pairs of matched pixel points in the infrared and visible light image pairs, determining a homography transformation matrix H through the matched pixel points, transforming the infrared image into an infrared and visible light image pair with the size consistent with that of the visible light image, manually selecting a region containing a target and cutting the region into the infrared and visible light image pair with the size of integral multiple of 4;

(2) the method comprises the following steps of constructing a generator Gnet based on a Resnet network as a characteristic extraction network for generating a countermeasure network under the condition, wherein the structure of the generator Gnet based on the Resnet network sequentially comprises a downsampling convolution module, 9 ResnetBlock modules, an upsampling deconvolution module and a Tanh component, and the downsampling convolution module sequentially comprises the following structures: a first filler convolutional layer, a first convolutional layer, a BN layer, an activation function layer, a second convolutional layer, a BN layer, an activation function layer, a third convolutional layer, a BN layer, an activation function layer; the first filling convolutional layer parameter is 3, the number of convolutional kernels of the first to third convolutional layers is sequentially set to be 64,128 and 256, the sizes of the convolutional kernels are sequentially set to be 7, 3 and 3, the step sizes are sequentially set to be 1, 2 and 2, and padding is sequentially set to be 0, 1 and 1; the RestnetBlock module structure sequentially comprises: fill convolutional layer, BN layer, activation function layer, DropOut layer, fill convolutional layer, BN layer; filling convolutional layers with parameters of 1, setting the number of convolutional cores of the convolutional layers to be 256, setting the sizes of all convolutional cores to be 3, setting the step length to be 1, and setting the padding to be 1; the up-sampling deconvolution module structure is as follows in sequence: a first deconvolution layer, a BN layer, an activation function layer, a second deconvolution layer, a BN layer, an activation function layer, a third filled convolution layer, a third convolution layer, an activation function layer; the number of convolution kernels of the first deconvolution layer and the number of convolution kernels of the second deconvolution layer are sequentially set to be 128 and 64, the sizes of the convolution kernels are both 3, the step sizes are both 2, the padding is both 1, the parameter of the third filling convolution layer is 3, the number of convolution kernels of the third convolution layer is 3, the size of the convolution kernels is 7, the step size is 1, and the padding is 0; all BN layers adopt batch standardization functions, and the rest of the activation function layers except the last activation function layer adopt Tanh functions adopt linear rectification functions;

(3) constructing a discriminator network of a Patchgan style Dnet as a condition generation network, wherein the structure of the discriminator network sequentially comprises a first convolution layer, an activation function layer, a second convolution layer, a BN layer, an activation function layer, a third convolution layer, a BN layer, an activation function layer, a fourth convolution layer, a BN layer, an activation function layer and a fifth convolution layer; the number of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer is sequentially set to be 64,128,256,512 and 1, the size of each convolution kernel is set to be 4, the step size is sequentially set to be 2,2,2,1 and 1, padding is set to be 1, the BN layer adopts a batch normalization function, and the activation function layer is set to be a LeakyReLU function;

(4) training conditions to generate a confrontation network, inputting a training data set into the condition generation confrontation network, and updating the conditions by using an Adam algorithm to generate a weight of the confrontation network until the confrontation Loss function Loss reaches convergence.

Further, the homography transformation matrix H in (1) is defined as follows:

where s represents a scale factor, M is a camera reference matrix, r1, r2, and t represents camera external references.

Further, the linear rectification function in (2) is defined as follows:

the Tanh function is defined as follows:

the LeakyReLU function is defined as follows:

where a is a fixed parameter of (1, + ∞), and x is an input;

the Loss function Loss in (4) is defined as follows:

the generation loss of Gnet is defined as follows:

Lg＝-log D(G(I_vis)-μ||I_inf-G(I_vis)||；

the antagonistic loss of Dnet is defined as follows:

L_d＝-log(l-D(I_vis，G(I_vis))-log(D(I_vis，I_inf))；

wherein LcgAN represents the total loss of the condition generation countermeasure network, G represents the generator Gnet, D represents the discriminator Dnet, x represents the input image, z represents the noise vector, and Dropout layer is adopted for z; μ denotes the weight of the second term in Lg, set to 100, Iinf denotes the real infrared image, Ivis denotes the visible light image.

Further, the step six distance and slope consistency filtering mismatching points comprises:

(1) aiming at the set of two optimal matches of each key point, if the Euclidean distance of the first optimal match is smaller than 0.75 times of the Euclidean distance of the second optimal match, keeping the first optimal match, and otherwise, rejecting the two optimal matches;

(2) when the number of correctly matched feature point pairs is large, setting the slope of a straight line formed by the feature points in the real infrared image and the feature points in the pseudo infrared image to be less than 0.1, and screening mismatching points;

it is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the infrared and visible image registration method.

Another object of the present invention is to provide an intelligent image processing terminal, which is used for implementing the infrared and visible light image registration method.

It is another object of the present invention to provide an infrared and visible image registration system implementing the infrared and visible image registration method, the infrared and visible image registration system comprising:

the image acquisition module is used for acquiring an infrared image and a visible light image;

the training data set construction module is used for constructing a training data set, constructing and training conditions and generating a confrontation neural network;

the image target detection module is used for detecting the visible light image target by using a YOLOv5 network;

the pseudo infrared image generation module is used for generating a pseudo infrared image from the visible light image by utilizing the generation countermeasure neural network;

the matching model building module is used for accelerating the robust feature SURF algorithm to build a feature descriptor and building a matching model by using a violence matching method;

the mismatching point filtering module is used for filtering mismatching points through distance and slope consistency;

and the matching completion module is used for estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration.

By combining all the technical schemes, the invention has the advantages and positive effects that: the method has the advantages that the problem of overlarge image resolution difference can be effectively solved by using a YOLOv5 network to detect a target approximate region in a visible light image, the contrast network generated by using conditions is used to perform modal conversion on the visible light image into a pseudo infrared image, the color difference of a different source image can be greatly reduced, the registration difficulty can be reduced, the characteristic description of the infrared image pair is constructed based on a SURF descriptor, a matching model is constructed by using violence matching, then the matching points are filtered by using distance constraint and slope consistency, the registration precision can be improved, and finally, the parameter estimation of homography transformation is performed by using a random sample consistency (RANSAC) algorithm, and the registration of the real infrared image is completed. The method is used for improving the registration accuracy of the infrared and visible light images with large size difference and complex background of the heterogeneous images.

The registration system can utilize a YOLOv5 network to detect the approximate target area of the visible light image, solve the problem of large resolution difference between the visible light image and the infrared image, and solve the problem of mismatching by adopting distance and slope consistency constraints, so that the registration system can be applied to the infrared and visible light image registration scene with large target size difference and complex background, and has the advantages of high precision and good robustness.

The method utilizes the condition generation countermeasure network to perform modal conversion on the visible light image to generate the pseudo infrared image, solves the problem of large color difference of the heterogeneous image, overcomes the problems of low precision and poor anti-interference capability of directly using SURF characteristics to perform registration, and has the advantages of strong generalization capability and capability of coping with infrared and visible light image registration scenes with much interference and complex background.

Drawings

Fig. 1 is a flowchart of an infrared and visible light image registration method according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of an infrared and visible image registration system provided by an embodiment of the present invention;

in fig. 2: 1. an image acquisition module; 2. a training data set construction module; 3. an image target detection module; 4. a pseudo-infrared image generation module; 5. a matching model construction module; 6. a mismatching point filtering module; 7. and a matching completion module.

Fig. 3 is a flowchart of an implementation of the infrared and visible light image registration method according to the embodiment of the present invention.

Fig. 4 is a schematic diagram of the structure of the Gnet network of the generator constructed by the invention.

Fig. 5 is a schematic structural diagram of a ResnetBlock module constructed by the present invention.

FIG. 6 is a schematic diagram of a network structure of a discriminator Dnet constructed by the invention.

FIG. 7 is a graph of the results of the present invention after registration of the bicycle-like infrared and visible images.

Fig. 8 is a graph of the result of the invention after registration of infrared and visible images of pedestrians.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to solve the problems in the prior art, the present invention provides a method, a system, a device, and an image processing terminal for registering infrared and visible light images, and the present invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the infrared and visible light image registration method provided by the present invention comprises the following steps:

s101: collecting an infrared image and a visible light image;

s102: constructing a training data set, and constructing and training conditions to generate a confrontation neural network;

s103: detecting a visible light image target by using a YOLOv5 network;

s104: generating a pseudo-infrared image from the visible light image using a generating countermeasure neural network;

s105: constructing a feature descriptor by an accelerated robust feature (SURF) algorithm, and constructing a matching model by using a violent matching method;

s106: filtering mismatching points through distance and slope consistency;

s107: and estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration.

Those skilled in the art of the infrared and visible light image registration method provided by the present invention may also use other steps to replace SURF features with SIFT, ORB, etc., and use a least square method to replace RANSAC algorithm, and the infrared and visible light image registration method provided by the present invention in fig. 1 is only a specific example.

As shown in fig. 2, the present invention provides an infrared and visible image registration system comprising:

the image acquisition module 1 is used for acquiring an infrared image and a visible light image;

the training data set construction module 2 is used for constructing a training data set, constructing and training conditions and generating a confrontation neural network;

the image target detection module 3 is used for detecting the visible light image target by using a YOLOv5 network;

a pseudo infrared image generation module 4 for generating a pseudo infrared image from the visible light image by using the generation countermeasure neural network;

the matching model construction module 5 is used for constructing a feature descriptor by an accelerated robust feature (SURF) algorithm and constructing a matching model by using a violent matching method;

the mismatching point filtering module 6 is used for filtering mismatching points through distance and slope consistency;

and a matching completion module 7 for estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete the registration.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

As shown in fig. 3, the infrared and visible light image registration method of the present invention comprises the following steps:

acquiring an infrared image and a visible light image: the common mobile phone is provided with a near-infrared camera and is used for shooting visible light images and infrared images of pedestrians or bicycles in the same scene at the same time, the resolution of the visible light images is wide 2160 pixels and high 3840 pixels, the resolution of the infrared images is wide 507 pixels and high 960 pixels, and each scene is 100 pairs, and the total number of the images is 200 pairs.

Step two, constructing a training data set and a condition to generate a confrontation network and training:

(2a) and (3) constructing a training data set, wherein the size difference between the visible light image and the infrared image is large, and the visible light image is reduced by 3 times to 1280 pixels and 720 pixels. And manually selecting four pairs of matched pixel points in the infrared and visible light image pairs, determining a homography transformation matrix H through the matched pixel points, transforming the infrared image into the infrared and visible light image pairs with the size consistent with that of the visible light image, manually selecting a region containing a target and cutting the region into the infrared and visible light image pairs with the size of integral multiple of 4.

The homography transformation matrix H is defined as follows:

where s represents a scale factor, M is a camera reference matrix, r₁，r₂And t represents an external parameter of the camera.

The condition generation countermeasure network of the present invention is further described with respect to Gnet as shown in fig. 4.

Fig. 4 is a schematic structural diagram of Gnet as a generator, and the structure of the Gnet sequentially comprises a downsampling convolution module, 9 ResnetBlock modules, an upsampling deconvolution module and a Tanh component, wherein the downsampling convolution module sequentially comprises: a first filler convolutional layer, a first convolutional layer, a BN layer, an activation function layer, a second convolutional layer, a BN layer, an activation function layer, a third convolutional layer, a BN layer, an activation function layer; the first filling convolutional layer parameter is 3, the number of convolutional kernels of the first to third convolutional layers is sequentially set to be 64,128 and 256, the sizes of the convolutional kernels are sequentially set to be 7, 3 and 3, the step sizes are sequentially set to be 1, 2 and 2, and padding is sequentially set to be 0, 1 and 1; the up-sampling deconvolution module structure is as follows in sequence: a first deconvolution layer, a BN layer, an activation function layer, a second deconvolution layer, a BN layer, an activation function layer, a third filled convolution layer, a third convolution layer, an activation function layer; the number of convolution kernels of the first deconvolution layer and the number of convolution kernels of the second deconvolution layer are sequentially set to be 128 and 64, the sizes of the convolution kernels are both 3, the step sizes are both 2, the padding is both 1, the parameter of the third filling convolution layer is 3, the number of convolution kernels of the third convolution layer is 3, the size of the convolution kernels is 7, the step size is 1, and the padding is 0; all BN layers adopt batch standardization functions, and the rest of the activation function layers except the last activation function layer adopt Tanh functions adopt linear rectification functions.

The linear rectification function is defined as follows:

the Tanh function is defined as follows:

the ResnetBlock module in the Gnet constructed in accordance with the present invention is further described, as shown in FIG. 5.

Fig. 5 is a schematic structural diagram of a ResnetBlock module, and the structure sequentially includes: fill convolutional layer, BN layer, activation function layer, DropOut layer, fill convolutional layer, BN layer; the parameters of the filling convolutional layer are 1, the number of convolutional cores of the convolutional layer is set to be 256, the sizes of the convolutional layers are all 3, the step lengths are all 1, and the padding is all 1.

The condition generation countermeasure network constructed by the present invention is further described with respect to dnets as shown in fig. 6.

FIG. 6 is a schematic diagram showing a structure of a discriminator Dnet, which comprises a first convolution layer, an activation function layer, a second convolution layer, a BN layer, an activation function layer, a third convolution layer, a BN layer, an activation function layer, a fourth convolution layer, a BN layer, an activation function layer, and a fifth convolution layer in this order; the number of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer is sequentially set to be 64,128,256,512 and 1, the size of each convolution kernel is set to be 4, the step size is sequentially set to be 2,2,2,1 and 1, padding is set to be 1, the BN layer adopts a batch normalization function, and the activation function layer is set to be a LeakyReLU function.

(2d) Training a condition generation countermeasure network, inputting a training data set into the condition generation countermeasure network, and updating the weight of the condition generation countermeasure network by using an Adam algorithm until the countermeasure Loss function Loss converges.

The Loss function Loss is defined as follows:

the generation loss of Gnet is defined as follows:

Lg＝-logD(G(I_vis))-μ||I_inf-G(I_vis)||

the antagonistic loss of Dnet is defined as follows:

L_d＝-log(1-D(I_vis，G(I_vis))-log(D(I_vis，I_inf))

Step three, detecting the visible light image target by using a YOLOv5 network:

detecting a target in the visible light image by using a YOLOv5 network, screening out pedestrians or bicycles according to label, increasing original height 1/12 above and below a pedestrian type target frame respectively, and adjusting the width to be 0.8 times of the height according to the height; 1/30 of the original width is added on the left and the right of the bicycle category target frame respectively, and the height is adjusted to be 1.25 times of the width according to the width; the other categories directly cut the original image according to the detection frame by half of the height of the original image and 0.8 time of the width of the original image to obtain an approximate target area;

step four, generating a countermeasure network by using conditions to generate a pseudo infrared image:

the cut visible light image is zoomed into the size of 512 pixels wide and 640 pixels high and input into a condition generation countermeasure network to obtain a corresponding pseudo infrared image, and the original real infrared image is removed with a watermark during shooting and zoomed into the size of 512 pixels wide and 640 pixels high;

step five, constructing a descriptor by an SURF algorithm, and constructing a matching model by violence matching:

respectively extracting the positions of key points of the real infrared image and the pseudo infrared image by using an accelerated robust feature (SURF) algorithm and constructing a descriptor, constructing a matching model between the feature points by using a violent matching method according to Euclidean distance, and selecting two optimal matches for each key point;

step six, filtering mismatching points according to the consistency of the distance and the slope:

(6a) aiming at the set of two optimal matches of each key point, if the Euclidean distance of the first optimal match is smaller than 0.75 times of the Euclidean distance of the second optimal match, keeping the first optimal match, and otherwise, rejecting the two optimal matches;

(6b) when the number of correctly matched feature point pairs is large, setting the slope of a straight line formed by the feature points in the real infrared image and the feature points in the pseudo infrared image to be less than 0.1, and screening mismatching points;

step seven, estimating transformation parameters by a random sample consensus (RANSAC) algorithm to complete registration:

and finally, estimating parameters of a homography transformation matrix H according to a random sampling consistency algorithm by using matching points which are reserved in the real infrared image and the pseudo infrared image, and transforming the real infrared image to complete registration.

The technical effects of the present invention will be described in detail with reference to simulation experiments.

1. Simulation conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: the processor is Intel (R) core (TM) i7-10700KCPU, the main frequency is 2.9GHz, the memory is 64GB, and the display card is NVIDIAGeForceRTX 3060.

The software platform of the simulation experiment of the invention is as follows: ubuntu20.04 operating system, Pycharm2021 software, python3.7, and Pytorch deep learning framework.

2. Simulation content and result analysis:

when a training set and a test set are generated in a simulation experiment, a pedestrian and bicycle data set shot in a campus by a user is used, and each category comprises 100 images, and the total number of the images is 400. Each 50 pairs of categories were used as training data sets and the other 50 pairs were used as test data sets in the simulation experiments of the present invention.

In the simulation experiment, the adopted prior art refers to that:

an accelerated robust feature method, abbreviated as SURF algorithm, proposed in Herbert Bay, Tinne Tuytelaars et al, SURF: Speeded up robust features [ C ]// Proceedings of the 9th European conference on Computer Vision-Volume Part I.Springer-Verlag,2006.

In order to qualitatively evaluate the simulation effect of the present invention, the present invention is used to test the bicycle class and pedestrian class data sets respectively, and the obtained registration results are shown in fig. 7 and 8 respectively. Fig. 7 (a) is a visible light image and a real infrared image detected and cropped by YOLOv5, fig. 7 (b) is a result after registration by the SURF algorithm and a result after registration by the present invention, fig. 7 (c) is a result after registration and fusion by the SURF algorithm and a result after registration and fusion by the present invention, and fig. 8 is the same.

As can be seen from fig. 7 and 8: the subjective evaluation effect of the invention for the infrared and visible light image registration is better than the registration effect based on the SURF algorithm.

In order to quantitatively evaluate the simulation effect of the invention, the invention adopts average absolute error (MAE), peak signal-to-noise ratio (PSNR), Normalized Mutual Information (NMI) and Structural Similarity (SSIM) as performance evaluation indexes to compare with the prior art, and the comparison result is shown in Table 1:

table 1 comparison table of evaluation indexes of the present invention and the prior art in simulation experiment

Method/index	mMAE	mPSNR	mNMI	mSSIM
					Prior Art	56.7484	9.5436	0.1590	0.4692
The invention	34.0360	13.7719	0.2393	0.6004

As can be seen from table 1, in the bicycle and pedestrian test data set, the method is superior to the prior art in all evaluation indexes, and proves that the method achieves the effects of higher precision and stronger robustness in the infrared and visible light image registration tasks with larger target size difference and complex background, and is a very practical registration method and system.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. The infrared and visible light image registration method is characterized in that a YOLOv5 network is used for detecting a target approximate region in a visible light image, a countermeasure network is generated by utilizing conditions to perform modal conversion on the visible light image into a pseudo infrared image, feature description of the infrared image pair is constructed based on a SURF descriptor, a matching model is constructed by utilizing violence matching, then distance constraint and slope consistency are utilized to filter matching points, and finally a random sampling consistency RANSAC algorithm is utilized to perform parameter estimation of homography transformation and complete registration on a real infrared image.

2. The infrared and visible image registration method of claim 1, wherein the infrared and visible image registration method comprises the steps of:

collecting an infrared image and a visible light image;

step three, detecting a visible light image target by using a YOLOv5 network: detecting a target in the visible light image by using a YOLOv5 network, screening pedestrians or bicycles according to label, respectively increasing the original height 1/12 above and below a pedestrian category target frame, and adjusting the width to be 0.8 times of the height according to the height; 1/30 of the original width is added on the left and the right of the bicycle category target frame respectively, and the height is adjusted to be 1.25 times of the width according to the width; the other categories directly cut the original image according to the detection frame by taking the height of the original image as half and the width as 0.8 time of the height to obtain a target approximate area;

generating a pseudo infrared image by using a conditional generation countermeasure network, zooming the cut visible light image into sizes of 512 pixels wide and 640 pixels high, inputting the sizes into the conditional generation countermeasure network to obtain a corresponding pseudo infrared image, removing a self-contained watermark from the original real infrared image during shooting, and zooming into sizes of 512 pixels wide and 640 pixels high;

step six, filtering mismatching points according to the consistency of the distance and the slope;

3. The infrared and visible image registration method of claim 2, wherein the step one of acquiring an infrared image and a visible image specifically comprises: a ResNet50 network is constructed to be used as a feature extraction network of a twin convolutional neural network, and the structure sequentially comprises the following steps: a first convolution layer, a first BN layer, an activation function layer, a maximum pooling layer, a second convolution layer, a second BN layer, a third convolution layer, a third BN layer, a fourth convolution layer, a fourth BN layer, a fifth convolution layer, a fifth BN layer; sequentially setting the number of convolution kernels of the first convolution layer to the fifth convolution layer to be 64, 64,128,256 and 512, sequentially setting the sizes of the convolution kernels to be 7, 3, 3, 3 and 3, setting the step sizes of the first convolution layer, the second convolution layer and the third convolution layer to be 2, setting the step sizes of the fourth convolution layer and the fifth convolution layer to be 1, and setting the void ratio of convolution kernels in the fourth convolution layer and the fifth convolution layer to be 2 and 4; setting the size of the core of the largest pooling layer pooling area to be 3 multiplied by 3, and setting the step length to be 2; the first BN layer to the fifth BN layer adopt batch standardization functions, the activation function layer adopts linear rectification functions, and the maximum pooling layer adopts a regional maximum pooling function.

4. The infrared and visible image registration method of claim 2, wherein the second step of constructing a training data set and a conditional generation countermeasure network and training comprises:

(2) the method comprises the following steps of constructing a generator Gnet based on a Resnet network as a characteristic extraction network for generating a countermeasure network under the condition, wherein the structure of the generator Gnet based on the Resnet network sequentially comprises a downsampling convolution module, 9 ResnetBlock modules, an upsampling deconvolution module and a Tanh component, and the downsampling convolution module sequentially comprises the following structures: a first filler convolutional layer, a first convolutional layer, a BN layer, an activation function layer, a second convolutional layer, a BN layer, an activation function layer, a third convolutional layer, a BN layer, an activation function layer; the first filling convolutional layer parameter is 3, the number of convolutional kernels of the first to third convolutional layers is sequentially set to be 64,128 and 256, the sizes of the convolutional kernels are sequentially set to be 7, 3 and 3, the step sizes are sequentially set to be 1, 2 and 2, and padding is sequentially set to be 0, 1 and 1; the RestnetBlock module structure sequentially comprises: fill convolutional layer, BN layer, activation function layer, DropOut layer, fill convolutional layer, BN layer; filling convolutional layers with parameters of 1, setting the number of convolutional cores of the convolutional layers to be 256, setting the sizes of all convolutional cores to be 3, setting the step length to be 1, and setting the padding to be 1; the up-sampling deconvolution module structure is as follows in sequence: a first deconvolution layer, a BN layer, an activation function layer, a second deconvolution layer, a BN layer, an activation function layer, a third filled convolution layer, a third convolution layer, an activation function layer; the number of the first deconvolution layer convolution kernels and the number of the second deconvolution layer convolution kernels are sequentially set to be 128 and 64, the sizes of the convolution kernels are all 3, the step sizes are all 2, the padding is all 1, the parameter of the third filling convolution layer is 3, the number of the third convolution layer convolution kernels is 3, the size of the convolution kernels is 7, the step size is 1, and the padding is 0; all BN layers adopt batch standardization functions, and the rest of the activation function layers except the last activation function layer adopt Tanh functions adopt linear rectification functions;

(4) training a condition generation countermeasure network, inputting a training data set into the condition generation countermeasure network, and updating the weight of the condition generation countermeasure network by using an Adam algorithm until the countermeasure Loss function Loss converges.

5. The infrared and visible image registration method of claim 4, wherein the homography transformation matrix H in (1) is defined as follows:

6. The infrared and visible image registration method of claim 4, wherein the linear rectification function in (2) is defined as follows:

the Tanh function is defined as follows:

the LeakyReLU function is defined as follows:

where a is a fixed parameter of (1, + ∞), and x is an input;

the Loss function Loss in (4) is defined as follows:

the generation loss of Gnet is defined as follows:

Lg＝-log D(G(I_vis))-μ||I_inf-G(I_vis)||；

the loss of resistance to Dnet is defined as follows:

L_d＝-log(1-D(I_vis，G(I_vis))-log(D(I_vis，I_inf))；

7. The infrared and visible image registration method of claim 2, wherein the step six distance, slope consistency filtering mismatch points comprises:

(2) and when the number of the correctly matched feature point pairs is large, setting the slope of a straight line formed by the feature points in the real infrared image and the feature points in the pseudo infrared image to be less than 0.1, and screening the mismatching points.

8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the infrared and visible image registration method according to any one of claims 1 to 7.

9. An intelligent image processing terminal, characterized in that the intelligent image processing terminal is used for realizing the infrared and visible light image registration method as claimed in any one of claims 1 to 7.

10. An infrared and visible image registration system for implementing the infrared and visible image registration method of any one of claims 1 to 7, wherein the infrared and visible image registration system comprises:

the pseudo infrared image generation module is used for generating a pseudo infrared image from the visible light image by utilizing the generation antagonistic neural network;