CN114529593A - Infrared and visible light image registration method, system, equipment and image processing terminal - Google Patents

Infrared and visible light image registration method, system, equipment and image processing terminal Download PDF

Info

Publication number
CN114529593A
CN114529593A CN202210029468.8A CN202210029468A CN114529593A CN 114529593 A CN114529593 A CN 114529593A CN 202210029468 A CN202210029468 A CN 202210029468A CN 114529593 A CN114529593 A CN 114529593A
Authority
CN
China
Prior art keywords
layer
image
infrared
visible light
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210029468.8A
Other languages
Chinese (zh)
Inventor
王斌
牛兴振
郭盛林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210029468.8A priority Critical patent/CN114529593A/en
Publication of CN114529593A publication Critical patent/CN114529593A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The invention belongs to the technical field of image processing, and discloses an infrared and visible light image registration method, a system, equipment and an image processing terminal, which comprise the following steps: (1) collecting an infrared image and a visible light image; (2) constructing a training data set, and constructing and training conditions to generate an antagonistic neural network; (3) detecting a visible light image target by using a YOLOv5 network; (4) generating a pseudo-infrared image from the visible light image with a generating antagonistic neural network; (5) constructing a feature descriptor by an accelerated robust feature (SURF) algorithm, and constructing a matching model by using a violent matching method; (6) filtering mismatching points through distance and slope consistency; (7) and estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration. The invention can finish relatively accurate registration of the infrared and visible light images with large target size difference and complex background, and is a practical solution for registering the infrared and visible light images.

Description

Infrared and visible light image registration method, system, equipment and image processing terminal
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an infrared and visible light image registration method, system, equipment and an image processing terminal. The method can be used for the infrared and visible light image registration tasks with large target size difference and complex background.
Background
At present, multi-modal images derived from different imaging devices can provide richer and more comprehensive information than single-modal images, and people have an increasing demand for integration and utilization of various image information. Image registration is a process of determining transformation parameters among images according to some similarity measures, so that two or more images of the same scene acquired from different sensors, different visual angles and different time are transformed to the same coordinate system to obtain the best matching on a pixel layer. The image registration technology is a core technology for providing an operation reference for image processing tasks such as subsequent image splicing, image fusion and the like, and is a hot research problem in the field of computer vision.
The infrared and visible light image registration is an important multi-sensor image registration method, and plays an important role in the fields of computer vision, robots, power equipment fault detection, remote sensing, military application and the like. However, due to the huge difference in resolution and color between the infrared image and the visible light image, the registration between the infrared image and the visible light image is difficult to achieve, wherein the infrared image is mostly within "500 x 960" in resolution, which is much lower than the "2160 x 3840" resolution of the visible light image, which results in serious loss of grayscale detail of the infrared image, blurred image, and large difference from clear texture in the visible light grayscale image. In addition, the infrared image and the visible light image have different imaging mechanisms, and the image contrast is determined by the reflectivity and the shadow in the visible light wave band; in the infrared band, contrast is determined by emissivity and temperature, and particularly as a result of temperature differences, contrast can vary over a wide range, which results in greater differences in color appearance in heterogeneous images.
Aiming at the problems that target information of an infrared image and a visible light image has great difference, and only SURF feature descriptors are utilized, so that the accuracy is low, the robustness is poor and the like, a new registration method is expected to be obtained, and the method can be used for registering the infrared image and the visible light image with large target size difference and complex background.
Through the above analysis, the problems and defects of the prior art are as follows: the existing infrared image and visible light image target information have great difference, and the accuracy and robustness are poor only by using SURF feature descriptors.
The difficulty in solving the above problems and defects is: the prior art is used for solving the problems that the infrared and visible light images with large resolution and color difference and complex background have high registration difficulty, low registration precision and poor robustness, and a better solution is not provided at present.
The significance of solving the problems and the defects is as follows: the invention provides a practical solution, improves the registration precision and robustness of the infrared and visible light images, and effectively solves the registration problem of the infrared and visible light images with huge difference and complex background in target information.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an infrared and visible light image registration method, a system, equipment and an image processing terminal.
The invention is realized in such a way, the infrared and visible light image registration method detects a target approximate region in a visible light image by using a YOLOv5 network, performs modal conversion on the visible light image into a pseudo infrared image by using a condition generation countermeasure network, constructs feature description of the infrared image pair based on a SURF descriptor, constructs a matching model by using violence matching, then filters matching points by using distance constraint and slope consistency, and finally performs parameter estimation of homography transformation by using a random sampling consistency RANSAC algorithm and completes registration on a real infrared image.
Further, the infrared and visible light image registration method comprises the following steps:
collecting an infrared image and a visible light image;
step two, constructing a training data set and a condition to generate a confrontation network and training;
step three, detecting the visible light image target by using a YOLOv5 network: detecting a target in the visible light image by using a YOLOv5 network, screening out pedestrians or bicycles according to label, increasing original height 1/12 above and below a pedestrian type target frame respectively, and adjusting the width to be 0.8 times of the height according to the height; 1/30 of the original width is added on the left and the right of the bicycle category target frame respectively, and the height is adjusted to be 1.25 times of the width according to the width; the other categories directly cut the original image according to the detection frame by taking the height of the original image as half and the width as 0.8 time of the height to obtain a target approximate area; the problem of overlarge resolution difference between an infrared image and a visible light image can be effectively solved by detecting the approximate area of the visible light target;
generating a pseudo infrared image by using a conditional generation countermeasure network, zooming the cut visible light image into sizes of 512 pixels wide and 640 pixels high, inputting the sizes into the conditional generation countermeasure network to obtain a corresponding pseudo infrared image, removing a self-contained watermark from the original real infrared image during shooting, and zooming into sizes of 512 pixels wide and 640 pixels high; the color difference of the images can be reduced by generating a false infrared image by the visible light image through the generation of the countermeasure network, and the registration difficulty of the different-source images is reduced;
step five, constructing a descriptor by an SURF algorithm, and constructing a matching model by violence matching: respectively extracting the positions of key points of the real infrared image and the pseudo infrared image by using an accelerated robust feature SURF algorithm and constructing a descriptor, constructing a matching model between the feature points by using a violent matching method according to Euclidean distance, and selecting two optimal matches for each key point;
step six, filtering mismatching points according to the consistency of the distance and the slope; obviously wrong mismatching points can be eliminated through distance and slope consistency constraint, and the accuracy of subsequent estimation transformation parameters is improved.
Step seven, estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration: and finally, estimating parameters of a homography transformation matrix H according to a random sampling consistency algorithm by using matching points which are reserved in the real infrared image and the pseudo infrared image, and transforming the real infrared image to complete registration.
Further, the step one of acquiring the infrared image and the visible light image specifically includes: a ResNet50 network is constructed to be used as a feature extraction network of a twin convolutional neural network, and the structure sequentially comprises the following steps: a first convolution layer, a first BN layer, an activation function layer, a maximum pooling layer, a second convolution layer, a second BN layer, a third convolution layer, a third BN layer, a fourth convolution layer, a fourth BN layer, a fifth convolution layer, a fifth BN layer; sequentially setting the number of convolution kernels of the first convolution layer to the fifth convolution layer to be 64, 64,128,256 and 512, sequentially setting the sizes of the convolution kernels to be 7, 3, 3, 3 and 3, setting the step sizes of the first convolution layer, the second convolution layer and the third convolution layer to be 2, setting the step sizes of the fourth convolution layer and the fifth convolution layer to be 1, and setting the void ratio of convolution kernels in the fourth convolution layer and the fifth convolution layer to be 2 and 4; the size of the core of the maximum pooling layer pooling area is set to be 3 multiplied by 3, and the step length is set to be 2; the first BN layer to the fifth BN layer adopt batch standardization functions, the activation function layer adopts linear rectification functions, and the maximum pooling layer adopts a regional maximum pooling function.
Further, the step two of constructing a training data set and a condition generating countermeasure network and training comprises:
(1) constructing a training data set, wherein the size difference between the visible light image and the infrared image is large, and the visible light image is reduced by 3 times to 1280 pixels and 720 pixels; manually selecting four pairs of matched pixel points in the infrared and visible light image pairs, determining a homography transformation matrix H through the matched pixel points, transforming the infrared image into an infrared and visible light image pair with the size consistent with that of the visible light image, manually selecting a region containing a target and cutting the region into the infrared and visible light image pair with the size of integral multiple of 4;
(2) the method comprises the following steps of constructing a generator Gnet based on a Resnet network as a characteristic extraction network for generating a countermeasure network under the condition, wherein the structure of the generator Gnet based on the Resnet network sequentially comprises a downsampling convolution module, 9 ResnetBlock modules, an upsampling deconvolution module and a Tanh component, and the downsampling convolution module sequentially comprises the following structures: a first filler convolutional layer, a first convolutional layer, a BN layer, an activation function layer, a second convolutional layer, a BN layer, an activation function layer, a third convolutional layer, a BN layer, an activation function layer; the first filling convolutional layer parameter is 3, the number of convolutional kernels of the first to third convolutional layers is sequentially set to be 64,128 and 256, the sizes of the convolutional kernels are sequentially set to be 7, 3 and 3, the step sizes are sequentially set to be 1, 2 and 2, and padding is sequentially set to be 0, 1 and 1; the RestnetBlock module structure sequentially comprises: fill convolutional layer, BN layer, activation function layer, DropOut layer, fill convolutional layer, BN layer; filling convolutional layers with parameters of 1, setting the number of convolutional cores of the convolutional layers to be 256, setting the sizes of all convolutional cores to be 3, setting the step length to be 1, and setting the padding to be 1; the up-sampling deconvolution module structure is as follows in sequence: a first deconvolution layer, a BN layer, an activation function layer, a second deconvolution layer, a BN layer, an activation function layer, a third filled convolution layer, a third convolution layer, an activation function layer; the number of convolution kernels of the first deconvolution layer and the number of convolution kernels of the second deconvolution layer are sequentially set to be 128 and 64, the sizes of the convolution kernels are both 3, the step sizes are both 2, the padding is both 1, the parameter of the third filling convolution layer is 3, the number of convolution kernels of the third convolution layer is 3, the size of the convolution kernels is 7, the step size is 1, and the padding is 0; all BN layers adopt batch standardization functions, and the rest of the activation function layers except the last activation function layer adopt Tanh functions adopt linear rectification functions;
(3) constructing a discriminator network of a Patchgan style Dnet as a condition generation network, wherein the structure of the discriminator network sequentially comprises a first convolution layer, an activation function layer, a second convolution layer, a BN layer, an activation function layer, a third convolution layer, a BN layer, an activation function layer, a fourth convolution layer, a BN layer, an activation function layer and a fifth convolution layer; the number of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer is sequentially set to be 64,128,256,512 and 1, the size of each convolution kernel is set to be 4, the step size is sequentially set to be 2,2,2,1 and 1, padding is set to be 1, the BN layer adopts a batch normalization function, and the activation function layer is set to be a LeakyReLU function;
(4) training conditions to generate a confrontation network, inputting a training data set into the condition generation confrontation network, and updating the conditions by using an Adam algorithm to generate a weight of the confrontation network until the confrontation Loss function Loss reaches convergence.
Further, the homography transformation matrix H in (1) is defined as follows:
Figure BDA0003465832030000051
where s represents a scale factor, M is a camera reference matrix, r1, r2, and t represents camera external references.
Further, the linear rectification function in (2) is defined as follows:
Figure BDA0003465832030000052
the Tanh function is defined as follows:
Figure BDA0003465832030000053
the LeakyReLU function is defined as follows:
Figure BDA0003465832030000054
where a is a fixed parameter of (1, + ∞), and x is an input;
the Loss function Loss in (4) is defined as follows:
Figure BDA0003465832030000055
the generation loss of Gnet is defined as follows:
Lg=-log D(G(Ivis)-μ||Iinf-G(Ivis)||;
the antagonistic loss of Dnet is defined as follows:
Ld=-log(l-D(Ivis,G(Ivis))-log(D(Ivis,Iinf));
wherein LcgAN represents the total loss of the condition generation countermeasure network, G represents the generator Gnet, D represents the discriminator Dnet, x represents the input image, z represents the noise vector, and Dropout layer is adopted for z; μ denotes the weight of the second term in Lg, set to 100, Iinf denotes the real infrared image, Ivis denotes the visible light image.
Further, the step six distance and slope consistency filtering mismatching points comprises:
(1) aiming at the set of two optimal matches of each key point, if the Euclidean distance of the first optimal match is smaller than 0.75 times of the Euclidean distance of the second optimal match, keeping the first optimal match, and otherwise, rejecting the two optimal matches;
(2) when the number of correctly matched feature point pairs is large, setting the slope of a straight line formed by the feature points in the real infrared image and the feature points in the pseudo infrared image to be less than 0.1, and screening mismatching points;
it is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the infrared and visible image registration method.
Another object of the present invention is to provide an intelligent image processing terminal, which is used for implementing the infrared and visible light image registration method.
It is another object of the present invention to provide an infrared and visible image registration system implementing the infrared and visible image registration method, the infrared and visible image registration system comprising:
the image acquisition module is used for acquiring an infrared image and a visible light image;
the training data set construction module is used for constructing a training data set, constructing and training conditions and generating a confrontation neural network;
the image target detection module is used for detecting the visible light image target by using a YOLOv5 network;
the pseudo infrared image generation module is used for generating a pseudo infrared image from the visible light image by utilizing the generation countermeasure neural network;
the matching model building module is used for accelerating the robust feature SURF algorithm to build a feature descriptor and building a matching model by using a violence matching method;
the mismatching point filtering module is used for filtering mismatching points through distance and slope consistency;
and the matching completion module is used for estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration.
By combining all the technical schemes, the invention has the advantages and positive effects that: the method has the advantages that the problem of overlarge image resolution difference can be effectively solved by using a YOLOv5 network to detect a target approximate region in a visible light image, the contrast network generated by using conditions is used to perform modal conversion on the visible light image into a pseudo infrared image, the color difference of a different source image can be greatly reduced, the registration difficulty can be reduced, the characteristic description of the infrared image pair is constructed based on a SURF descriptor, a matching model is constructed by using violence matching, then the matching points are filtered by using distance constraint and slope consistency, the registration precision can be improved, and finally, the parameter estimation of homography transformation is performed by using a random sample consistency (RANSAC) algorithm, and the registration of the real infrared image is completed. The method is used for improving the registration accuracy of the infrared and visible light images with large size difference and complex background of the heterogeneous images.
The registration system can utilize a YOLOv5 network to detect the approximate target area of the visible light image, solve the problem of large resolution difference between the visible light image and the infrared image, and solve the problem of mismatching by adopting distance and slope consistency constraints, so that the registration system can be applied to the infrared and visible light image registration scene with large target size difference and complex background, and has the advantages of high precision and good robustness.
The method utilizes the condition generation countermeasure network to perform modal conversion on the visible light image to generate the pseudo infrared image, solves the problem of large color difference of the heterogeneous image, overcomes the problems of low precision and poor anti-interference capability of directly using SURF characteristics to perform registration, and has the advantages of strong generalization capability and capability of coping with infrared and visible light image registration scenes with much interference and complex background.
Drawings
Fig. 1 is a flowchart of an infrared and visible light image registration method according to an embodiment of the present invention.
FIG. 2 is a schematic structural diagram of an infrared and visible image registration system provided by an embodiment of the present invention;
in fig. 2: 1. an image acquisition module; 2. a training data set construction module; 3. an image target detection module; 4. a pseudo-infrared image generation module; 5. a matching model construction module; 6. a mismatching point filtering module; 7. and a matching completion module.
Fig. 3 is a flowchart of an implementation of the infrared and visible light image registration method according to the embodiment of the present invention.
Fig. 4 is a schematic diagram of the structure of the Gnet network of the generator constructed by the invention.
Fig. 5 is a schematic structural diagram of a ResnetBlock module constructed by the present invention.
FIG. 6 is a schematic diagram of a network structure of a discriminator Dnet constructed by the invention.
FIG. 7 is a graph of the results of the present invention after registration of the bicycle-like infrared and visible images.
Fig. 8 is a graph of the result of the invention after registration of infrared and visible images of pedestrians.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to solve the problems in the prior art, the present invention provides a method, a system, a device, and an image processing terminal for registering infrared and visible light images, and the present invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the infrared and visible light image registration method provided by the present invention comprises the following steps:
s101: collecting an infrared image and a visible light image;
s102: constructing a training data set, and constructing and training conditions to generate a confrontation neural network;
s103: detecting a visible light image target by using a YOLOv5 network;
s104: generating a pseudo-infrared image from the visible light image using a generating countermeasure neural network;
s105: constructing a feature descriptor by an accelerated robust feature (SURF) algorithm, and constructing a matching model by using a violent matching method;
s106: filtering mismatching points through distance and slope consistency;
s107: and estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration.
Those skilled in the art of the infrared and visible light image registration method provided by the present invention may also use other steps to replace SURF features with SIFT, ORB, etc., and use a least square method to replace RANSAC algorithm, and the infrared and visible light image registration method provided by the present invention in fig. 1 is only a specific example.
As shown in fig. 2, the present invention provides an infrared and visible image registration system comprising:
the image acquisition module 1 is used for acquiring an infrared image and a visible light image;
the training data set construction module 2 is used for constructing a training data set, constructing and training conditions and generating a confrontation neural network;
the image target detection module 3 is used for detecting the visible light image target by using a YOLOv5 network;
a pseudo infrared image generation module 4 for generating a pseudo infrared image from the visible light image by using the generation countermeasure neural network;
the matching model construction module 5 is used for constructing a feature descriptor by an accelerated robust feature (SURF) algorithm and constructing a matching model by using a violent matching method;
the mismatching point filtering module 6 is used for filtering mismatching points through distance and slope consistency;
and a matching completion module 7 for estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete the registration.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
As shown in fig. 3, the infrared and visible light image registration method of the present invention comprises the following steps:
acquiring an infrared image and a visible light image: the common mobile phone is provided with a near-infrared camera and is used for shooting visible light images and infrared images of pedestrians or bicycles in the same scene at the same time, the resolution of the visible light images is wide 2160 pixels and high 3840 pixels, the resolution of the infrared images is wide 507 pixels and high 960 pixels, and each scene is 100 pairs, and the total number of the images is 200 pairs.
Step two, constructing a training data set and a condition to generate a confrontation network and training:
(2a) and (3) constructing a training data set, wherein the size difference between the visible light image and the infrared image is large, and the visible light image is reduced by 3 times to 1280 pixels and 720 pixels. And manually selecting four pairs of matched pixel points in the infrared and visible light image pairs, determining a homography transformation matrix H through the matched pixel points, transforming the infrared image into the infrared and visible light image pairs with the size consistent with that of the visible light image, manually selecting a region containing a target and cutting the region into the infrared and visible light image pairs with the size of integral multiple of 4.
The homography transformation matrix H is defined as follows:
Figure BDA0003465832030000101
where s represents a scale factor, M is a camera reference matrix, r1,r2And t represents an external parameter of the camera.
The condition generation countermeasure network of the present invention is further described with respect to Gnet as shown in fig. 4.
Fig. 4 is a schematic structural diagram of Gnet as a generator, and the structure of the Gnet sequentially comprises a downsampling convolution module, 9 ResnetBlock modules, an upsampling deconvolution module and a Tanh component, wherein the downsampling convolution module sequentially comprises: a first filler convolutional layer, a first convolutional layer, a BN layer, an activation function layer, a second convolutional layer, a BN layer, an activation function layer, a third convolutional layer, a BN layer, an activation function layer; the first filling convolutional layer parameter is 3, the number of convolutional kernels of the first to third convolutional layers is sequentially set to be 64,128 and 256, the sizes of the convolutional kernels are sequentially set to be 7, 3 and 3, the step sizes are sequentially set to be 1, 2 and 2, and padding is sequentially set to be 0, 1 and 1; the up-sampling deconvolution module structure is as follows in sequence: a first deconvolution layer, a BN layer, an activation function layer, a second deconvolution layer, a BN layer, an activation function layer, a third filled convolution layer, a third convolution layer, an activation function layer; the number of convolution kernels of the first deconvolution layer and the number of convolution kernels of the second deconvolution layer are sequentially set to be 128 and 64, the sizes of the convolution kernels are both 3, the step sizes are both 2, the padding is both 1, the parameter of the third filling convolution layer is 3, the number of convolution kernels of the third convolution layer is 3, the size of the convolution kernels is 7, the step size is 1, and the padding is 0; all BN layers adopt batch standardization functions, and the rest of the activation function layers except the last activation function layer adopt Tanh functions adopt linear rectification functions.
The linear rectification function is defined as follows:
Figure BDA0003465832030000102
the Tanh function is defined as follows:
Figure BDA0003465832030000103
the ResnetBlock module in the Gnet constructed in accordance with the present invention is further described, as shown in FIG. 5.
Fig. 5 is a schematic structural diagram of a ResnetBlock module, and the structure sequentially includes: fill convolutional layer, BN layer, activation function layer, DropOut layer, fill convolutional layer, BN layer; the parameters of the filling convolutional layer are 1, the number of convolutional cores of the convolutional layer is set to be 256, the sizes of the convolutional layers are all 3, the step lengths are all 1, and the padding is all 1.
The condition generation countermeasure network constructed by the present invention is further described with respect to dnets as shown in fig. 6.
FIG. 6 is a schematic diagram showing a structure of a discriminator Dnet, which comprises a first convolution layer, an activation function layer, a second convolution layer, a BN layer, an activation function layer, a third convolution layer, a BN layer, an activation function layer, a fourth convolution layer, a BN layer, an activation function layer, and a fifth convolution layer in this order; the number of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer is sequentially set to be 64,128,256,512 and 1, the size of each convolution kernel is set to be 4, the step size is sequentially set to be 2,2,2,1 and 1, padding is set to be 1, the BN layer adopts a batch normalization function, and the activation function layer is set to be a LeakyReLU function.
(2d) Training a condition generation countermeasure network, inputting a training data set into the condition generation countermeasure network, and updating the weight of the condition generation countermeasure network by using an Adam algorithm until the countermeasure Loss function Loss converges.
The Loss function Loss is defined as follows:
Figure BDA0003465832030000111
the generation loss of Gnet is defined as follows:
Lg=-logD(G(Ivis))-μ||Iinf-G(Ivis)||
the antagonistic loss of Dnet is defined as follows:
Ld=-log(1-D(Ivis,G(Ivis))-log(D(Ivis,Iinf))
wherein LcgAN represents the total loss of the condition generation countermeasure network, G represents the generator Gnet, D represents the discriminator Dnet, x represents the input image, z represents the noise vector, and Dropout layer is adopted for z; μ denotes the weight of the second term in Lg, set to 100, Iinf denotes the real infrared image, Ivis denotes the visible light image.
Step three, detecting the visible light image target by using a YOLOv5 network:
detecting a target in the visible light image by using a YOLOv5 network, screening out pedestrians or bicycles according to label, increasing original height 1/12 above and below a pedestrian type target frame respectively, and adjusting the width to be 0.8 times of the height according to the height; 1/30 of the original width is added on the left and the right of the bicycle category target frame respectively, and the height is adjusted to be 1.25 times of the width according to the width; the other categories directly cut the original image according to the detection frame by half of the height of the original image and 0.8 time of the width of the original image to obtain an approximate target area;
step four, generating a countermeasure network by using conditions to generate a pseudo infrared image:
the cut visible light image is zoomed into the size of 512 pixels wide and 640 pixels high and input into a condition generation countermeasure network to obtain a corresponding pseudo infrared image, and the original real infrared image is removed with a watermark during shooting and zoomed into the size of 512 pixels wide and 640 pixels high;
step five, constructing a descriptor by an SURF algorithm, and constructing a matching model by violence matching:
respectively extracting the positions of key points of the real infrared image and the pseudo infrared image by using an accelerated robust feature (SURF) algorithm and constructing a descriptor, constructing a matching model between the feature points by using a violent matching method according to Euclidean distance, and selecting two optimal matches for each key point;
step six, filtering mismatching points according to the consistency of the distance and the slope:
(6a) aiming at the set of two optimal matches of each key point, if the Euclidean distance of the first optimal match is smaller than 0.75 times of the Euclidean distance of the second optimal match, keeping the first optimal match, and otherwise, rejecting the two optimal matches;
(6b) when the number of correctly matched feature point pairs is large, setting the slope of a straight line formed by the feature points in the real infrared image and the feature points in the pseudo infrared image to be less than 0.1, and screening mismatching points;
step seven, estimating transformation parameters by a random sample consensus (RANSAC) algorithm to complete registration:
and finally, estimating parameters of a homography transformation matrix H according to a random sampling consistency algorithm by using matching points which are reserved in the real infrared image and the pseudo infrared image, and transforming the real infrared image to complete registration.
The technical effects of the present invention will be described in detail with reference to simulation experiments.
1. Simulation conditions are as follows:
the hardware platform of the simulation experiment of the invention is as follows: the processor is Intel (R) core (TM) i7-10700KCPU, the main frequency is 2.9GHz, the memory is 64GB, and the display card is NVIDIAGeForceRTX 3060.
The software platform of the simulation experiment of the invention is as follows: ubuntu20.04 operating system, Pycharm2021 software, python3.7, and Pytorch deep learning framework.
2. Simulation content and result analysis:
when a training set and a test set are generated in a simulation experiment, a pedestrian and bicycle data set shot in a campus by a user is used, and each category comprises 100 images, and the total number of the images is 400. Each 50 pairs of categories were used as training data sets and the other 50 pairs were used as test data sets in the simulation experiments of the present invention.
In the simulation experiment, the adopted prior art refers to that:
an accelerated robust feature method, abbreviated as SURF algorithm, proposed in Herbert Bay, Tinne Tuytelaars et al, SURF: Speeded up robust features [ C ]// Proceedings of the 9th European conference on Computer Vision-Volume Part I.Springer-Verlag,2006.
In order to qualitatively evaluate the simulation effect of the present invention, the present invention is used to test the bicycle class and pedestrian class data sets respectively, and the obtained registration results are shown in fig. 7 and 8 respectively. Fig. 7 (a) is a visible light image and a real infrared image detected and cropped by YOLOv5, fig. 7 (b) is a result after registration by the SURF algorithm and a result after registration by the present invention, fig. 7 (c) is a result after registration and fusion by the SURF algorithm and a result after registration and fusion by the present invention, and fig. 8 is the same.
As can be seen from fig. 7 and 8: the subjective evaluation effect of the invention for the infrared and visible light image registration is better than the registration effect based on the SURF algorithm.
In order to quantitatively evaluate the simulation effect of the invention, the invention adopts average absolute error (MAE), peak signal-to-noise ratio (PSNR), Normalized Mutual Information (NMI) and Structural Similarity (SSIM) as performance evaluation indexes to compare with the prior art, and the comparison result is shown in Table 1:
table 1 comparison table of evaluation indexes of the present invention and the prior art in simulation experiment
Method/index mMAE mPSNR mNMI mSSIM
Prior Art 56.7484 9.5436 0.1590 0.4692
The invention 34.0360 13.7719 0.2393 0.6004
As can be seen from table 1, in the bicycle and pedestrian test data set, the method is superior to the prior art in all evaluation indexes, and proves that the method achieves the effects of higher precision and stronger robustness in the infrared and visible light image registration tasks with larger target size difference and complex background, and is a very practical registration method and system.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The infrared and visible light image registration method is characterized in that a YOLOv5 network is used for detecting a target approximate region in a visible light image, a countermeasure network is generated by utilizing conditions to perform modal conversion on the visible light image into a pseudo infrared image, feature description of the infrared image pair is constructed based on a SURF descriptor, a matching model is constructed by utilizing violence matching, then distance constraint and slope consistency are utilized to filter matching points, and finally a random sampling consistency RANSAC algorithm is utilized to perform parameter estimation of homography transformation and complete registration on a real infrared image.
2. The infrared and visible image registration method of claim 1, wherein the infrared and visible image registration method comprises the steps of:
collecting an infrared image and a visible light image;
step two, constructing a training data set and a condition to generate a confrontation network and training;
step three, detecting a visible light image target by using a YOLOv5 network: detecting a target in the visible light image by using a YOLOv5 network, screening pedestrians or bicycles according to label, respectively increasing the original height 1/12 above and below a pedestrian category target frame, and adjusting the width to be 0.8 times of the height according to the height; 1/30 of the original width is added on the left and the right of the bicycle category target frame respectively, and the height is adjusted to be 1.25 times of the width according to the width; the other categories directly cut the original image according to the detection frame by taking the height of the original image as half and the width as 0.8 time of the height to obtain a target approximate area;
generating a pseudo infrared image by using a conditional generation countermeasure network, zooming the cut visible light image into sizes of 512 pixels wide and 640 pixels high, inputting the sizes into the conditional generation countermeasure network to obtain a corresponding pseudo infrared image, removing a self-contained watermark from the original real infrared image during shooting, and zooming into sizes of 512 pixels wide and 640 pixels high;
step five, constructing a descriptor by an SURF algorithm, and constructing a matching model by violence matching: respectively extracting the positions of key points of the real infrared image and the pseudo infrared image by using an accelerated robust feature SURF algorithm and constructing a descriptor, constructing a matching model between the feature points by using a violent matching method according to Euclidean distance, and selecting two optimal matches for each key point;
step six, filtering mismatching points according to the consistency of the distance and the slope;
step seven, estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration: and finally, estimating parameters of a homography transformation matrix H according to a random sampling consistency algorithm by using matching points which are reserved in the real infrared image and the pseudo infrared image, and transforming the real infrared image to complete registration.
3. The infrared and visible image registration method of claim 2, wherein the step one of acquiring an infrared image and a visible image specifically comprises: a ResNet50 network is constructed to be used as a feature extraction network of a twin convolutional neural network, and the structure sequentially comprises the following steps: a first convolution layer, a first BN layer, an activation function layer, a maximum pooling layer, a second convolution layer, a second BN layer, a third convolution layer, a third BN layer, a fourth convolution layer, a fourth BN layer, a fifth convolution layer, a fifth BN layer; sequentially setting the number of convolution kernels of the first convolution layer to the fifth convolution layer to be 64, 64,128,256 and 512, sequentially setting the sizes of the convolution kernels to be 7, 3, 3, 3 and 3, setting the step sizes of the first convolution layer, the second convolution layer and the third convolution layer to be 2, setting the step sizes of the fourth convolution layer and the fifth convolution layer to be 1, and setting the void ratio of convolution kernels in the fourth convolution layer and the fifth convolution layer to be 2 and 4; setting the size of the core of the largest pooling layer pooling area to be 3 multiplied by 3, and setting the step length to be 2; the first BN layer to the fifth BN layer adopt batch standardization functions, the activation function layer adopts linear rectification functions, and the maximum pooling layer adopts a regional maximum pooling function.
4. The infrared and visible image registration method of claim 2, wherein the second step of constructing a training data set and a conditional generation countermeasure network and training comprises:
(1) constructing a training data set, wherein the size difference between the visible light image and the infrared image is large, and the visible light image is reduced by 3 times to 1280 pixels and 720 pixels; manually selecting four pairs of matched pixel points in the infrared and visible light image pairs, determining a homography transformation matrix H through the matched pixel points, transforming the infrared image into an infrared and visible light image pair with the size consistent with that of the visible light image, manually selecting a region containing a target and cutting the region into the infrared and visible light image pair with the size of integral multiple of 4;
(2) the method comprises the following steps of constructing a generator Gnet based on a Resnet network as a characteristic extraction network for generating a countermeasure network under the condition, wherein the structure of the generator Gnet based on the Resnet network sequentially comprises a downsampling convolution module, 9 ResnetBlock modules, an upsampling deconvolution module and a Tanh component, and the downsampling convolution module sequentially comprises the following structures: a first filler convolutional layer, a first convolutional layer, a BN layer, an activation function layer, a second convolutional layer, a BN layer, an activation function layer, a third convolutional layer, a BN layer, an activation function layer; the first filling convolutional layer parameter is 3, the number of convolutional kernels of the first to third convolutional layers is sequentially set to be 64,128 and 256, the sizes of the convolutional kernels are sequentially set to be 7, 3 and 3, the step sizes are sequentially set to be 1, 2 and 2, and padding is sequentially set to be 0, 1 and 1; the RestnetBlock module structure sequentially comprises: fill convolutional layer, BN layer, activation function layer, DropOut layer, fill convolutional layer, BN layer; filling convolutional layers with parameters of 1, setting the number of convolutional cores of the convolutional layers to be 256, setting the sizes of all convolutional cores to be 3, setting the step length to be 1, and setting the padding to be 1; the up-sampling deconvolution module structure is as follows in sequence: a first deconvolution layer, a BN layer, an activation function layer, a second deconvolution layer, a BN layer, an activation function layer, a third filled convolution layer, a third convolution layer, an activation function layer; the number of the first deconvolution layer convolution kernels and the number of the second deconvolution layer convolution kernels are sequentially set to be 128 and 64, the sizes of the convolution kernels are all 3, the step sizes are all 2, the padding is all 1, the parameter of the third filling convolution layer is 3, the number of the third convolution layer convolution kernels is 3, the size of the convolution kernels is 7, the step size is 1, and the padding is 0; all BN layers adopt batch standardization functions, and the rest of the activation function layers except the last activation function layer adopt Tanh functions adopt linear rectification functions;
(3) constructing a discriminator network of a Patchgan style Dnet as a condition generation network, wherein the structure of the discriminator network sequentially comprises a first convolution layer, an activation function layer, a second convolution layer, a BN layer, an activation function layer, a third convolution layer, a BN layer, an activation function layer, a fourth convolution layer, a BN layer, an activation function layer and a fifth convolution layer; the number of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer is sequentially set to be 64,128,256,512 and 1, the size of each convolution kernel is set to be 4, the step size is sequentially set to be 2,2,2,1 and 1, padding is set to be 1, the BN layer adopts a batch normalization function, and the activation function layer is set to be a LeakyReLU function;
(4) training a condition generation countermeasure network, inputting a training data set into the condition generation countermeasure network, and updating the weight of the condition generation countermeasure network by using an Adam algorithm until the countermeasure Loss function Loss converges.
5. The infrared and visible image registration method of claim 4, wherein the homography transformation matrix H in (1) is defined as follows:
Figure FDA0003465832020000031
where s represents a scale factor, M is a camera reference matrix, r1, r2, and t represents camera external references.
6. The infrared and visible image registration method of claim 4, wherein the linear rectification function in (2) is defined as follows:
Figure FDA0003465832020000041
the Tanh function is defined as follows:
Figure FDA0003465832020000042
the LeakyReLU function is defined as follows:
Figure FDA0003465832020000043
where a is a fixed parameter of (1, + ∞), and x is an input;
the Loss function Loss in (4) is defined as follows:
Figure FDA0003465832020000044
the generation loss of Gnet is defined as follows:
Lg=-log D(G(Ivis))-μ||Iinf-G(Ivis)||;
the loss of resistance to Dnet is defined as follows:
Ld=-log(1-D(Ivis,G(Ivis))-log(D(Ivis,Iinf));
wherein LcgAN represents the total loss of the condition generation countermeasure network, G represents the generator Gnet, D represents the discriminator Dnet, x represents the input image, z represents the noise vector, and Dropout layer is adopted for z; μ denotes the weight of the second term in Lg, set to 100, Iinf denotes the real infrared image, Ivis denotes the visible light image.
7. The infrared and visible image registration method of claim 2, wherein the step six distance, slope consistency filtering mismatch points comprises:
(1) aiming at the set of two optimal matches of each key point, if the Euclidean distance of the first optimal match is smaller than 0.75 times of the Euclidean distance of the second optimal match, keeping the first optimal match, and otherwise, rejecting the two optimal matches;
(2) and when the number of the correctly matched feature point pairs is large, setting the slope of a straight line formed by the feature points in the real infrared image and the feature points in the pseudo infrared image to be less than 0.1, and screening the mismatching points.
8. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the infrared and visible image registration method according to any one of claims 1 to 7.
9. An intelligent image processing terminal, characterized in that the intelligent image processing terminal is used for realizing the infrared and visible light image registration method as claimed in any one of claims 1 to 7.
10. An infrared and visible image registration system for implementing the infrared and visible image registration method of any one of claims 1 to 7, wherein the infrared and visible image registration system comprises:
the image acquisition module is used for acquiring an infrared image and a visible light image;
the training data set construction module is used for constructing a training data set, constructing and training conditions and generating a confrontation neural network;
the image target detection module is used for detecting the visible light image target by using a YOLOv5 network;
the pseudo infrared image generation module is used for generating a pseudo infrared image from the visible light image by utilizing the generation antagonistic neural network;
the matching model building module is used for accelerating the robust feature SURF algorithm to build a feature descriptor and building a matching model by using a violence matching method;
the mismatching point filtering module is used for filtering mismatching points through distance and slope consistency;
and the matching completion module is used for estimating transformation parameters by using a random sample consensus (RANSAC) algorithm to complete registration.
CN202210029468.8A 2022-01-12 2022-01-12 Infrared and visible light image registration method, system, equipment and image processing terminal Pending CN114529593A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210029468.8A CN114529593A (en) 2022-01-12 2022-01-12 Infrared and visible light image registration method, system, equipment and image processing terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210029468.8A CN114529593A (en) 2022-01-12 2022-01-12 Infrared and visible light image registration method, system, equipment and image processing terminal

Publications (1)

Publication Number Publication Date
CN114529593A true CN114529593A (en) 2022-05-24

Family

ID=81620590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210029468.8A Pending CN114529593A (en) 2022-01-12 2022-01-12 Infrared and visible light image registration method, system, equipment and image processing terminal

Country Status (1)

Country Link
CN (1) CN114529593A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115306718A (en) * 2022-07-15 2022-11-08 嘉洋智慧安全生产科技发展(北京)有限公司 Method, apparatus, device, medium and program product for detecting screw compressor failure
CN116168221A (en) * 2023-04-25 2023-05-26 中国人民解放军火箭军工程大学 Transformer-based cross-mode image matching and positioning method and device
CN116363382A (en) * 2023-02-14 2023-06-30 长春理工大学 Dual-band image feature point searching and matching method
CN116433730A (en) * 2023-06-15 2023-07-14 南昌航空大学 Image registration method combining deformable convolution and modal conversion

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115306718A (en) * 2022-07-15 2022-11-08 嘉洋智慧安全生产科技发展(北京)有限公司 Method, apparatus, device, medium and program product for detecting screw compressor failure
CN115306718B (en) * 2022-07-15 2023-08-18 嘉洋智慧安全科技(北京)股份有限公司 Screw compressor fault detection method, apparatus, device, medium and program product
CN116363382A (en) * 2023-02-14 2023-06-30 长春理工大学 Dual-band image feature point searching and matching method
CN116363382B (en) * 2023-02-14 2024-02-23 长春理工大学 Dual-band image feature point searching and matching method
CN116168221A (en) * 2023-04-25 2023-05-26 中国人民解放军火箭军工程大学 Transformer-based cross-mode image matching and positioning method and device
CN116433730A (en) * 2023-06-15 2023-07-14 南昌航空大学 Image registration method combining deformable convolution and modal conversion
CN116433730B (en) * 2023-06-15 2023-08-29 南昌航空大学 Image registration method combining deformable convolution and modal conversion

Similar Documents

Publication Publication Date Title
CN114529593A (en) Infrared and visible light image registration method, system, equipment and image processing terminal
CN108960211B (en) Multi-target human body posture detection method and system
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
US20130208997A1 (en) Method and Apparatus for Combining Panoramic Image
CN110992263B (en) Image stitching method and system
CN109118544B (en) Synthetic aperture imaging method based on perspective transformation
CN105809640A (en) Multi-sensor fusion low-illumination video image enhancement method
CN108510451A (en) A method of the reconstruction car plate based on the double-deck convolutional neural networks
CN109313805A (en) Image processing apparatus, image processing system, image processing method and program
CN112634163A (en) Method for removing image motion blur based on improved cycle generation countermeasure network
CN112308128B (en) Image matching method based on attention mechanism neural network
CN109313806A (en) Image processing apparatus, image processing system, image processing method and program
CN113011401A (en) Face image posture estimation and correction method, system, medium and electronic equipment
CN114494347A (en) Single-camera multi-mode sight tracking method and device and electronic equipment
CN115205114A (en) High-resolution image splicing improved algorithm based on ORB (object-oriented bounding box) features
CN112614167A (en) Rock slice image alignment method combining single-polarization and orthogonal-polarization images
CN112365578A (en) Three-dimensional human body model reconstruction system and method based on double cameras
Hsu et al. Object detection using structure-preserving wavelet pyramid reflection removal network
CN113901928A (en) Target detection method based on dynamic super-resolution, and power transmission line component detection method and system
CN117392496A (en) Target detection method and system based on infrared and visible light image fusion
CN116563306A (en) Self-adaptive fire trace spectrum image segmentation method and system
Jung et al. Multispectral fusion of rgb and nir images using weighted least squares and convolution neural networks
Wang et al. A unified framework of source camera identification based on features
CN111127355A (en) Method for finely complementing defective light flow graph and application thereof
CN116310131A (en) Three-dimensional reconstruction method considering multi-view fusion strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination