CN114120013A - Infrared and RGB cross-modal feature point matching method - Google Patents

Infrared and RGB cross-modal feature point matching method Download PDF

Info

Publication number
CN114120013A
CN114120013A CN202111392935.5A CN202111392935A CN114120013A CN 114120013 A CN114120013 A CN 114120013A CN 202111392935 A CN202111392935 A CN 202111392935A CN 114120013 A CN114120013 A CN 114120013A
Authority
CN
China
Prior art keywords
image
rgb
original
images
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111392935.5A
Other languages
Chinese (zh)
Inventor
田炜
陈展
邓振文
黄禹尧
谭大艺
韩帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202111392935.5A priority Critical patent/CN114120013A/en
Publication of CN114120013A publication Critical patent/CN114120013A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Abstract

The invention relates to an infrared and RGB cross-modal feature point matching method, which comprises the following steps: performing off-line training on the deep learning model based on the collected original RGB image and IR image to obtain a trained matching model; and inputting the data to be detected into the matching model to extract the feature descriptors and output a corresponding matching result. Compared with the prior art, the method focuses on fusion of multispectral images, fuses visible light images (RGB) and thermal imaging Images (IR), can accurately extract feature points in multiple modes through model training, better executes a cross-mode feature matching task, further improves the accuracy of machine position and attitude estimation in scenes with severe illumination change and darkness, can provide a reliable sensing front end for many applications, lays a front end for subsequent research work of fusing multispectral sensors under a traditional SLAM framework, and is favorable for realizing mapping positioning matching or depth estimation and three-dimensional mapping of the same scene spanning day (based on RGB images) and night (based on IR images).

Description

Infrared and RGB cross-modal feature point matching method
Technical Field
The invention relates to the technical field of intelligent driving, in particular to an infrared and RGB cross-modal feature point matching method.
Background
In the unmanned perception task, feature extraction is the key, however, under the conditions of severe change of ambient light, even complete darkness and bad weather, the traditional machine vision often faces the problem of large error and even failure in feature extraction. For example, in tasks such as SLAM (simultaneous localization and mapping), SfM (Structure from motion, three-dimensional reconstruction based on motion), camera calibration, image registration, etc., extracted features mainly include points of interest, and points of interest with poor quality or small number extracted under the influence of environment inevitably cause failure in matching of subsequent feature points, so that subsequent calculation tasks cannot be performed.
Most of the conventional feature point extraction methods are based on relatively stable local image Features, including SIFT (Scale-invariant feature transform), SURF (Speeded Up Robust Features), and ORB (Oriented Fast and Rotated brif). Although the manual point precision of the conventional methods is high, robustness is lacked in many scenes, for example, under dark and low-texture scenes, the gradient information of pixels at the moment is extremely little and the noise is large, so that the feature points cannot be accurately and quickly extracted for matching.
At present, with the development of deep learning networks, feature point extraction methods based on deep learning begin to appear, but these methods are directed to a single modality, and the problem of feature difference between cross-modalities is not considered. Because the visible light camera has a perception defect in a low-light scene, the accuracy of the camera position and posture estimation in a dark scene with severe light change cannot be ensured, and great challenges are brought to mapping, positioning and matching or depth estimation and three-dimensional mapping in the same scene spanning day and night.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an infrared and RGB cross-modal feature point matching method, so that feature points can be accurately extracted in multiple modes, a cross-modal feature matching task can be better matched, the accuracy of machine pose estimation in a scene with severe illumination change and dark scenes is further improved, and the method is favorable for realizing mapping positioning matching or depth estimation and three-dimensional mapping in the same scene spanning day and night.
The purpose of the invention can be realized by the following technical scheme: an infrared and RGB cross-modal feature point matching method comprises the following steps:
s1, collecting an original RGB image and an original IR image, and performing off-line training on the deep learning model based on the original RGB image and the original IR image to obtain a trained matching model;
and S2, inputting the data to be tested into the trained matching model to extract the feature descriptors and outputting the corresponding matching result.
Further, the step S1 is to collect the raw RGB image by a visible light camera and collect the raw IR image by a thermal imaging camera.
Further, the deep learning model in step S1 is specifically an UnsuperPoint neural network model.
Further, the specific process of the offline training in step S1 is as follows:
s11, preprocessing the collected original RGB image and the original IR image to obtain a pair of images;
s12, establishing an UnstuperPoint neural network model, inputting paired images into the UnstuperPoint neural network model for off-line training, and obtaining a trained matching model.
Further, the step S11 is specifically to perform a pixel alignment process on the original RGB image and the original IR image to ensure that the original RGB image and the original IR image are completely aligned at a pixel level.
Further, the pair of images are specifically an original RGB image and an IR image added with a view angle transformation.
Further, the pair of images are specifically an original IR image and an RGB image added with a view angle transformation.
Further, the UnsuperPoint neural network model constructed in step S12 includes a backbone network, the backbone network is used for performing joint tasks of point confidence estimation, point coordinate regression, and descriptor extraction, and the backbone network is divided into two branches: one branch is used for processing an original image, the other branch is used for processing an image after the homography matrix transformation, extracted points are projected into the same image coordinate system through the true value of the homography matrix, the point distance of each pair is calculated, the point pair with the distance smaller than 4 pixels is used as an effective point pair, and a point corresponding relation is constructed to carry out self-supervision learning;
the UnsurPoint neural network model adopts a convolution network layer with convolution kernel size of 3 and step length of 2 to process edge blurring generated by temperature radiation in an IR image.
Further, the learning loss function of the UnsuperPoint neural network model is specifically as follows:
L=αscoreLscorerepLrepposLposuniLunidesLdesdes_coorLdes_coor
Figure BDA0003369387710000031
Figure BDA0003369387710000032
sim(zi,zj)=zi Tzj/||zi||||zj||
wherein A is the identification of RGB image, B is the identification of IR image, L is the total loss function, LscoreIs the point confidence loss, which is represented by the square of the difference in scores of the same points A and B, αscoreIs LscoreThe corresponding weight;
Lrepto account for loss of repeatability based on point pair distance, s is the confidence with which a point is extracted, d is the distance of a point pair,
Figure BDA0003369387710000033
is the mean of the distances of all pairs of points, αrepIs LrepThe corresponding weight;
Lposeuclidean distance loss, α, for point pairsposIs LposThe corresponding weight;
Lunifor loss of coordinate uniformity, αuniIs LuniThe corresponding weight;
Ldesto describe the loss of a descriptor, the loss is represented by the square of the difference of the descriptors of the same points A and B, αdesIs LdesThe corresponding weight;
Ldes_coorfor increasing the compactness of the descriptors in space, the loss being represented by the sum of the cross-correlation coefficients of the descriptors at the points at different positions, αdes_coorIs Ldes_coorThe corresponding weight;
zi,zjfor two descriptor vectors, sim (z)i,zj) Is zi,zjIs a temperature over-parameter, and is used for controlling the strength of the learning negative case.
Further, the data to be measured comprises paired RGB images to be measured and IR images to be measured.
Compared with the prior art, the method focuses on fusion of multispectral images, namely, the visible light camera and the thermal imaging camera are fused, and the sensing defect of the visible light camera in a low-illumination scene is made up by utilizing the characteristic that the thermal imaging camera is not influenced by illumination change. The invention learns an extraction point and a description sub-model which can be adapted in a multi-mode based on a neural network, and lays a front-end cushion for the subsequent research work of fusing a multi-spectral sensor under the traditional SLAM framework so as to improve the accuracy of machine position and posture estimation under the scene with severe illumination change and dark, thereby being beneficial to realizing the mapping positioning matching or depth estimation and three-dimensional mapping of the same scene spanning day and night.
According to the invention, the RGB and IR data sets are mixed to train a model, so that the trained matching model can perform feature matching under respective modes under the RGB and IR modes, the matching precision of the model with the dual-mode feature matching function under the two modes can be obviously improved through the training mode, meanwhile, a large amount of data is adopted in an unsupervised learning task, and the model trained by the large amount of data has higher robustness on the tasks of extracting features and defining repeat points.
The invention is based on the infrastructure of the UnsuuperPoint network model, carries out self-supervision through the image pair, realizes thorough unsupervised, and relieves the dependence on synthetic data during SuperPoint pre-training; meanwhile, the position is used as a regression term to be differentiable, so that the position can be optimized to move; in order to solve the problem that the characteristic points appear on the grid boundary, a heuristic distribution approximation algorithm is adopted, and the distribution of the characteristic points is more uniform through the method; in addition, the cross correlation among the dimensions of the descriptors is reduced, so that the redundancy among the dimensions of the descriptors is reduced, and the expression capacity of the descriptors is improved; innovative adjustments are made to replace maximal pooling with convolution, change learning loss functions, etc. to better match the cross-modal feature matching task.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of an embodiment of a deep learning network;
FIG. 3 is a schematic diagram of an auto-supervised learning framework;
fig. 4 is a diagram illustrating exemplary descriptor matching in different modalities.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Examples
As shown in fig. 1, a cross-modal feature point matching method for infrared and RGB includes the following steps:
s1, collecting an original RGB image and an original IR image, and performing off-line training on the deep learning model based on the original RGB image and the original IR image to obtain a trained matching model;
since the UnsuperPoint model has already proven its excellent performance in the RGB mode at present, the present embodiment mainly uses the model as a main frame (as shown in fig. 2), and adjusts for the cross-mode feature matching task on the basis of the model;
in this embodiment, an unscuperpoint neural network model is used for off-line training, and the specific process of off-line training is as follows:
s11, pre-processing the captured original RGB image (which can be captured by a visible light camera) and the original IR image (which can be captured by a thermal imaging camera) to obtain a pair of images, specifically, the pair of images is the original RGB image and the IR image added with the view angle transformation, or the original IR image and the RGB image added with the view angle transformation, and the original RGB image and the original IR image are completely aligned at a pixel level;
it should be noted that the model training process in the invention is the same as the model training thought of UnsuperPoint in the single mode, and according to the thought in the single mode, the transformation relation between the original image and the transformation image is randomly generated and known, so that the data input in the cross-mode must ensure that the original RGB and the original IR image are completely aligned in pixel level, so as to realize the cross-mode matching;
s12, constructing an UnserPoint neural network model, inputting paired images into the UnserPoint neural network model for off-line training to obtain a trained matching model, wherein the UnserPoint neural network model comprises a backbone network, UnserPoint shows excellent performance in an RGB mode, points extracted by the network can avoid artificially defining angular points or image gradients, and obtain good performance in repeatability and positioning errors, but infrared information is derived from thermal radiation, and edge characteristic noise of infrared imaging is high, and can be significantly different from edge characteristics of RGB images. Thus, by selecting keypoints with repeatability, rather than by visual corner or edge information, errors due to noise can be minimized.
The UnsurPoint completes the extraction of low-level features by a lightweight network, which is used for the joint task of point confidence estimation, point coordinate regression and descriptor extraction. For each 8 x 8 grid on the original image scale, the network outputs a point and its descriptor. The self-supervised learning process is illustrated in fig. 3. The network is divided into two branches: one branch processes the original image and the other branch processes the homographic matrix transformed image. The extracted points are projected into the same image coordinate system through the true values of the homography matrix. The point distances of each pair are calculated. Taking a point pair with a distance smaller than 4 pixels as an effective point pair, and constructing a point correspondence relationship to perform self-supervision learning;
meanwhile, in consideration of the fact that the non-parameter network structure of maximum pooling is selected in the original UnsurPoint network through down sampling, and the image in the infrared mode generates edge blurring due to temperature radiation, in order to enable the network to better learn the bottom layer characteristics, but not excessively influence computing resources, the edge blurring is processed by adopting a convolution network layer with the convolution kernel size of 3 and the step length of 2;
in addition, in the aspect of a learning loss function, a descriptor loss part adopts a negative example-based comparative learning loss SimCLR, and the SimCLR is a framework of self-supervision learning, and positive examples and negative examples are independently constructed for comparative learning. The loss function adopts a twin structure, and constructs a positive case through transformation enhancement, so that the network is forced to learn certain invariant properties in the image characteristics. In the descriptor learning task, positive examples, namely point pairs descriptors with the distance within a threshold value, can encourage higher similarity between the positive examples through contrast loss, meanwhile, the distance between the positive examples and any negative examples is enlarged, and the network is supervised by the negative examples to reserve independent features, so that model collapse is prevented, namely, the negative examples are distributed as uniformly as possible. For two descriptor vectors zi,zjThe similarity calculation of (d) is obtained by canonical normalized vector dot product:
sim(zi,zj)=zi Tzj/||zi||||zj||
the descriptor loss function is found to be:
Figure BDA0003369387710000051
in the formula (I), the compound is shown in the specification,
Figure BDA0003369387710000061
is a hint, which is multiplied by the following natural exponential operation, if k is not equal to i, the value of the hint is 1, otherwise it is 0; tau is a temperature over-parameter and can control the strength of the learning negative case. First, there are also differences between negative examples, with more similar negative examples and completely unrelated negative examples. Therefore, learning is difficult for the negative examples that are close in space, and the more difficult it is to pull the distance of the negative examples apart. The temperature super-parameter can be adjusted according to the punishment distribution of the hard negative cases, and the smaller the super-parameter is, the more the hard negative cases can be emphasized, so that the samples are more uniform in spatial distribution. However, the smaller the temperature over-parameter is, the better the temperature over-parameter is, when the descriptor is learned, the network judges whether the descriptor pair is a positive case or a negative case according to the distance prior of the point pair, and if the position of the extraction point is incorrect at the initial stage of network learning, the temperature over-parameter is too low, which may cause the descriptor which should be similar in reality to be pushed away, and the spatial distance of the descriptor is more difficult to be pulled back.
The learning loss function of the UnsurPoint neural network model in the invention is specifically as follows:
L=αscoreLscorerepLrepposLposuniLunidesLdesdes_coorLdes_coor
in the formula, LscoreThe point confidence loss is mainly to ensure that the scores of the same points of the A diagram (RGB diagram) and the B diagram (infrared diagram) have similarity, that is, the confidence of the points under different visual angles are consistent, and the loss is represented by a square table of the score difference values of the same points of the A diagram and the B diagramShown in the specification;
Lrepis based on the repeatability loss of point-to-distance, with a loss function of:
Figure BDA0003369387710000062
the twin map of A after homography transformation is considered here, and s is the confidence of the extracted point. Only point pairs with a pixel distance of less than 4, d being the distance of the point pair,
Figure BDA0003369387710000063
is the average of the distances of all the point pairs. This loss is expected to decrease confidence in points when distances are large and increase confidence in points when distances are small.
LposIs the Euclidean distance loss of the point pairs, and the main objective of the Euclidean distance loss is to ensure that the key point positions detected by the graph A (after homography) are the same as those detected by the graph B, namely, the stability of the positions of the points under different visual angles, and the Euclidean distance loss of the point pairs is ensured. L isuniHere, only points in an 8 × 8 grid are considered, and when the loss is calculated, the points are sorted by coordinates, and the variance of the intervals of the point coordinates is calculated as the loss. L isdesIs the descriptor penalty, represented by the square of the difference of the descriptors of the same points of the a and B plots, which ensures that the descriptors of the point pairs should be similar. L isdes_coorIs used to increase the compactness of the descriptors in space, the loss being represented by the sum of the cross-correlation coefficients of the descriptors at the points at different positions.
αscoreIs LscoreThe corresponding weight; alpha is alpharepIs LrepThe corresponding weight; alpha is alphaposIs LposThe corresponding weight; alpha is alphauniIs LuniThe corresponding weight; alpha is alphadesIs LdesThe corresponding weight; alpha is alphades_coorIs Ldes_coorThe corresponding weight.
Therefore, the invention improves the structure of the UnsurPoint network: edge blurring is handled using convolutional layers, and contrast loss improves matching and descriptor extraction quality.
And S2, inputting the data to be detected into the trained matching model to extract the feature descriptors and outputting the corresponding matching result, wherein the data to be detected comprises paired RGB images to be detected and IR images to be detected.
In summary, according to the technical scheme, the visible light camera and the thermal imaging camera are used for carrying out initial image acquisition, and the network model is trained offline based on the acquired initial image data; and then, sending the data set to be tested into the trained model, extracting the feature descriptors, and matching, wherein in fig. 4, the left side is an RGB image, the right side is an IR image, and fig. 4 shows a descriptor matching example between an RGB modality and an IR modality in the embodiment.
The model training process in the present invention is the same as the model training concept in the single mode, and the difference is that the input pair of images is RGB original image and IR image with added view angle transformation (or IR original image and RGB image with added view angle transformation), and it is necessary to ensure that the original RGB and IR image are completely aligned at the pixel level.
The qualitative indicators in the training are all expressed as follows: RS is the repetition rate, LE is the position error, HE is the homography estimation error, wherein the epsilon is the threshold value of the average error after the homography transformation of 4 angular points, and MS is the matching score.
The robustness of the extracted points was evaluated using the repetition rate. The original image is represented by O, the transformed image is represented by C, and the transformation matrix is known, at this time, the points extracted from O are transformed into C view angle by the homographic transformation matrix, and are marked as Ptrue_warpedAnd the point extracted from C is marked as PwarpedThe repetition rate calculation can be written as:
Figure BDA0003369387710000071
the distance threshold is 3 pixels, and points smaller than the distance threshold are considered as matching point pairs.
Another evaluation index is position error, which evaluates the accuracy of the extracted point position, as in the repetition rate calculation, when the recording distance is less thanThe points of the threshold are respectively paired with Gtrue_warpedAnd GwarpedThe position error is calculated as:
Figure BDA0003369387710000072
the evaluation of the descriptors cannot be carried out independently, and because the evaluation mode is also a method of calculating the descriptors after detecting points, only the performance of the extracted points and the descriptors can be comprehensively evaluated. The comprehensive evaluation is carried out under a homography transformation estimation method, firstly violent matching is adopted for matching of the descriptors, the similarity of the descriptors is measured by an L2 distance, and then a RANSAC (Random Sample Consensus) algorithm is combined according to a matching result to estimate a homography transformation matrix between two images.
The matching score mainly reflects the performance of the descriptor, RANSAC screens interior points (corresponding to matching point pairs within the error range of the homography) and exterior points in the sample, wherein the exterior points are matching point pairs causing too large error in calculating the homography, and the matching score calculation can be written as:
Figure BDA0003369387710000081
and the homography transformation accuracy comprehensively evaluates the position accuracy and the matching performance of the extraction points. After coordinates of four edge points of the image size are given, average error distances of the four points under estimated homography transformation and transformation matrix truth values are calculated, whether estimation is correct or not is judged according to different threshold values, finally, accuracy is evaluated according to the estimated correct proportion, and the index is marked as HE. In the present embodiment, 1, 3, 5, and 10 pixels are used as evaluation thresholds.
Regarding the selection of the backbone network, unlike the single-mode feature, a pair of features output by the backbone network in the cross-mode are the RGB mode and the IR mode, respectively. The backbone network of UnsuperPoint is very light, different from more complex network architectures such as VGG or ResNet, etc., the migration performance of the backbone network may be poor, and the backbone network cannot be used as a common backbone network in a cross-modal. Therefore, the RGB and IR data sets are mixed to train a model, namely the model can carry out feature matching under the RGB and IR modes respectively, and through the training mode, the matching precision of the model with the dual-mode feature matching function under the two modes can be obviously improved. Meanwhile, a large amount of data is adopted in the unsupervised learning task, and the robustness of the model trained by the large amount of data on the tasks of extracting features and defining repetition points is higher.
The method can provide a reliable sensing front end for automatic driving, and lays a front end for subsequent research work of fusing the multispectral sensor under the traditional SLAM framework, so as to improve the accuracy of phase and attitude estimation under the dark scene with severe illumination change. This work would also be beneficial for achieving mapping location matching or depth estimation and three-dimensional mapping of the same scene across day and night. For example, the problem that the RGB camera fails to track the feature points when the illumination is severely changed can be solved, the conversion of a multi-mode sensor can be realized by using cross-modal feature matching, and the influence of illumination on the stability of the SLAM system is reduced.

Claims (10)

1. An infrared and RGB cross-modal feature point matching method is characterized by comprising the following steps:
s1, collecting an original RGB image and an original IR image, and performing off-line training on the deep learning model based on the original RGB image and the original IR image to obtain a trained matching model;
and S2, inputting the data to be tested into the trained matching model to extract the feature descriptors and outputting the corresponding matching result.
2. The method as claimed in claim 1, wherein the step S1 is to collect original RGB images by a visible light camera and collect original IR images by a thermal imaging camera.
3. The infrared and RGB cross-modal feature point matching method according to claim 1, wherein the deep learning model in step S1 is specifically an UnsuperPoint neural network model.
4. The method as claimed in claim 3, wherein the off-line training in step S1 comprises:
s11, preprocessing the collected original RGB image and the original IR image to obtain a pair of images;
s12, establishing an UnstuperPoint neural network model, inputting paired images into the UnstuperPoint neural network model for off-line training, and obtaining a trained matching model.
5. The method as claimed in claim 4, wherein the step S11 is specifically to perform a pixel alignment process on the original RGB image and the original IR image to ensure that the original RGB image and the original IR image are completely aligned at a pixel level.
6. The method as claimed in claim 5, wherein the pair of images are original RGB images and IR images with added perspective transformation.
7. The method as claimed in claim 5, wherein the pair of images are an original IR image and an RGB image with a view transformation added.
8. The infrared and RGB cross-modal feature point matching method according to any of claims 6 or 7, wherein the UnsuperPoint neural network model constructed in the step S12 includes a backbone network, the backbone network is used for performing joint tasks of point confidence estimation, point coordinate regression and descriptor extraction, and the backbone network is divided into two branches: one branch is used for processing an original image, the other branch is used for processing an image after the homography matrix transformation, extracted points are projected into the same image coordinate system through the true value of the homography matrix, the point distance of each pair is calculated, the point pair with the distance smaller than 4 pixels is used as an effective point pair, and a point corresponding relation is constructed to carry out self-supervision learning;
the UnsurPoint neural network model adopts a convolution network layer with convolution kernel size of 3 and step length of 2 to process edge blurring generated by temperature radiation in an IR image.
9. The infrared and RGB cross-modal feature point matching method according to claim 8, wherein the learning loss function of the UnsuperPoint neural network model specifically is:
L=αscoreLscorerepLrep+aposLposuniLunidesLdesdes_coorLdes_coor
Figure FDA0003369387700000021
Figure FDA0003369387700000022
sim(zi,zj)=zi Tzj/||zi||||zj||
wherein A is the identification of RGB image, B is the identification of IR image, L is the total loss function, LscoreIs the point confidence loss, which is represented by the square of the difference in scores of the same points A and B, αscoreIs LscoreThe corresponding weight;
Lrepto account for loss of repeatability based on point pair distance, s is the confidence with which a point is extracted, d is the distance of a point pair,
Figure FDA0003369387700000023
is the mean of the distances of all pairs of points, αrepIs LrepThe corresponding weight;
Lposeuclidean distance loss, α, for point pairsposIs LposThe corresponding weight;
Lunifor loss of coordinate uniformity, αuniIs LuniThe corresponding weight;
Ldesto describe the loss of a descriptor, the loss is represented by the square of the difference of the descriptors of the same points A and B, αdesIs LdesThe corresponding weight;
Ldes_coorfor increasing the compactness of the descriptors in space, the loss being represented by the sum of the cross-correlation coefficients of the descriptors at the points at different positions, αdes_coorIs Ldes_coorThe corresponding weight;
zi,zjfor two descriptor vectors, sim (z)i,zj) Is zi,zjIs a temperature over-parameter, and is used for controlling the strength of the learning negative case.
10. The method as claimed in claim 4, wherein the data to be measured includes paired RGB image to be measured and IR image to be measured.
CN202111392935.5A 2021-11-23 2021-11-23 Infrared and RGB cross-modal feature point matching method Pending CN114120013A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111392935.5A CN114120013A (en) 2021-11-23 2021-11-23 Infrared and RGB cross-modal feature point matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111392935.5A CN114120013A (en) 2021-11-23 2021-11-23 Infrared and RGB cross-modal feature point matching method

Publications (1)

Publication Number Publication Date
CN114120013A true CN114120013A (en) 2022-03-01

Family

ID=80439813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111392935.5A Pending CN114120013A (en) 2021-11-23 2021-11-23 Infrared and RGB cross-modal feature point matching method

Country Status (1)

Country Link
CN (1) CN114120013A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351049A (en) * 2023-12-04 2024-01-05 四川金信石信息技术有限公司 Thermal imaging and visible light fusion measuring point registration guiding method, device and medium
CN117824624A (en) * 2024-03-05 2024-04-05 深圳市瀚晖威视科技有限公司 Indoor tracking and positioning method, system and storage medium based on face recognition

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351049A (en) * 2023-12-04 2024-01-05 四川金信石信息技术有限公司 Thermal imaging and visible light fusion measuring point registration guiding method, device and medium
CN117351049B (en) * 2023-12-04 2024-02-13 四川金信石信息技术有限公司 Thermal imaging and visible light fusion measuring point registration guiding method, device and medium
CN117824624A (en) * 2024-03-05 2024-04-05 深圳市瀚晖威视科技有限公司 Indoor tracking and positioning method, system and storage medium based on face recognition

Similar Documents

Publication Publication Date Title
Schneider et al. RegNet: Multimodal sensor registration using deep neural networks
WO2022002150A1 (en) Method and device for constructing visual point cloud map
CN113012212B (en) Depth information fusion-based indoor scene three-dimensional point cloud reconstruction method and system
CN111311666B (en) Monocular vision odometer method integrating edge features and deep learning
CN109341703B (en) Visual SLAM algorithm adopting CNNs characteristic detection in full period
Yang et al. Registration of challenging image pairs: Initialization, estimation, and decision
CN111401384A (en) Transformer equipment defect image matching method
CN113269237A (en) Assembly change detection method, device and medium based on attention mechanism
CN107424161B (en) Coarse-to-fine indoor scene image layout estimation method
CN113361542B (en) Local feature extraction method based on deep learning
CN111709980A (en) Multi-scale image registration method and device based on deep learning
CN114120013A (en) Infrared and RGB cross-modal feature point matching method
CN111126412A (en) Image key point detection method based on characteristic pyramid network
Zhou et al. Cross-weather image alignment via latent generative model with intensity consistency
CN111368733B (en) Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal
Potje et al. Extracting deformation-aware local features by learning to deform
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
CN113128518B (en) Sift mismatch detection method based on twin convolution network and feature mixing
Zhang et al. MLIFeat: Multi-level information fusion based deep local features
CN112070181A (en) Image stream-based cooperative detection method and device and storage medium
CN117351078A (en) Target size and 6D gesture estimation method based on shape priori
Zhang et al. Data association between event streams and intensity frames under diverse baselines
CN113052311B (en) Feature extraction network with layer jump structure and method for generating features and descriptors
CN115410014A (en) Self-supervision characteristic point matching method of fisheye image and storage medium thereof
Qin et al. Structured-patch optimization for dense correspondence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination