CN117095033A

CN117095033A - Multi-mode point cloud registration method based on image and geometric information guidance

Info

Publication number: CN117095033A
Application number: CN202310921353.4A
Authority: CN
Inventors: 江薪祺; 徐宗懿; 张睿诚; 高鑫雨; 高新波
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-11-21
Anticipated expiration: 2043-07-25
Also published as: CN117095033B

Abstract

The invention relates to a computer vision technology, in particular to a multi-mode point cloud registration method based on image and geometric information guidance, which comprises the steps of obtaining a fourth super point characteristic of a super point in a point cloud, a geometric correlation characteristic between the super point in a source point cloud and an anchor point, a fourth super point characteristic of the super point in a target point cloud, and a geometric correlation characteristic between the super point in the target point cloud and the anchor point, and fusing the fourth super point characteristic and the geometric correlation characteristic based on cross attention to obtain a fifth super point characteristic; calculating the similarity screening super point pairs of the super points in the source point cloud and the target cloud based on the fifth super point feature, and carrying out up-sampling decoding on the first super point feature to obtain the original point feature; obtaining an original point pair relation set based on a sink horn algorithm, carrying out transformation matrix estimation on the relation set according to the original point pair relation set, and selecting an optimal transformation matrix from the transformation matrix estimation; the invention not only can fully fuse the image texture information and the point cloud structure information, but also reduces the noise introduction of irrelevant image information, and obtains the multi-mode super-point characteristic with differentiation.

Description

Multi-mode point cloud registration method based on image and geometric information guidance

Technical Field

The invention relates to a computer vision technology, which is applied to the technical fields of three-dimensional reconstruction, automatic driving, simultaneous positioning and mapping, robot pose estimation and the like, in particular to a multi-mode point cloud registration method based on image and geometric information guidance.

Background

The point cloud registration is an upstream task in a three-dimensional visual task, aims at solving a transformation matrix, aligns two point clouds with different visual angles in the same scene to the same coordinate system, and is widely applied to the fields of three-dimensional reconstruction, automatic driving, simultaneous positioning and mapping, robot pose estimation and the like.

Most of the current point cloud registration methods are point cloud registration based on corresponding relations, and mainly comprise the following four processes of firstly extracting features of two input frames of point clouds respectively, then selecting corresponding relations in a feature space, then filtering outer points, and finally solving a transformation matrix by using a pose estimator with stronger robustness according to the corresponding relations. Recently, a point cloud registration method based on a correspondence relationship generally uses a coarse-to-fine registration method, which first finds a correspondence relationship of a super-point level, and then performs fine registration of a point level.

The point cloud registration method is classified according to modes, and can be classified into a single-mode point cloud registration method and a multi-mode point cloud registration method.

(1) Single-mode point cloud registration method

The single-mode point cloud registration method refers to registration using only a point cloud mode. Most of the current methods are single-mode point cloud registration methods. The success of point cloud registration does not leave the extraction of point cloud geometric features, and because of the difficulty in feature extraction caused by the disorder of the point cloud, people can use a shared multi-layer perceptron network to extract features at each point at first, and later use a hierarchical network structure, so that the extracted features can adapt to different point cloud densities. Inspired by an image convolution neural network, a learner proposes a point cloud feature extraction method based on variable kernel point convolution, and the method can adapt to objects with different shapes and attach more importance to the geometric structure characteristics of point clouds. In the point cloud registration task, people also pay attention to geometric consistency between two frames of point clouds, after the super points are extracted by downsampling by using the epipolar convolution, the characteristic enhancement is performed by injecting the distance and angle consistency information between the super points, and the method remarkably improves the robustness of the algorithm.

The performance of the current single-mode point cloud registration method has good effect, but as the methods only use the single mode point cloud, the extracted features only have geometric information and lack texture information, so that the method is easily limited by repeated geometric areas and weak geometric areas in a scene with low overlapping rate, and the final registration failure is caused. In addition, when the geometric consistency is embedded in the current method, the relation among each super point is often considered, and fuzzy geometric information is mixed, so that the distinguishing degree of the point cloud characteristics is reduced, and the correct selection of the corresponding relation is not facilitated.

(2) Multi-mode point cloud registration method

In recent years, with the rise of deep learning, the tasks such as image classification and target detection are made to be a great breakthrough by extracting the characteristics of the images by using the techniques such as convolutional neural network, so that a plurality of multi-mode point cloud registration algorithms based on the images and the point clouds are also developed in the point cloud registration task in recent two years. Firstly, an implicit multi-mode feature fusion method is tried, firstly, coding feature extraction is carried out on point clouds and images corresponding to the point clouds, then, a global fusion is carried out on the features of the point clouds and the images by using an attention mechanism, the purpose of enhancing the point cloud features is achieved, and finally, the point cloud features with texture information and structural information are obtained by using a decoder. The method provides a reference for the fusion of the multi-mode information, but the implicit image feature fusion method is adopted, so that the feature discrimination of the point cloud is reduced, and the point cloud cannot acquire a better performance on the point cloud registration task with a low overlapping rate. Later, it was proposed to use only one image in the training phase, which can cover the partially overlapping area of two input frames of point clouds at the same time, and this method is beneficial to searching the corresponding relationship, and ensures that the correct transformation matrix can be solved in the later stage. Besides, by utilizing the current mature image matching technology, the corresponding pixels are extracted from the image, then the corresponding pixels are projected onto the point cloud by using the inner and outer parameter matrix, the characteristics of the point cloud are initialized to the pixel characteristics in the two-dimensional overlapping area, and then the characteristic extraction of the point cloud is performed by using a conventional kernel convolution method.

The current multi-mode point cloud registration method is in an exploration stage, and when the point cloud characteristics are enhanced by utilizing image information, the prior method is integrated with fuzzy noise so that the degree of distinction of the point cloud characteristics is reduced, or the characteristics of the images are not fully integrated, so that the current multi-mode method cannot show excellent performance in registration tasks.

Disclosure of Invention

In order to solve the problem of limiting point cloud registration tasks of a weak geometric region and a repeated geometric region, the invention provides a multi-mode point cloud registration method based on image and geometric information guidance, which specifically comprises the following steps:

inputting a source point cloud and a target point cloud into a deformation convolution neural network serving as an encoder, and performing point cloud feature extraction while performing downsampling to obtain a first super point feature; acquiring pixel characteristics of an image corresponding to the point cloud through a residual error network;

carrying out dimension lifting on the first super-point feature to obtain a second super-point feature; taking the pixel characteristic corresponding to the filtered point of one super point in the down sampling process as the pixel characteristic of the super point, and fusing the second super point characteristic and the corresponding pixel characteristic based on an attention mechanism and a multi-layer perceptron to obtain the texture characteristic of the super point; splicing the texture features of the super points with the first super point features to obtain third super point features;

calculating the distance characteristic between two super points in a point cloud, and fusing the distance characteristic between the two super points with the third super point characteristic based on an attention mechanism to obtain a fourth super point characteristic;

selecting an anchor point set from a source point cloud and a target point cloud based on a non-maximum suppression method, acquiring distance features and angle correlations between the anchor points and the superpoints, and acquiring geometric correlation features between the superpoints and the anchor points by fusing the distance features and the angle correlations between the anchor points and the superpoints;

fusing the fourth super-point characteristic of the super-point in the source point cloud, the geometric correlation characteristic between the super-point in the source point cloud and the anchor point, the fourth super-point characteristic of the super-point in the target point cloud and the geometric correlation characteristic between the super-point in the target point cloud and the anchor point based on the cross attention to obtain a fifth super-point characteristic;

calculating the similarity between a fifth superpoint characteristic corresponding to the superpoint in the source point cloud and a fifth superpoint characteristic corresponding to the superpoint in the target cloud, and screening out the K-point with the highest similarity as a superpoint pair set;

up-sampling and decoding a first super-point feature of the super-point in the super-point pair set to obtain an original point feature; calculating the similarity of the original points in the neighborhood of each super point, obtaining an original point pair relation set based on a sink horn algorithm, and carrying out transformation matrix estimation on the relation set according to the original points;

and each super point corresponding relation can obtain a transformation matrix of a source point cloud and a target point cloud, and an optimal transformation matrix is selected from a plurality of transformation matrices estimated by the super point pair set.

Compared with the existing multi-modal point cloud registration method, the multi-modal feature fusion module based on the local texture information, which is designed by the invention, can fully fuse the image texture information and the point cloud structure information, reduce the noise introduction of irrelevant image information and obtain the multi-modal super-point features with distinction. In addition, the invention designs a selective correlation fusion module which selects a reliable super point as an anchor point, then carries out correlation fusion, and after the anchor point is iteratively updated, the anchor point falls in an overlapping area, so that less fuzzy noise information is introduced when the super point characteristic is enhanced, the super point between two frames of point clouds is facilitated to be correctly matched, and the accuracy of point cloud registration is improved.

Drawings

FIG. 1 is a schematic diagram of a model structure in a multi-modal point cloud registration method based on image and geometric information guidance;

FIG. 2 is a schematic diagram of distance embedding based on a self-attention mechanism according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating the extraction of correlation information between a single super point and an anchor point in an embodiment of the present invention;

FIG. 4 is a diagram of cross-attention mechanism based correlation fusion in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a visualization of registration results according to an embodiment of the present invention;

fig. 6 is a flowchart of a multi-modal point cloud registration method based on image and geometric information guidance according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a multi-mode point cloud registration method based on image and geometric information guidance, as shown in fig. 6, which specifically comprises the following steps:

In the present embodiment, a source point cloud is givenAnd target point cloudAnd two point clouds corresponding to each other's picture->Andwherein->And->With a certain overlap area between them, the two point clouds are two frames of point clouds from different viewing angles in the same scene. The objective is to solve for a rotation matrix R.epsilon.SO (3) and a translation vector +.>SO (3) is a three-dimensional rotating group, SO that the source point cloud and the target point cloud can be spliced together under the action of the SO (3). R and t can be calculated from equation (1):

wherein,is a set of point correspondences in a source point cloud and a target point cloud, +.>Is a corresponding point in the collection.

In this embodiment, a multi-mode point cloud registration method based on image and geometric information guidance is provided, as shown in fig. 1, and the method mainly includes four steps of feature extraction, multi-mode information fusion, selection correlation fusion, relationship matching and pose estimation. The embodiment describes a multi-mode point cloud registration method based on image and geometric information guidance from four steps respectively.

Feature extraction

The feature extraction includes extracting a first super-point feature in the point cloud and a pixel feature in an image corresponding to the point cloud, and specifically includes:

cloud the source point(/>Set of real numbers) and target point cloud +.>Respectively inputting two deformation convolutional neural networks with shared parameters, performing downsampling in the deformation convolutional neural networks, and performing point cloud feature extraction by using +.>And->To represent the super point obtained after downsampling, and usingAnd->Respectively representing first super point features in two frames of point clouds, wherein |·| represents the number of elements in a set; />Representing a super point set obtained after downsampling a source point cloud>The ith super point of (a); />Representing the super point set obtained after downsampling the target point cloud +.>The j th super point in the list; c represents the dimension of the feature.

Extracting image characteristics from the images corresponding to the point clouds by using a residual network, wherein the image characteristics corresponding to the source point cloud and the target point cloud are respectively usedAnd->Wherein W represents the width of the image; h represents the high of the image.

(II) multimodal information fusion

In the point cloud registration task, since the single-mode point cloud only contains geometric information (namely, the first super-point feature), challenges of some weak geometric areas and interference of repeated structural areas cannot be well solved, and texture information contained in an image corresponding to the point cloud can enhance the point cloud feature, so that the point cloud feature has both geometric information and texture information, and the distinction degree between super-points is increased. In the prior art, when point clouds and image information are fused, unnecessary texture noise information is often introduced, so that the point cloud features become fuzzy, and therefore, the invention designs a multi-mode feature fusion module based on local texture information, and in the embodiment, the source point clouds are taken as an example to explain how the module works.

In order to be able to accurately fuse multimodal information, first, for each point in the point cloud, its position of the corresponding pixel in the image is found. Specifically, for a point cloud, which is generated by three-dimensionally reconstructing a plurality of consecutive color maps and depth maps, the present embodiment uses only the first frame of pictures in consideration of time complexity. Next, according to the extrinsic matrix, the points in the point cloud are transformed into a camera coordinate system when the first frame of picture is taken, and the implementation of the transformation can be represented by a formula (2):

wherein R is _ext And t _ext The rotation matrix and translation vector of the point cloud relative to the camera pose are respectively.

Since the camera shooting operation is a three-dimensional to two-dimensional transformation operation, the scale of the image is different from the scale of the real world to some extent, so that the point under the camera coordinate system is obtainedThen, the internal reference matrix calibrated by the camera is required to be transformed into an image coordinate system, and the specific implementation can be represented by a formula (3):

wherein M is _int ∈R ^3×3 Is an internal reference matrix of the camera.

After obtaining the point cloud in the image coordinate system, to enableEach point of (3)Finding the position of its corresponding pixel requires a normalization operation to obtain the pixel position (w _i ,h _i ) This operation can be represented by equation (4):

(w _i ,h _i )＝(x _i /z _i ,y _i /z _i ) (4)

the projection operation finds the corresponding pixel position for each point, which creates a basic condition for multi-modal feature fusion based on local texture information.

Since feature extraction using the kernel convolution method is based on pointsThe down-sampling operation of the cloud is implemented so that each super point exists in the neighborhoodA filtered dot, which>The dots also correspond to +.>Each pixel, therefore, for each super point there is +.>And each pixel.

The present embodiment upscales the first superpoint feature (which aims at aligning with the dimension in the subsequent feature fusion by the attention mechanism), and the upscaled featureA second super-point feature expressed as super-point in the point cloud, usingThe pixel characteristics are represented as the pixel characteristics corresponding to the super points, wherein N represents the number of the super points, and +.>Indicating +/beside each super point>And each pixel.

The present embodiment uses an attention mechanism to fuse a single super-point feature with local texture information in its neighborhood. In particular, a matrix W which can be learned is used _q (its dimension isFor learning the dimensions of the matrix in the attention mechanism) will be second supersoint feature +.>Mapping into->Using leachable W _k 、W _v Super point in source point cloud ++>Image characteristics of->Mapping to->And->

Weight matrix is then calculatedThe weight matrix represents the relation weight between each pixel characteristic and the super point characteristic in the super point neighborhood, and the texture characteristic of the super point +.>Can be obtained by the formula (5):

F＝MLP(W*V _A ) (5)

wherein MLP is a multi-layer perceptron.

Finally, the texture feature F of the super point and the geometric feature of the super pointSplicing to obtain super-point characteristic with multi-mode information>Taking this feature as the third superpoint feature, this operation can be expressed by the formula (6):

wherein cat (-) represents the splicing operation;a j-th dimensional geometrical feature of an i-th superpoint in the source point cloud is represented; f (F) _ij And the j-th dimension texture characteristic of the i-th super point in the source point cloud is represented.

Compared with the existing multi-mode fusion method, the method can fuse geometric features and texture features in a local area, does not introduce excessive noise while fully fusing the texture information features of the image, aims at meeting the requirements of Fu Gedian cloud registration on the point features, and fully ensures the feature distinction between the points.

(III) selection of relatedness fusions

The internal structure of the point cloud contains rich context information, which is beneficial to the description capability of super point characteristics. Here, the present embodiment employs a self-attention mechanism to help perceive context information and embed its internal structure into the superpoint feature. In fig. 2, this embodiment takes a source point cloud as an example, and explains the process of feature enhancement.

One superpoint in a given source point cloudIt is +.>Distance feature between->Can be obtained from equation (7):

wherein,representing the calculation of the square of the Euclidean distance between the third superpoint features between two superpoints, f (·) is a sinusoidal position-coding function capable of mapping a low-dimensional feature into a high-dimensional feature, σ _d Is the distance sensitivity coefficient, when->And->When the distance between them exceeds this value, +.>The value of (2) is 0. Those skilled in the art may calculate the distance between the third superpoint features by using other distance calculation methods, and map the distance features between the two superpoints into the high-dimensional space by using other linear or nonlinear mapping methods, which in this embodiment only refers to one of the above-mentioned alternative embodiments.

Distance features within possession of a source point cloudThereafter, the third super-point feature is +.>And distance feature->Fusion is performed. Specifically, a matrix W that can be learned is used respectively _q ，W _k ，W _v Will->Mapping intoCan be usedMatrix W of learning _g Will->Mapping into->The attention Score is then calculated using equation (8) _(i,j) ：

Next, using equation (9) we get the super-point feature that fuses the range-transform invariance(this feature is a fourth superpoint feature):

wherein the Score is a matrix composed of attention scores among all superpoints, namely the elements of the ith row and the jth column in the matrix Score are the Score _(i,j) 。

In the point cloud registration task, it is important to find the invariant feature of the transformation between two frames of point clouds. Some existing methods choose to embed the distance and angle information between the super-points as they enhance the super-point features, as these information do not change with the rigid transformation. However, they often embed a characteristic relationship between the current super-point and each of the other super-points, which can result in the inclusion of redundant features in non-overlapping areas such that the feature differentiation between points is reduced. Therefore, the invention designs a selective correlation fusion module to solve the problems of the prior method.

Next, the present embodiment uses a non-maximum suppression (NMS) method for anchor selection. According to the method, the density among points in the point cloud space is considered, so that certain sparsity among the selected anchor points is fully ensured, and the performance of later-stage correlation fusion is improved. The algorithm flow is as follows:

(1) Inputting super-point characteristics of source point cloud and target point cloud fused with multi-mode information(i.e. third superpoint feature of superpoints in the source point cloud) and +.>(i.e. the third superpoint feature of the superpoints in the target point cloud), the filter radius r _nms The number K of anchor points; initializing anchor set->

(2) Selecting the super point pair with the maximum similarity (the similarity between two points can be calculated by adopting the distance between the two points, the closer the distance is, the higher the similarity is, the Euclidean distance between the continuous features is selected as the measure of the similarity, and the closer the Euclidean distance between the two super point feature vectors is, the greater the similarity is) from the super point feature space of the source point cloud and the target point cloudWill be put into->In, then filtering +.>Point beside r _nms All super points within the radius, filter +.in the target point cloud>Point beside r _nms All superpoints within the radius.

(3) Repeating (2) untilOutput->

At the time of obtaining the anchor point setAnd then, carrying out relevance characteristic fusion on the anchor point and the super point, and enhancing the characteristic of the super point by using the distance angle relation between the anchor point and the super point so as to enable the super point to have more differentiation.

Anchor point usage in a source point cloudTo indicate, for one super point in the source point cloud +.>Distance ρ between it and the ith anchor point _i The equation (10) can be used to obtain:

in this embodiment, the geometric correlation feature of the super point and the anchor point includes two parts, one part is the distance feature between the super point and the anchor point, and the other part is the angle correlation feature between the source point cloud set and the anchor point set, and in fig. 3, taking the case of including three anchor points as an example, the distance between each anchor point and the super point and the included angle formed between one super point and two anchor points are obtained by mapping respectively to obtain the distance feature between the super point and the anchor point and the angle correlation feature between the source point cloud set and the anchor point set.

Then, similar to the previous method, the distance is mapped to a high-dimensional space by using a sine position function, and the distance characteristic between the super point and the anchor point is obtainedThe process can be expressed by the formula (11):

wherein K is the number of anchor points, W _d ∈R ^C×C Is a learnable matrix used to project distance features.

In the geometric correlation feature, in addition to the distance correlation between points, the angle correlation between points is considered, so that the angle correlation feature between the super point and the anchor point is extracted next.

For one super point in the source point cloudFirst fix it as a vertex, then calculate its angle with the other two anchor points using equation (12):

wherein θ _k Representing the kth pair of anchor points and superspotAngle between->Represents the first anchor point,>representing the s-th anchor point, the formula first constructs two vectors, and then calculates the angle between the two vectors by deg (·).

After obtaining the angle between a super point and a pair of anchor points, mapping it to a high-dimensional space using sinusoidal location functions, obtaining the angle correlation feature between the source point cloud set and the anchor point set using equation (13)

Wherein sigma _θ Is the coefficient of angular sensitivity and,is a learnable matrix for projecting angular features, < >>The number of groups of two groups of anchors is represented, i.e., the number of combinations of 2 groups of K anchors.

Obtaining the distance correlation characteristic between the super point and the anchor pointAnd angle dependence feature->Then, using formula (14) to obtain the geometrical correlation characteristic of the super point and the anchor point +.>

Correlation feature between superpoint and anchor point in target point cloudIs obtained in the same way as the source point cloud.

After the relevant information features are calculated, the cross-attention mechanism is used for information exchange between two frames of point clouds, as shown in fig. 4. Specifically, the formula (15) is used to map the super-point characteristics of the source point cloud, the super-point-anchor point correlation characteristics in the source point cloud, the super-point characteristics of the target point cloud, and the super-point-anchor point correlation characteristics in the target point cloud:

then calculate the attention score between the nth superpoint in the source point cloud and the mth superpoint in the target point cloud using equation (16)

Finally, using a formula (17) to calculate the super point characteristics of the source point cloud after the correlation characteristics are fused and the information exchange is carried out with the target point cloud(i.e., the fifth superpoint feature of the superpoints in the source point cloud):

it should be noted that the above anchor-based relevance fusion module is iterative. After the super point is fused with the correlation characteristic, the anchor point is reselected from the super point, the correlation characteristic between the super point and the anchor point is calculated, and finally the correlation characteristic is fused into the super point by using a cross attention mechanism, so that the purpose of enhancing the super point characteristic is realized, and the iteration times reach the upper limit.

(IV) relational matching and pose estimation

After the fifth super-point characteristics of the source point cloud and the target point cloud are obtained, normalizing the fifth super-point characteristics to obtainAndand is also provided withCalculating the similarity matrix of the super points +.>The problem of searching for the accurate super-point corresponding relation is converted into an optimal transmission problem, and corresponding point pair sets with highest feature similarity are selected from a source point cloud and a target point cloud according to the super-point similarity matrix>The definition is shown as (18):

wherein,indicating that the K-point with the greatest similarity is selected.

After obtaining the correspondence of the superpoint level, we perform upsampling decoding on the first superpoint feature obtained using the variable kernel convolution downsampling encoding. In order to obtain the matching of the point levels, based on the decoded point characteristics, calculating a point similarity matrix S in each super-point neighborhood, and then obtaining a point corresponding relation set in the ith pair of super-point corresponding relations by using a Sinkhorn algorithmThe definition is shown as (19):

in order to realize accurate and efficient transform matrix estimation, the embodiment adopts a local-global transform matrix estimation method. Firstly, the transformation matrix estimation of the high confidence coefficient region is carried out, namely, in the region of each super-point corresponding relation, the transformation matrix estimation is carried out according to the point corresponding relation setEstimating a transformation matrix, and estimating a rotation matrix R of the i-th pair of super-point corresponding relations _i And translation vector t _i The result can be calculated by equation (20):

wherein R is _i Representing a rotation matrix obtained by estimating an ith pair of super-point areas in the super-point pair set; t is t _i Representing a translation vector estimated by the ith pair of super points in the second point pair set;representing an original point pair set within the super point pair region; w (w) _j A weight representing the jth original point pair; (p) _xj ,q _yj ) Representing coordinates of two points in the jth original point pair; r 'and t' each represent a group according to +.>The estimated coarse rotation matrix and translation vector.

Selecting an optimal rotation matrix and translation vector from the estimated plurality of rotation matrices and translation vectors, expressed as:

wherein R represents an optimal rotation matrix; t represents an optimal translation vector; PC represents the intersection of all pairs of origin points; (p) _k ,q _k ) Representing coordinates of two points in the kth original point pair; τ _a Representing a distance error threshold; []For Ai Fosen brackets, the value is 1 if the condition in brackets is satisfied, otherwise the value is 0.

In this embodiment, training is required for the trainable matrix in the four steps of feature extraction, multi-modal information fusion, selective correlation fusion, relationship matching, and pose estimation. Training deviceLoss function of trainingConsists of two parts, one part is the super-point matching loss function +.>The other part is +>

The superpoint matching loss is used to supervise the correct matching between superpoints. Firstly, if one super point in the source point cloud and one super point in the target point cloud are overlapped by at least 10%, the embodiment takes the pair of super points as positive samples, otherwise takes the corresponding relation of the pair of super points as negative samples, and then selects the super points with at least one positive sample from all the super points in the source point cloud to form a setFor each superpoint belonging to the set +.>It forms a set with positive sample pairs in the target point cloudForm a set with negative sample pairs in the target point cloud +.>The super-point matching loss in the final source point cloud can be calculated by equation (22):

wherein,is the characteristic of the source point cloud super point and the characteristic of the target point cloud super pointEuclidean distance in symptom space, < >>Representative is->And->Overlap ratio between them, super-parameter delta _p And delta _n Set to 0.1 and 1.4, respectively, positive sample ratio +.>Negative sample proportion->Super-point matching loss function of target point cloud>The same calculation mode as in the source point cloud, the final total super point matching loss function is calculated by the formula (23):

the point matching loss is calculated by a negative log likelihood function, which is used to monitor the transformation matrix calculated from the point correspondence in each super-point region. During the training phase, N is selected _g For the real super point corresponding relation, and calculating the real point corresponding relation set in the ith super point regionThen dividing the unmatched correct point correspondence into two sets according to the source point cloud and the target point cloud>And->Finally, the point matching loss function in the ith pair of super-point areas can be calculated by the formula (24):

wherein S is _i (x, y) represents the similarity score between the x-th and y-th original points in the super-point region, and the similarity matrixThe calculation formulas of (a) are the same, S _i (x, y) is calculated from the characteristics of the two original points.

Finally, the point matching loss function in all the super point corresponding relations is defined by a formula (25):

unless otherwise specified, the similarity between two features in the present invention is measured by using the euclidean distance between the two feature vectors, i.e., the smaller the euclidean distance, the greater the similarity. Other metrics known in the art may be used by those skilled in the art to measure the similarity between two feature vectors.

Fig. 5 shows a comparison of the present invention with the prior art, wherein the first column (a) in fig. 5 shows the input, the present embodiment takes as the input the point clouds with the overlapping rates of 42.8%, 43.5%, 29.6%, 12.5%, 22.7%, 14.2%, 27.2%, the second column (b) shows the corresponding real pose, the third column (c) shows the pose obtained by the existing point cloud matching, and the fourth column (d) shows the pose obtained by the present invention, and as can be seen by comparison, the pose obtained by the matching of the present invention is closer to the real pose at each overlapping rate than the matching result of the prior art.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The multi-mode point cloud registration method based on image and geometric information guidance is characterized by comprising the following steps of:

2. The method for multi-modal point cloud registration based on image and geometric information guidance according to claim 1, wherein the process of obtaining the pixel characteristics of the super-point comprises:

transforming a point in a point cloud to a camera coordinate system when a first frame of picture is shot, and transforming the point to an image coordinate system based on an internal reference matrix calibrated by a camera;

carrying out homogeneous operation on points under an image coordinate system to obtain pixel positions of one point in the image;

if during the downsampling process, one super-point is filtered out in its neighborhoodA dot, then calculate this->Pixel positions corresponding to the respective points are indexed according to the pixel positions to obtain the characteristics of the pixels, and the pixels are marked by the pixels>The pixel features constitute the pixel features of the super-point.

3. The method for multi-modal point cloud registration based on image and geometric information guidance according to claim 1 or 2, wherein the process of fusing the second super-point features with their corresponding pixel features based on an attention mechanism and a multi-layer perceptron comprises:

by mapping the second superpoint feature of the superpoint to a query vector Q in the attention mechanism _A Mapping the pixel characteristics corresponding to the super points into key vectors K respectively _A Sum vector V _A ；

The attention weight matrix is calculated by an attention mechanism, namely:

attention weight matrix W and value vector V based on multi-layer perceptron _A Fusing to obtain texture features of the super points, namely:

F＝MLP(W*V _A )

wherein W is the attention weight matrix between the second superpoint feature and the corresponding pixel feature, V _A As a value vector, F represents texture features of the super point; MLP (& gt) is a multi-layer perceptron; c is the dimension of the hidden layer in the attention mechanism.

4. The method of image and geometric information guided multi-modal point cloud registration according to claim 1, wherein the distance features between two superpoints are obtained by calculating the distance between two superpoints and mapping the distance to a high-dimensional space as the distance features.

5. The method for multi-modal point cloud registration based on image and geometric information guidance according to claim 1, wherein the process of fusing the distance feature between two superpoints with the third superpoint feature based on the attention mechanism to obtain the fourth superpoint feature comprises:

mapping the third super-point feature through three mapping matrixes to obtain a vectorVector->Vector->Through matrix W _g The distance characteristic between two superpoints is +.>Mapping to vector->Calculating an attention score:

calculating a fourth superpoint feature according to the attention score:

wherein C represents the dimension of the hidden layer in the attention mechanism; score _(r,m) An attention score between the r-th and m-th superpoints; score is an attention matrix of attention scores between all superpoints.

6. The method for multi-modal point cloud registration based on image and geometric information guidance according to claim 1, wherein the process of selecting anchor points comprises the steps of:

101. setting a filter radius r _nms And the number K of anchor pairs in the anchor point set, taking the third super point characteristic of the super point as input, and initializing the anchor point set

102. Calculating the Euclidean distance of the super-point feature vectors of the source point cloud and the target point cloud to obtain the super-point feature similarity, and selecting the super-point pair with the maximum similaritySuper point pair->Put in the collection->And filtering +.>Point beside r _nms All super points within the radius, filter +.in the target point cloud>Point beside r _nms All superpoints within the radius;

103. step 102 is repeated untilOutputting an anchor point set;

where || represents the number of elements in a set.

7. The method for multi-modal point cloud registration based on image and geometric information guidance according to claim 1, wherein the angle correlation between the anchor points and the superpoints is obtained by using one superpoint as a vertex, calculating the angle between the superpoint and a three-dimensional space vector formed by any two anchor points, and mapping the angle to a high-dimensional space as an angle correlation feature.

8. The method for multi-modal point cloud registration based on image and geometric information guidance according to claim 1, wherein the fifth super-point features are expressed as:

wherein,a fifth super point feature of the source point cloud; />A fourth super point feature which is a source point cloud; />A fourth super-point feature which is a target point cloud; />Is the geometric correlation characteristic between the source point cloud and the anchor point; />Is the geometrical correlation characteristic between the target point cloud and the anchor point; c is the dimension of the hidden layer in the attention mechanism.

9. The method for multi-modal point cloud registration based on image and geometric information guidance according to claim 1, wherein the calculation process of the rotation matrix and the translation matrix comprises:

estimating a rotation matrix and a translation vector according to the original point corresponding relation in each super point region in the super point pair set, wherein the estimation is expressed as follows:

wherein R is _i Representing a rotation matrix obtained by estimating an ith pair of super-point areas in the super-point pair set; t is t _i Representing a translation vector estimated by the ith pair of super points in the second point pair set;representing an original point pair set within the super point pair region; w (w) _j A weight representing the jth original point pair; />Representing coordinates of two points in the jth original point pair; r 'and t' each represent a group according to +.>The estimated coarse rotation matrix and translation vector;

10. The method of image and geometric information guided multi-modality point cloud registration of claim 9, wherein training the trainable parameters in the process of obtaining the optimal transformation matrix using the loss function comprises:

calculating the super-point matching loss in the source point cloud and the target point cloud respectively, taking the average value of the super-point matching losses of the source point cloud and the target point cloud as the super-point matching loss in the super-point estimation process, and expressing the matching loss of one super-point in the source point cloud as:

wherein,representing the matching loss of one super point in the source point cloud; />Representing a super point set of at least one positive sample super point pair in a source point cloud, wherein the super point pair is a positive sample if at least 10% of the super points in the source point cloud and the super points in the target point cloud overlap, the other super point pairs are negative samples, and the super points in the target point cloud form a set->Super points in the target point cloud in all negative samples constitute the set +.> Representation set->One of the super points->Representation set->One of the super points j, is->Representation set->A super point k; />Representing the overlapping rate between one super point i in the source point cloud and one super point j in the target point cloud; />Is a positive sample proportion, expressed as +.>Gamma is the scaling function, < >>Euclidean distance delta between fifth superpoint characteristic representing one superpoint i in source point cloud and one superpoint j in target point cloud in characteristic space _p Is a positive sample superparameter; />For negative sample proportion, expressed as +.> Representing Euclidean distance, delta, of super point characteristics of super point i in source point cloud and super point k in target point cloud in characteristic space _n Is a negative sample superparameter;

in the process of matching each super point, N is selected _g Computing the real point corresponding relation set in the ith super point region for the real super point corresponding relationDividing the unmatched correct point correspondence into two sets according to a source point cloud and a target point cloud>And->The i-th pair of point matching loss functions in the super-point region represents:

the loss function for all superpoints is expressed as:

wherein i represents the corresponding relation of the ith super point, S _i (x, y) represents a similarity score between the x-th and y-th original points in the super-point region; m is m _i Representing the number of original points in the source point cloud; n is n _i Representing the number of original points in the target point cloud.