CN116912296A

CN116912296A - Point cloud registration method based on position-enhanced attention mechanism

Info

Publication number: CN116912296A
Application number: CN202310917905.4A
Authority: CN
Inventors: 王丰; 靳勇勇
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-10-20

Abstract

The application discloses a point cloud registration method based on a position-enhanced attention mechanism, which comprises the following steps: firstly, extracting multi-scale characteristics of a source point cloud and a target point cloud; secondly, respectively extracting position information of a source point cloud and a target point cloud, learning context characteristic information of the source point cloud and the target point cloud from the multi-scale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic; generating an outlier parameter according to the alignment state of the source point cloud and the target point cloud, and obtaining the corresponding relation between the source point cloud and all points in the target point cloud by utilizing the outlier parameter and the mixed characteristic; and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.

Description

Point cloud registration method based on position-enhanced attention mechanism

Technical Field

The application relates to the field of three-dimensional point cloud registration in deep learning and computer vision, in particular to a three-dimensional point cloud registration method based on a position-enhanced attention mechanism.

Technical Field

The point cloud registration technology is a process of transforming point clouds acquired by a point cloud scanning device under different view angles into the same coordinate system through rotation, translation and the like, and is widely applied to the fields of attitude estimation, three-dimensional reconstruction, mobile robots and the like. However, in practical applications, the point cloud registration also has a certain challenge, mainly because 1) there are problems of noise and partial invisibility between the point clouds scanned under the non-same view angle; 2) The point cloud is unordered, sparse. Therefore, in the actual point cloud registration task, it is essential to improve the accuracy and robustness of the algorithm.

According to the conversion mode of the data, the existing point cloud registration method can be divided into methods based on voxels, multi-view and point cloud. Since the first two methods cause information loss, the most widely used method at present is a point cloud-based method. The PointNet algorithm solves the problems of unordered and rotation invariance of the point cloud. The processing of the point cloud registration problem is then generalized to deep learning-based methods. Compared with the traditional registration method for iterating the nearest point algorithm, the method based on the deep learning solves the problem of sinking into local optimum. However, when noise exists in the point cloud and part of the point cloud is missing, the method based on deep learning cannot keep a good registration effect.

Conventional methods in existing point cloud registration methods, such as the closest point iterative algorithm ICP (Iterative Closest Point), tend to fall into local optima. A deep learning-based method, for example using robust point matching Rpm-Net (Robust Point Matching using Learned Features) of learning features, enables efficient registration in the context of certain noise and small part of point cloud deletions. But when the missing part of the point cloud reaches 30%, the registration effect will be not ideal. The method is characterized in that the current registration method based on deep learning only focuses on local geometric features of the point cloud, lacks understanding of the global state, does not consider feature interaction between the source point cloud and the target point cloud and position information of corresponding points, so that the learned features are not so discernable, and a large number of wrong point corresponding relations are caused.

Therefore, in the scene that the point cloud has noise and the missing part of the point cloud reaches 30% or higher, how to improve the registration effect is a difficult problem to be solved at present.

Disclosure of Invention

The application aims to provide a point cloud registration method based on a position-enhanced attention mechanism, so that a network can learn context information between each point cloud and can be combined with the geometric structure of the point cloud at the same time to obtain the characteristic with more geometric relevance, and the registration performance is improved.

In order to realize the tasks, the application adopts the following technical scheme:

a point cloud registration method based on a location-enhanced attention mechanism, comprising:

firstly, respectively inputting source point cloud and target point cloud data into a self-adaptive graph convolution feature extraction module, and extracting multi-scale features of the source point cloud and the target point cloud;

secondly, inputting the multiscale characteristics of the source point cloud and the target point cloud into a position enhancement attention mechanism module, respectively extracting the position information of the source point cloud and the target point cloud, learning the context characteristic information of the source point cloud and the target point cloud from the multiscale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic;

then, according to the three-dimensional coordinates of the source point cloud and the target point cloud, the alignment states of the source point cloud and the target point cloud are obtained, the alignment states of the source point cloud and the target point cloud are input into an outlier parameter module to generate outlier parameters, the outlier parameters and the mixed characteristics are input into a similarity matching module together, and accordingly the corresponding relation between the source point cloud and all points in the target point cloud is obtained;

and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.

Further, the performing iterative solution to obtain a final rotation matrix and a translation matrix includes:

calculating a loss function between the obtained rotation matrix and translation matrix and a real rotation matrix and translation matrix provided by the data set, if the loss function is not converged, multiplying the source point cloud with the rotation matrix and summing the source point cloud with the translation matrix, so as to obtain a new source point cloud, and continuing a new round of iterative registration; if the loss function converges, a rotation matrix and a translation matrix are output.

Further, the processing procedure of the position enhancement attention mechanism module is as follows:

extracting context information from the multiscale characteristics of the source point cloud, directly extracting the position information of the source point cloud, splicing the position information with the context information, inputting the spliced characteristics and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context characteristics and the position information to obtain fusion characteristic information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting the mixed characteristic.

Further, the position information is encoded by the distance between the points and the normal vector coordinates; any two points in a source point cloud The space distance between the two is calculated by the following steps: />For normal vector information, directly adopting a linear layer to encode normal vectors of points in the source point cloud in space; and splicing the spatial distance information of the points with the coded normal vector information to obtain the position information of the points.

Further, the calculation process of the position enhancement attention mechanism module is as follows:

Q＝F·W _Q ,K＝F·W _K ,V＝F·W _V

wherein the multiscale characteristics of the source point cloud are as followsThe fusion characteristic information of the source point cloud is +.>J is Yun Zhongdian number of source points, d is the number of feature dimensions, < >>Representing real number sets, S _j Three projection matrixes representing the attention weight and the input characteristics of the source point cloud are Q, K and V; j denotes the index of the dot, and superscript T denotes the matrix transpose,/>Is the position information of the point, W _Q 、W _K And->Respectively, the learnable parameters, MLP represents the multi-layer perceptron, softmax (-) represents the line softmax, cat [. Cndot.,. Cndot.)]Representation stitching。

Further, the inputting the alignment state of the source point cloud and the target point cloud into the outlier parameter module to generate the outlier parameter includes:

using a parameter prediction network, taking the unaligned point cloud as input to predict parameters of a current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K), wherein B is the batch size, J and K are the numbers of the source point cloud and the target point cloud respectively, and 3 represents 3-dimensional coordinates;

to characterize which point cloud a point comes from, column 4 features are added, 0 indicating that a point comes from the source point cloud, 1 indicating that a point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the abnormal value parameters alpha and beta are finally obtained through the multi-layer perceptron and the maximum pooling layer.

Further, the inputting the outlier parameter and the mixed feature to the similarity matching module together, so as to obtain the correspondence between the source point cloud and all points in the target point cloud, including:

inputting outlier parameters alpha and beta and the mixed characteristics into a similarity matching module to construct a matching matrix M; each element m of the matching matrix _jk The e M initialization is as follows:

wherein Fx _j ,Fy _k The characteristics of the source point cloud and the target point cloud are mixed respectively;

and then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud.

Further, for each point p in the source point cloud _j Corresponding points in the target point cloud can be calculated

Wherein K represents the index of the point in the target point cloud, K represents the number of the points in the target point cloud, j represents the index of the point in the source point cloud, q _k Representing the kth point in the target point cloud.

Further, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between a source point cloud and a target point cloud, and iterative solution is carried out to obtain a final rotation matrix and a translation matrix, which comprises the following steps:

corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition ⁽ⁿ⁾ And a translation matrix t ⁽ⁿ⁾ ：

Wherein n represents the iteration times, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and the translation matrix obtained in the current iteration;

obtaining a rotation matrix R ⁽ⁿ⁾ And a translation matrix t ⁽ⁿ⁾ Then iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R ^(*) And a translation matrix t ^(*) 。

Further, the loss function is defined as the true transformation matrix { R } of the source point cloud _gt ,t _gt Predictive transformation matrix of { R } and source point cloud ^(*) ,t ^(*) L between } ₁ The distance, loss function is calculated as:

wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x _j Representing the jth point in the source point cloud。

Compared with the prior art, the application has the following technical characteristics:

when the point cloud has noise and when part of the point cloud is missing, the traditional algorithm and the learning-based algorithm can not be registered more effectively. The application provides an end-to-end point cloud registration scheme: extracting local geometric features of the point cloud from the original pair of point clouds, splicing the local geometric features into multi-scale features, inputting the features, the distance and normal line information of the points into a position-enhanced attention mechanism, and finally obtaining the mixed features. And then obtaining point corresponding relation based on the mixed characteristic, and finally obtaining a final rotation matrix and a final translation matrix by utilizing singular value decomposition. Compared with the traditional registration method and the registration method based on deep learning, the method has better registration effect under the scene that the point cloud has noise and part of the point cloud is missing; the position enhancing attention mechanism provided by the application can improve the understanding of the registration network to the position information and the distinguishing ability of the learned features, and improve the registration performance.

Drawings

FIG. 1 is a schematic diagram of a network of the method of the present application;

FIG. 2 is a schematic diagram of a location-enhanced attention module;

FIG. 3 is a schematic flow chart of the method of the present application.

Fig. 4 is a graph of the registration effect of the method of the present application in an embodiment.

Detailed Description

The application provides a point cloud registration method based on a position-enhanced attention mechanism, which is structurally shown in fig. 1:

The method comprises the steps of carrying out iterative solution to obtain a final rotation matrix and a translation matrix, wherein the method comprises the following steps:

The self-adaptive graph convolution proposed by Wei M et al overcomes the defects of a standard graph convolution fixed kernel, and adaptively establishes a relationship between a pair of points according to characteristic attributes, so as to generate the self-adaptive kernel, and further, different relationships between different partial points in the point cloud can be extracted more effectively. But lacks a global understanding. Therefore, the method inputs the point cloud into the self-adaptive graph convolution feature extraction module formed by the four self-adaptive graph convolution layers (64,64,128,256) to perform multi-level feature extraction, and then performs splicing operation to generate multi-scale features.

Aiming at the multi-scale characteristics of the source point cloud and the multi-scale characteristics of the target point cloud, the processing procedure of the position enhancing attention mechanism module is as follows: firstly, extracting context information from multi-scale features of a source point cloud, directly extracting position information of the source point cloud, splicing the position information with the context information, inputting the spliced features and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context features and the position information to obtain fusion feature information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting a mixed characteristic, wherein the structure of the mixed characteristic is shown in figure 2. The position enhancing attention mechanism of the application mainly embeds the position information of the point cloud into the calculation of the attention module, thereby helping the model to learn the space structure between the points in the point cloud, and enabling the model to pay more attention to the information of a specific area when processing the point cloud, thereby improving the perception capability of key points and reducing the wrong point corresponding relation between the source point cloud and the target point cloud.

The position information is encoded primarily by the distance between the points and the normal vector coordinates. The method calculates Euclidean distance of points, namely any two points in a source point cloudThe space distance between the two is calculated by the following steps: />And for normal vector information, directly adopting a linear layer to encode the normal vector of points in the source point cloud in space. And splicing the spatial distance information of the point with the coded normal vector information to obtain the position information of the point, thereby coding the position information of the point.

In the following, a calculation process of the source point cloud in the self-attention module is given.

Given an input feature matrix (Multi-scale features of a source point cloud)Output feature matrix (fusion feature information of source point cloud)/(fusion feature information)>(J is Yun Zhongdian number of source points, d is number of feature dimensions, < >>Representing a real set) is a weighted sum of projections of all input features F, S _j Representing the attention weight, first calculate Q, K, V (projection matrix of source point cloud input features):

Q＝F·W _Q ,K＝F·W _K ,V＝F·W _V

here, J is the number of source points Yun Zhongdian, J denotes the index of the point, the superscript T denotes the matrix transpose,is the position information of the point, W _Q 、W _K And->Respectively, the learning parameters are obtained through data set training, MLP represents a multi-layer sensor, and softmax () represents the line softmax, cat [. Cndot.,)]Representing stitching.

And a cross attention module is added after the self attention module, so that effective characteristic information interaction can be carried out between the source point cloud and the target point cloud, and finally, the mixed characteristic is obtained.

The parameter prediction network proposed by Yew Z J et al can select a proper outlier parameter according to the alignment state of the current point cloud. Thus using the parameter prediction network, the unaligned point cloud is taken as input to predict the parameters of the current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K) (B is the batch size, J, K are the number of the source point cloud and the target point cloud respectively, 3 represents 3-dimensional coordinates), and in order to represent which point cloud a certain point comes from, a 4 th column of characteristics is added, wherein 0 represents that the certain point comes from the source point cloud, and 1 represents that the certain point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the data of (B, 2) is finally obtained as outlier parameters alpha and beta through the multi-layer perceptron and the maximum pooling layer.

Then, the outlier parameters alpha and beta and the mixed characteristics are input into a similarity matching module to construct a matching matrix M; each element m of the matching matrix _jk The e M initialization is as follows:

wherein Fx _j ,Fy _k The hybrid features of the source point cloud and the target point cloud, respectively.

And then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud:

for each point p in the source point cloud _j Corresponding points in the target point cloud can be calculated

Finally, singular value decomposition is used to solve for the rigid transformation:

Where n represents the number of iterations, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and translation matrix obtained for the current iteration.

Obtaining a rotation matrix R ⁽ⁿ⁾ And a translation matrix t ⁽ⁿ⁾ Iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R ^(*) And a translation matrix t ^(*) 。

Taking the first three rows and the first three columns of a real transformation matrix in the data set used for network training, and representing the three rows and the first three columns as a real rotation matrix R _gt Correspondingly, the last column of the first three rows of the true transformation matrix in the dataset is taken, which is denoted as the true translation matrix t _gt . The loss function is defined as the true transformation matrix { R for the source point cloud _gt ,t _gt Predictive transformation matrix of { R } and source point cloud ^(*) ,t ^(*) L between } ₁ The distance is Manhattan distance. The loss function is calculated as:

wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x _j Representing the j-th point in the source point cloud.

The point cloud registration result of the application: the visual registration results are shown in fig. 4 (the left graph is an initial point cloud, and the right graph is a registration result), and it can be seen that the registration method provided by the application can realize accurate registration under the scene that noise exists in the point cloud and part of the point cloud is missing.

The computer processor used in this experiment was Intel (R) Xeon (R) Bronze 3204CPU@1.90GHz, the graphics card was RTX3090 GPU, pyTorch version 1.13.0, and the Python programming language was used.

The method of the application is compared with the traditional algorithm ICP (Iterative Closest Point) and the algorithm based on deep learning: the robust point matches for the learned features, rpm-Net (Robust Point Matching using Learned Features), are compared and the results are shown in table 1.

All methods were tested on the ModelNet40 dataset. Noise was added and there was a 30% point cloud missing, using the anisotropic mean absolute errors of the rotation and translation matrices, MAE (r), MAE (t), respectively, and also using the isotropic mean absolute errors of the rotation and translation matrices, error (r), error (t), respectively.

Table 1 results of experimental registration of different methods

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A point cloud registration method based on a location-enhanced attention mechanism, comprising:

2. The method for point cloud registration based on a position-enhanced-attention mechanism of claim 1, wherein the performing an iterative solution to obtain a final rotation matrix and a translation matrix comprises:

3. The point cloud registration method based on a location-based enhanced-attention mechanism of claim 1, wherein the processing procedure of the location-enhanced-attention mechanism module is:

4. The point cloud registration method based on a location-enhanced attentiveness mechanism as claimed in claim 1, wherein said location information is generalEncoding the distance between the points and the normal vector coordinates; any two points in a source point cloudThe space distance between the two is calculated by the following steps: />For normal vector information, directly adopting a linear layer to encode normal vectors of points in the source point cloud in space; and splicing the spatial distance information of the points with the coded normal vector information to obtain the position information of the points.

5. The point cloud registration method based on a location-based enhanced-attention mechanism of claim 1, wherein the calculation process of the location-enhanced-attention mechanism module is as follows:

Q＝F·W _Q ,K＝F·W _K ,V＝F·W _V

wherein the multiscale characteristics of the source point cloud are as followsThe fusion characteristic information of the source point cloud is +.>J is Yun Zhongdian number of source points, d is the number of feature dimensions, < >>Representing real number sets, S _j Representing attention weights, source point cloud input featuresThe three projection matrixes of (a) are Q, K and V; j denotes the index of the dot, and superscript T denotes the matrix transpose,/>Is the position information of the point, W _Q 、W _K And (d) sumRespectively, the learnable parameters, MLP represents the multi-layer perceptron, softmax (-) represents the line softmax, cat [. Cndot.,. Cndot.)]Representing stitching.

6. The method for point cloud registration based on a location enhanced attention mechanism of claim 1, wherein the inputting the alignment state of the source point cloud and the target point cloud into the outlier parameter module generates an outlier parameter, comprising:

7. The point cloud registration method based on the position-enhanced attention mechanism according to claim 1, wherein the inputting the outlier parameter and the mixed feature together into the similarity matching module to obtain the correspondence between the source point cloud and all points in the target point cloud includes:

8. The method of point cloud registration based on a location enhanced attention mechanism of claim 1, wherein for each point p in the source point cloud _j Corresponding points in the target point cloud can be calculated

9. The point cloud registration method based on a position enhancement mechanism according to claim 1, wherein a singular value decomposition method is used to obtain a rotation matrix and a translation matrix between a source point cloud and a target point cloud, and iterative solution is performed to obtain a final rotation matrix and a translation matrix, and the method comprises:

corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition ⁽ⁿ⁾ Translation matrixt ⁽ⁿ⁾ ：

10. The method of point cloud registration based on a location enhanced attention mechanism of claim 1, wherein the loss function is defined as a true transformation matrix { R for the source point cloud _gt ,t _gt Predictive transformation matrix of { R } and source point cloud ^(*) ,t ^(*) L between } ₁ The distance, loss function is calculated as: