CN116912296A - Point cloud registration method based on position-enhanced attention mechanism - Google Patents
Point cloud registration method based on position-enhanced attention mechanism Download PDFInfo
- Publication number
- CN116912296A CN116912296A CN202310917905.4A CN202310917905A CN116912296A CN 116912296 A CN116912296 A CN 116912296A CN 202310917905 A CN202310917905 A CN 202310917905A CN 116912296 A CN116912296 A CN 116912296A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- source point
- matrix
- source
- target point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000007246 mechanism Effects 0.000 title claims abstract description 28
- 239000011159 matrix material Substances 0.000 claims abstract description 113
- 238000013519 translation Methods 0.000 claims abstract description 44
- 230000004927 fusion Effects 0.000 claims abstract description 14
- 230000003993 interaction Effects 0.000 claims abstract description 12
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 18
- 239000013598 vector Substances 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 241000282326 Felis catus Species 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 5
- 230000002708 enhancing effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 229910000906 Bronze Inorganic materials 0.000 description 1
- 239000010974 bronze Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- KUNSUQLRTQLHQQ-UHFFFAOYSA-N copper tin Chemical compound [Cu].[Sn] KUNSUQLRTQLHQQ-UHFFFAOYSA-N 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a point cloud registration method based on a position-enhanced attention mechanism, which comprises the following steps: firstly, extracting multi-scale characteristics of a source point cloud and a target point cloud; secondly, respectively extracting position information of a source point cloud and a target point cloud, learning context characteristic information of the source point cloud and the target point cloud from the multi-scale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic; generating an outlier parameter according to the alignment state of the source point cloud and the target point cloud, and obtaining the corresponding relation between the source point cloud and all points in the target point cloud by utilizing the outlier parameter and the mixed characteristic; and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.
Description
Technical Field
The application relates to the field of three-dimensional point cloud registration in deep learning and computer vision, in particular to a three-dimensional point cloud registration method based on a position-enhanced attention mechanism.
Technical Field
The point cloud registration technology is a process of transforming point clouds acquired by a point cloud scanning device under different view angles into the same coordinate system through rotation, translation and the like, and is widely applied to the fields of attitude estimation, three-dimensional reconstruction, mobile robots and the like. However, in practical applications, the point cloud registration also has a certain challenge, mainly because 1) there are problems of noise and partial invisibility between the point clouds scanned under the non-same view angle; 2) The point cloud is unordered, sparse. Therefore, in the actual point cloud registration task, it is essential to improve the accuracy and robustness of the algorithm.
According to the conversion mode of the data, the existing point cloud registration method can be divided into methods based on voxels, multi-view and point cloud. Since the first two methods cause information loss, the most widely used method at present is a point cloud-based method. The PointNet algorithm solves the problems of unordered and rotation invariance of the point cloud. The processing of the point cloud registration problem is then generalized to deep learning-based methods. Compared with the traditional registration method for iterating the nearest point algorithm, the method based on the deep learning solves the problem of sinking into local optimum. However, when noise exists in the point cloud and part of the point cloud is missing, the method based on deep learning cannot keep a good registration effect.
Conventional methods in existing point cloud registration methods, such as the closest point iterative algorithm ICP (Iterative Closest Point), tend to fall into local optima. A deep learning-based method, for example using robust point matching Rpm-Net (Robust Point Matching using Learned Features) of learning features, enables efficient registration in the context of certain noise and small part of point cloud deletions. But when the missing part of the point cloud reaches 30%, the registration effect will be not ideal. The method is characterized in that the current registration method based on deep learning only focuses on local geometric features of the point cloud, lacks understanding of the global state, does not consider feature interaction between the source point cloud and the target point cloud and position information of corresponding points, so that the learned features are not so discernable, and a large number of wrong point corresponding relations are caused.
Therefore, in the scene that the point cloud has noise and the missing part of the point cloud reaches 30% or higher, how to improve the registration effect is a difficult problem to be solved at present.
Disclosure of Invention
The application aims to provide a point cloud registration method based on a position-enhanced attention mechanism, so that a network can learn context information between each point cloud and can be combined with the geometric structure of the point cloud at the same time to obtain the characteristic with more geometric relevance, and the registration performance is improved.
In order to realize the tasks, the application adopts the following technical scheme:
a point cloud registration method based on a location-enhanced attention mechanism, comprising:
firstly, respectively inputting source point cloud and target point cloud data into a self-adaptive graph convolution feature extraction module, and extracting multi-scale features of the source point cloud and the target point cloud;
secondly, inputting the multiscale characteristics of the source point cloud and the target point cloud into a position enhancement attention mechanism module, respectively extracting the position information of the source point cloud and the target point cloud, learning the context characteristic information of the source point cloud and the target point cloud from the multiscale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic;
then, according to the three-dimensional coordinates of the source point cloud and the target point cloud, the alignment states of the source point cloud and the target point cloud are obtained, the alignment states of the source point cloud and the target point cloud are input into an outlier parameter module to generate outlier parameters, the outlier parameters and the mixed characteristics are input into a similarity matching module together, and accordingly the corresponding relation between the source point cloud and all points in the target point cloud is obtained;
and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.
Further, the performing iterative solution to obtain a final rotation matrix and a translation matrix includes:
calculating a loss function between the obtained rotation matrix and translation matrix and a real rotation matrix and translation matrix provided by the data set, if the loss function is not converged, multiplying the source point cloud with the rotation matrix and summing the source point cloud with the translation matrix, so as to obtain a new source point cloud, and continuing a new round of iterative registration; if the loss function converges, a rotation matrix and a translation matrix are output.
Further, the processing procedure of the position enhancement attention mechanism module is as follows:
extracting context information from the multiscale characteristics of the source point cloud, directly extracting the position information of the source point cloud, splicing the position information with the context information, inputting the spliced characteristics and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context characteristics and the position information to obtain fusion characteristic information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting the mixed characteristic.
Further, the position information is encoded by the distance between the points and the normal vector coordinates; any two points in a source point cloud The space distance between the two is calculated by the following steps: />For normal vector information, directly adopting a linear layer to encode normal vectors of points in the source point cloud in space; and splicing the spatial distance information of the points with the coded normal vector information to obtain the position information of the points.
Further, the calculation process of the position enhancement attention mechanism module is as follows:
Q=F·W Q ,K=F·W K ,V=F·W V
wherein the multiscale characteristics of the source point cloud are as followsThe fusion characteristic information of the source point cloud is +.>J is Yun Zhongdian number of source points, d is the number of feature dimensions, < >>Representing real number sets, S j Three projection matrixes representing the attention weight and the input characteristics of the source point cloud are Q, K and V; j denotes the index of the dot, and superscript T denotes the matrix transpose,/>Is the position information of the point, W Q 、W K And->Respectively, the learnable parameters, MLP represents the multi-layer perceptron, softmax (-) represents the line softmax, cat [. Cndot.,. Cndot.)]Representation stitching。
Further, the inputting the alignment state of the source point cloud and the target point cloud into the outlier parameter module to generate the outlier parameter includes:
using a parameter prediction network, taking the unaligned point cloud as input to predict parameters of a current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K), wherein B is the batch size, J and K are the numbers of the source point cloud and the target point cloud respectively, and 3 represents 3-dimensional coordinates;
to characterize which point cloud a point comes from, column 4 features are added, 0 indicating that a point comes from the source point cloud, 1 indicating that a point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the abnormal value parameters alpha and beta are finally obtained through the multi-layer perceptron and the maximum pooling layer.
Further, the inputting the outlier parameter and the mixed feature to the similarity matching module together, so as to obtain the correspondence between the source point cloud and all points in the target point cloud, including:
inputting outlier parameters alpha and beta and the mixed characteristics into a similarity matching module to construct a matching matrix M; each element m of the matching matrix jk The e M initialization is as follows:
wherein Fx j ,Fy k The characteristics of the source point cloud and the target point cloud are mixed respectively;
and then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud.
Further, for each point p in the source point cloud j Corresponding points in the target point cloud can be calculated
Wherein K represents the index of the point in the target point cloud, K represents the number of the points in the target point cloud, j represents the index of the point in the source point cloud, q k Representing the kth point in the target point cloud.
Further, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between a source point cloud and a target point cloud, and iterative solution is carried out to obtain a final rotation matrix and a translation matrix, which comprises the following steps:
corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition (n) And a translation matrix t (n) :
Wherein n represents the iteration times, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and the translation matrix obtained in the current iteration;
obtaining a rotation matrix R (n) And a translation matrix t (n) Then iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R (*) And a translation matrix t (*) 。
Further, the loss function is defined as the true transformation matrix { R } of the source point cloud gt ,t gt Predictive transformation matrix of { R } and source point cloud (*) ,t (*) L between } 1 The distance, loss function is calculated as:
wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x j Representing the jth point in the source point cloud。
Compared with the prior art, the application has the following technical characteristics:
when the point cloud has noise and when part of the point cloud is missing, the traditional algorithm and the learning-based algorithm can not be registered more effectively. The application provides an end-to-end point cloud registration scheme: extracting local geometric features of the point cloud from the original pair of point clouds, splicing the local geometric features into multi-scale features, inputting the features, the distance and normal line information of the points into a position-enhanced attention mechanism, and finally obtaining the mixed features. And then obtaining point corresponding relation based on the mixed characteristic, and finally obtaining a final rotation matrix and a final translation matrix by utilizing singular value decomposition. Compared with the traditional registration method and the registration method based on deep learning, the method has better registration effect under the scene that the point cloud has noise and part of the point cloud is missing; the position enhancing attention mechanism provided by the application can improve the understanding of the registration network to the position information and the distinguishing ability of the learned features, and improve the registration performance.
Drawings
FIG. 1 is a schematic diagram of a network of the method of the present application;
FIG. 2 is a schematic diagram of a location-enhanced attention module;
FIG. 3 is a schematic flow chart of the method of the present application.
Fig. 4 is a graph of the registration effect of the method of the present application in an embodiment.
Detailed Description
The application provides a point cloud registration method based on a position-enhanced attention mechanism, which is structurally shown in fig. 1:
firstly, respectively inputting source point cloud and target point cloud data into a self-adaptive graph convolution feature extraction module, and extracting multi-scale features of the source point cloud and the target point cloud;
secondly, inputting the multiscale characteristics of the source point cloud and the target point cloud into a position enhancement attention mechanism module, respectively extracting the position information of the source point cloud and the target point cloud, learning the context characteristic information of the source point cloud and the target point cloud from the multiscale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic;
then, according to the three-dimensional coordinates of the source point cloud and the target point cloud, the alignment states of the source point cloud and the target point cloud are obtained, the alignment states of the source point cloud and the target point cloud are input into an outlier parameter module to generate outlier parameters, the outlier parameters and the mixed characteristics are input into a similarity matching module together, and accordingly the corresponding relation between the source point cloud and all points in the target point cloud is obtained;
and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.
The method comprises the steps of carrying out iterative solution to obtain a final rotation matrix and a translation matrix, wherein the method comprises the following steps:
calculating a loss function between the obtained rotation matrix and translation matrix and a real rotation matrix and translation matrix provided by the data set, if the loss function is not converged, multiplying the source point cloud with the rotation matrix and summing the source point cloud with the translation matrix, so as to obtain a new source point cloud, and continuing a new round of iterative registration; if the loss function converges, a rotation matrix and a translation matrix are output.
The self-adaptive graph convolution proposed by Wei M et al overcomes the defects of a standard graph convolution fixed kernel, and adaptively establishes a relationship between a pair of points according to characteristic attributes, so as to generate the self-adaptive kernel, and further, different relationships between different partial points in the point cloud can be extracted more effectively. But lacks a global understanding. Therefore, the method inputs the point cloud into the self-adaptive graph convolution feature extraction module formed by the four self-adaptive graph convolution layers (64,64,128,256) to perform multi-level feature extraction, and then performs splicing operation to generate multi-scale features.
Aiming at the multi-scale characteristics of the source point cloud and the multi-scale characteristics of the target point cloud, the processing procedure of the position enhancing attention mechanism module is as follows: firstly, extracting context information from multi-scale features of a source point cloud, directly extracting position information of the source point cloud, splicing the position information with the context information, inputting the spliced features and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context features and the position information to obtain fusion feature information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting a mixed characteristic, wherein the structure of the mixed characteristic is shown in figure 2. The position enhancing attention mechanism of the application mainly embeds the position information of the point cloud into the calculation of the attention module, thereby helping the model to learn the space structure between the points in the point cloud, and enabling the model to pay more attention to the information of a specific area when processing the point cloud, thereby improving the perception capability of key points and reducing the wrong point corresponding relation between the source point cloud and the target point cloud.
The position information is encoded primarily by the distance between the points and the normal vector coordinates. The method calculates Euclidean distance of points, namely any two points in a source point cloudThe space distance between the two is calculated by the following steps: />And for normal vector information, directly adopting a linear layer to encode the normal vector of points in the source point cloud in space. And splicing the spatial distance information of the point with the coded normal vector information to obtain the position information of the point, thereby coding the position information of the point.
In the following, a calculation process of the source point cloud in the self-attention module is given.
Given an input feature matrix (Multi-scale features of a source point cloud)Output feature matrix (fusion feature information of source point cloud)/(fusion feature information)>(J is Yun Zhongdian number of source points, d is number of feature dimensions, < >>Representing a real set) is a weighted sum of projections of all input features F, S j Representing the attention weight, first calculate Q, K, V (projection matrix of source point cloud input features):
Q=F·W Q ,K=F·W K ,V=F·W V
here, J is the number of source points Yun Zhongdian, J denotes the index of the point, the superscript T denotes the matrix transpose,is the position information of the point, W Q 、W K And->Respectively, the learning parameters are obtained through data set training, MLP represents a multi-layer sensor, and softmax () represents the line softmax, cat [. Cndot.,)]Representing stitching.
And a cross attention module is added after the self attention module, so that effective characteristic information interaction can be carried out between the source point cloud and the target point cloud, and finally, the mixed characteristic is obtained.
The parameter prediction network proposed by Yew Z J et al can select a proper outlier parameter according to the alignment state of the current point cloud. Thus using the parameter prediction network, the unaligned point cloud is taken as input to predict the parameters of the current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K) (B is the batch size, J, K are the number of the source point cloud and the target point cloud respectively, 3 represents 3-dimensional coordinates), and in order to represent which point cloud a certain point comes from, a 4 th column of characteristics is added, wherein 0 represents that the certain point comes from the source point cloud, and 1 represents that the certain point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the data of (B, 2) is finally obtained as outlier parameters alpha and beta through the multi-layer perceptron and the maximum pooling layer.
Then, the outlier parameters alpha and beta and the mixed characteristics are input into a similarity matching module to construct a matching matrix M; each element m of the matching matrix jk The e M initialization is as follows:
wherein Fx j ,Fy k The hybrid features of the source point cloud and the target point cloud, respectively.
And then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud:
for each point p in the source point cloud j Corresponding points in the target point cloud can be calculated
Wherein K represents the index of the point in the target point cloud, K represents the number of the points in the target point cloud, j represents the index of the point in the source point cloud, q k Representing the kth point in the target point cloud.
Finally, singular value decomposition is used to solve for the rigid transformation:
corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition (n) And a translation matrix t (n) :
Where n represents the number of iterations, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and translation matrix obtained for the current iteration.
Obtaining a rotation matrix R (n) And a translation matrix t (n) Iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R (*) And a translation matrix t (*) 。
Taking the first three rows and the first three columns of a real transformation matrix in the data set used for network training, and representing the three rows and the first three columns as a real rotation matrix R gt Correspondingly, the last column of the first three rows of the true transformation matrix in the dataset is taken, which is denoted as the true translation matrix t gt . The loss function is defined as the true transformation matrix { R for the source point cloud gt ,t gt Predictive transformation matrix of { R } and source point cloud (*) ,t (*) L between } 1 The distance is Manhattan distance. The loss function is calculated as:
wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x j Representing the j-th point in the source point cloud.
The point cloud registration result of the application: the visual registration results are shown in fig. 4 (the left graph is an initial point cloud, and the right graph is a registration result), and it can be seen that the registration method provided by the application can realize accurate registration under the scene that noise exists in the point cloud and part of the point cloud is missing.
The computer processor used in this experiment was Intel (R) Xeon (R) Bronze 3204CPU@1.90GHz, the graphics card was RTX3090 GPU, pyTorch version 1.13.0, and the Python programming language was used.
The method of the application is compared with the traditional algorithm ICP (Iterative Closest Point) and the algorithm based on deep learning: the robust point matches for the learned features, rpm-Net (Robust Point Matching using Learned Features), are compared and the results are shown in table 1.
All methods were tested on the ModelNet40 dataset. Noise was added and there was a 30% point cloud missing, using the anisotropic mean absolute errors of the rotation and translation matrices, MAE (r), MAE (t), respectively, and also using the isotropic mean absolute errors of the rotation and translation matrices, error (r), error (t), respectively.
Table 1 results of experimental registration of different methods
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (10)
1. A point cloud registration method based on a location-enhanced attention mechanism, comprising:
firstly, respectively inputting source point cloud and target point cloud data into a self-adaptive graph convolution feature extraction module, and extracting multi-scale features of the source point cloud and the target point cloud;
secondly, inputting the multiscale characteristics of the source point cloud and the target point cloud into a position enhancement attention mechanism module, respectively extracting the position information of the source point cloud and the target point cloud, learning the context characteristic information of the source point cloud and the target point cloud from the multiscale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic;
then, according to the three-dimensional coordinates of the source point cloud and the target point cloud, the alignment states of the source point cloud and the target point cloud are obtained, the alignment states of the source point cloud and the target point cloud are input into an outlier parameter module to generate outlier parameters, the outlier parameters and the mixed characteristics are input into a similarity matching module together, and accordingly the corresponding relation between the source point cloud and all points in the target point cloud is obtained;
and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.
2. The method for point cloud registration based on a position-enhanced-attention mechanism of claim 1, wherein the performing an iterative solution to obtain a final rotation matrix and a translation matrix comprises:
calculating a loss function between the obtained rotation matrix and translation matrix and a real rotation matrix and translation matrix provided by the data set, if the loss function is not converged, multiplying the source point cloud with the rotation matrix and summing the source point cloud with the translation matrix, so as to obtain a new source point cloud, and continuing a new round of iterative registration; if the loss function converges, a rotation matrix and a translation matrix are output.
3. The point cloud registration method based on a location-based enhanced-attention mechanism of claim 1, wherein the processing procedure of the location-enhanced-attention mechanism module is:
extracting context information from the multiscale characteristics of the source point cloud, directly extracting the position information of the source point cloud, splicing the position information with the context information, inputting the spliced characteristics and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context characteristics and the position information to obtain fusion characteristic information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting the mixed characteristic.
4. The point cloud registration method based on a location-enhanced attentiveness mechanism as claimed in claim 1, wherein said location information is generalEncoding the distance between the points and the normal vector coordinates; any two points in a source point cloudThe space distance between the two is calculated by the following steps: />For normal vector information, directly adopting a linear layer to encode normal vectors of points in the source point cloud in space; and splicing the spatial distance information of the points with the coded normal vector information to obtain the position information of the points.
5. The point cloud registration method based on a location-based enhanced-attention mechanism of claim 1, wherein the calculation process of the location-enhanced-attention mechanism module is as follows:
Q=F·W Q ,K=F·W K ,V=F·W V
wherein the multiscale characteristics of the source point cloud are as followsThe fusion characteristic information of the source point cloud is +.>J is Yun Zhongdian number of source points, d is the number of feature dimensions, < >>Representing real number sets, S j Representing attention weights, source point cloud input featuresThe three projection matrixes of (a) are Q, K and V; j denotes the index of the dot, and superscript T denotes the matrix transpose,/>Is the position information of the point, W Q 、W K And (d) sumRespectively, the learnable parameters, MLP represents the multi-layer perceptron, softmax (-) represents the line softmax, cat [. Cndot.,. Cndot.)]Representing stitching.
6. The method for point cloud registration based on a location enhanced attention mechanism of claim 1, wherein the inputting the alignment state of the source point cloud and the target point cloud into the outlier parameter module generates an outlier parameter, comprising:
using a parameter prediction network, taking the unaligned point cloud as input to predict parameters of a current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K), wherein B is the batch size, J and K are the numbers of the source point cloud and the target point cloud respectively, and 3 represents 3-dimensional coordinates;
to characterize which point cloud a point comes from, column 4 features are added, 0 indicating that a point comes from the source point cloud, 1 indicating that a point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the abnormal value parameters alpha and beta are finally obtained through the multi-layer perceptron and the maximum pooling layer.
7. The point cloud registration method based on the position-enhanced attention mechanism according to claim 1, wherein the inputting the outlier parameter and the mixed feature together into the similarity matching module to obtain the correspondence between the source point cloud and all points in the target point cloud includes:
inputting outlier parameters alpha and beta and the mixed characteristics into a similarity matching module to construct a matching matrix M; each element m of the matching matrix jk The e M initialization is as follows:
wherein Fx j ,Fy k The characteristics of the source point cloud and the target point cloud are mixed respectively;
and then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud.
8. The method of point cloud registration based on a location enhanced attention mechanism of claim 1, wherein for each point p in the source point cloud j Corresponding points in the target point cloud can be calculated
Wherein K represents the index of the point in the target point cloud, K represents the number of the points in the target point cloud, j represents the index of the point in the source point cloud, q k Representing the kth point in the target point cloud.
9. The point cloud registration method based on a position enhancement mechanism according to claim 1, wherein a singular value decomposition method is used to obtain a rotation matrix and a translation matrix between a source point cloud and a target point cloud, and iterative solution is performed to obtain a final rotation matrix and a translation matrix, and the method comprises:
corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition (n) Translation matrixt (n) :
Wherein n represents the iteration times, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and the translation matrix obtained in the current iteration;
obtaining a rotation matrix R (n) And a translation matrix t (n) Then iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R (*) And a translation matrix t (*) 。
10. The method of point cloud registration based on a location enhanced attention mechanism of claim 1, wherein the loss function is defined as a true transformation matrix { R for the source point cloud gt ,t gt Predictive transformation matrix of { R } and source point cloud (*) ,t (*) L between } 1 The distance, loss function is calculated as:
wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x j Representing the j-th point in the source point cloud.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310917905.4A CN116912296A (en) | 2023-07-25 | 2023-07-25 | Point cloud registration method based on position-enhanced attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310917905.4A CN116912296A (en) | 2023-07-25 | 2023-07-25 | Point cloud registration method based on position-enhanced attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116912296A true CN116912296A (en) | 2023-10-20 |
Family
ID=88362736
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310917905.4A Pending CN116912296A (en) | 2023-07-25 | 2023-07-25 | Point cloud registration method based on position-enhanced attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116912296A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876447A (en) * | 2024-03-13 | 2024-04-12 | 南京邮电大学 | Three-dimensional point cloud registration method based on micro-surface fusion and alignment |
-
2023
- 2023-07-25 CN CN202310917905.4A patent/CN116912296A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876447A (en) * | 2024-03-13 | 2024-04-12 | 南京邮电大学 | Three-dimensional point cloud registration method based on micro-surface fusion and alignment |
CN117876447B (en) * | 2024-03-13 | 2024-05-07 | 南京邮电大学 | Three-dimensional point cloud registration method based on micro-surface fusion and alignment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
He et al. | Visual semantics allow for textual reasoning better in scene text recognition | |
CN112801280A (en) | One-dimensional convolution position coding method of visual depth self-adaptive neural network | |
Tu et al. | Efficient monocular depth estimation for edge devices in internet of things | |
CN111161364A (en) | Real-time shape completion and attitude estimation method for single-view depth map | |
JP2023073231A (en) | Method and device for image processing | |
CN112819080B (en) | High-precision universal three-dimensional point cloud identification method | |
WO2022151586A1 (en) | Adversarial registration method and apparatus, computer device and storage medium | |
CN116912296A (en) | Point cloud registration method based on position-enhanced attention mechanism | |
CN113326851A (en) | Image feature extraction method and device, electronic equipment and storage medium | |
CN115457492A (en) | Target detection method and device, computer equipment and storage medium | |
CN116258757A (en) | Monocular image depth estimation method based on multi-scale cross attention | |
Wu et al. | Link-RGBD: Cross-guided feature fusion network for RGBD semantic segmentation | |
CN117372604B (en) | 3D face model generation method, device, equipment and readable storage medium | |
Xian et al. | Fast generation of high-fidelity RGB-D images by deep learning with adaptive convolution | |
CN113159053A (en) | Image recognition method and device and computing equipment | |
Cao et al. | CMAN: Leaning global structure correlation for monocular 3D object detection | |
CN116168046B (en) | 3D point cloud semantic segmentation method, system, medium and device under complex environment | |
CN117078518A (en) | Three-dimensional point cloud superdivision method based on multi-mode iterative fusion | |
Gao et al. | HDRNet: High‐Dimensional Regression Network for Point Cloud Registration | |
CN115222947B (en) | Rock joint segmentation method and device based on global self-attention transformation network | |
Zhu et al. | CED-Net: contextual encoder–decoder network for 3D face reconstruction | |
CN116385667A (en) | Reconstruction method of three-dimensional model, training method and device of texture reconstruction model | |
Yu et al. | Redundant same sequence point cloud registration | |
CN115841596B (en) | Multi-label image classification method and training method and device for model thereof | |
US20240013342A1 (en) | Method, electronic device, and computer program product for processing point cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |