CN116912296A - Point cloud registration method based on position-enhanced attention mechanism - Google Patents

Point cloud registration method based on position-enhanced attention mechanism Download PDF

Info

Publication number
CN116912296A
CN116912296A CN202310917905.4A CN202310917905A CN116912296A CN 116912296 A CN116912296 A CN 116912296A CN 202310917905 A CN202310917905 A CN 202310917905A CN 116912296 A CN116912296 A CN 116912296A
Authority
CN
China
Prior art keywords
point cloud
source point
matrix
source
target point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310917905.4A
Other languages
Chinese (zh)
Inventor
王丰
靳勇勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202310917905.4A priority Critical patent/CN116912296A/en
Publication of CN116912296A publication Critical patent/CN116912296A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a point cloud registration method based on a position-enhanced attention mechanism, which comprises the following steps: firstly, extracting multi-scale characteristics of a source point cloud and a target point cloud; secondly, respectively extracting position information of a source point cloud and a target point cloud, learning context characteristic information of the source point cloud and the target point cloud from the multi-scale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic; generating an outlier parameter according to the alignment state of the source point cloud and the target point cloud, and obtaining the corresponding relation between the source point cloud and all points in the target point cloud by utilizing the outlier parameter and the mixed characteristic; and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.

Description

Point cloud registration method based on position-enhanced attention mechanism
Technical Field
The application relates to the field of three-dimensional point cloud registration in deep learning and computer vision, in particular to a three-dimensional point cloud registration method based on a position-enhanced attention mechanism.
Technical Field
The point cloud registration technology is a process of transforming point clouds acquired by a point cloud scanning device under different view angles into the same coordinate system through rotation, translation and the like, and is widely applied to the fields of attitude estimation, three-dimensional reconstruction, mobile robots and the like. However, in practical applications, the point cloud registration also has a certain challenge, mainly because 1) there are problems of noise and partial invisibility between the point clouds scanned under the non-same view angle; 2) The point cloud is unordered, sparse. Therefore, in the actual point cloud registration task, it is essential to improve the accuracy and robustness of the algorithm.
According to the conversion mode of the data, the existing point cloud registration method can be divided into methods based on voxels, multi-view and point cloud. Since the first two methods cause information loss, the most widely used method at present is a point cloud-based method. The PointNet algorithm solves the problems of unordered and rotation invariance of the point cloud. The processing of the point cloud registration problem is then generalized to deep learning-based methods. Compared with the traditional registration method for iterating the nearest point algorithm, the method based on the deep learning solves the problem of sinking into local optimum. However, when noise exists in the point cloud and part of the point cloud is missing, the method based on deep learning cannot keep a good registration effect.
Conventional methods in existing point cloud registration methods, such as the closest point iterative algorithm ICP (Iterative Closest Point), tend to fall into local optima. A deep learning-based method, for example using robust point matching Rpm-Net (Robust Point Matching using Learned Features) of learning features, enables efficient registration in the context of certain noise and small part of point cloud deletions. But when the missing part of the point cloud reaches 30%, the registration effect will be not ideal. The method is characterized in that the current registration method based on deep learning only focuses on local geometric features of the point cloud, lacks understanding of the global state, does not consider feature interaction between the source point cloud and the target point cloud and position information of corresponding points, so that the learned features are not so discernable, and a large number of wrong point corresponding relations are caused.
Therefore, in the scene that the point cloud has noise and the missing part of the point cloud reaches 30% or higher, how to improve the registration effect is a difficult problem to be solved at present.
Disclosure of Invention
The application aims to provide a point cloud registration method based on a position-enhanced attention mechanism, so that a network can learn context information between each point cloud and can be combined with the geometric structure of the point cloud at the same time to obtain the characteristic with more geometric relevance, and the registration performance is improved.
In order to realize the tasks, the application adopts the following technical scheme:
a point cloud registration method based on a location-enhanced attention mechanism, comprising:
firstly, respectively inputting source point cloud and target point cloud data into a self-adaptive graph convolution feature extraction module, and extracting multi-scale features of the source point cloud and the target point cloud;
secondly, inputting the multiscale characteristics of the source point cloud and the target point cloud into a position enhancement attention mechanism module, respectively extracting the position information of the source point cloud and the target point cloud, learning the context characteristic information of the source point cloud and the target point cloud from the multiscale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic;
then, according to the three-dimensional coordinates of the source point cloud and the target point cloud, the alignment states of the source point cloud and the target point cloud are obtained, the alignment states of the source point cloud and the target point cloud are input into an outlier parameter module to generate outlier parameters, the outlier parameters and the mixed characteristics are input into a similarity matching module together, and accordingly the corresponding relation between the source point cloud and all points in the target point cloud is obtained;
and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.
Further, the performing iterative solution to obtain a final rotation matrix and a translation matrix includes:
calculating a loss function between the obtained rotation matrix and translation matrix and a real rotation matrix and translation matrix provided by the data set, if the loss function is not converged, multiplying the source point cloud with the rotation matrix and summing the source point cloud with the translation matrix, so as to obtain a new source point cloud, and continuing a new round of iterative registration; if the loss function converges, a rotation matrix and a translation matrix are output.
Further, the processing procedure of the position enhancement attention mechanism module is as follows:
extracting context information from the multiscale characteristics of the source point cloud, directly extracting the position information of the source point cloud, splicing the position information with the context information, inputting the spliced characteristics and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context characteristics and the position information to obtain fusion characteristic information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting the mixed characteristic.
Further, the position information is encoded by the distance between the points and the normal vector coordinates; any two points in a source point cloud The space distance between the two is calculated by the following steps: />For normal vector information, directly adopting a linear layer to encode normal vectors of points in the source point cloud in space; and splicing the spatial distance information of the points with the coded normal vector information to obtain the position information of the points.
Further, the calculation process of the position enhancement attention mechanism module is as follows:
Q=F·W Q ,K=F·W K ,V=F·W V
wherein the multiscale characteristics of the source point cloud are as followsThe fusion characteristic information of the source point cloud is +.>J is Yun Zhongdian number of source points, d is the number of feature dimensions, < >>Representing real number sets, S j Three projection matrixes representing the attention weight and the input characteristics of the source point cloud are Q, K and V; j denotes the index of the dot, and superscript T denotes the matrix transpose,/>Is the position information of the point, W Q 、W K And->Respectively, the learnable parameters, MLP represents the multi-layer perceptron, softmax (-) represents the line softmax, cat [. Cndot.,. Cndot.)]Representation stitching。
Further, the inputting the alignment state of the source point cloud and the target point cloud into the outlier parameter module to generate the outlier parameter includes:
using a parameter prediction network, taking the unaligned point cloud as input to predict parameters of a current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K), wherein B is the batch size, J and K are the numbers of the source point cloud and the target point cloud respectively, and 3 represents 3-dimensional coordinates;
to characterize which point cloud a point comes from, column 4 features are added, 0 indicating that a point comes from the source point cloud, 1 indicating that a point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the abnormal value parameters alpha and beta are finally obtained through the multi-layer perceptron and the maximum pooling layer.
Further, the inputting the outlier parameter and the mixed feature to the similarity matching module together, so as to obtain the correspondence between the source point cloud and all points in the target point cloud, including:
inputting outlier parameters alpha and beta and the mixed characteristics into a similarity matching module to construct a matching matrix M; each element m of the matching matrix jk The e M initialization is as follows:
wherein Fx j ,Fy k The characteristics of the source point cloud and the target point cloud are mixed respectively;
and then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud.
Further, for each point p in the source point cloud j Corresponding points in the target point cloud can be calculated
Wherein K represents the index of the point in the target point cloud, K represents the number of the points in the target point cloud, j represents the index of the point in the source point cloud, q k Representing the kth point in the target point cloud.
Further, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between a source point cloud and a target point cloud, and iterative solution is carried out to obtain a final rotation matrix and a translation matrix, which comprises the following steps:
corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition (n) And a translation matrix t (n)
Wherein n represents the iteration times, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and the translation matrix obtained in the current iteration;
obtaining a rotation matrix R (n) And a translation matrix t (n) Then iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R (*) And a translation matrix t (*)
Further, the loss function is defined as the true transformation matrix { R } of the source point cloud gt ,t gt Predictive transformation matrix of { R } and source point cloud (*) ,t (*) L between } 1 The distance, loss function is calculated as:
wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x j Representing the jth point in the source point cloud。
Compared with the prior art, the application has the following technical characteristics:
when the point cloud has noise and when part of the point cloud is missing, the traditional algorithm and the learning-based algorithm can not be registered more effectively. The application provides an end-to-end point cloud registration scheme: extracting local geometric features of the point cloud from the original pair of point clouds, splicing the local geometric features into multi-scale features, inputting the features, the distance and normal line information of the points into a position-enhanced attention mechanism, and finally obtaining the mixed features. And then obtaining point corresponding relation based on the mixed characteristic, and finally obtaining a final rotation matrix and a final translation matrix by utilizing singular value decomposition. Compared with the traditional registration method and the registration method based on deep learning, the method has better registration effect under the scene that the point cloud has noise and part of the point cloud is missing; the position enhancing attention mechanism provided by the application can improve the understanding of the registration network to the position information and the distinguishing ability of the learned features, and improve the registration performance.
Drawings
FIG. 1 is a schematic diagram of a network of the method of the present application;
FIG. 2 is a schematic diagram of a location-enhanced attention module;
FIG. 3 is a schematic flow chart of the method of the present application.
Fig. 4 is a graph of the registration effect of the method of the present application in an embodiment.
Detailed Description
The application provides a point cloud registration method based on a position-enhanced attention mechanism, which is structurally shown in fig. 1:
firstly, respectively inputting source point cloud and target point cloud data into a self-adaptive graph convolution feature extraction module, and extracting multi-scale features of the source point cloud and the target point cloud;
secondly, inputting the multiscale characteristics of the source point cloud and the target point cloud into a position enhancement attention mechanism module, respectively extracting the position information of the source point cloud and the target point cloud, learning the context characteristic information of the source point cloud and the target point cloud from the multiscale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic;
then, according to the three-dimensional coordinates of the source point cloud and the target point cloud, the alignment states of the source point cloud and the target point cloud are obtained, the alignment states of the source point cloud and the target point cloud are input into an outlier parameter module to generate outlier parameters, the outlier parameters and the mixed characteristics are input into a similarity matching module together, and accordingly the corresponding relation between the source point cloud and all points in the target point cloud is obtained;
and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.
The method comprises the steps of carrying out iterative solution to obtain a final rotation matrix and a translation matrix, wherein the method comprises the following steps:
calculating a loss function between the obtained rotation matrix and translation matrix and a real rotation matrix and translation matrix provided by the data set, if the loss function is not converged, multiplying the source point cloud with the rotation matrix and summing the source point cloud with the translation matrix, so as to obtain a new source point cloud, and continuing a new round of iterative registration; if the loss function converges, a rotation matrix and a translation matrix are output.
The self-adaptive graph convolution proposed by Wei M et al overcomes the defects of a standard graph convolution fixed kernel, and adaptively establishes a relationship between a pair of points according to characteristic attributes, so as to generate the self-adaptive kernel, and further, different relationships between different partial points in the point cloud can be extracted more effectively. But lacks a global understanding. Therefore, the method inputs the point cloud into the self-adaptive graph convolution feature extraction module formed by the four self-adaptive graph convolution layers (64,64,128,256) to perform multi-level feature extraction, and then performs splicing operation to generate multi-scale features.
Aiming at the multi-scale characteristics of the source point cloud and the multi-scale characteristics of the target point cloud, the processing procedure of the position enhancing attention mechanism module is as follows: firstly, extracting context information from multi-scale features of a source point cloud, directly extracting position information of the source point cloud, splicing the position information with the context information, inputting the spliced features and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context features and the position information to obtain fusion feature information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting a mixed characteristic, wherein the structure of the mixed characteristic is shown in figure 2. The position enhancing attention mechanism of the application mainly embeds the position information of the point cloud into the calculation of the attention module, thereby helping the model to learn the space structure between the points in the point cloud, and enabling the model to pay more attention to the information of a specific area when processing the point cloud, thereby improving the perception capability of key points and reducing the wrong point corresponding relation between the source point cloud and the target point cloud.
The position information is encoded primarily by the distance between the points and the normal vector coordinates. The method calculates Euclidean distance of points, namely any two points in a source point cloudThe space distance between the two is calculated by the following steps: />And for normal vector information, directly adopting a linear layer to encode the normal vector of points in the source point cloud in space. And splicing the spatial distance information of the point with the coded normal vector information to obtain the position information of the point, thereby coding the position information of the point.
In the following, a calculation process of the source point cloud in the self-attention module is given.
Given an input feature matrix (Multi-scale features of a source point cloud)Output feature matrix (fusion feature information of source point cloud)/(fusion feature information)>(J is Yun Zhongdian number of source points, d is number of feature dimensions, < >>Representing a real set) is a weighted sum of projections of all input features F, S j Representing the attention weight, first calculate Q, K, V (projection matrix of source point cloud input features):
Q=F·W Q ,K=F·W K ,V=F·W V
here, J is the number of source points Yun Zhongdian, J denotes the index of the point, the superscript T denotes the matrix transpose,is the position information of the point, W Q 、W K And->Respectively, the learning parameters are obtained through data set training, MLP represents a multi-layer sensor, and softmax () represents the line softmax, cat [. Cndot.,)]Representing stitching.
And a cross attention module is added after the self attention module, so that effective characteristic information interaction can be carried out between the source point cloud and the target point cloud, and finally, the mixed characteristic is obtained.
The parameter prediction network proposed by Yew Z J et al can select a proper outlier parameter according to the alignment state of the current point cloud. Thus using the parameter prediction network, the unaligned point cloud is taken as input to predict the parameters of the current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K) (B is the batch size, J, K are the number of the source point cloud and the target point cloud respectively, 3 represents 3-dimensional coordinates), and in order to represent which point cloud a certain point comes from, a 4 th column of characteristics is added, wherein 0 represents that the certain point comes from the source point cloud, and 1 represents that the certain point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the data of (B, 2) is finally obtained as outlier parameters alpha and beta through the multi-layer perceptron and the maximum pooling layer.
Then, the outlier parameters alpha and beta and the mixed characteristics are input into a similarity matching module to construct a matching matrix M; each element m of the matching matrix jk The e M initialization is as follows:
wherein Fx j ,Fy k The hybrid features of the source point cloud and the target point cloud, respectively.
And then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud:
for each point p in the source point cloud j Corresponding points in the target point cloud can be calculated
Wherein K represents the index of the point in the target point cloud, K represents the number of the points in the target point cloud, j represents the index of the point in the source point cloud, q k Representing the kth point in the target point cloud.
Finally, singular value decomposition is used to solve for the rigid transformation:
corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition (n) And a translation matrix t (n)
Where n represents the number of iterations, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and translation matrix obtained for the current iteration.
Obtaining a rotation matrix R (n) And a translation matrix t (n) Iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R (*) And a translation matrix t (*)
Taking the first three rows and the first three columns of a real transformation matrix in the data set used for network training, and representing the three rows and the first three columns as a real rotation matrix R gt Correspondingly, the last column of the first three rows of the true transformation matrix in the dataset is taken, which is denoted as the true translation matrix t gt . The loss function is defined as the true transformation matrix { R for the source point cloud gt ,t gt Predictive transformation matrix of { R } and source point cloud (*) ,t (*) L between } 1 The distance is Manhattan distance. The loss function is calculated as:
wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x j Representing the j-th point in the source point cloud.
The point cloud registration result of the application: the visual registration results are shown in fig. 4 (the left graph is an initial point cloud, and the right graph is a registration result), and it can be seen that the registration method provided by the application can realize accurate registration under the scene that noise exists in the point cloud and part of the point cloud is missing.
The computer processor used in this experiment was Intel (R) Xeon (R) Bronze 3204CPU@1.90GHz, the graphics card was RTX3090 GPU, pyTorch version 1.13.0, and the Python programming language was used.
The method of the application is compared with the traditional algorithm ICP (Iterative Closest Point) and the algorithm based on deep learning: the robust point matches for the learned features, rpm-Net (Robust Point Matching using Learned Features), are compared and the results are shown in table 1.
All methods were tested on the ModelNet40 dataset. Noise was added and there was a 30% point cloud missing, using the anisotropic mean absolute errors of the rotation and translation matrices, MAE (r), MAE (t), respectively, and also using the isotropic mean absolute errors of the rotation and translation matrices, error (r), error (t), respectively.
Table 1 results of experimental registration of different methods
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A point cloud registration method based on a location-enhanced attention mechanism, comprising:
firstly, respectively inputting source point cloud and target point cloud data into a self-adaptive graph convolution feature extraction module, and extracting multi-scale features of the source point cloud and the target point cloud;
secondly, inputting the multiscale characteristics of the source point cloud and the target point cloud into a position enhancement attention mechanism module, respectively extracting the position information of the source point cloud and the target point cloud, learning the context characteristic information of the source point cloud and the target point cloud from the multiscale characteristics, and fusing the context characteristic information and the position information to obtain fused characteristic information; performing characteristic information interaction on the fusion characteristic information of the source point cloud and the target point cloud to generate a hybrid characteristic;
then, according to the three-dimensional coordinates of the source point cloud and the target point cloud, the alignment states of the source point cloud and the target point cloud are obtained, the alignment states of the source point cloud and the target point cloud are input into an outlier parameter module to generate outlier parameters, the outlier parameters and the mixed characteristics are input into a similarity matching module together, and accordingly the corresponding relation between the source point cloud and all points in the target point cloud is obtained;
and finally, a singular value decomposition method is used for obtaining a rotation matrix and a translation matrix between the source point cloud and the target point cloud, iterative solution is carried out, a final rotation matrix and a translation matrix are obtained, and the registration process is completed.
2. The method for point cloud registration based on a position-enhanced-attention mechanism of claim 1, wherein the performing an iterative solution to obtain a final rotation matrix and a translation matrix comprises:
calculating a loss function between the obtained rotation matrix and translation matrix and a real rotation matrix and translation matrix provided by the data set, if the loss function is not converged, multiplying the source point cloud with the rotation matrix and summing the source point cloud with the translation matrix, so as to obtain a new source point cloud, and continuing a new round of iterative registration; if the loss function converges, a rotation matrix and a translation matrix are output.
3. The point cloud registration method based on a location-based enhanced-attention mechanism of claim 1, wherein the processing procedure of the location-enhanced-attention mechanism module is:
extracting context information from the multiscale characteristics of the source point cloud, directly extracting the position information of the source point cloud, splicing the position information with the context information, inputting the spliced characteristics and the position information into a self-attention module in a position-enhanced attention module, and performing interaction between the context characteristics and the position information to obtain fusion characteristic information of the source point cloud and target point cloud vice versa; and finally, inputting the fusion characteristic information of the source point cloud and the target point cloud into a cross attention module, carrying out characteristic interaction between the source point cloud and the target point cloud, and finally outputting the mixed characteristic.
4. The point cloud registration method based on a location-enhanced attentiveness mechanism as claimed in claim 1, wherein said location information is generalEncoding the distance between the points and the normal vector coordinates; any two points in a source point cloudThe space distance between the two is calculated by the following steps: />For normal vector information, directly adopting a linear layer to encode normal vectors of points in the source point cloud in space; and splicing the spatial distance information of the points with the coded normal vector information to obtain the position information of the points.
5. The point cloud registration method based on a location-based enhanced-attention mechanism of claim 1, wherein the calculation process of the location-enhanced-attention mechanism module is as follows:
Q=F·W Q ,K=F·W K ,V=F·W V
wherein the multiscale characteristics of the source point cloud are as followsThe fusion characteristic information of the source point cloud is +.>J is Yun Zhongdian number of source points, d is the number of feature dimensions, < >>Representing real number sets, S j Representing attention weights, source point cloud input featuresThe three projection matrixes of (a) are Q, K and V; j denotes the index of the dot, and superscript T denotes the matrix transpose,/>Is the position information of the point, W Q 、W K And (d) sumRespectively, the learnable parameters, MLP represents the multi-layer perceptron, softmax (-) represents the line softmax, cat [. Cndot.,. Cndot.)]Representing stitching.
6. The method for point cloud registration based on a location enhanced attention mechanism of claim 1, wherein the inputting the alignment state of the source point cloud and the target point cloud into the outlier parameter module generates an outlier parameter, comprising:
using a parameter prediction network, taking the unaligned point cloud as input to predict parameters of a current iteration; firstly, splicing a source point cloud and a target point cloud into a matrix (B, 3, J+K), wherein B is the batch size, J and K are the numbers of the source point cloud and the target point cloud respectively, and 3 represents 3-dimensional coordinates;
to characterize which point cloud a point comes from, column 4 features are added, 0 indicating that a point comes from the source point cloud, 1 indicating that a point comes from the target point cloud; therefore, the input data of the parameter prediction module is in (B, 4, J+K) dimension, and the abnormal value parameters alpha and beta are finally obtained through the multi-layer perceptron and the maximum pooling layer.
7. The point cloud registration method based on the position-enhanced attention mechanism according to claim 1, wherein the inputting the outlier parameter and the mixed feature together into the similarity matching module to obtain the correspondence between the source point cloud and all points in the target point cloud includes:
inputting outlier parameters alpha and beta and the mixed characteristics into a similarity matching module to construct a matching matrix M; each element m of the matching matrix jk The e M initialization is as follows:
wherein Fx j ,Fy k The characteristics of the source point cloud and the target point cloud are mixed respectively;
and then performing alternate row and column normalization on the matching matrix, and repeatedly applying the alternate row and column normalization to obtain a double random matrix from any square matrix with all positive terms, thereby obtaining the corresponding relation between the source point cloud and all points in the target point cloud.
8. The method of point cloud registration based on a location enhanced attention mechanism of claim 1, wherein for each point p in the source point cloud j Corresponding points in the target point cloud can be calculated
Wherein K represents the index of the point in the target point cloud, K represents the number of the points in the target point cloud, j represents the index of the point in the source point cloud, q k Representing the kth point in the target point cloud.
9. The point cloud registration method based on a position enhancement mechanism according to claim 1, wherein a singular value decomposition method is used to obtain a rotation matrix and a translation matrix between a source point cloud and a target point cloud, and iterative solution is performed to obtain a final rotation matrix and a translation matrix, and the method comprises:
corresponding points in the target point cloudCarrying out solving on rotation matrix R in nth iteration by singular value decomposition (n) Translation matrixt (n)
Wherein n represents the iteration times, J represents the number of points of the source point cloud, and R, t represents the rotation matrix and the translation matrix obtained in the current iteration;
obtaining a rotation matrix R (n) And a translation matrix t (n) Then iterating the current source point cloud to obtain a new source point cloud, and repeating the iterating process until the loss function converges to obtain a final rotation matrix R (*) And a translation matrix t (*)
10. The method of point cloud registration based on a location enhanced attention mechanism of claim 1, wherein the loss function is defined as a true transformation matrix { R for the source point cloud gt ,t gt Predictive transformation matrix of { R } and source point cloud (*) ,t (*) L between } 1 The distance, loss function is calculated as:
wherein J is the number of the points of the source point cloud, J is the index of the points of the source point cloud, and x j Representing the j-th point in the source point cloud.
CN202310917905.4A 2023-07-25 2023-07-25 Point cloud registration method based on position-enhanced attention mechanism Pending CN116912296A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310917905.4A CN116912296A (en) 2023-07-25 2023-07-25 Point cloud registration method based on position-enhanced attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310917905.4A CN116912296A (en) 2023-07-25 2023-07-25 Point cloud registration method based on position-enhanced attention mechanism

Publications (1)

Publication Number Publication Date
CN116912296A true CN116912296A (en) 2023-10-20

Family

ID=88362736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310917905.4A Pending CN116912296A (en) 2023-07-25 2023-07-25 Point cloud registration method based on position-enhanced attention mechanism

Country Status (1)

Country Link
CN (1) CN116912296A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876447A (en) * 2024-03-13 2024-04-12 南京邮电大学 Three-dimensional point cloud registration method based on micro-surface fusion and alignment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876447A (en) * 2024-03-13 2024-04-12 南京邮电大学 Three-dimensional point cloud registration method based on micro-surface fusion and alignment
CN117876447B (en) * 2024-03-13 2024-05-07 南京邮电大学 Three-dimensional point cloud registration method based on micro-surface fusion and alignment

Similar Documents

Publication Publication Date Title
He et al. Visual semantics allow for textual reasoning better in scene text recognition
CN112801280A (en) One-dimensional convolution position coding method of visual depth self-adaptive neural network
Tu et al. Efficient monocular depth estimation for edge devices in internet of things
CN111161364A (en) Real-time shape completion and attitude estimation method for single-view depth map
JP2023073231A (en) Method and device for image processing
CN112819080B (en) High-precision universal three-dimensional point cloud identification method
WO2022151586A1 (en) Adversarial registration method and apparatus, computer device and storage medium
CN116912296A (en) Point cloud registration method based on position-enhanced attention mechanism
CN113326851A (en) Image feature extraction method and device, electronic equipment and storage medium
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN116258757A (en) Monocular image depth estimation method based on multi-scale cross attention
Wu et al. Link-RGBD: Cross-guided feature fusion network for RGBD semantic segmentation
CN117372604B (en) 3D face model generation method, device, equipment and readable storage medium
Xian et al. Fast generation of high-fidelity RGB-D images by deep learning with adaptive convolution
CN113159053A (en) Image recognition method and device and computing equipment
Cao et al. CMAN: Leaning global structure correlation for monocular 3D object detection
CN116168046B (en) 3D point cloud semantic segmentation method, system, medium and device under complex environment
CN117078518A (en) Three-dimensional point cloud superdivision method based on multi-mode iterative fusion
Gao et al. HDRNet: High‐Dimensional Regression Network for Point Cloud Registration
CN115222947B (en) Rock joint segmentation method and device based on global self-attention transformation network
Zhu et al. CED-Net: contextual encoder–decoder network for 3D face reconstruction
CN116385667A (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
Yu et al. Redundant same sequence point cloud registration
CN115841596B (en) Multi-label image classification method and training method and device for model thereof
US20240013342A1 (en) Method, electronic device, and computer program product for processing point cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination