CN111882593B

CN111882593B - Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network

Info

Publication number: CN111882593B
Application number: CN202010717508.9A
Authority: CN
Inventors: 张振鑫; 孙澜; 钟若飞; 李小娟; 宫辉力; 邹建军
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2022-06-17
Anticipated expiration: 2040-07-23
Also published as: CN111882593A

Abstract

The invention relates to a point cloud registration model and a point cloud registration method combining an attention mechanism and a three-dimensional graph convolution network. The Detector model is used for extracting attention characteristics of points and constructing an attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud. The method comprises the steps of firstly, carrying out model training, training the model by utilizing a feature alignment triple loss structure loss function, and effectively extracting attention features and Descriptor (Descriptor) features from point cloud; and after model training, point cloud registration is carried out. The method can automatically extract the key points and the three-dimensional depth features of each key point, combines the multi-layer perceptron MLP with the graph convolution network GCN in the three-dimensional graph convolution network, designs a new point cloud feature extraction module, can extract more point cloud features with identification significance, and improves the accuracy of point cloud registration.

Description

Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network

Technical Field

The invention relates to the field of computer vision and geospatial information science, in particular to a point cloud registration model and a point cloud registration method combining an attention mechanism and a three-dimensional graph convolution network.

Background

The three-dimensional point cloud can provide rich and dense object space information and plays an important role in the fields of civil traffic engineering, tunnel engineering, digital cities, synchronous positioning and surveying and mapping and the like. In these applications, point cloud registration is a basic and critical problem, and due to errors of positioning sensors or inconsistency of a coordinate system, different phases or views of spatial data have a certain mismatch, and complexity and local similarity of spatial objects, automatic and efficient registration of cloud points is challenging.

The 3D point cloud Descriptor includes 3D Harris, 3D scale-invariant feature transform (SIFT), Normal Aligned Radial Feature (NARF), and local surface feature (local surface patch feature). In addition, there are some descriptors for spatial statistics, such as Fast Point Feature Histogram (FPFH), Point Feature Histogram (PFH), etc. These traditional point cloud descriptors are mainly designed manually, and the three-dimensional features have been modularized in a Point Cloud Library (PCL), which is not strong in feature significance and flexibility.

Compared with the manually designed features, the features can be automatically and efficiently constructed by utilizing the end-to-end deep learning model. However, due to the discreteness and irregularity of the point cloud, how to express the characteristics of the point cloud in deep learning has certain challenges. In 2017, due to the fact that a voxel-based point cloud deep learning model 3Dmatch exists, the registration of 3D point clouds also enters the deep learning era. The voxel-based deep learning feature Descriptor is designed for point cloud registration, while the 3D twin convolutional neural network allows voxels to be mapped through the network to obtain a 512-dimensional depth Descriptor. The PPFNet realizes point cloud registration by designing a context-aware local feature Descriptor; the 3DFeat-Net uses the PointNet model to learn the rotation matrix and the point cloud depth characteristics, and improves the anti-rotation capability of the PointNet model. The above-described working backbone networks (PPFNet and 3DFeatNet) both use the PointNet model.

Still have some problems, such as the depth learning method based on 3D voxel calculates time is longer, can lose some data precision in the process of voxelization, is not suitable for high-efficient registration; due to the performance of hardware devices, the size of the voxel is greatly limited, which further affects the quality of the registration result in some large scenes; PointNet and its family of works (e.g., PointNet + +) have some limitations on anti-spin performance and require many data enhancement operations (e.g., spinning) during training to meet the performance of the model; due to the quality limitations of the training samples, the model becomes unstable, and if the training samples are insufficient or the scene difference is large, the performance of the model may be degraded during the registration process.

In the aspect of point cloud processing, there are some researches combined with Graph Convolutional Network (GCN), which mainly include a spectrum domain GCN, a non-spectrum domain GCN, and the like. In addition, the accuracy of point cloud classification is improved due to the appearance of a super point graph (SuperPoint Graphs) network for large scene classification and an over-segmentation deep learning network of point clouds. In the methods, the GCN can effectively improve the precision of point cloud classification and improve the robustness of the algorithm. However, when the backbone network of these methods is directly used for point cloud registration, there are problems of high time complexity, long training time, and the like.

Therefore, a technical problem to be solved by those skilled in the art is to provide a registration network with a simple structure and a good effect, so that the point cloud registration can be completed in a short time and with a good result.

Disclosure of Invention

In view of this, an object of the present application is to provide a point cloud registration model and a point cloud registration method that combine an attention mechanism and a three-dimensional graph convolution network to extract more point cloud features with values and improve accuracy of point cloud registration.

In order to achieve the above object, the present application provides the following technical solutions.

A point cloud registration model combining an attention mechanism and a three-dimensional graph convolution network is a three-branch Siamese (Siamese) framework and comprises a Detector model and a Descriptor model, wherein the Detector model is used for extracting attention characteristics of points and constructing the attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud.

Preferably, the Detector model mainly extracts the attention features of the generated point cloud through a spectral domain-based graph convolution network module MLP _ GCN, and 5 complete connection layers (channels: 64, 64 and 128, filter: 1 × 1) are used in the MLP _ GCN module of the Detector model to extract the initial point cloud features, so as to further realize the function of extracting the point cloud features.

Preferably, the Descriptor model firstly uses a set interaction (SA) module extracted from a PointNet + + midpoint set to extract initial features of the point cloud, and then connects two map convolution network modules MLP _ GCN based on a spectral domain, thereby improving the depth and performance of the network and obtaining the final three-dimensional depth features.

Preferably, two spectrum domain-based graph convolution network modules MLP _ GCN, wherein the first MLP _ GCN connects 3 full-connection layers (node: 128-256), and the output characteristic dimension is n × 128; the second MLP _ GCN is connected to 3 full-connected layers (node: 256-512) and has an output characteristic dimension of n × 256.

Preferably, the MLP _ GCN combines the multi-layer perceptron MLP and the graph convolution network GCN, and can effectively extract the depth features based on the coordinates of the input point cloud, and improve the anti-rotation invariance and the identification of the features.

Preferably, the MLP _ GCN is constructed by using a sampling and grouping layer of PointNet + +, and 3 complete connection layers (nodes: 64-64-128, filter: 1 × 1) are connected to extract the point cloud feature X^n×128。

Preferably, the MLP _ GCN inputs are point sets, the number of points in each point set is n, the nearest K points of each point are searched in one point set, each point is connected with the respective nearest K points to form an edge, so that a graph G is established, an adjacency matrix a and a depth matrix D are reconstructed, and a laplacian matrix L is calculated^n×n(ii) a Setting W as a parameter of the convolution kernel to c_n×m，c_nTaking the characteristic length of the point cloud, and taking m as the output characteristic dimension of graph convolution to obtain an L.X.W value as output; finally, connect one to the mostAnd (5) enlarging the pool layer to obtain deep learning characteristics of the point cloud.

The point cloud registration model registration method comprises the following steps: firstly, model training is carried out, a feature alignment triplet loss structure loss function is used for training the model, and attention features and Descriptor (Descriptor) features are effectively extracted from point cloud; and after model training, point cloud registration is carried out.

Preferably, the model training comprises the steps of:

automatically constructing a matching point pair and a non-matching point pair, selecting one point from one data set, taking a corresponding point from the other data set to form the matching point pair, calling the point before registration as anchor, and the point after registration as positive, and then randomly selecting a point which is not the anchor and the positive as negative to form the non-matching point pair with the anchor point;

respectively searching K Nearest neighbors of an anchor point, a positive point and a negative point by using a K-Nearest Neighbor (KNN) algorithm to form three point sets which are used as input of a deep learning network;

obtaining the depth characteristics and attention characteristics of anchors, positive and negative through the model;

the above features are added to the Feature Alignment triple Loss function to optimize the features and train the model.

Preferably, the point cloud registration comprises the steps of:

uniformly sampling P from each of two sets of point clouds_mIs dotted and is at P_mSearching n nearest points of each point from the points to form P of each group of point clouds_mA point set; putting the point sets into a model, and generating a depth feature and an attention feature of each point in each point set;

the attention characteristics of each point utilize L₂-a norm value to determine if the point is at a maximum in its neighborhood, and if so, to add it to a key point (keypoint) queue, whereby the key points of both sets of point clouds can be derived;

sequencing the obtained attention values of the key points, and selecting front P_kThe points are key points;

searching their respective n nearest neighbors according to each final keypoint to obtain a final P_kPoint set, inputting the final point set into model to obtain P_kDepth feature vectors of each point in the point set;

according to the depth characteristics of each point in the point set, determining the corresponding depth characteristics with the closest Euclidean distance in another point cloud, and further obtaining 2 XnxP_kPoint cloud matching point pairs;

removing gross errors in the matching point set by using a RANSAC algorithm;

and calculating a rotation matrix by adopting a least square method to obtain a registration result.

The beneficial technical effects obtained by the invention are as follows:

1) the invention solves the defects of the existing point cloud registration work, designs a weak supervision three-dimensional graph convolution network for point cloud registration, and the network can automatically extract the three-dimensional depth characteristics of a point set consisting of key points, each key point and adjacent points thereof; in a three-dimensional graph convolution network, a multi-layer sensor MLP is combined with a graph convolution network GCN, and a new point cloud feature extraction module (named as MLP _ GCN) is designed, so that more point cloud features with identification significance can be extracted by the module, and the accuracy of point cloud registration is improved;

2) the invention combines PointNet + + and graph convolution network GCN into a new graph deep learning model, which makes full use of the advantages of GCN in point set characteristic expression and high-efficiency calculation;

3) according to the attention characteristics, the method automatically generates key points and depth characteristics thereof by adopting a non-maximum suppression (NMS) method, and the whole network training process only needs to establish matching point pairs and does not need to mark the key points; the mechanism is a weak supervision learning method, and can save a large amount of time for manually marking matching point pairs;

4) the invention is based on deep learning and has a learning process. Compared with the traditional non-deep learning method, the method has better performance, especially when the outdoor scene point cloud is sparse; by adopting a point set mode, the method is not like a cube with fixed voxels, has higher flexibility of shape and size, reduces information loss, and extracts more comprehensive characteristics, thereby having better final registration effect;

5) compared with the mode that a point set is adopted but a GCN network is not used, the method has the advantages that the GCN network is used, so that the method is more sensitive to spatial rotation and has better registration effect; especially in a large scene, due to the limited memory of the current GPU, the used voxels during training cannot be too small, and the final matching effect is poor, so the problem is greatly alleviated by the method.

The foregoing description is only an overview of the technical solutions of the present application, so that the technical means of the present application can be more clearly understood and the present application can be implemented according to the content of the description, and in order to make the above and other objects, features and advantages of the present application more clearly understood, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

Fig. 1 is a schematic structural diagram of a point cloud registration model combining an attention machine mechanism and a three-dimensional graph convolution network in embodiment 1 of the present disclosure;

FIG. 2 shows an input of C in example 1 of the present disclosure_nThe channel and the graph convolution network GCN with the output of m feature maps are schematic structural diagrams;

fig. 3 is a schematic flow chart of point cloud registration in embodiment 2 of the present disclosure;

fig. 4 is a data set disclosed in example 3 of the present disclosure ((a) data set i and (b) data set ii);

FIG. 5 is a graph of the test results of the method of example 3 of the present disclosure and the prior methods I-IV in data set I;

FIG. 6 is a graph of the results of the testing of the method of example 3 of the present disclosure and prior art methods I-IV in data set II.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.

Further, the present application may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion.

Example 1

A point cloud registration model combining an attention mechanism and a three-dimensional graph convolution network, as shown in fig. 1, the model is a three-branch Siamese (siame) framework, and includes a Detector model and a Descriptor model, the Detector model is used for extracting attention characteristics of points to construct the attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud.

The Detector model mainly extracts the attention features of the generated point cloud through a spectral domain-based image convolution network module MLP _ GCN, 5 complete connecting layers (channels: 64, 64 and 128 and a filter: 1 multiplied by 1) are used in the MLP _ GCN module of the Detector model to extract the initial point cloud features, and a point cloud feature extraction function h (-) is further realized. Then, the point cloud feature (X) is extracted by using a function h (·), and a laplacian matrix L (the number of nearest neighbors is set to K32, and the matrix L is n × n) is solved by convolution operation. And then, convolving the image in the spectral domain by utilizing the matrix L and the point cloud characteristic X to obtain L.X.W. Finally, L.X.W is connected to the max pooling layer, where the dimension of the W parameter is 256X 1 and the dimension of the output feature is n X1.

The Detector model mainly extracts the attention features of the generated point cloud through an MLP _ GCN module. The Detector model may express the importance of each point feature in the point cloud. Through the model, the characteristic expression effect of the point cloud can be enhanced, the negative influence of the point cloud is weakened, and the identification of the region is reflected. In the registration process, the key points are automatically detected by a method of searching local maximum values, and the characteristic expression of the point cloud attention is realized. Meanwhile, the point cloud registration is effectively finished by designing a weak supervised learning method without marking matching point pairs.

The Descriptor model is used for extracting deep features of each point in the point cloud by utilizing a convolution network. Firstly, a set iteration (SA) module extracted from a pointenet + + midpoint set is used for extracting initial features of point cloud, and then two graph convolution network modules MLP _ GCN based on a spectrum domain are connected, so that the depth and performance of a network are improved, and the final three-dimensional depth features are obtained.

The SA module is specifically realized as follows: three fully-connected layers (nodes: 64-64-128) are connected first, then one maximum pool layer is connected, then two fully-connected layers (nodes: 128-256) are designed, and finally one maximum pool layer is added. The SA generates a feature dimension of n × 256.

Wherein, the two graph convolution network modules MLP _ GCN based on the spectrum domain, wherein, the first MLP _ GCN connects 3 full connection layers (node: 128-256), the number (K) of the nearest neighbor points is 32, the dimension n × n of the Laplace matrix L, the dimension of the weight parameter W of the convolution graph is 256 × 128, and the dimension of the output characteristic X is n × 128; the second MLP _ GCN is connected with 3 full-connected layers (node: 256-512), the nearest neighbor number (K) is 32, the dimension of the Laplace matrix (L) is n × n, the dimension of the graph convolution weight parameter W is 512 × 256, and the output characteristic dimension is n × 256.

The MLP _ GCN combines the multi-layer perceptron MLP and the graph convolution network GCN, can effectively extract the depth features based on the coordinates of the input point cloud, and improves the anti-rotation invariance and the identification of the features. The MLP _ GCN adopts a graph convolution method based on frequency spectrum to extract the depth features of the point cloud, so that the importance of the point cloud features is effectively enhanced.

The graph convolution network GCN uses a point cloud composed of irregular points to construct a graph structure, and then uses the graph structure to gather features of neighboring points, as shown in fig. 2. After the graph is constructed, a deep learning network is designed to process the graph structure of the point cloud, and the feature extraction of the point cloud data is realized.

The method comprises the following specific steps:

and constructing a graph structure of the point cloud. A K-Nearest Neighbor (KNN) algorithm is used to generate a set of points from the point cloud, each set of points having n points. An indirect graph G ═ (V, E) is defined, where V ═ V₁,v₂,v₃....v_nThe set of points is the set of each point set, and the set of edges E of the graph represents the adjacency between points in the point set. Finding K nearest neighbors (K) to each vertex in the set of points V<n) and connected to the vertex, thereby establishing an edge for the vertex. Thus completing the construction of all edges for each vertex in V. The matrix A ∈ R^n×nDefined as the adjacency matrix of the graph. When two vertices are connected by an edge, a (i, j) ═ 1; otherwise, a (i, j) is 0. Where i represents the serial number of the row in the matrix and j represents the serial number of the column in the matrix. Let the matrix D be equal to R^n×nDefined as a depth matrix, when i ═ j,

when i ≠ j, D (i, j) is 0. From the adjacency matrix and the depth matrix, a Laplace matrix can be obtained

Constructing a graph convolution model, wherein a graph convolution formula is as follows:

Output＝ReLU(LXW), (1)

in the formula (I), the compound is shown in the specification,

the feature extracted from each point set is defined as

Wherein c is_nRepresenting the length of the generated point cloud feature; ReLU (·) represents a modified linear unit activation function, X is obtained by an initial point set after MLP (multi-layer perceptron); the convolution of the graph requires learning a weight matrix

Wherein m is the output feature dimension of the graph convolution; output (Output ∈ R)^n×m) Is the output characteristic after graph convolution.

A symmetric function f (x) is designed in the MLP _ GCN₁,x₂,…x_n)＝G(h(x₁),h(x₂),…h(x_n) LXW, where h (·) is the point cloud feature extraction function implemented by the multi-layer fully-connected layers of MLP. MLP uses multiple fully-connected layers, i.e., multiple 2D convolutional layers (with a 1 × 1 convolution kernel). In this way, the point cloud features are extracted first to obtain each point feature in the point cloud. The G (-) function is then implemented by graph convolution using equation (1). And clustering the effective point cloud characteristics through convolution operation, and enhancing the rotation invariance and the identification of the characteristics.

The MLP _ GCN is constructed by using a sampling and grouping layer of PointNet + +, and three complete connection layers (nodes: 64-64-128, filter: 1 × 1) are connected to extract the point cloud feature X^n×128。

The input of the MLP _ GCN is a point set, the number of points in each point set is n, in order to convolute the graph in a frequency spectrum domain, the nearest K points of each point are searched in one point set, each point is connected with the nearest K points to form an edge, so that a graph G is established, and an adjacency matrix A and a depth moment are constructedArray D, calculating Laplace matrix L^n×n(ii) a As can be seen from equation (1), the parameter W of the convolution kernel is set to c_n×m，c_nTaking the characteristic length of the point cloud, and taking m as the output characteristic dimension of graph convolution to obtain an L.X.W value as output; and finally, connecting a maximum pool layer to obtain the deep learning characteristics of the point cloud.

Example 2

Based on the embodiment 1, the point cloud registration model registration method includes the following steps: firstly, model training is carried out, a feature alignment triplet loss structure loss function is used for training the model, and attention features and Descriptor (Descriptor) features are effectively extracted from point cloud; after model training, point cloud registration is carried out.

In the model learning process, the Detector model generates an anchor point a_anc＝(a₁,a₂,…,a_n) The attention vector of (1). Depth f of anchor generated by Descriptor model_anc＝(f_{anc_1},f_{anc_2,fanc_3},...,f_{anc_n}) Positive depth feature f_pos＝(f_{pos_1},f_{pos_2},f_{pos_3},...,f_{pos_n}) And negative depth feature f_neg＝(f_{neg_1},f_{neg_2},f_{neg_3},...,f_{neg_n}). The four feature vectors are combined by feature alignment triplet loss to construct the objective function.

The construction of the feature alignment triple Loss is based on the triple Loss function, and the triple Loss function formula is as follows:

in the formula (I), the compound is shown in the specification,

in the formula (f)_ancIs the depth characteristic of the anchor point, f_posIs a depth feature of a positive point, f_negIs a depth feature of a negative point, a_ancIs the attention vector of achor point, a'_iFor the ratio of each feature value to the sum of all features in the attention vector,

the input to the triple Loss function includes the anchor point feature (f)_anc) Positive point feature (f)_pos) And negative point feature (f)_neg). By optimizing the distance between the anchor point and the positive point and the distance between the anchor point and the negative point, the distance (D) between the anchor point and the positive point can be shortened_anc,pos) Increasing the distance (D) between anchor point and negative point_anc,neg). The margin is a positive and negative interval parameter. In training the triple Loss function, 2 pairs of corresponding point sets, namely, pairs of matching point sets, are required: anchor point P_anc＝(p₁,p₂,…,p_n) And positive point P_pos＝(p₁,p₂,…,p_n) And a set of unmatched point set pairs: anchor point (P)_anc) And negative point P_neg＝(p₁,p₂,…,p_n)。

In formula (2), the anchor point, the positive point, and the negative point are used to optimize the respective features by calculating euclidean distances. When point cloud P_ancAnd P_posAs input (P)_posIs through P_ancBy rotation and translation) of the image, the depth characteristic is f_anc＝(f_{anc_1},f_{anc_2},f_{anc_3},...,f_{anc_n}) And f_pos＝(f_{pos_1},f_{pos_2},f_{pos_3},...,f_{pos_n}). The optimization of the triple Loss is realized by calculating the Euclidean distance, and the distance calculation is emphasized, so that a more effective optimization model is obtained.

Based on formula (2), the semantic Alignment is introduced into the optimization of the model innovatively, and the Feature Alignment Triplet Loss function is designed again, whereinEquations (3) and (4) replace D of equation (2)_anc,posAnd D_anc,negSo as to calculate three point clouds P_pos，P_posAnd P_negDistance between all Descriptor pairs in between. Searching for P by Using Feature Alignment triple Loss_posThe feature of each matching point in the image is matched with P_ancThe feature of each point in the graph is closest. This excludes irrelevant feature information and the points in the set of points are semantically aligned. Equation (4) is to normalize each feature value in attention to all features in attention.

The model training comprises the following steps:

the method comprises the steps of firstly, automatically constructing a matching point pair and a non-matching point pair, selecting one point from one data set, selecting a corresponding point from the other data set obtained by rotating and translating the data set to form the matching point pair, calling the point before registration as anchor, calling the point after registration as positive, then randomly selecting a point which is not anchor and positive as negative, and enabling the point which is not anchor and positive to form the non-matching point pair with the anchor point.

And secondly, searching K Nearest neighbors of the anchor point, the positive point and the negative point respectively by using a K-Nearest Neighbor (KNN) algorithm to form three point sets which are used as input of the deep learning network.

And thirdly, obtaining the depth characteristics and attention characteristics of the anchor, the positive and the negative through the model.

In the network model, there is a Detector model, which mainly extracts the attention features of the generated point cloud through an MLP _ GCN module. The input to the MLP _ GCN module is a set of points, where each set of points has a number n. The MLP _ GCN model is first constructed using the PointNet + + sampling and grouping layer and the three complete connection layers (nodes: 64-64-128, filter: 1 × 1) are connected to extract the point cloud feature X^n×128. To convolve the graph in the spectral domain, the graph G is built by searching a set of points for the nearest K points of each point and connecting each point with its respective nearest K points to form an edge. Next, an adjacency matrix A and a depth matrix D are constructed, and then a Laplace matrix L is also calculated^n×n. As can be seen from equation (1), the parameter W of the convolution kernel is set to c_n×m，c_nAnd m is the output characteristic dimension of graph convolution, and the L.X.W value is obtained as output. And finally, connecting a maximum pool layer to obtain the deep learning characteristics of the point cloud.

Output＝ReLU(LXW) (5)

In the formula (I), the compound is shown in the specification,

in the model, a set iteration (SA) module extracted by a point set in a pointent + + is used for extracting initial features of the point cloud, and then two map convolution network modules (MLP _ GCNs) based on a spectrum domain are connected, so that the depth and the performance of the network are improved, and the final three-dimensional deep features are obtained. The specific implementation of the SA module is as follows: three fully-connected layers (nodes: 64-64-128) are connected first, then one maximum pool layer is connected, then two fully-connected layers (nodes: 128-256) are designed, and finally one maximum pool layer is added. The SA generates a feature dimension of n × 256. In addition, the first MLP _ GCN module connects three full-connected layers (nodes: 128-256), the number of nearest neighbor points (K) is 32, the dimension n × n of the Laplace matrix L, the dimension of the weight parameter W of the convolution map is 256 × 128, and the dimension of the output feature X is n × 128. The parameters of the second MLP _ GCN module include three fully-connected layers (node: 256-512), the nearest neighbor (K) is 32, the dimension of the Laplace matrix (L) is n × n, the dimension of the graph convolution weight parameter W is 512 × 256, and the output feature dimension is n × 256.

And fourthly, adding the characteristics into a characteristic Feature Alignment triple Loss function to optimize the characteristics and train the model.

The construction of Feature Alignment triple Loss is based on a triple Loss function, and the input of the triple Loss function comprises anchor point characteristics (f)_anc) Positive characteristic (f)_pos) And negative feature (f)_neg). By optimizing the anchor point and the positive point, the anchor point and the neThe distance between the positive points can be shortened by shortening the distance (D) between the anchor point and the positive point_anc,pos) Increasing the distance (D) between anchor point and negative point_anc,neg). The margin is an interval parameter between the anchor point and the negative point. In training the triple Loss function, 2 pairs of corresponding point sets, namely, pairs of matching point sets, are required: anchor point P_anc＝(p₁,p₂,…,p_n) And positive point P_pos＝(p₁,p₂,…,p_n) And a set of unmatched point set pairs: anchor point (P)_anc) And negative point P_neg＝(p₁,p₂,…,p_n). The anchor point, the positive point and the negative point are used to optimize the respective features by calculating euclidean distances. When point cloud P_ancAnd P_posAs input (P)_posIs through P_ancBy rotation and translation) of the image, the depth characteristic is f_anc＝(f_{anc_1},f_{anc_2},f_{anc_3},...,f_{anc_n}) And f_pos＝(f_{pos_1},f_{pos_2},f_{pos_3},...,f_{pos_n}). The optimization of the triple Loss is realized by calculating the Euclidean distance, and the distance calculation is emphasized, so that a more effective optimization model is obtained.

As shown in fig. 3, the point cloud registration includes the following steps:

first, uniformly sampling P from each of two groups of point clouds_mIs dotted and is at P_mSearching n nearest points of each point from the points to form P of each group of point clouds_mA point set; putting the point sets into a model, and generating a depth feature of each point set and an attention feature of each point;

second, the attention characteristics of each point are determined by L₂-a norm value to determine if the point is at a maximum in its neighborhood, and if so, to add it to a key point (keypoint) queue, whereby the key points of both sets of point clouds can be derived;

thirdly, sequencing the obtained attention values of the key points, and selecting front P_kThe points are key points;

the fourth step, according to each final offThe keypoints search their respective n nearest neighbors to obtain the final P_kPoint set, inputting the final point set into model to obtain P_kDepth feature vectors of each point in the point set;

fifthly, according to the depth characteristics of each point in the point set, determining the corresponding depth characteristics with the closest Euclidean distance in another point cloud, and further obtaining 2 multiplied by n multiplied by P_kPoint cloud matching point pairs;

sixthly, removing gross errors in the matching point set by using an RANSAC algorithm;

and seventhly, calculating a rotation matrix by adopting a least square method to obtain a registration result.

Example 3

Based on the above embodiment 2, the two public data sets are used in a certain environment, but it does not mean that the invention can only be performed in this environment or the data set. The present example is intended to embody the comparison of the present embodiment with the existing four other point cloud registration methods.

The test environment was as follows: intel Xeon E5-2620 v4 CPU, Nvidia Quadro TITAN Xp GPU, 12-GB RAM. The training process was performed with the Tensorflow framework running on Ubuntu 16.04.

The model was tested using two public data sets ((a) data set i and (b) data set ii), as shown in fig. 4, from ASL data sets reproducibility. The Hokuyo UTM-30LX laser sensor is used for data acquisition, and a total station is used for measuring absolute coordinates, so that the accuracy of registration data is guaranteed.

Data set I is public indoor scene lidar point cloud data set, and it includes 45 point cloud data, including some complicated indoor environment, such as desk, chair etc. of putting and shape irregularity, and average every point cloud quantity is 365000 points, and the scene size is 17m 10m 3 m. 40000 matching point pairs are randomly acquired from the training data of the data set I in the selected 30 point cloud scenes, and then the rest 15 point cloud scene data are used for carrying out verification experiments.

The data set II is outdoor laser scanning data, the position of the data set II is located in a park and comprises grassland, lanes, sparse trees and the like, the main building is a terrace which consists of rock walls and a wooden ceiling covering vines, the scene size of the data set II is 72m multiplied by 70m multiplied by 19m, and the average number of point clouds of each scene is 153000 points. Data set II has 32 point cloud data, and the computing example uses 15000 matching point pairs randomly collected in 15 scenes as training data, and the rest scenes as test data. Both data sets I and II have true value data after registration for verification of the registration result.

To verify the performance of the example method, the method was compared to the existing 4 methods. Differences of the four methods are shown in table 1.

Table 1 shows a comparison of the different processes

The first method is the Super4pcs method (method I), which is a point cloud registration using affine invariance of four points in a plane. The second method is Fast Global Registration (method II), which uses a Global optimization method to compute the rotation matrix. The third method is a voxelized deep learning network (method iii), which uses a 3DCNN method to extract the depth features, and adopts a network structure similar to Alexnet to extract local 3D depth features. The fourth method is 3Dfeatnet method (method 4), which also adopts a point set method and adds a structure similar to the t-net of PointNet to solve the problem of poor anti-rotation capability of the PointNet network on the basis of using the SA module of PointNet + +.

Data set I

Methods I-IV and the example methods were tested using data set I and the results are shown in Table 2:

TABLE 2

As can be seen from table 2, the RMSE value of the proposed method is much lower than that of the comparative method, and a robust value of the overlap ratio is also obtained.

As shown in fig. 5, the method proposed in this embodiment achieves more robust and lower error results compared with other methods. Methods I and II are global registration methods that have no learning process, but this limits the performance of these methods. As method III also employs deep learning, since it is based on a voxel deep learning method for feature extraction. Due to the inflexibility of voxel size setting and information loss in the voxelization process, registration deviation is easily caused. The method IV also adopts a point set method, and since a GCN network is not used, it needs to learn a rotation matrix, which makes rotation transformation insensitive and easily causes matching errors. The method of the embodiment can better register the point cloud, and the obtained effect is better than that of other methods. The method of the present embodiment is superior to other methods in performance on matching pairs, since the GCN network can be more adaptive to the rotation transformation of the point cloud.

Data set II

Methods I-IV and the example methods were tested using data set II and the results are shown in Table 3:

TABLE 3

As can be seen from table 3, the method of the present embodiment solves the anti-rotation problem in the network structure, and performs training and registration in a point set-based manner, so that the registration problem in a large scene is better solved, and a registration effect superior to that of the method I-IV is obtained.

As shown in fig. 6, the outdoor scene point cloud is sparse, and the registration work is difficult to complete in the method I and the method II because the number of the found matching points is small. Due to the limitation of voxels, in a large scene, the method III is limited by GPU memory, and cannot be trained by smaller voxels, which results in poor matching effect. The matching effect of the method IV can be seen, and the method is consistent with the result in the paper of the method.

Through the tests on the data set I and the data set II, the point cloud Registration model combining the attention mechanism and the three-dimensional graph convolution network, which is provided by the invention, can be shown to be compared with the traditional Super4pcs, Fast Global Registration, a point cloud deep learning Registration model based on voxels and 3DFeat-Net, and the algorithm structure of the invention is optimal no matter from indoor or outdoor. The rotation resistance of the point cloud features is improved by improving the symmetric function and adopting a graph convolution mode, and the method is a method for effectively improving the point cloud registration result. The adaptability and robustness of the algorithm provided by the invention in each scene are verified.

The above description is only a preferred embodiment of the present invention, and it is not intended to limit the scope of the present invention, and various modifications and changes may be made by those skilled in the art. Variations, modifications, substitutions, integrations and parameter changes of the embodiments may be made without departing from the principle and spirit of the invention, which may be within the spirit and principle of the invention, by conventional substitution or may realize the same function.

Claims

1. A point cloud registration model registration method combining an attention mechanism and a three-dimensional graph convolution network is characterized by comprising the following steps: firstly, model training is carried out, a feature alignment triplet loss structure loss function is used for training the model, and attention features and Descriptor (Descriptor) features are effectively extracted from point cloud; after model training, point cloud registration is carried out;

the model is a three-branch Siamese (Siamese) framework and comprises a Detector model and a Descriptor model, wherein the Detector model is used for extracting the attention characteristics of points and constructing an attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud;

the Detector model mainly extracts the attention features of the generated point cloud through a spectral domain-based image volume network module MLP _ GCN, and 5 complete connecting layers are used in the MLP _ GCN module of the Detector model to extract the initial point cloud features, so that the function of extracting the point cloud features is further realized;

the Descriptor model firstly uses a set interaction (SA) module extracted by a PointNet + + midpoint set to extract initial features of point cloud, and then connects two graph convolution network modules MLP _ GCN based on a spectrum domain, thereby improving the depth and performance of a network and obtaining final three-dimensional depth features;

the point cloud registration comprises the steps of:

uniformly sampling Pm points from each group of the two groups of point clouds, and searching n points nearest to each point in the Pm points to form a Pm point set of each group of point clouds; putting the point sets into a model, and generating a depth feature and an attention feature of each point set;

sequencing the obtained attention values of the key points, and selecting the top P_kThe points are key points;

according to the depth characteristics of each point in the point set, determining the corresponding depth characteristics with the closest Euclidean distance in another point cloud, and further obtaining 2 multiplied by n multiplied by P_kPoint cloud matching point pairs;

removing gross errors in the matching point set by using a RANSAC algorithm;

2. The method of claim 1, wherein the two spectral domain-based map convolution network modules MLP _ GCN are connected to each other by a first MLP _ GCN having 3 fully-connected layers and an output feature dimension of n × 128; the second MLP _ GCN connects 3 fully connected layers with an output feature dimension of n × 256.

3. The method as claimed in claim 2, wherein the MLP _ GCN combines a multi-layer perceptron MLP and a graph convolution network GCN, so as to effectively extract depth features based on coordinates of the input point cloud, and improve the anti-rotation invariance and the discrimination of the features.

4. The method of claim 3, wherein the MLP _ GCN is constructed using PointNet + + sampling and grouping layers and 3 complete connection layers are connected to extract point cloud features X^n×128。

5. The method of claim 4, wherein the MLP _ GCN inputs are point sets, the number of points in each point set is n, the nearest K points of each point are searched in one point set, each point is connected with the nearest K points to form an edge, a graph G is built, an adjacency matrix A and a depth matrix D are constructed, and a Laplace matrix L is calculated^n×n(ii) a Setting W as C for the parameter of the convolution kernel_n×m，C_nTaking the characteristic length of the point cloud, and taking m as the output characteristic dimension of graph convolution to obtain an L.X.W value as output; finally, a maximum pooling layer is connected to obtain deep learning features of the point cloud.

6. The method for registering a point cloud registration model combining an attention mechanism and a three-dimensional graph convolution network according to claim 1, wherein the model training comprises the following steps: