CN111882593B - Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network - Google Patents

Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network Download PDF

Info

Publication number
CN111882593B
CN111882593B CN202010717508.9A CN202010717508A CN111882593B CN 111882593 B CN111882593 B CN 111882593B CN 202010717508 A CN202010717508 A CN 202010717508A CN 111882593 B CN111882593 B CN 111882593B
Authority
CN
China
Prior art keywords
point
point cloud
model
features
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010717508.9A
Other languages
Chinese (zh)
Other versions
CN111882593A (en
Inventor
张振鑫
孙澜
钟若飞
李小娟
宫辉力
邹建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN202010717508.9A priority Critical patent/CN111882593B/en
Publication of CN111882593A publication Critical patent/CN111882593A/en
Application granted granted Critical
Publication of CN111882593B publication Critical patent/CN111882593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/344Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a point cloud registration model and a point cloud registration method combining an attention mechanism and a three-dimensional graph convolution network. The Detector model is used for extracting attention characteristics of points and constructing an attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud. The method comprises the steps of firstly, carrying out model training, training the model by utilizing a feature alignment triple loss structure loss function, and effectively extracting attention features and Descriptor (Descriptor) features from point cloud; and after model training, point cloud registration is carried out. The method can automatically extract the key points and the three-dimensional depth features of each key point, combines the multi-layer perceptron MLP with the graph convolution network GCN in the three-dimensional graph convolution network, designs a new point cloud feature extraction module, can extract more point cloud features with identification significance, and improves the accuracy of point cloud registration.

Description

Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network
Technical Field
The invention relates to the field of computer vision and geospatial information science, in particular to a point cloud registration model and a point cloud registration method combining an attention mechanism and a three-dimensional graph convolution network.
Background
The three-dimensional point cloud can provide rich and dense object space information and plays an important role in the fields of civil traffic engineering, tunnel engineering, digital cities, synchronous positioning and surveying and mapping and the like. In these applications, point cloud registration is a basic and critical problem, and due to errors of positioning sensors or inconsistency of a coordinate system, different phases or views of spatial data have a certain mismatch, and complexity and local similarity of spatial objects, automatic and efficient registration of cloud points is challenging.
The 3D point cloud Descriptor includes 3D Harris, 3D scale-invariant feature transform (SIFT), Normal Aligned Radial Feature (NARF), and local surface feature (local surface patch feature). In addition, there are some descriptors for spatial statistics, such as Fast Point Feature Histogram (FPFH), Point Feature Histogram (PFH), etc. These traditional point cloud descriptors are mainly designed manually, and the three-dimensional features have been modularized in a Point Cloud Library (PCL), which is not strong in feature significance and flexibility.
Compared with the manually designed features, the features can be automatically and efficiently constructed by utilizing the end-to-end deep learning model. However, due to the discreteness and irregularity of the point cloud, how to express the characteristics of the point cloud in deep learning has certain challenges. In 2017, due to the fact that a voxel-based point cloud deep learning model 3Dmatch exists, the registration of 3D point clouds also enters the deep learning era. The voxel-based deep learning feature Descriptor is designed for point cloud registration, while the 3D twin convolutional neural network allows voxels to be mapped through the network to obtain a 512-dimensional depth Descriptor. The PPFNet realizes point cloud registration by designing a context-aware local feature Descriptor; the 3DFeat-Net uses the PointNet model to learn the rotation matrix and the point cloud depth characteristics, and improves the anti-rotation capability of the PointNet model. The above-described working backbone networks (PPFNet and 3DFeatNet) both use the PointNet model.
Still have some problems, such as the depth learning method based on 3D voxel calculates time is longer, can lose some data precision in the process of voxelization, is not suitable for high-efficient registration; due to the performance of hardware devices, the size of the voxel is greatly limited, which further affects the quality of the registration result in some large scenes; PointNet and its family of works (e.g., PointNet + +) have some limitations on anti-spin performance and require many data enhancement operations (e.g., spinning) during training to meet the performance of the model; due to the quality limitations of the training samples, the model becomes unstable, and if the training samples are insufficient or the scene difference is large, the performance of the model may be degraded during the registration process.
In the aspect of point cloud processing, there are some researches combined with Graph Convolutional Network (GCN), which mainly include a spectrum domain GCN, a non-spectrum domain GCN, and the like. In addition, the accuracy of point cloud classification is improved due to the appearance of a super point graph (SuperPoint Graphs) network for large scene classification and an over-segmentation deep learning network of point clouds. In the methods, the GCN can effectively improve the precision of point cloud classification and improve the robustness of the algorithm. However, when the backbone network of these methods is directly used for point cloud registration, there are problems of high time complexity, long training time, and the like.
Therefore, a technical problem to be solved by those skilled in the art is to provide a registration network with a simple structure and a good effect, so that the point cloud registration can be completed in a short time and with a good result.
Disclosure of Invention
In view of this, an object of the present application is to provide a point cloud registration model and a point cloud registration method that combine an attention mechanism and a three-dimensional graph convolution network to extract more point cloud features with values and improve accuracy of point cloud registration.
In order to achieve the above object, the present application provides the following technical solutions.
A point cloud registration model combining an attention mechanism and a three-dimensional graph convolution network is a three-branch Siamese (Siamese) framework and comprises a Detector model and a Descriptor model, wherein the Detector model is used for extracting attention characteristics of points and constructing the attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud.
Preferably, the Detector model mainly extracts the attention features of the generated point cloud through a spectral domain-based graph convolution network module MLP _ GCN, and 5 complete connection layers (channels: 64, 64 and 128, filter: 1 × 1) are used in the MLP _ GCN module of the Detector model to extract the initial point cloud features, so as to further realize the function of extracting the point cloud features.
Preferably, the Descriptor model firstly uses a set interaction (SA) module extracted from a PointNet + + midpoint set to extract initial features of the point cloud, and then connects two map convolution network modules MLP _ GCN based on a spectral domain, thereby improving the depth and performance of the network and obtaining the final three-dimensional depth features.
Preferably, two spectrum domain-based graph convolution network modules MLP _ GCN, wherein the first MLP _ GCN connects 3 full-connection layers (node: 128-256), and the output characteristic dimension is n × 128; the second MLP _ GCN is connected to 3 full-connected layers (node: 256-512) and has an output characteristic dimension of n × 256.
Preferably, the MLP _ GCN combines the multi-layer perceptron MLP and the graph convolution network GCN, and can effectively extract the depth features based on the coordinates of the input point cloud, and improve the anti-rotation invariance and the identification of the features.
Preferably, the MLP _ GCN is constructed by using a sampling and grouping layer of PointNet + +, and 3 complete connection layers (nodes: 64-64-128, filter: 1 × 1) are connected to extract the point cloud feature Xn×128
Preferably, the MLP _ GCN inputs are point sets, the number of points in each point set is n, the nearest K points of each point are searched in one point set, each point is connected with the respective nearest K points to form an edge, so that a graph G is established, an adjacency matrix a and a depth matrix D are reconstructed, and a laplacian matrix L is calculatedn×n(ii) a Setting W as a parameter of the convolution kernel to cn×m,cnTaking the characteristic length of the point cloud, and taking m as the output characteristic dimension of graph convolution to obtain an L.X.W value as output; finally, connect one to the mostAnd (5) enlarging the pool layer to obtain deep learning characteristics of the point cloud.
The point cloud registration model registration method comprises the following steps: firstly, model training is carried out, a feature alignment triplet loss structure loss function is used for training the model, and attention features and Descriptor (Descriptor) features are effectively extracted from point cloud; and after model training, point cloud registration is carried out.
Preferably, the model training comprises the steps of:
automatically constructing a matching point pair and a non-matching point pair, selecting one point from one data set, taking a corresponding point from the other data set to form the matching point pair, calling the point before registration as anchor, and the point after registration as positive, and then randomly selecting a point which is not the anchor and the positive as negative to form the non-matching point pair with the anchor point;
respectively searching K Nearest neighbors of an anchor point, a positive point and a negative point by using a K-Nearest Neighbor (KNN) algorithm to form three point sets which are used as input of a deep learning network;
obtaining the depth characteristics and attention characteristics of anchors, positive and negative through the model;
the above features are added to the Feature Alignment triple Loss function to optimize the features and train the model.
Preferably, the point cloud registration comprises the steps of:
uniformly sampling P from each of two sets of point cloudsmIs dotted and is at PmSearching n nearest points of each point from the points to form P of each group of point cloudsmA point set; putting the point sets into a model, and generating a depth feature and an attention feature of each point in each point set;
the attention characteristics of each point utilize L2-a norm value to determine if the point is at a maximum in its neighborhood, and if so, to add it to a key point (keypoint) queue, whereby the key points of both sets of point clouds can be derived;
sequencing the obtained attention values of the key points, and selecting front PkThe points are key points;
searching their respective n nearest neighbors according to each final keypoint to obtain a final PkPoint set, inputting the final point set into model to obtain PkDepth feature vectors of each point in the point set;
according to the depth characteristics of each point in the point set, determining the corresponding depth characteristics with the closest Euclidean distance in another point cloud, and further obtaining 2 XnxPkPoint cloud matching point pairs;
removing gross errors in the matching point set by using a RANSAC algorithm;
and calculating a rotation matrix by adopting a least square method to obtain a registration result.
The beneficial technical effects obtained by the invention are as follows:
1) the invention solves the defects of the existing point cloud registration work, designs a weak supervision three-dimensional graph convolution network for point cloud registration, and the network can automatically extract the three-dimensional depth characteristics of a point set consisting of key points, each key point and adjacent points thereof; in a three-dimensional graph convolution network, a multi-layer sensor MLP is combined with a graph convolution network GCN, and a new point cloud feature extraction module (named as MLP _ GCN) is designed, so that more point cloud features with identification significance can be extracted by the module, and the accuracy of point cloud registration is improved;
2) the invention combines PointNet + + and graph convolution network GCN into a new graph deep learning model, which makes full use of the advantages of GCN in point set characteristic expression and high-efficiency calculation;
3) according to the attention characteristics, the method automatically generates key points and depth characteristics thereof by adopting a non-maximum suppression (NMS) method, and the whole network training process only needs to establish matching point pairs and does not need to mark the key points; the mechanism is a weak supervision learning method, and can save a large amount of time for manually marking matching point pairs;
4) the invention is based on deep learning and has a learning process. Compared with the traditional non-deep learning method, the method has better performance, especially when the outdoor scene point cloud is sparse; by adopting a point set mode, the method is not like a cube with fixed voxels, has higher flexibility of shape and size, reduces information loss, and extracts more comprehensive characteristics, thereby having better final registration effect;
5) compared with the mode that a point set is adopted but a GCN network is not used, the method has the advantages that the GCN network is used, so that the method is more sensitive to spatial rotation and has better registration effect; especially in a large scene, due to the limited memory of the current GPU, the used voxels during training cannot be too small, and the final matching effect is poor, so the problem is greatly alleviated by the method.
The foregoing description is only an overview of the technical solutions of the present application, so that the technical means of the present application can be more clearly understood and the present application can be implemented according to the content of the description, and in order to make the above and other objects, features and advantages of the present application more clearly understood, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
Fig. 1 is a schematic structural diagram of a point cloud registration model combining an attention machine mechanism and a three-dimensional graph convolution network in embodiment 1 of the present disclosure;
FIG. 2 shows an input of C in example 1 of the present disclosurenThe channel and the graph convolution network GCN with the output of m feature maps are schematic structural diagrams;
fig. 3 is a schematic flow chart of point cloud registration in embodiment 2 of the present disclosure;
fig. 4 is a data set disclosed in example 3 of the present disclosure ((a) data set i and (b) data set ii);
FIG. 5 is a graph of the test results of the method of example 3 of the present disclosure and the prior methods I-IV in data set I;
FIG. 6 is a graph of the results of the testing of the method of example 3 of the present disclosure and prior art methods I-IV in data set II.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.
Further, the present application may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion.
Example 1
A point cloud registration model combining an attention mechanism and a three-dimensional graph convolution network, as shown in fig. 1, the model is a three-branch Siamese (siame) framework, and includes a Detector model and a Descriptor model, the Detector model is used for extracting attention characteristics of points to construct the attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud.
The Detector model mainly extracts the attention features of the generated point cloud through a spectral domain-based image convolution network module MLP _ GCN, 5 complete connecting layers (channels: 64, 64 and 128 and a filter: 1 multiplied by 1) are used in the MLP _ GCN module of the Detector model to extract the initial point cloud features, and a point cloud feature extraction function h (-) is further realized. Then, the point cloud feature (X) is extracted by using a function h (·), and a laplacian matrix L (the number of nearest neighbors is set to K32, and the matrix L is n × n) is solved by convolution operation. And then, convolving the image in the spectral domain by utilizing the matrix L and the point cloud characteristic X to obtain L.X.W. Finally, L.X.W is connected to the max pooling layer, where the dimension of the W parameter is 256X 1 and the dimension of the output feature is n X1.
The Detector model mainly extracts the attention features of the generated point cloud through an MLP _ GCN module. The Detector model may express the importance of each point feature in the point cloud. Through the model, the characteristic expression effect of the point cloud can be enhanced, the negative influence of the point cloud is weakened, and the identification of the region is reflected. In the registration process, the key points are automatically detected by a method of searching local maximum values, and the characteristic expression of the point cloud attention is realized. Meanwhile, the point cloud registration is effectively finished by designing a weak supervised learning method without marking matching point pairs.
The Descriptor model is used for extracting deep features of each point in the point cloud by utilizing a convolution network. Firstly, a set iteration (SA) module extracted from a pointenet + + midpoint set is used for extracting initial features of point cloud, and then two graph convolution network modules MLP _ GCN based on a spectrum domain are connected, so that the depth and performance of a network are improved, and the final three-dimensional depth features are obtained.
The SA module is specifically realized as follows: three fully-connected layers (nodes: 64-64-128) are connected first, then one maximum pool layer is connected, then two fully-connected layers (nodes: 128-256) are designed, and finally one maximum pool layer is added. The SA generates a feature dimension of n × 256.
Wherein, the two graph convolution network modules MLP _ GCN based on the spectrum domain, wherein, the first MLP _ GCN connects 3 full connection layers (node: 128-256), the number (K) of the nearest neighbor points is 32, the dimension n × n of the Laplace matrix L, the dimension of the weight parameter W of the convolution graph is 256 × 128, and the dimension of the output characteristic X is n × 128; the second MLP _ GCN is connected with 3 full-connected layers (node: 256-512), the nearest neighbor number (K) is 32, the dimension of the Laplace matrix (L) is n × n, the dimension of the graph convolution weight parameter W is 512 × 256, and the output characteristic dimension is n × 256.
The MLP _ GCN combines the multi-layer perceptron MLP and the graph convolution network GCN, can effectively extract the depth features based on the coordinates of the input point cloud, and improves the anti-rotation invariance and the identification of the features. The MLP _ GCN adopts a graph convolution method based on frequency spectrum to extract the depth features of the point cloud, so that the importance of the point cloud features is effectively enhanced.
The graph convolution network GCN uses a point cloud composed of irregular points to construct a graph structure, and then uses the graph structure to gather features of neighboring points, as shown in fig. 2. After the graph is constructed, a deep learning network is designed to process the graph structure of the point cloud, and the feature extraction of the point cloud data is realized.
The method comprises the following specific steps:
and constructing a graph structure of the point cloud. A K-Nearest Neighbor (KNN) algorithm is used to generate a set of points from the point cloud, each set of points having n points. An indirect graph G ═ (V, E) is defined, where V ═ V1,v2,v3....vnThe set of points is the set of each point set, and the set of edges E of the graph represents the adjacency between points in the point set. Finding K nearest neighbors (K) to each vertex in the set of points V<n) and connected to the vertex, thereby establishing an edge for the vertex. Thus completing the construction of all edges for each vertex in V. The matrix A ∈ Rn×nDefined as the adjacency matrix of the graph. When two vertices are connected by an edge, a (i, j) ═ 1; otherwise, a (i, j) is 0. Where i represents the serial number of the row in the matrix and j represents the serial number of the column in the matrix. Let the matrix D be equal to Rn×nDefined as a depth matrix, when i ═ j,
Figure BDA0002598761840000071
when i ≠ j, D (i, j) is 0. From the adjacency matrix and the depth matrix, a Laplace matrix can be obtained
Figure BDA0002598761840000072
Constructing a graph convolution model, wherein a graph convolution formula is as follows:
Output=ReLU(LXW), (1)
in the formula (I), the compound is shown in the specification,
Figure BDA0002598761840000073
the feature extracted from each point set is defined as
Figure BDA0002598761840000075
Wherein c isnRepresenting the length of the generated point cloud feature; ReLU (·) represents a modified linear unit activation function, X is obtained by an initial point set after MLP (multi-layer perceptron); the convolution of the graph requires learning a weight matrix
Figure BDA0002598761840000074
Wherein m is the output feature dimension of the graph convolution; output (Output ∈ R)n×m) Is the output characteristic after graph convolution.
A symmetric function f (x) is designed in the MLP _ GCN1,x2,…xn)=G(h(x1),h(x2),…h(xn) LXW, where h (·) is the point cloud feature extraction function implemented by the multi-layer fully-connected layers of MLP. MLP uses multiple fully-connected layers, i.e., multiple 2D convolutional layers (with a 1 × 1 convolution kernel). In this way, the point cloud features are extracted first to obtain each point feature in the point cloud. The G (-) function is then implemented by graph convolution using equation (1). And clustering the effective point cloud characteristics through convolution operation, and enhancing the rotation invariance and the identification of the characteristics.
The MLP _ GCN is constructed by using a sampling and grouping layer of PointNet + +, and three complete connection layers (nodes: 64-64-128, filter: 1 × 1) are connected to extract the point cloud feature Xn×128
The input of the MLP _ GCN is a point set, the number of points in each point set is n, in order to convolute the graph in a frequency spectrum domain, the nearest K points of each point are searched in one point set, each point is connected with the nearest K points to form an edge, so that a graph G is established, and an adjacency matrix A and a depth moment are constructedArray D, calculating Laplace matrix Ln×n(ii) a As can be seen from equation (1), the parameter W of the convolution kernel is set to cn×m,cnTaking the characteristic length of the point cloud, and taking m as the output characteristic dimension of graph convolution to obtain an L.X.W value as output; and finally, connecting a maximum pool layer to obtain the deep learning characteristics of the point cloud.
Example 2
Based on the embodiment 1, the point cloud registration model registration method includes the following steps: firstly, model training is carried out, a feature alignment triplet loss structure loss function is used for training the model, and attention features and Descriptor (Descriptor) features are effectively extracted from point cloud; after model training, point cloud registration is carried out.
In the model learning process, the Detector model generates an anchor point aanc=(a1,a2,…,an) The attention vector of (1). Depth f of anchor generated by Descriptor modelanc=(fanc_1,fanc_2,fanc_3,...,fanc_n) Positive depth feature fpos=(fpos_1,fpos_2,fpos_3,...,fpos_n) And negative depth feature fneg=(fneg_1,fneg_2,fneg_3,...,fneg_n). The four feature vectors are combined by feature alignment triplet loss to construct the objective function.
The construction of the feature alignment triple Loss is based on the triple Loss function, and the triple Loss function formula is as follows:
Figure BDA0002598761840000081
in the formula (I), the compound is shown in the specification,
Figure BDA0002598761840000082
Figure BDA0002598761840000083
in the formula (f)ancIs the depth characteristic of the anchor point, fposIs a depth feature of a positive point, fnegIs a depth feature of a negative point, aancIs the attention vector of achor point, a'iFor the ratio of each feature value to the sum of all features in the attention vector,
Figure BDA0002598761840000084
the input to the triple Loss function includes the anchor point feature (f)anc) Positive point feature (f)pos) And negative point feature (f)neg). By optimizing the distance between the anchor point and the positive point and the distance between the anchor point and the negative point, the distance (D) between the anchor point and the positive point can be shortenedanc,pos) Increasing the distance (D) between anchor point and negative pointanc,neg). The margin is a positive and negative interval parameter. In training the triple Loss function, 2 pairs of corresponding point sets, namely, pairs of matching point sets, are required: anchor point Panc=(p1,p2,…,pn) And positive point Ppos=(p1,p2,…,pn) And a set of unmatched point set pairs: anchor point (P)anc) And negative point Pneg=(p1,p2,…,pn)。
In formula (2), the anchor point, the positive point, and the negative point are used to optimize the respective features by calculating euclidean distances. When point cloud PancAnd PposAs input (P)posIs through PancBy rotation and translation) of the image, the depth characteristic is fanc=(fanc_1,fanc_2,fanc_3,...,fanc_n) And fpos=(fpos_1,fpos_2,fpos_3,...,fpos_n). The optimization of the triple Loss is realized by calculating the Euclidean distance, and the distance calculation is emphasized, so that a more effective optimization model is obtained.
Based on formula (2), the semantic Alignment is introduced into the optimization of the model innovatively, and the Feature Alignment Triplet Loss function is designed again, whereinEquations (3) and (4) replace D of equation (2)anc,posAnd Danc,negSo as to calculate three point clouds Ppos,PposAnd PnegDistance between all Descriptor pairs in between. Searching for P by Using Feature Alignment triple LossposThe feature of each matching point in the image is matched with PancThe feature of each point in the graph is closest. This excludes irrelevant feature information and the points in the set of points are semantically aligned. Equation (4) is to normalize each feature value in attention to all features in attention.
The model training comprises the following steps:
the method comprises the steps of firstly, automatically constructing a matching point pair and a non-matching point pair, selecting one point from one data set, selecting a corresponding point from the other data set obtained by rotating and translating the data set to form the matching point pair, calling the point before registration as anchor, calling the point after registration as positive, then randomly selecting a point which is not anchor and positive as negative, and enabling the point which is not anchor and positive to form the non-matching point pair with the anchor point.
And secondly, searching K Nearest neighbors of the anchor point, the positive point and the negative point respectively by using a K-Nearest Neighbor (KNN) algorithm to form three point sets which are used as input of the deep learning network.
And thirdly, obtaining the depth characteristics and attention characteristics of the anchor, the positive and the negative through the model.
In the network model, there is a Detector model, which mainly extracts the attention features of the generated point cloud through an MLP _ GCN module. The input to the MLP _ GCN module is a set of points, where each set of points has a number n. The MLP _ GCN model is first constructed using the PointNet + + sampling and grouping layer and the three complete connection layers (nodes: 64-64-128, filter: 1 × 1) are connected to extract the point cloud feature Xn×128. To convolve the graph in the spectral domain, the graph G is built by searching a set of points for the nearest K points of each point and connecting each point with its respective nearest K points to form an edge. Next, an adjacency matrix A and a depth matrix D are constructed, and then a Laplace matrix L is also calculatedn×n. As can be seen from equation (1), the parameter W of the convolution kernel is set to cn×m,cnAnd m is the output characteristic dimension of graph convolution, and the L.X.W value is obtained as output. And finally, connecting a maximum pool layer to obtain the deep learning characteristics of the point cloud.
Output=ReLU(LXW) (5)
In the formula (I), the compound is shown in the specification,
Figure BDA0002598761840000091
in the model, a set iteration (SA) module extracted by a point set in a pointent + + is used for extracting initial features of the point cloud, and then two map convolution network modules (MLP _ GCNs) based on a spectrum domain are connected, so that the depth and the performance of the network are improved, and the final three-dimensional deep features are obtained. The specific implementation of the SA module is as follows: three fully-connected layers (nodes: 64-64-128) are connected first, then one maximum pool layer is connected, then two fully-connected layers (nodes: 128-256) are designed, and finally one maximum pool layer is added. The SA generates a feature dimension of n × 256. In addition, the first MLP _ GCN module connects three full-connected layers (nodes: 128-256), the number of nearest neighbor points (K) is 32, the dimension n × n of the Laplace matrix L, the dimension of the weight parameter W of the convolution map is 256 × 128, and the dimension of the output feature X is n × 128. The parameters of the second MLP _ GCN module include three fully-connected layers (node: 256-512), the nearest neighbor (K) is 32, the dimension of the Laplace matrix (L) is n × n, the dimension of the graph convolution weight parameter W is 512 × 256, and the output feature dimension is n × 256.
And fourthly, adding the characteristics into a characteristic Feature Alignment triple Loss function to optimize the characteristics and train the model.
The construction of Feature Alignment triple Loss is based on a triple Loss function, and the input of the triple Loss function comprises anchor point characteristics (f)anc) Positive characteristic (f)pos) And negative feature (f)neg). By optimizing the anchor point and the positive point, the anchor point and the neThe distance between the positive points can be shortened by shortening the distance (D) between the anchor point and the positive pointanc,pos) Increasing the distance (D) between anchor point and negative pointanc,neg). The margin is an interval parameter between the anchor point and the negative point. In training the triple Loss function, 2 pairs of corresponding point sets, namely, pairs of matching point sets, are required: anchor point Panc=(p1,p2,…,pn) And positive point Ppos=(p1,p2,…,pn) And a set of unmatched point set pairs: anchor point (P)anc) And negative point Pneg=(p1,p2,…,pn). The anchor point, the positive point and the negative point are used to optimize the respective features by calculating euclidean distances. When point cloud PancAnd PposAs input (P)posIs through PancBy rotation and translation) of the image, the depth characteristic is fanc=(fanc_1,fanc_2,fanc_3,...,fanc_n) And fpos=(fpos_1,fpos_2,fpos_3,...,fpos_n). The optimization of the triple Loss is realized by calculating the Euclidean distance, and the distance calculation is emphasized, so that a more effective optimization model is obtained.
As shown in fig. 3, the point cloud registration includes the following steps:
first, uniformly sampling P from each of two groups of point cloudsmIs dotted and is at PmSearching n nearest points of each point from the points to form P of each group of point cloudsmA point set; putting the point sets into a model, and generating a depth feature of each point set and an attention feature of each point;
second, the attention characteristics of each point are determined by L2-a norm value to determine if the point is at a maximum in its neighborhood, and if so, to add it to a key point (keypoint) queue, whereby the key points of both sets of point clouds can be derived;
thirdly, sequencing the obtained attention values of the key points, and selecting front PkThe points are key points;
the fourth step, according to each final offThe keypoints search their respective n nearest neighbors to obtain the final PkPoint set, inputting the final point set into model to obtain PkDepth feature vectors of each point in the point set;
fifthly, according to the depth characteristics of each point in the point set, determining the corresponding depth characteristics with the closest Euclidean distance in another point cloud, and further obtaining 2 multiplied by n multiplied by PkPoint cloud matching point pairs;
sixthly, removing gross errors in the matching point set by using an RANSAC algorithm;
and seventhly, calculating a rotation matrix by adopting a least square method to obtain a registration result.
Example 3
Based on the above embodiment 2, the two public data sets are used in a certain environment, but it does not mean that the invention can only be performed in this environment or the data set. The present example is intended to embody the comparison of the present embodiment with the existing four other point cloud registration methods.
The test environment was as follows: intel Xeon E5-2620 v4 CPU, Nvidia Quadro TITAN Xp GPU, 12-GB RAM. The training process was performed with the Tensorflow framework running on Ubuntu 16.04.
The model was tested using two public data sets ((a) data set i and (b) data set ii), as shown in fig. 4, from ASL data sets reproducibility. The Hokuyo UTM-30LX laser sensor is used for data acquisition, and a total station is used for measuring absolute coordinates, so that the accuracy of registration data is guaranteed.
Data set I is public indoor scene lidar point cloud data set, and it includes 45 point cloud data, including some complicated indoor environment, such as desk, chair etc. of putting and shape irregularity, and average every point cloud quantity is 365000 points, and the scene size is 17m 10m 3 m. 40000 matching point pairs are randomly acquired from the training data of the data set I in the selected 30 point cloud scenes, and then the rest 15 point cloud scene data are used for carrying out verification experiments.
The data set II is outdoor laser scanning data, the position of the data set II is located in a park and comprises grassland, lanes, sparse trees and the like, the main building is a terrace which consists of rock walls and a wooden ceiling covering vines, the scene size of the data set II is 72m multiplied by 70m multiplied by 19m, and the average number of point clouds of each scene is 153000 points. Data set II has 32 point cloud data, and the computing example uses 15000 matching point pairs randomly collected in 15 scenes as training data, and the rest scenes as test data. Both data sets I and II have true value data after registration for verification of the registration result.
To verify the performance of the example method, the method was compared to the existing 4 methods. Differences of the four methods are shown in table 1.
Table 1 shows a comparison of the different processes
Figure BDA0002598761840000111
The first method is the Super4pcs method (method I), which is a point cloud registration using affine invariance of four points in a plane. The second method is Fast Global Registration (method II), which uses a Global optimization method to compute the rotation matrix. The third method is a voxelized deep learning network (method iii), which uses a 3DCNN method to extract the depth features, and adopts a network structure similar to Alexnet to extract local 3D depth features. The fourth method is 3Dfeatnet method (method 4), which also adopts a point set method and adds a structure similar to the t-net of PointNet to solve the problem of poor anti-rotation capability of the PointNet network on the basis of using the SA module of PointNet + +.
Data set I
Methods I-IV and the example methods were tested using data set I and the results are shown in Table 2:
TABLE 2
Figure BDA0002598761840000121
As can be seen from table 2, the RMSE value of the proposed method is much lower than that of the comparative method, and a robust value of the overlap ratio is also obtained.
As shown in fig. 5, the method proposed in this embodiment achieves more robust and lower error results compared with other methods. Methods I and II are global registration methods that have no learning process, but this limits the performance of these methods. As method III also employs deep learning, since it is based on a voxel deep learning method for feature extraction. Due to the inflexibility of voxel size setting and information loss in the voxelization process, registration deviation is easily caused. The method IV also adopts a point set method, and since a GCN network is not used, it needs to learn a rotation matrix, which makes rotation transformation insensitive and easily causes matching errors. The method of the embodiment can better register the point cloud, and the obtained effect is better than that of other methods. The method of the present embodiment is superior to other methods in performance on matching pairs, since the GCN network can be more adaptive to the rotation transformation of the point cloud.
Data set II
Methods I-IV and the example methods were tested using data set II and the results are shown in Table 3:
TABLE 3
Figure BDA0002598761840000122
As can be seen from table 3, the method of the present embodiment solves the anti-rotation problem in the network structure, and performs training and registration in a point set-based manner, so that the registration problem in a large scene is better solved, and a registration effect superior to that of the method I-IV is obtained.
As shown in fig. 6, the outdoor scene point cloud is sparse, and the registration work is difficult to complete in the method I and the method II because the number of the found matching points is small. Due to the limitation of voxels, in a large scene, the method III is limited by GPU memory, and cannot be trained by smaller voxels, which results in poor matching effect. The matching effect of the method IV can be seen, and the method is consistent with the result in the paper of the method.
Through the tests on the data set I and the data set II, the point cloud Registration model combining the attention mechanism and the three-dimensional graph convolution network, which is provided by the invention, can be shown to be compared with the traditional Super4pcs, Fast Global Registration, a point cloud deep learning Registration model based on voxels and 3DFeat-Net, and the algorithm structure of the invention is optimal no matter from indoor or outdoor. The rotation resistance of the point cloud features is improved by improving the symmetric function and adopting a graph convolution mode, and the method is a method for effectively improving the point cloud registration result. The adaptability and robustness of the algorithm provided by the invention in each scene are verified.
The above description is only a preferred embodiment of the present invention, and it is not intended to limit the scope of the present invention, and various modifications and changes may be made by those skilled in the art. Variations, modifications, substitutions, integrations and parameter changes of the embodiments may be made without departing from the principle and spirit of the invention, which may be within the spirit and principle of the invention, by conventional substitution or may realize the same function.

Claims (6)

1. A point cloud registration model registration method combining an attention mechanism and a three-dimensional graph convolution network is characterized by comprising the following steps: firstly, model training is carried out, a feature alignment triplet loss structure loss function is used for training the model, and attention features and Descriptor (Descriptor) features are effectively extracted from point cloud; after model training, point cloud registration is carried out;
the model is a three-branch Siamese (Siamese) framework and comprises a Detector model and a Descriptor model, wherein the Detector model is used for extracting the attention characteristics of points and constructing an attention mechanism; the Descriptor model is used for generating an expression of three-dimensional depth features to express the three-dimensional depth features of the points and learning and distinguishing the depth features of the point cloud;
the Detector model mainly extracts the attention features of the generated point cloud through a spectral domain-based image volume network module MLP _ GCN, and 5 complete connecting layers are used in the MLP _ GCN module of the Detector model to extract the initial point cloud features, so that the function of extracting the point cloud features is further realized;
the Descriptor model firstly uses a set interaction (SA) module extracted by a PointNet + + midpoint set to extract initial features of point cloud, and then connects two graph convolution network modules MLP _ GCN based on a spectrum domain, thereby improving the depth and performance of a network and obtaining final three-dimensional depth features;
the point cloud registration comprises the steps of:
uniformly sampling Pm points from each group of the two groups of point clouds, and searching n points nearest to each point in the Pm points to form a Pm point set of each group of point clouds; putting the point sets into a model, and generating a depth feature and an attention feature of each point set;
the attention characteristics of each point utilize L2-a norm value to determine if the point is at a maximum in its neighborhood, and if so, to add it to a key point (keypoint) queue, whereby the key points of both sets of point clouds can be derived;
sequencing the obtained attention values of the key points, and selecting the top PkThe points are key points;
searching their respective n nearest neighbors according to each final keypoint to obtain a final PkPoint set, inputting the final point set into model to obtain PkDepth feature vectors of each point in the point set;
according to the depth characteristics of each point in the point set, determining the corresponding depth characteristics with the closest Euclidean distance in another point cloud, and further obtaining 2 multiplied by n multiplied by PkPoint cloud matching point pairs;
removing gross errors in the matching point set by using a RANSAC algorithm;
and calculating a rotation matrix by adopting a least square method to obtain a registration result.
2. The method of claim 1, wherein the two spectral domain-based map convolution network modules MLP _ GCN are connected to each other by a first MLP _ GCN having 3 fully-connected layers and an output feature dimension of n × 128; the second MLP _ GCN connects 3 fully connected layers with an output feature dimension of n × 256.
3. The method as claimed in claim 2, wherein the MLP _ GCN combines a multi-layer perceptron MLP and a graph convolution network GCN, so as to effectively extract depth features based on coordinates of the input point cloud, and improve the anti-rotation invariance and the discrimination of the features.
4. The method of claim 3, wherein the MLP _ GCN is constructed using PointNet + + sampling and grouping layers and 3 complete connection layers are connected to extract point cloud features Xn×128
5. The method of claim 4, wherein the MLP _ GCN inputs are point sets, the number of points in each point set is n, the nearest K points of each point are searched in one point set, each point is connected with the nearest K points to form an edge, a graph G is built, an adjacency matrix A and a depth matrix D are constructed, and a Laplace matrix L is calculatedn×n(ii) a Setting W as C for the parameter of the convolution kerneln×m,CnTaking the characteristic length of the point cloud, and taking m as the output characteristic dimension of graph convolution to obtain an L.X.W value as output; finally, a maximum pooling layer is connected to obtain deep learning features of the point cloud.
6. The method for registering a point cloud registration model combining an attention mechanism and a three-dimensional graph convolution network according to claim 1, wherein the model training comprises the following steps:
automatically constructing a matching point pair and a non-matching point pair, selecting one point from one data set, taking a corresponding point from the other data set to form the matching point pair, calling the point before registration as anchor, and the point after registration as positive, and then randomly selecting a point which is not the anchor and the positive as negative to form the non-matching point pair with the anchor point;
respectively searching K nearest neighbors of an anchor point, a positive point and a negative point by using a K-nearest neighbor (KNN) algorithm to form three point sets which are used as input of a deep learning network;
obtaining the depth characteristics and attention characteristics of anchors, positive and negative through the model;
the above features are added to the Feature Alignment triple Loss function to optimize the features and train the model.
CN202010717508.9A 2020-07-23 2020-07-23 Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network Active CN111882593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010717508.9A CN111882593B (en) 2020-07-23 2020-07-23 Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010717508.9A CN111882593B (en) 2020-07-23 2020-07-23 Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network

Publications (2)

Publication Number Publication Date
CN111882593A CN111882593A (en) 2020-11-03
CN111882593B true CN111882593B (en) 2022-06-17

Family

ID=73154775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010717508.9A Active CN111882593B (en) 2020-07-23 2020-07-23 Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network

Country Status (1)

Country Link
CN (1) CN111882593B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581515B (en) * 2020-11-13 2022-12-13 上海交通大学 Outdoor scene point cloud registration method based on graph neural network
CN112597796A (en) * 2020-11-18 2021-04-02 中国石油大学(华东) Robust point cloud representation learning method based on graph convolution
CN112801268B (en) * 2020-12-30 2022-09-13 上海大学 Positioning method based on graph convolution and multilayer perceptron hybrid network
CN112819080B (en) * 2021-02-05 2022-09-02 四川大学 High-precision universal three-dimensional point cloud identification method
CN112926452B (en) * 2021-02-25 2022-06-14 东北林业大学 Hyperspectral classification method and system based on GCN and GRU enhanced U-Net characteristics
CN113011501B (en) * 2021-03-22 2022-05-24 广东海启星海洋科技有限公司 Method and device for predicting typhoon water level based on graph convolution neural network
CN112991407B (en) * 2021-04-02 2022-06-28 浙江大学计算机创新技术研究院 Point cloud registration method based on non-local operation
CN112862730B (en) * 2021-04-26 2021-07-27 深圳大学 Point cloud feature enhancement method and device, computer equipment and storage medium
CN113139996B (en) * 2021-05-06 2024-02-06 南京大学 Point cloud registration method and system based on three-dimensional point cloud geometric feature learning
CN113223062B (en) * 2021-06-04 2024-05-07 武汉工控仪器仪表有限公司 Point cloud registration method based on corner feature point selection and quick description
CN113658236B (en) * 2021-08-11 2023-10-24 浙江大学计算机创新技术研究院 Incomplete point cloud registration method based on graph attention mechanism
CN113807366B (en) * 2021-09-16 2023-08-08 电子科技大学 Point cloud key point extraction method based on deep learning
CN114037743B (en) * 2021-10-26 2024-01-26 西北大学 Three-dimensional point cloud robust registration method for Qin warriors based on dynamic graph attention mechanism
CN113971690B (en) * 2021-10-28 2024-04-16 燕山大学 End-to-end three-dimensional point cloud registration method based on deep learning
CN114092650B (en) * 2021-11-30 2024-05-28 燕山大学 Three-dimensional point cloud generation method based on efficient graph convolution
CN114004871B (en) * 2022-01-04 2022-04-15 山东大学 Point cloud registration method and system based on point cloud completion
CN114973422A (en) * 2022-07-19 2022-08-30 南京应用数学中心 Gait recognition method based on three-dimensional human body modeling point cloud feature coding
CN115375902B (en) * 2022-10-26 2023-03-24 昆明理工大学 Multi-spectral laser radar point cloud data-based over-point segmentation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876831A (en) * 2018-06-08 2018-11-23 西北工业大学 A kind of building three-dimensional point cloud method for registering based on deep learning
CN109064502A (en) * 2018-07-11 2018-12-21 西北工业大学 The multi-source image method for registering combined based on deep learning and artificial design features
CN110910433A (en) * 2019-10-29 2020-03-24 太原师范学院 Point cloud matching method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876831A (en) * 2018-06-08 2018-11-23 西北工业大学 A kind of building three-dimensional point cloud method for registering based on deep learning
CN109064502A (en) * 2018-07-11 2018-12-21 西北工业大学 The multi-source image method for registering combined based on deep learning and artificial design features
CN110910433A (en) * 2019-10-29 2020-03-24 太原师范学院 Point cloud matching method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Closest Point: Learning Representations for Point Cloud Registration;Yue Wang等;《2019 IEEE/CVF International Conference on Computer Vision (ICCV)》;20200227;3523-3532 *
Deep Learning for 3D Point Clouds: A Survey;Yulan Guo等;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20200623;1-27 *
基于深度学习的点云匹配;梁振斌等;《计算机工程与设计》;20200615(第06期);197-201 *

Also Published As

Publication number Publication date
CN111882593A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN111882593B (en) Point cloud registration model and method combining attention mechanism and three-dimensional graph convolution network
Li et al. Hierarchical line matching based on line–junction–line structure descriptor and local homography estimation
CN109345574B (en) Laser radar three-dimensional mapping method based on semantic point cloud registration
Schindler et al. Detecting and matching repeated patterns for automatic geo-tagging in urban environments
Ding et al. Automatic registration of aerial imagery with untextured 3d lidar models
Karantzalos et al. Large-scale building reconstruction through information fusion and 3-d priors
CN108320323B (en) Building three-dimensional modeling method and device
CN105139379B (en) Based on the progressive extracting method of classified and layered airborne Lidar points cloud building top surface
Ni et al. HyperSfM
Wu et al. Automatic 3D reconstruction of electrical substation scene from LiDAR point cloud
CN104820718A (en) Image classification and searching method based on geographic position characteristics and overall situation vision characteristics
CN102208033B (en) Data clustering-based robust scale invariant feature transform (SIFT) feature matching method
Wei et al. Automatic coarse registration of point clouds using plane contour shape descriptor and topological graph voting
CN110111375A (en) A kind of Image Matching elimination of rough difference method and device under Delaunay triangulation network constraint
Palmer et al. Using focus of attention with the Hough transform for accurate line parameter estimation
Wang et al. A method for detecting windows from mobile LiDAR data
CN111709317A (en) Pedestrian re-identification method based on multi-scale features under saliency model
Guo et al. Line-based 3d building abstraction and polygonal surface reconstruction from images
Milde et al. Building reconstruction using a structural description based on a formal grammar
CN117053779A (en) Tightly coupled laser SLAM method and device based on redundant key frame removal
CN116129118A (en) Urban scene laser LiDAR point cloud semantic segmentation method based on graph convolution
Meixner et al. Interpretation of 2D and 3D building details on facades and roofs
CN115063615A (en) Repeated texture image matching method based on Delaunay triangulation
Harshit et al. Geometric Features Interpretation of Photogrammetric Point Cloud from Unmanned Aerial Vehicle
Liu et al. Segmentation and reconstruction of buildings with aerial oblique photography point clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant