CN111783838A

CN111783838A - Point cloud characteristic space representation method for laser SLAM

Info

Publication number: CN111783838A
Application number: CN202010504793.6A
Authority: CN
Inventors: 莫凌飞; 索传哲
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-10-16

Abstract

The invention discloses a point cloud feature space characterization method for laser SLAM, which is used for establishing bidirectional mapping between a point cloud space and a point cloud feature space based on a deep learning network, wherein the feature space realizes key problems of closed-loop detection, relocation and the like in the laser SLAM and map compression storage and transmission. The method comprises the steps of realizing global description feature extraction and compression reconstruction of large scene point clouds through a neural network of a self-encoder structure, designing a characteristic space of the point clouds formed by extracting global description features through the encoder network, giving similarity measurement of scenes according to the distance in the characteristic space, judging whether two or more scene structures are similar, and realizing closed-loop detection and relocation of laser SLAM; reconstructing the point cloud through a designed decoder network, reconstructing an original point cloud from global description features extracted from an encoder network, and realizing compressed storage and low-bandwidth transmission of a point cloud map; the constructed encoder network does not need to be trained in advance according to the point cloud map, and has strong generalization capability.

Description

Point cloud characteristic space representation method for laser SLAM

Technical Field

The invention relates to the technical field of laser point cloud mapping and autonomous navigation, in particular to a point cloud feature space characterization method for laser SLAM.

Background

How to enable the mobile robot to better understand and perceive the surrounding environment and achieve flexible and reliable high-level autonomous navigation, the artificial intelligence technology promoted by deep learning brings high-speed development for environment perception understanding and autonomous navigation of the mobile robot, and particularly, the application of laser radar enables the robot to directly perceive a three-dimensional environment, but also brings challenges for point cloud data processing. Synchronous positioning And map building (SLAM) is one of the basic And key technologies for realizing autonomous navigation positioning of a mobile robot, And aims to establish a local map And determine the position of the robot in the map when the robot enters an unknown environment. The SLAM mainly comprises four parts, namely a front-end odometer, rear-end pose graph optimization, closed-loop detection and map construction, wherein the closed-loop detection is one of key components, the problem that whether a mobile robot returns to a previously visited environment is solved, a closed-loop scene is accurately detected, long-time accumulated errors are corrected for the SLAM system, and the establishment of globally consistent pose estimation and a map is very important.

The method is characterized in that how to make a robot recognize the environment is an important problem, closed loop detection is also a scene recognition problem in nature, and environment recognition and relocation are carried out through sensor data such as a camera, a laser radar and the like. Scene recognition is always a hotspot and difficult problem in the field of pattern recognition, and a camera needs to overcome adverse effects of factors such as illumination, visual angle and the like on scene recognition and extract useful information in the scene recognition. Compared with a camera, the laser radar can overcome the influence caused by illumination, seasons, weather and the like, but objects in laser point cloud scene data are various and comprise a large number of dynamic scenes, even if the same scene has larger difference under various external influence factors at different time, the scene identification problem is very complex, and the key of the scene identification lies in how to find out a global description feature which can completely represent the structural information and semantic information of the same scene, in particular to a feature characterization method for large-range laser point cloud.

The traditional environment recognition algorithm bag-of-words model (BoW), FV (Fisher vector), VLAD and the like based on point cloud usually depend on global, offline and high-resolution maps, a codebook needs to be trained in advance according to the maps to realize high-precision positioning, a scene recognition method based on 3D segmentation matching is provided by Segmatch based on deep learning, local matching is carried out through segmented scene blocks, the reliability of the local matching and the scene recognition is ensured through geometric consistency check, but for dynamic scene application, the static assumption of fast scene cannot be met. The PointNet and NetVLAD are combined, the PointNet and the NetVLAD are used for feature learning, the NetVLAD is used for feature aggregation, global description features can be learned, the PCAN based on point cloud deep learning is also used, the contribution degrees of different local features are considered on the basis of the PointNet VLAD, a self-attention mechanism is introduced, the contribution weights of the different local features are learned through the network during feature aggregation, but the local geometric features and semantic context features, point cloud neighborhood relations and feature space distribution are not considered in the method, and the effect in large-scene point cloud scene recognition feature coding is poor.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a point cloud feature space characterization method for laser SLAM, which can extract point cloud global description features representing scene structure information and semantic information from complex large-scene laser point clouds and restore and reconstruct original point clouds from the features, and is used for closed-loop detection, relocation, compression storage and transmission of the laser SLAM.

In order to solve the technical problem, the invention provides a point cloud feature space characterization method for a laser SLAM, which comprises the following steps:

s1, filtering and downsampling the point cloud samples in the point cloud data set, and constructing a training point cloud sample pair according to the similarity of the point cloud scenes, wherein the training point cloud sample pair comprises a positive sample p_posAnd negative sample p_neg；

s2, constructing an encoder network, inputting the training point cloud sample pair in the step s1, and obtaining global description characteristics f (P) of the input point cloud P through a point cloud self-adaptive neighborhood characteristic extraction module, a dynamic graph network aggregation module and a global description characteristic generation module;

s3, constructing a decoder network, inputting the global description characteristics generated in the step s2, generating a strategy through layeringGenerating refined and densified reconstructed point cloud from contour of key point

s4, taking sample pairs of the training data set in s1, and training the encoder network of s2 to obtain an encoder network model and parameters capable of measuring point cloud scene similarity;

s5, taking sample pairs of the training data set in s1 and corresponding global description features generated by an s2 encoder network, and training the decoder network of s3 to obtain a decoder network model and parameters capable of restoring and reconstructing original point clouds from the global description features;

s6, in the actual SLAM operation, taking the encoder network model and parameters in s4, extracting global description characteristics of the point cloud key frame, carrying out NN matching on the point cloud key frame and the global description characteristics of the historical key frame according to Euclidean distance, and taking the point cloud key frame which meets the matching conditions as a closed-loop candidate frame or a repositioning candidate position;

s7, in the actual SLAM operation, storing or transmitting the global description characteristics of the point cloud key frame extracted in s6, and restoring the global description characteristics to reconstruct the original SLAM point cloud map through the decoder network model and parameters in s 5;

s8, in the actual SLAM operation, the global description features extracted in s6 are taken, NN matching is conducted on the global description features and the global description features of the historical key frames of the multiple collaborative mapping robots according to Euclidean distances, and the multi-robot laser SLAM collaborative mapping task is conducted in the point cloud feature space through the point cloud feature space characterization method.

Preferably, in step s1, the similarity of the training sample pair is determined by the position coordinates of the samples in the map, the position interval is within 10m and is regarded as a positive sample with similar structure, the distance between 10m and 50 m is regarded as a negative sample selection range, the negative sample is randomly selected, and the randomly selected sample with the distance outside 50 m is regarded as a very dissimilar extreme negative sample.

Preferably, in step s2, the point cloud adaptive neighborhood feature extraction module of the encoder network uses the range [ k ]_min,k_max]Constructing each point p in a point cloud_iK NN neighborhood of

Eigenvalues λ of covariance matrix Σ using neighborhood point clouds₁≥λ₂≥λ₃Computing linearity of point cloud neighborhood not less than 0

Planarity

And scattering property

Is characterized by and measures an equation E through Shannon information entropy_kAdaptive determination of optimal neighborhood size

Extracting neighborhood characteristics as characteristics of points through a multilayer perceptron characteristic extraction network;

wherein:

v is a feature vector matrix, E_k＝-L_λlnL_λ-P_λlnP_λ-S_λlnS_λ。

Preferably, in step s2, the dynamic graph network aggregation module is composed of two-part aggregation:

(a) and (3) feature space dynamic graph network aggregation: performing k-fixed kNN on all points in the feature space in the step (3) to construct a dynamic graph, performing feature aggregation through a graph network, and updating the features of the points; the characteristic aggregation process of the graph network comprises the following steps: f. of_i'＝ρ(φ_Θ((f_i,f_j))＝ρ(φ_Θ(f_i‖(f_j-f_i)))，φ_ΘThe method is characterized in that the method is a multilayer perceptron MLP, and rho is maximum pooling;

(b) physical space dynamic graph network aggregation: all click-throughs in the physical space of the point cloudConstructing a dynamic graph by the kNN of the line fixed k, and updating the characteristics of the points by utilizing the characteristics of the points updated in the step (a) through a graph network; center point p_iThe constructed dynamic graph G (V, E) is composed of nodes V and edges E, and points p_iIs characterized in that_iAnd the node V is the feature f of the neighborhood point_jAnd the edge E is the relation (f) between the central point and the characteristics of the neighborhood points_i,f_j) Wherein: (f)_i,f_j)＝f_i‖(f_j-f_i) And | is characterized by splicing according to the channels.

Preferably, in step s2, the global description feature generation module generates the global description feature through the NetVLAD Layer.

Preferably, in step s3, the hierarchical generation strategy includes two parts:

(a) the key point outline generating network comprises the following steps: generating a rough key point outline of points through a three-layer fully-connected network and Reshape;

(b) refining dense point cloud generating network, namely splicing η×η points of 2D plane mesh of each point in the key points generated in the step (a)

Splicing the global description characteristics generated in the step s2 according to channels, and generating dense reconstruction point cloud through MLP;

wherein:

η is the number of dots on the edge of the planar grid and ξ is the size of the planar grid dots.

Preferably, in step s4, the encoder network is trained using a metric learning method, and a quadruple loss function is used

Semi-supervised training leads to input samples and positive samples p_posGlobal of each otherEuclidean distance of descriptive features_posCloser, with negative example p_negBetween the global descriptor feature of the Euclidean distance_negFarther, extremely negative sample

For the zooming-in for preventing the error of the global description characteristic distance between the negative sample pairs, the Euclidean distance between the extreme negative sample and the negative sample is

Wherein:

α is a hyperparameter.

Preferably, in step s5, the decoder network is trained in a supervised manner, using the loss function

Constraining Euclidean distance between the reconstructed point cloud and the truth point cloud, wherein

Applying the keypoint contours generated in step s2 (a) for constraining the physical spatial distribution of the point cloud,

applied to dense reconstructed point clouds;

wherein:

psi (p) is

The corresponding point of p in (1) is,

preferably, in steps s6 and s8, NN matching of global description features of the point cloud key frames is determined by sequence matching of consecutive N current point cloud frames and consecutive N historical point cloud frames, and if the euclidean distance of the global description features between consecutive sequence matching key frames is smaller than a set threshold and the gradient change compared with the feature distance of other historical key frames is larger than the set threshold, it may be regarded as a closed-loop detection candidate sequence key frame or a relocation candidate position; if a plurality of robots provide current continuous point cloud frames and historical key frames, establishing a map for the plurality of robots cooperatively; and performing a laser SLAM closed-loop task, a relocation task and a multi-robot cooperative task in the point cloud feature space by the point cloud feature space characterization method.

Preferably, in step s7, the encoder network in s4 is used to extract global description features from the keyframes in the laser point cloud map as compressed codes for storage or communication transmission, and then the decoder network in s5 is used to restore the storage features or transmission and reception features to reconstruct the original laser point cloud map.

The invention has the beneficial effects that: (1) an encoder network is designed to extract information of geometric structural features and semantic context features of large-scene laser point cloud data, point cloud scene information is fully mined, global description features are generated to form a feature space of point cloud, and similarity measurement of scenes is given according to the distance in the feature space and used for judging whether two or more scene structures are similar or not, and closed-loop detection and relocation of laser SLAM can be realized in the point cloud feature space; (2) reconstructing the point cloud through a designed decoder network, reconstructing an original point cloud from global description features extracted from an encoder network, and realizing compressed storage and low-bandwidth transmission of a point cloud map; (3) the constructed encoder network does not need to be trained in advance according to the point cloud map, and has strong generalization capability.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of a network model of an encoder according to the present invention.

Fig. 3 is a schematic diagram of a decoder network model structure according to the present invention.

Detailed Description

As shown in fig. 1, a method for characterizing a point cloud feature space of a laser SLAM includes the following steps:

step 1: preprocessing point cloud samples in the point cloud data set such as filtering and downsampling, and constructing training point cloud sample pairs according to similarity of point cloud scenes, wherein the training point cloud sample pairs comprise positive samples p_posAnd negative sample p_neg。

The similarity of the constructed training sample pairs is determined by the position coordinate assistance judgment of the samples in the map, the position interval is within 10m and is regarded as a positive sample with similar structure, the distance between 10m and 50 m is regarded as a negative sample selection range, a negative sample is randomly selected, and the randomly selected samples beyond 50 m are regarded as extremely dissimilar polar and negative samples.

The method specifically comprises the following steps: an Oxford RobotCar dataset is chosen, the point cloud range is set to [ x: + -30 m, y: + -20 m, z:0-10m ] using a straight-through filter, and the point cloud is downsampled to 4096 points using a random voxel filter and normalized to [ -1 ].

Step 2: the encoder network is constructed by the network structure design shown in fig. 2, the training point cloud sample pair in the step 1 is input, and the global description feature f (P) of the input point cloud P is obtained by the point cloud self-adaptive neighborhood feature extraction module, the dynamic graph network aggregation module and the global description feature generation module.

Point cloud adaptive neighborhood feature extraction module for encoder networks by range k_min,k_max]Constructing each point p in a point cloud_iK NN neighborhood of

Planarity

And scattering property

Extracting neighborhood characteristics as characteristics of points through an MLP (multi-layer perceptron) characteristic extraction network.

Wherein:

v is a feature vector matrix

E_k＝-L_λlnL_λ-P_λlnP_λ-S_λlnS_λ

The method specifically comprises the following steps: setting a point cloud self-adaptive neighborhood characteristic range [20-100], setting the step length as 10, and calculating and determining the size of the optimal neighborhood.

The dynamic graph network aggregation module is composed of two parts of aggregation:

(a) and (3) feature space dynamic graph network aggregation: constructing a dynamic graph by performing kNN of a fixed k on all points of the feature space of claim 3, performing feature aggregation through a graph network, and updating the features of the points;

(b) physical space dynamic graph network aggregation: and (3) performing k-fixed kNN on all points in the point cloud physical space to construct a dynamic graph, and performing feature update by using the updated features of the points in the step (a) through a graph network to update the features of the points. Center point p_iThe constructed dynamic graph G (V, E) is composed of nodes V and edges E, and points p_iIs characterized in that_iAnd the node V is the feature f of the neighborhood point_jAnd the edge E is the relation (f) between the central point and the characteristics of the neighborhood points_i,f_j)。

Wherein: (f)_i,f_j)＝f_i‖(f_j-f_i)，II, splicing according to channels by using characteristics;

the characteristic aggregation process of the graph network comprises the following steps:

f_i'＝ρ(φ_Θ((f_i,f_j))＝ρ(φ_Θ(f_i‖(f_j-f_i)))，φ_Θfor MLP (multilayer perceptron), ρ is the maximum pooling.

The method specifically comprises the following steps: dynamic graph construction kNN, phi using a fixed k of 20_ΘMLP in two layers (64,64)

Preferably, the global description feature generation module generates the global description feature through a NetVLAD Layer.

The method specifically comprises the following steps: the NetVLAD Layer selects the central feature dimension D as 256 and selects K as 64 central features.

And the global description feature generation module generates a global description feature through the NetVLAD Layer.

And step 3: constructing a decoder network by designing a network structure as shown in FIG. 3, inputting the global description features generated in step 2, generating key point outlines and generating refined and densified reconstructed point clouds by a hierarchical generation strategy

The hierarchical generation strategy comprises two parts:

And splicing the global description characteristics generated in the step 2 according to channels, and generating dense reconstruction point cloud through MLP (multi-layer perceptron).

Wherein:

The method specifically comprises the following steps: the keypoint profile generation network generated a rough profile containing 4096 keypoints and a 2 × 2 point planar network was stitched with a size ξ of 0.05.

And 4, step 4: and (3) taking a sample pair of the training data set in the step (1), and training the encoder network in the step (2) to obtain an encoder network model and parameters capable of measuring the similarity of the point cloud scene.

Training an encoder network using a metric learning method, using a quadruple loss function

Semi-supervised training leads to input samples and positive samples p_posBetween global descriptor feature_posCloser, with negative example p_negBetween the global descriptor feature of the Euclidean distance_negFarther, extremely negative sample

Wherein:

α is a hyperparameter.

The method specifically comprises the following steps: setting the hyper-parameter alpha to 0.5 and beta to 0.2, and selecting 2 positive samples, 18 negative samples and 1 randomly selected extreme negative sample during training.

And 5: and (3) taking the sample pairs of the training data set in the step (1) and the corresponding global description features generated by the encoder network in the step (2), and training the decoder network in the step (3) to obtain a decoder network model and parameters which can restore and reconstruct the original point cloud from the global description features.

In step 5, the decoder network is trained in a supervised manner, using the loss function

Applied to the generated key point contour for constraining the physical spatial distribution of the point cloud,

applied to dense reconstructed point clouds.

Wherein:

psi (p) is

Corresponding point of (3)

Step 6: in the actual SLAM operation, the encoder network model and parameters in step 4 are taken, global description features are extracted from the point cloud key frames, NN matching is carried out on the point cloud key frames and the global description features of the historical key frames according to Euclidean distances, and the point cloud key frames which meet matching conditions are used as closed-loop candidate frames or relocation candidate positions.

In step 6, the NN matching of the global description features of the point cloud key frames is determined by sequence matching of N consecutive current point cloud frames and N consecutive historical point cloud frames, and if the euclidean distance of the global description features between consecutive sequence matching key frames is smaller than a set threshold and the gradient change compared with the feature distance of other historical key frames is larger than the set threshold, the NN matching may be regarded as a closed-loop detection candidate sequence key frame or a relocation candidate position. And performing laser SLAM closed-loop task and repositioning task in the point cloud characteristic space by the point cloud characteristic space characterization method.

And 7: in the actual SLAM operation, the global description features of the point cloud key frame extracted in the step 6 are stored or transmitted, and the global description features can be restored to reconstruct the original SLAM point cloud map through the decoder network model and the parameters in the step 5.

And (4) extracting global description features from the key frames in the laser point cloud map by using the encoder network in the step (4) as compression codes for storage or communication transmission, and restoring the storage features or transmission and reception features by using the decoder network in the step (5) to reconstruct the original laser point cloud map.

And 8: in the actual SLAM operation, the global description features extracted in s6 are taken, NN matching is carried out on the global description features and the global description features of the historical key frames of the multiple collaborative mapping robots according to Euclidean distances, and the multi-robot laser SLAM collaborative mapping task is carried out in the point cloud feature space through the point cloud feature space characterization method.

In step 8, if a plurality of robots provide current continuous point cloud frames and historical key frames, multi-robot collaborative map building is performed, NN matching is performed on global description characteristics of all point cloud historical key frames, and multi-robot collaborative map building can be achieved. And performing a laser SLAM multi-robot cooperative task in the characteristic space of the point cloud by the point cloud characteristic space characterization method.

The invention has the beneficial effects that: (1) an encoder network is designed to extract information of geometric structural features and semantic context features of large-scene laser point cloud data, scene recognition is realized by the point cloud feature space representation method in a feature retrieval mode, and the average recall rate in an Oxford RobotCar data set scene recognition task reaches 93.35%; (2) reconstructing the point cloud through a designed decoder network, reconstructing an original point cloud from global description features extracted from an encoder network, and realizing compressed storage and low-bandwidth transmission of a point cloud map; (3) the constructed encoder network does not need to be trained in advance according to the point cloud map, has strong generalization capability, and can be directly applied to various data sets such as KITTI without training.

Claims

1. A point cloud feature space characterization method for laser SLAM is characterized by comprising the following steps:

s3, constructing a decoder network, inputting the global description characteristics generated in the step s2, generating key point outlines to generate refined and densified reconstructed point clouds through a layering generation strategy

2. The method as claimed in claim 1, wherein in step s1, the similarity of the training sample pair is determined by the position coordinates of the samples in the map, the position interval is within 10m and is regarded as a positive sample with similar structure, the distance between 10m and 50 m is regarded as a negative sample selection range, the negative sample is randomly selected, and the randomly selected samples beyond 50 m are regarded as polar and negative samples with very dissimilar structure.

3. The method of claim 1, wherein in step s2, the point cloud adaptive neighborhood feature extraction module of the encoder network uses the range [ k ] as the point cloud feature extraction module_min,k_max]Constructing each point p in a point cloud_iK NN neighborhood of N_iUsing the eigenvalues λ of the covariance matrix Σ of the neighborhood point cloud₁≥λ₂≥λ₃Computing linearity of point cloud neighborhood not less than 0

Planarity

And scattering property

wherein:

v is a feature vector matrix, E_k＝-L_λln L_λ-P_λln P_λ-S_λlnS_λ。

4. The method of claim 1, wherein in step s2, the dynamic graph network aggregation module is composed of two parts:

(b) physical space dynamic graph network aggregation: performing k-fixed kNN on all points in the point cloud physical space to construct a dynamic graph, and performing feature update by using the updated characteristics of the points in the step (a) through a graph network to update the characteristics of the points; center point p_iThe constructed dynamic graph G (V, E) is composed of nodes V and edges E, and points p_iIs characterized in that_iAnd the node V is the feature f of the neighborhood point_jAnd the edge E is the relation (f) between the central point and the characteristics of the neighborhood points_i,f_j) Wherein: (f)_i,f_j)＝f_i‖(f_j-f_i) And | is characterized by splicing according to the channels.

5. The method of claim 1, wherein in step s2, the global description feature generation module generates the global description feature through a NetVLAD Layer.

6. The method of claim 1, wherein in step s3, the hierarchical generation strategy comprises two parts:

wherein:

7. The method of claim 1, wherein in step s4, the encoder network is trained using a metric learning method, and a quadruple loss function is used to characterize the point cloud feature space

Semi-supervised training leads to input samples and positive samples p_posBetween global descriptor feature_posCloser, with negative example p_negBetween the global description feature of the Euclidean distanceSeparation device_negFarther, extremely negative sample

Wherein:

α is a hyperparameter.

8. The method of claim 1, wherein in step s5, the decoder network is trained in a supervised manner using a loss function

applied to dense reconstructed point clouds;

wherein:

psi (p) is

The corresponding point of p in (1) is,

9. the method according to claim 1, wherein in steps s6 and s8, NN matching of global description features of point cloud key frames is determined by sequence matching of consecutive N current point cloud frames and consecutive N historical point cloud frames, and if euclidean distance of global description features between consecutive sequence matching key frames is smaller than a set threshold and gradient change compared with other historical key frame feature distances is larger than a set threshold, it is considered as a closed-loop detection candidate sequence key frame or a relocation candidate position; if a plurality of robots provide current continuous point cloud frames and historical key frames, establishing a map for the plurality of robots cooperatively; and performing a laser SLAM closed-loop task, a relocation task and a multi-robot cooperative task in the point cloud feature space by the point cloud feature space characterization method.

10. The method as claimed in claim 1, wherein in step s7, the encoder network of s4 is used to extract global description features from key frames in the laser point cloud map as compressed codes for storage or communication transmission, and the decoder network of s5 is used to restore the storage features or transmission reception features to reconstruct the original laser point cloud map.