CN112085066B

CN112085066B - Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network

Info

Publication number: CN112085066B
Application number: CN202010812456.3A
Authority: CN
Inventors: 朱博; 范希明; 高翔
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2020-08-13
Filing date: 2020-08-13
Publication date: 2022-08-26
Anticipated expiration: 2040-08-13
Also published as: CN112085066A

Abstract

The invention discloses a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which specifically comprises the following steps of: firstly, carrying out voxelization processing adaptive to rotation translation transformation on scene point cloud data obtained by a visual sensor; then, weighting the information of the points near each point to the point by a method based on graph neural network spectrum convolution for the point cloud in the voxel to obtain a characteristic vector of each point; numbering each point in the voxel one by one according to the spatial distance, performing maximum pooling on the feature vector of each point according to the number, and splicing the pooled results end to obtain the feature vector of each voxel; and finally, inputting the feature vector of the voxel into a full-connection network to obtain a scene class label. The method relieves the problem of high calculation complexity of a spectrum convolution method to a certain extent, and has certain robustness for point cloud rotation and translation transformation.

Description

Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network

Technical Field

The invention belongs to a method for identifying a dimensional indoor scene, and particularly relates to a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network.

Background

With the rapid development of computer hardware and theory, the acquisition and processing of three-dimensional data become easier and easier. The identification of three-dimensional point cloud scenes is a research hotspot in the field of robots and computer vision at present.

In the identification of three-dimensional point cloud scenes, methods based on manual descriptors such as a histogram oriented feature descriptor (SHOT) exist, but the manual descriptors have the problem of small application range; there is a method of extracting features of a voxelized point cloud by using 3D CNN, but there is a problem of high computational complexity; the method for converting the three-dimensional point cloud into the two-dimensional image recognition by using multi-angle projection is beneficial, but the problem of losing too much three-dimensional geometric information in the projection process exists; with the development of the neural network and the excellent effect of the neural network on visual information identification, the three-dimensional scene point cloud identification by using the neural network becomes a research hotspot on processing three-dimensional point cloud data.

At present, the research methods of the neural network are mainly divided into two categories. The method is a space domain method, namely, point cloud space position information obtained by a sensor is directly used as input, and original point cloud data are not transformed. The method has more research ideas, such as PointNet, the T-net is used for relieving the interference caused by point cloud rotation translation transformation, and a small multilayer perceptron is used as a convolution kernel to extract characteristic information; as another example, PointAtrousNet, the neighborhood information is weighted 4 times to the center point with a multi-tier perceptron. In the method in the airspace, the multilayer perceptron is adopted to extract the characteristic information, but the multilayer perceptron has poor interpretability, and the structural design of the multilayer perceptron depends on a large amount of debugging. The other is a spectral domain method, which transforms point cloud space information obtained by a sensor to a fourier space and then designs a convolution kernel, such as a local spectrum convolution kernel proposed by Michael et al. The convolution mode has definite meaning, but the calculation complexity of calculating the Laplace matrix on the whole point cloud is high.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which can reduce the calculation complexity of a spectrum convolution method and has certain robustness on rotation translation transformation.

The technical scheme is as follows: the invention discloses a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which comprises the following steps of:

(1) transforming the three-dimensional space coordinate of the point cloud obtained by the visual sensor by using a T-net network in PointNet, and voxelizing the point cloud transformed by the T-net network;

(2) weighting the information of the adjacent point of each point in each voxel to the point to obtain the characteristic vector of each point;

(3) numbering the points in the voxels one by one according to the spatial distance of each point in the voxels; then performing maximum pooling on the feature vectors of the numbered adjacent points, and splicing the pooled results head and tail to obtain the feature vector of each voxel;

(4) and (4) inputting the characteristic vector of each voxel in the step (3) into a full-connection network, and outputting the characteristic vector as a category label of the scene point cloud.

In the step (2), the spatial position information of a plurality of points adjacent to each point in each voxel is weighted to the point for a plurality of times in a mode of fusing PointAtrousNet and a local spectrum convolution kernel.

In the step (3), the voxel interior points are numbered one by one according to the natural number sequence, and the method specifically comprises the following steps:

(3.1) defining the radius rho of the node neighborhood;

(3.2) randomly selecting one unnumbered node to carry out sequential numbering;

(3.3) selecting the node closest to the current maximum number node in the neighborhood range and coding the node as a next number;

(3.4) determining whether all the nodes in the neighborhood of the current maximum number node are numbered;

(3.5) if all numbers are not available, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;

(3.6) if the nodes which are not numbered exist in the step (3.5), repeating the step (3.2); if all nodes in the voxels in step (3.5) are numbered, the numbering is completed.

In the step (3), the step of performing maximum pooling on the feature vectors of the points with adjacent numbers specifically includes that the points with the adjacent numbers are sequentially taken for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the point with the maximum norm is taken as a pooling result.

In the step (4), the voxel characteristic vector is used as the input of the full-connection network.

Has the advantages that: compared with the prior art, the invention has the beneficial effects that: (1) the method is suitable for being applied to three-dimensional point cloud data, and has better robustness on the interference of the image to scale, visual angle and light compared with a two-dimensional scene image; (2) the method also has certain robustness on the rotation and translation transformation of the point cloud; (3) the spectral convolution processing is used for the point cloud in the voxel, and compared with the spectral convolution processing of the whole point cloud, the calculation complexity is reduced.

Drawings

FIG. 1 is a flow chart of the modeling of the present invention;

FIG. 2 is a flow chart of the numbering of points within voxels in accordance with the present invention.

Detailed Description

The invention is described in further detail below with reference to specific embodiments and the attached drawings.

As shown in fig. 1, in the method for classifying a voxelized three-dimensional point cloud scene based on a graph convolution neural network, when the method is implemented specifically, a recognition model needs to be established first, then a large number of three-dimensional scene point cloud training models are used, and training is completed after iteration is performed to a preset number of times. When a new point cloud scene to be classified is classified, three-dimensional point cloud data of the scene is input into a trained model, the output is a class label vector of the scene, and the sequence number of the maximum value in the vector is the corresponding scene class. This embodiment is implemented with the help of a pcl library, and a pytorch library. The method comprises the following specific steps:

(1) and transforming the three-dimensional space coordinates of the point cloud obtained by the visual sensor by using a T-net network in PointNet, and voxelizing the point cloud transformed by the T-net network. The T-net network is a fully-connected network sharing parameters, the input of the T-net network is three-dimensional space coordinates of each point of the whole point cloud, the output of the T-net network is three-dimensional vectors of corresponding points, and the output three-dimensional vectors show the spatial relationship of each point before rotation and translation transformation to a certain extent. Through T-net network processing, the robustness of the whole algorithm for point cloud rotation and translation transformation is improved. In the embodiment, the size of the T-net is changed slightly, a full-connection network T-net is built by using a pytorch, and the number of neurons in hidden layers of the network is in a symmetrical structure of 1024-256-64-256-1024. The voxelization of the point cloud after T-net network transformation specifically comprises the following steps: according to the three-dimensional space coordinates of each point of the point cloud, performing voxelization on the initial point cloud by using a pcl point cloud library to obtain K voxels, and replacing information of each point in each voxel by a three-dimensional vector of each point after T-net transformation from the initial three-dimensional space coordinates. Finally, the voxel number and the three-dimensional vector of each point within the corresponding voxel are obtained, as in a data structure "dictionary":

{ "voxel 1": a point p three-dimensional vector, -; … "voxel k": a point q three-dimensional vector.

(2) And (3) weighting the spatial position information of a plurality of points adjacent to each point in each voxel for multiple times in a mode of credit fusion of PointAtrousNet and a local spectrum convolution kernel to the point to obtain the feature vector of each point. The method specifically comprises the following steps: and (3) according to the three-dimensional vectors of the points in the point cloud after T-net transformation, calculating the distance between the points in the voxel, traversing each point to find out 20 points closest to the point, connecting each point with the 20 nearest neighbors to build a graph, and calculating the Laplace matrix. For each point in the voxel, the information of 20 adjacent points is sampled by 4 times and sent into 4 independent spectrum convolution kernels, and the sampling rate is 1,2,3 and 4. That is, each convolution kernel takes information of a target point and 5 neighboring points as input, the input vector dimension is 3, the weighted information of the target point is taken as output, and the output vector dimension is also 3. Each convolution kernel is built by a pytorech, 4 independent spectrum convolution kernels are adopted in the embodiment, each convolution kernel comprises N depths, each depth comprises 6 parameters to be trained, and the total number of the parameters is 24 xN. Parameters in each depth are trained independently and share parameters. The above is a layer of convolution operation, and the same convolution operation is performed 4 times. And connecting the convolution layers with the same structure end to end, and taking the output of the previous layer of convolution kernel with the same sampling rate as the input of the next layer of convolution kernel. After the last layer of convolution layer, the convolution results of all depths of all convolution kernels are spliced end to obtain 12 xN-dimension characteristic vectors of all points. The feature vectors of each point are recorded in the pytorch in the following data structure:

{ "voxel 1": point 1 feature vector,. point p feature vector; … "voxel k": point 1 eigenvector,. point q eigenvector ] }.

(3) Numbering the points in the voxels one by one according to a natural number sequence according to the spatial distance of each point in the voxels, so that the two points with the closest spatial distance are numbered adjacently; then maximum pooling is carried out on the feature vectors of the points with adjacent numbers, namely, the points in each voxel are sequentially taken several adjacent numbered points according to the number sequence for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the expression is

And taking the point with the maximum norm as the pooling result. The end-to-end stitching pooling results are then used as the feature vectors of the voxels, whichThe dimension is 12xNxm, and m is the number of pooled retention points per voxel. Because the number of points in each voxel is different, the pooling size in each voxel is flexibly adjusted, and the consistency of the feature vector size of each voxel is ensured.

(4) A fully-connected network is built by using a pytorech, a ReLU is selected as an activation function, Adam is used as an optimizer, and a MultiLabelSoftMarginLoss of the pytorech is used as a loss function. Inputting the K12 xNxm dimensional voxel characteristic vectors obtained in the step (3) into a full connection layer, and outputting the vectors as the labels of the scene categories. The class labels of the training data and the testing data are all in a one-hot encoding mode, and the last layer of the fully-connected network comprises a softmax layer.

In the step (2), the calculation formula of the local spectrum convolution kernel is as follows:

wherein f is the convolution result; f. of _j The information of 1 target point and 5 adjacent points after T-net transformation in the step (1); u is a characteristic vector matrix after decomposition of a characteristic value of a Laplace matrix of a point cloud in a voxel; u shape ^T Is the transposition of U; lambda is an eigenvalue diagonal matrix after characteristic decomposition of the Laplace matrix, alpha _j Are the convolution kernel parameters to be trained.

As shown in fig. 2, in step (3), numbering the points in the voxel one by one, specifically includes the following steps:

(3.1) defining the radius rho of the node neighborhood;

(3.2) randomly selecting one unnumbered node to carry out sequential numbering;

(3.5) if not all numbers, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;

Claims

1. The method for classifying the voxelized three-dimensional point cloud scene based on the graph convolution neural network is characterized by comprising the following steps of:

(1) transforming the point cloud three-dimensional space coordinate obtained by the visual sensor by using a T-net network in PointNet, and finely adjusting the size of the T-net network, wherein the number of hidden layer neurons of the network is a symmetrical structure; performing voxelization on the point cloud after T-net network transformation, and performing voxelization on the initial point cloud according to the three-dimensional space coordinates of each point of the point cloud to obtain a plurality of voxels; then replacing the information of each point in the voxel by the three-dimensional vector of each point after T-net transformation from the initial three-dimensional space coordinate; finally obtaining the voxel number and the three-dimensional vector of each point in the corresponding voxel;

(2) weighting the information of each point adjacent point in each voxel to the point by fusing PointAtrousNet and a local spectrum convolution kernel to obtain the characteristic vector of each point;

(3) numbering the points in the voxels one by one according to the space distance of each point in the voxels; then performing maximum pooling on the feature vectors of the numbered adjacent points, and splicing the pooled results head and tail to obtain the feature vector of each voxel;

2. The method of claim 1, wherein the method comprises the steps of: in the step (2), the spatial position information of a plurality of points adjacent to each point in each voxel is weighted to the point for a plurality of times in a mode of fusing PointAtrousNet and a local spectrum convolution kernel.

3. The method for classifying the voxelized three-dimensional point cloud scene based on the graph convolution neural network according to claim 1, wherein in the step (3), the points in the voxels are numbered one by one according to a natural number sequence, and the method specifically comprises the following steps:

(3.1) defining the radius rho of the node neighborhood;

(3.2) randomly selecting one unnumbered node to carry out sequential numbering;

(3.4) determining whether all nodes in the neighborhood of the current maximum number node are numbered;

4. The method of claim 1, wherein the method comprises the steps of: in the step (3), the step of performing maximum pooling on the feature vectors of the points with adjacent numbers specifically includes that the points with the adjacent numbers are sequentially taken for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the point with the maximum norm is taken as a pooling result.

5. The method of any one of claims 1 to 4 for classifying a scene from a voxelized three-dimensional point cloud based on a graph convolution neural network, wherein: in the step (4), the voxel characteristic vector is used as the input of the full-connection network.