CN112085066B - Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network - Google Patents

Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network Download PDF

Info

Publication number
CN112085066B
CN112085066B CN202010812456.3A CN202010812456A CN112085066B CN 112085066 B CN112085066 B CN 112085066B CN 202010812456 A CN202010812456 A CN 202010812456A CN 112085066 B CN112085066 B CN 112085066B
Authority
CN
China
Prior art keywords
point
voxel
point cloud
dimensional
numbered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010812456.3A
Other languages
Chinese (zh)
Other versions
CN112085066A (en
Inventor
朱博
范希明
高翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010812456.3A priority Critical patent/CN112085066B/en
Publication of CN112085066A publication Critical patent/CN112085066A/en
Application granted granted Critical
Publication of CN112085066B publication Critical patent/CN112085066B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which specifically comprises the following steps of: firstly, carrying out voxelization processing adaptive to rotation translation transformation on scene point cloud data obtained by a visual sensor; then, weighting the information of the points near each point to the point by a method based on graph neural network spectrum convolution for the point cloud in the voxel to obtain a characteristic vector of each point; numbering each point in the voxel one by one according to the spatial distance, performing maximum pooling on the feature vector of each point according to the number, and splicing the pooled results end to obtain the feature vector of each voxel; and finally, inputting the feature vector of the voxel into a full-connection network to obtain a scene class label. The method relieves the problem of high calculation complexity of a spectrum convolution method to a certain extent, and has certain robustness for point cloud rotation and translation transformation.

Description

Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network
Technical Field
The invention belongs to a method for identifying a dimensional indoor scene, and particularly relates to a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network.
Background
With the rapid development of computer hardware and theory, the acquisition and processing of three-dimensional data become easier and easier. The identification of three-dimensional point cloud scenes is a research hotspot in the field of robots and computer vision at present.
In the identification of three-dimensional point cloud scenes, methods based on manual descriptors such as a histogram oriented feature descriptor (SHOT) exist, but the manual descriptors have the problem of small application range; there is a method of extracting features of a voxelized point cloud by using 3D CNN, but there is a problem of high computational complexity; the method for converting the three-dimensional point cloud into the two-dimensional image recognition by using multi-angle projection is beneficial, but the problem of losing too much three-dimensional geometric information in the projection process exists; with the development of the neural network and the excellent effect of the neural network on visual information identification, the three-dimensional scene point cloud identification by using the neural network becomes a research hotspot on processing three-dimensional point cloud data.
At present, the research methods of the neural network are mainly divided into two categories. The method is a space domain method, namely, point cloud space position information obtained by a sensor is directly used as input, and original point cloud data are not transformed. The method has more research ideas, such as PointNet, the T-net is used for relieving the interference caused by point cloud rotation translation transformation, and a small multilayer perceptron is used as a convolution kernel to extract characteristic information; as another example, PointAtrousNet, the neighborhood information is weighted 4 times to the center point with a multi-tier perceptron. In the method in the airspace, the multilayer perceptron is adopted to extract the characteristic information, but the multilayer perceptron has poor interpretability, and the structural design of the multilayer perceptron depends on a large amount of debugging. The other is a spectral domain method, which transforms point cloud space information obtained by a sensor to a fourier space and then designs a convolution kernel, such as a local spectrum convolution kernel proposed by Michael et al. The convolution mode has definite meaning, but the calculation complexity of calculating the Laplace matrix on the whole point cloud is high.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which can reduce the calculation complexity of a spectrum convolution method and has certain robustness on rotation translation transformation.
The technical scheme is as follows: the invention discloses a voxelized three-dimensional point cloud scene classification method based on a graph convolution neural network, which comprises the following steps of:
(1) transforming the three-dimensional space coordinate of the point cloud obtained by the visual sensor by using a T-net network in PointNet, and voxelizing the point cloud transformed by the T-net network;
(2) weighting the information of the adjacent point of each point in each voxel to the point to obtain the characteristic vector of each point;
(3) numbering the points in the voxels one by one according to the spatial distance of each point in the voxels; then performing maximum pooling on the feature vectors of the numbered adjacent points, and splicing the pooled results head and tail to obtain the feature vector of each voxel;
(4) and (4) inputting the characteristic vector of each voxel in the step (3) into a full-connection network, and outputting the characteristic vector as a category label of the scene point cloud.
In the step (2), the spatial position information of a plurality of points adjacent to each point in each voxel is weighted to the point for a plurality of times in a mode of fusing PointAtrousNet and a local spectrum convolution kernel.
In the step (3), the voxel interior points are numbered one by one according to the natural number sequence, and the method specifically comprises the following steps:
(3.1) defining the radius rho of the node neighborhood;
(3.2) randomly selecting one unnumbered node to carry out sequential numbering;
(3.3) selecting the node closest to the current maximum number node in the neighborhood range and coding the node as a next number;
(3.4) determining whether all the nodes in the neighborhood of the current maximum number node are numbered;
(3.5) if all numbers are not available, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;
(3.6) if the nodes which are not numbered exist in the step (3.5), repeating the step (3.2); if all nodes in the voxels in step (3.5) are numbered, the numbering is completed.
In the step (3), the step of performing maximum pooling on the feature vectors of the points with adjacent numbers specifically includes that the points with the adjacent numbers are sequentially taken for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the point with the maximum norm is taken as a pooling result.
In the step (4), the voxel characteristic vector is used as the input of the full-connection network.
Has the advantages that: compared with the prior art, the invention has the beneficial effects that: (1) the method is suitable for being applied to three-dimensional point cloud data, and has better robustness on the interference of the image to scale, visual angle and light compared with a two-dimensional scene image; (2) the method also has certain robustness on the rotation and translation transformation of the point cloud; (3) the spectral convolution processing is used for the point cloud in the voxel, and compared with the spectral convolution processing of the whole point cloud, the calculation complexity is reduced.
Drawings
FIG. 1 is a flow chart of the modeling of the present invention;
FIG. 2 is a flow chart of the numbering of points within voxels in accordance with the present invention.
Detailed Description
The invention is described in further detail below with reference to specific embodiments and the attached drawings.
As shown in fig. 1, in the method for classifying a voxelized three-dimensional point cloud scene based on a graph convolution neural network, when the method is implemented specifically, a recognition model needs to be established first, then a large number of three-dimensional scene point cloud training models are used, and training is completed after iteration is performed to a preset number of times. When a new point cloud scene to be classified is classified, three-dimensional point cloud data of the scene is input into a trained model, the output is a class label vector of the scene, and the sequence number of the maximum value in the vector is the corresponding scene class. This embodiment is implemented with the help of a pcl library, and a pytorch library. The method comprises the following specific steps:
(1) and transforming the three-dimensional space coordinates of the point cloud obtained by the visual sensor by using a T-net network in PointNet, and voxelizing the point cloud transformed by the T-net network. The T-net network is a fully-connected network sharing parameters, the input of the T-net network is three-dimensional space coordinates of each point of the whole point cloud, the output of the T-net network is three-dimensional vectors of corresponding points, and the output three-dimensional vectors show the spatial relationship of each point before rotation and translation transformation to a certain extent. Through T-net network processing, the robustness of the whole algorithm for point cloud rotation and translation transformation is improved. In the embodiment, the size of the T-net is changed slightly, a full-connection network T-net is built by using a pytorch, and the number of neurons in hidden layers of the network is in a symmetrical structure of 1024-256-64-256-1024. The voxelization of the point cloud after T-net network transformation specifically comprises the following steps: according to the three-dimensional space coordinates of each point of the point cloud, performing voxelization on the initial point cloud by using a pcl point cloud library to obtain K voxels, and replacing information of each point in each voxel by a three-dimensional vector of each point after T-net transformation from the initial three-dimensional space coordinates. Finally, the voxel number and the three-dimensional vector of each point within the corresponding voxel are obtained, as in a data structure "dictionary":
{ "voxel 1": a point p three-dimensional vector, -; … "voxel k": a point q three-dimensional vector.
(2) And (3) weighting the spatial position information of a plurality of points adjacent to each point in each voxel for multiple times in a mode of credit fusion of PointAtrousNet and a local spectrum convolution kernel to the point to obtain the feature vector of each point. The method specifically comprises the following steps: and (3) according to the three-dimensional vectors of the points in the point cloud after T-net transformation, calculating the distance between the points in the voxel, traversing each point to find out 20 points closest to the point, connecting each point with the 20 nearest neighbors to build a graph, and calculating the Laplace matrix. For each point in the voxel, the information of 20 adjacent points is sampled by 4 times and sent into 4 independent spectrum convolution kernels, and the sampling rate is 1,2,3 and 4. That is, each convolution kernel takes information of a target point and 5 neighboring points as input, the input vector dimension is 3, the weighted information of the target point is taken as output, and the output vector dimension is also 3. Each convolution kernel is built by a pytorech, 4 independent spectrum convolution kernels are adopted in the embodiment, each convolution kernel comprises N depths, each depth comprises 6 parameters to be trained, and the total number of the parameters is 24 xN. Parameters in each depth are trained independently and share parameters. The above is a layer of convolution operation, and the same convolution operation is performed 4 times. And connecting the convolution layers with the same structure end to end, and taking the output of the previous layer of convolution kernel with the same sampling rate as the input of the next layer of convolution kernel. After the last layer of convolution layer, the convolution results of all depths of all convolution kernels are spliced end to obtain 12 xN-dimension characteristic vectors of all points. The feature vectors of each point are recorded in the pytorch in the following data structure:
{ "voxel 1": point 1 feature vector,. point p feature vector; … "voxel k": point 1 eigenvector,. point q eigenvector ] }.
(3) Numbering the points in the voxels one by one according to a natural number sequence according to the spatial distance of each point in the voxels, so that the two points with the closest spatial distance are numbered adjacently; then maximum pooling is carried out on the feature vectors of the points with adjacent numbers, namely, the points in each voxel are sequentially taken several adjacent numbered points according to the number sequence for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the expression is
Figure GDA0003715796140000031
And taking the point with the maximum norm as the pooling result. The end-to-end stitching pooling results are then used as the feature vectors of the voxels, whichThe dimension is 12xNxm, and m is the number of pooled retention points per voxel. Because the number of points in each voxel is different, the pooling size in each voxel is flexibly adjusted, and the consistency of the feature vector size of each voxel is ensured.
(4) A fully-connected network is built by using a pytorech, a ReLU is selected as an activation function, Adam is used as an optimizer, and a MultiLabelSoftMarginLoss of the pytorech is used as a loss function. Inputting the K12 xNxm dimensional voxel characteristic vectors obtained in the step (3) into a full connection layer, and outputting the vectors as the labels of the scene categories. The class labels of the training data and the testing data are all in a one-hot encoding mode, and the last layer of the fully-connected network comprises a softmax layer.
In the step (2), the calculation formula of the local spectrum convolution kernel is as follows:
Figure GDA0003715796140000041
wherein f is the convolution result; f. of j The information of 1 target point and 5 adjacent points after T-net transformation in the step (1); u is a characteristic vector matrix after decomposition of a characteristic value of a Laplace matrix of a point cloud in a voxel; u shape T Is the transposition of U; lambda is an eigenvalue diagonal matrix after characteristic decomposition of the Laplace matrix, alpha j Are the convolution kernel parameters to be trained.
As shown in fig. 2, in step (3), numbering the points in the voxel one by one, specifically includes the following steps:
(3.1) defining the radius rho of the node neighborhood;
(3.2) randomly selecting one unnumbered node to carry out sequential numbering;
(3.3) selecting the node closest to the current maximum number node in the neighborhood range and coding the node as a next number;
(3.4) determining whether all the nodes in the neighborhood of the current maximum number node are numbered;
(3.5) if not all numbers, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;
(3.6) if the nodes which are not numbered exist in the step (3.5), repeating the step (3.2); if all nodes in the voxels in step (3.5) are numbered, the numbering is completed.

Claims (5)

1. The method for classifying the voxelized three-dimensional point cloud scene based on the graph convolution neural network is characterized by comprising the following steps of:
(1) transforming the point cloud three-dimensional space coordinate obtained by the visual sensor by using a T-net network in PointNet, and finely adjusting the size of the T-net network, wherein the number of hidden layer neurons of the network is a symmetrical structure; performing voxelization on the point cloud after T-net network transformation, and performing voxelization on the initial point cloud according to the three-dimensional space coordinates of each point of the point cloud to obtain a plurality of voxels; then replacing the information of each point in the voxel by the three-dimensional vector of each point after T-net transformation from the initial three-dimensional space coordinate; finally obtaining the voxel number and the three-dimensional vector of each point in the corresponding voxel;
(2) weighting the information of each point adjacent point in each voxel to the point by fusing PointAtrousNet and a local spectrum convolution kernel to obtain the characteristic vector of each point;
(3) numbering the points in the voxels one by one according to the space distance of each point in the voxels; then performing maximum pooling on the feature vectors of the numbered adjacent points, and splicing the pooled results head and tail to obtain the feature vector of each voxel;
(4) and (4) inputting the characteristic vector of each voxel in the step (3) into a full-connection network, and outputting the characteristic vector as a category label of the scene point cloud.
2. The method of claim 1, wherein the method comprises the steps of: in the step (2), the spatial position information of a plurality of points adjacent to each point in each voxel is weighted to the point for a plurality of times in a mode of fusing PointAtrousNet and a local spectrum convolution kernel.
3. The method for classifying the voxelized three-dimensional point cloud scene based on the graph convolution neural network according to claim 1, wherein in the step (3), the points in the voxels are numbered one by one according to a natural number sequence, and the method specifically comprises the following steps:
(3.1) defining the radius rho of the node neighborhood;
(3.2) randomly selecting one unnumbered node to carry out sequential numbering;
(3.3) selecting the node closest to the current maximum number node in the neighborhood range and coding the node as a next number;
(3.4) determining whether all nodes in the neighborhood of the current maximum number node are numbered;
(3.5) if not all numbers, repeating the step (3.3); if all the nodes are numbered, further confirming that all the nodes in the voxels are numbered;
(3.6) if the nodes which are not numbered exist in the step (3.5), repeating the step (3.2); if all nodes in the voxels in step (3.5) are numbered, the numbering is completed.
4. The method of claim 1, wherein the method comprises the steps of: in the step (3), the step of performing maximum pooling on the feature vectors of the points with adjacent numbers specifically includes that the points with the adjacent numbers are sequentially taken for maximum pooling, the norm of the feature vectors of each point in the pooling size is calculated, and the point with the maximum norm is taken as a pooling result.
5. The method of any one of claims 1 to 4 for classifying a scene from a voxelized three-dimensional point cloud based on a graph convolution neural network, wherein: in the step (4), the voxel characteristic vector is used as the input of the full-connection network.
CN202010812456.3A 2020-08-13 2020-08-13 Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network Active CN112085066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010812456.3A CN112085066B (en) 2020-08-13 2020-08-13 Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010812456.3A CN112085066B (en) 2020-08-13 2020-08-13 Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network

Publications (2)

Publication Number Publication Date
CN112085066A CN112085066A (en) 2020-12-15
CN112085066B true CN112085066B (en) 2022-08-26

Family

ID=73728203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010812456.3A Active CN112085066B (en) 2020-08-13 2020-08-13 Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network

Country Status (1)

Country Link
CN (1) CN112085066B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB202207459D0 (en) * 2022-05-20 2022-07-06 Cobra Simulation Ltd Content generation from sparse point datasets
CN117409209B (en) * 2023-12-15 2024-04-16 深圳大学 Multi-task perception three-dimensional scene graph element segmentation and relationship reasoning method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118564A (en) * 2018-08-01 2019-01-01 湖南拓视觉信息技术有限公司 A kind of three-dimensional point cloud labeling method and device based on fusion voxel
CN109410307A (en) * 2018-10-16 2019-03-01 大连理工大学 A kind of scene point cloud semantic segmentation method
CN110135227A (en) * 2018-02-09 2019-08-16 电子科技大学 A kind of laser point cloud outdoor scene automatic division method based on machine learning
CN110633640A (en) * 2019-08-13 2019-12-31 杭州电子科技大学 Method for identifying complex scene by optimizing PointNet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135227A (en) * 2018-02-09 2019-08-16 电子科技大学 A kind of laser point cloud outdoor scene automatic division method based on machine learning
CN109118564A (en) * 2018-08-01 2019-01-01 湖南拓视觉信息技术有限公司 A kind of three-dimensional point cloud labeling method and device based on fusion voxel
CN109410307A (en) * 2018-10-16 2019-03-01 大连理工大学 A kind of scene point cloud semantic segmentation method
CN110633640A (en) * 2019-08-13 2019-12-31 杭州电子科技大学 Method for identifying complex scene by optimizing PointNet

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多尺度特征和PointNet的LiDAR点云地物分类方法;赵中阳等;《激光与光电子学进展》;20181007(第05期);全文 *

Also Published As

Publication number Publication date
CN112085066A (en) 2020-12-15

Similar Documents

Publication Publication Date Title
Atapour-Abarghouei et al. Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer
Li et al. So-net: Self-organizing network for point cloud analysis
Zanfir et al. Deep learning of graph matching
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
Olson et al. Automatic target recognition by matching oriented edge pixels
CN109063724B (en) Enhanced generation type countermeasure network and target sample identification method
CN108021947B (en) A kind of layering extreme learning machine target identification method of view-based access control model
CN112488205B (en) Neural network image classification and identification method based on optimized KPCA algorithm
CN107578007A (en) A kind of deep learning face identification method based on multi-feature fusion
CN111625667A (en) Three-dimensional model cross-domain retrieval method and system based on complex background image
CN112085066B (en) Voxelized three-dimensional point cloud scene classification method based on graph convolution neural network
CN106844620B (en) View-based feature matching three-dimensional model retrieval method
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
CN111028238B (en) Robot vision-based three-dimensional segmentation method and system for complex special-shaped curved surface
CN112784782B (en) Three-dimensional object identification method based on multi-view double-attention network
CN109840518B (en) Visual tracking method combining classification and domain adaptation
CN111652273A (en) Deep learning-based RGB-D image classification method
Ahmad et al. 3D capsule networks for object classification from 3D model data
CN111368733A (en) Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal
CN114170154A (en) Remote sensing VHR image change detection method based on Transformer
Wang et al. Ovpt: Optimal viewset pooling transformer for 3d object recognition
Wickramasinghe et al. Deep self-organizing maps for visual data mining
CN115578574A (en) Three-dimensional point cloud completion method based on deep learning and topology perception
CN112365456B (en) Transformer substation equipment classification method based on three-dimensional point cloud data
CN111507243B (en) Human behavior recognition method based on Grassmann manifold analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Zhu Bo

Inventor after: Fan Ximing

Inventor after: Gao Xiang

Inventor before: Gao Xiang

Inventor before: Fan Ximing

Inventor before: Zhu Bo

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant