CN109063753B

CN109063753B - Three-dimensional point cloud model classification method based on convolutional neural network

Info

Publication number: CN109063753B
Application number: CN201810790133.1A
Authority: CN
Inventors: 白静; 司庆龙; 刘振刚
Original assignee: North Minzu University
Current assignee: North Minzu University
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2021-09-14
Anticipated expiration: 2038-07-18
Also published as: CN109063753A

Abstract

The invention discloses a three-dimensional point cloud model classification method based on a convolutional neural network, which comprises the following steps: s1, selecting Princeton ModelNet, and selecting a required number of models from the official website as training data and test data aiming at ModelNet10 and ModelNet40 respectively to generate a training set and a data set; s2, performing feature analysis on the point cloud model and constructing a classification framework; s3, ordering the point clouds; s4, two-dimensional imaging of the ordered point cloud data; and S5, constructing a CNN network facing the two-dimensional point cloud image. According to the invention, the CNN in the image field is directly applied to the classification of the three-dimensional point cloud model for the first time, 93.97% and 89.75% of classification accuracy is respectively obtained on ModelNet10 and ModelNet40, and the method is equivalent to the current best method, and the experimental result fully shows that the CNN in the image field is feasible to be applied to the classification of the three-dimensional point cloud model, so that the PCI2CNN provided by the invention can effectively capture the three-dimensional characteristic information of the point cloud model, and is suitable for the classification of the three-dimensional point cloud model.

Description

Three-dimensional point cloud model classification method based on convolutional neural network

Technical Field

The invention relates to the technical field of computer graphics, computer vision and intelligent identification, in particular to a three-dimensional point cloud model classification method based on a convolutional neural network.

Background

With the rapid development of modern computer vision research, the fields of unmanned vehicles, autonomous robots, real-time SLAM technology, virtual three-dimensional models and the like have breakthrough progress, the usability development of three-dimensional point cloud data is promoted, and various application researches on the three-dimensional point cloud data are promoted. The classification of point cloud data is the basis and key of various application researches.

Currently, deep learning techniques make breakthrough progress in the field of image and speech recognition, which also provides a beneficial research direction for the classification of three-dimensional models. However, the deep learning model can process regular and ordered input, and the point cloud data has the characteristics of irregularity and disorder, which makes the point cloud classification work based on deep learning more difficult and has less research work.

Currently, work in deep learning for point cloud data focuses mainly on how to construct a network suitable for irregular and unordered three-dimensional point cloud data. The first network PointNet (Charles R Q, Su H, Kaichun M, et al. PointNet: Deep Learning on Point settings for 3D Classification and Segmentation [ C ]// Proceedings of Computer Vision and Pattern registration. Los Alamitos: IEEE Computer Society Press,2017:77-85) in the Point cloud aspect utilizes T-Net to realize data alignment and feature alignment, utilizes convolution of 1x1 to complete multi-feature transformation of Point cloud data, utilizes Max Pooling symmetric function to realize extraction of model global statistical information, and completes Point cloud model Classification based on the T-Net. The network does not carry out any adjacent point set operation all the time, and only obtains various transformations of points and statistical information containing all the points in the network, thereby ensuring the adaptability of the network to point cloud data. As PointNet [1] only extracts the global features of the point cloud model and is difficult to capture the local features of the point cloud model, the team of Charles also proposes PointNet + + (Qi C R, Yi L, Su H, et al. Pointnet + +: Deep hierarchical learning on point sites in a metallic space [ C ]// Proceedings of advanced in Neural Information Processing systems. Cambridge: MIT Press,2017: 5105-. Liyanyanyan (Li Y, BuR, Sun M, et al. PointCNN [ OL ] [2018-06-30]. https:// arxiv.org/abs/1801.07791) at Shandong university. The work tries to simulate convolution operation in the image field, solves the problem of point cloud data disorder through network learning and building X-transformation, and completes feature extraction and classification of a point cloud model through layer-by-layer aggregation simulation of convolution in the image field in local regions.

Through analysis, the above work focuses on solving the irregularity and disorder of the point cloud data, and attempts to solve the problems by introducing T-Net, X-transformation and symmetric functions are tried, so that certain effects are indeed achieved. However, these tasks all require a special Network to be designed for the point cloud model, and a Convolutional Neural Network (CNN) which has been very successful in the field of image recognition cannot be applied to the point cloud model classification.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a three-dimensional point cloud model classification method based on a convolutional neural network, which is used for establishing a general three-dimensional point cloud classification frame based on CNN (convolutional neural network) according to the characteristics of a point cloud model, and researching how to convert point cloud data into CNN acceptable rule and ordered two-dimensional point cloud image data by using a repeated point, how to construct a CNN network suitable for a two-dimensional point cloud image, and provides beneficial attempts for the classification work of the point cloud model and some supports for the direct application of the CNN in the classification of the three-dimensional point cloud model.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a three-dimensional point cloud model classification method based on a convolutional neural network comprises the following steps:

s1, selecting Princeton ModelNet, and selecting a required number of models from the official website as training data and test data aiming at ModelNet10 and ModelNet40 respectively to generate a training set and a test set;

s2, performing feature analysis on the point cloud model and constructing a classification framework;

s3, ordering the point clouds;

s4, two-dimensional imaging of the ordered point cloud data;

s5, constructing a CNN network facing the two-dimensional point cloud image, including: the method comprises the steps of point cloud model classification based on medium-sized CNN, point cloud model classification based on small-sized CNN and CNN construction and classification facing to a two-dimensional point cloud image.

In step S1, Princeton model net is selected, official website data is used, and 3991 and 9842 models are selected as training data and 908 and 2468 models are selected as test data for model net10 and model net40, respectively.

In step S2, a general classification framework of the three-dimensional point cloud model is designed for the characteristics of point cloud disorder, irregularity, limitation, and sparsity, and includes the following three modules:

the ordering module of the point cloud data is used for realizing the ordering of the disordered point cloud data;

the two-dimensional imaging module of the ordered point cloud data is used for realizing the regularization of the point cloud data;

the CNN module facing the two-dimensional point cloud image comprises two parts: the deconvolution submodule captures more associated information among the point cloud data through deconvolution operation, and makes up the problem of the sparsity of the point cloud data to a certain extent; and a medium and small convolution classification submodule is used for adapting to the finite characteristic of point cloud data and preventing overfitting of a network.

In step S3, three-dimensional point cloud data M { (x)_i,y_i,z_i) I is 1, …, n, after ordering, the order of the point clouds is determined, and the output is an ordered sequence S ((x)_i,y_i,z_i) I is 1, …, n), and x, y and z are dimension coordinate points of the three-dimensional point cloud model respectively; here, the basic principle of ordering point cloud data is: the distance of the points reaching the set value in the three-dimensional space is relatively close to the distance after the ordering, so that the characteristics of the original point cloud can be kept to the maximum extent without being damaged, and the position relation between adjacent points in the image field is met, and based on the basic principle, the following three different ordering methods are designed:

centroid sorting method: the method has the advantages that ordering results are irrelevant to point cloud input sequence and model translation, scaling and rotation, but on the other hand, the following problems also exist: points symmetrical with respect to the center of mass originally have no adjacent relation with each other in space, but may be adjacent to each other after ordering;

single-dimensional sorting method: the method can ensure that the ordering result is irrelevant to the point cloud input sequence and the translation, scaling and rotation of the model, and also avoids the problem that symmetrical position points in the space are adjacent after ordering, and the object to be scanned can also meet the precondition constraint of forward placement in the space, only the point cloud data after ordering only reflects the space information of a certain coordinate axis and cannot reflect the space information of other dimensions;

two-dimensional sorting method: the method comprises the steps of scanning a pre-aligned model to obtain point cloud data on the basis of a single-dimensional sorting method to obtain a point cloud data model, dividing the point cloud data model into m slices at equal intervals along a certain coordinate axis, when m selects a proper value, considering that the values on the coordinate axis in the same slice are equivalent, namely the points are located in the same plane, sorting each slice according to the certain coordinate axis in a plane coordinate again at the moment, and finishing the ordering of the point cloud data.

In step S4, an ordering sequence S of the unordered point cloud data is input ((x)_i,y_i,z_i) I-1, …, n), this step aims at a reasonable placement of the ordering sequence in the two-dimensional image a-a_jk) p × q, where p × q is n, and a is a two-dimensional matrix, corresponding to the generated image; j and k are the rows and columns of pixels, respectively; a is_jkThe pixel value of the jth row and the kth column in the two-dimensional matrix; p and q are respectively the row number and the column number of the matrix; n represents the number of points contained in the point cloud model so as to meet the requirement that point cloud data corresponding to adjacent pixels in the image are close to each other in spatial position, and aiming at the requirement, the following three different two-dimensional imaging methods are designed:

line scanning method: simulating the movement of a fluorescent screen electron beam, sequentially taking out ordered point cloud data from front to back, and filling the ordered point cloud data into a two-dimensional image line by line from left to right and from top to bottom, wherein the mode can ensure that horizontal adjacent pixels are close to each other in the original point cloud data, but cannot ensure that longitudinal adjacent pixels are close to each other in the original point cloud data, namely, the longitudinal adjacent pixels do not have isotropy;

a chessboard method: considering that the CNN extracts image features using the concept of local receptive fields, if the local part of the point cloud can be associated with the local part of the image, the local features of the point cloud data can be better extracted, and therefore, an imaging method of a chessboard method is proposed: sequentially taking out the ordered point cloud data from front to back, sequentially filling each grid from left to right and from top to bottom, filling each pixel in the grid from left to right and from top to bottom, and when the value of the grid is 8 multiplied by 8, each grid corresponds to a point cloud local area containing 64 points, and the method also has no isotropy;

a spiral method: the ordered point cloud data is sequentially taken out from front to back, the filling is sequentially carried out according to the spiral track from the central pixel of the image, the isotropy can be well kept in the mode, the distance relation of the original space point can be well kept at the position close to the central point, and the defects of the method are also existed: the closer to the edge, the more dispersed the pixel points are, and the distance may become larger after some points which are close to each other in space are filled.

In step S5, a convolutional neural network suitable for a two-dimensional point cloud image is constructed, and since point cloud data has finite and sparse properties and a large CNN with a large number of layers may cause overfitting, a medium CNN and a small CNN are selected first to perform a preliminary experiment;

classifying the point cloud model based on the medium CNN: the point cloud data has the characteristics of information limitation and sparsity, the medium-sized network is large in oriented data size, when the input size of AlxeNet is 224 multiplied by 224, the size of a two-dimensional image corresponding to a point cloud model with an input scale of 1024 is only 32 multiplied by 32, therefore, before the point cloud image is input, deconvolution operation is firstly carried out on the image data, the medium-sized CNN input size requirement is met, overfitting is avoided, meanwhile, high-resolution reconstruction of the point cloud image is realized, and more spatial correlation information is obtained;

classifying point cloud models based on the small CNN: LeNet is a small CNN network comprising two layers of convolution, two layers of pooling and three layers of full connection, and is mainly used for handwriting recognition, the input of the network is an image with the size of 32 multiplied by 32, which is just matched with the size of a two-dimensional point cloud image comprising 1024 points, therefore, the experimental design of point cloud model classification based on medium CNN is directly followed, a deconvolution sub-module is removed, and the 32 multiplied by 32 two-dimensional point cloud image is input into LeNet to complete feature extraction and model classification;

CNN construction and classification facing to the two-dimensional point cloud image: the characteristics of the point cloud model are analyzed, and a convolutional neural network PCI2CNN for two-dimensional point cloud image classification is designed by combining the two experimental results, wherein the design idea of the network is as follows:

the method comprises the following steps of (1) carrying out deconvolution on a group of 2 points, wherein the number of deconvolution channels in the group is 64, the kernel size is 2 multiplied by 2, and the step length is respectively 2 multiplied by 2 and 1 multiplied by 1, so that high-resolution reconstruction of a two-dimensional point cloud image is realized through deconvolution operation, and more related information among point clouds is obtained;

comprises 3 convolution layers, and the number of channels is 64,128, 256; compared with AlxNet, the network training method has the advantages that the network training method contains fewer network layers and parameters, so that the complexity of the network is avoided, and the stability of network training is improved; more parameters are included than LeNet to improve the ability of the network to fit training data;

and adding pooling operation after the first convolution layer and the third convolution layer, wherein the number of channels of the pooling layer is consistent with that of the channels of the previous layer, the core size is 3 multiplied by 3, and the step length is 2 multiplied by 2, so that more abundant information can be obtained through overlapped sampling.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention provides a CNN-based universal three-dimensional point cloud model classification framework, which overcomes the disorder, sparsity and limitation of three-dimensional point cloud data, supports effective classification of three-dimensional point cloud models based on various two-dimensional CNNs, and has higher classification accuracy.

2. The invention provides three ordering methods aiming at the existing characteristics of a point cloud model, realizes the ordering of disordered point cloud data, ensures the relative distance relationship between the point cloud data, keeps the integrity of the characteristics of the original point cloud to the maximum extent, accords with the characteristics of mutual spatial correlation of adjacent pixels of an image, and meets the basic requirements of various deep learning algorithms on the data ordering.

3. The invention provides three feasible methods aiming at the two-dimensional imaging of the ordered point cloud data, realizes the two-dimensional imaging of the ordered point cloud data, ensures the basic characteristic that the point cloud data corresponding to adjacent pixels in the image are adjacent to each other in the spatial position, and further supports the extraction and classification of the three-dimensional point cloud data based on the two-dimensional CNN.

4. The PCI2CNN provided by the invention can effectively capture the three-dimensional characteristic information of the point cloud model, obtains higher classification accuracy and is suitable for three-dimensional point cloud model classification.

Drawings

Fig. 1 is a classification frame diagram of a CNN-based three-dimensional point cloud model.

FIG. 2 is a schematic diagram of a two-dimensional sorting method

Fig. 3 is a two-dimensional imaging diagram using a line scanning method.

Fig. 4 is a two-dimensional graphical diagram using a checkerboard method.

Fig. 5 is a two-dimensional imaging diagram using the spiral method.

Fig. 6 is a network architecture diagram of AlexNet.

FIG. 7 is a diagram of a deconvolution sub-module.

Fig. 8 is a point cloud model classification performance diagram (ModelNet10) based on AlexNet.

Fig. 9 is a diagram of the LeNet network architecture.

Fig. 10 is a point cloud model classification performance diagram (ModelNet10) based on LeNet.

FIG. 11 is a diagram of a PCI2CNN network architecture.

Fig. 12 is a graph comparing the classification performance of three networks (ModelNet 10).

FIG. 13 is a comparison graph of classification performance for different ordering methods (ModelNet 10).

FIG. 14 is a graph of the comparison of the sorting performance for different slice numbers (ModelNet 10).

Fig. 15 is a classification performance comparison chart (ModelNet10) of different two-dimensional imaging methods.

FIG. 16 is a comparison graph (ModelNet10) of classification performance of three point cloud networks.

Detailed Description

The present invention will be further described with reference to the following specific examples.

The three-dimensional point cloud model classification method based on the convolutional neural network provided by the embodiment mainly designs three ordering methods of three-dimensional point cloud data, two-dimensional imaging methods of three ordered point cloud data and a convolutional neural network PCI2CNN suitable for two-dimensional point cloud image classification, and specifically comprises the following steps:

s1, selecting Princeton ModelNet, and selecting a certain number of models from the official website as training data and test data aiming at ModelNet10 and ModelNet40 respectively to generate a training set and a test set; specifically, a Princeton ModelNet is selected, official website data is adopted, 3991 and 9842 models are respectively selected as training data and 908 and 2468 models are respectively selected as test data aiming at ModelNet10 and ModelNet 40.

S2, performing feature analysis on the point cloud model and constructing a classification framework

Aiming at the characteristics of point cloud disorder, irregularity, limitation, sparsity and the like, the invention designs a general classification framework of a three-dimensional point cloud model, which comprises the following three modules as shown in figure 1.

The ordering module of the point cloud data realizes the ordering of the disordered point cloud data;

the two-dimensional imaging module of the ordered point cloud data realizes the regularization of the point cloud data;

the CNN module oriented to the two-dimensional point cloud image mainly comprises two parts: the deconvolution submodule captures more associated information among the point cloud data through deconvolution operation, and the problem of point cloud data sparsity is made up to a certain extent; and a medium and small convolution classification submodule is used for adapting to the finite characteristic of point cloud data and preventing overfitting of a network.

S3, ordering the point cloud

The three-dimensional point cloud data M { (xi, yi, zi), i { (1, …, n }, after being ordered, the order of the point cloud is determined, and the ordered sequence S { (xi, yi, zi), i ═ 1, …, n) is output, and x, y, z are respectively dimension coordinate points of the three-dimensional point cloud model. Here, the basic principle of ordering point cloud data is: at points closer in three-dimensional space, the distances after ordering are also relatively closer. Therefore, the characteristics of the original point cloud can be kept from being damaged to the maximum extent, and the position relation between adjacent points in the image field is met. Based on this basic principle, the invention designs the following three different ordering methods:

centroid sorting method: ordering of point cloud data is realized according to the distance from the point to the object centroid and the sequence from near to far. The method has the advantage that the ordering result is independent of the point cloud input sequence and the translation, scaling and rotation of the model. But on the other hand there is also the problem that points symmetrical with respect to the centroid may not be spatially adjacent to each other, but they may be adjacent to each other after ordering.

Single-dimensional sorting method: the model is pre-aligned and then scanned, and then ordering of the point cloud data is realized along a certain coordinate axis according to the magnitude of the coordinate values. The method can ensure that the ordering result is irrelevant to the point cloud input sequence and the translation, scaling and rotation of the model, and also avoid the problem that the symmetrical position points in the space are adjacent after ordering. Furthermore, the object to be scanned can also satisfy the precondition constraint of placing in the forward direction in space. Only the point cloud data after sequencing only reflects the spatial information of a certain coordinate axis, and cannot reflect the spatial information of other dimensions.

Two-dimensional sorting method: as shown in fig. 2, the point cloud data is obtained by scanning the pre-aligned model based on a single-dimensional sorting method. The point cloud data model is divided into m slices at equal intervals along a certain coordinate axis, such as a Z axis. When m is chosen properly, the Z values within the same slice can be considered to be comparable, i.e. the points lie within the same plane. At this time, sorting each slice according to a certain coordinate axis in the plane coordinates again to finish ordering of the point cloud data. The method can not only ensure the advantages of the second method, but also better embody the spatial information of different dimensions.

S4, two-dimensional imaging of ordered point cloud data

Inputting an ordering sequence S ═ ((x) of unordered point cloud data_i,y_i,z_i) I ═ 1, …, n). This section aims to place the ordering sequence reasonably in the two-dimensional image a ═ (a)_jk) p × q, where p × q ═ n, (a is a two-dimensional matrix, corresponding to the generated image; j and k are the rows and columns of pixels, respectively; a is_jkThe pixel value of the jth row and the kth column in the two-dimensional matrix; p and q are respectively the row number and the column number of the matrix; n represents the number of points included in the point cloud model) so as to satisfy that the point cloud data corresponding to adjacent pixels in the image are close to each other in spatial position. To meet this requirement, we have designed the following threeDifferent two-dimensional imaging methods.

Line scanning method: as shown in fig. 3, the ordered point cloud data is sequentially extracted from front to back, and filled into the two-dimensional image line by line from left to right and from top to bottom, simulating the movement of the electron beam of the phosphor screen. This way, it can be ensured that horizontally adjacent pixels are close to each other in the original point cloud data, but it cannot be ensured that vertically adjacent pixels are close to each other in the original point cloud data, i.e. not isotropic.

A chessboard method: considering that the CNN extracts image features using the concept of local receptive fields, if we can correspond the local parts of the point cloud with the local parts of the image, the local features of the point cloud data can be better extracted. Therefore, the invention provides an imaging method of a chessboard method, which is characterized in that ordered point cloud data is sequentially taken out from front to back, each grid is sequentially filled from left to right and from top to bottom according to the method shown in FIG. 4, and each pixel is filled in the grid from left to right and from top to bottom. When the value of the grid is 8 × 8, each grid corresponds to a local area of the point cloud containing 64 points. This method is also not isotropic.

A spiral method: and sequentially taking out the ordered point cloud data from front to back, and sequentially filling the ordered point cloud data according to a spiral track from the central pixel of the image as shown in FIG. 5. The mode can better keep isotropy, and can well keep the distance relationship of the original space points at the position close to the central point, but has the defects that the closer to the edge, the more dispersed the pixel points are, and the distance of some points which are close to the space after filling is possibly larger.

S5, constructing a convolutional neural network suitable for a two-dimensional point cloud image, and selecting a medium CNN and a small CNN to perform preliminary experiments because point cloud data has finiteness and sparsity and the large CNN with excessive layers may cause overfitting.

Classifying the point cloud model based on the medium CNN: as shown in fig. 6, AlexNet is a convolutional neural network comprising an 8-layer structure, belonging to a medium-sized network in the CNN model set in terms of model depth. Meanwhile, the network obtains champions in the image recognition contest of 2012, and the recognition capability of the network is worthy of affirmation. Therefore, AlexNet is selected as an experimental object, and the point cloud model features are extracted and classified.

The point cloud data has the characteristics of information limitation and sparsity, while the data size of the medium-sized network is usually large, for example, the input size of AlxeNet is 224 × 224, and the two-dimensional image size corresponding to a point cloud model with an input scale of 1024 is only 32 × 32. Therefore, before the point cloud image is input, the method firstly carries out deconvolution operation on the image data, meets the requirement of medium-sized CNN input size, and avoids overfitting; high-resolution reconstruction of the point cloud image is achieved, and more space correlation information is obtained.

As shown in fig. 7, we constructed a deconvolution sub-module including 3 sets of deconvolution operations, taking three-dimensional point cloud data including 1024 points as a standard, i.e., a 32 × 32 two-dimensional point cloud image as an input. Deconv (a, b, c) indicated next to the deconvolution indicates that the number of channels of the deconvolution operation is a, the kernel size is b × b, and the step size is c × c. The number of channels of the 3 sets of deconvolution operations is 16,32 and 64; the sizes of the cores are 2 × 2,4 × 4, and 6 × 6, respectively. Each group of deconvolution operation internally comprises convolution operations with the same number of two channels and the same size of kernel and different step sizes. The step size of the first deconvolution is set to be 2 x 2, the step size of the second deconvolution is set to be 1x1, and the size of each deconvolution kernel can be exactly divided by the step size, so that a chessboard effect is avoided in the deconvolution process, and the generation quality of high-resolution images is improved.

Based on the deconvolution sub-module, the 32 × 32 two-dimensional point cloud image can be reconstructed into a 279 × 279 high-resolution point cloud image, and then the image is scaled to 227 × 227 required by an AlexNet network through a bilinear difference value, so that the feature extraction and classification of the point cloud model are realized by taking the image as input.

To verify the effectiveness of the method, we selected a subset ModelNet10 of the rigid three-dimensional model dataset Princeton ModelNet as the test object to perform preliminary tests on the network. Since the three-dimensional models in the model net10 are all placed forward, in an experiment, three-dimensional point cloud data containing 1024 points are constructed by using a three-dimensional point cloud scanning algorithm provided by PointNet, a two-dimensional point cloud image of the three-dimensional point cloud data is constructed based on a Z-axis sequencing and line scanning mode, a high-resolution two-dimensional point cloud image of 227 x 227 is obtained through deconvolution operation and scaling operation shown in fig. 7, and AlexNet is input to complete feature extraction and classification. The results of the experiment are shown in FIG. 8. It can be seen that, in about 250 iteration steps, the algorithm converges, and the classification accuracy is over 90%. This also preliminarily verifies the validity of the algorithm of the present invention.

However, AlexNet is a medium CNN, which includes 8 layers with its own parameters of about 138M, and 6 layers of deconvolution, so that the whole network architecture is heavy, large in parameter amount, time-consuming in training, and low in performance. Meanwhile, the point cloud data only contains coordinate information of the point cloud and does not contain any topological connection relation, the information amount is limited and extremely sparse, an excessively complex network cannot extract more useful information, and even overfitting can be caused.

Classifying point cloud models based on the small CNN: as shown in FIG. 9, LeNet is a small CNN network (Lecun Y, Bottou L, Bengio Y, et al. gradient-based learning applied to document recognition [ J ] Proceedings of the IEEE,1998,86(11): 2278-. The input to the network is a 32 x 32 size image that exactly matches the size of a two-dimensional point cloud image containing 1024 points. Therefore, the experimental design of point cloud model classification based on the medium-sized CNN is directly adopted, the deconvolution sub-module is removed, and a 32 x 32 two-dimensional point cloud image is input into LeNet to complete feature extraction and model classification.

The classification experimental result based on the LeNet network is shown in FIG. 10, and it can be seen that 1, in about 250 iteration steps, the algorithm is converged, the classification accuracy is about 88%, and the effectiveness of the algorithm is verified again; 2. compared with the classification precision based on the AlexNet network, the classification precision of the network is reduced by 2%. The method is probably caused by two reasons, firstly, a deconvolution submodule is lacked, and more associated information among point cloud data cannot be acquired; secondly, the LeNet network is designed for recognizing digital handwriting, is suitable for capturing linear characteristics and is difficult to capture characteristics of more complex images such as two-dimensional point cloud images; 3. compared with a classification performance curve based on an AlxNet network, the classification performance curve under the network is relatively stable, because the point cloud model can provide limited information quantity, and the AlexNet network has more layers, more parameters and relatively complex parameter space, so that the problem of violent change can occur in the parameter adjustment process; the LeNet network layer number and the complexity of the parameter space are relatively matched with the information quantity which can be provided by the point cloud, so that the classification precision curve is relatively stable.

CNN construction and classification facing to the two-dimensional point cloud image; the characteristics of the point cloud model are analyzed, and a convolutional neural network PCI2CNN facing to two-dimensional point cloud image classification is designed by combining the two groups of experimental results. As shown in fig. 11, the design idea of the network is as follows:

the method comprises a group of 2 deconvolution, the number of deconvolution channels in the group is 64, the kernel size is 2 multiplied by 2, the step length is 2 multiplied by 2 and 1 multiplied by 1 respectively, so that high-resolution reconstruction of a two-dimensional point cloud image is realized through deconvolution operation, and more related information among point clouds is obtained.

Comprises 3 convolution layers, and the number of channels is 64,128,256 respectively. Compared with AlxNet, the network training method has the advantages that the network training method contains fewer network layers and parameters, so that the complexity of the network is avoided, and the stability of network training is improved; more parameters are included than LeNet to improve the ability of the network to fit the training data.

The experimental design of point cloud model classification based on the medium-sized CNN is used, the three-dimensional model in the ModelNet10 is converted into a 32 x 32 two-dimensional point cloud image, the PCI2CNN is input to complete feature extraction and model classification, and the test result shown in FIG. 12 is obtained by comparing with AlxeNet and LeNet. As can be seen from the figure, the PCI2CNN provided by the invention obtains about 92% of classification accuracy on ModelNet10, and is obviously improved compared with 88% of classification accuracy of LeNet. Compared with AlexNet, the accuracy is improved by about 2 percent, and the stability is obviously improved; meanwhile, the initial estimation of the parameters of the two methods shows that the parameter quantity of the PCI2CNN is reduced by 95.5 percent compared with the parameter quantity of the PCI2 CNN. The experimental result shows that the PCI2CNN designed by the inventor better accords with the characteristics of a two-dimensional point cloud image, and has good classification performance, high stability and less parameter quantity. Meanwhile, the effectiveness of the classification framework which is designed for the three-dimensional point cloud model can be also demonstrated by combining the three groups of experiments.

The experimental environment is that PCI2CNN is realized in an Ubuntu14.04 operating system based on open source deep learning framework Tensorflow, and a hardware platform is intel i 72600K + rainbow gtx 10606G +8G RAM.

The experiment aims to test the classification capability of the above classification method of the three-dimensional point cloud model based on the convolutional neural network.

The data set Princeton ModelNet adopted in the experiment is a rigid three-dimensional model data set, and internal models are all placed along the Z axis in the forward direction. Herein, for convenience and comparison with other work, the two subsets of the data set, ModelNet10 and ModelNet40, are used as reference data sets to complete the testing of PCI2 CNN. Wherein the ModelNet10 contains 4899 rigid models in total of 10 classes, and adopts official division, 3991 as training samples and 908 as testing samples; the ModelNet40 contains a total of 12311 rigid models of 40 classes, with official divisions of 9842 as training samples and 2468 as test samples. Based on this, a Point Cloud Library (PCL) tool is used to uniformly sample 1024 points on the surface of the triangular mesh and normalize them to the unit sphere, thereby obtaining a Point Cloud model of the given triangular mesh.

In the training process, each epoch randomly shuffles the sequence of the training samples, and the training PCI2CNN is input. Meanwhile, in order to expand training data, reduce overfitting of a network and improve the robustness of prediction, an angle theta belonging to [0,2 pi ] is randomly given to data of each batch in training, a point cloud model is rotated by theta degrees around the Z-axis direction, and Gaussian noise of (0,0.02) is added to randomly dither the point cloud data so as to amplify the training data.

Firstly, the ordering method of point cloud data is tested and analyzed as follows: with ModelNet10 as reference data and three ordering methods provided by the invention as bases, four different point cloud data ordering methods, namely a centroid ordering method and 3 single-dimensional ordering methods, are firstly constructed, namely coordinate ordering along an X axis, a Y axis and a Z axis; then, a two-dimensional point cloud image of the point cloud model is established by adopting a line scanning method, and PCI2CNN is input to complete classification, and an experimental result is shown in FIG. 13.

As can be seen from FIG. 13, the classification accuracy obtained by the different ordering modes is higher than 89%; the classification result after the ordering according to the Z-axis coordinate is obviously superior to other methods, the classification precision is about 92 percent, and the classification precision is improved by about 2 percent compared with other methods; and thirdly, the classification precision obtained by the X-axis coordinate, the Y-axis coordinate and the centroid sorting method is equivalent, and is about 90%. The algorithm framework has effectiveness, and the classification method based on the effectiveness has better overall classification performance; secondly, the model in the ModelNet10 is placed in the positive direction of the Z axis, and the data sequenced in the direction better accord with the space structure of an object and the visual cognition of human, so that the accuracy is highest; the object can rotate randomly in the Z-axis direction, namely the posture of the object is random in the X-axis direction and the Y-axis direction, so that the sequencing along the two coordinate axes has no obvious regularity, the classification accuracy is similar, and the sequencing along the Z-axis is lower; and fourthly, according to the method for sequencing the distances from the points to the mass center, due to the symmetry, the spatial distances are relatively far, and the sequencing results are adjacent to each other, so that the results are general.

Based on the above experiments, the ordering method according to the Z-axis ordering obtains the best classification performance. Therefore, when the two-dimensional sorting method is tested and analyzed, sorting is firstly carried out according to the Z axis, then equal-interval slicing is carried out, and the interior of the slice is reordered according to the Y coordinate to form a final ordering result. In the experiment, five slices with different numbers, namely 16 slices, 32 slices, 64 slices, 128 slices and 1024 slices (directly sorted according to the Z axis and not sliced), are respectively selected, and the experimental result is shown in FIG. 14.

As can be seen from FIG. 14, the best classification effect is 64 slices, the highest test accuracy rate reaches 93.97%, and is improved by about 2% compared with the test accuracy rate which is only sorted according to the Z coordinate; the least effective of the classification is 16 slices, even less than just sorting by Z coordinate; the classification effect of 32 slices and 128 slices is similar. This is because, compared to 1024 points, when 64 slices are adopted, the slice has 16 points on average inside, the Z-axis coordinates of the 16 points are approximate, and certain plane information is included, and at this time, sorting is performed again according to the Y-axis, and spatial information with more dimensions than sorting according to the Z-axis only can be obtained, so that the sorting performance is the best. However, when the number of slices is continuously increased, the number of points inside the slices is continuously reduced, and geometric information of one plane cannot be effectively expressed, so that the sorting effect along the Y axis is continuously reduced; on the contrary, when the number of the slices is continuously reduced, the point cloud data corresponding to the interior of the slices are too much, the difference of the Z values is larger, the original advantage of sorting according to the Z axis can be broken through by sorting according to the Y axis, and the sorting effect is further deteriorated.

In subsequent experiments, 64 pieces of point cloud data are equally segmented according to the Z-axis coordinate from small to large, and ordering of the point cloud data is completed in the manner that the interior of the segments is rearranged according to the Y coordinate.

Secondly, the two-dimensional imaging method of the ordered point cloud is tested and analyzed as follows: after the ModelNet10 is used as reference data and is ordered, a two-dimensional image of the ordered point cloud data is constructed respectively according to three different modes, namely a line scanning mode, a chessboard mode and a spiral mode, and PCI2CNN is input to complete classification. The experimental result is shown in fig. 15, and it can be seen that the highest classification accuracy is obtained in the line scanning manner, which is about 1% higher than that of the other two methods; the chessboard method and the spiral method have equivalent effects. Careful analysis reveals that the two-dimensional imaging method of line scanning has no isotropy, but the relationship between pixels and the relationship between point clouds have consistent corresponding relationship with each other only in a single direction; in the spiral method and the checkerboard method, the relationship between pixels in adjacent rows or adjacent columns does not have consistency, for example, in the spiral method, the original near point cloud may be far apart at the edge position of the image, and in the checkerboard method, the pixel points at the junction of adjacent lattices are separated from the ordered point cloud by the distance of almost one lattice, which affects the classification effect to a certain extent.

Finally, the invention compares and analyzes the classification experiment results: several typical three-dimensional model classification methods were chosen for comparison with PCI2CNN, including the PointCNN proposed by the latest shandong university liangyang et al, and the experimental data are shown in table 1, and the experimental analysis will be given below for model net10 and model net40, respectively.

TABLE 1 results of classification experiments on ModelNet datasets, "-" indicates that the corresponding entry information is not provided in the relevant papers

1) Experimental results and analysis on ModelNet10

On the ModelNet10, the algorithm herein achieved 93.97% classification accuracy, ranking the second of all methods. Because the point cloud data contains a small amount of information and has disorder and irregularity, the network taking the point cloud data as input can acquire less information than other types of networks, and therefore, the classification accuracy fully proves the effectiveness of the method. Detailed comparisons and analyses are as follows:

compared to the three-dimensional voxel based approach, the approach herein is superior to all but VRN ensembles. However, as can be seen from the parameter quantity, the parameters of the network constructed herein are only 1/45 of VRN ensembles. While other parts of the voxel method with the accuracy equivalent to that of the text, such as the VRN parameter number is 9 times that of the text, the ORION method requires both the input of the voxel model and the direction information provided in advance.

The classification accuracy of the method is highest compared to the multi-view based method. Due to the large amount of information contained in the multiple views, networks using the multiple views as input are often complex, and training is time-consuming. Specifically, compared with DeepPano and Pairwise, the accuracy of the network classification is respectively improved by 5.31% and 1.17%, and the parameter quantity is reduced by about 2 orders of magnitude compared with Pairwise.

Compared with other networks based on point cloud data, the classification accuracy of the method is improved by 0.89% and 2.01% compared with PointNet and PointNet (vanilla), respectively. Fig. 16 shows the test accuracy curves of the three networks during the training process, wherein the abscissa represents the number of epochs of training and the ordinate represents the test accuracy. From the figure, it can be seen that from the initial iteration to the final convergence, the classification performance of the network is better than that of PointNet and PointNet (vanilla), and the classification effect is stable; and when the epoch value is about 25, the test precision of the method is over 90 percent, which also fully proves the good generalization performance of the network.

Further analysis can find that three-dimensional points of the point cloud exist in isolation in a three-dimensional space, and the points have no direct connection relation but have a position relation with each other. In the PointNet network, all convolution processing is performed on a single three-dimensional point, extracted features are abstractions of three-dimensional coordinates of the single point, and finally global information including all three-dimensional point clouds is acquired by using a maximum pooling layer, so that feature extraction on related points is lacked. The point cloud is converted into a two-dimensional matrix after being ordered, and the relative relation between three-dimensional space points is reserved to a certain extent, so that more characteristic information than PointNet can be captured, and the classification accuracy is high.

2) Experimental results and analysis on ModelNet40

On model net40, the classification accuracy of the algorithm herein was 89.75% with the overall ranking centered. The comparison with the voxel method and the multi-view method is similar to the ModelNet10, and will not be described again. Here, various point cloud-based deep learning methods are mainly analyzed.

Compared with the network PointNet and PointNet (vanilla) facing point cloud classification, the classification accuracy of the algorithm is respectively improved by 0.55% and 2.55%, and the parameter amount is reduced by 39% compared with PointNet. The reason analysis is the same as ModelNet 10.

Compared with PointNet + + and PointCNN, the classification accuracy of the method is respectively reduced by 0.95% and 1.95%, and a certain difference exists. The reason is that feature information of different scales of a three-dimensional point cloud network can be completely and effectively captured in both PointNet + + and PointCNN, and although CNN is used for capturing local feature information, the problems of incomplete or mixed local feature capture may exist due to the fact that the relations among ordering, two-dimensional imaging, deconvolution in CNN, convolution kernels and step lengths cannot be comprehensively considered, and further research and thinking are worth.

Therefore, in conclusion, the experiments in the model net10 and the model net40 fully demonstrate the effectiveness of the method, and are worthy of popularization.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A three-dimensional point cloud model classification method based on a convolutional neural network is characterized by comprising the following steps:

s3, ordering the point clouds;

s4, performing two-dimensional imaging on the ordered point cloud data to obtain a two-dimensional point cloud image;

inputting an ordered sequence S = ((x) of disordered point cloud data_i,y_i,z_i) I =1, …, n), placing the ordered sequence in a two-dimensional image a = (a)_jk) p × q, where p × q = n, a is a two-dimensional matrix, corresponding to the generated image; j and k are the rows and columns of pixels, respectively; a is_jkThe pixel value of the jth row and the kth column in the two-dimensional matrix; p and q are respectively the row number and the column number of the matrix; n represents the number of points contained in the point cloud model so as to meet the requirement that point cloud data corresponding to adjacent pixels in the image are close to each other in spatial position, and aiming at the requirement, the following three different two-dimensional imaging methods are designed:

line scanning method: simulating the motion of a fluorescent screen electron beam, sequentially taking out ordered point cloud data from front to back, and filling the ordered point cloud data into a two-dimensional image line by line from left to right and from top to bottom;

a chessboard method: the imaging method of the chessboard method is provided: sequentially taking out the ordered point cloud data from front to back, sequentially filling each grid from left to right and from top to bottom, filling each pixel in the grid from left to right and from top to bottom, and when the value of the grid is 8 multiplied by 8, each grid corresponds to a point cloud local area containing 64 points;

a spiral method: sequentially taking out the ordered point cloud data from front to back, and sequentially filling the ordered point cloud data according to a spiral track from a central pixel of the image;

s5, constructing a CNN network facing the two-dimensional point cloud image, including: the method comprises the following steps of point cloud model classification based on the medium-sized CNN, point cloud model classification based on the small-sized CNN and CNN construction and classification facing to a two-dimensional point cloud image, wherein the point cloud model classification based on the medium-sized CNN, the point cloud model classification based on the small-sized CNN and the CNN construction and classification facing to the two-dimensional point cloud image are as follows:

classifying the point cloud model based on the medium CNN: before inputting the point cloud image, firstly carrying out deconvolution operation on image data to realize high-resolution reconstruction of the point cloud image and acquire more spatial correlation information;

classifying point cloud models based on the small CNN: LeNet is a small CNN network comprising two layers of convolution, two layers of pooling and three layers of full connection, the input of the network is an image with the size of 32 multiplied by 32, the image is just matched with the size of a two-dimensional point cloud image comprising 1024 points, the experimental design of point cloud model classification based on medium CNN is directly followed, a deconvolution sub-module is removed, and the two-dimensional point cloud image with the size of 32 multiplied by 32 is input into LeNet to complete feature extraction and model classification;

the method comprises the following steps of (1) carrying out deconvolution on 2 in total, wherein the number of deconvolution channels in the set is 64, the kernel size is 2 x 2, and the step size is 2 x 2 and 1x1 respectively;

comprises 3 convolution layers, and the channel numbers are respectively 64,128 and 256;

the pooling operation is added after the first and third convolutional layers, and the number of channels in the pooling layer is the same as the number of channels in the previous layer, the core size is 3 × 3, and the step size is 2 × 2.

2. The convolutional neural network-based three-dimensional point cloud model classification method according to claim 1, wherein: in step S1, Princeton model net is selected, official website data is used, and 3991 and 9842 models are selected as training data and 908 and 2468 models are selected as test data for model net10 and model net40, respectively.

3. The convolutional neural network-based three-dimensional point cloud model classification method according to claim 1, wherein: in step S2, a general classification framework of the three-dimensional point cloud model is designed, which includes the following three modules:

the CNN module facing the two-dimensional point cloud image comprises two parts: a deconvolution sub-module and a small and medium convolution classification sub-module.