CN108710906B

CN108710906B - Real-time point cloud model classification method based on lightweight network LightPointNet

Info

Publication number: CN108710906B
Application number: CN201810446480.2A
Authority: CN
Inventors: 白静; 司庆龙; 刘振刚
Original assignee: North Minzu University
Current assignee: North Minzu University
Priority date: 2018-05-11
Filing date: 2018-05-11
Publication date: 2022-02-11
Anticipated expiration: 2038-05-11
Also published as: CN108710906A

Abstract

The invention discloses a real-time point cloud model classification method based on a lightweight network LightPointNet, which comprises the following steps: s1, analyzing structural characteristics of a deep convolutional neural network, and designing a lightweight real-time point cloud network LightPointNet according to application requirements, wherein the lightweight real-time point cloud network LightPointNet at least comprises an input layer, convolutional layers, full-connection layers and output layers, the last layer of each convolutional layer comprises a pooling layer, and each characteristic channel of the last convolutional layer for convolution adopts maximum pooling for generating a characteristic value; s2, inputting data, and performing convolution on the data through three layers of convolution layers to obtain a final characteristic value; s3, inputting the obtained characteristic values into a full connection layer for classification; and S4, obtaining the final classification accuracy by utilizing the classification operation of S3. The invention has the advantages of less network layer number, less parameters, high processing speed, good classification performance and the like.

Description

Real-time point cloud model classification method based on lightweight network LightPointNet

Technical Field

The invention relates to the technical field of computer graphics, computer vision and intelligent identification, in particular to a real-time point cloud model classification method based on a lightweight network LightPointNet.

Background

With the wide application of 3D sensors such as laser scanners and RGBD cameras in the fields of robots, unmanned driving, three-dimensional scene roaming and the like, data of point cloud models are rapidly increased, and high-level semantic understanding of the point cloud models is gradually paid attention to by people. However, in sharp contrast to the development of various technologies in the image field, various researches based on the point cloud model are slow due to the disorder, sparsity and limited information of the point cloud model. The invention provides a deep learning network meeting application requirements by researching a point cloud model-oriented classification method aiming at the high-level semantic understanding requirements of the point cloud model.

The deep learning network facing the point cloud classification needs to have real-time performance and compactness. This is because in the application scenario of the point cloud model, such as when unmanned, preliminary segmentation and recognition always need to be completed according to the point cloud data acquired in real time to form a timely decision. Meanwhile, in many application scenarios, tasks such as segmentation and identification need to be performed in a small embedded device, which requires a smaller network structure and less occupied resources. In view of this, the invention provides a lightweight real-time point cloud network LightPointNet, which has the advantages of less parameters, high speed and the like while ensuring good classification performance.

The novel technologies such as unmanned driving and robots are widely applied, and the real-time requirement of point cloud processing is higher and higher. Currently, the point cloud model is processed by using the following two networks: a.PointNet network, b.PointNet + + network, c.Li-PointCNN network.

PointNet network: facing the difficulty of Point cloud data processing, Charles R Q, Su H et al (Charles R Q, Su H, Mo K, et al. PointNet: Deep Learning on Point sessions for 3D Classification and Segmentation [ J ].2016.) scholars from Stanford university in 2017CVPR first proposed PointNet, which gave a solution. PointNet is the first deep neural network to directly process disordered point cloud data. In general, a deep neural network requires input information to have a normalized format, such as a two-dimensional image, a time-sequential speech, and the like. The original three-dimensional point cloud data is often a disordered point set in space, a certain point cloud is assumed to contain N three-dimensional points, each point is represented by (x, y, z) three-dimensional coordinates, and even if the changes of occlusion, visual angle and the like are not considered, the points are arranged and combined in sequence, so that N! And (4) carrying out the following steps. Therefore, it is necessary to design a function so that the function value is independent of the order of input data. In practice, in algebraic combinatorics, such functions are called symmetric functions. In PointNet, they use the Max poling layer as the main symmetry function. Max Pooling is designed as a symmetric function to deal with the disorder of the point cloud model, i.e., Max Pooling outputs the same result regardless of the order of input. The symmetric function is an operation similar to natural number addition, and the input sequence is changed and the output is unchanged. They performed experiments on 13 classes in the stanford dataset, judged with an mlou, mean IoU, with an accuracy of 78.62%.

PointNet + + network: the network is improved by Su H et al (Qi C R, Yi L, Su H, et al. PointNet + +: Deep Hierarchical Feature Learning on Point Sets in a Metric Space [ J ].2017.), with an accuracy of 86.13%.

Li-PointCNN network: the network is proposed by Liyanyan-like person (Li Y, Bu R, Sun M, et al. PointCNN [ J ].2018.) at Shandong university according to the feature extracted from the relation of adjacent point clouds. Li-PointCNN provides a method called X-transformation, which solves the problem that convolution on point cloud is difficult to realize effectively. The X-transform is a set of weights X learned from the input points that can re-weight and rank the features associated with each point. The X-transform may achieve "random strain," i.e., when the order of the input points changes, X can change accordingly, leaving the features after weighting and permutation approximately unchanged. The input features can become normalized features that are independent of input point order while also encoding input point shape information after being processed by the X-transform. The convolution on the features after the X-transformation can greatly improve the utilization rate of a convolution kernel, thereby greatly improving the capability of the convolution operation for extracting the features on the ordinal data. Better results than PointNet + + were obtained on modelnt 40.

In summary, the main problems of the existing point cloud processing network are as follows: PointNet network and PointNet + + network: the required parameters are large in quantity, the training time is long, and the classification precision is not high; and b, the method of ordering point cloud data by the PointCNN based on the transformation causes the complication of a network structure to a certain extent and cannot be well suitable for real-time point cloud identification.

Disclosure of Invention

The invention aims to solve the problems of complex network structure, long training time, large required data volume and the like of the existing deep learning network facing point cloud model identification, and provides a real-time point cloud model classification method based on lightweight network LightPointNet.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: the real-time point cloud model classification method based on the lightweight network LightPointNet comprises the following steps:

s1, analyzing structural characteristics of a deep convolutional neural network, and designing a lightweight real-time point cloud network LightPointNet according to application requirements, wherein the lightweight real-time point cloud network LightPointNet at least comprises an input layer, convolutional layers, full-connection layers and output layers, the last layer of each convolutional layer comprises a pooling layer, and each characteristic channel of the last convolutional layer for convolution adopts maximum pooling for generating a characteristic value;

s2, inputting data, and performing convolution on the data through three layers of convolution layers to obtain a final characteristic value;

s3, inputting the obtained characteristic values into a full connection layer for classification;

and S4, obtaining the final classification accuracy by utilizing the classification operation of S3.

In steps S1 and S2, a point cloud model including n points is input, each point is composed of three coordinates (x, y, z), and is an n × 3-dimensional tensor as a whole, 64 and 128 feature maps are obtained through convolution layers of (64,3,1) and (128,1,1), respectively, the 128 feature maps obtained are further fused through a convolution layer with a parameter of (128,1,1), a 1 × 128-dimensional tensor is obtained through Max Pooling (Max Pooling), a 128-dimensional feature value is sent to a full-connection layer, a 1 × 256-dimensional tensor is obtained, and finally classification processing is performed.

In step S1, the designed lightweight real-time point cloud network LightPointNet network model has 6 layers, where the first layer is an input layer, the second to fifth layers are hidden layers, and the sixth layer is an output layer, and each layer has the following structure:

first layer, input layer: the input data is a point cloud containing n points, each point is composed of three coordinates (x, y, z) and is an n x 3-dimensional tensor as a whole;

second layer, hidden layer: comprises a convolution layer and an activation function layer;

third, hidden layer: comprises a convolution layer and an activation function layer;

fourth, hidden layer: comprises a convolution layer, an activation function layer and a pooling layer;

fifth layer, hidden layer: comprises a full connection layer, an activation function layer and a Dropout layer;

sixth layer, output layer: including a Loss of Loss layer.

The specific data required for each layer are as follows:

a first winding layer: convolution kernel 3 × 1, step size 1 × 1, output dimension 64 × n × 1, and parameter number 256;

a second convolution layer: convolution kernel 1 × 1, step size 1 × 1, output dimension 128 × n × 1, and number of parameters 8320;

a third convolutional layer: convolution kernel 1 × 1, step size 1 × 1, output dimension 128 × n × 1, number of parameters 16512;

a pooling layer: convolution kernel nx1, step size 1 × 1, output dimension 128 × 1, and number of parameters 128;

full connection layer: 256 output dimensions, 33024 parameter numbers;

an output layer: outputting dimension k and the number of parameters 257 k;

the light-weight real-time point cloud network LightPointNet has the total number of required parameters of 58240+257 k;

as described above, n is the number of points in the point cloud, and k is the number of classes.

The required parameter is only of the order of 10^-2M。

In step S2, the convolution specifically operates as follows:

s2.1, utilizing a convolution layer with 64 channels, 1x3 convolution kernel and 1x1 step length to expand the (x, y, z) fusion of each point into 64 characteristic values;

s2.2, expanding the (x, y, z) fusion of each point into 128 characteristic values by utilizing a convolution layer with the channel number of 128, the convolution kernel of 1 multiplied by 1 and the step length of 1 multiplied by 1;

s2.3, further fusing 128 characteristic values obtained by the previous convolution layer by utilizing the convolution layer with the channel number of 128, the convolution kernel of 1 multiplied by 1 and the step length of 1 multiplied by 1;

and S2.4, inputting the obtained 128-dimensional features into the full-connection layer to obtain a tensor of 1x256 dimensions, and finally performing classification processing.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the network structure is simple, only comprises the most basic convolution, pooling and full connection, and does not comprise various local transformation networks for the point cloud data disordering processing. Meanwhile, in order to ensure that the convolution layer is not influenced by the sequence of the input point clouds, the size of the convolution kernel of the network can only be 1 multiplied by 3 or 1 multiplied by 1.

2. The network design has symmetry, so that multiple disordered point cloud models corresponding to the same three-dimensional object have similar network output. In the design of the network, the use of a pooling layer in PointNet is used for reference, and the maximum pooling is used for extracting the global features of the point clouds at the end of the continuous convolution layer, so that the network is insensitive to the arrangement sequence of the point clouds, and the network is ensured to extract similar global features based on different arrangement sequences of the same point cloud.

3. The network hierarchy is shallow. Too deep a hierarchy can cause two problems: firstly, excessive calculation cost, and overlong training time and recognition time; second, overfitting is likely to occur. The network of the present invention contains only 2-4 convolutional layers.

4. The number of channels in each convolutional layer is not necessarily too large. The reason is mainly divided into two aspects: firstly, the number of channels is excessive, the calculation cost is high, and the training time and the recognition time are long; second, the data structure of the point cloud is a set of point coordinates in a three-dimensional space, which essentially resamples the geometry of the three-dimensional world at a low resolution, and therefore the amount of information that the point cloud can provide is very limited. The expression of too many channels to the point cloud feature is not beneficial, but rather increases the complexity of the network. The number of channels of the convolutional layers in the network of the present invention is set between 32 and 256.

5. The full-connection layer of the network must be small, and the number of nodes of each full-connection layer is not suitable to be too large. The full connection layer is the place where the parameters are most concentrated in the network, and the design of the full connection layer is the main factor influencing the complexity of the network. The invention only uses 1-2 full connection layers after the pooling layer of the network, and the number of nodes of the full connection layer is set between 128-1024 to simplify the network architecture.

In a word, the invention simplifies the network structure to only 6 layers and only comprises 1 layer of full connection layer, and the number of required parameters is greatly reduced. As described above, under the same conditions, the method greatly shortens the network training time and the recognition time compared with other networks. The classification accuracy of the invention is optimal on the ModelNet10, and the classification accuracy on the ModelNet40 is slightly lower than Su-MVNN, but the used parameters are only 1/46 of the network.

Drawings

Fig. 1 is a diagram of a lightweight real-time point cloud network LightPointNet architecture prototype.

Fig. 2 is a light-weight real-time point cloud network LightPointNet architecture diagram.

Fig. 3 is a classification accuracy curve diagram when the total connection layer takes different node numbers.

FIG. 4 is a graph comparing classification performance of one-layer full join and two-layer full join.

FIG. 5 is a graph comparing the classification performance of the third layer convolution with different channel numbers.

Fig. 6 is a graph comparing the classification performance of LightPointNet with different convolution layer numbers.

FIG. 7 is a graph comparing the classification performance of the first layer convolution with different channel numbers.

Figure 8 is a comparison graph of LightPointNet network prototype and final architecture classification performance.

FIG. 9 is a graph comparing Classification Performance of LightPointNet, PointNet, and PointNet (vanilla).

Detailed Description

The present invention will be further described with reference to the following specific examples.

The method for classifying the light-weight network LightPointNet-based real-time point cloud model provided by the embodiment is characterized in that a deep convolutional neural network is selected, and the network structure is simplified under the condition of ensuring the network classification performance by selecting the number of required convolutional layer channels and the number of neurons in a full connection layer; which comprises the following steps:

In step S1, the designed lightweight real-time point cloud LightPointNet network model has 6 layers, where the first layer is an input layer, the second to fourth layers are convolution layers, the fifth layer is a full-link layer, and the sixth layer is an output layer, and each layer has the following structure:

sixth layer, output layer: including a Loss of Loss layer.

The specific data required for each layer are as follows:

full connection layer: 256 output dimensions, 33024 parameter numbers;

an output layer: outputting dimension k and the number of parameters 257 k;

as described above, n is the number of points in the point cloud, and k is the number of classes. The required parameter is only of the order of 10^-2M。

In step S2, the convolution specifically operates as follows:

Further, in order to avoid the phenomenon that the deep network is too large in calculation amount and easy to generate overfitting, only 3 convolutional layers are designed in the lightweight real-time point cloud network LightPointNet, and experiments prove that the method is effective for feature extraction.

Further, the LightPointNet network solves the problems of limited point cloud information amount, complex network structure and the like by reducing the number of network channels, so that the network structure is simplified.

Furthermore, only one full connection layer is arranged behind the LightPointNet network convolution layer, and through repeated experiments, the number of neurons of the full connection layer is reduced under the condition that the classification performance is not influenced, so that the purpose of simplifying the network structure is achieved.

Further, the number of parameters required by the LightPointNet network is independent of the point cloud scale, and the specific data required by each layer in the step S2 is shown in table 1:

TABLE 1LightPointNet network parameter calculation

The invention uses ModelNet10, ModelNet 40: and (3) selecting 3991 and 9843 models as training data and 908 and 2468 models as test data respectively aiming at ModelNet10 and ModelNet40 by adopting official website data in the CAD model library.

Here, the number of parameters of several networks for three-dimensional model classification is summarized, including voxel and view-based networks. The LightPointNet network of the invention has only about 0.07M of parameters, and Su-MVCNN has about 800 times of parameters of the invention, and even the lightweight real-time network PointNet is 46 times of the invention. On a hardware platform built by the invention, by using a ModelNet10 and a ModelNet40 data set, the time consumption of 250 epochs (one epoch is called after the data set is completely trained) executed by the LightPointNet training method is respectively about 25 minutes and 1 hour, the time consumption of 250 epochs executed by the PointNet training method is respectively about 136 minutes and 6 hours, and the time consumption of the LightPointNet is 1/6 of the PointNet, so that the network has better real-time performance than the PointNet. The comparative results are shown in table 2:

TABLE 2 parameters of several three-dimensional model classification networks

Method	Number of parameters
		PointNet	3.5M
Subvolume	16.6M
		Su-MVCNN	60.0M
Ours	About 0.07M

Under an Ubuntu14.04 operating system, a Google open source-based deep learning framework tensorflow realizes a LightPointNet network, and a hardware platform is intel i 72600K + rainbow gtx 10606G +8G RAM. And the inventors compared experimental results with other networks, including traditional manual feature-based methods, voxel-based methods, view-based methods, and point cloud-based methods. In the present invention, the LightPointNet network has only about 0.07M, Su-MVCNN proposed by Su H et al (Su H, Maji S, Kalogerakis E, et al. Multi-view relational network for 3d shape registration [ C ]// Proceedings of the IEEE International Conference Computer Vision, Washington DC: IEEE Computer Society Press 2015:945 953), the required parameters are about 800 times of the present invention, even lightweight real-time Network PointNet as proposed by Garcia-Garcia A et al (Garcia-Garcia A, Gomez-Donoso F, Garcia-Rodriguez J, et al. PointNet: A3D relational Neural Network for real-time object class registration [ C ]// Proceedings of Neural Networks (IJCNN), International journal 2016 conference.Washington DC: IEEE Computer Society Press,2016:1578 + 1584) requires 46 times more parameters than the present invention. The comparative results are shown in table 3:

TABLE 3 comparison of the results of several classification methods on ModelNet and our method

	Input	Views	ModelNet10	ModelNet40
					SPH	Mesh	-	-	68.2
LFD	Image	10	79.8	75.5
					Su-MVCNN	Image	80	-	90.1
3DShapeNets	volume		1	83.5						77.3
					VoxNet	Volume	12	92.0	85.9
Subvolume	Volume	20	-	85.9
					PointNet	Point		1	92.1	89.2
Ours-LightPointNet	Point		1	93.1							89.3

On the hardware platform used, using the ModelNet10 and ModelNet40 datasets, the inventive LightPointNet was trained to perform 250 epochs (one complete training pass of the dataset is referred to as an epoch) for approximately 25 minutes and 1 hour, respectively, while training PointNet performed 250 epochs for approximately 136 minutes and 6 hours, respectively, which is 1/6 for LightPointNet, as shown in table 4, thus it can be seen that our network has better real-time performance than PointNet.

TABLE 4 comparison of time Performance between several Classification methods and LightPointNet Classification method

In this embodiment, the data sets used in the present invention are all model net10, model net 40: CAD model library, using official network data.

As shown in fig. 1 and 2, the lightweight real-time point cloud network LightPointNet is designed, and includes 1 input layer, 3 convolutional layers, 1 full-connection layer, and 1 output layer, and each layer has the following specific structure.

A first layer: an input layer: the input data of the network is a point cloud model containing n points, each point is composed of three-dimensional coordinates (x, y, z) and is a tensor of nx3 dimensions as a whole.

A second layer: and (3) rolling layers: the three parameters in (64,3,1) sequentially represent the channel number 64, the size of convolution kernel 1x3 and the step size 1x1, and 64 characteristic values can be obtained after the first convolution.

And a third layer: and (3) rolling layers: the three parameters in (128,1,1) sequentially represent the number of channels 128, the size of convolution kernel 1x1 and the step size 1x1, and 128 characteristic values can be obtained after the second convolution.

A fourth layer: and (3) rolling layers: the three parameters in (256,1,1) sequentially represent the number of channels 256, the size of convolution kernel 1x1 and the step size 1x1, and 256 characteristic values can be obtained after the third convolution.

A pooling layer: the convolution is performed with maximum Pooling for each channel in the MAX Pooling arrow direction as shown in FIG. 1, resulting in a tensor of 1x256 dimensions.

And a fifth layer: full connection layer: with the 1x 256-dimensional tensor as an input, a 1x 128-dimensional tensor is obtained through the full connection layer.

Dropout layer: to prevent overfitting, a Dropout layer is added after the fully connected layer and the Dropout ratio is set to 0.5.

A sixth layer: an output layer: the layer finally outputs the K-dimensional features.

Next, aiming at the lightweight real-time point cloud network LightPointNet prototype constructed in the above embodiment, through multiple experimental verifications, parameters of each layer of the network are modified, and a better network structure is constructed. The specific modification measures are as follows:

the present invention initially assumes that the layers in the network are independent of each other. In the optimization process, when the influence of the parameter setting of a certain layer on the network performance is researched, the parameter setting of other layers can be fixed to be consistent with the network prototype provided in fig. 1, and multiple experiments are carried out on different parameter values of the layer to determine the optimal parameter setting.

Based on the above assumptions, the invention sequentially completes the test and optimization by taking the model net40 as a reference data set according to the primary and secondary influences of each parameter on the network performance, and specifically comprises the following 5 steps:

1. and determining the number of nodes of the full connection layer according to the influence of the number of nodes of the full connection layer on the network performance. Since most of the parameters of the entire network are concentrated on the fully-connected layer, the inventors first explored the setting of the number of fully-connected layer nodes. Based on the network prototype provided by fig. 1, the other parameter settings are fixed and unchanged, and the optimal node number of the layer is determined by changing the node number of the fully-connected layer and carrying out a plurality of experiments. Fig. 3 shows the classification accuracy of the network on the test set as a function of the training epoch times when the fully-connected layer takes different node numbers. In fig. 3, the abscissa represents the number of epochs of training, the ordinate represents the classification accuracy of the test (the meanings of the coordinate axes in the following figures are consistent and are not repeated), and the number at the lower right represents the number of nodes of the network fully-connected layer corresponding to the color curve.

As can be seen from fig. 3, the classification performance is weak when the number of nodes of the full connection layer is 128 or 1024; the classification performance is equivalent when the node number is 256 and 512, and is higher than that when the node number is 128 or 1024. This means that the number of nodes in the fully-connected layer should not be set too large or too small. In view of the above, the invention modifies the number of nodes of the full connection layer in the prototype system from 128 to 256.

2. And determining the number of the full connection layers according to the influence of the number of the full connection layers on the network performance. After the number of nodes of the full-connection layer is determined, a full-connection layer with 128 nodes is added behind the full-connection layer to explore the classification performance of the two-layer full-connection architecture, and fig. 4 shows the comparison of the classification performance of the one-layer full-connection architecture and the two-layer full-connection architecture. As can be seen from fig. 4, the convergence speed and classification performance of the two-layer fully-connected architecture are far inferior to those of the one-layer fully-connected architecture, which may be related to the finite nature of the point cloud data, so the present invention continues to keep the one-layer fully-connected architecture unchanged. To this end, the present invention determines the network structure after the pooling operation, which comprises 1 full connection layer and 256 nodes.

3. And determining the channel number of the last convolutional layer according to the influence of the channel number of the convolutional layer on the network performance. After the full connection layer is determined, the invention determines the number of channels (corresponding to the characteristic dimension of one point) of the convolution of the third layer by using the same method. The layer is connected with the pooling layer to determine the output dimensionality of the pooling layer, and the pooling layer is connected with the full-connection layer, so that the setting of the number of the convolution channels of the third layer has a large influence on the network parameter number. We performed several experiments varying the number of convolutional layer feature channels to determine the optimal number of channels. Fig. 5 shows the classification performance comparison of different channel numbers of the third layer convolution, wherein the bottom right corner number represents the number of the third layer convolution channels of the network corresponding to the color curve.

As can be seen from fig. 5, the classification performance of the network is very weak when the number of the convolution channels in the third layer is 64; when the number of channels is greater than 128, the classification performance is not substantially improved any more. Relatively speaking, the number of channels is preferably 512, weakest 64 and weakest 1024 times; the classification performance is basically even when the values are 128, 256 and 512, and the network convergence is slower when the value is 128. This means that the number of channels in the convolutional layer is set to be moderate, not too small, not too large. On the model Net40, when the number of channels of the third layer convolution is 128, the parameter is reduced by 41.8% compared with that when the number of channels of the third layer convolution is 256, and the classification performance is not substantially lost, so that the number of channels of the third layer convolution in the prototype system is modified from 256 to 128 by the invention. Because the second layer of convolution is positioned between the two convolution layers and is not directly connected with other modules such as an input layer or a pooling layer, the invention directly uses the channel number of the third layer to set the parameters of the layer.

4. The number of convolutional layers is determined based on their impact on network performance. The inventors next will explore the effect of the number of layers of convolution on classification performance. The inventors have separately performed two comparative experiments, one after adding one copy of the convolution copy of the third layer to the layer; one is to remove the second layer convolution. The comparative experiment results are shown in fig. 6, where the bottom right corner represents the number of layers of the network convolution corresponding to the color curve.

As can be seen from fig. 6, the classification performance of the network is rather inferior to that of the 3-layer convolution when the 4-layer convolution is performed; the classification performance of the network is comparable between 2-layer convolution and 4-layer convolution, and is slightly weaker than that of 3-layer convolution.

5. The number of channels of the first convolutional layer is determined according to the influence of the number of channels of the convolutional layer on the network performance, and finally the inventor also researches the number of channels of the first convolutional layer. Fig. 7 shows the classification performance comparison of the network when the first layer convolution takes different channel numbers, where the bottom right-hand number represents the channel number of the first layer convolution of the network corresponding to the color curve. It can be seen that the presented law is basically consistent with the third layer of convolution, that is, when the number of channels is 32, the classification performance of the network is very weak; when the number of channels is more than 64, the network classification performance is basically not greatly improved. The invention therefore keeps the number of channels 64 of the first layer convolution constant.

Based on the network prototype shown in fig. 1, the final network shown in fig. 8 is obtained through testing and optimization. The ModelNet40 was used as a test data set to compare the classification performance of the prototype and the finally formed LightPointNet. As shown in fig. 9, the classification performance of the optimized network is significantly better than that of the prototype network. And the quantity of parameters of the improved architecture is reduced by 22% compared with the network prototype through calculation. By combining the above, the improved LightPointNet architecture is more compact, has fewer parameters, and has higher classification performance.

The invention provides a basic design principle of a point cloud-oriented lightweight deep learning network aiming at the real-time identification requirement of a point cloud model, gives a network prototype on the basis, completes the optimization of the setting of each parameter by a control variable method, establishes the structure and each layer of parameters of the network, and provides a lightweight real-time point cloud network model LightPointNet, wherein the network comprises 3 convolutional layers, 1 maximum pooling layer and 1 full connection layer, and has compact structure and less parameters. The classification accuracy rate equivalent to that of the best method at present is obtained on a ModelNet data set, and the excellent learning capability of LightPointNet on the unordered point cloud characteristics is fully proved. Compared with other networks, the training time and the recognition time are greatly shortened, and the rapid processing capability of the LightPointNet on the point cloud is further proved.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. The real-time point cloud model classification method based on the lightweight network LightPointNet is characterized by comprising the following steps of:

the network model of the lightweight real-time point cloud network LightPointNet has 6 layers, wherein the first layer is an input layer, the second to fourth layers are convolution layers, the fifth layer is a full-connection layer, the sixth layer is an output layer, and each layer structure is as follows:

sixth layer, output layer: comprises a Loss layer of Loss;

the specific data required for each layer are as follows:

full connection layer: 256 output dimensions, 33024 parameter numbers;

an output layer: outputting dimension k and the number of parameters 257 k;

as described above, n is the number of points in the point cloud, and k is the number of classes;

2. The lightweight network LightPointNet-based real-time point cloud model classification method according to claim 1, wherein: in steps S1 and S2, a point cloud model including n points is input, each point is composed of three coordinates (x, y, z), and is an n × 3-dimensional tensor as a whole, 64 and 128 feature maps are obtained through convolution layers of (64,3,1) and (128,1,1), respectively, the 128 feature maps obtained are further fused through a convolution layer with a parameter of (128,1,1), a tensor of 1 × 128 dimensions is obtained through maximum pooling, feature values of 128 dimensions are sent to a full connection layer, a tensor of 1 × 256 dimensions is obtained, and finally classification processing is performed.

3. The lightweight network LightPointNet-based real-time point cloud model classification method according to claim 1, wherein: the required parameter is only of the order of 10^-2M。

4. The lightweight network LightPointNet-based real-time point cloud model classification method according to claim 1, wherein: in step S2, the convolution specifically operates as follows: