CN111223120B

CN111223120B - Point cloud semantic segmentation method

Info

Publication number: CN111223120B
Application number: CN201911262240.8A
Authority: CN
Inventors: 潘琳琳; 孔慧
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2023-08-04
Anticipated expiration: 2039-12-10
Also published as: CN111223120A

Abstract

The invention discloses a point cloud semantic segmentation method, which is used for preprocessing point cloud data and comprises the operations of blocking, sampling, translation and normalization; designing a feature selection module, and establishing a neural network model based on feature selection; and training a neural network for subsequent point cloud semantic segmentation. The neural network structure comprises a feature selection module, semantic labels of each point are obtained through a training network, and the precision of point cloud semantic segmentation is improved. The neural network structure designed by the invention comprises the feature selection module, so that the feature channels of weak semantic information are restrained, the feature channels with key effects on the segmentation task are enhanced, and the segmentation precision is improved.

Description

Point cloud semantic segmentation method

Technical Field

The invention relates to a point cloud segmentation technology, in particular to a point cloud semantic segmentation method.

Background

The point cloud semantic segmentation is to divide the point cloud into semantically meaningful parts, and is an important research direction in the field of computer vision. At present, the point cloud semantic segmentation only stays on the feature extraction of the point cloud data, but the importance of each feature channel on the semantic segmentation task is ignored. Therefore, it is needed to design a network structure based on feature selection to suppress the feature channels of weak semantic information, enhance the feature channels having a key effect on the segmentation task, and improve the segmentation accuracy.

Disclosure of Invention

The invention aims to provide a point cloud semantic segmentation method.

The technical solution for realizing the purpose of the invention is as follows: a point cloud semantic segmentation method comprises the following steps:

step 1, preprocessing point cloud data, including blocking, sampling, translation and normalization operations;

step 2, designing a feature selection module, and establishing a neural network model based on feature selection;

and step 3, training a neural network for subsequent point cloud semantic segmentation.

Further, in step 1, the specific method for preprocessing the point cloud data is as follows:

firstly, dividing point cloud data into a plurality of cubes, randomly sampling in each block, discarding excessive points when the point number in the block is larger than a set threshold, and randomly picking points from the block to copy until the point number reaches the set threshold when the point number is smaller than the set threshold, so as to finish data sampling; the sampled point cloud data is a 6-dimensional vector comprising XYZ coordinate values and RGB color values, the point of the minimum coordinate value of the XYZ is taken as the origin of coordinates, the coordinate values of other points are correspondingly calculated by using a formula (1), and the data translation is completed to obtain X ', Y ' and Z '; normalizing X ', Y ', Z ' by using a formula (2), and adding 3-dimensional new coordinate values of X, Y and Z; normalizing RGB by a formula (3) to obtain normalized color values R ', G ', B ', and finally outputting processed 9-dimensional point cloud data (X ', Y ', Z ', X, Y, Z, R ', G ', B '):

X'＝X-X _min ；Y'＝Y-Y _min ；Z'＝Z-Z _min (1)

wherein X is _min 、Y _min 、Z _min Is the minimum value of XYZ coordinate value, X _max 、Y _max 、Z _max Is the maximum value of the XYZ coordinate values.

Further, in step 2, the specific method for establishing the neural network model based on feature selection is as follows:

(a) Neural network module design

Inputting the point cloud data into a neural network, obtaining a feature matrix after 5 layers of MLPs, and obtaining local features of the point cloud through a feature selection module; then obtaining global characteristics of the point cloud after maximum pooling and two layers of MLP; finally, splicing the local features and the global features, and obtaining the semantic category of each point through three-layer MLP operation;

(b) Feature selection module design

The feature selection includes a maximum pool, two layers of MLPs, an adder and a multiplier, and the input of the feature selection module is set as a feature matrix M ₁ The feature selection module selects the feature matrix M ₁ The weighting vector W is obtained by maximum pooling and two layers of MLP, and then W and M are calculated ₁ Is multiplied by each row of vectors to obtain a feature matrix M ₂ The method comprises the steps of carrying out a first treatment on the surface of the Finally M is arranged ₁ And M ₂ Adding matrix elements to obtain a feature matrix M ₃ I.e. local features of the point cloud.

Further, in step 3, the specific method for training the neural network is as follows:

optimizing the cross entropy loss function of the network by using an ADAM algorithm with a driving quantity parameter of 0.9 and using a batch training network with a size of 32; in the training process, the learning is performed by adopting a variable learning rate, the initial learning rate is 0.001, and the learning rate is reduced to 0.5 times of the original learning rate every 20 training periods, so that the learning attenuation rate is gradually increased from 0.5 to 0.99.

Compared with the prior art, the invention has the remarkable advantages that: the designed neural network structure comprises a feature selection module, so that feature channels of weak semantic information are restrained, feature channels with key effects on segmentation tasks are enhanced, and segmentation accuracy is improved.

Drawings

FIG. 1 is a workflow diagram of a point cloud semantic segmentation system of the present invention.

FIG. 2 is a flow chart of the operation of the data processing module of the present invention.

Fig. 3 is a schematic structural diagram of a neural network module according to the present invention.

Fig. 4 is a schematic structural view of the feature selection module of the present invention.

Detailed Description

The present invention will be further described with reference to the drawings and the specific embodiments.

The invention designs a neural network structure, obtains the semantic label of each point through a training network, improves the precision of point cloud semantic segmentation, and comprises a data processing module and a neural network module, and comprises the following specific working steps:

step 1, a data processing module completes the preprocessing of point cloud data, which comprises four steps of blocking, sampling, translation and normalization, as shown in fig. 2, the specific flow is as follows:

firstly, dividing point cloud data into a plurality of cubes, then randomly sampling in each block, discarding excessive points when the point number in the block is larger than a set threshold value, and randomly picking points from the block to copy until the point number reaches the set threshold value when the point number is smaller than the set threshold value, thereby completing data sampling. The acquired point cloud data are 6-dimensional vectors comprising XYZ coordinate values and RGB color values, for training convenience, the coordinate values of other points are correspondingly calculated by using a formula (1) by taking the point of the minimum coordinate value of XYZ as the origin of coordinates, and data translation is completed to obtain X ', Y ' and Z '. In order to improve the segmentation accuracy, the formula (2) is used for normalizing X ', Y ' and Z ', and 3-dimensional new coordinate values X, Y and Z (0-1) are added. In addition, the RGB is normalized by the formula (3) to obtain normalized color values R ', G ', B ' (0-1), and finally the processed 9-dimensional point cloud data (X ', Y ', Z ', X, Y, Z, R ', G ', B ') are output:

X'＝X-X _min ；Y'＝Y-Y _min ；Z'＝Z-Z _min (1)

step 2, designing a feature selection module, and establishing a neural network model based on feature selection, as shown in fig. 3 and 4, wherein the specific flow is as follows:

(a) Neural network module design

As shown in FIG. 3, the point cloud data is input into a neural network, and a feature matrix M is obtained after 5 layers of MLPs ₁ Then obtaining local characteristics of the point cloud through a characteristic selection module; then obtaining global characteristics of the point cloud after maximum pooling and two layers of MLP; and finally, splicing the local features and the global features, and obtaining the semantic category of each point through three-layer MLP operation.

(b) Feature selection module design

The feature selection module selects a useful feature channel by weighting the feature vector for each point. As shown in fig. 4, feature selection includes a maximum pool, a two-layer multi-layer perceptron (MLP), an adder, and a multiplier, assumingThe input of the feature selection module is a feature matrix M ₁ The feature selection module selects the feature matrix M ₁ The weighting vector W is obtained by maximum pooling and two layers of MLP, and then W and M are calculated ₁ Is multiplied by each row of vectors to obtain a feature matrix M ₂ The method comprises the steps of carrying out a first treatment on the surface of the Finally M is arranged ₁ And M ₂ Adding matrix elements to obtain a feature matrix M ₃ I.e. local features of the point cloud.

Step 3, training a neural network for semantic segmentation of subsequent point cloud data;

optimizing the cross entropy loss function of the network by using an ADAM algorithm with a driving quantity parameter of 0.9 and using a batch training network with a size of 32; in the training process, the learning is performed by adopting a variable learning rate, the initial learning rate is 0.001, the learning rate is reduced to 0.5 times of the original learning rate every 20 training periods, and the learning attenuation rate is gradually increased from 0.5 to 0.99.

The invention designs a neural network structure comprising a feature selection module, semantic tags of each point are obtained through a training network, and the precision of point cloud semantic segmentation is improved.

Examples

In order to verify the effectiveness of the scheme of the invention, the indoor scene point cloud data set S3DIS (Stanford 3D Indoor Semantic Dataset) is used as experimental data, and the following simulation experiment is carried out to predict the semantic label of each point. The data set comprises scanning data of 271 rooms of 6 scenes, each point is marked with a semantic label, and the system comprises the following specific working steps:

and step 1, a point cloud data preprocessing module performs four operations of blocking, sampling, translation and normalization. Dividing the point cloud data into a plurality of cubes with the side length of 1 meter according to each room, randomly sampling 4096 points in each block, discarding the points when the point number in the block is larger than 4096, and randomly picking the points from the block to copy until the point number reaches the value when the point number is smaller than 4096, so as to finish sampling; translation and normalization of the data is then completed according to equations 1-3.

Step 2, constructing a neural network based on feature selection, which comprises the following steps:

a. neural network module design

Firstly, 4096×9-dimensional point cloud data is input into a neural network, and n×1024 feature matrix M is obtained through 5 layers of MLPs with sizes of 64, 128 and 1024 dimensions in sequence ₁ For M ₁ Feature selection is carried out to obtain a 4096×1024 feature matrix M ₃ As a local feature of the point cloud; then to M ₃ Obtaining a 1X 1024 feature vector by adopting maximum pooling operation, and obtaining the 1X 128 feature vector through one layer of 256-dimensional and one layer of 128-dimensional MLP respectively as a global feature of the point cloud; finally, the local features and the global features are spliced to obtain 4096×1152 feature matrixes, then a layer of 512-dimensional, a layer of 256-dimensional and a layer of 13-dimensional MLP (semantic segmentation class number) are respectively used for obtaining 4096×13 matrixes, semantic classes of each point are obtained, and semantic segmentation of the object is completed.

b. Feature selection module design

First to M ₁ Obtaining a characteristic vector of 1 multiplied by 1024 by adopting maximum pooling operation, and obtaining a weighting vector W of 1 multiplied by 1024 by one layer of 128-dimensional and one layer of 1024-dimensional MLP respectively; then W and M ₁ Is multiplied by the vector of each row to select useful characteristic channels to obtain a 4096×1024 characteristic matrix M ₂ Then M is added with ₁ And M ₂ Adding matrix elements to obtain 4096×1024 characteristic matrix M ₃ I.e. local features of the point cloud.

And step 3, training a neural network to obtain the semantic category of each point.

Optimizing a cross entropy loss function of the network by using an ADAM algorithm with a driving quantity parameter of 0.9, so that the loss is reduced to the minimum value of the network, and training the network by using a batch with a size of 32; according to the invention, the learning is performed by adopting a variable learning rate, the initial learning rate is 0.001, the learning rate is reduced to 0.5 times of the original learning rate in every 20 training periods, and the learning attenuation rate is gradually increased from 0.5 to 0.99; the neural network is implemented in a TensorFlow programming language. In this embodiment, experimental performance of FSNet in S3DIS data set is shown, that is, overall Accuracy (Overall Accuracy) and mean cross-over ratio (mlou) are observed, since S3DIS data set includes 6 areas, we use 6 fold cross-validation to perform experimental comparison, that is, one Area is selected as a test set each time, the remaining 5 areas are used as training sets, and finally, the arithmetic average of these 6 sets of experiments is calculated to obtain the Overall semantic segmentation result of S3DIS data set, as shown in table 1:

TABLE 1S 3DIS dataset Whole semantic segmentation results

As can be seen from Table 1, the OA value of PointNet is 79.1%, the mIoU value is 46.7%, and the obtained OA value is 1.1% higher than PointNet, and the mIoU value is 1.0% higher than PointNet, respectively. This is a good indication that mining the dependencies between feature channels is meaningful for semantic segmentation tasks.

Claims

1. The point cloud semantic segmentation method is characterized by comprising the following steps of:

step 3, training a neural network for subsequent point cloud semantic segmentation;

in the step 1, the specific method for preprocessing the point cloud data comprises the following steps:

firstly dividing point cloud data into a plurality of cubes, randomly sampling in each block, discarding the excessive points when the point number in the block is larger than a set threshold, copying the excessive points from the block until the point number reaches the set threshold when the point number is smaller than the set threshold, and completing data sampling;

the sampled point cloud data is a 6-dimensional vector comprising XYZ coordinate values and RGB color values, the point of the minimum coordinate value of the XYZ is taken as the origin of coordinates, the coordinate values of other points are correspondingly calculated by using a formula (1), and the data translation is completed to obtain X ', Y ' and Z '; normalizing X ', Y ', Z ' by using a formula (2), and adding 3-dimensional new coordinate values of X, Y and Z; normalizing RGB by a formula (3) to obtain normalized color values R ', G ', B ', and finally outputting processed 9-dimensional point cloud data (X ', Y ', Z ', X, Y, Z, R ', G ', B '):

X'＝X-X _min ；Y'＝Y-Y _min ；Z'＝Z-Z _min (1)

wherein X is _min 、Y _min 、Z _min Is the minimum value of XYZ coordinate value, X _max 、Y _max 、Z _max Is the maximum value of XYZ coordinate values;

in step 2, the specific method for establishing the neural network model based on feature selection comprises the following steps:

(a) Neural network module design

Inputting the point cloud data into a neural network, obtaining a feature matrix after 5 layers of MLPs, and obtaining local features of the point cloud through a feature selection module; then obtaining global characteristics of the point cloud after maximum pooling and two layers of MLP; finally, splicing the local features and the global features, and obtaining semantic categories of each point through three-layer MLP operation, namely inputting 4096×9-dimensional point cloud data into a neural network, and obtaining an n×1024 feature matrix M through 5-layer MLP with the sizes of 64, 128 and 1024 in sequence ₁ For M ₁ Feature selection is carried out to obtain a 4096×1024 feature matrix M ₃ As a local feature of the point cloud; then to M ₃ The maximum pooling operation is adopted to obtain 1X 1024 feature vectors, and the feature vectors are respectively obtained through one layer of 256-dimensional MLP and one layer of 128-dimensional MLPA feature vector of 1×128 as a global feature of the point cloud; finally, splicing the local features and the global features to obtain 4096×1152 feature matrixes, respectively obtaining 4096×13 matrixes through one layer of 512-dimensional MLP, one layer of 256-dimensional MLP and one layer of 13-dimensional MLP to obtain semantic categories of each point, and completing semantic segmentation of objects;

(b) Feature selection module design

The feature selection includes a maximum pool, two layers of MLPs, an adder and a multiplier, and the input of the feature selection module is set as a feature matrix M ₁ The feature selection module selects the feature matrix M ₁ The weighting vector W is obtained by maximum pooling and two layers of MLP, and then W and M are calculated ₁ Is multiplied by each row of vectors to obtain a feature matrix M ₂ The method comprises the steps of carrying out a first treatment on the surface of the Finally M is arranged ₁ And M ₂ Adding matrix elements to obtain a feature matrix M ₃ Local features of point clouds, i.e. in particular for M ₁ Obtaining a characteristic vector of 1 multiplied by 1024 by adopting maximum pooling operation, and obtaining a weighting vector W of 1 multiplied by 1024 by one layer of 128-dimensional and one layer of 1024-dimensional MLP respectively; then W and M ₁ Is multiplied by the vector of each row to select useful characteristic channels to obtain a 4096×1024 characteristic matrix M ₂ Then M is added with ₁ And M ₂ Adding matrix elements to obtain 4096×1024 characteristic matrix M ₃ Namely, the local characteristics of the point cloud;

in step 3, the specific method for training the neural network is as follows: