CN110322453A

CN110322453A - 3D point cloud semantic segmentation method based on position attention and auxiliary network

Info

Publication number: CN110322453A
Application number: CN201910604264.0A
Authority: CN
Inventors: 焦李成; 冯志玺; 张格格; 杨淑媛; 程曦娜; 马清华; 张�杰; 郭雨薇; 丁静怡; 唐旭
Original assignee: Xian University of Electronic Science and Technology
Current assignee: Xian University of Electronic Science and Technology
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2019-10-11
Anticipated expiration: 2039-07-05
Also published as: CN110322453B

Abstract

The 3D point cloud semantic segmentation method based on position attention and auxiliary network that the invention proposes a kind of, mainly solves the problems, such as that prior art segmentation precision is lower, implementation is: obtaining training set T and test set V；3D point cloud semantic segmentation network is constructed, and sets the loss function of the network, which includes that successively cascade feature down-sampling network, position notice that power module, feature up-sample network and auxiliary network；The training that P wheel has supervision is carried out to the segmentation network using training set T: according to loss function in the training process of every wheel, network parameter is adjusted, after the completion of P takes turns training, using the highest network model of segmentation precision as trained network model；Test set V is input in trained network model and carries out semantic segmentation, obtains the segmentation result of each point.The present invention improves 3D point cloud semantic segmentation precision, can be used for automatic Pilot, robot, 3D scene rebuilding, quality testing, 3D drawing and smart city construction.

Description

3D point cloud semantic segmentation method based on position attention and auxiliary network

Technical field

The invention belongs to technical field of data processing, in particular to a kind of 3D point cloud semantic segmentation method can be used for automatic Driving, robot, 3D scene rebuilding, quality testing, 3D drawing and smart city construction.

Background technique

In recent years, with laser radar, the 3D such as RGBD camera sensor is answered in robot, the extensive of unmanned field With deep learning has become one of research hotspot in the application of 3D point cloud data.3D point cloud data refer to: in a three-dimensional seat The set of one group of vector in mark system, usually with x, the form of y, z three-dimensional coordinate indicate these vectors, are generally used to represent one The external surface shape of a object.In addition, may also contain RGB color, intensity, ash outside the geological information represented in addition to (x, y, z) The information such as angle value, depth or recycle time.Point cloud data is usually obtained by 3D scanning device, such as laser radar, RGBD Camera etc..The mode of these sensors automation measures the information largely put in body surface, then with certain data text Part exports point cloud data.Point cloud data has randomness, unstructured feature and may have in the 3 d space different Consistency.This applies deep learning to face huge challenge in the research of 3D point cloud data.

3D point cloud semantic segmentation, which refers to, distributes a classification to each of point cloud data of input point.In grinding for early stage Study carefully in work, 3D point cloud data are generally converted into the characteristics of image of manual voxel grid feature either multi-angle of view, then send Enter deep learning network and carry out feature extraction, not only data volume is big but also calculates complexity for the method for such converting characteristic, if drop Low resolution, then segmentation precision can decline.Therefore, point cloud data is directly handled using the method for deep learning seems especially heavy It wants.

2017, entitled " the PointNet:Deep Learning on that Charles R Qi et al. is delivered on CVPR The paper of Point Sets for 3D Classification and Segmentation " discloses a kind of directly processing 3D The deep learning frame of point cloud data, this method solve the problems, such as some cloud randomnesses using the symmetric function of max-pooling, from And extract the global characteristics of each point.But this method only considered global characteristics, have ignored the local feature of each point.Cause This, PointNet soon, the team of Charles R Qi has delivered entitled " PointNet++:Deep in NIPS The paper of Hierarchical Feature Learning on Point Sets in a Metric Space ", PointNet ++ it is the layering version of PointNet, every layer all there are three the stages: sampling, grouping and feature extraction.Some comparisons are chosen first Central point of the important point as each regional area, then chooses k according to Euclidean distance around these central points Neighbor Points.Feature is extracted using PointNet network using k Neighbor Points as a partial points cloud again, later to further feature It is returned, to obtain 3D point cloud data semantic segmentation result, this method is promoted compared with PointNet precision.

Above-mentioned both methods is compared with the traditional method, and due to directly handling 3D point cloud data, calculates simple, effectively solution The characteristics of point cloud randomness and segmentation precision is improved, still, PointNet++ is not due to accounting for each central point Relationship namely contextual information between feature, so character representation is relatively weak, in addition, PointNet++ has deferred to coding- Decoded general framework does not account for the more information of bottom, and therefore, segmentation precision is not high, still has improved space.

Summary of the invention

It is a kind of based on position attention and auxiliary net it is an object of the invention in view of the above shortcomings of the prior art, propose The 3D point cloud data semantic dividing method of network, with the auxiliary net of the position attention of associated context feature and reconstruction bottom-up information Network improves segmentation precision.

To achieve the above object, technical solution of the present invention includes the following steps:

(1) from the training file and test file of the official website ScanNet downloading 3D point cloud data, and classification statistics is carried out to it And dicing treatment, obtain training set T and test set V；

(2) 3D point cloud semantic segmentation network is constructed comprising successively cascade feature down-sampling network, position attention mould Block, feature up-sample network and auxiliary network；

(3) polytypic cross entropy loss function, the loss function as 3D point cloud semantic segmentation network are used；

(4) training set T is used, the training that P wheel has supervision, P >=500 are carried out to 3D point cloud data semantic segmentation network；

(4a), according to the loss function of semantic segmentation network, adjusts network parameter, obtains network during the training of every wheel Model；

(4b) is every P₁Wheel, assesses the segmentation precision of current network model using the sample of test set, if currently The segmentation precision of network model is higher than previously stored network model, then is saved, P₁≥2；

After the completion of (4c) P wheel training, using the highest network model of segmentation precision as trained network model；

(5) test set V is input in trained network model and carries out semantic segmentation, obtain the segmentation knot of each point Fruit.

Compared with prior art, the present invention having the advantage that

The present invention pays attention to power module due to constructing 3D point cloud semantic segmentation network, and by position therein, calculates it Correlation between feature representated by each mass center of input data is the local centroid feature of 3D point cloud semantic segmentation network Increase contextual information；Simultaneously because by auxiliary network therein, to the low-level image feature of 3D point cloud semantic segmentation network into Row passback, has rebuild the bottom-up information of 3D point cloud semantic segmentation network, has effectively increased the segmentation precision of 3D point cloud semantic segmentation.

Detailed description of the invention

Fig. 1 is implementation flow chart of the invention；

Fig. 2 is the 3D point cloud semantic segmentation network overall structure figure constructed in the present invention；

Fig. 3 is attention function structure chart in position in the present invention.

Specific embodiment

In the following with reference to the drawings and specific embodiments, present invention is further described in detail.

Referring to Fig.1, the realization step of this example includes as follows.

Step 1, training set T and test set V is obtained.

1.1) from the training file and test file of the official website ScanNet downloading 3D point cloud data, wherein trained file includes f₀A cloud scene includes f in test file₁A cloud scene, f in the present embodiment₀=1201, f₁=312；

1.2) using all f in statistics with histogram training file₀The number of each classification of the point cloud data of a scene, and count Calculate the weight w of each classification_k:

Wherein, G_kIndicate the number of kth class point cloud data, M indicates the number of all point cloud datas, and L indicates segmentation classification It counts, L >=2, L=21 in the present embodiment；

1.3) to each scene in training file, point centered on a point is randomly selected, coordinate is (x, y, z), (x-0.75, x+0.75) is taken around it, (y-0.75, y+0.75), the point in (z-0.75, z+0.75) range forms a number According to block；

1.4) sampling number N is set₀, points in data block and sampling number N that (1.3) are obtained₀It is compared, sentences Breaking, whether it is reasonable:

If the points in the data block are greater than sampling number N₀, then it is reasonable to be judged to the data block, and in the data block with Machine samples N₀It is a, a sample data is formed, otherwise, abandons the data block, thus obtains training set T, in the present embodiment, N₀ =8192；

1.5) for f all in test file₁Each of a scene scene is 1.5 × 1.5 × 3 using size Cube window carries out sliding window stripping and slicing, to each data block, stochastical sampling N₀It is a, a sample data is formed, is tested Collect V.

Step 2,3D point cloud semantic segmentation network is constructed.

Referring to Fig. 2, the 3D point cloud semantic segmentation network of the building of this step includes successively cascade feature down-sampling network, Position notices that power module, feature up-sample network and auxiliary network.

2.1) feature down-sampling network is set:

This feature down-sampling network includes n cascade PointSA modules, and each PointSA module includes successively cascading The sampling of point cloud mass center and packet layer, point cloud feature extraction layer, wherein n >=2, the parameter is set as n=4 in the present embodiment；

For the mass center sampling of m-th of PointSA module and packet layer, m=1,2 ..., n use iteration most first Far point sampling method is from input point cluster samplingA point as center of mass point, secondly, withCentered on the center of mass point of a sampling, make With spherical search algorithm, in its certain radius r^mIn the range of search forIt is a, form a grouping.The 1st in the present embodiment PointSA module, settingr¹=0.1；2nd PointSA module, setting r²=0.2；3rd PointSA module, settingr³=0.4；4th PointSA module, Settingr⁴=0.8；

For the point cloud feature extraction layer of m-th of PointSA module, including 3 successively cascade 2D convolutional layers, for mentioning The feature of mass center sampling and packet layer output data is taken, and pond is carried out to the provincial characteristics extracted using maximum pondization strategy Change.In the present embodiment the convolution kernel size of 3 2D convolutional layers of the point cloud feature extraction layer of the 1st PointSA module be 1 × 1, step-length is 1, and output channel number is 32,32,64 respectively；3 2D of the point cloud feature extraction layer of the 2nd PointSA module The convolution kernel size of convolutional layer is 1 × 1, and step-length is 1, and output channel number is 64,64,128 respectively；3rd PointSA mould The convolution kernel size of 3 2D convolutional layers of the point cloud feature extraction layer of block is 1 × 1, and step-length is 1, output channel number difference It is 128,128,256；The convolution kernel size of 3 2D convolutional layers of the point cloud feature extraction layer of the 4th PointSA module is 1 × 1, step-length is 1, and output channel number is 256,256,512 respectively；

2.2) setting position pays attention to power module, for calculating between feature representated by each mass center of its input data F Correlation, obtain the strengthened feature E of position attention:

Referring to Fig. 3, the module working principle is as follows:

2.2.1) input data F passes through the first 1D convolutional layer Q respectively and obtains the feature Q of i-th of mass center_i, i=1,2 ..., The mass center quantity of N, N expression F；Using the 2nd 1D convolutional layer U, the feature U of j-th of mass center is obtained_j, j=1,2 ..., N, then The feature V of j-th of mass center is obtained by the 3rd 1D convolutional layer V_j；Wherein, the convolution kernel size of these three 1D convolutional layers Q, U, V is equal It is 1, step-length is 1, and the output feature port number of the first 1D convolutional layer Q and the 2nd 1D convolutional layer U are that input data F feature is logical Road numberThe output feature port number of 3rd 1D convolutional layer V is identical as the feature port number of input data F；

2.2.2 the attention influence value t between feature representated by each mass center) is calculated_ij:Use t_ijConstitute matrix A:

2.2.3) calculating position attention feature

2.2.4 the strengthened feature E of attention) is exported:

E=[E₁；E₂；...；E_i；...；E_N],

Wherein, E_i=α J_i+F_iIndicate the feature of i-th of mass center in E, α indicates position attention feature J_iWeight, F_iTable Show the feature of i-th of mass center of input；

2.3) setting feature up-samples network:

This feature up-sampling network include successively cascade a PointFP module, 1D convolutional layer, Dropout layers and be used for The 1D convolutional layer of classification, each PointFP module include successively cascade feature interpolated layer and feature extraction layer, wherein a >=2, The parameter is set as a=4 in the present embodiment；

The structure of a PointFP module, feature interpolated layer and feature extraction layer is different, in which:

For the 1st PointFP module, feature interpolated layer carries out interpolation to the output feature of position attention module, And cascade the output feature of feature and the 3rd PointSA module after interpolation, the output for obtaining feature interpolated layer is special Sign；Its feature extraction layer includes 2 successively cascade 2D convolutional layers, for further extracting the output feature, 2 2D convolution It is 1 × 1 that the convolution kernel of layer, which is convolution kernel size, and step-length is 1, and output channel number is 256,256 respectively；

For the 2nd PointFP module, feature interpolated layer carries out the output feature of the 1st PointFP module slotting Value, and the output feature of feature and the 2nd PointSA module after interpolation is cascaded, obtain the output of feature interpolated layer Feature；Its feature extraction layer, comprising 2 successively cascade 2D convolutional layers, for further extracting the output feature, 2 2D volumes The convolution kernel of lamination is that convolution kernel size is 1 × 1, and step-length is 1, and output channel number is 256,256 respectively；

For the 3rd PointFP module, feature interpolated layer carries out the output feature of the 2nd PointFP module slotting Value, and the output feature of feature and the 1st PointSA module after interpolation is cascaded, obtain the output of feature interpolated layer Feature；Its feature extraction layer, comprising 2 successively cascade 2D convolutional layers, for further extracting the output feature, 2 2D volumes The convolution kernel of lamination is that convolution kernel size is 1 × 1, and step-length is 1, and output channel number is 256,128 respectively；

For the 4th PointFP module, feature interpolated layer carries out the output feature of the 3rd PointFP module slotting It is worth, the feature after obtaining interpolation, output feature of the feature after the interpolation as its feature interpolated layer；Its feature extraction layer, packet Containing 3 successively cascade 2D convolutional layers, for further extracting the output feature, the convolution kernel of 3 2D convolutional layers is convolution Core size is 1 × 1, and step-length is 1, and output channel number is 128,128,128 respectively.

The 1D convolutional layer, convolution kernel size are 1, and step-length 1, output feature port number is set as 128；

It is Dropout layers described, retain probability and is set as 0.5；

The 1D convolutional layer for classification, convolution kernel size are 1, and step-length 1, output feature port number is set as point The classification number L cut.

2.4) setting auxiliary network:

The auxiliary network includes successively cascade b PointAux module and the 1D convolutional layer for classification, each PointAux module includes 1D convolutional layer and feature interpolated layer, wherein b >=1, b=2 in the present embodiment；

For the 1st PointAux module, 1D convolutional layer is used to extract the spy of the 2nd PointFP module output data Sign, convolution kernel size are 1, step-length 1, and output feature channel is the classification number L of segmentation；Its feature interpolated layer is used to 1D convolution The feature that layer extracts carries out interpolation；

For the 2nd PointAux module, 1D convolutional layer is used to extract the spy of the 1st PointAux module output data Sign, convolution kernel size are 1, step-length 1, and output feature channel is the classification number L of segmentation；Its feature interpolated layer is used to 1D convolution The feature that layer extracts carries out interpolation；

For the 1D convolutional layer of classification, classify for the output feature to the 2nd PointAux module, convolution kernel Size is 1, step-length 1, and output feature port number is set as the classification number L of segmentation.

Step 3, the loss function of 3D point cloud semantic segmentation network is set.

This example is by polytypic cross entropy loss function, as the loss function of 3D point cloud semantic segmentation network, table Show that formula is as follows:

Wherein, C represents the sample points of training, and L represents classification sum, w_kFor the weight of kth class, w_aFor auxiliary network The weight of loss, w_a∈ [0,1], w in the present embodiment_a=0.5；

p_i,kThe true probability that i-th of sample point belongs to kth class is represented, if i-th of sample point belongs to kth class, probability value It is 1, otherwise, probability value 0；

WithIt respectively indicates feature up-sampling network and i-th of sample point of neural network forecast is assisted to belong to kth class Probability, calculation formula are as follows:

Wherein,It respectively indicates feature up-sampling network and assists k-th of i-th of sample point of network output and lead to Road characteristic value, calculation formula are as follows:

Wherein, x_iIndicate the input feature vector of i-th of sample point, f¹Indicate that feature up-samples network, θ¹Indicate feature up-sampling The parameter of network, f²Indicate auxiliary network, θ²Indicate the parameter of auxiliary network.

Step 4, using training set T, the training that P wheel has supervision, P >=500 are carried out to 3D point cloud semantic segmentation network.

P=1000 is taken in the present embodiment, training step is as follows:

4.1) during q takes turns training, l is set_qThe learning rate of training process is taken turns for q, and θ is set_qIt takes turns and trains for q The parameter of the network model of process uses formula according to the loss function that step 3 is setAdjust θ_q, obtain To the network model parameter θ for q+1 wheel training process_q+1, thus obtain q and take turns the network model after training process；

4.2) every P₁Wheel, test set is input in current network model, all point cloud datas in test set are obtained Predict classification, P₁>=2, in the present embodiment, P₁=5；

4.3) statistical test concentrates the prediction classification of all point cloud datas number identical with its true classification, calculates segmentation Precision:Wherein, R indicates the prediction classification of all point cloud datas number identical with its true classification in test set, H indicates the number of all point cloud datas in test set；

4.4) the segmentation precision acc of the segmentation precision acc of current network model and previously stored network model is carried out Compare, if the segmentation precision acc of current network model is higher than the segmentation precision acc of previously stored network model, shows to work as Preceding network model is more preferable, and saves to it, otherwise, without saving.

4.5) after the completion of the training of P wheel, using the highest network model of segmentation precision as trained network model；

Step 5, test set V is input in the trained network model that step 4.5) obtains and carries out semantic segmentation, obtained To the segmentation result of each point.

Below in conjunction with emulation experiment, technical effect of the invention is explained:

1. simulated conditions

Emulation experiment of the invention carries out in following environment.

Hardware platform: Intel (R) Xeon (R) CPU E5-2650v4@2.20GHz, 64GB running memory, Ubuntu16.04 operating system, GeForce GTX TITAN X；

Software platform: Tensorflow deep learning frame, Python3.5, data set used by testing is a cloud number According to collection ScanNet.

ScanNet is the indoor scene point cloud data collection for scanning and rebuilding by RGB-D camera.Include in total 1513 scenes use 1201 scenes as training set, and for 312 scenes as test set, the classification number for including has 21 classes.

2. emulation experiment:

Training set and test set are obtained according to the present invention, 3D point cloud semantic segmentation network are constructed, using training set to 3D point Cloud semantic segmentation network carries out Training, is then predicted using trained network model the point in test set, 3D point cloud segmentation network is calculated to the segmentation precision of test set V according to the method for step 4.3.

Compare the precision of the invention for doing semantic segmentation to point cloud data with existing PointNet++ method, and uses segmentation Evaluation index of the precision as the evaluation present invention and prior art quality, the results are shown in Table 1:

1 ScanNet Segmentation of Data Set accuracy comparison table of table

Evaluation index	The prior art	The present invention
			Segmentation precision	0.836	0.852

From table 1 it follows that segmentation precision of the present invention on ScanNet data set has been more than the prior art Pointnet++ improves 1.6%, shows that the present invention is better than PointNet++ to the semantic segmentation effect of 3D point cloud.

Claims

1. a kind of 3D point cloud semantic segmentation method based on position attention and auxiliary network, which is characterized in that include the following:

(1) from the training file and test file of the official website ScanNet downloading 3D point cloud data, and classification statistics is carried out to it and is cut Block processing, obtains training set T and test set V；

(2) 3D point cloud semantic segmentation network is constructed comprising successively cascade feature down-sampling network, position pay attention to power module, Feature up-samples network and auxiliary network；

(4) training set T is used, the training that P wheel has supervision is carried out to 3D point cloud data semantic segmentation network, P >=500:

(4a), according to the loss function of semantic segmentation network, adjusts network parameter, obtains network mould during the training of every wheel Type；

(4b) is every P₁Wheel, assesses the segmentation precision of current network model using the sample of test set, if current network mould The segmentation precision of type is higher than previously stored network model, then is saved, P₁≥2；

(5) test set V is input in trained network model and carries out semantic segmentation, obtain the segmentation result of each point.

2. the method according to claim 1, wherein being carried out at classification statistics and stripping and slicing in (1) to point cloud data Reason, is accomplished by

(1a) uses all f in statistics with histogram training file₀The number of each classification of the point cloud data of a scene, and calculate each The weight w of a classification_k:

Wherein, G_kIndicate the number of kth class point cloud data, M indicates the number of all point cloud datas, and L indicates segmentation classification number, f₀ >=1000, L >=2；

(1b) randomly selects point centered on a point, coordinate is (x, y, z), in its week to each scene in training file It enclosing and takes (x-0.75, x+0.75), (y-0.75, y+0.75), the point in (z-0.75, z+0.75) range forms a data block, And by the data block points and sampling number N₀It is compared, judges whether it is reasonable:

If the points in the data block are greater than sampling number N₀, then it is reasonable to be judged to the data block, and the stochastical sampling in the data block N₀It is a, a sample data is formed, otherwise, the data block is abandoned, thus obtains training set T, wherein N₀≥4096；

(1c) is for f all in test file₁Each of a scene scene, the cube for the use of size being 1.5 × 1.5 × 3 Window carries out sliding window stripping and slicing, to each data block, stochastical sampling N₀It is a, a sample data is formed, test set V, f are obtained₁≥ 300。

3. the method according to claim 1, wherein feature down-sampling network described in (2), including n grade The PointSA module of connection, each PointSA module include successively cascade cloud mass center sampling and packet layer, point Yun Tezheng Extract layer, wherein n >=2.

4. the method according to claim 1, wherein position described in (2) pays attention to power module, including 3 solely Vertical 1D convolutional layer Q, U, V, the feature of the input F for extracting the module, and calculate between feature representated by each mass center Attention influence value t_ijWith the strengthened feature E of attention:

E=[E₁；E₂；...；E_i；...；E_N]

Wherein, U_iIndicate that position pays attention to feature of the input data F of power module by 1D convolutional layer U i-th of the mass center extracted, Q_j ^T Indicate that position notices that the transposition of the feature of j-th mass center of the input data F of power module by 1D convolutional layer Q extraction, N indicate F Mass center quantity, E_iIndicate the feature of i-th of mass center in E, calculation formula are as follows:

Wherein, V_jIndicate the feature of j-th mass center of the F by 1D convolutional layer V extraction,It indicates by position attention The feature of i-th of mass center afterwards, α indicate the weight of position attention feature, F_iIndicate the feature of i-th of mass center of input.

5. the method according to claim 1, wherein feature described in (2) up-samples network, including successively grade A PointFP module of connection, 1D convolutional layer, Dropout layers and for classification 1D convolutional layer, each PointFP module includes Successively cascade feature interpolated layer and feature extraction layer, wherein a >=2.

6. the method according to claim 1, wherein auxiliary network described in (2), including successively cascade b A PointAux module and 1D convolutional layer for classification, each PointAux module include 1D convolutional layer and feature interpolated layer, Wherein, b >=1.

7. the method according to claim 1, wherein 3D point cloud semantic segmentation network described in step (3) Loss function, calculation formula are as follows:

Wherein, C represents the sample points of training, and L represents classification sum, w_kFor the weight of kth class, w_aFor the loss for assisting network Weight, w_a∈[0,1]；p_i,kThe true probability that i-th of sample point belongs to kth class is represented, if i-th of sample point belongs to kth Class, then probability value is 1, otherwise, probability value 0；WithIt respectively indicates feature up-sampling network and assists neural network forecast I-th of sample point belongs to the probability of kth class,WithCalculation formula it is as follows:

Wherein,It respectively indicates feature up-sampling network and assists k-th of channel of i-th of sample point of network output special Value indicative, calculation formula are as follows:

Wherein, x_iIndicate the input feature vector of i-th of sample point, f¹Indicate that feature up-samples network, θ¹Indicate that feature up-samples network Parameter, f²Indicate auxiliary network, θ²Indicate the parameter of auxiliary network.

8. the method according to claim 1, wherein being adjusted in (4a) according to the loss function of semantic segmentation network Whole network parameter is carried out by following formula:

Wherein, l_qIndicate the learning rate of q wheel training process, θ_qIndicate the 3D point cloud semantic segmentation network of q wheel training process Parameter, θ_q+1It indicates to θ_qParameter after adjustment, for q+1 wheel training process.

9. the method according to claim 1, wherein every P in (4b)₁Wheel, to the segmentation essence of current network model Degree is assessed, and is accomplished by

(4b1) is every P₁Wheel, test set is input in current network model, the prediction of all point cloud datas in test set is obtained Classification；

(4b2) statistical test concentrates the prediction classification of all point cloud datas number identical with its true classification, calculates segmentation essence Degree:Wherein, R indicates the prediction classification of all point cloud datas number identical with its true classification, H in test set Indicate the number of all point cloud datas in test set；

(4b3) compares the segmentation precision acc of current network model and the segmentation precision acc of previously stored network model Compared with if segmentation precision acc of the segmentation precision acc of current network model higher than previously stored network model, shows currently Network model is more preferable, and saves to it, otherwise, without saving.