CN110322453B - 3D point cloud semantic segmentation method based on position attention and auxiliary network - Google Patents
3D point cloud semantic segmentation method based on position attention and auxiliary network Download PDFInfo
- Publication number
- CN110322453B CN110322453B CN201910604264.0A CN201910604264A CN110322453B CN 110322453 B CN110322453 B CN 110322453B CN 201910604264 A CN201910604264 A CN 201910604264A CN 110322453 B CN110322453 B CN 110322453B
- Authority
- CN
- China
- Prior art keywords
- network
- point cloud
- training
- representing
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000012360 testing method Methods 0.000 claims abstract description 34
- 238000005070 sampling Methods 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 abstract description 2
- 238000013507 mapping Methods 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 7
- 238000004088 simulation Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a 3D point cloud semantic segmentation method based on position attention and an auxiliary network, which mainly solves the problem of low segmentation precision in the prior art, and the implementation scheme is as follows: acquiring a training set T and a test set V; constructing a 3D point cloud semantic segmentation network, and setting a loss function of the network, wherein the network comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded; and performing P rounds of supervised training on the segmentation network by using a training set T: adjusting network parameters according to a loss function in the training process of each round, and taking a network model with the highest segmentation precision as a trained network model after P rounds of training are completed; and inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point. The method improves the semantic segmentation precision of the 3D point cloud, and can be used for automatic driving, robots, 3D scene reconstruction, quality detection, 3D mapping and smart city construction.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a 3D point cloud semantic segmentation method which can be used for automatic driving, robots, 3D scene reconstruction, quality detection, 3D mapping and smart city construction.
Background
In recent years, with the wide application of laser radars, RGBD cameras and other 3D sensors in the fields of robots and unmanned driving, the application of deep learning in 3D point cloud data has become one of the research hotspots. The 3D point cloud data refers to: a set of vectors in a three-dimensional coordinate system, usually expressed in the form of x, y, z three-dimensional coordinates, is typically used to represent the shape of the outer surface of an object. In addition, besides the geometric information represented by (x, y, z), RGB color, intensity, gray scale value, depth, or number of returns may be included. The point cloud data is typically obtained by a 3D scanning device, such as a laser radar, RGBD camera, or the like. These sensors measure information in an automated manner at a large number of points on the object surface and then output point cloud data using some data file. Point cloud data has the characteristics of being disorderly, unstructured, and may have different densities in 3D space. This makes the study of deep learning applications on 3D point cloud data a huge challenge.
The 3D point cloud semantic segmentation refers to allocating a category to each point in the input point cloud data. In early research work, 3D point cloud data was generally converted into manual voxel grid features or multi-view image features, and then sent into a deep learning network for feature extraction, so that the method for converting features is not only large in data size, but also complex in calculation, and if the resolution is reduced, the segmentation accuracy would be reduced. Therefore, it is especially important to directly process point cloud data by using a deep learning method.
In 2017, a paper published on CVPR by Charles R Qi and the like and named as 'Deep Learning framework for directly processing 3D Point cloud data' discloses a Deep Learning framework for solving the problem of Point cloud disorder by using a symmetric function of max-posing so as to extract the global feature of each Point. But this method only considers global features and ignores local features at each point. Therefore, shortly after PointNet, charles R Qi's team published a paper in NIPS named "PointNet + +: deep Hierarchical Feature Learning on Point Sets in a Metric Space", pointNet + + being a layered version of PointNet, each layer having three phases: sampling, grouping and feature extraction. Firstly, some more important points are selected as the central points of each local area, and then k adjacent points are selected around the central points according to Euclidean distance. And then, taking the k neighbor points as a local point cloud, extracting features by adopting a PointNet network, and then returning deep features to obtain a 3D point cloud data semantic segmentation result.
Compared with the traditional method, the two methods have the advantages that the 3D point cloud data are directly processed, the calculation is simple, the characteristic of point cloud disorder is effectively solved, and the segmentation precision is improved, however, the PointNet + + does not consider the relation among characteristics of each central point, namely context information, so that the characteristic representation is relatively weak, and in addition, the PointNet + + obes a general frame of coding-decoding and does not consider more information of a bottom layer, so that the segmentation precision is not high, and an improved space is still provided.
Disclosure of Invention
The invention aims to provide a 3D point cloud data semantic segmentation method based on position attention and an auxiliary network aiming at the defects of the prior art, so that the position attention of context characteristics is associated with the auxiliary network for reconstructing underlying information, and the segmentation precision is improved.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) Downloading a training file and a test file of 3D point cloud data from a ScanNet official network, and carrying out category statistics and block cutting processing on the training file and the test file to obtain a training set T and a test set V;
(2) Constructing a 3D point cloud semantic segmentation network, which comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded;
(3) Using a multi-classification cross entropy loss function as a loss function of the 3D point cloud semantic segmentation network;
(4) Performing P rounds of supervised training on the 3D point cloud data semantic segmentation network by using a training set T, wherein P is more than or equal to 500;
(4a) In each round of training process, according to the loss function of the semantic segmentation network, adjusting network parameters to obtain a network model;
(4b) Every other P 1 And evaluating the segmentation accuracy of the current network model by using the sample of the test set, and if the segmentation accuracy of the current network model is higher than that of the previously stored network model, storing the current network model, P 1 ≥2;
(4c) After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;
(5) And inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point.
Compared with the prior art, the invention has the following advantages:
according to the invention, as the 3D point cloud semantic segmentation network is constructed, and the relevance among the characteristics represented by each centroid of the input data of the 3D point cloud semantic segmentation network is calculated through the position attention module, the context information is added to the local centroid characteristics of the 3D point cloud semantic segmentation network; meanwhile, the bottom layer characteristics of the 3D point cloud semantic segmentation network are returned through the auxiliary network, so that the bottom layer information of the 3D point cloud semantic segmentation network is reconstructed, and the segmentation precision of the 3D point cloud semantic segmentation is effectively improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a whole structure diagram of a 3D point cloud semantic segmentation network constructed in the invention;
FIG. 3 is a block diagram of a location attention module according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, implementation steps of this example include the following.
Step 1, a training set T and a test set V are obtained.
1.1 Training and testing files for downloading 3D point cloud data from ScanNet official network, wherein the training files contain f 0 Point cloud scene, test file contains f 1 Point cloud scenario, in this example f 0 =1201,f 1 =312;
1.2 Using histogram statistics to count all f's in the training file 0 The number of each category of the point cloud data of each scene is calculated, and the weight w of each category is calculated k :
Wherein, G k The number of k-th point cloud data is represented, M represents the number of all point cloud data, L represents the number of segmentation categories, L is more than or equal to 2, and L =21 in the embodiment;
1.3 Randomly selecting a point as a central point for each scene in the training file, wherein the coordinate is (x, y, z), and taking points in the ranges of (x-0.75, x + 0.75), (y-0.75, y + 0.75), (z-0.75, z + 0.75) around the point to form a data block;
1.4 Set the number of sampling points N) 0 The number of points in the data block obtained in (1.3) and the number of sampling points N 0 And (4) comparing to judge whether the comparison is reasonable:
if the number of points in the data block is larger than the number of sampling points N 0 Then the data block is judged to be reasonable and N is randomly sampled in the data block 0 Point, forming a sample data, otherwise, discarding the data block, thereby obtaining a training set T, in this embodiment, N 0 =8192;
1.5 For all f in the test file 1 For each scene in the scene, a sliding window is cut by using a cubic window with the size of 1.5 multiplied by 3, and N is randomly sampled for each data block 0 And (4) forming sample data by using the points to obtain a test set V.
And 2, constructing a 3D point cloud semantic segmentation network.
Referring to fig. 2, the 3D point cloud semantic segmentation network constructed in this step includes a feature down-sampling network, a location attention module, a feature up-sampling network, and an auxiliary network, which are sequentially cascaded.
2.1 Set up a feature downsampling network:
the feature downsampling network comprises n cascaded PointSA modules, wherein each PointSA module comprises a point cloud centroid sampling and grouping layer and a point cloud feature extraction layer which are cascaded in sequence, n is larger than or equal to 2, and the parameter is set to be n =4 in the embodiment;
for the centroid sampling and grouping layer of the mth PointSA module, m =1,2Point is taken as the center of mass point and, secondly, in->Using the center of mass point of each sample as the center, and using a spherical search algorithm to search for the center of mass point at a specific radius r m Search in range->Dots, constitute a packet. In this embodiment, the 1 st PointSA module is set to->r 1 =0.1; the 2 nd PointSA module, set-> r 2 =0.2; the 3 rd PointSA module, set->r 3 =0.4; 4 th PointSA module, set->r 4 =0.8;
And the point cloud feature extraction layer of the mth PointSA module comprises 3 sequentially cascaded 2D convolution layers and is used for extracting the features of the data output by the centroid sampling and grouping layer and pooling the extracted region features by using a maximum pooling strategy. In this embodiment, the convolution kernels of 3 2D convolution layers of the point cloud feature extraction layer of the 1 st PointSA module are all 1 × 1, the step length is all 1, and the number of output channels is 32, and 64, respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of the 2 nd PointSA module are all 1 multiplied by 1, the step length is 1, and the number of output channels is 64, 64 and 128 respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of a 3 rd PointSA module are all 1 multiplied by 1, the step length is 1, and the number of output channels is 128, 128 and 256 respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of a 4 th PointSA module are all 1 multiplied by 1, the step length is 1, and the output channel numbers are respectively 256, 256 and 512;
2.2 Set up the position attention module, is used for calculating the correlation between the characteristic that its every mass center of input data F represents, get the characteristic E after the position attention strengthens:
referring to fig. 3, the module works as follows:
2.2.1 Input data F respectively pass through the first 1D convolutional layer Q to obtain the characteristic Q of the ith centroid i I =1,2,.., N denotes the number of centroids of F; then the second 1D convolution layer U is used to obtain the characteristic U of the jth mass center j J =1,2.., N, and then the characteristic V of the jth centroid is obtained through the third 1D convolution layer V j (ii) a The sizes of convolution kernels of the three 1D convolution layers Q, U, V are all 1, the step lengths are all 1, and the number of output characteristic channels of the first 1D convolution layer Q and the second 1D convolution layer U is equal to the number of input data F characteristic channelsThe number of output characteristic channels of the third 1D convolutional layer V is the same as that of the input data F;
2.2.2 Computing attention-influence values t between features represented by respective centroids ij :Using t ij Forming a matrix A:
2.2.4 Feature E after attention boost is output:
E=[E 1 ;E 2 ;...;E i ;...;E N ],
wherein E is i =αJ i +F i Denote the feature of the ith centroid in E, and α denotes the positional attention feature J i Weight of (1), F i A feature representing an ith centroid of the input;
2.3 Set up a feature upsampling network:
the characteristic up-sampling network comprises a plurality of PointFP modules, a 1D convolution layer, a Dropout layer and a 1D convolution layer for classification which are sequentially cascaded, wherein each PointFP module comprises a characteristic interpolation layer and a characteristic extraction layer which are sequentially cascaded, a is more than or equal to 2, and the parameter is set to be a =4 in the embodiment;
the a PointFP modules have different structures of a characteristic interpolation layer and a characteristic extraction layer, wherein:
for the 1 st PointFP module, the characteristic interpolation layer interpolates the output characteristics of the position attention module, and the characteristics after interpolation are cascaded with the output characteristics of the 3 rd PointSA module to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of the convolutional kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 256 respectively;
for the 2 nd PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 1 st PointFP module, and the interpolated characteristics and the output characteristics of the 2 nd PointSA module are cascaded to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of the convolutional kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 256 respectively;
for the 3 rd PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 2 nd PointFP module, and the interpolated characteristics and the output characteristics of the 1 st PointSA module are cascaded to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of convolution kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 128 respectively;
for the 4 th PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 3 rd PointFP module to obtain interpolated characteristics, and the interpolated characteristics are used as the output characteristics of the characteristic interpolation layer; the feature extraction layer comprises 3 sequentially cascaded 2D convolutional layers and is used for further extracting the output feature, the sizes of convolution kernels of the 3 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 128, 128 and 128 respectively.
The convolution kernel of the 1D convolution layer is 1, the step length is 1, and the number of output characteristic channels is set to be 128;
the Dropout layer, the retention probability of which is set to 0.5;
the 1D convolutional layer for classification has the convolutional kernel size of 1 and the step length of 1, and the number of output characteristic channels is set as the number L of the classification of the segmentation.
2.4 Set up the auxiliary network:
the auxiliary network comprises b PointAux modules and 1D convolutional layers for classification, wherein the b is more than or equal to 1, and the b =2 in the embodiment;
for the 1 st PointAux module, the 1D convolution layer of the 1 st PointAux module is used for extracting the characteristics of the output data of the 2 nd PointFP module, the size of a convolution kernel is 1, the step length is 1, and an output characteristic channel is the number L of the divided categories; the characteristic interpolation layer is used for interpolating the characteristics extracted by the 1D convolution layer;
for the 2 nd PointAux module, the 1D convolution layer is used for extracting the characteristics of the output data of the 1 st PointAux module, the size of the convolution kernel is 1, the step length is 1, and the output characteristic channel is the segmented class number L; the characteristic interpolation layer is used for interpolating the characteristics extracted by the 1D convolution layer;
and the 1D convolutional layer is used for classifying the output characteristics of the 2 nd PointAux module, the size of a convolutional kernel is 1, the step length is 1, and the number of output characteristic channels is set as the number L of the divided categories.
And 3, setting a loss function of the 3D point cloud semantic segmentation network.
The example takes a multi-classification cross entropy loss function as a loss function of a 3D point cloud semantic segmentation network, and the expression formula is as follows:
wherein C represents the number of training sample points, L represents the total number of categories, and w k Is a weight of class k, w a Weight of loss to the auxiliary network, w a ∈[0,1]W in this embodiment a =0.5;
p i,k Representing the real probability that the ith sample point belongs to the kth class, wherein if the ith sample point belongs to the kth class, the probability value is 1, otherwise, the probability value is 0;
and &>Respectively representing the probability that the ith sample point predicted by the feature upsampling network and the auxiliary network belongs to the kth class, and the calculation formula is as follows:
wherein,Respectively representing the k channel characteristic value of the ith sample point output by the characteristic up-sampling network and the auxiliary network, and the calculation formula is as follows:
wherein x is i Input features representing the ith sample point, f 1 Representing a characteristic upsampling network, θ 1 Parameters representing a characteristic up-sampling network, f 2 Representing auxiliary networks, theta 2 Representing parameters of the auxiliary network.
And 4, performing P rounds of supervised training on the 3D point cloud semantic segmentation network by using the training set T, wherein P is more than or equal to 500.
In this embodiment, P =1000 is taken, and the training steps are as follows:
4.1 In the q-th round of training, set l q For the learning rate of the q-th round of training process, θ is set q Using a formula for the parameters of the network model of the q-th round training process according to the loss function set in step 3Adjusting theta q To obtain a network model parameter theta for the q +1 th round of training process q+1 Thus obtaining a network model after the q-th round of training process;
4.2 Every P) 1 And (4) inputting the test set into the current network model to obtain the prediction categories, P, of all point cloud data in the test set 1 Not less than 2, in this example, P 1 =5;
4.3 ) counting the number of the prediction categories of all point cloud data in the test set, which is the same as the real categories of the point cloud data, and calculating the segmentation precision:wherein, R represents the number of the prediction categories of all the point cloud data in the test set which is the same as the real categories of the point cloud data, and H represents the number of all the point cloud data in the test set;
4.4 Comparing the segmentation accuracy acc of the current network model with the segmentation accuracy acc of the previously stored network model, if the segmentation accuracy acc of the current network model is higher than the segmentation accuracy acc of the previously stored network model, indicating that the current network model is better and storing the current network model, otherwise, not storing the current network model.
4.5 After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;
and 5) inputting the test set V into the trained network model obtained in the step 4.5) for semantic segmentation to obtain a segmentation result of each point.
The technical effects of the invention are explained by combining simulation experiments as follows:
1. simulation conditions
The simulation experiment of the present invention was performed in the following environment.
Hardware platform: intel (R) Xeon (R) CPU E5-2650v4@2.20GHz,64GB runs memory, ubuntu16.04 operating system, geForce GTX TITAN X;
a software platform: tensorflow deep learning framework, python3.5, the dataset used for the experiment was a point cloud dataset ScanNet.
ScanNet is a point cloud dataset of an indoor scene scanned and reconstructed by an RGB-D camera. The total number of 1513 scenes is included, 1201 scenes are used as a training set, 312 scenes are used as a test set, and the number of included categories is 21.
2. Simulation experiment:
according to the method, a training set and a test set are obtained, a 3D point cloud semantic segmentation network is constructed, supervised training is carried out on the 3D point cloud semantic segmentation network by using the training set, then points in the test set are predicted by using a trained network model, and the segmentation precision of the 3D point cloud segmentation network on the test set V is calculated according to the method in the step 4.3.
Comparing the precision of semantic segmentation on point cloud data by the invention and the existing PointNet + + method, and using the segmentation precision as an evaluation index for evaluating the quality of the invention and the prior art, the results are shown in Table 1:
TABLE 1 ScanNet data set segmentation accuracy comparison table
Evaluation index | Prior Art | The invention |
Accuracy of segmentation | 0.836 | 0.852 |
As can be seen from Table 1, the segmentation precision of the ScanNet data set exceeds that of PointNet + +, which is the prior art, and is improved by 1.6%, which indicates that the semantic segmentation effect of the invention on 3D point cloud is stronger than that of PointNet + +.
Claims (7)
1. A3D point cloud semantic segmentation method based on position attention and an auxiliary network is characterized by comprising the following steps:
(1) Downloading a training file and a test file of 3D point cloud data from a ScanNet official network, and carrying out category statistics and block cutting processing on the training file and the test file to obtain a training set T and a test set V;
(2) Constructing a 3D point cloud semantic segmentation network, which comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded;
the location attention module comprises 3 independent 1D convolutional layers Q, U, V,for extracting features of the input data F of the module and calculating attention impact values t between features represented by respective centroids ij And feature E after attention boost:
E=[E 1 ;E 2 ;...;E i ;...;E N ]
wherein, U i Features representing the ith centroid of input data F of the location attention Module through the 1D convolutional layer U, Q j T The input data F representing the position attention module is transposed by the characteristics of the jth centroid extracted by the 1D convolutional layer Q, N represents the number of centroids of F, and E represents the number of centroids of F i And (3) representing the characteristic of the ith centroid in the E, wherein the calculation formula is as follows:
wherein, V j Features representing the jth centroid of F extracted through 1D convolutional layer V,features representing the ith centroid after positional attention has passed, a represents the weight of the positional attention feature, F i A feature representing an ith centroid of the input;
the auxiliary network comprises b pointAux modules and 1D convolutional layers for classification, wherein the b modules are sequentially cascaded, each pointAux module comprises a 1D convolutional layer and a characteristic interpolation layer, and b is more than or equal to 1;
(3) Using a multi-classification cross entropy loss function as a loss function of the 3D point cloud semantic segmentation network;
(4) Performing P rounds of supervised training on the 3D point cloud data semantic segmentation network by using a training set T, wherein P is more than or equal to 500:
(4a) In each round of training process, according to loss functions of the semantic segmentation network, network parameters are adjusted to obtain a network model;
(4b) Every other P 1 And evaluating the segmentation accuracy of the current network model by using the sample of the test set, and if the segmentation accuracy of the current network model is higher than that of the previously stored network model, storing the current network model, P 1 ≥2;
(4c) After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;
(5) And inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point.
2. The method according to claim 1, wherein (1) the class statistics and the block cutting processing are performed on the point cloud data as follows:
(1a) Using histogram statistics to count all f in training files 0 The number of each category of the point cloud data of each scene is calculated, and the weight w of each category is calculated k :
Wherein, G k Representing the number of the kth point cloud data, M representing the number of all point cloud data, L representing the number of segmentation classes, f 0 ≥1000,L≥2;
(1b) For each scene in the training file, a point is randomly selected as a central point with coordinates of (x, y, z), points in the ranges of (x-0.75, x + 0.75), (y-0.75, y + 0.75), (z-0.75, z + 0.75) are taken around the point to form a data block, and the number of points in the data block and the number of sampling points N are combined 0 And (4) comparing to judge whether the comparison is reasonable:
if the number of points in the data block is larger than the number of sampling points N 0 Then the data block is judged to be reasonable and N is randomly sampled in the data block 0 Point, forming a sample data, otherwise, abandoning the data block, thereby obtaining a training set T, wherein N is 0 ≥4096;
(1c) For places in test filesHas f 1 For each scene in the scene, a sliding window is cut by using a cubic window with the size of 1.5 multiplied by 3, and N is randomly sampled for each data block 0 Forming a sample data by using points to obtain a test set V, f 1 ≥300。
3. The method of claim 1, wherein the feature downsampling network in (2) comprises n cascaded PointSA modules, each PointSA module comprises a point cloud centroid sampling layer, a grouping layer and a point cloud feature extraction layer, which are cascaded in sequence, wherein n is greater than or equal to 2.
4. The method according to claim 1, wherein the feature upsampling network in (2) comprises a pointFP modules, a 1D convolutional layer, a Dropout layer and a 1D convolutional layer for classification, which are sequentially cascaded, and each pointFP module comprises a feature interpolation layer and a feature extraction layer, which are sequentially cascaded, wherein a is greater than or equal to 2.
5. The method according to claim 1, wherein the loss function of the 3D point cloud semantic segmentation network in step (3) is calculated as follows:
wherein C represents the number of training sample points, L represents the total number of categories, and w k Is a weight of class k, w a Weight of loss to the auxiliary network, w a ∈[0,1];p i,k Representing the real probability that the ith sample point belongs to the kth class, wherein if the ith sample point belongs to the kth class, the probability value is 1, otherwise, the probability value is 0;and &>Respectively representing characteristic of winningThe probability that the ith sample point predicted by the sample network and the auxiliary network belongs to the kth class, and->And &>The calculation formula of (a) is as follows:
wherein,respectively representing the k channel characteristic value of the ith sample point output by the characteristic up-sampling network and the auxiliary network, and the calculation formula is as follows:
wherein x is i Input features representing the ith sample point, f 1 Representing a characteristic up-sampling network, theta 1 Parameters representing a characteristic up-sampling network, f 2 Representing auxiliary networks, theta 2 Representing parameters of the auxiliary network.
6. The method of claim 5, wherein the adjusting the network parameters according to the loss function of the semantically segmented network in (4 a) is performed by the following formula:
wherein l q Represents the learning rate, theta, of the q-th round of training q Parameters of the 3D point cloud semantic segmentation network representing the qth round of training, θ q+1 Is expressed as a pair of theta q And after adjustment, parameters for the q +1 th round of training process.
7. The method of claim 1, wherein every P in (4 b) 1 And evaluating the segmentation precision of the current network model, and realizing the following steps:
(4b1) Every other P 1 Inputting the test set into the current network model to obtain the prediction categories of all point cloud data in the test set;
(4b2) Counting the number of the prediction categories of all point cloud data in the test set, which is the same as the real categories of the point cloud data, and calculating the segmentation precision:wherein, R represents the number of the prediction categories of all the point cloud data in the test set which is the same as the real categories of the point cloud data, and H represents the number of all the point cloud data in the test set;
(4b3) And comparing the segmentation precision of the current network model with the segmentation precision of the previously stored network model, if the segmentation precision of the current network model is higher than that of the previously stored network model, indicating that the current network model is better, and storing the current network model, otherwise, not storing the current network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910604264.0A CN110322453B (en) | 2019-07-05 | 2019-07-05 | 3D point cloud semantic segmentation method based on position attention and auxiliary network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910604264.0A CN110322453B (en) | 2019-07-05 | 2019-07-05 | 3D point cloud semantic segmentation method based on position attention and auxiliary network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110322453A CN110322453A (en) | 2019-10-11 |
CN110322453B true CN110322453B (en) | 2023-04-18 |
Family
ID=68122807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910604264.0A Active CN110322453B (en) | 2019-07-05 | 2019-07-05 | 3D point cloud semantic segmentation method based on position attention and auxiliary network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110322453B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827398B (en) * | 2019-11-04 | 2023-12-26 | 北京建筑大学 | Automatic semantic segmentation method for indoor three-dimensional point cloud based on deep neural network |
CN111127493A (en) * | 2019-11-12 | 2020-05-08 | 中国矿业大学 | Remote sensing image semantic segmentation method based on attention multi-scale feature fusion |
CN111223120B (en) * | 2019-12-10 | 2023-08-04 | 南京理工大学 | Point cloud semantic segmentation method |
CN111192270A (en) * | 2020-01-03 | 2020-05-22 | 中山大学 | Point cloud semantic segmentation method based on point global context reasoning |
CN111428619B (en) * | 2020-03-20 | 2022-08-05 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels |
CN111583263B (en) * | 2020-04-30 | 2022-09-23 | 北京工业大学 | Point cloud segmentation method based on joint dynamic graph convolution |
CN112633330B (en) * | 2020-12-06 | 2024-02-02 | 西安电子科技大学 | Point cloud segmentation method, system, medium, computer equipment, terminal and application |
CN112560865B (en) * | 2020-12-23 | 2022-08-12 | 清华大学 | Semantic segmentation method for point cloud under outdoor large scene |
CN112927248B (en) * | 2021-03-23 | 2022-05-10 | 重庆邮电大学 | Point cloud segmentation method based on local feature enhancement and conditional random field |
CN113205509B (en) * | 2021-05-24 | 2021-11-09 | 山东省人工智能研究院 | Blood vessel plaque CT image segmentation method based on position convolution attention network |
CN113554653A (en) * | 2021-06-07 | 2021-10-26 | 之江实验室 | Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration |
CN113470048B (en) * | 2021-07-06 | 2023-04-25 | 北京深睿博联科技有限责任公司 | Scene segmentation method, device, equipment and computer readable storage medium |
CN114140841A (en) * | 2021-10-30 | 2022-03-04 | 华为技术有限公司 | Point cloud data processing method, neural network training method and related equipment |
CN115619963B (en) * | 2022-11-14 | 2023-06-02 | 吉奥时空信息技术股份有限公司 | Urban building entity modeling method based on content perception |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102034267A (en) * | 2010-11-30 | 2011-04-27 | 中国科学院自动化研究所 | Three-dimensional reconstruction method of target based on attention |
CN102036073B (en) * | 2010-12-21 | 2012-11-28 | 西安交通大学 | Method for encoding and decoding JPEG2000 image based on vision potential attention target area |
US11094137B2 (en) * | 2012-02-24 | 2021-08-17 | Matterport, Inc. | Employing three-dimensional (3D) data predicted from two-dimensional (2D) images using neural networks for 3D modeling applications and other applications |
CN103871050B (en) * | 2014-02-19 | 2017-12-29 | 小米科技有限责任公司 | icon dividing method, device and terminal |
US11004202B2 (en) * | 2017-10-09 | 2021-05-11 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for semantic segmentation of 3D point clouds |
US10824862B2 (en) * | 2017-11-14 | 2020-11-03 | Nuro, Inc. | Three-dimensional object detection for autonomous robotic systems using image proposals |
CN109871532B (en) * | 2019-01-04 | 2022-07-08 | 平安科技(深圳)有限公司 | Text theme extraction method and device and storage medium |
-
2019
- 2019-07-05 CN CN201910604264.0A patent/CN110322453B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110322453A (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110322453B (en) | 3D point cloud semantic segmentation method based on position attention and auxiliary network | |
WO2022088676A1 (en) | Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium | |
CN108961235B (en) | Defective insulator identification method based on YOLOv3 network and particle filter algorithm | |
CN111079685B (en) | 3D target detection method | |
CN110245709B (en) | 3D point cloud data semantic segmentation method based on deep learning and self-attention | |
CN111199214B (en) | Residual network multispectral image ground object classification method | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN109214403B (en) | Image recognition method, device and equipment and readable medium | |
CN110059728B (en) | RGB-D image visual saliency detection method based on attention model | |
CN109029363A (en) | A kind of target ranging method based on deep learning | |
CN108305260B (en) | Method, device and equipment for detecting angular points in image | |
CN111028327A (en) | Three-dimensional point cloud processing method, device and equipment | |
CN111310821B (en) | Multi-view feature fusion method, system, computer equipment and storage medium | |
CN111860587B (en) | Detection method for small targets of pictures | |
CN111738114B (en) | Vehicle target detection method based on anchor-free accurate sampling remote sensing image | |
CN110532959B (en) | Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network | |
CN111339924B (en) | Polarized SAR image classification method based on superpixel and full convolution network | |
CN114998756B (en) | Yolov-based remote sensing image detection method, yolov-based remote sensing image detection device and storage medium | |
CN113610905B (en) | Deep learning remote sensing image registration method based on sub-image matching and application | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN114299405A (en) | Unmanned aerial vehicle image real-time target detection method | |
CN113450269A (en) | Point cloud key point extraction method based on 3D vision | |
CN110956601B (en) | Infrared image fusion method and device based on multi-sensor mode coefficients and computer readable storage medium | |
CN115761888A (en) | Tower crane operator abnormal behavior detection method based on NL-C3D model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |