CN110322453B - 3D point cloud semantic segmentation method based on position attention and auxiliary network - Google Patents

3D point cloud semantic segmentation method based on position attention and auxiliary network Download PDF

Info

Publication number
CN110322453B
CN110322453B CN201910604264.0A CN201910604264A CN110322453B CN 110322453 B CN110322453 B CN 110322453B CN 201910604264 A CN201910604264 A CN 201910604264A CN 110322453 B CN110322453 B CN 110322453B
Authority
CN
China
Prior art keywords
network
point cloud
training
representing
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910604264.0A
Other languages
Chinese (zh)
Other versions
CN110322453A (en
Inventor
焦李成
冯志玺
张格格
杨淑媛
程曦娜
马清华
张�杰
郭雨薇
丁静怡
唐旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910604264.0A priority Critical patent/CN110322453B/en
Publication of CN110322453A publication Critical patent/CN110322453A/en
Application granted granted Critical
Publication of CN110322453B publication Critical patent/CN110322453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a 3D point cloud semantic segmentation method based on position attention and an auxiliary network, which mainly solves the problem of low segmentation precision in the prior art, and the implementation scheme is as follows: acquiring a training set T and a test set V; constructing a 3D point cloud semantic segmentation network, and setting a loss function of the network, wherein the network comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded; and performing P rounds of supervised training on the segmentation network by using a training set T: adjusting network parameters according to a loss function in the training process of each round, and taking a network model with the highest segmentation precision as a trained network model after P rounds of training are completed; and inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point. The method improves the semantic segmentation precision of the 3D point cloud, and can be used for automatic driving, robots, 3D scene reconstruction, quality detection, 3D mapping and smart city construction.

Description

3D point cloud semantic segmentation method based on position attention and auxiliary network
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a 3D point cloud semantic segmentation method which can be used for automatic driving, robots, 3D scene reconstruction, quality detection, 3D mapping and smart city construction.
Background
In recent years, with the wide application of laser radars, RGBD cameras and other 3D sensors in the fields of robots and unmanned driving, the application of deep learning in 3D point cloud data has become one of the research hotspots. The 3D point cloud data refers to: a set of vectors in a three-dimensional coordinate system, usually expressed in the form of x, y, z three-dimensional coordinates, is typically used to represent the shape of the outer surface of an object. In addition, besides the geometric information represented by (x, y, z), RGB color, intensity, gray scale value, depth, or number of returns may be included. The point cloud data is typically obtained by a 3D scanning device, such as a laser radar, RGBD camera, or the like. These sensors measure information in an automated manner at a large number of points on the object surface and then output point cloud data using some data file. Point cloud data has the characteristics of being disorderly, unstructured, and may have different densities in 3D space. This makes the study of deep learning applications on 3D point cloud data a huge challenge.
The 3D point cloud semantic segmentation refers to allocating a category to each point in the input point cloud data. In early research work, 3D point cloud data was generally converted into manual voxel grid features or multi-view image features, and then sent into a deep learning network for feature extraction, so that the method for converting features is not only large in data size, but also complex in calculation, and if the resolution is reduced, the segmentation accuracy would be reduced. Therefore, it is especially important to directly process point cloud data by using a deep learning method.
In 2017, a paper published on CVPR by Charles R Qi and the like and named as 'Deep Learning framework for directly processing 3D Point cloud data' discloses a Deep Learning framework for solving the problem of Point cloud disorder by using a symmetric function of max-posing so as to extract the global feature of each Point. But this method only considers global features and ignores local features at each point. Therefore, shortly after PointNet, charles R Qi's team published a paper in NIPS named "PointNet + +: deep Hierarchical Feature Learning on Point Sets in a Metric Space", pointNet + + being a layered version of PointNet, each layer having three phases: sampling, grouping and feature extraction. Firstly, some more important points are selected as the central points of each local area, and then k adjacent points are selected around the central points according to Euclidean distance. And then, taking the k neighbor points as a local point cloud, extracting features by adopting a PointNet network, and then returning deep features to obtain a 3D point cloud data semantic segmentation result.
Compared with the traditional method, the two methods have the advantages that the 3D point cloud data are directly processed, the calculation is simple, the characteristic of point cloud disorder is effectively solved, and the segmentation precision is improved, however, the PointNet + + does not consider the relation among characteristics of each central point, namely context information, so that the characteristic representation is relatively weak, and in addition, the PointNet + + obes a general frame of coding-decoding and does not consider more information of a bottom layer, so that the segmentation precision is not high, and an improved space is still provided.
Disclosure of Invention
The invention aims to provide a 3D point cloud data semantic segmentation method based on position attention and an auxiliary network aiming at the defects of the prior art, so that the position attention of context characteristics is associated with the auxiliary network for reconstructing underlying information, and the segmentation precision is improved.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) Downloading a training file and a test file of 3D point cloud data from a ScanNet official network, and carrying out category statistics and block cutting processing on the training file and the test file to obtain a training set T and a test set V;
(2) Constructing a 3D point cloud semantic segmentation network, which comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded;
(3) Using a multi-classification cross entropy loss function as a loss function of the 3D point cloud semantic segmentation network;
(4) Performing P rounds of supervised training on the 3D point cloud data semantic segmentation network by using a training set T, wherein P is more than or equal to 500;
(4a) In each round of training process, according to the loss function of the semantic segmentation network, adjusting network parameters to obtain a network model;
(4b) Every other P 1 And evaluating the segmentation accuracy of the current network model by using the sample of the test set, and if the segmentation accuracy of the current network model is higher than that of the previously stored network model, storing the current network model, P 1 ≥2;
(4c) After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;
(5) And inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point.
Compared with the prior art, the invention has the following advantages:
according to the invention, as the 3D point cloud semantic segmentation network is constructed, and the relevance among the characteristics represented by each centroid of the input data of the 3D point cloud semantic segmentation network is calculated through the position attention module, the context information is added to the local centroid characteristics of the 3D point cloud semantic segmentation network; meanwhile, the bottom layer characteristics of the 3D point cloud semantic segmentation network are returned through the auxiliary network, so that the bottom layer information of the 3D point cloud semantic segmentation network is reconstructed, and the segmentation precision of the 3D point cloud semantic segmentation is effectively improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a whole structure diagram of a 3D point cloud semantic segmentation network constructed in the invention;
FIG. 3 is a block diagram of a location attention module according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments.
Referring to fig. 1, implementation steps of this example include the following.
Step 1, a training set T and a test set V are obtained.
1.1 Training and testing files for downloading 3D point cloud data from ScanNet official network, wherein the training files contain f 0 Point cloud scene, test file contains f 1 Point cloud scenario, in this example f 0 =1201,f 1 =312;
1.2 Using histogram statistics to count all f's in the training file 0 The number of each category of the point cloud data of each scene is calculated, and the weight w of each category is calculated k
Figure BDA0002120278500000031
Wherein, G k The number of k-th point cloud data is represented, M represents the number of all point cloud data, L represents the number of segmentation categories, L is more than or equal to 2, and L =21 in the embodiment;
1.3 Randomly selecting a point as a central point for each scene in the training file, wherein the coordinate is (x, y, z), and taking points in the ranges of (x-0.75, x + 0.75), (y-0.75, y + 0.75), (z-0.75, z + 0.75) around the point to form a data block;
1.4 Set the number of sampling points N) 0 The number of points in the data block obtained in (1.3) and the number of sampling points N 0 And (4) comparing to judge whether the comparison is reasonable:
if the number of points in the data block is larger than the number of sampling points N 0 Then the data block is judged to be reasonable and N is randomly sampled in the data block 0 Point, forming a sample data, otherwise, discarding the data block, thereby obtaining a training set T, in this embodiment, N 0 =8192;
1.5 For all f in the test file 1 For each scene in the scene, a sliding window is cut by using a cubic window with the size of 1.5 multiplied by 3, and N is randomly sampled for each data block 0 And (4) forming sample data by using the points to obtain a test set V.
And 2, constructing a 3D point cloud semantic segmentation network.
Referring to fig. 2, the 3D point cloud semantic segmentation network constructed in this step includes a feature down-sampling network, a location attention module, a feature up-sampling network, and an auxiliary network, which are sequentially cascaded.
2.1 Set up a feature downsampling network:
the feature downsampling network comprises n cascaded PointSA modules, wherein each PointSA module comprises a point cloud centroid sampling and grouping layer and a point cloud feature extraction layer which are cascaded in sequence, n is larger than or equal to 2, and the parameter is set to be n =4 in the embodiment;
for the centroid sampling and grouping layer of the mth PointSA module, m =1,2
Figure BDA0002120278500000041
Point is taken as the center of mass point and, secondly, in->
Figure BDA0002120278500000042
Using the center of mass point of each sample as the center, and using a spherical search algorithm to search for the center of mass point at a specific radius r m Search in range->
Figure BDA0002120278500000043
Dots, constitute a packet. In this embodiment, the 1 st PointSA module is set to->
Figure BDA0002120278500000044
r 1 =0.1; the 2 nd PointSA module, set->
Figure BDA0002120278500000045
Figure BDA0002120278500000046
r 2 =0.2; the 3 rd PointSA module, set->
Figure BDA0002120278500000047
r 3 =0.4; 4 th PointSA module, set->
Figure BDA0002120278500000048
r 4 =0.8;
And the point cloud feature extraction layer of the mth PointSA module comprises 3 sequentially cascaded 2D convolution layers and is used for extracting the features of the data output by the centroid sampling and grouping layer and pooling the extracted region features by using a maximum pooling strategy. In this embodiment, the convolution kernels of 3 2D convolution layers of the point cloud feature extraction layer of the 1 st PointSA module are all 1 × 1, the step length is all 1, and the number of output channels is 32, and 64, respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of the 2 nd PointSA module are all 1 multiplied by 1, the step length is 1, and the number of output channels is 64, 64 and 128 respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of a 3 rd PointSA module are all 1 multiplied by 1, the step length is 1, and the number of output channels is 128, 128 and 256 respectively; the convolution kernel sizes of 3 2D convolution layers of a point cloud feature extraction layer of a 4 th PointSA module are all 1 multiplied by 1, the step length is 1, and the output channel numbers are respectively 256, 256 and 512;
2.2 Set up the position attention module, is used for calculating the correlation between the characteristic that its every mass center of input data F represents, get the characteristic E after the position attention strengthens:
referring to fig. 3, the module works as follows:
2.2.1 Input data F respectively pass through the first 1D convolutional layer Q to obtain the characteristic Q of the ith centroid i I =1,2,.., N denotes the number of centroids of F; then the second 1D convolution layer U is used to obtain the characteristic U of the jth mass center j J =1,2.., N, and then the characteristic V of the jth centroid is obtained through the third 1D convolution layer V j (ii) a The sizes of convolution kernels of the three 1D convolution layers Q, U, V are all 1, the step lengths are all 1, and the number of output characteristic channels of the first 1D convolution layer Q and the second 1D convolution layer U is equal to the number of input data F characteristic channels
Figure BDA0002120278500000051
The number of output characteristic channels of the third 1D convolutional layer V is the same as that of the input data F;
2.2.2 Computing attention-influence values t between features represented by respective centroids ij
Figure BDA0002120278500000052
Using t ij Forming a matrix A:
Figure BDA0002120278500000053
2.2.3 Computing location attention features
Figure BDA0002120278500000054
2.2.4 Feature E after attention boost is output:
E=[E 1 ;E 2 ;...;E i ;...;E N ],
wherein E is i =αJ i +F i Denote the feature of the ith centroid in E, and α denotes the positional attention feature J i Weight of (1), F i A feature representing an ith centroid of the input;
2.3 Set up a feature upsampling network:
the characteristic up-sampling network comprises a plurality of PointFP modules, a 1D convolution layer, a Dropout layer and a 1D convolution layer for classification which are sequentially cascaded, wherein each PointFP module comprises a characteristic interpolation layer and a characteristic extraction layer which are sequentially cascaded, a is more than or equal to 2, and the parameter is set to be a =4 in the embodiment;
the a PointFP modules have different structures of a characteristic interpolation layer and a characteristic extraction layer, wherein:
for the 1 st PointFP module, the characteristic interpolation layer interpolates the output characteristics of the position attention module, and the characteristics after interpolation are cascaded with the output characteristics of the 3 rd PointSA module to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of the convolutional kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 256 respectively;
for the 2 nd PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 1 st PointFP module, and the interpolated characteristics and the output characteristics of the 2 nd PointSA module are cascaded to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of the convolutional kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 256 respectively;
for the 3 rd PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 2 nd PointFP module, and the interpolated characteristics and the output characteristics of the 1 st PointSA module are cascaded to obtain the output characteristics of the characteristic interpolation layer; the characteristic extraction layer comprises 2 sequentially cascaded 2D convolutional layers and is used for further extracting the output characteristic, the sizes of convolution kernels of the 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 256 and 128 respectively;
for the 4 th PointFP module, the characteristic interpolation layer interpolates the output characteristics of the 3 rd PointFP module to obtain interpolated characteristics, and the interpolated characteristics are used as the output characteristics of the characteristic interpolation layer; the feature extraction layer comprises 3 sequentially cascaded 2D convolutional layers and is used for further extracting the output feature, the sizes of convolution kernels of the 3 2D convolutional layers are all 1 multiplied by 1, the step length is all 1, and the number of output channels is 128, 128 and 128 respectively.
The convolution kernel of the 1D convolution layer is 1, the step length is 1, and the number of output characteristic channels is set to be 128;
the Dropout layer, the retention probability of which is set to 0.5;
the 1D convolutional layer for classification has the convolutional kernel size of 1 and the step length of 1, and the number of output characteristic channels is set as the number L of the classification of the segmentation.
2.4 Set up the auxiliary network:
the auxiliary network comprises b PointAux modules and 1D convolutional layers for classification, wherein the b is more than or equal to 1, and the b =2 in the embodiment;
for the 1 st PointAux module, the 1D convolution layer of the 1 st PointAux module is used for extracting the characteristics of the output data of the 2 nd PointFP module, the size of a convolution kernel is 1, the step length is 1, and an output characteristic channel is the number L of the divided categories; the characteristic interpolation layer is used for interpolating the characteristics extracted by the 1D convolution layer;
for the 2 nd PointAux module, the 1D convolution layer is used for extracting the characteristics of the output data of the 1 st PointAux module, the size of the convolution kernel is 1, the step length is 1, and the output characteristic channel is the segmented class number L; the characteristic interpolation layer is used for interpolating the characteristics extracted by the 1D convolution layer;
and the 1D convolutional layer is used for classifying the output characteristics of the 2 nd PointAux module, the size of a convolutional kernel is 1, the step length is 1, and the number of output characteristic channels is set as the number L of the divided categories.
And 3, setting a loss function of the 3D point cloud semantic segmentation network.
The example takes a multi-classification cross entropy loss function as a loss function of a 3D point cloud semantic segmentation network, and the expression formula is as follows:
Figure BDA0002120278500000071
wherein C represents the number of training sample points, L represents the total number of categories, and w k Is a weight of class k, w a Weight of loss to the auxiliary network, w a ∈[0,1]W in this embodiment a =0.5;
p i,k Representing the real probability that the ith sample point belongs to the kth class, wherein if the ith sample point belongs to the kth class, the probability value is 1, otherwise, the probability value is 0;
Figure BDA0002120278500000072
and &>
Figure BDA0002120278500000073
Respectively representing the probability that the ith sample point predicted by the feature upsampling network and the auxiliary network belongs to the kth class, and the calculation formula is as follows:
Figure BDA0002120278500000074
Figure BDA0002120278500000075
wherein,
Figure BDA0002120278500000076
Respectively representing the k channel characteristic value of the ith sample point output by the characteristic up-sampling network and the auxiliary network, and the calculation formula is as follows:
Figure BDA0002120278500000077
Figure BDA0002120278500000078
wherein x is i Input features representing the ith sample point, f 1 Representing a characteristic upsampling network, θ 1 Parameters representing a characteristic up-sampling network, f 2 Representing auxiliary networks, theta 2 Representing parameters of the auxiliary network.
And 4, performing P rounds of supervised training on the 3D point cloud semantic segmentation network by using the training set T, wherein P is more than or equal to 500.
In this embodiment, P =1000 is taken, and the training steps are as follows:
4.1 In the q-th round of training, set l q For the learning rate of the q-th round of training process, θ is set q Using a formula for the parameters of the network model of the q-th round training process according to the loss function set in step 3
Figure BDA0002120278500000079
Adjusting theta q To obtain a network model parameter theta for the q +1 th round of training process q+1 Thus obtaining a network model after the q-th round of training process;
4.2 Every P) 1 And (4) inputting the test set into the current network model to obtain the prediction categories, P, of all point cloud data in the test set 1 Not less than 2, in this example, P 1 =5;
4.3 ) counting the number of the prediction categories of all point cloud data in the test set, which is the same as the real categories of the point cloud data, and calculating the segmentation precision:
Figure BDA0002120278500000081
wherein, R represents the number of the prediction categories of all the point cloud data in the test set which is the same as the real categories of the point cloud data, and H represents the number of all the point cloud data in the test set;
4.4 Comparing the segmentation accuracy acc of the current network model with the segmentation accuracy acc of the previously stored network model, if the segmentation accuracy acc of the current network model is higher than the segmentation accuracy acc of the previously stored network model, indicating that the current network model is better and storing the current network model, otherwise, not storing the current network model.
4.5 After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;
and 5) inputting the test set V into the trained network model obtained in the step 4.5) for semantic segmentation to obtain a segmentation result of each point.
The technical effects of the invention are explained by combining simulation experiments as follows:
1. simulation conditions
The simulation experiment of the present invention was performed in the following environment.
Hardware platform: intel (R) Xeon (R) CPU E5-2650v4@2.20GHz,64GB runs memory, ubuntu16.04 operating system, geForce GTX TITAN X;
a software platform: tensorflow deep learning framework, python3.5, the dataset used for the experiment was a point cloud dataset ScanNet.
ScanNet is a point cloud dataset of an indoor scene scanned and reconstructed by an RGB-D camera. The total number of 1513 scenes is included, 1201 scenes are used as a training set, 312 scenes are used as a test set, and the number of included categories is 21.
2. Simulation experiment:
according to the method, a training set and a test set are obtained, a 3D point cloud semantic segmentation network is constructed, supervised training is carried out on the 3D point cloud semantic segmentation network by using the training set, then points in the test set are predicted by using a trained network model, and the segmentation precision of the 3D point cloud segmentation network on the test set V is calculated according to the method in the step 4.3.
Comparing the precision of semantic segmentation on point cloud data by the invention and the existing PointNet + + method, and using the segmentation precision as an evaluation index for evaluating the quality of the invention and the prior art, the results are shown in Table 1:
TABLE 1 ScanNet data set segmentation accuracy comparison table
Evaluation index Prior Art The invention
Accuracy of segmentation 0.836 0.852
As can be seen from Table 1, the segmentation precision of the ScanNet data set exceeds that of PointNet + +, which is the prior art, and is improved by 1.6%, which indicates that the semantic segmentation effect of the invention on 3D point cloud is stronger than that of PointNet + +.

Claims (7)

1. A3D point cloud semantic segmentation method based on position attention and an auxiliary network is characterized by comprising the following steps:
(1) Downloading a training file and a test file of 3D point cloud data from a ScanNet official network, and carrying out category statistics and block cutting processing on the training file and the test file to obtain a training set T and a test set V;
(2) Constructing a 3D point cloud semantic segmentation network, which comprises a feature down-sampling network, a position attention module, a feature up-sampling network and an auxiliary network which are sequentially cascaded;
the location attention module comprises 3 independent 1D convolutional layers Q, U, V,for extracting features of the input data F of the module and calculating attention impact values t between features represented by respective centroids ij And feature E after attention boost:
Figure FDA0004064341810000011
E=[E 1 ;E 2 ;...;E i ;...;E N ]
wherein, U i Features representing the ith centroid of input data F of the location attention Module through the 1D convolutional layer U, Q j T The input data F representing the position attention module is transposed by the characteristics of the jth centroid extracted by the 1D convolutional layer Q, N represents the number of centroids of F, and E represents the number of centroids of F i And (3) representing the characteristic of the ith centroid in the E, wherein the calculation formula is as follows:
Figure FDA0004064341810000012
wherein, V j Features representing the jth centroid of F extracted through 1D convolutional layer V,
Figure FDA0004064341810000013
features representing the ith centroid after positional attention has passed, a represents the weight of the positional attention feature, F i A feature representing an ith centroid of the input;
the auxiliary network comprises b pointAux modules and 1D convolutional layers for classification, wherein the b modules are sequentially cascaded, each pointAux module comprises a 1D convolutional layer and a characteristic interpolation layer, and b is more than or equal to 1;
(3) Using a multi-classification cross entropy loss function as a loss function of the 3D point cloud semantic segmentation network;
(4) Performing P rounds of supervised training on the 3D point cloud data semantic segmentation network by using a training set T, wherein P is more than or equal to 500:
(4a) In each round of training process, according to loss functions of the semantic segmentation network, network parameters are adjusted to obtain a network model;
(4b) Every other P 1 And evaluating the segmentation accuracy of the current network model by using the sample of the test set, and if the segmentation accuracy of the current network model is higher than that of the previously stored network model, storing the current network model, P 1 ≥2;
(4c) After P rounds of training are finished, the network model with the highest segmentation precision is used as a trained network model;
(5) And inputting the test set V into the trained network model for semantic segmentation to obtain a segmentation result of each point.
2. The method according to claim 1, wherein (1) the class statistics and the block cutting processing are performed on the point cloud data as follows:
(1a) Using histogram statistics to count all f in training files 0 The number of each category of the point cloud data of each scene is calculated, and the weight w of each category is calculated k
Figure FDA0004064341810000021
Wherein, G k Representing the number of the kth point cloud data, M representing the number of all point cloud data, L representing the number of segmentation classes, f 0 ≥1000,L≥2;
(1b) For each scene in the training file, a point is randomly selected as a central point with coordinates of (x, y, z), points in the ranges of (x-0.75, x + 0.75), (y-0.75, y + 0.75), (z-0.75, z + 0.75) are taken around the point to form a data block, and the number of points in the data block and the number of sampling points N are combined 0 And (4) comparing to judge whether the comparison is reasonable:
if the number of points in the data block is larger than the number of sampling points N 0 Then the data block is judged to be reasonable and N is randomly sampled in the data block 0 Point, forming a sample data, otherwise, abandoning the data block, thereby obtaining a training set T, wherein N is 0 ≥4096;
(1c) For places in test filesHas f 1 For each scene in the scene, a sliding window is cut by using a cubic window with the size of 1.5 multiplied by 3, and N is randomly sampled for each data block 0 Forming a sample data by using points to obtain a test set V, f 1 ≥300。
3. The method of claim 1, wherein the feature downsampling network in (2) comprises n cascaded PointSA modules, each PointSA module comprises a point cloud centroid sampling layer, a grouping layer and a point cloud feature extraction layer, which are cascaded in sequence, wherein n is greater than or equal to 2.
4. The method according to claim 1, wherein the feature upsampling network in (2) comprises a pointFP modules, a 1D convolutional layer, a Dropout layer and a 1D convolutional layer for classification, which are sequentially cascaded, and each pointFP module comprises a feature interpolation layer and a feature extraction layer, which are sequentially cascaded, wherein a is greater than or equal to 2.
5. The method according to claim 1, wherein the loss function of the 3D point cloud semantic segmentation network in step (3) is calculated as follows:
Figure FDA0004064341810000031
wherein C represents the number of training sample points, L represents the total number of categories, and w k Is a weight of class k, w a Weight of loss to the auxiliary network, w a ∈[0,1];p i,k Representing the real probability that the ith sample point belongs to the kth class, wherein if the ith sample point belongs to the kth class, the probability value is 1, otherwise, the probability value is 0;
Figure FDA0004064341810000032
and &>
Figure FDA0004064341810000033
Respectively representing characteristic of winningThe probability that the ith sample point predicted by the sample network and the auxiliary network belongs to the kth class, and->
Figure FDA0004064341810000034
And &>
Figure FDA0004064341810000035
The calculation formula of (a) is as follows:
Figure FDA0004064341810000036
Figure FDA0004064341810000037
wherein,
Figure FDA0004064341810000038
respectively representing the k channel characteristic value of the ith sample point output by the characteristic up-sampling network and the auxiliary network, and the calculation formula is as follows:
Figure FDA0004064341810000039
Figure FDA00040643418100000310
wherein x is i Input features representing the ith sample point, f 1 Representing a characteristic up-sampling network, theta 1 Parameters representing a characteristic up-sampling network, f 2 Representing auxiliary networks, theta 2 Representing parameters of the auxiliary network.
6. The method of claim 5, wherein the adjusting the network parameters according to the loss function of the semantically segmented network in (4 a) is performed by the following formula:
Figure FDA0004064341810000041
wherein l q Represents the learning rate, theta, of the q-th round of training q Parameters of the 3D point cloud semantic segmentation network representing the qth round of training, θ q+1 Is expressed as a pair of theta q And after adjustment, parameters for the q +1 th round of training process.
7. The method of claim 1, wherein every P in (4 b) 1 And evaluating the segmentation precision of the current network model, and realizing the following steps:
(4b1) Every other P 1 Inputting the test set into the current network model to obtain the prediction categories of all point cloud data in the test set;
(4b2) Counting the number of the prediction categories of all point cloud data in the test set, which is the same as the real categories of the point cloud data, and calculating the segmentation precision:
Figure FDA0004064341810000042
wherein, R represents the number of the prediction categories of all the point cloud data in the test set which is the same as the real categories of the point cloud data, and H represents the number of all the point cloud data in the test set;
(4b3) And comparing the segmentation precision of the current network model with the segmentation precision of the previously stored network model, if the segmentation precision of the current network model is higher than that of the previously stored network model, indicating that the current network model is better, and storing the current network model, otherwise, not storing the current network model.
CN201910604264.0A 2019-07-05 2019-07-05 3D point cloud semantic segmentation method based on position attention and auxiliary network Active CN110322453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910604264.0A CN110322453B (en) 2019-07-05 2019-07-05 3D point cloud semantic segmentation method based on position attention and auxiliary network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910604264.0A CN110322453B (en) 2019-07-05 2019-07-05 3D point cloud semantic segmentation method based on position attention and auxiliary network

Publications (2)

Publication Number Publication Date
CN110322453A CN110322453A (en) 2019-10-11
CN110322453B true CN110322453B (en) 2023-04-18

Family

ID=68122807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910604264.0A Active CN110322453B (en) 2019-07-05 2019-07-05 3D point cloud semantic segmentation method based on position attention and auxiliary network

Country Status (1)

Country Link
CN (1) CN110322453B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827398B (en) * 2019-11-04 2023-12-26 北京建筑大学 Automatic semantic segmentation method for indoor three-dimensional point cloud based on deep neural network
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111223120B (en) * 2019-12-10 2023-08-04 南京理工大学 Point cloud semantic segmentation method
CN111192270A (en) * 2020-01-03 2020-05-22 中山大学 Point cloud semantic segmentation method based on point global context reasoning
CN111428619B (en) * 2020-03-20 2022-08-05 电子科技大学 Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels
CN111583263B (en) * 2020-04-30 2022-09-23 北京工业大学 Point cloud segmentation method based on joint dynamic graph convolution
CN112633330B (en) * 2020-12-06 2024-02-02 西安电子科技大学 Point cloud segmentation method, system, medium, computer equipment, terminal and application
CN112560865B (en) * 2020-12-23 2022-08-12 清华大学 Semantic segmentation method for point cloud under outdoor large scene
CN112927248B (en) * 2021-03-23 2022-05-10 重庆邮电大学 Point cloud segmentation method based on local feature enhancement and conditional random field
CN113205509B (en) * 2021-05-24 2021-11-09 山东省人工智能研究院 Blood vessel plaque CT image segmentation method based on position convolution attention network
CN113554653A (en) * 2021-06-07 2021-10-26 之江实验室 Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration
CN113470048B (en) * 2021-07-06 2023-04-25 北京深睿博联科技有限责任公司 Scene segmentation method, device, equipment and computer readable storage medium
CN114140841A (en) * 2021-10-30 2022-03-04 华为技术有限公司 Point cloud data processing method, neural network training method and related equipment
CN115619963B (en) * 2022-11-14 2023-06-02 吉奥时空信息技术股份有限公司 Urban building entity modeling method based on content perception

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034267A (en) * 2010-11-30 2011-04-27 中国科学院自动化研究所 Three-dimensional reconstruction method of target based on attention
CN102036073B (en) * 2010-12-21 2012-11-28 西安交通大学 Method for encoding and decoding JPEG2000 image based on vision potential attention target area
US11094137B2 (en) * 2012-02-24 2021-08-17 Matterport, Inc. Employing three-dimensional (3D) data predicted from two-dimensional (2D) images using neural networks for 3D modeling applications and other applications
CN103871050B (en) * 2014-02-19 2017-12-29 小米科技有限责任公司 icon dividing method, device and terminal
US11004202B2 (en) * 2017-10-09 2021-05-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for semantic segmentation of 3D point clouds
US10824862B2 (en) * 2017-11-14 2020-11-03 Nuro, Inc. Three-dimensional object detection for autonomous robotic systems using image proposals
CN109871532B (en) * 2019-01-04 2022-07-08 平安科技(深圳)有限公司 Text theme extraction method and device and storage medium

Also Published As

Publication number Publication date
CN110322453A (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN110322453B (en) 3D point cloud semantic segmentation method based on position attention and auxiliary network
WO2022088676A1 (en) Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN111079685B (en) 3D target detection method
CN110245709B (en) 3D point cloud data semantic segmentation method based on deep learning and self-attention
CN111199214B (en) Residual network multispectral image ground object classification method
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN107633226B (en) Human body motion tracking feature processing method
CN109214403B (en) Image recognition method, device and equipment and readable medium
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN109029363A (en) A kind of target ranging method based on deep learning
CN108305260B (en) Method, device and equipment for detecting angular points in image
CN111028327A (en) Three-dimensional point cloud processing method, device and equipment
CN111310821B (en) Multi-view feature fusion method, system, computer equipment and storage medium
CN111860587B (en) Detection method for small targets of pictures
CN111738114B (en) Vehicle target detection method based on anchor-free accurate sampling remote sensing image
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN111339924B (en) Polarized SAR image classification method based on superpixel and full convolution network
CN114998756B (en) Yolov-based remote sensing image detection method, yolov-based remote sensing image detection device and storage medium
CN113610905B (en) Deep learning remote sensing image registration method based on sub-image matching and application
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN114299405A (en) Unmanned aerial vehicle image real-time target detection method
CN113450269A (en) Point cloud key point extraction method based on 3D vision
CN110956601B (en) Infrared image fusion method and device based on multi-sensor mode coefficients and computer readable storage medium
CN115761888A (en) Tower crane operator abnormal behavior detection method based on NL-C3D model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant