CN110322453B - 3D point cloud semantic segmentation method based on position attention and auxiliary network - Google Patents
3D point cloud semantic segmentation method based on position attention and auxiliary network Download PDFInfo
- Publication number
- CN110322453B CN110322453B CN201910604264.0A CN201910604264A CN110322453B CN 110322453 B CN110322453 B CN 110322453B CN 201910604264 A CN201910604264 A CN 201910604264A CN 110322453 B CN110322453 B CN 110322453B
- Authority
- CN
- China
- Prior art keywords
- network
- feature
- point cloud
- training
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 44
- 238000012360 testing method Methods 0.000 claims abstract description 32
- 238000005070 sampling Methods 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000017105 transposition Effects 0.000 claims 1
- 238000010276 construction Methods 0.000 abstract description 2
- 238000013507 mapping Methods 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 abstract 1
- 238000013135 deep learning Methods 0.000 description 7
- 238000004088 simulation Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical Field
本发明属于数据处理技术领域,特别涉及一种3D点云语义分割方法,可用于自动驾驶、机器人、3D场景重建、质量检测,3D制图及智慧城市建设。The present invention belongs to the field of data processing technology, and in particular relates to a 3D point cloud semantic segmentation method, which can be used for autonomous driving, robots, 3D scene reconstruction, quality inspection, 3D mapping and smart city construction.
背景技术Background Art
近年来,随着激光雷达,RGBD相机等3D传感器在机器人、无人驾驶领域的广泛应用,深度学习在3D点云数据的应用已经成为研究热点之一。3D点云数据是指:在一个三维坐标系统中的一组向量的集合,这些向量通常以x,y,z三维坐标的形式表示,一般用来代表一个物体的外表面形状。另外,除了(x,y,z)代表的几何信息外,可能还含有RGB颜色、强度、灰度值,深度或者返回次数等信息。点云数据通常是由3D扫描设备获得,例如激光雷达,RGBD相机等。这些传感器用自动化的方式测量在物体表面的大量点的信息,然后用某种数据文件输出点云数据。点云数据具有无序性,非结构化的特点并且在3D空间中可能具有不同的稠密度。这使深度学习应用在3D点云数据的研究面临巨大的挑战。In recent years, with the widespread application of 3D sensors such as LiDAR and RGBD cameras in the fields of robotics and unmanned driving, the application of deep learning in 3D point cloud data has become one of the research hotspots. 3D point cloud data refers to: a set of vectors in a three-dimensional coordinate system. These vectors are usually expressed in the form of x, y, and z three-dimensional coordinates, and are generally used to represent the outer surface shape of an object. In addition, in addition to the geometric information represented by (x, y, z), it may also contain RGB color, intensity, grayscale value, depth or return times. Point cloud data is usually obtained by 3D scanning devices, such as LiDAR, RGBD cameras, etc. These sensors measure the information of a large number of points on the surface of an object in an automated way, and then output point cloud data in a certain data file. Point cloud data is disordered and unstructured and may have different densities in 3D space. This makes the application of deep learning in the study of 3D point cloud data face huge challenges.
3D点云语义分割是指对输入的点云数据中的每一个点分配一个类别。在早期的研究工作中,3D点云数据通常被转换为手工体素网格特征或者是多视角的图像特征,然后送入深度学习网络进行特征提取,这样转换特征的方法不仅数据量大、而且计算复杂,如果降低分辨率,则分割精度会下降。因此,使用深度学习的方法直接处理点云数据显得尤其重要。3D point cloud semantic segmentation refers to assigning a category to each point in the input point cloud data. In early research work, 3D point cloud data is usually converted into manual voxel grid features or multi-view image features, and then sent to the deep learning network for feature extraction. This feature conversion method not only has a large amount of data, but also is computationally complex. If the resolution is reduced, the segmentation accuracy will decrease. Therefore, it is particularly important to use deep learning methods to directly process point cloud data.
2017年,Charles R Qi等人在CVPR上发表的名称为“PointNet:Deep Learning onPoint Sets for 3D Classification and Segmentation”的论文,公开了一种直接处理3D点云数据的深度学习框架,该方法使用max-pooling的对称函数解决点云无序性的问题,从而提取每个点的全局特征。但是该方法只考虑了全局特征,忽略了每个点的局部特征。因此,在PointNet不久,Charles R Qi的团队在NIPS发表了名称为“PointNet++:DeepHierarchical Feature Learning on Point Sets in a Metric Space”的论文,PointNet++是PointNet的分层版本,每层都有三个阶段:采样、分组和特征提取。首先选取一些比较重要的点作为每一个局部区域的中心点,然后在这些中心点的周围根据欧氏距离选取k个近邻点。再将k个近邻点作为一个局部点云采用PointNet网络来提取特征,之后对深层特征进行回传,从而得到3D点云数据语义分割结果,该方法较PointNet精度有所提升。In 2017, Charles R Qi et al. published a paper titled "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation" at CVPR, which disclosed a deep learning framework for directly processing 3D point cloud data. This method uses the symmetric function of max-pooling to solve the problem of point cloud disorder, thereby extracting the global features of each point. However, this method only considers global features and ignores the local features of each point. Therefore, shortly after PointNet, Charles R Qi's team published a paper titled "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space" at NIPS. PointNet++ is a hierarchical version of PointNet, and each layer has three stages: sampling, grouping, and feature extraction. First, some important points are selected as the center points of each local area, and then k neighboring points are selected around these center points according to the Euclidean distance. Then, the k neighboring points are used as a local point cloud to extract features using the PointNet network, and then the deep features are transmitted back to obtain the semantic segmentation results of 3D point cloud data. This method has improved accuracy compared to PointNet.
上述这两种方法与传统方法相比,由于直接处理3D点云数据,计算简单,有效解决了点云无序性的特点并且提升了分割精度,但是,PointNet++由于没有考虑到各个中心点特征之间的关系,也即上下文信息,所以特征表示相对较弱,另外,PointNet++遵从了编码-解码的通用框架,没有考虑底层更多的信息,因此,分割精度不高,仍具有改进的空间。Compared with traditional methods, the above two methods directly process 3D point cloud data, are simple to calculate, effectively solve the problem of point cloud disorder and improve segmentation accuracy. However, PointNet++ does not consider the relationship between the features of each center point, that is, the context information, so the feature representation is relatively weak. In addition, PointNet++ follows the general framework of encoding and decoding and does not consider more underlying information. Therefore, the segmentation accuracy is not high and there is still room for improvement.
发明内容Summary of the invention
本发明的目的在于针对上述现有技术的不足,提出一种基于位置注意力和辅助网络的3D点云数据语义分割方法,以关联上下文特征的位置注意力和重建底层信息的辅助网络,提高分割精度。The purpose of the present invention is to address the deficiencies of the above-mentioned prior art and propose a 3D point cloud data semantic segmentation method based on position attention and auxiliary network, so as to improve the segmentation accuracy by associating the position attention of contextual features and the auxiliary network of reconstructing the underlying information.
为实现上述目的,本发明的技术方案包括如下步骤:To achieve the above object, the technical solution of the present invention includes the following steps:
(1)从ScanNet官网下载3D点云数据的训练文件和测试文件,并对其进行类别统计和切块处理,获取训练集T和测试集V;(1) Download the training and test files of 3D point cloud data from the ScanNet official website, perform category statistics and block processing on them, and obtain the training set T and test set V;
(2)构建3D点云语义分割网络,其包括依次级联的特征下采样网络,位置注意力模块,特征上采样网络和辅助网络;(2) Construct a 3D point cloud semantic segmentation network, which includes a cascaded feature downsampling network, a position attention module, a feature upsampling network, and an auxiliary network;
(3)使用多分类的交叉熵损失函数,作为3D点云语义分割网络的损失函数;(3) Use the multi-classification cross entropy loss function as the loss function of the 3D point cloud semantic segmentation network;
(4)使用训练集T,对3D点云数据语义分割网络进行P轮有监督的训练,P≥500;(4) Using the training set T, perform P rounds of supervised training on the 3D point cloud data semantic segmentation network, where P ≥ 500;
(4a)在每轮训练过程中,根据语义分割网络的损失函数,调整网络参数,得到网络模型;(4a) In each round of training, the network parameters are adjusted according to the loss function of the semantic segmentation network to obtain the network model;
(4b)每隔P1轮,使用测试集的样本对当前网络模型的分割精度进行评估,若当前网络模型的分割精度高于之前保存的网络模型,则进行保存,P1≥2;(4b) Every P 1 rounds, use the samples of the test set to evaluate the segmentation accuracy of the current network model. If the segmentation accuracy of the current network model is higher than that of the previously saved network model, it is saved, P 1 ≥ 2;
(4c)P轮训练完成后,把分割精度最高的网络模型作为训练好的网络模型;(4c) After P rounds of training are completed, the network model with the highest segmentation accuracy is taken as the trained network model;
(5)将测试集V输入到训练好的网络模型中进行语义分割,得到每一个点的分割结果。(5) Input the test set V into the trained network model for semantic segmentation to obtain the segmentation result of each point.
本发明与现有技术相比,具有以下优点:Compared with the prior art, the present invention has the following advantages:
本发明由于构建了3D点云语义分割网络,并通过其中的位置注意力模块,计算其输入数据的各个质心所代表的特征之间的相关性,为3D点云语义分割网络的局部质心特征增加了上下文信息;同时由于通过其中的辅助网络,对3D点云语义分割网络的底层特征进行回传,重建了3D点云语义分割网络的底层信息,有效提高了3D点云语义分割的分割精度。The present invention constructs a 3D point cloud semantic segmentation network, and calculates the correlation between the features represented by each centroid of its input data through the position attention module therein, thereby adding contextual information to the local centroid features of the 3D point cloud semantic segmentation network; at the same time, since the underlying features of the 3D point cloud semantic segmentation network are fed back through the auxiliary network therein, the underlying information of the 3D point cloud semantic segmentation network is reconstructed, thereby effectively improving the segmentation accuracy of the 3D point cloud semantic segmentation.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明的实现流程图;Fig. 1 is an implementation flow chart of the present invention;
图2是本发明中构建的3D点云语义分割网络整体结构图;FIG2 is an overall structural diagram of a 3D point cloud semantic segmentation network constructed in the present invention;
图3是本发明中位置注意力模块结构图。FIG3 is a structural diagram of the position attention module in the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合附图和具体实施例,对本发明作进一步详细描述。The present invention is further described in detail below in conjunction with the accompanying drawings and specific embodiments.
参照图1,本实例的实现步骤包括如下。1 , the implementation steps of this example include the following.
步骤1,获取训练集T和测试集V。Step 1: Get the training set T and the test set V.
1.1)从ScanNet官网下载3D点云数据的训练文件和测试文件,其中训练文件包含f0个点云场景,测试文件中包含f1个点云场景,本实施例中f0=1201,f1=312;1.1) Download the training file and test file of 3D point cloud data from ScanNet official website, where the training file contains f 0 point cloud scenes and the test file contains f 1 point cloud scenes. In this embodiment, f 0 =1201 and f 1 =312;
1.2)使用直方图统计训练文件中所有f0个场景的点云数据各个类别的数目,并计算各个类别的权重wk:1.2) Use the histogram to count the number of each category of point cloud data of all f 0 scenes in the training file, and calculate the weight w k of each category:
其中,Gk表示第k类点云数据的数目,M表示所有点云数据的数目,L表示分割类别数,L≥2,本实施例中L=21;Wherein, G k represents the number of point cloud data of the kth type, M represents the number of all point cloud data, L represents the number of segmentation categories, L ≥ 2, and in this embodiment, L = 21;
1.3)对训练文件中的每个场景,随机选取一个点作为中心点,坐标为(x,y,z),在其周围取(x-0.75,x+0.75),(y-0.75,y+0.75),(z-0.75,z+0.75)范围中的点,组成一个数据块;1.3) For each scene in the training file, randomly select a point as the center point with coordinates (x, y, z), and take points in the range of (x-0.75, x+0.75), (y-0.75, y+0.75), and (z-0.75, z+0.75) around it to form a data block;
1.4)设置采样点数N0,将(1.3)得到的数据块中的点数与采样点数N0进行比较,判断其是否合理:1.4) Set the number of sampling points N 0 , compare the number of points in the data block obtained in (1.3) with the number of sampling points N 0 to determine whether it is reasonable:
若该数据块中的点数大于采样点数N0,则判为该数据块合理,并在该数据块中随机采样N0个点,组成一个样本数据,否则,抛弃该数据块,由此得到训练集T,本实施例中,N0=8192;If the number of points in the data block is greater than the number of sampling points N 0 , the data block is judged to be reasonable, and N 0 points are randomly sampled in the data block to form a sample data. Otherwise, the data block is discarded, thereby obtaining a training set T. In this embodiment, N 0 =8192;
1.5)对于测试文件中所有f1个场景中的每一个场景,使用大小为1.5×1.5×3的立方体窗口进行滑窗切块,对每个数据块,随机采样N0个点,组成一个样本数据,得到测试集V。1.5) For each of all f 1 scenes in the test file, use a cubic window of size 1.5×1.5×3 to perform sliding window cutting. For each data block, randomly sample N 0 points to form a sample data to obtain the test set V.
步骤2,构建3D点云语义分割网络。Step 2: Build a 3D point cloud semantic segmentation network.
参照图2,本步骤的构建的3D点云语义分割网络包括依次级联的特征下采样网络,位置注意力模块,特征上采样网络和辅助网络。Referring to Figure 2, the 3D point cloud semantic segmentation network constructed in this step includes a feature downsampling network, a position attention module, a feature upsampling network and an auxiliary network that are cascaded in sequence.
2.1)设置特征下采样网络:2.1) Set up feature downsampling network:
该特征下采样网络包括n个级联的PointSA模块,每个PointSA模块包括依次级联的点云质心采样以及分组层、点云特征提取层,其中,n≥2,本实施例中该参数设置为n=4;The feature downsampling network includes n cascaded PointSA modules, each PointSA module includes sequentially cascaded point cloud centroid sampling and grouping layers, and point cloud feature extraction layers, wherein n≥2, and in this embodiment, the parameter is set to n=4;
对于第m个PointSA模块的质心采样以及分组层,m=1,2,...,n,首先采用迭代最远点采样法从输入点集中采样个点作为质心点,其次,以个采样的质心点为中心,使用球形搜索算法,在其特定半径rm的范围内搜索个点,组成一个分组。本实施例中第1个PointSA模块,设置r1=0.1;第2个PointSA模块,设置 r2=0.2;第3个PointSA模块,设置r3=0.4;第4个PointSA模块,设置r4=0.8;For the centroid sampling and grouping layer of the m-th PointSA module, m = 1, 2, ..., n, the iterative farthest point sampling method is first used to sample from the input point set points as the centroid, and secondly, The centroid of the samples is used as the center, and the spherical search algorithm is used to search within the range of its specific radius r m. Points form a group. In this embodiment, the first PointSA module is set r 1 = 0.1; the second PointSA module, set r 2 = 0.2; The third PointSA module is set r 3 = 0.4; the fourth PointSA module, set r4 = 0.8;
对于第m个PointSA模块的点云特征提取层,包括3个依次级联的2D卷积层,用于提取质心采样及分组层输出数据的特征,并使用最大池化策略对提取到的区域特征进行池化。本实施例中第1个PointSA模块的点云特征提取层的3个2D卷积层的卷积核大小均为1×1,步长均为1,输出通道数分别是32、32、64;第2个PointSA模块的点云特征提取层的3个2D卷积层的卷积核大小均为1×1,步长均为1,输出通道数分别是64、64、128;第3个PointSA模块的点云特征提取层的3个2D卷积层的卷积核大小均为1×1,步长均为1,输出通道数分别是128、128、256;第4个PointSA模块的点云特征提取层的3个2D卷积层的卷积核大小均为1×1,步长均为1,输出通道数分别是256、256、512;For the point cloud feature extraction layer of the mth PointSA module, it includes three cascaded 2D convolutional layers to extract the features of the centroid sampling and grouping layer output data, and uses the maximum pooling strategy to pool the extracted regional features. In this embodiment, the convolution kernel size of the three 2D convolution layers of the point cloud feature extraction layer of the first PointSA module is 1×1, the step size is 1, and the number of output channels is 32, 32, and 64 respectively; the convolution kernel size of the three 2D convolution layers of the point cloud feature extraction layer of the second PointSA module is 1×1, the step size is 1, and the number of output channels is 64, 64, and 128 respectively; the convolution kernel size of the three 2D convolution layers of the point cloud feature extraction layer of the third PointSA module is 1×1, the step size is 1, and the number of output channels is 128, 128, and 256 respectively; the convolution kernel size of the three 2D convolution layers of the point cloud feature extraction layer of the fourth PointSA module is 1×1, the step size is 1, and the number of output channels is 256, 256, and 512 respectively;
2.2)设置位置注意力模块,用于计算其输入数据F的各个质心所代表的特征之间的相关性,得到位置注意力加强后的特征E:2.2) Set up a position attention module to calculate the correlation between the features represented by each centroid of its input data F, and obtain the feature E after position attention enhancement:
参照图3,该模块工作原理如下:Referring to Figure 3, the working principle of this module is as follows:
2.2.1)输入数据F分别经过第一1D卷积层Q得到第i个质心的特征Qi,i=1,2,...,N,N表示F的质心数量;再经过第二1D卷积层U,得到第j个质心的特征Uj,j=1,2,...,N,再经过第三1D卷积层V得到第j个质心的特征Vj;其中,这三个1D卷积层Q、U、V的卷积核大小均为1,步长均为1,第一1D卷积层Q和第二1D卷积层U的输出特征通道数均为输入数据F特征通道数的第三1D卷积层V的输出特征通道数与输入数据F的特征通道数相同;2.2.1) The input data F passes through the first 1D convolutional layer Q to obtain the feature Qi of the i-th centroid, i = 1, 2, ..., N, N represents the number of centroids of F; then passes through the second 1D convolutional layer U to obtain the feature Uj of the j-th centroid, j = 1, 2, ..., N, and then passes through the third 1D convolutional layer V to obtain the feature Vj of the j-th centroid; wherein the convolution kernel size of these three 1D convolutional layers Q, U, and V is 1, the step size is 1, and the number of output feature channels of the first 1D convolutional layer Q and the second 1D convolutional layer U is twice the number of feature channels of the input data F The number of output feature channels of the third 1D convolutional layer V is the same as the number of feature channels of the input data F;
2.2.2)计算各个质心所代表的特征之间的注意力影响值tij:使用tij构成矩阵A:2.2.2) Calculate the attention influence value tij between the features represented by each centroid: Use t ij to construct the matrix A:
2.2.3)计算位置注意力特征 2.2.3) Calculate position attention features
2.2.4)输出注意力加强后的特征E:2.2.4) Output the feature E after attention enhancement:
E=[E1;E2;...;Ei;...;EN],E=[E 1 ; E 2 ;...; E i ;...; E N ],
其中,Ei=αJi+Fi表示E中第i个质心的特征,α表示位置注意力特征Ji的权重,Fi表示输入的第i个质心的特征;Among them, E i = αJ i + Fi represents the feature of the i-th centroid in E, α represents the weight of the position attention feature Ji , and Fi represents the feature of the i-th centroid of the input;
2.3)设置特征上采样网络:2.3) Set up the feature upsampling network:
该特征上采样网络包括依次级联的a个PointFP模块、1D卷积层、Dropout层和用于分类的1D卷积层,每个PointFP模块包括依次级联的特征插值层和特征提取层,其中,a≥2,本实施例中该参数设置为a=4;The feature upsampling network includes a PointFP modules, a 1D convolution layer, a Dropout layer, and a 1D convolution layer for classification, which are cascaded in sequence. Each PointFP module includes a feature interpolation layer and a feature extraction layer, which are cascaded in sequence. Where a≥2, in this embodiment, the parameter is set to a=4;
所述a个PointFP模块,其特征插值层和特征提取层的结构有所不同,其中:The structures of the feature interpolation layer and the feature extraction layer of the PointFP module are different, where:
对于第1个PointFP模块,其特征插值层对位置注意力模块的输出特征进行插值,并将插值后的特征与第3个PointSA模块的输出特征进行级联,得到特征插值层的输出特征;其特征提取层,包含2个依次级联的2D卷积层,用来进一步提取该输出特征,2个2D卷积层的卷积核均为卷积核大小均为1×1,步长均为1,输出通道数分别是256、256;For the first PointFP module, its feature interpolation layer interpolates the output features of the position attention module and cascades the interpolated features with the output features of the third PointSA module to obtain the output features of the feature interpolation layer; its feature extraction layer includes two cascaded 2D convolutional layers to further extract the output features. The convolution kernels of the two 2D convolutional layers are both 1×1 in size, 1 in stride, and the number of output channels is 256 and 256 respectively;
对于第2个PointFP模块,其特征插值层对第1个PointFP模块的输出特征进行插值,并将插值后的特征与第2个PointSA模块的输出特征进行级联,得到特征插值层的输出特征;其特征提取层,包含2个依次级联的2D卷积层,用来进一步提取该输出特征,2个2D卷积层的卷积核均为卷积核大小均为1×1,步长均为1,输出通道数分别是256、256;For the second PointFP module, its feature interpolation layer interpolates the output features of the first PointFP module and cascades the interpolated features with the output features of the second PointSA module to obtain the output features of the feature interpolation layer; its feature extraction layer includes two cascaded 2D convolutional layers to further extract the output features. The convolution kernels of the two 2D convolutional layers are both 1×1 in size, 1 in stride, and the number of output channels is 256 and 256 respectively;
对于第3个PointFP模块,其特征插值层对第2个PointFP模块的输出特征进行插值,并将插值后的特征与第1个PointSA模块的输出特征进行级联,得到特征插值层的输出特征;其特征提取层,包含2个依次级联的2D卷积层,用来进一步提取该输出特征,2个2D卷积层的卷积核均为卷积核大小均为1×1,步长均为1,输出通道数分别是256、128;For the third PointFP module, its feature interpolation layer interpolates the output features of the second PointFP module and cascades the interpolated features with the output features of the first PointSA module to obtain the output features of the feature interpolation layer; its feature extraction layer includes two cascaded 2D convolutional layers to further extract the output features. The convolution kernels of the two 2D convolutional layers are both 1×1 in size, 1 in stride, and the number of output channels is 256 and 128 respectively;
对于第4个PointFP模块,其特征插值层对第3个PointFP模块的输出特征进行插值,得到插值后的特征,该插值后的特征作为其特征插值层的输出特征;其特征提取层,包含3个依次级联的2D卷积层,用来进一步提取该输出特征,3个2D卷积层的卷积核均为卷积核大小均为1×1,步长均为1,输出通道数分别是128、128、128。For the fourth PointFP module, its feature interpolation layer interpolates the output features of the third PointFP module to obtain the interpolated features, which are used as the output features of its feature interpolation layer; its feature extraction layer includes three cascaded 2D convolutional layers to further extract the output features. The convolution kernels of the three 2D convolutional layers are all 1×1 in size, with a step size of 1, and the number of output channels are 128, 128, and 128 respectively.
所述1D卷积层,其卷积核大小为1,步长为1,输出特征通道数设置为128;The 1D convolution layer has a convolution kernel size of 1, a step size of 1, and an output feature channel number of 128;
所述Dropout层,其保留概率设置为0.5;The retention probability of the Dropout layer is set to 0.5;
所述用于分类的1D卷积层,其卷积核大小为1,步长为1,输出特征通道数设置为分割的类别数L。The 1D convolution layer used for classification has a convolution kernel size of 1, a step size of 1, and the number of output feature channels is set to the number of segmented categories L.
2.4)设置辅助网络:2.4) Set up auxiliary network:
该辅助网络包括依次级联的b个PointAux模块和用于分类的1D卷积层,每个PointAux模块包括1D卷积层和特征插值层,其中,b≥1,本实施例中b=2;The auxiliary network includes b PointAux modules cascaded in sequence and a 1D convolution layer for classification, each PointAux module includes a 1D convolution layer and a feature interpolation layer, wherein b≥1, and in this embodiment, b=2;
对于第1个PointAux模块,其1D卷积层用来提取第2个PointFP模块输出数据的特征,卷积核大小为1,步长为1,输出特征通道为分割的类别数L;其特征插值层用来对1D卷积层提取到的特征进行插值;For the first PointAux module, its 1D convolution layer is used to extract the features of the output data of the second PointFP module. The convolution kernel size is 1, the step size is 1, and the output feature channel is the number of segmented categories L; its feature interpolation layer is used to interpolate the features extracted by the 1D convolution layer;
对于第2个PointAux模块,其1D卷积层用来提取第1个PointAux模块输出数据的特征,卷积核大小为1,步长为1,输出特征通道为分割的类别数L;其特征插值层用来对1D卷积层提取到的特征进行插值;For the second PointAux module, its 1D convolution layer is used to extract the features of the output data of the first PointAux module, the convolution kernel size is 1, the step size is 1, and the output feature channel is the number of segmented categories L; its feature interpolation layer is used to interpolate the features extracted by the 1D convolution layer;
用于分类的1D卷积层,用于对第2个PointAux模块的输出特征进行分类,其卷积核大小为1,步长为1,输出特征通道数设置为分割的类别数L。The 1D convolutional layer for classification is used to classify the output features of the second PointAux module. Its convolution kernel size is 1, the stride is 1, and the number of output feature channels is set to the number of segmented categories L.
步骤3,设定3D点云语义分割网络的损失函数。Step 3: Set the loss function of the 3D point cloud semantic segmentation network.
本实例将多分类的交叉熵损失函数,作为3D点云语义分割网络的损失函数,其表示公式如下:In this example, the multi-classification cross entropy loss function is used as the loss function of the 3D point cloud semantic segmentation network. The expression formula is as follows:
其中,C代表训练的样本点数,L代表类别总数,wk为第k类的权重,wa为辅助网络的loss的权重,wa∈[0,1],本实施例中wa=0.5;Wherein, C represents the number of training sample points, L represents the total number of categories, wk is the weight of the kth category, wa is the weight of the auxiliary network loss, wa∈ [0,1], and in this embodiment, wa =0.5;
pi,k代表第i个样本点属于第k类的真实概率,若第i个样本点属于第k类,则概率值为1,否则,概率值为0;p i,k represents the true probability that the i-th sample point belongs to the k-th class. If the i-th sample point belongs to the k-th class, the probability value is 1, otherwise, the probability value is 0;
和分别表示特征上采样网络和辅助网络预测的第i个样本点属于第k类的概率,计算公式如下: and They represent the probability that the i-th sample point predicted by the feature upsampling network and the auxiliary network belongs to the k-th class, respectively. The calculation formula is as follows:
其中,分别表示特征上采样网络和辅助网络输出的第i个样本点的第k个通道特征值,计算公式如下:in, They represent the k-th channel feature value of the i-th sample point output by the feature upsampling network and the auxiliary network respectively. The calculation formula is as follows:
其中,xi表示第i个样本点的输入特征,f1表示特征上采样网络,θ1表示特征上采样网络的参数,f2表示辅助网络,θ2表示辅助网络的参数。Among them, xi represents the input feature of the i-th sample point, f1 represents the feature upsampling network, θ1 represents the parameters of the feature upsampling network, f2 represents the auxiliary network, and θ2 represents the parameters of the auxiliary network.
步骤4,使用训练集T,对3D点云语义分割网络进行P轮有监督的训练,P≥500。Step 4: Use the training set T to perform P rounds of supervised training on the 3D point cloud semantic segmentation network, where P ≥ 500.
本实施例中取P=1000,其训练步骤如下:In this embodiment, P=1000, and the training steps are as follows:
4.1)在第q轮训练过程中,设置lq为第q轮训练过程的学习率,设置θq为第q轮训练过程的网络模型的参数,根据步骤3设定的损失函数,使用公式调整θq,得到用于第q+1轮训练过程的网络模型参数θq+1,由此得到第q轮训练过程后的网络模型;4.1) In the qth round of training, set lq as the learning rate of the qth round of training, set θq as the parameter of the network model of the qth round of training, and use the formula according to the loss function set in step 3 Adjust θ q to obtain the network model parameter θ q+1 for the q+1th round of training process, thereby obtaining the network model after the qth round of training process;
4.2)每隔P1轮,将测试集输入到当前网络模型中,得到测试集中所有点云数据的预测类别,P1≥2,本实施例中,P1=5;4.2) Every P 1 rounds, the test set is input into the current network model to obtain the predicted categories of all point cloud data in the test set, P 1 ≥ 2. In this embodiment, P 1 = 5;
4.3)统计测试集中所有点云数据的预测类别与其真实类别相同的数目,计算分割精度:其中,R表示测试集中所有点云数据的预测类别与其真实类别相同的数目,H表示测试集中所有点云数据的数目;4.3) Count the number of predicted categories of all point cloud data in the test set that are the same as their true categories, and calculate the segmentation accuracy: Among them, R represents the number of predicted categories of all point cloud data in the test set that are the same as their true categories, and H represents the number of all point cloud data in the test set;
4.4)将当前网络模型的分割精度acc与之前保存的网络模型的分割精度acc进行比较,若当前网络模型的分割精度acc高于之前保存的网络模型的分割精度acc,则表明当前网络模型更好,并对其进行保存,否则,不进行保存。4.4) Compare the segmentation accuracy acc of the current network model with the segmentation accuracy acc of the previously saved network model. If the segmentation accuracy acc of the current network model is higher than the segmentation accuracy acc of the previously saved network model, it indicates that the current network model is better and is saved. Otherwise, it is not saved.
4.5)P轮训练完成后,把分割精度最高的网络模型作为训练好的网络模型;4.5) After P rounds of training are completed, the network model with the highest segmentation accuracy is used as the trained network model;
步骤5,将测试集V输入到步骤4.5)得到的训练好的网络模型中进行语义分割,得到每一个点的分割结果。Step 5: Input the test set V into the trained network model obtained in step 4.5) for semantic segmentation to obtain the segmentation result of each point.
以下结合仿真实验,对本发明的技术效果作以说明:The following is a description of the technical effects of the present invention in conjunction with simulation experiments:
1.仿真条件1. Simulation conditions
本发明的仿真实验在以下环境中进行。The simulation experiment of the present invention is carried out in the following environment.
硬件平台:Intel(R)Xeon(R)CPU E5-2650v4@2.20GHz,64GB运行内存,Ubuntu16.04操作系统,GeForce GTX TITAN X;Hardware platform: Intel(R) Xeon(R) CPU E5-2650v4@2.20GHz, 64GB RAM, Ubuntu16.04 operating system, GeForce GTX TITAN X;
软件平台:Tensorflow深度学习框架,Python3.5,实验所采用的数据集是点云数据集ScanNet。Software platform: Tensorflow deep learning framework, Python3.5. The dataset used in the experiment is the point cloud dataset ScanNet.
ScanNet是一个通过RGB-D相机所扫描和重建的室内场景点云数据集。总共包含1513个场景,使用1201个场景作为训练集,312个场景作为测试集,包含的类别数有21类。ScanNet is a point cloud dataset of indoor scenes scanned and reconstructed by RGB-D cameras. It contains a total of 1513 scenes, using 1201 scenes as training sets and 312 scenes as test sets, and contains 21 categories.
2.仿真实验:2. Simulation experiment:
根据本发明获取训练集和测试集,构建3D点云语义分割网络,使用训练集对3D点云语义分割网络进行有监督训练,然后使用训练好的网络模型对测试集中的点进行预测,根据步骤4.3的方法计算3D点云分割网络对测试集V的分割精度。According to the present invention, a training set and a test set are obtained, a 3D point cloud semantic segmentation network is constructed, the training set is used to perform supervised training on the 3D point cloud semantic segmentation network, and then the trained network model is used to predict the points in the test set, and the segmentation accuracy of the 3D point cloud segmentation network for the test set V is calculated according to the method in step 4.3.
比较本发明与现有的PointNet++方法对点云数据做语义分割的精度,并使用分割精度作为评价本发明与现有技术好坏的评价指标,结果如表1所示:The accuracy of semantic segmentation of point cloud data by the present invention is compared with that of the existing PointNet++ method, and the segmentation accuracy is used as an evaluation index to evaluate the quality of the present invention and the existing technology. The results are shown in Table 1:
表1 ScanNet数据集分割精度对比表Table 1 ScanNet dataset segmentation accuracy comparison table
从表1中可以看出,本发明在ScanNet数据集上的分割精度超过了现有技术Pointnet++,提升了1.6%,表明本发明对3D点云的语义分割效果强于PointNet++。As can be seen from Table 1, the segmentation accuracy of the present invention on the ScanNet dataset exceeds that of the prior art Pointnet++ by 1.6%, indicating that the semantic segmentation effect of the present invention on 3D point clouds is stronger than that of PointNet++.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910604264.0A CN110322453B (en) | 2019-07-05 | 2019-07-05 | 3D point cloud semantic segmentation method based on position attention and auxiliary network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910604264.0A CN110322453B (en) | 2019-07-05 | 2019-07-05 | 3D point cloud semantic segmentation method based on position attention and auxiliary network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110322453A CN110322453A (en) | 2019-10-11 |
CN110322453B true CN110322453B (en) | 2023-04-18 |
Family
ID=68122807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910604264.0A Active CN110322453B (en) | 2019-07-05 | 2019-07-05 | 3D point cloud semantic segmentation method based on position attention and auxiliary network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110322453B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827398B (en) * | 2019-11-04 | 2023-12-26 | 北京建筑大学 | Automatic semantic segmentation method for indoor three-dimensional point cloud based on deep neural network |
CN111127493A (en) * | 2019-11-12 | 2020-05-08 | 中国矿业大学 | Remote sensing image semantic segmentation method based on attention multi-scale feature fusion |
CN111223120B (en) * | 2019-12-10 | 2023-08-04 | 南京理工大学 | Point cloud semantic segmentation method |
CN111192270A (en) * | 2020-01-03 | 2020-05-22 | 中山大学 | Point cloud semantic segmentation method based on point global context reasoning |
CN111428619B (en) * | 2020-03-20 | 2022-08-05 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels |
CN111583263B (en) * | 2020-04-30 | 2022-09-23 | 北京工业大学 | A point cloud segmentation method based on joint dynamic graph convolution |
CN112633330B (en) * | 2020-12-06 | 2024-02-02 | 西安电子科技大学 | Point cloud segmentation method, system, medium, computer equipment, terminal and application |
CN112560865B (en) * | 2020-12-23 | 2022-08-12 | 清华大学 | A Semantic Segmentation Method for Point Clouds in Large Outdoor Scenes |
CN112927248B (en) * | 2021-03-23 | 2022-05-10 | 重庆邮电大学 | Point cloud segmentation method based on local feature enhancement and conditional random field |
CN113205509B (en) * | 2021-05-24 | 2021-11-09 | 山东省人工智能研究院 | Blood vessel plaque CT image segmentation method based on position convolution attention network |
CN113554653B (en) * | 2021-06-07 | 2024-10-29 | 之江实验室 | Semantic segmentation method based on mutual information calibration point cloud data long tail distribution |
CN113470048B (en) * | 2021-07-06 | 2023-04-25 | 北京深睿博联科技有限责任公司 | Scene segmentation method, device, equipment and computer readable storage medium |
CN114140841A (en) * | 2021-10-30 | 2022-03-04 | 华为技术有限公司 | Processing method of point cloud data, training method of neural network and related equipment |
CN115619963B (en) * | 2022-11-14 | 2023-06-02 | 吉奥时空信息技术股份有限公司 | Urban building entity modeling method based on content perception |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102034267A (en) * | 2010-11-30 | 2011-04-27 | 中国科学院自动化研究所 | Three-dimensional reconstruction method of target based on attention |
CN102036073B (en) * | 2010-12-21 | 2012-11-28 | 西安交通大学 | Method for encoding and decoding JPEG2000 image based on vision potential attention target area |
US11094137B2 (en) * | 2012-02-24 | 2021-08-17 | Matterport, Inc. | Employing three-dimensional (3D) data predicted from two-dimensional (2D) images using neural networks for 3D modeling applications and other applications |
CN103871050B (en) * | 2014-02-19 | 2017-12-29 | 小米科技有限责任公司 | icon dividing method, device and terminal |
US11004202B2 (en) * | 2017-10-09 | 2021-05-11 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for semantic segmentation of 3D point clouds |
US10824862B2 (en) * | 2017-11-14 | 2020-11-03 | Nuro, Inc. | Three-dimensional object detection for autonomous robotic systems using image proposals |
CN109871532B (en) * | 2019-01-04 | 2022-07-08 | 平安科技(深圳)有限公司 | Text theme extraction method and device and storage medium |
-
2019
- 2019-07-05 CN CN201910604264.0A patent/CN110322453B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110322453A (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110322453B (en) | 3D point cloud semantic segmentation method based on position attention and auxiliary network | |
CN109410321B (en) | Three-dimensional reconstruction method based on convolutional neural network | |
CN111299815B (en) | A method for visual inspection and laser cutting trajectory planning for low-gray rubber pads | |
CN110245709B (en) | 3D point cloud data semantic segmentation method based on deep learning and self-attention | |
CN109029363A (en) | A kind of target ranging method based on deep learning | |
CN111860587B (en) | Detection method for small targets of pictures | |
CN114973002A (en) | Improved YOLOv 5-based ear detection method | |
CN109377555B (en) | Target feature extraction and recognition method for 3D reconstruction of autonomous underwater robot's foreground field of view | |
CN108629288A (en) | A kind of gesture identification model training method, gesture identification method and system | |
CN110070574B (en) | Binocular vision stereo matching method based on improved PSMAT net | |
CN103310481A (en) | Point cloud reduction method based on fuzzy entropy iteration | |
CN107977660A (en) | Region of interest area detecting method based on background priori and foreground node | |
CN113610905B (en) | Deep learning remote sensing image registration method based on sub-image matching and application | |
CN110222767A (en) | Three-dimensional point cloud classification method based on nested neural and grating map | |
CN116703932A (en) | CBAM-HRNet model wheat spike grain segmentation and counting method based on convolution attention mechanism | |
CN116645595A (en) | Method, device, equipment and medium for recognizing building roof contours from remote sensing images | |
CN117496384A (en) | A method for object detection in drone images | |
CN111339924A (en) | Polarized SAR image classification method based on superpixel and full convolution network | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN113989296A (en) | Unmanned aerial vehicle wheat field remote sensing image segmentation method based on improved U-net network | |
CN112819832A (en) | Urban scene semantic segmentation fine-grained boundary extraction method based on laser point cloud | |
CN114399728B (en) | Foggy scene crowd counting method | |
CN115272278A (en) | Method for constructing change detection model for remote sensing image change detection | |
CN112967296B (en) | Point cloud dynamic region graph convolution method, classification method and segmentation method | |
CN112990336B (en) | Deep three-dimensional point cloud classification network construction method based on competitive attention fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |