WO2020119619A1 - Network optimization structure employing 3d target classification and scene semantic segmentation - Google Patents

Network optimization structure employing 3d target classification and scene semantic segmentation Download PDF

Info

Publication number
WO2020119619A1
WO2020119619A1 PCT/CN2019/123947 CN2019123947W WO2020119619A1 WO 2020119619 A1 WO2020119619 A1 WO 2020119619A1 CN 2019123947 W CN2019123947 W CN 2019123947W WO 2020119619 A1 WO2020119619 A1 WO 2020119619A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
layer
features
points
semantic segmentation
Prior art date
Application number
PCT/CN2019/123947
Other languages
French (fr)
Chinese (zh)
Inventor
程俊
张锲石
王胜文
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2020119619A1 publication Critical patent/WO2020119619A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Definitions

  • the invention relates to the field of robots and reinforcement learning, and in particular to a network optimization structure based on 3D target classification and scene semantic segmentation.
  • PointNet++ is a recently proposed network structure for 3D target classification and scene semantic segmentation. Although it has achieved relatively satisfactory results, there are still two problems:
  • PointNet++ uses the farthest point sampling (FPS) algorithm when selecting centroid points. Although this algorithm can cover the entire data set better than random point selection, it ignores the classification of the features of each point. The fact that the contribution of the segmentation task is different. Therefore, FPS cannot guarantee that the selected set of centroid points can correctly represent the main features of the object;
  • FPS farthest point sampling
  • Multi-scale grouping MSG
  • Multi-resolution grouping MRG
  • PointNet++ Multi-scale grouping
  • MSG is the fusion of multi-scale features of the same point on the same layer
  • MRG is the global of different layers Feature fusion. This feature fusion method ignores the characteristics of the same point between different levels.
  • the present invention proposes a network optimization structure based on 3D target classification and scene semantic segmentation, which can not only improve the classification performance of PointNet++ for objects, but also improve the performance of scene segmentation.
  • the technical solution of the present invention to solve the above problems is: a network optimization structure based on 3D target classification and scene semantic segmentation, which is special in that it includes the following steps:
  • the collected point set is a subset of the previous point set. According to this characteristic, the same point has different characteristics in each layer, so the next layer is being processed.
  • This fusion method is to fuse the fine-grained features of the specified point.
  • the output of the module is Y:
  • M is the number of categories that need to be predicted
  • the PS module uses a 2-layer CNN layer and the convolution kernel size of each layer is 1x1.
  • the present invention is a network optimization structure based on 3D target classification and scene semantic segmentation. It proposes a new method to select centroid points and score the contribution degree of points before feature extraction, so that the selected point set can reflect The main characteristics of the target;
  • the Multi-level-point feature (MLPF) structure is proposed.
  • the MLPF method can extract different levels of features for each center point of interest and fuse them. Although MLPF also uses different levels of features, the object of action is a point. Not the area. And this feature extraction method is more universal and can be used in other networks;
  • FIG. 1 is a schematic structural diagram of a PS provided by an embodiment of the present invention (different numbers of points represent different importance);
  • FIG. 2 is a schematic diagram of center point selection between levels and multi-level fusion of features at the same point provided by an embodiment of the present invention (where l i represents the characteristics of the i-th layer).
  • a network optimization structure based on 3D target classification and scene semantic segmentation includes the following steps:
  • the PS module selects feature points with a new point-selection method.
  • the new point-selection method is a new method based on the attention mechanism to select those features that contribute more to the task Point, so that the selected point set can better represent the entire sampled space, the schematic diagram of the PS module structure is shown in Figure 1 (different numbers of points represent different importance);
  • the collected point set is a subset of the previous point set. According to this characteristic, the same point has different characteristics in each layer, so the next layer is being processed.
  • This fusion method is to fuse the fine-grained features of the specified point. The process is shown in Figure 2:
  • Figure 2 is the selection of the center point between levels and the multi-level fusion of features at the same point, where l i represents the characteristics of the i-th layer.
  • each layer of feature point sets is a subset of the previous layer, and the same point contains different feature information in different layers, so we can fuse these features to get more Powerful features.
  • layer l i+1 contains 3 points: point 1, point 2 and point 3. These points are obtained through the first two layers of feature selection.
  • the features of the points in the next layer are only related to the previous layer, and the previous features are not considered.
  • the corresponding diagram is the dotted line 2 with only l i-1 to l i and l i to l i+1 , and the dotted line 1 without l i-1 to l i+ 1.
  • the specific process is as follows:
  • C i represents the set of centroid points output by the i-th layer
  • N j represents the centroid of the points C i
  • C i F i representative feature points corresponding to the set
  • C i+1 the index to filter out the features of the middle point of C i+1 in the front i layer and perform feature stitching F fuse :
  • the output of the module is Y:
  • M is the number of categories that need to be predicted
  • the PS module uses a 2-layer CNN layer and the convolution kernel size of each layer is 1x1.

Abstract

A network structure optimization method employing 3D target classification and scene semantic segmentation, relating to the field of robots and the field of reinforcement learning. The method comprises: after acquiring the features of points, scoring each of the points, the level of the score representing the contribution of the point to a task; and sorting the scores, and selecting the top N points. In center point sampling, all of acquired point sets are subsets of point sets in a previous layer, and thus, the same point has different features in the same layer. Thus, when feature extraction is performed on the next layer, different features located in the same point in the previous layer can be combined, and the combination technique combines fine-grained features of a specified point. The method improves the classification performance of PointNet++ for objects, and improves performance for scene segmentation.

Description

一种基于3D目标分类和场景语义分割的网络优化结构A network optimization structure based on 3D target classification and scene semantic segmentation 技术领域Technical field
本发明涉及机器人与强化学习领域,具体而言,涉及一种基于3D目标分类和场景语义分割的网络优化结构。The invention relates to the field of robots and reinforcement learning, and in particular to a network optimization structure based on 3D target classification and scene semantic segmentation.
背景技术Background technique
PointNet++是近期提出的用于3D目标分类和场景语义分割的网络结构。虽然它已经取得了比较理想的效果,但仍然存在以下两个问题:PointNet++ is a recently proposed network structure for 3D target classification and scene semantic segmentation. Although it has achieved relatively satisfactory results, there are still two problems:
1)PointNet++在选择质心点的时候采用的是最远点采样(FPS)算法,该算法虽然与随机选点相比能更好的覆盖整个数据集,但是它忽略了每个点的特征对分类和分割任务的贡献不同这样一个事实。因此,FPS不能保证所选择的质心点的点集能够正确地表示物体的主体特征;1) PointNet++ uses the farthest point sampling (FPS) algorithm when selecting centroid points. Although this algorithm can cover the entire data set better than random point selection, it ignores the classification of the features of each point. The fact that the contribution of the segmentation task is different. Therefore, FPS cannot guarantee that the selected set of centroid points can correctly represent the main features of the object;
2)PointNet++中用Multi-scalegrouping(MSG)和Multi-resolution grouping(MRG)解决了点云的密度不均匀难题,但是MSG是对同一层同一点的多尺度特征融合,MRG是对不同层的全局特征融合。这种征融合方式忽视了同一点在不同级别之间的特征。2) Multi-scale grouping (MSG) and Multi-resolution grouping (MRG) are used in PointNet++ to solve the problem of uneven density of point clouds, but MSG is the fusion of multi-scale features of the same point on the same layer, and MRG is the global of different layers Feature fusion. This feature fusion method ignores the characteristics of the same point between different levels.
发明内容Summary of the invention
为解决上述背景技术中存在的问题,本发明提出一种基于3D目标分类和场景语义分割的网络优化结构,不仅可以提高PointNet++对物体的分类性能,还可以提高对场景分割的性能。To solve the above problems in the background art, the present invention proposes a network optimization structure based on 3D target classification and scene semantic segmentation, which can not only improve the classification performance of PointNet++ for objects, but also improve the performance of scene segmentation.
本发明解决上述问题的技术方案是:一种基于3D目标分类和场景语义分割的网络优化结构,其特殊之处在于,包括以下步骤:The technical solution of the present invention to solve the above problems is: a network optimization structure based on 3D target classification and scene semantic segmentation, which is special in that it includes the following steps:
1)构建PS模块1) Build PS module
1.1)获取点的特征;1.1) The characteristics of the acquisition point;
1.2)对每一个点进行打分,分数的高低代表该点对于任务的贡献程度;1.2) Scoring each point, the level of the score represents the point's contribution to the task;
1.3)对分数进行排序,取前N个点,其中,所述N为想要采样的点的个 数;1.3) Sort the scores and take the first N points, where N is the number of points that you want to sample;
2)MLPF特征提取和融合2) MLPF feature extraction and fusion
在进行中心点采样时,所采集得到的点集都是上一层点集的子集,根据这一特性,相同点在每一层中所拥有的特征都不一样,所以在进行下一层特征提取的时候,我们可以把之前层的位于同一个点的不同特征进行融合,这种融合方式是对指定点的细粒度的特征融合。When sampling the center point, the collected point set is a subset of the previous point set. According to this characteristic, the same point has different characteristics in each layer, so the next layer is being processed. During feature extraction, we can fuse the different features of the previous layer at the same point. This fusion method is to fuse the fine-grained features of the specified point.
进一步地,步骤1.2)中,对每一个点进行打分时,是用打分函数α(f n;θ)对每一个点进行打分,其中f n∈R d,n=1,2,…,N代表d维特征,θ代表学习得到的参数; Further, in step 1.2), when scoring each point, a scoring function α (f n ; θ) is used to score each point, where f nRd , n = 1, 2, ..., N Represents the d-dimensional feature, and θ represents the learned parameter;
在训练该PS模块时,模块的输出为Y:When training the PS module, the output of the module is Y:
Figure PCTCN2019123947-appb-000001
Figure PCTCN2019123947-appb-000001
其中
Figure PCTCN2019123947-appb-000002
代表最后一层输出层的权重,M是需要预测的类别个数;
among them
Figure PCTCN2019123947-appb-000002
Represents the weight of the last output layer, M is the number of categories that need to be predicted;
在训练的时候,使用crossentropyloss函数来收敛,loss函数公式如下:During training, use the crossentropyloss function to converge. The loss function formula is as follows:
L=-[y *ln p+(1-y *)ln (1-p)]   (2), L=-[y * ln p +(1-y * )ln (1-p) ] (2),
其中y *代表标签,
Figure PCTCN2019123947-appb-000003
Where y * represents the label,
Figure PCTCN2019123947-appb-000003
PS模块使用的是2层的CNN层且每层的卷积核大小都是1x1。The PS module uses a 2-layer CNN layer and the convolution kernel size of each layer is 1x1.
本发明的优点:Advantages of the invention:
1)本发明一种基于3D目标分类和场景语义分割的网络优化结构,其提出一种新的方法选择质心点,对特征提取之前的点进行贡献度的打分,使得选出的点集能体现目标的主体特征;1) The present invention is a network optimization structure based on 3D target classification and scene semantic segmentation. It proposes a new method to select centroid points and score the contribution degree of points before feature extraction, so that the selected point set can reflect The main characteristics of the target;
2)提出了Multi-level-pointfeature(MLPF)结构,MLPF方法可以对每个感兴趣的中心点分别提取不同层级的特征进行融合,虽然MLPF也是利用的不同层级的特征,但是作用对象却是点而不是区域。而且这种特征提取的方法更具有普适性,可以用到其他网络中;2) The Multi-level-point feature (MLPF) structure is proposed. The MLPF method can extract different levels of features for each center point of interest and fuse them. Although MLPF also uses different levels of features, the object of action is a point. Not the area. And this feature extraction method is more universal and can be used in other networks;
3)另外,提出一种新的特征融合的方式,从而可以提取更细粒度的特征。 此外,这两种结构不仅适用于PointNet++,而且可以应用于其他网络结构当中,从而提高网络的整体性能,并且可以有效地防止过拟合问题。所以我们的结构对于场景的目标分类和场景语义分割具有重要的使用和参考价值。3) In addition, a new feature fusion method is proposed, so that more fine-grained features can be extracted. In addition, these two structures are not only applicable to PointNet++, but can also be applied to other network structures, thereby improving the overall performance of the network, and can effectively prevent overfitting problems. Therefore, our structure has important use and reference value for scene target classification and scene semantic segmentation.
附图说明BRIEF DESCRIPTION
图1是本发明实施例提供的PS的结构示意图(点的不同编号代表不同的重要性);FIG. 1 is a schematic structural diagram of a PS provided by an embodiment of the present invention (different numbers of points represent different importance);
图2是本发明实施例提供的层级间的中心点的筛选和同一点的特征的多级融合示意图(其中l i代表第i层的特征)。 FIG. 2 is a schematic diagram of center point selection between levels and multi-level fusion of features at the same point provided by an embodiment of the present invention (where l i represents the characteristics of the i-th layer).
具体实施方式detailed description
为使本发明实施方式的目的、技术方案和优点更加清楚,下面将结合本发明实施方式中的附图,对本发明实施方式中的技术方案进行清楚、完整地描述,显然,所描述的实施方式是本发明一部分实施方式,而不是全部的实施方式。基于本发明中的实施方式,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施方式,都属于本发明保护的范围。因此,以下对在附图中提供的本发明的实施方式的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施方式。基于本发明中的实施方式,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施方式,都属于本发明保护的范围。To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments in the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.
一种基于3D目标分类和场景语义分割的网络优化结构,包括以下步骤:A network optimization structure based on 3D target classification and scene semantic segmentation includes the following steps:
1)构建PS模块,所述PS模块用新的选点方式Point-selection选择特征点,新的选点方式Point-selection是基于注意力机制的新方法来选择那些对任务有更多贡献的特征点,以便所选择的点集更能代表整个被采样空间,PS模块结构示意图如图1(点的不同编号代表不同的重要性)所示;1) Build a PS module. The PS module selects feature points with a new point-selection method. The new point-selection method is a new method based on the attention mechanism to select those features that contribute more to the task Point, so that the selected point set can better represent the entire sampled space, the schematic diagram of the PS module structure is shown in Figure 1 (different numbers of points represent different importance);
1.1)获取点的特征;1.1) The characteristics of the acquisition point;
1.2)对每一个点进行打分,分数的高低代表该点对于任务的贡献程度;1.2) Scoring each point, the level of the score represents the point's contribution to the task;
1.3)对分数进行排序,取前N个点,其中,所述N为想要采样的点的个 数。这样一来,同样取相同个数的点,使用该方法得到的点比FPS算法选取的点集更具代表性,特征更加明显。1.3) Sort the scores and take the first N points, where N is the number of points you want to sample. In this way, the same number of points is taken, and the points obtained by this method are more representative and more obvious than the point set selected by the FPS algorithm.
2)MLPF特征提取和融合2) MLPF feature extraction and fusion
在进行中心点采样时,所采集得到的点集都是上一层点集的子集,根据这一特性,相同点在每一层中所拥有的特征都不一样,所以在进行下一层特征提取的时候,我们可以把之前层的位于同一个点的不同特征进行融合,这种融合方式是对指定点的细粒度的特征融合,该过程如图2所示:When sampling the center point, the collected point set is a subset of the previous point set. According to this characteristic, the same point has different characteristics in each layer, so the next layer is being processed. During feature extraction, we can fuse the different features of the previous layer at the same point. This fusion method is to fuse the fine-grained features of the specified point. The process is shown in Figure 2:
图2是层级间的中心点的筛选和同一点的特征的多级融合,其中l i代表第i层的特征。 Figure 2 is the selection of the center point between levels and the multi-level fusion of features at the same point, where l i represents the characteristics of the i-th layer.
从图2可以看出,每层特征点集,都是上一层的子集,且同一个点在不同层所包含的特征信息都不太一样,所以我们可以把这些特征进行融合,得到更强大的特征。例如,第l i+1层包含了3个点:1号点,2号点和3号点。这些点是经过前面两层特征筛选得到的。原始的PointNet++中下一层的点的特征都只与上一层有关,并没有考虑到更前面的特征。相对应图示就是只有l i-1到l i和l i到l i+1的虚线2,而没有l i-1到l i+1的虚线1。我们经过这样多层级的同一点的特征融合达到一种细粒度特征的融合。这样的特征所包含的信息更加丰富。具体流程如下: It can be seen from Figure 2 that each layer of feature point sets is a subset of the previous layer, and the same point contains different feature information in different layers, so we can fuse these features to get more Powerful features. For example, layer l i+1 contains 3 points: point 1, point 2 and point 3. These points are obtained through the first two layers of feature selection. In the original PointNet++, the features of the points in the next layer are only related to the previous layer, and the previous features are not considered. The corresponding diagram is the dotted line 2 with only l i-1 to l i and l i to l i+1 , and the dotted line 1 without l i-1 to l i+ 1. We have achieved a fusion of fine-grained features through such multi-level feature fusion at the same point. Such features contain more information. The specific process is as follows:
Figure PCTCN2019123947-appb-000004
Figure PCTCN2019123947-appb-000004
其中,C i代表第i层输出的质心点的集合,
Figure PCTCN2019123947-appb-000005
表示C i中第n j个质心点;而F i代表C i中相对应点的特征集合,
Figure PCTCN2019123947-appb-000006
为点
Figure PCTCN2019123947-appb-000007
的特征。
Where C i represents the set of centroid points output by the i-th layer,
Figure PCTCN2019123947-appb-000005
N j represents the centroid of the points C i; and C i F i representative feature points corresponding to the set,
Figure PCTCN2019123947-appb-000006
For the point
Figure PCTCN2019123947-appb-000007
Characteristics.
在进行第i+1层特征提取的时候,首先会进行第i+1质心点集C i+1(
Figure PCTCN2019123947-appb-000008
其中k=1,2,…,i)的选取。在得到C i+1后,我们以C i+1为索引筛选出C i+1中点在前i层的特征并进行特征拼接F fuse
When performing feature extraction at the i+1th layer, the i+1 centroid point set C i+1 (
Figure PCTCN2019123947-appb-000008
Where k=1, 2, ..., i) selection. After obtaining C i+1 , we use C i+1 as the index to filter out the features of the middle point of C i+1 in the front i layer and perform feature stitching F fuse :
Figure PCTCN2019123947-appb-000009
Figure PCTCN2019123947-appb-000009
其中
Figure PCTCN2019123947-appb-000010
代表C i+1中的点在第i层中的特征。故最终i+1层的输入为{C i+1,F fuse},而原始网络中的输入为
Figure PCTCN2019123947-appb-000011
among them
Figure PCTCN2019123947-appb-000010
Represents the feature of the point in C i+1 in the i-th layer. Therefore, the input of the final i+1 layer is {C i+1 , F fuse }, and the input in the original network is
Figure PCTCN2019123947-appb-000011
进一步地,步骤1.2)中,对每一个点进行打分时,是用打分函数α(f n;θ)对每一个点进行打分,其中f n∈R d,n=1,2,…,N代表d维特征,θ代表学习得到的参数; Further, in step 1.2), when scoring each point, a scoring function α (f n ; θ) is used to score each point, where f nRd , n = 1, 2, ..., N Represents the d-dimensional feature, and θ represents the learned parameter;
在训练该PS模块时,模块的输出为Y:When training the PS module, the output of the module is Y:
Figure PCTCN2019123947-appb-000012
Figure PCTCN2019123947-appb-000012
其中
Figure PCTCN2019123947-appb-000013
代表最后一层输出层的权重,M是需要预测的类别个数;
among them
Figure PCTCN2019123947-appb-000013
Represents the weight of the last output layer, M is the number of categories that need to be predicted;
在训练的时候,使用crossentropyloss函数来收敛,loss函数公式如下:During training, use the crossentropyloss function to converge. The loss function formula is as follows:
L=-[y *ln p+(1-y *)ln (1-p)]   (2) L=-[y * ln p +(1-y * )ln (1-p) ] (2)
其中y *代表标签,
Figure PCTCN2019123947-appb-000014
Where y * represents the label,
Figure PCTCN2019123947-appb-000014
PS模块使用的是2层的CNN层且每层的卷积核大小都是1x1。The PS module uses a 2-layer CNN layer and the convolution kernel size of each layer is 1x1.
我们在ModelNet40和ScanNet数据集上进行了一些实验,并与其他先进的方法相比较,结果如表1和表2所示,可以验证本发明优于其他方法。We conducted some experiments on the ModelNet40 and ScanNet data sets, and compared with other advanced methods. The results are shown in Table 1 and Table 2, which can verify that the present invention is superior to other methods.
表1:在ModelNet40数据集上的物体分类结果Table 1: Object classification results on the ModelNet40 dataset
方式the way Mean lossMean Loss Accuracy(%)Accuracy(%) Avg.Acc(%)Avg.Acc(%)
SubvolumeSubvolume -- 89.289.2 86.086.0
MVCNNMVCNN -- 90.190.1 --
PointNetPointNet 0.4910.491 89.289.2 86.286.2
PointNet++(SSG)PointNet++(SSG) 0.4450.445 90.290.2 87.987.9
Ours(PS)Ours(PS) 0.3860.386 90.690.6 88.188.1
Ours(MLPF)Ours(MLPF) 0.3420.342 91.191.1 87.887.8
表2:在ScanNet数据集上的场景语义分割结果Table 2: Scene semantic segmentation results on ScanNet dataset
方式the way Accuracy(%)Accuracy(%)
3DCNN3DCNN 73.073.0
PointNetPointNet 73.973.9
PointNet++(SSG)PointNet++(SSG) 83.383.3
Ours(MLPF)Ours(MLPF) 85.185.1
以上所述仅为本发明的实施例,并非以此限制本发明的保护范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的系统领域,均同理包括在本发明的保护范围内。The above are only embodiments of the present invention, and are not intended to limit the scope of protection of the present invention. Any equivalent structure or equivalent process transformation made by the description and drawings of the present invention, or directly or indirectly used in other related In the field of systems, the same reason is included in the protection scope of the present invention.

Claims (2)

  1. 一种基于3D目标分类和场景语义分割的网络优化结构,其特殊之处在于,包括以下步骤:A network optimization structure based on 3D target classification and scene semantic segmentation is special in that it includes the following steps:
    1)构建PS模块1) Build PS module
    1.1)获取点的特征;1.1) The characteristics of the acquisition point;
    1.2)对每一个点进行打分,分数的高低代表该点对于任务的贡献程度;1.2) Scoring each point, the level of the score represents the point's contribution to the task;
    1.3)对分数进行排序,取前N个点,其中,所述N为想要采样的点的个数;1.3) Sort the scores and take the first N points, where N is the number of points you want to sample;
    2)MLPF特征提取和融合2) MLPF feature extraction and fusion
    在进行中心点采样时,所采集得到的点集都是上一层点集的子集,根据这一特性,相同点在每一层中所拥有的特征都不一样,所以在进行下一层特征提取的时候,可以把之前层的位于同一个点的不同特征进行融合,这种融合方式是对指定点的细粒度的特征融合。When sampling the center point, the collected point set is a subset of the previous layer point set. According to this characteristic, the same point has different characteristics in each layer, so the next layer is being processed. When extracting features, you can fuse the different features of the previous layer at the same point. This fusion method is to fuse the fine-grained features of the specified point.
  2. 根据权利要求1所述的一种基于3D目标分类和场景语义分割的网络优化结构,其特殊之处在于:A network optimization structure based on 3D target classification and scene semantic segmentation according to claim 1, which is special in that:
    步骤1.2)中,对每一个点进行打分时,是用打分函数α(f n;θ)对每一个点进行打分,其中f n∈R d,n=1,2,…,N代表d维特征,θ代表学习得到的参数; In step 1.2), when scoring each point, a scoring function α (f n ; θ) is used to score each point, where f nRd , n = 1, 2, ..., N represents the d dimension Features, θ represents the learned parameters;
    在训练该PS模块时,模块的输出为Y:When training the PS module, the output of the module is Y:
    Figure PCTCN2019123947-appb-100001
    Figure PCTCN2019123947-appb-100001
    其中
    Figure PCTCN2019123947-appb-100002
    代表最后一层输出层的权重,M是需要预测的 类别个数;
    among them
    Figure PCTCN2019123947-appb-100002
    Represents the weight of the last output layer, M is the number of categories that need to be predicted;
    在训练的时候,使用crossentropyloss函数来收敛,loss函数公式如下:During training, use the crossentropyloss function to converge. The loss function formula is as follows:
    L=-[y *ln p+(1-y *)ln (1-p)] (2), L=-[y * ln p +(1-y * )ln (1-p) ] (2),
    其中y *代表标签,
    Figure PCTCN2019123947-appb-100003
    Where y * represents the label,
    Figure PCTCN2019123947-appb-100003
    PS模块使用的是2层的CNN层且每层的卷积核大小都是1x1。The PS module uses a 2-layer CNN layer and the convolution kernel size of each layer is 1x1.
PCT/CN2019/123947 2018-12-14 2019-12-09 Network optimization structure employing 3d target classification and scene semantic segmentation WO2020119619A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811535562.0A CN109753995B (en) 2018-12-14 2018-12-14 Optimization method of 3D point cloud target classification and semantic segmentation network based on PointNet +
CN201811535562.0 2018-12-14

Publications (1)

Publication Number Publication Date
WO2020119619A1 true WO2020119619A1 (en) 2020-06-18

Family

ID=66403851

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/123947 WO2020119619A1 (en) 2018-12-14 2019-12-09 Network optimization structure employing 3d target classification and scene semantic segmentation

Country Status (2)

Country Link
CN (1) CN109753995B (en)
WO (1) WO2020119619A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257597A (en) * 2020-10-22 2021-01-22 中国人民解放军战略支援部队信息工程大学 Semantic segmentation method of point cloud data
CN114241110A (en) * 2022-02-23 2022-03-25 北京邮电大学 Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753995B (en) * 2018-12-14 2021-01-01 中国科学院深圳先进技术研究院 Optimization method of 3D point cloud target classification and semantic segmentation network based on PointNet +
CN110210431B (en) * 2019-06-06 2021-05-11 上海黑塞智能科技有限公司 Point cloud semantic labeling and optimization-based point cloud classification method
CN110245709B (en) * 2019-06-18 2021-09-03 西安电子科技大学 3D point cloud data semantic segmentation method based on deep learning and self-attention
CN110837811B (en) * 2019-11-12 2021-01-05 腾讯科技(深圳)有限公司 Method, device and equipment for generating semantic segmentation network structure and storage medium
CN112085123B (en) * 2020-09-25 2022-04-12 北方民族大学 Point cloud data classification and segmentation method based on salient point sampling
CN112818999B (en) * 2021-02-10 2022-10-28 桂林电子科技大学 Complex scene 3D point cloud semantic segmentation method based on convolutional neural network
US11295170B1 (en) 2021-08-17 2022-04-05 FPT USA Corp. Group-equivariant convolutional neural networks for 3D point clouds

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345887A (en) * 2018-01-29 2018-07-31 清华大学深圳研究生院 The training method and image, semantic dividing method of image, semantic parted pattern
CN108509949A (en) * 2018-02-05 2018-09-07 杭州电子科技大学 Object detection method based on attention map
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN109753995A (en) * 2018-12-14 2019-05-14 中国科学院深圳先进技术研究院 A kind of network optimization structure divided based on 3D target classification and Scene Semantics

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372111B (en) * 2016-08-22 2021-10-15 中国科学院计算技术研究所 Local feature point screening method and system
CN106815604B (en) * 2017-01-16 2019-09-27 大连理工大学 Method for viewing points detecting based on fusion of multi-layer information
CN108596924A (en) * 2018-05-17 2018-09-28 南方医科大学 A kind of MR prostate image partition methods based on distance field fusion and ellipsoid priori

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108345887A (en) * 2018-01-29 2018-07-31 清华大学深圳研究生院 The training method and image, semantic dividing method of image, semantic parted pattern
CN108509949A (en) * 2018-02-05 2018-09-07 杭州电子科技大学 Object detection method based on attention map
CN109753995A (en) * 2018-12-14 2019-05-14 中国科学院深圳先进技术研究院 A kind of network optimization structure divided based on 3D target classification and Scene Semantics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QI, CHARLES R. ET AL.: "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,", 31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS,, 31 December 2017 (2017-12-31), XP055713540, DOI: 20200218090554A *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257597A (en) * 2020-10-22 2021-01-22 中国人民解放军战略支援部队信息工程大学 Semantic segmentation method of point cloud data
CN112257597B (en) * 2020-10-22 2024-03-15 中国人民解放军战略支援部队信息工程大学 Semantic segmentation method for point cloud data
CN114241110A (en) * 2022-02-23 2022-03-25 北京邮电大学 Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation
CN114241110B (en) * 2022-02-23 2022-06-03 北京邮电大学 Point cloud semantic uncertainty sensing method based on neighborhood aggregation Monte Carlo inactivation

Also Published As

Publication number Publication date
CN109753995A (en) 2019-05-14
CN109753995B (en) 2021-01-01

Similar Documents

Publication Publication Date Title
WO2020119619A1 (en) Network optimization structure employing 3d target classification and scene semantic segmentation
CN108596258B (en) Image classification method based on convolutional neural network random pooling
JP6440303B2 (en) Object recognition device, object recognition method, and program
CN109583340B (en) Video target detection method based on deep learning
CN111354017A (en) Target tracking method based on twin neural network and parallel attention module
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
KR101443187B1 (en) medical image retrieval method based on image clustering
CN110210538B (en) Household image multi-target identification method and device
CN109492776B (en) Microblog popularity prediction method based on active learning
CN111062278B (en) Abnormal behavior identification method based on improved residual error network
WO2013053320A1 (en) Image retrieval method and device
CN108664526B (en) Retrieval method and device
CN111860587B (en) Detection method for small targets of pictures
CN111984817B (en) Fine-grained image retrieval method based on self-attention mechanism weighting
Jboor et al. Towards an inpainting framework for visual cultural heritage
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
WO2021012793A1 (en) Lawyer recommendation method based on big data analysis, and related device
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
Wei et al. Region ranking SVM for image classification
US20230089335A1 (en) Training method for robust neural network based on feature matching
CN107977670A (en) Accident classification stage division, the apparatus and system of decision tree and bayesian algorithm
CN113705596A (en) Image recognition method and device, computer equipment and storage medium
CN112800982A (en) Target detection method based on remote sensing scene classification
CN111723852A (en) Robust training method for target detection network
Gao et al. SHREC’15 Track: 3D object retrieval with multimodal views

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19896158

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05/11/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19896158

Country of ref document: EP

Kind code of ref document: A1