CN113378854A - Point cloud target detection method integrating original point cloud and voxel division - Google Patents

Point cloud target detection method integrating original point cloud and voxel division Download PDF

Info

Publication number
CN113378854A
CN113378854A CN202110651776.XA CN202110651776A CN113378854A CN 113378854 A CN113378854 A CN 113378854A CN 202110651776 A CN202110651776 A CN 202110651776A CN 113378854 A CN113378854 A CN 113378854A
Authority
CN
China
Prior art keywords
point cloud
point
feature
layer
voxel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110651776.XA
Other languages
Chinese (zh)
Inventor
姚剑
蒋天园
李寅暄
龚烨
李礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110651776.XA priority Critical patent/CN113378854A/en
Publication of CN113378854A publication Critical patent/CN113378854A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种融合原始点云和体素划分的点云目标检测方法。首先利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征,然后构建损失函数进一步提升无损特征提取网络Pointnt++对局部邻域信息的感知能力,采用三线性插值在体素特征初始化阶段和稀疏卷积感知阶段将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中,最后通过二维RPN对预先设置的检测锚框进行分类、回归得到最终的检测目标。本发明通过将点云的无损编码多尺度嵌入到体素方法中,使检测网络具备多尺度多层级信息融合感知能力,而且本发明融合了基于原始点云和基于体素划分两类点云目标检测方法,同时具备高效的点云感知能力和无损特征编码能力。

Figure 202110651776

The invention relates to a point cloud target detection method integrating original point cloud and voxel division. First, the lossless feature extraction network Pointnet++ is used to extract the local detail features and semantic features of point clouds, and then a loss function is constructed to further improve the perception ability of the lossless feature extraction network Pointnt++ for local neighborhood information. Trilinear interpolation is used in the voxel feature initialization stage and sparse volume. In the cumulative perception stage, the local detail features and semantic features without information loss are embedded into the point cloud target detection network based on voxel division. Finally, the pre-set detection anchor boxes are classified and regressed by two-dimensional RPN to obtain the final detection target. The invention makes the detection network have the ability of multi-scale and multi-level information fusion and perception by embedding the lossless coding multi-scale of the point cloud into the voxel method, and the invention integrates two types of point cloud targets based on the original point cloud and based on the voxel division. The detection method has both efficient point cloud perception ability and lossless feature encoding ability.

Figure 202110651776

Description

一种融合原始点云和体素划分的点云目标检测方法A point cloud object detection method that combines original point cloud and voxel division

技术领域technical field

本发明属于3D点云目标检测技术领域,特别是涉及一种融合原始点云和体素划分的点云目标检测方法。The invention belongs to the technical field of 3D point cloud target detection, and in particular relates to a point cloud target detection method integrating original point cloud and voxel division.

背景技术Background technique

随着车载激光雷达技术的不断升级,车载激光雷达能够快速、方便地获取当前场景的点云数据,利用场景点云的几何结构信息可以实现对场景中目标的提取,该技术方法已渗透到智慧城市建设、自动驾驶、无人配送等多个行业。由于激光点云散乱无序、密度和稀疏性差异巨大,如果采用传统目标检测算法对海量的点云数据进行统一的手工特征提取,无法适应在自动驾驶复杂道路场景下的目标的形体变化。因此,基于深度学习的点云目标检测算法在自动驾驶场景得到了快速地发展和应用。With the continuous upgrading of vehicle lidar technology, vehicle lidar can quickly and easily obtain the point cloud data of the current scene, and the geometric structure information of the scene point cloud can be used to extract the targets in the scene. This technical method has penetrated into the wisdom of Urban construction, autonomous driving, unmanned distribution and other industries. Because the laser point cloud is scattered and disordered, and the density and sparsity vary greatly, if the traditional target detection algorithm is used to extract the massive point cloud data by hand, it cannot adapt to the shape change of the target in the complex road scene of automatic driving. Therefore, point cloud target detection algorithms based on deep learning have been rapidly developed and applied in autonomous driving scenarios.

当前较为通用的基于深度学习的点云目标检测方法主要有:基于原始点云的目标检测和基于体素划分的点云目标检测。At present, the more common point cloud target detection methods based on deep learning mainly include: target detection based on original point cloud and point cloud target detection based on voxel division.

基于原始点云的3D目标检测算法不对场景点云做任何预处理,直接将原始点云坐标及对应的反射率数值输入由多层感知机(MLP)搭建的神经网络,采用最远点采样(FPS)由浅入深地对点云场景进行逐层采样,通过局部点集特征提取模块(Set Abstract)提取局部细节特征和语义特征,最后采用三线性插值将细节信息特征和语义信息特征通过特征传递层(Feature Propagation)赋予给原始场景中的所有点。该方法是无信息丢失的,但是多层感知器对无序点云的感知能力低于基于体素划分方法所采用的由卷积神经网络所搭建的结构。The 3D target detection algorithm based on the original point cloud does not do any preprocessing on the scene point cloud, but directly inputs the original point cloud coordinates and the corresponding reflectance values into the neural network constructed by the multilayer perceptron (MLP), and uses the farthest point sampling ( FPS) samples the point cloud scene layer by layer from shallow to deep, extracts local detail features and semantic features through the local point set feature extraction module (Set Abstract), and finally uses trilinear interpolation to transfer the detail information features and semantic information features through features. Layer (Feature Propagation) assigned to all points in the original scene. This method has no information loss, but the perception ability of multi-layer perceptron for disordered point cloud is lower than the structure constructed by convolutional neural network adopted by voxel-based method.

基于体素划分的点云目标检测根据不同线型激光雷达扫描到的点云密集程度将场景点云划分成均匀的体素格网,再采用适应不同体素大小的的体素特征提取方式对每个体素做特征提取,然后利用3D卷积或者3D稀疏卷积对初始化后的体素场景语义信息做特征提取,并逐步将高度维压缩至一维,进一步采用二维卷积搭建区域提出网络(RPN)在场景俯视图下对每一个卷积格点预先设置的锚框进行分类和预测。该方法能够快速高效地在自动驾驶点云场景中分类出不易形变且点云密度大的物体,但是体素划分导致了原始点云结构几何形变,尤其是针对行人和自行车这种较小物体,体素划分带来的形变丢失了局部细节信息,使得检测的分类和回归效果偏离真实目标。The point cloud target detection based on voxel division divides the scene point cloud into uniform voxel grids according to the density of point clouds scanned by different line-type lidars, and then uses voxel feature extraction methods adapted to different voxel sizes to detect Feature extraction is performed for each voxel, and then 3D convolution or 3D sparse convolution is used to extract features from the initialized voxel scene semantic information, and gradually compress the height dimension to one dimension, and further use two-dimensional convolution to build a region proposal network (RPN) classifies and predicts the pre-set anchor boxes for each convolution lattice point under the top view of the scene. This method can quickly and efficiently classify objects that are not easily deformed and have high point cloud density in autonomous driving point cloud scenes, but the voxel division leads to geometric deformation of the original point cloud structure, especially for small objects such as pedestrians and bicycles. The deformation caused by voxel division loses local detail information, which makes the classification and regression effects of detection deviate from the real target.

发明内容SUMMARY OF THE INVENTION

本发明针对现有技术的不足,提供一种融合原始点云和体素划分的点云目标检测方法。首先利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征,然后构建损失函数进一步提升无损特征提取网络Pointnt++对局部邻域信息的感知能力,接着分别采用三线性插值在体素特征初始化阶段和稀疏卷积感知阶段将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中,最后通过二维RPN对每个预先设置的检测锚框进行分类和回归得到最终的检测目标。Aiming at the deficiencies of the prior art, the present invention provides a point cloud target detection method integrating original point cloud and voxel division. First, the lossless feature extraction network Pointnet++ is used to extract the local detail features and semantic features of the point cloud, and then a loss function is constructed to further improve the perception ability of the lossless feature extraction network Pointnt++ for local neighborhood information. Then, trilinear interpolation is used in the voxel feature initialization stage and The sparse convolution perception stage embeds local detail features and semantic features without information loss into the point cloud object detection network based on voxel division, and finally classifies and regresses each preset detection anchor box through 2D RPN to obtain the final detection target.

为了达到上述目的,本发明提供的技术方案是一种融合原始点云和体素划分的点云目标检测方法,包括以下步骤:In order to achieve the above purpose, the technical solution provided by the present invention is a point cloud target detection method that integrates original point cloud and voxel division, including the following steps:

步骤1,利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征;Step 1, use the lossless feature extraction network Pointnet++ to extract the local detail features and semantic features of the point cloud;

步骤1.1,构建多层编码器;Step 1.1, build a multi-layer encoder;

步骤1.2,通过SA模块无信息丢失地提取每一层点云的局部细节特征和语义特征;Step 1.2, extract the local detail features and semantic features of each layer of point cloud without information loss through the SA module;

步骤1.3,采用三线性插值将步骤1.2提取的细节特征和语义特征通过特征传递层赋予给原始场景中的所有点;Step 1.3, using trilinear interpolation to assign the detailed features and semantic features extracted in step 1.2 to all points in the original scene through the feature transfer layer;

步骤2,构建损失函数,监督步骤1特征提取的执行,促进无损特征提取网络Pointnet++感知特征信息;Step 2, build a loss function, supervise the execution of feature extraction in step 1, and promote the lossless feature extraction network Pointnet++ to perceive feature information;

步骤3,将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中;Step 3: Embed local detail features and semantic features without information loss into the point cloud target detection network based on voxel division;

步骤3.1,利用步骤1提取到的局部细节特征对体素特征进行初始化;Step 3.1, initialize the voxel features using the local detail features extracted in step 1;

步骤3.2,使用3D稀疏卷积对步骤3.1初始化后的体素场景语义信息做特征提取;Step 3.2, use 3D sparse convolution to perform feature extraction on the semantic information of the voxel scene initialized in step 3.1;

步骤3.3,采用三线性插值将步骤1得到的语义特征转化为体素特征;Step 3.3, using trilinear interpolation to convert the semantic features obtained in step 1 into voxel features;

步骤3.4,采用注意力机制方式将步骤3.2经过稀疏卷积感知的语义特征与将步骤3.3转化得到的体素特征进行融合,得到融合两种感知模式的语义信息;In step 3.4, the attention mechanism is used to fuse the semantic features perceived by the sparse convolution in step 3.2 with the voxel features transformed in step 3.3 to obtain semantic information that fuses the two perception modes;

步骤4,将步骤3融合得到的语义特征投影到二维俯视图,通过二维卷积搭建区域提出网络(RPN),在场景俯视图下对每个像素点预先设置的检测锚框进行分类和回归得到最终的检测目标;Step 4: Project the semantic features obtained by fusion in Step 3 to a two-dimensional top view, build a Region Proposition Network (RPN) through two-dimensional convolution, and classify and regress the pre-set detection anchor boxes for each pixel in the top view of the scene. the final detection target;

步骤4.1,RPN网络结构和预定义检测锚框设置;Step 4.1, RPN network structure and predefined detection anchor frame settings;

步骤4.2,点云目标检测损失函数的设计。Step 4.2, the design of point cloud target detection loss function.

而且,所述步骤1.1中构建多层编码器需首先利用最远点采样策略(FPS)从原始点云中采集N个点作为输入点云,然后再利用FPS从输入点云数据中逐层采集数目为

Figure BDA0003111771480000031
的点云,构成4层编码器,每一层输入的点云即为上一层输出的点集。Moreover, the construction of the multi-layer encoder in the step 1.1 needs to first use the farthest point sampling strategy (FPS) to collect N points from the original point cloud as the input point cloud, and then use FPS to collect from the input point cloud data layer by layer. The number is
Figure BDA0003111771480000031
The point cloud of each layer constitutes a 4-layer encoder, and the input point cloud of each layer is the point set output by the previous layer.

而且,所述步骤1.2中每一层SA模块的输入为上一层经过FPS采样得到的固定数目的点集,设点pi为当前层通过FPS采样得到的第i个点,

Figure BDA0003111771480000032
为上一层中以点pi为中心,半径为r的球邻域内部的点所构成的集合,点pi输出特征的计算包括以下几步:Moreover, the input of each layer SA module in the described step 1.2 is a fixed number of point sets obtained by the previous layer through FPS sampling, and the point p i is the i-th point obtained by the current layer through FPS sampling,
Figure BDA0003111771480000032
is the set of points inside the spherical neighborhood with point pi as the center and radius r in the previous layer. The calculation of the output feature of point pi includes the following steps:

步骤1.2.1,从集合

Figure BDA0003111771480000033
中随机采样k个点组成集合
Figure BDA0003111771480000034
Step 1.2.1, from the collection
Figure BDA0003111771480000033
Randomly sample k points in the set to form a set
Figure BDA0003111771480000034

步骤1.2.2,通过多层感知机对步骤1.2.1采样的点进行特征融合提取,计算公式如下:Step 1.2.2, perform feature fusion extraction on the points sampled in step 1.2.1 through a multi-layer perceptron. The calculation formula is as follows:

Figure BDA0003111771480000035
Figure BDA0003111771480000035

其中,MLP表示多层感知机对点特征的高维映射,max()代表在点集合的特征维上取最大值,f(pi)即是点pi的输出特征;Among them, MLP represents the high-dimensional mapping of the multi-layer perceptron to the point feature, max() represents the maximum value on the feature dimension of the point set, and f( pi ) is the output feature of the point pi ;

步骤1.2.3,对每一层输入的点云重复FPS采样到对应数目的点云,然后对采样得到的点通过步骤1.2.2聚合邻域特征,由此完成无信息损失的特征提取。其中第一层提取到的是局部细节特征,后三层提取到的为语义特征。Step 1.2.3, repeat the FPS sampling for the input point cloud of each layer to the corresponding number of point clouds, and then aggregate the neighborhood features for the points obtained through the step 1.2.2, thereby completing the feature extraction without information loss. The first layer extracts local detail features, and the last three layers extract semantic features.

而且,所述步骤1.3中特征传递为特征提取的逆过程,从提取层的最后一层出发,依次向上一层做特征传递,即从

Figure BDA0003111771480000036
层传递到
Figure BDA0003111771480000037
层、
Figure BDA0003111771480000038
层传递到
Figure BDA0003111771480000039
层、
Figure BDA00031117714800000310
层传递到
Figure BDA00031117714800000311
层、
Figure BDA00031117714800000312
层传递到N层。以
Figure BDA00031117714800000313
层传递到
Figure BDA00031117714800000314
层为例介绍特征的传递过程,假设点pi
Figure BDA00031117714800000315
层需要传递特征的点,φ(Pi)表示
Figure BDA00031117714800000316
层中欧式空间中距离Pi最近的k个点组成的组合,Pj表示φ(Pi)中的一点,三线性插值特征传递的计算方法如下:Moreover, the feature transfer in the step 1.3 is the inverse process of feature extraction. Starting from the last layer of the extraction layer, the features are transferred to the upper layer in turn, that is, from the last layer of the extraction layer.
Figure BDA0003111771480000036
layer passed to
Figure BDA0003111771480000037
Floor,
Figure BDA0003111771480000038
layer passed to
Figure BDA0003111771480000039
Floor,
Figure BDA00031117714800000310
layer passed to
Figure BDA00031117714800000311
Floor,
Figure BDA00031117714800000312
layers are passed to N layers. by
Figure BDA00031117714800000313
layer passed to
Figure BDA00031117714800000314
Take the layer as an example to introduce the feature transfer process, assuming that the point pi is
Figure BDA00031117714800000315
The point where the layer needs to transfer features, φ(P i ) represents
Figure BDA00031117714800000316
The combination of k points closest to P i in the Euclidean space in the layer, P j represents a point in φ(P i ), and the calculation method of trilinear interpolation feature transfer is as follows:

Figure BDA0003111771480000041
Figure BDA0003111771480000041

Figure BDA0003111771480000042
Figure BDA0003111771480000042

式中,f(pi)是需要传递的特征,f(pj)表示在点Pi邻域内的第j个点Pj的特征,wij表示点Pi邻域内的第j个点Pj的特征加权权重。In the formula, f(p i ) is the feature to be transferred, f(p j ) represents the feature of the jth point P j in the neighborhood of point P i , and w ij represents the jth point P in the neighborhood of point P i The feature weights for j .

每一个被传递的点的特征通过对其下一层领域内k个点的特征进行欧式距离的加权求和得到,逐层向前传递即可传递给场景中的每一个点云,使其具备无损失的信息特征。The feature of each transferred point is obtained by the weighted summation of the Euclidean distance of the features of the k points in the next layer of the field, and the forward transfer can be transferred to each point cloud in the scene layer by layer, so that it has Lossless informative features.

而且,所述步骤2中是以原始场景中点云坐标作为点云监督信息,Smooth-L1损失作为损失函数,计算方式如下:Moreover, in the step 2, the point cloud coordinates in the original scene are used as the point cloud supervision information, and the Smooth-L1 loss is used as the loss function, and the calculation method is as follows:

Figure BDA0003111771480000043
Figure BDA0003111771480000043

Figure BDA0003111771480000044
Figure BDA0003111771480000044

其中,r′和r分别表示无损特征提取网络预测的点云空间坐标和原始点云的空间坐标,φ(p)表示整个原始场景中的点云集合,在损失函数的监督下,无损特征提取网络Pointnt++对局部邻域信息的感知效果得到进一步提升。Among them, r' and r represent the spatial coordinates of the point cloud predicted by the lossless feature extraction network and the spatial coordinates of the original point cloud, respectively, φ(p) represents the point cloud set in the entire original scene, under the supervision of the loss function, the lossless feature extraction The perception effect of the network Pointnt++ on local neighborhood information is further improved.

而且,所述步骤3.1中初始化是先将点云空间均匀划分成体素格网,包含点的体素的被保留,不包含点的体素被舍弃,然后对保留下来的体素,利用步骤1中得到的局部细节特征进行初始化。假设步骤1中编码器网络第一层的输出为

Figure BDA0003111771480000045
其中Pi表示原始点云空间中需要传递特征的点,Fi P为点Pi的特征,
Figure BDA0003111771480000046
表示步骤1中编码器一共提取到了
Figure BDA0003111771480000047
个点的局部细节特征。体素中心
Figure BDA0003111771480000048
Vj表示体素中心,
Figure BDA0003111771480000049
表示体素中心Vj需要被赋予的特征,M表示一共有M个体素中心需要被赋值。通过三线性插值函数对体素中心的特征进行赋值,令
Figure BDA0003111771480000051
表示欧式空间中距离Vj最近的k个点组成的组合,Pt表示
Figure BDA0003111771480000052
的一点,则
Figure BDA0003111771480000053
的计算方式如下:Moreover, the initialization in the step 3.1 is to first divide the point cloud space into a voxel grid uniformly, the voxels containing the points are retained, and the voxels not containing the points are discarded, and then the retained voxels are used in step 1. The local detail features obtained in are initialized. Suppose the output of the first layer of the encoder network in step 1 is
Figure BDA0003111771480000045
where Pi represents the point in the original point cloud space that needs to transfer features, F i P is the feature of point Pi ,
Figure BDA0003111771480000046
Indicates that in step 1, the encoder has extracted a total of
Figure BDA0003111771480000047
local detail features of a point. voxel center
Figure BDA0003111771480000048
V j represents the voxel center,
Figure BDA0003111771480000049
Indicates the feature that the voxel center V j needs to be assigned, and M indicates a total of M voxel centers that need to be assigned. The feature of the voxel center is assigned by the trilinear interpolation function, let
Figure BDA0003111771480000051
represents the combination of k points closest to V j in Euclidean space, and P t represents
Figure BDA0003111771480000052
a point, then
Figure BDA0003111771480000053
is calculated as follows:

Figure BDA0003111771480000054
Figure BDA0003111771480000054

Figure BDA0003111771480000055
Figure BDA0003111771480000055

其中,Ft P表示体素中心点Vj邻域内的第t个特征点Pt的特征,wtj表示体素中心点Vj邻域内的第t个点Pt的特征加权权重。Among them, F t P represents the feature of the t-th feature point P t in the neighborhood of the voxel center point V j , and w tj represents the feature weighting weight of the t-th point P t in the neighborhood of the voxel center point V j .

而且,所述步骤3.2是采用Spconv库堆叠4层稀疏卷积模块,其中每个稀疏卷积模块包含两层子流型卷积模块和一层下采样为2的点云稀疏卷积模块。假设输入的体素体征张量表示为L×W×H×C,其中L、W、H、C分别表示体素场景的长、宽、高和每一个体素的特征维度,那么经过4层稀疏卷积输出可以表示为

Figure BDA0003111771480000056
其中C′表示经过特征提取后的特征维度。Moreover, the step 3.2 is to use the Spconv library to stack 4 layers of sparse convolution modules, wherein each sparse convolution module includes two layers of sub-flow type convolution modules and one layer of point cloud sparse convolution modules with a downsampling of 2. Assuming that the input voxel sign tensor is represented as L×W×H×C, where L, W, H, and C represent the length, width, height of the voxel scene and the feature dimension of each voxel respectively, then after 4 layers The sparse convolution output can be expressed as
Figure BDA0003111771480000056
where C' represents the feature dimension after feature extraction.

而且,所述步骤3.3中假设从步骤1提取得到的三层语义信息特征表示为

Figure BDA0003111771480000057
其中4×表示下采样四倍,经过稀疏卷积后的体素坐标为
Figure BDA0003111771480000058
Figure BDA0003111771480000059
表示体素中心,
Figure BDA00031117714800000510
表示体素中心
Figure BDA00031117714800000511
需要被赋予的特征。采用三线性插值将点的语义特征转化到体素中心表征,令
Figure BDA00031117714800000512
表示欧式空间中距离
Figure BDA00031117714800000513
最近的k个点组成的组合,Pt,4×、Pt,8×、Pt,16×均为
Figure BDA00031117714800000514
中的点,则
Figure BDA00031117714800000515
的计算方式如下:Moreover, in step 3.3, it is assumed that the three-layer semantic information feature extracted from step 1 is expressed as
Figure BDA0003111771480000057
where 4× means downsampling four times, and the voxel coordinates after sparse convolution are
Figure BDA0003111771480000058
Figure BDA0003111771480000059
represents the voxel center,
Figure BDA00031117714800000510
Represents the voxel center
Figure BDA00031117714800000511
characteristics that need to be assigned. Trilinear interpolation is used to transform the semantic features of points into voxel center representations, so that
Figure BDA00031117714800000512
Represents distance in Euclidean space
Figure BDA00031117714800000513
The combination of the nearest k points, P t,4× , P t,8× , P t,16× are all
Figure BDA00031117714800000514
point in the
Figure BDA00031117714800000515
is calculated as follows:

Figure BDA00031117714800000516
Figure BDA00031117714800000516

Figure BDA00031117714800000517
Figure BDA00031117714800000517

Figure BDA00031117714800000518
Figure BDA00031117714800000518

Figure BDA0003111771480000061
Figure BDA0003111771480000061

其中,

Figure BDA0003111771480000062
表示经过3D稀疏卷积后的体素中心,Pt,4×、Pt,8×、Pt,16×表示进行特征加权的空间点,
Figure BDA0003111771480000063
表示体素中心点
Figure BDA0003111771480000064
邻域内的第t个点在下采样四倍层的的特征,wtj,4×表示体素中心
Figure BDA0003111771480000065
邻域内的第t个点在下采样四倍层的特征加权权重。in,
Figure BDA0003111771480000062
represents the voxel center after 3D sparse convolution, P t,4× , P t,8× , P t,16× represent the spatial point for feature weighting,
Figure BDA0003111771480000063
Represents the voxel center point
Figure BDA0003111771480000064
The t-th point in the neighborhood is the feature of the down-sampled quadruple layer, w tj, 4× represents the voxel center
Figure BDA0003111771480000065
The t-th point within the neighborhood is downsampled by four layers of feature weighting weights.

而且,所述步骤3.4中是先将两种语义信息在特征维度上进行串联,假设步骤3.3转化得到的体素特征维度为M1,稀疏卷积感知得到的体素特征为M2,那么叠加后的体素特征维度为M1+M2,然后采用一层多层感知器将组合后的特征M1+M2维度映射为M1Moreover, in the step 3.4, the two kinds of semantic information are first connected in series in the feature dimension. Assuming that the voxel feature dimension obtained by the transformation in step 3.3 is M 1 , and the voxel feature obtained by sparse convolution perception is M 2 , then the superposition The dimension of the final voxel feature is M 1 +M 2 , and then a layer of multi-layer perceptron is used to map the combined feature M 1 +M 2 dimension to M 1 .

而且,所述步骤4.1中RPN由四层二维卷积神经网络搭建,采用U-Net网络结构逐层地输出,具体表示为

Figure BDA0003111771480000066
Figure BDA0003111771480000067
各层均采用3×3卷积以减小学习参数。采用编码-解码网络结构对融合后的信息进一步特征抽象,并在最终的特征图上对每一个像素点预先设置一个对应的检测锚框,通过对预先设置的检测锚框进行分类、回归得到RPN检测出的物体。一个三维检测锚框可以表示为{x,y,z,l,w,h,r},(x,y,z)表示检测锚框的中心位置,l、w、h分别对应长、宽、高,r为在x-y平面的旋转角。经过3D稀疏卷积和语义信息融合后的体素特征可以表征为
Figure BDA0003111771480000068
将高度维压缩至特征维度得到二维图像表征为
Figure BDA0003111771480000069
因此对于大小为
Figure BDA00031117714800000610
的特征图,一共会有
Figure BDA00031117714800000611
个预定义检测锚框。Moreover, in the step 4.1, the RPN is constructed by a four-layer two-dimensional convolutional neural network, and the U-Net network structure is used to output layer by layer, which is specifically expressed as
Figure BDA0003111771480000066
Figure BDA0003111771480000067
Each layer adopts 3×3 convolution to reduce the learning parameters. The encoding-decoding network structure is used to further abstract the features of the fused information, and a corresponding detection anchor frame is preset for each pixel on the final feature map, and the RPN is obtained by classifying and regressing the preset detection anchor frame. detected objects. A three-dimensional detection anchor box can be expressed as {x,y,z,l,w,h,r}, (x,y,z) represents the center position of the detection anchor box, l, w, h correspond to the length, width, height, r is the rotation angle in the xy plane. The voxel features after 3D sparse convolution and semantic information fusion can be characterized as
Figure BDA0003111771480000068
The height dimension is compressed to the feature dimension to obtain a two-dimensional image representation as
Figure BDA0003111771480000069
So for a size of
Figure BDA00031117714800000610
The feature map of , there will be a total of
Figure BDA00031117714800000611
predefined detection anchor boxes.

而且,所述步骤4.2中分类损失函数Lcls采用交叉熵损失函数,即:Moreover, in the step 4.2, the classification loss function L cls adopts the cross entropy loss function, namely:

Figure BDA00031117714800000612
Figure BDA00031117714800000612

式中,n表示预先设置的检测锚框的个数,P(ai)表示第i个检测锚框预测的分数,Q(ai)表示该检测锚框的真实的标签值。In the formula, n represents the number of preset detection anchor boxes, P(a i ) represents the predicted score of the ith detection anchor box, and Q(a i ) represents the real label value of the detection anchor box.

回归损失函数Lreg采用Smooth-L1损失函数,即:The regression loss function L reg adopts the Smooth-L1 loss function, namely:

Figure BDA0003111771480000071
Figure BDA0003111771480000071

Figure BDA0003111771480000072
Figure BDA0003111771480000072

式中,n表示预先设置的检测锚框的个数,v表示检测锚框的真实值,v′表示RPN预测的检测锚框的值。In the formula, n represents the number of preset detection anchor boxes, v represents the real value of the detection anchor box, and v′ represents the value of the detection anchor box predicted by RPN.

通过分类损失函数和回归损失函数的联合监督,网络最终可学习到点云目标检测的能力。Through the joint supervision of the classification loss function and the regression loss function, the network can finally learn the ability of point cloud object detection.

与现有技术相比,本发明具有如下优点:(1)联合了当前基于体素划分和基于原始点云的点云目标检测方法的优点,同时具备高效的点云感知能力和无损特征编码的能力;(2)通过将点云的无损编码多尺度嵌入到体素方法中,促使检测网络具备多尺度多层级信息融合感知能力。Compared with the prior art, the present invention has the following advantages: (1) It combines the advantages of the current point cloud target detection method based on voxel division and original point cloud, and has efficient point cloud perception ability and lossless feature encoding. (2) By embedding the lossless encoding of point cloud multi-scale into the voxel method, the detection network has the ability of multi-scale and multi-level information fusion perception.

附图说明Description of drawings

图1是本发明实施例的流程图。FIG. 1 is a flowchart of an embodiment of the present invention.

图2是本发明实施例的检测实例图,其中图2(a)为输入点云,图2(b)为点云检测锚框。Fig. 2 is a diagram of a detection example according to an embodiment of the present invention, wherein Fig. 2(a) is an input point cloud, and Fig. 2(b) is a point cloud detection anchor frame.

具体实施方式Detailed ways

本发明提供一种融合原始点云和体素划分的点云目标检测方法,首先利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征,然后构建损失函数进一步提升无损特征提取网络Pointnt++对局部邻域信息的感知能力,接着分别采用三线性插值在体素特征初始化阶段和稀疏卷积感知阶段将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中,最后通过二维RPN对每个预先设置的检测锚框进行分类和回归得到最终的检测目标。The invention provides a point cloud target detection method that integrates original point cloud and voxel division. First, the lossless feature extraction network Pointnet++ is used to extract the local detail features and semantic features of the point cloud, and then a loss function is constructed to further improve the lossless feature extraction network Pointnt++. The perception ability of neighborhood information, and then use trilinear interpolation to embed local detail features and semantic features without information loss into the point cloud target detection network based on voxel division in the voxel feature initialization stage and the sparse convolution perception stage, respectively. Finally, the final detection target is obtained by classifying and regressing each pre-set detection anchor box by 2D RPN.

下面结合附图和实施例对本发明的技术方案作进一步说明。The technical solutions of the present invention will be further described below with reference to the accompanying drawings and embodiments.

步骤1,利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征。Step 1, use the lossless feature extraction network Pointnet++ to extract the local detail features and semantic features of the point cloud.

首先对输入点云采集到固定数目N,接着逐层采样并搭建局部点云特征提取器(SetAbstraction,SA)对局部场景做特征提取,然后采用三线性插值将局部细节特征和语义特征通过特征传递层(Feature Propagation)赋予给原始场景中的所有点。包括以下子步骤:First collect a fixed number N of input point clouds, then sample layer by layer and build a local point cloud feature extractor (SetAbstraction, SA) to extract features from the local scene, and then use trilinear interpolation to transfer local detail features and semantic features through features Layer (Feature Propagation) assigned to all points in the original scene. Includes the following sub-steps:

步骤1.1,构建多层编码器。Step 1.1, build a multi-layer encoder.

首先利用最远点采样策略(FPS)从原始点云中采集N个点作为输入点云,然后再利用FPS从输入点云数据中逐层采集数目为

Figure BDA0003111771480000081
的点云,构成4层编码器,每一层输入的点云即为上一层输出的点集。First use the farthest point sampling strategy (FPS) to collect N points from the original point cloud as the input point cloud, and then use FPS to collect the number of layers from the input point cloud data as
Figure BDA0003111771480000081
The point cloud of each layer constitutes a 4-layer encoder, and the input point cloud of each layer is the point set output by the previous layer.

步骤1.2,通过SA模块无信息丢失地提取每一层点云的局部细节特征和语义特征。In step 1.2, the local detail features and semantic features of each layer of point clouds are extracted through the SA module without information loss.

每一层SA模块的输入为上一层经过FPS采样得到的固定数目的点集,设点pi为当前层通过FPS采样得到的第i个点,

Figure BDA0003111771480000082
为上一层中以点pi为中心,半径为r的球邻域内部的点所构成的集合。pi的输出特征的计算包括以下几步:The input of the SA module of each layer is a fixed number of point sets obtained by the FPS sampling of the previous layer, and the point p i is the i-th point obtained by the FPS sampling of the current layer,
Figure BDA0003111771480000082
is the set of points inside the spherical neighborhood with point pi as the center and radius r in the previous layer. The calculation of the output features of p i includes the following steps:

步骤1.2.1,从集合

Figure BDA0003111771480000083
中随机采样k个点组成集合
Figure BDA0003111771480000084
Step 1.2.1, from the collection
Figure BDA0003111771480000083
Randomly sample k points in the set to form a set
Figure BDA0003111771480000084

步骤1.2.2,通过多层感知机对步骤1.2.1采样的点进行特征融合提取,计算得到点pi的输出特征。Step 1.2.2, perform feature fusion extraction on the points sampled in step 1.2.1 through a multi-layer perceptron, and calculate the output feature of point p i .

首先采用多层感知机对步骤1.2.1随机采样的点集合

Figure BDA0003111771480000085
做局部细节特征提取,得到
Figure BDA0003111771480000086
个点的高维映射特征,然后在特征维度通过最大池化得到在特征维度上的最大信息表征,该最大信息表征的高维映射特征即是点pi的输出特征。计算公式如下:First, a multi-layer perceptron is used to randomly sample the point set in step 1.2.1
Figure BDA0003111771480000085
Do local detail feature extraction to get
Figure BDA0003111771480000086
The high-dimensional mapping feature of each point, and then the maximum information representation in the feature dimension is obtained by maximum pooling in the feature dimension. The high-dimensional mapping feature of the maximum information representation is the output feature of point p i . Calculated as follows:

Figure BDA0003111771480000087
Figure BDA0003111771480000087

其中,MLP表示多层感知机对点特征的高维映射,max()代表在点集合的特征维上取最大值,f(pi)即是点pi的输出特征。Among them, MLP represents the high-dimensional mapping of the multi-layer perceptron to the point feature, max() represents the maximum value on the feature dimension of the point set, and f( pi ) is the output feature of the point pi .

步骤1.2.3,对每一层输入的点云重复FPS采样到对应数目的点云,然后对采样得到的点通过步骤1.2.2聚合邻域特征,由此完成无信息损失的特征提取。其中第一层提取到的是局部细节特征,后三层提取到的为语义特征。Step 1.2.3, repeat the FPS sampling for the input point cloud of each layer to the corresponding number of point clouds, and then aggregate the neighborhood features for the points obtained through the step 1.2.2, thereby completing the feature extraction without information loss. The first layer extracts local detail features, and the last three layers extract semantic features.

步骤1.3,采用三线性插值将步骤1.2提取的细节特征和语义特征通过特征传递层赋予给原始场景中的所有点。In step 1.3, trilinear interpolation is used to assign the detailed features and semantic features extracted in step 1.2 to all points in the original scene through the feature transfer layer.

特征传递为特征提取的逆过程,从提取层的最后一层出发,依次向上一层做特征传递,即从

Figure BDA0003111771480000091
层传递到
Figure BDA0003111771480000092
层、
Figure BDA0003111771480000093
层传递到
Figure BDA0003111771480000094
层、
Figure BDA0003111771480000095
层传递到
Figure BDA0003111771480000096
层、
Figure BDA0003111771480000097
层传递到N层。以
Figure BDA0003111771480000098
层传递到
Figure BDA0003111771480000099
层为例介绍特征的传递过程,假设点pi
Figure BDA00031117714800000910
层需要传递特征的点,φ(Pi)表示
Figure BDA00031117714800000911
层中欧式空间中距离Pi最近的k个点组成的组合,Pj表示φ(Pi)中的一点,三线性插值特征传递的计算方法如下:Feature transfer is the inverse process of feature extraction. Starting from the last layer of the extraction layer, feature transfer is performed to the upper layer in turn, that is, from the last layer of the extraction layer.
Figure BDA0003111771480000091
layer passed to
Figure BDA0003111771480000092
Floor,
Figure BDA0003111771480000093
layer passed to
Figure BDA0003111771480000094
Floor,
Figure BDA0003111771480000095
layer passed to
Figure BDA0003111771480000096
Floor,
Figure BDA0003111771480000097
layers are passed to N layers. by
Figure BDA0003111771480000098
layer passed to
Figure BDA0003111771480000099
Take the layer as an example to introduce the feature transfer process, assuming that the point pi is
Figure BDA00031117714800000910
The point where the layer needs to transfer features, φ(P i ) represents
Figure BDA00031117714800000911
The combination of k points closest to P i in the Euclidean space in the layer, P j represents a point in φ(P i ), and the calculation method of trilinear interpolation feature transfer is as follows:

Figure BDA00031117714800000912
Figure BDA00031117714800000912

Figure BDA00031117714800000913
Figure BDA00031117714800000913

式中,f(pi)是需要传递的特征,f(pj)表示在点Pi邻域内的第j个点Pj的特征,wij表示点Pi邻域内的第j个点Pj的特征加权权重。In the formula, f(p i ) is the feature to be transferred, f(p j ) represents the feature of the jth point P j in the neighborhood of point P i , and w ij represents the jth point P in the neighborhood of point P i The feature weights for j .

每一个被传递的点的特征通过对其下一层领域内k个点的特征进行欧式距离的加权求和得到,逐层向前传递即可传递给场景中的每一个点云,使其具备无损失的信息特征。The feature of each transferred point is obtained by the weighted summation of the Euclidean distance of the features of the k points in the next layer of the field, and the forward transfer can be transferred to each point cloud in the scene layer by layer, so that it has Lossless informative features.

步骤2,构建损失函数,监督步骤1特征提取的执行,促进无损特征提取网络Pointnet++感知特征信息。Step 2, build a loss function, supervise the execution of feature extraction in Step 1, and promote the lossless feature extraction network Pointnet++ to perceive feature information.

采用原始场景中点云坐标作为点云监督信息,Smooth-L1损失作为损失函数,计算方式如下:The point cloud coordinates in the original scene are used as the point cloud supervision information, and the Smooth-L1 loss is used as the loss function. The calculation method is as follows:

Figure BDA00031117714800000914
Figure BDA00031117714800000914

Figure BDA00031117714800000915
Figure BDA00031117714800000915

其中,r′和r分别表示无损特征提取网络预测的点云空间坐标和原始点云的空间坐标,φ(p)表示整个原始场景中的点云集合,在损失函数的监督下,无损特征提取网络Pointnt++对局部邻域信息的感知效果得到进一步提升。Among them, r' and r represent the spatial coordinates of the point cloud predicted by the lossless feature extraction network and the spatial coordinates of the original point cloud, respectively, φ(p) represents the point cloud set in the entire original scene, under the supervision of the loss function, the lossless feature extraction The perception effect of the network Pointnt++ on local neighborhood information is further improved.

步骤3,将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中。Step 3: Embed local detail features and semantic features without information loss into the point cloud object detection network based on voxel division.

首先将原始点云划分成体素,利用步骤1中所提取到的局部细节特征对体素特征进行初始化,接着通过稀疏3D卷积感知点云空间结构,然后再在语义层面融合步骤1提取到的语义特征,包括以下子步骤:First, the original point cloud is divided into voxels, and the voxel features are initialized using the local detail features extracted in step 1. Then, the spatial structure of the point cloud is perceived through sparse 3D convolution, and then the extracted data in step 1 is fused at the semantic level. Semantic features, including the following sub-steps:

步骤3.1,利用步骤1提取到的局部细节特征对体素特征进行初始化。Step 3.1, initialize the voxel features using the local detail features extracted in step 1.

首先将点云空间均匀划分成体素格网,包含点的体素的被保留,不包含点的体素被舍弃,然后对保留下来的体素,利用步骤1中得到的局部细节特征进行初始化。假设步骤1中编码器网络第一层的输出为

Figure BDA0003111771480000101
其中Pi表示原始点云空间中需要传递特征的点,Fi P为点Pi的特征,
Figure BDA0003111771480000102
表示步骤1中编码器一共提取到了
Figure BDA0003111771480000103
个点的局部细节特征。体素中心
Figure BDA0003111771480000104
Vj表示体素中心,
Figure BDA0003111771480000105
表示体素中心Vj需要被赋予的特征,M表示一共有M个体素中心需要被赋值。通过三线性插值函数对体素中心的特征进行赋值,令
Figure BDA0003111771480000106
表示欧式空间中距离Vj最近的k个点组成的组合,Pt表示
Figure BDA0003111771480000107
的一点,则
Figure BDA0003111771480000108
的计算方式如下:First, the point cloud space is evenly divided into a voxel grid, the voxels containing points are retained, and the voxels not containing points are discarded, and then the retained voxels are initialized using the local detail features obtained in step 1. Suppose the output of the first layer of the encoder network in step 1 is
Figure BDA0003111771480000101
where Pi represents the point in the original point cloud space that needs to transfer features, F i P is the feature of point Pi ,
Figure BDA0003111771480000102
Indicates that in step 1, the encoder has extracted a total of
Figure BDA0003111771480000103
local detail features of a point. voxel center
Figure BDA0003111771480000104
V j represents the voxel center,
Figure BDA0003111771480000105
Indicates the feature that the voxel center V j needs to be assigned, and M indicates a total of M voxel centers that need to be assigned. The feature of the voxel center is assigned by the trilinear interpolation function, let
Figure BDA0003111771480000106
represents the combination of k points closest to V j in Euclidean space, and P t represents
Figure BDA0003111771480000107
a point, then
Figure BDA0003111771480000108
is calculated as follows:

Figure BDA0003111771480000109
Figure BDA0003111771480000109

Figure BDA00031117714800001010
Figure BDA00031117714800001010

其中,Ft P表示体素中心点Vj邻域内的第t个特征点Pt的特征,wtj表示体素中心点Vj邻域内的第t个点Pt的特征加权权重。Among them, F t P represents the feature of the t-th feature point P t in the neighborhood of the voxel center point V j , and w tj represents the feature weighting weight of the t-th point P t in the neighborhood of the voxel center point V j .

步骤3.2,使用3D稀疏卷积对步骤3.1初始化后的体素场景语义信息做特征提取。Step 3.2, use 3D sparse convolution to perform feature extraction on the semantic information of the voxel scene initialized in step 3.1.

采用Spconv库堆叠4层稀疏卷积模块,其中每个稀疏卷积模块包含两层子流型卷积模块和一层下采样为2的点云稀疏卷积模块。假设输入的体素体征张量表示为L×W×H×C,其中L、W、H、C分别表示体素场景的长、宽、高和每一个体素的特征维度,那么经过4层稀疏卷积输出可以表示为

Figure BDA0003111771480000111
其中C′表示经过特征提取后的特征维度。The Spconv library is used to stack 4 layers of sparse convolution modules, in which each sparse convolution module contains two layers of sub-stream convolution modules and one layer of point cloud sparse convolution modules with downsampling of 2. Assuming that the input voxel sign tensor is represented as L×W×H×C, where L, W, H, and C represent the length, width, height of the voxel scene and the feature dimension of each voxel respectively, then after 4 layers The sparse convolution output can be expressed as
Figure BDA0003111771480000111
where C' represents the feature dimension after feature extraction.

步骤3.3,采用三线性插值将步骤1得到的语义特征转化为体素特征。Step 3.3, using trilinear interpolation to convert the semantic features obtained in step 1 into voxel features.

假设步骤1得到的三层语义特征为

Figure BDA0003111771480000112
其中4×表示下采样四倍,经过稀疏卷积后的体素坐标为
Figure BDA0003111771480000113
Figure BDA0003111771480000114
表示体素中心,
Figure BDA0003111771480000115
表示体素中心
Figure BDA0003111771480000116
需要被赋予的特征。采用三线性插值将点的语义特征转化到体素中心表征,令
Figure BDA0003111771480000117
表示欧式空间中距离
Figure BDA0003111771480000118
最近的k个点组成的组合,Pt,4×、Pt,8×、Pt,16×均为
Figure BDA0003111771480000119
中的点,则
Figure BDA00031117714800001110
的计算方式如下:Suppose the three-layer semantic feature obtained in step 1 is
Figure BDA0003111771480000112
where 4× means downsampling four times, and the voxel coordinates after sparse convolution are
Figure BDA0003111771480000113
Figure BDA0003111771480000114
represents the voxel center,
Figure BDA0003111771480000115
Represents the voxel center
Figure BDA0003111771480000116
characteristics that need to be assigned. Trilinear interpolation is used to transform the semantic features of points into voxel center representations, so that
Figure BDA0003111771480000117
Represents distance in Euclidean space
Figure BDA0003111771480000118
The combination of the nearest k points, P t,4× , P t,8× , P t,16× are all
Figure BDA0003111771480000119
point in the
Figure BDA00031117714800001110
is calculated as follows:

Figure BDA00031117714800001111
Figure BDA00031117714800001111

Figure BDA00031117714800001112
Figure BDA00031117714800001112

Figure BDA00031117714800001113
Figure BDA00031117714800001113

Figure BDA00031117714800001114
Figure BDA00031117714800001114

其中,

Figure BDA00031117714800001115
表示经过3D稀疏卷积后的体素中心,Pt,4×、Pt,8×、Pt,16×表示进行特征加权的空间点,
Figure BDA00031117714800001116
表示体素中心点
Figure BDA00031117714800001117
邻域内的第t个点在下采样四倍层的的特征,wtj,4×表示体素中心
Figure BDA00031117714800001118
邻域内的第t个点在下采样四倍层的特征加权权重。in,
Figure BDA00031117714800001115
represents the voxel center after 3D sparse convolution, P t,4× , P t,8× , P t,16× represent the spatial point for feature weighting,
Figure BDA00031117714800001116
Represents the voxel center point
Figure BDA00031117714800001117
The t-th point in the neighborhood is the feature of the down-sampled quadruple layer, w tj, 4× represents the voxel center
Figure BDA00031117714800001118
The t-th point within the neighborhood is downsampled by four layers of feature weighting weights.

步骤3.4,采用注意力机制方式将步骤3.2经过稀疏卷积感知的语义特征与将步骤3.3转化得到的体素特征进行融合,得到融合两种感知模式的语义信息。In step 3.4, the attention mechanism is used to fuse the semantic features perceived by the sparse convolution in step 3.2 with the voxel features transformed in step 3.3 to obtain semantic information that fuses the two perception modes.

首先将两种语义信息在特征维度上进行串联,假设步骤3.3转化得到的体素特征维度为M1,稀疏卷积感知得到的体素特征为M2,那么叠加后的体素特征维度为M1+M2,然后采用一层多层感知器将组合后的特征M1+M2维度映射为M1First, the two kinds of semantic information are concatenated in the feature dimension. Assuming that the voxel feature dimension obtained from the transformation in step 3.3 is M 1 , and the voxel feature obtained by sparse convolution perception is M 2 , then the superimposed voxel feature dimension is M 1 + M 2 , and then use a layer of multi-layer perceptron to map the combined feature M 1 +M 2 dimension to M 1 .

步骤4,将步骤3融合得到的语义特征投影到二维俯视图,通过二维卷积搭建区域提出网络(RPN),在场景俯视图下对每个像素点预先设置的检测锚框进行分类和回归得到最终的检测目标,包含以下子步骤:Step 4: Project the semantic features obtained by fusion in Step 3 to a two-dimensional top view, build a Region Proposition Network (RPN) through two-dimensional convolution, and classify and regress the pre-set detection anchor boxes for each pixel in the top view of the scene. The final detection target includes the following sub-steps:

步骤4.1,RPN网络结构和预定义框设置。Step 4.1, RPN network structure and pre-defined frame settings.

RPN由四层二维卷积神经网络搭建,采用U-Net网络结构逐层地输出,具体表示为

Figure BDA0003111771480000121
各层均采用3×3卷积以减小学习参数。采用编码-解码网络结构对融合后的信息进一步特征抽象,并在最终的特征图上对每一个像素点预先设置一个对应的检测锚框,通过对预先设置的检测锚框进行分类、回归得到RPN检测出的物体。一个三维检测锚框可以表示为{x,y,z,l,w,h,r},(x,y,z)表示检测锚框的中心位置,l、w、h分别对应长、宽、高,r为在x-y平面的旋转角。经过3D稀疏卷积和语义信息融合后的体素特征可以表征为
Figure BDA0003111771480000122
将高度维压缩至特征维度得到二维图像表征为
Figure BDA0003111771480000123
因此对于大小为
Figure BDA0003111771480000124
的特征图,一共会有
Figure BDA0003111771480000125
个预定义检测锚框。RPN is built by a four-layer two-dimensional convolutional neural network, and the U-Net network structure is used to output layer by layer, which is specifically expressed as
Figure BDA0003111771480000121
Each layer adopts 3×3 convolution to reduce the learning parameters. The encoding-decoding network structure is used to further abstract the features of the fused information, and a corresponding detection anchor frame is preset for each pixel on the final feature map, and the RPN is obtained by classifying and regressing the preset detection anchor frame. detected objects. A three-dimensional detection anchor box can be expressed as {x,y,z,l,w,h,r}, (x,y,z) represents the center position of the detection anchor box, l, w, h correspond to the length, width, height, r is the rotation angle in the xy plane. The voxel features after 3D sparse convolution and semantic information fusion can be characterized as
Figure BDA0003111771480000122
The height dimension is compressed to the feature dimension to obtain a two-dimensional image representation as
Figure BDA0003111771480000123
So for a size of
Figure BDA0003111771480000124
The feature map of , there will be a total of
Figure BDA0003111771480000125
predefined detection anchor boxes.

步骤4.2,点云目标检测损失函数的设计。Step 4.2, the design of point cloud target detection loss function.

利用分类损失函数和回归损失函数对预先设置的检测锚框进行分类和回归,进而得到RPN检测出的物体。Use the classification loss function and the regression loss function to classify and regress the preset detection anchor frame, and then obtain the object detected by the RPN.

分类损失函数Lcls采用交叉熵损失函数,即:The classification loss function L cls adopts the cross entropy loss function, namely:

Figure BDA0003111771480000126
Figure BDA0003111771480000126

式中,n表示预先设置的检测锚框的个数,P(ai)表示第i个检测锚框预测的分数,Q(ai)表示该检测锚框真实的标签值。In the formula, n represents the number of preset detection anchor boxes, P(a i ) represents the predicted score of the ith detection anchor box, and Q(a i ) represents the real label value of the detection anchor box.

回归损失函数Lreg采用Smooth-L1损失函数,即:The regression loss function L reg adopts the Smooth-L1 loss function, namely:

Figure BDA0003111771480000127
Figure BDA0003111771480000127

Figure BDA0003111771480000131
Figure BDA0003111771480000131

式中,n表示预先设置的检测锚框的个数,v表示检测锚框的真实值,v′表示RPN预测的检测锚框的值。In the formula, n represents the number of preset detection anchor boxes, v represents the real value of the detection anchor box, and v′ represents the value of the detection anchor box predicted by RPN.

通过分类损失函数和回归损失函数的联合监督,网络最终可学习到点云目标检测的能力。Through the joint supervision of the classification loss function and the regression loss function, the network can finally learn the ability of point cloud object detection.

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definition of the appended claims range.

Claims (10)

1.一种融合原始点云和体素划分的点云目标检测方法,其特征在于,包括如下步骤:1. a point cloud target detection method of fusion original point cloud and voxel division, is characterized in that, comprises the steps: 步骤1,利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征;Step 1, use the lossless feature extraction network Pointnet++ to extract the local detail features and semantic features of the point cloud; 步骤2,构建损失函数,监督步骤1特征提取的执行,促进无损特征提取网络Pointnet++感知特征信息;Step 2, build a loss function, supervise the execution of feature extraction in step 1, and promote the lossless feature extraction network Pointnet++ to perceive feature information; 步骤3,将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中;Step 3: Embed local detail features and semantic features without information loss into the point cloud target detection network based on voxel division; 步骤3.1,利用步骤1提取到的局部细节特征对体素特征进行初始化;Step 3.1, initialize the voxel features using the local detail features extracted in step 1; 步骤3.2,使用3D稀疏卷积对步骤3.1初始化后的体素场景语义信息做特征提取;Step 3.2, use 3D sparse convolution to perform feature extraction on the semantic information of the voxel scene initialized in step 3.1; 步骤3.3,采用三线性插值将步骤1得到的语义特征转化为体素特征;Step 3.3, using trilinear interpolation to convert the semantic features obtained in step 1 into voxel features; 步骤3.4,采用注意力机制方式将步骤3.2经过稀疏卷积感知的语义特征与将步骤3.3转化得到的体素特征进行融合,得到融合两种感知模式的语义信息;In step 3.4, the attention mechanism is used to fuse the semantic features perceived by the sparse convolution in step 3.2 with the voxel features transformed in step 3.3 to obtain semantic information that fuses the two perception modes; 步骤4,将步骤3融合得到的语义特征投影到二维俯视图,通过二维卷积搭建区域提出网络RPN,在场景俯视图下对每个像素点预先设置的检测锚框进行分类和回归得到最终的检测目标。Step 4: Project the semantic features obtained by fusion in step 3 to a two-dimensional top view, build a region through two-dimensional convolution to propose a network RPN, and classify and regress the pre-set detection anchor boxes for each pixel in the top view of the scene to obtain the final result. Detection target. 2.如权利要求1所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤1包括以下几个子步骤:2. the point cloud target detection method of a kind of fusion original point cloud and voxel division as claimed in claim 1 is characterized in that: described step 1 comprises following several sub-steps: 步骤1.1,构建多层编码器;Step 1.1, build a multi-layer encoder; 利用最远点采样策略从原始点云中采集N个点作为输入点云,然后再利用最远点采样策略从输入点云数据中逐层采集数目为
Figure FDA0003111771470000011
的点云,构成4层编码器,每一层输入的点云即为上一层输出的点集;
Use the farthest point sampling strategy to collect N points from the original point cloud as the input point cloud, and then use the farthest point sampling strategy to collect the number of layers from the input point cloud data as
Figure FDA0003111771470000011
The point cloud is composed of 4-layer encoder, and the input point cloud of each layer is the point set output by the previous layer;
步骤1.2,通过局部点集特征提取模块无信息丢失地提取每一层点云的局部细节特征和语义特征;Step 1.2, extract the local detail features and semantic features of each layer of point cloud through the local point set feature extraction module without information loss; 步骤1.3,采用三线性插值将步骤1.2提取的细节特征和语义特征通过特征传递层赋予给原始场景中的所有点。In step 1.3, trilinear interpolation is used to assign the detailed features and semantic features extracted in step 1.2 to all points in the original scene through the feature transfer layer.
3.如权利要求2所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤1.2中每一层局部点集特征提取模块的输入为上一层经过最远点采样策略采样得到的固定数目的点集,设点pi为当前层通过最远点采样策略采样得到的第i个点,
Figure FDA0003111771470000021
为上一层中以点pi为中心,半径为r的球邻域内部的点所构成的集合,点pi输出特征的计算包括以下几步:
3. the point cloud target detection method of a kind of fusion original point cloud and voxel division as claimed in claim 2, it is characterized in that: the input of each layer local point set feature extraction module in described step 1.2 is the upper layer A fixed number of point sets are sampled by the farthest point sampling strategy, and point p i is the i-th point sampled by the farthest point sampling strategy in the current layer,
Figure FDA0003111771470000021
is the set of points inside the spherical neighborhood with point pi as the center and radius r in the previous layer. The calculation of the output feature of point pi includes the following steps:
步骤1.2.1,从集合
Figure FDA0003111771470000022
中随机采样k个点组成集合
Figure FDA0003111771470000023
Step 1.2.1, from the collection
Figure FDA0003111771470000022
Randomly sample k points in the set to form a set
Figure FDA0003111771470000023
步骤1.2.2,通过多层感知机对步骤1.2.1采样的点进行特征融合提取,计算公式如下:Step 1.2.2, perform feature fusion extraction on the points sampled in step 1.2.1 through a multi-layer perceptron. The calculation formula is as follows:
Figure FDA0003111771470000024
Figure FDA0003111771470000024
其中,MLP表示多层感知机对点特征的高维映射,max()代表在点集合的特征维上取最大值,f(pi)即是点pi的输出特征;Among them, MLP represents the high-dimensional mapping of the multi-layer perceptron to the point feature, max() represents the maximum value on the feature dimension of the point set, and f( pi ) is the output feature of the point pi ; 步骤1.2.3,对每一层输入的点云重复最远点采样策略采样到对应数目的点云,然后对采样得到的点通过步骤1.2.2聚合邻域特征,由此完成无信息损失的特征提取,其中第一层提取到的是局部细节特征,后三层提取到的为语义特征。Step 1.2.3, repeat the farthest point sampling strategy for the input point cloud of each layer to sample the corresponding number of point clouds, and then aggregate the neighborhood features through step 1.2.2 for the sampled points, thus completing the information loss-free. Feature extraction, in which the first layer extracts local detail features, and the last three layers extract semantic features.
4.如权利要求2所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤1.3中特征传递为特征提取的逆过程,从提取层的最后一层出发,依次向上一层做特征传递,即从
Figure FDA0003111771470000025
层传递到
Figure FDA0003111771470000026
层、
Figure FDA0003111771470000027
层传递到
Figure FDA0003111771470000028
层、
Figure FDA0003111771470000029
层传递到
Figure FDA00031117714700000210
层、
Figure FDA00031117714700000211
层传递到N层,以
Figure FDA00031117714700000212
层传递到
Figure FDA00031117714700000213
层为例介绍特征的传递过程,假设点pi
Figure FDA00031117714700000214
层需要传递特征的点,φ(Pi)表示
Figure FDA00031117714700000215
层中欧式空间中距离Pi最近的k个点组成的组合,Pj表示φ(Pi)中的一点,三线性插值特征传递的计算方法如下:
4. a kind of point cloud target detection method of fusion original point cloud and voxel division as claimed in claim 2, it is characterized in that: in described step 1.3, feature transfer is the inverse process of feature extraction, from the last of the extraction layer Starting from the layer, the features are transferred to the upper layer in turn, that is, from
Figure FDA0003111771470000025
layer passed to
Figure FDA0003111771470000026
Floor,
Figure FDA0003111771470000027
layer passed to
Figure FDA0003111771470000028
Floor,
Figure FDA0003111771470000029
layer passed to
Figure FDA00031117714700000210
Floor,
Figure FDA00031117714700000211
layer is passed to N layers to
Figure FDA00031117714700000212
layer passed to
Figure FDA00031117714700000213
Take the layer as an example to introduce the feature transfer process, assuming that the point pi is
Figure FDA00031117714700000214
The point where the layer needs to transfer features, φ(P i ) represents
Figure FDA00031117714700000215
The combination of k points closest to P i in the Euclidean space in the layer, P j represents a point in φ(P i ), and the calculation method of trilinear interpolation feature transfer is as follows:
Figure FDA00031117714700000216
Figure FDA00031117714700000216
Figure FDA00031117714700000217
Figure FDA00031117714700000217
式中,f(pi)是需要传递的特征,f(pj)表示在点Pi邻域内的第j个点Pj的特征,wij表示点Pi邻域内的第j个点Pj的特征加权权重。In the formula, f(p i ) is the feature to be transferred, f(p j ) represents the feature of the jth point P j in the neighborhood of point P i , and w ij represents the jth point P in the neighborhood of point P i The feature weights for j .
5.如权利要求1所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤2中是以原始场景中点云坐标作为点云监督信息,Smooth-L1损失作为损失函数,计算方式如下:5. the point cloud target detection method of a kind of fusion original point cloud and voxel division as claimed in claim 1, it is characterized in that: in described step 2, take point cloud coordinates in original scene as point cloud supervision information, Smooth -L1 loss as a loss function, calculated as follows:
Figure FDA0003111771470000031
Figure FDA0003111771470000031
Figure FDA0003111771470000032
Figure FDA0003111771470000032
其中,r′和r分别表示无损特征提取网络预测的点云空间坐标和原始点云的空间坐标,φ(p)表示整个原始场景中的点云集合,在损失函数的监督下,无损特征提取网络Pointnt++对局部邻域信息的感知效果得到进一步提升。Among them, r' and r represent the spatial coordinates of the point cloud predicted by the lossless feature extraction network and the spatial coordinates of the original point cloud, respectively, φ(p) represents the point cloud set in the entire original scene, under the supervision of the loss function, the lossless feature extraction The perception effect of the network Pointnt++ on local neighborhood information is further improved.
6.如权利要求1所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤3.1中初始化是先将点云空间均匀划分成体素格网,包含点的体素的被保留,不包含点的体素被舍弃,然后对保留下来的体素,利用步骤1中得到的局部细节特征进行初始化,令
Figure FDA0003111771470000033
表示欧式空间中距离体素中心Vj最近的k个点组成的组合,Pt表示
Figure FDA0003111771470000034
的一点,则体素中心特征
Figure FDA0003111771470000035
的计算方式如下:
6. the point cloud target detection method of a kind of fusion original point cloud and voxel division as claimed in claim 1, it is characterized in that: in described step 3.1, initialize is to divide point cloud space uniformly into voxel grid first, including The voxels of the point are retained, and the voxels that do not contain the point are discarded, and then the retained voxels are initialized using the local detail features obtained in step 1, so that
Figure FDA0003111771470000033
represents the combination of k points closest to the voxel center V j in Euclidean space, and P t represents
Figure FDA0003111771470000034
, then the voxel center feature
Figure FDA0003111771470000035
is calculated as follows:
Figure FDA0003111771470000036
Figure FDA0003111771470000036
Figure FDA0003111771470000037
Figure FDA0003111771470000037
其中,Ft P表示体素中心点Vj邻域内的第t个特征点Pt的特征,wtj表示体素中心点Vj邻域内的第t个点Pt的特征加权权重。Among them, F t P represents the feature of the t-th feature point P t in the neighborhood of the voxel center point V j , and w tj represents the feature weighting weight of the t-th point P t in the neighborhood of the voxel center point V j .
7.如权利要求1所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤3.2是采用Spconv库堆叠4层稀疏卷积模块,其中每个稀疏卷积模块包含两层子流型卷积模块和一层下采样为2的点云稀疏卷积模块,假设输入的体素体征张量表示为L×W×H×C,L、W、H、C分别表示体素场景的长、宽、高和每一个体素的特征维度,那么经过4层稀疏卷积输出可以表示为
Figure FDA0003111771470000041
C′表示经过特征提取后的特征维度。
7. The point cloud target detection method of fusion original point cloud and voxel division as claimed in claim 1, it is characterized in that: described step 3.2 is to adopt Spconv library to stack 4 layers of sparse convolution modules, wherein each sparse convolution module The convolution module includes a two-layer sub-flow convolution module and a layer of point cloud sparse convolution module with a downsampling of 2. It is assumed that the input voxel sign tensor is expressed as L×W×H×C, L, W, H , C represent the length, width, height of the voxel scene and the feature dimension of each voxel, respectively, then the output after 4 layers of sparse convolution can be expressed as
Figure FDA0003111771470000041
C' represents the feature dimension after feature extraction.
8.如权利要求1所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤3.3中假设从步骤1提取得到的三层语义信息特征表示为
Figure FDA0003111771470000042
其中4×表示下采样四倍,令
Figure FDA0003111771470000043
表示欧式空间中距离经过稀疏卷积的体素中心
Figure FDA0003111771470000044
最近的k个点组成的组合,Pt,4×、Pt,8×、Pt,16×均为
Figure FDA0003111771470000045
中的点,则稀疏卷积后的体素中心特征
Figure FDA0003111771470000046
的计算方式如下:
8. A point cloud target detection method fused with original point cloud and voxel division as claimed in claim 1, characterized in that: in said step 3.3, it is assumed that the three-layer semantic information feature extracted from step 1 is expressed as
Figure FDA0003111771470000042
where 4× means downsampling four times, let
Figure FDA0003111771470000043
Represents the center of the voxel whose distance is sparsely convolved in Euclidean space
Figure FDA0003111771470000044
The combination of the nearest k points, P t,4× , P t,8× , P t,16× are all
Figure FDA0003111771470000045
The point in , the voxel center feature after sparse convolution
Figure FDA0003111771470000046
is calculated as follows:
Figure FDA0003111771470000047
Figure FDA0003111771470000047
Figure FDA0003111771470000048
Figure FDA0003111771470000048
Figure FDA0003111771470000049
Figure FDA0003111771470000049
Figure FDA00031117714700000410
Figure FDA00031117714700000410
其中,
Figure FDA00031117714700000411
表示经过3D稀疏卷积后的体素中心,Pt,4×、Pt,8×、Pt,16×表示进行特征加权的空间点,
Figure FDA00031117714700000412
表示体素中心点
Figure FDA00031117714700000413
邻域内的第t个点在下采样四倍层的的特征,wtj,4×表示体素中心
Figure FDA00031117714700000414
邻域内的第t个点在下采样四倍层的特征加权权重。
in,
Figure FDA00031117714700000411
represents the voxel center after 3D sparse convolution, P t,4× , P t,8× , P t,16× represent the spatial point for feature weighting,
Figure FDA00031117714700000412
Represents the voxel center point
Figure FDA00031117714700000413
The t-th point in the neighborhood is the feature of the down-sampled quadruple layer, w tj, 4× represents the voxel center
Figure FDA00031117714700000414
The t-th point within the neighborhood is downsampled by four layers of feature weighting weights.
9.如权利要求1所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤4包括以下两个子步骤:9. the point cloud target detection method of a kind of fusion original point cloud and voxel division as claimed in claim 1 is characterized in that: described step 4 comprises following two sub-steps: 步骤4.1,RPN网络结构和预定义检测锚框设置;Step 4.1, RPN network structure and predefined detection anchor frame settings; RPN由四层二维卷积神经网络搭建,采用U-Net网络结构逐层地输出,具体表示为
Figure FDA00031117714700000415
各层均采用3×3卷积以减小学习参数,采用编码-解码网络结构对融合后的信息进一步特征抽象,并在最终的特征图上对每一个像素点预先设置一个对应的检测锚框,通过对预先设置的检测锚框进行分类、回归得到RPN检测出的物体,一个三维检测锚框可以表示为{x,y,z,l,w,h,r},(x,y,z)表示检测锚框的中心位置,l、w、h分别对应长、宽、高,r为在x-y平面的旋转角,经过3D稀疏卷积和语义信息融合后的体素特征可以表征为
Figure FDA0003111771470000051
将高度维压缩至特征维度得到二维图像表征为
Figure FDA0003111771470000052
因此对于大小为
Figure FDA0003111771470000053
的特征图,一共会有
Figure FDA0003111771470000054
个预定义检测锚框;
RPN is built by a four-layer two-dimensional convolutional neural network, and the U-Net network structure is used to output layer by layer, which is specifically expressed as
Figure FDA00031117714700000415
Each layer adopts 3×3 convolution to reduce the learning parameters, adopts the encoding-decoding network structure to further abstract the features of the fused information, and presets a corresponding detection anchor frame for each pixel on the final feature map. , the object detected by RPN is obtained by classifying and regressing the pre-set detection anchor frame, a 3D detection anchor frame can be expressed as {x, y, z, l, w, h, r}, (x, y, z ) represents the center position of the detection anchor frame, l, w, h correspond to the length, width and height respectively, r is the rotation angle in the xy plane, the voxel feature after 3D sparse convolution and semantic information fusion can be represented as
Figure FDA0003111771470000051
The height dimension is compressed to the feature dimension to obtain a two-dimensional image representation as
Figure FDA0003111771470000052
So for a size of
Figure FDA0003111771470000053
The feature map of , there will be a total of
Figure FDA0003111771470000054
a predefined detection anchor box;
步骤4.2,点云目标检测损失函数的设计。Step 4.2, the design of point cloud target detection loss function.
10.如权利要求9所述的一种融合原始点云和体素划分的点云目标检测方法,其特征在于:所述步骤4.2中点云目标检测损失函数包括分类损失函数和回归损失函数,分类损失函数Lcls采用交叉熵损失函数,即:10. The point cloud target detection method of fusion original point cloud and voxel division as claimed in claim 9, characterized in that: the point cloud target detection loss function in the step 4.2 comprises a classification loss function and a regression loss function, The classification loss function L cls adopts the cross entropy loss function, namely:
Figure FDA0003111771470000055
Figure FDA0003111771470000055
式中,n表示预先设置的检测锚框的个数,P(ai)表示第i个检测锚框预测的分数,Q(ai)表示该检测锚框的真实的标签值;In the formula, n represents the number of preset detection anchor boxes, P(a i ) represents the predicted score of the ith detection anchor box, and Q(a i ) represents the real label value of the detection anchor box; 回归损失函数Lreg采用Smooth-L1损失函数,即:The regression loss function L reg adopts the Smooth-L1 loss function, namely:
Figure FDA0003111771470000056
Figure FDA0003111771470000056
Figure FDA0003111771470000057
Figure FDA0003111771470000057
式中,n表示预先设置的检测锚框的个数,v表示检测锚框的真实值,v′表示RPN预测的检测锚框的值。In the formula, n represents the number of preset detection anchor boxes, v represents the real value of the detection anchor box, and v′ represents the value of the detection anchor box predicted by RPN.
CN202110651776.XA 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division Pending CN113378854A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110651776.XA CN113378854A (en) 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110651776.XA CN113378854A (en) 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division

Publications (1)

Publication Number Publication Date
CN113378854A true CN113378854A (en) 2021-09-10

Family

ID=77573977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110651776.XA Pending CN113378854A (en) 2021-06-11 2021-06-11 Point cloud target detection method integrating original point cloud and voxel division

Country Status (1)

Country Link
CN (1) CN113378854A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900119A (en) * 2021-09-29 2022-01-07 苏州浪潮智能科技有限公司 A method, system, storage medium and device for lidar vehicle detection
CN113989797A (en) * 2021-10-26 2022-01-28 清华大学苏州汽车研究院(相城) Three-dimensional dynamic target detection method and device based on voxel point cloud fusion
CN114120115A (en) * 2021-11-19 2022-03-01 东南大学 A point cloud target detection method that fuses point features and grid features
CN114120270A (en) * 2021-11-08 2022-03-01 同济大学 Point cloud target detection method based on attention and sampling learning
CN114155524A (en) * 2021-10-29 2022-03-08 中国科学院信息工程研究所 Single-stage 3D point cloud target detection method and device, computer equipment, and medium
CN114219009A (en) * 2021-11-24 2022-03-22 重庆理工大学 A Point Cloud Downsampling Method Based on Partial Attention Mechanism and Mutual Supervision Loss
CN114241225A (en) * 2021-12-03 2022-03-25 南京信息工程大学 Bottom-up three-dimensional target detection method for generating suggestion frame end to end
CN114332494A (en) * 2021-12-22 2022-04-12 北京邮电大学 Three-dimensional target detection and identification method based on multi-source fusion under vehicle-road cooperation scene
CN114463736A (en) * 2021-12-28 2022-05-10 天津大学 Multi-target detection method and device based on multi-mode information fusion
CN114494183A (en) * 2022-01-25 2022-05-13 哈尔滨医科大学附属第一医院 A method and system for automatic measurement of acetabular radius based on artificial intelligence
CN114494807A (en) * 2021-12-22 2022-05-13 深圳元戎启行科技有限公司 Feature extraction method, network training method, electronic device and storage medium
CN114638953A (en) * 2022-02-22 2022-06-17 深圳元戎启行科技有限公司 Point cloud data segmentation method and device and computer readable storage medium
CN114821033A (en) * 2022-03-23 2022-07-29 西安电子科技大学 A detection and recognition method and device for enhanced three-dimensional information based on laser point cloud
CN114882495A (en) * 2022-04-02 2022-08-09 华南理工大学 3D target detection method based on context-aware feature aggregation
CN115082885A (en) * 2022-06-27 2022-09-20 深圳见得空间科技有限公司 Point cloud target detection method, device, equipment and storage medium
CN115147569A (en) * 2022-07-22 2022-10-04 广州小鹏自动驾驶科技有限公司 Obstacle detection method, apparatus, device and storage medium
CN115184364A (en) * 2022-07-06 2022-10-14 青岛科技大学 A tire bubble defect detection method and system based on 3D point cloud
CN115222988A (en) * 2022-07-17 2022-10-21 桂林理工大学 PointEFF fine classification method for urban ground objects based on lidar point cloud data
CN115375731A (en) * 2022-07-29 2022-11-22 大连宗益科技发展有限公司 3D point cloud single-target tracking method of associated points and voxels and related device
CN115471513A (en) * 2022-11-01 2022-12-13 小米汽车科技有限公司 Point cloud segmentation method and device
CN115760983A (en) * 2022-11-21 2023-03-07 清华大学深圳国际研究生院 Point cloud 3D detection method and model based on self-adaption and multistage feature dimension reduction
CN116664874A (en) * 2023-08-02 2023-08-29 安徽大学 A single-stage fine-grained lightweight point cloud 3D object detection system and method
CN117058402A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Real-time point cloud segmentation method and device based on 3D sparse convolution
CN117475410A (en) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 Three-dimensional target detection method, system, equipment and medium based on foreground point screening
CN115760983B (en) * 2022-11-21 2025-07-04 清华大学深圳国际研究生院 Point cloud 3D detection method and model based on adaptive and multi-level feature dimensionality reduction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118564A (en) * 2018-08-01 2019-01-01 湖南拓视觉信息技术有限公司 A kind of three-dimensional point cloud labeling method and device based on fusion voxel
CN111160214A (en) * 2019-12-25 2020-05-15 电子科技大学 3D target detection method based on data fusion
CN112052860A (en) * 2020-09-11 2020-12-08 中国人民解放军国防科技大学 Three-dimensional target detection method and system
CN112418084A (en) * 2020-11-23 2021-02-26 同济大学 Three-dimensional target detection method based on point cloud time sequence information fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118564A (en) * 2018-08-01 2019-01-01 湖南拓视觉信息技术有限公司 A kind of three-dimensional point cloud labeling method and device based on fusion voxel
CN111160214A (en) * 2019-12-25 2020-05-15 电子科技大学 3D target detection method based on data fusion
CN112052860A (en) * 2020-09-11 2020-12-08 中国人民解放军国防科技大学 Three-dimensional target detection method and system
CN112418084A (en) * 2020-11-23 2021-02-26 同济大学 Three-dimensional target detection method based on point cloud time sequence information fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHARLES R. QI ET AL.: "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space", 《ADVANCES 31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》 *
TIANYUAN JIANG,ET AL.: "VIC-Net: Voxelization Information Compensation Network for Point Cloud 3D Object Detection", 《2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021)》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900119A (en) * 2021-09-29 2022-01-07 苏州浪潮智能科技有限公司 A method, system, storage medium and device for lidar vehicle detection
CN113900119B (en) * 2021-09-29 2024-01-30 苏州浪潮智能科技有限公司 Method, system, storage medium and equipment for laser radar vehicle detection
CN113989797A (en) * 2021-10-26 2022-01-28 清华大学苏州汽车研究院(相城) Three-dimensional dynamic target detection method and device based on voxel point cloud fusion
CN114155524A (en) * 2021-10-29 2022-03-08 中国科学院信息工程研究所 Single-stage 3D point cloud target detection method and device, computer equipment, and medium
CN114120270A (en) * 2021-11-08 2022-03-01 同济大学 Point cloud target detection method based on attention and sampling learning
CN114120115A (en) * 2021-11-19 2022-03-01 东南大学 A point cloud target detection method that fuses point features and grid features
CN114120115B (en) * 2021-11-19 2024-08-23 东南大学 Point cloud target detection method integrating point features and grid features
CN114219009B (en) * 2021-11-24 2025-04-25 重庆理工大学 A point cloud downsampling method based on partial attention mechanism and mutual supervision loss
CN114219009A (en) * 2021-11-24 2022-03-22 重庆理工大学 A Point Cloud Downsampling Method Based on Partial Attention Mechanism and Mutual Supervision Loss
CN114241225A (en) * 2021-12-03 2022-03-25 南京信息工程大学 Bottom-up three-dimensional target detection method for generating suggestion frame end to end
CN114494807A (en) * 2021-12-22 2022-05-13 深圳元戎启行科技有限公司 Feature extraction method, network training method, electronic device and storage medium
CN114332494A (en) * 2021-12-22 2022-04-12 北京邮电大学 Three-dimensional target detection and identification method based on multi-source fusion under vehicle-road cooperation scene
CN114463736A (en) * 2021-12-28 2022-05-10 天津大学 Multi-target detection method and device based on multi-mode information fusion
CN114494183A (en) * 2022-01-25 2022-05-13 哈尔滨医科大学附属第一医院 A method and system for automatic measurement of acetabular radius based on artificial intelligence
CN114494183B (en) * 2022-01-25 2024-04-02 哈尔滨医科大学附属第一医院 Automatic acetabular radius measurement method and system based on artificial intelligence
CN114638953B (en) * 2022-02-22 2023-12-22 深圳元戎启行科技有限公司 Point cloud data segmentation method and device and computer readable storage medium
CN114638953A (en) * 2022-02-22 2022-06-17 深圳元戎启行科技有限公司 Point cloud data segmentation method and device and computer readable storage medium
CN114821033A (en) * 2022-03-23 2022-07-29 西安电子科技大学 A detection and recognition method and device for enhanced three-dimensional information based on laser point cloud
CN114882495B (en) * 2022-04-02 2024-04-12 华南理工大学 3D target detection method based on context-aware feature aggregation
CN114882495A (en) * 2022-04-02 2022-08-09 华南理工大学 3D target detection method based on context-aware feature aggregation
CN115082885A (en) * 2022-06-27 2022-09-20 深圳见得空间科技有限公司 Point cloud target detection method, device, equipment and storage medium
CN115184364A (en) * 2022-07-06 2022-10-14 青岛科技大学 A tire bubble defect detection method and system based on 3D point cloud
CN115222988A (en) * 2022-07-17 2022-10-21 桂林理工大学 PointEFF fine classification method for urban ground objects based on lidar point cloud data
CN115147569A (en) * 2022-07-22 2022-10-04 广州小鹏自动驾驶科技有限公司 Obstacle detection method, apparatus, device and storage medium
CN115375731A (en) * 2022-07-29 2022-11-22 大连宗益科技发展有限公司 3D point cloud single-target tracking method of associated points and voxels and related device
CN115471513A (en) * 2022-11-01 2022-12-13 小米汽车科技有限公司 Point cloud segmentation method and device
CN115760983A (en) * 2022-11-21 2023-03-07 清华大学深圳国际研究生院 Point cloud 3D detection method and model based on self-adaption and multistage feature dimension reduction
CN115760983B (en) * 2022-11-21 2025-07-04 清华大学深圳国际研究生院 Point cloud 3D detection method and model based on adaptive and multi-level feature dimensionality reduction
CN116664874B (en) * 2023-08-02 2023-10-20 安徽大学 Single-stage fine-granularity light-weight point cloud 3D target detection system and method
CN116664874A (en) * 2023-08-02 2023-08-29 安徽大学 A single-stage fine-grained lightweight point cloud 3D object detection system and method
CN117058402B (en) * 2023-08-15 2024-03-12 北京学图灵教育科技有限公司 Real-time point cloud segmentation method and device based on 3D sparse convolution
CN117058402A (en) * 2023-08-15 2023-11-14 北京学图灵教育科技有限公司 Real-time point cloud segmentation method and device based on 3D sparse convolution
CN117475410B (en) * 2023-12-27 2024-03-15 山东海润数聚科技有限公司 Three-dimensional target detection method, system, equipment and medium based on foreground point screening
CN117475410A (en) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 Three-dimensional target detection method, system, equipment and medium based on foreground point screening

Similar Documents

Publication Publication Date Title
CN113378854A (en) Point cloud target detection method integrating original point cloud and voxel division
CN111242041B (en) Laser radar three-dimensional target rapid detection method based on pseudo-image technology
Sindagi et al. Mvx-net: Multimodal voxelnet for 3d object detection
CN113850270B (en) Semantic scene completion method and system based on point cloud-voxel aggregation network model
CN112529015B (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
CN110135267B (en) A detection method for small objects in large scene SAR images
CN113177555B (en) Object processing method and device based on cross-level, cross-scale and cross-attention mechanism
KR102333682B1 (en) Semantic segmentation system in 3D point cloud and semantic segmentation method in 3D point cloud using the same
CN108509978A (en) The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN
CN111695494A (en) Three-dimensional point cloud data classification method based on multi-view convolution pooling
CN114332620A (en) A method for vehicle target recognition in airborne images based on feature fusion and attention mechanism
Mosella-Montoro et al. 2D–3D geometric fusion network using multi-neighbourhood graph convolution for RGB-D indoor scene classification
CN112560865A (en) Semantic segmentation method for point cloud under outdoor large scene
Megalingam et al. Concurrent detection and identification of multiple objects using YOLO algorithm
CN117037142A (en) 3D target detection method based on deep learning
CN114550161A (en) An End-to-End 3D Object Sparse Detection Method
CN116665202A (en) A 3D object detection method in spherical coordinates based on special-shaped 3D convolution
Wang et al. A survey of 3D point cloud and deep learning-based approaches for scene understanding in autonomous driving
CN116563488A (en) A 3D Object Detection Method Based on Point Cloud Columnarization
Drobnitzky et al. Survey and systematization of 3D object detection models and methods
CN117710255A (en) Point cloud completion method based on teacher-student network and course learning
CN114821033A (en) A detection and recognition method and device for enhanced three-dimensional information based on laser point cloud
CN114648698A (en) Improved 3D target detection system based on PointPillars
CN118625342A (en) A multi-sensor fusion intelligent vehicle environment perception method and model based on occupancy network
Raparthi et al. Machine Learning Based Deep Cloud Model to Enhance Robustness and Noise Interference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210910

RJ01 Rejection of invention patent application after publication