CN113378854A - Point cloud target detection method integrating original point cloud and voxel division - Google Patents
Point cloud target detection method integrating original point cloud and voxel division Download PDFInfo
- Publication number
- CN113378854A CN113378854A CN202110651776.XA CN202110651776A CN113378854A CN 113378854 A CN113378854 A CN 113378854A CN 202110651776 A CN202110651776 A CN 202110651776A CN 113378854 A CN113378854 A CN 113378854A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- point
- feature
- layer
- voxel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 91
- 238000000605 extraction Methods 0.000 claims abstract description 47
- 230000006870 function Effects 0.000 claims abstract description 38
- 230000004927 fusion Effects 0.000 claims abstract description 20
- 230000008447 perception Effects 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims abstract description 13
- 238000012546 transfer Methods 0.000 claims description 22
- 238000005070 sampling Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 6
- 230000000717 retained effect Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 abstract 1
- 238000010276 construction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
本发明涉及一种融合原始点云和体素划分的点云目标检测方法。首先利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征,然后构建损失函数进一步提升无损特征提取网络Pointnt++对局部邻域信息的感知能力,采用三线性插值在体素特征初始化阶段和稀疏卷积感知阶段将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中,最后通过二维RPN对预先设置的检测锚框进行分类、回归得到最终的检测目标。本发明通过将点云的无损编码多尺度嵌入到体素方法中,使检测网络具备多尺度多层级信息融合感知能力,而且本发明融合了基于原始点云和基于体素划分两类点云目标检测方法,同时具备高效的点云感知能力和无损特征编码能力。
The invention relates to a point cloud target detection method integrating original point cloud and voxel division. First, the lossless feature extraction network Pointnet++ is used to extract the local detail features and semantic features of point clouds, and then a loss function is constructed to further improve the perception ability of the lossless feature extraction network Pointnt++ for local neighborhood information. Trilinear interpolation is used in the voxel feature initialization stage and sparse volume. In the cumulative perception stage, the local detail features and semantic features without information loss are embedded into the point cloud target detection network based on voxel division. Finally, the pre-set detection anchor boxes are classified and regressed by two-dimensional RPN to obtain the final detection target. The invention makes the detection network have the ability of multi-scale and multi-level information fusion and perception by embedding the lossless coding multi-scale of the point cloud into the voxel method, and the invention integrates two types of point cloud targets based on the original point cloud and based on the voxel division. The detection method has both efficient point cloud perception ability and lossless feature encoding ability.
Description
技术领域technical field
本发明属于3D点云目标检测技术领域,特别是涉及一种融合原始点云和体素划分的点云目标检测方法。The invention belongs to the technical field of 3D point cloud target detection, and in particular relates to a point cloud target detection method integrating original point cloud and voxel division.
背景技术Background technique
随着车载激光雷达技术的不断升级,车载激光雷达能够快速、方便地获取当前场景的点云数据,利用场景点云的几何结构信息可以实现对场景中目标的提取,该技术方法已渗透到智慧城市建设、自动驾驶、无人配送等多个行业。由于激光点云散乱无序、密度和稀疏性差异巨大,如果采用传统目标检测算法对海量的点云数据进行统一的手工特征提取,无法适应在自动驾驶复杂道路场景下的目标的形体变化。因此,基于深度学习的点云目标检测算法在自动驾驶场景得到了快速地发展和应用。With the continuous upgrading of vehicle lidar technology, vehicle lidar can quickly and easily obtain the point cloud data of the current scene, and the geometric structure information of the scene point cloud can be used to extract the targets in the scene. This technical method has penetrated into the wisdom of Urban construction, autonomous driving, unmanned distribution and other industries. Because the laser point cloud is scattered and disordered, and the density and sparsity vary greatly, if the traditional target detection algorithm is used to extract the massive point cloud data by hand, it cannot adapt to the shape change of the target in the complex road scene of automatic driving. Therefore, point cloud target detection algorithms based on deep learning have been rapidly developed and applied in autonomous driving scenarios.
当前较为通用的基于深度学习的点云目标检测方法主要有:基于原始点云的目标检测和基于体素划分的点云目标检测。At present, the more common point cloud target detection methods based on deep learning mainly include: target detection based on original point cloud and point cloud target detection based on voxel division.
基于原始点云的3D目标检测算法不对场景点云做任何预处理,直接将原始点云坐标及对应的反射率数值输入由多层感知机(MLP)搭建的神经网络,采用最远点采样(FPS)由浅入深地对点云场景进行逐层采样,通过局部点集特征提取模块(Set Abstract)提取局部细节特征和语义特征,最后采用三线性插值将细节信息特征和语义信息特征通过特征传递层(Feature Propagation)赋予给原始场景中的所有点。该方法是无信息丢失的,但是多层感知器对无序点云的感知能力低于基于体素划分方法所采用的由卷积神经网络所搭建的结构。The 3D target detection algorithm based on the original point cloud does not do any preprocessing on the scene point cloud, but directly inputs the original point cloud coordinates and the corresponding reflectance values into the neural network constructed by the multilayer perceptron (MLP), and uses the farthest point sampling ( FPS) samples the point cloud scene layer by layer from shallow to deep, extracts local detail features and semantic features through the local point set feature extraction module (Set Abstract), and finally uses trilinear interpolation to transfer the detail information features and semantic information features through features. Layer (Feature Propagation) assigned to all points in the original scene. This method has no information loss, but the perception ability of multi-layer perceptron for disordered point cloud is lower than the structure constructed by convolutional neural network adopted by voxel-based method.
基于体素划分的点云目标检测根据不同线型激光雷达扫描到的点云密集程度将场景点云划分成均匀的体素格网,再采用适应不同体素大小的的体素特征提取方式对每个体素做特征提取,然后利用3D卷积或者3D稀疏卷积对初始化后的体素场景语义信息做特征提取,并逐步将高度维压缩至一维,进一步采用二维卷积搭建区域提出网络(RPN)在场景俯视图下对每一个卷积格点预先设置的锚框进行分类和预测。该方法能够快速高效地在自动驾驶点云场景中分类出不易形变且点云密度大的物体,但是体素划分导致了原始点云结构几何形变,尤其是针对行人和自行车这种较小物体,体素划分带来的形变丢失了局部细节信息,使得检测的分类和回归效果偏离真实目标。The point cloud target detection based on voxel division divides the scene point cloud into uniform voxel grids according to the density of point clouds scanned by different line-type lidars, and then uses voxel feature extraction methods adapted to different voxel sizes to detect Feature extraction is performed for each voxel, and then 3D convolution or 3D sparse convolution is used to extract features from the initialized voxel scene semantic information, and gradually compress the height dimension to one dimension, and further use two-dimensional convolution to build a region proposal network (RPN) classifies and predicts the pre-set anchor boxes for each convolution lattice point under the top view of the scene. This method can quickly and efficiently classify objects that are not easily deformed and have high point cloud density in autonomous driving point cloud scenes, but the voxel division leads to geometric deformation of the original point cloud structure, especially for small objects such as pedestrians and bicycles. The deformation caused by voxel division loses local detail information, which makes the classification and regression effects of detection deviate from the real target.
发明内容SUMMARY OF THE INVENTION
本发明针对现有技术的不足,提供一种融合原始点云和体素划分的点云目标检测方法。首先利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征,然后构建损失函数进一步提升无损特征提取网络Pointnt++对局部邻域信息的感知能力,接着分别采用三线性插值在体素特征初始化阶段和稀疏卷积感知阶段将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中,最后通过二维RPN对每个预先设置的检测锚框进行分类和回归得到最终的检测目标。Aiming at the deficiencies of the prior art, the present invention provides a point cloud target detection method integrating original point cloud and voxel division. First, the lossless feature extraction network Pointnet++ is used to extract the local detail features and semantic features of the point cloud, and then a loss function is constructed to further improve the perception ability of the lossless feature extraction network Pointnt++ for local neighborhood information. Then, trilinear interpolation is used in the voxel feature initialization stage and The sparse convolution perception stage embeds local detail features and semantic features without information loss into the point cloud object detection network based on voxel division, and finally classifies and regresses each preset detection anchor box through 2D RPN to obtain the final detection target.
为了达到上述目的,本发明提供的技术方案是一种融合原始点云和体素划分的点云目标检测方法,包括以下步骤:In order to achieve the above purpose, the technical solution provided by the present invention is a point cloud target detection method that integrates original point cloud and voxel division, including the following steps:
步骤1,利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征;Step 1, use the lossless feature extraction network Pointnet++ to extract the local detail features and semantic features of the point cloud;
步骤1.1,构建多层编码器;Step 1.1, build a multi-layer encoder;
步骤1.2,通过SA模块无信息丢失地提取每一层点云的局部细节特征和语义特征;Step 1.2, extract the local detail features and semantic features of each layer of point cloud without information loss through the SA module;
步骤1.3,采用三线性插值将步骤1.2提取的细节特征和语义特征通过特征传递层赋予给原始场景中的所有点;Step 1.3, using trilinear interpolation to assign the detailed features and semantic features extracted in step 1.2 to all points in the original scene through the feature transfer layer;
步骤2,构建损失函数,监督步骤1特征提取的执行,促进无损特征提取网络Pointnet++感知特征信息;Step 2, build a loss function, supervise the execution of feature extraction in step 1, and promote the lossless feature extraction network Pointnet++ to perceive feature information;
步骤3,将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中;Step 3: Embed local detail features and semantic features without information loss into the point cloud target detection network based on voxel division;
步骤3.1,利用步骤1提取到的局部细节特征对体素特征进行初始化;Step 3.1, initialize the voxel features using the local detail features extracted in step 1;
步骤3.2,使用3D稀疏卷积对步骤3.1初始化后的体素场景语义信息做特征提取;Step 3.2, use 3D sparse convolution to perform feature extraction on the semantic information of the voxel scene initialized in step 3.1;
步骤3.3,采用三线性插值将步骤1得到的语义特征转化为体素特征;Step 3.3, using trilinear interpolation to convert the semantic features obtained in step 1 into voxel features;
步骤3.4,采用注意力机制方式将步骤3.2经过稀疏卷积感知的语义特征与将步骤3.3转化得到的体素特征进行融合,得到融合两种感知模式的语义信息;In step 3.4, the attention mechanism is used to fuse the semantic features perceived by the sparse convolution in step 3.2 with the voxel features transformed in step 3.3 to obtain semantic information that fuses the two perception modes;
步骤4,将步骤3融合得到的语义特征投影到二维俯视图,通过二维卷积搭建区域提出网络(RPN),在场景俯视图下对每个像素点预先设置的检测锚框进行分类和回归得到最终的检测目标;Step 4: Project the semantic features obtained by fusion in Step 3 to a two-dimensional top view, build a Region Proposition Network (RPN) through two-dimensional convolution, and classify and regress the pre-set detection anchor boxes for each pixel in the top view of the scene. the final detection target;
步骤4.1,RPN网络结构和预定义检测锚框设置;Step 4.1, RPN network structure and predefined detection anchor frame settings;
步骤4.2,点云目标检测损失函数的设计。Step 4.2, the design of point cloud target detection loss function.
而且,所述步骤1.1中构建多层编码器需首先利用最远点采样策略(FPS)从原始点云中采集N个点作为输入点云,然后再利用FPS从输入点云数据中逐层采集数目为的点云,构成4层编码器,每一层输入的点云即为上一层输出的点集。Moreover, the construction of the multi-layer encoder in the step 1.1 needs to first use the farthest point sampling strategy (FPS) to collect N points from the original point cloud as the input point cloud, and then use FPS to collect from the input point cloud data layer by layer. The number is The point cloud of each layer constitutes a 4-layer encoder, and the input point cloud of each layer is the point set output by the previous layer.
而且,所述步骤1.2中每一层SA模块的输入为上一层经过FPS采样得到的固定数目的点集,设点pi为当前层通过FPS采样得到的第i个点,为上一层中以点pi为中心,半径为r的球邻域内部的点所构成的集合,点pi输出特征的计算包括以下几步:Moreover, the input of each layer SA module in the described step 1.2 is a fixed number of point sets obtained by the previous layer through FPS sampling, and the point p i is the i-th point obtained by the current layer through FPS sampling, is the set of points inside the spherical neighborhood with point pi as the center and radius r in the previous layer. The calculation of the output feature of point pi includes the following steps:
步骤1.2.1,从集合中随机采样k个点组成集合 Step 1.2.1, from the collection Randomly sample k points in the set to form a set
步骤1.2.2,通过多层感知机对步骤1.2.1采样的点进行特征融合提取,计算公式如下:Step 1.2.2, perform feature fusion extraction on the points sampled in step 1.2.1 through a multi-layer perceptron. The calculation formula is as follows:
其中,MLP表示多层感知机对点特征的高维映射,max()代表在点集合的特征维上取最大值,f(pi)即是点pi的输出特征;Among them, MLP represents the high-dimensional mapping of the multi-layer perceptron to the point feature, max() represents the maximum value on the feature dimension of the point set, and f( pi ) is the output feature of the point pi ;
步骤1.2.3,对每一层输入的点云重复FPS采样到对应数目的点云,然后对采样得到的点通过步骤1.2.2聚合邻域特征,由此完成无信息损失的特征提取。其中第一层提取到的是局部细节特征,后三层提取到的为语义特征。Step 1.2.3, repeat the FPS sampling for the input point cloud of each layer to the corresponding number of point clouds, and then aggregate the neighborhood features for the points obtained through the step 1.2.2, thereby completing the feature extraction without information loss. The first layer extracts local detail features, and the last three layers extract semantic features.
而且,所述步骤1.3中特征传递为特征提取的逆过程,从提取层的最后一层出发,依次向上一层做特征传递,即从层传递到层、层传递到层、层传递到层、层传递到N层。以层传递到层为例介绍特征的传递过程,假设点pi为层需要传递特征的点,φ(Pi)表示层中欧式空间中距离Pi最近的k个点组成的组合,Pj表示φ(Pi)中的一点,三线性插值特征传递的计算方法如下:Moreover, the feature transfer in the step 1.3 is the inverse process of feature extraction. Starting from the last layer of the extraction layer, the features are transferred to the upper layer in turn, that is, from the last layer of the extraction layer. layer passed to Floor, layer passed to Floor, layer passed to Floor, layers are passed to N layers. by layer passed to Take the layer as an example to introduce the feature transfer process, assuming that the point pi is The point where the layer needs to transfer features, φ(P i ) represents The combination of k points closest to P i in the Euclidean space in the layer, P j represents a point in φ(P i ), and the calculation method of trilinear interpolation feature transfer is as follows:
式中,f(pi)是需要传递的特征,f(pj)表示在点Pi邻域内的第j个点Pj的特征,wij表示点Pi邻域内的第j个点Pj的特征加权权重。In the formula, f(p i ) is the feature to be transferred, f(p j ) represents the feature of the jth point P j in the neighborhood of point P i , and w ij represents the jth point P in the neighborhood of point P i The feature weights for j .
每一个被传递的点的特征通过对其下一层领域内k个点的特征进行欧式距离的加权求和得到,逐层向前传递即可传递给场景中的每一个点云,使其具备无损失的信息特征。The feature of each transferred point is obtained by the weighted summation of the Euclidean distance of the features of the k points in the next layer of the field, and the forward transfer can be transferred to each point cloud in the scene layer by layer, so that it has Lossless informative features.
而且,所述步骤2中是以原始场景中点云坐标作为点云监督信息,Smooth-L1损失作为损失函数,计算方式如下:Moreover, in the step 2, the point cloud coordinates in the original scene are used as the point cloud supervision information, and the Smooth-L1 loss is used as the loss function, and the calculation method is as follows:
其中,r′和r分别表示无损特征提取网络预测的点云空间坐标和原始点云的空间坐标,φ(p)表示整个原始场景中的点云集合,在损失函数的监督下,无损特征提取网络Pointnt++对局部邻域信息的感知效果得到进一步提升。Among them, r' and r represent the spatial coordinates of the point cloud predicted by the lossless feature extraction network and the spatial coordinates of the original point cloud, respectively, φ(p) represents the point cloud set in the entire original scene, under the supervision of the loss function, the lossless feature extraction The perception effect of the network Pointnt++ on local neighborhood information is further improved.
而且,所述步骤3.1中初始化是先将点云空间均匀划分成体素格网,包含点的体素的被保留,不包含点的体素被舍弃,然后对保留下来的体素,利用步骤1中得到的局部细节特征进行初始化。假设步骤1中编码器网络第一层的输出为其中Pi表示原始点云空间中需要传递特征的点,Fi P为点Pi的特征,表示步骤1中编码器一共提取到了个点的局部细节特征。体素中心Vj表示体素中心,表示体素中心Vj需要被赋予的特征,M表示一共有M个体素中心需要被赋值。通过三线性插值函数对体素中心的特征进行赋值,令表示欧式空间中距离Vj最近的k个点组成的组合,Pt表示的一点,则的计算方式如下:Moreover, the initialization in the step 3.1 is to first divide the point cloud space into a voxel grid uniformly, the voxels containing the points are retained, and the voxels not containing the points are discarded, and then the retained voxels are used in step 1. The local detail features obtained in are initialized. Suppose the output of the first layer of the encoder network in step 1 is where Pi represents the point in the original point cloud space that needs to transfer features, F i P is the feature of point Pi , Indicates that in step 1, the encoder has extracted a total of local detail features of a point. voxel center V j represents the voxel center, Indicates the feature that the voxel center V j needs to be assigned, and M indicates a total of M voxel centers that need to be assigned. The feature of the voxel center is assigned by the trilinear interpolation function, let represents the combination of k points closest to V j in Euclidean space, and P t represents a point, then is calculated as follows:
其中,Ft P表示体素中心点Vj邻域内的第t个特征点Pt的特征,wtj表示体素中心点Vj邻域内的第t个点Pt的特征加权权重。Among them, F t P represents the feature of the t-th feature point P t in the neighborhood of the voxel center point V j , and w tj represents the feature weighting weight of the t-th point P t in the neighborhood of the voxel center point V j .
而且,所述步骤3.2是采用Spconv库堆叠4层稀疏卷积模块,其中每个稀疏卷积模块包含两层子流型卷积模块和一层下采样为2的点云稀疏卷积模块。假设输入的体素体征张量表示为L×W×H×C,其中L、W、H、C分别表示体素场景的长、宽、高和每一个体素的特征维度,那么经过4层稀疏卷积输出可以表示为其中C′表示经过特征提取后的特征维度。Moreover, the step 3.2 is to use the Spconv library to stack 4 layers of sparse convolution modules, wherein each sparse convolution module includes two layers of sub-flow type convolution modules and one layer of point cloud sparse convolution modules with a downsampling of 2. Assuming that the input voxel sign tensor is represented as L×W×H×C, where L, W, H, and C represent the length, width, height of the voxel scene and the feature dimension of each voxel respectively, then after 4 layers The sparse convolution output can be expressed as where C' represents the feature dimension after feature extraction.
而且,所述步骤3.3中假设从步骤1提取得到的三层语义信息特征表示为其中4×表示下采样四倍,经过稀疏卷积后的体素坐标为 表示体素中心,表示体素中心需要被赋予的特征。采用三线性插值将点的语义特征转化到体素中心表征,令表示欧式空间中距离最近的k个点组成的组合,Pt,4×、Pt,8×、Pt,16×均为中的点,则的计算方式如下:Moreover, in step 3.3, it is assumed that the three-layer semantic information feature extracted from step 1 is expressed as where 4× means downsampling four times, and the voxel coordinates after sparse convolution are represents the voxel center, Represents the voxel center characteristics that need to be assigned. Trilinear interpolation is used to transform the semantic features of points into voxel center representations, so that Represents distance in Euclidean space The combination of the nearest k points, P t,4× , P t,8× , P t,16× are all point in the is calculated as follows:
其中,表示经过3D稀疏卷积后的体素中心,Pt,4×、Pt,8×、Pt,16×表示进行特征加权的空间点,表示体素中心点邻域内的第t个点在下采样四倍层的的特征,wtj,4×表示体素中心邻域内的第t个点在下采样四倍层的特征加权权重。in, represents the voxel center after 3D sparse convolution, P t,4× , P t,8× , P t,16× represent the spatial point for feature weighting, Represents the voxel center point The t-th point in the neighborhood is the feature of the down-sampled quadruple layer, w tj, 4× represents the voxel center The t-th point within the neighborhood is downsampled by four layers of feature weighting weights.
而且,所述步骤3.4中是先将两种语义信息在特征维度上进行串联,假设步骤3.3转化得到的体素特征维度为M1,稀疏卷积感知得到的体素特征为M2,那么叠加后的体素特征维度为M1+M2,然后采用一层多层感知器将组合后的特征M1+M2维度映射为M1。Moreover, in the step 3.4, the two kinds of semantic information are first connected in series in the feature dimension. Assuming that the voxel feature dimension obtained by the transformation in step 3.3 is M 1 , and the voxel feature obtained by sparse convolution perception is M 2 , then the superposition The dimension of the final voxel feature is M 1 +M 2 , and then a layer of multi-layer perceptron is used to map the combined feature M 1 +M 2 dimension to M 1 .
而且,所述步骤4.1中RPN由四层二维卷积神经网络搭建,采用U-Net网络结构逐层地输出,具体表示为 各层均采用3×3卷积以减小学习参数。采用编码-解码网络结构对融合后的信息进一步特征抽象,并在最终的特征图上对每一个像素点预先设置一个对应的检测锚框,通过对预先设置的检测锚框进行分类、回归得到RPN检测出的物体。一个三维检测锚框可以表示为{x,y,z,l,w,h,r},(x,y,z)表示检测锚框的中心位置,l、w、h分别对应长、宽、高,r为在x-y平面的旋转角。经过3D稀疏卷积和语义信息融合后的体素特征可以表征为将高度维压缩至特征维度得到二维图像表征为因此对于大小为的特征图,一共会有个预定义检测锚框。Moreover, in the step 4.1, the RPN is constructed by a four-layer two-dimensional convolutional neural network, and the U-Net network structure is used to output layer by layer, which is specifically expressed as Each layer adopts 3×3 convolution to reduce the learning parameters. The encoding-decoding network structure is used to further abstract the features of the fused information, and a corresponding detection anchor frame is preset for each pixel on the final feature map, and the RPN is obtained by classifying and regressing the preset detection anchor frame. detected objects. A three-dimensional detection anchor box can be expressed as {x,y,z,l,w,h,r}, (x,y,z) represents the center position of the detection anchor box, l, w, h correspond to the length, width, height, r is the rotation angle in the xy plane. The voxel features after 3D sparse convolution and semantic information fusion can be characterized as The height dimension is compressed to the feature dimension to obtain a two-dimensional image representation as So for a size of The feature map of , there will be a total of predefined detection anchor boxes.
而且,所述步骤4.2中分类损失函数Lcls采用交叉熵损失函数,即:Moreover, in the step 4.2, the classification loss function L cls adopts the cross entropy loss function, namely:
式中,n表示预先设置的检测锚框的个数,P(ai)表示第i个检测锚框预测的分数,Q(ai)表示该检测锚框的真实的标签值。In the formula, n represents the number of preset detection anchor boxes, P(a i ) represents the predicted score of the ith detection anchor box, and Q(a i ) represents the real label value of the detection anchor box.
回归损失函数Lreg采用Smooth-L1损失函数,即:The regression loss function L reg adopts the Smooth-L1 loss function, namely:
式中,n表示预先设置的检测锚框的个数,v表示检测锚框的真实值,v′表示RPN预测的检测锚框的值。In the formula, n represents the number of preset detection anchor boxes, v represents the real value of the detection anchor box, and v′ represents the value of the detection anchor box predicted by RPN.
通过分类损失函数和回归损失函数的联合监督,网络最终可学习到点云目标检测的能力。Through the joint supervision of the classification loss function and the regression loss function, the network can finally learn the ability of point cloud object detection.
与现有技术相比,本发明具有如下优点:(1)联合了当前基于体素划分和基于原始点云的点云目标检测方法的优点,同时具备高效的点云感知能力和无损特征编码的能力;(2)通过将点云的无损编码多尺度嵌入到体素方法中,促使检测网络具备多尺度多层级信息融合感知能力。Compared with the prior art, the present invention has the following advantages: (1) It combines the advantages of the current point cloud target detection method based on voxel division and original point cloud, and has efficient point cloud perception ability and lossless feature encoding. (2) By embedding the lossless encoding of point cloud multi-scale into the voxel method, the detection network has the ability of multi-scale and multi-level information fusion perception.
附图说明Description of drawings
图1是本发明实施例的流程图。FIG. 1 is a flowchart of an embodiment of the present invention.
图2是本发明实施例的检测实例图,其中图2(a)为输入点云,图2(b)为点云检测锚框。Fig. 2 is a diagram of a detection example according to an embodiment of the present invention, wherein Fig. 2(a) is an input point cloud, and Fig. 2(b) is a point cloud detection anchor frame.
具体实施方式Detailed ways
本发明提供一种融合原始点云和体素划分的点云目标检测方法,首先利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征,然后构建损失函数进一步提升无损特征提取网络Pointnt++对局部邻域信息的感知能力,接着分别采用三线性插值在体素特征初始化阶段和稀疏卷积感知阶段将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中,最后通过二维RPN对每个预先设置的检测锚框进行分类和回归得到最终的检测目标。The invention provides a point cloud target detection method that integrates original point cloud and voxel division. First, the lossless feature extraction network Pointnet++ is used to extract the local detail features and semantic features of the point cloud, and then a loss function is constructed to further improve the lossless feature extraction network Pointnt++. The perception ability of neighborhood information, and then use trilinear interpolation to embed local detail features and semantic features without information loss into the point cloud target detection network based on voxel division in the voxel feature initialization stage and the sparse convolution perception stage, respectively. Finally, the final detection target is obtained by classifying and regressing each pre-set detection anchor box by 2D RPN.
下面结合附图和实施例对本发明的技术方案作进一步说明。The technical solutions of the present invention will be further described below with reference to the accompanying drawings and embodiments.
步骤1,利用无损特征提取网络Pointnet++提取点云局部细节特征和语义特征。Step 1, use the lossless feature extraction network Pointnet++ to extract the local detail features and semantic features of the point cloud.
首先对输入点云采集到固定数目N,接着逐层采样并搭建局部点云特征提取器(SetAbstraction,SA)对局部场景做特征提取,然后采用三线性插值将局部细节特征和语义特征通过特征传递层(Feature Propagation)赋予给原始场景中的所有点。包括以下子步骤:First collect a fixed number N of input point clouds, then sample layer by layer and build a local point cloud feature extractor (SetAbstraction, SA) to extract features from the local scene, and then use trilinear interpolation to transfer local detail features and semantic features through features Layer (Feature Propagation) assigned to all points in the original scene. Includes the following sub-steps:
步骤1.1,构建多层编码器。Step 1.1, build a multi-layer encoder.
首先利用最远点采样策略(FPS)从原始点云中采集N个点作为输入点云,然后再利用FPS从输入点云数据中逐层采集数目为的点云,构成4层编码器,每一层输入的点云即为上一层输出的点集。First use the farthest point sampling strategy (FPS) to collect N points from the original point cloud as the input point cloud, and then use FPS to collect the number of layers from the input point cloud data as The point cloud of each layer constitutes a 4-layer encoder, and the input point cloud of each layer is the point set output by the previous layer.
步骤1.2,通过SA模块无信息丢失地提取每一层点云的局部细节特征和语义特征。In step 1.2, the local detail features and semantic features of each layer of point clouds are extracted through the SA module without information loss.
每一层SA模块的输入为上一层经过FPS采样得到的固定数目的点集,设点pi为当前层通过FPS采样得到的第i个点,为上一层中以点pi为中心,半径为r的球邻域内部的点所构成的集合。pi的输出特征的计算包括以下几步:The input of the SA module of each layer is a fixed number of point sets obtained by the FPS sampling of the previous layer, and the point p i is the i-th point obtained by the FPS sampling of the current layer, is the set of points inside the spherical neighborhood with point pi as the center and radius r in the previous layer. The calculation of the output features of p i includes the following steps:
步骤1.2.1,从集合中随机采样k个点组成集合 Step 1.2.1, from the collection Randomly sample k points in the set to form a set
步骤1.2.2,通过多层感知机对步骤1.2.1采样的点进行特征融合提取,计算得到点pi的输出特征。Step 1.2.2, perform feature fusion extraction on the points sampled in step 1.2.1 through a multi-layer perceptron, and calculate the output feature of point p i .
首先采用多层感知机对步骤1.2.1随机采样的点集合做局部细节特征提取,得到个点的高维映射特征,然后在特征维度通过最大池化得到在特征维度上的最大信息表征,该最大信息表征的高维映射特征即是点pi的输出特征。计算公式如下:First, a multi-layer perceptron is used to randomly sample the point set in step 1.2.1 Do local detail feature extraction to get The high-dimensional mapping feature of each point, and then the maximum information representation in the feature dimension is obtained by maximum pooling in the feature dimension. The high-dimensional mapping feature of the maximum information representation is the output feature of point p i . Calculated as follows:
其中,MLP表示多层感知机对点特征的高维映射,max()代表在点集合的特征维上取最大值,f(pi)即是点pi的输出特征。Among them, MLP represents the high-dimensional mapping of the multi-layer perceptron to the point feature, max() represents the maximum value on the feature dimension of the point set, and f( pi ) is the output feature of the point pi .
步骤1.2.3,对每一层输入的点云重复FPS采样到对应数目的点云,然后对采样得到的点通过步骤1.2.2聚合邻域特征,由此完成无信息损失的特征提取。其中第一层提取到的是局部细节特征,后三层提取到的为语义特征。Step 1.2.3, repeat the FPS sampling for the input point cloud of each layer to the corresponding number of point clouds, and then aggregate the neighborhood features for the points obtained through the step 1.2.2, thereby completing the feature extraction without information loss. The first layer extracts local detail features, and the last three layers extract semantic features.
步骤1.3,采用三线性插值将步骤1.2提取的细节特征和语义特征通过特征传递层赋予给原始场景中的所有点。In step 1.3, trilinear interpolation is used to assign the detailed features and semantic features extracted in step 1.2 to all points in the original scene through the feature transfer layer.
特征传递为特征提取的逆过程,从提取层的最后一层出发,依次向上一层做特征传递,即从层传递到层、层传递到层、层传递到层、层传递到N层。以层传递到层为例介绍特征的传递过程,假设点pi为层需要传递特征的点,φ(Pi)表示层中欧式空间中距离Pi最近的k个点组成的组合,Pj表示φ(Pi)中的一点,三线性插值特征传递的计算方法如下:Feature transfer is the inverse process of feature extraction. Starting from the last layer of the extraction layer, feature transfer is performed to the upper layer in turn, that is, from the last layer of the extraction layer. layer passed to Floor, layer passed to Floor, layer passed to Floor, layers are passed to N layers. by layer passed to Take the layer as an example to introduce the feature transfer process, assuming that the point pi is The point where the layer needs to transfer features, φ(P i ) represents The combination of k points closest to P i in the Euclidean space in the layer, P j represents a point in φ(P i ), and the calculation method of trilinear interpolation feature transfer is as follows:
式中,f(pi)是需要传递的特征,f(pj)表示在点Pi邻域内的第j个点Pj的特征,wij表示点Pi邻域内的第j个点Pj的特征加权权重。In the formula, f(p i ) is the feature to be transferred, f(p j ) represents the feature of the jth point P j in the neighborhood of point P i , and w ij represents the jth point P in the neighborhood of point P i The feature weights for j .
每一个被传递的点的特征通过对其下一层领域内k个点的特征进行欧式距离的加权求和得到,逐层向前传递即可传递给场景中的每一个点云,使其具备无损失的信息特征。The feature of each transferred point is obtained by the weighted summation of the Euclidean distance of the features of the k points in the next layer of the field, and the forward transfer can be transferred to each point cloud in the scene layer by layer, so that it has Lossless informative features.
步骤2,构建损失函数,监督步骤1特征提取的执行,促进无损特征提取网络Pointnet++感知特征信息。Step 2, build a loss function, supervise the execution of feature extraction in Step 1, and promote the lossless feature extraction network Pointnet++ to perceive feature information.
采用原始场景中点云坐标作为点云监督信息,Smooth-L1损失作为损失函数,计算方式如下:The point cloud coordinates in the original scene are used as the point cloud supervision information, and the Smooth-L1 loss is used as the loss function. The calculation method is as follows:
其中,r′和r分别表示无损特征提取网络预测的点云空间坐标和原始点云的空间坐标,φ(p)表示整个原始场景中的点云集合,在损失函数的监督下,无损特征提取网络Pointnt++对局部邻域信息的感知效果得到进一步提升。Among them, r' and r represent the spatial coordinates of the point cloud predicted by the lossless feature extraction network and the spatial coordinates of the original point cloud, respectively, φ(p) represents the point cloud set in the entire original scene, under the supervision of the loss function, the lossless feature extraction The perception effect of the network Pointnt++ on local neighborhood information is further improved.
步骤3,将无信息损失的局部细节特征和语义特征嵌入到基于体素划分的点云目标检测网络中。Step 3: Embed local detail features and semantic features without information loss into the point cloud object detection network based on voxel division.
首先将原始点云划分成体素,利用步骤1中所提取到的局部细节特征对体素特征进行初始化,接着通过稀疏3D卷积感知点云空间结构,然后再在语义层面融合步骤1提取到的语义特征,包括以下子步骤:First, the original point cloud is divided into voxels, and the voxel features are initialized using the local detail features extracted in step 1. Then, the spatial structure of the point cloud is perceived through sparse 3D convolution, and then the extracted data in step 1 is fused at the semantic level. Semantic features, including the following sub-steps:
步骤3.1,利用步骤1提取到的局部细节特征对体素特征进行初始化。Step 3.1, initialize the voxel features using the local detail features extracted in step 1.
首先将点云空间均匀划分成体素格网,包含点的体素的被保留,不包含点的体素被舍弃,然后对保留下来的体素,利用步骤1中得到的局部细节特征进行初始化。假设步骤1中编码器网络第一层的输出为其中Pi表示原始点云空间中需要传递特征的点,Fi P为点Pi的特征,表示步骤1中编码器一共提取到了个点的局部细节特征。体素中心Vj表示体素中心,表示体素中心Vj需要被赋予的特征,M表示一共有M个体素中心需要被赋值。通过三线性插值函数对体素中心的特征进行赋值,令表示欧式空间中距离Vj最近的k个点组成的组合,Pt表示的一点,则的计算方式如下:First, the point cloud space is evenly divided into a voxel grid, the voxels containing points are retained, and the voxels not containing points are discarded, and then the retained voxels are initialized using the local detail features obtained in step 1. Suppose the output of the first layer of the encoder network in step 1 is where Pi represents the point in the original point cloud space that needs to transfer features, F i P is the feature of point Pi , Indicates that in step 1, the encoder has extracted a total of local detail features of a point. voxel center V j represents the voxel center, Indicates the feature that the voxel center V j needs to be assigned, and M indicates a total of M voxel centers that need to be assigned. The feature of the voxel center is assigned by the trilinear interpolation function, let represents the combination of k points closest to V j in Euclidean space, and P t represents a point, then is calculated as follows:
其中,Ft P表示体素中心点Vj邻域内的第t个特征点Pt的特征,wtj表示体素中心点Vj邻域内的第t个点Pt的特征加权权重。Among them, F t P represents the feature of the t-th feature point P t in the neighborhood of the voxel center point V j , and w tj represents the feature weighting weight of the t-th point P t in the neighborhood of the voxel center point V j .
步骤3.2,使用3D稀疏卷积对步骤3.1初始化后的体素场景语义信息做特征提取。Step 3.2, use 3D sparse convolution to perform feature extraction on the semantic information of the voxel scene initialized in step 3.1.
采用Spconv库堆叠4层稀疏卷积模块,其中每个稀疏卷积模块包含两层子流型卷积模块和一层下采样为2的点云稀疏卷积模块。假设输入的体素体征张量表示为L×W×H×C,其中L、W、H、C分别表示体素场景的长、宽、高和每一个体素的特征维度,那么经过4层稀疏卷积输出可以表示为其中C′表示经过特征提取后的特征维度。The Spconv library is used to stack 4 layers of sparse convolution modules, in which each sparse convolution module contains two layers of sub-stream convolution modules and one layer of point cloud sparse convolution modules with downsampling of 2. Assuming that the input voxel sign tensor is represented as L×W×H×C, where L, W, H, and C represent the length, width, height of the voxel scene and the feature dimension of each voxel respectively, then after 4 layers The sparse convolution output can be expressed as where C' represents the feature dimension after feature extraction.
步骤3.3,采用三线性插值将步骤1得到的语义特征转化为体素特征。Step 3.3, using trilinear interpolation to convert the semantic features obtained in step 1 into voxel features.
假设步骤1得到的三层语义特征为其中4×表示下采样四倍,经过稀疏卷积后的体素坐标为 表示体素中心,表示体素中心需要被赋予的特征。采用三线性插值将点的语义特征转化到体素中心表征,令表示欧式空间中距离最近的k个点组成的组合,Pt,4×、Pt,8×、Pt,16×均为中的点,则的计算方式如下:Suppose the three-layer semantic feature obtained in step 1 is where 4× means downsampling four times, and the voxel coordinates after sparse convolution are represents the voxel center, Represents the voxel center characteristics that need to be assigned. Trilinear interpolation is used to transform the semantic features of points into voxel center representations, so that Represents distance in Euclidean space The combination of the nearest k points, P t,4× , P t,8× , P t,16× are all point in the is calculated as follows:
其中,表示经过3D稀疏卷积后的体素中心,Pt,4×、Pt,8×、Pt,16×表示进行特征加权的空间点,表示体素中心点邻域内的第t个点在下采样四倍层的的特征,wtj,4×表示体素中心邻域内的第t个点在下采样四倍层的特征加权权重。in, represents the voxel center after 3D sparse convolution, P t,4× , P t,8× , P t,16× represent the spatial point for feature weighting, Represents the voxel center point The t-th point in the neighborhood is the feature of the down-sampled quadruple layer, w tj, 4× represents the voxel center The t-th point within the neighborhood is downsampled by four layers of feature weighting weights.
步骤3.4,采用注意力机制方式将步骤3.2经过稀疏卷积感知的语义特征与将步骤3.3转化得到的体素特征进行融合,得到融合两种感知模式的语义信息。In step 3.4, the attention mechanism is used to fuse the semantic features perceived by the sparse convolution in step 3.2 with the voxel features transformed in step 3.3 to obtain semantic information that fuses the two perception modes.
首先将两种语义信息在特征维度上进行串联,假设步骤3.3转化得到的体素特征维度为M1,稀疏卷积感知得到的体素特征为M2,那么叠加后的体素特征维度为M1+M2,然后采用一层多层感知器将组合后的特征M1+M2维度映射为M1。First, the two kinds of semantic information are concatenated in the feature dimension. Assuming that the voxel feature dimension obtained from the transformation in step 3.3 is M 1 , and the voxel feature obtained by sparse convolution perception is M 2 , then the superimposed voxel feature dimension is M 1 + M 2 , and then use a layer of multi-layer perceptron to map the combined feature M 1 +M 2 dimension to M 1 .
步骤4,将步骤3融合得到的语义特征投影到二维俯视图,通过二维卷积搭建区域提出网络(RPN),在场景俯视图下对每个像素点预先设置的检测锚框进行分类和回归得到最终的检测目标,包含以下子步骤:Step 4: Project the semantic features obtained by fusion in Step 3 to a two-dimensional top view, build a Region Proposition Network (RPN) through two-dimensional convolution, and classify and regress the pre-set detection anchor boxes for each pixel in the top view of the scene. The final detection target includes the following sub-steps:
步骤4.1,RPN网络结构和预定义框设置。Step 4.1, RPN network structure and pre-defined frame settings.
RPN由四层二维卷积神经网络搭建,采用U-Net网络结构逐层地输出,具体表示为各层均采用3×3卷积以减小学习参数。采用编码-解码网络结构对融合后的信息进一步特征抽象,并在最终的特征图上对每一个像素点预先设置一个对应的检测锚框,通过对预先设置的检测锚框进行分类、回归得到RPN检测出的物体。一个三维检测锚框可以表示为{x,y,z,l,w,h,r},(x,y,z)表示检测锚框的中心位置,l、w、h分别对应长、宽、高,r为在x-y平面的旋转角。经过3D稀疏卷积和语义信息融合后的体素特征可以表征为将高度维压缩至特征维度得到二维图像表征为因此对于大小为的特征图,一共会有个预定义检测锚框。RPN is built by a four-layer two-dimensional convolutional neural network, and the U-Net network structure is used to output layer by layer, which is specifically expressed as Each layer adopts 3×3 convolution to reduce the learning parameters. The encoding-decoding network structure is used to further abstract the features of the fused information, and a corresponding detection anchor frame is preset for each pixel on the final feature map, and the RPN is obtained by classifying and regressing the preset detection anchor frame. detected objects. A three-dimensional detection anchor box can be expressed as {x,y,z,l,w,h,r}, (x,y,z) represents the center position of the detection anchor box, l, w, h correspond to the length, width, height, r is the rotation angle in the xy plane. The voxel features after 3D sparse convolution and semantic information fusion can be characterized as The height dimension is compressed to the feature dimension to obtain a two-dimensional image representation as So for a size of The feature map of , there will be a total of predefined detection anchor boxes.
步骤4.2,点云目标检测损失函数的设计。Step 4.2, the design of point cloud target detection loss function.
利用分类损失函数和回归损失函数对预先设置的检测锚框进行分类和回归,进而得到RPN检测出的物体。Use the classification loss function and the regression loss function to classify and regress the preset detection anchor frame, and then obtain the object detected by the RPN.
分类损失函数Lcls采用交叉熵损失函数,即:The classification loss function L cls adopts the cross entropy loss function, namely:
式中,n表示预先设置的检测锚框的个数,P(ai)表示第i个检测锚框预测的分数,Q(ai)表示该检测锚框真实的标签值。In the formula, n represents the number of preset detection anchor boxes, P(a i ) represents the predicted score of the ith detection anchor box, and Q(a i ) represents the real label value of the detection anchor box.
回归损失函数Lreg采用Smooth-L1损失函数,即:The regression loss function L reg adopts the Smooth-L1 loss function, namely:
式中,n表示预先设置的检测锚框的个数,v表示检测锚框的真实值,v′表示RPN预测的检测锚框的值。In the formula, n represents the number of preset detection anchor boxes, v represents the real value of the detection anchor box, and v′ represents the value of the detection anchor box predicted by RPN.
通过分类损失函数和回归损失函数的联合监督,网络最终可学习到点云目标检测的能力。Through the joint supervision of the classification loss function and the regression loss function, the network can finally learn the ability of point cloud object detection.
本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definition of the appended claims range.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110651776.XA CN113378854A (en) | 2021-06-11 | 2021-06-11 | Point cloud target detection method integrating original point cloud and voxel division |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110651776.XA CN113378854A (en) | 2021-06-11 | 2021-06-11 | Point cloud target detection method integrating original point cloud and voxel division |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113378854A true CN113378854A (en) | 2021-09-10 |
Family
ID=77573977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110651776.XA Pending CN113378854A (en) | 2021-06-11 | 2021-06-11 | Point cloud target detection method integrating original point cloud and voxel division |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113378854A (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113900119A (en) * | 2021-09-29 | 2022-01-07 | 苏州浪潮智能科技有限公司 | A method, system, storage medium and device for lidar vehicle detection |
CN113989797A (en) * | 2021-10-26 | 2022-01-28 | 清华大学苏州汽车研究院(相城) | Three-dimensional dynamic target detection method and device based on voxel point cloud fusion |
CN114120115A (en) * | 2021-11-19 | 2022-03-01 | 东南大学 | A point cloud target detection method that fuses point features and grid features |
CN114120270A (en) * | 2021-11-08 | 2022-03-01 | 同济大学 | Point cloud target detection method based on attention and sampling learning |
CN114155524A (en) * | 2021-10-29 | 2022-03-08 | 中国科学院信息工程研究所 | Single-stage 3D point cloud target detection method and device, computer equipment, and medium |
CN114219009A (en) * | 2021-11-24 | 2022-03-22 | 重庆理工大学 | A Point Cloud Downsampling Method Based on Partial Attention Mechanism and Mutual Supervision Loss |
CN114241225A (en) * | 2021-12-03 | 2022-03-25 | 南京信息工程大学 | Bottom-up three-dimensional target detection method for generating suggestion frame end to end |
CN114332494A (en) * | 2021-12-22 | 2022-04-12 | 北京邮电大学 | Three-dimensional target detection and identification method based on multi-source fusion under vehicle-road cooperation scene |
CN114463736A (en) * | 2021-12-28 | 2022-05-10 | 天津大学 | Multi-target detection method and device based on multi-mode information fusion |
CN114494183A (en) * | 2022-01-25 | 2022-05-13 | 哈尔滨医科大学附属第一医院 | A method and system for automatic measurement of acetabular radius based on artificial intelligence |
CN114494807A (en) * | 2021-12-22 | 2022-05-13 | 深圳元戎启行科技有限公司 | Feature extraction method, network training method, electronic device and storage medium |
CN114638953A (en) * | 2022-02-22 | 2022-06-17 | 深圳元戎启行科技有限公司 | Point cloud data segmentation method and device and computer readable storage medium |
CN114821033A (en) * | 2022-03-23 | 2022-07-29 | 西安电子科技大学 | A detection and recognition method and device for enhanced three-dimensional information based on laser point cloud |
CN114882495A (en) * | 2022-04-02 | 2022-08-09 | 华南理工大学 | 3D target detection method based on context-aware feature aggregation |
CN115082885A (en) * | 2022-06-27 | 2022-09-20 | 深圳见得空间科技有限公司 | Point cloud target detection method, device, equipment and storage medium |
CN115147569A (en) * | 2022-07-22 | 2022-10-04 | 广州小鹏自动驾驶科技有限公司 | Obstacle detection method, apparatus, device and storage medium |
CN115184364A (en) * | 2022-07-06 | 2022-10-14 | 青岛科技大学 | A tire bubble defect detection method and system based on 3D point cloud |
CN115222988A (en) * | 2022-07-17 | 2022-10-21 | 桂林理工大学 | PointEFF fine classification method for urban ground objects based on lidar point cloud data |
CN115375731A (en) * | 2022-07-29 | 2022-11-22 | 大连宗益科技发展有限公司 | 3D point cloud single-target tracking method of associated points and voxels and related device |
CN115471513A (en) * | 2022-11-01 | 2022-12-13 | 小米汽车科技有限公司 | Point cloud segmentation method and device |
CN115760983A (en) * | 2022-11-21 | 2023-03-07 | 清华大学深圳国际研究生院 | Point cloud 3D detection method and model based on self-adaption and multistage feature dimension reduction |
CN116664874A (en) * | 2023-08-02 | 2023-08-29 | 安徽大学 | A single-stage fine-grained lightweight point cloud 3D object detection system and method |
CN117058402A (en) * | 2023-08-15 | 2023-11-14 | 北京学图灵教育科技有限公司 | Real-time point cloud segmentation method and device based on 3D sparse convolution |
CN117475410A (en) * | 2023-12-27 | 2024-01-30 | 山东海润数聚科技有限公司 | Three-dimensional target detection method, system, equipment and medium based on foreground point screening |
CN115760983B (en) * | 2022-11-21 | 2025-07-04 | 清华大学深圳国际研究生院 | Point cloud 3D detection method and model based on adaptive and multi-level feature dimensionality reduction |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109118564A (en) * | 2018-08-01 | 2019-01-01 | 湖南拓视觉信息技术有限公司 | A kind of three-dimensional point cloud labeling method and device based on fusion voxel |
CN111160214A (en) * | 2019-12-25 | 2020-05-15 | 电子科技大学 | 3D target detection method based on data fusion |
CN112052860A (en) * | 2020-09-11 | 2020-12-08 | 中国人民解放军国防科技大学 | Three-dimensional target detection method and system |
CN112418084A (en) * | 2020-11-23 | 2021-02-26 | 同济大学 | Three-dimensional target detection method based on point cloud time sequence information fusion |
-
2021
- 2021-06-11 CN CN202110651776.XA patent/CN113378854A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109118564A (en) * | 2018-08-01 | 2019-01-01 | 湖南拓视觉信息技术有限公司 | A kind of three-dimensional point cloud labeling method and device based on fusion voxel |
CN111160214A (en) * | 2019-12-25 | 2020-05-15 | 电子科技大学 | 3D target detection method based on data fusion |
CN112052860A (en) * | 2020-09-11 | 2020-12-08 | 中国人民解放军国防科技大学 | Three-dimensional target detection method and system |
CN112418084A (en) * | 2020-11-23 | 2021-02-26 | 同济大学 | Three-dimensional target detection method based on point cloud time sequence information fusion |
Non-Patent Citations (2)
Title |
---|
CHARLES R. QI ET AL.: "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space", 《ADVANCES 31ST CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017)》 * |
TIANYUAN JIANG,ET AL.: "VIC-Net: Voxelization Information Compensation Network for Point Cloud 3D Object Detection", 《2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021)》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113900119A (en) * | 2021-09-29 | 2022-01-07 | 苏州浪潮智能科技有限公司 | A method, system, storage medium and device for lidar vehicle detection |
CN113900119B (en) * | 2021-09-29 | 2024-01-30 | 苏州浪潮智能科技有限公司 | Method, system, storage medium and equipment for laser radar vehicle detection |
CN113989797A (en) * | 2021-10-26 | 2022-01-28 | 清华大学苏州汽车研究院(相城) | Three-dimensional dynamic target detection method and device based on voxel point cloud fusion |
CN114155524A (en) * | 2021-10-29 | 2022-03-08 | 中国科学院信息工程研究所 | Single-stage 3D point cloud target detection method and device, computer equipment, and medium |
CN114120270A (en) * | 2021-11-08 | 2022-03-01 | 同济大学 | Point cloud target detection method based on attention and sampling learning |
CN114120115A (en) * | 2021-11-19 | 2022-03-01 | 东南大学 | A point cloud target detection method that fuses point features and grid features |
CN114120115B (en) * | 2021-11-19 | 2024-08-23 | 东南大学 | Point cloud target detection method integrating point features and grid features |
CN114219009B (en) * | 2021-11-24 | 2025-04-25 | 重庆理工大学 | A point cloud downsampling method based on partial attention mechanism and mutual supervision loss |
CN114219009A (en) * | 2021-11-24 | 2022-03-22 | 重庆理工大学 | A Point Cloud Downsampling Method Based on Partial Attention Mechanism and Mutual Supervision Loss |
CN114241225A (en) * | 2021-12-03 | 2022-03-25 | 南京信息工程大学 | Bottom-up three-dimensional target detection method for generating suggestion frame end to end |
CN114494807A (en) * | 2021-12-22 | 2022-05-13 | 深圳元戎启行科技有限公司 | Feature extraction method, network training method, electronic device and storage medium |
CN114332494A (en) * | 2021-12-22 | 2022-04-12 | 北京邮电大学 | Three-dimensional target detection and identification method based on multi-source fusion under vehicle-road cooperation scene |
CN114463736A (en) * | 2021-12-28 | 2022-05-10 | 天津大学 | Multi-target detection method and device based on multi-mode information fusion |
CN114494183A (en) * | 2022-01-25 | 2022-05-13 | 哈尔滨医科大学附属第一医院 | A method and system for automatic measurement of acetabular radius based on artificial intelligence |
CN114494183B (en) * | 2022-01-25 | 2024-04-02 | 哈尔滨医科大学附属第一医院 | Automatic acetabular radius measurement method and system based on artificial intelligence |
CN114638953B (en) * | 2022-02-22 | 2023-12-22 | 深圳元戎启行科技有限公司 | Point cloud data segmentation method and device and computer readable storage medium |
CN114638953A (en) * | 2022-02-22 | 2022-06-17 | 深圳元戎启行科技有限公司 | Point cloud data segmentation method and device and computer readable storage medium |
CN114821033A (en) * | 2022-03-23 | 2022-07-29 | 西安电子科技大学 | A detection and recognition method and device for enhanced three-dimensional information based on laser point cloud |
CN114882495B (en) * | 2022-04-02 | 2024-04-12 | 华南理工大学 | 3D target detection method based on context-aware feature aggregation |
CN114882495A (en) * | 2022-04-02 | 2022-08-09 | 华南理工大学 | 3D target detection method based on context-aware feature aggregation |
CN115082885A (en) * | 2022-06-27 | 2022-09-20 | 深圳见得空间科技有限公司 | Point cloud target detection method, device, equipment and storage medium |
CN115184364A (en) * | 2022-07-06 | 2022-10-14 | 青岛科技大学 | A tire bubble defect detection method and system based on 3D point cloud |
CN115222988A (en) * | 2022-07-17 | 2022-10-21 | 桂林理工大学 | PointEFF fine classification method for urban ground objects based on lidar point cloud data |
CN115147569A (en) * | 2022-07-22 | 2022-10-04 | 广州小鹏自动驾驶科技有限公司 | Obstacle detection method, apparatus, device and storage medium |
CN115375731A (en) * | 2022-07-29 | 2022-11-22 | 大连宗益科技发展有限公司 | 3D point cloud single-target tracking method of associated points and voxels and related device |
CN115471513A (en) * | 2022-11-01 | 2022-12-13 | 小米汽车科技有限公司 | Point cloud segmentation method and device |
CN115760983A (en) * | 2022-11-21 | 2023-03-07 | 清华大学深圳国际研究生院 | Point cloud 3D detection method and model based on self-adaption and multistage feature dimension reduction |
CN115760983B (en) * | 2022-11-21 | 2025-07-04 | 清华大学深圳国际研究生院 | Point cloud 3D detection method and model based on adaptive and multi-level feature dimensionality reduction |
CN116664874B (en) * | 2023-08-02 | 2023-10-20 | 安徽大学 | Single-stage fine-granularity light-weight point cloud 3D target detection system and method |
CN116664874A (en) * | 2023-08-02 | 2023-08-29 | 安徽大学 | A single-stage fine-grained lightweight point cloud 3D object detection system and method |
CN117058402B (en) * | 2023-08-15 | 2024-03-12 | 北京学图灵教育科技有限公司 | Real-time point cloud segmentation method and device based on 3D sparse convolution |
CN117058402A (en) * | 2023-08-15 | 2023-11-14 | 北京学图灵教育科技有限公司 | Real-time point cloud segmentation method and device based on 3D sparse convolution |
CN117475410B (en) * | 2023-12-27 | 2024-03-15 | 山东海润数聚科技有限公司 | Three-dimensional target detection method, system, equipment and medium based on foreground point screening |
CN117475410A (en) * | 2023-12-27 | 2024-01-30 | 山东海润数聚科技有限公司 | Three-dimensional target detection method, system, equipment and medium based on foreground point screening |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113378854A (en) | Point cloud target detection method integrating original point cloud and voxel division | |
CN111242041B (en) | Laser radar three-dimensional target rapid detection method based on pseudo-image technology | |
Sindagi et al. | Mvx-net: Multimodal voxelnet for 3d object detection | |
CN113850270B (en) | Semantic scene completion method and system based on point cloud-voxel aggregation network model | |
CN112529015B (en) | Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping | |
CN110135267B (en) | A detection method for small objects in large scene SAR images | |
CN113177555B (en) | Object processing method and device based on cross-level, cross-scale and cross-attention mechanism | |
KR102333682B1 (en) | Semantic segmentation system in 3D point cloud and semantic segmentation method in 3D point cloud using the same | |
CN108509978A (en) | The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN | |
CN111695494A (en) | Three-dimensional point cloud data classification method based on multi-view convolution pooling | |
CN114332620A (en) | A method for vehicle target recognition in airborne images based on feature fusion and attention mechanism | |
Mosella-Montoro et al. | 2D–3D geometric fusion network using multi-neighbourhood graph convolution for RGB-D indoor scene classification | |
CN112560865A (en) | Semantic segmentation method for point cloud under outdoor large scene | |
Megalingam et al. | Concurrent detection and identification of multiple objects using YOLO algorithm | |
CN117037142A (en) | 3D target detection method based on deep learning | |
CN114550161A (en) | An End-to-End 3D Object Sparse Detection Method | |
CN116665202A (en) | A 3D object detection method in spherical coordinates based on special-shaped 3D convolution | |
Wang et al. | A survey of 3D point cloud and deep learning-based approaches for scene understanding in autonomous driving | |
CN116563488A (en) | A 3D Object Detection Method Based on Point Cloud Columnarization | |
Drobnitzky et al. | Survey and systematization of 3D object detection models and methods | |
CN117710255A (en) | Point cloud completion method based on teacher-student network and course learning | |
CN114821033A (en) | A detection and recognition method and device for enhanced three-dimensional information based on laser point cloud | |
CN114648698A (en) | Improved 3D target detection system based on PointPillars | |
CN118625342A (en) | A multi-sensor fusion intelligent vehicle environment perception method and model based on occupancy network | |
Raparthi et al. | Machine Learning Based Deep Cloud Model to Enhance Robustness and Noise Interference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210910 |
|
RJ01 | Rejection of invention patent application after publication |