CN111681212B - Three-dimensional target detection method based on laser radar point cloud data - Google Patents

Three-dimensional target detection method based on laser radar point cloud data Download PDF

Info

Publication number
CN111681212B
CN111681212B CN202010433849.3A CN202010433849A CN111681212B CN 111681212 B CN111681212 B CN 111681212B CN 202010433849 A CN202010433849 A CN 202010433849A CN 111681212 B CN111681212 B CN 111681212B
Authority
CN
China
Prior art keywords
feature
map
grid
view
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010433849.3A
Other languages
Chinese (zh)
Other versions
CN111681212A (en
Inventor
郭裕兰
张永聪
陈铭林
敖晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010433849.3A priority Critical patent/CN111681212B/en
Publication of CN111681212A publication Critical patent/CN111681212A/en
Application granted granted Critical
Publication of CN111681212B publication Critical patent/CN111681212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Optical Radar Systems And Details Thereof (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional target detection method based on laser radar point cloud data, which adopts a dense data expression form according to the data characteristics of laser radar point cloud so as to obtain dense characteristics and convert three-dimensional characteristics into two-dimensional characteristics, thereby effectively improving the operation efficiency and the operation precision.

Description

一种基于激光雷达点云数据的三维目标检测方法A three-dimensional target detection method based on lidar point cloud data

技术领域technical field

本发明涉及自动驾驶中的三维目标检测技术领域,具体涉及基于激光雷达点云数据的三维目标检测方法。The invention relates to the technical field of three-dimensional target detection in automatic driving, in particular to a three-dimensional target detection method based on laser radar point cloud data.

背景技术Background technique

激光雷达能获取三维空间的物体信息,其通过物体表面的反射时间计算出物体在空间中的位置信息。Lidar can obtain object information in three-dimensional space, and calculate the position information of the object in space through the reflection time of the surface of the object.

在车辆行使过程中,对位于车辆四周的三维目标检测是自动驾驶中的基本组成部分。当前的自动驾驶车辆一般是通过融合图像RGB信息与激光雷达点云信息进行目标检测。本专利只使用激光雷达点云数据作为输入进行感兴趣物体的检测。尽管在二维图像里面的二维目标检测已经获得了重大进展,已经达到了极高的检测精度,但是在在无人驾驶等场景下的三维激光雷达点云检测效果仍然不佳,而这主要由激光雷达点云的稀疏性导致的。In the process of vehicle driving, the detection of three-dimensional objects located around the vehicle is a basic part of automatic driving. Current autonomous vehicles generally detect objects by fusing image RGB information with lidar point cloud information. This patent only uses lidar point cloud data as input for the detection of objects of interest. Although significant progress has been made in 2D target detection in 2D images and a very high detection accuracy has been achieved, the detection effect of 3D LiDAR point cloud in scenarios such as unmanned driving is still poor, and this is mainly Caused by the sparsity of the lidar point cloud.

进一步的说,苹果公司在2018年提出了VoxelNet来对激光雷达点云输入数据进行目标检测。其将空间中的激光雷达点云数据进行体素化分割,将空间分割为一个个独立体素并将体素内的点云数据使用类似PointNet的网络VFELayer(特征学习网络)进行特征提取。最后使用三维卷并在俯视图方向对特征进行拼接后进行物体检测。Further, Apple proposed VoxelNet in 2018 to perform object detection on lidar point cloud input data. It voxelizes the lidar point cloud data in the space, divides the space into independent voxels, and uses the PointNet-like network VFELayer (feature learning network) to extract features from the point cloud data in the voxels. Finally, object detection is performed using the 3D volume and stitching the features in the top view direction.

但是,VoxelNet将三维空间进行三维分割,对每个分割出来的体素内的点云进行特征提取,形成了一张四维的特征图(三维空间加一维特征),从而需要使用三维卷积操作处理。其运算速度相较于二维卷积操作慢了一个量级。同时,由于点云的稀疏性,绝大部分的体素都是空的,因此三维卷积操作有极大的运算操作都是无用的卷积操作,但是仍需要浪费运算资源。However, VoxelNet divides the three-dimensional space into three-dimensional space, and extracts features from the point cloud in each segmented voxel to form a four-dimensional feature map (three-dimensional space plus one-dimensional features), which requires the use of three-dimensional convolution operations. deal with. Its operation speed is an order of magnitude slower than the two-dimensional convolution operation. At the same time, due to the sparseness of the point cloud, most of the voxels are empty, so the three-dimensional convolution operation has a huge operation and is a useless convolution operation, but it still needs to waste computing resources.

更进一步的说,PointPillars也是一种基于空间体素分割的网络,与VoxelNet不同的是其在俯视图方向上将空间切割成一条条长方体柱形,将柱形内的点云进行特征提取。这种方式处理出来的特征图就是一张三维特征图(二维空间加一维特征),处理这种特征图只需要使用二维卷积操作,与一般的RGB图像物体检测的特征图一样,因此可以直接采用二维RGB图像检测的后续框架进行处理。速度和精度性能相较于VoxelNet高许多。Furthermore, PointPillars is also a network based on spatial voxel segmentation. Different from VoxelNet, it cuts the space into strips of cuboid columns in the direction of the top view, and extracts features from the point cloud in the column. The feature map processed in this way is a three-dimensional feature map (two-dimensional space plus one-dimensional features). To process this feature map, only two-dimensional convolution operations are required, which is the same as the feature map of general RGB image object detection. Therefore, the subsequent framework of 2D RGB image detection can be directly used for processing. The speed and accuracy performance is much higher than that of VoxelNet.

PointPillars在柱形内特征提取的过程中,将同一竖直方向上的点云直接融合成了一个特征,这种融合方式较为粗糙,竖直方向上特征分布不明显,同时鸟瞰特征图中的特征也十分稀疏,导致二维卷积操作中有大量的算力被浪费。In the process of feature extraction in the column, PointPillars directly fuses the point clouds in the same vertical direction into one feature. This fusion method is relatively rough, and the feature distribution in the vertical direction is not obvious. At the same time, the features in the bird's-eye feature map It is also very sparse, resulting in a lot of wasted computing power in the two-dimensional convolution operation.

发明内容SUMMARY OF THE INVENTION

鉴于现有技术的不足,本发明探讨一种基于激光雷达点云数据的三维目标检测方法,旨在于解决VoxelNet与PointPillars的特征稀疏问题,构建一种稠密的特征表达形式。相比于VoxelNet,其拥有二维卷积的运算效率,相比于Pointpillars,其竖直方向上的点云特征并非强行压缩在一起,保留了更多物体竖直方向上的特征,从而更好表达竖直方向上特征比较重要的目标。In view of the deficiencies of the prior art, the present invention discusses a three-dimensional target detection method based on lidar point cloud data, aiming at solving the feature sparse problem of VoxelNet and PointPillars, and constructing a dense feature expression form. Compared with VoxelNet, it has the operational efficiency of two-dimensional convolution. Compared with Pointpillars, the point cloud features in the vertical direction are not forcibly compressed together, and more features in the vertical direction of the object are retained, which is better. Represents a target whose features are more important in the vertical direction.

更进一步的,本发明根据激光雷达点云的数据特征,采取一种稠密的数据表达形式,以得到稠密的特征并将三维特征转换为二维特征,有效提高运算效率并提升运算精度。Furthermore, the present invention adopts a dense data expression form according to the data characteristics of the lidar point cloud, so as to obtain dense features and convert three-dimensional features into two-dimensional features, thereby effectively improving computing efficiency and improving computing accuracy.

为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于激光雷达点云数据的三维目标检测方法,所述方法包括以下步骤:A three-dimensional target detection method based on lidar point cloud data, the method comprises the following steps:

将点云表示成密集形的表面图,图中行数为K,其中K是激光雷达的通道数;给定一个激光雷达点p=(x,y,z,r,)l,其中(x,y,z),r和l∈{0,...,K-1}分别表示位置、反射率和点所在层数;点p位于表面图Sh×w的网格(h,w)中,其中h=l,

Figure BDA0002501493530000031
The point cloud is represented as a dense surface map, and the number of rows in the figure is K, where K is the number of channels of the lidar; given a lidar point p=(x,y,z,r,)l, where (x, y, z), r and l∈{0,...,K-1} represent the position, reflectivity and layer number of the point respectively; point p is located in the grid (h,w) of the surface map S h×w , where h=l,
Figure BDA0002501493530000031

表面图根据场景的表面,将三维点投影到二维网格中,对于表面图的每个网格(h,w),通过平均网格内的所有点,获得质心点

Figure BDA0002501493530000032
(h,w)网格内的深度的计算如下:The surface map projects 3D points into a 2D grid according to the surface of the scene, and for each grid (h,w) of the surface map, obtains the centroid point by averaging all points in the grid
Figure BDA0002501493530000032
The depth within the (h,w) grid is calculated as follows:

Figure BDA0002501493530000033
Figure BDA0002501493530000033

然后可以获得表面深度图Dmap={d}∈RH×W;表面深度图将深度信息存储在每个网格中;The surface depth map D map ={d}∈R H×W can then be obtained; the surface depth map stores depth information in each grid;

基于体素特征编码层的网格特征编码器,体素特征编码层处理表面图的每个网格以生成该网格的特征,从而生成规则的2D表曲面特征图

Figure BDA0002501493530000034
若网格没有任何点,则使用零填充;且网格特征编码器不执行体素特征编码层中的随机采样;A grid feature encoder based on a voxel feature encoding layer that processes each grid of the surface map to generate features for that grid, resulting in a regular 2D surface surface feature map
Figure BDA0002501493530000034
If the grid does not have any points, zero padding is used; and the grid feature encoder does not perform random sampling in the voxel feature encoding layer;

具有N种不同分辨率的表面图,即SH×W,

Figure BDA0002501493530000041
由网格特征编码器对其分别进行独立处理,生成N个表面特征图,即SH×W,
Figure BDA0002501493530000042
然后,由特征串接得到一个多尺度表面特征F∈R3C×H×W:Surface maps with N different resolutions, namely S H×W ,
Figure BDA0002501493530000041
They are processed independently by the grid feature encoder to generate N surface feature maps, namely S H×W ,
Figure BDA0002501493530000042
Then, a multi-scale surface feature F∈R 3C×H×W is obtained by feature concatenation:

Figure BDA0002501493530000043
Figure BDA0002501493530000043

此多尺度表面特征用作后续模块的初始输入;This multi-scale surface feature is used as the initial input for subsequent modules;

具有表面特征卷积模块,并通过增加一个低分辨率输出的网络反卷积层来获得全分辨率输出;表面特征卷积模块生成的

Figure BDA0002501493530000044
中的前视图特征具有与其输入表面特征F相同的分辨率,但特征的维度不同;has a surface feature convolution module, and obtains a full-resolution output by adding a network deconvolution layer with a low-resolution output; the surface feature convolution module generates
Figure BDA0002501493530000044
The front view feature in F has the same resolution as its input surface feature F, but the dimension of the feature is different;

具有基于深度表面图的前视图特征从前视图到鸟瞰图的视图转换模块,不同对象的深度不同,但是从2d前视伪图像中获取的绝对深度是不相等的;从俯视图特征中得到物体的深度,并在视图变换后对高度进行回归。Front view features based on depth surface map View conversion module from front view to bird's-eye view, the depth of different objects is different, but the absolute depth obtained from the 2d front-view pseudo-image is not equal; the depth of the object is obtained from the top-view feature , and regresses the height after the view transform.

从热图HO派生的点表示俯视图中检测对象中心的位置,即,x,z,而参数图PO包含对象的参数,检测网络由一个公共特征提取器和两个分支,即热图分支和参数分支组成。The points derived from the heatmap HO represent the position of the center of the detected object in the top view, i.e., x, z, while the parameter map PO contains the parameters of the object, the detection network consists of a common feature extractor and two branches, the heatmap branch and parameter branches.

需要说明的是,视图转换模块有两个步骤:扩展和压缩;It should be noted that the view conversion module has two steps: expansion and compression;

在扩展步骤中,FV特征中每个(h,w)位置的特征f将会根据深度信息D映射到扩增特征图E的相应位置(d,h,w),其中In the expansion step, the feature f at each (h, w) position in the FV feature will be mapped to the corresponding position (d, h, w) of the augmented feature map E according to the depth information D, where

Figure BDA0002501493530000045
Figure BDA0002501493530000045

这里R是最大深度范围,若Dmap(h,w)>R,则设置d=D;Here R is the maximum depth range, if D map (h, w)>R, set d=D;

在压缩步骤中,通过随机选择在其H轴上挤压扩展的特征图,得到2D特征图,其大小为D×W,维度c';最后,使用M个连续的2D卷积层处理输出,从而得到最终的俯视图特征图。In the compression step, a 2D feature map is obtained by extruding the expanded feature map on its H axis by randomly selecting, its size is D × W, dimension c'; finally, the output is processed using M consecutive 2D convolutional layers, Thus, the final top-view feature map is obtained.

本发明有益效果在于,构建了一种稠密的特征表达形式。不但拥有二维卷积的运算效率,而且竖直方向上的点云特征并非强行压缩在一起,保留了更多物体竖直方向上的特征,从而更好表达竖直方向上特征比较重要的目标。The beneficial effect of the present invention is that a dense feature expression form is constructed. Not only has the operational efficiency of two-dimensional convolution, but also the point cloud features in the vertical direction are not forcibly compressed together, retaining more features in the vertical direction of the object, so as to better express the more important objects in the vertical direction. .

附图说明Description of drawings

图1为本发明的三维目标检测总框架示意图;1 is a schematic diagram of the overall framework of three-dimensional target detection of the present invention;

图2为本发明的长方体体素与表面图;Fig. 2 is the rectangular parallelepiped voxel and surface map of the present invention;

图3为本发明的评测结果对比图示意图。FIG. 3 is a schematic diagram of a comparison diagram of the evaluation results of the present invention.

具体实施方式Detailed ways

以下将结合附图对本发明作进一步的描述,需要说明的是,以下实施例以本技术方案为前提,给出了详细的实施方式和具体的操作过程,但本发明的保护范围并不限于本实施例。The present invention will be further described below in conjunction with the accompanying drawings. It should be noted that the following examples are based on the technical solution, and provide detailed implementations and specific operation processes, but the protection scope of the present invention is not limited to the present invention. Example.

本发明为一种基于激光雷达点云数据的三维目标检测方法,所述方法包括以下步骤:The present invention is a three-dimensional target detection method based on lidar point cloud data, and the method comprises the following steps:

将点云表示成密集形的表面图,图中行数为K,其中K是激光雷达的通道数;给定一个激光雷达点p=(x,y,z,r,)l,其中(x,y,z),r和l∈{0,...,K-1}分别表示位置、反射率和点所在层数;点p位于表面图Sh×w的网格(h,w)中,其中h=l,

Figure BDA0002501493530000061
The point cloud is represented as a dense surface map, and the number of rows in the figure is K, where K is the number of channels of the lidar; given a lidar point p=(x,y,z,r,)l, where (x, y, z), r and l∈{0,...,K-1} represent the position, reflectivity and layer number of the point respectively; point p is located in the grid (h,w) of the surface map S h×w , where h=l,
Figure BDA0002501493530000061

表面图根据场景的表面,将三维点投影到二维网格中,对于表面图的每个网格(h,w),通过平均网格内的所有点,获得质心点

Figure BDA0002501493530000062
(h,w)网格内的深度的计算如下:The surface map projects 3D points into a 2D grid according to the surface of the scene, and for each grid (h,w) of the surface map, obtains the centroid point by averaging all points in the grid
Figure BDA0002501493530000062
The depth within the (h,w) grid is calculated as follows:

Figure BDA0002501493530000063
Figure BDA0002501493530000063

然后可以获得表面深度图Dmap={d}∈RH×W;表面深度图将深度信息存储在每个网格中;The surface depth map D map ={d}∈R H×W can then be obtained; the surface depth map stores depth information in each grid;

基于体素特征编码层的网格特征编码器,体素特征编码层处理表面图的每个网格以生成该网格的特征,从而生成规则的2D表曲面特征图

Figure BDA0002501493530000064
若网格没有任何点,则使用零填充;且网格特征编码器不执行体素特征编码层中的随机采样;A grid feature encoder based on a voxel feature encoding layer that processes each grid of the surface map to generate features for that grid, resulting in a regular 2D surface surface feature map
Figure BDA0002501493530000064
If the grid does not have any points, zero padding is used; and the grid feature encoder does not perform random sampling in the voxel feature encoding layer;

具有N种不同分辨率的表面图,即SH×W,

Figure BDA0002501493530000065
由网格特征编码器对其分别进行独立处理,生成N个表面特征图,即SH×W,
Figure BDA0002501493530000066
然后,由特征串接得到一个多尺度表面特征F∈R3C×H×W:Surface maps with N different resolutions, namely S H×W ,
Figure BDA0002501493530000065
They are processed independently by the grid feature encoder to generate N surface feature maps, namely S H×W ,
Figure BDA0002501493530000066
Then, a multi-scale surface feature F∈R 3C×H×W is obtained by feature concatenation:

Figure BDA0002501493530000067
Figure BDA0002501493530000067

此多尺度表面特征用作后续模块的初始输入;This multi-scale surface feature is used as the initial input for subsequent modules;

具有表面特征卷积模块,并通过增加一个低分辨率输出的网络反卷积层来获得全分辨率输出;表面特征卷积模块生成的

Figure BDA0002501493530000068
中的前视图特征具有与其输入表面特征F相同的分辨率,但特征的维度不同;has a surface feature convolution module, and obtains a full-resolution output by adding a network deconvolution layer with a low-resolution output; the surface feature convolution module generates
Figure BDA0002501493530000068
The front view feature in F has the same resolution as its input surface feature F, but the dimension of the feature is different;

具有基于深度表面图的前视图特征从前视图到鸟瞰图的视图转换模块,不同对象的深度不同,但是从2d前视伪图像中获取的绝对深度是不相等的;从俯视图特征中得到物体的深度,并在视图变换后对高度进行回归。Front view features based on depth surface map View conversion module from front view to bird's-eye view, the depth of different objects is different, but the absolute depth obtained from the 2d front-view pseudo-image is not equal; the depth of the object is obtained from the top-view feature , and regresses the height after the view transform.

从热图HO派生的点表示俯视图中检测对象中心的位置,即,x,z,而参数图PO包含对象的参数,检测网络由一个公共特征提取器和两个分支,即热图分支和参数分支组成。The points derived from the heatmap HO represent the position of the center of the detected object in the top view, i.e., x, z, while the parameter map PO contains the parameters of the object, the detection network consists of a common feature extractor and two branches, the heatmap branch and parameter branches.

需要说明的是,视图转换模块有两个步骤:扩展和压缩;It should be noted that the view conversion module has two steps: expansion and compression;

在扩展步骤中,FV特征中每个(h,w)位置的特征f将会根据深度信息D映射到扩增特征图E的相应位置(d,h,w),其中In the expansion step, the feature f at each (h, w) position in the FV feature will be mapped to the corresponding position (d, h, w) of the augmented feature map E according to the depth information D, where

Figure BDA0002501493530000071
Figure BDA0002501493530000071

这里R是最大深度范围,若Dmap(h,w)>R,则设置d=D;Here R is the maximum depth range, if D map (h, w)>R, set d=D;

在压缩步骤中,通过随机选择在其H轴上挤压扩展的特征图,得到2D特征图,其大小为D×W,维度c';最后,使用M个连续的2D卷积层处理输出,从而得到最终的俯视图特征图。In the compression step, a 2D feature map is obtained by extruding the expanded feature map on its H axis by randomly selecting, its size is D × W, dimension c'; finally, the output is processed using M consecutive 2D convolutional layers, Thus, the final top-view feature map is obtained.

实施例Example

表面图(Surface map)Surface map

激光雷达是自动驾驶中常用的传感器。例如,Velodyne HDL-64E激光雷达按激光束的顺序记录64行点,而相邻扫描线之间的点分布是均匀的。基于对激光雷达扫描机理的观察,本发明将点云表示成密集的形式,即表面图(Surface map)。表面图是一个二维伪图像,行数为K,其中K是激光雷达的通道数。沿扫描方向的点(通常是水平的)被放置在表面图的一行中,而沿表面图的一列的点对应于在单次扫描期间在不同通道(即激光束)获得的点。(具有相同的水平但不同的垂直角)。Lidar is a commonly used sensor in autonomous driving. For example, the Velodyne HDL-64E lidar records 64 lines of points in the order of the laser beam, while the point distribution between adjacent scan lines is uniform. Based on the observation of the scanning mechanism of the lidar, the present invention expresses the point cloud in a dense form, that is, a surface map. The surface map is a 2D pseudo-image with K rows, where K is the number of lidar channels. Points along the scan direction (usually horizontal) are placed in a row of the surface map, while points along a column of the surface map correspond to points obtained at different channels (ie laser beams) during a single scan. (with the same horizontal but different vertical angles).

给定一个激光雷达点p=(x,y,z,r,l),其中(x,y,z),r和l∈{0,...,K-1}分别表示位置、反射率和点所在层数。点p位于表面图Sh×w的网格(h,w)中,其中h=l,

Figure BDA0002501493530000081
表面图根据场景的表面,将三维点投影到二维网格中。Given a lidar point p = (x, y, z, r, l), where (x, y, z), r and l∈{0,...,K-1} represent the position, reflectivity, respectively and the number of layers where the point is located. The point p is located in the grid (h,w) of the surface map Sh×w , where h=l,
Figure BDA0002501493530000081
Surface maps project 3D points onto a 2D grid based on the surface of the scene.

对于表面图的每个网格(h,w),通过平均网格内的所有点,获得质心点

Figure BDA0002501493530000082
(h,w)网格内的深度的计算如下:For each grid (h,w) of the surface map, obtain the centroid point by averaging all points within the grid
Figure BDA0002501493530000082
The depth within the (h,w) grid is calculated as follows:

Figure BDA0002501493530000083
Figure BDA0002501493530000083

然后可以获得表面深度图Dmap={d}∈RH×W。表面深度图将深度信息存储在每个网格中,并将在后文视图转换模块中使用。The surface depth map Dmap ={d}∈R H×W can then be obtained. The surface depth map stores depth information in each grid and will be used later in the view conversion module.

表面网络(SurfaceNet)SurfaceNet

SurfaceNet提出了一种利用表面图(Surface map)表示来预测物体的精确检测框方法。它由四个模块组成(如图1所示):1)网格特征编码器,可以处理每个网格内任意数量的点;2)表面特征卷积模块,采用二维主干网络提取高层次特征;3)视图转换模块,将特征从前视图Front View(FV)转换为鸟瞰图Bird’s Eye View(BEV);以及4)3D检测框参数的预测和无锚点3D中心热图预测。SurfaceNet proposes an accurate detection box method for predicting objects using surface map representations. It consists of four modules (shown in Figure 1): 1) a grid feature encoder, which can process any number of points within each grid; 2) a surface feature convolution module, which uses a 2D backbone network to extract high-level features; 3) a view conversion module, which converts features from Front View (FV) to Bird's Eye View (BEV); and 4) 3D detection box parameter prediction and anchor-free 3D center heatmap prediction.

网格特征编码器Grid Feature Encoder

由于点云的不规则性,网格内的点个数是任意的。网格特征编码器设计用于将任意数量的点编码为具有固定维度C的稠密特征,如图1(a)所示。Due to the irregularity of the point cloud, the number of points in the grid is arbitrary. The trellis feature encoder is designed to encode an arbitrary number of points into dense features with a fixed dimension C, as shown in Fig. 1(a).

本发明的编码器基于体素特征编码(Voxel Feature Encoder)层。VFE层处理Surface map的每个网格以生成该网格的特征,从而生成规则的2D表曲面特征图

Figure BDA0002501493530000091
如果网格没有任何点,则使用零填充。此外,本发明的网格特征编码器不执行VFE中的随机采样。这里有两个原因:1)每个网格点的数量很少;2)每个网格之间点的分布是近似均匀的,并且不需要减少点的不均衡性。The encoder of the present invention is based on a Voxel Feature Encoder layer. The VFE layer processes each mesh of the Surface map to generate features for that mesh, resulting in a regular 2D surface feature map
Figure BDA0002501493530000091
If the grid does not have any points, zero padding is used. Furthermore, the trellis feature encoder of the present invention does not perform random sampling in the VFE. There are two reasons for this: 1) the number of points per grid is small; 2) the distribution of points between each grid is approximately uniform and there is no need to reduce the unevenness of points.

为了便于多尺度三维目标检测,本发明采用了N种不同分辨率的表面图,(即SH×W,

Figure BDA0002501493530000092
)。由网格特征编码器对其分别进行独立处理,生成三个表面特征图(即SH×W,
Figure BDA0002501493530000093
)。然后,由特征串接得到一个多尺度表面特征F∈R3C×H×W:In order to facilitate multi-scale three-dimensional target detection, the present invention adopts N surface maps with different resolutions, (ie S H×W ,
Figure BDA0002501493530000092
). They are processed independently by the grid feature encoder to generate three surface feature maps (i.e. S H×W ,
Figure BDA0002501493530000093
). Then, a multi-scale surface feature F∈R 3C×H×W is obtained by feature concatenation:

Figure BDA0002501493530000094
Figure BDA0002501493530000094

此多尺度表面特征用作后续模块的初始输入。为了更加清晰表示,本发明省略了“多尺度(multi-scale)”,并使用表面特征(surface feature)来表示多尺度表面特征(multi-scale surface feature)。This multiscale surface feature is used as the initial input for subsequent modules. For clearer representation, the present invention omits "multi-scale" and uses surface features to represent multi-scale surface features.

表面特征卷积模块Surface Feature Convolutional Module(SFCM)Surface Feature Convolutional Module (SFCM)

由于表面特征的感受野非常有限(即,仅在其底层网格内),因此本发明使用2D卷积神经网络(参见图1(b))来更有效地逐步扩大感受野。Since the receptive field of a surface feature is very limited (ie, only within its underlying grid), the present invention uses a 2D convolutional neural network (see Fig. 1(b)) to more efficiently progressively enlarge the receptive field.

一般而言,由2D卷积神经网络生成的特征图由于算力原因,具有比其输入图像更低的分辨率。为了避免低分辨率特征引起的性能下降(特别是在小目标检测中),本发明设计了一个表面特征卷积模块SFCM,并通过增加一个低分辨率输出的网络反卷积层来获得全分辨率输出。因此,该模块生成的

Figure BDA0002501493530000101
中的前视图特征具有与其输入表面特征F相同的分辨率,但特征的维度不同。In general, feature maps generated by 2D convolutional neural networks have lower resolution than their input images due to computational power. In order to avoid the performance degradation caused by low-resolution features (especially in small target detection), the present invention designs a surface feature convolution module SFCM, and obtains full resolution by adding a low-resolution output network deconvolution layer rate output. Therefore, the module generates
Figure BDA0002501493530000101
The front-view features in F have the same resolution as their input surface features F, but the dimensions of the features are different.

视图转换模块view conversion module

前视图特征将单个网格的局部表面信息及其邻接关系嵌入到前视图中。然而,从前视图特征中直接预测绝对深度信息是很困难的。但是,根据前视图特征的位置,本发明可以判别出高度和宽度信息。因此,本发明提出了基于深度表面图的前视图特征从前视图到鸟瞰图的视图转换模块,如图1(c)所示。The Front View feature embeds the local surface information of a single mesh and its adjacencies into the front view. However, it is difficult to directly predict absolute depth information from front view features. However, based on the location of the front view feature, the present invention can discriminate height and width information. Therefore, the present invention proposes a view conversion module from the front view to the bird's-eye view based on the front view features of the depth surface map, as shown in Fig. 1(c).

使用视图转换模块的原因在于:1)不同对象的深度不同,但是从2d前视伪图像中获取的绝对深度是不相等的;2)而不同物体的高度是相似的,因为它们总是站在地面上。因此,本发明可以很容易地从俯视图(BEV)特征中得到物体的深度,并在视图变换后对高度进行回归。The reasons for using the view transformation module are: 1) the depths of different objects are different, but the absolute depths obtained from the 2d front-view pseudo-image are not equal; 2) the heights of different objects are similar because they always stand on the ground. Therefore, the present invention can easily obtain the depth of an object from top view (BEV) features and regress the height after view transformation.

具体来说,视图转换模块有两个步骤:扩展和压缩。在扩展步骤中,FV特征中每个(h,w)位置的特征f将会根据深度信息D映射到扩增特征图E的相应位置(d,h,w),其中Specifically, the view transformation module has two steps: expansion and compression. In the expansion step, the feature f at each (h, w) position in the FV feature will be mapped to the corresponding position (d, h, w) of the augmented feature map E according to the depth information D, where

Figure BDA0002501493530000102
Figure BDA0002501493530000102

这里R是最大深度范围。如果Dmap(h,w)>R,则设置d=D。Here R is the maximum depth range. If Dmap (h,w)>R, then set d=D.

在压缩步骤中,本发明通过随机选择在其H轴上挤压扩展的特征图,得到2D特征图,其大小为D×W,维度c'。In the compression step, the present invention obtains a 2D feature map by randomly selecting and extruding the expanded feature map on its H axis, the size of which is D×W and the dimension c′.

最后,使用M个连续的2D卷积层来处理上述输出,从而得到最终的俯视图特征图。Finally, M consecutive 2D convolutional layers are used to process the above output, resulting in the final top-view feature map.

无锚点框的三维目标检测3D object detection without anchor boxes

图2所示,本发明将3D对象视为一些具有属性的点。从热图HO派生的点表示俯视图中检测对象中心的位置(即,x,z),而参数图PO包含对象的参数,例如高度y、大小(h,w,l)和旋转角度θ。本发明的检测网络由一个公共特征提取器和两个分支(即热图分支和参数分支)组成。本发明使用与VoxelNet中的RPN相似的公共特征提取模块。不同的是,本发明使用二维卷积层直接处理视图转换模块输出的二维特征,而非三维卷积层。热图和参数图分支具有相同的拓扑结构,由M个连续的2D卷积层组成。As shown in Figure 2, the present invention treats 3D objects as some points with attributes. The points derived from the heatmap HO represent the position (i.e., x, z) of the center of the detected object in the top view, while the parameter map PO contains the parameters of the object, such as height y, size (h, w, l ) and rotation angle θ . The detection network of the present invention consists of a common feature extractor and two branches (ie, the heatmap branch and the parameter branch). The present invention uses a common feature extraction module similar to RPN in VoxelNet. The difference is that the present invention uses a two-dimensional convolutional layer to directly process the two-dimensional features output by the view conversion module instead of a three-dimensional convolutional layer. The heatmap and parameter map branches have the same topology and consist of M consecutive 2D convolutional layers.

如图3所示,本发明在KITTI 3D目标检测数据集上评估SurfaceNet,该数据集包含7481个训练和7518个测试点云。评价采用三个难度等级:容易、中等和困难。由于访问KITTI测试服务器的次数有限,本发明通过将官方训练集分割为3712个点云进行训练和3769个点云进行验证来评估本发明的方法。3D边界框相交超过并集(IoU)阈值设置为0.25%,用于检测行人。此外,使用PointRCNN的离线评估代码来获得本发明方法的度量。进一步的,由图3可知,SurfaceNet为66.17%,优于最先进的方法(如AVOD-FP和PointPillars)超过7%。并且,本发明的方法仅使用激光雷达点云,而AVOD-FPN使用点云和RGB图像。As shown in Figure 3, the present invention evaluates SurfaceNet on the KITTI 3D object detection dataset, which contains 7481 training and 7518 testing point clouds. The assessment uses three difficulty levels: easy, medium and hard. Due to the limited number of visits to the KITTI test server, the present invention evaluates the method of the present invention by dividing the official training set into 3712 point clouds for training and 3769 point clouds for validation. The 3D bounding box intersection over union (IoU) threshold is set to 0.25% for pedestrian detection. Furthermore, the offline evaluation code of PointRCNN is used to obtain the metrics of the method of the present invention. Further, it can be seen from Figure 3 that SurfaceNet is 66.17%, outperforming the state-of-the-art methods (such as AVOD-FP and PointPillars) by more than 7%. Also, the method of the present invention only uses LiDAR point cloud, while AVOD-FPN uses point cloud and RGB image.

损失函数loss function

本发明的SurfaceNet预测3D预测框的中心热图HO∈RD×W和参数图PO∈R5×D×W。HO用于确定(x,z)平面中对象的中心,而PO用于回归高度(y)、大小(w,h,l)和旋转角度θ。The SurfaceNet of the present invention predicts the center heatmap H O ∈ R D×W and the parameter map P O ∈ R 5×D×W of the 3D prediction frame. HO is used to determine the center of the object in the (x, z) plane, while PO is used to regress height (y), size (w, h, l ) and rotation angle θ.

对于每个中心热图的回归,本发明使用均方误差损失:For the regression of each center heatmap, the present invention uses a mean squared error loss:

Lhm=MSE(Hgt-Ho)L hm =MSE(H gt -H o )

Hgt是通过物体真值中心的位置(x,z)经过高斯热图生成的。H gt is generated through a Gaussian heatmap through the position (x, z) of the object's ground-truth center.

对于每一个参数,本发明使用每一个参数的平滑L1损失的总和作为总的损失函数。For each parameter, the present invention uses the sum of the smoothed L1 losses for each parameter as the total loss function.

Figure BDA0002501493530000121
Figure BDA0002501493530000121

其中Δy,Δw,Δh,Δl和Δθ为相应属性的损失where Δy, Δw, Δh, Δl and Δθ are the losses of the corresponding properties

高度损失Δy使用真值和预测值之间的误差定义:The height loss Δy is defined using the error between the true and predicted values:

Δy=ygt-yo Δy=y gt -y o

预测框大小的损失{Δw,Δh,Δl}使用对数损失:The loss of predicted box size {Δw,Δh,Δl} uses a logarithmic loss:

Figure BDA0002501493530000122
Figure BDA0002501493530000122

旋转损失定义:Spin loss definition:

Δθ=sin(θgto)Δθ=sin(θ gto )

在训练过程中,yo,wo,ho,lo和θo由参数预测图Po根据真值3D边界框的中心位置得出得参数,而ygt,wgt,hgt,lgt和θgt则是对应物体真值的参数。During training, y o , w o , h o , l o and θ o are obtained from the parameter prediction map P o according to the center position of the ground-truth 3D bounding box, while y gt , w gt , h gt , l gt and θ gt are parameters corresponding to the true value of the object.

最后,总的损失函数定义如下:Finally, the overall loss function is defined as follows:

L=Lhm+βLloc L=L hm +βL loc

其中β是用于调整两个损失项之间的平衡参数。当然,本发明所描述的总的损失函数只是其中的一种,作为以此为基础所作出的函数变形,应当属于本发明的保护范围之内。where β is the balance parameter used to adjust the balance between the two loss terms. Of course, the total loss function described in the present invention is only one of them, and the functional deformation based on this should fall within the protection scope of the present invention.

对于本领域的技术人员来说,可以根据以上的技术方案和构思,给出各种相应的改变和变形,而所有的这些改变和变形,都应该包括在本发明权利要求的保护范围之内。For those skilled in the art, various corresponding changes and deformations can be given according to the above technical solutions and concepts, and all these changes and deformations should be included within the protection scope of the claims of the present invention.

Claims (2)

1. A three-dimensional target detection method based on laser radar point cloud data is characterized by comprising the following steps:
representing the point cloud into a dense surface map, wherein the number of lines in the map is K, and K is the number of channels of the laser radar; giving a lidar point p ═ (x, y, z, r, l), where (x, y, z), r, and l ∈ {0,..., K-1} denote position, reflectivity, and the number of layers where the point is located, respectively; point p is located on surface map Sh×wIn the grid (h, w) of (a), wherein h ═ l,
Figure FDA0003461950970000011
surface map three-dimensional points are projected into a two-dimensional grid according to the surface of the scene, for each of the surface mapsGrid (h, w) by averaging all points within the grid, centroid points are obtained
Figure FDA0003461950970000012
The depth within the (h, w) grid is calculated as follows:
Figure FDA0003461950970000013
wherein a surface depth map D may then be obtainedmap={d}∈RH×W(ii) a The surface depth map stores depth information in each mesh;
grid feature encoder based on voxel feature encoding layer, voxel feature encoding layer processes each grid of surface map to generate features of the grid, thereby generating regular 2D table surface feature map
Figure FDA0003461950970000014
If the grid has no points, zero padding is used; the grid characteristic encoder does not execute random sampling in the voxel characteristic encoding layer;
surface maps with N different resolutions, i.e. SH×W,
Figure FDA0003461950970000015
The three surface feature maps are generated by independently processing the three surface feature maps by a grid feature encoder respectively, namely
Figure FDA0003461950970000016
Then, a multiscale surface feature F epsilon R is obtained by feature concatenation3C×H×W
Figure FDA0003461950970000017
This multi-scale surface feature is used as an initial input for subsequent modules;
having surface feature convolution modules and byAdding a network deconvolution layer of low resolution output to obtain full resolution output; generated by a surface feature convolution module
Figure FDA0003461950970000021
The front view features in (a) have the same resolution as their input surface features F, but the dimensions of the features are different;
a view transformation module having front view features based on a depth surface map from a front view to a bird's eye view, the depths of different objects being different but the absolute depths obtained from the 2d front view pseudo-image being unequal; obtaining the depth of the object from the top view characteristics, and regressing the height after view transformation;
from heatmap HODerived points represent the position of the center of the detected object in top view, i.e., x, z, while the parameter map POContaining the parameters of the objects, the detection network consists of one common feature extractor and two branches, namely a hot map branch and a parameter branch.
2. The lidar point cloud data-based three-dimensional target detection method of claim 1, wherein the view conversion module comprises two steps: expanding and compressing;
in the expansion step, the feature f of each (h, w) position in the FV feature is mapped to the corresponding position (D, h, w) of the augmented feature map E according to the depth information D, wherein
Figure FDA0003461950970000022
Where R is the maximum depth range, if Dmap(h,w)>R, setting D as D;
in the compression step, a 2D characteristic diagram is obtained by randomly selecting a characteristic diagram extruded and expanded on an H axis, wherein the size of the characteristic diagram is DxW and the dimension c'; finally, the output is processed using M consecutive 2D convolutional layers, resulting in the final top view feature map.
CN202010433849.3A 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data Active CN111681212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010433849.3A CN111681212B (en) 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010433849.3A CN111681212B (en) 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data

Publications (2)

Publication Number Publication Date
CN111681212A CN111681212A (en) 2020-09-18
CN111681212B true CN111681212B (en) 2022-05-03

Family

ID=72452140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010433849.3A Active CN111681212B (en) 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data

Country Status (1)

Country Link
CN (1) CN111681212B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699806B (en) * 2020-12-31 2024-09-24 罗普特科技集团股份有限公司 Three-dimensional point cloud target detection method and device based on three-dimensional heat map
CN113095172B (en) * 2021-03-29 2022-08-05 天津大学 Point cloud three-dimensional object detection method based on deep learning
CN113219493B (en) * 2021-04-26 2023-08-25 中山大学 An end-to-end point cloud data compression method based on 3D lidar sensor
CN113111974B (en) 2021-05-10 2021-12-14 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis
CN113284163B (en) * 2021-05-12 2023-04-07 西安交通大学 Three-dimensional target self-adaptive detection method and system based on vehicle-mounted laser radar point cloud
CN113267761B (en) * 2021-05-28 2023-06-23 中国航天科工集团第二研究院 Laser radar target detection and identification method, system and computer readable storage medium
CN114155507A (en) * 2021-12-07 2022-03-08 奥特酷智能科技(南京)有限公司 Laser radar point cloud target detection method based on deep learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264416A (en) * 2019-05-28 2019-09-20 深圳大学 Sparse point cloud segmentation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110622215B (en) * 2017-12-14 2021-07-23 佳能株式会社 Three-dimensional model generation device, generation method and program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264416A (en) * 2019-05-28 2019-09-20 深圳大学 Sparse point cloud segmentation method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Learning Based 3D Object Detection for Automotive Radar and Camera;Michael Meyer et al;《2019 16th European Radar Conference》;20191121;1-10 *
VoxelNet:End-to-end Learning for Point Cloud Based 3D Object Detection;Y ZHOU et al;《in proeedings of the IEEE conference on computer vision and pattern recognition》;20181231;4490-4499 *
一种新的激光成像数据多视粗拼接算法;郭裕兰 等;《计算机工程与科学》;20131231;第35卷(第12期);146-152 *

Also Published As

Publication number Publication date
CN111681212A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
CN111681212B (en) Three-dimensional target detection method based on laser radar point cloud data
Wang et al. Fusing bird’s eye view lidar point cloud and front view camera image for 3d object detection
US11488308B2 (en) Three-dimensional object detection method and system based on weighted channel features of a point cloud
CN111145174B (en) 3D target detection method for point cloud screening based on image semantic features
CN113052109B (en) A 3D object detection system and a 3D object detection method thereof
CN103810744B (en) It is backfilled a little in cloud
CN110879994A (en) 3D visual detection method, system and device based on shape attention mechanism
Lafarge et al. A hybrid multiview stereo algorithm for modeling urban scenes
CN108109139B (en) Airborne LIDAR three-dimensional building detection method based on gray voxel model
US20150049937A1 (en) Method and apparatus for processing images
CN107424166B (en) Point cloud segmentation method and device
CN115063539B (en) Image dimension-increasing method and three-dimensional target detection method
CN113421217B (en) Drivable area detection method and device
CN111161267A (en) A segmentation method of 3D point cloud model
Wang et al. Fusing bird view lidar point cloud and front view camera image for deep object detection
CN112099046A (en) Airborne LIDAR 3D Plane Detection Method Based on Multivalued Voxel Model
Hou et al. Planarity constrained multi-view depth map reconstruction for urban scenes
CN114519681A (en) Automatic calibration method and device, computer readable storage medium and terminal
CN112509126B (en) Method, device, equipment and storage medium for detecting three-dimensional object
CN112712062A (en) Monocular three-dimensional object detection method and device based on decoupling truncated object
CN116188931A (en) Processing method and device for detecting point cloud target based on fusion characteristics
Alsfasser et al. Exploiting polar grid structure and object shadows for fast object detection in point clouds
CN116051489A (en) Bird's-eye view view angle feature map processing method, device, electronic device, and storage medium
CN116704307A (en) Target detection method and system based on fusion of image virtual point cloud and laser point cloud
CN115932883A (en) LiDAR-based wire galloping boundary recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Guo Yulan

Inventor after: Zhang Yongcong

Inventor after: Chen Minglin

Inventor after: Ao Cheng

Inventor before: Guo Yulan

Inventor before: Zhang Yongcong

Inventor before: Chen Minglin

Inventor before: Ao Sheng

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240328

Address after: 510000 No. 135 West Xingang Road, Guangdong, Guangzhou

Patentee after: SUN YAT-SEN University

Country or region after: China

Patentee after: National University of Defense Technology

Address before: 510275 No. 135 West Xingang Road, Guangzhou, Guangdong, Haizhuqu District

Patentee before: SUN YAT-SEN University

Country or region before: China

TR01 Transfer of patent right