CN111681212B - Three-dimensional target detection method based on laser radar point cloud data - Google Patents

Three-dimensional target detection method based on laser radar point cloud data Download PDF

Info

Publication number
CN111681212B
CN111681212B CN202010433849.3A CN202010433849A CN111681212B CN 111681212 B CN111681212 B CN 111681212B CN 202010433849 A CN202010433849 A CN 202010433849A CN 111681212 B CN111681212 B CN 111681212B
Authority
CN
China
Prior art keywords
map
feature
grid
view
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010433849.3A
Other languages
Chinese (zh)
Other versions
CN111681212A (en
Inventor
郭裕兰
张永聪
陈铭林
敖晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010433849.3A priority Critical patent/CN111681212B/en
Publication of CN111681212A publication Critical patent/CN111681212A/en
Application granted granted Critical
Publication of CN111681212B publication Critical patent/CN111681212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Optical Radar Systems And Details Thereof (AREA)

Abstract

The invention discloses a three-dimensional target detection method based on laser radar point cloud data, which adopts a dense data expression form according to the data characteristics of laser radar point cloud so as to obtain dense characteristics and convert three-dimensional characteristics into two-dimensional characteristics, thereby effectively improving the operation efficiency and the operation precision.

Description

Three-dimensional target detection method based on laser radar point cloud data
Technical Field
The invention relates to the technical field of three-dimensional target detection in automatic driving, in particular to a three-dimensional target detection method based on laser radar point cloud data.
Background
The laser radar can acquire object information of a three-dimensional space, and position information of an object in the space is calculated through reflection time of the surface of the object.
During the driving process of the vehicle, the detection of three-dimensional targets positioned around the vehicle is an essential component in automatic driving. The current automatic driving vehicle generally performs target detection by fusing image RGB information and laser radar point cloud information. The method only uses the laser radar point cloud data as input to detect the interested object. Although two-dimensional target detection in a two-dimensional image has been significantly advanced and has achieved extremely high detection accuracy, the detection effect of a three-dimensional lidar point cloud in a scene such as unmanned driving is still poor, which is mainly caused by the sparsity of the lidar point cloud.
Further, apple incorporated proposed VoxelNet in 2018 to perform target detection on laser radar point cloud input data. The method comprises the steps of carrying out voxelization segmentation on laser radar point cloud data in a space, dividing the space into independent voxels, and carrying out feature extraction on the point cloud data in the voxels by using a PointNet-like network VFELayer (feature learning network). And finally, using the three-dimensional roll, splicing the features in the top view direction, and then carrying out object detection.
However, VoxelNet divides a three-dimensional space into three dimensions, extracts features from a point cloud in each of the divided voxels to form a four-dimensional feature map (three-dimensional space plus one-dimensional features), and thus requires a three-dimensional convolution process. The operation speed is one order of magnitude slower than that of the two-dimensional convolution operation. Meanwhile, due to the sparsity of the point cloud, most voxels are empty, so that the three-dimensional convolution operation has huge operation which is useless, but operation resources still need to be wasted.
Furthermore, pointpilars is also a network based on spatial voxel segmentation, and is different from VoxelNet in that a space is cut into a rectangular parallelepiped column in the top view direction, and the point cloud in the column is subjected to feature extraction. The feature map processed in this way is a three-dimensional feature map (two-dimensional space plus one-dimensional feature), and the feature map is processed only by using two-dimensional convolution operation, which is the same as the feature map of object detection of a general RGB image, so that the feature map can be directly processed by using a subsequent frame of two-dimensional RGB image detection. Speed and accuracy performance is much higher compared to VoxelNet.
In the process of extracting the features in the column, the PointPillars directly fuses point clouds in the same vertical direction into one feature, the fusion mode is rough, the feature distribution in the vertical direction is not obvious, and meanwhile, the features in the aerial view feature map are also very sparse, so that a great amount of calculation power is wasted in two-dimensional convolution operation.
Disclosure of Invention
In view of the defects of the prior art, the invention discusses a three-dimensional target detection method based on laser radar point cloud data, aims to solve the problem of sparse characteristics of VoxelNet and PointPillars and constructs a dense characteristic expression form. Compared with VoxelNet, the method has the operation efficiency of two-dimensional convolution, and compared with Pointpilers, the point cloud characteristics in the vertical direction are not compressed together forcibly, so that more characteristics of objects in the vertical direction are reserved, and the important target of the characteristics in the vertical direction is expressed better.
Furthermore, according to the data characteristics of the laser radar point cloud, a dense data expression form is adopted to obtain dense characteristics and convert the three-dimensional characteristics into two-dimensional characteristics, so that the operation efficiency is effectively improved, and the operation precision is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a three-dimensional target detection method based on laser radar point cloud data comprises the following steps:
representing the point cloud into a dense surface map, wherein the number of lines in the map is K, and K is the number of channels of the laser radar; giving a lidar point p ═ x, y, z, r,) l, where (x, y, z), r and l ∈ {0,..., K-1} respectively denote position, reflectivity, and the number of layers where the point is located; point p is located on surface map Sh×wIn the grid (h, w) of (a), wherein h ═ l,
Figure BDA0002501493530000031
surface map three-dimensional points are projected into a two-dimensional grid according to the surface of the scene, and for each grid (h, w) of the surface map centroid points are obtained by averaging all points within the grid
Figure BDA0002501493530000032
The depth within the (h, w) grid is calculated as follows:
Figure BDA0002501493530000033
a surface depth map D may then be obtainedmap={d}∈RH×W(ii) a The surface depth map stores depth information in each mesh;
grid feature encoder based on voxel feature encoding layer, voxel feature encoding layer processes each grid of surface map to generate features of the grid, thereby generating regular 2D table surface feature map
Figure BDA0002501493530000034
If gridWithout any points, zero padding is used; the grid characteristic encoder does not execute random sampling in the voxel characteristic encoding layer;
surface maps with N different resolutions, i.e. SH×W,
Figure BDA0002501493530000041
The grid characteristic encoder processes the surface characteristic images respectively and independently to generate N surface characteristic images, namely SH×W,
Figure BDA0002501493530000042
Then, a multiscale surface feature F epsilon R is obtained by feature concatenation3C×H×W
Figure BDA0002501493530000043
This multi-scale surface feature is used as an initial input for subsequent modules;
the system comprises a surface feature convolution module and a network deconvolution layer with low resolution output, wherein the network deconvolution layer is added to obtain full resolution output; generated by a surface feature convolution module
Figure BDA0002501493530000044
The front view features in (a) have the same resolution as their input surface features F, but the dimensions of the features are different;
a view transformation module having front view features based on a depth surface map from a front view to a bird's eye view, the depths of different objects being different but the absolute depths obtained from the 2d front view pseudo-image being unequal; the depth of the object is obtained from the top view features and the height is regressed after the view transformation.
From heatmap HODerived points represent the position of the center of the detected object in top view, i.e., x, z, while the parameter map POContaining the parameters of the objects, the detection network consists of one common feature extractor and two branches, namely a hot map branch and a parameter branch.
It should be noted that the view conversion module has two steps: expanding and compressing;
in the expansion step, the feature f of each (h, w) position in the FV feature is mapped to the corresponding position (D, h, w) of the augmented feature map E according to the depth information D, wherein
Figure BDA0002501493530000045
Where R is the maximum depth range, if Dmap(h,w)>R, setting D as D;
in the compression step, a 2D characteristic diagram is obtained by randomly selecting a characteristic diagram extruded and expanded on an H axis, wherein the size of the characteristic diagram is DxW and the dimension c'; finally, the output is processed using M consecutive 2D convolutional layers, resulting in the final top view feature map.
The method has the beneficial effect that a dense characteristic expression form is constructed. The method has the advantages that the method has the operation efficiency of two-dimensional convolution, point cloud characteristics in the vertical direction are not compressed together forcibly, and more characteristics of objects in the vertical direction are reserved, so that important targets of the characteristics in the vertical direction are expressed better.
Drawings
FIG. 1 is a schematic diagram of a three-dimensional object detection overall framework of the present invention;
FIG. 2 is a diagram of a cuboid voxel and a surface of the present invention;
fig. 3 is a comparative illustration of the evaluation results of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the following examples are provided to illustrate the detailed embodiments and specific operations based on the technical solutions of the present invention, but the scope of the present invention is not limited to the examples.
The invention relates to a three-dimensional target detection method based on laser radar point cloud data, which comprises the following steps:
representing the point cloud into a dense surface map, wherein the number of lines in the map is K, and K is the number of channels of the laser radar; given a lidar point p ═(x, y, z, r,) l, where (x, y, z), r and l ∈ { 0., K-1} respectively represent position, reflectivity, and number of layers in which the dot is located; point p is located on surface map Sh×wIn the grid (h, w) of (a), wherein h ═ l,
Figure BDA0002501493530000061
surface map three-dimensional points are projected into a two-dimensional grid according to the surface of the scene, and for each grid (h, w) of the surface map centroid points are obtained by averaging all points within the grid
Figure BDA0002501493530000062
The depth within the (h, w) grid is calculated as follows:
Figure BDA0002501493530000063
a surface depth map D may then be obtainedmap={d}∈RH×W(ii) a The surface depth map stores depth information in each mesh;
grid feature encoder based on voxel feature encoding layer, voxel feature encoding layer processes each grid of surface map to generate features of the grid, thereby generating regular 2D table surface feature map
Figure BDA0002501493530000064
If the grid has no points, zero padding is used; the grid characteristic encoder does not execute random sampling in the voxel characteristic encoding layer;
surface maps with N different resolutions, i.e. SH×W,
Figure BDA0002501493530000065
The grid characteristic encoder processes the surface characteristic images respectively and independently to generate N surface characteristic images, namely SH×W,
Figure BDA0002501493530000066
Then, a multiscale surface feature F epsilon R is obtained by feature concatenation3C×H×W
Figure BDA0002501493530000067
This multi-scale surface feature is used as an initial input for subsequent modules;
the system comprises a surface feature convolution module and a network deconvolution layer with low resolution output, wherein the network deconvolution layer is added to obtain full resolution output; generated by a surface feature convolution module
Figure BDA0002501493530000068
The front view features in (a) have the same resolution as their input surface features F, but the dimensions of the features are different;
a view transformation module having front view features based on a depth surface map from a front view to a bird's eye view, the depths of different objects being different but the absolute depths obtained from the 2d front view pseudo-image being unequal; the depth of the object is obtained from the top view features and the height is regressed after the view transformation.
From heatmap HODerived points represent the position of the center of the detected object in top view, i.e., x, z, while the parameter map POContaining the parameters of the objects, the detection network consists of one common feature extractor and two branches, namely a hot map branch and a parameter branch.
It should be noted that the view conversion module has two steps: expanding and compressing;
in the expansion step, the feature f of each (h, w) position in the FV feature is mapped to the corresponding position (D, h, w) of the augmented feature map E according to the depth information D, wherein
Figure BDA0002501493530000071
Where R is the maximum depth range, if Dmap(h,w)>R, setting D as D;
in the compression step, a 2D characteristic diagram is obtained by randomly selecting a characteristic diagram extruded and expanded on an H axis, wherein the size of the characteristic diagram is DxW and the dimension c'; finally, the output is processed using M consecutive 2D convolutional layers, resulting in the final top view feature map.
Examples
Surface map (Surface map)
Lidar is a commonly used sensor in automotive driving. For example, the Velodyne HDL-64E lidar records 64 rows of dots in the order of the laser beam, while the dot distribution between adjacent scan lines is uniform. Based on the observation of the scanning mechanism of the laser radar, the invention represents the point cloud into a dense form, namely a Surface map. The surface map is a two-dimensional pseudo-image with a number of lines K, where K is the number of channels of the lidar. The points along the scan direction (usually horizontal) are placed in one row of the surface map, while the points along one column of the surface map correspond to the points obtained during a single scan at different channels, i.e. the laser beam. (having the same level but different vertical angles).
Given a lidar point p ═ x, y, z, r, l, where (x, y, z), r, and l ∈ {0,..., K-1} indicate position, reflectivity, and the number of layers in which the point is located, respectively. Point p is located on surface map Sh×wIn the grid (h, w) of (a), wherein h ═ l,
Figure BDA0002501493530000081
the surface map projects three-dimensional points into a two-dimensional grid according to the surface of the scene.
For each mesh (h, w) of the surface map, the centroid point is obtained by averaging all points within the mesh
Figure BDA0002501493530000082
The depth within the (h, w) grid is calculated as follows:
Figure BDA0002501493530000083
a surface depth map D may then be obtainedmap={d}∈RH×W. The surface depth map stores depth information in each mesh and will be used in the later view conversion module.
Surface network (SurfaceNet)
Surface net proposes an accurate detection frame method for predicting objects using Surface map representations. It consists of four modules (as shown in fig. 1): 1) a grid feature encoder that can process any number of points within each grid; 2) the surface feature convolution module extracts high-level features by adopting a two-dimensional backbone network; 3) a View conversion module that converts the features from a Front View (FV) to a Bird's Eye View (BEV); and 4) prediction of 3D detection frame parameters and anchor-free 3D central heat map prediction.
Grid feature encoder
The number of points within the grid is arbitrary due to the irregularity of the point cloud. The trellis feature encoder is designed to encode an arbitrary number of points into dense features with a fixed dimension C, as shown in fig. 1 (a).
The Encoder of the present invention is based on a Voxel Feature encoding (Voxel Feature Encoder) layer. The VFE layer processes each mesh of the Surface map to generate features of the mesh, thereby generating a regular 2D table Surface feature map
Figure BDA0002501493530000091
If the grid does not have any points, zero padding is used. Furthermore, the trellis feature encoder of the present invention does not perform random sampling in the VFE. There are two reasons for this: 1) the number of each grid point is small; 2) the distribution of points between each grid is approximately uniform and there is no need to reduce the point imbalance.
In order to facilitate multi-scale three-dimensional target detection, the invention adopts N surface maps with different resolutions (namely S)H×W,
Figure BDA0002501493530000092
). The three surface feature maps are generated by independently processing the three surface feature maps by a grid feature encoder (namely S)H×W,
Figure BDA0002501493530000093
). Then, a multiscale surface feature F epsilon R is obtained by feature concatenation3C×H×W
Figure BDA0002501493530000094
This multi-scale surface feature is used as an initial input for subsequent modules. For a clearer representation, the present invention omits "multi-scale" and uses surface features to represent multi-scale surface features.
Surface Feature convolution Module Surface Feature Convolitional Module (SFCM)
Since the receptive field of a surface feature is very limited (i.e., only within its underlying grid), the present invention uses a 2D convolutional neural network (see fig. 1(b)) to more effectively step up the receptive field.
In general, the feature map generated by a 2D convolutional neural network has a lower resolution than its input image for computational reasons. In order to avoid performance degradation caused by low-resolution features (particularly in small target detection), the invention designs a surface feature convolution module SFCM (small object detection), and obtains full-resolution output by adding a network deconvolution layer of low-resolution output. Thus, the module generates
Figure BDA0002501493530000101
Has the same resolution as its input surface feature F, but the dimensions of the feature are different.
View conversion module
The front view feature embeds the local surface information of a single mesh and its adjacency into a front view. However, it is difficult to predict the absolute depth information directly from the front view features. However, the present invention can discriminate the height and width information based on the location of the front view features. Therefore, the invention proposes a view conversion module from a front view to a bird's eye view based on the front view features of the depth surface map, as shown in fig. 1 (c).
The reason for using the view conversion module is that: 1) the depths of different objects are different, but the absolute depths obtained from the 2d forward looking pseudo-image are not equal; 2) while the heights of the different objects are similar because they always stand on the ground. Therefore, the present invention can easily derive the depth of an object from top view (BEV) features and regress the height after view transformation.
Specifically, the view conversion module has two steps: expansion and compression. In the expansion step, the feature f of each (h, w) position in the FV feature is mapped to the corresponding position (D, h, w) of the augmented feature map E according to the depth information D, wherein
Figure BDA0002501493530000102
Where R is the maximum depth range. If D ismap(h,w)>And R, setting D as D.
In the compression step, the invention obtains a 2D characteristic diagram with the size of D multiplied by W and the dimension c' by randomly selecting the characteristic diagram extruded and expanded on the H axis.
Finally, the output is processed using M consecutive 2D convolutional layers to obtain the final top view feature map.
Three-dimensional target detection without anchor point frame
As shown in fig. 2, the present invention treats a 3D object as points having attributes. From heatmap HOThe derived points represent the position of the center of the detected object (i.e., x, z) in the top view, while the parameter map POIncluding parameters of the object such as height y, size (h, w, l) and rotation angle theta. The detection network of the present invention consists of one common feature extractor and two branches, namely a hot map branch and a parameter branch. The present invention uses a common feature extraction module similar to the RPN in VoxelNet. In contrast, the present invention uses two-dimensional convolutional layers to directly process two-dimensional features output by the view conversion module, rather than three-dimensional convolutional layers. The heatmap and parametric map branches have the same topology, consisting of M consecutive 2D convolutional layers.
As shown in fig. 3, the present invention evaluates surfacent on a KITTI 3D target detection dataset, which contains 7481 training and 7518 test point clouds. Three difficulty ratings were used for the evaluation: easy, medium and difficult. Due to the limited number of visits to the KITTI test server, the present invention evaluates the method of the present invention by segmenting the official training set into 3712 point clouds for training and 3769 point clouds for validation. The 3D bounding box intersection over union (IoU) threshold is set to 0.25% for detecting pedestrians. Furthermore, the off-line evaluation code of PointRCNN is used to obtain the metrics of the inventive method. Further, as can be seen from FIG. 3, SurfaceNet is 66.17%, which is better than the most advanced methods (such as AVOD-FP and PointPillars) by more than 7%. Moreover, the method of the invention only uses laser radar point cloud, while the AVOD-FPN uses point cloud and RGB image.
Loss function
Central heatmap H of SurfaceNet predictive 3D predictive frame of the inventionO∈RD×WAnd a parameter map PO∈R5×D×W。HOFor determining the center of an object in the (x, z) plane, POFor regression height (y), size (w, h, l) and rotation angle θ.
For regression of each central heatmap, the invention uses the loss of mean square error:
Lhm=MSE(Hgt-Ho)
Hgtis generated from a gaussian heatmap by the position (x, z) of the object's true center.
For each parameter, the present invention uses the sum of the smoothed L1 losses for each parameter as the total loss function.
Figure BDA0002501493530000121
Where Δ y, Δ w, Δ h, Δ l and Δ θ are the loss of the corresponding attribute
The height loss Δ y is defined using the error between the true and predicted values:
Δy=ygt-yo
the penalty { Δ w, Δ h, Δ l } for the prediction frame size uses the logarithmic penalty:
Figure BDA0002501493530000122
spin loss definition:
Δθ=sin(θgto)
during the training, yo,wo,ho,loAnd thetaoPrediction of the map P from the parametersoThe parameters are derived from the true 3D bounding box center position, and ygt,wgt,hgt,lgtAnd thetagtIt is the parameter that corresponds to the true value of the object.
Finally, the overall loss function is defined as follows:
L=Lhm+βLloc
where β is a parameter used to adjust the balance between the two loss terms. Of course, the overall loss function described in the present invention is only one of the functions, and the functional variations based on the above are within the scope of the present invention.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims (2)

1. A three-dimensional target detection method based on laser radar point cloud data is characterized by comprising the following steps:
representing the point cloud into a dense surface map, wherein the number of lines in the map is K, and K is the number of channels of the laser radar; giving a lidar point p ═ (x, y, z, r, l), where (x, y, z), r, and l ∈ {0,..., K-1} denote position, reflectivity, and the number of layers where the point is located, respectively; point p is located on surface map Sh×wIn the grid (h, w) of (a), wherein h ═ l,
Figure FDA0003461950970000011
surface map three-dimensional points are projected into a two-dimensional grid according to the surface of the scene, for each of the surface mapsGrid (h, w) by averaging all points within the grid, centroid points are obtained
Figure FDA0003461950970000012
The depth within the (h, w) grid is calculated as follows:
Figure FDA0003461950970000013
wherein a surface depth map D may then be obtainedmap={d}∈RH×W(ii) a The surface depth map stores depth information in each mesh;
grid feature encoder based on voxel feature encoding layer, voxel feature encoding layer processes each grid of surface map to generate features of the grid, thereby generating regular 2D table surface feature map
Figure FDA0003461950970000014
If the grid has no points, zero padding is used; the grid characteristic encoder does not execute random sampling in the voxel characteristic encoding layer;
surface maps with N different resolutions, i.e. SH×W,
Figure FDA0003461950970000015
The three surface feature maps are generated by independently processing the three surface feature maps by a grid feature encoder respectively, namely
Figure FDA0003461950970000016
Then, a multiscale surface feature F epsilon R is obtained by feature concatenation3C×H×W
Figure FDA0003461950970000017
This multi-scale surface feature is used as an initial input for subsequent modules;
having surface feature convolution modules and byAdding a network deconvolution layer of low resolution output to obtain full resolution output; generated by a surface feature convolution module
Figure FDA0003461950970000021
The front view features in (a) have the same resolution as their input surface features F, but the dimensions of the features are different;
a view transformation module having front view features based on a depth surface map from a front view to a bird's eye view, the depths of different objects being different but the absolute depths obtained from the 2d front view pseudo-image being unequal; obtaining the depth of the object from the top view characteristics, and regressing the height after view transformation;
from heatmap HODerived points represent the position of the center of the detected object in top view, i.e., x, z, while the parameter map POContaining the parameters of the objects, the detection network consists of one common feature extractor and two branches, namely a hot map branch and a parameter branch.
2. The lidar point cloud data-based three-dimensional target detection method of claim 1, wherein the view conversion module comprises two steps: expanding and compressing;
in the expansion step, the feature f of each (h, w) position in the FV feature is mapped to the corresponding position (D, h, w) of the augmented feature map E according to the depth information D, wherein
Figure FDA0003461950970000022
Where R is the maximum depth range, if Dmap(h,w)>R, setting D as D;
in the compression step, a 2D characteristic diagram is obtained by randomly selecting a characteristic diagram extruded and expanded on an H axis, wherein the size of the characteristic diagram is DxW and the dimension c'; finally, the output is processed using M consecutive 2D convolutional layers, resulting in the final top view feature map.
CN202010433849.3A 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data Active CN111681212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010433849.3A CN111681212B (en) 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010433849.3A CN111681212B (en) 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data

Publications (2)

Publication Number Publication Date
CN111681212A CN111681212A (en) 2020-09-18
CN111681212B true CN111681212B (en) 2022-05-03

Family

ID=72452140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010433849.3A Active CN111681212B (en) 2020-05-21 2020-05-21 Three-dimensional target detection method based on laser radar point cloud data

Country Status (1)

Country Link
CN (1) CN111681212B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699806B (en) * 2020-12-31 2024-09-24 罗普特科技集团股份有限公司 Three-dimensional point cloud target detection method and device based on three-dimensional heat map
CN113095172B (en) * 2021-03-29 2022-08-05 天津大学 Point cloud three-dimensional object detection method based on deep learning
CN113219493B (en) * 2021-04-26 2023-08-25 中山大学 End-to-end cloud data compression method based on three-dimensional laser radar sensor
CN113111974B (en) 2021-05-10 2021-12-14 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis
CN113284163B (en) * 2021-05-12 2023-04-07 西安交通大学 Three-dimensional target self-adaptive detection method and system based on vehicle-mounted laser radar point cloud
CN113267761B (en) * 2021-05-28 2023-06-23 中国航天科工集团第二研究院 Laser radar target detection and identification method, system and computer readable storage medium
CN114155507A (en) * 2021-12-07 2022-03-08 奥特酷智能科技(南京)有限公司 Laser radar point cloud target detection method based on deep learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264416A (en) * 2019-05-28 2019-09-20 深圳大学 Sparse point cloud segmentation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593012A (en) * 2017-12-14 2021-11-02 佳能株式会社 Three-dimensional model generation device, generation method, and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264416A (en) * 2019-05-28 2019-09-20 深圳大学 Sparse point cloud segmentation method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Learning Based 3D Object Detection for Automotive Radar and Camera;Michael Meyer et al;《2019 16th European Radar Conference》;20191121;1-10 *
VoxelNet:End-to-end Learning for Point Cloud Based 3D Object Detection;Y ZHOU et al;《in proeedings of the IEEE conference on computer vision and pattern recognition》;20181231;4490-4499 *
一种新的激光成像数据多视粗拼接算法;郭裕兰 等;《计算机工程与科学》;20131231;第35卷(第12期);146-152 *

Also Published As

Publication number Publication date
CN111681212A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
CN111681212B (en) Three-dimensional target detection method based on laser radar point cloud data
CN111145174B (en) 3D target detection method for point cloud screening based on image semantic features
CN110264416B (en) Sparse point cloud segmentation method and device
Wang et al. Fusing bird’s eye view lidar point cloud and front view camera image for 3d object detection
KR102096673B1 (en) Backfilling points in a point cloud
CN110879994A (en) Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN112288667B (en) Three-dimensional target detection method based on fusion of laser radar and camera
CN113267761B (en) Laser radar target detection and identification method, system and computer readable storage medium
CN115512132A (en) 3D target detection method based on point cloud data and multi-view image data fusion
CN115063539B (en) Image dimension-increasing method and three-dimensional target detection method
CN117274749B (en) Fused 3D target detection method based on 4D millimeter wave radar and image
CN114332134B (en) Building facade extraction method and device based on dense point cloud
CN113362385A (en) Cargo volume measuring method and device based on depth image
CN114298151A (en) 3D target detection method based on point cloud data and image data fusion
Hou et al. Planarity constrained multi-view depth map reconstruction for urban scenes
CN116486396A (en) 3D target detection method based on 4D millimeter wave radar point cloud
Krauss et al. Deterministic guided lidar depth map completion
CN113421217A (en) Method and device for detecting travelable area
CN117372680B (en) Target detection method based on fusion of binocular camera and laser radar
CN116704307A (en) Target detection method and system based on fusion of image virtual point cloud and laser point cloud
CN116778266A (en) Multi-scale neighborhood diffusion remote sensing point cloud projection image processing method
CN115932883A (en) Wire galloping boundary identification method based on laser radar
CN116129422A (en) Monocular 3D target detection method, monocular 3D target detection device, electronic equipment and storage medium
Xinming et al. China DSM generation and accuracy acessment using ZY3 images
CN114118125A (en) Multi-modal input and space division three-dimensional target detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Guo Yulan

Inventor after: Zhang Yongcong

Inventor after: Chen Minglin

Inventor after: Ao Cheng

Inventor before: Guo Yulan

Inventor before: Zhang Yongcong

Inventor before: Chen Minglin

Inventor before: Ao Sheng

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240328

Address after: 510000 No. 135 West Xingang Road, Guangdong, Guangzhou

Patentee after: SUN YAT-SEN University

Country or region after: China

Patentee after: National University of Defense Technology

Address before: 510275 No. 135 West Xingang Road, Guangzhou, Guangdong, Haizhuqu District

Patentee before: SUN YAT-SEN University

Country or region before: China