CN116189147A

CN116189147A - YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method

Info

Publication number: CN116189147A
Application number: CN202310155654.0A
Authority: CN
Inventors: 柳冠华; 吴振宇; 赵亮; 崔俊涛; 黄涛
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-05-30

Abstract

The invention provides a YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method, which belongs to the field of target detection, and comprises a point cloud metadata processing step, a BEV mapping step, an RGB filling step and a network feature extraction and regression step.

Description

YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method

Technical Field

The invention belongs to the field of deep learning, and particularly relates to a three-dimensional point cloud low-power-consumption rapid target detection method based on YOLO.

Background

The three-dimensional point cloud target detection autopilot, AR, VR and robot fields are widely used. Compared with other mode data, the three-dimensional point cloud information has richer geometric information, and along with the growth of the market of acquisition equipment such as a laser radar, the acquisition threshold of the three-dimensional point cloud is gradually lowered. Target detection methods of three-dimensional point clouds are generally classified into three categories: a multi-view method of projecting a three-dimensional point cloud into a two-dimensional point cloud, a voxel convolution method based on representing a scene in voxel form, and a method of directly processing three-dimensional point cloud data.

The three-dimensional point cloud data at present has the following problems: the density of the point clouds is inconsistent, the near point clouds are dense, and the far point clouds are sparse in the acquisition process of the laser radar; the point clouds have disorder, and the point clouds on the same object can be completely represented by two completely different three-dimensional point cloud coordinate matrixes; the point cloud has low resolution, and the three-dimensional point cloud is used for sampling the three-dimensional geometric shape with low resolution, so that only one-sided geometric information can be obtained; the early acquisition sensor has various noises and the like.

In general, the target detection of the three-dimensional point cloud has a large calculation load, so that the application scene of the three-dimensional point cloud technology is limited by the calculation force of the system, and the three-dimensional point cloud technology cannot be applied to low-calculation-force and low-power consumption platforms such as many embedded devices. Therefore, how to reduce the calculation load, improve the calculation efficiency, and reduce the prediction time consumption is an important research topic in the field.

Disclosure of Invention

Aiming at the defects of the existing three-dimensional point cloud target detection, the three-dimensional point cloud target detection method is obtained based on the YOLO network improvement, the model is simple, the calculation efficiency of the three-dimensional point cloud target detection algorithm is improved, the calculation load is reduced, the prediction time consumption is reduced, the hardware requirement is lower, the three-dimensional point cloud target detection function can be rapidly completed on a low-power-consumption platform, and the method is suitable for low-power-consumption and low-calculation-force platforms.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

a YOLO-based three-dimensional point cloud low-power consumption rapid target detection method comprises the following steps:

step one, three-dimensional point cloud metadata processing, which is used for eliminating non-valued sampling points in original three-dimensional point cloud data, and specifically comprises the following sub-steps:

(1.1) point cloud clipping, namely setting a clipping box according to the target environment and the performance of acquisition equipment, and removing point cloud data with a relatively far distance and a relatively low value;

(1.2) point cloud downsampling, namely setting voxel grids with proper sizes, and finishing downsampling by a voxel method;

and (1.3) removing outliers, and removing sampling points exceeding alpha times standard deviation in the search radius by utilizing Gaussian distribution statistical characteristics of the point cloud.

Mapping the three-dimensional point cloud data to the BEV, and compressing the three-dimensional data to a two-dimensional space through mapping, wherein the method specifically comprises the following substeps:

(2.1) rasterizing three-dimensional point cloud information;

(2.2) distributing the point cloud into a grid under the bird's eye view.

Step three, the information obtained in the step two is normalized and filled into RGB three channels, and the characteristics under the BEV visual angle are extracted to be matched with the RGB channels, and the method specifically comprises the following substeps:

(3.1) respectively obtaining three kinds of information of maximum height, maximum intensity and point cloud density in each grid;

(3.2) respectively carrying out normalization treatment on the three kinds of information;

(3.3) filling three kinds of information into the RGB channel to match the RGB channel with the network.

And fourthly, finishing feature extraction and loss regression by using a YOLO network, adding a complex angle regression layer into the YOLO network for expansion, and finishing feature extraction and loss regression, wherein the method specifically comprises the following substeps:

(4.1) feature extraction, using a simplified YOLO-v4 network, by adding a complex angle regression layer for expansion;

and (4.2) loss regression, introducing a complex angle into a loss function, and completing loss function calculation.

Compared with the prior art, the invention has the following beneficial effects:

the invention modifies the network structure based on the traditional target detection deep learning neural network, and provides an efficient target detection method. The three-dimensional information is compressed to the two-dimensional space, so that the operation load can be effectively reduced, the detection efficiency is improved, and the calculation force requirement on an operation platform is reduced. Compared with other network structures, the method has the advantages that the method is simpler, the computational effort occupation of an operation platform is reduced, the prediction time consumption is reduced, and the application range and the scene of three-dimensional point cloud target detection are expanded.

Drawings

FIG. 1 is a basic flow chart of the method of the present invention.

Detailed Description

The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.

As shown in fig. 1, the YOLO-based three-dimensional point cloud low-power consumption rapid target detection method of the embodiment includes the following steps:

step 100: and acquiring three-dimensional point cloud metadata of the laser radar.

The laser radar imaging system mainly measures the distance of an object by laser, and simultaneously adjusts the laser emission angle and the laser emission position by using a control system and a scanning system to image. The laser radar distance measurement mainly calculates the distance between the laser radar and the target through laser flight time, a timing circuit starts after the laser emits the laser, and the timing is finished after the laser receives the echo signal, and the target distance is calculated through calculating the time difference between the emission and the receiving. The coordinates with the lidar as the origin are thus obtained from the distance, the horizontal angle and the vertical angle. Each sample point consists of a set of three-dimensional coordinates and a reflection intensity.

Step 101: and preprocessing the three-dimensional point cloud metadata.

Firstly, according to the influence of factors such as algorithm characteristics, sampling environment, sampling equipment performance and the like, a cutting box is set to be 80 meters long, 40 meters wide and 3 meters high, and point cloud data with a far distance and low value are removed, so that unnecessary operation is reduced; dividing small voxels with length, width and height of 1 cm in a three-dimensional space, obtaining a point set falling in each voxel, and taking a sampling point in each voxel to replace the original point set, thereby completing point cloud downsampling; according to the characteristic of Gaussian distribution of the point cloud distribution sign, the number of the closest points analyzed by the sampling points in the point cloud is set as K, the distances from all the points to the sampling points in the point cloud are calculated, and if the distances from a certain point to the sampling points exceed the average distance by more than alpha times of standard deviation, the point is regarded as an outlier and needs to be removed.

Step 102: the three-dimensional point cloud data is mapped to BEV perspectives.

The three-dimensional point cloud data are rasterized, the grid resolution is set to be 8 cm, all point clouds are mapped to a two-dimensional plane in a overlooking view angle, and therefore a two-dimensional point cloud image under the BEV view angle is obtained.

Step 103: filling into network RGB channels.

The maximum height, the maximum intensity and the point cloud density of the point cloud in each grid are obtained respectively, the formula is as follows, normalization processing is carried out on three kinds of information respectively, and the obtained three-channel data are filled into RGB channels.

Definition of the definition

For projection onto a specific grid of point clouds at BEV view angles, < >>

Describing the mapping function mapped to a particular grid, then:

wherein z is _g Indicating maximum height, z _b Representing maximum intensity, the I function represents single point cloud intensity, z _r Representing the normalized point cloud density within the grid, N is the number of points mapped to a particular grid at the BEV perspective.

Step 104: feature extraction and loss regression were done using YOLO networks.

Wherein the overall network characteristics are similar to the YOLO-v4 network, and the characteristics are the same as the YOLO-v4 network in the feature extraction stage, namely the CSP-dark net53 network. After the network is extracted by the features, a complex angle regression layer is added to the output layer, and the features output by the network are decoded into three-dimensional space coordinates, size, category probability and orientation angle of the target.

Wherein the size of the complex angle regression layer is determined according to the size and shape of the input point cloud network (in this embodiment, the regression network is set to be 32×16×75, i.e. 32×16 grids are divided, each grid provides 5 predictions), and each prediction includes t _x 、t _y 、c _x 、c _y 、t _w 、t _l Etc. prediction parameters.

b _x ＝σ(t _x )+c _x

b _y ＝σ(t _y )+c _y

b _φ ＝arctan ₂ (t _Im ,t _Re )

Wherein the predicted center point t _x ，t _y Normalizing into the relative position of each grid by a sigmoid function; sigma function means that the actual offset is obtained by the relative position; c _x ，x _y Indexing positions for grids on the output feature map; t is t _w ，t _l Obtaining the offset relative to the anchor frame through logarithmic function characterization; p is p _w ，p _l The length and width of the anchor frame are the length and width of the anchor frame; t is t _Im ，t _Re For the real and imaginary parts of the predicted complex angle, the orientation angle b is determined by arctangent _φ 。

Wherein b _x Is the x coordinate of the central point of the three-dimensional space of the target, b _y Is the y coordinate of the center point of the three-dimensional space of the target, b _w For the width of the target three-dimensional space b _l B is the length of the target three-dimensional space _φ Is the orientation angle of the target three-dimensional space.

Wherein, complex angle returns, the target orientation angle b _φ Can pass through corresponding regression parameters t _Im And t _Re Calculated to obtain t _Im And t _Re The real part and the imaginary part of the complex number are respectively corresponding, and the singularity can be effectively avoided by adopting a complex number mode.

Wherein the loss function:

L＝L _Yolo +L _Euler

wherein L is _Yolo As a self-loss function of YOLO, L _Euler Is a loss function of the complex angle regression layer.

In general, it is desirable to have a higher learning rate at the early stage of training so that the network converges rapidly, and a lower learning rate at the later stage of training so that the network converges better to the optimal solution.

In the embodiment, verification is completed on the KITTI data set, and compared with the VoxelNet and other networks, the average detection accuracy is basically the same, but the model reasoning time and the network volume are greatly reduced. The method can complete real-time reasoning speed of 4.7fps on the NVIDIA TX2 low-power consumption embedded platform.

In summary, the invention provides a low-power-consumption rapid target detection method for three-dimensional point clouds based on YOLO, which is characterized in that after three-dimensional point cloud metadata is obtained from a laser radar, invalid sampling points are reduced by cutting, removing outliers and the like, and then the three-dimensional point clouds are mapped to a two-dimensional plane, so that the network calculation amount is greatly reduced, a complex convolution layer is added on the basis of a mature YOLO network, a high-efficiency point cloud target detection network is finally formed, and the problem of singular points caused by single-angle estimation is avoided. The method is simple and clear in principle, small in calculation burden and short in prediction time consumption, can effectively expand the application scene of three-dimensional point cloud target detection, and has wide application value and market prospect.

Claims

1. A YOLO-based three-dimensional point cloud low-power consumption rapid target detection method is characterized by comprising the following steps:

(1.1) point cloud clipping, namely setting a clipping box according to the target environment and the performance of acquisition equipment, and eliminating the point cloud data with low distance value;

(1.2) point cloud downsampling, namely setting a voxel grid, and finishing downsampling by a voxel method;

(1.3) removing outliers, and removing sampling points exceeding alpha times standard deviation in a searching radius by utilizing Gaussian distribution statistical characteristics of point clouds;

(2.1) rasterizing three-dimensional point cloud information;

(2.2) distributing the point cloud into a grid under the aerial view;

(3.3) filling three kinds of information into the RGB channel to match with the network;

(4.1) feature extraction, using YOLO-v4 network, expanding by adding complex angle regression layer;

2. The YOLO-based three-dimensional point cloud low-power consumption rapid target detection method according to claim 1, wherein in the third step,

definition of the definition

For projection onto a specific grid of point clouds at BEV view angles, < >>

Describing the mapping function mapped to a particular grid, then:

3. The YOLO-based three-dimensional point cloud low-power consumption rapid target detection method according to claim 1 or 2, wherein in the fourth step,

the overall network characteristics are similar to the YOLO-v4 network, and are the same as the YOLO-v4 network in the characteristic extraction stage, namely, a CSP-DarkNet53 network is used; after the network is extracted by the features, a complex angle regression layer is added to the output layer, and the features output by the network are decoded into three-dimensional space coordinates, size, category probability and orientation angle of the target;

wherein the size of the complex angle regression layer is determined according to the size and shape of the input point cloud network, and each prediction comprises t _x 、t _y 、c _x 、c _y 、t _w 、t _l Is used for predicting parameters of the (a);

b _x ＝σ(t _x )+c _x

b _y ＝σ(t _y )+c _y

b _φ ＝arctan ₂ (t _Im ,t _Re )

wherein the predicted center point t _x ，t _y Normalizing into the relative position of each grid by a sigmoid function; sigma function means that the actual offset is obtained by the relative position; c _x ，c _y Indexing positions for grids on the output feature map; t is t _w ，t _l Obtaining the offset relative to the anchor frame through logarithmic function characterization; p is p _w ，p _l The length and width of the anchor frame are the length and width of the anchor frame; t is t _Im ，t _Re For the real and imaginary parts of the predicted complex angle, the orientation angle b is determined by arctangent _φ ；

Wherein b _x X coordinate of central point of three-dimensional space for target，b _y Is the y coordinate of the center point of the three-dimensional space of the target, b _w For the width of the target three-dimensional space b _l B is the length of the target three-dimensional space _φ The orientation angle of the three-dimensional space of the target;

wherein, complex angle returns, the target orientation angle b _φ Can pass through corresponding regression parameters t _Im And t _Re Calculated to obtain t _Im And t _Re Corresponding to the real and imaginary parts of the complex numbers, respectively;

wherein the loss function:

L＝L _Yolo +L _Euler