CN116189147A - YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method - Google Patents
YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method Download PDFInfo
- Publication number
- CN116189147A CN116189147A CN202310155654.0A CN202310155654A CN116189147A CN 116189147 A CN116189147 A CN 116189147A CN 202310155654 A CN202310155654 A CN 202310155654A CN 116189147 A CN116189147 A CN 116189147A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- yolo
- network
- dimensional
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Measurement Of Optical Distance (AREA)
Abstract
The invention provides a YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method, which belongs to the field of target detection, and comprises a point cloud metadata processing step, a BEV mapping step, an RGB filling step and a network feature extraction and regression step.
Description
Technical Field
The invention belongs to the field of deep learning, and particularly relates to a three-dimensional point cloud low-power-consumption rapid target detection method based on YOLO.
Background
The three-dimensional point cloud target detection autopilot, AR, VR and robot fields are widely used. Compared with other mode data, the three-dimensional point cloud information has richer geometric information, and along with the growth of the market of acquisition equipment such as a laser radar, the acquisition threshold of the three-dimensional point cloud is gradually lowered. Target detection methods of three-dimensional point clouds are generally classified into three categories: a multi-view method of projecting a three-dimensional point cloud into a two-dimensional point cloud, a voxel convolution method based on representing a scene in voxel form, and a method of directly processing three-dimensional point cloud data.
The three-dimensional point cloud data at present has the following problems: the density of the point clouds is inconsistent, the near point clouds are dense, and the far point clouds are sparse in the acquisition process of the laser radar; the point clouds have disorder, and the point clouds on the same object can be completely represented by two completely different three-dimensional point cloud coordinate matrixes; the point cloud has low resolution, and the three-dimensional point cloud is used for sampling the three-dimensional geometric shape with low resolution, so that only one-sided geometric information can be obtained; the early acquisition sensor has various noises and the like.
In general, the target detection of the three-dimensional point cloud has a large calculation load, so that the application scene of the three-dimensional point cloud technology is limited by the calculation force of the system, and the three-dimensional point cloud technology cannot be applied to low-calculation-force and low-power consumption platforms such as many embedded devices. Therefore, how to reduce the calculation load, improve the calculation efficiency, and reduce the prediction time consumption is an important research topic in the field.
Disclosure of Invention
Aiming at the defects of the existing three-dimensional point cloud target detection, the three-dimensional point cloud target detection method is obtained based on the YOLO network improvement, the model is simple, the calculation efficiency of the three-dimensional point cloud target detection algorithm is improved, the calculation load is reduced, the prediction time consumption is reduced, the hardware requirement is lower, the three-dimensional point cloud target detection function can be rapidly completed on a low-power-consumption platform, and the method is suitable for low-power-consumption and low-calculation-force platforms.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a YOLO-based three-dimensional point cloud low-power consumption rapid target detection method comprises the following steps:
step one, three-dimensional point cloud metadata processing, which is used for eliminating non-valued sampling points in original three-dimensional point cloud data, and specifically comprises the following sub-steps:
(1.1) point cloud clipping, namely setting a clipping box according to the target environment and the performance of acquisition equipment, and removing point cloud data with a relatively far distance and a relatively low value;
(1.2) point cloud downsampling, namely setting voxel grids with proper sizes, and finishing downsampling by a voxel method;
and (1.3) removing outliers, and removing sampling points exceeding alpha times standard deviation in the search radius by utilizing Gaussian distribution statistical characteristics of the point cloud.
Mapping the three-dimensional point cloud data to the BEV, and compressing the three-dimensional data to a two-dimensional space through mapping, wherein the method specifically comprises the following substeps:
(2.1) rasterizing three-dimensional point cloud information;
(2.2) distributing the point cloud into a grid under the bird's eye view.
Step three, the information obtained in the step two is normalized and filled into RGB three channels, and the characteristics under the BEV visual angle are extracted to be matched with the RGB channels, and the method specifically comprises the following substeps:
(3.1) respectively obtaining three kinds of information of maximum height, maximum intensity and point cloud density in each grid;
(3.2) respectively carrying out normalization treatment on the three kinds of information;
(3.3) filling three kinds of information into the RGB channel to match the RGB channel with the network.
And fourthly, finishing feature extraction and loss regression by using a YOLO network, adding a complex angle regression layer into the YOLO network for expansion, and finishing feature extraction and loss regression, wherein the method specifically comprises the following substeps:
(4.1) feature extraction, using a simplified YOLO-v4 network, by adding a complex angle regression layer for expansion;
and (4.2) loss regression, introducing a complex angle into a loss function, and completing loss function calculation.
Compared with the prior art, the invention has the following beneficial effects:
the invention modifies the network structure based on the traditional target detection deep learning neural network, and provides an efficient target detection method. The three-dimensional information is compressed to the two-dimensional space, so that the operation load can be effectively reduced, the detection efficiency is improved, and the calculation force requirement on an operation platform is reduced. Compared with other network structures, the method has the advantages that the method is simpler, the computational effort occupation of an operation platform is reduced, the prediction time consumption is reduced, and the application range and the scene of three-dimensional point cloud target detection are expanded.
Drawings
FIG. 1 is a basic flow chart of the method of the present invention.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.
As shown in fig. 1, the YOLO-based three-dimensional point cloud low-power consumption rapid target detection method of the embodiment includes the following steps:
step 100: and acquiring three-dimensional point cloud metadata of the laser radar.
The laser radar imaging system mainly measures the distance of an object by laser, and simultaneously adjusts the laser emission angle and the laser emission position by using a control system and a scanning system to image. The laser radar distance measurement mainly calculates the distance between the laser radar and the target through laser flight time, a timing circuit starts after the laser emits the laser, and the timing is finished after the laser receives the echo signal, and the target distance is calculated through calculating the time difference between the emission and the receiving. The coordinates with the lidar as the origin are thus obtained from the distance, the horizontal angle and the vertical angle. Each sample point consists of a set of three-dimensional coordinates and a reflection intensity.
Step 101: and preprocessing the three-dimensional point cloud metadata.
Firstly, according to the influence of factors such as algorithm characteristics, sampling environment, sampling equipment performance and the like, a cutting box is set to be 80 meters long, 40 meters wide and 3 meters high, and point cloud data with a far distance and low value are removed, so that unnecessary operation is reduced; dividing small voxels with length, width and height of 1 cm in a three-dimensional space, obtaining a point set falling in each voxel, and taking a sampling point in each voxel to replace the original point set, thereby completing point cloud downsampling; according to the characteristic of Gaussian distribution of the point cloud distribution sign, the number of the closest points analyzed by the sampling points in the point cloud is set as K, the distances from all the points to the sampling points in the point cloud are calculated, and if the distances from a certain point to the sampling points exceed the average distance by more than alpha times of standard deviation, the point is regarded as an outlier and needs to be removed.
Step 102: the three-dimensional point cloud data is mapped to BEV perspectives.
The three-dimensional point cloud data are rasterized, the grid resolution is set to be 8 cm, all point clouds are mapped to a two-dimensional plane in a overlooking view angle, and therefore a two-dimensional point cloud image under the BEV view angle is obtained.
Step 103: filling into network RGB channels.
The maximum height, the maximum intensity and the point cloud density of the point cloud in each grid are obtained respectively, the formula is as follows, normalization processing is carried out on three kinds of information respectively, and the obtained three-channel data are filled into RGB channels.
Definition of the definitionFor projection onto a specific grid of point clouds at BEV view angles, < >>Describing the mapping function mapped to a particular grid, then:
wherein z is g Indicating maximum height, z b Representing maximum intensity, the I function represents single point cloud intensity, z r Representing the normalized point cloud density within the grid, N is the number of points mapped to a particular grid at the BEV perspective.
Step 104: feature extraction and loss regression were done using YOLO networks.
Wherein the overall network characteristics are similar to the YOLO-v4 network, and the characteristics are the same as the YOLO-v4 network in the feature extraction stage, namely the CSP-dark net53 network. After the network is extracted by the features, a complex angle regression layer is added to the output layer, and the features output by the network are decoded into three-dimensional space coordinates, size, category probability and orientation angle of the target.
Wherein the size of the complex angle regression layer is determined according to the size and shape of the input point cloud network (in this embodiment, the regression network is set to be 32×16×75, i.e. 32×16 grids are divided, each grid provides 5 predictions), and each prediction includes t x 、t y 、c x 、c y 、t w 、t l Etc. prediction parameters.
b x =σ(t x )+c x
b y =σ(t y )+c y
b φ =arctan 2 (t Im ,t Re )
Wherein the predicted center point t x ,t y Normalizing into the relative position of each grid by a sigmoid function; sigma function means that the actual offset is obtained by the relative position; c x ,x y Indexing positions for grids on the output feature map; t is t w ,t l Obtaining the offset relative to the anchor frame through logarithmic function characterization; p is p w ,p l The length and width of the anchor frame are the length and width of the anchor frame; t is t Im ,t Re For the real and imaginary parts of the predicted complex angle, the orientation angle b is determined by arctangent φ 。
Wherein b x Is the x coordinate of the central point of the three-dimensional space of the target, b y Is the y coordinate of the center point of the three-dimensional space of the target, b w For the width of the target three-dimensional space b l B is the length of the target three-dimensional space φ Is the orientation angle of the target three-dimensional space.
Wherein, complex angle returns, the target orientation angle b φ Can pass through corresponding regression parameters t Im And t Re Calculated to obtain t Im And t Re The real part and the imaginary part of the complex number are respectively corresponding, and the singularity can be effectively avoided by adopting a complex number mode.
Wherein the loss function:
L=L Yolo +L Euler
wherein L is Yolo As a self-loss function of YOLO, L Euler Is a loss function of the complex angle regression layer.
In general, it is desirable to have a higher learning rate at the early stage of training so that the network converges rapidly, and a lower learning rate at the later stage of training so that the network converges better to the optimal solution.
In the embodiment, verification is completed on the KITTI data set, and compared with the VoxelNet and other networks, the average detection accuracy is basically the same, but the model reasoning time and the network volume are greatly reduced. The method can complete real-time reasoning speed of 4.7fps on the NVIDIA TX2 low-power consumption embedded platform.
In summary, the invention provides a low-power-consumption rapid target detection method for three-dimensional point clouds based on YOLO, which is characterized in that after three-dimensional point cloud metadata is obtained from a laser radar, invalid sampling points are reduced by cutting, removing outliers and the like, and then the three-dimensional point clouds are mapped to a two-dimensional plane, so that the network calculation amount is greatly reduced, a complex convolution layer is added on the basis of a mature YOLO network, a high-efficiency point cloud target detection network is finally formed, and the problem of singular points caused by single-angle estimation is avoided. The method is simple and clear in principle, small in calculation burden and short in prediction time consumption, can effectively expand the application scene of three-dimensional point cloud target detection, and has wide application value and market prospect.
Claims (3)
1. A YOLO-based three-dimensional point cloud low-power consumption rapid target detection method is characterized by comprising the following steps:
step one, three-dimensional point cloud metadata processing, which is used for eliminating non-valued sampling points in original three-dimensional point cloud data, and specifically comprises the following sub-steps:
(1.1) point cloud clipping, namely setting a clipping box according to the target environment and the performance of acquisition equipment, and eliminating the point cloud data with low distance value;
(1.2) point cloud downsampling, namely setting a voxel grid, and finishing downsampling by a voxel method;
(1.3) removing outliers, and removing sampling points exceeding alpha times standard deviation in a searching radius by utilizing Gaussian distribution statistical characteristics of point clouds;
mapping the three-dimensional point cloud data to the BEV, and compressing the three-dimensional data to a two-dimensional space through mapping, wherein the method specifically comprises the following substeps:
(2.1) rasterizing three-dimensional point cloud information;
(2.2) distributing the point cloud into a grid under the aerial view;
step three, the information obtained in the step two is normalized and filled into RGB three channels, and the characteristics under the BEV visual angle are extracted to be matched with the RGB channels, and the method specifically comprises the following substeps:
(3.1) respectively obtaining three kinds of information of maximum height, maximum intensity and point cloud density in each grid;
(3.2) respectively carrying out normalization treatment on the three kinds of information;
(3.3) filling three kinds of information into the RGB channel to match with the network;
and fourthly, finishing feature extraction and loss regression by using a YOLO network, adding a complex angle regression layer into the YOLO network for expansion, and finishing feature extraction and loss regression, wherein the method specifically comprises the following substeps:
(4.1) feature extraction, using YOLO-v4 network, expanding by adding complex angle regression layer;
and (4.2) loss regression, introducing a complex angle into a loss function, and completing loss function calculation.
2. The YOLO-based three-dimensional point cloud low-power consumption rapid target detection method according to claim 1, wherein in the third step,
definition of the definitionFor projection onto a specific grid of point clouds at BEV view angles, < >>Describing the mapping function mapped to a particular grid, then:
wherein z is g Indicating maximum height, z b Representing maximum intensity, the I function represents single point cloud intensity, z r Representing the normalized point cloud density within the grid, N is the number of points mapped to a particular grid at the BEV perspective.
3. The YOLO-based three-dimensional point cloud low-power consumption rapid target detection method according to claim 1 or 2, wherein in the fourth step,
the overall network characteristics are similar to the YOLO-v4 network, and are the same as the YOLO-v4 network in the characteristic extraction stage, namely, a CSP-DarkNet53 network is used; after the network is extracted by the features, a complex angle regression layer is added to the output layer, and the features output by the network are decoded into three-dimensional space coordinates, size, category probability and orientation angle of the target;
wherein the size of the complex angle regression layer is determined according to the size and shape of the input point cloud network, and each prediction comprises t x 、t y 、c x 、c y 、t w 、t l Is used for predicting parameters of the (a);
b x =σ(t x )+c x
b y =σ(t y )+c y
b φ =arctan 2 (t Im ,t Re )
wherein the predicted center point t x ,t y Normalizing into the relative position of each grid by a sigmoid function; sigma function means that the actual offset is obtained by the relative position; c x ,c y Indexing positions for grids on the output feature map; t is t w ,t l Obtaining the offset relative to the anchor frame through logarithmic function characterization; p is p w ,p l The length and width of the anchor frame are the length and width of the anchor frame; t is t Im ,t Re For the real and imaginary parts of the predicted complex angle, the orientation angle b is determined by arctangent φ ;
Wherein b x X coordinate of central point of three-dimensional space for target,b y Is the y coordinate of the center point of the three-dimensional space of the target, b w For the width of the target three-dimensional space b l B is the length of the target three-dimensional space φ The orientation angle of the three-dimensional space of the target;
wherein, complex angle returns, the target orientation angle b φ Can pass through corresponding regression parameters t Im And t Re Calculated to obtain t Im And t Re Corresponding to the real and imaginary parts of the complex numbers, respectively;
wherein the loss function:
L=L Yolo +L Euler
wherein L is Yolo As a self-loss function of YOLO, L Euler Is a loss function of the complex angle regression layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310155654.0A CN116189147A (en) | 2023-02-23 | 2023-02-23 | YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310155654.0A CN116189147A (en) | 2023-02-23 | 2023-02-23 | YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116189147A true CN116189147A (en) | 2023-05-30 |
Family
ID=86445939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310155654.0A Pending CN116189147A (en) | 2023-02-23 | 2023-02-23 | YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116189147A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116704163A (en) * | 2023-08-03 | 2023-09-05 | 金锐同创(北京)科技股份有限公司 | Method, device, equipment and medium for displaying virtual reality scene at terminal |
CN117292140A (en) * | 2023-10-17 | 2023-12-26 | 小米汽车科技有限公司 | Point cloud data processing method and device, vehicle and storage medium |
-
2023
- 2023-02-23 CN CN202310155654.0A patent/CN116189147A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116704163A (en) * | 2023-08-03 | 2023-09-05 | 金锐同创(北京)科技股份有限公司 | Method, device, equipment and medium for displaying virtual reality scene at terminal |
CN116704163B (en) * | 2023-08-03 | 2023-10-31 | 金锐同创(北京)科技股份有限公司 | Method, device, equipment and medium for displaying virtual reality scene at terminal |
CN117292140A (en) * | 2023-10-17 | 2023-12-26 | 小米汽车科技有限公司 | Point cloud data processing method and device, vehicle and storage medium |
CN117292140B (en) * | 2023-10-17 | 2024-04-02 | 小米汽车科技有限公司 | Point cloud data processing method and device, vehicle and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021000902A1 (en) | Sar image data enhancement method and apparatus, and storage medium | |
CN111832655B (en) | Multi-scale three-dimensional target detection method based on characteristic pyramid network | |
CN116189147A (en) | YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method | |
CN112418245B (en) | Electromagnetic emission point positioning method based on urban environment physical model | |
CN109410307A (en) | A kind of scene point cloud semantic segmentation method | |
CN112395987B (en) | SAR image target detection method based on unsupervised domain adaptive CNN | |
CN110866531A (en) | Building feature extraction method and system based on three-dimensional modeling and storage medium | |
CN109241978B (en) | Method for rapidly extracting plane piece in foundation three-dimensional laser point cloud | |
CN110827302A (en) | Point cloud target extraction method and device based on depth map convolutional network | |
CN103414861A (en) | Method for self-adaptation geometric correction of projector picture | |
CN110807781A (en) | Point cloud simplification method capable of retaining details and boundary features | |
CN112630160A (en) | Unmanned aerial vehicle track planning soil humidity monitoring method and system based on image acquisition and readable storage medium | |
CN109323697A (en) | A method of particle fast convergence when starting for Indoor Robot arbitrary point | |
CN108986218A (en) | A kind of building point off density cloud fast reconstructing method based on PMVS | |
CN111458691B (en) | Building information extraction method and device and computer equipment | |
CN116258817A (en) | Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction | |
CN113240038A (en) | Point cloud target detection method based on height-channel feature enhancement | |
CN111765883B (en) | Robot Monte Carlo positioning method, equipment and storage medium | |
CN113989631A (en) | Infrared image target detection network compression method based on convolutional neural network | |
CN108345007B (en) | Obstacle identification method and device | |
CN111915724A (en) | Point cloud model slice shape calculation method | |
CN114140495A (en) | Single target tracking method based on multi-scale Transformer | |
CN111951299B (en) | Infrared aerial target detection method | |
CN112762824B (en) | Unmanned vehicle positioning method and system | |
CN113139965A (en) | Indoor real-time three-dimensional semantic segmentation method based on depth map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |