CN116189147A - YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method - Google Patents

YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method Download PDF

Info

Publication number
CN116189147A
CN116189147A CN202310155654.0A CN202310155654A CN116189147A CN 116189147 A CN116189147 A CN 116189147A CN 202310155654 A CN202310155654 A CN 202310155654A CN 116189147 A CN116189147 A CN 116189147A
Authority
CN
China
Prior art keywords
point cloud
yolo
network
dimensional
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310155654.0A
Other languages
Chinese (zh)
Inventor
柳冠华
吴振宇
赵亮
崔俊涛
黄涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310155654.0A priority Critical patent/CN116189147A/en
Publication of CN116189147A publication Critical patent/CN116189147A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Measurement Of Optical Distance (AREA)

Abstract

The invention provides a YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method, which belongs to the field of target detection, and comprises a point cloud metadata processing step, a BEV mapping step, an RGB filling step and a network feature extraction and regression step.

Description

YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method
Technical Field
The invention belongs to the field of deep learning, and particularly relates to a three-dimensional point cloud low-power-consumption rapid target detection method based on YOLO.
Background
The three-dimensional point cloud target detection autopilot, AR, VR and robot fields are widely used. Compared with other mode data, the three-dimensional point cloud information has richer geometric information, and along with the growth of the market of acquisition equipment such as a laser radar, the acquisition threshold of the three-dimensional point cloud is gradually lowered. Target detection methods of three-dimensional point clouds are generally classified into three categories: a multi-view method of projecting a three-dimensional point cloud into a two-dimensional point cloud, a voxel convolution method based on representing a scene in voxel form, and a method of directly processing three-dimensional point cloud data.
The three-dimensional point cloud data at present has the following problems: the density of the point clouds is inconsistent, the near point clouds are dense, and the far point clouds are sparse in the acquisition process of the laser radar; the point clouds have disorder, and the point clouds on the same object can be completely represented by two completely different three-dimensional point cloud coordinate matrixes; the point cloud has low resolution, and the three-dimensional point cloud is used for sampling the three-dimensional geometric shape with low resolution, so that only one-sided geometric information can be obtained; the early acquisition sensor has various noises and the like.
In general, the target detection of the three-dimensional point cloud has a large calculation load, so that the application scene of the three-dimensional point cloud technology is limited by the calculation force of the system, and the three-dimensional point cloud technology cannot be applied to low-calculation-force and low-power consumption platforms such as many embedded devices. Therefore, how to reduce the calculation load, improve the calculation efficiency, and reduce the prediction time consumption is an important research topic in the field.
Disclosure of Invention
Aiming at the defects of the existing three-dimensional point cloud target detection, the three-dimensional point cloud target detection method is obtained based on the YOLO network improvement, the model is simple, the calculation efficiency of the three-dimensional point cloud target detection algorithm is improved, the calculation load is reduced, the prediction time consumption is reduced, the hardware requirement is lower, the three-dimensional point cloud target detection function can be rapidly completed on a low-power-consumption platform, and the method is suitable for low-power-consumption and low-calculation-force platforms.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a YOLO-based three-dimensional point cloud low-power consumption rapid target detection method comprises the following steps:
step one, three-dimensional point cloud metadata processing, which is used for eliminating non-valued sampling points in original three-dimensional point cloud data, and specifically comprises the following sub-steps:
(1.1) point cloud clipping, namely setting a clipping box according to the target environment and the performance of acquisition equipment, and removing point cloud data with a relatively far distance and a relatively low value;
(1.2) point cloud downsampling, namely setting voxel grids with proper sizes, and finishing downsampling by a voxel method;
and (1.3) removing outliers, and removing sampling points exceeding alpha times standard deviation in the search radius by utilizing Gaussian distribution statistical characteristics of the point cloud.
Mapping the three-dimensional point cloud data to the BEV, and compressing the three-dimensional data to a two-dimensional space through mapping, wherein the method specifically comprises the following substeps:
(2.1) rasterizing three-dimensional point cloud information;
(2.2) distributing the point cloud into a grid under the bird's eye view.
Step three, the information obtained in the step two is normalized and filled into RGB three channels, and the characteristics under the BEV visual angle are extracted to be matched with the RGB channels, and the method specifically comprises the following substeps:
(3.1) respectively obtaining three kinds of information of maximum height, maximum intensity and point cloud density in each grid;
(3.2) respectively carrying out normalization treatment on the three kinds of information;
(3.3) filling three kinds of information into the RGB channel to match the RGB channel with the network.
And fourthly, finishing feature extraction and loss regression by using a YOLO network, adding a complex angle regression layer into the YOLO network for expansion, and finishing feature extraction and loss regression, wherein the method specifically comprises the following substeps:
(4.1) feature extraction, using a simplified YOLO-v4 network, by adding a complex angle regression layer for expansion;
and (4.2) loss regression, introducing a complex angle into a loss function, and completing loss function calculation.
Compared with the prior art, the invention has the following beneficial effects:
the invention modifies the network structure based on the traditional target detection deep learning neural network, and provides an efficient target detection method. The three-dimensional information is compressed to the two-dimensional space, so that the operation load can be effectively reduced, the detection efficiency is improved, and the calculation force requirement on an operation platform is reduced. Compared with other network structures, the method has the advantages that the method is simpler, the computational effort occupation of an operation platform is reduced, the prediction time consumption is reduced, and the application range and the scene of three-dimensional point cloud target detection are expanded.
Drawings
FIG. 1 is a basic flow chart of the method of the present invention.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.
As shown in fig. 1, the YOLO-based three-dimensional point cloud low-power consumption rapid target detection method of the embodiment includes the following steps:
step 100: and acquiring three-dimensional point cloud metadata of the laser radar.
The laser radar imaging system mainly measures the distance of an object by laser, and simultaneously adjusts the laser emission angle and the laser emission position by using a control system and a scanning system to image. The laser radar distance measurement mainly calculates the distance between the laser radar and the target through laser flight time, a timing circuit starts after the laser emits the laser, and the timing is finished after the laser receives the echo signal, and the target distance is calculated through calculating the time difference between the emission and the receiving. The coordinates with the lidar as the origin are thus obtained from the distance, the horizontal angle and the vertical angle. Each sample point consists of a set of three-dimensional coordinates and a reflection intensity.
Step 101: and preprocessing the three-dimensional point cloud metadata.
Firstly, according to the influence of factors such as algorithm characteristics, sampling environment, sampling equipment performance and the like, a cutting box is set to be 80 meters long, 40 meters wide and 3 meters high, and point cloud data with a far distance and low value are removed, so that unnecessary operation is reduced; dividing small voxels with length, width and height of 1 cm in a three-dimensional space, obtaining a point set falling in each voxel, and taking a sampling point in each voxel to replace the original point set, thereby completing point cloud downsampling; according to the characteristic of Gaussian distribution of the point cloud distribution sign, the number of the closest points analyzed by the sampling points in the point cloud is set as K, the distances from all the points to the sampling points in the point cloud are calculated, and if the distances from a certain point to the sampling points exceed the average distance by more than alpha times of standard deviation, the point is regarded as an outlier and needs to be removed.
Step 102: the three-dimensional point cloud data is mapped to BEV perspectives.
The three-dimensional point cloud data are rasterized, the grid resolution is set to be 8 cm, all point clouds are mapped to a two-dimensional plane in a overlooking view angle, and therefore a two-dimensional point cloud image under the BEV view angle is obtained.
Step 103: filling into network RGB channels.
The maximum height, the maximum intensity and the point cloud density of the point cloud in each grid are obtained respectively, the formula is as follows, normalization processing is carried out on three kinds of information respectively, and the obtained three-channel data are filled into RGB channels.
Definition of the definition
Figure BDA0004092251200000041
For projection onto a specific grid of point clouds at BEV view angles, < >>
Figure BDA0004092251200000042
Describing the mapping function mapped to a particular grid, then:
Figure BDA0004092251200000043
Figure BDA0004092251200000044
Figure BDA0004092251200000045
wherein z is g Indicating maximum height, z b Representing maximum intensity, the I function represents single point cloud intensity, z r Representing the normalized point cloud density within the grid, N is the number of points mapped to a particular grid at the BEV perspective.
Step 104: feature extraction and loss regression were done using YOLO networks.
Wherein the overall network characteristics are similar to the YOLO-v4 network, and the characteristics are the same as the YOLO-v4 network in the feature extraction stage, namely the CSP-dark net53 network. After the network is extracted by the features, a complex angle regression layer is added to the output layer, and the features output by the network are decoded into three-dimensional space coordinates, size, category probability and orientation angle of the target.
Wherein the size of the complex angle regression layer is determined according to the size and shape of the input point cloud network (in this embodiment, the regression network is set to be 32×16×75, i.e. 32×16 grids are divided, each grid provides 5 predictions), and each prediction includes t x 、t y 、c x 、c y 、t w 、t l Etc. prediction parameters.
b x =σ(t x )+c x
b y =σ(t y )+c y
Figure BDA0004092251200000051
Figure BDA0004092251200000052
b φ =arctan 2 (t Im ,t Re )
Wherein the predicted center point t x ,t y Normalizing into the relative position of each grid by a sigmoid function; sigma function means that the actual offset is obtained by the relative position; c x ,x y Indexing positions for grids on the output feature map; t is t w ,t l Obtaining the offset relative to the anchor frame through logarithmic function characterization; p is p w ,p l The length and width of the anchor frame are the length and width of the anchor frame; t is t Im ,t Re For the real and imaginary parts of the predicted complex angle, the orientation angle b is determined by arctangent φ
Wherein b x Is the x coordinate of the central point of the three-dimensional space of the target, b y Is the y coordinate of the center point of the three-dimensional space of the target, b w For the width of the target three-dimensional space b l B is the length of the target three-dimensional space φ Is the orientation angle of the target three-dimensional space.
Wherein, complex angle returns, the target orientation angle b φ Can pass through corresponding regression parameters t Im And t Re Calculated to obtain t Im And t Re The real part and the imaginary part of the complex number are respectively corresponding, and the singularity can be effectively avoided by adopting a complex number mode.
Wherein the loss function:
L=L Yolo +L Euler
wherein L is Yolo As a self-loss function of YOLO, L Euler Is a loss function of the complex angle regression layer.
In general, it is desirable to have a higher learning rate at the early stage of training so that the network converges rapidly, and a lower learning rate at the later stage of training so that the network converges better to the optimal solution.
In the embodiment, verification is completed on the KITTI data set, and compared with the VoxelNet and other networks, the average detection accuracy is basically the same, but the model reasoning time and the network volume are greatly reduced. The method can complete real-time reasoning speed of 4.7fps on the NVIDIA TX2 low-power consumption embedded platform.
In summary, the invention provides a low-power-consumption rapid target detection method for three-dimensional point clouds based on YOLO, which is characterized in that after three-dimensional point cloud metadata is obtained from a laser radar, invalid sampling points are reduced by cutting, removing outliers and the like, and then the three-dimensional point clouds are mapped to a two-dimensional plane, so that the network calculation amount is greatly reduced, a complex convolution layer is added on the basis of a mature YOLO network, a high-efficiency point cloud target detection network is finally formed, and the problem of singular points caused by single-angle estimation is avoided. The method is simple and clear in principle, small in calculation burden and short in prediction time consumption, can effectively expand the application scene of three-dimensional point cloud target detection, and has wide application value and market prospect.

Claims (3)

1. A YOLO-based three-dimensional point cloud low-power consumption rapid target detection method is characterized by comprising the following steps:
step one, three-dimensional point cloud metadata processing, which is used for eliminating non-valued sampling points in original three-dimensional point cloud data, and specifically comprises the following sub-steps:
(1.1) point cloud clipping, namely setting a clipping box according to the target environment and the performance of acquisition equipment, and eliminating the point cloud data with low distance value;
(1.2) point cloud downsampling, namely setting a voxel grid, and finishing downsampling by a voxel method;
(1.3) removing outliers, and removing sampling points exceeding alpha times standard deviation in a searching radius by utilizing Gaussian distribution statistical characteristics of point clouds;
mapping the three-dimensional point cloud data to the BEV, and compressing the three-dimensional data to a two-dimensional space through mapping, wherein the method specifically comprises the following substeps:
(2.1) rasterizing three-dimensional point cloud information;
(2.2) distributing the point cloud into a grid under the aerial view;
step three, the information obtained in the step two is normalized and filled into RGB three channels, and the characteristics under the BEV visual angle are extracted to be matched with the RGB channels, and the method specifically comprises the following substeps:
(3.1) respectively obtaining three kinds of information of maximum height, maximum intensity and point cloud density in each grid;
(3.2) respectively carrying out normalization treatment on the three kinds of information;
(3.3) filling three kinds of information into the RGB channel to match with the network;
and fourthly, finishing feature extraction and loss regression by using a YOLO network, adding a complex angle regression layer into the YOLO network for expansion, and finishing feature extraction and loss regression, wherein the method specifically comprises the following substeps:
(4.1) feature extraction, using YOLO-v4 network, expanding by adding complex angle regression layer;
and (4.2) loss regression, introducing a complex angle into a loss function, and completing loss function calculation.
2. The YOLO-based three-dimensional point cloud low-power consumption rapid target detection method according to claim 1, wherein in the third step,
definition of the definition
Figure FDA0004092251190000021
For projection onto a specific grid of point clouds at BEV view angles, < >>
Figure FDA0004092251190000022
Describing the mapping function mapped to a particular grid, then:
Figure FDA0004092251190000023
Figure FDA0004092251190000024
Figure FDA0004092251190000025
wherein z is g Indicating maximum height, z b Representing maximum intensity, the I function represents single point cloud intensity, z r Representing the normalized point cloud density within the grid, N is the number of points mapped to a particular grid at the BEV perspective.
3. The YOLO-based three-dimensional point cloud low-power consumption rapid target detection method according to claim 1 or 2, wherein in the fourth step,
the overall network characteristics are similar to the YOLO-v4 network, and are the same as the YOLO-v4 network in the characteristic extraction stage, namely, a CSP-DarkNet53 network is used; after the network is extracted by the features, a complex angle regression layer is added to the output layer, and the features output by the network are decoded into three-dimensional space coordinates, size, category probability and orientation angle of the target;
wherein the size of the complex angle regression layer is determined according to the size and shape of the input point cloud network, and each prediction comprises t x 、t y 、c x 、c y 、t w 、t l Is used for predicting parameters of the (a);
b x =σ(t x )+c x
b y =σ(t y )+c y
Figure FDA0004092251190000026
Figure FDA0004092251190000027
b φ =arctan 2 (t Im ,t Re )
wherein the predicted center point t x ,t y Normalizing into the relative position of each grid by a sigmoid function; sigma function means that the actual offset is obtained by the relative position; c x ,c y Indexing positions for grids on the output feature map; t is t w ,t l Obtaining the offset relative to the anchor frame through logarithmic function characterization; p is p w ,p l The length and width of the anchor frame are the length and width of the anchor frame; t is t Im ,t Re For the real and imaginary parts of the predicted complex angle, the orientation angle b is determined by arctangent φ
Wherein b x X coordinate of central point of three-dimensional space for target,b y Is the y coordinate of the center point of the three-dimensional space of the target, b w For the width of the target three-dimensional space b l B is the length of the target three-dimensional space φ The orientation angle of the three-dimensional space of the target;
wherein, complex angle returns, the target orientation angle b φ Can pass through corresponding regression parameters t Im And t Re Calculated to obtain t Im And t Re Corresponding to the real and imaginary parts of the complex numbers, respectively;
wherein the loss function:
L=L Yolo +L Euler
wherein L is Yolo As a self-loss function of YOLO, L Euler Is a loss function of the complex angle regression layer.
CN202310155654.0A 2023-02-23 2023-02-23 YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method Pending CN116189147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310155654.0A CN116189147A (en) 2023-02-23 2023-02-23 YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310155654.0A CN116189147A (en) 2023-02-23 2023-02-23 YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method

Publications (1)

Publication Number Publication Date
CN116189147A true CN116189147A (en) 2023-05-30

Family

ID=86445939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310155654.0A Pending CN116189147A (en) 2023-02-23 2023-02-23 YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method

Country Status (1)

Country Link
CN (1) CN116189147A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704163A (en) * 2023-08-03 2023-09-05 金锐同创(北京)科技股份有限公司 Method, device, equipment and medium for displaying virtual reality scene at terminal
CN117292140A (en) * 2023-10-17 2023-12-26 小米汽车科技有限公司 Point cloud data processing method and device, vehicle and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704163A (en) * 2023-08-03 2023-09-05 金锐同创(北京)科技股份有限公司 Method, device, equipment and medium for displaying virtual reality scene at terminal
CN116704163B (en) * 2023-08-03 2023-10-31 金锐同创(北京)科技股份有限公司 Method, device, equipment and medium for displaying virtual reality scene at terminal
CN117292140A (en) * 2023-10-17 2023-12-26 小米汽车科技有限公司 Point cloud data processing method and device, vehicle and storage medium
CN117292140B (en) * 2023-10-17 2024-04-02 小米汽车科技有限公司 Point cloud data processing method and device, vehicle and storage medium

Similar Documents

Publication Publication Date Title
WO2021000902A1 (en) Sar image data enhancement method and apparatus, and storage medium
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN116189147A (en) YOLO-based three-dimensional point cloud low-power-consumption rapid target detection method
CN112418245B (en) Electromagnetic emission point positioning method based on urban environment physical model
CN109410307A (en) A kind of scene point cloud semantic segmentation method
CN112395987B (en) SAR image target detection method based on unsupervised domain adaptive CNN
CN110866531A (en) Building feature extraction method and system based on three-dimensional modeling and storage medium
CN109241978B (en) Method for rapidly extracting plane piece in foundation three-dimensional laser point cloud
CN110827302A (en) Point cloud target extraction method and device based on depth map convolutional network
CN103414861A (en) Method for self-adaptation geometric correction of projector picture
CN110807781A (en) Point cloud simplification method capable of retaining details and boundary features
CN112630160A (en) Unmanned aerial vehicle track planning soil humidity monitoring method and system based on image acquisition and readable storage medium
CN109323697A (en) A method of particle fast convergence when starting for Indoor Robot arbitrary point
CN108986218A (en) A kind of building point off density cloud fast reconstructing method based on PMVS
CN111458691B (en) Building information extraction method and device and computer equipment
CN116258817A (en) Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN113240038A (en) Point cloud target detection method based on height-channel feature enhancement
CN111765883B (en) Robot Monte Carlo positioning method, equipment and storage medium
CN113989631A (en) Infrared image target detection network compression method based on convolutional neural network
CN108345007B (en) Obstacle identification method and device
CN111915724A (en) Point cloud model slice shape calculation method
CN114140495A (en) Single target tracking method based on multi-scale Transformer
CN111951299B (en) Infrared aerial target detection method
CN112762824B (en) Unmanned vehicle positioning method and system
CN113139965A (en) Indoor real-time three-dimensional semantic segmentation method based on depth map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination