CN113240038A

CN113240038A - Point cloud target detection method based on height-channel feature enhancement

Info

Publication number: CN113240038A
Application number: CN202110605139.9A
Authority: CN
Inventors: 张静; 王佳军; 许达; 李云松
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-10
Anticipated expiration: 2041-05-31
Also published as: CN113240038B

Abstract

The invention discloses a point cloud target detection method based on height-channel characteristic enhancement, which is used for solving the problems that the information loss and the detection effect of the existing detection method for compressing a point cloud space are limited by point cloud distribution characteristics. The method comprises the following steps: (1) converting point cloud data blocks received from a laser radar in real time into aggregated feature vectors; (2) extracting the attention weight value of a high degree dimension in the aggregation feature vector; (3) extracting an attention weight value of a channel dimension in the aggregation feature vector; (4) weighting the aggregated feature vectors; (5) constructing a backbone network; (6) training a backbone network; (7) and detecting the point cloud target. According to the invention, the point cloud data is equally divided into four parts, so that the information loss of the point cloud data is overcome, the key characteristics of the point cloud are enhanced by extracting and weighting the characteristic vectors, and the average precision of the point cloud target detection is improved.

Description

Point cloud target detection method based on height-channel feature enhancement

Technical Field

The invention belongs to the technical field of radars, and further relates to a point cloud target detection method based on height-channel characteristic enhancement in the technical field of radar target detection. The invention can detect the distant target in the point cloud scene by enhancing the point cloud characteristics.

Background

The point cloud is used as a basic data format output by the laser radar, original geometric information in a three-dimensional space is reserved, and rich shape and proportion information can be provided. It is therefore a preferred representation for unmanned and robotic scene-aware understanding. According to the characteristics of the laser radar, the point cloud data of every 360 degrees can form a three-dimensional point cloud scene. However, due to the fact that point clouds have near-dense far-sparse distribution unevenness, detection of far-sparse targets in a point cloud scene always has great challenges, and how to accurately detect far-sparse targets in the point cloud scene becomes a problem to be solved urgently in the technical field.

A point cloud target detection method based on a foresight grid is disclosed in patent document 'laser radar point cloud target detection method, system and device' applied by Shenlan Artificial Intelligence (Shenzhen) Limited company (patent application No. CN2020110603176, publication No. CN 112183393A). The method comprises the following specific steps: 1. an input feature construction step, namely constructing a foresight grid for the acquired point cloud, projecting each point in the field of view range in the point cloud into the foresight grid, recording the statistic of the point closest to the center of the laser radar in each grid as a feature value, and taking the statistic as an input feature of the foresight grid; extracting coordinates and reflection intensity of a point closest to the center of the laser radar in a point grid in each front view grid to obtain point cloud input characteristics; 2. a feature extraction step, wherein a front view grid input feature and a point cloud input feature are extracted in the step, and a convolution neural network is used for extracting features from a front view grid output feature; the point cloud output characteristics are characterized by using a point cloud characteristic extraction network, and each point characteristic is adjusted and output; constructing a point cloud output characteristic projection into a characteristic diagram with the same size as the grid output characteristic diagram of the front view; using the corresponding relation to construct a point cloud output characteristic projection into a characteristic diagram with the same size as the grid diagram output characteristic diagram, and keeping the original dimension to combine the grid output characteristic with the point cloud output characteristic of the front view; 3. and a detector detecting step of detecting an obstacle using the three-dimensional object based on the front view. The method solves the problem that information between original point cloud points is lost by using the grid features through the combination of the point cloud features and the grid features of the front view. However, the method still has the disadvantages that the method is limited by the near-dense and far-sparse distribution characteristics of the point cloud, the point cloud information of the far-distance target is less, and the missing detection of the far-distance target can be caused.

Lang et al, in its published paper "Point pilers: Fast Encoders for Object Detection from Point Clouds" (Computer Vision and Pattern Recognition, CVPR 2019 IEEE Conference on), disclose a Point cloud target Detection method based on Point cylinder partitioning. The method comprises the following specific steps: 1. carrying out cylinder division on each frame of point cloud data; 2. extracting the features in the point cylinder by using a neural network to generate a point cloud initial feature map; 3. taking the 2D convolutional neural network as a backbone network to perform feature extraction on the point cloud initial feature map; 4. and predicting the target in the point cloud by using the extracted point cloud characteristics. According to the method, the high-efficiency detection efficiency can be realized by carrying out cylinder division on the point cloud space. However, the method still has the disadvantages that the whole point cloud space is compressed along the vertical height direction by adopting a cylinder division mode, and the loss of point cloud information can be caused in the compression process, so that the false detection or the missing detection of a point cloud target can be caused.

Disclosure of Invention

The invention aims to provide a point cloud target detection method based on height-channel feature enhancement aiming at the defects of the prior art. The method is used for solving the problems that the information loss and the detection effect caused by compressing the point cloud space in the prior art are limited by the distribution characteristics of the point cloud close to dense and distant from sparse.

The specific idea for realizing the purpose of the invention is as follows: and carrying out four times of segmentation on the point cloud data to be detected along the vertical height direction to obtain four point cloud characteristic vectors, and enhancing the point cloud characteristics by attention weighting. The detection speed can be ensured by four times of segmentation, and point cloud information can be kept in a finer granularity manner so as to reduce the information loss of the point cloud; by performing attention weighting on the feature vectors in the height dimension and the channel dimension, the key features of the point cloud can be enhanced to overcome the close-dense-far-sparse distribution characteristics of the point cloud.

The specific steps for realizing the purpose of the invention are as follows:

(1) converting point cloud data blocks received from a laser radar in real time into aggregated feature vectors:

(2) extracting the attention weight value of the high dimensionality in the aggregation feature vector:

(2a) compressing the number of channels of the aggregated feature vector to 1 by using maximum pooling operation to obtain a height feature vector with the size of 496 multiplied by 432 multiplied by 1 multiplied by 4;

(2b) inputting the height feature vector into the convolutional layer, and outputting an attention weight value of a height dimension in the aggregation feature vector with the size of 496 multiplied by 432 multiplied by 4;

(3) extracting attention weight values of channel dimensions in the aggregated feature vector:

(3a) compressing the height number of the aggregation feature vector to be 1 by utilizing maximum pooling operation to obtain a channel feature vector with the size of 496 multiplied by 432 multiplied by 32 multiplied by 1;

(3b) inputting the channel feature vector into the convolutional layer, and outputting an attention weight value of a channel dimension in the aggregated feature vector with the size of 496 multiplied by 432 multiplied by 32;

(4) weighting the aggregated feature vectors:

(4a) performing cross multiplication on the attention weight value of the height dimension and the attention weight value of the channel dimension to obtain a polymerization attention weight value with the size of 496 multiplied by 432 multiplied by 32 multiplied by 4;

(4b) multiplying the aggregation attention weight value and the aggregation feature vector element by element to obtain a weighted feature vector with the size of 496 multiplied by 432 multiplied by 32 multiplied by 4;

(4c) compressing the height dimension of the weighted feature vector to be 1 by utilizing maximum pooling operation to obtain an enhanced feature vector of 496 multiplied by 432 multiplied by 32;

(5) constructing a backbone network:

building a backbone network of pointpilars and setting the number of input channels of the backbone network to be 32;

(6) training a backbone network:

inputting the enhanced feature vector into a backbone network, and iteratively updating network parameters by using an Adam optimization algorithm until a loss function of the backbone network is converged to obtain a trained backbone network;

(7) detecting a point cloud target:

(7a) converting point cloud data to be detected into a polymerization feature vector by adopting the same method as the step (1);

(7b) obtaining an enhanced feature vector by weighting the aggregated feature vector by the same method as the steps (2), (3) and (4);

(7c) and inputting the enhanced feature vector into the trained backbone network to complete point cloud target detection.

Compared with the prior art, the invention has the following advantages:

firstly, the point cloud data blocks are equally divided into four parts along the vertical direction to obtain four parts of point cloud characteristics, so that the defect that information loss is easily caused when point cloud data is compressed into one part of point cloud characteristics in the prior art is overcome, the false detection of a point cloud target can be reduced, and the average precision of point cloud target detection is improved.

Secondly, the method extracts the attention weight value of the high-degree dimension in the aggregated feature vector, extracts the attention weight value of the channel dimension in the aggregated feature vector, weights the aggregated feature vector, enhances the key features of the point cloud, overcomes the defect that the prior art is limited by the near-dense far-sparse distribution characteristics of the point cloud, and can reduce the missing detection of a remote target so as to improve the average precision of point cloud target detection.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The implementation steps of the present invention are further described with reference to fig. 1.

Step 1, point cloud data blocks received in real time from a laser radar are converted into aggregation characteristic vectors.

Through-filtering, point cloud data blocks with the length of 79.36 meters, the width of 69.12 meters and the height of 4 meters are obtained from the point cloud received by the laser radar in real time.

And equally dividing the point cloud data blocks into 4 parts along the vertical direction.

And uniformly dividing each point cloud data into cylinders with the same size, wherein the length and the width of each cylinder are 0.16 m, and the height of each cylinder is 1 m.

And inputting each cylinder into a trained PointNet network, and outputting 32-dimensional point cloud characteristics of the cylinder.

And arranging the 32-dimensional point cloud characteristics of each cylinder in each point cloud data into a characteristic vector with the size of 496 multiplied by 432 multiplied by 32 corresponding to the point cloud data according to the segmentation position.

The four feature vectors are spliced into an aggregated feature vector of size 496 × 432 × 32 × 4.

And 2, extracting the attention weight value of the high dimensionality in the aggregation feature vector.

With the maximum pooling operation, the number of channels of the aggregated feature vector is compressed to 1, resulting in a height feature vector of size 496 × 432 × 1 × 4.

The height feature vector is input to the first convolution layer, and the attention weight value of the height dimension in the aggregated feature vector with the size of 496 × 432 × 4 is output.

The first convolutional layer is a 2D convolutional layer with the convolutional kernel size of 1 multiplied by 1, the number of input channels of 4 and the number of output channels of 4.

And 3, extracting the attention weight value of the channel dimension in the aggregation characteristic vector.

With the maximum pooling operation, the height number of the aggregated feature vector is compressed to 1, resulting in a channel feature vector of size 496 × 432 × 32 × 1.

The channel feature vector is input into the second convolution layer, and the attention weight value of the channel dimension in the aggregated feature vector with the size of 496 × 432 × 32 is output.

The second convolutional layer is a 2D convolutional layer with the convolutional kernel size of 1 multiplied by 1, the number of input channels of 32 and the number of output channels of 32.

And 4, weighting the aggregation characteristic vectors.

And cross-multiplying the attention weight value of the height dimension and the attention weight value of the channel dimension to obtain an aggregate attention weight value with the size of 496 multiplied by 432 multiplied by 32 multiplied by 4.

The aggregate attention weight value is multiplied element by element with the aggregate feature vector to obtain a weighted feature vector with the size of 496 × 432 × 32 × 4.

The height dimension of the weighted feature vector is compressed to 1 using a max pooling operation, resulting in an enhanced feature vector of 496 × 432 × 32.

And 5, constructing a backbone network.

A backbone network of Pointpilars is built, and the number of input channels of the backbone network is set to be 32.

And 6, training a backbone network.

And inputting the enhanced feature vector into the backbone network, and iteratively updating the network parameters by using an Adam optimization algorithm until the loss function of the backbone network is converged to obtain the trained backbone network.

The parameter setting of the Adam optimization algorithm is that the exponential decay rate of the biased first moment estimation is set to be 0.95, the exponential decay rate of the biased second moment estimation is set to be 0.85, and the learning rate is set to be 0.003.

The loss function of the backbone network is as follows:

wherein, L represents the loss function of the backbone network, N represents the number of positive samples obtained by the enhanced feature vector through the prediction of the backbone network, and beta_locThe weight value of the positioning loss function is (0, 10)]，L_locRepresenting the localization loss function, beta_clsWeight values representing classification loss functions, the values of whichThe range is (0, 5)]，L_clsRepresenting the classification loss function, beta_dirRepresents the weight value of the orientation loss function, and the value range is (0, 1)]，L_dirRepresenting an orientation loss function;

the localization loss function is as follows:

sigma (·) represents summation operation, b represents parameters of a point cloud target frame obtained by enhancing a feature vector through backbone network prediction, x represents an x-axis coordinate value corresponding to the center of the point cloud target frame, y represents a y-axis coordinate value corresponding to the center of the point cloud target frame, z represents a z-axis coordinate value corresponding to the center of the point cloud target frame, w represents the width of the point cloud target frame, l represents the length of the point cloud target frame, h represents the height of the point cloud target frame, theta represents an orientation angle of the point cloud target frame, Smoothl1(·) represents a Smoothl1 loss function, and delta b represents residual coding of the point cloud target frame and a real point cloud target frame obtained by enhancing the feature vector through backbone network prediction;

the classification loss function is as follows:

L_cls＝-0.25(1-p^a)²logp^a

wherein p is^aRepresenting the confidence degree of the target classification obtained by predicting the enhanced feature vector through a backbone network, wherein log (-) represents the logarithmic operation based on a natural constant e;

the orientation loss function is as follows:

wherein e is^(·)Denotes an exponential operation with a natural constant e as base, f_jAll possible target orientation classes, f, representing enhanced feature vectors predicted by the backbone network_iF representing enhanced feature vectors predicted by backbone network_jTowards the category.

And 7, detecting the point cloud target.

And (3) converting the point cloud data to be detected into the aggregation characteristic vector by adopting the same method as the step 1.

And (4) weighting the aggregation feature vector by adopting the same method as the steps 2, 3 and 4 to obtain an enhanced feature vector.

And inputting the enhanced feature vector into the trained backbone network to complete point cloud target detection.

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: the processor is Inter (R) core (TM) i9-9900k CPU @3.6Ghz, and the memory is 16 GB. The GPU hardware is configured to be NVIDIA GeForce RTX 2080Ti, and the video memory is 11 GB.

The software platform of the simulation experiment of the invention is as follows: ubuntu 18.04 operating system and python 3.6.

The point cloud data set used in the simulation experiment is a KITTI data set, wherein the point cloud data is acquired by 64-line laser radar. A data set disclosed in The publication "Vision measures laboratories, The kit data set, The International Journal of nucleic Research 32.11(2013): 1231-1237" by Geiger, Andrea et al, which classifies data samples into three types, a simple sample, a medium sample, and a difficult sample.

2. Simulation content and result analysis thereof:

the simulation experiment of the invention adopts the invention and three detection methods (VoxelNet, Second, Pointpilers) in the prior art to respectively carry out target detection on vehicles in the input point cloud data set so as to obtain predicted point cloud targets.

The prior art method VoxelNet refers to a point cloud target detection method of a voxel-based 3D convolution network method, called VoxelNet for short, proposed in Zhou, Y et al, "VoxelNet: End-to-End learning for point closed based 3D object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018, pp.4490-4499".

The prior art method Second refers to a point cloud target detection method using 3D sparse convolution, called Second for short, proposed by Yan, Yan et al in "Second: sparse embedded volumetric detection, Sensors 18.10(2018): 3337".

The prior art method Pointpilers refers to a point cloud target detection method for point cylinder segmentation, which is proposed in "Pointpilers: Fast encoders for object detection from point clusters. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019, pp.12697-12705" by Lang, A.H et al.

In order to verify the effect of the invention, the following calculation of the detection average precision is carried out on the predicted point cloud target obtained by adopting four different detection methods.

All the calculation results are plotted in table 1, and Ours in table 1 represent the simulation experiment results of the present invention.

TABLE 1 table of average accuracy of four methods of detection

Average accuracy	Simple sample	Moderate sample	Difficult sample
				VoxelNet	81.97	65.46	62.85
Second	85.50	75.04	68.78
				Pointpillars	87.50	77.01	74.77
Ours	88.45	78.01	76.72

As can be seen from Table 1, the invention exceeds the prior art in simple, medium and difficult samples on KITTI data sets, and proves that the invention can obtain higher average detection accuracy.

Claims

1. A point cloud target detection method based on height-channel feature enhancement is characterized by extracting an attention weight value of a high-degree dimension in a polymerization feature vector, extracting an attention weight value of a channel dimension in the polymerization feature vector, and weighting the polymerization feature vector to obtain an enhanced feature vector; the method comprises the following steps:

(2b) inputting the height feature vector into a first convolution layer, and outputting an attention weight value of a height dimension in the aggregation feature vector with the size of 496 multiplied by 432 multiplied by 4;

(3b) inputting the channel feature vector into a second convolution layer, and outputting an attention weight value of a channel dimension in the aggregated feature vector with the size of 496 multiplied by 432 multiplied by 32;

(4) weighting the aggregated feature vectors:

(5) constructing a backbone network:

(6) training a backbone network:

(7) detecting a point cloud target:

2. The method for detecting the point cloud target based on the height-channel feature enhancement as claimed in claim 1, wherein the step of converting the point cloud data block received from the laser radar in real time into the aggregated feature vector in step (1) is as follows:

firstly, through-filtering, obtaining a point cloud data block with the length of 79.36 meters, the width of 69.12 meters and the height of 4 meters from the point cloud received by a laser radar in real time;

uniformly dividing the point cloud data blocks into 4 parts in the vertical direction;

uniformly dividing each point cloud data into cylinders with the same size, wherein the length and the width of each cylinder are 0.16 m, and the height of each cylinder is 1 m;

fourthly, inputting each cylinder into a trained PointNet network, and outputting 32-dimensional point cloud characteristics of the cylinder;

fifthly, arranging the 32-dimensional point cloud characteristics of each cylinder in each point cloud data into a characteristic vector with the size of 496 multiplied by 432 multiplied by 32 corresponding to the point cloud data according to the segmentation position;

and sixthly, splicing the four feature vectors into an aggregation feature vector with the size of 496 multiplied by 432 multiplied by 32 multiplied by 4.

3. The method of claim 1, wherein the first convolution layer in step (2b) is a 2D convolution layer with a convolution kernel size of 1 × 1, a number of input channels of 4, and a number of output channels of 4.

4. The method of claim 1, wherein the second convolutional layer in step (3b) is a 2D convolutional layer with a convolutional kernel size of 1 × 1, the number of input channels of 32, and the number of output channels of 32.

5. The method for detecting point cloud target based on height-channel feature enhancement as claimed in claim 1, wherein the parameters of the Adam optimization algorithm in step (6) are set to set the exponential decay rate of the biased first moment estimation to 0.95, the exponential decay rate of the biased second moment estimation to 0.85, and the learning rate to 0.003.

6. The method for point cloud target detection based on elevation-channel feature enhancement as claimed in claim 1, wherein the loss function of the backbone network in step (6) is as follows:

wherein, L represents the loss function of the backbone network, N represents the number of positive samples obtained by the enhanced feature vector through the prediction of the backbone network, and beta_locThe weight value of the positioning loss function is (0, 10)]，L_locRepresenting the localization loss function, beta_clsA weight value representing a classification loss function, which has a value in the range of (0, 5)]，L_clsRepresenting the classification loss function, beta_dirRepresents the weight value of the orientation loss function, and the value range is (0, 1)]，L_dirRepresenting an orientation loss function;

the localization loss function is as follows:

the classification loss function is as follows:

L_cls＝-0.25(1-p^a)²logp^a

the orientation loss function is as follows: