CN116704463A

CN116704463A - Automatic driving target detection method based on point cloud columnar rapid coding algorithm

Info

Publication number: CN116704463A
Application number: CN202310687054.9A
Authority: CN
Inventors: 杨钊灿; 沈永峰; 孙叶凡; 陈飞洋; 岳松儒
Original assignee: Shanghai Dianji University
Current assignee: Shanghai Dianji University
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-09-05

Abstract

The application discloses an automatic driving target detection method based on a point cloud columnar rapid coding algorithm, which comprises the following steps: 1. a feature encoder network that converts the point cloud into a sparse pseudo image; 101. inputting point cloud; 102. stacking point cloud columns; 103. feature learning; 104. generating a pseudo image; 2. adding the improved ECA module into a 2D convolution backbone network of a Point villars algorithm, and processing a pseudo image into an advanced representation; 3. and (3) carrying out regression of the Bbox by using the SSD detection head to realize automatic driving 3D target detection. The application can improve the detection effect of the target detection network under the shielding condition and can effectively improve the detection precision.

Description

Automatic driving target detection method based on point cloud columnar rapid coding algorithm

Technical Field

The application belongs to the technical field of automatic driving target detection, and particularly relates to an automatic driving target detection method based on a point cloud columnar rapid coding algorithm.

Background

In recent years, development of artificial intelligence technology based on deep learning has been advanced, and thus, automatic driving automobiles based on artificial intelligence technology have been put into commercialization for years. Many automobile industry tap companies at home and abroad continuously push out novel automobiles provided with advanced auxiliary driving systems (Advanced Driver Assistant System), abbreviated as ADAS. The ADAS mainly comprises three aspects, namely environment perception, decision and planning and vehicle control, wherein the environment perception refers to the acquisition of information from the road environment of a driving vehicle, the extraction of relevant knowledge by using the collected information provides support for the later decision planning of some targets, and the method belongs to the basic and decisive links for realizing automatic driving. The sense of environmental awareness in the field of autopilot is the ability of a vehicle to understand the environment of a road scene, such as drivable region segmentation, lane line detection, obstacle type and location detection, target classification tracking, and the like.

Aiming at the problem of automatic driving vehicle target detection in a road scene, expert students propose to perform feature clustering on point cloud data through a PCL point cloud processing library, obtain the features of road vehicles according to clustering results, and apply a deep learning method to three-dimensional point cloud data processing on the basis of the feature clustering, so that the problems of long time consumption and large calculation amount of a manual obstacle feature extraction method are solved, but in the prior art, the accuracy of target detection is not high enough.

Disclosure of Invention

The technical problem to be solved by the application is to provide an automatic driving target detection method based on a point cloud columnar rapid coding algorithm aiming at the defects in the prior art, which can improve the detection effect of a target detection network under the shielding condition and can effectively improve the detection precision.

In order to solve the technical problems, the application adopts the following technical scheme: an automatic driving target detection method based on a point cloud columnar rapid coding algorithm comprises the following steps:

step one, converting point cloud into a feature encoder network of a sparse pseudo image; the specific process is as follows:

step 101, point cloud input: dividing an input point cloud into a plurality of point cloud column units, and dividing the point cloud by each point cloud column on an XY plane with a step length L to obtain a three-dimensional small cell;

step 102, stacking point cloud columns: when the number of the point cloud data in each point cloud column exceeds N, selecting N from the point cloud data by adopting a random sampling method, and when the number of the point cloud data in each point cloud column is less than N, filling the point cloud data into N by adopting a zero filling method, and encoding one frame of point cloud data into dense tensors with the dimensions of (D, P, N);

step 103, feature learning: using PointNet network to process each point containing D dimension feature with a linear layer Batch Norm and ReLU activation function to generate tensor with dimension (C, P, N); performing maximum pooling operation on each point cloud column unit to obtain tensors with dimensions (C, P);

step 104, generating a pseudo image: generating a pseudo image through a scanner operator, and converting the (C, P) tensor generated in the previous step back to the original point cloud column coordinates of the tensor through the point cloud column index value of each point to create the pseudo image with the size of (C, H, W), wherein H and W respectively represent the height and the width of the pseudo image;

step two, adding the improved ECA module into a 2D convolution backbone network of a Point Pillars algorithm, and processing a pseudo image into an advanced representation;

and thirdly, performing Bbox regression by using the SSD detection head to realize automatic driving 3D target detection.

In the automatic driving target detection method based on the point cloud columnar rapid coding algorithm, when the point cloud is divided in step 101, the range of the point cloud coordinates on the XY plane and the size of each point cloud column are set.

In the above automatic driving target detection method based on the point cloud column rapid coding algorithm, in step 101, the point cloud code of each point cloud column is a 9-dimensional vector D: (x, y, z, r, xc, yc, zc, xp, yp), wherein x, y, z, r represent 3 coordinates and reflection intensities of the point cloud in three-dimensional space, respectively; xc, yc, zc represents the distance to the arithmetic mean point of all points in the point cloud column, xp, yp represents the offset value of the point to the x, y center of the point cloud column r.

According to the automatic driving target detection method based on the Point cloud columnar rapid coding algorithm, the 2D convolution backbone network of the Point pilars algorithm adopts an RPN backbone network and is divided into two sub-networks: one sub-network from top to bottom is used to extract features on smaller and smaller spatial resolution feature maps, and the other sub-network is responsible for upsampling the extracted features on different resolution feature maps to the same dimension size through a deconvolution operation and then concatenating the upsampled features.

According to the automatic driving target detection method based on the point cloud columnar rapid coding algorithm, in the second step, the improved ECA module changes the convolution kernel size k of the ECA module from a variable value to a fixed value.

According to the automatic driving target detection method based on the point cloud columnar rapid coding algorithm, the fixed value is 3.

According to the automatic driving target detection method based on the Point cloud columnar rapid coding algorithm, when the improved ECA module is added into the 2D convolution backbone network of the Point Pillars algorithm, a plurality of improved ECA modules are added.

Compared with the prior art, the application has the following advantages:

the application adds the ECA module into the 2DCNN backbone network, and improves the ECA module before adding the ECA module, and changes the convolution kernel size of the ECA module from a variable value to a fixed value, namely, no matter what kind of the channel dimension is subjected to the chemical conversion, the convolution kernel size of the ECA module is unchanged all the time. The ECA module takes a fixed value because the small convolution kernel can reduce the parameter quantity and the detection precision can be improved; adding a plurality of improved ECA modules into the 2DCNN backbone network, wherein the improved ECA modules bring proper cross-channel interaction for ECA-Point Pillars, and the ECA modules can learn the relation among channels, so that the model can automatically learn the importance of different channel characteristics, and the method is beneficial to distributing weights to different parts of data input into the ECA modules; therefore, the 2DCNN backbone network of the ECA-Point Pillars can extract key information and inhibit unimportant information; by this improvement, a great improvement in performance can be achieved by adding only a small number of additional parameters.

The technical scheme of the application is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flow chart of the method of the present application;

FIG. 2 is a block diagram of an ECA module of the improved 2D convolutional backbone network;

fig. 3 is a block diagram of a 2D convolutional backbone network of the Point Pillars algorithm.

Detailed Description

As shown in fig. 1, the automatic driving target detection method based on the Point pilars algorithm of the application comprises the following steps:

and thirdly, performing Bbox regression by using a SSD (Single Shot Detection) detection head to realize automatic driving 3D target detection.

In the implementation, SSD is used as a two-dimensional detector, the set prior frame is matched with the ground real frame through a two-dimensional intersection ratio (Intersection over Union, ioU for short), and the height of the frame and the height of the center point are used as additional regression target values; only the direction, i.e. the grid plane in the top view, is subjected to detection classification, and the height in the z-axis direction in the finally output three-dimensional envelope frame is obtained by means of separate regression calculation.

In this embodiment, when the point cloud is divided in step 101, a range of point cloud coordinates on the XY plane and a size of each point cloud column are set.

In this embodiment, in step 101, each point cloud column encodes a point cloud into a 9-dimensional vector D: (x, y, z, r, xc, yc, zc, xp, yp), wherein x, y, z, r represent 3 coordinates and reflection intensities of the point cloud in three-dimensional space, respectively; xc, yc, zc represents the distance to the arithmetic mean point of all points in the point cloud column, xp, yp represents the offset value of the point to the x, y center of the point cloud column r.

In this embodiment, the 2D convolutional backbone network of the Point pilars algorithm adopts a RPN (Region Proposal Network) backbone network identical to VoxelNet, and is divided into two sub-networks: one sub-network from top to bottom is used to extract features on smaller and smaller spatial resolution feature maps, and the other sub-network is responsible for upsampling the extracted features on different resolution feature maps to the same dimension size through a deconvolution operation and then concatenating the upsampled features.

In this embodiment, the modified ECA module in the second step changes the convolution kernel size k of the ECA module from a variable value to a fixed value.

In this embodiment, the fixed value is 3.

I.e. k=3, i.e. no matter what kind of change occurs in the channel dimension C', the convolution kernel size k of the ECA module is always unchanged, taking a fixed value of 3; taking k=3 is because a small convolution kernel can reduce the number of parameters, and experiments will also prove that taking a fixed value k=3 for the convolution kernel size k of the ECA module can bring about improvement of detection accuracy.

In implementation, a structural diagram of an ECA module of the improved 2D convolutional backbone network is shown in fig. 2.

In this embodiment, when the modified ECA module is added to the 2D convolution backbone network of the Point pilars algorithm in the second step, a plurality of modified ECA modules are added. That is, the attention mechanism is added in the Point Pillars algorithm, and a plurality of ECA modules can learn the relation among channels, so that the model can automatically learn the importance of different channel characteristics, and the method is helpful for distributing weights to different parts of data input to the ECA modules. Therefore, the 2D CNN backbone network of ECA-Point Pillars can extract key information and inhibit unimportant information. By this improvement, a great improvement in performance can be achieved by adding only a small number of additional parameters.

In particular, a block diagram of a 2D convolution backbone network of the Point pilars algorithm is shown in fig. 3.

In order to verify the technical effect which can be produced by the method, a KITTI 3D target detection reference data set is used in a comparison experiment; the data set consists of a laser radar point cloud and an image, and the experiment is trained by using the laser radar point cloud only. The samples of the KITTI 3D target detection reference dataset were divided into 7481 training samples and 7518 test samples, containing a total of 80256 labeled subjects. Distant objects will be filtered according to their bounding box height on the image plane;

the experimental environment included ubuntu18.04, pytorch 1.1 and python 3.6. The video card uses NVIDIA QuadroP4000 and the CPU uses Intel Xeon (R) Si layer 4210. All settings of the Point Pillars, ECA-Point Pillars (1 ECA module) and ECA-Point Pillars are identical.

ECA-Point villars optimizes the loss function by using an Adam optimizer, the learning rate is 2x10 < -4 > at the beginning, the learning rate is reduced by 0.8 times every 15 epochs, 160 epochs are trained in total, and the batch size is selected to be 4 during training; the xy resolution is set to 0.16m, the maximum number of pi llars (P) is chosen to be 12000, and the maximum number of point clouds per pi llar (N) is chosen to be 100. The anchor point of each category of object has two directions: 0 degrees and 90 degrees. Reasoning by adopting an axis alignment non-maximum suppression (NMS) method with an overlap threshold value of 0.5 IoU;

the IoU (cross over ratio) threshold for select cars is 0.7 and the IoU threshold for pedestrians/bicycles is 0.5. Average detection Accuracy (AP) and average mean square similarity (AOS) for automobiles, pedestrians, and bicycles are presented as simple, medium, and difficult levels, respectively.

The BEV (bird's eye view) detection performance for the Point pilars and fixed k values is shown in table 1:

TABLE 1 BEV (bird's eye view) detection Performance Table of Point Pillars and fixed k values

As can be seen from table 1, for BEV detection, the AP values of ECA-Point villars (fixed k values) are higher for each of the simple, medium and difficult conditions of an automobile, pedestrian and bicycle over the KITTI verification dataset than for the Point villars corresponding term.

The 3D (three-dimensional) detection performance of the obtained Point pilars and fixed k values is shown in table 2:

TABLE 2 3D (three-dimensional) detection Performance of Point Pilars with fixed k values

As can be seen from table 2, for 3D detection, ECA-Point Pi llars (fixed k-value) have higher AP values for each of the terms in simple, medium and difficult conditions for car, pedestrian and bicycle over the KITTI validation dataset than the AP values for the corresponding terms of Point Pi llars, similar to BEV detection. Furthermore, the detection effect of ECA-Point pilars (fixed k-value) on pedestrians under simple, medium and difficult conditions is very pronounced compared to the corresponding value of Point pilars.

The obtained AOS (average direction similarity) detection performance of Point villars and fixed k value is shown in Table 3:

TABLE 3Point villars and fixed k AOS (average Direction similarity) detection Performance

As can be seen from table 3, AOS detection, ECA-Point villars (k=3) except that the AOS value of pedestrians is smaller than that of Point villars in simple, medium, and simple conditions, and that of ECA-Point villars (k=3) is slightly higher than that of Point villars in simple, medium, and difficult conditions for the case of automobiles and bicycles.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present application are presented for purposes of illustration and description. It is not intended to limit the application to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the application and its practical application to thereby enable one skilled in the art to make and utilize the application in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the application be defined by the claims and their equivalents.

Claims

1. An automatic driving target detection method based on a point cloud columnar rapid coding algorithm is characterized by comprising the following steps of:

step two, adding the improved ECA module into a 2D convolution backbone network of a Point pilars algorithm, and processing the pseudo image into an advanced representation;

2. The automatic driving target detection method based on the point cloud columnar rapid coding algorithm according to claim 1, wherein the automatic driving target detection method is characterized by comprising the following steps of: in step 101, when the point cloud is divided, a range of point cloud coordinates on the XY plane and a size of each point cloud column are set.

3. The automatic driving target detection method based on the point cloud columnar rapid coding algorithm according to claim 1, wherein the automatic driving target detection method is characterized by comprising the following steps of: in step 101, each point cloud column encodes a point cloud into a 9-dimensional vector D: (x, y, z, r, xc, yc, zc, xp, yp), wherein x, y, z, r represent 3 coordinates and reflection intensities of the point cloud in three-dimensional space, respectively; xc, yc, zc represents the distance to the arithmetic mean point of all points in the point cloud column, xp, yp represents the offset value of the point to the x, y center of the point cloud column r.

4. The automatic driving target detection method based on the point cloud columnar rapid coding algorithm according to claim 1, wherein the automatic driving target detection method is characterized by comprising the following steps of: the 2D convolution backbone network of the Point pilars algorithm adopts an RPN backbone network and is divided into two sub-networks: one sub-network from top to bottom is used to extract features on smaller and smaller spatial resolution feature maps, and the other sub-network is responsible for upsampling the extracted features on different resolution feature maps to the same dimension size through a deconvolution operation and then concatenating the upsampled features.

5. An automatic driving target detection method based on a point cloud columnar rapid coding algorithm according to claim 1 or 4, wherein the method comprises the following steps: and step two, the improved ECA module changes the convolution kernel size k of the ECA module from a variable value to a fixed value.

6. The automatic driving target detection method based on the point cloud columnar rapid coding algorithm according to claim 5, wherein the automatic driving target detection method is characterized by comprising the following steps of: the fixed value is 3.

7. The automatic driving target detection method based on the point cloud columnar rapid coding algorithm according to claim 5, wherein the automatic driving target detection method is characterized by comprising the following steps of: and in the second step, when the improved ECA module is added into the 2D convolution backbone network of the Point pilars algorithm, a plurality of improved ECA modules are added.