CN114004978A

CN114004978A - Point cloud target detection method based on attention mechanism and deformable convolution

Info

Publication number: CN114004978A
Application number: CN202111297310.0A
Authority: CN
Inventors: 曾凯; 朱明亮; 朱艳; 沈韬; 谢江
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-02-01

Abstract

The invention relates to a point cloud target detection method based on an attention mechanism and deformable convolution, and belongs to the technical field of target detection. The invention firstly carries out feature coding on original point cloud data, codes a point cloud space into a pseudo image form, and then combines a channel attention module and a deformable convolution module, thereby improving the extraction capability of a network on important features and the adaptability of feature deformation.

Description

Point cloud target detection method based on attention mechanism and deformable convolution

Technical Field

The invention relates to a point cloud target detection method based on an attention mechanism and deformable convolution, and belongs to the technical field of target detection.

Background

With continuous progress of artificial intelligence technology, research in the field of automatic driving is also greatly advanced, a perception task in the field of automatic driving is completed by utilizing laser radar point cloud data, great attention is paid to the industrial and academic fields, in the scene of automatic driving, an environment perception task is taken as a core task, surrounding target objects are generally taken as information, and the information is processed to further realize judgment of a travelable area, target identification and rule understanding in a traffic scene. The point cloud data is different from the traditional RGB image, contains object space geometric information, and can accurately record the position information and the structure information of the object reflection points.

The point cloud space is divided into voxels through division, the voxels are similar to pixels in an image, namely the point cloud space is divided into grids, the grids are subjected to feature coding to form a regular feature representation space, and then the regular feature representation space is sent into three-dimensional convolution to carry out feature extraction, so that the final detection task is completed. The method based on the point column ignores information of the point cloud in the Z-axis direction on the basis of voxelization, only uses information of X and Y coordinates, encodes the information into a pseudo image form, and sends the pseudo image form to a two-dimensional convolution for feature extraction, and finally completes a detection task.

Therefore, point cloud target detection based on a point column mode has a more practical application value in the industry, but due to the sparsity of point cloud data, an original network cannot well focus on important characteristic information of an object, and after the point cloud data in the network are coded into a pseudo image, a characteristic extraction module cannot effectively adapt to the situation of geometric deformation of the object, so that the sensing task in an automatic driving scene is limited.

Disclosure of Invention

The invention aims to provide a point cloud target detection method based on an attention mechanism and deformable convolution, which is used for solving the problems of low detection precision and high omission ratio caused by fewer numbers of pedestrians and riders in the field of point cloud target detection.

The technical scheme of the invention is as follows: a point cloud target detection method based on an attention mechanism and deformable convolution comprises the steps of firstly carrying out feature coding on original point cloud data, coding a point cloud space into a pseudo image form, then combining a channel attention module with a deformable convolution module, improving the extraction capability of a network on important features and the adaptability of feature deformation, and finally detecting the point cloud target to obtain a final output feature map.

The method comprises the following specific steps:

step 1: the method comprises the steps of firstly dividing an original point cloud data space into single cylinders, and then coding the single cylinders into a pseudo image in a characteristic coding mode, wherein the pseudo image characteristic information comprises channel number, height and width information, so that the pseudo image characteristic information can be suitable for characteristic extraction in a two-dimensional convolution mode.

Step 2: feature extraction is carried out on the pseudo image through two-dimensional convolution, and in the process, a deformable convolution module is introduced to enhance the adaptability of the network to the geometric deformation of the target.

Step 3: after a deformable convolution module is introduced into the network, a channel attention module is introduced, the extraction capability of the network on the important feature information of the target is enhanced, and finally feature extraction is completed to obtain a feature map.

Step 4: and (5) sending the characteristic diagram into a detector to obtain a final output characteristic diagram, and finishing a detection task.

The Step1 is specifically as follows: the original point cloud data form is (x, y, z, r), only position coordinate information in the data is adopted, feature coding is carried out through a PFN layer in a Pointpilars algorithm, a point cloud tensor [ x, y, z ] is converted into a pseudo image form, and the converted point cloud data feature tensor is [ B, C, H, W ] and represents batch times, channel number of the pseudo image, height and width respectively.

In Step2, the deformable convolution module specifically includes:

wherein, y (p)₀) Output characteristic diagram, p, representing a deformable convolution module₀Represents the center point of a common convolution kernel, p_nSample points representing a common convolution kernel, x (-) representing the input feature map, Δ p_nRepresenting the offset in the deformable convolution.

In Step3, the channel attention module specifically includes:

wherein the content of the first and second substances,

is a feature matrix u_cAnd s_cThe final feature matrix, u, obtained by the dot product weighting operation_cIs a feature vector, s, after feature extraction_cIs a feature matrix obtained after excitation operation.

The Step4 is specifically as follows:

and sending the feature graph into a detector, performing convolution operation on the input feature graph for three times to extract features on different dimensions, converting the three feature graphs with different sizes into feature graphs with the same size through deconvolution operation, and realizing fusion of the feature graphs through splicing operation to obtain a final output feature graph.

And sending the final output feature map into a detection model based on a two-dimensional image, wherein the detection model at least comprises a YOLO (Young Look one) algorithm model, an SSD (Single Shot Multi Box Detector) algorithm model and an RPN (region pro-social network) algorithm model.

The invention has the beneficial effects that: the invention enables the network to adapt to the characteristic change caused by the geometric deformation of target objects such as pedestrians and riders through the deformable convolution, effectively extracts the important characteristic information of the target objects in the point cloud space through the attention mechanism, finally realizes the characteristic graph to accurately represent the characteristic information of the pedestrians and the riders, reduces the omission ratio of the network to the pedestrians and the riders, and improves the detection precision of the network to the vehicles, the pedestrians and the riders.

Drawings

FIG. 1 is a flow chart of the steps of the present invention;

FIG. 2 is a diagram of a detection network architecture in an embodiment of the present invention;

FIG. 3 is a diagram showing the results of detection in the example of the present invention.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

As shown in fig. 1, a point cloud target detection method based on attention mechanism and deformable convolution specifically includes the following steps:

step 1: the method comprises the steps of firstly dividing an original point cloud data space into single cylinders, and then coding the single cylinders into a pseudo image in a characteristic coding mode, wherein the pseudo image characteristic information comprises information such as channel number, height and width, and the like, so that the pseudo image characteristic information can be suitable for feature extraction in a two-dimensional convolution mode.

This embodiment performs feature encoding on the original point cloud data, and encodes it into columns through a pfn (pilarfeaturenet) layer in the pointpilers model, where the size of each column is [0.16,0.16,3], the size of the whole point cloud space is [0, -19.84, -2.5,47.36,19.84,0.5], and the size of the encoded pseudo-image is [2,64,296,248], where batchsize is 2, the number of channels is 64, the height is 296, and the width is 248.

As shown in fig. 2, SE-DCN represents the introduced deformable convolution and attention mechanism combination module, the input of each layer of deformable convolution is the number of channels of the feature map of the previous layer, the channel inputs of the three modules in the figure are 64, 128 and 256 respectively, and the offset channel number input in the deformable convolution module is 18.

Step 3: after a deformable convolution module is introduced into the network, a channel attention module is introduced, and the extraction capability of the network on the important feature information of the target is enhanced.

For the purpose of enhancing the extraction capability of the network on the important features of the target object through the channel attention mechanism, the number of input channels of the attention module at each layer is 64, 128 and 256 respectively.

Step 4: and after the characteristic extraction is finished, sending the data to a detector to finish the detection task.

In this example, three convolution operations are performed on the input feature maps to extract features of different dimensions, each layer of feature extraction block operation includes a convolution operation, a pooling operation and a ReLu activation operation, the input tensor of each layer is [ C, H, W ] and includes the number of channels, height and width, the input of each layer of feature map is the output of the feature map of the previous layer, the output size of the feature map of the first layer is [64,296,248], the output size of the feature map of the second layer is [128,148,124], and the output size of the feature map of the third layer is [256,74,62 ].

In this example, three feature maps with the same size are converted into feature maps with the same size through deconvolution operation, the output size of each layer of feature map after deconvolution operation is [128,148,124], the three feature maps with the same size are fused through splicing operation, and the output final feature map size is [2,384,148,124], where 2 represents the batch number, 384 is the final channel number, 148 is the height of the feature map, and 124 is the width of the feature map.

And sending the final output characteristic diagram into a detection model based on a two-dimensional image, and sending the final output characteristic diagram into an SSD algorithm model for classification and regression in the example to obtain a final detection result.

This example performs permute and contiguous operations on the output feature map, converting the feature map size to [2,148,124,384] to meet the input requirements of the classification and regression models.

Fig. 3 is a diagram of the experimental effect of the present example, which is improved in the detection accuracy of the target object compared to the case of using the pointpilers and VoxelNet models, and can improve the missing detection phenomenon in the pointpilers models.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims

1. A point cloud target detection method based on an attention mechanism and deformable convolution is characterized in that:

step 1: firstly, dividing an original point cloud data space into single cylinders, and coding the single cylinders into a pseudo image in a characteristic coding mode, wherein the pseudo image characteristic information comprises channel number, height and width information, so that the pseudo image characteristic information can be suitable for characteristic extraction in a two-dimensional convolution mode;

step 2: extracting features on the pseudo image through two-dimensional convolution, and introducing a deformable convolution module in the process;

step 3: introducing a channel attention module after introducing a deformable convolution module into a network, and finally completing feature extraction to obtain a feature map;

2. The method of point cloud target detection based on attention mechanism and deformable convolution of claim 1, wherein Step1 is specifically: the original point cloud data form is (x, y, z, r), only position coordinate information in the data is adopted, feature coding is carried out through a PFN layer in a Pointpilars algorithm, a point cloud tensor [ x, y, z ] is converted into a pseudo image form, and the converted point cloud data feature tensor is [ B, C, H, W ] and represents batch times, channel number of the pseudo image, height and width respectively.

3. The method of point cloud target detection based on attention mechanism and deformable convolution of claim 1, wherein in Step2, the deformable convolution module is specifically:

wherein, y (p)₀) Output characteristic diagram, p, representing a deformable convolution module₀Represents the center point of a common convolution kernel, p_nSample points representing the ordinary convolution kernel, x (-) representsIs the input characteristic map, Δ p_nRepresenting the offset in the deformable convolution.

4. The method of claim 1, wherein in Step3, the channel attention module is specifically:

wherein the content of the first and second substances,

5. The method of point cloud target detection based on attention mechanism and deformable convolution of claim 1, wherein Step4 is specifically:

sending the feature map into a detector, performing convolution operation on the input feature map for three times to extract features on different dimensions, converting the three feature maps with different sizes into feature maps with the same size through deconvolution operation, and realizing fusion of the feature maps through splicing operation to obtain a final output feature map;

and sending the final output characteristic graph into a detection model based on a two-dimensional image, wherein the detection model at least comprises a YOLO algorithm model, an SSD algorithm model and an RPN algorithm model.