CN114004978A - Point cloud target detection method based on attention mechanism and deformable convolution - Google Patents

Point cloud target detection method based on attention mechanism and deformable convolution Download PDF

Info

Publication number
CN114004978A
CN114004978A CN202111297310.0A CN202111297310A CN114004978A CN 114004978 A CN114004978 A CN 114004978A CN 202111297310 A CN202111297310 A CN 202111297310A CN 114004978 A CN114004978 A CN 114004978A
Authority
CN
China
Prior art keywords
feature
point cloud
convolution
deformable convolution
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111297310.0A
Other languages
Chinese (zh)
Inventor
曾凯
朱明亮
朱艳
沈韬
谢江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202111297310.0A priority Critical patent/CN114004978A/en
Publication of CN114004978A publication Critical patent/CN114004978A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a point cloud target detection method based on an attention mechanism and deformable convolution, and belongs to the technical field of target detection. The invention firstly carries out feature coding on original point cloud data, codes a point cloud space into a pseudo image form, and then combines a channel attention module and a deformable convolution module, thereby improving the extraction capability of a network on important features and the adaptability of feature deformation.

Description

Point cloud target detection method based on attention mechanism and deformable convolution
Technical Field
The invention relates to a point cloud target detection method based on an attention mechanism and deformable convolution, and belongs to the technical field of target detection.
Background
With continuous progress of artificial intelligence technology, research in the field of automatic driving is also greatly advanced, a perception task in the field of automatic driving is completed by utilizing laser radar point cloud data, great attention is paid to the industrial and academic fields, in the scene of automatic driving, an environment perception task is taken as a core task, surrounding target objects are generally taken as information, and the information is processed to further realize judgment of a travelable area, target identification and rule understanding in a traffic scene. The point cloud data is different from the traditional RGB image, contains object space geometric information, and can accurately record the position information and the structure information of the object reflection points.
The point cloud space is divided into voxels through division, the voxels are similar to pixels in an image, namely the point cloud space is divided into grids, the grids are subjected to feature coding to form a regular feature representation space, and then the regular feature representation space is sent into three-dimensional convolution to carry out feature extraction, so that the final detection task is completed. The method based on the point column ignores information of the point cloud in the Z-axis direction on the basis of voxelization, only uses information of X and Y coordinates, encodes the information into a pseudo image form, and sends the pseudo image form to a two-dimensional convolution for feature extraction, and finally completes a detection task.
Therefore, point cloud target detection based on a point column mode has a more practical application value in the industry, but due to the sparsity of point cloud data, an original network cannot well focus on important characteristic information of an object, and after the point cloud data in the network are coded into a pseudo image, a characteristic extraction module cannot effectively adapt to the situation of geometric deformation of the object, so that the sensing task in an automatic driving scene is limited.
Disclosure of Invention
The invention aims to provide a point cloud target detection method based on an attention mechanism and deformable convolution, which is used for solving the problems of low detection precision and high omission ratio caused by fewer numbers of pedestrians and riders in the field of point cloud target detection.
The technical scheme of the invention is as follows: a point cloud target detection method based on an attention mechanism and deformable convolution comprises the steps of firstly carrying out feature coding on original point cloud data, coding a point cloud space into a pseudo image form, then combining a channel attention module with a deformable convolution module, improving the extraction capability of a network on important features and the adaptability of feature deformation, and finally detecting the point cloud target to obtain a final output feature map.
The method comprises the following specific steps:
step 1: the method comprises the steps of firstly dividing an original point cloud data space into single cylinders, and then coding the single cylinders into a pseudo image in a characteristic coding mode, wherein the pseudo image characteristic information comprises channel number, height and width information, so that the pseudo image characteristic information can be suitable for characteristic extraction in a two-dimensional convolution mode.
Step 2: feature extraction is carried out on the pseudo image through two-dimensional convolution, and in the process, a deformable convolution module is introduced to enhance the adaptability of the network to the geometric deformation of the target.
Step 3: after a deformable convolution module is introduced into the network, a channel attention module is introduced, the extraction capability of the network on the important feature information of the target is enhanced, and finally feature extraction is completed to obtain a feature map.
Step 4: and (5) sending the characteristic diagram into a detector to obtain a final output characteristic diagram, and finishing a detection task.
The Step1 is specifically as follows: the original point cloud data form is (x, y, z, r), only position coordinate information in the data is adopted, feature coding is carried out through a PFN layer in a Pointpilars algorithm, a point cloud tensor [ x, y, z ] is converted into a pseudo image form, and the converted point cloud data feature tensor is [ B, C, H, W ] and represents batch times, channel number of the pseudo image, height and width respectively.
In Step2, the deformable convolution module specifically includes:
Figure BDA0003336981310000021
wherein, y (p)0) Output characteristic diagram, p, representing a deformable convolution module0Represents the center point of a common convolution kernel, pnSample points representing a common convolution kernel, x (-) representing the input feature map, Δ pnRepresenting the offset in the deformable convolution.
In Step3, the channel attention module specifically includes:
Figure BDA0003336981310000022
wherein the content of the first and second substances,
Figure BDA0003336981310000023
is a feature matrix ucAnd scThe final feature matrix, u, obtained by the dot product weighting operationcIs a feature vector, s, after feature extractioncIs a feature matrix obtained after excitation operation.
The Step4 is specifically as follows:
and sending the feature graph into a detector, performing convolution operation on the input feature graph for three times to extract features on different dimensions, converting the three feature graphs with different sizes into feature graphs with the same size through deconvolution operation, and realizing fusion of the feature graphs through splicing operation to obtain a final output feature graph.
And sending the final output feature map into a detection model based on a two-dimensional image, wherein the detection model at least comprises a YOLO (Young Look one) algorithm model, an SSD (Single Shot Multi Box Detector) algorithm model and an RPN (region pro-social network) algorithm model.
The invention has the beneficial effects that: the invention enables the network to adapt to the characteristic change caused by the geometric deformation of target objects such as pedestrians and riders through the deformable convolution, effectively extracts the important characteristic information of the target objects in the point cloud space through the attention mechanism, finally realizes the characteristic graph to accurately represent the characteristic information of the pedestrians and the riders, reduces the omission ratio of the network to the pedestrians and the riders, and improves the detection precision of the network to the vehicles, the pedestrians and the riders.
Drawings
FIG. 1 is a flow chart of the steps of the present invention;
FIG. 2 is a diagram of a detection network architecture in an embodiment of the present invention;
FIG. 3 is a diagram showing the results of detection in the example of the present invention.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
As shown in fig. 1, a point cloud target detection method based on attention mechanism and deformable convolution specifically includes the following steps:
step 1: the method comprises the steps of firstly dividing an original point cloud data space into single cylinders, and then coding the single cylinders into a pseudo image in a characteristic coding mode, wherein the pseudo image characteristic information comprises information such as channel number, height and width, and the like, so that the pseudo image characteristic information can be suitable for feature extraction in a two-dimensional convolution mode.
This embodiment performs feature encoding on the original point cloud data, and encodes it into columns through a pfn (pilarfeaturenet) layer in the pointpilers model, where the size of each column is [0.16,0.16,3], the size of the whole point cloud space is [0, -19.84, -2.5,47.36,19.84,0.5], and the size of the encoded pseudo-image is [2,64,296,248], where batchsize is 2, the number of channels is 64, the height is 296, and the width is 248.
Step 2: feature extraction is carried out on the pseudo image through two-dimensional convolution, and in the process, a deformable convolution module is introduced to enhance the adaptability of the network to the geometric deformation of the target.
As shown in fig. 2, SE-DCN represents the introduced deformable convolution and attention mechanism combination module, the input of each layer of deformable convolution is the number of channels of the feature map of the previous layer, the channel inputs of the three modules in the figure are 64, 128 and 256 respectively, and the offset channel number input in the deformable convolution module is 18.
Step 3: after a deformable convolution module is introduced into the network, a channel attention module is introduced, and the extraction capability of the network on the important feature information of the target is enhanced.
For the purpose of enhancing the extraction capability of the network on the important features of the target object through the channel attention mechanism, the number of input channels of the attention module at each layer is 64, 128 and 256 respectively.
Step 4: and after the characteristic extraction is finished, sending the data to a detector to finish the detection task.
In this example, three convolution operations are performed on the input feature maps to extract features of different dimensions, each layer of feature extraction block operation includes a convolution operation, a pooling operation and a ReLu activation operation, the input tensor of each layer is [ C, H, W ] and includes the number of channels, height and width, the input of each layer of feature map is the output of the feature map of the previous layer, the output size of the feature map of the first layer is [64,296,248], the output size of the feature map of the second layer is [128,148,124], and the output size of the feature map of the third layer is [256,74,62 ].
In this example, three feature maps with the same size are converted into feature maps with the same size through deconvolution operation, the output size of each layer of feature map after deconvolution operation is [128,148,124], the three feature maps with the same size are fused through splicing operation, and the output final feature map size is [2,384,148,124], where 2 represents the batch number, 384 is the final channel number, 148 is the height of the feature map, and 124 is the width of the feature map.
And sending the final output characteristic diagram into a detection model based on a two-dimensional image, and sending the final output characteristic diagram into an SSD algorithm model for classification and regression in the example to obtain a final detection result.
This example performs permute and contiguous operations on the output feature map, converting the feature map size to [2,148,124,384] to meet the input requirements of the classification and regression models.
Fig. 3 is a diagram of the experimental effect of the present example, which is improved in the detection accuracy of the target object compared to the case of using the pointpilers and VoxelNet models, and can improve the missing detection phenomenon in the pointpilers models.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (5)

1. A point cloud target detection method based on an attention mechanism and deformable convolution is characterized in that:
step 1: firstly, dividing an original point cloud data space into single cylinders, and coding the single cylinders into a pseudo image in a characteristic coding mode, wherein the pseudo image characteristic information comprises channel number, height and width information, so that the pseudo image characteristic information can be suitable for characteristic extraction in a two-dimensional convolution mode;
step 2: extracting features on the pseudo image through two-dimensional convolution, and introducing a deformable convolution module in the process;
step 3: introducing a channel attention module after introducing a deformable convolution module into a network, and finally completing feature extraction to obtain a feature map;
step 4: and (5) sending the characteristic diagram into a detector to obtain a final output characteristic diagram, and finishing a detection task.
2. The method of point cloud target detection based on attention mechanism and deformable convolution of claim 1, wherein Step1 is specifically: the original point cloud data form is (x, y, z, r), only position coordinate information in the data is adopted, feature coding is carried out through a PFN layer in a Pointpilars algorithm, a point cloud tensor [ x, y, z ] is converted into a pseudo image form, and the converted point cloud data feature tensor is [ B, C, H, W ] and represents batch times, channel number of the pseudo image, height and width respectively.
3. The method of point cloud target detection based on attention mechanism and deformable convolution of claim 1, wherein in Step2, the deformable convolution module is specifically:
Figure FDA0003336981300000011
wherein, y (p)0) Output characteristic diagram, p, representing a deformable convolution module0Represents the center point of a common convolution kernel, pnSample points representing the ordinary convolution kernel, x (-) representsIs the input characteristic map, Δ pnRepresenting the offset in the deformable convolution.
4. The method of claim 1, wherein in Step3, the channel attention module is specifically:
Figure FDA0003336981300000012
wherein the content of the first and second substances,
Figure FDA0003336981300000013
is a feature matrix ucAnd scThe final feature matrix, u, obtained by the dot product weighting operationcIs a feature vector, s, after feature extractioncIs a feature matrix obtained after excitation operation.
5. The method of point cloud target detection based on attention mechanism and deformable convolution of claim 1, wherein Step4 is specifically:
sending the feature map into a detector, performing convolution operation on the input feature map for three times to extract features on different dimensions, converting the three feature maps with different sizes into feature maps with the same size through deconvolution operation, and realizing fusion of the feature maps through splicing operation to obtain a final output feature map;
and sending the final output characteristic graph into a detection model based on a two-dimensional image, wherein the detection model at least comprises a YOLO algorithm model, an SSD algorithm model and an RPN algorithm model.
CN202111297310.0A 2021-11-04 2021-11-04 Point cloud target detection method based on attention mechanism and deformable convolution Pending CN114004978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111297310.0A CN114004978A (en) 2021-11-04 2021-11-04 Point cloud target detection method based on attention mechanism and deformable convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111297310.0A CN114004978A (en) 2021-11-04 2021-11-04 Point cloud target detection method based on attention mechanism and deformable convolution

Publications (1)

Publication Number Publication Date
CN114004978A true CN114004978A (en) 2022-02-01

Family

ID=79927072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111297310.0A Pending CN114004978A (en) 2021-11-04 2021-11-04 Point cloud target detection method based on attention mechanism and deformable convolution

Country Status (1)

Country Link
CN (1) CN114004978A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116343192A (en) * 2023-02-10 2023-06-27 泉州装备制造研究所 Outdoor 3D target detection method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343192A (en) * 2023-02-10 2023-06-27 泉州装备制造研究所 Outdoor 3D target detection method and system
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116091496B (en) * 2023-04-07 2023-11-24 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN

Similar Documents

Publication Publication Date Title
Alonso et al. 3d-mininet: Learning a 2d representation from point clouds for fast and efficient 3d lidar semantic segmentation
Wen et al. Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based backbone
CN109190752B (en) Image semantic segmentation method based on global features and local features of deep learning
CN111862101A (en) 3D point cloud semantic segmentation method under aerial view coding visual angle
CN114004978A (en) Point cloud target detection method based on attention mechanism and deformable convolution
CN112613378B (en) 3D target detection method, system, medium and terminal
CN112347987A (en) Multimode data fusion three-dimensional target detection method
EP4174792A1 (en) Method for scene understanding and semantic analysis of objects
CN112819080B (en) High-precision universal three-dimensional point cloud identification method
CN112907573B (en) Depth completion method based on 3D convolution
CN116563488A (en) Three-dimensional target detection method based on point cloud body column
CN116486368A (en) Multi-mode fusion three-dimensional target robust detection method based on automatic driving scene
Chidanand et al. Multi-scale voxel class balanced ASPP for LIDAR pointcloud semantic segmentation
CN114118247A (en) Anchor-frame-free 3D target detection method based on multi-sensor fusion
CN117173399A (en) Traffic target detection method and system of cross-modal cross-attention mechanism
CN116704463A (en) Automatic driving target detection method based on point cloud columnar rapid coding algorithm
CN114550160A (en) Automobile identification method based on three-dimensional point cloud data and traffic scene
Huo et al. Semantic segmentation and scene reconstruction for traffic simulation using CNN
Wang et al. CenterPoint-SE: A single-stage anchor-free 3-D object detection algorithm with spatial awareness enhancement
CN117078982B (en) Deep learning-based large-dip-angle stereoscopic image alignment dense feature matching method
Tang et al. MPT-Net: Mask Point Transformer Network for Large Scale Point Cloud Semantic Segmentation
CN113011360B (en) Road traffic sign line detection method and system based on attention capsule network model
WANG et al. Point Cloud Processing Methods for 3D Point Cloud Detection Tasks D Point Cloud Detection Tasks
Hao et al. Semantic Segmentation for Traffic Scene Understanding Based on Mobile Networks
Zhou et al. Surround-View Road Scene Layout Estimation via IPM-Transformer for Autonomous Driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination