CN116071603A - Multi-class target detection method based on camera and laser radar - Google Patents

Multi-class target detection method based on camera and laser radar Download PDF

Info

Publication number
CN116071603A
CN116071603A CN202310198312.7A CN202310198312A CN116071603A CN 116071603 A CN116071603 A CN 116071603A CN 202310198312 A CN202310198312 A CN 202310198312A CN 116071603 A CN116071603 A CN 116071603A
Authority
CN
China
Prior art keywords
point cloud
image
pseudo
point
pseudo image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310198312.7A
Other languages
Chinese (zh)
Inventor
张静
许达
李云松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhu Research Institute of Xidian University
Original Assignee
Wuhu Research Institute of Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhu Research Institute of Xidian University filed Critical Wuhu Research Institute of Xidian University
Priority to CN202310198312.7A priority Critical patent/CN116071603A/en
Publication of CN116071603A publication Critical patent/CN116071603A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multi-class target detection method based on a camera and a laser radar, which mainly solves the problem of low detection accuracy of the existing target detection method. The implementation scheme is as follows: acquiring a pavement image and point cloud data; voxel pretreatment is carried out on the point cloud data; carrying out space information reinforcement on the preprocessed point cloud; color information fusion is carried out on the point cloud with the enhanced space information; convolving the fused point cloud to obtain a double-reinforcement pseudo image; obtaining a feature map to be detected according to the double-reinforcement pseudo image; and sending the feature map to be detected into an SSD detector to generate a detection result of the front target in the running process of the automobile. According to the invention, the space information of the point cloud is enhanced by establishing the intra-mode mapping matrix and sampling, and the color information of the RGB image and the point cloud are fused by size adjustment and generation of the pseudo-view transformation matrix, so that the accuracy of target detection is improved, and the method can be used for automatic driving of unmanned vehicles.

Description

Multi-class target detection method based on camera and laser radar
Technical Field
The invention belongs to the technical field of physics, and further relates to a multi-class target detection method which can be used for automatic driving of an unmanned automobile.
Background
Cameras, which are the most commonly used sensors in unmanned car driving systems, acquire images containing color information of the external environment, play an extremely important role in object detection, but have drawbacks in that they lack depth information and are affected by natural conditions. The laser radar has the advantages of low dependence on lighting conditions, difficult influence of severe weather and high detection precision. The data generated by the method is point cloud data, more detailed target shape and position information can be obtained by analyzing and processing the point cloud data, the environment can be better understood by the automobile, and a more stable and accurate perception result can be provided. Therefore, unmanned vehicles have become a trend to use data generated by cameras and lidar to detect objects located in front of the vehicle during the traveling process, such as vehicles, pedestrians, and riders.
Currently, two types of sensor data fusion algorithms for cameras and lidars fall into three categories: front end fusion, depth fusion, back end fusion. Front end fusion mainly aims at data layer fusion, wherein data of different modes are fused into a single feature vector before being input, and then the single feature vector is input for subsequent operation. The data may be raw data from the sensor or may be pre-processed data. The depth fusion is mainly aimed at the fusion of feature layers, and the depth fusion converts the original data of different modes into high-dimensional feature expression after extracting features, and then performs some interactive fusion operations in different feature layers. The back end fusion is mainly aimed at the fusion of decision planes, and the back end fusion is carried out by processing the original data of different modes by respective networks, outputting classification scores and fusing the scores.
The middle Cheng Hua Long computer technology company has the following application number: patent literature CN202211082826 discloses an "automatic driving decision method and SoC chip based on multi-sensor data fusion", which inputs point cloud data into a trained point cloud target detection neural network model to detect an obstacle target. In the method, although a camera is used, in obstacle detection, only point cloud data acquired by a laser radar is utilized, sensor data cannot be fully utilized, and the acquired image is only used for detecting a lane, and the importance of color information on the obstacle is ignored, so that the obstacle cannot be efficiently detected.
The scientific and technological company of Studian Brown (Beijing) has the application number: the patent document of CN202211314591 discloses a scene-sensing-based V2X multi-sensor fusion method and device, which comprises the steps of performing time synchronization preprocessing operation on acquired sensor data, comparing the preprocessed two types of data, and acquiring correlation coefficients among sensors and confidence degrees of the sensors so as to judge the scene; and finally, determining fusion weights of all the sensors in the preset neural network based on the judged scene and the input sensor types to obtain a fusion result. The method ignores the relation among different sensor data because the fusion of the different sensor data occurs at the decision level, so that the characteristics among different mode data cannot be complemented, and finally the accuracy of target detection is reduced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a multi-class target detection method based on a camera and a laser radar, aims to solve the defect that a single laser radar target detection algorithm lacks color information, and solves the defect that decision-level fusion cannot fully consider the characteristic relation of different mode data, and improves the target detection precision.
The technical idea of the invention is as follows: setting an image auxiliary processing branch and a point cloud main processing branch and a point cloud auxiliary processing branch for the point cloud by setting a path of image auxiliary processing branch for the RGB image so as to solve the problem of lack of color information in single laser radar target detection; spatial information of the point cloud auxiliary processing branch can be fused into the point cloud main processing branch through single-mode self-fusion, so that the problem that decision-level fusion cannot complement characteristics among different mode data is solved.
According to the above thought, the implementation steps of the invention include the following:
(1) Data are obtained through the sensors respectively, and the camera obtains RGB image F 1 Laser radar obtains point cloud R 1 (x,y,z,r);
(2) Point-to-point cloud R 1 (x, y, z, r) voxel pretreatment to obtain an original pseudo image P:
(3) Spatial information of the pseudo image P is enhanced by utilizing a single-mode self-fusion method, and a spatial enhanced pseudo image Ps is generated i
(3a) Point cloud feature R i Registering with the pseudo image feature P to generate an intra-mode mapping matrix M RP
M RP =R i /P i
(3b) Based on the sampling position p' and the intra-modal mapping matrix M RP Generating a characteristic representation V RP
Figure BDA0004108037020000021
Wherein K represents a bilinear interpolation function,
Figure BDA0004108037020000022
features representing adjacent pixels at the sampling position p';
(3c) Point cloud feature R using SetAbstract sampling operation i Feature extraction is carried out to generate point cloud features R to be fused i ';
(3d) Representing the characteristic V RP And point cloud feature R to be fused i ' Point-by-Point fusion, generating fused Point cloud characteristics R i ”:
R i ”=σ(Wtanh(UV RP +VR i ))
Wherein W, U and V are three learnable weight matrices of different values, sigma represents a sigmoid activation function, and tanh represents a hyperbolic tangent function;
(3e) For the fused point cloud characteristics R i "obtaining spatially enhanced pseudo image Ps with two fully connected layers FC i
Ps i =P i ×FC(FC(R i ”))
(4) Image F of RGB i Color information integration space-enhanced pseudo image Ps i Generating a color enhanced pseudo image Pc i
(4a) For RGB image F i Resized to a size and spatially enhanced pseudo image Ps i Is the same in size;
(4b) To resized RGB image F i And spatially enhanced pseudo image Ps i The method comprises the steps of executing BatchNorm operation and ReLu operation respectively, and connecting in a channel dimension to obtain a to-be-transformed graph PF;
(4c) Generating a space factor matrix M by using a space attention formula and a channel attention formula of the map PF to be transformed respectively s And channel factor matrix M c And gets the pseudo view transformation matrix M cs
M cs =0.6*M c +0.4*M s
(4d) Transforming the pseudo-view into matrix M cs And RGB image F i Multiplication to obtain viewing angle conversion pseudo image Pv i And the pseudo image Pv i And spatially enhancing the dummy image Ps i Splicing in the channel dimension to generate a double-enhanced pseudo image Pc i
(5) According to the double-enhanced pseudo-image Pc i Obtaining a dual-enhancement pseudo image:
(5a) For double-enhanced pseudo image Pc i Convolving to obtain a downsampled double-enhanced pseudo-image Pc i+1 Returning to the step (3);
(5b) Repeating the step (5 a) to finally obtain the double-reinforced pseudo image Pc 3
(6) For double-enhanced pseudo image Pc 3 Performing two transpose convolutions, dividingObtain two transposed characteristic graphs Pt 1 And Pt (Pt) 2 . Double-enhanced pseudo image Pc 3 With transposed two characteristic patterns Pt 1 、Pt 2 Splicing to obtain a feature map F to be detected U
(7) To-be-detected feature map F U And sending the detection result to an SSD detector to generate a detection result of a front target in the running process of the automobile.
Compared with the prior art, the invention has the following advantages:
firstly, the invention uses a single-mode self-fusion method to strengthen the spatial information of the pseudo image by establishing a mapping matrix in the mode and setabstract sampling, so that the spatial expression capacity of the pseudo image on the target is enhanced, the defect that useful spatial information is lost in the voxelization process in the prior art is overcome, and the accuracy of target detection is improved.
Secondly, the invention utilizes a bimodal cross fusion method to fuse the color information of the RGB image with the space information of the point cloud through size adjustment and generation of the pseudo view transformation matrix, thereby further improving the accuracy of target detection.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic illustration of a single mode self-fusion in accordance with the present invention;
FIG. 3 is a schematic diagram of a bimodal cross-fusion in accordance with the present invention.
Detailed Description
Embodiments and effects of the present invention are described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps for this example are as follows.
And step 1, obtaining road surface images and point cloud data.
Under driving scene, the detection of various targets by the automobile needs to acquire images and point cloud data of the road surface through a sensor, and the acquisition of the information can acquire RGB image F by adopting a camera 1 Obtaining a point cloud R by adopting a laser radar 1 (x, y, z, r). In this embodiment, the actual scenario is simulated with the internationally published KITTI data set as inputThe data is acquired by the sensor. The KITTI data set is one of the most authoritative computer vision evaluation data sets internationally, and comprises real images and point cloud data acquired by scenes such as urban areas, villages, highways and the like.
And step 2, voxelized preprocessing is carried out on the point cloud data.
Point cloud R 1 About 2 tens of thousands of data are contained, including a large number of noise points and redundancy points. In order to reduce the system computing load, a point cloud R is needed first 1 The (x, y, z, r) is subjected to voxelization pretreatment to obtain an original pseudo image P capable of expressing important information of the point cloud, and the method is concretely realized as follows:
2.1 In z direction, from z=0m to z=4m, dividing the point cloud space into four equal-height spaces, respectively, height space S 1 (z 1 =[0,1)m),S 2 (z 2 =[1,2)m),S 3 (z 3 =[2,3)m),S 4 (z 4 =[3,4]m);
2.2 In the x-y direction), the point cloud in each height space is divided into regular column-shaped sub-point clouds with the bottom dimension of 0.16mx0.16m, in this embodiment, the height space S 1 、S 2 、S 3 、S 4 Are each divided into 496×432 regular cylindrical sub-point clouds;
2.3 Setting a sampling threshold D, randomly sampling columnar sub-point clouds with the number of points larger than the threshold, and zero-filling columnar sub-point clouds with the number of points smaller than the threshold, wherein in the embodiment, d=32, if the number of points in 496×432 regular columnar sub-point clouds in each height space is larger than 32, randomly sampling 32 points; if less than 32, the missing dots are complemented with 0;
2.4 Calculating an arithmetic mean (x) of all points within each columnar sub-point cloud c ,y c ,z c ) And offset (x) p ,y p ) Obtaining enhanced point cloud (x, y, z, r, x) c ,y c ,z c ,x p ,y p );
2.5 Enhanced point cloud (x, y, z, r, x) using a maximally pooled and simplified PointNet network pair c ,y c ,z c ,x p ,y p ) Encoding to generateSub-pseudo-image p corresponding to each height space i The method comprises the steps of carrying out a first treatment on the surface of the In the present embodiment, the height space S 1 、S 2 、S 3 、S 4 The corresponding sub-pseudo images are p 1 、p 2 、p 3 、p 4 . Each sub-pseudo-image has a size of 496 x 432.
2.6 Using four fully connected layers to obtain a height dimension weight and a channel dimension weight:
2.6.1 Using a first fully-connected layer W 1 Compressing sub-pseudo-image p i Is recycled by the second full connection layer W 2 Extracting its height dimension weight S i
S i =W 2 δ(W 1 p i )
Wherein δ () represents an activation function;
2.6.2 Using a third full-connection layer W 3 Compressing sub-pseudo-image p i Re-use of the fourth full connection layer W 4 Extracting its channel dimension weight T i
T i =W 4 δ(W 3 p i )
Wherein δ () represents an activation function;
2.7 Four sub-pseudo-images p) i Height dimension weight S corresponding to the same i And channel dimension weight T i Multiplication is followed by stitching and maximum pooling operations in the channel dimension to obtain the final original pseudo-image P.
In this embodiment, the shape of the four sub-pseudo images after being stitched in the channel dimension is [4,496,432], and by performing the maximum pooling in the channel dimension, the shape of the final original pseudo image P is [496,432].
And 3, spatial information reinforcement is carried out on the preprocessed point cloud.
In the voxelization process, part of spatial information is lost due to the point cloud, so that the accuracy of target detection is reduced, and the spatial information of the pseudo image P needs to be enhanced. In the embodiment, the spatial information of the pseudo image P is enhanced through registration, sampling and fusion, and finally the spatial enhanced pseudo image Ps is generated i
Referring to fig. 2, the specific implementation of this step is as follows:
3.1 Point cloud feature R) i Registering with the pseudo image feature P to generate an intra-mode mapping matrix M RP
M RP =R i /P i
3.2 According to intra-modality mapping matrix M RP And sampling position p' to generate a feature representation V RP
Figure BDA0004108037020000061
Wherein K represents a bilinear interpolation function,
Figure BDA0004108037020000062
features representing adjacent pixels at the sampling position p';
3.3 Point cloud feature R using setextraction sampling operations i Feature extraction is carried out to generate point cloud features R to be fused i ' its specific implementation is as follows:
3.3.1 Point cloud R) 1 Obtaining key points by using a furthest point sampling method;
3.3.2 A nearest k neighborhood method is adopted, a plurality of points around the nearest k neighborhood method are grouped by taking the key point as a center, and a local point area is formed;
3.3.3 Encoding a local point region using a simplified PointNet network to obtain a point cloud R 1 Is characterized by (2);
in this embodiment, if i=1, r 1 Extracting 4096 points from 20000 points; if i=2, r 2 Extracting 1024 points from 4096 points; if i=3, r 3 Extracting 256 points from 1024 points;
3.4 Representing the characteristic V RP And point cloud characteristics R to be fused i ' Point-by-Point fusion, generating fused Point cloud characteristics R i ”:
R i ”=σ(Wtanh(UV RP +VR i ))
Wherein W, U and V are three learnable weight matrices of different values, sigma represents a sigmoid activation function, and tanh represents a hyperbolic tangent function;
3.5 For the fused point cloud characteristics R) i "feature of lot number dimension and channel dimension is sequentially compressed by two fully connected layers FC () to obtain spatially enhanced pseudo image Ps i
Ps i =P i ×FC(FC(R i ”))
In the present embodiment, the pseudo image Ps is spatially enhanced i The size of the image is the same as the size of the original pseudo image characteristic P, and is [496,432]]。
And 4, carrying out color information fusion on the point cloud with the enhanced space information.
One of the biggest drawbacks of single-point clouds is the lack of color information, which makes it difficult to improve the accuracy of point cloud target detection. This step is performed by applying a mask to the RGB image F i Is transformed into a pseudo image so that color information thereof is integrated into a space for the pseudo image Ps i Performs enhancement to generate a color enhancement pseudo image Pc i
Referring to fig. 3, the specific implementation of this step is as follows:
4.1 Adjusting RGB image F i Size, size and space-enhancing dummy image Ps i In the present embodiment, the RGB image F i Original size of [1300,400 ]]Adjusted to [496,432]];
4.2 RGB image F after resizing i And spatially enhanced pseudo image Ps i The method comprises the steps of sequentially executing BatchNorm operation and ReLu operation, and splicing the two images in the channel dimension to obtain a to-be-transformed image PF; in the present embodiment, the resized RGB image F i Shape is [3,496,432 ]]Spatially enhanced pseudo image Ps i Is of the shape [64,496,432 ]]The shape of the spliced diagram PF to be transformed is [67,496,432 ]];
4.3 Generating a space factor matrix M by using a space attention formula and a channel attention formula of the map PF to be transformed s And channel factor matrix M c
M s =σ(Conv(AvgPool(PF),MaxPool(PF)))
M c =σ(W 5 ReLu(W 6 (PF)))
Wherein σ () represents an activation function, conv () represents a convolution operation, avgPool () represents an average pooling operation, maxPool () represents a maximum pooling operation, W 5 And W is 6 Representing two different fully connected layers, respectively, reLu () represents ReLu operation;
4.4 According to the space factor matrix M s And channel factor matrix M c Calculating to obtain a pseudo view transformation matrix M cs
M cs =0.6*M c +0.4*M s Wherein represents multiplication;
4.5 Transform matrix M) of pseudo-views cs And RGB image F i Multiplication to obtain viewing angle conversion pseudo image Pv i And the pseudo image Pv i And spatially enhancing the dummy image Ps i Performing convolution operation after channel dimension is spliced to generate a double-reinforcement pseudo image Pc i In the present embodiment, the double-enhanced pseudo image Pc i Is of the shape [64,496,432 ]]。
Step 5, convolution obtains a double-enhanced pseudo image Pc 3
5.1 For double-enhanced dummy image Pc) i Convolving to obtain a downsampled double-enhanced pseudo-image Pc i+1 Returning to the step (3);
5.2 Repeating the step (5.1) to finally obtain the double-reinforced pseudo image Pc 3
In this embodiment, the number of convolution kernels corresponding to the convolution operation is [6,128,256] in sequence, and the step sizes are [1,2,4] in sequence.
Step 6, according to the double-enhanced pseudo image Pc 3 Obtaining a feature diagram F to be detected U
6.1 For double-enhanced dummy image Pc) 3 Performing two transpose convolutions to obtain two transposed feature maps Pt respectively 1 And Pt (Pt) 2
6.2 To double-strengthen dummy image Pc) 3 With transposed two characteristic patterns Pt 1 、Pt 2 Splicing in turnObtaining a feature diagram F to be detected U In this embodiment, the transpose convolution kernels of both transpose convolutions are 128.
Step 7, feature map F to be detected U And sending the detection result to an SSD detector to generate a detection result of a front target in the running process of the automobile.
The SSD detector is one of the most excellent existing object detectors, and can directly generate an object detection result for an input feature map, and the object detection result is specifically realized as follows:
7.1 SSD detector pair input to-be-detected feature map F U Convolving to generate a classification prediction score and a regression prediction score;
7.2 Generating a default frame according to the classification prediction score and the regression prediction score;
7.3 Obtaining classification loss and regression loss according to the default frame;
7.4 Decoding the default frame and suppressing the non-maximum value according to the classification loss and the regression loss to obtain a final detection result.
The effect of the present invention can be further illustrated by the following simulation experiment.
Simulation experiment data set
The data set adopts one of the most internationally authoritative computer vision evaluation data sets KITTI, which contains real images and point cloud data acquired by scenes such as urban areas, villages and highways.
Second, simulation content
The method and the existing 5 single laser radar methods are respectively used: MV3D, voxelNet, SECOND, pointPillars, pointRCNN,5 camera lidar methods: h 2 3DRCNN and Point-GNN, IPOD, F-ConvNet, pointPainting respectively perform target detection on KITTI data sets, and compare their detection accuracies, and the results are shown in Table 1.
TABLE 1 comparison of the inventive method with other prior art methods
Figure BDA0004108037020000081
Figure BDA0004108037020000091
The MV3D method is derived from Multi-view3 dobjectdetectionwork for Autonomomousdriving
The VoxelNet method is derived from VoxelNet End-to-EndLearninggforPointCloudBased 3DObject Detection
The SECOND method is derived from Sensors|FreeFull-text|SECOND: sparselyEmbedded ConvolutionalDetection
The PointPicloras method is derived from PointPicloras FastEncoderstfor object detection from PointClouds
The PointRCNN method is from PointRCNN:3DObjectProposalGenerationandDetectionfrom PointCloud
The H is 2 The 3DRCNN method is described in from FromMulti-Viewtollow-3D: hallucinatedHollow-3DR-CNN for3DObject detection
The Point-GNN method is derived from the GraphNeuralNetworkfor3DObjectDetectionina PointCloud
The IPOD method is derived from IPOD: intervePoint-basedObjectDetector for PointCloud
The F-ConvNet method is derived from FrustumConvNet: slip FrustumstoAggregateLocalPoint-Wise FeaturesforAmodal DObjectdetection
The PointPaintPaintmethod comes from PointPaintPaintSequencialFusion for3DObject detection
It can be seen from table 1 that the present method achieves excellent results in terms of accuracy, especially for both the bicycle and cyclist types of targets. For example, the accuracy is improved by 2.66% on average compared to the classical target detection algorithm PointPicloras. Wherein the accuracy improves on average by 2.97% for detection of small objects, wherein the accuracy improvement for simple cyclists is greatest (5.27%). Compared with the classical target detection method PointRCNN, the accuracy is improved by 11.50% on average, and the detection accuracy for small targets is improved by 11.3% on average. Among them, the detection accuracy for a simple bicycle is improved by 13.7%.

Claims (5)

1. A multi-category target detection method based on a camera and a laser radar is characterized by comprising the following steps:
(1) Data are obtained through the sensors respectively, and the camera obtains RGB image F 1 Laser radar obtains point cloud R 1 (x,y,z,r);
(2) Point-to-point cloud R 1 (x, y, z, r) voxel pretreatment to obtain an original pseudo image P:
(3) Enhancing the spatial information of the original pseudo image P to generate a spatially enhanced pseudo image Ps i
(3a) Point cloud feature R i Registering with the pseudo image feature P to generate an intra-mode mapping matrix M RP
M RP =R i /P i
(3b) Based on the sampling position p' and the intra-modal mapping matrix M RP Generating a characteristic representation V RP
Figure FDA0004108037010000011
Wherein K represents a bilinear interpolation function,
Figure FDA0004108037010000012
features representing adjacent pixels at the sampling position p';
(3c) Operating point cloud feature R using SetAbstract i Feature extraction is carried out to generate point cloud features R to be fused i ';
(3d) Representing the characteristic V RP And point cloud feature R to be fused i ' Point-by-Point fusion, generating fused Point cloud characteristics R i ”:
R i ”=σ(Wtanh(UV RP +VR i ))
Wherein W, U and V are three learnable weight matrices of different values, sigma represents a sigmoid activation function, and tanh represents a hyperbolic tangent function;
(3e) For the fused point cloud characteristics R i "obtaining spatially enhanced pseudo image Ps with two fully connected layers FC i
Ps i =P i ×FC(FC(R i ”))
(4) Image F of RGB i Color information integration space-enhanced pseudo image Ps i Generating a color enhanced pseudo image Pc i
(4a) For RGB image F i Resized to a size and spatially enhanced pseudo image Ps i Is the same in size;
(4b) To resized RGB image F i And spatially enhanced pseudo image Ps i The method comprises the steps of executing BatchNorm operation and ReLu operation respectively, and connecting in a channel dimension to obtain a to-be-transformed graph PF;
(4c) Generating a space factor matrix M by using a space attention formula and a channel attention formula of the map PF to be transformed respectively s And channel factor matrix M c
(4d) According to a space factor matrix M s And channel factor matrix M c Calculating to obtain a pseudo view transformation matrix M cs
M cs =0.6*M c +0.4*M s
(4e) Transforming the pseudo-view into matrix M cs And RGB image F i Multiplication to obtain viewing angle conversion pseudo image Pv i And the pseudo image Pv i And spatially enhancing the dummy image Ps i Splicing in the channel dimension to generate a double-enhanced pseudo image Pc i
(5) According to the double-enhanced pseudo-image Pc i Obtaining a dual-enhancement pseudo image:
(5a) For double-enhanced pseudo image Pc i Convolving to obtain a downsampled double-enhanced pseudo-image Pc i+1 Returning to the step (3);
(5b) Repeating the step (5 a) to finally obtain the double-reinforced pseudo image Pc 3
(6) For double-enhanced pseudo image Pc 3 Performing two transpose convolutions to obtain transposed two feature maps Pt respectively 1 And Pt (Pt) 2 . Double-enhanced pseudo image Pc 3 With transposed two characteristic patterns Pt 1 、Pt 2 Sequentially splicing to obtain a feature map F to be detected U
(7) To-be-detected feature map F U And sending the detection result to an SSD detector to generate a detection result of a front target in the running process of the automobile.
2. The method of claim 1, wherein in step (2) the point cloud R is 1 (x, y, z, r) voxel pretreatment, realizing the following:
(2a) In the z direction, dividing the point cloud space into four equal-height spaces;
(2b) In the x-y direction, dividing the point cloud in each height space into regular column-shaped sub-point clouds with the bottom surface size of 0.16mX0.16m;
(2c) Setting a sampling threshold D, randomly sampling columnar sub-point clouds with the point number larger than the threshold, and zero filling columnar sub-point clouds with the point number smaller than the threshold;
(2d) Calculating the arithmetic mean (x) of all points in each columnar sub-point cloud c ,y c ,z c ) And offset (x) p ,y p );
(2e) Computed point clouds (x, y, z, r, x) using a maximally pooled and simplified PointNet network c ,y c ,z c ,x p ,y p ) Coding to generate sub-pseudo image p corresponding to each height space i The method comprises the steps of carrying out a first treatment on the surface of the And obtain each sub-pseudo image p i Corresponding height dimension weight S i And channel dimension weight T i
S i =W 2 δ(W 1 p i )
T i =W 4 δ(W 3 p i )
Wherein W is 1 ,W 2 ,W 3 ,W 4 Representing four different fully connected layers, respectively, delta () representing an activation function;
(2f) To four sub-pseudo-images p i Height dimension weight S corresponding to the same i And channel dimension weight T i Multiplication is followed by stitching and maximum pooling operations in the channel dimension to obtain the final original pseudo-image P.
3. The method of claim 1, wherein the point cloud feature R is operated in step (3 c) using setextraction i The feature extraction is carried out, and the realization is as follows:
(3c1) Point-to-point cloud R 1 Obtaining key points by using a furthest point sampling method;
(3c2) Grouping a plurality of points around the key point by using a nearest k neighborhood method as a center to form a local point area;
(3c3) Encoding local point regions using a simplified PointNet network to obtain a point cloud R 1 Is characterized by (3).
4. The method of claim 1, wherein step (4 c) generates the spatial factor matrix M using a spatial attention formula and a channel attention formula, respectively, for the map PF to be transformed s And channel factor matrix M c The formula is as follows:
M s =σ(Conv(AvgPool(PF),MaxPool(PF)))
M c =σ(W 5 ReLu(W 6 (PF)))
wherein σ () represents an activation function, conv () represents a convolution operation, avgPool () represents an average pooling operation, maxPool () represents a maximum pooling operation, W 5 And W is 6 Representing two different fully connected layers, respectively, reLu () represents ReLu operation.
5. The method according to claim 1, wherein the feature pattern F to be detected is obtained in the step (7) U Sending the detection result to an SSD detector to generate a detection result of a front target in the running process of the automobile, wherein the detection result is realized as follows:
(7a) SSD detector to-be-detected feature map F U Convolving to generate a classification prediction score and a regression prediction score;
(7b) Generating a default frame according to the classification prediction score and the regression prediction score;
(7c) Obtaining classification loss and regression loss according to a default frame;
(7d) And decoding the default frame and performing non-maximum suppression processing according to the classification loss and the regression loss to obtain a final prediction result.
CN202310198312.7A 2023-03-03 2023-03-03 Multi-class target detection method based on camera and laser radar Pending CN116071603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310198312.7A CN116071603A (en) 2023-03-03 2023-03-03 Multi-class target detection method based on camera and laser radar

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310198312.7A CN116071603A (en) 2023-03-03 2023-03-03 Multi-class target detection method based on camera and laser radar

Publications (1)

Publication Number Publication Date
CN116071603A true CN116071603A (en) 2023-05-05

Family

ID=86180243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310198312.7A Pending CN116071603A (en) 2023-03-03 2023-03-03 Multi-class target detection method based on camera and laser radar

Country Status (1)

Country Link
CN (1) CN116071603A (en)

Similar Documents

Publication Publication Date Title
CN111027401B (en) End-to-end target detection method with integration of camera and laser radar
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN111199206A (en) Three-dimensional target detection method and device, computer equipment and storage medium
CN112731436B (en) Multi-mode data fusion travelable region detection method based on point cloud up-sampling
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN117274749B (en) Fused 3D target detection method based on 4D millimeter wave radar and image
CN114821507A (en) Multi-sensor fusion vehicle-road cooperative sensing method for automatic driving
CN114299405A (en) Unmanned aerial vehicle image real-time target detection method
CN114463736A (en) Multi-target detection method and device based on multi-mode information fusion
CN116486368A (en) Multi-mode fusion three-dimensional target robust detection method based on automatic driving scene
TW202225730A (en) High-efficiency LiDAR object detection method based on deep learning through direct processing of 3D point data to obtain a concise and fast 3D feature to solve the shortcomings of complexity and time-consuming of the current voxel network model
CN114332494A (en) Three-dimensional target detection and identification method based on multi-source fusion under vehicle-road cooperation scene
Hu et al. A video streaming vehicle detection algorithm based on YOLOv4
CN112288667A (en) Three-dimensional target detection method based on fusion of laser radar and camera
CN117111055A (en) Vehicle state sensing method based on thunder fusion
CN116883588A (en) Method and system for quickly reconstructing three-dimensional point cloud under large scene
CN114155414A (en) Novel unmanned-driving-oriented feature layer data fusion method and system and target detection method
CN114118247A (en) Anchor-frame-free 3D target detection method based on multi-sensor fusion
CN116664856A (en) Three-dimensional target detection method, system and storage medium based on point cloud-image multi-cross mixing
CN116703996A (en) Monocular three-dimensional target detection algorithm based on instance-level self-adaptive depth estimation
CN116563488A (en) Three-dimensional target detection method based on point cloud body column
CN116310368A (en) Laser radar 3D target detection method
CN116129234A (en) Attention-based 4D millimeter wave radar and vision fusion method
CN116152800A (en) 3D dynamic multi-target detection method, system and storage medium based on cross-view feature fusion
CN116071603A (en) Multi-class target detection method based on camera and laser radar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination