CN111291714A - Vehicle detection method based on monocular vision and laser radar fusion - Google Patents
Vehicle detection method based on monocular vision and laser radar fusion Download PDFInfo
- Publication number
- CN111291714A CN111291714A CN202010124991.XA CN202010124991A CN111291714A CN 111291714 A CN111291714 A CN 111291714A CN 202010124991 A CN202010124991 A CN 202010124991A CN 111291714 A CN111291714 A CN 111291714A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- image
- fusion
- bounding box
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 52
- 230000004927 fusion Effects 0.000 title claims abstract description 49
- 239000013598 vector Substances 0.000 claims abstract description 43
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000010586 diagram Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000001629 suppression Effects 0.000 claims description 9
- 240000004050 Pentaglottis sempervirens Species 0.000 claims description 6
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 claims description 6
- 238000012512 characterization method Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 36
- 230000008569 process Effects 0.000 description 6
- 230000007547 defect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/584—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention relates to a vehicle detection method based on monocular vision and laser radar fusion, which comprises the following steps: s1: acquiring an image feature map; s2: acquiring a point cloud characteristic diagram; s3: respectively extracting point cloud characteristic vector f from point cloud characteristic image and image characteristic imagelidarAnd image feature vector fRGB(ii) a S4: the point cloud feature vector flidarAnd image feature vector fRGBPerforming feature fusion to obtain a fusion feature fL(ii) a S5: according to the fusion characteristics fLObtaining a 3D bounding box of the vehicle and obtaining corresponding category parameters; s6: compared with the prior art, the method and the device are beneficial to solving the problems that monocular vision is difficult to effectively estimate the vehicle position and the laser radar can effectively estimate the vehicle positionThe problem of missed detection caused by the sparse remote point cloud can be solved, and the detection effect of the three-dimensional target of the vehicle is further improved.
Description
Technical Field
The invention relates to the field of automatic driving environment perception, in particular to a vehicle detection method based on monocular vision and laser radar fusion.
Background
The detection of the vehicle is an indispensable component in an automatic driving environment perception system, and the target detection is also a basic problem in computer vision. Despite the tremendous advances in this area made by researchers in recent years, it remains a significant challenge to develop a high accuracy, high efficiency, robust target detection system that can be used for autonomous driving. The detection of the vehicle is realized through three-dimensional target detection, a three-dimensional bounding box is output by the three-dimensional target detection, the information comprises the position and posture information of the target vehicle in a three-dimensional environment, and the automatic driving decision system can make a driving decision further only by the information, so that the automatic driving decision system has more important significance for automatic driving.
Because various sensors have respective advantages and disadvantages, information fusion of the multi-mode sensor becomes a necessary choice, the information fusion can make up the defect that environment information acquired by a single sensor is not abundant enough, a sensing system with stronger fault-tolerant capability and higher safety is provided by fusing the advantages of different sensors, and the reliability, the accuracy and the adaptability of the vehicle environment sensing system can be greatly improved particularly under the condition of complex road traffic.
At present, aiming at a three-dimensional target detection task of a vehicle, a camera has the advantages that more details can be obtained, an image contains richer semantic information, and a point cloud obtained by a laser radar sensor is sparse compared with the image, but has high-precision three-dimensional position information. The fusion of the visible light image and the laser radar point cloud can obtain a three-dimensional sensing result with higher precision theoretically.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a vehicle detection method based on monocular vision and laser radar fusion, which is helpful for solving the problems that the monocular vision is difficult to effectively estimate the vehicle position and the laser radar is likely to miss detection due to long-distance point cloud sparsity, and further improving the three-dimensional target detection effect of the vehicle.
The purpose of the invention can be realized by the following technical scheme:
a vehicle detection method based on monocular vision and laser radar fusion comprises the following steps:
s1: acquiring an image feature map;
s2: acquiring a point cloud characteristic diagram;
s3: respectively extracting point cloud characteristic vector f from point cloud characteristic image and image characteristic imagelidarAnd image feature vector fRGB;
S4: the point cloud feature vector flidarAnd image feature vector fRGBPerforming feature fusion to obtain a fusion feature fL;
S5: according to the fusion characteristics fLObtaining a 3D bounding box of the vehicle and obtaining corresponding parameters;
s6: and removing the overlapped 3D bounding boxes to obtain a final 3D bounding box and corresponding parameters, and finishing vehicle detection.
The step S3 specifically includes:
s301: extracting a 3D candidate region in the point cloud feature map;
s302: respectively projecting the 3D candidate region into an image feature map and a point cloud feature map to obtain a region of interest RoI;
s303: utilizing the region of interest RoI to respectively scratch image region characteristics and point cloud region characteristics from the image characteristic diagram and the point cloud characteristic diagram;
s304: the image area characteristic and the point cloud area characteristic are zoomed to the same set size, and the point cloud characteristic vector f with the same length is obtainedlidarAnd image feature vector fRGB。
The projecting the 3D candidate region into the image feature map specifically includes: projecting the 3D candidate area into an image characteristic diagram by using a projection formula from a point cloud coordinate to an image coordinate, projecting a point (x, y, z) in a laser radar coordinate system into an image plane, wherein the obtained image coordinate is (u, v), and the projection formula from the point cloud coordinate to the image coordinate is as follows:
wherein,a correction rotation matrix for the image plane of reference camera 0 (left grayscale camera) to camera 2 (left color camera), size 4 × 4;is a matrix of corrective projections of the camera 2,is a rotation and translation matrix from the laser radar coordinate system to the camera coordinate system.
The projecting the 3D candidate region into the point cloud feature map specifically includes: firstly, projecting the 3D candidate area on a bird's-eye view, and then obtaining corresponding coordinates on the point cloud characteristic map in proportion through the bird's-eye view coordinates.
The feature fusion adopts pre-fusion, and specifically comprises the following steps: the point cloud feature vector flidarAnd image feature vector fRGBAnd performing fusion in an input stage.
The formula of the feature fusion is as follows:
wherein f isLFor fused output, { HlL1, …, L is a characteristic transformation function,for the fusion operation, the fusion operation includes connecting, summing or averaging by elements.
The step S5 specifically includes:
s501: fusing the features fLInputting into a detection network;
s502: and obtaining a 3D bounding box of the vehicle, and performing regression on the 3D bounding box type, the coordinate and the size and the direction vector respectively.
The point cloud characteristic graph is obtained through a VoxelNet network, the 3D candidate area is extracted through an area candidate network of the VoxelNet network, and the detection network is three full-connection networks.
When training the model, through minimizing the loss function LFINALEnd-to-end training, said loss function LFINALThe expression of (a) is:
LFINAL=LVoxleNet_RPN+LDET
in the formula 1, LVoxleNet_RPNLoss function for a VoxelNet area candidate network, LDETTo detect a multitasking loss function of the network. In the formula 2, the first step is,output of confidence maps corresponding to the anchor boxes of the positive and negative samples, respectively, Npos,NnegThe number of positive and negative sample anchor boxes, respectively; for a vehicle target, when the intersection ratio of a certain anchor box to any truth bounding box is greater than 0.6 or the intersection ratio to a certain truth bounding box is the largest in all anchor boxes, the anchor box is considered as a positive sample; when the intersection ratio of a certain anchor box and all truth value bounding boxes is less than 0.45, the anchor box is considered as a negative sample; ignoring intersection ratios with all truth bounding boxes between anchor boxes 0.45 and 0.6; classification loss function LclsStill adopting a cross entropy loss function; u. ofi=(uix,uiy,uiz,uil,uiw,uih,uiθ) Is a characterization vector that predicts the normalized difference between the bounding box and the corresponding positive sample anchor box; whileA characterization vector that is the difference between the corresponding true bounding box and the positive sample anchor box; regression loss function LregThe Smooth L1 function is still used, and the classification loss function and the regression loss function of the positive and negative samples are balanced by a hyper-parameter α, where α is 1.5, β is 1, and in formula 3, k is the sequence number of the candidate region of the input detection network in the mini-batchkAnd the bounding box k which is the output of the bounding box classification regression branch is the predicted probability of the vehicle.Is a truth label, when the intersection ratio of the candidate region k and any truth bounding box is more than 0.65, the candidate region is considered as a positive sample, andotherwise it is considered as a negative sample,classification loss function LclsA cross entropy loss function is still employed. To pairAre defined as in equation 2, and are the token vector of the difference between the predicted bounding box and the corresponding positive sample candidate bounding box and the token vector of the difference between the corresponding true value bounding box and the positive sample candidate bounding box, respectively; l isreg,LangAll adopt Smooth L1 function, NposNumber of anchor boxes for positive samples, vjTo predict the direction vector difference between a bounding box and the corresponding positive sample candidate bounding box,is the difference of direction vectors between the corresponding true bounding box and the positive sample candidate bounding box, λ is the hyperparameter for balancing the classification loss function and the regression loss function, NclsThe sum of the number of the positive and negative sample anchor boxes in the mini-batch is used for normalizing the classification loss function.
The overlap-removing 3D bounding box specifically comprises: with 0.01 as the cross-over threshold, the overlapping 3D bounding boxes are removed using 2D non-maximum suppression in top view.
When the point cloud characteristic image extracts a 3D candidate region, and when the convolution dimensionality reduction is carried out to model training equal to the image characteristic image channel number, the 3D candidate region retains 1024 candidate regions through non-maximum suppression; during detection, the 3D candidate regions are restrained and reserved for the first 300 candidate regions through non-maximum values.
Compared with the prior art, the invention has the following advantages:
1) according to the invention, the image obtained by monocular vision and the point cloud obtained by the laser radar are fused, so that the problem that the position of the vehicle is difficult to effectively estimate by monocular vision and the detection omission of the laser radar due to the sparse remote point cloud is solved, and the detection effect of the three-dimensional target of the vehicle is further improved;
2) the invention provides a technical route for extracting image features based on a feature pyramid, extracting point cloud features based on a VoxelNet network, extracting region-of-interest features based on a candidate area network, fusing features based on a pre-fusion strategy and improving a detection result based on a non-maximum suppression method, and provides a new idea for detecting a three-dimensional target of a vehicle;
3) the invention adopts a two-stage target detection structure similar to fast R-CNN, and can better inhibit the problem of sample imbalance by utilizing a screening mechanism of a regional candidate network on a simple sample;
4) the method has the advantages that the difference of the number of the 3D candidate areas is reserved in the training process and the detection process, and the dimension reduction is carried out on the point cloud characteristic diagram, so that the accuracy and the reliability of a detection model are ensured, the time for detection operation, the occupied memory and the calculated amount can be reduced, and the vehicle detection efficiency is improved;
5) according to the invention, the overlapped redundant 3D bounding boxes are removed by a non-maximum value inhibition method, and the overlapped detection result is eliminated, so that the detection result has higher accuracy, is more practical and has stronger practicability.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is an overall flow chart of the present invention;
FIG. 3 is a schematic representation of feature fusion.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
A vehicle detection method based on monocular vision and laser radar fusion comprises the following steps: extracting an image feature map by using a feature pyramid network; acquiring a point cloud characteristic map by using a VoxelNet network and extracting a 3D candidate region; extracting point cloud characteristics and image characteristics of the region of interest based on the candidate region; fusing image characteristics and point cloud characteristics by using a pre-fusion strategy to obtain fusion characteristics; estimating a target class and a 3D bounding box by using the fusion features; the overall process of the method is shown in fig. 2, and specifically includes the steps of removing the overlapped redundant bounding boxes by using a non-maximum suppression method, and the like:
step 1: normalizing the input image, firstly calculating the mean values of the training set image on three channels of R/G/B, and respectively subtracting the mean values from the pixel values of the image on the three channels of R/G/B during training and detection;
step 2: for image information, a feature pyramid is used as a feature extraction network, an image is input into the network, and an image convolution feature map is obtained, wherein the size of the image convolution feature map is 360 multiplied by 1200 multiplied by 32;
and step 3: for the laser radar point cloud, firstly, a VoxelNet network is used for obtaining a point cloud characteristic map of the laser radar point cloud, and the size of the characteristic map is 200 multiplied by 176 multiplied by 768;
and 4, step 4: extracting 3D candidate regions by using a VoxelNet network region candidate network, wherein the output candidate regions are all 0 degrees or 90 degrees, and the loss function used by the VoxelNe region candidate network is as follows:
wherein,output of confidence maps corresponding to the anchor boxes of the positive and negative samples, respectively, Npos,NnegThe number of positive and negative sample anchor boxes, respectively; for a vehicle target, when the intersection ratio of a certain anchor box to any truth bounding box is greater than 0.6 or the intersection ratio to a certain truth bounding box is the largest in all anchor boxes, the anchor box is considered as a positive sample; when the intersection ratio of a certain anchor box and all truth value bounding boxes is less than 0.45, the anchor box is considered as a negative sample; ignoring intersection ratios with all truth bounding boxes between anchor boxes 0.45 and 0.6; classification loss function LclsStill adopting a cross entropy loss function; u. ofi=(uix,uiy,uiz,uil,uiw,uih,uiθ) Is a characterization vector that predicts the normalized difference between the bounding box and the corresponding positive sample anchor box; whileA characterization vector that is the difference between the corresponding true bounding box and the positive sample anchor box; regression loss function LregThe Smooth L1 function is still used, and the classification loss function and the regression loss function of the positive and negative samples are balanced by a hyper-parameter α, which is set to α -1.5 and β -1.
In the training process, the candidate regions output by the VoxelNet are subjected to non-maximum suppression (the 2D IoU threshold value is 0.8), 1024 candidate regions are reserved and input into a subsequent detection network. In the detection process, in order to reduce the operation time, the non-maximum suppression reserves the first 300 candidate regions. Because the number of channels of the image feature map is different from that of the point cloud feature map, in order to facilitate subsequent fusion and reduce memory occupation and calculation amount during inference, the dimension of the point cloud feature map is reduced to 200 × 176 × 32 by using 1 × 1 convolution, so that the number of channels is equal to that of the image feature map.
And 5: and projecting the 3D candidate regions obtained by the VoxelNet network onto the image feature map and the point cloud feature map respectively to obtain a region of interest (RoI).
The method for projecting the image feature map comprises the following steps: using a projection formula from point cloud coordinates to image coordinates:
wherein,a correction rotation matrix for the image plane of reference camera 0 (left grayscale camera) to camera 2 (left color camera), size 4 × 4;is a matrix of corrective projections of the camera 2,is a rotation and translation matrix from the laser radar coordinate system to the camera coordinate system.
Because the size of the image feature map obtained by the feature pyramid network is the same as that of the original image, the image area can be directly mapped onto the feature map to obtain an image interesting area;
the method for projecting the point cloud characteristic diagram comprises the following steps: firstly, projecting the 3D candidate area on a bird's-eye view, wherein the convolution intermediate layer of the VoxelNet network mainly aggregates the features in the height direction, the space structure in the top view direction is still reserved in the subsequent convolution operation, and the corresponding coordinates on the point cloud feature map can be obtained in proportion through the bird's-eye view coordinates.
Regional features can be respectively scratched from an image feature map and a point cloud feature map by utilizing a region of interest (RoI), and because the two regional features are possibly inconsistent in size and difficult to be directly fused, the scratched regional features are respectively zoomed in to 7 multiplied by 7 by using bilinear difference values, and finally point cloud feature vectors f with equal length are respectively obtainedlidarAnd image feature vector fRGB。
Step 6: point-to-point cloud feature vector flidarAnd image feature vector fRGBFeature using pre-fusion strategyFusion, as shown in fig. 3.
The pre-fusion strategy specifically comprises: assuming the converged network has L layers, the pre-convergence will be flidarAnd fRGBAnd fusion is carried out in an input stage:
wherein f isLIs a point cloud feature vector flidarAnd image feature vector fRGBFused features of post-fusion output, { HlL ═ 1, …, L } is the feature transformation function, in this example the full link layer;representing fusion operations (including joining, summing, etc.), the element-by-element averaging is used in this embodiment.
And 7: fusing the features fLAnd inputting the data into three fully-connected networks, and performing regression on the coordinates and the size of the bounding box, the category and the direction vector respectively. The multi-task loss function of the detection network is as follows:
wherein k is the sequence number of the candidate area of the input detection network in the mini-batch. q. q.skAnd the bounding box k which is the output of the bounding box classification regression branch is the predicted probability of the vehicle.Is a truth label, when the intersection ratio of the candidate region k and any truth bounding box is more than 0.65, the candidate region is considered as a positive sample, andotherwise it is considered as a negative sample,classification loss function LclsStill exploiting cross-entropy lossesA function.Respectively representing the difference value between the predicted bounding box and the corresponding positive sample candidate bounding box and the difference value between the corresponding truth bounding box and the positive sample candidate bounding box; l isreg,LangAll adopt Smooth L1 function, NposFor the number of anchor boxes for positive samples, vjTo predict the direction vector difference between a bounding box and the corresponding positive sample candidate bounding box,is the difference of direction vectors between the corresponding true bounding box and the positive sample candidate bounding box, λ is the hyperparameter for balancing the classification loss function and the regression loss function, NclsThe sum of the number of the positive and negative sample anchor boxes in the mini-batch is used for normalizing the classification loss function.
And 8: final overall loss function LFINALThe sum of loss functions of a VoxelNet area candidate network and a detection network is obtained, the objective function value is minimized by optimizing the objective function, end-to-end learning is carried out, and model training is completed:
LFINAL=LVoxleNet_RPN+LDET
this process is the inverse algorithm of maximum likelihood estimation.
And step 9: this is generally not possible in real road scenes, as multiple candidate regions may regress to the same or overlapping bounding box regions in a top view perspective. To avoid this, the overlap detection result is eliminated by removing the 3D bounding box that is overlapped excessively using the 2D non-maximum value suppression in the top view with 0.01 as the cross-over ratio threshold, and the final vehicle detection result is obtained.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A vehicle detection method based on monocular vision and laser radar fusion is characterized by comprising the following steps:
s1: acquiring an image feature map;
s2: acquiring a point cloud characteristic diagram;
s3: respectively extracting point cloud characteristic vector f from point cloud characteristic image and image characteristic imagelidarAnd image feature vector fRGB;
S4: the point cloud feature vector flidarAnd image feature vector fRGBPerforming feature fusion to obtain a fusion feature fL;
S5: according to the fusion characteristics fLObtaining a 3D bounding box of the vehicle and obtaining corresponding parameters;
s6: and removing the overlapped 3D bounding boxes to obtain a final 3D bounding box and corresponding parameters, and finishing vehicle detection.
2. The method for detecting a vehicle based on the fusion of monocular vision and lidar according to claim 1, wherein the step S3 specifically comprises:
s301: extracting a 3D candidate region in the point cloud feature map;
s302: respectively projecting the 3D candidate region into an image feature map and a point cloud feature map to obtain a region of interest RoI;
s303: utilizing the region of interest RoI to respectively scratch image region characteristics and point cloud region characteristics from the image characteristic diagram and the point cloud characteristic diagram;
s304: the image area characteristic and the point cloud area characteristic are zoomed to the same set size, and the point cloud characteristic vector f with the same length is obtainedlidarAnd image feature vector fRGB。
3. The method according to claim 2, wherein the projecting the 3D candidate region into the image feature map specifically comprises: projecting the 3D candidate area into an image characteristic diagram by using a projection formula from a point cloud coordinate to an image coordinate, projecting a point (x, y, z) in a laser radar coordinate system into an image plane, wherein the obtained image coordinate is (u, v), and the projection formula from the point cloud coordinate to the image coordinate is as follows:
wherein,a correction rotation matrix for the left grayscale camera to the left color camera image plane;is the corrective projection matrix of the left color camera,a rotation translation matrix from a laser radar coordinate system to a camera coordinate system;
the projecting the 3D candidate region into the point cloud feature map specifically includes: firstly, projecting the 3D candidate area on a bird's-eye view, and then obtaining corresponding coordinates on the point cloud characteristic map in proportion through the bird's-eye view coordinates.
4. The method for vehicle detection based on monocular vision and lidar fusion as claimed in claim 2, wherein the feature fusion employs pre-fusion, specifically comprising: the point cloud feature vector flidarAnd image feature vector fRGBAnd performing fusion in an input stage.
5. The method for vehicle detection based on the fusion of monocular vision and lidar according to claim 4, wherein the formula of the feature fusion is:
6. The method for detecting a vehicle based on the fusion of monocular vision and lidar according to claim 2, wherein the step S5 specifically comprises:
s501: fusing the features fLInputting into a detection network;
s502: and obtaining a 3D bounding box of the vehicle, and performing regression on the coordinates, the sizes and the direction vectors of the 3D bounding box and the 3D bounding box respectively.
7. The vehicle detection method based on the fusion of the monocular vision and the laser radar as claimed in claim 6, wherein the point cloud feature map is obtained through a VoxelNet network, the 3D candidate area is extracted through an area candidate network of the VoxelNet network, and the detection network is three fully-connected networks.
8. The method of claim 7, wherein the model training is performed by minimizing a loss function LFINALEnd-to-end training, said loss function LFINALThe expression of (a) is:
LFINAL=LVoxleNet_RPN+LDET
wherein L isVoxleNet_RPNLoss function for a VoxelNet area candidate network, LDETIn order to detect the multi-tasking loss function of the network,output for confidence maps corresponding to the anchor boxes of the positive and negative samples, respectively, Npos,NnegNumber of positive and negative sample anchor boxes, L, respectivelyclsAs a function of classification loss, ui=(uix,uiy,uiz,uil,uiw,uih,uiθ) To predict the characterization vector for the normalized difference between the bounding box and the corresponding positive sample anchor box,a characterization vector, L, for the difference between the corresponding true bounding box and the positive sample anchor boxregFor the regression loss function, α and β are hyper-parameters, k is the sequence number of the candidate region of the input detection network, qkThe bounding box k that is the output of the classification regression branch for the 3D bounding box is the predicted probability of the vehicle,is true value label, Lang、LregAs a function of Smooth L1, NposFor the number of anchor boxes for positive samples, VjTo predict the direction vector difference between a bounding box and the corresponding positive sample candidate bounding box,is the difference of direction vectors between the corresponding true bounding box and the positive sample candidate bounding box, λ is the hyperparameter for balancing the classification loss function and the regression loss function, NclsIs the sum of the number of the positive and negative sample anchor boxes.
9. The method for vehicle detection based on monocular vision and lidar fusion of claim 1, wherein the removing of the overlapped 3D bounding box specifically comprises: with 0.01 as the cross-over threshold, the overlapping 3D bounding boxes are removed using 2D non-maximum suppression in top view.
10. The method for vehicle detection based on monocular vision and lidar fusion of claim 1, wherein the point cloud feature map is reduced in dimension to be equal to the number of channels of the image feature map by convolution when extracting the 3D candidate region, and 1024 candidate regions are reserved in the 3D candidate region through non-maximum suppression during model training; during detection, the 3D candidate regions are restrained and reserved for the first 300 candidate regions through non-maximum values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010124991.XA CN111291714A (en) | 2020-02-27 | 2020-02-27 | Vehicle detection method based on monocular vision and laser radar fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010124991.XA CN111291714A (en) | 2020-02-27 | 2020-02-27 | Vehicle detection method based on monocular vision and laser radar fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111291714A true CN111291714A (en) | 2020-06-16 |
Family
ID=71029510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010124991.XA Pending CN111291714A (en) | 2020-02-27 | 2020-02-27 | Vehicle detection method based on monocular vision and laser radar fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111291714A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111722245A (en) * | 2020-06-22 | 2020-09-29 | 北京百度网讯科技有限公司 | Positioning method, positioning device and electronic equipment |
CN112183578A (en) * | 2020-09-01 | 2021-01-05 | 国网宁夏电力有限公司检修公司 | Target detection method, medium and system |
CN112200851A (en) * | 2020-12-09 | 2021-01-08 | 北京云测信息技术有限公司 | Point cloud-based target detection method and device and electronic equipment thereof |
CN112712129A (en) * | 2021-01-11 | 2021-04-27 | 深圳力维智联技术有限公司 | Multi-sensor fusion method, device, equipment and storage medium |
CN112990229A (en) * | 2021-03-11 | 2021-06-18 | 上海交通大学 | Multi-modal 3D target detection method, system, terminal and medium |
CN112990050A (en) * | 2021-03-26 | 2021-06-18 | 清华大学 | Monocular 3D target detection method based on lightweight characteristic pyramid structure |
CN113066124A (en) * | 2021-02-26 | 2021-07-02 | 华为技术有限公司 | Neural network training method and related equipment |
CN113111974A (en) * | 2021-05-10 | 2021-07-13 | 清华大学 | Vision-laser radar fusion method and system based on depth canonical correlation analysis |
CN113468950A (en) * | 2021-05-12 | 2021-10-01 | 东风汽车股份有限公司 | Multi-target tracking method based on deep learning in unmanned driving scene |
CN113674421A (en) * | 2021-08-25 | 2021-11-19 | 北京百度网讯科技有限公司 | 3D target detection method, model training method, related device and electronic equipment |
CN113724335A (en) * | 2021-08-01 | 2021-11-30 | 国网江苏省电力有限公司徐州供电分公司 | Monocular camera-based three-dimensional target positioning method and system |
CN113762001A (en) * | 2020-10-10 | 2021-12-07 | 北京京东乾石科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN113822910A (en) * | 2021-09-30 | 2021-12-21 | 上海商汤临港智能科技有限公司 | Multi-target tracking method and device, electronic equipment and storage medium |
CN114118125A (en) * | 2021-10-08 | 2022-03-01 | 南京信息工程大学 | Multi-modal input and space division three-dimensional target detection method |
CN114359891A (en) * | 2021-12-08 | 2022-04-15 | 华南理工大学 | Three-dimensional vehicle detection method, system, device and medium |
CN114638996A (en) * | 2020-12-01 | 2022-06-17 | 广州视源电子科技股份有限公司 | Model training method, device, equipment and storage medium based on counterstudy |
WO2024139375A1 (en) * | 2022-12-30 | 2024-07-04 | 华为技术有限公司 | Data processing method and computer device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886215A (en) * | 2019-02-26 | 2019-06-14 | 常熟理工学院 | The cruise of low speed garden unmanned vehicle and emergency braking system based on machine vision |
CN109948661A (en) * | 2019-02-27 | 2019-06-28 | 江苏大学 | A kind of 3D vehicle checking method based on Multi-sensor Fusion |
CN110738121A (en) * | 2019-09-17 | 2020-01-31 | 北京科技大学 | front vehicle detection method and detection system |
-
2020
- 2020-02-27 CN CN202010124991.XA patent/CN111291714A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886215A (en) * | 2019-02-26 | 2019-06-14 | 常熟理工学院 | The cruise of low speed garden unmanned vehicle and emergency braking system based on machine vision |
CN109948661A (en) * | 2019-02-27 | 2019-06-28 | 江苏大学 | A kind of 3D vehicle checking method based on Multi-sensor Fusion |
CN110738121A (en) * | 2019-09-17 | 2020-01-31 | 北京科技大学 | front vehicle detection method and detection system |
Non-Patent Citations (2)
Title |
---|
YIN ZHOU ET AL: "VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection", 《ARXIV:1711.06396V1 [CS.CV]》 * |
王也: "基于深度学习与虚拟数据的车辆识别与状态估计研究", 《中国优秀博硕士学位论文全文数据库(博士) 工程科技Ⅱ辑》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111722245B (en) * | 2020-06-22 | 2023-03-10 | 阿波罗智能技术(北京)有限公司 | Positioning method, positioning device and electronic equipment |
US11713970B2 (en) | 2020-06-22 | 2023-08-01 | Beijing Baidu Netcom Science Technology Co., Ltd. | Positioning method, electronic device and computer readable storage medium |
CN111722245A (en) * | 2020-06-22 | 2020-09-29 | 北京百度网讯科技有限公司 | Positioning method, positioning device and electronic equipment |
EP3842749A2 (en) * | 2020-06-22 | 2021-06-30 | Beijing Baidu Netcom Science Technology Co., Ltd. | Positioning method, positioning device and electronic device |
CN112183578A (en) * | 2020-09-01 | 2021-01-05 | 国网宁夏电力有限公司检修公司 | Target detection method, medium and system |
CN112183578B (en) * | 2020-09-01 | 2023-05-23 | 国网宁夏电力有限公司检修公司 | Target detection method, medium and system |
CN113762001A (en) * | 2020-10-10 | 2021-12-07 | 北京京东乾石科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN113762001B (en) * | 2020-10-10 | 2024-04-19 | 北京京东乾石科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN114638996A (en) * | 2020-12-01 | 2022-06-17 | 广州视源电子科技股份有限公司 | Model training method, device, equipment and storage medium based on counterstudy |
CN112200851A (en) * | 2020-12-09 | 2021-01-08 | 北京云测信息技术有限公司 | Point cloud-based target detection method and device and electronic equipment thereof |
CN112200851B (en) * | 2020-12-09 | 2021-02-26 | 北京云测信息技术有限公司 | Point cloud-based target detection method and device and electronic equipment thereof |
CN112712129B (en) * | 2021-01-11 | 2024-04-19 | 深圳力维智联技术有限公司 | Multi-sensor fusion method, device, equipment and storage medium |
CN112712129A (en) * | 2021-01-11 | 2021-04-27 | 深圳力维智联技术有限公司 | Multi-sensor fusion method, device, equipment and storage medium |
CN113066124A (en) * | 2021-02-26 | 2021-07-02 | 华为技术有限公司 | Neural network training method and related equipment |
CN112990229A (en) * | 2021-03-11 | 2021-06-18 | 上海交通大学 | Multi-modal 3D target detection method, system, terminal and medium |
CN112990050B (en) * | 2021-03-26 | 2021-10-08 | 清华大学 | Monocular 3D target detection method based on lightweight characteristic pyramid structure |
CN112990050A (en) * | 2021-03-26 | 2021-06-18 | 清华大学 | Monocular 3D target detection method based on lightweight characteristic pyramid structure |
US11532151B2 (en) | 2021-05-10 | 2022-12-20 | Tsinghua University | Vision-LiDAR fusion method and system based on deep canonical correlation analysis |
CN113111974A (en) * | 2021-05-10 | 2021-07-13 | 清华大学 | Vision-laser radar fusion method and system based on depth canonical correlation analysis |
CN113111974B (en) * | 2021-05-10 | 2021-12-14 | 清华大学 | Vision-laser radar fusion method and system based on depth canonical correlation analysis |
CN113468950A (en) * | 2021-05-12 | 2021-10-01 | 东风汽车股份有限公司 | Multi-target tracking method based on deep learning in unmanned driving scene |
CN113724335B (en) * | 2021-08-01 | 2023-12-19 | 国网江苏省电力有限公司徐州供电分公司 | Three-dimensional target positioning method and system based on monocular camera |
CN113724335A (en) * | 2021-08-01 | 2021-11-30 | 国网江苏省电力有限公司徐州供电分公司 | Monocular camera-based three-dimensional target positioning method and system |
CN113674421B (en) * | 2021-08-25 | 2023-10-13 | 北京百度网讯科技有限公司 | 3D target detection method, model training method, related device and electronic equipment |
CN113674421A (en) * | 2021-08-25 | 2021-11-19 | 北京百度网讯科技有限公司 | 3D target detection method, model training method, related device and electronic equipment |
CN113822910A (en) * | 2021-09-30 | 2021-12-21 | 上海商汤临港智能科技有限公司 | Multi-target tracking method and device, electronic equipment and storage medium |
CN114118125A (en) * | 2021-10-08 | 2022-03-01 | 南京信息工程大学 | Multi-modal input and space division three-dimensional target detection method |
CN114359891A (en) * | 2021-12-08 | 2022-04-15 | 华南理工大学 | Three-dimensional vehicle detection method, system, device and medium |
CN114359891B (en) * | 2021-12-08 | 2024-05-28 | 华南理工大学 | Three-dimensional vehicle detection method, system, device and medium |
WO2024139375A1 (en) * | 2022-12-30 | 2024-07-04 | 华为技术有限公司 | Data processing method and computer device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111291714A (en) | Vehicle detection method based on monocular vision and laser radar fusion | |
CN111862126B (en) | Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm | |
CN110119148B (en) | Six-degree-of-freedom attitude estimation method and device and computer readable storage medium | |
WO2020062433A1 (en) | Neural network model training method and method for detecting universal grounding wire | |
CN113378686B (en) | Two-stage remote sensing target detection method based on target center point estimation | |
CN113076871B (en) | Fish shoal automatic detection method based on target shielding compensation | |
CN112884064A (en) | Target detection and identification method based on neural network | |
WO2023019875A1 (en) | Vehicle loss detection method and apparatus, and electronic device and storage medium | |
CN112084869B (en) | Compact quadrilateral representation-based building target detection method | |
CN112950645B (en) | Image semantic segmentation method based on multitask deep learning | |
CN111797688A (en) | Visual SLAM method based on optical flow and semantic segmentation | |
CN111208818B (en) | Intelligent vehicle prediction control method based on visual space-time characteristics | |
CN112308921B (en) | Combined optimization dynamic SLAM method based on semantics and geometry | |
CN114937083B (en) | Laser SLAM system and method applied to dynamic environment | |
CN112767478B (en) | Appearance guidance-based six-degree-of-freedom pose estimation method | |
Fang et al. | Sewer defect instance segmentation, localization, and 3D reconstruction for sewer floating capsule robots | |
WO2021175434A1 (en) | System and method for predicting a map from an image | |
CN112949635B (en) | Target detection method based on feature enhancement and IoU perception | |
CN115482518A (en) | Extensible multitask visual perception method for traffic scene | |
CN117593548A (en) | Visual SLAM method for removing dynamic feature points based on weighted attention mechanism | |
CN113160117A (en) | Three-dimensional point cloud target detection method under automatic driving scene | |
Zhang et al. | Front vehicle detection based on multi-sensor fusion for autonomous vehicle | |
CN117671647B (en) | Multitasking road scene perception method | |
CN114663880A (en) | Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism | |
CN113012191B (en) | Laser mileage calculation method based on point cloud multi-view projection graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200616 |
|
RJ01 | Rejection of invention patent application after publication |