CN111291714A - Vehicle detection method based on monocular vision and laser radar fusion - Google Patents

Vehicle detection method based on monocular vision and laser radar fusion Download PDF

Info

Publication number
CN111291714A
CN111291714A CN202010124991.XA CN202010124991A CN111291714A CN 111291714 A CN111291714 A CN 111291714A CN 202010124991 A CN202010124991 A CN 202010124991A CN 111291714 A CN111291714 A CN 111291714A
Authority
CN
China
Prior art keywords
point cloud
image
fusion
bounding box
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010124991.XA
Other languages
Chinese (zh)
Inventor
张立军
孟德建
黄露莹
张状
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202010124991.XA priority Critical patent/CN111291714A/en
Publication of CN111291714A publication Critical patent/CN111291714A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a vehicle detection method based on monocular vision and laser radar fusion, which comprises the following steps: s1: acquiring an image feature map; s2: acquiring a point cloud characteristic diagram; s3: respectively extracting point cloud characteristic vector f from point cloud characteristic image and image characteristic imagelidarAnd image feature vector fRGB(ii) a S4: the point cloud feature vector flidarAnd image feature vector fRGBPerforming feature fusion to obtain a fusion feature fL(ii) a S5: according to the fusion characteristics fLObtaining a 3D bounding box of the vehicle and obtaining corresponding category parameters; s6: compared with the prior art, the method and the device are beneficial to solving the problems that monocular vision is difficult to effectively estimate the vehicle position and the laser radar can effectively estimate the vehicle positionThe problem of missed detection caused by the sparse remote point cloud can be solved, and the detection effect of the three-dimensional target of the vehicle is further improved.

Description

Vehicle detection method based on monocular vision and laser radar fusion
Technical Field
The invention relates to the field of automatic driving environment perception, in particular to a vehicle detection method based on monocular vision and laser radar fusion.
Background
The detection of the vehicle is an indispensable component in an automatic driving environment perception system, and the target detection is also a basic problem in computer vision. Despite the tremendous advances in this area made by researchers in recent years, it remains a significant challenge to develop a high accuracy, high efficiency, robust target detection system that can be used for autonomous driving. The detection of the vehicle is realized through three-dimensional target detection, a three-dimensional bounding box is output by the three-dimensional target detection, the information comprises the position and posture information of the target vehicle in a three-dimensional environment, and the automatic driving decision system can make a driving decision further only by the information, so that the automatic driving decision system has more important significance for automatic driving.
Because various sensors have respective advantages and disadvantages, information fusion of the multi-mode sensor becomes a necessary choice, the information fusion can make up the defect that environment information acquired by a single sensor is not abundant enough, a sensing system with stronger fault-tolerant capability and higher safety is provided by fusing the advantages of different sensors, and the reliability, the accuracy and the adaptability of the vehicle environment sensing system can be greatly improved particularly under the condition of complex road traffic.
At present, aiming at a three-dimensional target detection task of a vehicle, a camera has the advantages that more details can be obtained, an image contains richer semantic information, and a point cloud obtained by a laser radar sensor is sparse compared with the image, but has high-precision three-dimensional position information. The fusion of the visible light image and the laser radar point cloud can obtain a three-dimensional sensing result with higher precision theoretically.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a vehicle detection method based on monocular vision and laser radar fusion, which is helpful for solving the problems that the monocular vision is difficult to effectively estimate the vehicle position and the laser radar is likely to miss detection due to long-distance point cloud sparsity, and further improving the three-dimensional target detection effect of the vehicle.
The purpose of the invention can be realized by the following technical scheme:
a vehicle detection method based on monocular vision and laser radar fusion comprises the following steps:
s1: acquiring an image feature map;
s2: acquiring a point cloud characteristic diagram;
s3: respectively extracting point cloud characteristic vector f from point cloud characteristic image and image characteristic imagelidarAnd image feature vector fRGB
S4: the point cloud feature vector flidarAnd image feature vector fRGBPerforming feature fusion to obtain a fusion feature fL
S5: according to the fusion characteristics fLObtaining a 3D bounding box of the vehicle and obtaining corresponding parameters;
s6: and removing the overlapped 3D bounding boxes to obtain a final 3D bounding box and corresponding parameters, and finishing vehicle detection.
The step S3 specifically includes:
s301: extracting a 3D candidate region in the point cloud feature map;
s302: respectively projecting the 3D candidate region into an image feature map and a point cloud feature map to obtain a region of interest RoI;
s303: utilizing the region of interest RoI to respectively scratch image region characteristics and point cloud region characteristics from the image characteristic diagram and the point cloud characteristic diagram;
s304: the image area characteristic and the point cloud area characteristic are zoomed to the same set size, and the point cloud characteristic vector f with the same length is obtainedlidarAnd image feature vector fRGB
The projecting the 3D candidate region into the image feature map specifically includes: projecting the 3D candidate area into an image characteristic diagram by using a projection formula from a point cloud coordinate to an image coordinate, projecting a point (x, y, z) in a laser radar coordinate system into an image plane, wherein the obtained image coordinate is (u, v), and the projection formula from the point cloud coordinate to the image coordinate is as follows:
Figure BDA0002394145370000021
wherein,
Figure BDA0002394145370000022
a correction rotation matrix for the image plane of reference camera 0 (left grayscale camera) to camera 2 (left color camera), size 4 × 4;
Figure BDA0002394145370000023
is a matrix of corrective projections of the camera 2,
Figure BDA0002394145370000024
is a rotation and translation matrix from the laser radar coordinate system to the camera coordinate system.
The projecting the 3D candidate region into the point cloud feature map specifically includes: firstly, projecting the 3D candidate area on a bird's-eye view, and then obtaining corresponding coordinates on the point cloud characteristic map in proportion through the bird's-eye view coordinates.
The feature fusion adopts pre-fusion, and specifically comprises the following steps: the point cloud feature vector flidarAnd image feature vector fRGBAnd performing fusion in an input stage.
The formula of the feature fusion is as follows:
Figure BDA0002394145370000031
wherein f isLFor fused output, { HlL1, …, L is a characteristic transformation function,
Figure BDA0002394145370000032
for the fusion operation, the fusion operation includes connecting, summing or averaging by elements.
The step S5 specifically includes:
s501: fusing the features fLInputting into a detection network;
s502: and obtaining a 3D bounding box of the vehicle, and performing regression on the 3D bounding box type, the coordinate and the size and the direction vector respectively.
The point cloud characteristic graph is obtained through a VoxelNet network, the 3D candidate area is extracted through an area candidate network of the VoxelNet network, and the detection network is three full-connection networks.
When training the model, through minimizing the loss function LFINALEnd-to-end training, said loss function LFINALThe expression of (a) is:
LFINAL=LVoxleNet_RPN+LDET
Figure BDA0002394145370000033
Figure BDA0002394145370000034
in the formula 1, LVoxleNet_RPNLoss function for a VoxelNet area candidate network, LDETTo detect a multitasking loss function of the network. In the formula 2, the first step is,
Figure BDA0002394145370000035
output of confidence maps corresponding to the anchor boxes of the positive and negative samples, respectively, Npos,NnegThe number of positive and negative sample anchor boxes, respectively; for a vehicle target, when the intersection ratio of a certain anchor box to any truth bounding box is greater than 0.6 or the intersection ratio to a certain truth bounding box is the largest in all anchor boxes, the anchor box is considered as a positive sample; when the intersection ratio of a certain anchor box and all truth value bounding boxes is less than 0.45, the anchor box is considered as a negative sample; ignoring intersection ratios with all truth bounding boxes between anchor boxes 0.45 and 0.6; classification loss function LclsStill adopting a cross entropy loss function; u. ofi=(uix,uiy,uiz,uil,uiw,uih,u) Is a characterization vector that predicts the normalized difference between the bounding box and the corresponding positive sample anchor box; while
Figure BDA0002394145370000036
A characterization vector that is the difference between the corresponding true bounding box and the positive sample anchor box; regression loss function LregThe Smooth L1 function is still used, and the classification loss function and the regression loss function of the positive and negative samples are balanced by a hyper-parameter α, where α is 1.5, β is 1, and in formula 3, k is the sequence number of the candidate region of the input detection network in the mini-batchkAnd the bounding box k which is the output of the bounding box classification regression branch is the predicted probability of the vehicle.
Figure BDA0002394145370000041
Is a truth label, when the intersection ratio of the candidate region k and any truth bounding box is more than 0.65, the candidate region is considered as a positive sample, and
Figure BDA0002394145370000042
otherwise it is considered as a negative sample,
Figure BDA0002394145370000043
classification loss function LclsA cross entropy loss function is still employed. To pair
Figure BDA0002394145370000044
Are defined as in equation 2, and are the token vector of the difference between the predicted bounding box and the corresponding positive sample candidate bounding box and the token vector of the difference between the corresponding true value bounding box and the positive sample candidate bounding box, respectively; l isreg,LangAll adopt Smooth L1 function, NposNumber of anchor boxes for positive samples, vjTo predict the direction vector difference between a bounding box and the corresponding positive sample candidate bounding box,
Figure BDA0002394145370000045
is the difference of direction vectors between the corresponding true bounding box and the positive sample candidate bounding box, λ is the hyperparameter for balancing the classification loss function and the regression loss function, NclsThe sum of the number of the positive and negative sample anchor boxes in the mini-batch is used for normalizing the classification loss function.
The overlap-removing 3D bounding box specifically comprises: with 0.01 as the cross-over threshold, the overlapping 3D bounding boxes are removed using 2D non-maximum suppression in top view.
When the point cloud characteristic image extracts a 3D candidate region, and when the convolution dimensionality reduction is carried out to model training equal to the image characteristic image channel number, the 3D candidate region retains 1024 candidate regions through non-maximum suppression; during detection, the 3D candidate regions are restrained and reserved for the first 300 candidate regions through non-maximum values.
Compared with the prior art, the invention has the following advantages:
1) according to the invention, the image obtained by monocular vision and the point cloud obtained by the laser radar are fused, so that the problem that the position of the vehicle is difficult to effectively estimate by monocular vision and the detection omission of the laser radar due to the sparse remote point cloud is solved, and the detection effect of the three-dimensional target of the vehicle is further improved;
2) the invention provides a technical route for extracting image features based on a feature pyramid, extracting point cloud features based on a VoxelNet network, extracting region-of-interest features based on a candidate area network, fusing features based on a pre-fusion strategy and improving a detection result based on a non-maximum suppression method, and provides a new idea for detecting a three-dimensional target of a vehicle;
3) the invention adopts a two-stage target detection structure similar to fast R-CNN, and can better inhibit the problem of sample imbalance by utilizing a screening mechanism of a regional candidate network on a simple sample;
4) the method has the advantages that the difference of the number of the 3D candidate areas is reserved in the training process and the detection process, and the dimension reduction is carried out on the point cloud characteristic diagram, so that the accuracy and the reliability of a detection model are ensured, the time for detection operation, the occupied memory and the calculated amount can be reduced, and the vehicle detection efficiency is improved;
5) according to the invention, the overlapped redundant 3D bounding boxes are removed by a non-maximum value inhibition method, and the overlapped detection result is eliminated, so that the detection result has higher accuracy, is more practical and has stronger practicability.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is an overall flow chart of the present invention;
FIG. 3 is a schematic representation of feature fusion.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
Examples
A vehicle detection method based on monocular vision and laser radar fusion comprises the following steps: extracting an image feature map by using a feature pyramid network; acquiring a point cloud characteristic map by using a VoxelNet network and extracting a 3D candidate region; extracting point cloud characteristics and image characteristics of the region of interest based on the candidate region; fusing image characteristics and point cloud characteristics by using a pre-fusion strategy to obtain fusion characteristics; estimating a target class and a 3D bounding box by using the fusion features; the overall process of the method is shown in fig. 2, and specifically includes the steps of removing the overlapped redundant bounding boxes by using a non-maximum suppression method, and the like:
step 1: normalizing the input image, firstly calculating the mean values of the training set image on three channels of R/G/B, and respectively subtracting the mean values from the pixel values of the image on the three channels of R/G/B during training and detection;
step 2: for image information, a feature pyramid is used as a feature extraction network, an image is input into the network, and an image convolution feature map is obtained, wherein the size of the image convolution feature map is 360 multiplied by 1200 multiplied by 32;
and step 3: for the laser radar point cloud, firstly, a VoxelNet network is used for obtaining a point cloud characteristic map of the laser radar point cloud, and the size of the characteristic map is 200 multiplied by 176 multiplied by 768;
and 4, step 4: extracting 3D candidate regions by using a VoxelNet network region candidate network, wherein the output candidate regions are all 0 degrees or 90 degrees, and the loss function used by the VoxelNe region candidate network is as follows:
Figure BDA0002394145370000061
wherein,
Figure BDA0002394145370000062
output of confidence maps corresponding to the anchor boxes of the positive and negative samples, respectively, Npos,NnegThe number of positive and negative sample anchor boxes, respectively; for a vehicle target, when the intersection ratio of a certain anchor box to any truth bounding box is greater than 0.6 or the intersection ratio to a certain truth bounding box is the largest in all anchor boxes, the anchor box is considered as a positive sample; when the intersection ratio of a certain anchor box and all truth value bounding boxes is less than 0.45, the anchor box is considered as a negative sample; ignoring intersection ratios with all truth bounding boxes between anchor boxes 0.45 and 0.6; classification loss function LclsStill adopting a cross entropy loss function; u. ofi=(uix,uiy,uiz,uil,uiw,uih,u) Is a characterization vector that predicts the normalized difference between the bounding box and the corresponding positive sample anchor box; while
Figure BDA0002394145370000063
A characterization vector that is the difference between the corresponding true bounding box and the positive sample anchor box; regression loss function LregThe Smooth L1 function is still used, and the classification loss function and the regression loss function of the positive and negative samples are balanced by a hyper-parameter α, which is set to α -1.5 and β -1.
In the training process, the candidate regions output by the VoxelNet are subjected to non-maximum suppression (the 2D IoU threshold value is 0.8), 1024 candidate regions are reserved and input into a subsequent detection network. In the detection process, in order to reduce the operation time, the non-maximum suppression reserves the first 300 candidate regions. Because the number of channels of the image feature map is different from that of the point cloud feature map, in order to facilitate subsequent fusion and reduce memory occupation and calculation amount during inference, the dimension of the point cloud feature map is reduced to 200 × 176 × 32 by using 1 × 1 convolution, so that the number of channels is equal to that of the image feature map.
And 5: and projecting the 3D candidate regions obtained by the VoxelNet network onto the image feature map and the point cloud feature map respectively to obtain a region of interest (RoI).
The method for projecting the image feature map comprises the following steps: using a projection formula from point cloud coordinates to image coordinates:
Figure BDA0002394145370000064
wherein,
Figure BDA0002394145370000065
a correction rotation matrix for the image plane of reference camera 0 (left grayscale camera) to camera 2 (left color camera), size 4 × 4;
Figure BDA0002394145370000066
is a matrix of corrective projections of the camera 2,
Figure BDA0002394145370000067
is a rotation and translation matrix from the laser radar coordinate system to the camera coordinate system.
Because the size of the image feature map obtained by the feature pyramid network is the same as that of the original image, the image area can be directly mapped onto the feature map to obtain an image interesting area;
the method for projecting the point cloud characteristic diagram comprises the following steps: firstly, projecting the 3D candidate area on a bird's-eye view, wherein the convolution intermediate layer of the VoxelNet network mainly aggregates the features in the height direction, the space structure in the top view direction is still reserved in the subsequent convolution operation, and the corresponding coordinates on the point cloud feature map can be obtained in proportion through the bird's-eye view coordinates.
Regional features can be respectively scratched from an image feature map and a point cloud feature map by utilizing a region of interest (RoI), and because the two regional features are possibly inconsistent in size and difficult to be directly fused, the scratched regional features are respectively zoomed in to 7 multiplied by 7 by using bilinear difference values, and finally point cloud feature vectors f with equal length are respectively obtainedlidarAnd image feature vector fRGB
Step 6: point-to-point cloud feature vector flidarAnd image feature vector fRGBFeature using pre-fusion strategyFusion, as shown in fig. 3.
The pre-fusion strategy specifically comprises: assuming the converged network has L layers, the pre-convergence will be flidarAnd fRGBAnd fusion is carried out in an input stage:
Figure BDA0002394145370000071
wherein f isLIs a point cloud feature vector flidarAnd image feature vector fRGBFused features of post-fusion output, { HlL ═ 1, …, L } is the feature transformation function, in this example the full link layer;
Figure BDA0002394145370000072
representing fusion operations (including joining, summing, etc.), the element-by-element averaging is used in this embodiment.
And 7: fusing the features fLAnd inputting the data into three fully-connected networks, and performing regression on the coordinates and the size of the bounding box, the category and the direction vector respectively. The multi-task loss function of the detection network is as follows:
Figure BDA0002394145370000073
wherein k is the sequence number of the candidate area of the input detection network in the mini-batch. q. q.skAnd the bounding box k which is the output of the bounding box classification regression branch is the predicted probability of the vehicle.
Figure BDA0002394145370000074
Is a truth label, when the intersection ratio of the candidate region k and any truth bounding box is more than 0.65, the candidate region is considered as a positive sample, and
Figure BDA0002394145370000075
otherwise it is considered as a negative sample,
Figure BDA0002394145370000076
classification loss function LclsStill exploiting cross-entropy lossesA function.
Figure BDA0002394145370000079
Respectively representing the difference value between the predicted bounding box and the corresponding positive sample candidate bounding box and the difference value between the corresponding truth bounding box and the positive sample candidate bounding box; l isreg,LangAll adopt Smooth L1 function, NposFor the number of anchor boxes for positive samples, vjTo predict the direction vector difference between a bounding box and the corresponding positive sample candidate bounding box,
Figure BDA0002394145370000078
is the difference of direction vectors between the corresponding true bounding box and the positive sample candidate bounding box, λ is the hyperparameter for balancing the classification loss function and the regression loss function, NclsThe sum of the number of the positive and negative sample anchor boxes in the mini-batch is used for normalizing the classification loss function.
And 8: final overall loss function LFINALThe sum of loss functions of a VoxelNet area candidate network and a detection network is obtained, the objective function value is minimized by optimizing the objective function, end-to-end learning is carried out, and model training is completed:
LFINAL=LVoxleNet_RPN+LDET
this process is the inverse algorithm of maximum likelihood estimation.
And step 9: this is generally not possible in real road scenes, as multiple candidate regions may regress to the same or overlapping bounding box regions in a top view perspective. To avoid this, the overlap detection result is eliminated by removing the 3D bounding box that is overlapped excessively using the 2D non-maximum value suppression in the top view with 0.01 as the cross-over ratio threshold, and the final vehicle detection result is obtained.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and those skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A vehicle detection method based on monocular vision and laser radar fusion is characterized by comprising the following steps:
s1: acquiring an image feature map;
s2: acquiring a point cloud characteristic diagram;
s3: respectively extracting point cloud characteristic vector f from point cloud characteristic image and image characteristic imagelidarAnd image feature vector fRGB
S4: the point cloud feature vector flidarAnd image feature vector fRGBPerforming feature fusion to obtain a fusion feature fL
S5: according to the fusion characteristics fLObtaining a 3D bounding box of the vehicle and obtaining corresponding parameters;
s6: and removing the overlapped 3D bounding boxes to obtain a final 3D bounding box and corresponding parameters, and finishing vehicle detection.
2. The method for detecting a vehicle based on the fusion of monocular vision and lidar according to claim 1, wherein the step S3 specifically comprises:
s301: extracting a 3D candidate region in the point cloud feature map;
s302: respectively projecting the 3D candidate region into an image feature map and a point cloud feature map to obtain a region of interest RoI;
s303: utilizing the region of interest RoI to respectively scratch image region characteristics and point cloud region characteristics from the image characteristic diagram and the point cloud characteristic diagram;
s304: the image area characteristic and the point cloud area characteristic are zoomed to the same set size, and the point cloud characteristic vector f with the same length is obtainedlidarAnd image feature vector fRGB
3. The method according to claim 2, wherein the projecting the 3D candidate region into the image feature map specifically comprises: projecting the 3D candidate area into an image characteristic diagram by using a projection formula from a point cloud coordinate to an image coordinate, projecting a point (x, y, z) in a laser radar coordinate system into an image plane, wherein the obtained image coordinate is (u, v), and the projection formula from the point cloud coordinate to the image coordinate is as follows:
Figure FDA0002394145360000011
wherein,
Figure FDA0002394145360000012
a correction rotation matrix for the left grayscale camera to the left color camera image plane;
Figure FDA0002394145360000013
is the corrective projection matrix of the left color camera,
Figure FDA0002394145360000014
a rotation translation matrix from a laser radar coordinate system to a camera coordinate system;
the projecting the 3D candidate region into the point cloud feature map specifically includes: firstly, projecting the 3D candidate area on a bird's-eye view, and then obtaining corresponding coordinates on the point cloud characteristic map in proportion through the bird's-eye view coordinates.
4. The method for vehicle detection based on monocular vision and lidar fusion as claimed in claim 2, wherein the feature fusion employs pre-fusion, specifically comprising: the point cloud feature vector flidarAnd image feature vector fRGBAnd performing fusion in an input stage.
5. The method for vehicle detection based on the fusion of monocular vision and lidar according to claim 4, wherein the formula of the feature fusion is:
Figure FDA0002394145360000021
wherein f isLFor fused output, { HlL1, L is a characteristic transformation function,
Figure FDA0002394145360000022
for the fusion operation, the fusion operation includes connecting, summing or averaging by elements.
6. The method for detecting a vehicle based on the fusion of monocular vision and lidar according to claim 2, wherein the step S5 specifically comprises:
s501: fusing the features fLInputting into a detection network;
s502: and obtaining a 3D bounding box of the vehicle, and performing regression on the coordinates, the sizes and the direction vectors of the 3D bounding box and the 3D bounding box respectively.
7. The vehicle detection method based on the fusion of the monocular vision and the laser radar as claimed in claim 6, wherein the point cloud feature map is obtained through a VoxelNet network, the 3D candidate area is extracted through an area candidate network of the VoxelNet network, and the detection network is three fully-connected networks.
8. The method of claim 7, wherein the model training is performed by minimizing a loss function LFINALEnd-to-end training, said loss function LFINALThe expression of (a) is:
LFINAL=LVoxleNet_RPN+LDET
Figure FDA0002394145360000023
Figure FDA0002394145360000031
wherein L isVoxleNet_RPNLoss function for a VoxelNet area candidate network, LDETIn order to detect the multi-tasking loss function of the network,
Figure FDA0002394145360000032
output for confidence maps corresponding to the anchor boxes of the positive and negative samples, respectively, Npos,NnegNumber of positive and negative sample anchor boxes, L, respectivelyclsAs a function of classification loss, ui=(uix,uiy,uiz,uil,uiw,uih,u) To predict the characterization vector for the normalized difference between the bounding box and the corresponding positive sample anchor box,
Figure FDA0002394145360000033
a characterization vector, L, for the difference between the corresponding true bounding box and the positive sample anchor boxregFor the regression loss function, α and β are hyper-parameters, k is the sequence number of the candidate region of the input detection network, qkThe bounding box k that is the output of the classification regression branch for the 3D bounding box is the predicted probability of the vehicle,
Figure FDA0002394145360000034
is true value label, Lang、LregAs a function of Smooth L1, NposFor the number of anchor boxes for positive samples, VjTo predict the direction vector difference between a bounding box and the corresponding positive sample candidate bounding box,
Figure FDA0002394145360000035
is the difference of direction vectors between the corresponding true bounding box and the positive sample candidate bounding box, λ is the hyperparameter for balancing the classification loss function and the regression loss function, NclsIs the sum of the number of the positive and negative sample anchor boxes.
9. The method for vehicle detection based on monocular vision and lidar fusion of claim 1, wherein the removing of the overlapped 3D bounding box specifically comprises: with 0.01 as the cross-over threshold, the overlapping 3D bounding boxes are removed using 2D non-maximum suppression in top view.
10. The method for vehicle detection based on monocular vision and lidar fusion of claim 1, wherein the point cloud feature map is reduced in dimension to be equal to the number of channels of the image feature map by convolution when extracting the 3D candidate region, and 1024 candidate regions are reserved in the 3D candidate region through non-maximum suppression during model training; during detection, the 3D candidate regions are restrained and reserved for the first 300 candidate regions through non-maximum values.
CN202010124991.XA 2020-02-27 2020-02-27 Vehicle detection method based on monocular vision and laser radar fusion Pending CN111291714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010124991.XA CN111291714A (en) 2020-02-27 2020-02-27 Vehicle detection method based on monocular vision and laser radar fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010124991.XA CN111291714A (en) 2020-02-27 2020-02-27 Vehicle detection method based on monocular vision and laser radar fusion

Publications (1)

Publication Number Publication Date
CN111291714A true CN111291714A (en) 2020-06-16

Family

ID=71029510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010124991.XA Pending CN111291714A (en) 2020-02-27 2020-02-27 Vehicle detection method based on monocular vision and laser radar fusion

Country Status (1)

Country Link
CN (1) CN111291714A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111722245A (en) * 2020-06-22 2020-09-29 北京百度网讯科技有限公司 Positioning method, positioning device and electronic equipment
CN112183578A (en) * 2020-09-01 2021-01-05 国网宁夏电力有限公司检修公司 Target detection method, medium and system
CN112200851A (en) * 2020-12-09 2021-01-08 北京云测信息技术有限公司 Point cloud-based target detection method and device and electronic equipment thereof
CN112712129A (en) * 2021-01-11 2021-04-27 深圳力维智联技术有限公司 Multi-sensor fusion method, device, equipment and storage medium
CN112990229A (en) * 2021-03-11 2021-06-18 上海交通大学 Multi-modal 3D target detection method, system, terminal and medium
CN112990050A (en) * 2021-03-26 2021-06-18 清华大学 Monocular 3D target detection method based on lightweight characteristic pyramid structure
CN113066124A (en) * 2021-02-26 2021-07-02 华为技术有限公司 Neural network training method and related equipment
CN113111974A (en) * 2021-05-10 2021-07-13 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis
CN113468950A (en) * 2021-05-12 2021-10-01 东风汽车股份有限公司 Multi-target tracking method based on deep learning in unmanned driving scene
CN113674421A (en) * 2021-08-25 2021-11-19 北京百度网讯科技有限公司 3D target detection method, model training method, related device and electronic equipment
CN113724335A (en) * 2021-08-01 2021-11-30 国网江苏省电力有限公司徐州供电分公司 Monocular camera-based three-dimensional target positioning method and system
CN113762001A (en) * 2020-10-10 2021-12-07 北京京东乾石科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113822910A (en) * 2021-09-30 2021-12-21 上海商汤临港智能科技有限公司 Multi-target tracking method and device, electronic equipment and storage medium
CN114118125A (en) * 2021-10-08 2022-03-01 南京信息工程大学 Multi-modal input and space division three-dimensional target detection method
CN114359891A (en) * 2021-12-08 2022-04-15 华南理工大学 Three-dimensional vehicle detection method, system, device and medium
CN114638996A (en) * 2020-12-01 2022-06-17 广州视源电子科技股份有限公司 Model training method, device, equipment and storage medium based on counterstudy
WO2024139375A1 (en) * 2022-12-30 2024-07-04 华为技术有限公司 Data processing method and computer device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886215A (en) * 2019-02-26 2019-06-14 常熟理工学院 The cruise of low speed garden unmanned vehicle and emergency braking system based on machine vision
CN109948661A (en) * 2019-02-27 2019-06-28 江苏大学 A kind of 3D vehicle checking method based on Multi-sensor Fusion
CN110738121A (en) * 2019-09-17 2020-01-31 北京科技大学 front vehicle detection method and detection system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886215A (en) * 2019-02-26 2019-06-14 常熟理工学院 The cruise of low speed garden unmanned vehicle and emergency braking system based on machine vision
CN109948661A (en) * 2019-02-27 2019-06-28 江苏大学 A kind of 3D vehicle checking method based on Multi-sensor Fusion
CN110738121A (en) * 2019-09-17 2020-01-31 北京科技大学 front vehicle detection method and detection system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YIN ZHOU ET AL: "VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection", 《ARXIV:1711.06396V1 [CS.CV]》 *
王也: "基于深度学习与虚拟数据的车辆识别与状态估计研究", 《中国优秀博硕士学位论文全文数据库(博士) 工程科技Ⅱ辑》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111722245B (en) * 2020-06-22 2023-03-10 阿波罗智能技术(北京)有限公司 Positioning method, positioning device and electronic equipment
US11713970B2 (en) 2020-06-22 2023-08-01 Beijing Baidu Netcom Science Technology Co., Ltd. Positioning method, electronic device and computer readable storage medium
CN111722245A (en) * 2020-06-22 2020-09-29 北京百度网讯科技有限公司 Positioning method, positioning device and electronic equipment
EP3842749A2 (en) * 2020-06-22 2021-06-30 Beijing Baidu Netcom Science Technology Co., Ltd. Positioning method, positioning device and electronic device
CN112183578A (en) * 2020-09-01 2021-01-05 国网宁夏电力有限公司检修公司 Target detection method, medium and system
CN112183578B (en) * 2020-09-01 2023-05-23 国网宁夏电力有限公司检修公司 Target detection method, medium and system
CN113762001A (en) * 2020-10-10 2021-12-07 北京京东乾石科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113762001B (en) * 2020-10-10 2024-04-19 北京京东乾石科技有限公司 Target detection method and device, electronic equipment and storage medium
CN114638996A (en) * 2020-12-01 2022-06-17 广州视源电子科技股份有限公司 Model training method, device, equipment and storage medium based on counterstudy
CN112200851A (en) * 2020-12-09 2021-01-08 北京云测信息技术有限公司 Point cloud-based target detection method and device and electronic equipment thereof
CN112200851B (en) * 2020-12-09 2021-02-26 北京云测信息技术有限公司 Point cloud-based target detection method and device and electronic equipment thereof
CN112712129B (en) * 2021-01-11 2024-04-19 深圳力维智联技术有限公司 Multi-sensor fusion method, device, equipment and storage medium
CN112712129A (en) * 2021-01-11 2021-04-27 深圳力维智联技术有限公司 Multi-sensor fusion method, device, equipment and storage medium
CN113066124A (en) * 2021-02-26 2021-07-02 华为技术有限公司 Neural network training method and related equipment
CN112990229A (en) * 2021-03-11 2021-06-18 上海交通大学 Multi-modal 3D target detection method, system, terminal and medium
CN112990050B (en) * 2021-03-26 2021-10-08 清华大学 Monocular 3D target detection method based on lightweight characteristic pyramid structure
CN112990050A (en) * 2021-03-26 2021-06-18 清华大学 Monocular 3D target detection method based on lightweight characteristic pyramid structure
US11532151B2 (en) 2021-05-10 2022-12-20 Tsinghua University Vision-LiDAR fusion method and system based on deep canonical correlation analysis
CN113111974A (en) * 2021-05-10 2021-07-13 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis
CN113111974B (en) * 2021-05-10 2021-12-14 清华大学 Vision-laser radar fusion method and system based on depth canonical correlation analysis
CN113468950A (en) * 2021-05-12 2021-10-01 东风汽车股份有限公司 Multi-target tracking method based on deep learning in unmanned driving scene
CN113724335B (en) * 2021-08-01 2023-12-19 国网江苏省电力有限公司徐州供电分公司 Three-dimensional target positioning method and system based on monocular camera
CN113724335A (en) * 2021-08-01 2021-11-30 国网江苏省电力有限公司徐州供电分公司 Monocular camera-based three-dimensional target positioning method and system
CN113674421B (en) * 2021-08-25 2023-10-13 北京百度网讯科技有限公司 3D target detection method, model training method, related device and electronic equipment
CN113674421A (en) * 2021-08-25 2021-11-19 北京百度网讯科技有限公司 3D target detection method, model training method, related device and electronic equipment
CN113822910A (en) * 2021-09-30 2021-12-21 上海商汤临港智能科技有限公司 Multi-target tracking method and device, electronic equipment and storage medium
CN114118125A (en) * 2021-10-08 2022-03-01 南京信息工程大学 Multi-modal input and space division three-dimensional target detection method
CN114359891A (en) * 2021-12-08 2022-04-15 华南理工大学 Three-dimensional vehicle detection method, system, device and medium
CN114359891B (en) * 2021-12-08 2024-05-28 华南理工大学 Three-dimensional vehicle detection method, system, device and medium
WO2024139375A1 (en) * 2022-12-30 2024-07-04 华为技术有限公司 Data processing method and computer device

Similar Documents

Publication Publication Date Title
CN111291714A (en) Vehicle detection method based on monocular vision and laser radar fusion
CN111862126B (en) Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm
CN110119148B (en) Six-degree-of-freedom attitude estimation method and device and computer readable storage medium
WO2020062433A1 (en) Neural network model training method and method for detecting universal grounding wire
CN113378686B (en) Two-stage remote sensing target detection method based on target center point estimation
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
CN112884064A (en) Target detection and identification method based on neural network
WO2023019875A1 (en) Vehicle loss detection method and apparatus, and electronic device and storage medium
CN112084869B (en) Compact quadrilateral representation-based building target detection method
CN112950645B (en) Image semantic segmentation method based on multitask deep learning
CN111797688A (en) Visual SLAM method based on optical flow and semantic segmentation
CN111208818B (en) Intelligent vehicle prediction control method based on visual space-time characteristics
CN112308921B (en) Combined optimization dynamic SLAM method based on semantics and geometry
CN114937083B (en) Laser SLAM system and method applied to dynamic environment
CN112767478B (en) Appearance guidance-based six-degree-of-freedom pose estimation method
Fang et al. Sewer defect instance segmentation, localization, and 3D reconstruction for sewer floating capsule robots
WO2021175434A1 (en) System and method for predicting a map from an image
CN112949635B (en) Target detection method based on feature enhancement and IoU perception
CN115482518A (en) Extensible multitask visual perception method for traffic scene
CN117593548A (en) Visual SLAM method for removing dynamic feature points based on weighted attention mechanism
CN113160117A (en) Three-dimensional point cloud target detection method under automatic driving scene
Zhang et al. Front vehicle detection based on multi-sensor fusion for autonomous vehicle
CN117671647B (en) Multitasking road scene perception method
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
CN113012191B (en) Laser mileage calculation method based on point cloud multi-view projection graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200616

RJ01 Rejection of invention patent application after publication