CN115032651B

CN115032651B - Target detection method based on laser radar and machine vision fusion

Info

Publication number: CN115032651B
Application number: CN202210630026.9A
Authority: CN
Inventors: 张炳力; 王怿昕; 姜俊昭; 徐雨强; 王欣雨; 王焱辉; 杨程磊
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2024-04-09
Anticipated expiration: 2042-06-06
Also published as: CN115032651A

Abstract

The invention discloses a target detection method based on laser radar and machine vision fusion, which comprises the following steps: 1. arranging a laser radar and a camera at the corresponding position of the vehicle; 2. processing the point cloud information acquired by the laser radar to output a radar detection frame; 3. processing the image information acquired by the camera to output a visual detection frame; 4. carrying out space-time synchronization on the information processed by the laser radar and the camera; 5. carrying out data association on the information after time-space synchronization to obtain an association pair; 6. and carrying out data fusion on the obtained association pairs, tracking the fused targets, and outputting a final fusion result by integrating continuous frame target information. The invention can avoid the problem that a large number of false detection and omission detection exist in the data association and fusion process in the target detection process based on multi-sensor fusion, thereby ensuring the accuracy of the evaluation of the sensing environment and ensuring the accurate execution of planning control.

Description

Target detection method based on laser radar and machine vision fusion

Technical Field

The invention relates to the technical field of environment sensing based on multi-sensor fusion, in particular to a target detection method based on laser radar and machine vision fusion.

Background

The perception technology is taken as the most basic of the unmanned technology and is also the most important, and the accuracy and instantaneity of understanding the targets around the vehicle directly determine the overall level of the unmanned system. The method is limited by the working principles of different sensors in the execution of sensing tasks, and accurate and comprehensive obstacle information is required to be acquired, so that a single sensor cannot be realized, and the research on a multi-sensor fusion technology becomes necessary.

The data fusion method commonly used at present can be divided into pre-fusion and post-fusion. The pre-fusion comprises data-level fusion and feature-level fusion, and the post-fusion is mainly decision-level fusion.

If the pre-fusion mode is selected, both the data level fusion and the feature level fusion depend on a deep learning framework, so that the network architecture is more complex, and the requirements on the GPU are also improved. In post fusion, a comprehensive fusion strategy needs to be provided in a decision-level fusion method to solve target recognition in various scenes, and most methods select the situation that unusual obstacles are missed due to the fact that the interested areas are formed visually, so that the problem of missing detection and false detection is solved without further processing of the fusion targets.

Specifically, park et al generate a high resolution dense disparity map based on a two-stage convolutional neural network using dense stereo disparities and point clouds, generate fused disparities using lidar and stereo disparities and fuse the fused disparities with images in feature space, predict the final high resolution disparities and reconstruct a 3D scene using such high resolution disparities, which is limited by the need for large-scale labeled stereo lidar datasets. Liang et al realized point-by-point fusion through one continuous convolution fusion layer, connecting the roles of images of different scales and point cloud features in multiple stages of the network. Firstly, K nearest neighbors are extracted for each pixel, then the points are projected onto an image, relevant image features are searched, finally, the fusion feature vectors are weighted according to the geometric offset between the fusion feature vectors and the target pixels, and then the fusion feature vectors are input into a neural network. However, when the radar resolution is low or the distance is long, the point fusion may cause a problem that the high-resolution image cannot be fully utilized.

Disclosure of Invention

Aiming at the problems existing in the existing method, the invention provides the target combination method based on the fusion of the laser radar and the machine vision, so as to realize the fusion of multi-sensor information in the target detection process, thereby ensuring the accuracy of the evaluation of the sensing environment and ensuring the accurate operation of planning control.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the invention discloses a target detection method based on fusion of a laser radar and machine vision, which is characterized by comprising the following steps of:

A. a solid laser radar is arranged at the front bumper of the vehicle, a camera is arranged at the front windshield of the vehicle, the advancing direction of the vehicle is taken as a Z axis, the left direction pointing to the driver is taken as an X axis, the right direction pointing to the vehicle is taken as a Y axis,laser emission center of laser radar is used as camera origin O _l Establishing a laser radar coordinate system O _l -X _l Y _l Z _l And takes the focus center of the camera as the origin O of the camera coordinate system _c Establishing a camera coordinate system O _c -X _c Y _c Z _c The O-XZ surfaces of the two coordinate systems are kept horizontal with the ground;

B. processing each frame of point cloud information acquired by the laser radar comprises the following steps: firstly, carrying out ground point cloud segmentation on point cloud information by a multi-plane fitting method, extracting road edge points from the obtained segmentation result, and sequentially carrying out curve fitting, filtering and downsampling on the extracted road edge points to obtain an interested region of each frame; clustering the point cloud in the region of interest to obtain each target after each frame of clustering, and identifying each clustered target by using a three-dimensional detection frame; wherein, the q-th target after p-th frame clustering utilizes the q-th three-dimensional detection frameTo mark (I)>X-axis coordinate representing the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Representing the y-axis coordinate of the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Z-axis coordinate representing the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Represents the width of the q-th three-dimensional detection frame in the p-th frame,/or->Representing the length of the q-th three-dimensional detection frame in the p-th frame,/for the frame>Representing the height of the q-th three-dimensional detection frame in the p-th frame; selecting a two-dimensional detection frame which is closest to the laser radar in the three-dimensional detection frames>Characterizing the q-th target after clustering in the p-th frame; thereby obtaining a point cloud data set with a detection frame;

C. constructing a yolov5 model by adopting a convolution attention module, training the yolov5 model by utilizing a road vehicle image data set to obtain a trained yolov5 model, processing each frame of image information acquired by the camera by utilizing the trained yolov5 model, and outputting a detection frame of each target in each frame of image information and coordinate, size, category and confidence information thereof, thereby obtaining an image information set with the detection frame;

D. performing space-time synchronization on the point cloud information set and the image information set, including: the method comprises the steps of taking a laser radar signal as a reference of registration frequency, aligning time stamps of the laser radar and a camera by using an interpolation method, and thus obtaining a point cloud information set of the laser radar and an image information set of the camera at the same moment; then calibrating the camera to obtain an internal reference of the camera, calibrating the camera and the laser radar in a combined way, and obtaining an external reference, so that a two-dimensional detection frame under a laser radar coordinate system is projected under a pixel coordinate system, and a projected two-dimensional detection frame is obtainedWherein (1)>X-axis coordinate representing center point of two-dimensional detection frame after the q-th projection, +.>Representing the y-axis coordinate of the center point of the two-dimensional detection frame after the q-th projection, +.>Represents the width of the two-dimensional detection frame after the q-th projection,/->Representing the height of the two-dimensional detection frame after the q-th projection;

E. carrying out data association on the information after time-space synchronization to obtain an association pair:

e1, setting the association threshold as r _th The method comprises the steps of carrying out a first treatment on the surface of the Defining a variable i to represent the number of frames of the laser radar after time synchronization with the camera, defining a variable j to represent the current target number contained in the point cloud data of the laser radar for observing the ith frame, defining a variable k to represent the current target number contained in the image data of the camera for observing the ith frame, and initializing i=1;

e2, initializing j=1; taking the coordinate and size information of the j-th projected two-dimensional detection frame in the point cloud data set of the i-th frame laser radar as the j-th radar target observation information of the i-th frameThe jth radar target observation information of the ith frame +.>The corresponding three-dimensional detection frame is taken as the j three-dimensional detection frame after the i frame is clustered>

E3, initializing k=1; taking the coordinates, the size, the category and the confidence information of the kth detection frame in the image information set of the ith frame camera as the kth camera target observation information of the ith frameWherein (1)>Representing the x-axis coordinates of the center point of the kth detection frame,/>y-axis coordinate representing the center point of the kth detection frame,/->Represents the width of the kth detection frame, +.>Indicating the height of the kth detection box, < >>Class of detected target of kth detection frame,/->Confidence information representing a kth detection frame;

e4, calculating the j-th laser radar target observation information of the i-th frameThe kth camera object observation information +.>Euclidean distance between->

E5, judgingIf so, the detection target of the laser radar is successfully matched with the detection target of the camera, and the j-th radar target observation information of the i-th frame is +.>The kth camera object observation information +.>The correlation pairs are formed, otherwise, the matching failure is indicated;

e6, after k+1 is assigned to k, returning to the step E3 for sequential execution until all camera target observation information of the ith frame is traversed, after j+1 is assigned to j, returning to the step E2 until all targets of the ith frame are traversed;

e7, calculating the j-th radar target observation information of the i-th frameThe kth camera object observation information +.>Cross ratio of the correlation pair>And with the set cross-over threshold IOU _th Comparing if->The corresponding association pair in the ith frame is indicated to be correct and output, otherwise, the corresponding association pair in the ith frame is omitted, E7 is returned to calculate the next association pair in the ith frame until all correct association pairs in the ith frame are output;

F. data fusion is carried out on all correct association pairs in the ith frame to obtain target detection information after the fusion of the ith frame, wherein the target detection information comprises: if the mth radar target observation information of the ith frameWith the nth camera object observation informationThe relation is a correlation pair, and the x-axis coordinate in the three-dimensional detection frame corresponding to the mth radar target observation information of the ith frame is directly added>y-axis coordinate->z-axis coordinate +.>Length->Width->And class +.f in nth camera object observation information>Confidence information->Directly serving as fused partial target detection information of the corresponding association pair, and then converting an nth camera target observation frame into a radar coordinate system by utilizing the participation of the camera in the step D and the participation of the camera in the step D, thereby obtaining an nth camera target observation height +.>Projection in a radar coordinate system +.>And as the target detection height compensation information after fusion of the corresponding association pair, the target detection information after fusion is formed by partial target detection information after fusion and target detection height compensation information;

G. tracking each target in the target detection information fused in the ith frame and outputting a target detection result.

2. The method for detecting a target based on fusion of laser radar and machine vision according to claim 1, wherein in said E5, if the j-th radar target observation information of the i-th frameObservation information of any one camera object from the ith frame +.>Euclidean distance between->Are all greater than r _th Then j radar target observation information of i frame +.>Outputting and carrying out target tracking;

if the corresponding radar target observation information is detected in the (i+1) th frameAnd corresponding radar target observation information +.>The kth camera target observation information with the (i+1) th frameEuclidean distance between->Then consider the jth radar target observation information +.>The target was successfully detected.

Compared with the prior art, the invention has the beneficial effects that:

1. aiming at the problems of a large number of false detection and omission in the data association and fusion process in the target detection process based on multi-sensor fusion, the method takes the accurate result of the fusion of the laser radar and the image information as a target, firstly, the laser radar is used for collecting multi-target point cloud data, generating a laser radar detection frame of the target after ground point cloud segmentation, region of interest extraction and clustering, then, a machine vision detection frame of the target is generated by using a yolov5 algorithm improved by a convolution attention module, and the detection result of the laser radar and the machine vision is associated to obtain an association pair by setting a reasonable threshold; compared with NN algorithm with weak anti-interference capability, association errors are easy to occur, and the method has the advantages that whether the threshold is met or not is judged through calculating the cross-correlation ratio (IOU) between the association pairs, if yes, the threshold is output, otherwise, sub-optimal association is selected to recalculate the cross-correlation ratio until the threshold is met, an accurate association pair is obtained, the situation that the NN algorithm is erroneously associated in the data association process is effectively avoided by utilizing the IOU, and therefore accuracy of target detection based on multi-sensor fusion is improved, and accurate execution of planning control is ensured.

2. The invention provides a decision method under the condition that a laser radar and a machine vision cannot be matched in the data fusion process, which can further screen the target which is not successfully matched in the data association process, thereby reducing the probability of the condition of missed detection of the target in the data fusion process.

3. The invention provides a target fusion method based on laser radar and machine vision. Firstly, directly adding object information which can be output by a single sensor into a fused target; and then the position information and the width information of the object are directly obtained by adopting a laser radar, and the height information is dynamically compensated by adopting a mode of converting a pixel frame into a radar coordinate system, wherein depth information provided by the laser radar is used as a basis for calculating projection of a detection frame in the pixel coordinate system to a camera coordinate system. Compared with the method of M.Liang et al, the method of the invention utilizes the image information to compensate the height information of the laser radar, thereby solving the problem that the high-resolution image can not be fully utilized in the point fusion process.

Drawings

FIG. 1 is an overall flow chart of a target detection method based on laser radar and machine vision fusion in accordance with the present invention;

FIG. 2a is a view of a laser radar detection scene of the present invention;

FIG. 2b is a diagram showing the detection effect of the laser radar according to the present invention;

FIG. 3 is a diagram showing the machine vision inspection effect of the present invention;

FIG. 4 is a time synchronization schematic diagram of the present invention;

FIG. 5 is a graph showing the combined calibration effect of the laser radar and the camera of the invention;

FIG. 6 is a diagram of possible association situations in the context of the target association scenario of the present invention;

FIG. 7 is a diagram of a decision making method in a scene of successful target mismatch in the present invention;

FIG. 8 is a diagram of a data fusion method according to the present invention.

Detailed Description

In this embodiment, a target detection method based on fusion of laser radar and machine vision, as shown in fig. 1, includes the following steps:

A. a solid-state laser radar is arranged at the front bumper of the vehicle, a camera is arranged at the front windshield of the vehicle, the advancing direction of the vehicle is taken as a Z axis, the left direction pointing to the driver is taken as an X axis, the right direction pointing to the vehicle is taken as a Y axis, and the laser emission center of the laser radar is taken as a camera origin O _l Establishing a laser radar coordinate system O _l -X _l Y _l Z _l And takes the focus center of the camera as the origin O of the camera coordinate system _c Establishing a camera coordinate system O _c -X _c Y _c Z _c The O-XZ surfaces of the two coordinate systems are kept horizontal with the ground;

B. processing the point cloud information acquired by the laser radar, including:

b1, carrying out ground point cloud segmentation on point cloud information through a multi-plane fitting method: dividing each frame of laser point cloud into a plurality of areas along the running direction of the vehicle, calculating the average value RPA (region point average) of the lowest height points in the areas, eliminating the influence of noise point cloud, and setting a height threshold h _th Meet h based on RPA _th As a seed point set; according to the seed point fitting plane, a simple linear plane model is selected as shown in formula (1):

Ax+By+Cz+D＝0 (1)

in the formula (1), (A, B, C) is a normal vector of the plane, and D is a distance required for translating the plane to the origin of coordinates;

thereby obtaining an initial plane model, and setting a distance threshold D _th =0.2m, the distance d between points in the region and the plane is calculated from the distance equation (2) between points in the solid geometry and the plane:

in the formula (2), x, y and z are three-dimensional coordinates of the point cloud. If satisfy d<D _th Adding the point to the ground point set, otherwise, considering the point as a non-ground point; using the obtained ground points as an initial set of the next iteration, and completing the segmentation of the ground point cloud after 3 optimization iterations;

b2, after extracting road edge points of the obtained segmentation result, sequentially carrying out curve fitting, filtering and downsampling on the extracted road edge points to obtain an interested region of each frame, wherein the extraction of the interested region is to consider that the occupied ratio is the most in all invalid target information, the influence on target detection is the invalid point cloud targets such as pedestrian targets on a pavement, trees and buildings on two sides of the pavement in the y-axis direction, and the like, and consider that a structured urban road can distinguish a vehicle driving region from a non-vehicle driving region by the road edge, and the point cloud information intensive by a laser radar is very suitable for identifying the road edge to obtain the interested region (ROI, region of interesting); then, extracting a position road edge candidate point by utilizing the characteristic that two adjacent points on the same scanning line have mutation at the road edge, classifying the position road edge candidate point into a left road edge and a right road edge according to the positive and negative of the y coordinates of the points, adding the value into the left road edge point if the value is positive, and adding the value into the right road edge point if the value is negative; finally, fitting the left and right road edges by using a linear model in the RANSAC according to the extracted road edge points to finish the extraction of the region of interest;

b3, clustering the point clouds in the region of interest to obtain each clustered frameThe targets are identified by using a three-dimensional detection frame; wherein, the q-th target after p-th frame clustering utilizes the q-th three-dimensional detection frameTo mark (I)>X-axis coordinate representing the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Representing the y-axis coordinate of the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Z-axis coordinate representing the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Represents the width of the q-th three-dimensional detection frame in the p-th frame,/or->Representing the length of the q-th three-dimensional detection frame in the p-th frame,/for the frame>Representing the height of the q-th three-dimensional detection frame in the p-th frame; selecting a two-dimensional detection frame which is closest to the laser radar in the three-dimensional detection frames>Characterizing a q-th target after clustering in a p-th frame; the point cloud data set with the detection frame is obtained, wherein the clustering in the step B3 is completed by using a DBSCAN algorithm, wherein in order to avoid that distant targets cannot be clustered, two objects with close proximity distance are clustered into one type when the distance is large, different epsilon thresholds are set for improving the clustering effect, and considering that the horizontal angle resolution of the laser radar is generally higher than the vertical angle resolution, the clustering effect is improved by using the vertical angle resolutionAngular resolution setting distance adaptive threshold epsilon _th The expression (3) can be used to determine:

ε _th ＝kh (3)

in the formula (3), k=1.1 is an amplification factor, h is the height between two scanning lines in the vertical direction when the laser radar is at a certain distance, a clustered target is obtained, the clustered target is framed by a detection frame nearest to the radar to represent target information, fig. 2a is a certain detection scene graph, and fig. 2b is a detection effect graph corresponding to processing output;

C. a convolution attention module is adopted to construct a yolov5 model, the road vehicle image data set is utilized to train the yolov5 model, a trained yolov5 model is obtained, each frame of image information acquired by a camera is processed by utilizing the trained yolov5 model, detection frames of all targets in each frame of image information and coordinate, size, category and confidence information of the targets are output, so that an image information set with the detection frames is obtained, wherein the convolution attention module consists of a channel attention module and a space attention module, the channel attention module calculates attention force diagram in a channel dimension, the space attention module is input after multiplication with a feature diagram, the space attention module further calculates a feature diagram in an high-wide dimension, the attention feature diagram is output after multiplication with the input, and the network is induced to focus on learning of important features correctly; selecting a part of pictures which are closer to the data set in the public data set, modifying the categories and deleting the unnecessary targets in the pictures, and then manufacturing the rest part of data sets by oneself, wherein the total number of the data sets is 6000, and the proportion of the training set to the verification set is 5:1, completing the establishment of a data set, and fig. 3 is a graph of the detection effect of the improved yolov5 identification output.

D. Performing space-time synchronization on information processed by the laser radar and the camera, including:

d1, using a laser radar signal as a reference of registration frequency, aligning time stamps of the laser radar and a camera by using an interpolation method, obtaining point cloud information of the laser radar and image information of the camera at the same moment, and if target information corresponding to a camera at the moment of 100ms is to be obtained, calculating corresponding data information at the moment of 100ms by interpolation of information acquired by the cameras at 67ms and 133ms through (4), as shown in fig. 4.

In the formula (4), t _i For the time before interpolation, t _i+1 For the time after interpolation, t _j For interpolation time, x _i For the x-axis coordinate information of the moment before interpolation, x _i+1 For coordinate information of x axis at time after interpolation, x _j In order to obtain the x-axis coordinate information of the interpolation time, when an interpolation method is used, the interval between the selected interpolation time and the front and back data frames needs to be ensured not to be higher than 67ms of the sampling period of the camera, and if the sampling period of the camera is exceeded, the invalid interpolation time is considered to be removed;

d2, performing space-time synchronization on the point cloud information set and the image information set, including: the method comprises the steps of taking a laser radar signal as a reference of registration frequency, aligning time stamps of the laser radar and a camera by using an interpolation method, and thus obtaining a point cloud information set of the laser radar and an image information set of the camera at the same moment; then calibrating the camera internal parameters by utilizing an automatic calibration method based on Zhang Zhengyou, and then acquiring an external parameter matrix between the radar and the camera by utilizing a Calibration Toolkit tool kit separated from the automatic calibration method, wherein FIG. 5 is a laser radar and camera combined calibration effect diagram; thereby projecting the two-dimensional detection frame under the laser radar coordinate system to the pixel coordinate system to obtain the projected two-dimensional detection frameWherein (1)>X-axis coordinate representing center point of two-dimensional detection frame after the q-th projection, +.>Representing the y-axis coordinate of the center point of the two-dimensional detection frame after the q-th projection,represents the width of the two-dimensional detection frame after the q-th projection,/->The height of the two-dimensional detection frame after the q-th projection is shown.

e1, setting a correlation threshold r _th Considering that too large a threshold can lead to complex matching conditions and affect algorithm accuracy, too small a threshold can lead to matching failure, r is set _th Circular threshold=0.5 meters; defining a variable i to represent the number of frames of the laser radar and the camera after time synchronization, defining a variable j to represent the current target number contained in the point cloud data of the laser radar for observing the ith frame, defining a variable k to represent the current target number contained in the image data of the camera for observing the ith frame, and initializing i=1;

e2, initializing j=1; taking the coordinate and size information of the j-th projected two-dimensional detection frame in the point cloud data set of the i-th frame laser radar as the j-th radar target observation information of the i-th frameJth radar target observation information of ith frame +.>The corresponding three-dimensional detection frame is the j-th three-dimensional detection frame after the i-th frame is clustered

E3, initializing k=1; taking the coordinates, the size, the category and the confidence information of the kth detection frame in the image information set of the ith frame camera as the kth camera target observation information of the ith frameWherein (1)>X-axis coordinate representing the center point of the kth detection frame,/->Y-axis coordinate representing the center point of the kth detection frame,/->Represents the width of the kth detection frame, +.>Indicating the height of the kth detection box, < >>Class of detected target of kth detection frame,/->Confidence information representing a kth detection frame;

e4, calculating the j-th laser radar target observation information of the i-th frameThe kth camera object observation information +.>Euclidean distance between->E5, judging->If so, the detection target of the laser radar is successfully matched with the detection target of the camera, and the j-th radar target observation information of the i-th frame is +.>The kth camera object observation information +.>The correlation pairs are formed, otherwise, the situation that the correlation of the decision diagram possibly occurs when the matching failure figure 6 is the target unmatched is shown. When the matching is unsuccessful, the decision method is as follows: for targets detected by the radar but not detected by vision, the reason that the view angles are different is ignored because the region of interest is extracted by the laser radar is possibly that the vision cannot be detected when the light conditions such as the category of objects with untrained vision such as animals, cone barrels and the like on the road and the evening are bad, the objects possibly have influence on the safe running of the vehicle, the effect is reserved, and if the j-th radar target observation information of the i-th frame is->Observation information of any one camera object from the ith frame +.>Euclidean distance between->Are all greater than r _th Then j radar target observation information of i frame +.>Outputting and carrying out target tracking; if the corresponding radar target observation information is detected in the (i+1) th frame +.>And corresponding radar target observation informationThe kth camera object observation information +.1 to the (i+1) th frame>Euclidean distance between->Then consider the jth radar target observation information +.>Successfully detecting the target; for targets which are detected visually and not detected by the radar, the possible reasons are that the clustering precision of the laser radar with the target distance too far is not reached, the conditions directly remove the visual recognition targets, meanwhile, the field of view of a camera is larger than targets such as pedestrians on some road edges recognized by the radar interest area, and the like, and the objects cannot influence the safe running of the vehicle and are ignored; for targets detected by vision and radar, the reason that the radar algorithm cannot cluster and distinguish the targets with too close distance between pedestrians and vehicles generally appears is that the radar detection result is reserved, as shown in fig. 7, L represents the laser radar detection result, C represents the camera detection result, wherein L1 and C1 are successfully paired, L2 is reserved, and C2 is ignored.

E6, after k+1 is assigned to k, returning to the step E4 for sequential execution until all camera target observation information of the ith frame is traversed, after j+1 is assigned to j, returning to the step E3 until all targets of the ith frame are traversed;

e7, calculating the j-th radar target observation information of the i-th frameThe kth camera object observation information +.>Cross ratio of the correlation pair>And with the set cross-over threshold IOU _th Comparing, selecting IOU through instance test _th =0.7, if->The corresponding association pair in the ith frame is indicated to be correct and output, otherwise, the corresponding association pair in the ith frame is omitted, E7 is returned to calculate the next association pair in the ith frame until all correct association pairs in the ith frame are output;

F. according to the characteristics of the output data of different sensors, data fusion is performed on all correct association pairs in the ith frame to obtain target detection information after the fusion of the ith frame, as shown in fig. 8, including:

f1, since the laser radar can output depth information of the target and the camera can output category and confidence information of the object, if the mth radar target observation information of the ith frameObservation information about nth camera object>The relation is a correlation pair, and the x-axis coordinate in the three-dimensional detection frame information corresponding to the mth radar target observation information of the ith frame is directly added>y-axis coordinate->z-axis coordinate +.>Length->Width->And class +.f in nth camera object observation information>Confidence information->Fused partial target detection information directly serving as corresponding association pair

F2, when the laser radar detects the target, the laser scanning line on the height of the target is sparse as the target distance is farther, so that the loss of the height information occurs, and the n-th camera target observation frame is replaced under the radar coordinate system by utilizing the participation of the internal parameters and the external parameters of the camera marked in the step D, so that the n-th camera target observation height information is obtainedProjection in a radar coordinate system +.>Outputting the fused target detection height compensation information serving as the correlation pair, and further obtaining fused target detection data;

G. the method of the invention selects an Extended Kalman Filter (EKF) to track the fusion target.

Claims

1. The target detection method based on the fusion of the laser radar and the machine vision is characterized by comprising the following steps of:

A. a solid-state laser radar is arranged at the front bumper of the vehicle, a camera is arranged at the front windshield of the vehicle, the advancing direction of the vehicle is taken as a Z axis, the left direction pointing to the driver is taken as an X axis, the right direction pointing to the vehicle is taken as a Y axis, and the laser emission center of the laser radar is taken as a camera origin O _l Establishing a laser radar coordinate system O _l -X _l Y _l Z _l And takes the focus center of the camera as the origin O of the camera coordinate system _c Establishing a camera coordinate system O _c -Z _c Y _c Z _c The O-XZ surfaces of the two coordinate systems are kept horizontal with the ground;

B. for each frame point acquired by the laser radarCloud information processing includes: firstly, carrying out ground point cloud segmentation on point cloud information by a multi-plane fitting method, extracting road edge points from the obtained segmentation result, and sequentially carrying out curve fitting, filtering and downsampling on the extracted road edge points to obtain an interested region of each frame; clustering the point cloud in the region of interest to obtain each target after each frame of clustering, and identifying each clustered target by using a three-dimensional detection frame; wherein, the q-th target after p-th frame clustering utilizes the q-th three-dimensional detection frameTo mark (I)>X-axis coordinate representing the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Representing the y-axis coordinate of the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Z-axis coordinate representing the center point of the q-th three-dimensional detection frame in the p-th frame, +.>Represents the width of the q-th three-dimensional detection frame in the p-th frame,/or->Representing the length of the q-th three-dimensional detection frame in the p-th frame,/for the frame>Representing the height of the q-th three-dimensional detection frame in the p-th frame; selecting a two-dimensional detection frame which is closest to the laser radar in the three-dimensional detection frames>Characterizing the q-th target after clustering in the p-th frame; thereby obtaining a point cloud data set with a detection frame;

D. performing space-time synchronization on the point cloud data set and the image information set, including: the method comprises the steps of taking a laser radar signal as a reference of registration frequency, aligning time stamps of the laser radar and a camera by using an interpolation method, and thus obtaining a point cloud information set of the laser radar and an image information set of the camera at the same moment; then calibrating the camera to obtain an internal reference of the camera, calibrating the camera and the laser radar in a combined way, and obtaining an external reference, so that a two-dimensional detection frame under a laser radar coordinate system is projected under a pixel coordinate system, and a projected two-dimensional detection frame is obtainedWherein (1)>X-axis coordinate representing center point of two-dimensional detection frame after the q-th projection, +.>Representing the y-axis coordinate of the center point of the two-dimensional detection frame after the q-th projection, +.>Represents the width of the two-dimensional detection frame after the q-th projection,/->Representing the height of the two-dimensional detection frame after the q-th projection;

e2, initializing j=1; taking the coordinate and size information of the j-th projected two-dimensional detection frame in the point cloud data set of the i-th frame laser radar as the j-th radar target observation information of the i-th frameThe jth radar target observation information of the ith frame +.>The corresponding three-dimensional detection frame is used as the j-th three-dimensional detection frame after the i-th frame is clustered

E3, initializing k=1; taking the coordinates, the size, the category and the confidence information of the kth detection frame in the image information set of the ith frame camera as the kth camera target observation information of the ith frameWherein (1)>X-axis coordinate representing the center point of the kth detection frame,/->Representing the y-axis coordinates of the center point of the kth detection frame,/>represents the width of the kth detection frame, +.>Indicating the height of the kth detection box, < >>Class of detected target of kth detection frame,/->Confidence information representing a kth detection frame;

E5, judgingIf so, the detection target of the laser radar is successfully matched with the detection target of the camera, and the j-th radar target observation information of the i-th frame is +.>The kth camera object observation information +.>Between which isAssociation pairs, otherwise, representing failure of matching;

2. The lidar-based system of claim 1The method for detecting the target integrated with the machine vision is characterized in that in E5, if the j-th radar target observation information of the i-th frameObservation information of any one camera object from the ith frame +.>Euclidean distance between->Are all greater than r _th Then j radar target observation information of i frame +.>Outputting and carrying out target tracking;