CN114463303A

CN114463303A - Road target detection method based on fusion of binocular camera and laser radar

Info

Publication number: CN114463303A
Application number: CN202210110972.0A
Authority: CN
Inventors: 张炳力; 潘泽昊; 姜俊昭; 刘文涛; 张成标; 程进
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-01-29
Filing date: 2022-01-29
Publication date: 2022-05-10

Abstract

The invention provides a road target detection method based on fusion of binocular cameras and laser radars, which comprises the steps of collecting front road target information by utilizing a left camera, a right camera and a laser radar; acquiring binocular parallax through a binocular stereo matching algorithm; acquiring image target category and two-dimensional position information by using a monocular vision-based neural network; combining binocular parallax and monocular vision detection information to obtain a front target vision three-dimensional detection result; acquiring a front target radar three-dimensional detection result through point cloud segmentation and clustering; performing Hungary algorithm optimization solution on the matching cost of the two three-dimensional enclosure frames, classifying based on matching results, adopting different fusion strategies, and finally outputting the road target information after supplementary correction. The detection framework of the invention realizes the advantage complementation of the sensor, and uses a target-level matching fusion strategy to output more accurate and reliable road target information.

Description

Road target detection method based on fusion of binocular camera and laser radar

Technical Field

The invention belongs to the field of intelligent vehicle environment perception, and particularly relates to a road target detection method based on fusion of a binocular camera and a laser radar.

Background

At the present stage, intelligent driving becomes a mainstream trend, and the accurate and efficient environment perception of the intelligent vehicle is a primary task for realizing advanced auxiliary driving and even automatic driving. Currently, the environment sensing technology of intelligent vehicles mainly depends on vehicle-mounted sensors such as vision and radar. The road target category information can be obtained through the visual image, but the monocular vision system cannot accurately obtain the object distance information; and the laser radar can provide three-dimensional information of road targets, but cannot accurately judge the target types.

In the existing solution, in order to realize advantage complementation of each sensor, a target detection method based on sensor fusion appears, but in the specific implementation design, only simple extension of multi-sensor detection information is utilized, and the accuracy and reliability of the fusion scheme need to be improved. No better method or technique for solving or improving the above problems has been seen.

Disclosure of Invention

Aiming at the defects or improvement requirements of the existing method, the invention aims to provide a road target detection method based on the fusion of a binocular camera and a laser radar, combines the vision sensing technology with the laser radar technology, fuses a YOLOv4 detection model, and finally fuses binocular vision and a radar detection result, thereby providing an intelligent, more accurate and reliable technical means for road target detection.

In order to solve the technical problem, the invention provides a road target detection method based on fusion of a binocular camera and a laser radar, which comprises the following steps:

step 1, collecting front road target information by using a left camera, a right camera and a laser radar;

step 2, obtaining binocular parallax information through a binocular stereo matching algorithm;

step 3, utilizing a visual target detection neural network based on YOLOv4 to acquire the category and two-dimensional position information of a target in the left camera image as left camera visual detection information;

step 4, acquiring a three-dimensional visual detection result of the front target by combining binocular parallax information and left camera visual detection information;

step 5, carrying out point cloud segmentation and clustering on the original point cloud obtained by the laser radar to obtain a front target radar detection result;

and 6, performing time and space registration on the vision and radar detection results, performing matching fusion on the front target detection results obtained by the vision and radar detection results, and outputting fused front target information.

In the above technical solution, further, the step 1 specifically includes: a binocular camera system is formed by a left camera and a right camera which are arranged in parallel, and left image information and right image information of a road target in front are respectively collected. Collecting point cloud information of a front road area through a laser radar; relative position protection of camera and radarAnd before the cameras and the laser radar are used for collecting road information, the left camera and the right camera are calibrated in a binocular mode, and the left camera and the laser radar are calibrated in a combined mode. The calibration steps are specifically as follows: obtaining internal parameters M of the left camera and the right camera by using a Zhangyingyou calibration method and an MABTLAB calibration tool according to images acquired by the left camera and the right camera_L1,M_R1Distortion coefficient D_L,D_R. Selecting 4 pairs of 3D points in the laser radar point cloud and corresponding image pixel points thereof, and utilizing the 3D space point position P of the points_{Lidar_i}And the 2D image projection position p of the point_{img_i}Obtaining external parameter M between the left camera and the laser radar by adopting a PnP algorithm_LR＝[R_LR|T_LR]And finishing the combined calibration.

Further, the step 2 specifically includes: and performing stereo correction on the images acquired by the left camera and the right camera by using the internal and external parameters calibrated by the binocular cameras. And searching pixel points corresponding to the same target object of the left image and the right image through an SGBM algorithm, and generating parallax based on the pixel points of the left camera image. Acquiring two-dimensional point [ u ] of image pixel based on Bouguet algorithm by using binocular parallax_i,v_i]^TTo a three-dimensional point [ x ] of a camera coordinate system_i,y_i,z_i]^TThe reprojection matrix Q_reprojection。

Further, the step 3 specifically includes: the method comprises the steps of collecting a target image data set in advance, labeling road target types and positions in the image, inputting the labeled data set serving as a training set into a YOLOv4 visual target detection neural network, and training out network prediction weights. Inputting the image acquired by the left camera into a YOLOv4 visual target detection neural network to obtain the class information class of the ith target in the image_iAnd its minimum two-dimensional bounding box 2d _ bbox_{img_i}(u_i,v_i,w_i,h_i) Wherein u is_iIs the ith target two-dimensional minimum bounding box 2d _ bbox in the image_{img_i}Left upper corner point abscissa, v_iIs the ordinate of the upper left corner point, w_iIs the width of the smallest bounding box, h_iIs the height of the smallest enclosing frame.

Further, the stepsThe step 4 is specifically as follows: the target vision two-dimensional information 2d _ bbox obtained by the neural network in the step 3 is processed_{img_i}Using the reprojection matrix Q of step 2_reprojectionRe-projecting the three-dimensional minimum bounding box to the camera coordinate system to obtain the three-dimensional minimum bounding box bbox of the i visual detection targets in the camera coordinate system_{cam_i}. Mixing bbox_{cam_i}Geometric center P_iAs target three-dimensional position, with bbox_{cam_i}Class_iAnd are output together as the front target visual detection result.

Further, the step 5 specifically includes: and filtering the ground point cloud and the noise point cloud by using a RANSAC algorithm, and segmenting point clouds of all road targets. And clustering all the road target point clouds into a plurality of point cloud clusters by using a clustering algorithm, wherein each point cloud cluster comprises all the point clouds of one road target. Calculating to obtain the minimum three-dimensional bounding box bbox of each target point cloud cluster_{lidar_k}. Mixing bbox_{lidar_k}Geometric center Q_kAs target three-dimensional position, with bbox_{lidar_k}And the detection results are output together as the detection result of the front target radar.

Further, the step 6 specifically includes: firstly, time registration of vision and radar detection results is carried out, and the vision bbox with the time stamp difference less than 0.1 second is used_{cam_i}And radar bbox_{lidar_k}And (4) regarding the information as the same time information, and performing subsequent fusion. Calibrating the obtained external parameter M by using the combination of a camera and a laser radar_LR＝[R_LR|T_LR]And converting the visual detection result into a radar coordinate system, and realizing the spatial registration of the visual detection result and the radar detection result. Visual bbox after temporal and spatial registration_{cam_i}And radar bbox_{lidar_k}And 3DIOU matching cost calculation is carried out. The calculation formula is:

wherein bbox_{cam_i} I bbox_{lidar_k}Is bbox_{cam_i}And bbox_{lidar_k}The three-dimensional box volume of the intersection; bbox_{cam_i}I bbox_{lidar_k}Is bbox_{cam_i}And bbox_{lidar_k}A three-dimensional frame volume of the union; c. C_ikIs bbox_{cam_i}And bbox_{lidar_k}The minimum length of the diagonal line surrounding the cuboid; d_ikIs a geometric center P_iAnd Q_kThe euclidean distance of (c). And optimizing and solving the matching cost by using the Hungarian algorithm. By using the Hungarian matching result, visual and laser radar detection targets are divided into three categories: detected by vision only, radar simultaneously with vision. The matching is regarded as unsuccessful only by visual detection or radar detection, the matching is regarded as successful by radar and visual simultaneous detection, and visual or radar detection information is reserved for targets with unsuccessful matching. And further supplementing and correcting the successfully matched target information. The supplementary correction process comprises: supplementing the successfully matched visual category information into the category information of the fusion target; and setting vision and radar detection result correction coefficients alpha and beta, wherein alpha + beta is 1, carrying out weighted correction on the minimum three-dimensional surrounding frame and the three-dimensional position information of the target, and outputting a fusion detection result.

The invention also provides a processor, which is characterized in that the processor is used for running the program, wherein the program executes the steps when running.

The invention also provides corresponding computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the steps in the above embodiments are realized when the memory executes the program.

Compared with the prior art, the invention has the following advantages:

the invention provides a road target detection framework based on fusion of a binocular camera and a laser radar, provides a target level fusion method of the binocular camera and the laser radar, overcomes the defect of poor traditional detection effect, and realizes accurate and reliable advantage complementation among multiple sensors.

The monocular vision system target detection based on the deep learning method has low requirements on the performance of a computing platform and low cost for manufacturing and training a data set. The depth estimation based on the computer stereo vision matching algorithm is mature and easy to transplant. By combining binocular vision depth estimation and monocular neural network target detection, the defects that the monocular vision detection cannot accurately acquire target depth information and the binocular vision detection cannot efficiently perform multi-target detection are overcome.

By adopting the target level matching fusion method, the target detection results respectively output by the binocular camera and the laser radar are utilized to correct and output more accurate and reliable road target information, and the potential safety hazard of intelligent vehicle road environment perception caused by missed detection and false detection of a single sensor is reduced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the examples serve to explain the principles of the invention and not limit the invention.

Fig. 1 is a schematic flow chart of a road target detection method according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of visual output provided by an embodiment of the present invention;

fig. 3 is a schematic diagram of a fused matching cost calculation according to an embodiment of the present invention.

Detailed Description

For the purpose of making the present invention more comprehensible, and for the purpose of making the present application more comprehensible, embodiments and advantages thereof, the present invention will be further described with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.

The invention provides a road target detection method for fusing a binocular camera and a laser radar, which comprises the following specific steps as shown in figure 1:

1. and collecting front road target information by using a left camera, a right camera and a laser radar.

1.1 in the step, the optical axes of the lenses of the left camera and the right camera are kept parallel, the focal lengths are consistent, the placing distance is not too large, and the imaging overlapping parts of the left camera and the right camera account for more than 80% of a single image. The positions of the binocular camera and the laser radar are fixed, and the relative positions of the binocular camera, the laser radar and the laser radar are unchanged, so that the three sensors are guaranteed to have a common sensing range.

2. And carrying out binocular calibration on the left camera and the right camera, and carrying out combined calibration on the left camera and the laser radar.

2.1 in this step, binocular calibration is performed on the left and right cameras by using a Zhang Zhengyou calibration method and an MABTLAB calibration tool to obtain internal parameters M of the left and right cameras_L1,M_R1Distortion coefficient D_L,D_RAnd an external parameter M for describing the rotation and translation relationship of the left camera and the right camera_LR＝[R_LR|T_LR]Wherein R is_LRIs a rotation matrix, T, of the left and right cameras_LRIs a left and right camera translation matrix.

2.2 in this step, the specific steps of the left camera and the laser radar combined calibration are as follows: selecting 3D point P in more than 4 pairs of laser radar point clouds_{Lidar_i}And corresponding image pixel point p_{img_i}(i 1,2, 3.) using its 3D spatial point position P_{Lidar_i}＝[X_i,Y_i,Z_i]^TAnd the 2D image projection position p of the point_{img_i}＝[u_i,v_i]^TSolving a rotation translation transformation matrix of the left camera relative to the laser radar by adopting a PnP algorithm, namely an external reference M_{Lidar_img}. Thereby obtaining a two-dimensional point [ u ] of an image pixel_i,v_i]^TThree-dimensional point [ x ] of camera coordinate system_i,y_i,z_i]^TAnd laser radar three-dimensional point [ X ]_i,Y_i,Z_i]^TCoordinate transformation relation among the three:

[u_i,v_i]^T＝M_L1[x_i,y_i,z_i]^T (1)

[x_i,y_i,z_i]^T＝M_{Lidar_img}[X_i,Y_i,Z_i]^T (2)

3. and acquiring depth information, namely a binocular disparity map, by using binocular calibration parameters through a binocular stereo matching algorithm.

3.1 in this step, first, the binocular camera image is corrected by using the distortion coefficient:

[u_{rect_i},v_{rect_i}]^T＝D[u_i,v_i]^T (3)

wherein D is the distortion coefficient D of the left and right cameras_L,D_R；[u_i,v_i]^TTwo-dimensional point coordinates of pixels of the left camera image and the right camera image are obtained; [ u ] of_{rect_i},v_{rect_i}]^TTwo-dimensional point coordinates of pixels of the corrected left camera image and the corrected right camera image are obtained;

3.2 searching pixel points corresponding to the same target object of the left image and the right image through an SGBM algorithm by using binocular calibration parameters, and generating parallax d based on the pixel points of the left camera image_i＝u_l-u_rWherein u is_l,u_rRespectively, the abscissa of the pixel point of the same target object on the left image and the right image.

3.3 obtaining two-dimensional point [ u ] of image pixel by using binocular parallax based on Bouguet algorithm_i,v_i]^TTo a three-dimensional point [ x ] of the camera coordinate system_i,y_i,z_i]^TThe reprojection matrix Q_reprojectionAnd obtaining a reprojection relation:

[x_i,y_i,z_i]^T＝Q_reprojectiong[u_i,v_i,d_i]^T (4)

wherein, d_iIs a two-dimensional point [ u ] of an image pixel_i,v_i]^TThe corresponding parallax.

4. And acquiring the category and position information of the target in the left camera image by using a visual target detection neural network based on YOLOv 4.

4.1 in this step, a target image data set is collected in advance, the road target type and position in the image are labeled, the labeled data set is used as a training set and is input into a Yolov4 visual target detection neural network, and the network prediction weight is trained. And acquiring the category and position information of the target in the left camera image by using the trained network.

4.2 inputting the image acquired by the left camera into a YOLOv4 visual target detection neural network to obtain the class information class of the ith target in the image_iAnd its minimum two-dimensional bounding box 2d _ bbox_{img_i}(u_i,v_i,w_i,h_i) Wherein u is_iIs the ith target two-dimensional minimum bounding box 2d _ bbox in the image_{img_i}Left upper corner point abscissa, v_iIs the ordinate of the point at the top left corner, w_iIs the width of the smallest bounding box, h_iIs the height of the smallest enclosing frame.

5. And combining the binocular depth map and the left camera visual detection information to obtain a front target visual detection result.

5.1 in this step, as shown in fig. 2, (a) is a schematic output diagram of the neural network for visual target detection; (b) representing different parallax values by different color areas of the parallax map obtained in the step 3; (c) is the output visual detection result. The method comprises the following specific steps: extracting the 2d _ bbox in the step 4_{img_i}(u_i,v_i,w_i,h_i) Upper left corner point p of_{LT_i}(u_i,v_i) And the lower right corner point p_{RD_i}(u_i+w_i/2,v_i+h_iAnd/2) re-projecting the coordinate system of the camera by the formula (4) to obtain the upper left corner point P of the target three-dimensional surrounding frame_{LT_i}(x_{lt_i},y_{lt_i},z_{lt_i}) And the lower right corner point P_{RD_i}(x_{rd_i},y_{rd_i},z_{rd_i})。

5.2 Upper left corner P_{LT_i}And the lower right corner point P_{RD_i}Forming a target three-dimensional minimum bounding box bbox of the ith visual detection target in a camera coordinate system_{cam_i}，bbox_{cam_i}Geometric center P_iAs an objectThree dimensional location, both and object class_iAnd are output together as the front target visual detection result.

6. And carrying out point cloud segmentation and clustering on the original point cloud obtained by the laser radar to obtain a front target radar detection result.

6.1 in the step, filtering the ground point cloud and the noise point cloud from the original point cloud obtained by the laser radar by using a RANSAC algorithm, and segmenting out the point clouds of all road targets. Clustering all road target point clouds into a plurality of point cloud clusters by using a clustering algorithm, wherein each point cloud cluster comprises all point clouds Pt of one road target_k{(X_k1,Y_k1,Z_k1),(X_k2,Y_k2,Z_k2)...(X_kj,Y_kj,Z_kj) Where k is the kth target, (X)_kj,Y_kj,Z_kj) Three-dimensional coordinates of the jth point cloud representing the kth target.

6.2 calculating the coordinate of the minimum bounding box corner point of each target point cloud cluster:

X_{LT_k}＝max(X_k1,X_k2,...X_kj) (5)

X_{RD_k}＝min(X_k1,X_k2,...X_kj) (6)

obtaining the upper left corner point Q of the smallest enclosing frame of the kth target point cloud cluster in the same way_{LT_k}(X_{LT_k},Y_{LT_k},Z_{LT_k}) Lower right corner point Q_{RD_k}(X_{RD_k},Y_{RD_k},Z_{RD_k})。

6.3 Upper left corner Q_{LT_k}And the lower right corner point Q_{RD_k}Forming a target three-dimensional minimum bounding box bbox of the kth radar detection target under a radar coordinate system_{lidar_k}，bbox_{lidar_k}Geometric center Q_kAnd as the three-dimensional position of the target, outputting the three-dimensional position as the detection result of the front target radar.

7. And performing time and space registration on the vision and radar detection results, performing matching fusion on the front target detection results obtained by the vision and radar detection results, and outputting fused front target information.

7.1 in this step, time registration of vision and radar detection results is first performed, and vision bbox with a timestamp difference of less than 0.1 second is used_{cam_i}And radar bbox_{lidar_k}And (4) regarding the information as the same time information, and performing subsequent fusion.

7.2 Using equation (2), bbox located in the Camera coordinate System_{cam_i}And transforming to a radar coordinate system to realize the spatial registration of vision and radar detection results.

7.3 visual bbox after temporal and spatial registration_{cam_i}And radar bbox_{lidar_k}The following matching cost calculation is performed:

as shown in fig. 3, wherein, 3DIOU_ikMatching costs for the three-dimensional target frame of the ith visual detection target result and the kth radar detection target result; bbox_{cam_i} I bbox_{lidar_k}Is bbox_{cam_i}And bbox_{lidar_k}The three-dimensional box volume of the intersection; bbox_{cam_i} I bbox_{lidar_k}Is bbox_{cam_i}And bbox_{lidar_k}A three-dimensional frame volume of the union; c. C_ikIs bbox_{cam_i}And bbox_{lidar_k}The minimum length of the diagonal line surrounding the cuboid; d is a radical of_ikIs a geometric center P_iAnd Q_kThe euclidean distance of (c). Using Hungarian algorithm to match cost 3DIOU_ikOptimizing and solving to obtain a matching result with minimum cost

Wherein assign_nIs the radar detection target result which is globally and optimally matched with the nth visual detection target.

7.4 after the matching result is obtained, dividing the vision and laser radar detection targets into three types: the classification judgment is based on the following table 1, wherein the classification judgment is detected only by vision, only by radar, and simultaneously by radar and vision. The successfully matched target is the target detected by the radar and the vision simultaneously, and the vision and radar detection is integratedAnd (3) information supplement and correction are carried out: correcting the target information successfully matched by using the formulas (8) and (9), wherein alpha is 0.35, and beta is 0.65, which is a correction coefficient; assigning target-matched visual class_iTarget corrected three-dimensional position F_iAnd correcting the three-dimensional minimum bounding box bbox_{fusion_i}. Visual or radar detection information is kept for the target with unsuccessful matching, and the fusion strategy is shown in table 1.

F_i＝αP_i+βQ_k (8)

bbox_{fusion_i}＝α bbox_{cam_i}+β bbox_{lidar_k} (9)

TABLE 1

The invention also provides a programmable processor of any type (FPGA, ASIC or other integrated circuit) for running a program, wherein the program performs the steps of the above embodiments when running.

The invention also provides corresponding computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the steps in the embodiment are realized when the memory executes the program.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the scope of the present invention should be determined by the following claims.

Claims

1. A road target detection method based on binocular camera and laser radar fusion is characterized by comprising the following steps:

step 4, combining binocular parallax information and left camera vision detection information to obtain a three-dimensional vision detection result of the front target;

and 6, performing time and space registration on the visual detection result and the radar detection result, performing matching fusion on the front target detection results obtained by the visual detection result and the radar detection result, and outputting fused front target information.

2. The method of claim 1, wherein step 1 comprises:

a binocular camera system is formed by a left camera and a right camera which are arranged in parallel, and left image information and right image information of a front road target are respectively collected; collecting point cloud information of a front road area through a laser radar;

the relative positions of the camera and the radar are kept unchanged, and before the camera and the laser radar are used for collecting road information, the left camera and the right camera are calibrated in a binocular mode, and the left camera and the laser radar are calibrated in a combined mode.

3. The method of claim 2, wherein the calibrating step specifically comprises:

according to the images collected by the left and right cameras, a Zhang-friend calibration method is utilizedAnd MABTLAB calibration tool for obtaining internal reference M of left camera and right camera_L1,M_R1Distortion coefficient D_L,D_R；

Selecting 4 pairs of 3D points in the laser radar point cloud and corresponding image pixel points thereof, and utilizing the 3D space point position P of the points_{Lidar_i}And the 2D image projection position p of the point_{img_i}Obtaining external parameter M between the left camera and the laser radar by adopting a PnP algorithm_LR＝[R_LR|T_LR]And finishing the combined calibration.

4. The method of claim 1, wherein the step 2 comprises: performing stereo correction on images acquired by a left camera and a right camera by utilizing internal and external parameters calibrated by a binocular camera; searching pixel points corresponding to the same target object of the left image and the right image through an SGBM algorithm, and generating parallax based on the pixel points of the left camera image; acquiring a two-dimensional point [ u ] of an image pixel based on a Bouguet algorithm by using binocular parallax_i,v_i]^TTo a three-dimensional point [ x ] of a camera coordinate system_i,y_i,z_i]^TThe reprojection matrix Q_reprojection。

5. The method of claim 1, wherein step 3 comprises:

acquiring a target image data set in advance, labeling the road target type and position in the image, inputting the labeled data set serving as a training set into a YOLOv4 visual target detection neural network, and training out a network prediction weight;

inputting the image acquired by the left camera into a YOLOv4 visual target detection neural network to obtain the class information class of the ith target in the image_iAnd its minimum two-dimensional bounding box 2d _ bbox_{img_i}(u_i,v_i,w_i,h_i) Wherein u is_iIs the ith target two-dimensional minimum bounding box 2d _ bbox in the image_{img_i}Left upper corner point abscissa, v_iIs the ordinate of the upper left corner point, w_iIs the width of the smallest bounding box, h_iIs the height of the smallest enclosing frame.

6. The method of claim 4, wherein step 4 comprises: the target vision two-dimensional information 2d _ bbox obtained by the neural network in the step 3 is processed_{img_i}Using the reprojection matrix Q of step 2_reprojectionRe-projecting the three-dimensional minimum bounding box to the camera coordinate system to obtain the three-dimensional minimum bounding box bbox of the i visual detection targets in the camera coordinate system_{cam_i}(ii) a Mixing bbox_{cam_i}Geometric center P_iAs target three-dimensional position, with bbox_{cam_i}Class_iAnd are output together as the front target visual detection result.

7. The method of claim 1, wherein the step 5 comprises: filtering ground point cloud and noise point cloud by using RANSAC algorithm, and segmenting point cloud of all road targets; clustering all road target point clouds into a plurality of point cloud clusters by using a clustering algorithm, wherein each point cloud cluster comprises all point clouds of one road target; calculating to obtain the minimum three-dimensional bounding box bbox of each target point cloud cluster_{lidar_k}(ii) a Mixing bbox_{lidar_k}Geometric center Q_kAs target three-dimensional position, with bbox_{lidar_k}And the detection results are output together as the detection result of the front target radar.

8. The method of claim 1, wherein the step 6 comprises:

firstly, time registration of vision and radar detection results is carried out, and the vision bbox with the time stamp difference smaller than the preset time is obtained_{cam_i}And radar bbox_{lidar_k}The information is regarded as the same time information, and subsequent fusion is carried out;

calibrating the obtained external parameter M by using the combination of a camera and a laser radar_LR＝[R_LR|T_LR]Converting the visual detection result into a radar coordinate system to realize the spatial registration of the visual detection result and the radar detection result;

visual bbox after temporal and spatial registration_{cam_i}And radar bbox_{lidar_k}And 3DIOU matching cost calculation is carried out.

9. The method of claim 8, wherein the matching cost calculation formula is:

wherein bbox_{cam_i}I bbox_{lidar_k}Is bbox_{cam_i}And bbox_{lidar_k}The three-dimensional frame volume of the intersection; bbox_{cam_i}I bbox_{lidar_k}Is bbox_{cam_i}And bbox_{lidar_k}A three-dimensional frame volume of the union; c. C_ikIs bbox_{cam_i}And bbox_{lidar_k}The minimum length of the diagonal line surrounding the cuboid; d_ikIs a geometric center P_iAnd Q_kThe euclidean distance of (c).

10. The method of claim 8, wherein the matching results are used to classify visual and lidar detection targets into three categories: detected by vision only, radar and vision simultaneously; and only the visual detection or only the radar detection is regarded as unsuccessful matching, the radar and the visual detection are regarded as successful matching, the visual or radar detection information is reserved for the target which is unsuccessfully matched, and the information of the target which is successfully matched is further supplemented and corrected.