CN114332158B

CN114332158B - 3D real-time multi-target tracking method based on fusion of camera and laser radar

Info

Publication number: CN114332158B
Application number: CN202111553630.8A
Authority: CN
Inventors: 王西洋; 傅春耘; 赖颖; 李占坤; 何嘉伟
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2024-05-07
Anticipated expiration: 2041-12-17
Also published as: CN114332158A

Abstract

The invention relates to a 3D real-time multi-target tracking method based on fusion of a camera and a laser radar, which belongs to the field of environmental perception and comprises the following steps: s1: obtaining two-dimensional information and three-dimensional information of a target object at each moment; s2: the method comprises the steps of merging to obtain a target object detected by both the laser radar and the camera, a target object detected by only the laser radar and a target object detected by only the camera; s3: matching a target object detected by two sensors with a three-dimensional track; s4: matching a target object detected by the laser radar with the three-dimensional tracks on the rest unmatched target object; s5: matching the two-dimensional track of the target object detected by the camera only; s6: matching the three-dimensional track with the two-dimensional track; s7: the track is managed.

Description

3D real-time multi-target tracking method based on fusion of camera and laser radar

Technical Field

The invention belongs to the field of environment perception, and relates to a 3D real-time multi-target tracking method based on fusion of a camera and a laser radar.

Background

The currently mainstream multi-target tracking method is based on tracking-by-detection, and the process is divided into two steps: 1) Detecting a target; 2) And (5) data association. Along with the higher and higher precision of target detection, the tracking accuracy is correspondingly improved, and more difficulties still exist in the data association stage which is the most important for multi-target tracking, and how to overcome the missing heel and the false heel caused by inaccurate detection and shielding is still a challenge. The existing multi-target tracking method is mainly divided into target tracking based on cameras and target tracking based on laser radars.

Camera-based methods utilize information of a target object on an RGB image to accomplish object similarity-related tasks, typically appearance information, motion information, and the like. Camera-based multi-target tracking methods are typically 2D, i.e. tracking on an image plane, but there are also 3D tracking with binocular camera extraction depth. The main stream is that target information is extracted by using a target detection algorithm, then prediction is performed by Kalman filtering, and then a cost matrix between each object of the front frame and the rear frame is calculated, for example: and finally, matching and associating by using a Hungary algorithm or a greedy algorithm. Later, a learner puts forward a single-stage tracking framework (Jointly LEARNS THE Detector and Embedding model) for considering that the detection and tracking can be performed simultaneously, and puts forward a tracking model for embedding the target detection and the appearance into the shared structure learning, so that the tracking model is beneficial to preventing the false heel phenomenon caused by missed detection, can solve the heel missing problem, and obtains better results.

Lidar-based methods typically have depth information and thus facilitate 3D tracking. Since the deep learning method makes a major breakthrough in processing the laser radar point cloud data, the laser radar-based 3D tracking is becoming popular. Tracking using lidar point cloud data lacks pixel information such as vision, and despite recent developments in extraction of point cloud features, point cloud-based appearance features as a whole are less accurate than vision-based appearance features.

Disclosure of Invention

Accordingly, the present invention is directed to a method for 3D real-time multi-target tracking based on camera and lidar fusion

In order to achieve the above purpose, the present invention provides the following technical solutions:

The invention discloses a 3D real-time multi-target tracking method based on fusion of a camera and a laser radar, which comprises the following steps:

A3D real-time multi-target tracking method based on camera and laser radar fusion comprises the following steps:

S1: obtaining two-dimensional information of the target object at each moment by using a camera-based 2D detector, and obtaining three-dimensional information of the target object at each moment by using a laser radar-based 3D detector;

S2: three-dimensional information of a target object obtained by detection based on the laser radar is projected onto an image plane through coordinate system conversion and is fused with the target object obtained by detection based on the camera, so that the target object detected by both sensors, the target object detected by only the laser radar and the target object detected by only the camera are obtained;

S3: matching the target object detected by the two sensors at the time t with the three-dimensional track at the time t-1 through Kalman filtering prediction to obtain a three-dimensional track at the time t, and obtaining a successfully matched track and an unsuccessfully matched track;

s4: matching a target object detected by the laser radar with the three-dimensional tracks on the rest unmatched target object;

S5: matching a target object detected by a camera at the moment t with a two-dimensional track at the moment t-1 through Kalman filtering prediction to obtain a two-dimensional track at the moment t;

s6: the three-dimensional track at the time t is projected to an image plane through coordinate system transformation to be matched with the corresponding two-dimensional track;

s7: initializing a new confirmation track for the target which is not matched and is detected by the two sensors, initializing a three-dimensional track to be confirmed for the target object which is not matched and is detected by the laser radar, and converting the three-dimensional track into the confirmation track if three continuous frames are matched; initializing an unmatched target object which is only detected by a camera into a two-dimensional track to be confirmed, and converting the two-dimensional track into a confirmation track if three continuous frames are matched; 6 frames are reserved for the two-dimensional and three-dimensional tracks on the unmatched tracks.

Further, the information of the target object obtained by the 3D detector in step S1 is represented by three-dimensional bounding box= (x, y, z, w, h, l, θ), where (x, y, z) is a coordinate of a center point of the three-dimensional detection frame in a laser radar coordinate system, (w, h, l) is a width, a height, and a length of the three-dimensional detection frame, θ is an orientation angle (THE HEADING ANGLE), and the target object detected by the 2D detector is represented by two-dimensional bounding box= (x _c,y_c, w, h), where (x _c,y_c) is a coordinate of a center point of the target detection frame in a pixel coordinate system, and (w, h) is a width and a height of the detection frame, respectively.

Further, the step S2 specifically includes the following steps:

S21: the three-dimensional bounding box is projected onto an image plane through coordinate system conversion to obtain a two-dimensional bounding box,

Then calculating the cross-over ratio with a binding box based on the target object detected by the camera, wherein the cross-over ratio is calculated in the following way:

Where b _{3d_2d} denotes a three-dimensional to two-dimensional bounding box, and b _2d denotes a two-dimensional bounding box based on a target object detected by a camera;

S22: comparing the calculated intersection ratio d ^2diou with a threshold value sigma ₁, and if d ^2diou≥σ₁, considering that the target is detected by the laser radar or the camera, the target is expressed as If d ^2diou<σ₁, it is considered that each of them is a target object detected by two kinds of sensors separately, and the object detected by using the camera separately is expressed asObjects detected using lidar alone are denoted/>

Further, the step S3 specifically includes:

S31: target object detected by detector at t moment The track predicted from T-1 at T moment is T _3D＝{T₁,T₂,…,T_m, and the cross-over ratio and normalized Euclidean distance between D _fusion and T _3D are calculated to form a cost function

The calculation mode of the cross-over ratio is that

The normalized European calculation mode is

S32: if D _fusion is greater than the threshold σ ₂, the track is considered to be successfully matched with the measurement, the track T _matched is updated with the corresponding measurement information, and if D _fusion<σ₂, the measurement D _unmatched on the unmatched track T _unmatched enters the next stage of matching.

Further, the step S4 specifically includes:

S41: matching the unmatched track T _unmatched in the step S32 with an object D _only3d detected only under the laser radar by using a Hungary algorithm, wherein a cost function is as follows;

the calculation mode of the cross-over ratio is that

The normalized European calculation mode is

S42: comparing the cost function with a threshold value, if the cost function is larger than the threshold value, measuring and track matching are successful, the track is updated by corresponding measurement, the track which is not to be confirmed is initialized by measurement on the rest of unmatched track, and the next stage of matching is carried out on the unmatched track.

Further, the hungarian algorithm in step S41 is as follows:

S411: subtracting the minimum value of each row from each row of the cost matrix formed by the cost function;

s412: subtracting the minimum value of the column from each column of the new matrix;

s413: using the least row lines and column lines to cover all elements with zero values in the new matrix, if the row lines and column lines cannot cover all zero elements at the moment, entering S414, otherwise entering S415;

s414: finding the minimum value of the elements which cannot be connected by using the row line and the column line, subtracting the minimum value from the rest elements, and adding the minimum value to the elements at the intersection of the row line and the column line;

s415: and (3) starting matching from the row or column with the least zero elements until each row is matched, and obtaining the optimal matching scheme.

Further, the step S5 specifically includes:

s51: matching the two-dimensional track in the image plane with the target object detected only by the camera by using a Hungary algorithm, wherein the step of calculating the cost function is the same as the step S41;

S52: comparing the cost function with a threshold sigma ₃, if the cost function is larger than the threshold sigma ₃, measuring and track matching are successful, updating the two-dimensional track by using corresponding measurement, and if the cost function is smaller than the threshold sigma ₃, initializing the track which is not to be confirmed by measurement on unmatched track.

Further, the step S6 specifically includes:

s61: the three-dimensional tracks are projected to an image plane through coordinate system transformation to form two-dimensional tracks, wherein the three-dimensional tracks are based on a laser radar coordinate system, and the two-dimensional tracks are based on the mode that the three-dimensional binding box _3D of each three-dimensional track is converted to the image plane to form a two-dimensional binding box _{3D_to_2D} in the image coordinate system, and the method comprises the following steps:

bounding box_{3D_to_2D}＝P_rect·R₀·Tr_{velo_to_cam}·bounding box_3D

wherein P _rect2 is an internal reference matrix of the camera, R ₀ is a correction rotation matrix, and Tr _{velo_to_cam} is a transformation matrix for converting points in the laser radar into a camera coordinate system;

S62: matching the two-dimensional track binding box _{3D_to_2D} obtained after conversion with the two-dimensional track binding box _2D formed in the step S52, wherein the cost function is calculated as follows

S63: and comparing the cost value between the successfully matched pairs with a threshold sigma ₄, and if the cost value is larger than sigma ₄, fusing the two-dimensional track and the three-dimensional track, namely updating the two-dimensional track information by using the three-dimensional track information.

The invention has the beneficial effects that: the tracking framework of the invention can arbitrarily integrate the currently mainstream 2D and 3D detectors, fully integrate the characteristics of the camera and the laser radar, realize the integration of 2D and 3D tracks, carry out 2D tracking when the target is located at a distance and only detected by the camera, and carry out 3D tracking once the target enters the detection range of the laser radar sensor.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of an algorithm of the present invention;

fig. 2 is a schematic diagram showing the fusion of a three-dimensional track and a two-dimensional track.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.

Referring to fig. 1-2, the invention provides a 3D real-time multi-target tracking method based on fusion of a camera and a laser radar, comprising the following steps:

Step 1: obtaining two-dimensional information of the target object at each moment by using a camera-based 2D detector, and obtaining three-dimensional information of the target object at each moment by using a laser radar-based 3D detector; the information of the target object obtained by the 3D detector is represented by a three-dimensional bounding box= (x, y, z, w, h, l, θ), wherein (x, y, z) is the coordinate of the center point of the three-dimensional detection frame under the laser radar coordinate system, (w, h, l) is the width, height and length of the three-dimensional detection frame, θ is the orientation angle (THE HEADING ANGLE), and the target object detected by the 2D detector is represented by a two-dimensional bounding box= (x _c,y_c, w, h), wherein (x _c,y_c) is the coordinate of the center point of the target detection frame under the pixel coordinate system, and (w, h) is the width and height of the detection frame.

Step 2: three-dimensional information of the target object obtained by detection based on the laser radar is converted and projected onto an image plane through a coordinate system, and is fused with the target object obtained by detection based on the camera, so that the target object detected by both sensors, the target object detected only by the laser radar and the target object detected only by the camera can be obtained; the method specifically comprises the following steps:

2-1: the three-dimensional binding box and the two-dimensional binding box are projected onto an image plane through coordinate system conversion to obtain a two-dimensional binding box, and then the two-dimensional binding box is used for calculating the cross-over ratio with the binding box based on the target object detected by the camera, wherein the cross-over ratio calculating mode is that Where b _{3d_2d} denotes a three-dimensional to two-dimensional bounding box, which is denoted as a two-dimensional bounding box based on the target object detected by the camera.

2-2: Comparing the calculated intersection ratio d ^2diou with a threshold value sigma ₁, and if d ^2diou≥σ₁, considering that the target is detected by the laser radar or the camera, the target is expressed asIf d ^2diou<σ₁, it is considered that each of them is a target object detected by two kinds of sensors separately, and the object detected by using the camera separately is expressed asObjects detected using lidar alone are denoted/>

Step 3: matching the target object detected by the two sensors at the time t with the three-dimensional track at the time t-1 through Kalman filtering prediction to obtain a three-dimensional track at the time t, and obtaining a track successfully matched and a track unsuccessfully matched; the method specifically comprises the following steps:

3-1: target object detected by detector at t moment The track predicted from the T-1 moment at the T moment is T _3D＝{T₁,T₂,…,T_m, and the cross-over ratio and the normalized Euclidean distance between the D _fusion and the T _3D are calculated to form a cost function/>The calculation mode of the cross-over ratio is thatNormalized European calculation mode is/>

3-2: If D _fusion is greater than the threshold σ ₂, then the track is considered to be successfully matched with the measurement, the track is updated with the corresponding measurement information (T _matched), and if D _fusion<σ₂, the unmatched measurement (D _unmatched) is initialized to a new validation track, and the unmatched track (T _unmatched) enters the next stage of matching.

Step 4: matching a target object detected by only the laser radar with the three-dimensional tracks on the rest unmatched object; the method specifically comprises the following steps:

4-1: matching the unmatched track in the step 3 with an object detected only under the laser radar by using a Hungary algorithm, wherein the calculation of a cost function is the same as that in the step 3, and the method is as follows;

the calculation mode of the cross-over ratio is that

The normalized European calculation mode is

Wherein, the Hungary algorithm comprises the following steps:

1) Subtracting the minimum value of each row from each row of the cost matrix formed by the cost function

2) Subtracting the minimum value of the column from each column of the new matrix

3) Using the least row and column lines to cover all elements with zero values in the new matrix, if the row and column lines cannot cover all zero elements at the moment, entering step 4), otherwise, entering step 5

4) Finding the minimum value of the elements that cannot be connected using the row and column lines, subtracting the minimum value from the remaining elements, and adding the minimum value to the element at the intersection of the row and column lines

5) Starting matching from the row or column with the least zero elements until each row is matched, and obtaining the optimal matching scheme

4-2: Comparing the cost function with a threshold value, if the cost function is larger than the threshold value, measuring and track matching are successful, the track is updated by corresponding measurement, the track which is not to be confirmed is initialized by measurement on the rest of unmatched track, and the next stage of matching is carried out on the unmatched track.

Step 5: matching a target object detected by a camera at the moment t with a two-dimensional track at the moment t-1 through Kalman filtering prediction to obtain a two-dimensional track at the moment t; the method specifically comprises the following steps:

5-1: matching the two-dimensional track in the image plane with the target object detected only by the camera using a hungarian algorithm;

5-2: comparing the cost function with a threshold sigma ₃, if the cost function is larger than the threshold sigma ₃, measuring and track matching are successful, updating the two-dimensional track by using corresponding measurement, and if the cost function is smaller than the threshold sigma ₃, initializing the track which is not to be confirmed by measurement on unmatched track.

Step 6: the three-dimensional track at the time t is projected to an image plane through coordinate system transformation to be matched with the corresponding two-dimensional track; the method specifically comprises the following steps:

6-1: the three-dimensional trajectories are projected to the image plane through coordinate system transformation to form two-dimensional trajectories, wherein the three-dimensional trajectories are based on a laser radar coordinate system, the two-dimensional trajectories are converted from a three-dimensional bounding box _3D of each three-dimensional trajectory to the image plane to form a two-dimensional bounding box _{3D_to_2D} in the image coordinate system, and the method comprises the following steps:

A sounding box _{3D_to_2D}＝P_rect·R₀·Tr_{velo_to_cam}·bounding box_3D, where P _rect2 is an internal reference matrix of the camera, R ₀ is a correction rotation matrix, tr _{velo_to_cam} is a transformation matrix that converts points in the lidar to a camera coordinate system;

6-2: matching the two-dimensional track binding box _{3D_to_2D} obtained after conversion with the two-dimensional track binding box _2D formed in the step 5 by using a Hungary algorithm, wherein the cost function is the same as that in the step 5;

6-3: track fusion schematic fig. 2, for an object with track number 2268, the camera has detected the presence of the object at frame 7, but the lidar has not detected the object until frame 33, the method starts 2D tracking the object at frame 9, and 3D tracking the object when it is present within the range of the lidar sensor, i.e. starting at frame 33.

Step 7: initializing a new confirmation track for the target which is not matched and is detected by the two sensors, initializing a three-dimensional track to be confirmed for the target which is not matched and is detected by the laser radar (converting into the confirmation track if the two continuous frames are matched), initializing a two-dimensional track to be confirmed for the target which is not matched and is detected by the camera only (converting into the confirmation track if the two continuous frames are matched), and reserving 6 frames for the two-dimensional track and the three-dimensional track which are not matched;

finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A3D real-time multi-target tracking method based on camera and laser radar fusion is characterized in that: the method comprises the following steps:

2. The camera and lidar fusion-based 3D real-time multi-target tracking method of claim 1, wherein: the information of the target object obtained by the 3D detector in the step S1 is represented by a three-dimensional bounding box= (x, y, z, w, h, l, θ), where (x, y, z) is a coordinate of a center point of the three-dimensional detection frame in a laser radar coordinate system, (w, h, l) is a width, a height, and a length of the three-dimensional detection frame, θ is an orientation angle (THE HEADING ANGLE), and the target object detected by the 2D detector is represented by a two-dimensional bounding box= (x _c,y_c, w, h), where (x _c,y_c) is a coordinate of a center point of the target detection frame in a pixel coordinate system, and (w, h) is a width and a height of the detection frame.

3. The camera and lidar fusion-based 3D real-time multi-target tracking method of claim 1, wherein: the step S2 specifically comprises the following steps:

4. The camera and lidar fusion-based 3D real-time multi-target tracking method of claim 1, wherein: the step S3 specifically comprises the following steps:

The calculation mode of the cross-over ratio is that

The normalized European calculation mode is

5. The camera and lidar fusion-based 3D real-time multi-target tracking method of claim 4, wherein the method comprises the steps of: the step S4 specifically comprises the following steps:

the calculation mode of the cross-over ratio is that

The normalized European calculation mode is

6. The camera and lidar fusion-based 3D real-time multi-target tracking method of claim 5, wherein the camera and lidar fusion-based 3D real-time multi-target tracking method is characterized by: the hungarian algorithm in step S41 is as follows:

7. The camera and lidar fusion-based 3D real-time multi-target tracking method of claim 6, wherein: the step S5 specifically comprises the following steps:

8. The camera and lidar fusion-based 3D real-time multi-target tracking method of claim 7, wherein: the step S6 specifically comprises the following steps:

bounding box_{3D_to_2D}＝P_rect·R₀·Tr_{velo_to_cam}·bounding box_3D