CN111161325B

CN111161325B - Three-dimensional multi-target tracking method based on Kalman filtering and LSTM

Info

Publication number: CN111161325B
Application number: CN201911416915.XA
Authority: CN
Inventors: 彭永坚; 汪壮雄; 周智恒; 黄宇; 彭明; 朱湘军
Original assignee: GUANGZHOU VIDEO-STAR ELECTRONICS CO LTD; South China University of Technology SCUT
Current assignee: GUANGZHOU VIDEO-STAR ELECTRONICS CO LTD; South China University of Technology SCUT
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-05-23
Anticipated expiration: 2039-12-31
Also published as: CN111161325A

Abstract

The invention discloses a three-dimensional target tracking method based on Kalman filtering and LSTM, which comprises the following steps: initializing the track of an input three-dimensional target frame; updating and denoising the three-dimensional target frame track by using a constant-rate Kalman filtering algorithm to obtain a predicted track set; carrying out data association on the predicted track and the three-dimensional target frame of the current frame by using a Hungary algorithm and updating a Kalman filter; using the denoised three-dimensional target frame sequence for training a long-short-term memory network; and tracking and predicting the three-dimensional target by using a constant-rate Kalman filtering algorithm and a Hungary algorithm and the trained LSTM. The traditional target tracking method based on Kalman filtering has the problem of insufficient nonlinear fitting capability, and the method is the biggest difference from the traditional method in that the strong characteristic extraction capability of a deep learning model LSTM is used, so that a more complex motion model can be fitted, the tracking result is smoother, and meanwhile, the speed of a tracking system is improved.

Description

Three-dimensional multi-target tracking method based on Kalman filtering and LSTM

Technical Field

The invention relates to the field of computer vision, in particular to a three-dimensional multi-target tracking method based on Kalman filtering and LSTM.

Background

Three-dimensional multi-target tracking is an important component of video processing and computer vision, is widely applied to automatic driving, benefits from improvement of accuracy of a detection algorithm, and is mainly based on detection. In a tracking algorithm based on detection, a target detector detects an image of each frame to obtain a target detection frame, and then the motion information and frame information of the target are utilized to correlate and track the target frame to obtain a track of the target.

When the traditional tracking algorithm based on detection is applied to three-dimensional multi-target tracking, the real-time performance of the tracking algorithm is seriously dependent on the detection speed of a three-dimensional target detector. The current mainstream three-dimensional target detector has a low speed, so that the existing tracking algorithm based on detection cannot be directly applied to three-dimensional multi-target tracking. Meanwhile, the three-dimensional frame of the tracking target is obvious in jitter due to noise of the three-dimensional target frame output by the detector, so that the tracking result is not smooth and stable enough.

Disclosure of Invention

In order to solve the above technical problems, an embodiment of the present invention provides a three-dimensional multi-target tracking method based on kalman filtering and LSTM, including:

s1, initializing tracks of input three-dimensional target frames, wherein whether a track is newly established is determined according to whether the three-dimensional frames of a t+1st frame are matched with the three-dimensional frames of the t frame, and because false positive samples possibly exist in a three-dimensional target detection result, the new track is initialized only when two continuous frames have the same target;

s2, updating and denoising the frame track of the t-frame three-dimensional target by using a constant-rate Kalman filtering algorithm to obtain a real track set

Then predicting to obtain a predicted track set +.>

Wherein the prediction track set->

Representing a predicted track set of the t+1st frame;

s3, carrying out data association on the predicted track and the three-dimensional target frame of the current frame by using a Hungary algorithm and updating a Kalman filter;

s4, using the denoised three-dimensional target frame sequence for training a long-short-term memory network LSTM;

s5, tracking and predicting a three-dimensional target by using a constant-rate Kalman filtering algorithm and a Hungary algorithm and a trained LSTM, if each frame is used for three-dimensional target frame detection, a three-dimensional target detection result can be obtained every F frames because the main stream three-dimensional target detector generally has the problem of low detection speed, and the middle F frames are predicted by using the LSTM model, so that the tracking result is smoother and the speed is increased.

Further, the track initialization process in the step S1 is as follows:

by using

An ith three-dimensional object border representing a t-th frame,>

wherein x, y, z, l, w, h and θ respectively represent the x-axis coordinate, the y-axis coordinate and the z-axis coordinate of the three-dimensional target frame in the camera coordinate system;

length, width and height of three-dimensional target frame and observation angle of target, set D _t A set representing all three-dimensional object frames of the t-th frame;

if the ratio of cross-over

That is, when the intersection ratio (Intersection over Union, ioU) of the ith three-dimensional target frame of the t frame and the jth three-dimensional target frame of the (t+1) frame is greater than or equal to the threshold value threshold, a track is newly created>

Wherein k represents the kth track, and the track set at time t+1 is denoted as T _t+1 The remaining three-dimensional object frames are discarded.

Further, the data association in the step S3 is specifically as follows:

three-dimensional target frame set D of current t-th frame _t Prediction track set T obtained by Kalman filtering algorithm _t ^p The input Hungary algorithm of the system obtains a data association result;

in the result, three-dimensional object frame set D _t Divided into two sets

Respectively representing a matched three-dimensional detection frame set and an unmatched three-dimensional target frame set by +.>

Update, for collection->

Step S1 is executed to initialize a three-dimensional target track;

in the result of the data association, the unmatched tracks will be discarded and the matched tracks remain.

Further, the training long-short-time memory network LSTM of step S4 is specifically as follows:

setting a time step L of LSTM, cutting tracks in a track set according to a frame number L+1, discarding tracks with a length less than L+1, inputting a three-dimensional target frame sequence of the previous L frames as LSTM, and training the LSTM to obtain a three-dimensional target track prediction model by using a label of the last frame as LSTM.

Further, the tracking and predicting of the three-dimensional object in the step S5 is specifically as follows:

setting an interval frame number F, if the current track frame number is not equal to N (F+1) +1 and is larger than L, starting an LSTM network to predict a three-dimensional target frame of a next frame, wherein N is a natural number, acquiring the three-dimensional target frame every other F frames, and performing target tracking on the three-dimensional target frame sequence of the interval F frames by adopting a constant rate Kalman filtering algorithm and a Hungary algorithm, namely executing steps S1 to S3 to obtain a denoised three-dimensional target frame, wherein the union of two model prediction results is a final tracking result.

Compared with the prior art, the invention has the following advantages and effects:

high efficiency: according to the invention, the LSTM is utilized to predict the three-dimensional target frame of the middle interval frame number, so that the frequency of acquiring the three-dimensional target frame by a tracking algorithm is reduced, and the speed of the three-dimensional multi-target tracking method is greatly improved;

stability: because of the strong nonlinear fitting capability of the depth network LSTM, the track jitter output by the tracking algorithm is smaller and more stable.

Drawings

Fig. 1 is a schematic diagram of a kalman filter and LSTM fusion for processing different frames according to a first embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

First embodiment

The embodiment discloses a layer-by-layer updating algorithm based on generation of an countermeasure network, which specifically comprises the following steps:

step S1, initializing tracks of input three-dimensional target frames, wherein the initialization determines whether to newly establish a track according to whether the three-dimensional frames of the t+1st frame are matched with the three-dimensional frames of the t frame, and because false positive samples possibly exist in the three-dimensional target detection result, the new track is initialized only when the same target appears in two continuous frames. The specific process is as follows:

by using

An ith three-dimensional object border representing a t-th frame,>

wherein x, y, z, l, w, h and θ respectively represent the x-axis coordinate, y-axis coordinate and z-axis coordinate of the three-dimensional target frame in the camera coordinate system, the length, width and height of the three-dimensional target frame, the observation angle of the target, and the set D _t Representing the set of all three-dimensional object frames of the t-th frame, if the cross-ratios are

Taking threshold=0.7, that is, the intersection ratio IoU of the ith three-dimensional target frame of the t frame and the jth three-dimensional target frame of the t+1st frame is greater than or equal to the threshold, then creating a track

Then predicting to obtain a predicted track set +.>

Wherein->

Representing a predicted track set of the t+1st frame;

and S3, carrying out data association on the predicted track and the three-dimensional target frame of the current frame by using a Hungary algorithm and updating a Kalman filter. The specific process is as follows:

three-dimensional target frame set D of current t-th frame _t Prediction track set T obtained by Kalman filtering algorithm _t ^p The input Hungary algorithm of (2) to obtain a data association result, and a three-dimensional target frame set D is obtained in the result _t Divided into two sets

Update, for collection->

And step S1 is executed to initialize the three-dimensional target track, in the result of data association, the unmatched track is discarded, and the matched track is reserved.

And S4, using the denoised three-dimensional target frame sequence for training the long-short-term memory network LSTM. The specific process is as follows:

setting a time step L of LSTM, taking L=30, wherein the video frame rate is 30 frames per second, cutting tracks in a track set according to the frame number L+1, discarding tracks with the length less than L+1, inputting a three-dimensional target frame sequence of the previous L frames as LSTM, and training the LSTM to obtain a three-dimensional target track prediction model by using a label of the last frame as LSTM.

Step S5, tracking and predicting three-dimensional targets by using a constant-rate Kalman filtering algorithm and a Hungary algorithm and a trained LSTM, if each frame is used for three-dimensional target frame detection, a three-dimensional target detection result can be obtained every F frames because the main stream three-dimensional target detector generally has the problem of low detection speed, F=5 is taken here, the middle F frames are predicted by using the LSTM model, and the tracking result can be smoother and the speed is increased. The specific process is as follows:

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The three-dimensional multi-target tracking method based on Kalman filtering and LSTM is characterized by comprising the following steps of:

s1, initializing tracks of input three-dimensional target frames, and determining whether to establish a track according to whether the three-dimensional frames of the (t+1) th frame are matched with the three-dimensional frames of the (t) th frame;

s2, updating and denoising the frame track of the t frame three-dimensional target by using a constant-rate Kalman filtering algorithm to obtain a real track set, and predicting to obtain a predicted track set, wherein the predicted track set represents the predicted track set of the t+1st frame;

s5, tracking and predicting a three-dimensional target with the trained LSTM by using a constant-rate Kalman filtering algorithm and a Hungary algorithm;

the tracking and predicting of the three-dimensional target in step S5 is specifically as follows:

2. The three-dimensional multi-target tracking method based on kalman filtering and LSTM according to claim 1, wherein the trajectory initialization process of step S1 is as follows:

by using

An ith three-dimensional object border representing a t-th frame,>

if the ratio of cross-over

Namely, when the intersection ratio of the ith three-dimensional target frame of the t frame and the jth three-dimensional target frame of the t+1st frame is greater than or equal to a threshold value threshold; newly created track +.>

3. The three-dimensional multi-target tracking method based on kalman filtering and LSTM according to claim 1, wherein the data association in step S3 is specifically as follows:

three-dimensional target frame set D of current t-th frame _t Obtained by Kalman filtering algorithmIs set of predicted trajectories T of (1) _t ^p The input Hungary algorithm of the system obtains a data association result;

in the result, three-dimensional object frame set D _t Divided into two sets

Update, for collection->

Step S1 is executed to initialize a three-dimensional target track;

4. The three-dimensional multi-target tracking method based on kalman filtering and LSTM according to claim 1, wherein the training long-short-term memory network LSTM of step S4 is specifically as follows: