CN110288627B

CN110288627B - Online multi-target tracking method based on deep learning and data association

Info

Publication number: CN110288627B
Application number: CN201910429444.XA
Authority: CN
Inventors: 陈小波; 冀建宇; 王彦钧; 蔡英凤; 王海; 陈龙
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2023-03-31
Anticipated expiration: 2039-05-22
Also published as: CN110288627A

Abstract

The invention discloses an online multi-target tracking method based on deep learning and data association, which comprises the following steps: 1. inputting an image of a current frame of a video; 2. obtaining all detection responses in the image by using a target detector; 3. extracting appearance characteristics of detection response by using a depth cosine metric learning model; 4. initializing a target state; 5. predicting the position and the scale of the target in the next frame by using a Kalman filtering algorithm; 6. matching and associating the target and the detection response based on the two-stage data association to obtain an optimal association result; 7. updating the state and the characteristics of the target according to the optimal correlation result in the step 6; 8. and inputting the image of the next video frame, and repeating the steps 2,3, 4, 5, 6 and 7 until the video is finished. Compared with the prior art, the method can realize correct association between the targets under the complex conditions of target interaction and shielding, similar appearance among the targets and the like, and complete robust and continuous multi-target tracking.

Description

Online multi-target tracking method based on deep learning and data association

Technical Field

The invention relates to a target tracking method, in particular to an online multi-target tracking method based on deep learning and data association, and belongs to the field of computer vision.

Background

The multi-target tracking technology is a particularly important branch in the field of computer vision, and is widely applied to various video analysis scenes, such as automatic driving of automobiles, robot navigation, intelligent traffic video monitoring, motion analysis and the like.

The task of online multi-target tracking is to reliably estimate the position of a target frame by frame and to track the same target across frames to estimate the trajectories of multiple targets. In recent years, due to the development of deep learning, the performance of a target detection algorithm is continuously improved, the detection response is more reliable, a Tracking-by-detection (Tracking-by-detection) framework based on detection is widely concerned, remarkable effects are obtained, and the target detection algorithm becomes the mainstream of the current multi-target Tracking. Under the tracking framework, firstly, a target detector trained offline is used for independently detecting targets in each frame of image to obtain the number and the positions of the targets, and then the targets detected in adjacent frames are associated according to the information of the appearance, the motion and the like of the targets to realize the matching and the tracking of the targets. Detection-based tracking algorithms can be divided into two categories: offline tracking and online tracking.

At present, a detection-based tracking algorithm also faces a plurality of challenges, the tracking effect depends heavily on the performance of a detector, and in a complex scene, when a target and an obstacle or a target are seriously shielded, a multi-target tracking algorithm is easy to lose track of the target or the number of the target is disordered. Secondly, the detection noise of the target detector and the drastic change of the target scale can also cause the tracking drift of the multi-target tracking algorithm.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides an online multi-target tracking method based on deep learning and data association, and aims to solve the problems of serious number switching, tracking drift and the like in the conventional multi-target tracking technology when targets with similar appearances are shielded mutually in a complex scene.

The invention provides a novel multi-target tracking method, which solves the problem of multi-target tracking from multiple angles. 1) The appearance model of the target is designed by adopting a deep cosine metric learning model, the characteristics are extracted from the target image by utilizing a multilayer convolution network, and the cosine among characteristic vectors is taken as the similarity among the appearances of the target, so that the effective identification of different appearances of the target is realized; 2) In consideration of the continuity of dynamic change of the target appearance, a target appearance similarity measurement method fusing multi-frame historical appearance characteristics is constructed, so that the influence of the defects of a detector or the mutual shielding of targets on the target matching precision can be effectively relieved; 3) A two-phase data association method based on a target state is provided, corresponding association strategies are respectively designed aiming at the reliability of the target, and a Hungarian algorithm is adopted for data association. Under the complex traffic scene of congestion and frequent shielding, the algorithm can realize accurate and stable multi-target tracking.

The technical scheme is as follows: an online multi-target tracking method based on deep learning and data association is characterized by comprising the following steps:

step 1: inputting an image of a current frame of a video;

step 2: obtaining a set D of all detection responses in an image using a target detector ^t ＝{D ₁ ,D ₂ ,…,D _M T is the currentFrame number, D _j Is the jth detection response, denoted as

Wherein->

To detect a response D _j Is based on the center point coordinates of>

To detect a response D _j M is the total number of detection responses;

and 3, step 3: from the detection response set D using a deep cosine metric learning model ^t All detection responses in (a) extract an appearance feature vector, denoted as { Z } ₁ ,Z ₂ ,…,Z _M In which Z is _j ∈R ^p To detect a response D _j The appearance characteristics of (a);

and 4, step 4: initializing the target state, and dividing the target state into 4 types: an initial state, a tracking state, a loss state, and a deletion state; if T =1, i.e. the first frame of the input video, the target set T is generated ^t ＝{T ₁ ,T ₂ ,…,T _N H, N = M, target T _j And detection response D _j Correspond and target T _j Setting the state of (1) as an initial state, and turning to the step (1); otherwise, go to step 5;

and 5: predicting a target set T by applying a Kalman filtering algorithm ^t-1 Each target T in (1) _i The position and scale in the current frame are expressed as

Wherein +>

For the predicted center point coordinate, is>

Predicted width and height;

step 6: matching and associating the target and the detection response based on the two-stage data association to obtain an optimal association result;

and 7: updating the state and the characteristics of the target according to the optimal correlation result in the step 6;

and step 8: and inputting the image of the next video frame, and repeating the

steps

2,3, 4, 5, 6 and 7 until the video is finished.

Preferably, the step 6 is based on the matching association of the target status and the detection response of the two-stage data association, and comprises the following steps:

(a) Based on the state of all targets in the previous frame, set T targets ^t-1 ＝{T ₁ ,T ₂ ,…,T _N Divide into two categories omega ₁ And Ω ₂ ，Ω ₁ ∪Ω ₂ ＝T ^t-1 ，Ω ₁ Composed of targets in initial and tracking states, Ω ₂ Composed of targets in lost state, N is the total number of targets;

(b) Calculate Ω ₁ Each target in (1) and D ^t The matching similarity of each detection response in the sequence table is obtained to obtain a similarity matrix A ₁ (ii) a with-A ₁ To correlate the cost matrix, we will be Ω ₁ Target of (1) and D ^t The detection responses in the step (2) are correlated, and the Hungary algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega ₁ And D ^t Dividing:

D ^t ＝D ^A ∪D ^B wherein->

Target of (1) and D ^A Is successfully associated, in response to a detection of->

For unassociated target sets, D ^B A set of detection responses that are not associated with the first stage;

(c) Calculate Ω ₂ Each target in (1) and D ^B The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A ₂ (ii) a with-A ₂ To correlate the cost matrix, will Ω ₂ Target of (1) and D ^B The detection responses in the step (2) are correlated, and the Hungarian algorithm is applied to solve the optimal correlation. According to the correlation result, dividing omega ₂ And D ^B Dividing:

wherein +>

With a target in>

Is successfully associated, in response to a detection of->

For unassociated sets of targets, based on the number of associated target sets, a method of determining the number of target sets in a cluster based on the number of target sets in a cluster>

Is the unassociated set of detection responses for the second stage.

Preferably, the method calculates Ω ₁ Each object in (1) and D ^t The matching similarity of each detection response in (1), comprising:

(a) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in _j Degree of appearance similarity of

And->

Wherein<*,*>Is the inner product of vectors, X _i (T-K) represents a target T _i Appearance feature vector in t-k frame, z _j Indicating a detection response D _j Apparent feature vector of (a), ω _k Representing an appearance feature vector X _i Weight of (t-k), C _i (T-k) is a target T _i Matching cost with detection response at the t-k frame;

(b) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in (1) _j Similarity of shape of

(c) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in _j Motion similarity of

Is a target T _i In a prediction area of>

And detection response D _j Corresponding region

Cross-over ratio (IOU), where area (×) represents the area;

(d) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in (1) _j Matching similarity A of ₁ (i,j)：

Preferably, the method calculates Ω ₂ Each target in (1) and D ^B The matching similarity of each detection response in (1), comprising:

(a) Calculate Ω using the above equations (1), (2) and (3) ₂ Target T in (1) _i And D ^B Detection response D in _j (iii) degree of appearance similarity of

And shape similarity>

(b) Calculating a target T _i Search radius r of _i ：

Wherein

For the current frame number and the target T _i The difference between the maximum frame numbers when in the tracking state, α, is a constant. With a target T _i Prediction position in a current frame>

Is a center, r _i For the radius, a target T is defined _i Search region R of _i ；

(c) Calculate Ω ₂ Target T in (1) _i And the detection response set D ^B Detection response D in (1) _j Matching similarity A of ₂ (i,j)：

Wherein I (R) _i ∩D _j >0) To indicate the function, when searching for the region R _i And detection response D _j In the presence of overlap, I (R) _i ∩D _j >0) =1, otherwise I (R) _i ∩D _j >0)＝0。

Preferably, the step 7: updating the state and the characteristics of the target according to the optimal correlation result in the step 6, wherein the method comprises the following steps:

(a) For the

The non-associated detection response in (1) indicates that a new target may appear in the video, initializes the new target, and sets the state to the initial state. When the target of the initial state continuously appears f _init Frame, distributing ID for the target, setting state parameter, and then converting the target into tracking state;

(b) For

The target in (1) keeps the state of the target unchanged due to the existence of associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame;

(c) For

The target in (1) converts the target state from a tracking state to a loss state due to no associated detection response, and stores the appearance characteristic vector of the target in the current frame;

(d) For

The target in (2) converts the target state from a lost state to a tracking state due to the existence of associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame;

(e) For

The target in (2), because there is no associated detection response, the target state is kept unchanged;

(f) When the target continues to f _del And if the frame is in a lost state, converting the frame into a deletion state, and destroying the target.

Has the beneficial effects that: 1. according to the method, the appearance model of the target is learned by adopting a depth cosine metric learning model, the characteristics are extracted from the target image by utilizing a multilayer convolution network, the cosine between characteristic vectors is taken as the similarity between the appearances of the targets, the effective identification of the appearances of different targets is realized, and the problem of ID switching caused by the interaction of the targets with similar appearances in a complex scene is effectively solved; 2. according to the method, the continuity of dynamic change of the target appearance is considered, and the influence of detector defects or mutual shielding of targets on matching precision is effectively relieved by constructing a target appearance similarity measurement method integrating multi-frame historical appearance characteristics; 3. according to the invention, by adopting a two-stage data association method based on the target state, corresponding association strategies are respectively designed aiming at different states of the target, and the Hungarian algorithm is adopted to carry out data association, so that the problem of track fracture (Fragment) caused by data association failure is effectively solved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a framework of a deep cosine metric learning model of the present invention;

FIG. 3 is a target state transition diagram of the present invention.

Detailed Description

The technical solution of the present invention will be further explained in detail with reference to the drawings and the specific embodiments, taking the on-line multi-target tracking of pedestrians as an example, but the scope of protection of the present invention is not limited to the following embodiments.

An off-line training stage:

off-line training of a deep cosine metric learning model:

given a set of training samples { (x) _i ,y _i ) I =1,2,3, \ 8230;, L }, where x is _i ∈R ^128×64 For normalized pedestrian images, y _i E {1,2,3, \8230;, K } is the corresponding pedestrian category label, and L is the number of training samples. The deep cosine metric learning model learns a feature extraction function f (x) from a training sample, an input pedestrian image x is mapped into an embedded feature space, and then a cosine softmax classifier is applied to the embedded feature space to maximize the posterior probability of classification. The cosine softmax classifier is defined as follows:

wherein

As a normalized weight vector, ω _k Is a weight vector of class k, τ is a scale parameter, f (x) is a feature vector extracted from the image, f (x) has a unit length. Due to->

And f (x) each have a unit length of

Expressed as the cosine of the angle between two vectors, the angle between each class of targets and its corresponding weight vector can be reduced by maximizing the posterior probability P (y = k | f (x)).

The cross entropy loss function used to train the deep cosine metric learning model is:

wherein I (y) _i K) is an indicator function when y _i K, I (y) _i = k) =1, otherwise I (y) _i ＝k)＝0。

In this embodiment, a convolutional neural network CNN is used to implement the feature extraction function f (x), the structure of the CNN is as shown in fig. 2, the size of the input image is 128 × 64, the length of the output feature vector is 128, and the activation function of each layer is an Exponential Linear Unit (ELU). And training the network by using the pedestrian images in the Market-1501 database, and updating network parameters by using an Adam optimization method.

An online pedestrian multi-target tracking stage:

specifically, as shown in fig. 1, the invention provides an online multi-target tracking method based on deep learning and data association, and the method has the key technical steps as follows:

step 1: inputting an image of a current frame of a video;

step 2: using a detector to obtain a set D of all detection responses in an image ^t ＝{D ₁ ,D ₂ ,…,D _M T is the current frame number, D _j Is the jth detection response, denoted as

Wherein->

To detect a response D _j Is based on the center point coordinates of>

To detect a response D _j M is the total number of detection responses;

in the present embodiment, the pedestrian detector used is a DPM (Deformable Parts Model).

And 3, step 3: detecting a response set D by using the offline trained deep cosine metric learning model ^t All detection responses in (2) extract an appearance feature vector, denoted as { Z } ₁ ,Z ₂ ,…,Z _M In which Z is _j ∈R ^p To detect a response D _j Extracted appearance features;

and 4, step 4: the target state is initialized. Target states are classified into 4 classes: initial state, tracking state, loss state, and delete state. If T =1, i.e. the first frame of the input video, a target set T is generated ^t ＝{T ₁ ,T ₂ ,…,T _N H, N = M, target T _j And detection response D _j Correspond and target T _j Is set as an initial state, and step 1 is carried out. Otherwise, go to step 5.

Wherein +>

For the predicted center point coordinate, is>

Is the predicted width and height;

6.1: based on the state of all targets in the previous frame, set T targets ^t-1 ＝{T ₁ ,T ₂ ,…,T _N Divide into two kinds omega ₁ And Ω ₂ ，Ω ₁ ∪Ω ₂ ＝T ^t-1 ，Ω ₁ Composed of targets in an initial state and a tracking state, Ω ₂ Composed of targets in a lost state, N being the total number of targets;

6.2: calculate Ω ₁ Each object in (1) and D ^t The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A ₁ In the formula-A ₁ To correlate the cost matrix, we will be Ω ₁ Target of (1) and D ^t The detection responses in the step (2) are correlated, and the Hungary algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega ₁ And D ^t Dividing:

D ^t ＝D ^A ∪D ^B in which>

For unassociated sets of objects, D ^B Is the set of detection responses that are not associated in the first stage. Calculating a similarity matrix A ₁ The method comprises the following specific steps:

(a) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in (1) _j Degree of appearance similarity of

And is

Wherein<*,*>Is the inner product of vectors, X _i (T-K) represents a target T _i Appearance feature vector in t-k frame, Z _j Indicating a detection response D _j The appearance feature vector of (a) ([ omega ]) _k Representing an appearance feature vector X _i Weight of (t-k), C _i (T-k) is a target T _i Matching costs with the detection response at the t-k frame.

In this embodiment, the historical appearance feature of the target in the last 6 frames, that is, K =6, is saved.

(b) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in (1) _j Shape similarity of (2)

(c) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in (1) _j Motion similarity of (2)

Is a target T _i In a prediction area of>

And detection response D _j Corresponding region

Cross-over ratio (IOU), where area (x) represents the area.

(d) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in _j Matching similarity A of ₁ (i,j):

6.3: calculate Ω ₂ Each target in (1) and D ^B The matching similarity of each detection response in the sequence table is obtained to obtain a similarity matrix A ₂ (ii) a with-A ₂ To correlate the cost matrix, we will be Ω ₂ Target of (1) and D ^B The detection responses in the step (2) are correlated, and the Hungarian algorithm is applied to solve the optimal correlation. According to the correlation result, dividing omega ₂ And D ^B Dividing:

wherein->

Is targeted and->

Is successfully associated, in response to a detection of->

Is the unassociated set of detection responses for the second stage. Calculating a similarity matrix A ₂ The method comprises the following specific steps:

(a) Calculate Ω using the above equations (3), (4), (5) ₂ Target T in (1) _i And D ^B Detection response D in (1) _j (iii) degree of appearance similarity of

Shape similarity->

(b) Calculating a target T _i Search radius r of _i ：

In the present embodiment, α is 0.15.

Wherein

For the current frame number and the target T _i The difference between the maximum frame numbers when in the tracking state, α, is a constant.

(c) With a target T _i Predicted position in current frame

Is a center, r _i For the radius, define the target T _i Search region R of _i 。

(d) Calculate Ω ₂ Target T in (1) _i And D ^B Detection response D in _j Matching similarity A of ₂ (i,j)：

Wherein I (R) _i ∩D _j >0) When detecting the response D as an indicator function _j And search region R _i In the presence of overlap, I (R) _i ∩D _j >0) =1, otherwise I (R) _i ∩D _j >0)＝0。

And 7: as shown in fig. 3, the state and characteristics of the target are updated according to the optimal association result in step 6, and the specific steps are as follows:

(a) For the

The non-associated detection response in (1) indicates that a new target may appear in the video, initializes the new target, and sets the state to the initial state. When the target of the initial state continuously appears f _init And (5) frame, distributing an ID for the target, setting state parameters, and then converting the target into a tracking state.

(b) For the

The target in (2) keeps the state of the target unchanged due to the existence of the associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame. />

(c) For

Since there is no linkAnd detecting response, converting the target state from the tracking state to a loss state, and storing the appearance characteristic vector of the target in the current frame.

(d) For

The target in (2) converts the target state from a lost state to a tracking state due to the existence of associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame.

(e) For the

The target in (2) keeps the target state unchanged because there is no associated detection response.

In this example, take f _init ＝3，f _del ＝20。

And 8: and inputting the image of the next video frame, and repeating the

steps

2,3, 4, 5, 6 and 7 until the video is finished.

The implementation effect is as follows:

based on the above steps, we performed experiments on MOT16 datasets of multi-target tracking Challenge MOT Challenge. All experiments were carried out on a PC with the main parameters: central processing unit Intel Core i 7.3GHz, 16G internal memory. The algorithm is implemented in Python language.

The result shows that the technical scheme can effectively track the detected pedestrians in the video, can also realize continuous tracking when the pedestrians are shielded or detection noise exists, and outputs the correct track of the target. Moreover, the program running efficiency is high, and 10 input images can be processed in about 1 second. The experiment shows that the multi-target tracking algorithm of the embodiment can accurately and quickly realize on-line pedestrian tracking.

In summary, the invention provides an online multi-target tracking method based on deep learning and data association. The method is widely applicable to target tracking in various video scenes, such as pedestrian tracking in video monitoring scenes, and provides technical support for an intelligent security system, and vehicle tracking in complex traffic scenes, and provides technical support for an automatic driving technology. The method follows a tracking framework based on detection, converts the on-line multi-target tracking problem into a data association problem, and firstly extracts all detection responses in an image by using a trained target detector; then extracting an appearance characteristic vector from each detection response by using a depth cosine metric learning model; calculating the association cost between different targets and detection response by combining the appearance, motion, shape and other clues of the targets; and (3) applying a Hungarian algorithm in the two-phase data association to realize the optimal matching of the target and the detection, and finally updating the target state according to the association result.

The above-mentioned embodiments further explain the background, technical solutions and benefits of the present invention in detail. It will be understood by those skilled in the art that the foregoing is only one embodiment of the present invention, and is not intended to limit the scope of the invention. It should be noted that any modification, equivalent replacement, improvement, etc. made by those skilled in the art within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An online multi-target tracking method based on deep learning and data association is characterized by comprising the following steps:

step 1: inputting an image of a current frame of a video;

step 2: obtaining a set D of all detection responses in an image using a target detector ^t ＝{D ₁ ,D ₂ ,…,D _M T is the current frame number, D _j Is the jth detection response, denoted as

Wherein->

To detect a response D _j The coordinates of the center point of (a),

to detect a response D _j M is the total number of detection responses;

and 3, step 3: from the detection response set D using a deep cosine metric learning model ^t All detection responses in (2) extract an appearance feature vector, denoted as { Z } ₁ ,Z ₂ ,…,Z _M In which Z is _j ∈R ^p To detect a response D _j The appearance characteristics of (a);

and 4, step 4: initializing target states, and dividing the target states into 4 types: an initial state, a tracking state, a lost state, and a deleted state; if T =1, i.e. the first frame of the input video, a target set T is generated ^t ＝{T ₁ ,T ₂ ,…,T _N H, N = M, target T _j And detection response D _j Correspond and target T _j Setting the state of (2) as an initial state, and turning to the step (1); otherwise, go to step 5;

Wherein +>

For the predicted center point coordinate, is>

Is the predicted width and height;

step 6: matching and associating the target with the detection response based on the two-stage data association to obtain an optimal association result;

and 8: and inputting the image of the next video frame, and repeating the steps 2,3, 4, 5, 6 and 7 until the video is finished.

2. The method for online multi-target tracking based on deep learning and data association as claimed in claim 1, wherein the step 6 is based on matching association of target states and detection responses of two-stage data association, and comprises:

(a) Based on the state of all targets in the previous frame, set T targets ^t-1 ＝{T ₁ ,T ₂ ,…,T _N Divide into two kinds omega ₁ And Ω ₂ ，Ω ₁ ∪Ω ₂ ＝T ^t-1 ，Ω ₁ Composed of targets in initial and tracking states, Ω ₂ Composed of targets in a lost state, N being the total number of targets;

(b) Calculate Ω ₁ Each target in (1) and D ^t The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A ₁ (ii) a with-A ₁ To correlate the cost matrix, we will be Ω ₁ Target of (1) and D ^t The detection responses in the step (2) are correlated, and the Hungary algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega ₁ And D ^t Dividing:

D ^t ＝D ^A ∪D ^B wherein->

Target of (1) and D ^A In response to a successful association>

(c) Calculate Ω ₂ Each object in (1) and D ^B The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A ₂ (ii) a with-A ₂ To correlate the cost matrix, we will be Ω ₂ Target of (1) and D ^B The detection responses in the step (2) are correlated, and the Hungarian algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega ₂ And D ^B Dividing:

wherein->

Is targeted and->

Is successfully associated, in response to a detection of->

For an unassociated target set, be>

Is the unassociated set of detection responses for the second stage.

3. The on-line multi-target tracking method based on deep learning and data association as claimed in claim 2, wherein the method calculates Ω ₁ Each target in (1) and D ^t The matching similarity of each detection response in (a), comprising:

And->

/>

Wherein<*,*>Is the inner product of vectors, X _i (T-K) represents a target T _i Appearance feature vector in t-k frame, Z _j Indicating a detection response D _j The appearance feature vector of (a) ([ omega ]) _k Representing an appearance feature vector X _i Weight of (t-k), C _i (T-k) is a target T _i Matching cost with detection response at the t-k frame;

(c) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in _j Motion similarity of (2)

Is a target T _i In a prediction area of>

And detection response D _j Corresponding region->

Intersection ratio (IOU), where area (x) represents the area;

(d) Calculate Ω ₁ Target T in (1) _i And D ^t Detection response D in _j Matching similarity A of ₁ (i,j)：

4. The method for on-line multi-target tracking based on deep learning and data association as claimed in claim 2, wherein the method calculates Ω ₂ Each object in (1) and D ^B The matching similarity of each detection response in (a), comprising:

(a) Calculating omega by using the above formulas (1), (2) and (3) ₂ Target T in (1) _i And D ^B Detection response D in (1) _j (iii) degree of appearance similarity of

And the shape similarity->

(b) Calculating a target T _i Search radius r of _i ：

Wherein

For the current frame number and the target T _i The difference of the maximum frame number when in the tracking state, wherein alpha is a constant; with a target T _i Predicted position->

Is a center r _i For the radius, a target T is defined _i Search region R of _i ；

5. The on-line multi-target tracking method based on deep learning and data association as claimed in claim 1, wherein the step 7: updating the state and the characteristics of the target according to the optimal correlation result in the step 6, wherein the method comprises the following steps of:

(a) For the

The non-associated detection response in the video indicates that a new target appears in the video, initializes the new target and sets the state as an initial state; when the target of the initial state continuously appears f _init Frame, distributing ID for the target, setting state parameter, and then converting the target into tracking state;

(b) For the

(c) For the

The target in (2) converts the target state from a tracking state to a loss state due to no associated detection response, and stores the appearance characteristic vector of the target in the current frame;

(d) For the

The target in (1) is converted from a lost state into a tracking state due to the existence of associated detection response, the state of the target is updated by applying a Kalman filtering algorithm, and the appearance characteristic vector of the target in the current frame is stored;

(e) For

The target in (1), because there is no associated detection response, the target state is kept unchanged;

(f) When the target continues to f _del If the frame is in the lost state, the frame is converted into the deletion state, and the target is destroyed.