CN110288627B - Online multi-target tracking method based on deep learning and data association - Google Patents

Online multi-target tracking method based on deep learning and data association Download PDF

Info

Publication number
CN110288627B
CN110288627B CN201910429444.XA CN201910429444A CN110288627B CN 110288627 B CN110288627 B CN 110288627B CN 201910429444 A CN201910429444 A CN 201910429444A CN 110288627 B CN110288627 B CN 110288627B
Authority
CN
China
Prior art keywords
target
state
detection
detection response
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910429444.XA
Other languages
Chinese (zh)
Other versions
CN110288627A (en
Inventor
陈小波
冀建宇
王彦钧
蔡英凤
王海
陈龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN201910429444.XA priority Critical patent/CN110288627B/en
Publication of CN110288627A publication Critical patent/CN110288627A/en
Application granted granted Critical
Publication of CN110288627B publication Critical patent/CN110288627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The invention discloses an online multi-target tracking method based on deep learning and data association, which comprises the following steps: 1. inputting an image of a current frame of a video; 2. obtaining all detection responses in the image by using a target detector; 3. extracting appearance characteristics of detection response by using a depth cosine metric learning model; 4. initializing a target state; 5. predicting the position and the scale of the target in the next frame by using a Kalman filtering algorithm; 6. matching and associating the target and the detection response based on the two-stage data association to obtain an optimal association result; 7. updating the state and the characteristics of the target according to the optimal correlation result in the step 6; 8. and inputting the image of the next video frame, and repeating the steps 2,3, 4, 5, 6 and 7 until the video is finished. Compared with the prior art, the method can realize correct association between the targets under the complex conditions of target interaction and shielding, similar appearance among the targets and the like, and complete robust and continuous multi-target tracking.

Description

Online multi-target tracking method based on deep learning and data association
Technical Field
The invention relates to a target tracking method, in particular to an online multi-target tracking method based on deep learning and data association, and belongs to the field of computer vision.
Background
The multi-target tracking technology is a particularly important branch in the field of computer vision, and is widely applied to various video analysis scenes, such as automatic driving of automobiles, robot navigation, intelligent traffic video monitoring, motion analysis and the like.
The task of online multi-target tracking is to reliably estimate the position of a target frame by frame and to track the same target across frames to estimate the trajectories of multiple targets. In recent years, due to the development of deep learning, the performance of a target detection algorithm is continuously improved, the detection response is more reliable, a Tracking-by-detection (Tracking-by-detection) framework based on detection is widely concerned, remarkable effects are obtained, and the target detection algorithm becomes the mainstream of the current multi-target Tracking. Under the tracking framework, firstly, a target detector trained offline is used for independently detecting targets in each frame of image to obtain the number and the positions of the targets, and then the targets detected in adjacent frames are associated according to the information of the appearance, the motion and the like of the targets to realize the matching and the tracking of the targets. Detection-based tracking algorithms can be divided into two categories: offline tracking and online tracking.
At present, a detection-based tracking algorithm also faces a plurality of challenges, the tracking effect depends heavily on the performance of a detector, and in a complex scene, when a target and an obstacle or a target are seriously shielded, a multi-target tracking algorithm is easy to lose track of the target or the number of the target is disordered. Secondly, the detection noise of the target detector and the drastic change of the target scale can also cause the tracking drift of the multi-target tracking algorithm.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides an online multi-target tracking method based on deep learning and data association, and aims to solve the problems of serious number switching, tracking drift and the like in the conventional multi-target tracking technology when targets with similar appearances are shielded mutually in a complex scene.
The invention provides a novel multi-target tracking method, which solves the problem of multi-target tracking from multiple angles. 1) The appearance model of the target is designed by adopting a deep cosine metric learning model, the characteristics are extracted from the target image by utilizing a multilayer convolution network, and the cosine among characteristic vectors is taken as the similarity among the appearances of the target, so that the effective identification of different appearances of the target is realized; 2) In consideration of the continuity of dynamic change of the target appearance, a target appearance similarity measurement method fusing multi-frame historical appearance characteristics is constructed, so that the influence of the defects of a detector or the mutual shielding of targets on the target matching precision can be effectively relieved; 3) A two-phase data association method based on a target state is provided, corresponding association strategies are respectively designed aiming at the reliability of the target, and a Hungarian algorithm is adopted for data association. Under the complex traffic scene of congestion and frequent shielding, the algorithm can realize accurate and stable multi-target tracking.
The technical scheme is as follows: an online multi-target tracking method based on deep learning and data association is characterized by comprising the following steps:
step 1: inputting an image of a current frame of a video;
step 2: obtaining a set D of all detection responses in an image using a target detector t ={D 1 ,D 2 ,…,D M T is the currentFrame number, D j Is the jth detection response, denoted as
Figure GDA0003892770730000021
Wherein->
Figure GDA0003892770730000022
To detect a response D j Is based on the center point coordinates of>
Figure GDA0003892770730000023
To detect a response D j M is the total number of detection responses;
and 3, step 3: from the detection response set D using a deep cosine metric learning model t All detection responses in (a) extract an appearance feature vector, denoted as { Z } 1 ,Z 2 ,…,Z M In which Z is j ∈R p To detect a response D j The appearance characteristics of (a);
and 4, step 4: initializing the target state, and dividing the target state into 4 types: an initial state, a tracking state, a loss state, and a deletion state; if T =1, i.e. the first frame of the input video, the target set T is generated t ={T 1 ,T 2 ,…,T N H, N = M, target T j And detection response D j Correspond and target T j Setting the state of (1) as an initial state, and turning to the step (1); otherwise, go to step 5;
and 5: predicting a target set T by applying a Kalman filtering algorithm t-1 Each target T in (1) i The position and scale in the current frame are expressed as
Figure GDA0003892770730000024
Wherein +>
Figure GDA0003892770730000025
For the predicted center point coordinate, is>
Figure GDA0003892770730000026
Predicted width and height;
step 6: matching and associating the target and the detection response based on the two-stage data association to obtain an optimal association result;
and 7: updating the state and the characteristics of the target according to the optimal correlation result in the step 6;
and step 8: and inputting the image of the next video frame, and repeating the steps 2,3, 4, 5, 6 and 7 until the video is finished.
Preferably, the step 6 is based on the matching association of the target status and the detection response of the two-stage data association, and comprises the following steps:
(a) Based on the state of all targets in the previous frame, set T targets t-1 ={T 1 ,T 2 ,…,T N Divide into two categories omega 1 And Ω 2 ,Ω 1 ∪Ω 2 =T t-1 ,Ω 1 Composed of targets in initial and tracking states, Ω 2 Composed of targets in lost state, N is the total number of targets;
(b) Calculate Ω 1 Each target in (1) and D t The matching similarity of each detection response in the sequence table is obtained to obtain a similarity matrix A 1 (ii) a with-A 1 To correlate the cost matrix, we will be Ω 1 Target of (1) and D t The detection responses in the step (2) are correlated, and the Hungary algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega 1 And D t Dividing:
Figure GDA0003892770730000027
D t =D A ∪D B wherein->
Figure GDA0003892770730000028
Target of (1) and D A Is successfully associated, in response to a detection of->
Figure GDA0003892770730000029
For unassociated target sets, D B A set of detection responses that are not associated with the first stage;
(c) Calculate Ω 2 Each target in (1) and D B The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A 2 (ii) a with-A 2 To correlate the cost matrix, will Ω 2 Target of (1) and D B The detection responses in the step (2) are correlated, and the Hungarian algorithm is applied to solve the optimal correlation. According to the correlation result, dividing omega 2 And D B Dividing:
Figure GDA00038927707300000210
Figure GDA00038927707300000211
wherein +>
Figure GDA00038927707300000212
With a target in>
Figure GDA00038927707300000213
Is successfully associated, in response to a detection of->
Figure GDA00038927707300000214
For unassociated sets of targets, based on the number of associated target sets, a method of determining the number of target sets in a cluster based on the number of target sets in a cluster>
Figure GDA00038927707300000215
Is the unassociated set of detection responses for the second stage.
Preferably, the method calculates Ω 1 Each object in (1) and D t The matching similarity of each detection response in (1), comprising:
(a) Calculate Ω 1 Target T in (1) i And D t Detection response D in j Degree of appearance similarity of
Figure GDA00038927707300000216
Figure GDA00038927707300000217
Figure GDA0003892770730000031
And->
Figure GDA0003892770730000032
Wherein<*,*>Is the inner product of vectors, X i (T-K) represents a target T i Appearance feature vector in t-k frame, z j Indicating a detection response D j Apparent feature vector of (a), ω k Representing an appearance feature vector X i Weight of (t-k), C i (T-k) is a target T i Matching cost with detection response at the t-k frame;
(b) Calculate Ω 1 Target T in (1) i And D t Detection response D in (1) j Similarity of shape of
Figure GDA0003892770730000033
Figure GDA0003892770730000034
(c) Calculate Ω 1 Target T in (1) i And D t Detection response D in j Motion similarity of
Figure GDA0003892770730000035
Figure GDA0003892770730000036
Figure GDA0003892770730000037
Is a target T i In a prediction area of>
Figure GDA0003892770730000038
And detection response D j Corresponding region
Figure GDA0003892770730000039
Cross-over ratio (IOU), where area (×) represents the area;
(d) Calculate Ω 1 Target T in (1) i And D t Detection response D in (1) j Matching similarity A of 1 (i,j):
Figure GDA00038927707300000310
Preferably, the method calculates Ω 2 Each target in (1) and D B The matching similarity of each detection response in (1), comprising:
(a) Calculate Ω using the above equations (1), (2) and (3) 2 Target T in (1) i And D B Detection response D in j (iii) degree of appearance similarity of
Figure GDA00038927707300000311
And shape similarity>
Figure GDA00038927707300000312
(b) Calculating a target T i Search radius r of i
Figure GDA00038927707300000313
Wherein
Figure GDA00038927707300000314
For the current frame number and the target T i The difference between the maximum frame numbers when in the tracking state, α, is a constant. With a target T i Prediction position in a current frame>
Figure GDA00038927707300000315
Is a center, r i For the radius, a target T is defined i Search region R of i
(c) Calculate Ω 2 Target T in (1) i And the detection response set D B Detection response D in (1) j Matching similarity A of 2 (i,j):
Figure GDA00038927707300000316
Wherein I (R) i ∩D j >0) To indicate the function, when searching for the region R i And detection response D j In the presence of overlap, I (R) i ∩D j >0) =1, otherwise I (R) i ∩D j >0)=0。
Preferably, the step 7: updating the state and the characteristics of the target according to the optimal correlation result in the step 6, wherein the method comprises the following steps:
(a) For the
Figure GDA00038927707300000317
The non-associated detection response in (1) indicates that a new target may appear in the video, initializes the new target, and sets the state to the initial state. When the target of the initial state continuously appears f init Frame, distributing ID for the target, setting state parameter, and then converting the target into tracking state;
(b) For
Figure GDA0003892770730000041
The target in (1) keeps the state of the target unchanged due to the existence of associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame;
(c) For
Figure GDA0003892770730000042
The target in (1) converts the target state from a tracking state to a loss state due to no associated detection response, and stores the appearance characteristic vector of the target in the current frame;
(d) For
Figure GDA0003892770730000043
The target in (2) converts the target state from a lost state to a tracking state due to the existence of associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame;
(e) For
Figure GDA0003892770730000044
The target in (2), because there is no associated detection response, the target state is kept unchanged;
(f) When the target continues to f del And if the frame is in a lost state, converting the frame into a deletion state, and destroying the target.
Has the beneficial effects that: 1. according to the method, the appearance model of the target is learned by adopting a depth cosine metric learning model, the characteristics are extracted from the target image by utilizing a multilayer convolution network, the cosine between characteristic vectors is taken as the similarity between the appearances of the targets, the effective identification of the appearances of different targets is realized, and the problem of ID switching caused by the interaction of the targets with similar appearances in a complex scene is effectively solved; 2. according to the method, the continuity of dynamic change of the target appearance is considered, and the influence of detector defects or mutual shielding of targets on matching precision is effectively relieved by constructing a target appearance similarity measurement method integrating multi-frame historical appearance characteristics; 3. according to the invention, by adopting a two-stage data association method based on the target state, corresponding association strategies are respectively designed aiming at different states of the target, and the Hungarian algorithm is adopted to carry out data association, so that the problem of track fracture (Fragment) caused by data association failure is effectively solved.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a framework of a deep cosine metric learning model of the present invention;
FIG. 3 is a target state transition diagram of the present invention.
Detailed Description
The technical solution of the present invention will be further explained in detail with reference to the drawings and the specific embodiments, taking the on-line multi-target tracking of pedestrians as an example, but the scope of protection of the present invention is not limited to the following embodiments.
An off-line training stage:
off-line training of a deep cosine metric learning model:
given a set of training samples { (x) i ,y i ) I =1,2,3, \ 8230;, L }, where x is i ∈R 128×64 For normalized pedestrian images, y i E {1,2,3, \8230;, K } is the corresponding pedestrian category label, and L is the number of training samples. The deep cosine metric learning model learns a feature extraction function f (x) from a training sample, an input pedestrian image x is mapped into an embedded feature space, and then a cosine softmax classifier is applied to the embedded feature space to maximize the posterior probability of classification. The cosine softmax classifier is defined as follows:
Figure GDA0003892770730000045
wherein
Figure GDA0003892770730000046
As a normalized weight vector, ω k Is a weight vector of class k, τ is a scale parameter, f (x) is a feature vector extracted from the image, f (x) has a unit length. Due to->
Figure GDA0003892770730000051
And f (x) each have a unit length of
Figure GDA0003892770730000052
Expressed as the cosine of the angle between two vectors, the angle between each class of targets and its corresponding weight vector can be reduced by maximizing the posterior probability P (y = k | f (x)).
The cross entropy loss function used to train the deep cosine metric learning model is:
Figure GDA0003892770730000053
wherein I (y) i K) is an indicator function when y i K, I (y) i = k) =1, otherwise I (y) i =k)=0。
In this embodiment, a convolutional neural network CNN is used to implement the feature extraction function f (x), the structure of the CNN is as shown in fig. 2, the size of the input image is 128 × 64, the length of the output feature vector is 128, and the activation function of each layer is an Exponential Linear Unit (ELU). And training the network by using the pedestrian images in the Market-1501 database, and updating network parameters by using an Adam optimization method.
An online pedestrian multi-target tracking stage:
specifically, as shown in fig. 1, the invention provides an online multi-target tracking method based on deep learning and data association, and the method has the key technical steps as follows:
step 1: inputting an image of a current frame of a video;
step 2: using a detector to obtain a set D of all detection responses in an image t ={D 1 ,D 2 ,…,D M T is the current frame number, D j Is the jth detection response, denoted as
Figure GDA0003892770730000054
Wherein->
Figure GDA0003892770730000055
To detect a response D j Is based on the center point coordinates of>
Figure GDA0003892770730000056
To detect a response D j M is the total number of detection responses;
in the present embodiment, the pedestrian detector used is a DPM (Deformable Parts Model).
And 3, step 3: detecting a response set D by using the offline trained deep cosine metric learning model t All detection responses in (2) extract an appearance feature vector, denoted as { Z } 1 ,Z 2 ,…,Z M In which Z is j ∈R p To detect a response D j Extracted appearance features;
and 4, step 4: the target state is initialized. Target states are classified into 4 classes: initial state, tracking state, loss state, and delete state. If T =1, i.e. the first frame of the input video, a target set T is generated t ={T 1 ,T 2 ,…,T N H, N = M, target T j And detection response D j Correspond and target T j Is set as an initial state, and step 1 is carried out. Otherwise, go to step 5.
And 5: predicting a target set T by applying a Kalman filtering algorithm t-1 Each target T in (1) i The position and scale in the current frame are expressed as
Figure GDA0003892770730000057
Wherein +>
Figure GDA0003892770730000058
For the predicted center point coordinate, is>
Figure GDA0003892770730000059
Is the predicted width and height;
step 6: matching and associating the target and the detection response based on the two-stage data association to obtain an optimal association result;
6.1: based on the state of all targets in the previous frame, set T targets t-1 ={T 1 ,T 2 ,…,T N Divide into two kinds omega 1 And Ω 2 ,Ω 1 ∪Ω 2 =T t-1 ,Ω 1 Composed of targets in an initial state and a tracking state, Ω 2 Composed of targets in a lost state, N being the total number of targets;
6.2: calculate Ω 1 Each object in (1) and D t The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A 1 In the formula-A 1 To correlate the cost matrix, we will be Ω 1 Target of (1) and D t The detection responses in the step (2) are correlated, and the Hungary algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega 1 And D t Dividing:
Figure GDA0003892770730000061
D t =D A ∪D B in which>
Figure GDA00038927707300000624
Target of (1) and D A Is successfully associated, in response to a detection of->
Figure GDA0003892770730000062
For unassociated sets of objects, D B Is the set of detection responses that are not associated in the first stage. Calculating a similarity matrix A 1 The method comprises the following specific steps:
(a) Calculate Ω 1 Target T in (1) i And D t Detection response D in (1) j Degree of appearance similarity of
Figure GDA0003892770730000063
Figure GDA0003892770730000064
Figure GDA0003892770730000065
And is
Figure GDA0003892770730000066
Wherein<*,*>Is the inner product of vectors, X i (T-K) represents a target T i Appearance feature vector in t-k frame, Z j Indicating a detection response D j The appearance feature vector of (a) ([ omega ]) k Representing an appearance feature vector X i Weight of (t-k), C i (T-k) is a target T i Matching costs with the detection response at the t-k frame.
In this embodiment, the historical appearance feature of the target in the last 6 frames, that is, K =6, is saved.
(b) Calculate Ω 1 Target T in (1) i And D t Detection response D in (1) j Shape similarity of (2)
Figure GDA0003892770730000067
Figure GDA0003892770730000068
(c) Calculate Ω 1 Target T in (1) i And D t Detection response D in (1) j Motion similarity of (2)
Figure GDA0003892770730000069
Figure GDA00038927707300000610
Figure GDA00038927707300000611
Is a target T i In a prediction area of>
Figure GDA00038927707300000612
And detection response D j Corresponding region
Figure GDA00038927707300000613
Cross-over ratio (IOU), where area (x) represents the area.
(d) Calculate Ω 1 Target T in (1) i And D t Detection response D in j Matching similarity A of 1 (i,j):
Figure GDA00038927707300000614
6.3: calculate Ω 2 Each target in (1) and D B The matching similarity of each detection response in the sequence table is obtained to obtain a similarity matrix A 2 (ii) a with-A 2 To correlate the cost matrix, we will be Ω 2 Target of (1) and D B The detection responses in the step (2) are correlated, and the Hungarian algorithm is applied to solve the optimal correlation. According to the correlation result, dividing omega 2 And D B Dividing:
Figure GDA00038927707300000615
Figure GDA00038927707300000616
wherein->
Figure GDA00038927707300000617
Is targeted and->
Figure GDA00038927707300000618
Is successfully associated, in response to a detection of->
Figure GDA00038927707300000619
For unassociated sets of targets, based on the number of associated target sets, a method of determining the number of target sets in a cluster based on the number of target sets in a cluster>
Figure GDA00038927707300000620
Is the unassociated set of detection responses for the second stage. Calculating a similarity matrix A 2 The method comprises the following specific steps:
(a) Calculate Ω using the above equations (3), (4), (5) 2 Target T in (1) i And D B Detection response D in (1) j (iii) degree of appearance similarity of
Figure GDA00038927707300000621
Shape similarity->
Figure GDA00038927707300000622
(b) Calculating a target T i Search radius r of i
Figure GDA00038927707300000623
In the present embodiment, α is 0.15.
Wherein
Figure GDA0003892770730000071
For the current frame number and the target T i The difference between the maximum frame numbers when in the tracking state, α, is a constant.
(c) With a target T i Predicted position in current frame
Figure GDA0003892770730000072
Is a center, r i For the radius, define the target T i Search region R of i
(d) Calculate Ω 2 Target T in (1) i And D B Detection response D in j Matching similarity A of 2 (i,j):
Figure GDA0003892770730000073
Wherein I (R) i ∩D j >0) When detecting the response D as an indicator function j And search region R i In the presence of overlap, I (R) i ∩D j >0) =1, otherwise I (R) i ∩D j >0)=0。
And 7: as shown in fig. 3, the state and characteristics of the target are updated according to the optimal association result in step 6, and the specific steps are as follows:
(a) For the
Figure GDA0003892770730000074
The non-associated detection response in (1) indicates that a new target may appear in the video, initializes the new target, and sets the state to the initial state. When the target of the initial state continuously appears f init And (5) frame, distributing an ID for the target, setting state parameters, and then converting the target into a tracking state.
(b) For the
Figure GDA0003892770730000075
The target in (2) keeps the state of the target unchanged due to the existence of the associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame. />
(c) For
Figure GDA0003892770730000076
Since there is no linkAnd detecting response, converting the target state from the tracking state to a loss state, and storing the appearance characteristic vector of the target in the current frame.
(d) For
Figure GDA0003892770730000077
The target in (2) converts the target state from a lost state to a tracking state due to the existence of associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame.
(e) For the
Figure GDA0003892770730000078
The target in (2) keeps the target state unchanged because there is no associated detection response.
(f) When the target continues to f del And if the frame is in a lost state, converting the frame into a deletion state, and destroying the target.
In this example, take f init =3,f del =20。
And 8: and inputting the image of the next video frame, and repeating the steps 2,3, 4, 5, 6 and 7 until the video is finished.
The implementation effect is as follows:
based on the above steps, we performed experiments on MOT16 datasets of multi-target tracking Challenge MOT Challenge. All experiments were carried out on a PC with the main parameters: central processing unit Intel Core i 7.3GHz, 16G internal memory. The algorithm is implemented in Python language.
The result shows that the technical scheme can effectively track the detected pedestrians in the video, can also realize continuous tracking when the pedestrians are shielded or detection noise exists, and outputs the correct track of the target. Moreover, the program running efficiency is high, and 10 input images can be processed in about 1 second. The experiment shows that the multi-target tracking algorithm of the embodiment can accurately and quickly realize on-line pedestrian tracking.
In summary, the invention provides an online multi-target tracking method based on deep learning and data association. The method is widely applicable to target tracking in various video scenes, such as pedestrian tracking in video monitoring scenes, and provides technical support for an intelligent security system, and vehicle tracking in complex traffic scenes, and provides technical support for an automatic driving technology. The method follows a tracking framework based on detection, converts the on-line multi-target tracking problem into a data association problem, and firstly extracts all detection responses in an image by using a trained target detector; then extracting an appearance characteristic vector from each detection response by using a depth cosine metric learning model; calculating the association cost between different targets and detection response by combining the appearance, motion, shape and other clues of the targets; and (3) applying a Hungarian algorithm in the two-phase data association to realize the optimal matching of the target and the detection, and finally updating the target state according to the association result.
The above-mentioned embodiments further explain the background, technical solutions and benefits of the present invention in detail. It will be understood by those skilled in the art that the foregoing is only one embodiment of the present invention, and is not intended to limit the scope of the invention. It should be noted that any modification, equivalent replacement, improvement, etc. made by those skilled in the art within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. An online multi-target tracking method based on deep learning and data association is characterized by comprising the following steps:
step 1: inputting an image of a current frame of a video;
step 2: obtaining a set D of all detection responses in an image using a target detector t ={D 1 ,D 2 ,…,D M T is the current frame number, D j Is the jth detection response, denoted as
Figure FDA0003892770720000011
Wherein->
Figure FDA0003892770720000012
To detect a response D j The coordinates of the center point of (a),
Figure FDA0003892770720000013
to detect a response D j M is the total number of detection responses;
and 3, step 3: from the detection response set D using a deep cosine metric learning model t All detection responses in (2) extract an appearance feature vector, denoted as { Z } 1 ,Z 2 ,…,Z M In which Z is j ∈R p To detect a response D j The appearance characteristics of (a);
and 4, step 4: initializing target states, and dividing the target states into 4 types: an initial state, a tracking state, a lost state, and a deleted state; if T =1, i.e. the first frame of the input video, a target set T is generated t ={T 1 ,T 2 ,…,T N H, N = M, target T j And detection response D j Correspond and target T j Setting the state of (2) as an initial state, and turning to the step (1); otherwise, go to step 5;
and 5: predicting a target set T by applying a Kalman filtering algorithm t-1 Each target T in (1) i The position and scale in the current frame are expressed as
Figure FDA0003892770720000014
Wherein +>
Figure FDA0003892770720000015
For the predicted center point coordinate, is>
Figure FDA0003892770720000016
Is the predicted width and height;
step 6: matching and associating the target with the detection response based on the two-stage data association to obtain an optimal association result;
and 7: updating the state and the characteristics of the target according to the optimal correlation result in the step 6;
and 8: and inputting the image of the next video frame, and repeating the steps 2,3, 4, 5, 6 and 7 until the video is finished.
2. The method for online multi-target tracking based on deep learning and data association as claimed in claim 1, wherein the step 6 is based on matching association of target states and detection responses of two-stage data association, and comprises:
(a) Based on the state of all targets in the previous frame, set T targets t-1 ={T 1 ,T 2 ,…,T N Divide into two kinds omega 1 And Ω 2 ,Ω 1 ∪Ω 2 =T t-1 ,Ω 1 Composed of targets in initial and tracking states, Ω 2 Composed of targets in a lost state, N being the total number of targets;
(b) Calculate Ω 1 Each target in (1) and D t The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A 1 (ii) a with-A 1 To correlate the cost matrix, we will be Ω 1 Target of (1) and D t The detection responses in the step (2) are correlated, and the Hungary algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega 1 And D t Dividing:
Figure FDA0003892770720000017
D t =D A ∪D B wherein->
Figure FDA0003892770720000018
Target of (1) and D A In response to a successful association>
Figure FDA0003892770720000019
For unassociated target sets, D B A set of detection responses that are not associated with the first stage;
(c) Calculate Ω 2 Each object in (1) and D B The matching similarity of each detection response in the sequence is obtained to obtain a similarity matrix A 2 (ii) a with-A 2 To correlate the cost matrix, we will be Ω 2 Target of (1) and D B The detection responses in the step (2) are correlated, and the Hungarian algorithm is applied to solve the optimal correlation; according to the correlation result, dividing omega 2 And D B Dividing:
Figure FDA0003892770720000021
wherein->
Figure FDA0003892770720000022
Is targeted and->
Figure FDA0003892770720000023
Is successfully associated, in response to a detection of->
Figure FDA0003892770720000024
For an unassociated target set, be>
Figure FDA0003892770720000025
Is the unassociated set of detection responses for the second stage.
3. The on-line multi-target tracking method based on deep learning and data association as claimed in claim 2, wherein the method calculates Ω 1 Each target in (1) and D t The matching similarity of each detection response in (a), comprising:
(a) Calculate Ω 1 Target T in (1) i And D t Detection response D in (1) j Degree of appearance similarity of
Figure FDA0003892770720000026
Figure FDA0003892770720000027
Figure FDA0003892770720000028
And->
Figure FDA0003892770720000029
/>
Wherein<*,*>Is the inner product of vectors, X i (T-K) represents a target T i Appearance feature vector in t-k frame, Z j Indicating a detection response D j The appearance feature vector of (a) ([ omega ]) k Representing an appearance feature vector X i Weight of (t-k), C i (T-k) is a target T i Matching cost with detection response at the t-k frame;
(b) Calculate Ω 1 Target T in (1) i And D t Detection response D in (1) j Similarity of shape of
Figure FDA00038927707200000210
Figure FDA00038927707200000211
(c) Calculate Ω 1 Target T in (1) i And D t Detection response D in j Motion similarity of (2)
Figure FDA00038927707200000212
Figure FDA00038927707200000213
Figure FDA00038927707200000214
Is a target T i In a prediction area of>
Figure FDA00038927707200000215
And detection response D j Corresponding region->
Figure FDA00038927707200000216
Intersection ratio (IOU), where area (x) represents the area;
(d) Calculate Ω 1 Target T in (1) i And D t Detection response D in j Matching similarity A of 1 (i,j):
Figure FDA00038927707200000217
4. The method for on-line multi-target tracking based on deep learning and data association as claimed in claim 2, wherein the method calculates Ω 2 Each object in (1) and D B The matching similarity of each detection response in (a), comprising:
(a) Calculating omega by using the above formulas (1), (2) and (3) 2 Target T in (1) i And D B Detection response D in (1) j (iii) degree of appearance similarity of
Figure FDA00038927707200000218
And the shape similarity->
Figure FDA00038927707200000219
(b) Calculating a target T i Search radius r of i
Figure FDA00038927707200000220
Wherein
Figure FDA0003892770720000031
For the current frame number and the target T i The difference of the maximum frame number when in the tracking state, wherein alpha is a constant; with a target T i Predicted position->
Figure FDA0003892770720000032
Is a center r i For the radius, a target T is defined i Search region R of i
(c) Calculate Ω 2 Target T in (1) i And the detection response set D B Detection response D in (1) j Matching similarity A of 2 (i,j):
Figure FDA0003892770720000033
Wherein I (R) i ∩D j >0) To indicate the function, when searching for the region R i And detection response D j In the presence of overlap, I (R) i ∩D j >0) =1, otherwise I (R) i ∩D j >0)=0。
5. The on-line multi-target tracking method based on deep learning and data association as claimed in claim 1, wherein the step 7: updating the state and the characteristics of the target according to the optimal correlation result in the step 6, wherein the method comprises the following steps of:
(a) For the
Figure FDA0003892770720000034
The non-associated detection response in the video indicates that a new target appears in the video, initializes the new target and sets the state as an initial state; when the target of the initial state continuously appears f init Frame, distributing ID for the target, setting state parameter, and then converting the target into tracking state;
(b) For the
Figure FDA0003892770720000035
The target in (1) keeps the state of the target unchanged due to the existence of associated detection response, updates the state of the target by applying a Kalman filtering algorithm, and stores the appearance characteristic vector of the target in the current frame;
(c) For the
Figure FDA0003892770720000036
The target in (2) converts the target state from a tracking state to a loss state due to no associated detection response, and stores the appearance characteristic vector of the target in the current frame;
(d) For the
Figure FDA0003892770720000037
The target in (1) is converted from a lost state into a tracking state due to the existence of associated detection response, the state of the target is updated by applying a Kalman filtering algorithm, and the appearance characteristic vector of the target in the current frame is stored;
(e) For
Figure FDA0003892770720000038
The target in (1), because there is no associated detection response, the target state is kept unchanged;
(f) When the target continues to f del If the frame is in the lost state, the frame is converted into the deletion state, and the target is destroyed.
CN201910429444.XA 2019-05-22 2019-05-22 Online multi-target tracking method based on deep learning and data association Active CN110288627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910429444.XA CN110288627B (en) 2019-05-22 2019-05-22 Online multi-target tracking method based on deep learning and data association

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910429444.XA CN110288627B (en) 2019-05-22 2019-05-22 Online multi-target tracking method based on deep learning and data association

Publications (2)

Publication Number Publication Date
CN110288627A CN110288627A (en) 2019-09-27
CN110288627B true CN110288627B (en) 2023-03-31

Family

ID=68002271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910429444.XA Active CN110288627B (en) 2019-05-22 2019-05-22 Online multi-target tracking method based on deep learning and data association

Country Status (1)

Country Link
CN (1) CN110288627B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581496A (en) * 2019-09-29 2021-03-30 四川大学 Multi-target pedestrian trajectory tracking method based on reinforcement learning
CN110796687B (en) * 2019-10-30 2022-04-01 电子科技大学 Sky background infrared imaging multi-target tracking method
CN113077495B (en) * 2020-01-06 2023-01-31 广州汽车集团股份有限公司 Online multi-target tracking method, system, computer equipment and readable storage medium
CN111476826A (en) * 2020-04-10 2020-07-31 电子科技大学 Multi-target vehicle tracking method based on SSD target detection
CN111932588B (en) * 2020-08-07 2024-01-30 浙江大学 Tracking method of airborne unmanned aerial vehicle multi-target tracking system based on deep learning
CN112163473A (en) * 2020-09-15 2021-01-01 郑州金惠计算机系统工程有限公司 Multi-target tracking method and device, electronic equipment and computer storage medium
CN112149762A (en) * 2020-11-24 2020-12-29 北京沃东天骏信息技术有限公司 Target tracking method, target tracking apparatus, and computer-readable storage medium
CN117292327A (en) * 2023-11-23 2023-12-26 安徽启新明智科技有限公司 Method, device, equipment and medium for associating targets
CN117495917B (en) * 2024-01-03 2024-03-26 山东科技大学 Multi-target tracking method based on JDE multi-task network model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104835178B (en) * 2015-02-02 2017-08-18 郑州轻工业学院 A kind of tracking of low signal-to-noise ratio moving small target is with knowing method for distinguishing
CN109360226B (en) * 2018-10-17 2021-09-24 武汉大学 Multi-target tracking method based on time series multi-feature fusion

Also Published As

Publication number Publication date
CN110288627A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110288627B (en) Online multi-target tracking method based on deep learning and data association
CN109360226B (en) Multi-target tracking method based on time series multi-feature fusion
Tsintotas et al. Assigning visual words to places for loop closure detection
US11393103B2 (en) Target tracking method, device, system and non-transitory computer readable medium
CN107145862B (en) Multi-feature matching multi-target tracking method based on Hough forest
CN107516321B (en) Video multi-target tracking method and device
Lee et al. Place recognition using straight lines for vision-based SLAM
Jia et al. Obstacle detection in single images with deep neural networks
CN106934817B (en) Multi-attribute-based multi-target tracking method and device
Tsintotas et al. DOSeqSLAM: Dynamic on-line sequence based loop closure detection algorithm for SLAM
Kim et al. Online tracker optimization for multi-pedestrian tracking using a moving vehicle camera
CN111626194A (en) Pedestrian multi-target tracking method using depth correlation measurement
CN111931571B (en) Video character target tracking method based on online enhanced detection and electronic equipment
Bashar et al. Multiple object tracking in recent times: A literature review
Li et al. A novel intuitionistic fuzzy clustering algorithm based on feature selection for multiple object tracking
Urdiales et al. An improved deep learning architecture for multi-object tracking systems
Wang et al. Multiple pedestrian tracking with graph attention map on urban road scene
Ray et al. An efficient approach for object detection and tracking of objects in a video with variable background
Salimpour et al. Self-calibrating anomaly and change detection for autonomous inspection robots
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN113012193B (en) Multi-pedestrian tracking method based on deep learning
CN116912763A (en) Multi-pedestrian re-recognition method integrating gait face modes
Mancusi et al. TrackFlow: Multi-Object Tracking with Normalizing Flows
Badal et al. Online multi-object tracking: multiple instance based target appearance model
CN115457082A (en) Pedestrian multi-target tracking algorithm based on multi-feature fusion enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant