CN112116629A - End-to-end multi-target tracking method using global response graph - Google Patents

End-to-end multi-target tracking method using global response graph Download PDF

Info

Publication number
CN112116629A
CN112116629A CN202010802373.6A CN202010802373A CN112116629A CN 112116629 A CN112116629 A CN 112116629A CN 202010802373 A CN202010802373 A CN 202010802373A CN 112116629 A CN112116629 A CN 112116629A
Authority
CN
China
Prior art keywords
target
response
global
tracked
targets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010802373.6A
Other languages
Chinese (zh)
Inventor
王进军
万星宇
曹佳恺
周三平
邓烨
辛晓萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010802373.6A priority Critical patent/CN112116629A/en
Publication of CN112116629A publication Critical patent/CN112116629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses an end-to-end multi-target tracking method using a global response diagram, which comprises the following steps: 1) expressing the motion characteristics of the tracked target: selecting the motion attributes of all tracked targets from the data set, and carrying out normalization processing on the motion attributes; 2) generating an attribute label of the training sample corresponding to the step 1): generating attribute labels of actual existing states of the training samples by using a logic inference method based on historical state information; 3) target positioning: training a global response graph for target localization with an improved salient target detection sub-network using training data; 4) predicting a target location change; 5) distance measurement, namely calculating the IOU distance between an observation space and a state space; 6) track connection: and constructing a global cost matrix, and performing optimal allocation by using the Hungarian algorithm through minimizing the cost matrix, thereby obtaining a final target track. The invention can realize the end-to-end multi-target tracking in the real sense efficiently.

Description

End-to-end multi-target tracking method using global response graph
Technical Field
The invention belongs to the field of target tracking of computer vision, and particularly relates to an end-to-end multi-target tracking method using a global response diagram.
Background
Target tracking is one of the important areas of research in computer vision and pattern recognition and digital image processing. By target tracking, it is meant the use of an image measurement and predictive dynamic model to continuously estimate the state of one or more targets over a continuous sequence of video frames. The main challenge of multi-target tracking is to continuously and effectively model multiple objects with high uncertainty in a complex scene, where the uncertainty includes occlusion between targets, occlusion between a target and a background, change of illumination, blurring of motion, and false alarm. There are three key problems to be solved in the multi-target tracking algorithm framework: 1) modeling dynamic motion patterns of a plurality of objects; 2) processing the situation that a target enters or leaves from a scene; 3) and the robustness of the tracking result when the target is shielded and the appearance or the background changes is maintained. Single target tracking algorithms focus primarily on solving problems 1) and 3), and the tracking effect that can be obtained by simply applying multiple single target trackers to the multi-target tracking problem is generally not ideal due to problem 2).
The prior art mainly focuses on a technical strategy of 'detection before tracking' to perform multi-target tracking. In this algorithmic framework, the results of object detection are expressed in the form of four point coordinates of a rectangular box and can be extracted from the video sequence to be used as a priori information in the tracking phase. Therefore, in this framework, the multi-target tracking problem evolves into a data association problem, which aims to find a suitable measurement method to connect target detection results into motion tracks frame by frame. The accuracy of the multi-target tracking algorithm of 'detection before tracking' mainly depends on two key factors: 1) the quality of the target detection result, that is, once the detection result is lost or wrong in a certain frame or the target cannot obtain the detection result in the shielding stage, the identity information of the target is lost; 2) and a robust data association model is used for associating the correct motion track with a moving object with high uncertainty frame by frame. The existing deep learning technology cannot perfectly solve the pain points of the two technologies, and meanwhile, the method is high in time complexity, sensitive to the appearance characteristic expression quality of the correlation model and not suitable for scenes requiring real-time processing. Although many methods attempt to utilize a deep neural network in the process of feature expression and data association of a target at the same time, a multi-target tracking method based on a "detection before tracking" framework cannot achieve a true end-to-end state all the time.
Disclosure of Invention
Aiming at the defects or the improvement requirements of the prior art, the invention provides an end-to-end multi-target tracking method using a global response diagram, aiming at improving an algorithm framework of 'detection before tracking' and better integrating a target detector based on a deep learning technology into a visual tracking task so as to efficiently realize the end-to-end multi-target tracking in the true sense.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
an end-to-end multi-target tracking method using a global response graph comprises the following steps:
1) expressing the motion characteristics of the tracked target: selecting the motion attributes of all tracked targets from the data set, carrying out normalization processing on the motion attributes, and expressing the attribute characteristics of all targets to a global response graph in the form of different channels;
2) generating an attribute label of the training sample corresponding to the step 1): generating attribute labels of actual existing states of the training samples by using a logic inference method based on historical state information;
3) target positioning: training the global response map defined in step 1) for target localization by means of an improved salient target detection subnetwork using the training data generated in step 2);
4) predicting a target position change: predicting the target position change by using a motion offset regression sub-network based on the interframe optical flow field according to the global response map obtained in the step 3);
5) distance measurement: calculating the IOU distance between the observation space and the state space according to the global response graph obtained in the step 3) and the position change graph obtained in the step 4) aiming at each corresponding point:
Figure BDA0002627871460000021
wherein, area (a) and area (b) refer to the rectangular frame area of the target at the observation position and the prediction position, respectively;
6) track connection: and 5) constructing a global cost matrix by using the distance measurement obtained in the step 5), and then performing optimal distribution by using the Hungarian algorithm through minimizing the cost matrix, thereby obtaining a final target track.
The further improvement of the invention is that the specific implementation method of the step 1) is as follows:
101) expressing the existence information of the target as a response value with Gaussian distribution, wherein the value range of each response point is [0,1 ];
102) expressing the position information of the targets into a form with Gaussian distribution instead of a form of a detection frame, wherein the central point of the Gaussian distribution of each target is the central point position of the target rectangular frame;
103) the presence or absence of all targets and the location information attributes are modeled simultaneously using a global response graph, each channel of the response graph representing an attribute of a target.
The further improvement of the present invention is that step 2) is to generate a corresponding training sample label for the response graph constructed in step 1), and the specific implementation method is as follows:
201) expressing the actual existence state of each tracked target in the training sample at each moment to be 0/1 response values;
202) and deducing a response value at the current moment by observing the target state value at the historical moment in the time window with the length of l-10.
A further development of the invention is that the object localization sub-network used in step 3) belongs to an automatic encoder whose input is a sequence of consecutive image frames of a time window and whose output is the global response map defined in step 1).
The further improvement of the invention is that between the step 3) and the step 4), the overall response graph output by the step 4) is subjected to non-maximum suppression, so as to filter abnormal values with too low response values and overlapping.
A further improvement of the present invention is that step 4) uses region-of-interest pooling and multiple full-join operations to achieve simultaneous regression of the position change information of all tracked objects at the next time from the pixel-level global motion offset field, including the offset Δ cx, Δ cy of the center point of the object and the variation Δ w, Δ h of the object size.
The further improvement of the present invention is that step 5) is a post-processing step according to the global response map obtained in step 3) and the location prediction map obtained in step 4), and the specific implementation method is as follows:
501) taking the target position positioned in the target response image as a value of an observation space, counting all response points with response values exceeding a lowest threshold value, and setting the total number as M;
502) the target position prediction obtained by the regression network is regarded as a value of a state space, for each tracked target, a predicted value of a corresponding position of the tracked target is obtained from a position prediction image output by the regression network, and the total number of the tracked targets is counted to be N;
503) for each tracked target, calculating IOU distance between the rectangular box of the predicted position and the rectangular boxes of all positive response positions in the next frame of global response map.
The further improvement of the invention is that step 6) is a process of obtaining the IOU distance between the prediction space and the observation space according to step 5) to perform optimal allocation, and the specific implementation method is as follows:
601) constructing an N x M global cost matrix according to the total number N of the tracked targets and the total number M of the positive response points, wherein the corresponding position of the matrix is the IOU distance calculated by each target according to the step 5);
602) performing optimal allocation by minimizing a global cost matrix by using a Hungarian algorithm, wherein the allocation result is the corresponding relation between a tracked target and the position of a response point observed by the current frame;
603) connecting target tracks according to each distribution result, initializing the tracks by using unallocated response points, listing the unallocated targets in an observation period, and terminating the target tracks if the unallocated targets in the observation period are not matched with the response points after 10 frames.
The invention has at least the following beneficial technical effects:
(1) the multi-target tracking technology provided by the invention can simultaneously represent the dynamic characteristics of a plurality of targets, the optimal scheme uses a more efficient target positioning network to implicitly perform a target positioning process, and global normalization is performed on the motion characteristics of the tracked foreground target in a mode of simultaneously representing all interested target areas in a Gaussian distribution mode to achieve a response graph. On one hand, the characteristics of the foreground target and the background area can be effectively distinguished, and the neural network can learn the consistency of the tracked foreground target to the maximum extent; on the other hand, the characteristic expression strategy of the global response diagram can greatly improve the running speed of the tracking process.
(2) According to the multi-target tracking technology provided by the invention, the optimal scheme judges the state of the target at the current moment based on the historical prior information of the sliding window, reasonably deduces whether the target exists at the current moment according to the existing state of the target at the past moment, can effectively solve the problem of shielding of the target, and the global response graph output by the network can still maintain a correct positive response value during the period that the target is shielded.
(3) According to the multi-target tracking technology provided by the invention, the optimal scheme realizes that the position change information of all tracked targets at the next moment is simultaneously regressed from a pixel-level global motion offset field by using region-of-interest pooling and a plurality of full-connection operations, and a network simultaneously outputs the regressed positions of all targets in a one-time forward propagation process, so that the problem of uncertain number during data association among multiple targets can be solved.
(4) According to the multi-target tracking technology provided by the invention, the optimal scheme can complete the association process of all targets simultaneously in one forward propagation by directly matching the strategy of the response point in each prediction region in a global matching manner, so that the calculation complexity of the algorithm is effectively reduced.
Compared with the traditional multi-target tracking method, the multi-target tracking technology provided by the invention is not limited by a specific target detection technology and a complex data association strategy any more, and an additional deep neural network model is not needed to explicitly learn the appearance characteristics of the target, so that the end-to-end online tracking process in a real sense is realized, and the tracking speed and the tracking precision in a complex video monitoring scene can be improved at the same time.
Drawings
Fig. 1 is a system network structure diagram of an end-to-end multi-target tracking method according to an embodiment of the present invention;
FIG. 2 is a three-dimensional schematic of a partial response map characterizing a target state in an embodiment of the present invention;
fig. 3 is a network structure diagram for performing position prediction using an inter-frame optical flow field in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described with reference to the accompanying drawings and examples.
The invention provides an end-to-end multi-target tracking method using a global response diagram, which comprises the following steps:
1) expressing the motion characteristics of the tracked target: and selecting the motion attributes of all tracked targets from the data set, normalizing the motion attributes, and expressing the attribute characteristics of all targets to a global response graph in the form of different channels. The end-to-end multi-target tracking method using the global response graph comprises the following steps in step 1):
101) expressing the existence information of the target as a response value with Gaussian distribution, wherein the value range of each response point is [0,1 ];
102) expressing the position information of the targets into a form with Gaussian distribution instead of a form of a detection frame, wherein the central point of the Gaussian distribution of each target is the central point position of the target rectangular frame;
103) the presence or absence of all targets and the location information attributes are modeled simultaneously using a global response graph, each channel of the response graph representing an attribute of a target.
In the conventional method, the motion characteristics are expressed as characteristic vectors related to the number of targets, but in the step of the invention, the motion characteristics of all targets are simultaneously expressed, and the expression is in the form of a global response map independent of the number of targets, the global response map contains a plurality of channels, and the response value of each channel can represent different attributes of the motion trajectory of the target in a state space, such as 'existence or nonexistence of the target', 'x/y/w/h', 'Δ x/Δ y/Δ w/Δ h', and the like.
2) Generating an attribute label of the training sample corresponding to the step 1): an attribute label of the actual presence state of the training sample is generated using a logical inference method based on historical state information. In the end-to-end multi-target tracking method using the global response graph, step 2) is to generate a corresponding training sample label for the response graph constructed in step 1), and the specific implementation method is as follows:
201) expressing the actual existence state of each tracked target in the training sample at each moment to be 0/1 response values;
202) and deducing a response value at the current moment by observing the target state value at the historical moment in the time window with the length of l-10.
The traditional method for judging the target state is only based on the current frame input image, but in the step, the target existing state is extracted from the continuous frame input images in a sliding window mode and logically deduced, and the existing state of the target at the current moment is determined by the aid of historical prior information of the target.
3) Target positioning: training the global response map defined in step 1) for target localization with an improved salient target detection subnetwork using the training data generated in step 2). In the end-to-end multi-target tracking method using the global response graph, the target positioning sub-network used in the step 3) belongs to an Auto-Encoder (Auto-Encoder), the input of the Auto-Encoder is a continuous image frame sequence of a time window, and the output is the global response graph defined in the step 1).
The traditional multi-target tracking method uses a target detector based on a deep convolutional neural network to position a target, and the invention uses an improved significant target detection network to position the target in the step, and the model parameters of the network are smaller than those of a general target detection network, so that the operation speed of a network inference stage is higher, and the positioning accuracy of the target of interest is more accurate.
4) Predicting a target position change: predicting the target position change by using a motion offset regression sub-network based on the interframe optical flow field according to the global response diagram obtained in the step 3). The end-to-end multi-target tracking method using the global response graph further comprises the step of carrying out non-maximum inhibition on the global response graph output in the step 4) between the step 3) and the step 4), so that abnormal values with too low response values and overlapping occurrence are filtered.
Preferably, in the end-to-end multi-target tracking method using the global response map, step 4) of the method uses region-of-interest pooling and multiple full-connection operations to realize simultaneous regression of position change information of all tracked targets at the next moment from a global motion offset field at a pixel level. These position change information include the amounts of deviation Δ cx, Δ cy of the target center point and the amounts of change Δ w, Δ h of the target size.
The step improves the technical strategy of establishing a motion model by the traditional method, and the target position prediction task is embedded into a tracking network as an interframe optical flow field regressor in an end-to-end mode for training and testing. By introducing the region-of-interest pooling operation, the problem that the number of targets is uncertain in the traditional method can be effectively solved.
5) Distance measurement: calculating the IOU distance between the observation space and the state space according to the global response diagram obtained in the step 3) and the position change diagram obtained in the step 4) aiming at each corresponding point
Figure BDA0002627871460000071
Wherein, area (a) and area (b) refer to the rectangular frame area of the target at the observation position and the prediction position, respectively; in the end-to-end multi-target tracking method using the global response map, step 5) is a post-processing step of the global response map obtained in step 3) and the position prediction map obtained in step 4), and the specific implementation method is as follows:
501) taking the target position positioned in the target response image as a value of an observation space, counting all response points with response values exceeding a lowest threshold value, and setting the total number as M;
502) the target position prediction obtained by the regression network is regarded as a value of a state space, for each tracked target, a predicted value of a corresponding position of the tracked target is obtained from a position prediction image output by the regression network, and the total number of the tracked targets is counted to be N;
503) for each tracked target, calculating IOU distance between the rectangular box of the predicted position and the rectangular boxes of all positive response positions in the next frame of global response map.
6) Track connection: and 5) constructing a global cost matrix by using the distance measurement obtained in the step 5), and then performing optimal distribution by using the Hungarian algorithm through minimizing the cost matrix, thereby obtaining a final target track. In the end-to-end multi-target tracking method using the global response graph, step 6) is a process of obtaining the IOU distance between the prediction space and the observation space according to step 5) to perform optimal allocation, and the specific implementation method is as follows:
601) constructing an N x M global cost matrix according to the total number N of the tracked targets and the total number M of the positive response points, wherein the corresponding position of the matrix is the IOU distance calculated by each target according to the step 5);
602) performing optimal allocation by minimizing a global cost matrix by using a Hungarian algorithm, wherein the allocation result is the corresponding relation between a tracked target and the position of a response point observed by the current frame;
603) connecting target tracks according to each distribution result, initializing the tracks by using unallocated response points, listing the unallocated targets in an observation period, and terminating the target tracks if the unallocated targets in the observation period are not matched with the response points after 10 frames.
The multi-target tracking technology provided by the invention realizes end-to-end multi-target tracking by utilizing a global response diagram inferred based on historical logic and a position prediction diagram combined with an interframe optical flow field, and has the effect of improving the accuracy of quickly positioning and tracking all interested targets in a complex monitoring video scene.
The traditional method generally adopts an iterative mode to extract the characteristics of each target area to construct distance measurement in pairs when data association data is carried out, but the iterative mode is improved in the step of the invention, and the association process of all targets can be simultaneously completed in one-time forward propagation by directly matching the strategy of the response point in each prediction area, so that the calculation complexity of the algorithm is effectively reduced.
Examples
The network structure flow of the multi-target tracking method provided by the embodiment of the invention is as illustrated in fig. 1, and the method mainly comprises three modules: the system comprises a target positioning module, a position prediction module and a data correlation module. The method specifically comprises the following steps:
(1) firstly, at the target positioning module, a simpler and more efficient method is used for expressing the motion characteristics and the position information of the target. For all tracked objects at each time point, uniformly regarding the tracked objects as foreground targets, and expressing the tracked objects as a global response map by using a two-dimensional Gaussian distribution with the center of the target position as an origin and a peak value from 0 to 1.
A three-dimensional schematic diagram of a local response graph of a target state feature is shown in fig. 2, each gaussian distribution represents a tracked foreground target, x and y axes correspond to the spatial position of the target, and z axis corresponds to the actual state of the target at the current time (whether the target exists at the current time, 0 represents absence, and 1 represents presence). The definition of the radius r and sigma of the gaussian kernel is:
Figure BDA0002627871460000091
Figure BDA0002627871460000092
Figure BDA0002627871460000093
where h and w are the length and width of the target detection frame, α is an invariant set to 0.7, aiAnd biAre the parameters calculated in the different cases in 3.
The global response map can be used to describe the spatial position and the actual existence state of the target at the same time through the step (1). To learn such a characterization, the next step requires preparation of sufficient training data and corresponding sample labels for model training.
(2) And acquiring the actual existing state label of the target by using a logic inference method based on historical prior. The definition of this logical inference method is for the target trajectory { T } from the training sample truth (ground-route)jJ-1, 2, …, m, the response value of the actual presence state of the target at frame t
Figure BDA0002627871460000094
By means of historical states within a time window of length l
Figure BDA0002627871460000095
Figure BDA0002627871460000101
The specific estimation method is as follows:
Figure BDA0002627871460000102
where β is a constant set to 0.6 indicating the proportion of positive response values within the time window. If the target is present at most of the time (more than 60%) over the past period of time, the target is considered to be present at the present time even if there is no corresponding target detection result. On the other hand, if the target is absent at most of the past time (more than 60%), the target is considered to be absent at the present time even if there is a corresponding target detection result. In addition, if the target has a response value of 1 at the previous time (t-1 frame), the current time is considered to be present.
The target feature representation method defined in the step (1) and the training samples and labels obtained in the step (2) use an improved salient target detection network as an auto-encoder to perform target positioning.
(3) The target location sub-network structure uses an HED-based significance detection network which takes VGG16 as a backbone network and adopts short-connection for the side output of each feature layer. And calculating the average value of the side outputs of the 1 st, 2 nd, 3 rd and 6 th layers and then obtaining the output of the target positioning network through a sigmoid activation function. Given a training set
Figure BDA0002627871460000103
Figure BDA0002627871460000104
And corresponding response map label
Figure BDA0002627871460000105
The loss function of the target positioning network adopts standard cross entropy loss:
Figure BDA0002627871460000106
wherein P (y)j=1|Xl) Representing the probability of belonging to an activation value at position j, while label Y is derived from the training sample by the logical inference method of step (2).
While learning the global response maps of all tracked targets through step (3), the predicted positions of all the targets of interest also need to be learned. The invention uses a regression network fused with an interframe optical flow field to predict and learn the target position.
(4) In the position prediction module, firstly, the FlowNet2 network is adopted to extract the optical flow information between frames. Given two adjacent frames It-1And ItThe optical flow field from t-1 frame to t frame can be expressed as
Figure BDA0002627871460000107
Wherein (u)i,vi) Respectively representing optical flow information of x-direction and y-direction pixel points. After the position offset of the pixel point is obtained from the optical flow field, a regression network is used for learning the global position offset of the Gaussian distribution response point defined in the step (1) as a predicted value of the target position.
Fig. 3 shows a network structure diagram for performing position prediction using an interframe optical flow field in an embodiment of the present invention, where for a t-th frame, a response diagram Z obtained at first is showntThe local non-maximum suppression is performed, and the first k positive response distributions are filtered out by setting a response threshold Score to 0.05, and the coordinates of their center points are calculated. Each center point is (cx, cy) and has a fixed size rzThe Gaussian distribution is used as a region of interest (ROI) and the position offset of all ROIs is extracted from the optical flow field to form a feature vector
Figure BDA0002627871460000111
Regression was performed. The position offset regression network is composed of one ROI pooling layer and several fully connected layers. The output of the regression network is a position offset vector D from t-1 frame to t framet={dj(Δ cx, Δ cy, Δ w, Δ h), j is 1,2, …, k }. Given a truth value GtAnd network output DtAdopting smooth L1 loss as a loss function of the regression network:
Figure BDA0002627871460000112
(5) global response graph ZtAnd a position prediction value DtAfter the data is obtained through the step (3) and the step (4), the data correlation module needs to predict a value (D)t) And observed value (Z)t) Global matching is performed between them. First, each predicted position of the t frame is calculated
Figure BDA0002627871460000113
And its nearest neighbor response value
Figure BDA0002627871460000114
The IOU distance between. Wherein, the nearest neighbor response value is obtained by finding the shortest path of the distance between the central points of the two distributions. Then selecting the IOU distance in the observation space to be larger than the threshold IOUminAnd calculating the IOU distance between the response value of the t-1 frame and the candidate of the t frame to obtain a cost matrix, wherein the maximum response point is 0.7 as the candidate. And then, a Hungarian algorithm is used for solving a minimum cost matrix to perform optimal allocation.
(6) And for the t-th frame, after global optimal allocation is carried out, a cascade matching strategy is adopted to carry out re-matching on all response points and target tracks which are not allocated. First, a constant A is setmaxRepresenting the maximum number of frames for backtracking. For each response point not matched to the track in the current frame
Figure BDA0002627871460000115
In a time window t-1, t-2, … t-AmaxCalculating each terminated target track and the response point frame by frame
Figure BDA0002627871460000116
Until the IOU distance is greater than a set threshold IOU at a certain frameminWhen the response value is 0.7, the response point is considered to be able to match with a certain past track, and then the corresponding target track position is updated by the response value. After passing through the global cascade matching strategy, all unmatched response points will be reinitializedIs a new target and all unmatched tracks will be terminated.
The end-to-end multi-target tracking method using the global response diagram improves the traditional 'detection-first tracking-later' algorithm framework and a data association-based method, and provides a unique feature expression and a network structure to perform end-to-end multi-target tracking. The target locator based on image sequence/video frame included in the invention can effectively solve the problem of the occlusion of the target in a short period. The method for estimating the motion change of the target by combining the target position offset regressor of the interframe optical flow field can solve the matching problem of uncertain number in one forward propagation. The end-to-end target tracking method using the global response graph provided by the invention breaks away from the traditional frame of 'detection before tracking', and realizes complete end-to-end without any prior information of detection or appearance characteristics. The multi-target tracking method provided by the invention achieves the prior art level in the industry on the aspects of speed and precision.

Claims (8)

1. An end-to-end multi-target tracking method using a global response graph is characterized by comprising the following steps:
1) expressing the motion characteristics of the tracked target: selecting the motion attributes of all tracked targets from the data set, carrying out normalization processing on the motion attributes, and expressing the attribute characteristics of all targets to a global response graph in the form of different channels;
2) generating an attribute label of the training sample corresponding to the step 1): generating attribute labels of actual existing states of the training samples by using a logic inference method based on historical state information;
3) target positioning: training the global response map defined in step 1) for target localization by means of an improved salient target detection subnetwork using the training data generated in step 2);
4) predicting a target position change: predicting the target position change by using a motion offset regression sub-network based on the interframe optical flow field according to the global response map obtained in the step 3);
5) distance measurement: calculating the IOU distance between the observation space and the state space according to the global response graph obtained in the step 3) and the position change graph obtained in the step 4) aiming at each corresponding point:
Figure FDA0002627871450000011
wherein, area (a) and area (b) refer to the rectangular frame area of the target at the observation position and the prediction position, respectively;
6) track connection: and 5) constructing a global cost matrix by using the distance measurement obtained in the step 5), and then performing optimal distribution by using the Hungarian algorithm through minimizing the cost matrix, thereby obtaining a final target track.
2. The end-to-end multi-target tracking method using the global response graph as claimed in claim 1, wherein the specific implementation method of step 1) is as follows:
101) expressing the existence information of the target as a response value with Gaussian distribution, wherein the value range of each response point is [0,1 ];
102) expressing the position information of the targets into a form with Gaussian distribution instead of a form of a detection frame, wherein the central point of the Gaussian distribution of each target is the central point position of the target rectangular frame;
103) the presence or absence of all targets and the location information attributes are modeled simultaneously using a global response graph, each channel of the response graph representing an attribute of a target.
3. The end-to-end multi-target tracking method using the global response graph as claimed in claim 1, wherein step 2) is to generate corresponding training sample labels for the response graph constructed in step 1), and the specific implementation method is as follows:
201) expressing the actual existence state of each tracked target in the training sample at each moment to be 0/1 response values;
202) and deducing a response value at the current moment by observing the target state value at the historical moment in the time window with the length of l-10.
4. An end-to-end multi-target tracking method using global response maps according to claim 1, wherein the target location sub-network used in step 3) belongs to an automatic encoder, the input of which is a continuous image frame sequence of a time window, and the output of which is the global response map defined in step 1).
5. The end-to-end multi-target tracking method using the global response graph according to claim 1, wherein between the step 3) and the step 4), non-maximum suppression is further performed on the global response graph output in the step 4), so as to filter abnormal values with too low response values and overlapping occurrence.
6. The method for end-to-end multi-target tracking by using the global response graph according to claim 1, wherein the step 4) uses region-of-interest pooling and a plurality of full-join operations to realize that the position change information of all tracked targets at the next moment is simultaneously regressed from the global motion offset field at the pixel level, and the position change information comprises the offset Δ cx, Δ cy of the center point of the target and the variation Δ w, Δ h of the size of the target.
7. The end-to-end multi-target tracking method using the global response graph according to claim 1, wherein the step 5) is a post-processing step of the global response graph obtained in the step 3) and the position prediction graph obtained in the step 4), and the implementation method is as follows:
501) taking the target position positioned in the target response image as a value of an observation space, counting all response points with response values exceeding a lowest threshold value, and setting the total number as M;
502) the target position prediction obtained by the regression network is regarded as a value of a state space, for each tracked target, a predicted value of a corresponding position of the tracked target is obtained from a position prediction image output by the regression network, and the total number of the tracked targets is counted to be N;
503) for each tracked target, calculating IOU distance between the rectangular box of the predicted position and the rectangular boxes of all positive response positions in the next frame of global response map.
8. The method for end-to-end multi-target tracking by using a global response graph as claimed in claim 7, wherein step 6) is a process of obtaining the IOU distance between the prediction space and the observation space according to step 5) for optimal allocation, and the specific implementation method is as follows:
601) constructing an N x M global cost matrix according to the total number N of the tracked targets and the total number M of the positive response points, wherein the corresponding position of the matrix is the IOU distance calculated by each target according to the step 5);
602) performing optimal allocation by minimizing a global cost matrix by using a Hungarian algorithm, wherein the allocation result is the corresponding relation between a tracked target and the position of a response point observed by the current frame;
603) connecting target tracks according to each distribution result, initializing the tracks by using unallocated response points, listing the unallocated targets in an observation period, and terminating the target tracks if the unallocated targets in the observation period are not matched with the response points after 10 frames.
CN202010802373.6A 2020-08-11 2020-08-11 End-to-end multi-target tracking method using global response graph Pending CN112116629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010802373.6A CN112116629A (en) 2020-08-11 2020-08-11 End-to-end multi-target tracking method using global response graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010802373.6A CN112116629A (en) 2020-08-11 2020-08-11 End-to-end multi-target tracking method using global response graph

Publications (1)

Publication Number Publication Date
CN112116629A true CN112116629A (en) 2020-12-22

Family

ID=73804026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010802373.6A Pending CN112116629A (en) 2020-08-11 2020-08-11 End-to-end multi-target tracking method using global response graph

Country Status (1)

Country Link
CN (1) CN112116629A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744313A (en) * 2021-09-06 2021-12-03 山东工商学院 Deep learning integrated tracking algorithm based on target movement track prediction
CN114724359A (en) * 2022-03-07 2022-07-08 重庆亲禾智千科技有限公司 Deepstream-based road event detection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135314A (en) * 2019-05-07 2019-08-16 电子科技大学 A kind of multi-object tracking method based on depth Trajectory prediction
CN111488795A (en) * 2020-03-09 2020-08-04 天津大学 Real-time pedestrian tracking method applied to unmanned vehicle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135314A (en) * 2019-05-07 2019-08-16 电子科技大学 A kind of multi-object tracking method based on depth Trajectory prediction
CN111488795A (en) * 2020-03-09 2020-08-04 天津大学 Real-time pedestrian tracking method applied to unmanned vehicle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINGYU WAN等: "End-to-End Multi-Object Tracking with Global Response Map", 《ARXIV》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744313A (en) * 2021-09-06 2021-12-03 山东工商学院 Deep learning integrated tracking algorithm based on target movement track prediction
CN113744313B (en) * 2021-09-06 2024-02-02 山东工商学院 Deep learning integrated tracking algorithm based on target movement track prediction
CN114724359A (en) * 2022-03-07 2022-07-08 重庆亲禾智千科技有限公司 Deepstream-based road event detection method

Similar Documents

Publication Publication Date Title
CN111126152B (en) Multi-target pedestrian detection and tracking method based on video
CN114972418B (en) Maneuvering multi-target tracking method based on combination of kernel adaptive filtering and YOLOX detection
Tang et al. Multiple people tracking by lifted multicut and person re-identification
Tissainayagam et al. Object tracking in image sequences using point features
CN113506317B (en) Multi-target tracking method based on Mask R-CNN and apparent feature fusion
CN107516321B (en) Video multi-target tracking method and device
Maag et al. Time-dynamic estimates of the reliability of deep semantic segmentation networks
CN112395957A (en) Online learning method for video target detection
CN112116629A (en) End-to-end multi-target tracking method using global response graph
Gündüz et al. Efficient multi-object tracking by strong associations on temporal window
CN114581954A (en) Cross-domain retrieval and target tracking method based on pedestrian features
CN112233145A (en) Multi-target shielding tracking method based on RGB-D space-time context model
Fakhfakh et al. Background subtraction and 3D localization of moving and stationary obstacles at level crossings
Huang et al. A bayesian hierarchical detection framework for parking space detection
Gallego et al. Segmentation and tracking of static and moving objects in video surveillance scenarios
Duan Deep learning-based multitarget motion shadow rejection and accurate tracking for sports video
Tang et al. Salient moving object detection using stochastic approach filtering
Khan et al. Foreground detection using motion histogram threshold algorithm in high-resolution large datasets
CN111160170A (en) Self-learning human behavior identification and anomaly detection method
Cong et al. Robust visual tracking via MCMC-based particle filtering
CN115565157A (en) Multi-camera multi-target vehicle tracking method and system
Altaf et al. Presenting an effective algorithm for tracking of moving object based on support vector machine
Badal et al. Online multi-object tracking: multiple instance based target appearance model
Wang et al. Pmds-slam: Probability mesh enhanced semantic slam in dynamic environments
CN114067240A (en) Pedestrian single-target tracking method based on online updating strategy and fusing pedestrian characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201222

RJ01 Rejection of invention patent application after publication