CN111639551B - Online multi-target tracking method and system based on twin network and long-short term clues - Google Patents

Online multi-target tracking method and system based on twin network and long-short term clues Download PDF

Info

Publication number
CN111639551B
CN111639551B CN202010404941.7A CN202010404941A CN111639551B CN 111639551 B CN111639551 B CN 111639551B CN 202010404941 A CN202010404941 A CN 202010404941A CN 111639551 B CN111639551 B CN 111639551B
Authority
CN
China
Prior art keywords
frame
target
tracking
pedestrian
track
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202010404941.7A
Other languages
Chinese (zh)
Other versions
CN111639551A (en
Inventor
韩守东
于恩
刘东海生
黄飘
王宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202010404941.7A priority Critical patent/CN111639551B/en
Publication of CN111639551A publication Critical patent/CN111639551A/en
Application granted granted Critical
Publication of CN111639551B publication Critical patent/CN111639551B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an online multi-target tracking method and system based on a twin network and long and short term clues, and belongs to the field of multi-target tracking. The method comprises the following steps: the twin network module is used for performing cross correlation on the tracking target template and the search area to obtain a response graph and acquiring a preliminarily predicted tracking track of each target; the correction module is used for combining the preliminary track and the observation frame and correcting the pedestrian frame through a pedestrian regression network; the data association module is used for calculating the similarity between the tracking track and the observed pedestrian, respectively extracting long and short term clues of the tracking track and the observed pedestrian and fusing the long and short term clues to further calculate the similarity, and distributing a corresponding observed pedestrian frame for each tracking track; and the track post-processing module is used for updating, supplementing and deleting the tracking track to complete the tracking of the current frame. The invention perfects the problems of apparent feature fusion, pedestrian interaction shielding and large-scale change in the multi-target tracking task, improves the accuracy and relieves the problem of feature misalignment.

Description

Online multi-target tracking method and system based on twin network and long-short term clues
Technical Field
The invention belongs to the technical field of multi-target tracking, and particularly relates to an online multi-target tracking method and system based on a twin network and long-short term clues.
Background
In the face of increasingly complex video scenes, effective processing of massive video data needs to be achieved, all meaningful targets in the video need to be detected, positioned, tracked and analyzed, wherein Multi-Object Tracking/Multi-Target Tracking serves as a middle-layer visual task and plays a very critical role, urban communities, highway traffic and accurate Tracking of residential isolation, pedestrians, foreign people and vehicles entering and exiting from crowds are monitored in real time through security cameras, and the method has great practical significance for epidemic situation monitoring. The multi-target tracking is oriented to a complex scene with large area and multiple pedestrians, and the number of the pedestrians in each frame is not fixed, so that the method is very suitable for video monitoring scenes.
In recent years, with the wide application of deep learning in the field of computer vision, the field of target tracking (especially single target tracking) has been rapidly developed. The field of multi-target tracking forms a main framework based on detection and tracking. The common prediction model mostly adopts a motion model based on motion information at present, but the motion model mostly assumes that a tracked object is in a state of uniform motion, and can not better process some sudden motion states (steering, acceleration running, sudden stop and the like), and is extremely easy to lose the track of the condition of pedestrian interaction shielding, and once the track is lost, the track is difficult to reconnect again.
Because a large number of high-density crowds exist in a multi-target tracking scene, the number of pedestrians is not fixed, and interaction shielding phenomenon exists among the pedestrians, the existing multi-target tracking algorithm based on detection still has great defects.
Disclosure of Invention
Aiming at the interactive occlusion problem among the tracked targets in the existing multi-target tracking task and the defect and improvement requirement of large-scale morphological change, the invention provides an online multi-target tracking method and system based on a twin network and long and short-term clues, aiming at improving the apparent feature fusion and pedestrian interactive occlusion in the multi-target tracking task to the maximum extent and solving the problem of large-scale change, greatly improving the data association precision and accuracy and relieving the problem of feature misalignment.
To achieve the above object, according to a first aspect of the present invention, there is provided an online multi-target tracking method based on a twin network and long-short term cues, the method comprising the steps of:
s0., cutting the target detection result of the first frame of the surveillance video as an observation frame to obtain the observation frame of each target of the 1 st frame, taking the observation frame as the first input of the twin network, initializing a target template, taking the observation frame of each target of the 1 st frame as the initial state of the target tracking track, wherein T is 2;
s1, carrying out target detection on a T-th frame, and cutting a target detection result of the T-th frame as an observation frame to obtain the observation frame of each target of the T-th frame; cutting the T-th frame by taking N times of areas of the positions of the target templates of the T-1 th frame as search areas to obtain a search area picture of the T-th frame, wherein N is more than or equal to 1;
s2, taking the search area picture of the T-th frame as the second input of the twin network to obtain a most possible tracking frame as the tracking frame of each target T-th frame;
s3, respectively extracting features of the observation frame and the tracking frame of each target T frame by using the trained Re-ID model, and calculating the similarity of the extracted features to obtain a long-term feature clue of each target T frame; calculating IOU between the tracking frame and the observation frame of each target Tth frame as a short-term characteristic clue of each target Tth frame;
s4, fusing the extracted long-term clues and short-term clues to obtain fused characteristic clues of the T-th frame of each target;
s5, taking the fusion characteristic clue of each T-th frame as a cost matrix of data association, and matching the tracking track with the observation frame;
s6, updating, supplementing and deleting the tracking track according to the data association result to complete the tracking of the T-th frame;
and S7, judging whether the video is finished, if so, finishing, otherwise, inputting the current pedestrian frame of the current tracking track into the twin network as a target template for updating, wherein T is T +1, and returning to the step S1.
Preferably, the twin network comprises the following processes:
(1) extracting a template feature map of each target, and extracting a feature map of a picture of a search area corresponding to the target;
(2) performing cross correlation on the template characteristic diagram and the search area characteristic diagram to obtain a multi-channel response diagram;
(3) classifying the tracking target in the multi-channel response image, and predicting a pedestrian regression frame according to response information in the multi-channel response image;
(4) scoring the pedestrian regression frame by quality assessment;
(5) and taking the product of the quality evaluation score and the classification confidence score as a final score, and taking a regression box with the highest final score as a tracking box.
Preferably, the quality assessment score is calculated as follows:
Figure GDA0003505927960000041
wherein l*,r*,t*,b*Respectively representing the distances from the center point of the object to the four edges of the object.
Preferably, the Re-ID model comprises: and the global branch and the local branch respectively extract global features and local features based on a multi-attention joint mechanism.
Preferably, IBN-Net is introduced in the underlying CNN of the Re-ID model by any of the following means:
1) dividing an output channel of a first convolution after picture input into two halves, wherein one half is subjected to IN standardization, the other half is subjected to BN standardization, and the same operation is also performed after the first increment;
2) the IN operation is added after the output of the spatial and channel entries IN the soft entry of the HACNN, and the IN operation is added after the first layer of convolution after the picture input.
Preferably, step S4 includes the steps of:
s41, obtaining a long-term clue as a real distance through the Re-ID model, obtaining a short-term clue as a sot distance through IOU calculation, and using the number of times pause that the target i track is lostiCalculating the scaling factor
Figure GDA0003505927960000042
S42, judging pauseiIf it exceeds 2, increasing the proportionality coefficient and new proportionality coefficient
Figure GDA0003505927960000043
The long-term thread is updated to be a real distance (rate/real threshold), otherwise, the long-term thread is updated to be a real distance/real threshold, TL represents the limiting time of the track loss, and the real threshold represents the Re-ID enhancement coefficient;
s43, calculating a cost matrix cost of the target ii=rate×sot distance+(1-rate)×reid distance。
Preferably, before the data correlation, the observation pedestrian frame before correction is transmitted to the twin network for prediction, and the position where the pedestrian frame is possible is obtained to obtain the rough pedestrian frame before screening, so as to determine the observation pedestrian frame sequence.
Preferably, step S6 includes:
directly updating relevant parameters of the successfully associated tracking track;
regarding the observation frames which are not successfully correlated, taking the observation frames as an initial state and adding a tracking sequence again;
regarding the tracking track which is not associated successfully as a lost state;
if the lost state persists beyond the limit time, the active state of the trace is cancelled.
Preferably, the computation formula of the limit time of the track loss is as follows:
Figure GDA0003505927960000053
Figure GDA0003505927960000051
where pd represents pedestrian density, TL0A basic time limit is indicated and,
Figure GDA0003505927960000052
represents a round-down operation, numdetIndicates the number of detected pedestrians, num0Representing a pedestrian number threshold.
To achieve the above object, according to a second aspect of the present invention, there is provided an online multi-target tracking system based on a twin network and long-short term cues, comprising:
the twin network module is used for performing cross-correlation on the tracking target template and the search area to obtain a response graph and obtain a preliminarily predicted tracking track of each target;
the correction module is used for combining the acquired preliminary track with the observation frame and correcting the pedestrian frame through a pedestrian regression network;
the data association module is used for calculating the similarity between the tracking track and the observed pedestrian, extracting long and short term characteristic clues of the tracking track and the observed pedestrian respectively and fusing the long and short term characteristic clues to further calculate the similarity, and distributing a corresponding observed pedestrian frame for each tracking track;
and the track post-processing module is used for updating, supplementing and deleting the tracking track to complete the tracking of the current frame.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) the invention constructs a prediction model in a multi-target tracking framework based on the twin network, and because the core idea of the twin network is based on a response diagram, namely the probability of the twin network being a tracking target is judged by comparing the feature similarity of a target template and a candidate region, the influence of a sudden change motion state on tracking is reduced. And introducing a quality branch in the classification branch to score the regression frame of the regression branch, comprehensively considering space and amplitude limitations, performing more accurate scoring on the regression frame, and finally regarding the regression frame with the highest score as a pedestrian prediction frame.
(2) According to the invention, long-term characteristic clue extraction is carried out on the tracked target of each frame by a pedestrian re-identification technology, a multi-attention combination mechanism is adopted in the network, and under the multi-attention combination mechanism, the model can pay more attention to the foreground and the part of the target which is not shielded, so that accurate long-term clues can be extracted more conveniently. And a model structure combining example standardization and batch standardization for enhancing the generalization capability of the Re-ID model is constructed, so that characteristic clues are extracted better.
(3) According to the invention, the long-term characteristic information of the pedestrian is extracted through the Re-ID module, and the part of characteristic information has stronger adaptability to shielding, large-scale change and the like; the extracted long-term clues and short-term clues are subjected to weighted fusion by taking the overlapping degree of the pedestrian frames as short-term characteristic clues, so that the effective utilization of the long-term clues and the short-term clues is realized, and the problem of characteristic misalignment after some special shelters or large-scale changes occur is solved.
Drawings
FIG. 1 is a flow chart of an online multi-target tracking method based on a twin network and long and short term clues according to the present invention;
FIG. 2 is a diagram of a multi-target tracking infrastructure based on a twin network according to the present invention;
FIG. 3(a) is a diagram of a Re-ID model structure provided by the present invention;
FIG. 3(b) is a block diagram of the introduced combined standardization and batch standardization provided by the present invention;
FIG. 4 is a flow chart of a trace post-processing provided by the present invention;
fig. 5 is a diagram of a pedestrian regional regression network structure provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The on-line tracking refers to predicting the next frame by using the related information of the historical frame and the current frame.
As shown in FIG. 1, the invention provides an online multi-target tracking method based on twin network and long-short term clues, which comprises the following steps:
and step S0., cutting the target detection result of the first frame of the surveillance video as an observation frame to obtain an observation frame of each target of the 1 st frame, taking the observation frame as the first input of the twin network, initializing a target template, and taking the observation frame of each target of the 1 st frame as the initial state of the target tracking track, wherein T is 2.
Track initialization is carried out on each target in the initial frame, and information such as the pedestrian ID of the target and the coordinates of a pedestrian frame are recorded.
S1, carrying out target detection on a T-th frame, and cutting a target detection result of the T-th frame as an observation frame to obtain an observation frame of each target of the T-th frame; and cutting the T-th frame by taking the N times area of the position of each target template of the T-1 th frame as a search area to obtain a search area picture of the T-th frame, wherein N is more than or equal to 1.
In this embodiment, N is 2.
And S2, taking the search area picture of the T-th frame as a second input of the twin network to obtain a most possible tracking frame as a tracking frame of each target T-th frame.
As shown in fig. 2, the multi-target tracking task is decomposed into tracking (SOT) branches of multiple targets and data association is performed by combining with an appearance model, firstly, for each target in a pedestrian sequence, a tracker is initialized for the target, the target is tracked separately, the tracker takes a twin Network as a basic structure, and a quality detection branch is added to an RPN (Region candidate regression) structure behind a response map to score a regression frame.
Preferably, the twin network comprises the following processes:
(1) and extracting the template characteristic graph of each target, and extracting the characteristic graph of the picture of the search area corresponding to the target.
(2) And performing cross correlation on the template characteristic diagram and the search area characteristic diagram to obtain a multi-channel response diagram.
Figure GDA0003505927960000091
Wherein f (z, x) is a response map,
Figure GDA0003505927960000092
a convolution function for the shared parameters used to extract features, a cross-correlation operation, and bl an offset value for each point on the response map.
The response map includes the location and semantic information of the target.
(3) And classifying the tracking target in the multi-channel response diagram, and predicting a pedestrian regression frame according to response information in the multi-channel response diagram.
To reduce the conflict between regression and classification, it is common to do it separately, i.e., duplicate the same response map into two parts for the regression and classification operations, respectively.
(4) And scoring the pedestrian regression box through quality evaluation.
(5) And taking the product of the quality evaluation score and the classification confidence score as a final score, and taking a regression box with the highest final score as a tracking box.
The twin network comprises multiple branches, wherein the classification branch is responsible for classifying the tracking targets in the response diagram, and the regression branch is responsible for predicting the pedestrian regression box according to the response information in the response diagram. And introducing a quality branch in the classification branch, and taking charge of scoring the regression frame of the regression branch, and finally taking the regression frame with the highest score as a pedestrian prediction frame.
The invention adopts a mode of combining position space and amplitude limitation to score, and preferably, the quality evaluation score calculation formula is as follows:
Figure GDA0003505927960000093
wherein l*,r*,t*,b*Respectively representing the distances from the center point of the object to the four edges of the object.
S3, respectively extracting features of the observation frame and the tracking frame of each target T frame by using the trained Re-ID model, and calculating the similarity of the extracted features to obtain a long-term feature clue of each target T frame; and calculating the IOU between the tracking frame and the observation frame of each target Tth frame as a short-term characteristic clue of each target Tth frame.
In order to ensure the diversity of pedestrian sequences with the same identity, samples are screened in an Intersection Over Unit (IOU) and visibility comparison mode, after a first picture of each pedestrian sequence is initialized, a next pedestrian frame with the same identity, of which the IOU is smaller than 0.7 or the visibility difference exceeds 0.2, is selected as a next sample, and the like. 295 pedestrian IDs can be obtained finally, and the total number of samples is 33573. Adam is adopted in the training process to optimize network weight, the initial learning rate is set to be 0.003, the batch size is 64, the input resolution is 160 multiplied by 64, and a total training is 150 epochs. The loss function of the multitask convolution neural network is designed as a cross entropy loss function:
Figure GDA0003505927960000101
where N represents the number of samples in the current training batch (batch), yiAnd
Figure GDA0003505927960000102
and respectively representing the network predicted value and the real label of the pedestrian classification category joint probability distribution.
As shown in fig. 3(a), preferably, the Re-ID model includes: and the global branch and the local branch respectively extract global features and local features based on a multi-attention joint mechanism.
The network adopts a multi-attention combination mechanism, and under the multi-attention combination mechanism, the model can pay more attention to the foreground and the part of the target which is not shielded, so that accurate long-term clues can be extracted more conveniently. And combining hard attribute and soft attribute to realize the feature extraction of the severe scale change target and the occluded target.
The invention introduces IBN-Net to the bottom CNN of the Re-ID model. Two IBN-Net construction methods are shown in fig. 3 (b): one is that the output channel of the first convolution after the picture input is divided into two halves, where one half is subjected to IN Normalization and the other half is subjected to BN Normalization, and the same operation is performed after the first convolution, which is called HACNN _ IBN. The other is to add IN after the output of the spatial and channel entries IN the soft entry of HACNN, and add IN operation after the first layer convolution after picture input, called HACNN _ IBN _ B. The Re-ID model introduces the structure combining standardization and batch standardization for training, and solves the problem of cross-domain generalization.
And extracting the apparent characteristics of each object in the tracking sequence and observation through Re-ID, and finally calculating characteristic cosine distance as a long-term clue. And combining the overlapping scale of each observed object and the tracking track, and calculating the overlapping degree of each observed object as a short-term characteristic clue.
And S4, fusing the extracted long-term clues and short-term clues to obtain fused characteristic clues of the T-th frame of each target.
Specifically, step S4 includes the steps of:
s41, obtaining a long-term clue as a real distance through the Re-ID model, obtaining a short-term clue as a sot distance through IOU calculation, and using the number of times pause that the target i track is lostiCalculating the scaling factor
Figure GDA0003505927960000111
S42, judging pauseiIf it exceeds 2, increasing the proportionality coefficient and new proportionality coefficient
Figure GDA0003505927960000112
The long-term thread is updated to a real distance (rate/real threshold), otherwise, the long-term thread is updated to a real distance/real threshold, TL represents the restriction time of the track loss, and the real threshold represents the Re-ID enhancement coefficient. In this embodiment, TL is 3 frames, and reid thresh is 0.7.
S43, calculating a cost matrix cost of the target ii=rate×sot distance+(1-rate)×reid distance。
And S5, taking the fusion characteristic clue of each T-th frame as a cost matrix associated with data, and matching the tracking track with the observation box.
In a multi-target tracking scene, the number of targets in each frame is dynamically changed, including disappearance of old targets and appearance of new targets, and multiple targets between frames need to be matched, namely data association.
Data association is completed by using the Hungarian algorithm, and the threshold value of the cost matrix is preferably 0.7. This step may assign a corresponding tracking trajectory, i.e. target identity, to each observation pedestrian frame.
And S6, updating, supplementing and deleting the tracking track according to the data association result to complete the tracking of the T-th frame.
Specifically, as shown in fig. 4, step S6 includes:
directly updating relevant parameters of the successfully associated tracking track;
regarding the observation frames which are not successfully correlated, taking the observation frames as an initial state and adding a tracking sequence again;
regarding the tracking track which is not associated successfully as a lost state;
if the lost state persists for more than a certain time, the active state of the track is cancelled.
Preferably, the computation formula of the limit time of the track loss is as follows:
Figure GDA0003505927960000123
Figure GDA0003505927960000121
where pd represents pedestrian density, TL0A basic time limit is indicated and,
Figure GDA0003505927960000122
represents a round-down operation, numdetIndicates the number of detected pedestrians, num0Representing a pedestrian number threshold. num0Setting is carried out according to the complexity of different scenes.
Preferably, before the data correlation, the observation pedestrian frame before correction is transmitted to the twin network for prediction, and the position where the pedestrian frame is possible is obtained to obtain the rough pedestrian frame before screening, so as to determine the observation pedestrian frame sequence.
And S7, judging whether the video is finished, if so, finishing, otherwise, inputting the current pedestrian frame of the current tracking track into the twin network as a target template for updating, wherein T is T +1, and returning to the step S1.
As shown in fig. 5, in order to obtain a finer view frame, preferably, between step S2 and step S3, the method further comprises:
supplementing an observation frame of the T-th frame by using a tracking frame of each target of the T-th frame;
and correcting the supplemented observation frame by using a regional regression network to obtain the corrected observation frame of each target Tth frame.
The invention integrates the processes into a unified multi-target tracking framework, and takes an MOT17 test set as an example to carry out effect display. Wherein MOTA represents the track proportion of the correct overall tracking, IDF1 represents the identity confidence score of the tracking track, MT represents the track proportion of the tracking track with the effective length of more than 80%, ML represents the track proportion with the effective length of less than 20%, FP represents the number of the backgrounds judged as the tracking objects, FN represents the number of the tracking objects judged as the backgrounds, and ID Sw represents the number of times of identity conversion in the track.
The overall tracking effect on the final MOT17 test set is shown in table 1, wherein the specific results of each video are shown in table 2.
TABLE 1
Figure GDA0003505927960000131
TABLE 2
Figure GDA0003505927960000141
Correspondingly, the invention also provides an online multi-target tracking system based on the twin network and the long and short term clues, which comprises the following steps:
the twin network module is used for obtaining a response graph by performing cross-correlation on a tracking target template and a search area, and obtaining a preliminarily predicted tracking track of each target;
the correction module is used for combining the acquired preliminary track with the observation frame and correcting the pedestrian frame through a pedestrian regression network;
the data association module is used for calculating the similarity between the tracking track and the observed pedestrian, extracting long and short term characteristic clues of the tracking track and the observed pedestrian respectively and fusing the long and short term characteristic clues to further calculate the similarity, and distributing a corresponding observed pedestrian frame for each tracking track;
and the track post-processing module is used for updating, supplementing and deleting the tracking track to complete the tracking of the current frame.
Preferably, the data association module comprises a long-term apparent feature difference calculation module based on the Re-ID appearance model and a short-term feature difference calculation module based on the pedestrian frame overlapping degree, and the long-term apparent feature difference calculation module and the short-term feature difference calculation module are respectively used for calculating the difference between the tracking track and the long-term and short-term apparent feature of the observed pedestrian frame, and then carrying out weighted fusion on the difference based on the tracking track related information to obtain the final feature difference.
The Re-ID model of the long-term feature difference calculation module introduces the structure combining standardization and batch standardization for training, and solves the problem of cross-domain generalization.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. An online multi-target tracking method based on twin networks and long-short term clues is characterized by comprising the following steps:
s0., cutting the target detection result of the first frame of the surveillance video as an observation frame to obtain the observation frame of each target of the 1 st frame, taking the observation frame as the first input of the twin network, initializing a target template, taking the observation frame of each target of the 1 st frame as the initial state of the target tracking track, wherein T is 2;
s1, carrying out target detection on a T-th frame, and cutting a target detection result of the T-th frame as an observation frame to obtain the observation frame of each target of the T-th frame; cutting the T-th frame by taking N times of areas of the positions of the target templates of the T-1 th frame as search areas to obtain a search area picture of the T-th frame, wherein N is more than or equal to 1;
s2, taking the search area picture of the T-th frame as the second input of the twin network to obtain a most possible tracking frame as the tracking frame of each target T-th frame;
s3, respectively extracting features of the observation frame and the tracking frame of each target T frame by using the trained Re-ID model, and calculating the similarity of the extracted features to obtain a long-term feature clue of each target T frame; calculating IOU between the tracking frame and the observation frame of each target Tth frame as a short-term characteristic clue of each target Tth frame;
s4, fusing the extracted long-term clues and short-term clues to obtain fused characteristic clues of the T-th frame of each target;
s5, taking the fusion characteristic clue of each T-th frame as a cost matrix of data association, and matching the tracking track with the observation frame;
s6, updating, supplementing and deleting the tracking track according to the data association result to complete the tracking of the T-th frame;
s7, judging whether the video is finished, if so, finishing, otherwise, inputting the current pedestrian frame of the current tracking track into the twin network as a target template for updating, wherein T is T +1, and returning to the step S1;
step S4 includes the following steps:
s41, obtaining a long-term clue as a real distance through the Re-ID model, obtaining a short-term clue as a sot distance through IOU calculation, and using the number of times pause that the target i track is lostiCalculating the scaling factor
Figure FDA0003455600170000022
S42, judging pauseiIf it exceeds 2, increasing the proportionality coefficient and new proportionality coefficient
Figure FDA0003455600170000021
The long-term thread is updated to be a real distance (rate/real threshold), otherwise, the long-term thread is updated to be a real distance/real threshold, TL represents the limiting time of the track loss, and the real threshold represents the Re-ID enhancement coefficient;
s43, calculating a cost matrix cost of the target ii=rate×sot distance+(1-rate)×reid distance。
2. The method of claim 1, wherein the twin network comprises the following processes:
(1) extracting a template feature map of each target, and extracting a feature map of a picture of a search area corresponding to the target;
(2) performing cross correlation on the template characteristic diagram and the search area characteristic diagram to obtain a multi-channel response diagram;
(3) classifying the tracking target in the multi-channel response image, and predicting a pedestrian regression frame according to response information in the multi-channel response image;
(4) scoring the pedestrian regression frame by quality assessment;
(5) and taking the product of the quality evaluation score and the classification confidence score as a final score, and taking a regression box with the highest final score as a tracking box.
3. The method of claim 2, wherein the quality assessment score is calculated as follows:
Figure FDA0003455600170000031
wherein l*,r*,t*,b*Respectively representing the distances from the center point of the object to the four edges of the object.
4. The method of any one of claims 1 to 3, wherein the Re-ID model comprises: and the global branch and the local branch respectively extract global features and local features based on a multi-attention joint mechanism.
5. The method of claim 4, wherein IBN-Net is introduced at the underlying CNN of the Re-ID model by any one of:
1) dividing an output channel of a first convolution after picture input into two halves, wherein one half is subjected to example standardization, the other half is subjected to batch standardization, and the same operation is also performed after the first inclusion;
2) the example is added after the output of the spatial and channel entries in the soft entry of HACNN, and the example operation is added after the first layer of convolution after the picture input.
6. A method according to any one of claims 1 to 3, wherein prior to said data correlation, the pre-corrected observed pedestrian frames are transported to a twin network for prediction, and the positions where pedestrian frames are likely to be located are obtained to obtain the unscreened coarse pedestrian frames, thereby determining the sequence of observed pedestrian frames.
7. The method according to any one of claims 1 to 3, wherein step S6 includes:
directly updating relevant parameters of the successfully associated tracking track;
regarding the observation frames which are not successfully correlated, taking the observation frames as an initial state and adding a tracking sequence again;
regarding the tracking track which is not associated successfully as a lost state;
if the lost state persists beyond the limit time, the active state of the trace is cancelled.
8. The method of claim 1, wherein the constraint time for track loss is calculated as follows:
Figure FDA0003455600170000042
Figure FDA0003455600170000041
where pd represents pedestrian density, TL0A basic time limit is indicated and,
Figure FDA0003455600170000043
represents a round-down operation, numdetIndicates the number of detected pedestrians, num0Representing a pedestrian number threshold.
9. An online multi-target tracking system based on twin networks and long-short term cues, comprising:
a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is used for reading executable instructions stored in the computer-readable storage medium and executing the twin network and long-short term clue-based online multi-target tracking method of any one of claims 1 to 8.
CN202010404941.7A 2020-05-12 2020-05-12 Online multi-target tracking method and system based on twin network and long-short term clues Expired - Fee Related CN111639551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010404941.7A CN111639551B (en) 2020-05-12 2020-05-12 Online multi-target tracking method and system based on twin network and long-short term clues

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010404941.7A CN111639551B (en) 2020-05-12 2020-05-12 Online multi-target tracking method and system based on twin network and long-short term clues

Publications (2)

Publication Number Publication Date
CN111639551A CN111639551A (en) 2020-09-08
CN111639551B true CN111639551B (en) 2022-04-01

Family

ID=72330228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010404941.7A Expired - Fee Related CN111639551B (en) 2020-05-12 2020-05-12 Online multi-target tracking method and system based on twin network and long-short term clues

Country Status (1)

Country Link
CN (1) CN111639551B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163473A (en) * 2020-09-15 2021-01-01 郑州金惠计算机系统工程有限公司 Multi-target tracking method and device, electronic equipment and computer storage medium
CN112132152B (en) * 2020-09-21 2022-05-27 厦门大学 Multi-target tracking and segmentation method utilizing short-range association and long-range pruning
CN112288775B (en) * 2020-10-23 2022-04-15 武汉大学 Multi-target shielding tracking method based on long-term and short-term prediction model
CN112487934B (en) * 2020-11-26 2022-02-01 电子科技大学 Strong data association integrated real-time multi-target tracking method based on ReID (ReID) characteristics
CN112633078B (en) * 2020-12-02 2024-02-02 西安电子科技大学 Target tracking self-correction method, system, medium, equipment, terminal and application
CN112560651B (en) * 2020-12-09 2023-02-03 燕山大学 Target tracking method and device based on combination of depth network and target segmentation
CN112560656B (en) * 2020-12-11 2024-04-02 成都东方天呈智能科技有限公司 Pedestrian multi-target tracking method combining attention mechanism end-to-end training
CN112464900B (en) * 2020-12-16 2022-04-29 湖南大学 Multi-template visual target tracking method based on twin network
CN112734800A (en) * 2020-12-18 2021-04-30 上海交通大学 Multi-target tracking system and method based on joint detection and characterization extraction
CN112802067B (en) * 2021-01-26 2024-01-26 深圳市普汇智联科技有限公司 Multi-target tracking method and system based on graph network
CN112991385B (en) * 2021-02-08 2023-04-28 西安理工大学 Twin network target tracking method based on different measurement criteria
CN113239800B (en) * 2021-05-12 2023-07-25 上海善索智能科技有限公司 Target detection method and target detection device
CN113379793B (en) * 2021-05-19 2022-08-12 成都理工大学 On-line multi-target tracking method based on twin network structure and attention mechanism
CN113392721B (en) * 2021-05-24 2023-02-10 中国科学院西安光学精密机械研究所 Remote sensing satellite video target tracking method
CN113344976B (en) * 2021-06-29 2024-01-23 常州工学院 Visual tracking method based on target object characterization point estimation
CN113724291B (en) * 2021-07-29 2024-04-02 西安交通大学 Multi-panda tracking method, system, terminal device and readable storage medium
CN113673166B (en) * 2021-08-26 2023-10-31 东华大学 Digital twin model working condition self-adaption method and system for processing quality prediction
CN113744313B (en) * 2021-09-06 2024-02-02 山东工商学院 Deep learning integrated tracking algorithm based on target movement track prediction
CN114241003B (en) * 2021-12-14 2022-08-19 成都阿普奇科技股份有限公司 All-weather lightweight high-real-time sea surface ship detection and tracking method
CN114677633B (en) * 2022-05-26 2022-12-02 之江实验室 Multi-component feature fusion-based pedestrian detection multi-target tracking system and method
CN116647644B (en) * 2023-06-06 2024-02-20 上海优景智能科技股份有限公司 Campus interactive monitoring method and system based on digital twin technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information
CN110443827A (en) * 2019-07-22 2019-11-12 浙江大学 A kind of UAV Video single goal long-term follow method based on the twin network of improvement
CN110473231A (en) * 2019-08-20 2019-11-19 南京航空航天大学 A kind of method for tracking target of the twin full convolutional network with anticipation formula study more new strategy

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11308350B2 (en) * 2016-11-07 2022-04-19 Qualcomm Incorporated Deep cross-correlation learning for object tracking
US10902615B2 (en) * 2017-11-13 2021-01-26 Qualcomm Incorporated Hybrid and self-aware long-term object tracking
US10957053B2 (en) * 2018-10-18 2021-03-23 Deepnorth Inc. Multi-object tracking using online metric learning with long short-term memory

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information
CN110443827A (en) * 2019-07-22 2019-11-12 浙江大学 A kind of UAV Video single goal long-term follow method based on the twin network of improvement
CN110473231A (en) * 2019-08-20 2019-11-19 南京航空航天大学 A kind of method for tracking target of the twin full convolutional network with anticipation formula study more new strategy

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Harmonious Attention Network for Person Re-identification;Wei Li et al;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20181217;全文 *
Multi-Object Tracking Hierarchically in Visual Data Taken From Drones;Siyang Pan et al;《2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)》;20200305;全文 *
Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification;Weitao Feng et al;《arXiv:1901.06129v1》;20190118;全文 *
基于深度学习的视频多目标跟踪算法研究;储琪;《中国博士学位论文全文数据库》;20190815;全文 *

Also Published As

Publication number Publication date
CN111639551A (en) 2020-09-08

Similar Documents

Publication Publication Date Title
CN111639551B (en) Online multi-target tracking method and system based on twin network and long-short term clues
CN110717414B (en) Target detection tracking method, device and equipment
Wang et al. Deep learning model for house price prediction using heterogeneous data analysis along with joint self-attention mechanism
Fan et al. Point spatio-temporal transformer networks for point cloud video modeling
Feng et al. Cross-frame keypoint-based and spatial motion information-guided networks for moving vehicle detection and tracking in satellite videos
Fernández-Sanjurjo et al. Real-time multiple object visual tracking for embedded GPU systems
Abdulghafoor et al. A novel real-time multiple objects detection and tracking framework for different challenges
CN112634329A (en) Scene target activity prediction method and device based on space-time and-or graph
CN113744316A (en) Multi-target tracking method based on deep neural network
Liang et al. Cross-scene foreground segmentation with supervised and unsupervised model communication
Mao et al. Aic2018 report: Traffic surveillance research
CN114926859A (en) Pedestrian multi-target tracking method in dense scene combined with head tracking
CN114677633A (en) Multi-component feature fusion-based pedestrian detection multi-target tracking system and method
CN114283355A (en) Multi-target endangered animal tracking method based on small sample learning
Wang et al. Non-local attention association scheme for online multi-object tracking
Zaman et al. A robust deep networks based multi-object multi-camera tracking system for city scale traffic
Badal et al. Online multi-object tracking: multiple instance based target appearance model
CN115100565A (en) Multi-target tracking method based on spatial correlation and optical flow registration
Cao et al. A long-memory pedestrian target tracking algorithm incorporating spatiotemporal trajectory feature enhancement model
Nguyen et al. Real-time multi-vehicle multi-camera tracking with graph-based tracklet features
CN114494349A (en) Video tracking system and method based on target feature space-time alignment
Sankaranarayanan et al. Road traffic congestion (TraCo) estimation using multi-layer continuous virtual loop (MCVL)
CN112561956A (en) Video target tracking method and device, electronic equipment and storage medium
Chigrinskii et al. Optimization of a tracking system based on a network of cameras
Li et al. MULS-Net: A Multilevel Supervised Network for Ship Tracking From Low-Resolution Remote-Sensing Image Sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220401