CN113409356A - Similarity calculation method and multi-target tracking method - Google Patents

Similarity calculation method and multi-target tracking method Download PDF

Info

Publication number
CN113409356A
CN113409356A CN202110695292.5A CN202110695292A CN113409356A CN 113409356 A CN113409356 A CN 113409356A CN 202110695292 A CN202110695292 A CN 202110695292A CN 113409356 A CN113409356 A CN 113409356A
Authority
CN
China
Prior art keywords
target
frame
similarity
targets
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110695292.5A
Other languages
Chinese (zh)
Inventor
储琪
俞能海
刘斌
刘乾坤
顾建军
寄珊珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202110695292.5A priority Critical patent/CN113409356A/en
Publication of CN113409356A publication Critical patent/CN113409356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a similarity calculation method and a multi-target tracking method.A neighbor of a target is calculated for each target in each video frame, a vertex set is constructed by utilizing the appearance characteristics of the target and the neighbor of the target, and a directed edge set is calculated by utilizing the correlation among the targets, so that a directed graph is constructed; and for adjacent video frames, performing matching calculation by using the directed graphs of all targets in the two video frames to obtain a similarity calculation result. The invention simultaneously utilizes the appearance characteristics of the targets and the relative position characteristics between the targets, and the topological structure between the targets is well coded into the directed graph. The invention improves the accuracy and precision rate of multi-target tracking; and can retrieve lost targets when a certain target is seriously shielded by other targets; more tracking targets can be obtained, and the number of lost targets is reduced.

Description

Similarity calculation method and multi-target tracking method
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a similarity calculation method and a multi-target tracking method.
Background
By multi-target tracking, it is meant that given a video, the algorithm can output a location box of the target of interest in the video. It should be noted that the number of objects in the video is not fixed. During the tracking process, each target is also assigned a unique identity information (i.e. a number, denoted by ID). Multi-target tracking has wide applications, such as intelligent monitoring, automatic driving, and the like. With the rapid development of target detection, multi-target tracking methods based on detection gradually become mainstream. In the method, the multi-target tracking problem is mainly solved by data association.
In general, a robust similarity model is the key to solving the data association. In the existing method, the similarity between the targets is mostly calculated based on the characteristics of the target individuals, namely, the similarity model does not consider the correlation between the targets. Although the feature representation capability of the target is stronger and stronger with the development of the deep learning technology, the similarity calculated based on the individual features of the target is also somewhat robust. However, this calculation method still has limitations in complex scenarios. For example, when tracked objects belong to the same category (e.g., pedestrian tracking, vehicle tracking, etc.), the objects have certain similarities in appearance, and frequent occlusion exists inevitably between the objects. As shown in fig. 1, a complex scenario under pedestrian tracking is demonstrated. Part (a) of fig. 1 shows two adjacent pictures, and it can be seen that clothes of different pedestrians are relatively similar in the scene, and there is a certain occlusion between pedestrians. Part (b) of fig. 1 shows the similarity score calculated based on the target individual features, and the basic flow is as follows: firstly, extracting Individual features (industrialized reproduction) of a single target, and then performing Similarity calculation (Similarity Measure); it can be seen that the calculated similarity score is relatively low when the pedestrian is occluded, and relatively high when different pedestrians are similarly dressed. Therefore, in such a complex scenario, the similarity score calculated based on the target individual feature is not reliable enough.
Appearance features of objects are widely used in object tracking. Earlier work utilized manually designed appearance features for multi-target tracking. For example, Yamaguchi et al used raw pixel templates (raw pixel templates) in the article "white are you with and where are you in CVPR 2011", and Izadinia et al used Color histograms (Color histograms) and gradient Histograms (HOG) in the article "(MP) 2: multiple-pest multiple parts tracker. ECCV 2012". With the development of deep learning in recent years, appearance features extracted by using a Convolutional Neural Network (CNN) are widely applied to the field of multi-target tracking.
Motion information of a target is also widely applied to the field of multi-target tracking. These methods using motion information are basically based on an assumption that the motion of the object is smooth and slow. Milan et al, in the article "Continuous energy development for multiple target tracking. TPAMI 2013" designed a linear motion model, Yang et al, in the article "multiple tracking by online learning of non-linear motion patterns and robust application models. CVPR 2012" designed a non-linear motion model. However, the motion of the object in the video is not determined only by the object itself but also by the shooting device. During the shooting process, the device inevitably has a certain jitter, which is generally random and unpredictable. Therefore, it is difficult to solve the camera shake problem simply by relying on a motion model.
The above descriptions are based on the design and utilization characteristics of target individuals, and do not utilize the correlation between targets. Some of the current efforts have been directed to multi-target tracking using the interrelationship between targets. Sadeghian et al designed an Occupancy Map (Occupanacy Map) by partitioning a picture into a grid of fixed size in the article "Tracking the untrackable: left to track multiple crops with long-term dependencies. ICCV 2017". And when a target exists in a certain grid, setting the value corresponding to the placeholder map as 1, otherwise, setting the value as 0. It can be seen that this processing method only roughly records the distribution of objects, and the grid with value 1 in the placeholder map cannot distinguish different objects and the number of objects. Xu et al, in the article "Spatial-temporal relationship networks for multi-object tracking. ICCV 2019", extract the interrelationships between objects using a Relational Network. Specifically, the mutual position relationship between the objects is encoded into a weight, and then the weight is used to fuse the appearance characteristics of other objects in the current frame. It should be noted that the weights utilized by each object in fusing the appearance characteristics of the other objects are different. But this is not very interpretable and ignores the topology between objects.
Disclosure of Invention
The invention aims to provide a similarity calculation method and a multi-target tracking method, which can improve the robustness of a similarity model in multi-target tracking and ensure the tracking effect of the multi-target tracking in a complex scene.
The purpose of the invention is realized by the following technical scheme: a similarity calculation method for multi-target tracking, comprising:
for each target in each video frame, calculating the neighbor of the target, constructing a vertex set by using the appearance characteristics of the target and the neighbor of the target, and calculating a directed edge set by using the correlation between the targets, thereby constructing a directed graph.
And for adjacent video frames, performing matching calculation by using the directed graphs of all targets in the two video frames to obtain a similarity calculation result.
Further, for the target set in the t-th frame is expressed as
Figure BDA0003128029730000021
Wherein the ith target is represented as
Figure BDA0003128029730000022
Representing a position frame of the ith target, wherein four elements are the coordinate of the upper left corner of the ith target and the width and the height of the position frame respectively;
Figure BDA0003128029730000023
representing picture blocks truncated in the t-th frame according to the position frame, ItIndicating the number of objects in the t-th frame.
Further, K neighbors of the targets are obtained according to the distance between the targets, wherein K is the total number of the neighbors; for the t frame, with the ith target
Figure BDA0003128029730000024
For anchor points, the target
Figure BDA0003128029730000025
As its own 0 th neighbor, the target
Figure BDA0003128029730000026
And its neighbor forming set
Figure BDA0003128029730000031
Wherein the content of the first and second substances,
Figure BDA0003128029730000032
is an object
Figure BDA0003128029730000033
Is adjacent to the k-th neighbor.
Further, for the ith target in the tth frame
Figure BDA0003128029730000034
The constructed directed graph is represented as
Figure BDA0003128029730000035
Wherein the vertex set
Figure BDA0003128029730000036
The definition is as follows:
Figure BDA0003128029730000037
in the formula (I), the compound is shown in the specification,
Figure BDA0003128029730000038
to represent
Figure BDA0003128029730000039
Of the appearance characteristics phiACNN(. cndot.) represents the forward function of the convolutional neural network used to extract appearance features.
For directed edge sets
Figure BDA00031280297300000310
First, by
Figure BDA00031280297300000311
Representing anchor points
Figure BDA00031280297300000312
And its neighboring neighbor
Figure BDA00031280297300000313
Relative position vector between:
Figure BDA00031280297300000314
in the formula, wtAnd htIs the width and height of the t-th frame, phiRP(-) is a function that calculates the relative position between the targets based on the location box.
Using relative position encoders to align relative position vectors
Figure BDA00031280297300000315
Is transformed to obtain
Figure BDA00031280297300000316
Thereby obtaining a set of directed edges
Figure BDA00031280297300000317
Figure BDA00031280297300000318
In the formula, phiRPE(. cndot.) is a relative position encoder.
Further, the hard matching is carried out by utilizing the directed graphs of the targets in the two video frames, and the hard matching comprises the following steps:
given the directed graphs of two objects for adjacent video frames, frame t-1 and frame t
Figure BDA00031280297300000319
And
Figure BDA00031280297300000320
first, a similarity matrix is calculated
Figure BDA00031280297300000321
The elements in the k-th row and k' -th column of the matrix are calculated as follows:
Figure BDA00031280297300000322
in the formula (I), the compound is shown in the specification,
Figure BDA00031280297300000323
representing element subtraction between feature vectors, | · non-2Representing the squaring of elements in a pair vector, [, ]]Means that two vectors are spliced together, phiBC(. cndot.) represents the forward function of the two classifiers.
Finally, the following are obtained:
Figure BDA00031280297300000324
further, soft matching is carried out by utilizing directed graphs of objects in two video frames, and the soft matching method comprises the following steps:
on the basis of hard matching, firstly performing proximity alignment, and then calculating the similarity, which is expressed as:
Figure BDA00031280297300000325
in the formula (I), the compound is shown in the specification,
Figure BDA00031280297300000326
is a similarity matrix Si,jRemoving the first row and the first column to obtain a matrix; phi is aLA(. cndot.) is a linear distribution function for completing task distribution and returning the maximum total similarity sum according to the input similarity matrix.
The multi-target tracking method applies the similarity calculation method to the existing multi-target tracking method based on data association to replace a similarity model in the multi-target tracking method.
Further, comprising: finding the lost target in the current frame by using the information of the previous frame, the steps are as follows:
for the ith target in the t-1 frame
Figure BDA0003128029730000041
If lost at the t frame, then use the ith target
Figure BDA0003128029730000042
Its k-th neighbor
Figure BDA0003128029730000043
Relative position of each other
Figure BDA0003128029730000044
And
Figure BDA0003128029730000045
estimate the location frame of the ith target in the t-th frame
Figure BDA0003128029730000046
Figure BDA0003128029730000047
In the formula (I), the compound is shown in the specification,
Figure BDA0003128029730000048
is indicative of phiRPInverse function of phiRP(-) is a function that calculates the relative position between the targets based on the location box;
Figure BDA0003128029730000049
to represent
Figure BDA00031280297300000410
The corresponding target in the t-th frame.
For the ith target
Figure BDA00031280297300000411
All the K neighbors estimate the position frame of the ith target in the t frame in the mode, and the final position frame of the ith target in the t frame is calculated in the averaging mode
Figure BDA00031280297300000412
Further, a final position frame of the ith target in the tth frame is obtained
Figure BDA00031280297300000413
Then based on
Figure BDA00031280297300000414
Several candidate boxes are sampled by a gaussian distribution.
For any candidate frame sampled
Figure BDA00031280297300000415
By using
Figure BDA00031280297300000416
To represent
Figure BDA00031280297300000417
One candidate object in the t-th frame,
Figure BDA00031280297300000418
show according to location box
Figure BDA00031280297300000419
Picture blocks truncated in the t-1 th frame and constructing a directed graph
Figure BDA00031280297300000420
Then, a graph is obtained
Figure BDA00031280297300000421
And
Figure BDA00031280297300000422
the similarity between them.
And (3) extracting the candidate detection result with the highest similarity score from all the candidate detection results, and if the similarity score is larger than a set threshold value, taking the candidate target with the highest score as the candidate target
Figure BDA00031280297300000423
Tracking results in the t-th frame.
The invention has the beneficial effects that: the invention simultaneously utilizes the appearance characteristics of the targets and the relative position characteristics between the targets, and the topological structure between the targets is well coded into the directed graph. The invention improves the accuracy and precision rate of multi-target tracking; and can retrieve lost targets when a certain target is seriously shielded by other targets; more tracking targets can be obtained, and the number of lost targets is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a diagram illustrating the effect of the prior art in a complex application scenario;
fig. 2 is a schematic diagram illustrating an effect of the similarity calculation method according to the embodiment of the present invention;
FIG. 3 is a schematic diagram of a constructed directed graph provided by an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a relative position encoder according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a second classifier according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a change of a directed graph in a missing detection situation according to an embodiment of the present invention;
fig. 7 is a schematic diagram of retrieving a lost target based on a directed graph according to an embodiment of the present invention;
FIG. 8 illustrates tracking performance at different K values provided by embodiments of the present invention;
fig. 9 is a schematic diagram of a visual tracking result provided in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The similarity calculation method for multi-target tracking can be used as a similarity model to improve the robustness of the similarity model in multi-target tracking, and particularly the similarity model is a graph similarity model which has universality and can replace the similarity model in the conventional multi-target tracking.
As shown in fig. 2, which illustrates the main principle of the similarity calculation method for multi-target tracking, the scenario shown in part (a) of fig. 1 in fig. 2 is taken as an example, and mainly includes:
1. for each target in each video frame, calculating the neighbor of the target, then using the appearance characteristics of the target and the neighbor to construct a vertex set, and using the correlation between the targets to calculate a directed edge set, thereby constructing a directed Graph, namely 'Graph Representation' in fig. 2.
2. For adjacent video frames, Matching calculation is performed by using the directed graphs of the targets in the two video frames to obtain a similarity calculation result, namely 'Graph Matching' in fig. 2.
For convenience of understanding, the following detailed description is made on the above scheme of the present invention, and the scheme is mainly described by two parts, wherein the first part mainly describes data association in multi-target tracking; the second section mainly introduces the principle of the graph similarity model.
Firstly, data association.
Expressed as target set in the t-th frame
Figure BDA0003128029730000051
Wherein the ith target is represented as
Figure BDA0003128029730000052
Position box representing ith target, four elements therein
Figure BDA0003128029730000053
Figure BDA0003128029730000054
The horizontal and vertical coordinates and the width and height of the upper left corner of the position frame of the ith target are respectively;
Figure BDA0003128029730000055
representing a picture block truncated in the t-th frame according to the location box of the I-th object, ItIndicating the number of objects in the t-th frame.
The data association is performed in two adjacent framesRespectively, the t-1 th frame and the t-th frame. When solving the data association, a cost matrix needs to be provided
Figure BDA0003128029730000056
Element m of ith row and jth column in cost matrixi,jRepresenting objects
Figure BDA0003128029730000057
And
Figure BDA0003128029730000058
the cost is calculated as follows:
Figure BDA0003128029730000059
in the above formula, phiCI(-, denotes a function that calculates a cost based on the target individual characteristics.
Most of the existing methods mainly learn the characteristics of better target individuals or design better cost function phiCITo improve the robustness of the similarity model, but this does not take into account the interrelationship between objects.
As shown in part (a) of fig. 3, the target detection results in two adjacent frames are shown, and the numbers in the upper left corner of the position frame of the target are used to distinguish different detection results. Taking the 1 st target (i.e. the leftmost pedestrian) as an example, it can be seen that
Figure BDA0003128029730000061
Is severely shielded and in appearance
Figure BDA0003128029730000062
There is a relatively large difference.
In order to utilize the mutual relation between targets, the invention designs a graph similarity model, and the cost calculation mode based on the graph is as follows:
Figure BDA0003128029730000063
in the above formula, the first and second carbon atoms are,
Figure BDA0003128029730000064
expressed as a target
Figure BDA0003128029730000065
Created directed graph (implementation details see introduction below), phiGI(-) represents a function that computes the cost based on the feature representation of both graphs. The interrelationship between objects is embedded in the feature representation of the graph.
Secondly, a graph similarity model.
This section is presented mainly from three aspects: obtaining target neighbor, constructing a directed graph and matching the graph.
1. And acquiring target neighbors.
To create a directed graph
Figure BDA0003128029730000066
First needs to be acquired in the t frame
Figure BDA0003128029730000067
K neighbors (K is the total number of neighbors, the value can be set by itself), the target and its neighbors must be in the same frame. There are many ways to obtain the neighbor, which can be referred to the conventional techniques, but the invention is not limited thereto.
By some metric, the distance between the targets can be calculated. The distance is a general expression, and includes, but is not limited to, the Euclidean distance between the center points of the target location frames. In the embodiment of the invention, the neighbor is obtained by using the Euclidean distance between the central points of the target position frames. Using ordered sets
Figure BDA0003128029730000068
Representing objects
Figure BDA0003128029730000069
And a set of its neighbors, wherein
Figure BDA00031280297300000610
Is an object
Figure BDA00031280297300000611
Is adjacent to the k-th neighbor. In addition, define
Figure BDA00031280297300000612
Is an anchor point. For simplifying writing, it is provided
Figure BDA00031280297300000613
Is its own 0 th neighbor and thus
Figure BDA00031280297300000614
Taking part (a) of fig. 3 as an example, when k is 2,
Figure BDA00031280297300000615
2. and constructing a directed graph.
For the ith target in the t frame
Figure BDA00031280297300000616
The constructed directed graph is represented as
Figure BDA00031280297300000617
Directed graph
Figure BDA00031280297300000618
Is based on
Figure BDA0003128029730000071
And constructing the structure, wherein the structure consists of K +1 vertexes and K +1 directed edges. Part (b), (c) and (d) of fig. 3 respectively show directed graphs of different objects in two adjacent frames, and although the nodes in the directed graphs may be the same, their edges are different.
Vertex set
Figure BDA0003128029730000072
The definition is as follows:
Figure BDA0003128029730000073
in the above formula, the first and second carbon atoms are,
Figure BDA0003128029730000074
to represent
Figure BDA0003128029730000075
The appearance of the glass substrate is as follows,
Figure BDA0003128029730000076
to represent
Figure BDA0003128029730000077
Is used to frame the picture block cut in the t-th frameACNN(. The) represents a forward transfer function of the convolutional neural network used to extract appearance features; the convolutional neural network may be implemented in a conventional manner.
In order to exploit the interrelationship between objects, first the mutual position between objects is defined, using
Figure BDA0003128029730000078
Representing anchor points
Figure BDA0003128029730000079
And its neighboring neighbor
Figure BDA00031280297300000710
Relative position vector between:
Figure BDA00031280297300000711
in the above formula, the first and second carbon atoms are,
Figure BDA00031280297300000712
are respectively as
Figure BDA00031280297300000713
The horizontal and vertical coordinates and the width and height w of the upper left corner of the position frametAnd htIs the width and height of the t-th frame, phiRP(-) is a function that calculates the relative position between the targets based on the location box;
using relative position encoders to align relative position vectors
Figure BDA00031280297300000714
Is transformed to obtain
Figure BDA00031280297300000715
Thereby obtaining a set of directed edges
Figure BDA00031280297300000716
Figure BDA00031280297300000717
In the above formula, the first and second carbon atoms are,
Figure BDA00031280297300000718
to represent
Figure BDA00031280297300000719
Element of (5), phiRPE(. cndot.) is a relative position encoder.
Illustratively, relative position vector
Figure BDA00031280297300000720
The vector can be an 8-dimensional vector, the 8-dimensional relative position vector can be transformed into a high-dimensional relative position vector by using a method designed by Vaswani in the article "Attention all you needed. Fig. 4 schematically shows the structure of the relative position encoder, which is mainly composed of an fc (full connected) layer, and also utilizes a Batch Normalization (BN) layer and a ReLU activation function.
3. And (6) matching the graphs.
For the adjacencyVideo frames, i.e. frame t-1 and frame t, given directed graphs of two objects
Figure BDA00031280297300000721
And
Figure BDA00031280297300000722
first, a similarity matrix is calculated
Figure BDA0003128029730000081
Elements of the k-th row and k' -th column of the matrix
Figure BDA0003128029730000082
The calculation is as follows:
Figure BDA0003128029730000083
in the above formula, the first and second carbon atoms are,
Figure BDA0003128029730000084
representing element subtraction between feature vectors, | · non-2The representation squares the elements in the vector, [. The representation concatenates the two vectors, phiBC(. denotes the fronthaul function of the Binary Classifier (BC); FIG. 5 schematically shows the structure of the Binary Classifier.
Finally, the directed graph
Figure BDA0003128029730000085
And
Figure BDA0003128029730000086
similarity between si,jIt can be calculated in the following way:
Figure BDA0003128029730000087
the matching mode is hard matching. In practical application, the detection result is not perfect, and missing detection and false alarm exist, so that the robustness of the similarity score obtained through the calculation of the formula is not high.
As shown in fig. 6, the target
Figure BDA0003128029730000088
Is missed in the t frame, the corresponding constructed directed graph
Figure BDA0003128029730000089
And
Figure BDA00031280297300000810
changes also occur. The similarity score obtained when directly performing a hard match is not reliable because the order of the neighbors changes, i.e., there is no alignment between neighbors.
In order to solve the problem that the neighbors are not aligned, a soft matching scheme is further provided, namely on the basis of hard matching, neighbor pair is firstly carried out, and then similarity is calculated
Figure BDA00031280297300000811
Expressed as:
Figure BDA00031280297300000812
in the above formula, the first and second carbon atoms are,
Figure BDA00031280297300000813
is a similarity matrix Si,jRemoving the first row and the first column to obtain a matrix; phi is aLA(. is a linear distribution function which is obtained by modifying Hungarian algorithm and used for completing task distribution and returning the maximum total similarity sum according to the input similarity matrix.
Soft matching may align K neighbors of an anchor point as compared to hard matching. It should be noted, however, that the similarity score obtained by soft matching is always not less than the similarity score obtained by hard matching, i.e. there is always similarity
Figure BDA00031280297300000814
This is true. It can thus be seen that soft matching has a positive effect when the two anchors in the two graphs are the same target (positive samples) and a negative effect when the anchors in the two graphs are different targets (negative samples). Nevertheless, since the appearance features of the objects and the relative position features between the objects are encoded into the directed graph, so that the feature representation capability of the directed graph is good, the negative influence of the soft matching on the latter is basically negligible. Finally, the aforementioned cost formula can be rewritten as:
Figure BDA0003128029730000091
preferably, the embodiment of the invention also provides a multi-target tracking method capable of retrieving the lost target.
In the field of multi-target tracking, one target may be occluded by other targets. When the occlusion is severe, it is difficult for the detector to detect the occluded object, and as shown in fig. 6, the rightmost object is lost at the t-th frame. Because the graph similarity model designed by the invention utilizes the topological structure between the targets, the lost targets can be found back by utilizing the graph similarity model in the invention, as shown in fig. 7.
For the ith target in the t-1 frame
Figure BDA0003128029730000092
If lost at the t frame, then use the ith target
Figure BDA0003128029730000093
Its k-th neighbor
Figure BDA0003128029730000094
Relative position of each other
Figure BDA0003128029730000095
And
Figure BDA0003128029730000096
estimate the location frame of the ith target in the t-th frame
Figure BDA0003128029730000097
Figure BDA0003128029730000098
In the above formula, the first and second carbon atoms are,
Figure BDA0003128029730000099
is indicative of phiRPInverse function of (phi)RP(-) is a function that calculates the relative position between the targets based on the location box;
Figure BDA00031280297300000910
to represent
Figure BDA00031280297300000911
The corresponding target in the t-th frame.
For the ith target
Figure BDA00031280297300000912
All the K neighbors estimate the position frame of the ith target in the t frame in the mode, and the final position frame of the ith target in the t frame is calculated in the averaging mode
Figure BDA00031280297300000913
Further, the final position frame of the ith target in the obtained t frame
Figure BDA00031280297300000914
Then based on
Figure BDA00031280297300000915
Several candidate boxes are sampled by a gaussian distribution. For any candidate frame sampled
Figure BDA00031280297300000916
By using
Figure BDA00031280297300000917
To represent
Figure BDA00031280297300000918
One candidate object in the t-th frame,
Figure BDA00031280297300000919
show according to location box
Figure BDA00031280297300000920
Picture blocks truncated in the t-1 th frame and constructing a directed graph
Figure BDA00031280297300000921
Then, a graph is obtained
Figure BDA00031280297300000922
And
Figure BDA00031280297300000923
the similarity between them; selecting the candidate object with the highest similarity score from all the candidate objects, and if the similarity score is larger than a set threshold value, taking the candidate object with the highest similarity score as the candidate object
Figure BDA00031280297300000924
Tracking results in the t-th frame.
In the above-mentioned solution of the embodiment of the present invention, a feature representation (i.e. directed graph) of a graph is designed, and the feature representation not only utilizes the features of target individuals, but also utilizes the interrelation between targets. This correlation is represented by a directed graph, which is also in fact a topology between the targets; a characteristic matching mode of the graph is also designed, and a more robust similarity score can be obtained through a reasonable matching mode.
On the other hand, in order to explain the effect of the above scheme of the embodiment of the invention, the graph similarity model is applied to the existing multi-target tracking method based on data association, the similarity model therein is replaced, and the validity of the graph similarity model is verified through experiments.
Experiments were performed on motchelinge (https:// motchallange. net /) to analyze the merit and positive effects of the graph similarity model. The data sets used included MOT16 and MOT17, with the following evaluation indices:
1) Multi-Object Tracking Accuracy (MOTA), the higher the index the better.
2) Multi-Object Tracking Precision (MOTP), the higher the index the better;
3) the frequency of the same ID assigned to the same object (IDF 1, the higher the index is, the better the index is;
4) the number of most Tracked objects (MT), the index is as high as possible;
5) the number of most missing objects (ML), the lower the indicator, the better;
6) the number of times that the target ID changes (IDS), the lower the index, the better;
7) the number of discontinuous target tracks (Frag), the lower the index, the better;
8) the number of missed targets (FN), the lower the index is, the better;
9) the lower the False alarm count (FP), the better the indicator.
1. Details of the experiment.
In the experiment, all picture blocks were scaled to a size of 64 × 128. The convolutional neural network used to extract appearance features is implemented based on ResNet-34(Deep residual learning for image recognition. CVPR 2016): the last FC layer is removed to obtain 2 × 4 × 256 features, then the features are transformed into 2048-dimensional vectors, and finally the vectors are input into an FC layer to obtain 256-dimensional appearance feature vectors. The relative position characteristic of the relative position encoder RPE output is also 256 dimensions.
Appearance rollThe product neural network, the relative position encoder and the classifier are trained end to end, and 30 times on a training set by using a binary cross entropy loss function. Input to the classifier during training
Figure BDA0003128029730000101
The method comprises the following steps of dividing a positive sample into a negative sample: when anchor point
Figure BDA0003128029730000102
And
Figure BDA0003128029730000103
are the same target and are close to each other
Figure BDA0003128029730000104
And
Figure BDA0003128029730000105
when the target is the same, the input is a positive sample, and the other samples which do not meet the requirement are negative samples. The learning rate was initialized to 0.002, and the learning rate was reduced to 1/2 for each 10 training sessions. In addition, the problem of unbalance of positive and negative samples is solved by using online difficult sample mining.
2. Ablation experiment
In order to verify the validity of the GSM (graph similarity model) in the invention, a basic model (using the GSM) is also designed
Figure BDA0003128029730000106
Representation).
Figure BDA0003128029730000107
There is no RPE (relative position encoder) and the classifier only classifies according to the appearance features of the object (determines whether two appearance features belong to the same object). Two trackers are further designed, and the only difference between the two trackers is the difference between the similarity models. The two trackers track by correlating objects in matching adjacent frames. The 7 videos in the MOT16 training set were further divided into a training subset and a verification subset. The verification subset includes MOT16-09 and MOT16-10, the remainderThe 5 videos constitute a training subset. The tracking results of the different models on the validation subset are shown in table 1.
Table 1: tracking performance on verification subsets
Figure BDA0003128029730000111
In table 1, the superscript numbers of the different models represent the number K of neighbors used. The first two rows compare the performance of multi-target tracking without neighbor, and the tracking performance of the two models is basically the same. This is understandable because of the foregoing
Figure BDA0003128029730000112
It can be seen from the calculation formula that the relative position of the target to itself is an 8-dimensional all-zero vector, which means that GSM only utilizes appearance features when the number of neighbors K is 0.
The middle four rows compare the effect of hard matching (subscript h) and soft matching (subscript s) on tracking performance. Five neighbors were used. For the
Figure BDA0003128029730000118
And (4) modeling, wherein the constructed graph only has nodes and no edges. It can be seen that the hard match and soft match pairs
Figure BDA0003128029730000119
The models all have negative effects for the following reasons: (1) for two graphs with different anchor points, if the same target exists in the adjacent neighbors, the similarity of the two graphs can be increased by the two matching modes; (2) for two graphs with different anchor points, the negative effect of soft matching is not negligible because only appearance features and no relative position features are used at this time. For the GSM model, when 5 neighbors are used, the hard matching instead degrades tracking performance (compare GSM to GSM)0And
Figure BDA0003128029730000113
). The reason is that when the anchor points of the two graphs are the same target and the neighbors are not aligned, there is a hard matchThe resulting similarity will be low. Contrast to GSM0And
Figure BDA0003128029730000114
it can be seen that
Figure BDA0003128029730000115
The IDF1 was 5.9% higher and the IDS was lower, indicating that
Figure BDA0003128029730000116
When tracking, the same target is assigned the same ID more frequently.
The last line shows the influence on the tracking performance after finding the lost target, and the method is used
Figure BDA0003128029730000117
And (4) showing. When retrieving objects, 64 candidate objects are sampled for each missing object. It can be seen that
Figure BDA0003128029730000121
Is lower, indicating that a partially lost object has been retrieved. But instead of the other end of the tube
Figure BDA0003128029730000122
FP of (a) is also higher, indicating that some of the retrieval targets failed.
In order to find a proper K value, a lot of experiments are carried out by using soft matching, as shown in FIG. 8, and overall, when K is more than or equal to 5, the MOTA is slightly improved, and the IDF1 is basically kept unchanged. The reason is that: with the increase of the K value, when the anchor points in the two graphs are the same target, the obtained similarity score is higher (more reliable); conversely, when the anchor points in the two figures are different targets, the resulting similarity score is also higher (less reliable). Such positive and negative effects counteract each other. To trade off tracking performance against time, K is set to 5. On the verification set, a directed graph is created, and the similarity score between the two graphs is calculated, which takes 0.15ms and 0.03ms respectively. Will be provided with
Figure BDA0003128029730000128
After the model was replaced with the GSM model, the tracking speed dropped from 93.7FPS to 61.5 FPS.
Some tracking results are visualized in fig. 9, first row: n aive0Tracking a result; second row
Figure BDA0003128029730000123
The tracking result of (2);
Figure BDA0003128029730000124
the tracking result of (1). In frame 27 there are two objects within the dashed box, denoted object 2 on the right and object 8 on the left. At frame 29, the object 2 is partially occluded by the object 8, resulting in the object 2 not being detected. However, the position of the target 2 is controlled
Figure BDA0003128029730000125
Is well estimated. At frame 49, target 2 is completely occluded by target 8, Naive0Erroneously recognizing target 8 as target 2, but
Figure BDA0003128029730000126
And
Figure BDA0003128029730000127
the target is correctly identified (rectangle box to the right in the middle).
3. Results on MOTChalleng
The GSM model in the invention is applied to tracker (Tracking with out balls and places. ICCV 2019) with the best performance, which is published currently and is recorded as GSMTracktor. In addition, the algorithms MTDF (Multi-level cooperative fusion of gm-phd filters for on-line multiple human Tracking. IEEE Transactions on Multimedia2019), STAM (on-line multiple Tracking using cn-based single object Tracking with spatial-temporal Tracking. ICCV2017), DMMOT (on-line multiple-object Tracking with temporal Tracking around ECCV 2018), and AMIR (Tracking along with Tracking around to Tracking around on-line multiple Tracking around ECCV 2018) are also usedMultiple documents with long-term dependencies. ICCV2017), STRN (Spatial-temporal correlation networks for multiple-object tracking. ICCV 2019), DMAN (Online multi-object tracking with dual-description mapping. ECCV 2018), HAM-SADF (Online multi-object tracking with historical addressing mapping. AVSS 2018), MOTDT (Real-time multi-object tracking with historical mapped selection and characterization mapping. AVSS 2018), and FAMNT (Real-time multi-object tracking with historical mapped selection and characterization mapping. 2018). And tested on test sets of MOT16 and MOT 17. The test results are shown in table 2.
Table 2: tracking performance of different tracking algorithms on MOTChalnge
Figure BDA0003128029730000131
At MOT16, GSM in addition to MOTP, FP, and IDSTracktorThe best tracking performance was achieved, among other things, and ranked second (475) on the IDS, just a little higher than the IDS of the first (473). GSM in contrast to TracktorTracktorThe improvements were 3.6% and 5.7% on MOTA and IDF1, respectively, and the IDS was also reduced by 30.4%. At MOT17, see GSM as a wholeTracktorThe best performance is also achieved. GSM in contrast to TracktorTracktorBetter results were obtained in almost all indicators. In particular, the improvements on MOTA and IDF1 were 2.9% and 5.5%, respectively, and IDS was also reduced by more than 20%. GSM (Global System for Mobile communications)TracktorThe better performance is obtained on IDS and IDF1, which shows that the GSM model has good characteristic representation capability and the calculated similarity is more robust.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A similarity calculation method for multi-target tracking, comprising:
for each target in each video frame, calculating the neighbor of the target, constructing a vertex set by using the appearance characteristics of the target and the neighbor of the target, and calculating a directed edge set by using the correlation between the targets, thereby constructing a directed graph.
And for adjacent video frames, performing matching calculation by using the directed graphs of all targets in the two video frames to obtain a similarity calculation result.
2. The similarity calculation method for multi-target tracking according to claim 1, wherein the similarity is expressed for the target set in the t-th frame
Figure FDA0003128029720000011
Wherein the ith target is represented as
Figure FDA0003128029720000012
Representing a position frame of the ith target, wherein four elements are the coordinate of the upper left corner of the ith target and the width and the height of the position frame respectively;
Figure FDA0003128029720000013
representing picture blocks truncated in the t-th frame according to the position frame, ItIndicating the number of objects in the t-th frame.
3. The similarity calculation method for multi-target tracking according to claim 2, characterized in that K neighbors of the targets are obtained according to the distance between the targets, wherein K is the total number of the neighbors; for the t frame, with the ith target
Figure FDA0003128029720000014
For anchor points, the target
Figure FDA0003128029720000015
As its own 0 th neighbor, the target
Figure FDA0003128029720000016
And its neighbor forming set
Figure FDA0003128029720000017
Wherein the content of the first and second substances,
Figure FDA0003128029720000018
is an object
Figure FDA0003128029720000019
Is adjacent to the k-th neighbor.
4. The similarity calculation method for multi-target tracking according to claim 3, wherein for the ith target in the tth frame
Figure FDA00031280297200000110
The constructed directed graph is represented as
Figure FDA00031280297200000111
Wherein the vertex set
Figure FDA00031280297200000112
The definition is as follows:
Figure FDA00031280297200000113
in the formula (I), the compound is shown in the specification,
Figure FDA00031280297200000114
to represent
Figure FDA00031280297200000115
Of the appearance characteristics phiACNN(. cndot.) represents the forward function of the convolutional neural network used to extract appearance features.
For directed edge sets
Figure FDA00031280297200000116
First, by
Figure FDA00031280297200000117
Representing anchor points
Figure FDA00031280297200000118
And its neighboring neighbor
Figure FDA00031280297200000119
Relative position vector between:
Figure FDA00031280297200000120
in the formula, wtAnd htIs the width and height of the t-th frame, phiRP(-) is a function that calculates the relative position between the targets based on the location box.
Using relative position encoders to align relative position vectors
Figure FDA00031280297200000121
Is transformed to obtain
Figure FDA00031280297200000122
Thereby obtaining a set of directed edges
Figure FDA00031280297200000123
Figure FDA00031280297200000124
In the formula, phiRPE(. cndot.) is a relative position encoder.
5. The similarity calculation method for multi-target tracking according to claim 4, wherein the hard matching using the directed graph of each target in two video frames comprises:
given the directed graphs of two objects for adjacent video frames, frame t-1 and frame t
Figure FDA00031280297200000125
And
Figure FDA00031280297200000126
first, a similarity matrix is calculated
Figure FDA0003128029720000021
The elements in the k-th row and k' -th column of the matrix are calculated as follows:
Figure FDA0003128029720000022
in the formula (I), the compound is shown in the specification,
Figure FDA0003128029720000023
representing element subtraction between feature vectors, | · non-2Representing the squaring of elements in a pair vector, [, ]]Means that two vectors are spliced together, phiBC(. cndot.) represents the forward function of the two classifiers.
Finally, the following are obtained:
Figure FDA0003128029720000024
6. the similarity calculation method for multi-target tracking according to claim 5, wherein the soft matching is performed by using the directed graphs of the targets in the two video frames, and comprises the following steps:
on the basis of hard matching, firstly performing proximity alignment, and then calculating the similarity, which is expressed as:
Figure FDA0003128029720000025
in the formula (I), the compound is shown in the specification,
Figure FDA0003128029720000026
is a similarity matrix Si,jRemoving the first row and the first column to obtain a matrix; phi is aLA(. cndot.) is a linear distribution function for completing task distribution and returning the maximum total similarity sum according to the input similarity matrix.
7. A multi-target tracking method is characterized in that the method of any one of claims 1 to 6 is applied to an existing multi-target tracking method based on data association to replace a similarity model in the existing multi-target tracking method.
8. The multi-target tracking method according to claim 7, comprising: finding the lost target in the current frame by using the information of the previous frame, the steps are as follows:
for the ith target in the t-1 frame
Figure FDA0003128029720000027
If lost at the t frame, then use the ith target
Figure FDA0003128029720000028
Its k-th neighbor
Figure FDA0003128029720000029
Relative position of each other
Figure FDA00031280297200000210
And
Figure FDA00031280297200000211
estimate the location frame of the ith target in the t-th frame
Figure FDA00031280297200000212
Figure FDA00031280297200000213
In the formula (I), the compound is shown in the specification,
Figure FDA00031280297200000214
is indicative of phiRPInverse function of phiRP(-) is a function that calculates the relative position between the targets based on the location box;
Figure FDA00031280297200000215
to represent
Figure FDA00031280297200000216
The corresponding target in the t-th frame.
For the ith target
Figure FDA00031280297200000217
All the K neighbors estimate the position frame of the ith target in the t frame in the above mode, and the t frame is calculated in an averaging modeLast position frame of ith target
Figure FDA00031280297200000218
9. The multi-target tracking method according to claim 8, wherein a final position frame of the ith target in the tth frame is obtained
Figure FDA00031280297200000219
Then based on
Figure FDA00031280297200000220
Several candidate boxes are sampled by a gaussian distribution.
For any candidate frame sampled
Figure FDA00031280297200000221
By using
Figure FDA00031280297200000222
To represent
Figure FDA00031280297200000223
One candidate object in the t-th frame,
Figure FDA00031280297200000224
show according to location box
Figure FDA00031280297200000225
Picture blocks truncated in the t-1 th frame and constructing a directed graph
Figure FDA00031280297200000226
Then, a graph is obtained
Figure FDA00031280297200000227
And
Figure FDA00031280297200000228
the similarity between them.
And (3) extracting the candidate detection result with the highest similarity score from all the candidate detection results, and if the similarity score is larger than a set threshold value, taking the candidate target with the highest score as the candidate target
Figure FDA0003128029720000031
Tracking results in the t-th frame.
CN202110695292.5A 2021-06-23 2021-06-23 Similarity calculation method and multi-target tracking method Pending CN113409356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110695292.5A CN113409356A (en) 2021-06-23 2021-06-23 Similarity calculation method and multi-target tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110695292.5A CN113409356A (en) 2021-06-23 2021-06-23 Similarity calculation method and multi-target tracking method

Publications (1)

Publication Number Publication Date
CN113409356A true CN113409356A (en) 2021-09-17

Family

ID=77682492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110695292.5A Pending CN113409356A (en) 2021-06-23 2021-06-23 Similarity calculation method and multi-target tracking method

Country Status (1)

Country Link
CN (1) CN113409356A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882580A (en) * 2020-07-17 2020-11-03 元神科技(杭州)有限公司 Video multi-target tracking method and system
EP3770854A1 (en) * 2018-09-14 2021-01-27 Tencent Technology (Shenzhen) Company Limited Target tracking method, apparatus, medium, and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3770854A1 (en) * 2018-09-14 2021-01-27 Tencent Technology (Shenzhen) Company Limited Target tracking method, apparatus, medium, and device
CN111882580A (en) * 2020-07-17 2020-11-03 元神科技(杭州)有限公司 Video multi-target tracking method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIANKUN LIU ET AL.: "GSM: Graph Similarity Model for Multi-Object Tracking", 《PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-20)》 *

Similar Documents

Publication Publication Date Title
Chen et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification
CN108470354B (en) Video target tracking method and device and implementation device
Lynen et al. Placeless place-recognition
Lee et al. Place recognition using straight lines for vision-based SLAM
CN108960047B (en) Face duplication removing method in video monitoring based on depth secondary tree
CN106780557A (en) A kind of motion target tracking method based on optical flow method and crucial point feature
CN104200495A (en) Multi-target tracking method in video surveillance
Fazli et al. Particle filter based object tracking with sift and color feature
Li et al. Robust object tracking with discrete graph-based multiple experts
Li et al. Adaptive and compressive target tracking based on feature point matching
Poiesi et al. Tracking multiple high-density homogeneous targets
Wan et al. Tracking beyond detection: learning a global response map for end-to-end multi-object tracking
Xiang et al. Multitarget tracking using hough forest random field
Leyva et al. Video anomaly detection based on wake motion descriptors and perspective grids
CN111091583A (en) Long-term target tracking method
CN113409356A (en) Similarity calculation method and multi-target tracking method
Narayan et al. Learning deep features for online person tracking using non-overlapping cameras: A survey
Han et al. Multi-target tracking based on high-order appearance feature fusion
Liu et al. [Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video
Taalimi et al. Robust multi-object tracking using confident detections and safe tracklets
Yang et al. Keyframe-based camera relocalization method using landmark and keypoint matching
Bai et al. Pedestrian Tracking and Trajectory Analysis for Security Monitoring
Lu et al. A robust tracking architecture using tracking failure detection in Siamese trackers
Tang et al. An online LC-KSVD based dictionary learning for multi-target tracking
Nithin et al. Multi-camera tracklet association and fusion using ensemble of visual and geometric cues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210917

WD01 Invention patent application deemed withdrawn after publication