CN112001252A - Multi-target tracking method based on heteromorphic graph network - Google Patents

Multi-target tracking method based on heteromorphic graph network Download PDF

Info

Publication number
CN112001252A
CN112001252A CN202010712454.7A CN202010712454A CN112001252A CN 112001252 A CN112001252 A CN 112001252A CN 202010712454 A CN202010712454 A CN 202010712454A CN 112001252 A CN112001252 A CN 112001252A
Authority
CN
China
Prior art keywords
target
frame
detection
detection frame
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010712454.7A
Other languages
Chinese (zh)
Other versions
CN112001252B (en
Inventor
张宝鹏
李芮
滕竹
刘炜
李浥东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202010712454.7A priority Critical patent/CN112001252B/en
Publication of CN112001252A publication Critical patent/CN112001252A/en
Application granted granted Critical
Publication of CN112001252B publication Critical patent/CN112001252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention provides a multi-target tracking method based on a heteromorphic graph network, which is applied to multi-target tracking. Firstly, an object detection frame is obtained by using an object detection algorithm, and then data association between video frames is carried out by using optical flow calculation and linear regression operation. In order to solve the problem of target occlusion, the model uses a heteromorphic network to extract the characteristics of a detection frame and a tracking target for similarity measurement after data association, and judges whether a newly-appeared detection frame belongs to the existing target. The heterogeneous graph network comprises three parts of appearance feature extraction, spatial relationship extraction and time relationship extraction, and is used for learning discriminant features so as to encode information such as appearance, spatial position, time relationship and the like of a target, and improve the representation capability and discriminant capability of the features, thereby improving the performance of multi-target tracking.

Description

Multi-target tracking method based on heteromorphic graph network
Technical Field
The invention relates to the technical field of computer vision, in particular to a multi-target tracking method based on a heteromorphic graph network.
Background
With the development of deep learning, the convolutional neural network is applied to more and more scenes, and multi-target tracking is increasingly identified in the field of computer vision due to the wide application of the convolutional neural network in video monitoring, human-computer interaction and virtual reality. Multi-target tracking aims at locating multiple target objects in a given video sequence, assigning different identity IDs to different objects and recording the trajectory of each ID in the video. At present, with the continuous development of target detection technology based on convolutional neural network, a tracking algorithm based on detection has become the mainstream direction of multi-target tracking. The detection-based tracking algorithm firstly needs to perform target detection on each video frame to obtain a detection result of each frame, and then performs data association according to the detection result to create a track of each object in the video.
In detection-based tracking algorithms, it is important to learn the feature representation of objects with discriminability, which determines whether the tracker can correctly distinguish the trajectories between different objects. However, because the appearance of the target in the video shot by the camera is fuzzy, most of the existing methods only consider the appearance characteristics of the target and cannot accurately identify and distinguish different targets, and the methods mainly solve the problem of data association and cannot process the frequent occlusion problem existing in the video, which directly affects the improvement of the algorithm performance.
In multi-target tracking based on detection, most methods mainly research the data association problem, namely designing a robust model to perform association of the same target between adjacent video frames to obtain the tracks of all targets in a video sequence. However, the above method neglects the influence of the occlusion problem on the target track, and treats the occluded target as the target whose track has been terminated, which directly affects the performance of multi-target tracking. Recently, a multi-target tracking method using a regressor in a target detection algorithm to perform data association achieves good tracking effect and further processes the problem of target occlusion, as shown in fig. 5, a video sequence and the positions of all target detection frames in the video sequence obtained by using a common target detection algorithm are input, then a regressor in a target detection algorithm fast RCNN is used to perform data association between video frames, a convolutional neural network ResNet-50 is used to extract appearance characteristics of targets and detection frames, target re-identification is performed to judge whether a detection frame newly appearing in the video frames belongs to a terminated target track, and finally the target track obtained after the above operation is output. The performance of the method is superior to that of most existing multi-target tracking methods, but only appearance characteristics of the target are considered, and information such as spatial topology, time relation and the like in a multi-target tracking scene is ignored.
Disclosure of Invention
The embodiment of the invention provides a multi-target tracking method based on a heteromorphic graph network, which is used for solving the problem that how to find a feature representation method capable of enhancing the target discriminability under the condition of fully utilizing multi-target tracking video data in the prior art; under the conditions that the multi-target tracking video data is low in pixel, targets are fuzzy, light rays are different, visual angles are different, and the targets are shielded, the judgment force of target feature representation is improved on the basis of appearance features, and the technical problem of shielding is solved.
In order to achieve the purpose, the invention adopts the following technical scheme.
A multi-target tracking method based on a heteromorphic graph network is characterized by comprising the following steps:
s1, extracting a detection frame of each frame through a common target detection algorithm based on the original video sequence;
s2, obtaining the position of the target in each frame of image through data association processing based on the original video sequence and the detection frame;
s3, based on the position and the detection frame of the target, obtaining target characteristics and detection characteristics through the network processing of the heteromorphic graph;
s4, similarity measurement processing is carried out on the target characteristic and the detection characteristic, whether the detection frame belongs to a certain termination target or not is judged, if yes, the detection frame is added into the termination target, and the termination target is set to be in an active state; otherwise, initializing a new target for the detection frame; judging whether the current frame is the last frame of the video, if so, ending the execution of the method; otherwise, the step S2 is executed.
Preferably, the obtaining the position of the target in each frame of image through data association processing based on the original video sequence and the detection frame includes:
let t be the frame sequence number of the original video sequence;
when t is equal to 1, initializing by using all detection frames in the 1 st frame, and acquiring the initial position of the target;
when t is greater than 1, performing data association according to the position of the detection frame in the t-1 frame to obtain the position of the target of the t-th frame;
and carrying out position discrimination processing on the position of the target of the t-th frame by using a binary classifier.
Preferably, when t >1, the performing data association according to the position of the detection frame in the t-1 frame includes: adjusting the position of the detection frame at the t-1 th frame through the light flow graph of the adjacent video frame, and then performing regression operation on the adjusted detection frame by using a linear regressor to obtain the position of the target of the t-th frame;
the step of performing position discrimination processing on the position of the target of the t-th frame by the classifier comprises the following steps:
the position of a target of the t frame is scored by using a binary classifier, if the position score of a certain target is smaller than a preset threshold value, the target is judged as a terminated target, and otherwise, the target is judged as an active target;
and calculating the intersection ratio between the detection frame of the t-th frame and the position of the active target to obtain a detection frame which cannot be matched with the position of the target.
Preferably, the obtaining of the target feature and the detection feature through the heteromorphic network processing based on the position and the detection frame of the target includes:
respectively extracting appearance characteristics, spatial relation characteristics and time relation characteristics of the t frame which is judged as a track terminated target through an abnormal composition network;
fusing the appearance characteristic, the spatial relation characteristic and the time relation characteristic of the target with the t-th frame judged as the track termination to obtain a target characteristic;
extracting appearance characteristics and spatial relation characteristics of a detection frame which cannot be matched with the active target through the heterogeneous graph network;
and fusing the appearance characteristic and the spatial relation characteristic of the detection frame which cannot be matched with the active target to obtain the detection characteristic.
Preferably, the performing similarity measurement processing on the target feature and the detection feature, and determining whether the detection frame belongs to a certain termination target specifically includes:
calculating the Euclidean distance between the detection feature and the target feature, if the Euclidean distance is smaller than a preset threshold value, adding a detection frame which cannot be matched with the active target into a termination target track smaller than the preset threshold value, setting the target as the active target, and executing the data association processing process on the target at the t +1 th frame; otherwise, initializing a detection frame which cannot be matched with the active target to obtain a new target ID;
and judging whether the original video sequence is finished, if so, outputting a tracking track of the target, and otherwise, continuing to execute the step S2 aiming at the t +1 th frame.
According to the technical scheme provided by the embodiment of the invention, the multi-target tracking method based on the heteromorphic graph network is applied to multi-target tracking. Firstly, an object detection frame is obtained by using an object detection algorithm, and then data association between video frames is carried out by using optical flow calculation and linear regression operation. In order to solve the problem of target occlusion, the model uses a heteromorphic network to extract the characteristics of a detection frame and a tracking target for similarity measurement after data association, and judges whether a newly-appeared detection frame belongs to the existing target. The heterogeneous graph network comprises three parts of appearance feature extraction, spatial relationship extraction and time relationship extraction, and is used for learning discriminant features so as to encode information such as appearance, spatial position, time relationship and the like of a target, and improve the representation capability and discriminant capability of the features, thereby improving the performance of multi-target tracking.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a processing flow chart of a multi-target tracking method based on a heteromorphic graph network according to the present invention;
FIG. 2 is an overall frame diagram of a multi-target tracking model based on an heteromorphic network in the multi-target tracking method based on the heteromorphic network according to the present invention;
FIG. 3 is a flowchart of a specific implementation procedure of a heterogeneous graph network-based multi-target tracking method according to the present invention;
FIG. 4 is a logic diagram of heterogeneous graph network processing in a heterogeneous graph network-based multi-target tracking method provided by the present invention;
fig. 5 is an overall frame diagram of an approximate solution in the prior art.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
Referring to fig. 1, the multi-target tracking method based on the heteromorphic network provided by the invention comprises the following steps:
s1, extracting a detection frame of each frame through a common target detection algorithm based on the original video sequence;
s2, obtaining the position of the target in each frame of image through data association processing based on the original video sequence and the detection frame;
s3, based on the position and the detection frame of the target, obtaining target characteristics and detection characteristics through the network processing of the heteromorphic graph;
s4, similarity measurement processing is carried out on the target characteristic and the detection characteristic, whether the detection frame belongs to a certain termination target or not is judged, if yes, the detection frame is added into the termination target, and the termination target is set to be in an active state; otherwise, initializing a new target for the detection frame; judging whether the current frame is the last frame of the video, if so, ending the execution of the method; otherwise, the above step S2 is executed.
In the embodiment provided by the present invention, the detection frame, i.e. the detection response, is also referred to as "detection hypothesis" or "detection observed quantity", which is the output quantity of the detection process. Objects are closed regions that are clearly distinguished from the surroundings in an image and are often referred to as objects, although these objects generally have some physical significance, such as pedestrians, cars, etc. The tracks are output quantities of multi-target tracking, and one track corresponds to a position sequence of the target in a time period. Multiple target basis is to locate multiple targets of interest simultaneously in a given video and maintain their IDs, record their trajectories.
In the embodiment of the invention, a multi-target tracking model based on a heteromorphic graph network is provided, and the model consists of three modules of data association, heterogeneous feature extraction and similarity measurement. As shown in fig. 2, the input of the present invention includes two parts, an original video sequence and a detection frame position obtained by a common target detection algorithm, and the input data is sequentially subjected to data association, extraction of heterogeneous features such as appearance, spatial and temporal relationships, and similarity measurement operations to obtain a final target track. The working process of each module is as follows:
data association module
Inputting an original video sequence and a detection frame position obtained by a common target detection algorithm into a model, firstly, passing through a data association module, finely adjusting the position of a target in the previous frame by using a light flow graph, then, regressing the finely adjusted position by using a linear regressor, obtaining the position of the target in the current frame image, judging whether a track is terminated according to a fraction obtained by a classifier, and then, calculating the intersection ratio between the detection frame position of the current frame and the target position to obtain a detection frame which cannot be matched with the target; the data association module is connected with the heterogeneous feature extraction module, and a target obtained by data association is input into the heterogeneous feature extraction module to extract heterogeneous features;
heterogeneous feature extraction module
The input of the heterogeneous feature extraction module is a detection box which cannot be matched with a target after data association and the target of which the current frame track is already terminated. The heterogeneous feature extraction module comprises an appearance feature extraction sub-network, a spatial relationship extraction sub-network and a temporal relationship extraction sub-network, extracts the appearance, spatial and temporal relationships of the target and fuses to obtain target features, and extracts the appearance and spatial relationships of the detection frames and fuses to obtain detection features. The fused target feature and the detection feature are transmitted to a similarity measurement module;
similarity measurement module
The module processes the target feature and the detection feature obtained by the heterogeneous feature extraction module, calculates the Euclidean distance between the target feature and the detection feature, compares the Euclidean distance with a given threshold value, judges whether a detection frame which is obtained by the data association module and cannot be matched belongs to a terminated track, if the Euclidean distance is smaller than the threshold value, the detection frame is added into the terminated track of the target, and otherwise, the detection frame is initialized to a new target.
In the embodiment provided by the present invention, as shown in fig. 3, the overall processing flow is as follows.
In the above step S1, a specific process is to give a video sequence that needs multi-target tracking, extract a detection box existing in each frame of the video sequence by using a common target detection algorithm, and input the video sequence and the obtained detection box into the model.
In the step S2, the specific process includes:
let t be the frame ordinal number of the original video sequence, start from t ═ 1 until all video frames of the video sequence are processed;
when t is equal to 1, initializing by using all detection frames in the 1 st frame, and acquiring the initial position of the target;
when t is greater than 1, performing data association according to the position of the target in a t-1 frame, firstly, fine-tuning the position of the target in the t-1 frame through a light flow graph, and regressing the fine-tuned position by using a linear regressor to obtain the position of the target in the t-th frame;
and carrying out position discrimination processing on the position of the target of the t-th frame by using a binary classifier.
Further, the process of the discrimination processing specifically includes:
the position of the target of the t-th frame is scored by using a binary classifier, if the position score of a certain target is smaller than a preset threshold value for judgment, the target is judged to be a terminated target and is added into a target set of which the track is terminated, otherwise, the target is judged to be an active target and is added into an active target set, and the data association operation of the active target is continued for the next frame;
and calculating the intersection ratio between the detection frame of the t-th frame and the position of the active target to obtain a detection frame which cannot be matched with the active target.
In experiments, the applicant finds that the environment of the multi-target tracking video is complex, the number of targets is large, frequent shielding exists among the targets or the targets are shielded by surrounding buildings, so that in the whole video sequence, the shielded target track can be stopped and disappeared, but the target track can disappear for a period of time and reappear, and a new ID is given when the target track reappears, so that the overall effect of multi-target tracking is influenced. In order to solve the problems, the invention adds a heterogeneous feature extraction and similarity measurement module, judges whether new detection occurs in a video frame after data association is completed, extracts heterogeneous features of a target of which the new detection and the track are already terminated to perform similarity measurement, and judges whether the new detection belongs to the target of which the track is terminated due to previous shielding.
As shown in fig. 4, the heterogeneous graph network is composed of a convolutional neural network for extracting appearance features of the target and the detection, a spatial relationship graph network and a temporal relationship graph network, and the spatial and temporal relationship graphs simultaneously encode spatial and temporal relationships.
In the step of extracting and processing the heterogeneous characteristics, the specific process is as follows:
respectively extracting appearance characteristics, spatial relation characteristics and time relation characteristics of the t frame which is judged as a track terminated target through an abnormal composition network;
fusing the appearance characteristic, the spatial relation characteristic and the time relation characteristic of the target with the t-th frame judged as the track termination to obtain a target characteristic;
after the execution process of the intersection ratio calculation is completed, if all the detection frames are matched with the positions of the targets one by one, the operation on the current frame can be represented to be completed, otherwise, the appearance characteristics and the spatial relationship characteristics of the detection frames which cannot be matched with the active targets are extracted through the heteromorphic network;
and fusing the appearance characteristic and the spatial relation characteristic of the detection frame which cannot be matched with the active target to obtain the detection characteristic.
Compared with the prior art, the heterogeneous graph network is additionally provided with a heterogeneous feature extraction and similarity measurement module, the proposed heterogeneous graph network is used for extracting targets and detected heterogeneous features, then the similarity measurement of the features is carried out, and whether the detection belongs to a target of which a certain track is already terminated is judged. The environment of the multi-target tracking video is complex, the number of targets is large, frequent shielding exists among the targets or the targets are shielded by surrounding buildings, and therefore target tracks disappear for a period of time and reappear in the whole video sequence. The two modules are added after data association, and are mainly used for solving the problem of target occlusion. The data association provided by the invention can achieve a relatively ideal tracking effect under the condition that a target is not shielded, but once the shielding phenomenon occurs, the shielded target track is terminated, and a new target ID is given when the shielded target track reappears, so that the overall effect of multi-target tracking is influenced.
Further, the process of performing the similarity measurement process specifically includes:
calculating the Euclidean distance between the detection feature and the target feature, if the Euclidean distance is smaller than a preset threshold value, determining that a detection frame which cannot be matched with the active target belongs to a target of which the track is already terminated, adding the target into the track of the target, setting the target as the active target, and executing the data association processing process on the target at the t +1 th frame, otherwise, determining that the detection frame which cannot be matched with the position of the active target does not belong to the target of which the existing track is already terminated, and initializing the detection frame to obtain a new target ID;
and finally, judging whether the original video sequence is finished or not, outputting the tracking tracks of all the targets if the original video sequence is finished, and returning to continue executing the step S2 aiming at the t +1 th frame if the original video sequence is not finished.
In summary, the multi-target tracking method based on the heteromorphic network provided by the invention uses discriminant feature learning, and firstly uses a common target detection algorithm, namely fast RCNN, to extract possible positions of targets in all video frames of a video sequence, namely a detection frame. And then, the optical flow graph and the linear regressor are used for carrying out data association, and the data association method has the advantages of simple structure, convenience in operation and higher accuracy of adjacent frame target association. Because frequent shielding phenomenon exists in multi-target tracking, in order to improve the tracking performance, the invention judges whether a new detection frame appears in a video frame after data association, extracts the appearance, space, time relation and other characteristics of the new detection frame and a target with a terminated track by using a heteromorphic network for fusion, and finally measures the similarity of the detection characteristics and the target characteristics for judging whether the newly appeared detection belongs to the target with the terminated track. In the invention, data association and heterogeneous feature extraction supplement each other, the representation capability and discrimination capability of the model are improved together, and the performance of multi-target tracking is further improved.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (5)

1. A multi-target tracking method based on a heteromorphic graph network is characterized by comprising the following steps:
s1, extracting a detection frame of each frame through a common target detection algorithm based on the original video sequence;
s2, obtaining the position of the target in each frame of image through data association processing based on the original video sequence and the detection frame;
s3, based on the position and the detection frame of the target, obtaining target characteristics and detection characteristics through the network processing of the heteromorphic graph;
s4, similarity measurement processing is carried out on the target characteristic and the detection characteristic, whether the detection frame belongs to a certain termination target or not is judged, if yes, the detection frame is added into the termination target, and the termination target is set to be in an active state; otherwise, initializing a new target for the detection frame; judging whether the current frame is the last frame of the video, if so, ending the execution of the method; otherwise, the step S2 is executed.
2. The method of claim 1, wherein obtaining the position of the target in each frame of image through data association processing based on the original video sequence and the detection frame comprises:
let t be the frame sequence number of the original video sequence;
when t is equal to 1, initializing by using all detection frames in the 1 st frame, and acquiring the initial position of the target;
when t is greater than 1, performing data association according to the position of the detection frame in the t-1 frame to obtain the position of the target of the t-th frame;
and carrying out position discrimination processing on the position of the target of the t-th frame by using a binary classifier.
3. The method according to claim 2, wherein when t >1, the associating data at the position of t-1 frame according to the detection frame comprises: adjusting the position of the detection frame at the t-1 th frame through the light flow graph of the adjacent video frame, and then performing regression operation on the adjusted detection frame by using a linear regressor to obtain the position of the target of the t-th frame;
the step of performing position discrimination processing on the position of the target of the t-th frame by the classifier comprises the following steps:
the position of a target of the t frame is scored by using a binary classifier, if the position score of a certain target is smaller than a preset threshold value, the target is judged as a terminated target, and otherwise, the target is judged as an active target;
and calculating the intersection ratio between the detection frame of the t-th frame and the position of the active target to obtain a detection frame which cannot be matched with the position of the target.
4. The method of claim 3, wherein obtaining the target feature and the detected feature through heterogeneous graph network processing based on the position and the detection frame of the target comprises:
respectively extracting appearance characteristics, spatial relation characteristics and time relation characteristics of the t frame which is judged as a track terminated target through an abnormal composition network;
fusing the appearance characteristic, the spatial relation characteristic and the time relation characteristic of the target with the t-th frame judged as the track termination to obtain a target characteristic;
extracting appearance characteristics and spatial relation characteristics of a detection frame which cannot be matched with the active target through the heterogeneous graph network;
and fusing the appearance characteristic and the spatial relation characteristic of the detection frame which cannot be matched with the active target to obtain the detection characteristic.
5. The method according to claim 4, wherein said performing the similarity measure processing on the target feature and the detection feature to determine whether the detection frame belongs to a certain termination target specifically comprises:
calculating the Euclidean distance between the detection feature and the target feature, if the Euclidean distance is smaller than a preset threshold value, adding a detection frame which cannot be matched with the active target into a termination target track smaller than the preset threshold value, setting the target as the active target, and executing the data association processing process on the target at the t +1 th frame; otherwise, initializing a detection frame which cannot be matched with the active target to obtain a new target ID;
and judging whether the original video sequence is finished, if so, outputting a tracking track of the target, and otherwise, continuing to execute the step S2 aiming at the t +1 th frame.
CN202010712454.7A 2020-07-22 2020-07-22 Multi-target tracking method based on different composition network Active CN112001252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010712454.7A CN112001252B (en) 2020-07-22 2020-07-22 Multi-target tracking method based on different composition network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010712454.7A CN112001252B (en) 2020-07-22 2020-07-22 Multi-target tracking method based on different composition network

Publications (2)

Publication Number Publication Date
CN112001252A true CN112001252A (en) 2020-11-27
CN112001252B CN112001252B (en) 2024-04-12

Family

ID=73468031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010712454.7A Active CN112001252B (en) 2020-07-22 2020-07-22 Multi-target tracking method based on different composition network

Country Status (1)

Country Link
CN (1) CN112001252B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744316A (en) * 2021-09-08 2021-12-03 电子科技大学 Multi-target tracking method based on deep neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160343146A1 (en) * 2015-05-22 2016-11-24 International Business Machines Corporation Real-time object analysis with occlusion handling
CN107330920A (en) * 2017-06-28 2017-11-07 华中科技大学 A kind of monitor video multi-target tracking method based on deep learning
CN109360226A (en) * 2018-10-17 2019-02-19 武汉大学 A kind of multi-object tracking method based on time series multiple features fusion
CN109800689A (en) * 2019-01-04 2019-05-24 西南交通大学 A kind of method for tracking target based on space-time characteristic fusion study
CN109993770A (en) * 2019-04-09 2019-07-09 西南交通大学 A kind of method for tracking target of adaptive space-time study and state recognition
CN111161311A (en) * 2019-12-09 2020-05-15 中车工业研究院有限公司 Visual multi-target tracking method and device based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160343146A1 (en) * 2015-05-22 2016-11-24 International Business Machines Corporation Real-time object analysis with occlusion handling
CN107330920A (en) * 2017-06-28 2017-11-07 华中科技大学 A kind of monitor video multi-target tracking method based on deep learning
CN109360226A (en) * 2018-10-17 2019-02-19 武汉大学 A kind of multi-object tracking method based on time series multiple features fusion
CN109800689A (en) * 2019-01-04 2019-05-24 西南交通大学 A kind of method for tracking target based on space-time characteristic fusion study
CN109993770A (en) * 2019-04-09 2019-07-09 西南交通大学 A kind of method for tracking target of adaptive space-time study and state recognition
CN111161311A (en) * 2019-12-09 2020-05-15 中车工业研究院有限公司 Visual multi-target tracking method and device based on deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744316A (en) * 2021-09-08 2021-12-03 电子科技大学 Multi-target tracking method based on deep neural network

Also Published As

Publication number Publication date
CN112001252B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
Fradi et al. Crowd behavior analysis using local mid-level visual descriptors
Liu et al. Context-aware three-dimensional mean-shift with occlusion handling for robust object tracking in RGB-D videos
WO2021017291A1 (en) Darkflow-deepsort-based multi-target tracking detection method, device, and storage medium
Huang et al. Robust object tracking by hierarchical association of detection responses
Wang et al. Tracklet association by online target-specific metric learning and coherent dynamics estimation
KR20190023389A (en) Multi-Class Multi-Object Tracking Method using Changing Point Detection
Denman et al. Multi-spectral fusion for surveillance systems
Kim et al. Online tracker optimization for multi-pedestrian tracking using a moving vehicle camera
Al-Shakarji et al. Robust multi-object tracking with semantic color correlation
Iraei et al. Object tracking with occlusion handling using mean shift, Kalman filter and edge histogram
Yang et al. 3D multiview basketball players detection and localization based on probabilistic occupancy
CN111192297A (en) Multi-camera target association tracking method based on metric learning
Liu et al. Accelerating vanishing point-based line sampling scheme for real-time people localization
CN114220061A (en) Multi-target tracking method based on deep learning
Li et al. Smot: Single-shot multi object tracking
Atghaei et al. Abnormal event detection in urban surveillance videos using GAN and transfer learning
Liu et al. Semantic superpixel based vehicle tracking
CN112001252B (en) Multi-target tracking method based on different composition network
Liu et al. Multi-view vehicle detection and tracking in crossroads
CN115188081B (en) Complex scene-oriented detection and tracking integrated method
Roopchand et al. Bat detection and tracking toward batsman stroke recognition
Avgerinakis et al. Moving camera human activity localization and recognition with motionplanes and multiple homographies
Luo et al. Crowd counting for static images: a survey of methodology
Truong et al. Single object tracking using particle filter framework and saliency-based weighted color histogram
Han et al. Multi-target tracking based on high-order appearance feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant