CN114596340A - Multi-target tracking method and system for monitoring video - Google Patents
Multi-target tracking method and system for monitoring video Download PDFInfo
- Publication number
- CN114596340A CN114596340A CN202210220010.0A CN202210220010A CN114596340A CN 114596340 A CN114596340 A CN 114596340A CN 202210220010 A CN202210220010 A CN 202210220010A CN 114596340 A CN114596340 A CN 114596340A
- Authority
- CN
- China
- Prior art keywords
- target
- tracking
- monitoring video
- tracked
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 158
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 54
- 238000013528 artificial neural network Methods 0.000 claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000001914 filtration Methods 0.000 claims abstract description 24
- 238000009432 framing Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000001514 detection method Methods 0.000 claims description 45
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 11
- 238000002372 labelling Methods 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 2
- 239000010410 layer Substances 0.000 description 14
- 238000010586 diagram Methods 0.000 description 10
- 208000034012 Acid sphingomyelinase deficiency Diseases 0.000 description 8
- 101000595674 Homo sapiens Pituitary homeobox 3 Proteins 0.000 description 8
- 102100036088 Pituitary homeobox 3 Human genes 0.000 description 8
- 208000026753 anterior segment dysgenesis Diseases 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000004927 fusion Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a multi-target tracking method and a multi-target tracking system for a surveillance video, which relate to the technical field of computer vision technology and civil aviation traffic engineering, and are used for acquiring a real-time surveillance video to be tracked; performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked; inputting a video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked can be automatically determined by the multi-target recognition model and the multi-target tracking model obtained by improving and training the YOLOv4 neural network and the DeepsORT neural network, so that the intelligent level of video monitoring is improved.
Description
Technical Field
The invention relates to the technical field of computer vision technology and civil aviation traffic engineering, in particular to a multi-target tracking method and system for a surveillance video.
Background
In recent years, with the rapid development of civil aviation industry, airport areas are increasingly enlarged, the traffic conditions of airport surfaces such as runways, taxiways and airports are increasingly complex, and the probability of collision of airplanes on the surfaces is increased. A plurality of runways exist in large airports such as Beijing, Shanghai, Xian and the like, and the airport scene is often congested. In addition, the sight of the terminal building is blocked, so that monitoring blind areas exist on airport parking ramps and partial taxiways, and potential safety hazards are buried for traffic control of airport surfaces. Therefore, it is very necessary for the auxiliary control personnel to effectively master the traffic situation of the airport to monitor the airport scene.
Because the current automatic monitoring system has simple functions, the current automatic monitoring system still is a monitoring means mainly adopted by domestic large airports depending on manual visual observation. At present, most of large-flow international airports (Hangzhou Xiaoshan international airport, Chongqing Jiangbei international airport, Shenzhen Baoan international airport and the like) adopt a semi-manual semi-automatic monitoring system, and along with the continuous increase of airport passenger flow, especially during the rush hour of flight, the supervision difficulty of safety monitoring work of an air park is increased, which puts higher requirements on the working capacity of air park monitoring personnel. Because the scene vehicle and personnel are more, and the operation time limit is high and the environment is relatively abominable, the general staff of security protection control on air park is not enough, and traditional artifical visual control management means has the safety bottleneck, supervises the emergence that the unartificial airport incident that leads to easily. Therefore, it is difficult for the conventional manual visual observation to meet the monitoring requirement of the airport scene, and there is a need to improve the intelligent level of the scene video monitoring system.
Disclosure of Invention
The invention aims to provide a multi-target tracking method and a multi-target tracking system for a surveillance video, which can track a plurality of targets in the surveillance video and improve the intelligent level of video surveillance.
In order to achieve the purpose, the invention provides the following scheme:
a multi-target tracking method for surveillance videos comprises the following steps:
acquiring a real-time monitoring video to be tracked;
performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
inputting the video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
Optionally, before the obtaining the real-time monitoring video to be tracked, the method further includes:
acquiring a historical monitoring video;
extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
labeling target information in each historical monitoring video frame in a first historical monitoring video frame sequence to obtain a labeled historical monitoring video frame sequence;
determining a historical target information group sequence of an annotated historical monitoring video frame sequence;
and training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target recognition model.
Optionally, before the obtaining the real-time monitoring video to be tracked, the method further includes:
amplifying the sizes of the convolution layer, the residual module and the network output size of the DeepSORT neural network to obtain an amplified DeepSORT neural network;
carrying out dimensionality reduction on the network structure of the amplified DeepSORT neural network to obtain an improved DeepSORT neural network;
performing frame division processing on the historical monitoring video to obtain a plurality of historical monitoring video frames serving as a second historical monitoring video frame sequence;
carrying out related labeling on the same target information in the second historical monitoring video frame sequence by using a Darklabel tool to obtain historical tracking tracks of a plurality of targets;
and taking a plurality of historical target information group sequences as input, taking historical tracking tracks of a plurality of targets as expected output, and training the improved DeepsORT neural network to obtain the multi-target tracking model.
Optionally, before obtaining the real-time monitoring video to be tracked, the method further includes:
acquiring a historical monitoring video which is the same as the monitoring scene of the real-time monitoring video to be tracked as a pre-training video;
and training the multi-target tracking model by using the pre-training video to obtain a plurality of initial tracking tracks in a monitoring scene.
Optionally, the determining, according to the target information group sequence, the tracking trajectories of the multiple targets in the real-time monitoring video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model specifically includes:
making the iteration number m equal to 1;
determining the initial tracking track as the tracking track of the 0 th iteration;
matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group;
matching a plurality of targets in the first unmatched target group with a plurality of tracking tracks in the (m-1) th iteration by using an IoU matching algorithm to obtain a second target-tracking track matching group;
combining the first target-tracking track matching group and the second target-tracking track matching group into a total matching group;
updating the corresponding tracking tracks according to the coordinates of the targets in the total matching group to obtain a plurality of tracking tracks in the mth iteration;
performing real-time simulation display on a plurality of tracking tracks in the mth iteration by using a Kalman filtering algorithm;
and increasing the value of m by 1 and returning to the step of matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group until the target information group sequence is traversed to obtain the tracking tracks of the plurality of targets in the time period of the real-time monitoring video to be tracked.
Optionally, the matching of multiple targets in the mth target information group in the target information group sequence and multiple tracking tracks in the m-1 st iteration by using the hungarian algorithm and the cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group specifically includes:
determining any target in the mth target information group as a current target;
determining any one of the plurality of tracking tracks in the m-1 iteration as a current tracking track;
using a formula based on the coordinates of the current targetDetermining motion information values of a current target and a current tracking track; wherein d is(1)(i, j) is the motion information value of the ith tracking track and the jth target, DjDenotes the detection result of the jth target, PiRepresents the ith tracking track, SiRepresenting a covariance matrix between the average track position and the detection position obtained by Kalman filtering prediction;
determining a first matching metric value of the current target and the current tracking track according to the motion information value;
according to the appearance characteristics of the current target, using a formulaIs determined whenAppearance information values of the previous target and the current tracking track; wherein, d(2)(i, j) is the appearance information value of the ith tracking track and the jth target, rj TIs the appearance characteristic matrix r of the jth targetjThe transposed matrix of (2);representing the ith feature on the ith tracking track k; riRepresenting a feature set on the ith tracking track k;
determining a second matching metric value of the current target and the current tracking track according to the appearance information value;
utilizing a formula based on the first matching metric value and the second matching metric valueDetermining a total matching metric value; wherein, bijThe total matching metric value of the ith tracking track and the jth target is obtained;representing an mth matching metric value; m is 1 or 2;
judging whether the total matching metric value is 1 or not to obtain a first judgment result;
if the first judgment result is yes, adding the current target and the current tracking track into a first target-tracking track matching group;
if the first judgment result is negative, adding the current target into a first unmatched target group;
updating the current tracking track and returning to the step of utilizing a formula according to the coordinates of the current targetDetermining the motion information values of the current target and the current tracking track till the plurality of tracking tracks in the (m-1) th iteration are traversed, updating the current target, and returning to the step of determining any tracking track in the plurality of tracking tracks in the (m-1) th iteration to be the current tracking track till the mth iteration is traversedAnd obtaining a first target-tracking track matching group and a first unmatched target group by the target information group.
Optionally, the determining a first matching metric value of the current target and the current tracking track according to the motion information value specifically includes:
judging whether the motion information value is larger than a motion information threshold value or not to obtain a second judgment result;
if the second judgment result is negative, the first matching metric value is made to be 1;
if the second determination result is yes, the first matching metric value is set to 0.
Optionally, the determining a second matching metric value of the current target and the current tracking track according to the appearance information value specifically includes:
judging whether the appearance information value is larger than an appearance information threshold value or not to obtain a third judgment result;
if the third judgment result is negative, the first matching metric value is made to be 1;
if the third determination result is yes, the first matching metric value is set to 0.
A multi-target tracking system for surveillance videos, comprising:
the to-be-tracked real-time monitoring video acquisition module is used for acquiring a to-be-tracked real-time monitoring video;
the framing module is used for framing the real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
the target information group sequence determining module is used for inputting the video frame sequence to be tracked into the multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
the tracking track determining module is used for determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
Optionally, the system further includes:
the historical monitoring video acquisition module is used for acquiring historical monitoring videos;
the first historical monitoring video frame sequence extraction module is used for extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
the first historical monitoring video frame sequence determining module is used for marking target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a first historical monitoring video frame sequence;
the historical target information group sequence determining module is used for determining a historical target information group sequence for marking a historical monitoring video frame sequence;
and the multi-target identification model determining module is used for training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target identification model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method is obtained by improving and training the YOLOv4 neural network and the DeepsORT neural network, and the multi-target recognition model and the multi-target tracking model can automatically determine the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked, so that the intelligent level of video monitoring is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flowchart of a multi-target tracking method for surveillance videos according to an embodiment of the present invention;
FIG. 2 is a flow chart of the construction of an ASMD dataset according to an embodiment of the present invention;
FIG. 3 is a flowchart of YOLOv4-TS target detection in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a YOLOv4 neural network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a YOLOv4-TS neural network structure according to an embodiment of the present invention;
FIG. 6 is a diagram of a structure of a PANET network according to an embodiment of the present invention;
FIG. 7 is a flow chart of the operation of the multi-target tracking model in an embodiment of the invention;
FIG. 8 is a schematic structural diagram of a multi-target tracking system for surveillance videos according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a data acquisition module according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of an airport surface target detection module based on YOLOv4-TS according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an improved DeepSORT-based multi-target tracking module in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention aims to provide a multi-target tracking method and a multi-target tracking system for a surveillance video, which can track a plurality of targets in the surveillance video and improve the intelligent level of video surveillance.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention provides a multi-target tracking method of a surveillance video, which comprises the following steps:
acquiring a real-time monitoring video to be tracked;
performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
inputting a video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in the YOLOv4 neural network;
determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
In addition, the multi-target tracking method for the surveillance videos, provided by the invention, further comprises the following steps before the real-time surveillance video to be tracked is obtained:
acquiring a historical monitoring video;
extracting a plurality of historical monitoring video frames from a historical monitoring video according to a preset time length to serve as a first historical monitoring video frame sequence;
labeling target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a labeled historical monitoring video frame sequence;
determining a historical target information group sequence of an annotated historical monitoring video frame sequence;
and training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target recognition model.
In addition, the multi-target tracking method for the surveillance videos, provided by the invention, further comprises the following steps before the real-time surveillance video to be tracked is obtained:
amplifying the size of the convolution layer, the residual module and the network output size of the DeepSORT neural network to obtain an amplified DeepSORT neural network;
carrying out dimensionality reduction on the network structure of the amplified DeepSORT neural network to obtain an improved DeepSORT neural network;
performing frame division processing on the historical monitoring video to obtain a plurality of historical monitoring video frames serving as a second historical monitoring video frame sequence;
carrying out related labeling on the same target information in the second historical monitoring video frame sequence by using a Darklabel tool to obtain historical tracking tracks of a plurality of targets;
and taking the historical target information group sequences as input, taking the historical tracking tracks of the targets as expected output, and training the improved DeepsORT neural network to obtain the multi-target tracking model.
In addition, the multi-target tracking method for the surveillance videos, provided by the invention, further comprises the following steps before the real-time surveillance video to be tracked is obtained:
acquiring a historical monitoring video which is the same as a monitoring scene of a real-time monitoring video to be tracked as a pre-training video;
and training the multi-target tracking model by utilizing the pre-training video to obtain a plurality of initial tracking tracks in the monitoring scene.
Specifically, determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model according to a target information group sequence, specifically comprising:
making the iteration number m equal to 1;
determining the initial tracking track as the tracking track of the 0 th iteration;
matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group;
matching a plurality of targets in the first unmatched target group with a plurality of tracking tracks in the (m-1) th iteration by using an IoU matching algorithm to obtain a second target-tracking track matching group;
combining the first target-tracking track matching group and the second target-tracking track matching group into a total matching group;
updating the corresponding tracking tracks according to the coordinates of the targets in the total matching group to obtain a plurality of tracking tracks in the mth iteration;
performing real-time simulation display on a plurality of tracking tracks in the mth iteration by using a Kalman filtering algorithm;
and increasing the value of m by 1 and returning to the step of matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group until the target information group sequence is traversed to obtain the tracking tracks of the plurality of targets in the time period of the real-time monitoring video to be tracked.
The method comprises the following steps of matching a plurality of targets in an mth target information group in a target information group sequence with a plurality of tracking tracks in an m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group, and specifically comprises the following steps:
determining any target in the mth target information group as a current target;
determining any one of the plurality of tracking tracks in the m-1 iteration as a current tracking track;
using a formula based on the coordinates of the current targetDetermining motion information values of a current target and a current tracking track; wherein d is(1)(i, j) is the motion information value of the ith tracking track and the jth target, DjDenotes the detection result of the jth target, PiRepresents the ith tracking track, SiRepresenting the covariance between the mean trajectory position predicted by Kalman filtering and the detected positionA matrix;
determining a first matching metric value of a current target and a current tracking track according to the motion information value;
according to the appearance characteristics of the current target, using a formulaDetermining appearance information values of a current target and a current tracking track; wherein d is(2)(i, j) is the appearance information value of the ith tracking track and the jth target, rj TIs the appearance characteristic matrix r of the jth targetjThe transposed matrix of (2);representing the ith feature on the ith tracking track k; riRepresenting a feature set on the ith tracking track k;
determining a second matching metric value of the current target and the current tracking track according to the appearance information value;
using a formula based on the first matching metric value and the second matching metric valueDetermining a total matching metric value; wherein, bijThe total matching metric value of the ith tracking track and the jth target is obtained;representing an mth matching metric value; m is 1 or 2;
judging whether the total matching metric value is 1 or not to obtain a first judgment result;
if the first judgment result is yes, adding the current target and the current tracking track into a first target-tracking track matching group;
if the first judgment result is negative, adding the current target into the first unmatched target group;
updating the current tracking track and returning to the step of utilizing a formula according to the coordinates of the current targetAnd determining the motion information values of the current target and the current tracking track until the multiple tracking tracks are traversed during the (m-1) th iteration, updating the current target, and returning to the step of determining any tracking track of the multiple tracking tracks during the (m-1) th iteration to be the current tracking track until the mth target information group is traversed to obtain a first target-tracking track matching group and a first unmatched target group.
Specifically, determining a first matching metric value of the current target and the current tracking track according to the motion information value specifically includes:
judging whether the motion information value is larger than a motion information threshold value or not to obtain a second judgment result;
if the second judgment result is negative, the first matching metric value is made to be 1;
if the second determination result is yes, the first matching metric value is set to 0.
Further, the method is characterized in that a second matching metric value of the current target and the current tracking track is determined according to the appearance information value, and specifically includes:
judging whether the appearance information value is larger than an appearance information threshold value or not to obtain a third judgment result;
if the third judgment result is negative, the first matching metric value is made to be 1;
if the third determination result is yes, the first matching metric value is set to 0.
Referring to fig. 1, a multi-target tracking method for surveillance videos provided by the present invention is specifically described by taking an airport scene surveillance video as an example, and the present invention includes the following steps:
s100, constructing an airport scene multifunctional data set:
constructing the ASMD data set including a training set and a test set for training and evaluating the YOLOv4-TS target detection algorithm, a test set for evaluating the improved deepsrt multi-target tracking algorithm, and a training set for training the improved Re-ID network, as shown in fig. 2, in step S100, may include:
s101, data acquisition:
specifically, a high-definition camera is used for shooting a monitoring video of the airport scene, and a camera interface is remotely accessed to a part of airport scene cameras to obtain original image information under the condition of no secret involvement.
S102, constructing a target detection data set:
specifically, a picture is intercepted every 500ms, each screenshot is labeled by using a LabelImg labeling tool, and a training set and a test set for training and evaluating a YOLOv4-TS target detection algorithm are constructed.
S103, constructing a multi-target tracking data set:
specifically, each frame of the video is labeled by using a Darklabel tool, and a test set for evaluating the improved DeepSORT multi-target tracking algorithm is constructed.
S104, establishing a Re-ID data set:
specifically, each label picture is segmented from an original image according to the positions of different types of target detection frames by using the constructed data set, and a training set for training an improved Re-ID network is constructed.
S110, constructing an airport scene target detection model based on YOLOv 4-TS:
specifically, by training an airport surface target detection model of YOLOv4-TS on an ASMD data set, inputting an airport surface monitoring image, and obtaining a detection frame and coordinate information of a target, as shown in fig. 3, step S110 includes:
s111, feature extraction:
specifically, a CSPDarknet network structure is adopted as a feature extraction network to extract features of an input scene monitoring image. The convolution layers of the MS-COCO data set have 53 layers, namely CSPDarknet53 is used as a feature extraction network, and the feature extraction network is initialized in the pre-training weight of the MS-COCO data set to perform transfer learning.
S112, multi-scale feature information acquisition:
specifically, multi-scale feature information is obtained using an SPP network. The SPP (spatial pyramid pooling) module enables the ability to obtain multi-scale feature information, centered on three different sized kernels in the maximal pooling layer. Wherein each maximum pooling stepSetting the distance size as s and the kernel size as d, outputting the characteristic diagram size YsizeCan be expressed by equation (1).
Wherein,for rounding down functions, XsizeFor inputting the feature map size, the step s of the SPP module in the YOLOv4 network is 1, p (pad) is Padding for the image, and both pad and Padding indicate that some pixels are padded in the periphery of the image.
The Padding size depends on the kernel size d, and the calculation method is shown in formula (2).
In fig. 4, CBL is a volume block, which is composed of three network layers, i.e., Conv (volume layer), Batch Normalization (Batch Normalization layer), and leakyrleu (activation function), and CBLn is represented by n CBL modules connected together; SPP represents a spatial pyramid pooling layer; up sample and Down sample represent upsampling and downsampling, respectively. As shown in fig. 4-5, input X is obtained by calculationsizeAnd output YsizeAnd equality, namely after the SPP module processing, the size of the output feature map is consistent with that of the original feature map. The lower level feature maps of CNN have higher resolution and therefore contain more detailed information, while the higher level feature maps have larger receptive fields and richer expression information. The SPP module integrates more scale features, absorbs the advantages of low-level and high-level feature maps and obtains more abstract information, thereby enlarging the range of obtaining feature information and improving the prediction performance of the model. The invention adds the SPP modules to the SPP modules originally positioned behind the 19X 19 characteristic diagram of the neck network at the same positions of 38X 38 and 76X 76 respectively, strengthens the detection capability of medium-scale and small-scale targets, and better adapts to the existence of multi-scale targets such as camera scenesThe complex environment of the target. Since the number of SPP modules in the original YOLOv4 network is increased from one to three, the improved network is named as "YOLOv 4-Triple SPP" and is called "YOLOv 4-TS" for short.
S113, multi-scale feature fusion:
in fig. 6, Class indicates Class, box indicates detection box, and mask indicates image mask, which is generally represented by a two-dimensional matrix array, as shown in fig. 6. Specifically, a PANet network is used to fuse different scale features. The method comprises the following steps: first, in the first part, with reference to the FPN structure, feature maps with the same spatial size are generated at the same network stage, using { P }2,P3,P4,P5Denotes generating feature levels for optimizing propagation paths. Second, the second part uses { N } for heuristic of ResNet architecture2,N3,N4,N5Denotes { P }2,P3,P4,P5Corresponding to the generated feature mapping, expanding the path from the N of the lowest level2At first, gradually increase to N5. Finally, each module uses the unprocessed profile PiAnd high resolution feature map NiAnd generating a new characteristic diagram in a transverse connection mode. The self-adaptive feature pool in the third part of the PANet plays the role of fusing single-layer features into multi-layer features, and the feature fusion is utilized to enable the network to have self-adaptive capacity, thereby providing powerful support for the architecture of bottom-up path expansion. The (fourth) part of the PANet is to classify and regress the feature layers fused from the (third) part. The fusion of the full connection layer is positioned in the fifth part of the PANet, the full connection layer mainly bears the work of semantic segmentation, plays the role of predicting and generating Mask, and the two branches fuse and generate Mask to finally obtain a prediction result.
S114, YOLOv4-TS network training:
specifically, training a network by using a data set of a YOLOv4-TS target detection algorithm, adjusting network hyper-parameters, and selecting an optimal network model;
s115, YOLOv4-TS target detection:
specifically, the airport scene monitoring image is sent to a YOLOv4-TS network model for multi-class target detection;
s120, constructing an improved DeepSORT-based multi-target tracking model;
specifically, the improved depth appearance model is trained on an ASMD data set, so that the re-recognition effect of an original network is improved; and then, a DeepsORT multi-target tracking algorithm is utilized, the detection values obtained in the step S110 are used as input, and the tracking tracks of a plurality of targets in the video are obtained. As shown in fig. 7, step S120 includes:
s121, Kalman filtering state prediction:
specifically, a Kalman filtering algorithm is used for carrying out state prediction on the scene multi-target. Wherein, the state prediction equation and covariance matrix equation involved in the state prediction process are shown in equations (3) and (4), and the gain equation, the updated state optimal equation and the optimal estimated covariance matrix equation involved in the state update are shown in equations (5), (6) and (7). Wherein, Xk,kAnd Xk-1,k-1Respectively representing the state vectors corresponding to the instants k and k-1, Xk,k-1Representing the state vector from the time k-1 to the time k; pk,kAnd Pk-1,k-1Respectively representing the covariance matrix, P, corresponding to the k and k-1 instantsk,k-1Representing a covariance matrix from time k-1 to time k; zkRepresenting observation vectors, A representing the state transition matrix at time k-1 to k, B and UkRespectively representing the input gain matrix and the input gain vector, H represents an observation matrix, and covariance matrixes of system noise and observation noise are set to be Q and R, wherein Q and R are not influenced by the system state.
Xk,k-1=AXk-1,k-1+BUk) (3)
Pk,k-1=APk-1,k-1AT+Q (4)
Kk=Pk,k-1HT(HPk,k-1HT+R)-1 (5)
Xk,k=Xk,k-1+Kk(Zk-HXk,k-1) (6)
Pk,k=Pk,k-1-KkHPk,k-1 (7)
S122, Hungarian matching:
specifically, the detection values and the predicted trajectories are further matched using the Hungarian algorithm. The method comprises the steps of matching the similarity of motion information based on Mahalanobis distance and the similarity of a depth appearance model based on improvement, and performing data association fusion on the similarity of motion information and the similarity of the depth appearance model, and comprises the following specific steps:
when there is an uncertainty factor in the motion state of the target, mahalanobis distance can be used to express the motion information between the prediction frame and the detection frame, and the calculation expression is shown in formula (8). Wherein d is(1)(i, j) represents the motion matching information of the ith prediction frame and the jth detection frame, PiIndicates the prediction result of the i-th prediction box, DjDenotes the detection result of the jth detection frame, SiAnd representing a covariance matrix between the average track position and the detection position obtained by Kalman filtering prediction. Furthermore, to ensure that invalid associations are filtered out, mahalanobis distance is thresholded by a 95% confidence interval calculated by the chi-squared distribution, as shown by function (9). Wherein a threshold t is given for a four-dimensional prediction space (x, y, w, h)(1)9.4877, when the prediction box is successfully associated with the detection box, the result is 1. Wherein (x, y) represents the coordinates of the center point of the target, w represents the aspect ratio of the frame of the target, and h represents the height of the frame of the target.
Since the original feature network is trained on the pedestrian data set, the feature extraction object is mainly a pedestrian, the input size is only 128 × 64, and the original feature network is shown in table 1. Aiming at the ASMD data set constructed by the invention, large-size targets such as vehicles and airplanes exist, and the input size of the original algorithm has certain limitation on the large-size targets. Therefore, the invention is improved on the basis of the original network model, the size and the residual error of the input convolution layer are amplified, the adjusted network input size is expanded to 128 multiplied by 128, meanwhile, the residual error network is deepened, the dimension of the amplified network is reduced, the convergence speed of the training model is ensured, and the improved network structure is shown in table 2.
Table 1 feature extraction network architecture before improvement
Table 2 improved feature extraction network architecture
The added depth appearance information is realized by the following steps of firstly solving each detection frame DjCorresponding appearance characteristic rjGiving a constraint | | | rj1, |; secondly, establishing a feature set R on the tracking track kiAnd recording one hundred appearance characteristics into a characteristic set R every time one appearance characteristic is successfully associatediPerforming the following steps; finally, a minimum cosine distance expression prediction frame P is introducediAnd a detection frame DjDepth appearance information of. And (3) calculating appearance information of the prediction frame and the detection frame, namely a cosine distance expression (10) through the improved network model training. In addition, d is defined by analogy with equation (8)(1)(i, j) the thresholding is also performed, and the threshold t is defined as shown in function (11) and obtained by the improved network model training(2)。
S123, cascade matching:
fused data association model ci,jThe expression is shown in equation (12). Wherein, λ represents the regulation and control parameter of the model, and the value interval is [0,1 ]]For example, under the condition that the target is blocked for a long time or the motion amplitude of the camera is large, the Kalman filtering prediction effect is extremely poor and has no reference value, and at the moment, the fusion model gives up the predicted motion information, namely, the lambda is made to be 0. In addition, d with spatial position matching is added(3)(i, j) which functions as the matching metric bijIf the trace does not match the predetermined value, IoU is used to indicate that the trace is not matchedσMeasures to make up for this loss, matching measures bijThe expression is shown in formula (13).
Specifically, a cascade matching algorithm is used to match the detected values with the predicted trajectory. The cascade matching algorithm takes the predicted track and the detection box as input, outputs a successfully matched set and an unsuccessfully matched set by calculating the matching measurement, and performs iterative operation by predicting the track, so that a target with frequent occurrence times obtains a priority matching right, and the problem of mismatching caused by probability dispersion is effectively solved. In addition, the prediction frames meeting the requirements are obtained through screening and are matched, so that the number of times of identity jump caused by long-time shielding of the target can be reduced, and the robustness of the algorithm is effectively improved.
S124, Hungarian-cascade joint matching algorithm:
the method comprises the following steps of inputting a predicted track set P ═ 1.. multidot.n, a detection frame set D ═ 1.. multidot.m, and a maximum continuous pause number A blockedmax。
And outputting the successfully matched set M and the unsuccessfully matched set U.
Step 1, a data association model obtained by fusing motion information and appearance information, namely, a value obtained by calculation by using a formula (12) is given to a set Cm={ci,j};
And 2, fusing the matching measurement. That is, the value calculated by the formula (13) is given to the set Bm={bij};
Step 3, initializing an algorithm, and making the set M be an empty set;
step 4, initializing an unmatched detection frame set, and giving a detection frame set D to a set U;
step 5, starting from the predicted track matched to the target, traversing to AmaxThe predicted values of the unmatched targets and the set { x ] of successfully matched targets are obtainedi,j};
Step 6, updating the set, namely the { x to be obtainedi,jThe value of the sum is added to the set M, and the set U is removed from the successfully matched set xi,jAnd assigning the obtained difference set to the set U again, and completing a round of algorithm;
and 7, returning the M, U two sets as initial values to the step 5, and substituting the initial values into the next iteration.
S125, IoU matching:
specifically, IoU matching is performed on the trace of the unconfirmed state, the unmatched trace and the unmatched detection box, as shown in equation (12), followed by the next iteration using the hungarian algorithm.
S126, updating Kalman filtering parameters:
specifically, the parameters are updated and subsequently processed using Kalman filtering.
Aiming at the problem that the existing airport scene monitoring equipment does not completely meet the requirement of accurate perception of scene targets, the multi-target tracking method and the multi-target tracking device based on the airport scene monitoring video solve the problems of high working strength, low efficiency and the like of the existing visual monitoring method used by the airport scene, utilize a deep learning network to quickly perform multi-target tracking on the airport scene, improve the intelligent level of a scene video monitoring system to a certain extent, and reduce the dependence of scene monitoring on manual interpretation.
In addition, the invention also provides a multi-target tracking system of the monitoring video, which comprises the following steps:
the to-be-tracked real-time monitoring video acquisition module is used for acquiring a to-be-tracked real-time monitoring video;
the framing module is used for framing the real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
the target information group sequence determining module is used for inputting the video frame sequence to be tracked into the multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in the YOLOv4 neural network;
the tracking track determining module is used for determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
The historical monitoring video acquisition module is used for acquiring historical monitoring videos;
the first historical monitoring video frame sequence extraction module is used for extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
the first historical monitoring video frame sequence determining module is used for marking target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a first historical monitoring video frame sequence;
the historical target information group sequence determining module is used for determining a historical target information group sequence for marking a historical monitoring video frame sequence;
and the multi-target identification model determining module is used for training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target identification model.
As shown in fig. 8, taking an airport scene surveillance video as an example, the multi-target tracking system for surveillance video provided by the invention includes the following modules: the data acquisition module is used for reading monitoring videos or pictures in the ASMD data set of the local disk; an airport scene target detection module based on the YOLOv4-TS trains an airport scene target detection model of the YOLOv4-TS on an ASMD data set, and inputs an airport scene monitoring image to obtain a detection frame and coordinate information of a target; based on an improved DeepSORT multi-target tracking module, an improved depth appearance model is trained on an ASMD data set, and the re-identification effect of an original network is improved; and then, a DeepsORT multi-target tracking algorithm is utilized, and the detection values obtained by the target detection module are used as input, so that the tracking tracks of a plurality of targets in the video are obtained.
Further, as shown in fig. 9, the data acquisition module includes: the data set unit of the target detection algorithm is used for training and evaluating the YOLOv4-TS target detection algorithm; the multi-target tracking data set unit is used for evaluating an improved DeepSORT multi-target tracking algorithm; and a Re-ID data set unit trains the improved Re-ID network.
Further, as shown in fig. 10, the YOLOv4-TS based airport surface target detection module includes: the CSPDarknet53 feature extraction network unit is used for extracting features of an input scene monitoring image; the spatial pyramid pooling unit is used for acquiring multi-scale feature information; and the path aggregation network unit is used for fusing different scale characteristics.
As shown in fig. 11, the improved DeepSORT-based multi-target tracking module: the Kalman filtering model building unit is used for predicting a motion track on an airport scene; the Hungarian matching unit is used for matching the detection value with the predicted track from the motion information and the appearance information; the cascade matching unit is used for further matching the detection value with the predicted track; moU matching unit, for performing supplementary matching to unmatched value; a Kalman filtering parameter updating unit; for updating the system state.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In summary, this summary should not be construed to limit the present invention.
Claims (10)
1. A multi-target tracking method for surveillance videos is characterized by comprising the following steps:
acquiring a real-time monitoring video to be tracked;
performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
inputting the video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
2. The multi-target tracking method for the surveillance videos as claimed in claim 1, further comprising, before the obtaining of the real-time surveillance video to be tracked:
acquiring a historical monitoring video;
extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
marking target information in each historical monitoring video frame in a first historical monitoring video frame sequence to obtain a marked historical monitoring video frame sequence;
determining a historical target information group sequence of an annotated historical monitoring video frame sequence;
and training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target recognition model.
3. The multi-target tracking method for the surveillance videos as claimed in claim 2, further comprising, before the obtaining of the real-time surveillance video to be tracked:
amplifying the size of the convolution layer, the residual module and the network output size of the DeepSORT neural network to obtain an amplified DeepSORT neural network;
carrying out dimensionality reduction on the network structure of the amplified DeepSORT neural network to obtain an improved DeepSORT neural network;
performing framing processing on the historical monitoring video to obtain a plurality of historical monitoring video frames serving as a second historical monitoring video frame sequence;
carrying out related labeling on the same target information in the second historical monitoring video frame sequence by using a Darklabel tool to obtain historical tracking tracks of a plurality of targets;
and taking a plurality of historical target information group sequences as input, taking historical tracking tracks of a plurality of targets as expected output, and training the improved DeepsORT neural network to obtain the multi-target tracking model.
4. The multi-target tracking method for the surveillance videos as claimed in claim 1, further comprising, before obtaining the real-time surveillance video to be tracked:
acquiring a historical monitoring video which is the same as the monitoring scene of the real-time monitoring video to be tracked as a pre-training video;
and training the multi-target tracking model by using the pre-training video to obtain a plurality of initial tracking tracks in a monitoring scene.
5. The multi-target tracking method for the surveillance video according to claim 4, wherein the determining, according to the target information group sequence, the tracking trajectories of the multiple targets in the real-time surveillance video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model specifically comprises:
making the iteration number m equal to 1;
determining the initial tracking track as the tracking track of the 0 th iteration;
matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group;
matching a plurality of targets in the first unmatched target group with a plurality of tracking tracks in the (m-1) th iteration by using an IoU matching algorithm to obtain a second target-tracking track matching group;
combining the first target-tracking track matching group and the second target-tracking track matching group into a total matching group;
updating the corresponding tracking tracks according to the coordinates of the targets in the total matching group to obtain a plurality of tracking tracks during the mth iteration;
performing real-time simulation display on a plurality of tracking tracks in the mth iteration by using a Kalman filtering algorithm;
and increasing the value of m by 1 and returning to the step of matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group until the target information group sequence is traversed to obtain the tracking tracks of the plurality of targets in the time period of the real-time monitoring video to be tracked.
6. The multi-target tracking method for the surveillance video according to claim 5, wherein the matching of the multiple targets in the mth target information group in the target information group sequence with the multiple tracking tracks in the (m-1) th iteration by using the Hungarian algorithm and the cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group specifically comprises:
determining any target in the mth target information group as a current target;
determining any one of the plurality of tracking tracks in the m-1 iteration as a current tracking track;
using a formula based on the coordinates of the current targetDetermining motion information values of a current target and a current tracking track; wherein d is(1)(i, j) is the motion information value of the ith tracking track and the jth target, DjDenotes the detection result of the jth target, PiRepresents the ith tracking track, SiRepresenting a covariance matrix between the average track position predicted by Kalman filtering and the detection position;
determining a first matching metric value of the current target and the current tracking track according to the motion information value;
according to the appearance characteristics of the current target, using a formulaDetermining appearance information values of a current target and a current tracking track; wherein d is(2)(i, j) is the appearance information value of the ith tracking track and the jth target, rj TAppearance feature matrix r for jth objectjThe transposed matrix of (2);represents the ith traceThe ith feature on track k; riRepresenting a feature set on the ith tracking track k;
determining a second matching metric value of the current target and the current tracking track according to the appearance information value;
according to the first matching metric value and the second matching metric value, a formula is utilizedDetermining a total matching metric value; wherein, bijThe total matching metric value of the ith tracking track and the jth target is obtained;representing an mth matching metric value; m is 1 or 2;
judging whether the total matching metric value is 1 or not to obtain a first judgment result;
if the first judgment result is yes, adding the current target and the current tracking track into a first target-tracking track matching group;
if the first judgment result is negative, adding the current target into a first unmatched target group;
updating the current tracking track and returning to the step of utilizing a formula according to the coordinates of the current targetAnd determining the motion information values of the current target and the current tracking track until the multiple tracking tracks are traversed during the (m-1) th iteration, updating the current target, and returning to the step of determining any tracking track of the multiple tracking tracks during the (m-1) th iteration to be the current tracking track until the mth target information group is traversed to obtain a first target-tracking track matching group and a first unmatched target group.
7. The multi-target tracking method for the surveillance video according to claim 6, wherein the determining a first matching metric value of the current target and the current tracking trajectory according to the motion information value specifically comprises:
judging whether the motion information value is larger than a motion information threshold value or not to obtain a second judgment result;
if the second judgment result is negative, the first matching metric value is made to be 1;
if the second determination result is yes, the first matching metric value is set to 0.
8. The multi-target tracking method for the surveillance video according to claim 6, wherein the determining a second matching metric value of the current target and the current tracking trajectory according to the appearance information value specifically includes:
judging whether the appearance information value is larger than an appearance information threshold value or not to obtain a third judgment result;
if the third judgment result is negative, the first matching metric value is made to be 1;
if the third determination result is yes, the first matching metric value is set to 0.
9. A multi-target tracking system for surveillance video, the system comprising:
the to-be-tracked real-time monitoring video acquisition module is used for acquiring a to-be-tracked real-time monitoring video;
the framing module is used for framing the real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
the target information group sequence determining module is used for inputting the video frame sequence to be tracked into the multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
the tracking track determining module is used for determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
10. The multi-target tracking system for surveillance videos of claim 9, further comprising:
the historical monitoring video acquisition module is used for acquiring historical monitoring videos;
the first historical monitoring video frame sequence extraction module is used for extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
the first historical monitoring video frame sequence determining module is used for marking target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a first historical monitoring video frame sequence;
the historical target information group sequence determining module is used for determining a historical target information group sequence for marking a historical monitoring video frame sequence;
and the multi-target identification model determining module is used for training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target identification model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210220010.0A CN114596340A (en) | 2022-03-08 | 2022-03-08 | Multi-target tracking method and system for monitoring video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210220010.0A CN114596340A (en) | 2022-03-08 | 2022-03-08 | Multi-target tracking method and system for monitoring video |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114596340A true CN114596340A (en) | 2022-06-07 |
Family
ID=81808080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210220010.0A Pending CN114596340A (en) | 2022-03-08 | 2022-03-08 | Multi-target tracking method and system for monitoring video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114596340A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115464659A (en) * | 2022-10-05 | 2022-12-13 | 哈尔滨理工大学 | Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information |
CN115880620A (en) * | 2022-09-13 | 2023-03-31 | 中信重工开诚智能装备有限公司 | Personnel counting method applied to cart early warning system |
CN116403162A (en) * | 2023-04-11 | 2023-07-07 | 南京航空航天大学 | Airport scene target behavior recognition method and system and electronic equipment |
CN118429948A (en) * | 2024-07-05 | 2024-08-02 | 广州国交润万交通信息有限公司 | Traffic signal lamp monitoring method and device based on video |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158983A (en) * | 2021-05-18 | 2021-07-23 | 南京航空航天大学 | Airport scene activity behavior recognition method based on infrared video sequence image |
CN113269098A (en) * | 2021-05-27 | 2021-08-17 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle |
CN113420607A (en) * | 2021-05-31 | 2021-09-21 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Multi-scale target detection and identification method for unmanned aerial vehicle |
-
2022
- 2022-03-08 CN CN202210220010.0A patent/CN114596340A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158983A (en) * | 2021-05-18 | 2021-07-23 | 南京航空航天大学 | Airport scene activity behavior recognition method based on infrared video sequence image |
CN113269098A (en) * | 2021-05-27 | 2021-08-17 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle |
CN113420607A (en) * | 2021-05-31 | 2021-09-21 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Multi-scale target detection and identification method for unmanned aerial vehicle |
Non-Patent Citations (4)
Title |
---|
周嘉麒等: "基于融合数据关联的无人机多目标跟踪算法", 《舰船电子工程》, pages 48 - 54 * |
蒋镕圻等: "嵌入scSE模块的改进YOLOv4小目标检测算法", 《图学学报》, pages 2 * |
赵朵朵;章坚武;傅剑峰;: "基于深度学习的实时人流统计方法研究", 传感技术学报, no. 08 * |
黄玉富等: "基于多尺度特征融合的水果图像识别算法研究", 《长春理工大学学报(自然科学版)》, pages 1 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115880620A (en) * | 2022-09-13 | 2023-03-31 | 中信重工开诚智能装备有限公司 | Personnel counting method applied to cart early warning system |
CN115880620B (en) * | 2022-09-13 | 2023-11-07 | 中信重工开诚智能装备有限公司 | Personnel counting method applied to cart early warning system |
CN115464659A (en) * | 2022-10-05 | 2022-12-13 | 哈尔滨理工大学 | Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information |
CN115464659B (en) * | 2022-10-05 | 2023-10-24 | 哈尔滨理工大学 | Mechanical arm grabbing control method based on visual information deep reinforcement learning DDPG algorithm |
CN116403162A (en) * | 2023-04-11 | 2023-07-07 | 南京航空航天大学 | Airport scene target behavior recognition method and system and electronic equipment |
CN116403162B (en) * | 2023-04-11 | 2023-10-27 | 南京航空航天大学 | Airport scene target behavior recognition method and system and electronic equipment |
CN118429948A (en) * | 2024-07-05 | 2024-08-02 | 广州国交润万交通信息有限公司 | Traffic signal lamp monitoring method and device based on video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110059558B (en) | Orchard obstacle real-time detection method based on improved SSD network | |
CN109559320B (en) | Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network | |
CN110660082B (en) | Target tracking method based on graph convolution and trajectory convolution network learning | |
CN114596340A (en) | Multi-target tracking method and system for monitoring video | |
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN111709416B (en) | License plate positioning method, device, system and storage medium | |
JP6650657B2 (en) | Method and system for tracking moving objects in video using fingerprints | |
CN111626128A (en) | Improved YOLOv 3-based pedestrian detection method in orchard environment | |
CN112052802B (en) | Machine vision-based front vehicle behavior recognition method | |
CN108022258B (en) | Real-time multi-target tracking method based on single multi-frame detector and Kalman filtering | |
CN112101221A (en) | Method for real-time detection and identification of traffic signal lamp | |
Javadi et al. | Vehicle detection in aerial images based on 3D depth maps and deep neural networks | |
CN112991391A (en) | Vehicle detection and tracking method based on radar signal and vision fusion | |
CN110532937B (en) | Method for accurately identifying forward targets of train based on identification model and classification model | |
JP2014071902A5 (en) | ||
CN105809716A (en) | Superpixel and three-dimensional self-organizing background subtraction algorithm-combined foreground extraction method | |
CN110567324A (en) | multi-target group threat degree prediction device and method based on DS evidence theory | |
CN116434088A (en) | Lane line detection and lane auxiliary keeping method based on unmanned aerial vehicle aerial image | |
CN117949942B (en) | Target tracking method and system based on fusion of radar data and video data | |
CN113689459B (en) | Real-time tracking and mapping method based on GMM and YOLO under dynamic environment | |
CN114332444A (en) | Complex starry sky background target identification method based on incremental drift clustering | |
Fan et al. | Covered vehicle detection in autonomous driving based on faster rcnn | |
Jiang et al. | Surveillance from above: A detection-and-prediction based multiple target tracking method on aerial videos | |
CN112069997A (en) | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net | |
CN106650814B (en) | Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220607 |