CN114596340A - Multi-target tracking method and system for monitoring video - Google Patents

Multi-target tracking method and system for monitoring video Download PDF

Info

Publication number
CN114596340A
CN114596340A CN202210220010.0A CN202210220010A CN114596340A CN 114596340 A CN114596340 A CN 114596340A CN 202210220010 A CN202210220010 A CN 202210220010A CN 114596340 A CN114596340 A CN 114596340A
Authority
CN
China
Prior art keywords
target
tracking
monitoring video
tracked
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210220010.0A
Other languages
Chinese (zh)
Inventor
丁萌
周嘉麒
曹云峰
魏丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210220010.0A priority Critical patent/CN114596340A/en
Publication of CN114596340A publication Critical patent/CN114596340A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-target tracking method and a multi-target tracking system for a surveillance video, which relate to the technical field of computer vision technology and civil aviation traffic engineering, and are used for acquiring a real-time surveillance video to be tracked; performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked; inputting a video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked can be automatically determined by the multi-target recognition model and the multi-target tracking model obtained by improving and training the YOLOv4 neural network and the DeepsORT neural network, so that the intelligent level of video monitoring is improved.

Description

Multi-target tracking method and system for monitoring video
Technical Field
The invention relates to the technical field of computer vision technology and civil aviation traffic engineering, in particular to a multi-target tracking method and system for a surveillance video.
Background
In recent years, with the rapid development of civil aviation industry, airport areas are increasingly enlarged, the traffic conditions of airport surfaces such as runways, taxiways and airports are increasingly complex, and the probability of collision of airplanes on the surfaces is increased. A plurality of runways exist in large airports such as Beijing, Shanghai, Xian and the like, and the airport scene is often congested. In addition, the sight of the terminal building is blocked, so that monitoring blind areas exist on airport parking ramps and partial taxiways, and potential safety hazards are buried for traffic control of airport surfaces. Therefore, it is very necessary for the auxiliary control personnel to effectively master the traffic situation of the airport to monitor the airport scene.
Because the current automatic monitoring system has simple functions, the current automatic monitoring system still is a monitoring means mainly adopted by domestic large airports depending on manual visual observation. At present, most of large-flow international airports (Hangzhou Xiaoshan international airport, Chongqing Jiangbei international airport, Shenzhen Baoan international airport and the like) adopt a semi-manual semi-automatic monitoring system, and along with the continuous increase of airport passenger flow, especially during the rush hour of flight, the supervision difficulty of safety monitoring work of an air park is increased, which puts higher requirements on the working capacity of air park monitoring personnel. Because the scene vehicle and personnel are more, and the operation time limit is high and the environment is relatively abominable, the general staff of security protection control on air park is not enough, and traditional artifical visual control management means has the safety bottleneck, supervises the emergence that the unartificial airport incident that leads to easily. Therefore, it is difficult for the conventional manual visual observation to meet the monitoring requirement of the airport scene, and there is a need to improve the intelligent level of the scene video monitoring system.
Disclosure of Invention
The invention aims to provide a multi-target tracking method and a multi-target tracking system for a surveillance video, which can track a plurality of targets in the surveillance video and improve the intelligent level of video surveillance.
In order to achieve the purpose, the invention provides the following scheme:
a multi-target tracking method for surveillance videos comprises the following steps:
acquiring a real-time monitoring video to be tracked;
performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
inputting the video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
Optionally, before the obtaining the real-time monitoring video to be tracked, the method further includes:
acquiring a historical monitoring video;
extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
labeling target information in each historical monitoring video frame in a first historical monitoring video frame sequence to obtain a labeled historical monitoring video frame sequence;
determining a historical target information group sequence of an annotated historical monitoring video frame sequence;
and training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target recognition model.
Optionally, before the obtaining the real-time monitoring video to be tracked, the method further includes:
amplifying the sizes of the convolution layer, the residual module and the network output size of the DeepSORT neural network to obtain an amplified DeepSORT neural network;
carrying out dimensionality reduction on the network structure of the amplified DeepSORT neural network to obtain an improved DeepSORT neural network;
performing frame division processing on the historical monitoring video to obtain a plurality of historical monitoring video frames serving as a second historical monitoring video frame sequence;
carrying out related labeling on the same target information in the second historical monitoring video frame sequence by using a Darklabel tool to obtain historical tracking tracks of a plurality of targets;
and taking a plurality of historical target information group sequences as input, taking historical tracking tracks of a plurality of targets as expected output, and training the improved DeepsORT neural network to obtain the multi-target tracking model.
Optionally, before obtaining the real-time monitoring video to be tracked, the method further includes:
acquiring a historical monitoring video which is the same as the monitoring scene of the real-time monitoring video to be tracked as a pre-training video;
and training the multi-target tracking model by using the pre-training video to obtain a plurality of initial tracking tracks in a monitoring scene.
Optionally, the determining, according to the target information group sequence, the tracking trajectories of the multiple targets in the real-time monitoring video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model specifically includes:
making the iteration number m equal to 1;
determining the initial tracking track as the tracking track of the 0 th iteration;
matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group;
matching a plurality of targets in the first unmatched target group with a plurality of tracking tracks in the (m-1) th iteration by using an IoU matching algorithm to obtain a second target-tracking track matching group;
combining the first target-tracking track matching group and the second target-tracking track matching group into a total matching group;
updating the corresponding tracking tracks according to the coordinates of the targets in the total matching group to obtain a plurality of tracking tracks in the mth iteration;
performing real-time simulation display on a plurality of tracking tracks in the mth iteration by using a Kalman filtering algorithm;
and increasing the value of m by 1 and returning to the step of matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group until the target information group sequence is traversed to obtain the tracking tracks of the plurality of targets in the time period of the real-time monitoring video to be tracked.
Optionally, the matching of multiple targets in the mth target information group in the target information group sequence and multiple tracking tracks in the m-1 st iteration by using the hungarian algorithm and the cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group specifically includes:
determining any target in the mth target information group as a current target;
determining any one of the plurality of tracking tracks in the m-1 iteration as a current tracking track;
using a formula based on the coordinates of the current target
Figure BDA0003536812200000041
Determining motion information values of a current target and a current tracking track; wherein d is(1)(i, j) is the motion information value of the ith tracking track and the jth target, DjDenotes the detection result of the jth target, PiRepresents the ith tracking track, SiRepresenting a covariance matrix between the average track position and the detection position obtained by Kalman filtering prediction;
determining a first matching metric value of the current target and the current tracking track according to the motion information value;
according to the appearance characteristics of the current target, using a formula
Figure BDA0003536812200000042
Is determined whenAppearance information values of the previous target and the current tracking track; wherein, d(2)(i, j) is the appearance information value of the ith tracking track and the jth target, rj TIs the appearance characteristic matrix r of the jth targetjThe transposed matrix of (2);
Figure BDA0003536812200000043
representing the ith feature on the ith tracking track k; riRepresenting a feature set on the ith tracking track k;
determining a second matching metric value of the current target and the current tracking track according to the appearance information value;
utilizing a formula based on the first matching metric value and the second matching metric value
Figure BDA0003536812200000044
Determining a total matching metric value; wherein, bijThe total matching metric value of the ith tracking track and the jth target is obtained;
Figure BDA0003536812200000045
representing an mth matching metric value; m is 1 or 2;
judging whether the total matching metric value is 1 or not to obtain a first judgment result;
if the first judgment result is yes, adding the current target and the current tracking track into a first target-tracking track matching group;
if the first judgment result is negative, adding the current target into a first unmatched target group;
updating the current tracking track and returning to the step of utilizing a formula according to the coordinates of the current target
Figure BDA0003536812200000046
Determining the motion information values of the current target and the current tracking track till the plurality of tracking tracks in the (m-1) th iteration are traversed, updating the current target, and returning to the step of determining any tracking track in the plurality of tracking tracks in the (m-1) th iteration to be the current tracking track till the mth iteration is traversedAnd obtaining a first target-tracking track matching group and a first unmatched target group by the target information group.
Optionally, the determining a first matching metric value of the current target and the current tracking track according to the motion information value specifically includes:
judging whether the motion information value is larger than a motion information threshold value or not to obtain a second judgment result;
if the second judgment result is negative, the first matching metric value is made to be 1;
if the second determination result is yes, the first matching metric value is set to 0.
Optionally, the determining a second matching metric value of the current target and the current tracking track according to the appearance information value specifically includes:
judging whether the appearance information value is larger than an appearance information threshold value or not to obtain a third judgment result;
if the third judgment result is negative, the first matching metric value is made to be 1;
if the third determination result is yes, the first matching metric value is set to 0.
A multi-target tracking system for surveillance videos, comprising:
the to-be-tracked real-time monitoring video acquisition module is used for acquiring a to-be-tracked real-time monitoring video;
the framing module is used for framing the real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
the target information group sequence determining module is used for inputting the video frame sequence to be tracked into the multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
the tracking track determining module is used for determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
Optionally, the system further includes:
the historical monitoring video acquisition module is used for acquiring historical monitoring videos;
the first historical monitoring video frame sequence extraction module is used for extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
the first historical monitoring video frame sequence determining module is used for marking target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a first historical monitoring video frame sequence;
the historical target information group sequence determining module is used for determining a historical target information group sequence for marking a historical monitoring video frame sequence;
and the multi-target identification model determining module is used for training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target identification model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method is obtained by improving and training the YOLOv4 neural network and the DeepsORT neural network, and the multi-target recognition model and the multi-target tracking model can automatically determine the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked, so that the intelligent level of video monitoring is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flowchart of a multi-target tracking method for surveillance videos according to an embodiment of the present invention;
FIG. 2 is a flow chart of the construction of an ASMD dataset according to an embodiment of the present invention;
FIG. 3 is a flowchart of YOLOv4-TS target detection in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a YOLOv4 neural network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a YOLOv4-TS neural network structure according to an embodiment of the present invention;
FIG. 6 is a diagram of a structure of a PANET network according to an embodiment of the present invention;
FIG. 7 is a flow chart of the operation of the multi-target tracking model in an embodiment of the invention;
FIG. 8 is a schematic structural diagram of a multi-target tracking system for surveillance videos according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a data acquisition module according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of an airport surface target detection module based on YOLOv4-TS according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an improved DeepSORT-based multi-target tracking module in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention aims to provide a multi-target tracking method and a multi-target tracking system for a surveillance video, which can track a plurality of targets in the surveillance video and improve the intelligent level of video surveillance.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention provides a multi-target tracking method of a surveillance video, which comprises the following steps:
acquiring a real-time monitoring video to be tracked;
performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
inputting a video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in the YOLOv4 neural network;
determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
In addition, the multi-target tracking method for the surveillance videos, provided by the invention, further comprises the following steps before the real-time surveillance video to be tracked is obtained:
acquiring a historical monitoring video;
extracting a plurality of historical monitoring video frames from a historical monitoring video according to a preset time length to serve as a first historical monitoring video frame sequence;
labeling target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a labeled historical monitoring video frame sequence;
determining a historical target information group sequence of an annotated historical monitoring video frame sequence;
and training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target recognition model.
In addition, the multi-target tracking method for the surveillance videos, provided by the invention, further comprises the following steps before the real-time surveillance video to be tracked is obtained:
amplifying the size of the convolution layer, the residual module and the network output size of the DeepSORT neural network to obtain an amplified DeepSORT neural network;
carrying out dimensionality reduction on the network structure of the amplified DeepSORT neural network to obtain an improved DeepSORT neural network;
performing frame division processing on the historical monitoring video to obtain a plurality of historical monitoring video frames serving as a second historical monitoring video frame sequence;
carrying out related labeling on the same target information in the second historical monitoring video frame sequence by using a Darklabel tool to obtain historical tracking tracks of a plurality of targets;
and taking the historical target information group sequences as input, taking the historical tracking tracks of the targets as expected output, and training the improved DeepsORT neural network to obtain the multi-target tracking model.
In addition, the multi-target tracking method for the surveillance videos, provided by the invention, further comprises the following steps before the real-time surveillance video to be tracked is obtained:
acquiring a historical monitoring video which is the same as a monitoring scene of a real-time monitoring video to be tracked as a pre-training video;
and training the multi-target tracking model by utilizing the pre-training video to obtain a plurality of initial tracking tracks in the monitoring scene.
Specifically, determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model according to a target information group sequence, specifically comprising:
making the iteration number m equal to 1;
determining the initial tracking track as the tracking track of the 0 th iteration;
matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group;
matching a plurality of targets in the first unmatched target group with a plurality of tracking tracks in the (m-1) th iteration by using an IoU matching algorithm to obtain a second target-tracking track matching group;
combining the first target-tracking track matching group and the second target-tracking track matching group into a total matching group;
updating the corresponding tracking tracks according to the coordinates of the targets in the total matching group to obtain a plurality of tracking tracks in the mth iteration;
performing real-time simulation display on a plurality of tracking tracks in the mth iteration by using a Kalman filtering algorithm;
and increasing the value of m by 1 and returning to the step of matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group until the target information group sequence is traversed to obtain the tracking tracks of the plurality of targets in the time period of the real-time monitoring video to be tracked.
The method comprises the following steps of matching a plurality of targets in an mth target information group in a target information group sequence with a plurality of tracking tracks in an m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group, and specifically comprises the following steps:
determining any target in the mth target information group as a current target;
determining any one of the plurality of tracking tracks in the m-1 iteration as a current tracking track;
using a formula based on the coordinates of the current target
Figure BDA0003536812200000091
Determining motion information values of a current target and a current tracking track; wherein d is(1)(i, j) is the motion information value of the ith tracking track and the jth target, DjDenotes the detection result of the jth target, PiRepresents the ith tracking track, SiRepresenting the covariance between the mean trajectory position predicted by Kalman filtering and the detected positionA matrix;
determining a first matching metric value of a current target and a current tracking track according to the motion information value;
according to the appearance characteristics of the current target, using a formula
Figure BDA0003536812200000092
Determining appearance information values of a current target and a current tracking track; wherein d is(2)(i, j) is the appearance information value of the ith tracking track and the jth target, rj TIs the appearance characteristic matrix r of the jth targetjThe transposed matrix of (2);
Figure BDA0003536812200000093
representing the ith feature on the ith tracking track k; riRepresenting a feature set on the ith tracking track k;
determining a second matching metric value of the current target and the current tracking track according to the appearance information value;
using a formula based on the first matching metric value and the second matching metric value
Figure BDA0003536812200000094
Determining a total matching metric value; wherein, bijThe total matching metric value of the ith tracking track and the jth target is obtained;
Figure BDA0003536812200000095
representing an mth matching metric value; m is 1 or 2;
judging whether the total matching metric value is 1 or not to obtain a first judgment result;
if the first judgment result is yes, adding the current target and the current tracking track into a first target-tracking track matching group;
if the first judgment result is negative, adding the current target into the first unmatched target group;
updating the current tracking track and returning to the step of utilizing a formula according to the coordinates of the current target
Figure BDA0003536812200000101
And determining the motion information values of the current target and the current tracking track until the multiple tracking tracks are traversed during the (m-1) th iteration, updating the current target, and returning to the step of determining any tracking track of the multiple tracking tracks during the (m-1) th iteration to be the current tracking track until the mth target information group is traversed to obtain a first target-tracking track matching group and a first unmatched target group.
Specifically, determining a first matching metric value of the current target and the current tracking track according to the motion information value specifically includes:
judging whether the motion information value is larger than a motion information threshold value or not to obtain a second judgment result;
if the second judgment result is negative, the first matching metric value is made to be 1;
if the second determination result is yes, the first matching metric value is set to 0.
Further, the method is characterized in that a second matching metric value of the current target and the current tracking track is determined according to the appearance information value, and specifically includes:
judging whether the appearance information value is larger than an appearance information threshold value or not to obtain a third judgment result;
if the third judgment result is negative, the first matching metric value is made to be 1;
if the third determination result is yes, the first matching metric value is set to 0.
Referring to fig. 1, a multi-target tracking method for surveillance videos provided by the present invention is specifically described by taking an airport scene surveillance video as an example, and the present invention includes the following steps:
s100, constructing an airport scene multifunctional data set:
constructing the ASMD data set including a training set and a test set for training and evaluating the YOLOv4-TS target detection algorithm, a test set for evaluating the improved deepsrt multi-target tracking algorithm, and a training set for training the improved Re-ID network, as shown in fig. 2, in step S100, may include:
s101, data acquisition:
specifically, a high-definition camera is used for shooting a monitoring video of the airport scene, and a camera interface is remotely accessed to a part of airport scene cameras to obtain original image information under the condition of no secret involvement.
S102, constructing a target detection data set:
specifically, a picture is intercepted every 500ms, each screenshot is labeled by using a LabelImg labeling tool, and a training set and a test set for training and evaluating a YOLOv4-TS target detection algorithm are constructed.
S103, constructing a multi-target tracking data set:
specifically, each frame of the video is labeled by using a Darklabel tool, and a test set for evaluating the improved DeepSORT multi-target tracking algorithm is constructed.
S104, establishing a Re-ID data set:
specifically, each label picture is segmented from an original image according to the positions of different types of target detection frames by using the constructed data set, and a training set for training an improved Re-ID network is constructed.
S110, constructing an airport scene target detection model based on YOLOv 4-TS:
specifically, by training an airport surface target detection model of YOLOv4-TS on an ASMD data set, inputting an airport surface monitoring image, and obtaining a detection frame and coordinate information of a target, as shown in fig. 3, step S110 includes:
s111, feature extraction:
specifically, a CSPDarknet network structure is adopted as a feature extraction network to extract features of an input scene monitoring image. The convolution layers of the MS-COCO data set have 53 layers, namely CSPDarknet53 is used as a feature extraction network, and the feature extraction network is initialized in the pre-training weight of the MS-COCO data set to perform transfer learning.
S112, multi-scale feature information acquisition:
specifically, multi-scale feature information is obtained using an SPP network. The SPP (spatial pyramid pooling) module enables the ability to obtain multi-scale feature information, centered on three different sized kernels in the maximal pooling layer. Wherein each maximum pooling stepSetting the distance size as s and the kernel size as d, outputting the characteristic diagram size YsizeCan be expressed by equation (1).
Figure BDA0003536812200000111
Wherein,
Figure BDA0003536812200000112
for rounding down functions, XsizeFor inputting the feature map size, the step s of the SPP module in the YOLOv4 network is 1, p (pad) is Padding for the image, and both pad and Padding indicate that some pixels are padded in the periphery of the image.
The Padding size depends on the kernel size d, and the calculation method is shown in formula (2).
Figure BDA0003536812200000113
In fig. 4, CBL is a volume block, which is composed of three network layers, i.e., Conv (volume layer), Batch Normalization (Batch Normalization layer), and leakyrleu (activation function), and CBLn is represented by n CBL modules connected together; SPP represents a spatial pyramid pooling layer; up sample and Down sample represent upsampling and downsampling, respectively. As shown in fig. 4-5, input X is obtained by calculationsizeAnd output YsizeAnd equality, namely after the SPP module processing, the size of the output feature map is consistent with that of the original feature map. The lower level feature maps of CNN have higher resolution and therefore contain more detailed information, while the higher level feature maps have larger receptive fields and richer expression information. The SPP module integrates more scale features, absorbs the advantages of low-level and high-level feature maps and obtains more abstract information, thereby enlarging the range of obtaining feature information and improving the prediction performance of the model. The invention adds the SPP modules to the SPP modules originally positioned behind the 19X 19 characteristic diagram of the neck network at the same positions of 38X 38 and 76X 76 respectively, strengthens the detection capability of medium-scale and small-scale targets, and better adapts to the existence of multi-scale targets such as camera scenesThe complex environment of the target. Since the number of SPP modules in the original YOLOv4 network is increased from one to three, the improved network is named as "YOLOv 4-Triple SPP" and is called "YOLOv 4-TS" for short.
S113, multi-scale feature fusion:
in fig. 6, Class indicates Class, box indicates detection box, and mask indicates image mask, which is generally represented by a two-dimensional matrix array, as shown in fig. 6. Specifically, a PANet network is used to fuse different scale features. The method comprises the following steps: first, in the first part, with reference to the FPN structure, feature maps with the same spatial size are generated at the same network stage, using { P }2,P3,P4,P5Denotes generating feature levels for optimizing propagation paths. Second, the second part uses { N } for heuristic of ResNet architecture2,N3,N4,N5Denotes { P }2,P3,P4,P5Corresponding to the generated feature mapping, expanding the path from the N of the lowest level2At first, gradually increase to N5. Finally, each module uses the unprocessed profile PiAnd high resolution feature map NiAnd generating a new characteristic diagram in a transverse connection mode. The self-adaptive feature pool in the third part of the PANet plays the role of fusing single-layer features into multi-layer features, and the feature fusion is utilized to enable the network to have self-adaptive capacity, thereby providing powerful support for the architecture of bottom-up path expansion. The (fourth) part of the PANet is to classify and regress the feature layers fused from the (third) part. The fusion of the full connection layer is positioned in the fifth part of the PANet, the full connection layer mainly bears the work of semantic segmentation, plays the role of predicting and generating Mask, and the two branches fuse and generate Mask to finally obtain a prediction result.
S114, YOLOv4-TS network training:
specifically, training a network by using a data set of a YOLOv4-TS target detection algorithm, adjusting network hyper-parameters, and selecting an optimal network model;
s115, YOLOv4-TS target detection:
specifically, the airport scene monitoring image is sent to a YOLOv4-TS network model for multi-class target detection;
s120, constructing an improved DeepSORT-based multi-target tracking model;
specifically, the improved depth appearance model is trained on an ASMD data set, so that the re-recognition effect of an original network is improved; and then, a DeepsORT multi-target tracking algorithm is utilized, the detection values obtained in the step S110 are used as input, and the tracking tracks of a plurality of targets in the video are obtained. As shown in fig. 7, step S120 includes:
s121, Kalman filtering state prediction:
specifically, a Kalman filtering algorithm is used for carrying out state prediction on the scene multi-target. Wherein, the state prediction equation and covariance matrix equation involved in the state prediction process are shown in equations (3) and (4), and the gain equation, the updated state optimal equation and the optimal estimated covariance matrix equation involved in the state update are shown in equations (5), (6) and (7). Wherein, Xk,kAnd Xk-1,k-1Respectively representing the state vectors corresponding to the instants k and k-1, Xk,k-1Representing the state vector from the time k-1 to the time k; pk,kAnd Pk-1,k-1Respectively representing the covariance matrix, P, corresponding to the k and k-1 instantsk,k-1Representing a covariance matrix from time k-1 to time k; zkRepresenting observation vectors, A representing the state transition matrix at time k-1 to k, B and UkRespectively representing the input gain matrix and the input gain vector, H represents an observation matrix, and covariance matrixes of system noise and observation noise are set to be Q and R, wherein Q and R are not influenced by the system state.
Xk,k-1=AXk-1,k-1+BUk) (3)
Pk,k-1=APk-1,k-1AT+Q (4)
Kk=Pk,k-1HT(HPk,k-1HT+R)-1 (5)
Xk,k=Xk,k-1+Kk(Zk-HXk,k-1) (6)
Pk,k=Pk,k-1-KkHPk,k-1 (7)
S122, Hungarian matching:
specifically, the detection values and the predicted trajectories are further matched using the Hungarian algorithm. The method comprises the steps of matching the similarity of motion information based on Mahalanobis distance and the similarity of a depth appearance model based on improvement, and performing data association fusion on the similarity of motion information and the similarity of the depth appearance model, and comprises the following specific steps:
when there is an uncertainty factor in the motion state of the target, mahalanobis distance can be used to express the motion information between the prediction frame and the detection frame, and the calculation expression is shown in formula (8). Wherein d is(1)(i, j) represents the motion matching information of the ith prediction frame and the jth detection frame, PiIndicates the prediction result of the i-th prediction box, DjDenotes the detection result of the jth detection frame, SiAnd representing a covariance matrix between the average track position and the detection position obtained by Kalman filtering prediction. Furthermore, to ensure that invalid associations are filtered out, mahalanobis distance is thresholded by a 95% confidence interval calculated by the chi-squared distribution, as shown by function (9). Wherein a threshold t is given for a four-dimensional prediction space (x, y, w, h)(1)9.4877, when the prediction box is successfully associated with the detection box, the result is 1. Wherein (x, y) represents the coordinates of the center point of the target, w represents the aspect ratio of the frame of the target, and h represents the height of the frame of the target.
Figure BDA0003536812200000141
Figure BDA0003536812200000142
Since the original feature network is trained on the pedestrian data set, the feature extraction object is mainly a pedestrian, the input size is only 128 × 64, and the original feature network is shown in table 1. Aiming at the ASMD data set constructed by the invention, large-size targets such as vehicles and airplanes exist, and the input size of the original algorithm has certain limitation on the large-size targets. Therefore, the invention is improved on the basis of the original network model, the size and the residual error of the input convolution layer are amplified, the adjusted network input size is expanded to 128 multiplied by 128, meanwhile, the residual error network is deepened, the dimension of the amplified network is reduced, the convergence speed of the training model is ensured, and the improved network structure is shown in table 2.
Table 1 feature extraction network architecture before improvement
Figure BDA0003536812200000143
Table 2 improved feature extraction network architecture
Figure BDA0003536812200000151
The added depth appearance information is realized by the following steps of firstly solving each detection frame DjCorresponding appearance characteristic rjGiving a constraint | | | rj1, |; secondly, establishing a feature set R on the tracking track kiAnd recording one hundred appearance characteristics into a characteristic set R every time one appearance characteristic is successfully associatediPerforming the following steps; finally, a minimum cosine distance expression prediction frame P is introducediAnd a detection frame DjDepth appearance information of. And (3) calculating appearance information of the prediction frame and the detection frame, namely a cosine distance expression (10) through the improved network model training. In addition, d is defined by analogy with equation (8)(1)(i, j) the thresholding is also performed, and the threshold t is defined as shown in function (11) and obtained by the improved network model training(2)
Figure BDA0003536812200000152
Figure BDA0003536812200000153
S123, cascade matching:
fused data association model ci,jThe expression is shown in equation (12). Wherein, λ represents the regulation and control parameter of the model, and the value interval is [0,1 ]]For example, under the condition that the target is blocked for a long time or the motion amplitude of the camera is large, the Kalman filtering prediction effect is extremely poor and has no reference value, and at the moment, the fusion model gives up the predicted motion information, namely, the lambda is made to be 0. In addition, d with spatial position matching is added(3)(i, j) which functions as the matching metric bijIf the trace does not match the predetermined value, IoU is used to indicate that the trace is not matchedσMeasures to make up for this loss, matching measures bijThe expression is shown in formula (13).
Figure BDA0003536812200000161
Figure BDA0003536812200000162
Specifically, a cascade matching algorithm is used to match the detected values with the predicted trajectory. The cascade matching algorithm takes the predicted track and the detection box as input, outputs a successfully matched set and an unsuccessfully matched set by calculating the matching measurement, and performs iterative operation by predicting the track, so that a target with frequent occurrence times obtains a priority matching right, and the problem of mismatching caused by probability dispersion is effectively solved. In addition, the prediction frames meeting the requirements are obtained through screening and are matched, so that the number of times of identity jump caused by long-time shielding of the target can be reduced, and the robustness of the algorithm is effectively improved.
S124, Hungarian-cascade joint matching algorithm:
the method comprises the following steps of inputting a predicted track set P ═ 1.. multidot.n, a detection frame set D ═ 1.. multidot.m, and a maximum continuous pause number A blockedmax
And outputting the successfully matched set M and the unsuccessfully matched set U.
Step 1, a data association model obtained by fusing motion information and appearance information, namely, a value obtained by calculation by using a formula (12) is given to a set Cm={ci,j};
And 2, fusing the matching measurement. That is, the value calculated by the formula (13) is given to the set Bm={bij};
Step 3, initializing an algorithm, and making the set M be an empty set;
step 4, initializing an unmatched detection frame set, and giving a detection frame set D to a set U;
step 5, starting from the predicted track matched to the target, traversing to AmaxThe predicted values of the unmatched targets and the set { x ] of successfully matched targets are obtainedi,j};
Step 6, updating the set, namely the { x to be obtainedi,jThe value of the sum is added to the set M, and the set U is removed from the successfully matched set xi,jAnd assigning the obtained difference set to the set U again, and completing a round of algorithm;
and 7, returning the M, U two sets as initial values to the step 5, and substituting the initial values into the next iteration.
S125, IoU matching:
specifically, IoU matching is performed on the trace of the unconfirmed state, the unmatched trace and the unmatched detection box, as shown in equation (12), followed by the next iteration using the hungarian algorithm.
S126, updating Kalman filtering parameters:
specifically, the parameters are updated and subsequently processed using Kalman filtering.
Aiming at the problem that the existing airport scene monitoring equipment does not completely meet the requirement of accurate perception of scene targets, the multi-target tracking method and the multi-target tracking device based on the airport scene monitoring video solve the problems of high working strength, low efficiency and the like of the existing visual monitoring method used by the airport scene, utilize a deep learning network to quickly perform multi-target tracking on the airport scene, improve the intelligent level of a scene video monitoring system to a certain extent, and reduce the dependence of scene monitoring on manual interpretation.
In addition, the invention also provides a multi-target tracking system of the monitoring video, which comprises the following steps:
the to-be-tracked real-time monitoring video acquisition module is used for acquiring a to-be-tracked real-time monitoring video;
the framing module is used for framing the real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
the target information group sequence determining module is used for inputting the video frame sequence to be tracked into the multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in the YOLOv4 neural network;
the tracking track determining module is used for determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
The historical monitoring video acquisition module is used for acquiring historical monitoring videos;
the first historical monitoring video frame sequence extraction module is used for extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
the first historical monitoring video frame sequence determining module is used for marking target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a first historical monitoring video frame sequence;
the historical target information group sequence determining module is used for determining a historical target information group sequence for marking a historical monitoring video frame sequence;
and the multi-target identification model determining module is used for training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target identification model.
As shown in fig. 8, taking an airport scene surveillance video as an example, the multi-target tracking system for surveillance video provided by the invention includes the following modules: the data acquisition module is used for reading monitoring videos or pictures in the ASMD data set of the local disk; an airport scene target detection module based on the YOLOv4-TS trains an airport scene target detection model of the YOLOv4-TS on an ASMD data set, and inputs an airport scene monitoring image to obtain a detection frame and coordinate information of a target; based on an improved DeepSORT multi-target tracking module, an improved depth appearance model is trained on an ASMD data set, and the re-identification effect of an original network is improved; and then, a DeepsORT multi-target tracking algorithm is utilized, and the detection values obtained by the target detection module are used as input, so that the tracking tracks of a plurality of targets in the video are obtained.
Further, as shown in fig. 9, the data acquisition module includes: the data set unit of the target detection algorithm is used for training and evaluating the YOLOv4-TS target detection algorithm; the multi-target tracking data set unit is used for evaluating an improved DeepSORT multi-target tracking algorithm; and a Re-ID data set unit trains the improved Re-ID network.
Further, as shown in fig. 10, the YOLOv4-TS based airport surface target detection module includes: the CSPDarknet53 feature extraction network unit is used for extracting features of an input scene monitoring image; the spatial pyramid pooling unit is used for acquiring multi-scale feature information; and the path aggregation network unit is used for fusing different scale characteristics.
As shown in fig. 11, the improved DeepSORT-based multi-target tracking module: the Kalman filtering model building unit is used for predicting a motion track on an airport scene; the Hungarian matching unit is used for matching the detection value with the predicted track from the motion information and the appearance information; the cascade matching unit is used for further matching the detection value with the predicted track; moU matching unit, for performing supplementary matching to unmatched value; a Kalman filtering parameter updating unit; for updating the system state.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In summary, this summary should not be construed to limit the present invention.

Claims (10)

1. A multi-target tracking method for surveillance videos is characterized by comprising the following steps:
acquiring a real-time monitoring video to be tracked;
performing framing processing on a real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
inputting the video frame sequence to be tracked into a multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
2. The multi-target tracking method for the surveillance videos as claimed in claim 1, further comprising, before the obtaining of the real-time surveillance video to be tracked:
acquiring a historical monitoring video;
extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
marking target information in each historical monitoring video frame in a first historical monitoring video frame sequence to obtain a marked historical monitoring video frame sequence;
determining a historical target information group sequence of an annotated historical monitoring video frame sequence;
and training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target recognition model.
3. The multi-target tracking method for the surveillance videos as claimed in claim 2, further comprising, before the obtaining of the real-time surveillance video to be tracked:
amplifying the size of the convolution layer, the residual module and the network output size of the DeepSORT neural network to obtain an amplified DeepSORT neural network;
carrying out dimensionality reduction on the network structure of the amplified DeepSORT neural network to obtain an improved DeepSORT neural network;
performing framing processing on the historical monitoring video to obtain a plurality of historical monitoring video frames serving as a second historical monitoring video frame sequence;
carrying out related labeling on the same target information in the second historical monitoring video frame sequence by using a Darklabel tool to obtain historical tracking tracks of a plurality of targets;
and taking a plurality of historical target information group sequences as input, taking historical tracking tracks of a plurality of targets as expected output, and training the improved DeepsORT neural network to obtain the multi-target tracking model.
4. The multi-target tracking method for the surveillance videos as claimed in claim 1, further comprising, before obtaining the real-time surveillance video to be tracked:
acquiring a historical monitoring video which is the same as the monitoring scene of the real-time monitoring video to be tracked as a pre-training video;
and training the multi-target tracking model by using the pre-training video to obtain a plurality of initial tracking tracks in a monitoring scene.
5. The multi-target tracking method for the surveillance video according to claim 4, wherein the determining, according to the target information group sequence, the tracking trajectories of the multiple targets in the real-time surveillance video to be tracked by using a Kalman filtering algorithm and a multi-target tracking model specifically comprises:
making the iteration number m equal to 1;
determining the initial tracking track as the tracking track of the 0 th iteration;
matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group;
matching a plurality of targets in the first unmatched target group with a plurality of tracking tracks in the (m-1) th iteration by using an IoU matching algorithm to obtain a second target-tracking track matching group;
combining the first target-tracking track matching group and the second target-tracking track matching group into a total matching group;
updating the corresponding tracking tracks according to the coordinates of the targets in the total matching group to obtain a plurality of tracking tracks during the mth iteration;
performing real-time simulation display on a plurality of tracking tracks in the mth iteration by using a Kalman filtering algorithm;
and increasing the value of m by 1 and returning to the step of matching a plurality of targets in the mth target information group in the target information group sequence with a plurality of tracking tracks in the m-1 iteration by using a Hungarian algorithm and a cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group until the target information group sequence is traversed to obtain the tracking tracks of the plurality of targets in the time period of the real-time monitoring video to be tracked.
6. The multi-target tracking method for the surveillance video according to claim 5, wherein the matching of the multiple targets in the mth target information group in the target information group sequence with the multiple tracking tracks in the (m-1) th iteration by using the Hungarian algorithm and the cascade algorithm to obtain a first target-tracking track matching group and a first unmatched target group specifically comprises:
determining any target in the mth target information group as a current target;
determining any one of the plurality of tracking tracks in the m-1 iteration as a current tracking track;
using a formula based on the coordinates of the current target
Figure FDA0003536812190000031
Determining motion information values of a current target and a current tracking track; wherein d is(1)(i, j) is the motion information value of the ith tracking track and the jth target, DjDenotes the detection result of the jth target, PiRepresents the ith tracking track, SiRepresenting a covariance matrix between the average track position predicted by Kalman filtering and the detection position;
determining a first matching metric value of the current target and the current tracking track according to the motion information value;
according to the appearance characteristics of the current target, using a formula
Figure FDA0003536812190000032
Determining appearance information values of a current target and a current tracking track; wherein d is(2)(i, j) is the appearance information value of the ith tracking track and the jth target, rj TAppearance feature matrix r for jth objectjThe transposed matrix of (2);
Figure FDA0003536812190000033
represents the ith traceThe ith feature on track k; riRepresenting a feature set on the ith tracking track k;
determining a second matching metric value of the current target and the current tracking track according to the appearance information value;
according to the first matching metric value and the second matching metric value, a formula is utilized
Figure FDA0003536812190000034
Determining a total matching metric value; wherein, bijThe total matching metric value of the ith tracking track and the jth target is obtained;
Figure FDA0003536812190000035
representing an mth matching metric value; m is 1 or 2;
judging whether the total matching metric value is 1 or not to obtain a first judgment result;
if the first judgment result is yes, adding the current target and the current tracking track into a first target-tracking track matching group;
if the first judgment result is negative, adding the current target into a first unmatched target group;
updating the current tracking track and returning to the step of utilizing a formula according to the coordinates of the current target
Figure FDA0003536812190000041
And determining the motion information values of the current target and the current tracking track until the multiple tracking tracks are traversed during the (m-1) th iteration, updating the current target, and returning to the step of determining any tracking track of the multiple tracking tracks during the (m-1) th iteration to be the current tracking track until the mth target information group is traversed to obtain a first target-tracking track matching group and a first unmatched target group.
7. The multi-target tracking method for the surveillance video according to claim 6, wherein the determining a first matching metric value of the current target and the current tracking trajectory according to the motion information value specifically comprises:
judging whether the motion information value is larger than a motion information threshold value or not to obtain a second judgment result;
if the second judgment result is negative, the first matching metric value is made to be 1;
if the second determination result is yes, the first matching metric value is set to 0.
8. The multi-target tracking method for the surveillance video according to claim 6, wherein the determining a second matching metric value of the current target and the current tracking trajectory according to the appearance information value specifically includes:
judging whether the appearance information value is larger than an appearance information threshold value or not to obtain a third judgment result;
if the third judgment result is negative, the first matching metric value is made to be 1;
if the third determination result is yes, the first matching metric value is set to 0.
9. A multi-target tracking system for surveillance video, the system comprising:
the to-be-tracked real-time monitoring video acquisition module is used for acquiring a to-be-tracked real-time monitoring video;
the framing module is used for framing the real-time monitoring video to be tracked to obtain a video frame sequence to be tracked;
the target information group sequence determining module is used for inputting the video frame sequence to be tracked into the multi-target recognition model to obtain a target information group sequence; any target information group in the target information group sequence comprises the coordinates and the types of all targets in the same video frame to be tracked; the multi-target recognition model is obtained by training a YOLOv4-TS neural network by using a historical monitoring video; the YOLOv4-TS neural network is obtained by adding a spatial pyramid pooling module in a YOLOv4 neural network;
the tracking track determining module is used for determining the tracking tracks of a plurality of targets in the real-time monitoring video to be tracked by utilizing a Kalman filtering algorithm and a multi-target tracking model according to the target information group sequence; the multi-target tracking model is obtained by training a DeepSORT neural network by using historical monitoring videos.
10. The multi-target tracking system for surveillance videos of claim 9, further comprising:
the historical monitoring video acquisition module is used for acquiring historical monitoring videos;
the first historical monitoring video frame sequence extraction module is used for extracting a plurality of historical monitoring video frames from the historical monitoring video according to preset time length to serve as a first historical monitoring video frame sequence;
the first historical monitoring video frame sequence determining module is used for marking target information in each historical monitoring video frame in the first historical monitoring video frame sequence to obtain a first historical monitoring video frame sequence;
the historical target information group sequence determining module is used for determining a historical target information group sequence for marking a historical monitoring video frame sequence;
and the multi-target identification model determining module is used for training the YOLOv4-TS neural network by taking the marked historical monitoring video frame sequence as input and the historical target information group sequence as expected output to obtain the multi-target identification model.
CN202210220010.0A 2022-03-08 2022-03-08 Multi-target tracking method and system for monitoring video Pending CN114596340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210220010.0A CN114596340A (en) 2022-03-08 2022-03-08 Multi-target tracking method and system for monitoring video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210220010.0A CN114596340A (en) 2022-03-08 2022-03-08 Multi-target tracking method and system for monitoring video

Publications (1)

Publication Number Publication Date
CN114596340A true CN114596340A (en) 2022-06-07

Family

ID=81808080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210220010.0A Pending CN114596340A (en) 2022-03-08 2022-03-08 Multi-target tracking method and system for monitoring video

Country Status (1)

Country Link
CN (1) CN114596340A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115464659A (en) * 2022-10-05 2022-12-13 哈尔滨理工大学 Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information
CN115880620A (en) * 2022-09-13 2023-03-31 中信重工开诚智能装备有限公司 Personnel counting method applied to cart early warning system
CN116403162A (en) * 2023-04-11 2023-07-07 南京航空航天大学 Airport scene target behavior recognition method and system and electronic equipment
CN118429948A (en) * 2024-07-05 2024-08-02 广州国交润万交通信息有限公司 Traffic signal lamp monitoring method and device based on video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158983A (en) * 2021-05-18 2021-07-23 南京航空航天大学 Airport scene activity behavior recognition method based on infrared video sequence image
CN113269098A (en) * 2021-05-27 2021-08-17 中国人民解放军军事科学院国防科技创新研究院 Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN113420607A (en) * 2021-05-31 2021-09-21 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-scale target detection and identification method for unmanned aerial vehicle

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158983A (en) * 2021-05-18 2021-07-23 南京航空航天大学 Airport scene activity behavior recognition method based on infrared video sequence image
CN113269098A (en) * 2021-05-27 2021-08-17 中国人民解放军军事科学院国防科技创新研究院 Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN113420607A (en) * 2021-05-31 2021-09-21 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-scale target detection and identification method for unmanned aerial vehicle

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
周嘉麒等: "基于融合数据关联的无人机多目标跟踪算法", 《舰船电子工程》, pages 48 - 54 *
蒋镕圻等: "嵌入scSE模块的改进YOLOv4小目标检测算法", 《图学学报》, pages 2 *
赵朵朵;章坚武;傅剑峰;: "基于深度学习的实时人流统计方法研究", 传感技术学报, no. 08 *
黄玉富等: "基于多尺度特征融合的水果图像识别算法研究", 《长春理工大学学报(自然科学版)》, pages 1 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880620A (en) * 2022-09-13 2023-03-31 中信重工开诚智能装备有限公司 Personnel counting method applied to cart early warning system
CN115880620B (en) * 2022-09-13 2023-11-07 中信重工开诚智能装备有限公司 Personnel counting method applied to cart early warning system
CN115464659A (en) * 2022-10-05 2022-12-13 哈尔滨理工大学 Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information
CN115464659B (en) * 2022-10-05 2023-10-24 哈尔滨理工大学 Mechanical arm grabbing control method based on visual information deep reinforcement learning DDPG algorithm
CN116403162A (en) * 2023-04-11 2023-07-07 南京航空航天大学 Airport scene target behavior recognition method and system and electronic equipment
CN116403162B (en) * 2023-04-11 2023-10-27 南京航空航天大学 Airport scene target behavior recognition method and system and electronic equipment
CN118429948A (en) * 2024-07-05 2024-08-02 广州国交润万交通信息有限公司 Traffic signal lamp monitoring method and device based on video

Similar Documents

Publication Publication Date Title
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN109559320B (en) Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network
CN110660082B (en) Target tracking method based on graph convolution and trajectory convolution network learning
CN114596340A (en) Multi-target tracking method and system for monitoring video
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN111709416B (en) License plate positioning method, device, system and storage medium
JP6650657B2 (en) Method and system for tracking moving objects in video using fingerprints
CN111626128A (en) Improved YOLOv 3-based pedestrian detection method in orchard environment
CN112052802B (en) Machine vision-based front vehicle behavior recognition method
CN108022258B (en) Real-time multi-target tracking method based on single multi-frame detector and Kalman filtering
CN112101221A (en) Method for real-time detection and identification of traffic signal lamp
Javadi et al. Vehicle detection in aerial images based on 3D depth maps and deep neural networks
CN112991391A (en) Vehicle detection and tracking method based on radar signal and vision fusion
CN110532937B (en) Method for accurately identifying forward targets of train based on identification model and classification model
JP2014071902A5 (en)
CN105809716A (en) Superpixel and three-dimensional self-organizing background subtraction algorithm-combined foreground extraction method
CN110567324A (en) multi-target group threat degree prediction device and method based on DS evidence theory
CN116434088A (en) Lane line detection and lane auxiliary keeping method based on unmanned aerial vehicle aerial image
CN117949942B (en) Target tracking method and system based on fusion of radar data and video data
CN113689459B (en) Real-time tracking and mapping method based on GMM and YOLO under dynamic environment
CN114332444A (en) Complex starry sky background target identification method based on incremental drift clustering
Fan et al. Covered vehicle detection in autonomous driving based on faster rcnn
Jiang et al. Surveillance from above: A detection-and-prediction based multiple target tracking method on aerial videos
CN112069997A (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
CN106650814B (en) Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220607