CN111767847A - Pedestrian multi-target tracking method integrating target detection and association - Google Patents

Pedestrian multi-target tracking method integrating target detection and association Download PDF

Info

Publication number
CN111767847A
CN111767847A CN202010605987.5A CN202010605987A CN111767847A CN 111767847 A CN111767847 A CN 111767847A CN 202010605987 A CN202010605987 A CN 202010605987A CN 111767847 A CN111767847 A CN 111767847A
Authority
CN
China
Prior art keywords
target
sub
model
detection
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010605987.5A
Other languages
Chinese (zh)
Other versions
CN111767847B (en
Inventor
杨航
杨海东
黄坤山
彭文瑜
林玉山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Original Assignee
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute, Foshan Guangdong University CNC Equipment Technology Development Co. Ltd filed Critical Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority to CN202010605987.5A priority Critical patent/CN111767847B/en
Publication of CN111767847A publication Critical patent/CN111767847A/en
Application granted granted Critical
Publication of CN111767847B publication Critical patent/CN111767847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)

Abstract

The invention discloses a pedestrian multi-target tracking method integrating target detection and association, which comprises the following steps: training the tracking model network by adopting a training data set to obtain a tracking model; a first frame of image in the video stream to be tracked passes through the detection sub-model, and a boundary frame of the pedestrian target is generated according to the heat map and the offset vector; extracting a characteristic vector for each pedestrian target through an appearance characteristic extraction sub-model and distributing an ID and a track; and sequentially passing other frame images in the video stream to be tracked through the tracking model, determining the corresponding track positions of the pedestrians in the current frame image according to the similarity of the feature vectors in the two adjacent frame images in the video stream to be tracked, and connecting the track positions corresponding to the same ID in all the frame images in the video stream to be tracked, namely the corresponding tracking result. The invention overcomes the defects of low speed and frequent identity switching under the condition of object shielding in the traditional tracking method.

Description

Pedestrian multi-target tracking method integrating target detection and association
Technical Field
The invention relates to the field of pedestrian tracking, in particular to a pedestrian multi-target tracking method integrating target detection and association.
Background
The main task of the multi-target tracking is to give an image sequence, find moving objects in the image sequence, correspond moving objects in different frames one by one, and then give the motion tracks of different objects. These objects may be arbitrary, such as pedestrians, vehicles, athletes, various animals, etc., and most studied is pedestrian tracking. This is because firstly the "pedestrian" is a typical non-rigid target, which is more difficult than a rigid target, and secondly the detection and tracking of the pedestrian is more commercially valuable in practical applications. By incomplete statistics, at least 75% of multi-objective tracking studies are studying pedestrian tracking.
Due to the development and application of convolutional neural networks, many tasks in the field of computer vision have been greatly developed, and many target methods based on convolutional neural networks are also applied to the problem of image recognition. How to expand the application of deep learning in the field of multi-target tracking of pedestrians is a challenging subject to study a deep learning algorithm suitable for a multi-target tracking problem. In the common multi-target pedestrian tracking method in the prior art, target detection and association are divided into two steps and are respectively trained by different data sets, but the method causes the problem that the operation speed of a tracking model is slow, and the real-time tracking effect is difficult to achieve.
Disclosure of Invention
In view of the shortcomings of the prior art, the present invention aims to provide an integrated target detection and associated pedestrian multi-target tracking method.
In order to achieve the purpose, the invention adopts the following technical scheme: a pedestrian multi-target tracking method integrating target detection and association comprises the following steps:
s01: training the tracking model network by adopting a training data set to obtain a tracking model; the tracking model comprises a detection submodel and an appearance characteristic extraction submodel connected behind the detection submodel; the method specifically comprises the following steps:
s011: training the detection submodel network by adopting a first training data set to obtain a detection submodel; the detection sub-model network sequentially comprises two convolution pooling sub-networks and two recursion sub-networks which are connected together, wherein the output end of one convolution pooling sub-network is connected with the input end of one of the recursion sub-networks; the convolution pooling sub-network comprises a convolution layer and a pooling layer, and the output end of the convolution layer is connected with the input end of the pooling layer;
s012: training the appearance characteristic extraction sub-model network by adopting a second training data set to obtain an appearance characteristic extraction sub-model; the appearance characteristic extraction sub-model network sequentially comprises a convolution layer, a pooling layer, three residual blocks and a full connection layer;
s02: inputting a video stream to be tracked into a tracking model to obtain a tracking result; the method specifically comprises the following steps:
s021: a first frame of image in the video stream to be tracked passes through the detection sub-model, and a boundary frame of the pedestrian target is generated according to the heat map and the offset vector; extracting a characteristic vector for each pedestrian target through an appearance characteristic extraction sub-model and distributing an ID and a track;
s022: other frames of images in the video stream to be tracked sequentially pass through the tracking model, each frame of image passes through the detection sub-model, and a boundary frame of the pedestrian target is generated according to the heat map and the offset vector; then generating a feature vector of the pedestrian target through an appearance feature extraction submodel; and determining the track position corresponding to each pedestrian in the current frame image according to the distance between the feature vectors in two adjacent frame images in the video stream to be tracked, and connecting the track positions corresponding to the same ID in all the frame images in the video stream to be tracked, namely the track positions corresponding to the frame images are the tracking results.
Further, the step S011 specifically includes:
s0111: the first training data set sequentially passes through two convolution pooling sub-networks and two recursion sub-networks, and outputs a heat map;
s0112: setting a hyper-parameter of a detection sub-model network, inputting a first training data set into the detection sub-model network for training, and determining a parameter of the detection sub-model network; the hyper-parameters of the detection sub-model network comprise a learning rate, iteration times, batch processing size and a heat map score threshold;
s0113: in the heat map, non-maximum suppression is performed according to the heat map score to extract peak key points, the positions of the key points with the heat map score larger than a threshold value are reserved, and the coordinate of the boundary frame is calculated to obtain the heat map containing the boundary frame.
Furthermore, before each down-sampling in the recursive sub-network, an upper half path is divided to retain original scale information; after each up-sampling, adding the data of the previous scale; extracting features by using three residual blocks between two times of downsampling; between the two additions, a residual block is used to extract features.
Further, the residual block includes a convolution path and a jump path, the convolution path includes three convolution layers with different convolution kernels connected in series, and the jump path includes a convolution layer with a convolution kernel of 1.
Further, step S012 specifically includes:
s0121: inputting the heatmap containing the boundary frame into an appearance feature extraction sub-model network, setting a similarity threshold, judging that the two target features are the same target if the similarity between the two target features is greater than or equal to the similarity threshold, and judging that the two target features are not the same target if the similarity between the two target features is less than the similarity threshold;
s0122: setting a hyper-parameter of the appearance characteristic extraction sub-model network, inputting a second training data set into the appearance characteristic extraction sub-model network, and determining a parameter of the appearance characteristic extraction sub-model network; the appearance characteristic extraction sub-model network hyper-parameters comprise a learning rate, iteration times, batch processing size and a similarity threshold.
Further, in step S022, according to the similarity of the feature vectors and the size of the IOU, the distance between the detection targets in the M-1 th frame and the M-th frame of the video stream to be tracked is measured, when the distance is smaller than a distance threshold, the matching is considered to be successful, the ID of the corresponding target in the previous frame of image is inherited, otherwise, the matching is unsuccessful, and a new ID is assigned to the target.
Further, the similarity is calculated by using a mahalanobis distance or an euclidean distance.
Further, in step S022, kalman filtering is used to predict the position of the trajectory in the current frame image, and if the distance between the trajectory position in the current frame image and the link detection is greater than the trajectory threshold, the distance is set to infinity.
Further, the first training data set includes data sets in an ETH database, a CityPerson database, a calech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
Further, the second training data set includes data sets in a CalTech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
The invention has the beneficial effects that: in order to obtain good balance between precision and speed, more skip connections are arranged between low-level and high-level characteristics, meanwhile, intermediate supervision is adopted, loss is calculated for the output of two recursive sub-networks, repeated bidirectional reasoning is adopted to ensure that bottom-layer parameters are normally updated, and the accuracy of detection is improved; the invention abandons the traditional mode of separating the detection part from the association part in the past, and greatly reduces the calculation amount and the running time by integrating the two modules. Meanwhile, the dimensionality of the feature vector is reduced in the appearance feature extraction submodel, so that overfitting can be relieved, robustness is improved, and the calculated amount and the running time are reduced. The method overcomes the defects of low speed and frequent identity switching under the condition of object shielding in the traditional tracking method, and can be applied to video monitoring scenes in areas with large human flow, such as crossroads and the like.
Drawings
FIG. 1 is a schematic view of a tracking model of the present invention;
FIG. 2 is a schematic diagram of a recursive subnetwork structure of the present invention;
FIG. 3 is a schematic diagram of a residual block structure according to the present invention;
FIG. 4 is a flow chart of the pedestrian tracking of the video stream to be tracked according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and the detailed description below:
the invention provides a pedestrian multi-target tracking method integrating target detection and association, which comprises the following steps:
s01: training the tracking model network by adopting a training data set to obtain a tracking model; the tracking model comprises a detection submodel and an appearance characteristic extraction submodel connected behind the detection submodel. The tracking model network is a preliminary model frame, and at the moment, each model parameter in the model frame is uncertain. The purpose of training is to determine each model parameter, and substituting the determined model parameter into the model network is the trained tracking model. The training data set adopted in the invention can be but is not limited to the training data set collected in the existing pedestrian database, such as ETH and CityPerson data sets, which only provide the annotation of the bounding box, and can be used as a first training data set for training the detection sub-model network; the CalTech, MOT17, CUHK-SYSU and PRW databases provide bounding boxes and identity annotations, which can be used as a second training data set for training the appearance feature extraction submodel network, and can also be used as a first training data set for training the detection submodel network.
The specific training process comprises the following steps of training the detection submodel model and training the appearance characteristic extraction submodel network:
s011: training the detection submodel network by adopting a first training data set to obtain a detection submodel; the detection sub-model network sequentially comprises two convolution pooling sub-networks and two recursion sub-networks which are connected together, wherein the output end of one convolution pooling sub-network is connected with the input end of one of the recursion sub-networks; the convolution pooling subnetwork includes a convolution layer and a pooling layer, and an output of the convolution layer is connected to an input of the pooling layer. The specific training process for detecting the sub-model network comprises the following steps:
s0111: the first training data set sequentially passes through two convolution pooling sub-networks and two recursion sub-networks, and outputs a heat map; as shown in fig. 1, the first training data set sequentially passes through successive convolutional layers, pooling layers, convolutional layers and pooling layers, the convolutional layers adopt 3 × 3 convolutional kernels, and the pooling layers actually use maximum pooling to obtain a feature map with the resolution of 1/4 of the original image; the signature graph is then input into two successive recursive subnetworks, and a heat map is output. The recursive sub-network structure is shown in fig. 2, the dimension of the input increases from left to middle, the resolution of the feature map decreases, and the dimension decreases from middle to right, and the resolution of the feature map increases. In the invention, before each down-sampling in a recursive sub-network, an upper half path is divided to retain original scale information; after each up-sampling, adding the data of the previous scale; for example, the c4b network layer is combined by a c7 network layer and a c4a network layer, wherein the c7 network layer doubles the resolution by upsampling, and in order to increase the resolution of the feature map, for example, the convolution kernel of the c7 network layer is 4 × 4, and then the convolution kernel obtained after upsampling is 8 × 8. The size of the c4a network layer is consistent with that of the c4 network layer, and the c4a network layer can be regarded as a 'copy' of the c4 network layer, the convolution kernel of the copy is twice of that of the c7 network layer, the size of the copy is just consistent with that of the c7 network layer after being upsampled, and the values can be directly added, so that the c4b network layer is obtained.
The above process includes a number of pooling and upsampling processes, where the pooling layer employs maximal pooling for redundancy reduction, and the upsampling layer uses nearest neighbor interpolation. Before each down-sampling, a path is divided to retain the original scale information, and after each up-sampling, the path is fused with the feature of the previous scale. Specifically, three residual blocks are used for extracting features between two times of downsampling, and one residual block is used for extracting features between two times of feature map fusion. The advantage of this structure is that the characteristics of the object may appear in different network layers, the conventional convolutional network model is easy to cause characteristic loss, and the multi-scale characteristics can be effectively combined by using the hopping connection mode.
The residual block in the recursive sub-network is shown in fig. 3, and includes a convolution path and a skip level path, where the convolution path includes three convolution layers with different kernel scales connected in series, and the skip level path includes a convolution layer with a convolution kernel of 1. M is the number of input channels, i.e., the input depth, and N is the number of output channels, i.e., the output depth. The characteristic of the residual block structure is that the dimension of the feature map can be increased or decreased, and the resolution of the feature map is not changed; the residual block extracts the characteristics of a higher level (convolution path), and simultaneously retains the information of the original level (jump path), and the residual block does not change the data size and only changes the data depth; it can be considered as an advanced convolutional layer of guaranteed size. Meanwhile, due to the characteristic of the residual block structure, the characteristic in the characteristic diagram is ensured not to be lost easily, and the problem of gradient disappearance can be effectively relieved. Each recursive subnetwork outputs a heat map of 1 number of channels at the same resolution as the original and regresses an offset vector for each corresponding target point on the heat map.
The invention adopts a structure of cascading 2 recursion sub-networks, the heat map generated by each recursion sub-network and a true value calculate a loss function, the value of each point on the heat map is between 0 and 1, and the numerical value is closer to 1 when the probability that the point is predicted to be the target central point is higher. Considering the positive and negative sample imbalances, we use focal length as a loss function.
Figure BDA0002561075650000071
If the gradient reduction is directly carried out on the whole characteristic diagram, the error of an output layer is greatly reduced through the back propagation of multiple layers, namely, the gradient disappearance occurs. We therefore employ two recursive sub-network cascades in combination with the use of intermediate supervision, with repeated bi-directional reasoning to ensure that the underlying parameters are updated properly. The intermediate supervision refers to that the heat map output by each network layer in the recursive sub-network is used as the prediction, the prediction precision is far better than the effect of only using the heat map output by the last network layer as the prediction, and the supervision training method considering the intermediate network layer is the intermediate supervision.
S0112: setting a hyper-parameter of a detection sub-model network, inputting a first training data set into the detection sub-model network for training, and determining a parameter of the detection sub-model network; detecting the hyper-parameters of the sub-model network, including learning rate, iteration times, batch processing size and heat map score threshold;
s0113: on the obtained heat map, non-maximum inhibition is carried out according to the heat map score to extract peak key points, the positions of the key points with the heat map score larger than a threshold value are reserved, and then corresponding boundary frame coordinates are calculated according to the estimated offset vector and the size of the boundary frame, so that the heat map containing the boundary frame is obtained.
S012: training the appearance characteristic extraction sub-model network by adopting a second training data set to obtain an appearance characteristic extraction sub-model; with reference to fig. 1, the appearance feature extraction submodel network sequentially includes a convolutional layer, a pooling layer, three residual blocks, and a full link layer. The specific training process is as follows:
s0121: and inputting the heat map containing the boundary frame into an appearance feature extraction sub-model network, wherein the heat map sequentially passes through a convolution layer, a pooling layer, three residual blocks and a final full-connection layer to generate a 128-dimensional feature vector for each target in the heat map. The goal of the appearance information extraction module is to generate feature vectors that can distinguish between different objects. Ideally, the distance between different objects should be greater than the distance between the same objects. The Mahalanobis distance can be used as measurement, a Mahalanobis distance threshold value is set before training, the Mahalanobis distance between the targets is considered as the same target when the Mahalanobis distance of the characteristic vectors between the targets is smaller than the threshold value, and the Mahalanobis distance between the characteristic vectors between the targets is judged as different targets when the Mahalanobis distance between the characteristic vectors between the targets is larger than or equal to the threshold value.
S0122: setting a hyper-parameter of the appearance characteristic extraction sub-model network, inputting a second training data set into the appearance characteristic extraction sub-model network, and determining a parameter of the appearance characteristic extraction sub-model network; the appearance characteristic extraction sub-model network hyper-parameters comprise a learning rate, iteration times, batch processing size and a similarity threshold.
S02: as shown in fig. 4, a video stream to be tracked is input into a tracking model to obtain a tracking result; the method specifically comprises the following steps:
s021: a first frame of image in a video stream to be tracked passes through a detection sub-model, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; extracting a characteristic vector for each pedestrian target through an appearance characteristic extraction sub-model and distributing an ID and a track;
s022: other frames of images in the video stream to be tracked sequentially pass through the tracking model, each frame of image passes through the detection sub-model, and a boundary frame of the pedestrian target is generated according to the heat map and the offset vector; then generating a feature vector of the pedestrian target through an appearance feature extraction submodel;
and determining the track position corresponding to each pedestrian in the current frame image according to the distance between the feature vectors in two adjacent frame images in the video stream to be tracked, and connecting the track positions corresponding to the same ID in all the frame images in the video stream to be tracked, namely the track positions corresponding to the frame images are the tracking results.
For example, in the M frames, the distance between the detection targets in the M-1 th frame and the M-th frame is measured according to the similarity of the feature vectors between the targets and the size of IOU (interaction Intersection), wherein the similarity is calculated by adopting the Mahalanobis distance or the Euclidean distance. And when the distance is smaller than the set threshold value, the pedestrian target is considered to be successfully matched, namely the pedestrian target and the pedestrian target are the same pedestrian target, at the moment, the pedestrian target in the M-th frame image inherits the ID of the corresponding pedestrian target in the M-1-th frame image, and if the matching is unsuccessful, a new ID is allocated to the target. The invention can also adopt Kalman filtering to predict the position of the track in the current frame, if the track position distance link detection in the current frame image is greater than the track threshold value, the distance is set to be infinite, thereby effectively preventing the link of the detection and a large moving target.
Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.

Claims (10)

1. A pedestrian multi-target tracking method integrating target detection and association is characterized by comprising the following steps:
s01: training the tracking model network by adopting a training data set to obtain a tracking model; the tracking model comprises a detection submodel and an appearance characteristic extraction submodel connected behind the detection submodel; the method specifically comprises the following steps:
s011: training the detection submodel network by adopting a first training data set to obtain a detection submodel; the detection sub-model network sequentially comprises two convolution pooling sub-networks and two recursion sub-networks which are connected together, wherein the output end of one convolution pooling sub-network is connected with the input end of one of the recursion sub-networks; the convolution pooling sub-network comprises a convolution layer and a pooling layer, and the output end of the convolution layer is connected with the input end of the pooling layer;
s012: training the appearance characteristic extraction sub-model network by adopting a second training data set to obtain an appearance characteristic extraction sub-model; the appearance characteristic extraction sub-model network sequentially comprises a convolution layer, a pooling layer, three residual blocks and a full connection layer;
s02: inputting a video stream to be tracked into a tracking model to obtain a tracking result; the method specifically comprises the following steps:
s021: a first frame of image in the video stream to be tracked passes through the detection sub-model, and a boundary frame of the pedestrian target is generated according to the heat map and the offset vector; extracting a characteristic vector for each pedestrian target through an appearance characteristic extraction sub-model and distributing an ID and a track;
s022: other frames of images in the video stream to be tracked sequentially pass through the tracking model, each frame of image passes through the detection sub-model, and a boundary frame of the pedestrian target is generated according to the heat map and the offset vector; then generating a feature vector of the pedestrian target through an appearance feature extraction submodel; and determining the track position corresponding to each pedestrian in the current frame image according to the distance between the feature vectors in two adjacent frame images in the video stream to be tracked, and connecting the track positions corresponding to the same ID in all the frame images in the video stream to be tracked, namely the track positions corresponding to the frame images are the tracking results.
2. The integrated target detection and associated pedestrian multi-target tracking method according to claim 1, wherein the step S011 specifically includes:
s0111: the first training data set sequentially passes through two convolution pooling sub-networks and two recursion sub-networks, and outputs a heat map;
s0112: setting a hyper-parameter of a detection sub-model network, inputting a first training data set into the detection sub-model network for training, and determining a parameter of the detection sub-model network; the hyper-parameters of the detection sub-model network comprise a learning rate, iteration times, batch processing size and a heat map score threshold;
s0113: in the heat map, non-maximum suppression is performed according to the heat map score to extract peak key points, the positions of the key points with the heat map score larger than a threshold value are reserved, and the coordinate of the boundary frame is calculated to obtain the heat map containing the boundary frame.
3. The integrated target detection and associated pedestrian multi-target tracking method according to claim 2, wherein before each down-sampling in the recursive sub-network, an upper half way is divided to retain original scale information; after each up-sampling, adding the data of the previous scale; extracting features by using three residual blocks between two times of downsampling; between the two additions, a residual block is used to extract features.
4. The integrated target detection and associated pedestrian multi-target tracking method of claim 3, wherein the residual block comprises a convolution path and a jump stage path, the convolution path comprises three convolution layers with different convolution kernels connected in series, and the jump stage path comprises one convolution layer with a convolution kernel of 1.
5. The method for integrating target detection and related pedestrian multi-target tracking according to claim 2, wherein the step S012 specifically comprises:
s0121: inputting the heatmap containing the boundary frame into an appearance feature extraction sub-model network, setting a similarity threshold, judging that the two target features are the same target if the similarity between the two target features is greater than or equal to the similarity threshold, and judging that the two target features are not the same target if the similarity between the two target features is less than the similarity threshold;
s0122: setting a hyper-parameter of the appearance characteristic extraction sub-model network, inputting a second training data set into the appearance characteristic extraction sub-model network, and determining a parameter of the appearance characteristic extraction sub-model network; the appearance characteristic extraction sub-model network hyper-parameters comprise a learning rate, iteration times, batch processing size and a similarity threshold.
6. The method according to claim 1, wherein in step S022, according to the similarity of feature vectors and the IOU size, the distance between the detected target in the M-1 th frame and the detected target in the M-th frame of the video stream to be tracked is measured according to the M frame, and when the distance is smaller than a distance threshold, the matching is considered to be successful, and the ID of the corresponding target in the previous frame of image is inherited, otherwise, the matching is unsuccessful, and a new ID is assigned to the target.
7. The integrated object detection and association pedestrian multi-target tracking method according to claim 6, wherein the similarity is calculated using mahalanobis distance or euclidean distance.
8. The method according to claim 1, wherein Kalman filtering is adopted to predict the position of the trajectory in the current frame image in step S022, and if the distance between the trajectory position in the current frame image and the link detection is greater than a trajectory threshold, the distance is set to infinity.
9. The integrated object detection and associated pedestrian multi-target tracking method of claim 1, wherein the first training data set comprises data sets in an ETH database, a CityPerson database, a calech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
10. The integrated object detection and associated pedestrian multi-target tracking method of claim 1, wherein the second training data set comprises data sets in a CalTech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
CN202010605987.5A 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association Active CN111767847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010605987.5A CN111767847B (en) 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010605987.5A CN111767847B (en) 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association

Publications (2)

Publication Number Publication Date
CN111767847A true CN111767847A (en) 2020-10-13
CN111767847B CN111767847B (en) 2023-06-09

Family

ID=72722916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010605987.5A Active CN111767847B (en) 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association

Country Status (1)

Country Link
CN (1) CN111767847B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112258559A (en) * 2020-10-26 2021-01-22 上海萱闱医疗科技有限公司 Intelligent running timing scoring system and method based on multi-target tracking
CN112417988A (en) * 2020-10-30 2021-02-26 深圳点猫科技有限公司 Video multi-target tracking method, device and equipment based on deep learning
CN113221787A (en) * 2021-05-18 2021-08-06 西安电子科技大学 Pedestrian multi-target tracking method based on multivariate difference fusion
CN113362372A (en) * 2021-05-25 2021-09-07 同济大学 Single target tracking method and computer readable medium
CN113378704A (en) * 2021-06-09 2021-09-10 武汉理工大学 Multi-target detection method, equipment and storage medium
CN113744220A (en) * 2021-08-25 2021-12-03 中国科学院国家空间科学中心 PYNQ-based preselection-frame-free detection system
CN114998999A (en) * 2022-07-21 2022-09-02 之江实验室 Multi-target tracking method and device based on multi-frame input and track smoothing

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086648A (en) * 2018-05-24 2018-12-25 同济大学 A kind of method for tracking target merging target detection and characteristic matching
US20190005657A1 (en) * 2017-06-30 2019-01-03 Baidu Online Network Technology (Beijing) Co., Ltd . Multiple targets-tracking method and apparatus, device and storage medium
CN109271933A (en) * 2018-09-17 2019-01-25 北京航空航天大学青岛研究院 The method for carrying out 3 D human body Attitude estimation based on video flowing
CN109816690A (en) * 2018-12-25 2019-05-28 北京飞搜科技有限公司 Multi-target tracking method and system based on depth characteristic
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110188690A (en) * 2019-05-30 2019-08-30 青岛伴星智能科技有限公司 A kind of intelligent vision analysis system based on unmanned plane, intelligent vision analysis system and method
CN110728698A (en) * 2019-09-30 2020-01-24 天津大学 Multi-target tracking model based on composite cyclic neural network system
CN111126152A (en) * 2019-11-25 2020-05-08 国网信通亿力科技有限责任公司 Video-based multi-target pedestrian detection and tracking method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005657A1 (en) * 2017-06-30 2019-01-03 Baidu Online Network Technology (Beijing) Co., Ltd . Multiple targets-tracking method and apparatus, device and storage medium
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109086648A (en) * 2018-05-24 2018-12-25 同济大学 A kind of method for tracking target merging target detection and characteristic matching
CN109271933A (en) * 2018-09-17 2019-01-25 北京航空航天大学青岛研究院 The method for carrying out 3 D human body Attitude estimation based on video flowing
CN109816690A (en) * 2018-12-25 2019-05-28 北京飞搜科技有限公司 Multi-target tracking method and system based on depth characteristic
CN110188690A (en) * 2019-05-30 2019-08-30 青岛伴星智能科技有限公司 A kind of intelligent vision analysis system based on unmanned plane, intelligent vision analysis system and method
CN110728698A (en) * 2019-09-30 2020-01-24 天津大学 Multi-target tracking model based on composite cyclic neural network system
CN111126152A (en) * 2019-11-25 2020-05-08 国网信通亿力科技有限责任公司 Video-based multi-target pedestrian detection and tracking method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘嘉威: "视频监控中的行人再识别算法研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112258559A (en) * 2020-10-26 2021-01-22 上海萱闱医疗科技有限公司 Intelligent running timing scoring system and method based on multi-target tracking
CN112258559B (en) * 2020-10-26 2024-05-07 上海萱闱医疗科技有限公司 Intelligent running timing scoring system and method based on multi-target tracking
CN112417988A (en) * 2020-10-30 2021-02-26 深圳点猫科技有限公司 Video multi-target tracking method, device and equipment based on deep learning
CN113221787A (en) * 2021-05-18 2021-08-06 西安电子科技大学 Pedestrian multi-target tracking method based on multivariate difference fusion
CN113221787B (en) * 2021-05-18 2023-09-29 西安电子科技大学 Pedestrian multi-target tracking method based on multi-element difference fusion
CN113362372A (en) * 2021-05-25 2021-09-07 同济大学 Single target tracking method and computer readable medium
CN113362372B (en) * 2021-05-25 2023-05-02 同济大学 Single target tracking method and computer readable medium
CN113378704A (en) * 2021-06-09 2021-09-10 武汉理工大学 Multi-target detection method, equipment and storage medium
CN113744220A (en) * 2021-08-25 2021-12-03 中国科学院国家空间科学中心 PYNQ-based preselection-frame-free detection system
CN113744220B (en) * 2021-08-25 2024-03-26 中国科学院国家空间科学中心 PYNQ-based detection system without preselection frame
CN114998999A (en) * 2022-07-21 2022-09-02 之江实验室 Multi-target tracking method and device based on multi-frame input and track smoothing
CN114998999B (en) * 2022-07-21 2022-12-06 之江实验室 Multi-target tracking method and device based on multi-frame input and track smoothing

Also Published As

Publication number Publication date
CN111767847B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
CN110400332B (en) Target detection tracking method and device and computer equipment
CN112184752A (en) Video target tracking method based on pyramid convolution
CN111476817A (en) Multi-target pedestrian detection tracking method based on yolov3
CN112560656A (en) Pedestrian multi-target tracking method combining attention machine system and end-to-end training
CN110991274B (en) Pedestrian tumbling detection method based on Gaussian mixture model and neural network
CN112651995A (en) On-line multi-target tracking method based on multifunctional aggregation and tracking simulation training
CN113221787A (en) Pedestrian multi-target tracking method based on multivariate difference fusion
Munir et al. LDNet: End-to-end lane marking detection approach using a dynamic vision sensor
CN117252904B (en) Target tracking method and system based on long-range space perception and channel enhancement
CN116402850A (en) Multi-target tracking method for intelligent driving
Han et al. A method based on multi-convolution layers joint and generative adversarial networks for vehicle detection
CN112287906A (en) Template matching tracking method and system based on depth feature fusion
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN110111358B (en) Target tracking method based on multilayer time sequence filtering
CN115731517B (en) Crowded Crowd detection method based on crown-RetinaNet network
CN111862147A (en) Method for tracking multiple vehicles and multiple human targets in video
CN113012193A (en) Multi-pedestrian tracking method based on deep learning
CN111091583A (en) Long-term target tracking method
CN114972434A (en) End-to-end multi-target tracking system for cascade detection and matching
CN112613472B (en) Pedestrian detection method and system based on deep search matching
Amini et al. New approach to road detection in challenging outdoor environment for autonomous vehicle
Yi et al. Human action recognition based on skeleton features
Ranjbar et al. Scene novelty prediction from unsupervised discriminative feature learning
Li et al. MULS-Net: A Multilevel Supervised Network for Ship Tracking From Low-Resolution Remote-Sensing Image Sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant