CN111767847B - Pedestrian multi-target tracking method integrating target detection and association - Google Patents

Pedestrian multi-target tracking method integrating target detection and association Download PDF

Info

Publication number
CN111767847B
CN111767847B CN202010605987.5A CN202010605987A CN111767847B CN 111767847 B CN111767847 B CN 111767847B CN 202010605987 A CN202010605987 A CN 202010605987A CN 111767847 B CN111767847 B CN 111767847B
Authority
CN
China
Prior art keywords
model
sub
target
tracking
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010605987.5A
Other languages
Chinese (zh)
Other versions
CN111767847A (en
Inventor
杨航
杨海东
黄坤山
彭文瑜
林玉山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Original Assignee
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute, Foshan Guangdong University CNC Equipment Technology Development Co. Ltd filed Critical Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority to CN202010605987.5A priority Critical patent/CN111767847B/en
Publication of CN111767847A publication Critical patent/CN111767847A/en
Application granted granted Critical
Publication of CN111767847B publication Critical patent/CN111767847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a pedestrian multi-target tracking method integrating target detection and association, which comprises the following steps: training a tracking model network by adopting a training data set to obtain a tracking model; a first frame image in the video stream to be tracked passes through the detection sub-model first, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; extracting feature vectors from each pedestrian target through the appearance feature extraction sub-model, and distributing IDs and tracks; and the other frame images in the video stream to be tracked sequentially pass through the tracking model, the corresponding track positions of all pedestrians in the current frame image are determined according to the similarity of the feature vectors in the two adjacent frame images in the video stream to be tracked, and the track positions corresponding to the same ID in all frame images in the video stream to be tracked are connected, so that the corresponding tracking result is obtained. The invention overcomes the defects of low speed and frequent identity switching under the condition of object shielding in the traditional tracking method.

Description

Pedestrian multi-target tracking method integrating target detection and association
Technical Field
The invention relates to the field of pedestrian tracking, in particular to a pedestrian multi-target tracking method integrating target detection and association.
Background
The main task of multi-target tracking is to give an image sequence, find moving objects in the image sequence, make the moving objects in different frames correspond to each other one by one, and then give the motion trail of different objects. These objects may be arbitrary, such as pedestrians, vehicles, athletes, various animals, etc., and pedestrian tracking is the most studied. This is because, firstly, "pedestrians" are typically non-rigid objects, which are more difficult than rigid objects, and secondly, detection and tracking of pedestrians in practical applications are more commercially valuable. At least 75% of multi-objective tracking studies are studying pedestrian tracking based on incomplete statistics.
Due to the development and application of convolutional neural networks, many tasks in the field of computer vision have been greatly developed, and many target methods based on convolutional neural networks are also applied to the problem of image recognition. How to expand the application of deep learning in the field of pedestrian multi-target tracking, it is a challenging task to study the deep learning algorithm applicable to the multi-target tracking problem. The common multi-target pedestrian tracking method in the prior art divides target detection and association into two steps, and trains by using different data sets respectively, but the problem caused by the method is that the running speed of a tracking model is slow, and the real-time tracking effect is difficult to achieve.
Disclosure of Invention
In view of the shortcomings of the prior art, the invention aims to provide a pedestrian multi-target tracking method integrating target detection and association.
In order to achieve the above purpose, the invention adopts the following technical scheme: a pedestrian multi-target tracking method integrating target detection and association comprises the following steps:
s01: training a tracking model network by adopting a training data set to obtain a tracking model; the tracking model comprises a detection sub-model and an appearance feature extraction sub-model connected behind the detection sub-model; the method specifically comprises the following steps:
s011: training the detection sub-model network by adopting a first training data set to obtain a detection sub-model; the detection sub-model network sequentially comprises two convolution pooling sub-networks and two recursion sub-networks which are connected together, wherein the output end of one convolution pooling sub-network is connected with the input end of one recursion sub-network; the convolution pooling sub-network comprises a convolution layer and a pooling layer, and the output end of the convolution layer is connected with the input end of the pooling layer;
s012: training the appearance characteristic extraction sub-model network by adopting a second training data set to obtain an appearance characteristic extraction sub-model; the appearance characteristic extraction sub-model network sequentially comprises a convolution layer, a pooling layer, three residual blocks and a full-connection layer;
s02: inputting the video stream to be tracked into a tracking model to obtain a tracking result; the method specifically comprises the following steps:
s021: a first frame image in the video stream to be tracked passes through the detection sub-model first, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; extracting feature vectors from each pedestrian target through the appearance feature extraction sub-model, and distributing IDs and tracks;
s022: other frame images in the video stream to be tracked sequentially pass through the tracking model, each frame image passes through the detection sub-model, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; then, the appearance feature extraction sub-model is used for generating a feature vector of the pedestrian target; and determining the corresponding track positions of each pedestrian in the current frame image according to the distance between the feature vectors in two adjacent frame images in the video stream to be tracked, and connecting the track positions corresponding to the same ID in all frame images in the video stream to be tracked, namely the corresponding tracking result.
Further, the step S011 specifically includes:
s0111: the first training data set sequentially passes through two convolution pooling sub-networks and two recursion sub-networks to output a heat map;
s0112: setting super parameters of the detection sub-model network, inputting a first training data set into the detection sub-model network for training, and determining parameters of the detection sub-model network; the super parameters of the detection sub-model network comprise a learning rate, iteration times, batch processing size and a heat map score threshold;
s0113: and in the heat map, non-maximum suppression is carried out according to the heat map score to extract peak key points, the positions of the key points with the heat map score larger than a threshold value are reserved, and the coordinates of the boundary frame are calculated to obtain the heat map with the boundary frame.
Further, before each downsampling in the recursive sub-network, the upper half way is separated to reserve the original scale information; after each up-sampling, adding the data with the last scale; between two downsampling steps, three residual blocks are used to extract features; between the two additions, a residual block is used to extract features.
Further, the residual block comprises a convolution path and a skip stage path, wherein the convolution path is formed by connecting three convolution layers with different convolution kernels in series, and the skip stage path comprises a convolution layer with a convolution kernel of 1.
Further, the step S012 specifically includes:
s0121: inputting the heat map containing the boundary frame into an appearance feature extraction sub-model network, setting a similarity threshold, judging that the two target features are the same target if the similarity between the two target features is greater than or equal to the similarity threshold, and judging that the two target features are not the same target if the similarity between the two target features is less than the similarity threshold;
s0122: setting super parameters of the appearance characteristic extraction sub-model network, inputting a second training data set into the appearance characteristic extraction sub-model network, and determining parameters of the appearance characteristic extraction sub-model network; the super parameters of the appearance characteristic extraction sub-model network comprise learning rate, iteration times, batch processing size and similarity threshold.
Further, in the step S022, for the M-th frame of the video stream to be tracked, the distance between the detected targets in the M-1 th frame and the M-th frame image is measured according to the similarity of the feature vectors and the IOU size, when the distance is smaller than the distance threshold, successful matching is considered, the ID of the corresponding target in the previous frame image is inherited, otherwise, the matching is unsuccessful, and a new ID is allocated to the target.
Further, the similarity is calculated by using a mahalanobis distance or an euclidean distance.
Further, in the step S022, the position of the track in the current frame image is predicted by adopting kalman filtering, and if the distance between the track position in the current frame image and the link detection is greater than the track threshold, the distance is set to infinity.
Further, the first training data set includes data sets in an ETH database, a Cityperson database, a CalTech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
Further, the second training data set includes data sets in a CalTech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
The invention has the beneficial effects that: in order to achieve good balance between precision and speed, more skip connection is arranged between low-level and high-level characteristics, meanwhile, intermediate supervision is adopted to calculate loss on the output of two recursion sub-networks, and repeated bidirectional reasoning is carried out to ensure normal updating of bottom layer parameters and improve the detection accuracy; the present invention eliminates the traditional way of separating the detection and correlation parts, and by integrating the two modules together, the computational effort and the runtime are greatly reduced. Meanwhile, in the appearance feature extraction submodel, the dimension of the feature vector is reduced, so that the overfitting can be relieved, the robustness is improved, and the calculated amount and the running time are reduced. The method overcomes the defects of low speed and frequent identity switching under the condition of object shielding in the traditional tracking method, and can be applied to video monitoring scenes in areas with large human flow such as intersections.
Drawings
FIG. 1 is a schematic diagram of a tracking model of the present invention;
FIG. 2 is a schematic diagram of a recursive subnetwork structure according to the present invention;
FIG. 3 is a schematic diagram of the residual block structure of the present invention;
fig. 4 is a flowchart of pedestrian tracking performed on a video stream to be tracked according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and detailed description below:
the invention provides a pedestrian multi-target tracking method integrating target detection and association, which comprises the following steps:
s01: training a tracking model network by adopting a training data set to obtain a tracking model; the tracking model includes a detection sub-model and an appearance feature extraction sub-model connected after the detection sub-model. The tracking model network is a preliminary model framework, and at the moment, all model parameters in the model framework are uncertain. The training purpose is to determine each model parameter, and substituting the determined model parameter into a model network is the tracking model after training. The training data set employed in the present invention may be, but is not limited to, only providing bounding box annotations for training data sets collected in existing pedestrian databases, such as the ETH and CityPerson data sets, which may be used as the first training data set for training the detection sub-model network; calTech, MOT17, CUHK-SYSU and PRW databases provide bounding boxes and identity annotations, which can be used as both a second training dataset for training the appearance feature extraction sub-model network and a first training dataset for training the detection sub-model network.
The specific training process comprises the steps of training the detection sub-model and training the external feature extraction sub-model network, and the specific process is as follows:
s011: training the detection sub-model network by adopting a first training data set to obtain a detection sub-model; the detection sub-model network comprises two convolution pooling sub-networks and two recursion sub-networks which are connected together in sequence, wherein the output end of one convolution pooling sub-network is connected with the input end of one recursion sub-network; the convolution pooling sub-network comprises a convolution layer and a pooling layer, and an output end of the convolution layer is connected with an input end of the pooling layer. The specific training process for the detection sub-model network comprises the following steps:
s0111: the first training data set sequentially passes through two convolution pooling sub-networks and two recursion sub-networks to output a heat map; as shown in fig. 1, the first training data set sequentially passes through a continuous convolution layer, a pooling layer, a convolution layer and a pooling layer, wherein the convolution layer adopts a convolution kernel of 3×3, and the pooling layer uses maximum pooling to obtain a feature map with the resolution of 1/4 of that of the original figure; the feature map is then input into two successive recursive subnetworks, outputting a heat map. The recursive subnetwork structure is shown in figure 2, the input is from left to middle, the dimension is increased, the resolution of the feature map is reduced, the dimension is reduced, and the resolution of the feature map is increased. Before each time of downsampling in the recursion sub-network, the upper half way is separated to reserve the original scale information; after each up-sampling, adding the data with the last scale; for example, the c4b network layer is merged by the c7 network layer and the c4a network layer, wherein the c7 network layer doubles the resolution by upsampling, and in order to enlarge the resolution of the feature map, for example, the convolution kernel of the c7 network layer is 4×4, the convolution kernel obtained after upsampling is 8×8. The size of the c4a network layer is consistent with that of the c4 network layer, the c4a network layer can be regarded as a copy of the c4 network layer, the convolution kernel of the convolution kernel is twice that of the c7 network layer, the convolution kernel is just consistent with that of the up-sampled c7 network layer, and the values can be directly added, so that the c4b network layer is obtained.
The above process involves a number of pooling and upsampling processes, where the pooling layer uses maximum pooling for redundancy reduction and the upsampling layer uses nearest neighbor interpolation. Before each downsampling, one path is split to retain the original scale information, and after each upsampling, the two paths are fused with the features of the previous scale. Specifically, between two downsampling passes, three residual blocks are used to extract features, and between two feature map fusion passes, one residual block is used to extract features. The advantage of this architecture is that the features of the object may appear at different network layers, and conventional convolutional network models tend to result in feature loss, while the use of such a jump connection can effectively incorporate multi-scale features.
The residual block in the recursive subnetwork is shown in figure 3, and comprises a convolution path and a skip level path, wherein the convolution path is formed by connecting three convolution layers with different kernel scales in series, and the skip level path comprises a convolution layer with a convolution kernel of 1. M is the number of input channels, i.e., input depth, and N is the number of output channels, i.e., output depth. The characteristic of the residual block structure is that the dimension of the feature map can be increased or decreased, and the resolution of the feature map is not changed; the residual block extracts higher-level features (convolution paths) and simultaneously retains original-level information (jump paths), and the residual block does not change the data size but only changes the data depth; it can be regarded as a high-level convolutional layer of a guaranteed size. Meanwhile, due to the characteristic of a residual block structure, the characteristics in the characteristic diagram are not easy to lose, and the gradient vanishing problem can be effectively relieved. Each recursion sub-network outputs a heat map with the same channel number as 1 with the original image resolution and regresses an offset vector for each corresponding target point on the heat map.
The invention adopts a structure of cascade connection of 2 recursion sub-networks, the heat map generated by each recursion sub-network is used for calculating a loss function with a true value, the value of each point on the heat map is between 0 and 1, and the larger the probability that the point is predicted to be a target center point is, the closer the numerical value is to 1. Considering positive and negative sample imbalance, we use focal loss as a loss function.
Figure BDA0002561075650000071
If the gradient descent is directly carried out on the whole characteristic diagram, the error of the output layer is greatly reduced through multi-layer back propagation, namely gradient disappearance occurs. So we use two recursive subnetworks cascading in combination with the use of intermediate supervision, repeated bi-directional reasoning to ensure the normal updating of the underlying parameters. The intermediate supervision in the invention refers to that the heat map output by each network layer in the recursion sub-network is used as prediction, the prediction accuracy is far better than the effect of only using the heat map output by the last network layer as prediction, and the supervision training method considering the intermediate network layer is the intermediate supervision.
S0112: setting super parameters of the detection sub-model network, inputting a first training data set into the detection sub-model network for training, and determining parameters of the detection sub-model network; the super parameters of the sub-model network are detected, wherein the super parameters comprise a learning rate, iteration times, batch processing size and a heat map score threshold;
s0113: and on the obtained heat map, performing non-maximum inhibition according to the heat map score to extract peak key points, reserving the positions of the key points with the heat map score larger than a threshold value, and calculating corresponding boundary frame coordinates according to the estimated offset vector and the size of the boundary frame to obtain the heat map with the boundary frame.
S012: training the appearance characteristic extraction sub-model network by adopting a second training data set to obtain an appearance characteristic extraction sub-model; with continued reference to fig. 1, the appearance feature extraction submodel network includes, in order, a convolutional layer, a pooling layer, three residual blocks, and a full connection layer. The specific training process is as follows:
s0121: inputting the heat map containing the boundary box into an appearance characteristic extraction submodel network, and sequentially passing through a convolution layer, a pooling layer, three residual blocks and a final full-connection layer to generate a 128-dimensional characteristic vector for each target in the heat map. The object of the appearance information extraction module is to generate feature vectors that can distinguish between different objects. Ideally, the distance between different objects should be greater than the distance between the same object. The method can adopt the mahalanobis distance as a measurement, set a mahalanobis distance threshold before training, consider the same target when the mahalanobis distance of the feature vector between the targets is smaller than the threshold, and judge different targets when the mahalanobis distance of the feature vector between the targets is larger than or equal to the threshold.
S0122: setting super parameters of the appearance characteristic extraction sub-model network, inputting a second training data set into the appearance characteristic extraction sub-model network, and determining parameters of the appearance characteristic extraction sub-model network; the super parameters of the appearance characteristic extraction sub-model network comprise learning rate, iteration times, batch processing size and similarity threshold.
S02: as shown in fig. 4, inputting a video stream to be tracked into a tracking model to obtain a tracking result; the method specifically comprises the following steps:
s021: a first frame image in a video stream to be tracked firstly passes through a detection sub-model, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; extracting feature vectors from each pedestrian target through the appearance feature extraction sub-model, and distributing IDs and tracks;
s022: other frame images in the video stream to be tracked sequentially pass through the tracking model, each frame image passes through the detection sub-model, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; then, the appearance feature extraction sub-model is used for generating a feature vector of the pedestrian target;
and determining the corresponding track positions of each pedestrian in the current frame image according to the distance between the feature vectors in two adjacent frame images in the video stream to be tracked, and connecting the track positions corresponding to the same ID in all frame images in the video stream to be tracked, namely the corresponding tracking result.
For example, in M frames, the distance between detected objects in the M-1 frame and the M frame is measured according to the similarity of feature vectors between objects and the IOU ((Intersection over Union)) size, wherein the similarity is calculated by using the Markov distance or the Euclidean distance. And when the distance is smaller than the set threshold value, the pedestrian targets are considered to be successfully matched, namely the pedestrian targets are the same pedestrian targets, at the moment, the pedestrian targets in the M-1 frame image inherit the IDs of the corresponding pedestrian targets in the M-1 frame image, and if the matching is unsuccessful, a new ID is allocated to the targets. The invention can also predict the track position in the current frame by adopting Kalman filtering, and if the track position distance link detection in the current frame image is greater than the track threshold value, the distance is set to infinity, thereby effectively preventing the link detection with a larger moving object.
It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made which are within the scope of the invention as defined in the appended claims.

Claims (8)

1. The pedestrian multi-target tracking method integrating target detection and association is characterized by comprising the following steps of:
s01: training a tracking model network by adopting a training data set to obtain a tracking model; the tracking model comprises a detection sub-model and an appearance feature extraction sub-model connected behind the detection sub-model; the method specifically comprises the following steps:
s011: training the detection sub-model network by adopting a first training data set to obtain a detection sub-model; the detection sub-model network sequentially comprises two convolution pooling sub-networks and two recursion sub-networks which are connected together, wherein the output end of one convolution pooling sub-network is connected with the input end of one recursion sub-network; the convolution pooling sub-network comprises a convolution layer and a pooling layer, and the output end of the convolution layer is connected with the input end of the pooling layer;
s012: training the appearance characteristic extraction sub-model network by adopting a second training data set to obtain an appearance characteristic extraction sub-model; the appearance characteristic extraction sub-model network sequentially comprises a convolution layer, a pooling layer, three residual blocks and a full-connection layer;
s02: inputting the video stream to be tracked into a tracking model to obtain a tracking result; the method specifically comprises the following steps:
s021: a first frame image in the video stream to be tracked passes through the detection sub-model first, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; extracting feature vectors from each pedestrian target through the appearance feature extraction sub-model, and distributing IDs and tracks;
s022: other frame images in the video stream to be tracked sequentially pass through the tracking model, each frame image passes through the detection sub-model, and a boundary frame of a pedestrian target is generated according to the heat map and the offset vector; then, the appearance feature extraction sub-model is used for generating a feature vector of the pedestrian target; determining the corresponding track positions of each pedestrian in the current frame image according to the distance between the feature vectors in two adjacent frame images in the video stream to be tracked, and connecting the track positions corresponding to the same ID in all frame images in the video stream to be tracked, namely the corresponding tracking result;
the step S011 specifically includes:
s0111: the first training data set sequentially passes through two convolution pooling sub-networks and two recursion sub-networks to output a heat map;
s0112: setting super parameters of the detection sub-model network, inputting a first training data set into the detection sub-model network for training, and determining parameters of the detection sub-model network; the super parameters of the detection sub-model network comprise a learning rate, iteration times, batch processing size and a heat map score threshold;
s0113: in the heat map, non-maximum inhibition is carried out according to the heat map score to extract peak key points, positions of key points with the heat map score larger than a threshold value are reserved, and coordinates of a boundary frame are calculated to obtain the heat map with the boundary frame;
the step S012 specifically includes:
s0121: inputting the heat map containing the boundary frame into an appearance feature extraction sub-model network, setting a similarity threshold, judging that the two target features are the same target if the similarity between the two target features is greater than or equal to the similarity threshold, and judging that the two target features are not the same target if the similarity between the two target features is less than the similarity threshold;
s0122: setting super parameters of the appearance characteristic extraction sub-model network, inputting a second training data set into the appearance characteristic extraction sub-model network, and determining parameters of the appearance characteristic extraction sub-model network; the super parameters of the appearance characteristic extraction sub-model network comprise learning rate, iteration times, batch processing size and similarity threshold.
2. The method for integrating object detection and associated pedestrian multi-object tracking according to claim 1, wherein prior to each downsampling in the recursive subnetwork, the upper half way of the reserved original scale information is split; after each up-sampling, adding the data with the last scale; between two downsampling steps, three residual blocks are used to extract features; between the two additions, a residual block is used to extract features.
3. The method of claim 2, wherein the residual block comprises a convolution path and a skip path, the convolution path comprising three convolution layers having different convolution kernels in series, the skip path comprising a convolution layer having a convolution kernel of 1.
4. The method according to claim 1, wherein in step S022, for the M-th frame of the video stream to be tracked, the distance between the detected targets in the M-1 st frame and the M-th frame is measured according to the similarity of the feature vectors and the IOU size, and when the distance is smaller than the distance threshold, the matching is considered successful, the ID of the corresponding target in the previous frame is inherited, otherwise, the matching is unsuccessful, and a new ID is allocated to the target.
5. The method of claim 4, wherein the similarity is calculated using a mahalanobis distance or a euclidean distance.
6. The method according to claim 1, wherein the step S022 uses kalman filtering to predict the track position in the current frame image, and if the distance between the track position in the current frame image and the link detection is greater than the track threshold, the distance is set to infinity.
7. The integrated target detection and associated pedestrian multi-target tracking method of claim 1 wherein the first training data set comprises data sets in an ETH database, a CityPerson database, a CalTech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
8. The integrated target detection and associated pedestrian multi-target tracking method of claim 1 wherein the second training data set comprises data sets in a CalTech database, a MOT17 database, a CUHK-SYSU database, and a PRW database.
CN202010605987.5A 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association Active CN111767847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010605987.5A CN111767847B (en) 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010605987.5A CN111767847B (en) 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association

Publications (2)

Publication Number Publication Date
CN111767847A CN111767847A (en) 2020-10-13
CN111767847B true CN111767847B (en) 2023-06-09

Family

ID=72722916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010605987.5A Active CN111767847B (en) 2020-06-29 2020-06-29 Pedestrian multi-target tracking method integrating target detection and association

Country Status (1)

Country Link
CN (1) CN111767847B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112258559A (en) * 2020-10-26 2021-01-22 上海萱闱医疗科技有限公司 Intelligent running timing scoring system and method based on multi-target tracking
CN113221787B (en) * 2021-05-18 2023-09-29 西安电子科技大学 Pedestrian multi-target tracking method based on multi-element difference fusion
CN113362372B (en) * 2021-05-25 2023-05-02 同济大学 Single target tracking method and computer readable medium
CN113378704B (en) * 2021-06-09 2022-11-11 武汉理工大学 Multi-target detection method, equipment and storage medium
CN113744220B (en) * 2021-08-25 2024-03-26 中国科学院国家空间科学中心 PYNQ-based detection system without preselection frame
CN114998999B (en) * 2022-07-21 2022-12-06 之江实验室 Multi-target tracking method and device based on multi-frame input and track smoothing

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214238B (en) * 2017-06-30 2022-06-28 阿波罗智能技术(北京)有限公司 Multi-target tracking method, device, equipment and storage medium
CN109086648A (en) * 2018-05-24 2018-12-25 同济大学 A kind of method for tracking target merging target detection and characteristic matching
CN109271933B (en) * 2018-09-17 2021-11-16 北京航空航天大学青岛研究院 Method for estimating three-dimensional human body posture based on video stream
CN109816690A (en) * 2018-12-25 2019-05-28 北京飞搜科技有限公司 Multi-target tracking method and system based on depth characteristic
CN110188690B (en) * 2019-05-30 2022-02-08 山东巍然智能科技有限公司 Intelligent visual analysis system based on unmanned aerial vehicle, intelligent visual analysis system and method
CN110728698B (en) * 2019-09-30 2023-05-16 天津大学 Multi-target tracking system based on composite cyclic neural network system
CN111126152B (en) * 2019-11-25 2023-04-11 国网信通亿力科技有限责任公司 Multi-target pedestrian detection and tracking method based on video

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device

Also Published As

Publication number Publication date
CN111767847A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
CN110147743B (en) Real-time online pedestrian analysis and counting system and method under complex scene
CN109711316B (en) Pedestrian re-identification method, device, equipment and storage medium
CN111797716B (en) Single target tracking method based on Siamese network
CN111460926B (en) Video pedestrian detection method fusing multi-target tracking clues
CN111476181B (en) Human skeleton action recognition method
CN108304798B (en) Street level order event video detection method based on deep learning and motion consistency
CN112184752A (en) Video target tracking method based on pyramid convolution
CN110991274B (en) Pedestrian tumbling detection method based on Gaussian mixture model and neural network
CN112651995A (en) On-line multi-target tracking method based on multifunctional aggregation and tracking simulation training
CN113744311A (en) Twin neural network moving target tracking method based on full-connection attention module
CN112215080B (en) Target tracking method using time sequence information
CN110598586A (en) Target detection method and system
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN113763427B (en) Multi-target tracking method based on coarse-to-fine shielding processing
Munir et al. LDNet: End-to-end lane marking detection approach using a dynamic vision sensor
CN111862145A (en) Target tracking method based on multi-scale pedestrian detection
Han et al. A method based on multi-convolution layers joint and generative adversarial networks for vehicle detection
CN112287906B (en) Template matching tracking method and system based on depth feature fusion
Şah et al. Review and evaluation of player detection methods in field sports: Comparing conventional and deep learning based methods
CN111862147A (en) Method for tracking multiple vehicles and multiple human targets in video
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN111291785A (en) Target detection method, device, equipment and storage medium
CN116311353A (en) Intensive pedestrian multi-target tracking method based on feature fusion, computer equipment and storage medium
CN112613472A (en) Pedestrian detection method and system based on deep search matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant