CN114897939A - Multi-target tracking method and system based on deep path aggregation network - Google Patents
Multi-target tracking method and system based on deep path aggregation network Download PDFInfo
- Publication number
- CN114897939A CN114897939A CN202210599934.6A CN202210599934A CN114897939A CN 114897939 A CN114897939 A CN 114897939A CN 202210599934 A CN202210599934 A CN 202210599934A CN 114897939 A CN114897939 A CN 114897939A
- Authority
- CN
- China
- Prior art keywords
- target
- network
- frame
- path aggregation
- aggregation network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002776 aggregation Effects 0.000 title claims abstract description 72
- 238000004220 aggregation Methods 0.000 title claims abstract description 72
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000001514 detection method Methods 0.000 claims description 34
- 238000012549 training Methods 0.000 claims description 26
- 238000007781 pre-processing Methods 0.000 claims description 13
- 238000005070 sampling Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 abstract description 5
- 230000004927 fusion Effects 0.000 abstract description 2
- 238000001914 filtration Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 19
- 238000013135 deep learning Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/18—Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Geometry (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-target tracking method and system based on a deep path aggregation network. With a deep path aggregation network as a framework, firstly inputting a large amount of external video data to train the network so as to generate a target center heat map, a target center offset and a prediction frame size, and simultaneously extracting re-ID characteristics of a target; then generating a prediction frame according to the target position information, and marking the detected object in a rectangular frame form; then the cosine distances of the re-ID feature vectors of all detected objects in a certain frame and the previous frame of the video and IoU of a prediction box are calculated, and all the objects in the current frame are connected into the existing track. And finally, further estimating the positions of all the targets in the current frame through a Kalman filtering algorithm. The invention adopts the feature fusion layer from bottom to top to extract the target space feature information, shortens the information path between the bottom layer and the top layer features, and ensures that the tracker has the advantages of higher tracking precision and high real-time tracking speed.
Description
Technical Field
The invention belongs to the fields of computer vision, deep learning technology and multi-target tracking, and particularly relates to a multi-target tracking method and system based on a deep path aggregation network.
Background
The multi-target tracking is a popular research field in computer vision and has important application in the fields of automatic driving, intelligent traffic, intelligent monitoring and the like. With the rapid development of artificial intelligence, more and more deep learning algorithms are applied to various aspects in life. In single target tracking, the appearance characteristics of an object are known in advance. In multi-target tracking, a tracker needs to estimate the trajectories of multiple targets in a video and detect targets leaving or entering a video scene. Multiple objects in a video may be occluded or have similar appearances, and external environmental factors such as lighting, weather, and video quality may make tracking difficult for a tracker. In the field of multi-target tracking, the trackers based on deep learning have relatively good performance, including JDE, FairMOT, and the like, but they usually cannot achieve a good balance between tracking accuracy and tracking speed. And the spatial feature information of the target is not further extracted, so that the accurate position of the target cannot be well deduced.
Disclosure of Invention
The invention aims to provide a multi-target tracking method and system based on a deep path aggregation network, and aims to solve the technical problems that an existing tracker based on deep learning cannot well balance tracking precision and tracking speed and cannot well deduce the accurate position of a target.
In order to solve the technical problems, the specific technical scheme of the invention is as follows:
a multi-target tracking method based on a deep path aggregation network comprises the following steps:
step 1, data preprocessing: aiming at each frame image in each section of video of the training set, performing data enhancement by using rotation, scaling and color dithering to obtain an input data set of the network;
step 2, constructing a deep path aggregation network, comprising the following substeps:
step 2.1, designing a network structure of the deep path aggregation network;
2.2, constructing a training sample, selecting images from the input data set, and inputting the images into the depth path aggregation network as the input of the network;
step 2.3, designing an error function to perform back propagation, and optimizing parameters of the network until convergence;
step 3, performing multi-target tracking on each detected object in the video: extracting a detection area and re-ID characteristics of the target based on the trained depth path aggregation network, calculating the cosine distance of the re-ID characteristics of each target in a certain frame and the previous frame of the video and IoU of the detection frame, and performing tracking prediction by using a Kalman filter to obtain the position of the target in the current frame.
Further, the data preprocessing step in step 1 is specifically as follows:
and randomly selecting an angle of-10 to 10 for each frame of image in each section of video of the training set, rotating, then carrying out image scaling operation with the proportion of 4, and finally increasing the image color depth and the image brightness by 0.5 time to obtain an input data set of the network.
Further, designing a network structure of the deep path aggregation network in step 2.1 specifically includes the following steps:
step 201, based on the DLA network, adding three feature map layers from top to bottom at the final stage for performing down-sampling and aggregation operations on the output feature map of the DLA network, thereby obtaining three feature maps with different resolutions;
and 202, performing multi-scale aggregation on two medium and small resolution feature maps and another large resolution feature map to obtain a high resolution feature map representation, aiming at the output three feature maps with different resolutions. The high-resolution feature map representation is an output feature map of the deep path aggregation network, and the center, the heat map, the offset and the re-ID feature vector of the target can be output through the output feature map.
Further, the step 2.2 of constructing the training sample specifically includes the following steps:
and 4 images are selected from the input data set and input into the depth path aggregation network.
Further, the error function is designed in step 2.3 to perform back propagation, and the parameters of the network are optimized until convergence, specifically:
calculating a central heat map of each target in the image by using a Focal local Loss function, calculating a central offset and a predicted frame size of each target in the image by using an L1 local Loss function, and calculating a re-ID embedded Loss value of each target in the image by using an ID local Loss function; then, corresponding weight values are given to the three loss values to form a total loss value; after training, the central position, the size of a prediction frame and re-ID characteristics of each target in the image can be obtained; the total loss value is:
L detection =L heat +L box
wherein w 1 To adjust learnable parameters of target detection in the total loss function; w is a 2 The learnable parameters of the re-ID task in the total loss function are adjusted to balance the target detection and the re-ID task; e is a natural constant in mathematics; l is identity A penalty function representing a re-ID task; l is detection A loss function representing the detection task, which includes the heat map loss, prediction box size, and offset loss for each target in the image.
Further, the step 3 of performing multi-target tracking in the video specifically comprises the following steps:
step 301, inputting a certain frame image into a trained depth path aggregation network in a section of video to obtain a convolution feature map of the certain frame image, and then respectively generating a target heat map feature map, a prediction frame size feature map, a prediction frame offset feature map and a re-ID embedded feature map;
step 302, in the subsequent frame, performing online tracking on each target according to all target positions and re-ID features deduced from the previous frame, the steps are as follows:
step 3021, calculating detection frames and re-ID characteristics of all targets in the current frame by using the trained deep path aggregation network;
step 3022, predicting the position in the current frame by using a Kalman filter according to the motion track of each target in the previous frame;
step 3023, calculating cosine distances of re-ID features of all targets in the previous frame and all targets in the current frame and IoU of a detection frame, if the cosine distances are greater than 0.4 and the IoU score is greater than 0.5, determining that the tracking is successful, and connecting the successfully tracked targets to the existing motion track;
and step 3024, performing state updating by using a Kalman filter for all the targets successfully tracked to obtain the optimal estimation in the current frame.
The invention also provides a multi-target tracking system based on the deep path aggregation network, which comprises a data preprocessing unit, a deep path aggregation network training unit and a video multi-target tracking unit;
the data preprocessing unit is used for inputting an image sequence, and performing data enhancement on the image sequence by using rotation, scaling and color dithering;
the deep path aggregation network training unit is used for training a designed deep path aggregation network and is configured to execute the following steps:
step A, designing a network structure of a deep path aggregation network;
step B, constructing a training sample as the input of the network;
step C, designing an error function to perform back propagation, and optimizing parameters of the network until convergence;
the video multi-target tracking unit is configured to execute the following actions: extracting a detection area and re-ID characteristics of the targets based on the trained depth path aggregation network, and performing tracking prediction by using a Kalman filter according to the cosine distance of the re-ID characteristics of each target and IoU of the detection frame to obtain the positions of the targets in the current frame.
The multi-target tracking method and system based on the deep path aggregation network have the following advantages:
1. the multi-target tracking method based on the depth path aggregation network can be used for tracking all targets of each frame in any video;
2. the method can output the characteristics of different levels of the target by using the trained deep path aggregation network and by multi-scale aggregation of the cross-resolution ratio, so that the robustness on the appearance change of the target is stronger.
3. The invention adopts the feature fusion layer from bottom to top to extract the target space feature information, thereby shortening the information path between the bottom layer feature and the top layer feature, leading the tracker to have higher tracking precision and simultaneously ensuring the real-time performance of tracking.
Drawings
Fig. 1 is a schematic diagram of a multi-target tracking method based on a deep path aggregation network according to the present invention.
Detailed Description
In order to better understand the purpose, structure and function of the present invention, a multi-target tracking method and system based on a deep path aggregation network according to the present invention are described in further detail below with reference to the accompanying drawings.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The invention firstly provides a multi-target tracking method based on a deep path aggregation network, which is shown by referring to fig. 1 and comprises the following steps:
step 1, data preprocessing: performing data enhancement on the input image sequence by using rotation, scaling and color dithering to obtain an input data set of a network;
and randomly selecting an angle of-10 to 10 for each frame of image in each section of video of the training set, rotating, then carrying out image scaling operation with the proportion of 4, and finally increasing the image color depth and the image brightness by 0.5 time to obtain an input data set of the network.
Step 2, constructing a deep path aggregation network, comprising the following substeps:
step 2.1, designing a network structure of the deep path aggregation network, which specifically comprises the following steps:
step 201, based on the DLA network, adding three feature map layers from top to bottom at the final stage for performing down-sampling and aggregation operations on the output feature map of the DLA network, thereby obtaining three feature maps with different resolutions; deep aggregation networks (DLAs) are a type of neural network that can extract features and perform multi-scale aggregation. The deep path aggregation network proposed herein is an improvement based on DLA networks.
Step 202, aiming at three output feature maps with different resolutions, carrying out multi-scale aggregation on two middle and small resolution feature maps and another large resolution feature map to obtain a high resolution feature map representation; the high-resolution feature map representation is an output feature map of the deep path aggregation network, and the center, the heat map, the offset and the re-ID feature vector of the target can be output through the output feature map.
2.2, constructing a training sample, selecting 4 images from the input data set, and inputting the images into the depth path aggregation network as the input of the network;
step 2.3, designing an error function to perform back propagation, and optimizing parameters of the network until convergence, wherein the method specifically comprises the following steps:
respectively calculating a central heat map, a central offset, a predicted frame size and a re-ID embedding Loss value of each target in the image by using the Focal local, the L1 local and the ID local; then, corresponding weight values are given to the three loss values to form a total loss value; after training, the central position, the size of a prediction frame and re-ID characteristics of each target in the image can be obtained; where Focal local, L1 local, ID local are all some Loss functions common in the deep learning field, which can be understood as functions that compute target center heat map Loss, center offset and bounding box size Loss, re-ID embedding Loss, respectively. re-ID is an abbreviation of re-identification and means re-identification, the depth path aggregation network finally outputs re-ID feature vectors of all targets in each frame of image in the video, and the positions of the targets in the previous frame of image in the next frame of image can be re-identified by calculating cosine distances of all re-ID feature vectors in two adjacent frames of image in the tracking stage. The total loss value is:
L detection =L heat +L box
wherein w 1 To adjust learnable parameters of target detection in the total loss function; w is a 2 The learnable parameters of the re-ID task in the total loss function are adjusted to balance the target detection and the re-ID task; e is a natural constant, L identity A penalty function representing a re-ID task; l is detection A loss function representing the detection task, which includes the heat map loss, prediction box size, and offset loss for each target in the image.
Step 3, performing multi-target tracking on each detected object in the video: extracting a detection area and a re-ID feature of a target based on a trained deep path aggregation network, calculating the cosine distance of the re-ID feature of each target in a certain frame and a previous frame of a video and the IoU of a detection frame, wherein the IoU represents the overlapping degree of two frames, and the target detection field is often used for judging the overlapping degree of a prediction frame and a real frame, so that the size of the prediction frame is continuously adjusted in the training process to approximate the position and the size of the real frame. Tracking and predicting by using a Kalman filter to obtain the position of a target in a current frame, and specifically comprising the following steps of:
step 301, inputting a certain frame image into a trained depth path aggregation network in a section of video to obtain a convolution feature map of the certain frame image, and then respectively generating a target heat map feature map, a prediction frame size feature map, a prediction frame offset feature map and a re-ID embedded feature map;
step 302, in the subsequent frame, performing online tracking on each target according to all target positions and re-ID features deduced from the previous frame, the steps are as follows:
step 3021, calculating detection frames and re-ID characteristics of all targets in the current frame by using the trained deep path aggregation network;
step 3022, predicting the position in the current frame by using a Kalman filter according to the motion track of each target in the previous frame;
step 3023, calculating cosine distances of re-ID features of all targets in the previous frame and all targets in the current frame and IoU of a detection frame, if the cosine distances are greater than 0.4 and the IoU score is greater than 0.5, determining that the tracking is successful, and connecting the successfully tracked targets to the existing motion track;
and step 3024, performing state updating by using a Kalman filter for all the targets successfully tracked to obtain the optimal estimation in the current frame.
The invention also provides a multi-target tracking system based on the deep path aggregation network, which comprises a data preprocessing unit, a deep path aggregation network training unit and a video multi-target tracking unit;
the data preprocessing unit is used for inputting an image sequence, and performing data enhancement on the image sequence by using rotation, scaling and color dithering;
the deep path aggregation network training unit is used for training a designed deep path aggregation network and is configured to execute the following steps:
step A, designing a network structure of a deep path aggregation network;
step B, constructing a training sample as the input of the network;
step C, designing an error function to perform back propagation, and optimizing parameters of the network until convergence;
the video multi-target tracking unit is configured to execute the following actions: extracting a detection area and re-ID characteristics of the targets based on the trained depth path aggregation network, and performing tracking prediction by using a Kalman filter according to the cosine distance of the re-ID characteristics of each target and IoU of the detection frame to obtain the positions of the targets in the current frame.
It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the methods specified in the block or blocks of the block diagrams and/or flowchart block or blocks.
It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.
Claims (7)
1. A multi-target tracking method based on a deep path aggregation network is characterized by comprising the following steps:
step 1, data preprocessing: aiming at each frame image in each section of video of the training set, performing data enhancement by using rotation, scaling and color dithering to obtain an input data set of the network;
step 2, constructing a deep path aggregation network, comprising the following substeps:
step 2.1, designing a network structure of the deep path aggregation network;
2.2, constructing a training sample, selecting images from the input data set, and inputting the images into the depth path aggregation network as the input of the network;
step 2.3, designing an error function to perform back propagation, and optimizing parameters of the network until convergence;
step 3, performing multi-target tracking on each detected object in the video: extracting a detection area and re-ID characteristics of the target based on the trained depth path aggregation network, calculating the cosine distance of the re-ID characteristics of each target in a certain frame and the previous frame of the video and IoU of the detection frame, and performing tracking prediction by using a Kalman filter to obtain the position of the target in the current frame.
2. The multi-target tracking method based on the deep path aggregation network as claimed in claim 1, wherein the data preprocessing step in the step 1 is as follows:
and randomly selecting an angle of-10 to 10 for each frame of image in each section of video of the training set, rotating, then carrying out image scaling operation with the proportion of 4, and finally increasing the image color depth and the image brightness by 0.5 time to obtain an input data set of the network.
3. The multi-target tracking method based on the deep path aggregation network according to claim 1, wherein the step 2.1 of designing the network structure of the deep path aggregation network specifically comprises the following steps:
step 201, based on the DLA network, adding three feature map layers from top to bottom at the final stage for performing down-sampling and aggregation operations on the output feature map of the DLA network, thereby obtaining three feature maps with different resolutions;
and 202, performing multi-scale aggregation on two medium and small resolution feature maps and another large resolution feature map to obtain a high resolution feature map representation, aiming at the output three feature maps with different resolutions.
4. The multi-target tracking method based on the deep path aggregation network as claimed in claim 3, wherein the constructing of the training samples in the step 2.2 specifically includes the following steps:
and 4 images are selected from the input data set and input into the depth path aggregation network.
5. The multi-target tracking method based on the deep path aggregation network as claimed in claim 4, wherein the designing of the error function in the step 2.3 is performed with back propagation to optimize the parameters of the network until convergence, and specifically includes:
calculating a central heat map of each target in the image by using a FocalLoss Loss function, calculating a central offset and a predicted frame size of each target in the image by using an L1 Loss Loss function, and calculating a re-ID embedded Loss value of each target in the image by using an ID Loss Loss function; then, corresponding weight values are given to the three loss values to form a total loss value; after training, obtaining the central position, the size of a prediction frame and re-ID characteristics of each target in the image; the total loss value is:
L detection =L heat +L box
wherein w 1 To adjust learnable parameters of target detection in the total loss function; w is a 2 The learnable parameters of the re-ID task in the total loss function are adjusted to balance the target detection and the re-ID task; e is a natural constant, L identity A penalty function representing a re-ID task; l is detection A loss function representing the detection task, which includes the heat map loss, prediction box size, and center offset loss for each target in the image.
6. The multi-target tracking method based on the deep path aggregation network as claimed in claim 1, wherein the step 3 of performing multi-target tracking in the video specifically comprises the following steps:
step 301, inputting a certain frame image into a trained depth path aggregation network in a section of video to obtain a convolution feature map of the certain frame image, and then respectively generating a target heat map feature map, a prediction frame size feature map, a prediction frame offset feature map and a re-ID embedded feature map;
step 302, in the subsequent frame, performing online tracking on each target according to all target positions and re-ID features deduced from the previous frame, the steps are as follows:
step 3021, calculating detection frames and re-ID characteristics of all targets in the current frame by using the trained deep path aggregation network;
step 3022, predicting the position in the current frame by using a Kalman filter according to the motion track of each target in the previous frame;
step 3023, calculating cosine distances of re-ID features of all targets in the previous frame and all targets in the current frame and IoU of a detection frame, if the cosine distances are greater than 0.4 and the IoU score is greater than 0.5, determining that the tracking is successful, and connecting the successfully tracked targets to the existing motion track;
and step 3024, performing state updating by using a Kalman filter for all the targets successfully tracked to obtain the optimal estimation in the current frame.
7. The multi-target tracking system based on the deep path aggregation network is characterized by comprising a data preprocessing unit, a deep path aggregation network training unit and a video multi-target tracking unit, wherein the data preprocessing unit is used for preprocessing data;
the data preprocessing unit is used for inputting an image sequence, and performing data enhancement on the image sequence by using rotation, scaling and color dithering;
the deep path aggregation network training unit is used for training a designed deep path aggregation network and is configured to execute the following steps:
step A, designing a network structure of a deep path aggregation network;
step B, constructing a training sample as the input of the network;
step C, designing an error function to perform back propagation, and optimizing parameters of the network until convergence;
the video multi-target tracking unit is configured to execute the following actions: extracting a detection area and re-ID characteristics of the targets based on the trained depth path aggregation network, and performing tracking prediction by using a Kalman filter according to the cosine distance of the re-ID characteristics of each target and IoU of the detection frame to obtain the positions of the targets in the current frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210599934.6A CN114897939A (en) | 2022-05-26 | 2022-05-26 | Multi-target tracking method and system based on deep path aggregation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210599934.6A CN114897939A (en) | 2022-05-26 | 2022-05-26 | Multi-target tracking method and system based on deep path aggregation network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114897939A true CN114897939A (en) | 2022-08-12 |
Family
ID=82725901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210599934.6A Pending CN114897939A (en) | 2022-05-26 | 2022-05-26 | Multi-target tracking method and system based on deep path aggregation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114897939A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258608A (en) * | 2023-05-15 | 2023-06-13 | 中铁水利信息科技有限公司 | Water conservancy real-time monitoring information management system integrating GIS and BIM three-dimensional technology |
-
2022
- 2022-05-26 CN CN202210599934.6A patent/CN114897939A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116258608A (en) * | 2023-05-15 | 2023-06-13 | 中铁水利信息科技有限公司 | Water conservancy real-time monitoring information management system integrating GIS and BIM three-dimensional technology |
CN116258608B (en) * | 2023-05-15 | 2023-08-11 | 中铁水利信息科技有限公司 | Water conservancy real-time monitoring information management system integrating GIS and BIM three-dimensional technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113506317B (en) | Multi-target tracking method based on Mask R-CNN and apparent feature fusion | |
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN107452015B (en) | Target tracking system with re-detection mechanism | |
CN111814621A (en) | Multi-scale vehicle and pedestrian detection method and device based on attention mechanism | |
CN113674416B (en) | Three-dimensional map construction method and device, electronic equipment and storage medium | |
CN113409361B (en) | Multi-target tracking method and device, computer and storage medium | |
CN104820997B (en) | A kind of method for tracking target based on piecemeal sparse expression Yu HSV Feature Fusion | |
CN112668483B (en) | Single-target person tracking method integrating pedestrian re-identification and face detection | |
CN104517275A (en) | Object detection method and system | |
CN110827320B (en) | Target tracking method and device based on time sequence prediction | |
CN110009060A (en) | A kind of robustness long-term follow method based on correlation filtering and target detection | |
CN106780567B (en) | Immune particle filter extension target tracking method fusing color histogram and gradient histogram | |
CN111161325A (en) | Three-dimensional multi-target tracking method based on Kalman filtering and LSTM | |
CN111027505A (en) | Hierarchical multi-target tracking method based on significance detection | |
CN105809718A (en) | Object tracking method with minimum trajectory entropy | |
CN113763427A (en) | Multi-target tracking method based on coarse-fine shielding processing | |
CN115063447A (en) | Target animal motion tracking method based on video sequence and related equipment | |
CN115690545B (en) | Method and device for training target tracking model and target tracking | |
Foresti | Object detection and tracking in time-varying and badly illuminated outdoor environments | |
CN114897939A (en) | Multi-target tracking method and system based on deep path aggregation network | |
CN113223064A (en) | Method and device for estimating scale of visual inertial odometer | |
CN102800105B (en) | Target detection method based on motion vector | |
Lee et al. | An edge detection–based eGAN model for connectivity in ambient intelligence environments | |
CN116861262B (en) | Perception model training method and device, electronic equipment and storage medium | |
CN113129336A (en) | End-to-end multi-vehicle tracking method, system and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |