CN112836639A - Pedestrian multi-target tracking video identification method based on improved YOLOv3 model - Google Patents
Pedestrian multi-target tracking video identification method based on improved YOLOv3 model Download PDFInfo
- Publication number
- CN112836639A CN112836639A CN202110151278.9A CN202110151278A CN112836639A CN 112836639 A CN112836639 A CN 112836639A CN 202110151278 A CN202110151278 A CN 202110151278A CN 112836639 A CN112836639 A CN 112836639A
- Authority
- CN
- China
- Prior art keywords
- target
- detection
- algorithm
- pedestrian
- yolov3
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000001514 detection method Methods 0.000 claims abstract description 87
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 52
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 14
- 238000001914 filtration Methods 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 10
- 238000003064 k means clustering Methods 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Abstract
A pedestrian multi-target tracking video identification method based on an improved YOLOv3 model belongs to the field of image processing of computer vision. In the YOLOv3 network, the original standard convolution in the Darknet-53 characteristic extraction layer is replaced by the deep separable convolution; and introducing a SENET module in a prediction layer of the YOLOv3 network; and clustering the target frame in the selected data set by using a K-means + + clustering algorithm, optimizing the prior frame parameter of the network according to a clustering result, and correcting the anchor frame. The invention utilizes a tracking-by-detection framework and uses an improved YOLOv3 algorithm to realize the detection work of the target information, and the tracking part adopts a Deep-SORT algorithm for tracking, so that the whole algorithm can effectively reduce the conditions of missed detection and shielding, and can keep higher detection speed and better tracking effect.
Description
Technical Field
The invention belongs to the field of image processing of computer vision, and particularly relates to a method for improving a network structure of YOLOv3 aiming at the problems of high missing detection rate and low detection speed of a pedestrian target in multi-target tracking, so that the detection precision and the detection speed of a model on the pedestrian target are improved. The detection part detects the pedestrian target by adopting an improved YOLOv3 algorithm, the tracking part predicts the motion track of the target by using a Kalman filtering algorithm, and the data association part performs matching association on the target by using a Hungarian algorithm.
Background
With the rapid development of deep learning, compared with the characteristics of the traditional manual design, the convolutional neural network gradually shows advantages; the deep neural network shows excellent performance in the field of machine vision, and has gained wide attention of scholars; pedestrians serve as vulnerable groups in a road traffic environment, safety problems of the pedestrians are not small and varied, and establishment of a perfect pedestrian detection system becomes a research hotspot; in addition, the use of deep learning for driving assistance systems is also becoming a trend; the research of the target detection and tracking algorithm based on deep learning is carried out by using road pedestrian research objects.
In recent years, a multi-target tracking method based on detection gradually becomes a mainstream scheme in the field of multi-target tracking, but the method has high requirement on the accuracy of a detection result, and if the background is complex, the target detection is greatly influenced, so that the tracking effect is further influenced; even the current advanced YOLOv3 algorithm has the problems of low detection precision and low detection speed; secondly, how to effectively establish the target model between the detector and the tracker is also important. Therefore, it is a problem to be solved by those skilled in the art to provide a pedestrian detection and tracking algorithm with higher detection accuracy and higher detection speed.
Disclosure of Invention
In order to improve the detection precision and speed of a pedestrian multi-target tracking algorithm, the invention provides a pedestrian multi-target tracking video identification method based on an improved YOLOv3 network model; on the basis of a YOLOv3 network model and a Deep-SORT algorithm, aiming at the problems of occlusion and missing detection of target detection tracking, a prior frame is optimized by using a K-means + + clustering method, a SEnet module is embedded into a YOLOv3 network prediction layer, and aiming at the problem of low algorithm detection speed, a standard convolution of a YOLOv3 network is replaced by a Deep separable convolution network for feature extraction. A classical tracking-by-detection framework is selected, a detection part uses an improved YOLOv3 algorithm to realize the detection work of target information, and a tracking part selects a Deep-SORT algorithm for tracking.
The technical scheme adopted by the invention is as follows:
the pedestrian multi-target tracking video identification method based on the improved YOLOv3 model comprises the following steps:
step 1: the pedestrian detection section: improving a YOLOv3 target detection network, introducing a depth separable convolution module, and replacing a standard convolution module in a Darknet-53 feature extraction layer with the depth separable convolution module; introducing a SENET module, and adding the SENET module into a YOLO prediction layer;
step 2: selecting a data set containing a pedestrian image from the public data set, using a K-means + + clustering algorithm to replace the K-means clustering algorithm to perform clustering analysis on the data set labels, and training a pedestrian detection Yolov3 network model;
and step 3: a multi-target tracking section: carrying out target detection by using a trained pedestrian detection YOLOv3 network model, and carrying out multi-target tracking on pedestrians by combining with a Deep-SORT algorithm;
the step1 is further specifically as follows:
step 1.1: a depth separable convolution module is introduced into the Darknet-53 feature extraction layer, and the depth separable convolution module is used for replacing a standard convolution module in the original Darknet-53; the depth separable convolution is to take channels and space regions into consideration separately, decompose standard convolution into depth convolution and point-by-point convolution, namely, firstly, respectively performing 3 × 3 convolution on 3 single channels in a feature map by using the depth convolution, collecting the features of each channel, then performing 1 × 1 point-by-point convolution on the feature map subjected to the depth convolution by using the point-by-point convolution, and collecting the features of each point;
step 1.2: a SENET module is introduced into a YOLO prediction layer, and the SENET module is respectively embedded after vectors are output by 26 th, 43 th and 53 th layers of a network.
The step2 specifically comprises the following steps:
step 2.1: respectively extracting N pedestrian photos from the public data set, and labeling the photos by using a labeling tool; then dividing the pictures into a training set and a testing set according to the proportion;
step 2.2: and (3) carrying out prior frame clustering on the samples of the picture training set by using a K-means + + clustering algorithm instead of the K-means clustering algorithm to obtain a new anchor frame, and carrying out iterative training of a pedestrian detection YOLOv3 network model by using the new anchor frame.
Before multi-target tracking, a trained pedestrian detection YOLOv3 network model is required to be used for detecting targets, specifically:
inputting continuous frames of images with any size into a trained pedestrian detection YOLOv3 network model, firstly adaptively adjusting the input images, predicting B bounding boxes in each grid, detecting C-type targets, and outputting the bounding boxes of each type of targets and the confidence degrees of the bounding boxes. The confidence of the bounding box is defined as: the bounding box intersects the actual bounding box of the object and is compared to the IOU, multiplied by the probability that the object is present within the bounding box. Calculating the formula:
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,the bounding box is compared with the actual bounding box of the object.
By setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS method to obtain 5 parameters (x, y, w, h, p) of the boundary boxc) Where (x, y) is the target center left relative to the cellRelative coordinates of the upper corners, (w, h) are the width and height of the target and the entire image, p, respectivelycThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
The multi-target tracking in the step3 specifically comprises the following steps:
step 1: inputting a multi-target tracking algorithm: target coordinate information (c) obtained after improved YOLOv3 network detectionx,cyR, h, p) to obtain an 8-dimensional vector X ═ cx,cy,r,h,vx,vy,vr,vh]As input to the multi-target tracking algorithm. Where p is the confidence score and the center coordinate of the bounding box is (c)x,cy) Aspect ratio r, height h, vx,vy,vr,vhRepresents cx,cyR, h velocity variation value
Step 2: and (3) state estimation: firstly, predicting the position of a tracker at the next moment by using Kalman filtering, and then updating the predicted position based on a detection result obtained by the Kalman filtering;
step 3: assignment problem: the Hungarian algorithm is utilized to solve the problem of correlation between a detection result obtained by the Kalman filtering algorithm and a tracking prediction result, and meanwhile, the correlation of motion information and the correlation of target appearance information are considered;
correlation of motion information: and (3) predicting the Mahalanobis distance between the state and the new measurement by adopting a Kalman filter to express the motion information:
in the formula (d)(1)(i, j) represents the degree of motion matching between the j detection frames and the ith track, djIndicates the position of the jth detection frame, yiState vector representing the ith trace, SiRepresenting the covariance matrix between the detected position and the average position. Setting the association of the movement state if the mahalanobis distance of a certain association is smaller than a specified threshold value, which is derived from a separate training setSuccess is achieved;
introducing a correlation method of target appearance information, measuring the distance between the apparent features by using cosine distance, wherein the calculation formula is as follows:
wherein the limiting condition is | | | ri||=1,For storing the feature vector, r, successfully associated with the most recent n framesi,rkRepresenting two intersected vectors, and measuring the apparent characteristics of the tracker and the apparent characteristics corresponding to the detection result by using cosine distance;
and the relevance measurement is obtained by weighting the motion model and the appearance model:
ci,j=λd(1)(i,j)+(1-λ)d(2)(i,j) (7)
in the formula, ci,jAnd the comprehensive matching degree is shown, and lambda is a hyper-parameter and is 0 by default. Only c isi,jWhen the two types of measurement thresholds are within the intersection of the two types of measurement thresholds, the correct association is considered to be realized, and after the assignment is completed, unmatched detectors and trackers are classified;
step 4: cascade matching and IOU matching: when the target is shielded for a long time, the correctness of the Kalman filtering prediction result is reduced, and the observability in the state space is correspondingly reduced, so that the priority is given to the more frequently-appearing target by utilizing cascade matching. Performing IOU matching on trackers in unconfirmed states, unmatched trackers and unmatched detection, and assigning by using the Hungarian algorithm again;
and Step5, updating parameters of the matched tracker, deleting the tracker which is not matched again, and initializing the detection of the non-matching as a new target. Judging whether the video stream is finished or not, and if so, exiting the loop; otherwise, entering next frame detection.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
the deep separable convolution module is introduced into a YOLOv3 network model to replace a standard convolution module in YOLOv3, and the operation speed of the algorithm is increased.
According to the invention, the SENet module is added into the YOLOv3 prediction layer, and the characteristics of the SENet network reflecting the correlation and importance of the features between different channels are utilized, so that the feature extraction capability of the network is enhanced, and the detection precision is improved.
In the target detection network part, the K-means + + clustering algorithm is used for replacing the K-means clustering algorithm, and the anchor frame is modified, so that the characteristics of pedestrians are better met, the characteristic extraction is better performed, and the detection precision of the algorithm is improved.
The improved YOLOv3 algorithm is used for achieving detection work of target information, and the Deep-SORT algorithm is used for tracking in the tracking portion. Experimental results show that the provided tracking algorithm can effectively reduce the conditions of missed detection and shielding, and can keep higher detection speed and better tracking effect.
The above description is only an outline of the technical solution of the present invention, and the embodiments of the present invention will be described below in order to make the technical means of the present invention more clearly understood and to make the content, features, and advantages of the present invention more comprehensible.
Drawings
Fig. 1 is a flow chart of a specific algorithm of the present invention.
Fig. 2 is a diagram of an improved YOLOv3 network framework.
FIG. 3 is a diagram of the SEnet module architecture.
FIG. 4 is a diagram of a standard convolution structure and a depth separable convolution structure. Wherein, (a) represents a standard convolution structure, (b) represents a deep convolution structure, and (c) represents a point-by-point convolution structure.
FIG. 5 is a comparison of the test results of the model of the present invention and the original model. Wherein, (a) the tracking result of YOLOv3-Deep-SORT under different frame numbers, and (b) the tracking result under different frame numbers of the algorithm of the invention.
Detailed Description
The following further describes the embodiments of the present invention with reference to the drawings.
As shown in fig. 1, the present invention provides a pedestrian multi-target tracking method based on an improved YOLOv3 model, which includes:
step 1: the improved YOLOv3 sub-network of target detection, which is the basic operation based on detection tracking, as shown in fig. 2, is specifically divided into the following steps:
step 1.1: as shown in FIG. 4, a depth separable convolution module is introduced
A depth separable convolution module is introduced into a Darknet-53 feature extraction layer, and the depth separable convolution module is used for replacing the standard convolution in the original Darknet-53;
step 1.2: as shown in FIG. 3, the SENet module is introduced into the YOLO prediction layer
Embedding the SENet module after the vectors are respectively output by 26 th, 42 th and 53 th layers of a Darknet-53 feature extraction layer of a YOLOv3 network.
Step 2: and selecting a data set containing a pedestrian image from the VOC2007 picture, carrying out cluster analysis on the data set label by using a K-means + + clustering algorithm, and training a pedestrian detection Yolov3 network model. The method comprises the following steps:
step 2.1: 10000 photos of pedestrians in the VOC2007 and MOT 2015 public data sets are respectively extracted, and the pictures are respectively labeled by using a labeling tool; the pictures were then combined in a training set: the test set was 2: the training sample is selected according to the proportion of 1.
Step 2.2: and (3) carrying out prior frame clustering on the samples by using a K-means + + algorithm to obtain new anchors (the number of the anchors is selected to be 9), and carrying out iterative training on a Yolov3 pedestrian detection network model by using the new anchors.
And step 3: the improved YOLOv3 network is used as a detector for target detection, and is combined with a Deep-SORT multi-target tracking algorithm to realize multi-target tracking of pedestrians. The method comprises the following steps:
step 3.1: an object detection section: inputting continuous frames of images with any size into an improved YOLOv3 network model, firstly, adaptively adjusting the input images to 416 x 416, predicting B bounding boxes (B is 9) in each grid, detecting C-type objects (in pedestrian detection, the type is set as person), and outputting the bounding boxes and the confidence degrees of the bounding boxes of each type of objects. The confidence of the bounding box is defined as: the bounding box is cross-over-real (IOU) with the actual bounding box of the object, multiplied by the probability that the object is present within the bounding box. Calculating the formula:
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,the bounding box is compared with the actual bounding box of the object.
By setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS (non-maximum suppression) method to obtain 5 parameters of the boundary box as (x, y, w, h, p)c) Where (x, y) is the relative coordinate of the center of the target with respect to the upper left corner of the cell, (w, h) is the ratio of the width and height of the target to the entire image, pcThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
Step 3.2: referring to fig. 1, the improved YOLOv3 network is used as a detector for target detection, and the multi-target tracking part specifically includes the following steps:
step 1: target detection: target detection is carried out on the input video stream to obtain frame and characteristic information, and then the target coordinate information (c) obtained after detection is carried outx,cyR, h, p) to obtain an 8-dimensional vector X ═ cx,cy,r,h,vx,vy,vr,vh]As input to the multi-target tracking algorithm. Where p is the confidence score and the center coordinate of the bounding box is (c)x,cy) Aspect ratio r, height h, and respective speedsDegree change value
Step 2: and (3) state estimation: the position of the tracker at the next time instant is first predicted using kalman filtering, and then the predicted position is updated based on the detection result.
Step 3: assignment problem: the Hungarian algorithm is utilized to solve the problem of association between the detection result and the tracking prediction result, and meanwhile, the association of the motion information and the association of the target appearance information are considered.
Correlation of motion information: and (3) predicting the Mahalanobis distance between the state and the new measurement by adopting a Kalman filter to express the motion information:
in the formula (d)(1)(i, j) represents the degree of motion matching between the j detection frames and the ith track, djIndicates the position of the jth detection frame, yiState vector representing the ith trace, SiRepresenting the covariance matrix between the detected position and the average position. The association of the set motion state is successful if the mahalanobis distance of a certain association is smaller than a specified threshold (which is derived from a separate training set).
Introducing a correlation method of target appearance information, measuring the distance between the apparent features by using cosine distance, wherein the calculation formula is as follows:
in the formula, the limitation is | | | ri||=1,Used to store the feature vectors that were successfully associated with the last 100 frames. The cosine distance is used to measure the apparent characteristics of the tracker and the apparent characteristics corresponding to the detection result.
And the relevance measurement is obtained by weighting the motion model and the appearance model:
ci,j=λd(1)(i,j)+(1-λ)d(2)(i,j) (4)
in the formula, ci,jAnd the comprehensive matching degree is shown, and lambda is a hyper-parameter and is 0 by default. Only c isi,jWhen the assignment is complete, the unmatched detectors and trackers are classified.
Step 4: cascade matching and IOU matching: when the target is shielded for a long time, the correctness of the Kalman filtering prediction result is reduced, and the observability in the state space is correspondingly reduced, so that the priority is given to the more frequently-appearing target through cascade matching. And for the trackers in unconfirmed states, the unmatched trackers and the unmatched detection, performing IOU matching, and assigning by using the Hungarian algorithm again.
And Step5, updating parameters of the matched tracker, deleting the tracker which is not matched again, and initializing the detection of the non-matching as a new target. Judging whether the video stream is finished or not, and if so, exiting the loop; otherwise, entering next frame detection.
And 4, step 4: simulation experiment
And (3) qualitative experiment: the sequence in the MOT16 multi-target tracking data set is selected to perform a multi-target tracking experiment, and specific experiments such as fig. 5 show that the improved network model is improved to a certain extent in the aspects of accuracy, missing rate and the like.
Quantitative experiments: as shown in table 1, an MOT15 multi-target tracking data set is selected for testing, and 7 more advanced multi-target tracking algorithms are selected for comparison, so that the improved network model has obvious advantages and the performance indexes are improved correspondingly in an integrated manner.
TABLE 1 Multi-target tracking algorithm evaluation index contrast
The present invention is not intended to be limited to the particular embodiments shown above, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (6)
1. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model is characterized by comprising the following steps:
step 1: the pedestrian detection section: improving a YOLOv3 target detection network, introducing a depth separable convolution module, and replacing a standard convolution module in a Darknet-53 feature extraction layer with the depth separable convolution module; introducing a SENET module, and adding the SENET module into a YOLO prediction layer;
step 2: selecting a data set containing a pedestrian image from the public data set, using a K-means + + clustering algorithm to replace the K-means clustering algorithm to perform clustering analysis on the data set labels, and training a pedestrian detection Yolov3 network model;
and step 3: a multi-target tracking section: and (3) carrying out target detection by using a trained pedestrian detection YOLOv3 network model, and carrying out multi-target tracking on pedestrians by combining with a Deep-SORT algorithm.
2. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model according to claim 1, wherein the step1 is further specifically as follows:
step 1.1: a depth separable convolution module is introduced into the Darknet-53 feature extraction layer, and the depth separable convolution module is used for replacing a standard convolution module in the original Darknet-53; the depth separable convolution is to take channels and space regions into consideration separately, decompose standard convolution into depth convolution and point-by-point convolution, namely, firstly, respectively performing 3 × 3 convolution on 3 single channels in a feature map by using the depth convolution, collecting the features of each channel, then performing 1 × 1 point-by-point convolution on the feature map subjected to the depth convolution by using the point-by-point convolution, and collecting the features of each point;
step 1.2: a SENET module is introduced into a YOLO prediction layer, and the SENET module is respectively embedded after vectors are output by 26 th, 43 th and 53 th layers of a network.
3. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model according to claim 1 or 2, wherein the step2 is specifically as follows:
step 2.1: respectively extracting N pedestrian photos from the public data set, and labeling the photos by using a labeling tool; then dividing the pictures into a training set and a testing set according to the proportion;
step 2.2: and (3) carrying out prior frame clustering on the samples of the picture training set by using a K-means + + clustering algorithm instead of the K-means clustering algorithm to obtain a new anchor frame, and carrying out iterative training of a pedestrian detection YOLOv3 network model by using the new anchor frame.
4. The pedestrian multi-target tracking video recognition method based on the improved YOLOv3 model as claimed in claim 1 or 2, wherein a trained pedestrian detection YOLOv3 network model is required to detect the target before multi-target tracking, specifically:
inputting continuous frames of images with any size into a trained pedestrian detection YOLOv3 network model, firstly adaptively adjusting the input images, predicting B bounding boxes in each grid, detecting C-type targets, and outputting the bounding boxes of each type of targets and the confidence coefficients of the bounding boxes; the confidence of the bounding box is defined as: the intersection ratio between the bounding box and the actual bounding box of the object IOU is multiplied by the probability of the object existing in the bounding box, and the formula is calculated as follows:
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,is a bounding box and theComparing the actual bounding boxes of the objects;
by setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS method to obtain 5 parameters (x, y, w, h, p) of the boundary boxc) Where (x, y) is the relative coordinate of the center of the target with respect to the upper left corner of the cell, (w, h) is the width and height of the target and the entire image, respectively, pcThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
5. The pedestrian multi-target tracking video recognition method based on the improved YOLOv3 model as claimed in claim 3, wherein a trained pedestrian detection YOLOv3 network model is required to detect the target before multi-target tracking, specifically:
inputting continuous frames of images with any size into a trained pedestrian detection YOLOv3 network model, firstly adaptively adjusting the input images, predicting B bounding boxes in each grid, detecting C-type targets, and outputting the bounding boxes of each type of targets and the confidence coefficients of the bounding boxes; the confidence of the bounding box is defined as: the intersection ratio between the bounding box and the actual bounding box of the object IOU is multiplied by the probability of the object existing in the bounding box, and the formula is calculated as follows:
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,comparing the boundary box with the actual boundary box of the object;
by setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS method to obtain 5 parameters (x, y, w, h, p) of the boundary boxc) Where (x, y) is the relative coordinate of the center of the target with respect to the upper left corner of the cell(w, h) width and height of the object and the whole image, p, respectivelycThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
6. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model according to claim 1, 2 or 5, wherein the multi-target tracking in the step3 is specifically:
step 1: inputting a multi-target tracking algorithm: target coordinate information (c) obtained after improved YOLOv3 network detectionx,cyR, h, p) to obtain an 8-dimensional vector X ═ cx,cy,r,h,vx,vy,vr,vh]As input to a multi-target tracking algorithm; where p is the confidence score and the center coordinate of the bounding box is (c)x,cy) Aspect ratio r, height h, vx,vy,vr,vhRepresents cx,cyR, h velocity variation value
Step 2: and (3) state estimation: firstly, predicting the position of a tracker at the next moment by using Kalman filtering, and then updating the predicted position based on a detection result obtained by the Kalman filtering;
step 3: assignment problem: the Hungarian algorithm is utilized to solve the problem of correlation between a detection result obtained by the Kalman filtering algorithm and a tracking prediction result, and meanwhile, the correlation of motion information and the correlation of target appearance information are considered;
correlation of motion information: and (3) predicting the Mahalanobis distance between the state and the new measurement by adopting a Kalman filter to express the motion information:
in the formula (d)(1)(i, j) represents the degree of motion matching between the j detection frames and the ith track, djIndicates the position of the jth detection frame, yiState vector representing the ith trace, SiIndicating the detected position and the average positionA covariance matrix between the positions; if the mahalanobis distance of a certain correlation is smaller than a specified threshold value, and the threshold value is obtained from a single training set, setting the correlation of the motion state to be successful;
introducing a correlation method of target appearance information, measuring the distance between the apparent features by using cosine distance, wherein the calculation formula is as follows:
wherein the limiting condition is | | | ri||=1,For storing the feature vector, r, successfully associated with the most recent n framesi,rkRepresenting two intersected vectors, and measuring the apparent characteristics of the tracker and the apparent characteristics corresponding to the detection result by using cosine distance;
and the relevance measurement is obtained by weighting the motion model and the appearance model:
ci,j=λd(1)(i,j)+(1-λ)d(2)(i,j) (7)
in the formula, ci,jExpressing the comprehensive matching degree, wherein lambda is a hyper-parameter and is 0 by default; only c isi,jWhen the two types of measurement thresholds are within the intersection of the two types of measurement thresholds, the correct association is considered to be realized, and after the assignment is completed, unmatched detectors and trackers are classified;
step 4: cascade matching and IOU matching: after the target is shielded for a long time, giving priority to the more frequently appeared target by utilizing cascade matching; performing IOU matching on trackers in unconfirmed states, unmatched trackers and unmatched detection, and assigning by using the Hungarian algorithm again;
step5, updating parameters of the matched tracker, deleting the unmatched tracker again, and initializing the unmatched detection as a new target; judging whether the video stream is finished or not, and if so, exiting the loop; otherwise, entering next frame detection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110151278.9A CN112836639A (en) | 2021-02-03 | 2021-02-03 | Pedestrian multi-target tracking video identification method based on improved YOLOv3 model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110151278.9A CN112836639A (en) | 2021-02-03 | 2021-02-03 | Pedestrian multi-target tracking video identification method based on improved YOLOv3 model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112836639A true CN112836639A (en) | 2021-05-25 |
Family
ID=75931941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110151278.9A Pending CN112836639A (en) | 2021-02-03 | 2021-02-03 | Pedestrian multi-target tracking video identification method based on improved YOLOv3 model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836639A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113221808A (en) * | 2021-05-26 | 2021-08-06 | 新疆爱华盈通信息技术有限公司 | Dinner plate counting statistical method and device based on image recognition |
CN113313008A (en) * | 2021-05-26 | 2021-08-27 | 南京邮电大学 | Target and identification tracking method based on YOLOv3 network and mean shift |
CN113392754A (en) * | 2021-06-11 | 2021-09-14 | 成都掌中全景信息技术有限公司 | Method for reducing false detection rate of pedestrian based on yolov5 pedestrian detection algorithm |
CN113470076A (en) * | 2021-07-13 | 2021-10-01 | 南京农业大学 | Multi-target tracking method for yellow-feather chickens in flat-breeding henhouse |
CN113688797A (en) * | 2021-09-27 | 2021-11-23 | 江南大学 | Abnormal behavior identification method and system based on skeleton extraction |
CN113723361A (en) * | 2021-09-18 | 2021-11-30 | 西安邮电大学 | Video monitoring method and device based on deep learning |
CN113763427A (en) * | 2021-09-05 | 2021-12-07 | 东南大学 | Multi-target tracking method based on coarse-fine shielding processing |
CN113822153A (en) * | 2021-08-11 | 2021-12-21 | 桂林电子科技大学 | Unmanned aerial vehicle tracking method based on improved DeepSORT algorithm |
CN114241397A (en) * | 2022-02-23 | 2022-03-25 | 武汉烽火凯卓科技有限公司 | Frontier defense video intelligent analysis method and system |
CN114879891A (en) * | 2022-05-19 | 2022-08-09 | 中国人民武装警察部队工程大学 | Multi-mode man-machine interaction method under self-supervision multi-target tracking |
CN116188767A (en) * | 2023-01-13 | 2023-05-30 | 湖北普罗格科技股份有限公司 | Neural network-based stacked wood board counting method and system |
CN116416281A (en) * | 2023-04-28 | 2023-07-11 | 云观智慧科技(无锡)有限公司 | Grain depot AI video supervision and analysis method and system |
CN114879891B (en) * | 2022-05-19 | 2024-04-26 | 中国人民武装警察部队工程大学 | Multi-mode man-machine interaction method under self-supervision multi-target tracking |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215208A (en) * | 2020-11-10 | 2021-01-12 | 中国人民解放军战略支援部队信息工程大学 | Remote sensing image bridge target detection algorithm based on improved YOLOv4 |
CN112308881A (en) * | 2020-11-02 | 2021-02-02 | 西安电子科技大学 | Ship multi-target tracking method based on remote sensing image |
-
2021
- 2021-02-03 CN CN202110151278.9A patent/CN112836639A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308881A (en) * | 2020-11-02 | 2021-02-02 | 西安电子科技大学 | Ship multi-target tracking method based on remote sensing image |
CN112215208A (en) * | 2020-11-10 | 2021-01-12 | 中国人民解放军战略支援部队信息工程大学 | Remote sensing image bridge target detection algorithm based on improved YOLOv4 |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113313008A (en) * | 2021-05-26 | 2021-08-27 | 南京邮电大学 | Target and identification tracking method based on YOLOv3 network and mean shift |
CN113221808A (en) * | 2021-05-26 | 2021-08-06 | 新疆爱华盈通信息技术有限公司 | Dinner plate counting statistical method and device based on image recognition |
CN113313008B (en) * | 2021-05-26 | 2022-08-05 | 南京邮电大学 | Target and identification tracking method based on YOLOv3 network and mean shift |
CN113392754A (en) * | 2021-06-11 | 2021-09-14 | 成都掌中全景信息技术有限公司 | Method for reducing false detection rate of pedestrian based on yolov5 pedestrian detection algorithm |
CN113470076A (en) * | 2021-07-13 | 2021-10-01 | 南京农业大学 | Multi-target tracking method for yellow-feather chickens in flat-breeding henhouse |
CN113470076B (en) * | 2021-07-13 | 2024-03-12 | 南京农业大学 | Multi-target tracking method for yellow feather chickens in flat raising chicken house |
CN113822153A (en) * | 2021-08-11 | 2021-12-21 | 桂林电子科技大学 | Unmanned aerial vehicle tracking method based on improved DeepSORT algorithm |
CN113763427B (en) * | 2021-09-05 | 2024-02-23 | 东南大学 | Multi-target tracking method based on coarse-to-fine shielding processing |
CN113763427A (en) * | 2021-09-05 | 2021-12-07 | 东南大学 | Multi-target tracking method based on coarse-fine shielding processing |
CN113723361A (en) * | 2021-09-18 | 2021-11-30 | 西安邮电大学 | Video monitoring method and device based on deep learning |
CN113688797A (en) * | 2021-09-27 | 2021-11-23 | 江南大学 | Abnormal behavior identification method and system based on skeleton extraction |
CN114241397A (en) * | 2022-02-23 | 2022-03-25 | 武汉烽火凯卓科技有限公司 | Frontier defense video intelligent analysis method and system |
CN114241397B (en) * | 2022-02-23 | 2022-07-08 | 武汉烽火凯卓科技有限公司 | Frontier defense video intelligent analysis method and system |
CN114879891A (en) * | 2022-05-19 | 2022-08-09 | 中国人民武装警察部队工程大学 | Multi-mode man-machine interaction method under self-supervision multi-target tracking |
CN114879891B (en) * | 2022-05-19 | 2024-04-26 | 中国人民武装警察部队工程大学 | Multi-mode man-machine interaction method under self-supervision multi-target tracking |
CN116188767A (en) * | 2023-01-13 | 2023-05-30 | 湖北普罗格科技股份有限公司 | Neural network-based stacked wood board counting method and system |
CN116188767B (en) * | 2023-01-13 | 2023-09-08 | 湖北普罗格科技股份有限公司 | Neural network-based stacked wood board counting method and system |
CN116416281A (en) * | 2023-04-28 | 2023-07-11 | 云观智慧科技(无锡)有限公司 | Grain depot AI video supervision and analysis method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112836639A (en) | Pedestrian multi-target tracking video identification method based on improved YOLOv3 model | |
Jana et al. | YOLO based Detection and Classification of Objects in video records | |
CN110796186A (en) | Dry and wet garbage identification and classification method based on improved YOLOv3 network | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN111862145B (en) | Target tracking method based on multi-scale pedestrian detection | |
CN108564598B (en) | Improved online Boosting target tracking method | |
CN106952293B (en) | Target tracking method based on nonparametric online clustering | |
CN109035295B (en) | Multi-target tracking method, device, computer equipment and storage medium | |
CN110569782A (en) | Target detection method based on deep learning | |
CN109087337B (en) | Long-time target tracking method and system based on hierarchical convolution characteristics | |
CN112884742A (en) | Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method | |
Zheng et al. | Improvement of grayscale image 2D maximum entropy threshold segmentation method | |
CN110008899B (en) | Method for extracting and classifying candidate targets of visible light remote sensing image | |
CN113327272B (en) | Robustness long-time tracking method based on correlation filtering | |
Wang et al. | Multi-target pedestrian tracking based on yolov5 and deepsort | |
CN111259808A (en) | Detection and identification method of traffic identification based on improved SSD algorithm | |
CN109697727A (en) | Method for tracking target, system and storage medium based on correlation filtering and metric learning | |
CN111368634B (en) | Human head detection method, system and storage medium based on neural network | |
Mrabti et al. | Human motion tracking: A comparative study | |
CN114923491A (en) | Three-dimensional multi-target online tracking method based on feature fusion and distance fusion | |
CN108257148B (en) | Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking | |
CN112613565B (en) | Anti-occlusion tracking method based on multi-feature fusion and adaptive learning rate updating | |
Zhang et al. | Residual memory inference network for regression tracking with weighted gradient harmonized loss | |
CN111539987B (en) | Occlusion detection system and method based on discrimination model | |
Chen et al. | Improved yolov3 algorithm for ship target detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |