CN112836639A - Pedestrian multi-target tracking video identification method based on improved YOLOv3 model - Google Patents

Pedestrian multi-target tracking video identification method based on improved YOLOv3 model Download PDF

Info

Publication number
CN112836639A
CN112836639A CN202110151278.9A CN202110151278A CN112836639A CN 112836639 A CN112836639 A CN 112836639A CN 202110151278 A CN202110151278 A CN 202110151278A CN 112836639 A CN112836639 A CN 112836639A
Authority
CN
China
Prior art keywords
target
detection
algorithm
pedestrian
yolov3
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110151278.9A
Other languages
Chinese (zh)
Inventor
张相胜
沈庆
姚猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202110151278.9A priority Critical patent/CN112836639A/en
Publication of CN112836639A publication Critical patent/CN112836639A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Abstract

A pedestrian multi-target tracking video identification method based on an improved YOLOv3 model belongs to the field of image processing of computer vision. In the YOLOv3 network, the original standard convolution in the Darknet-53 characteristic extraction layer is replaced by the deep separable convolution; and introducing a SENET module in a prediction layer of the YOLOv3 network; and clustering the target frame in the selected data set by using a K-means + + clustering algorithm, optimizing the prior frame parameter of the network according to a clustering result, and correcting the anchor frame. The invention utilizes a tracking-by-detection framework and uses an improved YOLOv3 algorithm to realize the detection work of the target information, and the tracking part adopts a Deep-SORT algorithm for tracking, so that the whole algorithm can effectively reduce the conditions of missed detection and shielding, and can keep higher detection speed and better tracking effect.

Description

Pedestrian multi-target tracking video identification method based on improved YOLOv3 model
Technical Field
The invention belongs to the field of image processing of computer vision, and particularly relates to a method for improving a network structure of YOLOv3 aiming at the problems of high missing detection rate and low detection speed of a pedestrian target in multi-target tracking, so that the detection precision and the detection speed of a model on the pedestrian target are improved. The detection part detects the pedestrian target by adopting an improved YOLOv3 algorithm, the tracking part predicts the motion track of the target by using a Kalman filtering algorithm, and the data association part performs matching association on the target by using a Hungarian algorithm.
Background
With the rapid development of deep learning, compared with the characteristics of the traditional manual design, the convolutional neural network gradually shows advantages; the deep neural network shows excellent performance in the field of machine vision, and has gained wide attention of scholars; pedestrians serve as vulnerable groups in a road traffic environment, safety problems of the pedestrians are not small and varied, and establishment of a perfect pedestrian detection system becomes a research hotspot; in addition, the use of deep learning for driving assistance systems is also becoming a trend; the research of the target detection and tracking algorithm based on deep learning is carried out by using road pedestrian research objects.
In recent years, a multi-target tracking method based on detection gradually becomes a mainstream scheme in the field of multi-target tracking, but the method has high requirement on the accuracy of a detection result, and if the background is complex, the target detection is greatly influenced, so that the tracking effect is further influenced; even the current advanced YOLOv3 algorithm has the problems of low detection precision and low detection speed; secondly, how to effectively establish the target model between the detector and the tracker is also important. Therefore, it is a problem to be solved by those skilled in the art to provide a pedestrian detection and tracking algorithm with higher detection accuracy and higher detection speed.
Disclosure of Invention
In order to improve the detection precision and speed of a pedestrian multi-target tracking algorithm, the invention provides a pedestrian multi-target tracking video identification method based on an improved YOLOv3 network model; on the basis of a YOLOv3 network model and a Deep-SORT algorithm, aiming at the problems of occlusion and missing detection of target detection tracking, a prior frame is optimized by using a K-means + + clustering method, a SEnet module is embedded into a YOLOv3 network prediction layer, and aiming at the problem of low algorithm detection speed, a standard convolution of a YOLOv3 network is replaced by a Deep separable convolution network for feature extraction. A classical tracking-by-detection framework is selected, a detection part uses an improved YOLOv3 algorithm to realize the detection work of target information, and a tracking part selects a Deep-SORT algorithm for tracking.
The technical scheme adopted by the invention is as follows:
the pedestrian multi-target tracking video identification method based on the improved YOLOv3 model comprises the following steps:
step 1: the pedestrian detection section: improving a YOLOv3 target detection network, introducing a depth separable convolution module, and replacing a standard convolution module in a Darknet-53 feature extraction layer with the depth separable convolution module; introducing a SENET module, and adding the SENET module into a YOLO prediction layer;
step 2: selecting a data set containing a pedestrian image from the public data set, using a K-means + + clustering algorithm to replace the K-means clustering algorithm to perform clustering analysis on the data set labels, and training a pedestrian detection Yolov3 network model;
and step 3: a multi-target tracking section: carrying out target detection by using a trained pedestrian detection YOLOv3 network model, and carrying out multi-target tracking on pedestrians by combining with a Deep-SORT algorithm;
the step1 is further specifically as follows:
step 1.1: a depth separable convolution module is introduced into the Darknet-53 feature extraction layer, and the depth separable convolution module is used for replacing a standard convolution module in the original Darknet-53; the depth separable convolution is to take channels and space regions into consideration separately, decompose standard convolution into depth convolution and point-by-point convolution, namely, firstly, respectively performing 3 × 3 convolution on 3 single channels in a feature map by using the depth convolution, collecting the features of each channel, then performing 1 × 1 point-by-point convolution on the feature map subjected to the depth convolution by using the point-by-point convolution, and collecting the features of each point;
step 1.2: a SENET module is introduced into a YOLO prediction layer, and the SENET module is respectively embedded after vectors are output by 26 th, 43 th and 53 th layers of a network.
The step2 specifically comprises the following steps:
step 2.1: respectively extracting N pedestrian photos from the public data set, and labeling the photos by using a labeling tool; then dividing the pictures into a training set and a testing set according to the proportion;
step 2.2: and (3) carrying out prior frame clustering on the samples of the picture training set by using a K-means + + clustering algorithm instead of the K-means clustering algorithm to obtain a new anchor frame, and carrying out iterative training of a pedestrian detection YOLOv3 network model by using the new anchor frame.
Before multi-target tracking, a trained pedestrian detection YOLOv3 network model is required to be used for detecting targets, specifically:
inputting continuous frames of images with any size into a trained pedestrian detection YOLOv3 network model, firstly adaptively adjusting the input images, predicting B bounding boxes in each grid, detecting C-type targets, and outputting the bounding boxes of each type of targets and the confidence degrees of the bounding boxes. The confidence of the bounding box is defined as: the bounding box intersects the actual bounding box of the object and is compared to the IOU, multiplied by the probability that the object is present within the bounding box. Calculating the formula:
Figure BDA0002931835520000021
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,
Figure BDA0002931835520000022
the bounding box is compared with the actual bounding box of the object.
By setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS method to obtain 5 parameters (x, y, w, h, p) of the boundary boxc) Where (x, y) is the target center left relative to the cellRelative coordinates of the upper corners, (w, h) are the width and height of the target and the entire image, p, respectivelycThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
The multi-target tracking in the step3 specifically comprises the following steps:
step 1: inputting a multi-target tracking algorithm: target coordinate information (c) obtained after improved YOLOv3 network detectionx,cyR, h, p) to obtain an 8-dimensional vector X ═ cx,cy,r,h,vx,vy,vr,vh]As input to the multi-target tracking algorithm. Where p is the confidence score and the center coordinate of the bounding box is (c)x,cy) Aspect ratio r, height h, vx,vy,vr,vhRepresents cx,cyR, h velocity variation value
Step 2: and (3) state estimation: firstly, predicting the position of a tracker at the next moment by using Kalman filtering, and then updating the predicted position based on a detection result obtained by the Kalman filtering;
step 3: assignment problem: the Hungarian algorithm is utilized to solve the problem of correlation between a detection result obtained by the Kalman filtering algorithm and a tracking prediction result, and meanwhile, the correlation of motion information and the correlation of target appearance information are considered;
correlation of motion information: and (3) predicting the Mahalanobis distance between the state and the new measurement by adopting a Kalman filter to express the motion information:
Figure BDA0002931835520000031
in the formula (d)(1)(i, j) represents the degree of motion matching between the j detection frames and the ith track, djIndicates the position of the jth detection frame, yiState vector representing the ith trace, SiRepresenting the covariance matrix between the detected position and the average position. Setting the association of the movement state if the mahalanobis distance of a certain association is smaller than a specified threshold value, which is derived from a separate training setSuccess is achieved;
introducing a correlation method of target appearance information, measuring the distance between the apparent features by using cosine distance, wherein the calculation formula is as follows:
Figure BDA0002931835520000032
wherein the limiting condition is | | | ri||=1,
Figure BDA0002931835520000033
For storing the feature vector, r, successfully associated with the most recent n framesi,rkRepresenting two intersected vectors, and measuring the apparent characteristics of the tracker and the apparent characteristics corresponding to the detection result by using cosine distance;
and the relevance measurement is obtained by weighting the motion model and the appearance model:
ci,j=λd(1)(i,j)+(1-λ)d(2)(i,j) (7)
in the formula, ci,jAnd the comprehensive matching degree is shown, and lambda is a hyper-parameter and is 0 by default. Only c isi,jWhen the two types of measurement thresholds are within the intersection of the two types of measurement thresholds, the correct association is considered to be realized, and after the assignment is completed, unmatched detectors and trackers are classified;
step 4: cascade matching and IOU matching: when the target is shielded for a long time, the correctness of the Kalman filtering prediction result is reduced, and the observability in the state space is correspondingly reduced, so that the priority is given to the more frequently-appearing target by utilizing cascade matching. Performing IOU matching on trackers in unconfirmed states, unmatched trackers and unmatched detection, and assigning by using the Hungarian algorithm again;
and Step5, updating parameters of the matched tracker, deleting the tracker which is not matched again, and initializing the detection of the non-matching as a new target. Judging whether the video stream is finished or not, and if so, exiting the loop; otherwise, entering next frame detection.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
the deep separable convolution module is introduced into a YOLOv3 network model to replace a standard convolution module in YOLOv3, and the operation speed of the algorithm is increased.
According to the invention, the SENet module is added into the YOLOv3 prediction layer, and the characteristics of the SENet network reflecting the correlation and importance of the features between different channels are utilized, so that the feature extraction capability of the network is enhanced, and the detection precision is improved.
In the target detection network part, the K-means + + clustering algorithm is used for replacing the K-means clustering algorithm, and the anchor frame is modified, so that the characteristics of pedestrians are better met, the characteristic extraction is better performed, and the detection precision of the algorithm is improved.
The improved YOLOv3 algorithm is used for achieving detection work of target information, and the Deep-SORT algorithm is used for tracking in the tracking portion. Experimental results show that the provided tracking algorithm can effectively reduce the conditions of missed detection and shielding, and can keep higher detection speed and better tracking effect.
The above description is only an outline of the technical solution of the present invention, and the embodiments of the present invention will be described below in order to make the technical means of the present invention more clearly understood and to make the content, features, and advantages of the present invention more comprehensible.
Drawings
Fig. 1 is a flow chart of a specific algorithm of the present invention.
Fig. 2 is a diagram of an improved YOLOv3 network framework.
FIG. 3 is a diagram of the SEnet module architecture.
FIG. 4 is a diagram of a standard convolution structure and a depth separable convolution structure. Wherein, (a) represents a standard convolution structure, (b) represents a deep convolution structure, and (c) represents a point-by-point convolution structure.
FIG. 5 is a comparison of the test results of the model of the present invention and the original model. Wherein, (a) the tracking result of YOLOv3-Deep-SORT under different frame numbers, and (b) the tracking result under different frame numbers of the algorithm of the invention.
Detailed Description
The following further describes the embodiments of the present invention with reference to the drawings.
As shown in fig. 1, the present invention provides a pedestrian multi-target tracking method based on an improved YOLOv3 model, which includes:
step 1: the improved YOLOv3 sub-network of target detection, which is the basic operation based on detection tracking, as shown in fig. 2, is specifically divided into the following steps:
step 1.1: as shown in FIG. 4, a depth separable convolution module is introduced
A depth separable convolution module is introduced into a Darknet-53 feature extraction layer, and the depth separable convolution module is used for replacing the standard convolution in the original Darknet-53;
step 1.2: as shown in FIG. 3, the SENet module is introduced into the YOLO prediction layer
Embedding the SENet module after the vectors are respectively output by 26 th, 42 th and 53 th layers of a Darknet-53 feature extraction layer of a YOLOv3 network.
Step 2: and selecting a data set containing a pedestrian image from the VOC2007 picture, carrying out cluster analysis on the data set label by using a K-means + + clustering algorithm, and training a pedestrian detection Yolov3 network model. The method comprises the following steps:
step 2.1: 10000 photos of pedestrians in the VOC2007 and MOT 2015 public data sets are respectively extracted, and the pictures are respectively labeled by using a labeling tool; the pictures were then combined in a training set: the test set was 2: the training sample is selected according to the proportion of 1.
Step 2.2: and (3) carrying out prior frame clustering on the samples by using a K-means + + algorithm to obtain new anchors (the number of the anchors is selected to be 9), and carrying out iterative training on a Yolov3 pedestrian detection network model by using the new anchors.
And step 3: the improved YOLOv3 network is used as a detector for target detection, and is combined with a Deep-SORT multi-target tracking algorithm to realize multi-target tracking of pedestrians. The method comprises the following steps:
step 3.1: an object detection section: inputting continuous frames of images with any size into an improved YOLOv3 network model, firstly, adaptively adjusting the input images to 416 x 416, predicting B bounding boxes (B is 9) in each grid, detecting C-type objects (in pedestrian detection, the type is set as person), and outputting the bounding boxes and the confidence degrees of the bounding boxes of each type of objects. The confidence of the bounding box is defined as: the bounding box is cross-over-real (IOU) with the actual bounding box of the object, multiplied by the probability that the object is present within the bounding box. Calculating the formula:
Figure BDA0002931835520000051
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,
Figure BDA0002931835520000052
the bounding box is compared with the actual bounding box of the object.
By setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS (non-maximum suppression) method to obtain 5 parameters of the boundary box as (x, y, w, h, p)c) Where (x, y) is the relative coordinate of the center of the target with respect to the upper left corner of the cell, (w, h) is the ratio of the width and height of the target to the entire image, pcThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
Step 3.2: referring to fig. 1, the improved YOLOv3 network is used as a detector for target detection, and the multi-target tracking part specifically includes the following steps:
step 1: target detection: target detection is carried out on the input video stream to obtain frame and characteristic information, and then the target coordinate information (c) obtained after detection is carried outx,cyR, h, p) to obtain an 8-dimensional vector X ═ cx,cy,r,h,vx,vy,vr,vh]As input to the multi-target tracking algorithm. Where p is the confidence score and the center coordinate of the bounding box is (c)x,cy) Aspect ratio r, height h, and respective speedsDegree change value
Step 2: and (3) state estimation: the position of the tracker at the next time instant is first predicted using kalman filtering, and then the predicted position is updated based on the detection result.
Step 3: assignment problem: the Hungarian algorithm is utilized to solve the problem of association between the detection result and the tracking prediction result, and meanwhile, the association of the motion information and the association of the target appearance information are considered.
Correlation of motion information: and (3) predicting the Mahalanobis distance between the state and the new measurement by adopting a Kalman filter to express the motion information:
Figure BDA0002931835520000061
in the formula (d)(1)(i, j) represents the degree of motion matching between the j detection frames and the ith track, djIndicates the position of the jth detection frame, yiState vector representing the ith trace, SiRepresenting the covariance matrix between the detected position and the average position. The association of the set motion state is successful if the mahalanobis distance of a certain association is smaller than a specified threshold (which is derived from a separate training set).
Introducing a correlation method of target appearance information, measuring the distance between the apparent features by using cosine distance, wherein the calculation formula is as follows:
Figure BDA0002931835520000062
in the formula, the limitation is | | | ri||=1,
Figure BDA0002931835520000063
Used to store the feature vectors that were successfully associated with the last 100 frames. The cosine distance is used to measure the apparent characteristics of the tracker and the apparent characteristics corresponding to the detection result.
And the relevance measurement is obtained by weighting the motion model and the appearance model:
ci,j=λd(1)(i,j)+(1-λ)d(2)(i,j) (4)
in the formula, ci,jAnd the comprehensive matching degree is shown, and lambda is a hyper-parameter and is 0 by default. Only c isi,jWhen the assignment is complete, the unmatched detectors and trackers are classified.
Step 4: cascade matching and IOU matching: when the target is shielded for a long time, the correctness of the Kalman filtering prediction result is reduced, and the observability in the state space is correspondingly reduced, so that the priority is given to the more frequently-appearing target through cascade matching. And for the trackers in unconfirmed states, the unmatched trackers and the unmatched detection, performing IOU matching, and assigning by using the Hungarian algorithm again.
And Step5, updating parameters of the matched tracker, deleting the tracker which is not matched again, and initializing the detection of the non-matching as a new target. Judging whether the video stream is finished or not, and if so, exiting the loop; otherwise, entering next frame detection.
And 4, step 4: simulation experiment
And (3) qualitative experiment: the sequence in the MOT16 multi-target tracking data set is selected to perform a multi-target tracking experiment, and specific experiments such as fig. 5 show that the improved network model is improved to a certain extent in the aspects of accuracy, missing rate and the like.
Quantitative experiments: as shown in table 1, an MOT15 multi-target tracking data set is selected for testing, and 7 more advanced multi-target tracking algorithms are selected for comparison, so that the improved network model has obvious advantages and the performance indexes are improved correspondingly in an integrated manner.
TABLE 1 Multi-target tracking algorithm evaluation index contrast
Figure BDA0002931835520000064
Figure BDA0002931835520000071
The present invention is not intended to be limited to the particular embodiments shown above, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model is characterized by comprising the following steps:
step 1: the pedestrian detection section: improving a YOLOv3 target detection network, introducing a depth separable convolution module, and replacing a standard convolution module in a Darknet-53 feature extraction layer with the depth separable convolution module; introducing a SENET module, and adding the SENET module into a YOLO prediction layer;
step 2: selecting a data set containing a pedestrian image from the public data set, using a K-means + + clustering algorithm to replace the K-means clustering algorithm to perform clustering analysis on the data set labels, and training a pedestrian detection Yolov3 network model;
and step 3: a multi-target tracking section: and (3) carrying out target detection by using a trained pedestrian detection YOLOv3 network model, and carrying out multi-target tracking on pedestrians by combining with a Deep-SORT algorithm.
2. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model according to claim 1, wherein the step1 is further specifically as follows:
step 1.1: a depth separable convolution module is introduced into the Darknet-53 feature extraction layer, and the depth separable convolution module is used for replacing a standard convolution module in the original Darknet-53; the depth separable convolution is to take channels and space regions into consideration separately, decompose standard convolution into depth convolution and point-by-point convolution, namely, firstly, respectively performing 3 × 3 convolution on 3 single channels in a feature map by using the depth convolution, collecting the features of each channel, then performing 1 × 1 point-by-point convolution on the feature map subjected to the depth convolution by using the point-by-point convolution, and collecting the features of each point;
step 1.2: a SENET module is introduced into a YOLO prediction layer, and the SENET module is respectively embedded after vectors are output by 26 th, 43 th and 53 th layers of a network.
3. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model according to claim 1 or 2, wherein the step2 is specifically as follows:
step 2.1: respectively extracting N pedestrian photos from the public data set, and labeling the photos by using a labeling tool; then dividing the pictures into a training set and a testing set according to the proportion;
step 2.2: and (3) carrying out prior frame clustering on the samples of the picture training set by using a K-means + + clustering algorithm instead of the K-means clustering algorithm to obtain a new anchor frame, and carrying out iterative training of a pedestrian detection YOLOv3 network model by using the new anchor frame.
4. The pedestrian multi-target tracking video recognition method based on the improved YOLOv3 model as claimed in claim 1 or 2, wherein a trained pedestrian detection YOLOv3 network model is required to detect the target before multi-target tracking, specifically:
inputting continuous frames of images with any size into a trained pedestrian detection YOLOv3 network model, firstly adaptively adjusting the input images, predicting B bounding boxes in each grid, detecting C-type targets, and outputting the bounding boxes of each type of targets and the confidence coefficients of the bounding boxes; the confidence of the bounding box is defined as: the intersection ratio between the bounding box and the actual bounding box of the object IOU is multiplied by the probability of the object existing in the bounding box, and the formula is calculated as follows:
Figure FDA0002931835510000011
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,
Figure FDA0002931835510000012
is a bounding box and theComparing the actual bounding boxes of the objects;
by setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS method to obtain 5 parameters (x, y, w, h, p) of the boundary boxc) Where (x, y) is the relative coordinate of the center of the target with respect to the upper left corner of the cell, (w, h) is the width and height of the target and the entire image, respectively, pcThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
5. The pedestrian multi-target tracking video recognition method based on the improved YOLOv3 model as claimed in claim 3, wherein a trained pedestrian detection YOLOv3 network model is required to detect the target before multi-target tracking, specifically:
inputting continuous frames of images with any size into a trained pedestrian detection YOLOv3 network model, firstly adaptively adjusting the input images, predicting B bounding boxes in each grid, detecting C-type targets, and outputting the bounding boxes of each type of targets and the confidence coefficients of the bounding boxes; the confidence of the bounding box is defined as: the intersection ratio between the bounding box and the actual bounding box of the object IOU is multiplied by the probability of the object existing in the bounding box, and the formula is calculated as follows:
Figure FDA0002931835510000021
where Confidence is the Confidence of the bounding box, Pr(Object) is the probability that an Object exists within the bounding box,
Figure FDA0002931835510000022
comparing the boundary box with the actual boundary box of the object;
by setting a threshold, eliminating the boundary box with the category confidence lower than the threshold, and then screening the boundary box by adopting an NMS method to obtain 5 parameters (x, y, w, h, p) of the boundary boxc) Where (x, y) is the relative coordinate of the center of the target with respect to the upper left corner of the cell(w, h) width and height of the object and the whole image, p, respectivelycThe probability value representing the target class is normalized, and the final network output is S × (5 × B + C).
6. The pedestrian multi-target tracking video identification method based on the improved YOLOv3 model according to claim 1, 2 or 5, wherein the multi-target tracking in the step3 is specifically:
step 1: inputting a multi-target tracking algorithm: target coordinate information (c) obtained after improved YOLOv3 network detectionx,cyR, h, p) to obtain an 8-dimensional vector X ═ cx,cy,r,h,vx,vy,vr,vh]As input to a multi-target tracking algorithm; where p is the confidence score and the center coordinate of the bounding box is (c)x,cy) Aspect ratio r, height h, vx,vy,vr,vhRepresents cx,cyR, h velocity variation value
Step 2: and (3) state estimation: firstly, predicting the position of a tracker at the next moment by using Kalman filtering, and then updating the predicted position based on a detection result obtained by the Kalman filtering;
step 3: assignment problem: the Hungarian algorithm is utilized to solve the problem of correlation between a detection result obtained by the Kalman filtering algorithm and a tracking prediction result, and meanwhile, the correlation of motion information and the correlation of target appearance information are considered;
correlation of motion information: and (3) predicting the Mahalanobis distance between the state and the new measurement by adopting a Kalman filter to express the motion information:
Figure FDA0002931835510000031
in the formula (d)(1)(i, j) represents the degree of motion matching between the j detection frames and the ith track, djIndicates the position of the jth detection frame, yiState vector representing the ith trace, SiIndicating the detected position and the average positionA covariance matrix between the positions; if the mahalanobis distance of a certain correlation is smaller than a specified threshold value, and the threshold value is obtained from a single training set, setting the correlation of the motion state to be successful;
introducing a correlation method of target appearance information, measuring the distance between the apparent features by using cosine distance, wherein the calculation formula is as follows:
Figure FDA0002931835510000032
wherein the limiting condition is | | | ri||=1,
Figure FDA0002931835510000033
For storing the feature vector, r, successfully associated with the most recent n framesi,rkRepresenting two intersected vectors, and measuring the apparent characteristics of the tracker and the apparent characteristics corresponding to the detection result by using cosine distance;
and the relevance measurement is obtained by weighting the motion model and the appearance model:
ci,j=λd(1)(i,j)+(1-λ)d(2)(i,j) (7)
in the formula, ci,jExpressing the comprehensive matching degree, wherein lambda is a hyper-parameter and is 0 by default; only c isi,jWhen the two types of measurement thresholds are within the intersection of the two types of measurement thresholds, the correct association is considered to be realized, and after the assignment is completed, unmatched detectors and trackers are classified;
step 4: cascade matching and IOU matching: after the target is shielded for a long time, giving priority to the more frequently appeared target by utilizing cascade matching; performing IOU matching on trackers in unconfirmed states, unmatched trackers and unmatched detection, and assigning by using the Hungarian algorithm again;
step5, updating parameters of the matched tracker, deleting the unmatched tracker again, and initializing the unmatched detection as a new target; judging whether the video stream is finished or not, and if so, exiting the loop; otherwise, entering next frame detection.
CN202110151278.9A 2021-02-03 2021-02-03 Pedestrian multi-target tracking video identification method based on improved YOLOv3 model Pending CN112836639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110151278.9A CN112836639A (en) 2021-02-03 2021-02-03 Pedestrian multi-target tracking video identification method based on improved YOLOv3 model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110151278.9A CN112836639A (en) 2021-02-03 2021-02-03 Pedestrian multi-target tracking video identification method based on improved YOLOv3 model

Publications (1)

Publication Number Publication Date
CN112836639A true CN112836639A (en) 2021-05-25

Family

ID=75931941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110151278.9A Pending CN112836639A (en) 2021-02-03 2021-02-03 Pedestrian multi-target tracking video identification method based on improved YOLOv3 model

Country Status (1)

Country Link
CN (1) CN112836639A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221808A (en) * 2021-05-26 2021-08-06 新疆爱华盈通信息技术有限公司 Dinner plate counting statistical method and device based on image recognition
CN113313008A (en) * 2021-05-26 2021-08-27 南京邮电大学 Target and identification tracking method based on YOLOv3 network and mean shift
CN113392754A (en) * 2021-06-11 2021-09-14 成都掌中全景信息技术有限公司 Method for reducing false detection rate of pedestrian based on yolov5 pedestrian detection algorithm
CN113470076A (en) * 2021-07-13 2021-10-01 南京农业大学 Multi-target tracking method for yellow-feather chickens in flat-breeding henhouse
CN113688797A (en) * 2021-09-27 2021-11-23 江南大学 Abnormal behavior identification method and system based on skeleton extraction
CN113723361A (en) * 2021-09-18 2021-11-30 西安邮电大学 Video monitoring method and device based on deep learning
CN113763427A (en) * 2021-09-05 2021-12-07 东南大学 Multi-target tracking method based on coarse-fine shielding processing
CN113822153A (en) * 2021-08-11 2021-12-21 桂林电子科技大学 Unmanned aerial vehicle tracking method based on improved DeepSORT algorithm
CN114241397A (en) * 2022-02-23 2022-03-25 武汉烽火凯卓科技有限公司 Frontier defense video intelligent analysis method and system
CN114879891A (en) * 2022-05-19 2022-08-09 中国人民武装警察部队工程大学 Multi-mode man-machine interaction method under self-supervision multi-target tracking
CN116188767A (en) * 2023-01-13 2023-05-30 湖北普罗格科技股份有限公司 Neural network-based stacked wood board counting method and system
CN116416281A (en) * 2023-04-28 2023-07-11 云观智慧科技(无锡)有限公司 Grain depot AI video supervision and analysis method and system
CN114879891B (en) * 2022-05-19 2024-04-26 中国人民武装警察部队工程大学 Multi-mode man-machine interaction method under self-supervision multi-target tracking

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215208A (en) * 2020-11-10 2021-01-12 中国人民解放军战略支援部队信息工程大学 Remote sensing image bridge target detection algorithm based on improved YOLOv4
CN112308881A (en) * 2020-11-02 2021-02-02 西安电子科技大学 Ship multi-target tracking method based on remote sensing image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308881A (en) * 2020-11-02 2021-02-02 西安电子科技大学 Ship multi-target tracking method based on remote sensing image
CN112215208A (en) * 2020-11-10 2021-01-12 中国人民解放军战略支援部队信息工程大学 Remote sensing image bridge target detection algorithm based on improved YOLOv4

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313008A (en) * 2021-05-26 2021-08-27 南京邮电大学 Target and identification tracking method based on YOLOv3 network and mean shift
CN113221808A (en) * 2021-05-26 2021-08-06 新疆爱华盈通信息技术有限公司 Dinner plate counting statistical method and device based on image recognition
CN113313008B (en) * 2021-05-26 2022-08-05 南京邮电大学 Target and identification tracking method based on YOLOv3 network and mean shift
CN113392754A (en) * 2021-06-11 2021-09-14 成都掌中全景信息技术有限公司 Method for reducing false detection rate of pedestrian based on yolov5 pedestrian detection algorithm
CN113470076A (en) * 2021-07-13 2021-10-01 南京农业大学 Multi-target tracking method for yellow-feather chickens in flat-breeding henhouse
CN113470076B (en) * 2021-07-13 2024-03-12 南京农业大学 Multi-target tracking method for yellow feather chickens in flat raising chicken house
CN113822153A (en) * 2021-08-11 2021-12-21 桂林电子科技大学 Unmanned aerial vehicle tracking method based on improved DeepSORT algorithm
CN113763427B (en) * 2021-09-05 2024-02-23 东南大学 Multi-target tracking method based on coarse-to-fine shielding processing
CN113763427A (en) * 2021-09-05 2021-12-07 东南大学 Multi-target tracking method based on coarse-fine shielding processing
CN113723361A (en) * 2021-09-18 2021-11-30 西安邮电大学 Video monitoring method and device based on deep learning
CN113688797A (en) * 2021-09-27 2021-11-23 江南大学 Abnormal behavior identification method and system based on skeleton extraction
CN114241397A (en) * 2022-02-23 2022-03-25 武汉烽火凯卓科技有限公司 Frontier defense video intelligent analysis method and system
CN114241397B (en) * 2022-02-23 2022-07-08 武汉烽火凯卓科技有限公司 Frontier defense video intelligent analysis method and system
CN114879891A (en) * 2022-05-19 2022-08-09 中国人民武装警察部队工程大学 Multi-mode man-machine interaction method under self-supervision multi-target tracking
CN114879891B (en) * 2022-05-19 2024-04-26 中国人民武装警察部队工程大学 Multi-mode man-machine interaction method under self-supervision multi-target tracking
CN116188767A (en) * 2023-01-13 2023-05-30 湖北普罗格科技股份有限公司 Neural network-based stacked wood board counting method and system
CN116188767B (en) * 2023-01-13 2023-09-08 湖北普罗格科技股份有限公司 Neural network-based stacked wood board counting method and system
CN116416281A (en) * 2023-04-28 2023-07-11 云观智慧科技(无锡)有限公司 Grain depot AI video supervision and analysis method and system

Similar Documents

Publication Publication Date Title
CN112836639A (en) Pedestrian multi-target tracking video identification method based on improved YOLOv3 model
Jana et al. YOLO based Detection and Classification of Objects in video records
CN110796186A (en) Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN107633226B (en) Human body motion tracking feature processing method
CN111862145B (en) Target tracking method based on multi-scale pedestrian detection
CN108564598B (en) Improved online Boosting target tracking method
CN106952293B (en) Target tracking method based on nonparametric online clustering
CN109035295B (en) Multi-target tracking method, device, computer equipment and storage medium
CN110569782A (en) Target detection method based on deep learning
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN112884742A (en) Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method
Zheng et al. Improvement of grayscale image 2D maximum entropy threshold segmentation method
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN113327272B (en) Robustness long-time tracking method based on correlation filtering
Wang et al. Multi-target pedestrian tracking based on yolov5 and deepsort
CN111259808A (en) Detection and identification method of traffic identification based on improved SSD algorithm
CN109697727A (en) Method for tracking target, system and storage medium based on correlation filtering and metric learning
CN111368634B (en) Human head detection method, system and storage medium based on neural network
Mrabti et al. Human motion tracking: A comparative study
CN114923491A (en) Three-dimensional multi-target online tracking method based on feature fusion and distance fusion
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
CN112613565B (en) Anti-occlusion tracking method based on multi-feature fusion and adaptive learning rate updating
Zhang et al. Residual memory inference network for regression tracking with weighted gradient harmonized loss
CN111539987B (en) Occlusion detection system and method based on discrimination model
Chen et al. Improved yolov3 algorithm for ship target detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination