CN112884810A - Pedestrian tracking method based on YOLOv3 - Google Patents
Pedestrian tracking method based on YOLOv3 Download PDFInfo
- Publication number
- CN112884810A CN112884810A CN202110290409.1A CN202110290409A CN112884810A CN 112884810 A CN112884810 A CN 112884810A CN 202110290409 A CN202110290409 A CN 202110290409A CN 112884810 A CN112884810 A CN 112884810A
- Authority
- CN
- China
- Prior art keywords
- target
- tracking
- frame
- video
- color
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims description 40
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 5
- 238000001914 filtration Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration by the use of histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a pedestrian tracking method based on YOLOv3, and relates to the field of target tracking application in computer vision. The method comprises three parts of target detection, target matching and target prediction. The target detection part realizes the identification of all pedestrians in the field of view by using YOLOv 3; the target matching part matches the result of target identification with the template based on the color characteristics of the object, and locks the target to be tracked; the target prediction part predicts the position of the next frame of the target and reduces the detection range, thereby realizing the improvement of the tracking accuracy. The invention can track a selected pedestrian in the video with a single target, the tracking precision reaches about 99%, the tracking speed reaches about 22 frames per second, and the real-time requirement can be met.
Description
Technical Field
The invention relates to the field of target tracking application in computer vision, in particular to a pedestrian tracking method based on YOLOv 3.
Background
The YOLO (You Only Look one) algorithm is a target detection algorithm based on deep learning proposed by Redmon et al in 2016, and is unique in the neighborhood of the deep learning target detection algorithm with the advantages of simple network, high detection speed and the like. YOLOv3 is the third version of its family of algorithms, which is popular in the industry with high stability.
As an important application field of computer vision, target tracking develops through the following three stages: early classical target tracking algorithms, correlation filtering based target tracking algorithms and today's deep learning based target tracking algorithms, wherein the deep learning based target tracking algorithms become the mainstream algorithms of today's target tracking technologies. In 2012, a target detection algorithm based on deep learning has obtained great success in the fields of image recognition and the like, wherein the most representative is an AlexNet network. Taking the distance as a boundary, the target detection and tracking algorithm based on deep learning begins to show the head and corner completely in the field of computer vision. The deep learning algorithm achieved very excellent performance on the VOT (Visual-Object-Tracking Change) 2017 tournament. Although the processing speed of the target tracking algorithm based on deep learning is not higher than that of the target tracking algorithm based on the relevant filtering, the tracking accuracy rate of the target tracking algorithm based on deep learning presents a rolling posture, and the target tracking algorithm based on the relevant filtering and the early classical algorithm are far better.
Disclosure of Invention
Aiming at the problems of low single-target tracking accuracy, low real-time performance and the like of a target tracking algorithm based on deep learning, the invention provides a pedestrian tracking method based on YOLOv3, which realizes high accuracy and meets the requirement of real-time target tracking.
The invention has the technical scheme that the pedestrian tracking method based on the YOLOv3 specifically comprises the following steps:
step 1: a user manually selects a tracking target in a first frame of a video to be detected by using machine vision software, and the tracking target is used as a template;
pausing the video at a first frame, and waiting for a user to manually frame a target to be tracked; pressing a left mouse button to open drawing authority; moving a mouse and selecting a target to be tracked in a frame; lifting a left mouse button and closing the drawing authority; the drawing result is confirmed and used as a template.
Step 2: detecting all pedestrians in the video by using a YOLOv3 algorithm;
setting a 'person' label as a detection standard by adopting a COCO data set preset by the Yolo official, and only reserving a target with the label of 'person'; all objects labeled "person" in the video, i.e., all pedestrians, are detected using the YOLOv3 algorithm.
Step 2.1: adjusting the size of an input picture to be a fixed size;
step 2.2: detecting an object specified by the COCO data set through a Darknet-53 neural network;
step 2.3: setting size classification standards, and outputting in three branches, namely large, medium and small according to different target sizes;
and step 3: matching all pedestrian detection results with a tracking target template by using a color histogram algorithm, and locking a tracking target;
step 3.1: analyzing the image color characteristics by using a color histogram algorithm;
adjusting the sizes of two pictures to be compared to be consistent, calculating and counting color histograms, respectively counting three color histograms of R, G and B for three-channel color pictures, and finally calculating the Babbitt coefficient, wherein the calculation formula of the Babbitt coefficient is
Wherein rho is a Papanicolaou coefficient, P and P 'are color histograms of the two pictures respectively, P (i) and P' (i) are ith components of the color histograms of the two pictures respectively, and N is the total number of the components;
step 3.2: and calculating the color difference between all pedestrians and the template in the detection result and calculating the Papanicolaou coefficient of the pedestrians by taking the image color characteristics as matching conditions, and taking the pedestrian detection result with the maximum Papanicolaou coefficient, namely the minimum image difference as a tracking target.
And 4, step 4: predicting the position of a tracking target in the next frame of the video by using a K neighborhood algorithm, reducing the detection range and improving the tracking accuracy;
using a K neighborhood algorithm for a pedestrian detection tracking result of the current frame; the K neighborhood is based on a target rectangular frame detected by a current frame, an adjacent area is searched for in a next frame by taking the rectangular frame as a reference, and a center point of the search rectangle is coincided with a coordinate position of a center point of the basic rectangular frame, as shown in the following formula:
wherein WsearchWidth, H, representing a rectangular search areasearchRepresents the height of a rectangular search area; wobjectWidth, H, of the rectangular region representing the object in the previous frameobjectAnd K is the expansion ratio of the prediction frame relative to the detection frame.
And 5: judging whether the video is finished:
if the video is not processed, jumping to the step 2; and if the video is processed, ending the program and completing the pedestrian tracking.
The beneficial effects produced by adopting the technical method are as follows:
the invention provides a pedestrian tracking method based on YOLOv3, which greatly improves the tracking accuracy of a single target, and the target can be freely specified by a user and has strong universality. The defects of low tracking accuracy, poor real-time performance and single tracking target in the prior art are overcome.
Drawings
FIG. 1 is a flow chart of a pedestrian tracking method based on YOLOv3 in the invention
FIG. 2 is a schematic diagram of a statistical histogram of colors according to the present invention;
FIG. 3 is a flow chart of a K neighborhood algorithm in an embodiment of the present invention;
FIG. 4 is a diagram illustrating the K neighborhood algorithm results of the present invention;
FIG. 5 is a schematic diagram illustrating the effect of the pedestrian tracking method based on YOLOv3 according to the present invention;
FIG. 6 is a flow chart of the YOLOv3 algorithm in an embodiment of the present invention;
fig. 7 is a flowchart of a color histogram algorithm in an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The invention provides a pedestrian tracking method based on YOLOv3, a flow chart of which is shown in figure 1 and comprises the following steps:
step 1: a user manually selects a tracking target in a first frame of a video to be detected by using machine vision software, and the tracking target is used as a template;
pausing the video in a first frame, manually framing a target to be tracked in the first frame of the video by a user, and opening a drawing authority by pressing a left mouse button; then, the mouse is moved in a state that the left button of the mouse is pressed down, the selected frame wraps the target as much as possible, and interference caused by background information is reduced; and lifting a left mouse button after the frame selection is finished, closing the drawing authority, and taking the frame-selected part as a template.
In this embodiment, simulation is performed on an R740 server, where a CPU is Inter Xeon, a GPU is Nvidia Titan X, an operating system is a Linux operating system Ubuntu server 16.04, and an OpenCV version is 3.4.5;
step 2: all pedestrians in the video were detected using the YOLOv3 algorithm, as shown in fig. 6;
setting a 'person' label as a detection standard by adopting a COCO data set preset by the Yolo official, and only reserving a target with the label of 'person'; all objects labeled "person" in the video, i.e., all pedestrians, are detected using the YOLOv3 algorithm.
Step 2.1: adjusting the size of the input picture to a fixed size, in this embodiment, the size is 416 × 416;
step 2.2: detecting an object specified by the COCO data set through a Darknet-53 neural network;
step 2.3: setting size classification standards, and outputting in three branches, namely large, medium and small according to different target sizes;
and step 3: matching all pedestrian detection results with a tracking target template by using a color histogram algorithm, and locking a tracking target;
step 3.1: analyzing the image color characteristics by using a color histogram algorithm, as shown in fig. 7;
the sizes of two pictures to be compared are adjusted to be consistent, a statistical color histogram is calculated, and for the three-channel color picture, three color histograms of R, G and B are respectively calculated, as shown in FIG. 2, R is a red component of a color image pixel value, G is a green component of the color image pixel value, and B is a blue component of the color image pixel value. Finally, calculating the Babbitt coefficient, wherein the larger the obtained Babbitt coefficient is, the more similar the pictures are; the calculation formula of the Babbitt coefficient is
Wherein ρ is a babbitt coefficient, P and P 'are color histograms of the two pictures, P (i) and P' (i) are ith components of the color histograms of the two pictures, N is a total number of the components, and the value of N in this embodiment is 256. And taking the candidate block diagram with the minimum difference as a tracking target, and displaying the frame of the candidate block diagram on a screen.
Step 3.2: and calculating the color difference between all pedestrians and the template in the detection result and calculating the Papanicolaou coefficient of the pedestrians by taking the image color characteristics as matching conditions, and taking the pedestrian detection result with the maximum Papanicolaou coefficient, namely the minimum image difference as a tracking target.
And 4, step 4: predicting the position of a tracking target in the next frame of the video by using a K neighborhood algorithm as shown in FIG. 3, reducing the detection range and improving the tracking accuracy, wherein a result graph is shown in FIG. 4;
using a K neighborhood algorithm for a pedestrian detection tracking result of the current frame; the K neighborhood is based on a target rectangular frame detected by a current frame, an adjacent area is searched for in a next frame by taking the rectangular frame as a reference, and a center point of the search rectangle is coincided with a coordinate position of a center point of the basic rectangular frame, as shown in the following formula:
wherein WsearchWidth, H, representing a rectangular search areasearchRepresents the height of a rectangular search area; wobjectWidth, H, of the rectangular region representing the object in the previous frameobjectThe height of the target rectangular region in the previous frame is indicated, K is the expansion ratio of the prediction frame to the detection frame, and in this embodiment, K is 2.
The visual field is reduced through a target prediction algorithm, the calculated amount is reduced, and the tracking accuracy is improved.
And 5: judging whether the video is finished:
if the video is not processed, jumping to the step 2; and if the video is processed, ending the program and completing the pedestrian tracking.
The final experimental result is shown in fig. 5, the tracking accuracy is 99.5%, and the tracking speed is about 22 frames per second, which meets the requirements of practical application.
Claims (3)
1. A pedestrian tracking method based on YOLOv3 is characterized by comprising the following steps:
step 1: a user manually selects a tracking target in a first frame of a video to be detected by using machine vision software, and the tracking target is used as a template;
pausing the video at a first frame, and waiting for a user to manually frame a target to be tracked; pressing a left mouse button to open drawing authority; moving a mouse and selecting a target to be tracked in a frame; lifting a left mouse button and closing the drawing authority; confirming the drawing result as a template;
step 2: detecting all pedestrians in the video by using a YOLOv3 algorithm;
setting a 'person' label as a detection standard by adopting a COCO data set preset by the Yolo official, and only reserving a target with the label of 'person'; detecting all objects labeled as 'person' in the video by using a YOLOv3 algorithm, namely detecting all pedestrians;
step 2.1: adjusting the size of an input picture to be a fixed size;
step 2.2: detecting an object specified by the COCO data set through a Darknet-53 neural network;
step 2.3: setting size classification standards, and outputting in three branches, namely large, medium and small according to different target sizes;
and step 3: matching all pedestrian detection results with a tracking target template by using a color histogram algorithm, and locking a tracking target;
and 4, step 4: predicting the position of a tracking target in the next frame of the video by using a K neighborhood algorithm on the pedestrian detection tracking result of the current frame, reducing the detection range and improving the tracking accuracy;
and 5: judging whether the video is finished:
if the video is not processed, jumping to the step 2; and if the video is processed, ending the program and completing the pedestrian tracking.
2. The method for pedestrian tracking based on YOLOv3 as claimed in claim 1, wherein step 3 specifically comprises the steps of:
step 3.1: analyzing the image color characteristics by using a color histogram algorithm;
adjusting the sizes of two pictures to be compared to be consistent, calculating and counting color histograms, respectively counting three color histograms of R, G and B for three-channel color pictures, and finally calculating the Babbitt coefficient, wherein the calculation formula of the Babbitt coefficient is
Wherein rho is a Papanicolaou coefficient, P and P 'are color histograms of the two pictures respectively, P (i) and P' (i) are ith components of the color histograms of the two pictures respectively, and N is the total number of the components;
step 3.2: and calculating the color difference between all pedestrians and the template in the detection result and calculating the Papanicolaou coefficient of the pedestrians by taking the image color characteristics as matching conditions, and taking the pedestrian detection result with the maximum Papanicolaou coefficient, namely the minimum image difference as a tracking target.
3. The method as claimed in claim 1, wherein the K neighborhood is based on a target rectangle detected in the current frame, and in the next frame, the neighborhood is searched based on the target rectangle, and the coordinate position of the center point of the search rectangle coincides with the coordinate position of the center point of the base rectangle, as shown in the following formula:
wherein WsearchWidth, H, representing a rectangular search areasearchRepresents the height of a rectangular search area; wobjectWidth, H, of the rectangular region representing the object in the previous frameobjectAnd K is the expansion ratio of the prediction frame relative to the detection frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110290409.1A CN112884810B (en) | 2021-03-18 | 2021-03-18 | Pedestrian tracking method based on YOLOv3 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110290409.1A CN112884810B (en) | 2021-03-18 | 2021-03-18 | Pedestrian tracking method based on YOLOv3 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112884810A true CN112884810A (en) | 2021-06-01 |
CN112884810B CN112884810B (en) | 2024-02-02 |
Family
ID=76041053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110290409.1A Active CN112884810B (en) | 2021-03-18 | 2021-03-18 | Pedestrian tracking method based on YOLOv3 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112884810B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107563313A (en) * | 2017-08-18 | 2018-01-09 | 北京航空航天大学 | Multiple target pedestrian detection and tracking based on deep learning |
WO2018133666A1 (en) * | 2017-01-17 | 2018-07-26 | 腾讯科技(深圳)有限公司 | Method and apparatus for tracking video target |
CN108509859A (en) * | 2018-03-09 | 2018-09-07 | 南京邮电大学 | A kind of non-overlapping region pedestrian tracting method based on deep neural network |
CN110516705A (en) * | 2019-07-19 | 2019-11-29 | 平安科技(深圳)有限公司 | Method for tracking target, device and computer readable storage medium based on deep learning |
WO2019237536A1 (en) * | 2018-06-11 | 2019-12-19 | 平安科技(深圳)有限公司 | Target real-time tracking method and apparatus, and computer device and storage medium |
CN111126152A (en) * | 2019-11-25 | 2020-05-08 | 国网信通亿力科技有限责任公司 | Video-based multi-target pedestrian detection and tracking method |
CN111241931A (en) * | 2019-12-30 | 2020-06-05 | 沈阳理工大学 | Aerial unmanned aerial vehicle target identification and tracking method based on YOLOv3 |
CN111582062A (en) * | 2020-04-21 | 2020-08-25 | 电子科技大学 | Re-detection method in target tracking based on YOLOv3 |
-
2021
- 2021-03-18 CN CN202110290409.1A patent/CN112884810B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018133666A1 (en) * | 2017-01-17 | 2018-07-26 | 腾讯科技(深圳)有限公司 | Method and apparatus for tracking video target |
CN107563313A (en) * | 2017-08-18 | 2018-01-09 | 北京航空航天大学 | Multiple target pedestrian detection and tracking based on deep learning |
CN108509859A (en) * | 2018-03-09 | 2018-09-07 | 南京邮电大学 | A kind of non-overlapping region pedestrian tracting method based on deep neural network |
WO2019237536A1 (en) * | 2018-06-11 | 2019-12-19 | 平安科技(深圳)有限公司 | Target real-time tracking method and apparatus, and computer device and storage medium |
CN110516705A (en) * | 2019-07-19 | 2019-11-29 | 平安科技(深圳)有限公司 | Method for tracking target, device and computer readable storage medium based on deep learning |
CN111126152A (en) * | 2019-11-25 | 2020-05-08 | 国网信通亿力科技有限责任公司 | Video-based multi-target pedestrian detection and tracking method |
CN111241931A (en) * | 2019-12-30 | 2020-06-05 | 沈阳理工大学 | Aerial unmanned aerial vehicle target identification and tracking method based on YOLOv3 |
CN111582062A (en) * | 2020-04-21 | 2020-08-25 | 电子科技大学 | Re-detection method in target tracking based on YOLOv3 |
Non-Patent Citations (2)
Title |
---|
任珈民;宫宁生;韩镇阳;: "基于YOLOv3与卡尔曼滤波的多目标跟踪算法", 计算机应用与软件, no. 05 * |
王超;苏湛;: "基于Kalman和Surf的Camshift目标跟踪研究", 软件导刊, no. 01 * |
Also Published As
Publication number | Publication date |
---|---|
CN112884810B (en) | 2024-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105005766B (en) | A kind of body color recognition methods | |
US9020195B2 (en) | Object tracking device, object tracking method, and control program | |
US7869631B2 (en) | Automatic skin color model face detection and mean-shift face tracking | |
US7218759B1 (en) | Face detection in digital images | |
US6404900B1 (en) | Method for robust human face tracking in presence of multiple persons | |
CN102426649B (en) | Simple steel seal digital automatic identification method with high accuracy rate | |
CN109271937B (en) | Sports ground marker identification method and system based on image processing | |
JP4373840B2 (en) | Moving object tracking method, moving object tracking program and recording medium thereof, and moving object tracking apparatus | |
CN111161313B (en) | Multi-target tracking method and device in video stream | |
CN108470356B (en) | Target object rapid ranging method based on binocular vision | |
EP1300804A2 (en) | Face detecting method by skin color recognition | |
CN107230219B (en) | Target person finding and following method on monocular robot | |
CN113052170B (en) | Small target license plate recognition method under unconstrained scene | |
CN109961016B (en) | Multi-gesture accurate segmentation method for smart home scene | |
CN110991398A (en) | Gait recognition method and system based on improved gait energy map | |
CN111242074A (en) | Certificate photo background replacement method based on image processing | |
CN111310768A (en) | Saliency target detection method based on robustness background prior and global information | |
CN110956184A (en) | Abstract diagram direction determination method based on HSI-LBP characteristics | |
Zeng et al. | Adaptive foreground object extraction for real-time video surveillance with lighting variations | |
CN107194954B (en) | Player tracking method and device of multi-view video | |
Hu et al. | Fast face detection based on skin color segmentation using single chrominance Cr | |
CN110310303B (en) | Image analysis multi-target tracking method | |
Tan et al. | Gesture segmentation based on YCb'Cr'color space ellipse fitting skin color modeling | |
CN112884810B (en) | Pedestrian tracking method based on YOLOv3 | |
JP4181313B2 (en) | Scene content information adding device and scene content information adding program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |