CN112884810A

CN112884810A - Pedestrian tracking method based on YOLOv3

Info

Publication number: CN112884810A
Application number: CN202110290409.1A
Authority: CN
Inventors: 张德慧; 张德育; 吕艳辉; 徐子睿
Original assignee: Shenyang Ligong University
Current assignee: Shenyang Ligong University
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-06-01
Anticipated expiration: 2041-03-18
Also published as: CN112884810B

Abstract

The invention provides a pedestrian tracking method based on YOLOv3, and relates to the field of target tracking application in computer vision. The method comprises three parts of target detection, target matching and target prediction. The target detection part realizes the identification of all pedestrians in the field of view by using YOLOv 3; the target matching part matches the result of target identification with the template based on the color characteristics of the object, and locks the target to be tracked; the target prediction part predicts the position of the next frame of the target and reduces the detection range, thereby realizing the improvement of the tracking accuracy. The invention can track a selected pedestrian in the video with a single target, the tracking precision reaches about 99%, the tracking speed reaches about 22 frames per second, and the real-time requirement can be met.

Description

Pedestrian tracking method based on YOLOv3

Technical Field

The invention relates to the field of target tracking application in computer vision, in particular to a pedestrian tracking method based on YOLOv 3.

Background

The YOLO (You Only Look one) algorithm is a target detection algorithm based on deep learning proposed by Redmon et al in 2016, and is unique in the neighborhood of the deep learning target detection algorithm with the advantages of simple network, high detection speed and the like. YOLOv3 is the third version of its family of algorithms, which is popular in the industry with high stability.

As an important application field of computer vision, target tracking develops through the following three stages: early classical target tracking algorithms, correlation filtering based target tracking algorithms and today's deep learning based target tracking algorithms, wherein the deep learning based target tracking algorithms become the mainstream algorithms of today's target tracking technologies. In 2012, a target detection algorithm based on deep learning has obtained great success in the fields of image recognition and the like, wherein the most representative is an AlexNet network. Taking the distance as a boundary, the target detection and tracking algorithm based on deep learning begins to show the head and corner completely in the field of computer vision. The deep learning algorithm achieved very excellent performance on the VOT (Visual-Object-Tracking Change) 2017 tournament. Although the processing speed of the target tracking algorithm based on deep learning is not higher than that of the target tracking algorithm based on the relevant filtering, the tracking accuracy rate of the target tracking algorithm based on deep learning presents a rolling posture, and the target tracking algorithm based on the relevant filtering and the early classical algorithm are far better.

Disclosure of Invention

Aiming at the problems of low single-target tracking accuracy, low real-time performance and the like of a target tracking algorithm based on deep learning, the invention provides a pedestrian tracking method based on YOLOv3, which realizes high accuracy and meets the requirement of real-time target tracking.

The invention has the technical scheme that the pedestrian tracking method based on the YOLOv3 specifically comprises the following steps:

step 1: a user manually selects a tracking target in a first frame of a video to be detected by using machine vision software, and the tracking target is used as a template;

pausing the video at a first frame, and waiting for a user to manually frame a target to be tracked; pressing a left mouse button to open drawing authority; moving a mouse and selecting a target to be tracked in a frame; lifting a left mouse button and closing the drawing authority; the drawing result is confirmed and used as a template.

Step 2: detecting all pedestrians in the video by using a YOLOv3 algorithm;

setting a 'person' label as a detection standard by adopting a COCO data set preset by the Yolo official, and only reserving a target with the label of 'person'; all objects labeled "person" in the video, i.e., all pedestrians, are detected using the YOLOv3 algorithm.

Step 2.1: adjusting the size of an input picture to be a fixed size;

step 2.2: detecting an object specified by the COCO data set through a Darknet-53 neural network;

step 2.3: setting size classification standards, and outputting in three branches, namely large, medium and small according to different target sizes;

and step 3: matching all pedestrian detection results with a tracking target template by using a color histogram algorithm, and locking a tracking target;

step 3.1: analyzing the image color characteristics by using a color histogram algorithm;

adjusting the sizes of two pictures to be compared to be consistent, calculating and counting color histograms, respectively counting three color histograms of R, G and B for three-channel color pictures, and finally calculating the Babbitt coefficient, wherein the calculation formula of the Babbitt coefficient is

Wherein rho is a Papanicolaou coefficient, P and P 'are color histograms of the two pictures respectively, P (i) and P' (i) are ith components of the color histograms of the two pictures respectively, and N is the total number of the components;

step 3.2: and calculating the color difference between all pedestrians and the template in the detection result and calculating the Papanicolaou coefficient of the pedestrians by taking the image color characteristics as matching conditions, and taking the pedestrian detection result with the maximum Papanicolaou coefficient, namely the minimum image difference as a tracking target.

And 4, step 4: predicting the position of a tracking target in the next frame of the video by using a K neighborhood algorithm, reducing the detection range and improving the tracking accuracy;

using a K neighborhood algorithm for a pedestrian detection tracking result of the current frame; the K neighborhood is based on a target rectangular frame detected by a current frame, an adjacent area is searched for in a next frame by taking the rectangular frame as a reference, and a center point of the search rectangle is coincided with a coordinate position of a center point of the basic rectangular frame, as shown in the following formula:

wherein W_searchWidth, H, representing a rectangular search area_searchRepresents the height of a rectangular search area; w_objectWidth, H, of the rectangular region representing the object in the previous frame_objectAnd K is the expansion ratio of the prediction frame relative to the detection frame.

And 5: judging whether the video is finished:

if the video is not processed, jumping to the step 2; and if the video is processed, ending the program and completing the pedestrian tracking.

The beneficial effects produced by adopting the technical method are as follows:

the invention provides a pedestrian tracking method based on YOLOv3, which greatly improves the tracking accuracy of a single target, and the target can be freely specified by a user and has strong universality. The defects of low tracking accuracy, poor real-time performance and single tracking target in the prior art are overcome.

Drawings

FIG. 1 is a flow chart of a pedestrian tracking method based on YOLOv3 in the invention

FIG. 2 is a schematic diagram of a statistical histogram of colors according to the present invention;

FIG. 3 is a flow chart of a K neighborhood algorithm in an embodiment of the present invention;

FIG. 4 is a diagram illustrating the K neighborhood algorithm results of the present invention;

FIG. 5 is a schematic diagram illustrating the effect of the pedestrian tracking method based on YOLOv3 according to the present invention;

FIG. 6 is a flow chart of the YOLOv3 algorithm in an embodiment of the present invention;

fig. 7 is a flowchart of a color histogram algorithm in an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

The invention provides a pedestrian tracking method based on YOLOv3, a flow chart of which is shown in figure 1 and comprises the following steps:

pausing the video in a first frame, manually framing a target to be tracked in the first frame of the video by a user, and opening a drawing authority by pressing a left mouse button; then, the mouse is moved in a state that the left button of the mouse is pressed down, the selected frame wraps the target as much as possible, and interference caused by background information is reduced; and lifting a left mouse button after the frame selection is finished, closing the drawing authority, and taking the frame-selected part as a template.

In this embodiment, simulation is performed on an R740 server, where a CPU is Inter Xeon, a GPU is Nvidia Titan X, an operating system is a Linux operating system Ubuntu server 16.04, and an OpenCV version is 3.4.5;

step 2: all pedestrians in the video were detected using the YOLOv3 algorithm, as shown in fig. 6;

Step 2.1: adjusting the size of the input picture to a fixed size, in this embodiment, the size is 416 × 416;

step 3.1: analyzing the image color characteristics by using a color histogram algorithm, as shown in fig. 7;

the sizes of two pictures to be compared are adjusted to be consistent, a statistical color histogram is calculated, and for the three-channel color picture, three color histograms of R, G and B are respectively calculated, as shown in FIG. 2, R is a red component of a color image pixel value, G is a green component of the color image pixel value, and B is a blue component of the color image pixel value. Finally, calculating the Babbitt coefficient, wherein the larger the obtained Babbitt coefficient is, the more similar the pictures are; the calculation formula of the Babbitt coefficient is

Wherein ρ is a babbitt coefficient, P and P 'are color histograms of the two pictures, P (i) and P' (i) are ith components of the color histograms of the two pictures, N is a total number of the components, and the value of N in this embodiment is 256. And taking the candidate block diagram with the minimum difference as a tracking target, and displaying the frame of the candidate block diagram on a screen.

And 4, step 4: predicting the position of a tracking target in the next frame of the video by using a K neighborhood algorithm as shown in FIG. 3, reducing the detection range and improving the tracking accuracy, wherein a result graph is shown in FIG. 4;

wherein W_searchWidth, H, representing a rectangular search area_searchRepresents the height of a rectangular search area; w_objectWidth, H, of the rectangular region representing the object in the previous frame_objectThe height of the target rectangular region in the previous frame is indicated, K is the expansion ratio of the prediction frame to the detection frame, and in this embodiment, K is 2.

The visual field is reduced through a target prediction algorithm, the calculated amount is reduced, and the tracking accuracy is improved.

And 5: judging whether the video is finished:

The final experimental result is shown in fig. 5, the tracking accuracy is 99.5%, and the tracking speed is about 22 frames per second, which meets the requirements of practical application.

Claims

1. A pedestrian tracking method based on YOLOv3 is characterized by comprising the following steps:

pausing the video at a first frame, and waiting for a user to manually frame a target to be tracked; pressing a left mouse button to open drawing authority; moving a mouse and selecting a target to be tracked in a frame; lifting a left mouse button and closing the drawing authority; confirming the drawing result as a template;

step 2: detecting all pedestrians in the video by using a YOLOv3 algorithm;

setting a 'person' label as a detection standard by adopting a COCO data set preset by the Yolo official, and only reserving a target with the label of 'person'; detecting all objects labeled as 'person' in the video by using a YOLOv3 algorithm, namely detecting all pedestrians;

step 2.1: adjusting the size of an input picture to be a fixed size;

and 4, step 4: predicting the position of a tracking target in the next frame of the video by using a K neighborhood algorithm on the pedestrian detection tracking result of the current frame, reducing the detection range and improving the tracking accuracy;

and 5: judging whether the video is finished:

2. The method for pedestrian tracking based on YOLOv3 as claimed in claim 1, wherein step 3 specifically comprises the steps of:

3. The method as claimed in claim 1, wherein the K neighborhood is based on a target rectangle detected in the current frame, and in the next frame, the neighborhood is searched based on the target rectangle, and the coordinate position of the center point of the search rectangle coincides with the coordinate position of the center point of the base rectangle, as shown in the following formula: