CN112101433B

CN112101433B - Automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT

Info

Publication number: CN112101433B
Application number: CN202010924261.8A
Authority: CN
Inventors: 王晨; 周威; 陆振波; 夏井新
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2024-04-30
Anticipated expiration: 2040-09-04
Also published as: CN112101433A

Abstract

The invention discloses a split-lane vehicle automatic counting method based on YOLO V4 and DeepSORT, which comprises the following steps: collecting a YOLO V4 training data set, a vehicle re-identification data set and data enhancement, building a YOLO V4 model, training, building DeepSORT a target tracking model, tracking vehicles, extracting running tracks of each vehicle, building track record files, storing running track information of each vehicle, clustering endpoint coordinates of track data by using a DBSCAN clustering algorithm, associating a cluster with lane information, and realizing a lane dividing counting function of the vehicles according to a change rule of the track data and a corresponding relation between the track and the cluster; the method adopts the YOLOV4+ DeepSORT vehicle detection and tracking model, ensures the real-time performance of vehicle detection and tracking, and greatly improves the accuracy.

Description

Automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT

Technical Field

The invention relates to the field of traffic big data, in particular to an automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT.

Background

The traffic flow parameter extraction is a basic and important task of traffic management and control, and brings convenience to decision and management of traffic managers. At present, some typical traffic flow parameter extraction methods are mainly divided into (1) a method based on coil detection; (2) a manner based on red line sensor detection; (3) detection method based on microwave technology, etc. The coil-based detection mode is a contact type detection mode, and has the defects of troublesome installation and disassembly, poor extraction of track data of vehicles and inapplicability to road sections with serious congestion. The detection mode based on infrared or microwave technology adopts a non-contact detection mode, so that the installation and the disassembly are convenient, but vehicles in different lanes cannot be distinguished, and the detection mode is also limited by road sections with more traffic jams.

In recent years, with the large-area coverage of traffic monitoring, applications such as vehicle track extraction based on traffic monitoring and lane-dividing vehicle counting have been paid attention to, and compared with some traditional traffic flow parameter extraction methods, a video-based detection method has the following advantages:

(1) The non-contact detection mode is convenient to install and disassemble;

(2) The vehicle type of the vehicle and the lane where the vehicle is located can be distinguished;

(3) The method is less limited by conditions such as traffic jams and the like, and vehicles in the traffic jams can be detected and tracked better.

Paper "Real-Time Traffic Flow Parameter Estimation From UAV Video Based on Ensemble Classifier and Optical Flow" is based on unmanned aerial vehicle monitoring videos, HAAR CASCADE and a convolutional neural network are adopted to detect vehicles in video monitoring, then the optical flow theory is used for capturing motion information of the vehicles in the time dimension, finally traffic flow parameters (track, speed and traffic flow) of a monitored road section are extracted according to the motion information, and the method is suitable for specific monitoring videos (unmanned aerial vehicle monitoring videos) but is limited by other monitoring types (such as bayonet monitoring, high-altitude cameras and the like). The paper Vision-based vehicle detection and counting system using DEEP LEARNING IN HIGHWAY SCENES is based on expressway monitoring video, adopts a deep learning YOLO V3 target detection model to detect vehicles, then uses an ORB algorithm to acquire vehicle tracks, and realizes the counting function of different vehicles. The paper Vehicle Count System based on TIME INTERVAL IMAGE Capture Method AND DEEP LEARNING MASK R-CNN adopts a MaskRCNN target detection and segmentation model to Capture and count vehicles on a specific road, but has poor recognition capability on small objects such as non-motor vehicles and the like and cannot guarantee detection speed.

In summary, the main disadvantages of the current research methods are:

(1) Regarding a vehicle position detection algorithm/model, most of current research methods detect vehicles mainly through a background subtraction method or an optical flow method, the detection accuracy is not high, and the accuracy and efficiency of extracting traffic flow parameters are affected; a few researches use a convolutional neural network model with good precision to detect the vehicle, but the convolutional neural network model is large, so that the real-time requirement of detection cannot be well met, and the speed and efficiency of extracting the traffic flow parameters are affected.

(2) As for the motion information extraction of vehicles, most of the current research methods mainly perform vehicle tracking by an optical flow method/image feature matching/kalman filtering method of a detection area, which is suitable for rare road sections of vehicles, but is limited to complex/congested road sections.

(3) Most of the current research methods do not have the functions of track analysis, automatic lane division, automatic lane-dividing vehicle counting and the like, the lane judgment needs to rely on manually set rules, and the manual judgment is time-consuming and labor-consuming for the monitoring pictures of more lanes.

Disclosure of Invention

In order to solve the defects in the background art, the invention aims to provide an automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT, which utilizes a target detection model YOLO V4 to detect the vehicle position in a monitoring video, has the advantages of high detection precision, high detection speed and the like, and on the basis of acquiring the vehicle position by YOLO V4, the vehicle is tracked in the time dimension by using a target tracking model DeepSORT, and as DeepSORT, the position is predicted by using Kalman filtering and the characteristics are extracted by using a re-identification model, compared with the traditional target tracking algorithm/model, the precision is greatly improved;

Meanwhile, when the vehicle track data storage CSV file is built in the background, and the center coordinates of each vehicle detection frame and the corresponding vehicle ID of each frame are recorded. And finally, performing cluster analysis on the track data by using a DBSCAN clustering algorithm to realize the functions of track end point clustering, automatic lane dividing and automatic vehicle counting of each lane.

The aim of the invention can be achieved by the following technical scheme:

an automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT comprises the following steps:

S1, collecting a Yolo V4 training data set, a vehicle re-identification data set and data enhancement;

s2, constructing a YOLO V4 model by using a Pytorch deep learning frame and training;

S3, constructing DeepSORT a target tracking model, training a vehicle characteristic extraction network by vehicle re-identification data, and completing construction of a YOLO V4+ DeepSORT vehicle tracking model by taking a detection frame of YOLO V4 in each frame as input;

s4, tracking vehicles by using a YOLOV4+ DeepSORT model, extracting running tracks of each vehicle, building track record files and storing running track information of each vehicle;

S5, clustering end point coordinates of the track data by using a DBSCAN clustering algorithm, and associating the clustered clusters with the lane information;

S6, according to the change rule of the track data and the corresponding relation between the track and the cluster, the lane dividing counting function of the vehicle is realized.

Further, the specific data set and data enhancement manner in S1 include:

S11. Collection and data enhancement of Yolo V4 training dataset: collecting the labeling pictures and labeling information of cars, trucks, buses and non-motor vehicles in the PASCAL VOC and COCO data set, manually labeling the types and the position information of vehicles in 2000 monitoring video frames with different visual angles, adopting a data enhancement mode of random cutting, random overturning, randomly adjusting picture parameters of saturation, hue and brightness, enhancing mosaic data, enhancing mixed cutting data, and uniformly scaling the picture data to 608x608 resolution;

s12, collecting and enhancing a vehicle re-identification data set: the data set is collected VeRi, the vehicle feature extraction model in DeepSORT is trained, and the data enhancement mode adopted is random clipping.

Further, the process of constructing and training YOLO V4 in S2 is as follows:

S21.yolo V4 consists of: 1. feature extraction network CSPDARKNET; 2. the multi-scale feature fusion network PAN and the spatial pyramid pooling SPP; 3. a head network resembling the YOLO V3 model for classification and detection frame regression; sequentially constructing CSPDARKNET53 feature extraction networks by using Pytorch deep learning frames, carrying out feature fusion on three different width and height feature graphs output by the feature extraction networks through SSP+PAN, and finally enabling the three different width and height feature graphs obtained after the feature fusion to pass through a 1x1 convolutional neural network once to obtain an output result of YOLO V4;

S22, training according to the output of the network and the real label set loss function of the data set on the basis of the YOLO V4 built in the S21, inheriting the cross entropy loss of the YOLO V3 by the object classification loss, and updating the network parameters of the YOLO V4 by using a back propagation algorithm after the loss function is set;

S23, setting the YOLO V4 super parameters in the training process as follows: an Adam optimizer was selected, the initial learning rate was set to 1e-5, the dataset training round was set to 50, and the batch size was set to 16.

Further, the building and training process of the DeepSORT target tracking model in S3 specifically includes:

S31, taking the size and position information of a candidate frame output by YOLO v4 as input, and building three components of DeepSORT: (1) The kalman filtering algorithm is used as a position predictor and comprises two stages:

(1.1) prediction stage: when the target moves, predicting the speed and position information of the target in the current frame according to the speed and position information of the target in the previous frame;

(1.2) in the updating stage, according to the predicted value and the observed value captured by the algorithm, obtaining the state of the current system through linear weighting of two normal distributions;

(2) Training and testing a small residual error network as a feature extractor, training the small residual error network by using ReID data sets, using cross entropy as a training loss function, setting the training round to be 50 rounds, selecting an Adam optimizer by the optimizer, setting the initial learning rate to be 0.0001, and after training, scaling a vehicle picture in a YOLO V4 detection frame to 112 pixels by 112 pixels to be used as an input to obtain a 128-dimensional low-dimensional vector for calculating the similarity at the back;

(3) And after the similarity of vectorization of the detection frames is calculated by using the cosine distance, matching vehicles in the detection frames in the front frame and the rear frame by using the Hungary algorithm, wherein the vehicles with high matching degree are identified as the same vehicle, and uniform ID numbers are allocated.

Further, the specific step of S4 is as follows:

s41, obtaining position information of each vehicle in each frame, and distributing a unique identifier for each vehicle;

s42, replacing the vehicle with the center of the detection frame, and drawing the track of the same ID vehicle in time;

S43, creating a CSV file as a track record file, and importing the ID information of all vehicles and the vehicle track information into the CSV file in real time.

Further, the specific step of S5 is as follows:

S51, selecting a track end point coordinate to perform track clustering;

s52, importing track data in the CSV file and acquiring end point coordinates of all the track data;

S53, clustering the end point coordinates of the track data by using a DBSCAN clustering algorithm;

S54, analyzing the tracks of which the track end points fall into the same cluster, connecting the gravity centers of the track start point positions and the gravity centers of the end point positions to realize the identification and division of the lanes corresponding to the cluster, and if a plurality of obvious differences occur at the start points of the tracks corresponding to a certain cluster, respectively connecting the gravity centers of the start points with larger differences to the gravity centers of the end points to realize the association of the cluster and the lane information;

S54, after the cluster and the lane information are associated, generating a piece of complete track data, and distributing the track data to the lane corresponding to the cluster to which the end point coordinates belong.

Further, the step S6 of implementing a lane dividing counting function of the vehicle specifically includes:

s61, if track data of a certain vehicle are not updated in the next 10 frames of video segments, storing the track data into a CSV file;

S62, distributing newly added track data in the CSV file to a cluster closest to the CSV file according to the terminal point coordinates;

S63, if one end point coordinate data is newly added in one cluster, increasing the traffic flow of the lane corresponding to the cluster to generate the standard vehicle equivalent number of the track vehicle, and realizing the lane-dividing vehicle counting function.

The invention has the beneficial effects that:

1. the method adopts the YOLOV4+ DeepSORT vehicle detection and tracking model, ensures the real-time performance of vehicle detection and tracking, and greatly improves the accuracy;

2. When the method is used, track end positions replace track data, DBSCAN is used for clustering the track end positions, the track data is clustered, after clustering is finished, matching correspondence between clusters and lanes is carried out according to the cluster positions where the tracks fall in and the track start positions, and finally, the automatic counting function of the lane-dividing vehicles is completed according to the matched lanes and track analysis.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a general flow chart of the present invention;

FIG. 2 is a representation of a portion of a vehicle re-identification dataset of the present invention;

FIG. 3 is a diagram of the construction of YOLO V4 of the present invention;

FIG. 4 is a schematic representation of the location and ID of the vehicle in each frame of the present invention;

FIG. 5 is a schematic representation of a vehicle trajectory of the present invention;

FIG. 6 is a graph of the detection effect of the YOLO V4 vehicle of the present invention;

FIG. 7 is a graph of the vehicle tracking effect of YOLOV4+ DeepSORT of the present invention;

FIG. 8 is a vehicle track extraction effect diagram of the present invention;

FIG. 9 is a vehicle track endpoint profile of the invention 4;

FIG. 10 is a cluster distribution diagram after DBSCAN clustering in accordance with the present invention;

FIG. 11 is a cluster-to-lane correspondence map of the present invention;

fig. 12 is a lane-splitting vehicle counting function implementation of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be understood that the terms "open," "upper," "lower," "thickness," "top," "middle," "length," "inner," "peripheral," and the like indicate orientation or positional relationships, merely for convenience in describing the present invention and to simplify the description, and do not indicate or imply that the components or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.

An automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT, as shown in fig. 1, comprises the following steps:

S11. Collection and data enhancement of Yolo V4 training dataset: and collecting labeling pictures and labeling information of all cars, trucks, buses and non-motor vehicles in the PASCAL VOC and COCO data set, and manually labeling the types and the position information of the vehicles in 2000 monitoring video frames with different visual angles. The adopted data enhancement mode mainly comprises random clipping, random overturning, random adjustment of picture parameters such as saturation, tone, brightness and the like, mosaic data enhancement, mixed clipping data enhancement and the like; the picture data is uniformly scaled to 608x608 resolution;

S12, collecting and enhancing a vehicle re-identification data set: collecting VeRi data sets (37778 training pictures and 11579 measured data sets) for training DeepSORT a vehicle characteristic extraction model, wherein the adopted data enhancement mode is mainly random cutting, and part of the data sets are shown in fig. 2 (hereinafter, the data sets are photographed at different angles of the same vehicle);

S21.yolo V4 consists of: (1) a feature extraction network CSPDARKNET53; (2) The multi-scale feature fusion network PAN and the spatial pyramid pooling SPP; (3) A head network resembling the YOLO V3 model for classification and detection frame regression; the specific structure diagram is shown in the following figure 3, a Pytorch deep learning framework is used for sequentially constructing CSPDARKNET feature extraction networks, three different width and height feature graphs (width and height are 19x19,38x38 and 76x 76) output by the feature extraction networks are subjected to feature fusion through SSP+PAN, and finally the three different width and height feature graphs obtained after the feature fusion are subjected to one-time 1x1 convolutional neural network to obtain an output result of YOLO V4;

s22, training according to the output of the network and the real label set loss function of the data set on the basis of the YOLO V4 built in the S21, wherein the YOLO V4 model learns the position and the size of a detection frame by adopting CIOU loss function, and compared with the MSE loss function adopted in the YOLO V3, the speed is faster and the precision is higher. The object classification loss still inherits the cross entropy loss of the YOLO V3, and after the loss function is set, the backward propagation algorithm is used for updating the parameters of the YOLO V4 network.

S23, setting some super parameters of YOLO V4 in the training process as follows: selecting an Adam optimizer, and setting the initial learning rate to be 1e-5; the data set training round is set to 50; batch size was set to 16;

s31, taking the size and position information of a candidate frame output by YOLO v4 as input, and building three components of DeepSORT: (1) The Kalman filtering algorithm is used as a position predictor, and a uniform motion and linear observation model of the Kalman filter is used; the method is mainly divided into two stages,

(2) The small residual error network is used for training and testing the feature extractor. The mini-residual network was trained using ReID datasets, the training round was set to 50 rounds, the optimizer selected Adam optimizer, and the initial learning rate was set to 0.0001 using cross entropy as the training loss function. After training, the vehicle picture in the YOLO V4 detection frame is scaled to 112 pixels as input to obtain a 128-dimensional low-dimensional vector for subsequent similarity calculation.

(3) The Hungary algorithm is used as a feature matcher, after the vectorized approximation degree of the detection frame is calculated by using the cosine distance, vehicles in the detection frames in the front frame and the rear frame are matched by using the Hungary algorithm, the vehicles with high matching degree are identified as the same vehicle, and uniform ID numbers are allocated;

S41, obtaining position information (the position of a detection frame) of each vehicle in each frame, and allocating a unique identifier to each vehicle, as shown in the following figure 4;

S42, replacing the vehicle with the center of the detection frame, and drawing the track of the same ID vehicle in time, as shown in FIG. 5;

S43, creating a CSV file as a track record file, and importing the ID information of all vehicles and the vehicle track information into the CSV file in real time;

s51, the lengths of different track data are different, and the problems that the distance between tracks is difficult to calculate and the like occur when the track data are clustered directly, so that the track end point coordinates (based on an image pixel coordinate system) are selected to replace the track data for track clustering;

S54, after the cluster and the lane information are associated, generating a piece of complete track data, wherein the track data are distributed to lanes corresponding to the cluster to which the terminal coordinates belong;

S6, according to the change rule of track data (whether updating is stopped or not in a period of time) and the corresponding relation between the track and the cluster, the lane dividing counting function of the vehicle is realized.

S63, if one end point coordinate data is newly added in one cluster, increasing the traffic flow of the lane corresponding to the cluster to generate the standard number of vehicle equivalent of the track vehicle, and realizing the lane-dividing vehicle counting function.

In fig. 6, there are shown the YOLO V4 detection effect diagrams under four different road environments/weather environments, and it can be seen from the detection diagram of the upper left corner (heavy fog) that the effect of our model on vehicle detection under foggy weather environment is still good, the upper right corner (night) shows the detection effect diagram of vehicle under night environment, and the YOLO V4 model still ensures the accuracy of vehicle detection under the luminance environment. The following two figures represent two common road environments, namely, congested traffic and normal traffic. The model ensures the detection precision under the road environments, and rarely has the conditions of repeated detection, missed detection and the like.

Fig. 7 shows DeepSORT that on the basis of YOLO V4 vehicle detection, a unique ID identifier is allocated to each vehicle detection frame, and the ID identifier of the same vehicle is not changed in the time dimension along with video playing, so that tracking of the vehicle is realized. Fig. 8 is a diagram showing the vehicle track extraction by selecting the center position of the vehicle detection frame instead of detecting the vehicle and connecting the vehicle detection center positions in the time dimension based on the vehicle tracking.

Fig. 9 (right) depicts the end points of the completed (not updated) vehicle track on a graph, and it can be clearly seen that the distribution of the end points of the track exhibits the characteristics of smaller intra-class distances and larger inter-class distances. We clustered the trace endpoint profile using a DBSCAN clustering algorithm to obtain the clustered profile of fig. 10. The method analyzes the tracks of which the track end points fall into the same cluster, connects the gravity centers of the track start point positions and the gravity centers of the track end point positions, realizes the identification and division of the lanes corresponding to the cluster, and if a plurality of obvious differences occur at the start points of the tracks corresponding to a certain cluster, respectively connects the gravity centers of the start points with larger differences to the gravity centers of the end points, so as to obtain the corresponding relation between the cluster and the lanes as shown in figure 11. And finally, storing the track into a CSV file according to the fact that the track is not updated, distributing the track to the traffic flow of which the end points fall into the traffic lane corresponding to the cluster, and adding the equivalent standard traffic equivalent traffic flow to the corresponding traffic lane according to the vehicle type generating the track to realize the automatic counting function of the traffic flow of the lane, as shown in fig. 12.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. An automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT is characterized by comprising the following steps:

s6, according to the change rule of track data and the corresponding relation between the track and the cluster, the lane dividing counting function of the vehicle is realized;

the specific steps of the S5 are as follows:

S51, selecting a track end point coordinate to perform track clustering;

2. The YOLO V4 and DeepSORT based lane-splitting vehicle automatic counting method according to claim 1, wherein the data set and data enhancement mode in S1 includes:

3. The automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT according to claim 1, wherein the process of building and training YOLO V4 in S2 is as follows:

s22, training according to the output of the network and the real label set loss function on the basis of the YOLO V4 built in the S21, and updating the network parameters of the YOLO V4 by using a back propagation algorithm after the loss function is set;

4. The automatic lane-splitting vehicle counting method based on YOLO V4 and DeepSORT according to claim 1, wherein the building and training process of the DeepSORT target tracking model in S3 specifically includes:

S31, taking the size and position information of a candidate frame output by YOLO v4 as input, and building three components of DeepSORT:

(1) The kalman filtering algorithm is used as a position predictor and comprises two stages:

5. The automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT according to claim 1, wherein the specific step of S4 is:

6. The automatic lane-dividing vehicle counting method based on YOLO V4 and DeepSORT according to claim 1, wherein the step S6 is implemented as a lane-dividing vehicle counting function, and specifically includes: