CN114898327B

CN114898327B - Vehicle detection method based on lightweight deep learning network

Info

Publication number: CN114898327B
Application number: CN202210250838.0A
Authority: CN
Inventors: 贺宜; 鲁曼可; 曹博; 巴继东; 李泽
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2024-04-26
Anticipated expiration: 2042-03-15
Also published as: CN114898327A

Abstract

The invention discloses a vehicle detection method based on a lightweight deep learning network. The method comprises the steps of collecting original videos of road vehicles to obtain a road vehicle image data set; introducing a PSO particle swarm algorithm, improving a particle fitness function, and optimizing the width and height dimensions of a vehicle marking frame; adopting a distance intersection ratio (DIOU) as an index for measuring the similarity of labels, and optimizing the width and height dimensions of a vehicle prior frame by combining a PSO particle swarm algorithm and a K-means clustering algorithm; adding a depth separable convolution to the YOLOv model, modifying the Res module in the YOLOv model; training a lightweight deep learning network and detecting road vehicle types. The method accelerates the convergence of the K-means clustering algorithm and the PSO particle swarm optimization algorithm to obtain the optimal prior frame size of the road vehicle, assists YOLOv in the deep learning network to generate an accurate target prediction frame, and reduces the number of operation parameters on a large scale on the premise of improving the detection accuracy of the algorithm, so that the detection speed of the algorithm is further improved, and the vehicle types in the traffic scene are detected in real time.

Description

Vehicle detection method based on lightweight deep learning network

Technical Field

The invention belongs to the technical field of image recognition target detection, and particularly relates to a vehicle detection method based on a lightweight deep learning network.

Background

In recent years, as traffic construction further increases, urban road networks become more and more complex, and the detection demands for traffic objects on roads are gradually increased, so that higher requirements are placed on accuracy and detection speed. Meanwhile, with the rapid development of computer hardware, the large-scale computing power of the computer is improved unprecedented, so that the target detection algorithm based on deep learning is gradually becoming the mainstream. However, the depth network model often involves a huge amount of parameter calculation, so that the high requirement of the detection speed required by real-time detection cannot be met, and how to further improve the detection speed of the model becomes the key of optimizing the target detection algorithm under the condition of ensuring the accuracy.

The existing algorithms are mainly classified into two types, namely RCNN series of algorithms of two-stage, the algorithms are divided into two stages, a candidate region is generated in the first stage, then the generated candidate region is input into the second stage, the target objects possibly existing in the candidate region are classified and subjected to frame regression on a candidate frame to adjust the positions of the target objects, and the algorithms comprise RCNN, fast RCNN, FASTER RCNN and the like, while the algorithms ensure the accuracy, but the detection speed is slower; the other type is a one-stage algorithm, which omits the generation stage of a candidate region, and can directly output the class probability and the position coordinates of a target object after processing an input image, wherein the class probability and the position coordinates comprise SSD and YOLO series algorithms, and the series algorithms increase the detection speed, lose part of accuracy and have poor performance in small-range information. The existing network model can distinguish and classify traffic targets in a real detection environment, but cannot meet the increasing detection speed requirement, and the existing larger model has the problems of multiple calculation parameters, large memory occupation and the like, and cannot well meet the requirement of large-scale application.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the vehicle detection method and the system based on the lightweight deep learning network, which solve the problems of more memory, more parameter and complex calculation of a general target detection algorithm based on deep learning by reducing the parameter and simplifying calculation, so that the detection network has higher detection speed, and meanwhile, the target detection network can keep higher detection precision by improving the quality of a priori frame, thereby meeting the requirements of identifying the types and the positions of vehicles in traffic scenes in real time.

The invention provides a method for optimizing the size calculation of a vehicle dataset image priori frame by combining a PSO particle swarm optimization algorithm and a K-means clustering algorithm, and introduces a distance intersection ratio (DIOU) as a similarity index of a vehicle rectangular labeling frame and a vehicle priori frame, accelerates the convergence of the K-means clustering algorithm and the PSO particle swarm optimization algorithm to obtain the optimal vehicle dataset image priori frame size, and assists YOLOv in generating an accurate target prediction frame by a deep learning network. Meanwhile, the residual error module of the YOLOv algorithm is modified by adopting a depth separable convolution network, so that the number of operation parameters can be reduced on a large scale on the premise of improving the detection accuracy of the algorithm, the memory occupied by the model is reduced, the detection speed of the algorithm is further improved, and the type and the position of the vehicle in the traffic scene are detected in real time.

In order to achieve the above object, the technical solution of the present invention is a vehicle detection method based on a lightweight deep learning network, which is characterized by comprising the following steps:

Step 1: collecting an initial video of a vehicle through a road monitoring camera, and transmitting the initial video to a calculation processing host for frame extraction to obtain a plurality of road vehicle images; manually labeling each vehicle labeling frame in each road vehicle image, and further manually labeling the vehicle category in each vehicle labeling frame in each road vehicle image; extracting the width of each vehicle marking frame in each road vehicle image and the height of each vehicle marking frame in each road vehicle image to form a wide-high data set;

Step 2: introducing a PSO particle swarm optimization algorithm, taking the marked frame width and height dimensions as variables to be optimized, improving a particle fitness function, and obtaining optimized K marked frame width and height dimensions by utilizing the global searching capability of the PSO algorithm;

Step 3: taking the optimized width and height dimensions of the K marking frames as initial values of priori frames to be generated by a K-means clustering algorithm, calculating the distance intersection ratio of each marking frame to each generated priori frame, and clustering by the K-means clustering algorithm to generate clustered optimized priori frame width and height dimensions;

Step4: adding depth separable convolution to a Res residual error module of YOLOv < 3 > deep learning network to obtain a lightweight YOLOv < 3 > deep learning network, inputting priori frame width and height data and a vehicle image dataset which are obtained by optimizing a K-means algorithm and a PSO algorithm and are suitable for the dataset into the lightweight YOLOv < 3 > deep learning network for training to obtain a trained lightweight YOLOv < 3 > deep learning network model;

step 5: and transmitting the traffic video acquired in real time to a calculation processing host for frame extraction to obtain a plurality of real-time road vehicle images, and further predicting by using the trained lightweight YOLOv deep learning network model to obtain a vehicle prediction frame in the plurality of real-time road vehicle images and the category of vehicles in the prediction frame.

Preferably, in step1, each vehicle marking frame in each road vehicle image is:

Box_m,n＝(x_m,n,y_m,n,w_m,n,h_m,n),m∈[1,M],n∈[1,N]

the width and height data set in the step 1 is as follows:

Ф＝(v_m,n,h_m,n),m∈[1,M],n∈[1,N]

Wherein Box _m,n represents an nth vehicle marking frame in an mth road vehicle image, Φ represents a data set of width and height dimensions of all vehicle marking frames, x _m,n represents an abscissa of a center point of the nth vehicle marking frame in the mth road vehicle image, y _m,n represents an ordinate w _m,n of a center point of the nth vehicle marking frame in the mth road vehicle image in the road vehicle data set, and h _m,n represents a height of the nth vehicle marking frame in the mth road vehicle image in the road vehicle data set. M represents the number of road vehicle images, N represents the number of vehicle marking frames in each road vehicle image;

in the step1, the vehicle category in each circumscribed rectangular frame of each vehicle in each road vehicle image is as follows:

type_m,n,type_m,n∈[1,3]

wherein, type _m,n =1 indicates that the vehicle type in the nth vehicle marking frame in the mth road vehicle image is a car, type _m,n =2 indicates that the vehicle type in the nth vehicle marking frame in the mth road vehicle image is a bus, and type _m,n =3 indicates that the vehicle type in the nth vehicle marking frame in the mth road vehicle image is a truck;

Preferably, the step of optimizing the PSO algorithm in the step2 to obtain the width and height dimensions of the K marked frames comprises the following steps:

Step 2.1: randomly selecting K marking frames from all the vehicle marking frames of each road vehicle image in the step 1;

Step 2.2: c _k＝(w_k,h_k) which is the wide and high size data of the K marking frames, wherein K epsilon [1, K ] is used as an initial value of the center point position of the particle population of the PSO particle swarm algorithm, and is initialized;

Wherein c _k＝(w_k,h_k) represents the width-height dimension data of the K marking frames selected randomly, an

Step 2.3: initializing the speed V _k =0 of the particles, the individual optimal position P _best (k) and the corresponding individual extremum f (P _best (k)), the group optimal position G _best and the corresponding global extremum f (G _bsst), and the particle group population size is N, namely N particles P _j, j epsilon (1, 2, …, N) of the particle group;

Step 2.4: calculating the distance-to-intersection ratio (DIOU) of each particle p _j to the center point c _k＝(w_k,h_k), improving the particle fitness function, which fitness function is as follows:

step 2.5: comparing the calculation results of the fitness fit, and updating the individual extremum f (P _best (k)) and the individual optimal position P _best (k) of the particle swarm, the global extremum f (G _bsst) and the group optimal position G _best of the particle swarm.

Step 2.6: when the highest iteration times are reached, the algorithm is ended, the optimal group position G _best＝(P₁,P₂,P₃,…,P_K is obtained through PSO algorithm optimization), the positions of particles in the optimal group are correspondingly the optimized K particle coordinates P _k＝(w′_k,h′_k), K epsilon [1, K ], and the optimized marking frame width and height dimensions are obtained.

Preferably, in the step 3, the clustering is performed by a K-means clustering algorithm to generate a priori frame width height after clustering optimization, and the specific process is as follows:

Step 3.1: firstly, reading the width and height dimensions P _k＝(w′_k,h′_k of K marked frames obtained by a PSO particle swarm optimization algorithm), wherein K is E [1, K ] which is used as the initial value of K priori frames to be generated by a K-means clustering algorithm.

Step 3.2: calculating the distance intersection ratio of all the vehicle marking frames of each road vehicle image to K prior frames, further calculating the distance values of all the vehicle marking frames of each road vehicle image to the K prior frames according to an improved distance formula d, comparing the distance values, and classifying marking frames with the smallest distance values with the K prior frames into one type.

The distance intersection ratio (DIOU) formula of all the vehicle marking frames of each road vehicle image and the generated K prior frames is calculated as follows:

The improved distance formula is as follows:

Wherein b, b ^Box represents the center points of the prior frame (anchor frame) and the labeling frame (boundary frame), ρ represents the Euclidean distance of the two centers, and c represents the diagonal distance of the minimum closed frame capable of surrounding the prior frame and the labeling frame at the same time.

Step 3.3: and aiming at the labeling frames of a certain class of prior frames, sequencing the labeling frames according to the width and the height, and taking the intermediate value as a new class of prior frames to update the width and the height of the prior frames.

Step 3.4: and calculating the distance intersection ratio and the distance value of each new prior frame and all the vehicle marking frames of each road vehicle image, and carrying out new classification according to the steps.

Step 3.5: repeating the steps 3.3 and 3.4 until the width and height dimensions of the prior frame are not updated, outputting the K prior frame width and height dimensions (w' _k,h″_k) optimized by the K-means clustering algorithm, and K E [1, K ].

Preferably, the lightweight YOLOv deep learning network described in step 4 is:

The Res residual block in the YOLOv network model is modified with a depth separable convolutional network. The basic module in the original YOLOv network structure is a DBL, and the DBL module is composed of a convolution layer, scale normalization and a Leaky_ relu activation function, wherein the Res module references the residual structure of ResNet. The invention adopts depth separable convolution to replace basic convolution operation in Res module of YOLOv network, and carries out 3X 3 channel-by-channel convolution operation after 1X 1 convolution operation in the first step, the specific step of channel-by-channel convolution is to split the multi-channel feature map of the upper layer to form a plurality of single-channel feature maps, then carries out single-channel convolution operation on the feature maps, and then stacks the feature maps together again, and carries out 1X 1 point-by-point convolution operation again after channel-by-channel convolution operation, which has the function of carrying out weighting operation in the depth direction to obtain a new feature map. The depth separable convolution operation is added into the residual error module of YOLOv, so that the parameter calculation amount in network calculation is greatly reduced, and the memory occupied by the YOLOv algorithm model is also reduced accordingly.

The loss function model of the lightweight deep learning network described in step 4 is:

the design of the penalty function of the YOLOv algorithm is mainly considered from three aspects of boundary frame coordinate prediction error, confidence error of boundary frame and classification prediction error. YOLOv3 the loss function formula can be expressed as:

Wherein G is the number of grids divided by the image, B is the number of predicted boundary frames in each grid, i represents the number of cells, and j represents the number of prior frames (anchor frames); Indicating whether the jth prior frame (anchor frame) of the ith cell is responsible for predicting the object, and taking a value of 1 or 0; /(I) Abscissa representing center point of frame of target of nth vehicle predicted by ith grid of mth image of image training set,/>Ordinate representing center point of frame of object of nth vehicle predicted by ith grid of mth image of image training set,/>Representing the width of the nth vehicle target frame predicted by the ith grid of the mth image of the image training set, and type _m,n,s,i represents the nth vehicle target frame category of the ith grid of the mth image of the image training set, wherein/(I)Representing an nth vehicle target frame category of an ith grid prediction of an mth image of the image training set; /(I)Indicates whether there is no target in the j-th anchor box of the i grids,/>N-th vehicle target frame vehicle class confidence predicted by the mth image and the ith grid of the image training set is represented, and p _i(type_m,n,s,i) represents n-th vehicle target frame vehicle class confidence of the mth image and the ith grid of the image training set is true.

Compared with the prior art, the invention has the beneficial effects that: the PSO particle swarm optimization algorithm is adopted, the width and height dimensions of the marked frame are used as variables to be optimized, the obtained width and height dimensions of the optimized marked frame are used as initial values generated by prior frames of the K-means clustering algorithm, in the calculation of the PSO algorithm and the K-means algorithm, a distance intersection ratio (DIOU) is introduced to replace an intersection ratio (IOU), a distance formula is improved, the distance between the marked frame and the center point of the prior frame can be directly minimized in the calculation process, the problem that the IOU cannot accurately reflect the true degree of the two frames is solved, the convergence speed of network calculation can be accelerated, the prior frame dimensions are more consistent with real values through optimization of the prior frame width and height dimensions, a YOLOv deep learning network is assisted to generate a more accurate predicted frame, and the detection accuracy of the YOLOv algorithm to the vehicle type and the position is improved. And in a residual module of the YOLOv model, the depth separable convolution is utilized to replace basic convolution operation in the model, so that the parameter number participating in calculation is greatly reduced, the memory of the model is reduced, and the detection speed is improved. The technical scheme of the invention can rapidly and accurately identify the vehicles in the traffic video, realizes the real-time position identification operation of traffic targets, and has a certain application prospect.

Drawings

Fig. 1: is a general method flow diagram in the embodiment of the invention;

Fig. 2: a flow chart of a Particle Swarm Optimization (PSO) and K-means clustering algorithm is combined in the embodiment of the invention;

fig. 3: a structure diagram of a residual error module ResNet-DS which is improved in the embodiment of the invention;

fig. 4: a schematic diagram of a YOLOv network structure modified in the example of the present invention is shown.

Detailed Description

The following describes the technical scheme provided by the invention in detail with reference to the accompanying drawings:

As shown in fig. 1, the invention provides a vehicle detection method and system based on a lightweight deep learning network, comprising:

the image deep learning system is characterized by comprising: the road monitoring camera, the calculation processing host and the display screen;

the road monitoring camera, the calculation processing host and the display screen are connected in sequence;

the road monitoring camera is used for collecting an initial image of a road vehicle and transmitting the initial image to the calculation processing host;

The calculation processing host is used for identifying the type of the road vehicle from the initial image of the road vehicle to obtain a prediction frame of the road vehicle, and the corresponding vehicle type and confidence in the prediction frame of the road vehicle;

the display screen is used for displaying a prediction frame of the road vehicle, and the corresponding vehicle category and the confidence level in the prediction frame of the road vehicle.

The computing processing host is configured to: i9-10980XE type CPU; RTX3080 model GPU; a model X299 motherboard; DDR4 3000HZ 16G memory strips; GW-EPS1250DA type power supply;

step 1: the method comprises the steps of collecting an initial video of a vehicle through a road monitoring camera, transmitting the initial video to a calculation processing host for frame extraction, obtaining a plurality of vehicle images, and constructing an image data set of the vehicle. Each vehicle in each image in the manual labeling image data set is externally connected with a rectangular frame to form a labeling frame, and the types of the vehicles are further manually labeled to obtain the width and height size information of the labeling frame;

In the step 1, each vehicle marking frame in each road vehicle image is:

Box_m,n＝(x_m,n,y_m,n,w_m,n,h_m,n),m∈[1,M],n∈[1,N]

the width and height data set in the step 1 is as follows:

Ф＝(w_m,n,h_m,n),m∈[1,M],n∈[1,N]

The PSO algorithm in the step 2 is optimized to obtain the width and height dimensions of the K marked frames, which comprises the following steps:

Step 2.3: initializing the speed V _k =0 of the particles, the individual optimal position P _best (k) and the corresponding individual extremum f (P _best (k)), the group optimal position G _best and the corresponding global extremum f (G _best), and the particle group population size is N, namely N particles P _j, j epsilon (1, 2, …, N) of the particle group;

Step 3: the width and height dimensions of K marking frames obtained after optimizing the PSO particle swarm optimization algorithm are used as initial values of priori frames to be generated by the K-means clustering algorithm, and the distance intersection ratio (DIOU) of each marking frame and each generated priori frame is calculated and clustered to obtain the width and height dimensions of the clustered and optimized priori frames;

and 3, clustering through a K-means clustering algorithm to generate a priori frame width height after clustering optimization, wherein the method comprises the following specific processes:

Step 3.3: aiming at the labeling frames of a certain class of prior frames, sorting the labeling frames according to the width and the height, and taking an intermediate value as a new class of prior frames to update the width and the height of the prior frames;

The distance overlap ratio (DIOU) formula is:

The improved distance formula is as follows:

where b ^Box represents the center points of the prior frame (anchor frame) and the label frame (bounding frame), respectively, ρ represents the euclidean distance of the two centers, and c represents the diagonal distance of the smallest closed frame that can enclose both the prior frame and the label frame. The problem that when the prior frame and the labeling frame are in the containing relation and the separating relation, the prior frame and the labeling frame cannot be clearly represented by the original distance formula can be well solved by adopting the distance cross-over (DIOU) to replace the cross-over (IOU) in the distance formula of the original K-means clustering algorithm.

Meanwhile, the distance between the center points of the prior frame and the labeling frame can be directly minimized by the improved distance formula, so that the convergence of the K-means clustering algorithm can be accelerated.

The problem that the K-means clustering result is greatly influenced by the initial value can be solved by using a PSO particle optimization method through a PSO particle swarm optimization algorithm and K-means clustering algorithm combination method, the prior frame size is finally enabled to be more in line with the true value, and YOLOv detection accuracy is improved through improving prior frame quality.

The lightweight YOLOv deep learning network described in step 4 is:

the Res residual block in the YOLOv network model is modified with a depth separable convolutional network. The basic module in the original YOLOv network structure is a DBL, and the DBL module is composed of a convolution layer, scale normalization and a Leaky_ relu activation function, wherein the Res module references the residual structure of ResNet. The present invention replaces the basic convolution operation with a depth separable convolution in the Res module of YOLOv network.

As shown in fig. 3, in the new residual module ResNet-DS of the YOLOv network, after the first step of 1×1 convolution operation, a 3×3 channel-by-channel convolution operation is performed, where the specific step of the channel-by-channel convolution is to split the multi-channel feature map of the previous layer to form multiple single-channel feature maps, then perform single-channel convolution operation on the feature maps, and then re-stack the feature maps together, and after the channel-by-channel convolution operation, perform another 1×1 point-by-point convolution operation again, which is used to perform a weighting operation in the depth direction, so as to obtain a new feature map. The depth separable convolution operation is added into the residual error module of YOLOv, so that the parameter calculation amount in network calculation is greatly reduced, and the memory occupied by the YOLOv algorithm model is also reduced accordingly.

The parameter operation process participated in the calculation in the improved YOLOv residual module comprises the following specific steps:

According to the convolution operation of the original YOLOv algorithm, the input of a convolution layer is set to be a 3-channel image, the convolution layer has N filters, and each Filter comprises k convolution kernels with the size of 3 multiplied by 3. Thus, the number of parameters N ₁ for the original YOLOv algorithm convolutional layer is:

N₁＝N×k×3×3＝27N

The convolution operation is separable by depth, and for the channel-by-channel convolution operation, the number of channels and convolution kernels should be one-to-one, with 1 convolution kernel being responsible for 1 channel. Therefore, 1 image of 3 channels is convolved channel by channel to generate 3 feature maps. Wherein, 1 Filter only contains 1 convolution kernel with size of 3×3, and the number N _DW of parameters involved in the channel-by-channel convolution part is:

N_DW＝3×3×3＝27

The channel-by-channel convolution independently carries out convolution operation on each channel of the input layer, so that the number of feature images after the channel-by-channel convolution operation is the same as the number of channels of the input layer, but the operation does not use the spatial feature image information therein, so that the original feature images cannot be expanded after the channel-by-channel convolution. The feature map after the channel-by-channel convolution is weighted and combined in the depth direction to generate a new feature map, the convolution kernel size of the point-by-point convolution operation is 1×1×m, M is the number of channels of the previous layer, and since the number of channels of the previous layer is 3, m=3. Since the convolution has N filters and the convolution kernel adopts a1×1 convolution method, the number N _PW of parameters involved in the point-by-point convolution is:

N_PW＝3×1×1×N＝3N

The number of convolutional layer parameters N ₂ of the modified YOLOv algorithm is therefore:

N₂＝N_DW+N_PW＝27+3N

the parameter quantity 27+3N of the convolution operation participated in calculation after the improvement of the depth separable convolution is obviously reduced compared with the parameter quantity 27N of the original operation, and the parameter reduction quantity reaches 87.9% when the N value exceeds 100 because of more filters participated in calculation in the convolution operation, namely the N value is larger. Therefore, the parameter scale of the improved YOLOv algorithm is well improved, the memory occupied by the corresponding improved YOLOv algorithm model is greatly reduced, the training and detection speed of the improved algorithm is also improved, and meanwhile, the accuracy of the modified algorithm identification is still kept at a higher level because the depth separable volume operation is modified only for the residual module.

Step 5: and transmitting the traffic video acquired in real time to a calculation processing host for frame extraction to obtain a plurality of road vehicle images, and further predicting by using the trained lightweight YOLOv deep learning network model to obtain a vehicle prediction frame in the plurality of road vehicle images and the category of vehicles in the prediction frame.

The specific process of the lightweight YOLOv deep learning network detection in the step 5 is as follows:

and inputting the vehicle image acquired by the video into an improved YOLOv network for feature extraction, and performing downsampling in a Darknet backbone network through convolution operation with the step length of 2 for a plurality of times to obtain feature graphs with the three dimensions of 13×13, 26×26 and 52×52. And distributing the K optimized prior frame sizes obtained in the step 3 to three feature maps, wherein after the prior frame is predicted by the feature map with the 13 multiplied by 13 scale, the subsequent candidate frame information is directly obtained after further convolution operation. The 26×26 scale feature map is obtained by up-sampling the 13×13 scale feature map, adding the up-sampled feature map to the 26×26 scale feature map, and outputting the subsequent candidate frame information through a plurality of convolution operations. The feature map of 52×52 scale is firstly up-sampled, then added with the feature map of 52×52 scale, and then the subsequent candidate frame information is output through a plurality of convolution operations. And screening the candidate frames by adopting a Soft-NMS suppression algorithm according to the generated candidate frames, and finally outputting a high-precision vehicle prediction boundary frame and a vehicle category.

The above examples are merely illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the protection scope of the present invention without departing from the design spirit of the present invention.

Claims

1. The vehicle detection method based on the lightweight deep learning network is characterized by comprising the following steps of:

Step 5: transmitting the traffic video acquired in real time to a calculation processing host for frame extraction to obtain a plurality of real-time road vehicle images, and further predicting by using a trained lightweight YOLOv3 deep learning network model to obtain a vehicle prediction frame in the plurality of real-time road vehicle images and the category of vehicles in the prediction frame;

In the step 1, each vehicle marking frame in each road vehicle image is:

Box_m,n＝(x_m,n,y_m,n,w_m,n,h_m,n),m∈[1,M],n∈[1,N]

the width and height data set in the step 1 is as follows:

Ф＝(w_m,n,h_m,n),m∈[1,M],n∈[1,N]

Wherein Box _m,n represents an nth vehicle marking frame in an mth road vehicle image, Φ represents a data set of width and height dimensions of all vehicle marking frames, x _m,n represents an abscissa of a center point of the nth vehicle marking frame in the mth road vehicle image, y _m,n represents an ordinate w _m,n of a center point of the nth vehicle marking frame in the mth road vehicle image in the road vehicle data set, and h _m,n represents a height of the nth vehicle marking frame in the mth road vehicle image in the road vehicle data set; m represents the number of road vehicle images, N represents the number of vehicle marking frames in each road vehicle image;

type_m,n,type_m,n∈[1,3]

Step 2.4: calculating the distance overlap ratio DIOU of each particle p _j to the center point c _k＝(w_k,h_k), the particle fitness function is improved as follows:

Step 2.5: comparing the calculation results of the fitness fit, and updating the individual extremum f (P _best (k)) and the individual optimal position P _best (k) of the particle swarm, the global extremum f (G _best) and the group optimal position G _best of the particle swarm;

step 2.6: when the highest iteration times are reached, the algorithm is ended, the optimal group position G _best＝(P₁,P₂,P₃,…,P_K is obtained through PSO algorithm optimization), the positions of particles in the optimal group are correspondingly the optimized K particle coordinates P _k＝(w'_k,h'_k), K epsilon [1, K ], and the optimized marking frame width and height dimensions are obtained.

2. The method for vehicle detection based on a lightweight deep learning network of claim 1, wherein,

step 3.1: firstly, reading the width and height dimensions P _k＝(w'_k,h'_k of K marking frames obtained by a PSO particle swarm optimization algorithm), wherein K is E [1, K ] which is used as the initial value of K priori frames to be generated by a K-means clustering algorithm;

Step 3.2: calculating the distance intersection ratio of all the vehicle marking frames of each road vehicle image to K prior frames, further calculating the distance values of all the vehicle marking frames of each road vehicle image to the K prior frames according to an improved distance formula d, comparing the distance values, and classifying marking frames with the minimum distance values with the K prior frames into one type;

Calculating the distance intersection ratio of all vehicle marking frames of each road vehicle image and the generated K prior frames, wherein the distance intersection ratio DIOU formula is as follows:

The improved distance formula is as follows:

B ^Box represents the center points of the prior frame and the labeling frame, ρ represents the Euclidean distance between the two centers, and c represents the diagonal distance of the minimum closed frame capable of surrounding the prior frame and the labeling frame simultaneously;

step 3.4: calculating the distance intersection ratio and the distance value of each new prior frame and all vehicle marking frames of each road vehicle image, and carrying out new classification according to the steps;

3. The method for vehicle detection based on a lightweight deep learning network of claim 1, wherein,

The lightweight YOLOv deep learning network described in step 4 is:

Modifying YOLOv a Res residual module in the network model with a depth separable convolutional network; the basic module in the original YOLOv network structure is DBL, and the DBL module is composed of a convolution layer, scale normalization and a Leaky_ relu activation function, wherein the Res module references the residual structure of ResNet; the Res module of YOLOv network adopts depth separable convolution to replace basic convolution operation, and after the first step of 1X 1 convolution operation, 3X 3 channel-by-channel convolution operation is carried out, wherein the specific step of channel-by-channel convolution is to split the multi-channel feature map of the upper layer so as to form a plurality of single-channel feature maps, then the feature maps are subjected to single-channel convolution operation and then are stacked together again, and after the channel-by-channel convolution operation, 1X 1 point-by-point convolution operation is carried out again, and the effect is that weighting operation is carried out in the depth direction so as to obtain a new feature map; adding depth separable convolution operation into the residual error module YOLOv to greatly reduce the parameter calculation amount in network calculation, so that the memory occupied by the YOLOv algorithm model is reduced;

The design of the loss function of YOLOv algorithm is considered from three aspects of boundary frame coordinate prediction error, confidence coefficient error of boundary frame and classification prediction error; YOLOv3 the loss function formula can be expressed as:

Wherein G is the number of grids divided by the image, B is the number of predicted boundary frames in each grid, i represents the number of cells, and j represents the number of prior frames; Indicating whether the jth prior frame of the ith cell is responsible for predicting an object, wherein the value is 1 or 0; /(I) Abscissa representing center point of frame of target of nth vehicle predicted by ith grid of mth image of image training set,/>Ordinate representing center point of frame of object of nth vehicle predicted by ith grid of mth image of image training set,/>Representing the width of the nth vehicle target frame predicted by the ith grid of the mth image of the image training set, and type _m,n,s,i represents the nth vehicle target frame category of the ith grid of the mth image of the image training set, wherein/(I)Representing an nth vehicle target frame category of an ith grid prediction of an mth image of the image training set; /(I)Indicates whether there is no target in the j-th anchor box of the i grids,/>N-th vehicle target frame vehicle class confidence predicted by the mth image and the ith grid of the image training set is represented, and p _i(type_m,n,s,i) represents n-th vehicle target frame vehicle class confidence of the mth image and the ith grid of the image training set is true.