CN111461083A

CN111461083A - Rapid vehicle detection method based on deep learning

Info

Publication number: CN111461083A
Application number: CN202010452151.6A
Authority: CN
Inventors: 王国栋; 王亮亮; 徐洁
Original assignee: Qingdao University
Current assignee: Qingdao University
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-07-28

Abstract

The invention discloses a rapid vehicle detection method based on deep learning, which comprises the steps of performing feature layer fusion and feature extraction by using a multi-scale cavity convolution module in a detection network through feature extraction of a basic network, performing detection of three different scales, training on a data set to obtain a weight model of a detection algorithm, adding a pruning strategy based on a BN layer on the basis of the model, and performing retraining, thereby obtaining a pruned model. The method has the advantages of small dependence on system environment, less memory, high detection speed and higher detection precision, can be conveniently arranged in an edge end system, and can be widely applied to various engineering applications requiring target detection technology through the training data generation model.

Description

Rapid vehicle detection method based on deep learning

Technical Field

The invention belongs to the technical field of machine vision and deep learning, relates to a target detection technology, and particularly relates to a rapid vehicle detection method based on deep learning.

Background

The rapid vehicle detection method based on deep learning aims to provide a detection algorithm which is simple in environment dependence, small in memory occupation, high in running speed and capable of meeting detection accuracy requirements and is conveniently arranged in an edge end system for vehicles in urban roads. The small-sized system at the edge end can directly process the acquired data, so that the timeliness and the accuracy of information analysis are improved, and the pressure can be reduced for the central data processing system. The algorithm can be applied to the construction of the Internet of things, smart cities and the like.

A two-stage detection algorithm (a region of interest is selected in advance and then further detected) in deep learning (with a region projection module) is high in detection accuracy, but the requirement on hardware equipment is high, the detection speed under the general equipment environment cannot meet the requirement of multi-target high-definition video real-time detection, and the YO L O (You Only L ook one) algorithm solves the problem of object detection as a regression problem, completes the input from an original image to the output of the position and the type of an object based on a single end-to-end network, does not have an explicit region-of-interest extraction process, greatly improves the detection speed, enables the speed of YO L Ov1 to reach 45FPS in an image detection test, simplifies the version to reach 155FPS, meets the detection speed required by high-frame-rate video high-definition (high-frame-rate) real-time detection, but sacrifices much precision until the occurrence of YO L Ov3, greatly improves the detection precision, reduces the speed, can also meet the requirement of real-time detection in a high-performance computer configuration for deep learning, but can hardly meet the requirement of high-speed detection in real-time detection, and can be applied to a system with less requirement for hardware detection under the premise that the high-speed detection, and the lack of the high-speed detection of the edge detection.

CN201910915495.3 relates to a vehicle detection method based on improved YO L Ov3, a convolutional neural network structure between a Darknet layer and three yolo layers is redesigned, a YO L O-TN network is designed by taking the Trident Net weight sharing idea as reference, model pruning is carried out on the YO L O-TN convolutional neural network, a vehicle detection data set is constructed and vehicle position information in the data set is labeled, vehicle detection models based on YO L O-TN and YO L Ov3 are respectively trained to complete vehicle detection tasks, and detection results of the two are compared.

CN201711104408.3 discloses a vehicle detection method based on deep learning, which comprises the steps of firstly training a deep learning network by using a vehicle database, then sending a picture to be detected into the trained network, obtaining class information of the picture through one-time forward propagation, obtaining the weight with the largest weight in parameters according to the class information, superposing the weight with a feature map of the last layer of convolution layer, then carrying out image fusion with the picture to be detected, and finally realizing accurate positioning of a vehicle. The problems of environmental interference, illumination influence, obstacle influence, low accuracy and the like when the traditional image processing algorithm is used for realizing vehicle detection are effectively solved, and the method is applicable to vehicle detection in different scenes.

CN201910065786.8 discloses a vehicle detection method based on deep learning. The method utilizes an R-FCN algorithm to detect the vehicle, avoids the problem that manual features need to be designed in the traditional vehicle detection, and simultaneously improves the accuracy and the robustness. The vehicle detection method comprises the following steps: A. defining a vehicle vision task; B. making a vehicle detection data set; C. determining a shared convolutional network structure; D. optimizing a shared convolution network by adopting a random depth method; E. and training the integral R-FCN model to obtain the final vehicle detection network. F. And testing the vehicle detection network by using the new sample to obtain the detection result of the new sample.

CN201811322079.4 provides a vehicle detection method based on deep learning. Constructing a training set and a verification set; performing data amplification on the training set; constructing a vehicle detection network; and training and predicting the vehicle detection network. The vehicle detection method based on deep learning fully considers the diversity of application scene weather and the complexity of vehicle types, and uses the faster-rcnn network based on the resnet101, so that the vehicle detection speed is ensured, and the vehicle detection accuracy is improved.

CN201810539356.0 discloses a vehicle detection method based on deep learning, which combines Edge Boxes and an improved Faster R-CNN model to detect vehicles in a complex environment, firstly, the Edge Boxes are used to process images, and more accurate vehicle candidate regions are preliminarily extracted; and secondly, inputting the candidate region into an improved Faster R-CNN model to further finely position the vehicle and obtaining a final detection result through classification and judgment. Meanwhile, in order to enhance the detection capability of the model on small-size vehicles and the discrimination capability of the model, convolution features of different layers are combined to supplement detailed information of some vehicles, and a difficult sample mining strategy is added in a training stage, so that the model focuses on difficult samples, and the background of the vehicles and suspected vehicles can be well distinguished.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a rapid vehicle detection method based on deep learning, which is used for solving the problems that a general target detection algorithm based on deep learning occupies a large amount of memory and has low detection speed; the method can be applied to, but not limited to, a road monitoring system for automatically detecting, identifying and uploading the result of the vehicle.

The invention constructs a detection network based on deep learning, and detects a characteristic layer in three scales after a basic network, wherein a multi-scale hole convolution module (MDC module) is used in the detection network part, the module takes the output of the basic network as input, and then uses hole convolution kernels with convolution expansion rates of [1,2 and 5] respectively as 3 × 3 to carry out three-channel convolution operation, then the output and the input of the three channels are subjected to channel fusion together, and the fused characteristic layer is transmitted into the detection network in the three scales to detect and identify a target object, so that the sensitivity of the target object in different scales can be increased, and the recall rate of an algorithm is increased.

The technical scheme provided by the invention is as follows:

a rapid vehicle detection method based on deep learning is characterized in that a multi-scale cavity convolution module is used for fusion and feature extraction of feature layers in a detection network through feature extraction of a basic network, and then detection of three different scales is carried out, so that feature information of a target object in a feature map is enhanced, a feature edge is strengthened, a detection algorithm is sensitive to the target object, and algorithm detection precision is increased; and then training the data set to obtain a weight model of the detection algorithm, and adding a pruning strategy based on a BN layer on the basis of the weight model for retraining so as to obtain a pruned model. The method specifically comprises the following steps:

1) construction of detection algorithms

The method comprises the steps of constructing a basic feature extraction network of a detection algorithm, constructing a basic module DB L of the detection algorithm, wherein the basic module DB L comprises a convolution layer (CONV) plus scale normalization (BN) and an activation function (L eaky re), an RES module refers to a residual module structure and comprises a pixel filling module (padding), a DB L module and a classical residual module, processing an input feature layer by three parallel channels through three types of cavity convolution operation with a convolution kernel size of 3 × by using a classical cavity convolution expansion rate of [1,2,5] after extracting features by the basic network, thereby forming a multi-scale cavity convolution module, inputting the basic network into the multi-scale cavity convolution module after passing through three DB L structures, then dividing a processing result into three scales, performing channel integration and feature fusion on an output result of the multi-scale convolution module by using an FPN structure, performing channel integration and feature fusion on the output result of the multi-scale convolution module by using a DB L structure, finally performing channel integration and feature fusion on an extracted feature graph by three scales, generating a depth value of 3 + 5 n, and a coordinate value of a detection result, and a central prediction frame, and a final detection result of a prediction position, wherein the three scales are obtained by using a detection algorithm, and a prediction frame, and a prediction value, and a prediction result, and a prediction position prediction frame, and a final detection algorithm, and a prediction result are formed by using a prediction value of a prediction frame, and a prediction value of a;

2) using cross entropy loss function

The method comprises the following steps of selecting L eaky Re L U functions as activation functions, accelerating convergence speed and avoiding gradient disappearance, selecting L eaky Re L U functions as loss functions, wherein the loss functions comprise four types of target object position information (x, y), prediction frame width and height (w, h), prediction category information class and prediction target object confidence coefficient, taking mean square errors of the prediction frame width and height relative to the real frame width and height and multiplying the mean square errors by a coefficient 0.5 to balance errors of prediction frames with different sizes and proportion of width and height errors in total errors in order to solve the problem that error proportions of prediction frames with different sizes are the same, directly calculating relative errors by adopting cross entropy loss functions in the remaining three types, and during training, the total loss value is the sum of four loss values, and the loss functions are as follows:

wherein, xy_lossIs the position coordinate error of the object, wh_lossTo predict frame width height error, s²S in (2), (52) and (104) respectively represents that the characteristic diagram under 3 scales is divided into S × S grids (grid cells) which are respectively responsible for predicting the target object falling into a certain grid, B takes a value of 3, which means that each grid has 3 frames (bounding boxes) with different scales,

the object falls into the jth bounding box of the grid i, the value of lambda is 0.5, and (x, y, w, h) is the actual position coordinate and width and height of the detected target object in the labeling data,

position coordinates and width and height of the detected target object predicted in the forward calculation;

3) training algorithm to obtain weight model

Collecting target object pictures in different places, different time periods, different weather conditions and the like as data set original pictures, labeling the target objects in the pictures to generate a data set, and training a constructed deep learning algorithm in the data set by a certain strategy to generate a weight model;

4) performing model clipping on the trained weight model

Using a pruning method of a BN layer, introducing a scaling factor gamma to each channel of the BN layer, multiplying the scaling factor gamma by the output of the channel, training the network weight and the scaling factors in a combined manner, finally directly removing the channel with the small scaling factor, and finely adjusting the pruned network to obtain a weight model after pruning; the objective function adopts the formula:

where (x, y) represents training data and labels, W is the trainable parameter of the network, the first term is the training loss function of CNN, g (γ) is the multiplication term over the scaling factor, λ is the balance factor of the two terms, where g (x) ═ x |, i.e., L1 regularization, which is also widely applied to sparsification;

5) final compression model testing

And inputting the test pictures in the data set into a network, and marking out the target object in the test picture by running the test, thereby realizing the rapid vehicle detection method based on deep learning.

The invention discloses a rapid vehicle detection method based on deep learning, which constructs a rapid vehicle detection algorithm by analyzing the advantages and disadvantages of various target detection algorithms based on deep learning, and using a residual error module, a multi-scale cavity convolution module, a characteristic pyramid structure and the like. The residual error module effectively prevents network degradation problems such as gradient disappearance, gradient explosion and the like, multi-scale cavity convolution increases association of local features and the whole feature graph and strengthens feature information, and the robustness of a detection algorithm to the scale of a target object is enhanced by a feature pyramid structure, so that vehicle detection under complex conditions has higher precision; in view of the characteristics of large generation model and large calculation amount of the deep learning algorithm, the algorithm is difficult to be effectively arranged at the edge end, and in order to make the algorithm more suitable for engineering, a model channel pruning strategy is adopted. Introducing a scaling factor gamma into each channel, multiplying the scaling factor gamma by the output of the channel, training the network weight and the scaling factors jointly, and finally directly removing the channel with the small scaling factor and finely adjusting the pruned network. The detection speed is improved and the memory occupation is reduced under the condition that the detection precision of the final generated model is similar, and a better effect is obtained.

In the invention, a basic feature extraction network of the detection algorithm is constructed, and a multi-scale cavity convolution module and a feature pyramid structure are used in the detection network, so that the performance of the detection algorithm is improved. And then, a pruning strategy based on a BN layer channel is adopted for the improved algorithm, so that the model volume is reduced and the detection speed is increased on the premise of ensuring the detection precision. The method has the advantages of small dependence on system environment, less memory occupation, high detection speed and high detection precision. The method can be conveniently arranged in an edge end system, and the model can be widely applied to various engineering applications requiring target detection technology through training data generation.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a rapid vehicle detection method based on deep learning. The method can be applied to engineering implementation, and a smaller weight model can be obtained by training with enough data sets and compressing the model. The model is easy to arrange in an edge end system, and the vehicle can be detected quickly and accurately.

Drawings

FIG. 1 is a schematic diagram of an overall structural framework of a target detection algorithm provided by the present invention;

the figure is a schematic diagram of the overall network structure of the detection algorithm and is divided into a basic network and a detection network.

Fig. 2 is a supplementary illustration of the structure of the module in the framework diagram of the detection algorithm of fig. 1.

FIG. 3 is an illustration of the structure of the multi-scale hole convolution module (MDC module) of FIG. 1.

FIG. 4 is an illustration of the detection structures of Y1, Y2, and Y3 in FIG. 1.

Fig. 5 is a schematic diagram of a fusion process of the detection results of the three scales in fig. 4.

Fig. 6 is a schematic diagram of the detection result of the detection algorithm.

Detailed Description

The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.

The invention provides a rapid vehicle detection method based on deep learning, which realizes the improvement of the precision of a detection algorithm by constructing a basic network and using a multi-scale cavity convolution module and a characteristic pyramid structure in the detection network. The algorithm is used for training a data set with a vehicle as a target object to obtain a weight model, and then the trained weight model is pruned, so that the compression of a detection algorithm model is realized, and the detection speed of the algorithm is greatly improved.

As shown in fig. 1, after the basic network, the extracted feature layer is input into the MDC module to perform a void convolution operation of 3 scales to further enhance the feature points, and then the channel processing results of three scales and the input are subjected to channel addition processing, and the obtained feature layer is input into the subsequent detection network to perform classification and regression. Fig. 2 shows the modular construction of the algorithm framework. FIG. 3 is a diagram illustrating the structure of a multi-scale hole convolution module according to the present invention. The specific implementation comprises the following steps:

1) and constructing a detection algorithm.

After the characteristics are extracted by the basic network, three types of cavity convolution operations with convolution kernel size of 3 × are divided into three parallel channels to process input characteristic layers, so that a multi-scale cavity convolution module is formed, the basic network is input to the multi-scale cavity convolution module through three DB L structures, then the processing results are divided into three scales, the output results of the multi-scale cavity module are subjected to channel integration and characteristic fusion by taking the FPN structure, the channel fusion, upsampling and DB L structure to perform channel integration and characteristic fusion, finally, the extracted characteristic graph is processed by dividing the three scales, each FPN structure generates three types of characteristic results, the coordinate values of each characteristic layer are as shown in a channel fusion, an IOP 5 coordinate value of the characteristic layer are as shown in a 5, and the coordinate values of the three types of characteristic results are obtained by performing channel fusion and characteristic fusion, and the prediction results of the three types of characteristic results are as shown in a 5, a coordinate value of the same detection position of a detection frame, a coordinate value of the initial position and a coordinate value of a 5, and a prediction position of a horizontal position, and a coordinate value of a prediction are formed by a coordinate value of a reference of a corresponding to be equal to a coordinate value of a reference point of a coordinate value of a reference of a convolution layer (CONV, a reference point of a reference point, a coordinate value of a coordinate.

2) A cross entropy loss function is used.

In order to solve the problem that error values of large frames are larger when error proportions of prediction frames with different sizes are the same, mean square errors of the prediction frame width and the prediction frame height relative to the real frame width and the real frame height are taken and multiplied by a coefficient of 0.5, so that errors of the prediction frames with different sizes and specific gravity of the width and height errors in total errors are balanced, and relative errors are directly calculated by adopting cross-over loss functions in the remaining three categories, wherein during training, the total loss value is the sum of four loss values, and the loss functions are as follows:

wherein, xy_lossIs the position coordinate error of the object, wh_lossTo predict the frame width height error. s²S in (26,52,104) represents that the characteristic diagram under 3 scales is divided into S × S grids (grid cells) which are respectively responsible for predicting the target object falling into a certain grid, and B takes a value of 3, which means that each grid has 3 frames (bounding boxes) with different scales.

Indicating that the object falls into the jth bounding box of grid i. Lambda takes the value 0.5. (x, y, w, h) are the actual position coordinates and width and height of the detected target object in the labeling data,

the position coordinates and the width and height of the detected target object predicted in the forward calculation are obtained.

3) The training algorithm obtains a weight model.

The method comprises the steps of extracting pictures by adopting a road monitoring video, collecting the pictures under the conditions of different places, different time periods, different weather and the like as original pictures of a data set, marking windows in the pictures, replacing vehicles with the windows as detected target objects to generate the data set, increasing the generalization capability of a weight model for increasing data quantity during training, selecting the data set to be augmented to increase image contrast, increase image saturation and image decoloration to expand the data set, taking an image with the size of 416 × 416 as input in the data set, gradually increasing the learning rate to 0.002 in the first 2000 steps during training, reducing the learning rate to 0.0002 in the 40000 steps, reducing the learning rate to 0.0001 in the 45000 steps, training the 60000 step to generate the weight model (the adjustment of the learning rate is slightly changed in the training process of different data sets)

4) And performing model clipping on the trained weight model.

A pruning method using one BN layer. A scaling factor y is introduced for each channel of the BN layer and then multiplied with the output of the channel. And (4) training the network weight and the scaling factors in a combined manner, finally, directly removing the channel with the small scaling factor, and finely adjusting the pruned network to obtain the pruned weight model. The objective function adopts the formula:

where (x, y) represents training data and labels, W is the trainable parameter of the network, the first term is the training loss function of CNN g (γ) is the multiplication term on the scaling factor, λ is the balance factor of the two terms, where g (x) x, i.e., L1 regularization, is also widely applied to sparsification.

5) And finally testing the compression model.

And (3) collecting the data set which is made in the same data set (step 3) and takes the car window as a detected target object) to train the YO L Ov3 algorithm and the text algorithm to obtain a weight model, and carrying out test comparison in a GeForce GTX 1080Ti video card environment, wherein the results are shown in Table 1.

TABLE 1 YO L Ov3 Algorithm model and Algorithm model test comparisons herein

Compared with the results in the table 1, the detection models finally generated by the method are superior to the YO L Ov3 detection algorithm, so that the sizes of the models are easier to arrange in small-sized systems at edge ends, and the detection and identification of the vehicles in the road can be well completed with more excellent detection speed and better recall rate.

It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims

1. A rapid vehicle detection method based on deep learning is characterized by comprising the following steps:

1) construction of detection algorithms

Constructing a basic feature extraction network of a detection algorithm, wherein a basic module DB L in the structure of the network is composed of a convolutional layer (CONV), a scale normalization (BN) and an activation function (L eaky re), an RES module refers to a residual module structure and is composed of a pixel filling module (padding), a DB L module and a classical residual module, after extracting features, the basic network processes an input feature layer by three parallel channels through classical cavity convolution operation to form a multi-scale cavity convolution module, after the basic network passes through three DB L structures, the basic network inputs the input feature layer into the multi-scale cavity convolution module, then divides the processing result into three scales, and uses an FPN structure to adopt channel fusion, up-sampling and DB L structures to perform channel and feature integration fusion on the output result of the multi-scale cavity convolution module, finally processes the extracted feature map by three scales, each scale feature layer generates a feature map with the depth of 3 × (5+ n), 3 is the number of prediction frames, 5 is the number of the prediction frames, 5 is the width, the horizontal coordinate value and 4 longitudinal position information, 1 position information and three position information and the prediction results are classified into the same number of a prediction frame, and the prediction result of a prediction center prediction model, and the prediction result is obtained by the same prediction and the detection algorithm, and the final detection result is obtained by classification, and the detection algorithm, and the detection result is obtained by the detection method, and the detection algorithm, wherein the detection algorithm, and the detection result is not obtained by;

2) using cross entropy loss function

3) training algorithm to obtain weight model

Collecting vehicle pictures at different places, different time periods and different weather conditions as data set original pictures, and labeling a target object in the pictures to generate a data set; training a built deep learning algorithm in the data set by a certain strategy to generate a weight model;

4) performing model clipping on the trained weight model

5) final compression model testing

Inputting the test pictures in the data set into a network, and marking the vehicles in the test images after running the test;

through the steps, the rapid vehicle detection method based on deep learning is realized.

2. The fast vehicle detection method based on deep learning as claimed in claim 1, wherein in step 1), the input feature layer is processed in three parallel channels by three kinds of hole convolution operations with convolution expansion rate [1,2,5] and convolution kernel size 3 × 3.

3. The rapid vehicle detection method based on deep learning of claim 1, wherein in step 3), a road monitoring video is used for extracting pictures, the pictures at different places and time periods under different weather conditions are collected as original pictures of a data set, windows in the pictures are labeled, the windows replace vehicles to be detected as targets, the data set is generated, the data set is expanded by using a method of increasing image contrast, increasing image saturation and image decoloring to increase the generalization capability of a weight model in order to increase data volume during training, images with the size of 416 × 416 are used as input in the data set, the learning rate is gradually increased to 0.002 in the first 2000 steps of training, the learning rate is reduced to 0.0002 in the 40000 steps, and the learning rate is reduced to 0.0001 in the 45000 steps, and the weight model is generated in the 60000 steps.

4. The method of any one of claims 1 to 3 is applied to, but not limited to, automatic detection and identification of vehicles by a road monitoring system and uploading results.