CN109829400A

CN109829400A - A kind of fast vehicle detection method

Info

Publication number: CN109829400A
Application number: CN201910047520.0A
Authority: CN
Inventors: 王国栋; 王亮亮; 潘振宽; 徐洁; 王岩杰; 李宁孝; 胡诗语
Original assignee: Qingdao University
Current assignee: Qingdao University
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2019-05-31
Anticipated expiration: 2039-01-18
Also published as: CN109829400B

Abstract

The invention belongs to the video detection technology fields in deep learning, more particularly to a kind of fast vehicle detection method based on vehicle window feature, it is proposed that replacing car body as object using vehicle window is detected, in conjunction with the residual error module of ResNet network and the Multi resolution feature extraction method of SSD algorithm, the network structure for using for reference YOLOv3, constructs the full convolution detection method of only 24 convolutional layers；In the biggish situation of vehicle flowrate, when batch testing, average detected precision is close to 100%, average recall rate reaches 90%, detection speed reaches 22 milliseconds/frame, it realizes to the real-time detection of vehicle in road high-definition monitoring video, effectively improves the recall rate of vehicle in larger vehicle flowrate, there is important application value.

Description

A kind of fast vehicle detection method

Technical field:

The invention belongs to the video detection technology fields in deep learning, and in particular to a kind of based on the quick of vehicle window feature Vehicle checking method, especially a kind of method that can be fast implemented to moving vehicle state-detection.

Background technique:

Currently, the deep learning based on convolutional neural networks (Convolutional Neural Networks, CNN) is calculated Method development is swift and violent, and by its powerful ability in feature extraction, very high precision is achieved in terms of target detection identification.But at present The detection algorithm based on deep learning have a big disadvantage, be exactly speed issue, such as Faster R-CNN has region The algorithm of proposal (choosing area-of-interest in advance, then further detect) step, detection accuracy is very high, and such as China is special Sharp CN2018106089322 discloses a kind of Large Construction vehicle and raises arm detection algorithm, collects pictures, using Faster RCNN Algorithm detection, calculates the ratio of vehicle body area and entire vehicle area, reaches better Detection accuracy.But Faster R-CNN's Detection speed cannot but reach the requirement of multiple target HD video real-time detection.The deep learning field YOLO famous with speed (You Only Look Once) is solved object detection as regression problem, complete based on an individual network end to end At the output for being input to object space and classification from original image, the process of not explicit extraction region proposal, Detection speed is greatly improved, so that YOLOv1 has reached 45FPS in the detection speed of smaller picture, simple version reaches 155FPS has been arrived, detection speed needed for meeting HD video real-time detection, but it sacrifices many precision.YOLOv3's Occur, so that detection accuracy greatly improves, speed is reduced, and the low pixel video less for object may also reach up The requirement of real-time detection, but the high-definition monitoring video of multiple target objects is faced, the speed of YOLOv3 is just difficult to reach real-time detection It is required that.

In the prior art, it needs to be embedded in detection system in road monitoring equipment, is used to generate condition of road surface in real time, or Statistical vehicle flowrate.At this moment, there are many vehicle fleet size within the scope of monitoring visual field, often will appear 20 in the image that single frame video generates A object, and the pixel value of high-definition monitoring video single frames picture is larger, is at this time that real-time detection is not achieved in YOLOv3 speed Requirement.It is quickly detected as Chinese patent CN2016105281180 discloses the vehicle queue length based on Local Features Analysis Video sensing area is optimized to the local feature of image by algorithm from entire image, in detection this traffic of vehicle queue length The three column pixel values comprising lane picture are only chosen in information, are formed one-dimensional characteristic array on the basis of weighting reconstruct and are divided Analysis, the fast accuracy rate of arithmetic speed is high, and single frames handles time-consuming 10ms.The problem that additionally, there may be is when vehicle is more, to exist Mutually blocking between fore-aft vehicle, the global feature point of vehicle is incomplete in the visual field at this time, and two cars characteristic point mutually merges, It, thus can be big the case where often resulting in more vehicle frames together, be identified as a vehicle when being handled with non-maxima suppression The big detection accuracy of identification reduced to vehicle, this is also a difficult point in vehicle detection.

Summary of the invention:

It is an object of the invention to overcome the shortcomings of the prior art, seek to design a kind of fast vehicle detection method, Based on vehicle window feature as detected object, in conjunction with the residual error module and SSD (Single of ResNet (residual error connection network) Shot Detector) algorithm Multi resolution feature extraction technique, use for reference the network structure of YOLOv3, building only has 24 convolution The full convolution detection method of layer, mutual shielding automobile asking of being difficult to accurately detect in moving vehicle detection on better solving road The problem of topic and the existing detection algorithm based on deep learning cannot in real time detect the vehicle in monitor video.

To achieve the goals above, fast vehicle detection method provided by the invention, technical process the following steps are included:

(1) acquire data: the vehicle to road intercept in real-time traffic monitor video big under different condition Spirogram piece, picture size is 1920px × 1080px, using this as data；

(2) data mark: data mark are done to the picture acquired in step (1) based on vehicle window feature, using labelimg Annotation tool uses rectangle frame to be framed respectively using the front window of vehicle and vehicle rear window as object in picture, and is attached to vehicle Class label be labeled, by pixel coordinate information of each rectangle frame drawn on picture with xml data format guarantor There are in .xml file, generation includes the .xml text of the front and back window locations information of vehicle in picture and the classification information of vehicle Part, for being used when training and test；

(3) it constructs basic network: under step (2) data notation methods, being connected using the residual error of ResNet network, according to Characteristic layer scale pixel size successively carries out feature point extraction to input picture, and sequentially forming characteristic layer scale is respectively 208px Four parts × 208px, 104px × 104px, 52px × 52px, 26px × 26px, by the different characteristic layer of same scale Information fusion, the small-sized basic network of full convolution of 13 layers of building；

(4) it constructs detection method: using for reference the Multi resolution feature extraction method of SSD, in the basic network that step (3) construct On the basis of utilize multilayer feature extractive technique, respectively from basic network mesoscale be 104px × 104px, 52px × 52px, 26px The residual error articulamentum (Residual) of × 26px extracts characteristic layer information and is fused to detection network, corresponding with detection network Characteristic layer by up-sampling after feature fusion；Then it is generated with 1 × 1 convolution corresponding with detection object species number Tensor, it may be assumed that

(26 × 26+52 × 52+104 × 104) × (3 × (4+1+C)) ties up tensor；Using sigmoid function and 0.5 threshold The screening of value chooses the type of highest confidence level as types of forecast；By carrying out NMS to multiple prediction blocks, (non-maximum presses down System) obtain final prediction block；The classification and location information of prediction target vehicle are obtained again, constitutes detection method, are realized to road The quick detection of vehicle in movement；

(5) a large amount of pictures that step (2) acceptance of the bid is poured in training data: are chosen 80% as training data, residue 20% As test picture；Training data is trained using detection method in step (4), is stopped when loss value is less than the value of setting Only train；Obtain the weight model that one " .weights " is suffix；

(6) application test: the detection method in step (4) will be using the weight model generated in step (5) as standard, to survey Attempt piece to give a forecast, checks measuring accuracy and test speed；When measuring accuracy and speed meet the requirements, the detection that constructs at this time Method has just reached requirement；Then using the detection method and trained weight model built to the figure for having target vehicle Piece or road traffic video detect, and identify vehicle in real time, complete the detection to moving vehicle.

Wherein, in step (1) acquisition data of the present invention, in order to enable detection method to be suitable for a variety of different road fields Scape, when intercepting picture from Traffic Surveillance Video, the different condition refer to different sections of highway, different condition of road surface, different weather or Different time.

Wherein, in step (2) data mark of the present invention, the class label of the vehicle is divided into " bus " and " car " two class, Wherein " bus " class includes bus and bus, and " car " class includes kart and SUV vehicle.

Wherein, in step (3) building basic network of the present invention, characteristic layer scale is that 208px × 208px was partially forming Journey are as follows: picture pixels size is adjusted to 416px × 416px as data first and is input in network, it is big using 16 3 × 3 Small convolution, stride are that 1 pair of characteristic layer carries out intensification and feature extraction；Then the convolution of 32 3 × 3 sizes, stride 2 are used Characteristic layer is further deepened, reduces this layer of characteristic pattern scale to 208px × 208px size, and carry out feature extraction, obtains First layer under 208px × 208px scale；Then again with 1 × 1 convolution, stride is the feature that 1 fusion is extracted, obtain 208px × The second layer under 208px scale；With 3 × 3 convolution, stride is 1 to increase feature layer depth to 32 208px × 208px sizes Characteristic pattern obtains the third layer under 208px × 208px scale；Finally use for reference ResNet network residual error connection, by 208px × First layer under 208px scale is connect with the feature residual error of the third layer under 208px × 208px scale, and forming characteristic pattern is Residual error articulamentum under 208px × 208px scale.

Wherein, in step (3) building basic network of the present invention, characteristic layer scale is that 104px × 104px was partially forming Journey are as follows: to 64 3 × 3 convolution of characteristic pattern of the residual error articulamentum under 208px × 208px scale, stride is 2 progress features It extracts, reduces this layer of characteristic pattern scale to 104px × 104px size, and carry out feature extraction, obtain 104px × 104px scale Under first layer；Then 1 × 1 convolution is used, stride is the feature that 1 fusion is extracted, and obtains the under 104px × 104px scale Two layers；Again with 3 × 3 convolution, stride is 1 to deepen characteristic layer again to the characteristic pattern of 64 104px × 104px sizes, is obtained Third layer under 104px × 104px scale；It will be under the first layer and 104px × 104px scale under 104px × 104px scale The feature of third layer carries out residual error connection, and forming characteristic pattern is the residual error articulamentum under 104px × 104px scale.

Wherein, in step (3) building basic network of the present invention, characteristic layer scale is that 52px × 52px is partially forming process Are as follows: to 128 3 × 3 convolution of characteristic pattern of the residual error articulamentum under 104px × 104px scale, stride is 2 progress features It extracts, reduces this layer of characteristic pattern scale to 52px × 52px size, and carry out feature extraction, obtain under 52px × 52px scale First layer；Then 1 × 1 convolution is used, stride is the feature that 1 fusion is extracted, and obtains the second layer under 52px × 52px scale；Again With 3 × 3 convolution, stride deepens characteristic layer to the characteristic pattern of 128 52px × 52px sizes for 1 again, obtains 52px × 52px Third layer under scale；The feature of third layer under first layer and 52px × 52px scale under 52px × 52px scale is carried out Residual error connection forms the residual error articulamentum that characteristic pattern is 52px × 52px.

Wherein, in step (3) building basic network of the present invention, characteristic layer scale is that 26px × 26px is partially forming process Are as follows: to 256 3 × 3 convolution of characteristic pattern of the residual error articulamentum under 52px × 52px scale, stride is that 2 progress features mention It takes, reduces this layer of characteristic pattern scale to 26px × 26px size, and carry out feature extraction, obtain under 26px × 26px scale One layer；Then 1 × 1 convolution is used, stride is 1 fusion feature, obtains the second layer under 26px × 26px scale；Again with volume 3 × 3 Product, stride are deepened characteristic layer to the characteristic pattern of 256 26px × 26px sizes for 1 again, are obtained under 26px × 26px scale Third layer；The feature of third layer under first layer and 26px × 26px scale under 26px × 26px scale is subjected to residual error company It connects, forms the residual error articulamentum that characteristic pattern is 26px × 26px.

Wherein, in step (3) of the present invention, the specific structure of the basic network of the building are as follows:

In the present invention, under the data notation methods based on vehicle window feature, vehicle window Property comparison is simple, and feature is obvious, So not needing to construct too many feature extraction layer, it is only necessary to which a small amount of edge feature and provincial characteristics is set according to this point Counted lightweight vehicle window detection basic network, sequentially form characteristic layer scale be respectively 208px × 208px, 104px × Four parts 104px, 52px × 52px, 26px × 26px, characteristic pattern are successively deepened, and characteristic point is successively extracted, unit character The corresponding receptive field of point successively increases；It is direct-connected in the first layer of every part and the portion of third layer, by the different characteristic of same scale The information fusion of layer guarantees to be easier to obtain optimal result when training, effectively prevent gradient disappearance problem occur, thus construct 13 The small-sized basic network of full convolution of layer.The residual error articulamentum that scale is 104px × 104px retains more fine granularity features, favorably In detection local feature region；Scale is that the feature height of the residual error articulamentum of 26px × 26px is abstract, can obtain big impression Open country, convenient for detection Integral Characteristic.

By the feature for the residual error articulamentum that scale is 104px × 104px, 52px × 52px, 26px × 26px in the present invention It is fused to detection network, with feature fusion of the corresponding characteristic layer after up-sampling in detection network, enhances target Vehicle characteristics point information；Improve detection accuracy.

Detection method provided by the invention is write as by C++, and dependence environment is few, is conveniently embedded in road video monitoring equipment In system, or it is mounted on client in the form of plug-in unit, realizes real-time detection and statistics to vehicle in road traffic video.

The realization process of technical solution of the present invention is: monitoring traveling state of vehicle and driver by road video monitoring equipment State, the camera of road video monitoring equipment can take the front windshield of all vehicles, while it can be seen that rear gear Wind glass；Replace entire vehicle as detected object the front and back vehicle window of vehicle, the feature difference of different vehicle windows is smaller, so that The feature for being detected object is more stable, is conducive to detection identification；Feature of the characteristic point quantity of vehicle window far fewer than entire vehicle Point quantity, thus constructs a small-sized detection method, under the premise of ensureing precision, detection speed is promoted, so that detection method Reach the requirement to high-definition monitoring video real-time detection；Residual error module and SSD in conjunction with residual error connection network (ResNet) are calculated The Multi resolution feature extraction method of method, uses for reference the network structure of YOLOv3, is built with the full convolution detection method of 24 convolutional layers, Realize the real-time quick detection to moving vehicle.

Compared with prior art, the present invention having the following advantages that and marked improvement: to existing in road high-definition monitoring video Good effect is obtained in the detection of the various sizes of object blocked in various degree；After tested, in the biggish feelings of vehicle flowrate Under condition, the vehicle number that every picture includes is at 25 or more, and recall rate is up to 95%, and detection accuracy is examined close to 100% Degree of testing the speed has reached 22 milliseconds/frame (45FPS)；In batch testing, guarantee detection accuracy close in the case where 100%, it is average to examine Extracting rate accomplishes effectively to improve larger vehicle flowrate to the real-time detection identification of vehicle in road high-definition monitoring video close to 90% The recall rate of middle vehicle has important application value.

Detailed description of the invention:

Fig. 1 is the unitary construction principle schematic diagram of detection method of the present invention.

Fig. 2 is the video interception that the monitoring device in the higher position of two-way lane is shot, and wherein Fig. 2 (a) is original image, Fig. 2 (b) is Faster R-CNN detection algorithm recognition effect figure, and Fig. 2 (c) is detection method recognition effect figure of the invention.

Fig. 3 is the monitoring shooting screenshot of closer distance, and wherein Fig. 3 (a) is original image, and Fig. 3 (b) is Faster R-CNN inspection Method of determining and calculating recognition effect figure, Fig. 3 (c) are detection method recognition effect figure of the invention.

Fig. 4 is the conventional monitor video screenshot of certain crossroad, and wherein Fig. 4 (a) is original image, and Fig. 4 (b) is Faster R-CNN detection algorithm recognition effect figure, Fig. 4 (c) are detection method recognition effect figure of the invention.

Specific embodiment:

Specific embodiments of the present invention will be further explained with reference to the accompanying drawing.

Embodiment 1:

The present embodiment is related to a kind of fast vehicle detection method and its application test, and specific embodiment includes following step It is rapid:

(1) data are acquired:, can be to the mesh of monitor video because of the traffic of various complexity, weather conditions, different time sections Mark Different Effects caused by vehicle fleet size and distribution, shading value, color saturation；In order to enable detection method to be suitable for A variety of different road scenes, selection intercepts different sections of highway, different road like from Traffic Surveillance Video when acquiring training data A large amount of pictures under the conditions of condition, different weather, different time, vehicle flowrate is larger in picture, and the vehicle number that every picture includes is equal At 25 or more, picture size is 1920px × 1080px, using this as data；

(2) data mark: doing data mark to the picture acquired in step (1), the detection method based on vehicle window feature will The front window and vehicle rear window of vehicle are identified respectively as object；With labelimg annotation tool to vehicle in picture before Vehicle window and vehicle rear window are labeled respectively, and generation includes vehicle front and back window locations information and class of vehicle information in picture The .xml file of xml data format；

For Traffic Surveillance Video in road, the classification information of vehicle is labeled as " bus " and " car " two class, " bus " class packet Containing bus and bus, " car " class includes kart and SUV vehicle；

(3) construct basic network: under step (2) data notation methods, the characteristic point for being detected target vehicle substantially subtracts Few, convolution kernel negligible amounts used in every layer of convolution facilitate simplifying for its network structure；Basic network uses for reference ResNet network Residual error module (residual blocks), first using picture pixels size be adjusted to 416px × 416px as data input Into network, using the convolution of 16 3 × 3 sizes, stride is that 1 pair of characteristic layer carries out intensification and feature extraction；Then 32 are used The convolution of a 3 × 3 size, stride are that 2 pairs of characteristic layers are further deepened, and it is big to 208px × 208px to reduce this layer of characteristic pattern scale It is small, and feature extraction is carried out, obtain the first layer under 208px × 208px scale；Then the spy extracted again with the fusion of 1 × 1 convolution Sign, obtains the second layer under 208px × 208px scale；Increase feature layer depth to 32 208px × 208px with 3 × 3 convolution The characteristic pattern of size obtains the third layer under 208px × 208px scale；The residual error connection for finally using for reference ResNet network, will The feature of the third layer under first layer and 208px × 208px scale under 208px × 208px scale carries out residual error connection, is formed Characteristic pattern is the residual error articulamentum under 208px × 208px scale；

Then feature is carried out with 64 3 × 3 convolution to the characteristic pattern of the residual error articulamentum under 208px × 208px scale It extracts, reduces this layer of characteristic pattern scale to 104px × 104px size, and carry out feature extraction, obtain 104px × 104px scale Under first layer；Then the feature extracted using the fusion of 1 × 1 convolution, obtains the second layer under 104px × 104px scale；It uses again 3 × 3 convolution deepen the characteristic pattern of characteristic layer to 64 104px × 104px sizes again, obtain under 104px × 104px scale Third layer；The feature of third layer under first layer and 104px × 104px scale under 104px × 104px scale is subjected to residual error Connection, forming characteristic pattern is the residual error articulamentum under 104px × 104px scale；

Then feature is carried out with 128 3 × 3 convolution to the characteristic pattern of the residual error articulamentum under 104px × 104px scale It extracts, reduces this layer of characteristic pattern scale to 52px × 52px size, and carry out feature extraction, obtain under 52px × 52px scale First layer；Then the feature extracted using the fusion of 1 × 1 convolution, obtains the second layer under 52px × 52px scale；Again with volume 3 × 3 Product deepens characteristic layer to the characteristic pattern of 128 52px × 52px sizes again, obtains the third layer under 52px × 52px scale；It will The feature of the third layer under first layer and 52px × 52px scale under 52px × 52px scale carries out residual error connection, forms feature The residual error articulamentum that figure is 52px × 52px；

Then feature is carried out with 256 3 × 3 convolution to the characteristic pattern of the residual error articulamentum under 52px × 52px scale to mention It takes, reduces this layer of characteristic pattern scale to 26px × 26px size, and carry out feature extraction, obtain under 26px × 26px scale One layer；Then 1 × 1 convolution fusion feature is used, the second layer under 26px × 26px scale is obtained；Added again with 3 × 3 convolution again Deep characteristic layer obtains the third layer under 26px × 26px scale to the characteristic pattern of 256 26px × 26px sizes；By 26px × The feature of the third layer under first layer and 26px × 26px scale under 26px scale carries out residual error connection, and forming characteristic pattern is The residual error articulamentum of 26px × 26px；

It is 208px × 208px, 104px × 104px, 52px × 52px, 26px that characteristic pattern scale pixel size, which is consequently formed, Four parts of × 26px, characteristic pattern are successively deepened, and characteristic point is successively extracted, and unit character point corresponds to receptive field and successively increases；? The last layer of every part and the segments first layer are direct-connected, and the information of the different characteristic layer of same scale is merged, and guarantee training When be easier to obtain optimal result, effectively prevent the appearance of gradient disappearance problem, thus construct 13 layers of the small-sized basis of full convolution Network, specific network structure are as follows:

(4) it constructs detection method: using for reference the Multi resolution feature extraction method of SSD, in the basic network that step (3) construct On the basis of utilize multilayer feature extractive technique, respectively from basic network mesoscale be 104px × 104px, 52px × 52px, 26px The residual error articulamentum (Residual) of × 26px extracts profile information, to obtain the characteristic pattern of three different scales；Again in ruler Degree is that the residual error articulamentum of 26px × 26px obtains big receptive field, convenient for detecting larger target vehicle；Scale be 104px × The residual error articulamentum of 104px retains more fine granularity features, is conducive to the target vehicle for detecting small size；It will be under these three scales Fusion Features to detection network, with detection network in corresponding characteristic layer by up-sampling after feature fusion；So Tensor corresponding with detection object species number is generated with 1 × 1 convolution afterwards, it may be assumed that

(26 × 26+52 × 52+104 × 104) × (3 × (4+1+C)) ties up tensor；Using sigmoid function and 0.5 threshold The screening of value chooses the type of highest confidence level as types of forecast；By carrying out NMS to multiple prediction blocks, (non-maximum presses down System) obtain final prediction block；The classification and location information of prediction target vehicle are obtained again, constitutes detection method, are realized to road The quick detection of vehicle, such as Fig. 1 in movement；

(5) a large amount of pictures that step (2) acceptance of the bid is poured in training data: are chosen 80% as training data, residue 20% As test picture；Data set is trained using detection method in step (4), is stopped when loss value is less than the value of setting Training；Obtain the weight model that one " .weights " is suffix；

Embodiment 2:

The present embodiment chooses the video interception of the video monitoring equipment shooting of the higher position of two-way lane, will be of the invention Detection method and Faster R-CNN detection algorithm compare, and see Fig. 2；Wherein, as can be seen that having on the left of road in Fig. 2 (a) Three rest in the vehicle in roadside, are blocked by branch, and vehicle is more in road, and exist and mutually block；Fig. 2 (b) can be seen that Faster R-CNN detection algorithm recognition effect is undesirable, recall rate 55%, and it is 100% that detection accuracy, which detects accuracy, There is the case where nearlyr vehicle is not detected, also has because by two vehicle identifications being vehicle caused by mutually being blocked between vehicle Situation in this way can have a great impact to the statistics of wagon flow, and its detection speed is slower compared with the detection speed of this detection method；Figure 2. (c) as can be seen that detection method of the invention has preferable performance, closely the interior vehicle mutually blocked is all detected, examine Extracting rate reaches 95%, detection accuracy 100%.

Embodiment 3:

The present embodiment chooses the video monitoring shooting of closer distance, and vehicle flowrate is larger, phase between the region vehicle in camera lens Mutual serious shielding, such as Fig. 3 (a)；Detection method of the invention and Faster R-CNN detection algorithm are compared, Fig. 3 (b) can be with Find out, Faster R-CNN to it is this there are the vehicle recall rate of serious occlusion issue be 57%, detection accuracy 85%, generally The case where in the presence of more vehicles are identified as a vehicle, and its IOU (Intersection over Union) error is larger；Figure 3. (c) is as can be seen that this method detection effect is better than the prior art, recall rate 79%, detection accuracy 100%, on boundary Vehicle there is the case where not detecting, remaining vehicle mutually blocked is all apparent clear detection and mark.

Embodiment 4:

The present embodiment chooses the conventional monitor video screenshot of certain crossroad, such as Fig. 4, has " bus " class in figure；Fig. 4 (b) As can be seen that the vehicle Faster R-CNN not blocked mutually still can be detected accurately, but in vehicle Intensive place will malfunction, this figure recall rate is 60%, detection accuracy 92%；Fig. 4 (c) finds out, detection method of the invention Preferably, this figure recall rate is 93% for performance, and detection accuracy 100% is better than the prior art.

Under NVIDIA Tesla K80 video card, it is 111 milliseconds of every frames that YOLOv3, which detects speed, in embodiment 2-4, this hair Bright detection method has reached the speed (45FPS) of 22 milliseconds of every frames.

Claims

1. a kind of fast vehicle detection method, which is characterized in that the fast vehicle detection method the following steps are included:

(1) acquire data: the vehicle to road carries out intercepting the big spirogram under different condition in real-time traffic monitor video Piece, picture size is 1920px × 1080px, using this as data；

(2) data mark: doing data mark to the picture acquired in step (1) based on vehicle window feature, marked using labelimg Tool uses rectangle frame to be framed respectively using the front window of vehicle and vehicle rear window as object in picture, and is attached to the class of vehicle Distinguishing label is labeled, and pixel coordinate information of each rectangle frame drawn on picture is stored in xml data format .xml in file, generation includes the .xml file of the front and back window locations information of vehicle in picture and the classification information of vehicle, For being used when training and test；

(3) it constructs basic network: under step (2) data notation methods, being connected using the residual error of ResNet network, according to feature Layer scale pixel size to input picture successively carry out feature point extraction, sequentially form characteristic layer scale be respectively 208px × Four parts 208px, 104px × 104px, 52px × 52px, 26px × 26px, by the letter of the different characteristic layer of same scale Breath fusion, the small-sized basic network of full convolution of 13 layers of building；

(4) it constructs detection method: using for reference the Multi resolution feature extraction method of SSD, in the basis of the basic network of step (3) building It is upper utilize multilayer feature extractive technique, respectively from basic network mesoscale be 104px × 104px, 52px × 52px, 26px × The residual error articulamentum of 26px extracts characteristic layer information and is fused to detection network, passes through with corresponding characteristic layer in detection network Feature fusion after up-sampling；Then tensor corresponding with detection object species number is generated with 1 × 1 convolution, it may be assumed that (26 × 26+52 × 52+104 × 104) × (3 × (4+1+C)) ties up tensor；Using the sieve of sigmoid function and 0.5 threshold value Choosing chooses the type of highest confidence level as types of forecast；It is obtained finally by carrying out non-maxima suppression to multiple prediction blocks Prediction block；The classification and location information of prediction target vehicle are obtained again, constitute detection method；

(5) training data；

(6) application test.

2. a kind of fast vehicle detection method according to claim 1, which is characterized in that difference item described in step (1) Part refers to different sections of highway, different condition of road surface, different weather or different time；The class label of vehicle described in step (2) is divided into " bus " and " car " two class, wherein " bus " class includes bus and bus, " car " class includes kart and SUV vehicle Automobile.

3. a kind of fast vehicle detection method according to claim 1 to 2, which is characterized in that feature described in step (3) Layer scale be 208px × 208px be partially forming process are as follows: first using picture pixels size be adjusted to 416px × 416px as Data are input in network, and using the convolution of 16 3 × 3 sizes, stride is that 1 pair of characteristic layer carries out intensification and feature extraction；So The convolution of 32 3 × 3 sizes is used afterwards, and stride is that 2 pairs of characteristic layers are further deepened, and reduces this layer of characteristic pattern scale to 208px × 208px size, and feature extraction is carried out, obtain the first layer under 208px × 208px scale；Then again with 1 × 1 convolution, step Width is the feature that 1 fusion is extracted, and obtains the second layer under 208px × 208px scale；With 3 × 3 convolution, stride is 1 increase feature Layer depth obtains the third layer under 208px × 208px scale to the characteristic pattern of 32 208px × 208px sizes；Finally use for reference The residual error of ResNet network connects, by the third layer under the first layer and 208px × 208px scale under 208px × 208px scale The connection of feature residual error, forms characteristic pattern as the residual error articulamentum under 208px × 208px scale.

4. a kind of fast vehicle detection method according to claim 1 to 2, which is characterized in that feature described in step (3) Layer scale is that 104px × 104px is partially forming process are as follows: to the characteristic pattern of the residual error articulamentum under 208px × 208px scale With 64 3 × 3 convolution, stride is 2 progress feature extractions, reduces this layer of characteristic pattern scale to 104px × 104px size, and Feature extraction is carried out, the first layer under 104px × 104px scale is obtained；Then 1 × 1 convolution is used, stride is that 1 fusion is extracted Feature, obtain the second layer under 104px × 104px scale；Again with 3 × 3 convolution, stride is 1 to deepen characteristic layer again to 64 The characteristic pattern for opening 104px × 104px size, obtains the third layer under 104px × 104px scale；By 104px × 104px scale Under first layer and 104px × 104px scale under third layer feature carry out residual error connection, is formed characteristic pattern for 104px × Residual error articulamentum under 104px scale.

5. a kind of fast vehicle detection method according to claim 1 to 2, which is characterized in that feature described in step (3) Layer scale is that 52px × 52px is partially forming process are as follows: is used the characteristic pattern of the residual error articulamentum under 104px × 104px scale 128 3 × 3 convolution, stride are 2 progress feature extractions, reduce this layer of characteristic pattern scale to 52px × 52px size, and carry out Feature extraction obtains the first layer under 52px × 52px scale；Then 1 × 1 convolution is used, stride is the feature that 1 fusion is extracted, Obtain the second layer under 52px × 52px scale；Again with 3 × 3 convolution, stride be 1 deepen again characteristic layer to 128 52px × The characteristic pattern of 52px size obtains the third layer under 52px × 52px scale；By under 52px × 52px scale first layer with The feature of third layer under 52px × 52px scale carries out residual error connection, forms the residual error that characteristic pattern is 52px × 52px and connects Layer.

6. a kind of fast vehicle detection method according to claim 1 to 2, which is characterized in that feature described in step (3) Layer scale is that 26px × 26px is partially forming process are as follows: is used the characteristic pattern of the residual error articulamentum under 52px × 52px scale 256 3 × 3 convolution, stride are 2 progress feature extractions, reduce this layer of characteristic pattern scale to 26px × 26px size, and carry out Feature extraction obtains the first layer under 26px × 26px scale；Then 1 × 1 convolution is used, stride is 1 fusion feature, is obtained The second layer under 26px × 26px scale；Again with 3 × 3 convolution, stride is 1 to deepen characteristic layer again to 256 26px × 26px The characteristic pattern of size obtains the third layer under 26px × 26px scale；By under 26px × 26px scale first layer and 26px × The feature of third layer under 26px scale carries out residual error connection, forms the residual error articulamentum that characteristic pattern is 26px × 26px.

7. a kind of fast vehicle detection method according to claim 1 to 2, which is characterized in that 13 layers described in step (3) The small-sized basic network of full convolution specific structure are as follows:

8. a kind of fast vehicle detection method according to claim 1, which is characterized in that training number described in step (5) According to operating process are as follows: a large amount of pictures for being poured in of step (2) acceptance of the bid are chosen 80% as training data, residue 20% is as surveying Attempt piece；Training data is trained using detection method in step (4), stops instruction when loss value is less than the value of setting Practice；Obtain the weight model that one " .weights " is suffix.

9. a kind of fast vehicle detection method according to claim 1, which is characterized in that using survey described in step (6) The operating process of examination are as follows: the detection method in step (4) will be using the weight model generated in step (5) as standard, to test chart Piece gives a forecast, and checks measuring accuracy and test speed；When measuring accuracy and speed meet the requirements, the detection method that constructs at this time Just requirement has been reached；Then using the detection method and trained weight model built to target vehicle picture or Person's road traffic video detects, and identifies vehicle in real time, completes the detection to moving vehicle.

10. a kind of reality of any fast vehicle detection method vehicle in road high-definition monitoring video of claim 1-9 When detection in application, which is characterized in that the average detected precision of the fast vehicle detection method is average to examine close to 100% Extracting rate reaches 90%, and detection speed reaches 22 milliseconds/frame.