CN108710875A

CN108710875A - A kind of take photo by plane road vehicle method of counting and device based on deep learning

Info

Publication number: CN108710875A
Application number: CN201811054125.7A
Authority: CN
Inventors: 谭鑫; 罗林燕; 马维力; 李思勤; 张癸; 张一癸
Original assignee: Hunan Kunpeng Newell Uav Technology Co Ltd
Current assignee: Hunan Kunpeng Zhihui Technology Co ltd
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2018-10-26
Anticipated expiration: 2038-09-11
Also published as: CN108710875B

Abstract

The present invention provides a kind of take photo by plane road vehicle method of counting and device based on deep learning, including obtains the road vehicle sample image of taking photo by plane under a variety of image-forming conditions, and is pre-processed to the road vehicle sample image of taking photo by plane；Deep neural network model is built, the deep neural network model of pretreated road vehicle sample image training structure of taking photo by plane is utilized；Road vehicle image to be detected of taking photo by plane is detected using the deep neural network model, the highway and Vehicle Object identification probability, object's position, subject area for exporting detection are divided；According to the highway of identification and vehicle as a result, to the vehicle count in highway region, count results are exported.By sharing depth image feature extraction network, it is effectively utilized computing resource, saves model training and highway, vehicle detection run time, accuracy rate, recall rate all greatly improve.The present invention is applied to traffic data collection, traffic flow monitoring and image real time transfer, image analysis application field.

Description

A kind of take photo by plane road vehicle method of counting and device based on deep learning

Technical field

The present invention relates to the data acquisitions in image real time transfer, image analysis, image data identification and traffic application Equal fields more particularly to a kind of take photo by plane road vehicle method of counting and device based on deep learning.

Background technology

With socio-economic development, vehicle fleet size sharply increases, and the traffic problems such as crowded, blocking, traffic accident occur often.For Solution problems, it is necessary to vehicle, especially vehicle flowrate are monitored, to pinpoint the problems in time, relieve traffic congestion.Mesh Before, common vehicle detection and method of counting are still to install the fixed location based on monitoring camera, earth induction detector etc. Monitoring.Such methods limited coverage area, there are monitoring blind areas.It is especially vast sparse in surrounding city, suburb, and western part Road network can not monitor substantially.In recent years, the characteristics of unmanned plane is with its flexible and a wide range of cruise, for traffic inspection field Problems provide a kind of good solution.

In unmanned plane traffic inspection, vehicle detection and counting are an important core technologies.Its key is to utilize calculating Vehicle in the technologies automatic identification highway such as machine vision, artificial intelligence and statistical magnitude.Unmanned plane is since its flying height is compared with ground Face monitoring device is much higher, so the usual visual field of Aerial Images is larger, scene is complicated.Wherein, highway Aerial Images would generally be clapped Take the photograph to roadside pavement, parking lot, public square etc..These places can usually park a large amount of vehicles.So being set with ground monitoring It is standby different, based on the vehicle count of unmanned plane image, need the content for being related to two aspects：When identification vehicle, second is that Identify highway.It should not be included in the scope of vehicular traffic statistics in both sides of highway parked vehicle.

In correlative study, " method for acquiring traffic information based on Aerial Images, application number：201010588880.0 hair Bright designer：Liu Fuqiang etc. " describes the acquisition of the traffic information based on Aerial Images.First with Color histogram distribution and suddenly The Road that husband's change detection goes out limits road area, then on the road area of restriction, is detected using patch analysis method Stationary vehicle, and for moving vehicle, then use KLT algorithms to detect.After detecting vehicle, extraction vehicle heading, vehicle The information such as speed and length of wagon；" the moving vehicle detection method based on video image of taking photo by plane, application number： 201710013944.6 invention designer：In bright etc. " propose a kind of moving vehicle detection side based on video image of taking photo by plane Method.Rough detection is carried out to moving vehicle first with Three image difference, it is then in the essence detection of vehicle, color threshold is adaptive The road area that detection should be divided is determined as vehicle candidate region, and in extracted region CHLBP features, finally utilizes SVM points The accurate testing result that class device carries out judging to obtain moving vehicle to CHLBP features；" the highway vehicle based on Airborne camera Detection and tracking technique, Southeast China University, master thesis, author：Li Cong " is examined for unmanned plane vehicle on highway It surveys, it is proposed that using the highway detection method of Threshold segmentation and edge line detection, and in highway sector, utilize Vibe adaptive RTS threshold adjustment algorithms, segmentation foreground area is as vehicle detection result；" based on conspicuousness detection and grader instruction Experienced Aerial Images vehicle detection, Beijing Jiaotong University, Specialized Master Degree paper, author：King Haiti " is directed to Aerial Images vehicle Road proposes the conspicuousness detection method based on histogram contrast, and utilizes harr feature combination Adaboost graders Training pattern is detected road vehicle；" method for being detected using remote sensing image and counting road vehicle, application number： 200810227007.1 invention designer：Tan's thoroughfare continuous heavy rain etc. " is then directed to remote sensing image, it is proposed that is based on GIS-Geographic Information System road The road buffering area that center line vector layer data generates one fixed width is divided as road area, and builds the fuzzy of object-oriented Counting is identified to the Vehicle Object in road in grader.In addition to the above-mentioned research for detecting vehicle in the road area of detection Outside, the research of also several independent detection vehicles.

Unmanned plane image road vehicle counts, and key problem is road vehicle identification.Currently, for highway car It is basic or using first highway is identified with a kind of method in the research of identification, vehicle is then alternatively identified again Thinking, such as above-mentioned each document.The shortcomings that this thinking is that whole efficiency is not high, needs to do analyzing processing twice to image.It is practical On, many foundation characteristics of image are all identical, and processing twice causes the waste of time and resource.Currently, not yet finding It can identify the research of road and vehicle simultaneously in the same model.

In terms of the identification to road, method of the method universal at present still based on Conventional visual detection, such as " based on boat Clap the method for acquiring traffic information of image, application number：201010588880.0 invention designer：Liu Fuqiang etc. " is straight using color The Road that side's figure distribution is detected with Hough transformation limits road area jointly, " the moving vehicle based on video image of taking photo by plane Detection method, application number：201710013944.6 invention designer：In bright etc. " utilize color threshold adaptivenon-uniform sampling detection road Road region, " vehicle on highway detection and tracking technique, Southeast China University, master thesis based on Airborne camera are made Person：Li Cong " determines highway sector also with Threshold segmentation and edge line detection, " is detected and is classified based on conspicuousness The Aerial Images vehicle detection of device training, Beijing Jiaotong University, Specialized Master Degree paper, author：King Haiti " also uses and is based on Conspicuousness detection method identification road of histogram contrast, etc..These methods belong to shallow-layer learning method, model tormulation Ability is limited.Involved in method to the parameters such as threshold value be often difficult to select.It is specific that set of parameter usually may be only available for certain Scene.

In highway image of taking photo by plane, since visual field is larger, the position that vehicle occurs may be on highway, it is also possible to In both sides of highway, in some instances it may even be possible in the region from highway farther out.In order to road vehicle accurate counting, it is necessary to identify highway and Two objects of vehicle.

Herein, since vehicle can travel on highway, can also rest in roadside, thus two objects exist intersection and comprising The case where, i.e., vehicle partly on highway, and is completely contained in highway region all outside highway.Existing technical side Method usually identifies highway first, then recycles the identification of highway region limitation vehicle.The disadvantage of such methods maximum is It needs to handle same image twice, greatly reduces computational efficiency, waste computing resource, extend detection time.

Invention content

In view of the deficienciess of the prior art, the object of the present invention is to provide a kind of highway car of taking photo by plane based on deep learning Method of counting.

The technical solution of method of counting is in the present invention：

A kind of road vehicle method of counting of taking photo by plane based on deep learning, specifically includes following steps：

Road vehicle sample image of taking photo by plane under S1, a variety of image-forming conditions of acquisition, forms sample database, and in the sample database Road vehicle sample image of taking photo by plane is pre-processed；

S2, structure deep neural network model utilize the depth of pretreated road vehicle sample image training structure of taking photo by plane Neural network model；

S3, road vehicle image to be detected of taking photo by plane is detected using the deep neural network model, exports the public affairs of detection Divide with Vehicle Object identification probability, object's position, subject area on road；

S4, according to the highway of identification with vehicle as a result, to the vehicle count in highway region, export count results.

As a further improvement of the above technical scheme, in step S2, its network knot of the deep neural network model of structure Structure includes Infrastructure Network and head Task Network；

Infrastructure Network includes：

Global image feature extraction sub-network is made of several convolutional layers, pond layer, for extracting the depth of image from shallow to deep Characteristic pattern；

Sub-network is nominated in candidate target position, is made of several convolutional layers, output layer, default by being slided on depth characteristic figure The window of size and step-length differentiates the characteristic pattern under different windows, exports the position candidate of target；

Target scale is adapted to sub-network, is made of multiple pond layers, by the corresponding feature of window of the different scale size of nomination Figure is further extracted as the characteristic pattern of fixed dimension, needs fixed dimension to input data to meet the full articulamentum of head Task Network Spend the requirement of size；

Head Task Network includes：

Vehicle identification sub-network is made of multiple full articulamentums, and whether the candidate target window feature figure to nomination is vehicle mesh Mark is identified, and exports position and identification probability of the window in artwork, i.e. vehicle position in the picture and vehicle Identification probability；

Vehicles segmentation sub-network is made of multiple convolutional layers, and Pixel-level segmentation is carried out to the candidate target window feature figure of nomination, The probability that each pixel is vehicle is exported, vehicles segmentation mask thermodynamic chart, the i.e. dicing masks of vehicle are formed；

Highway identifies sub-network, is made of multiple full articulamentums, whether the candidate target window feature figure to nomination is highway mesh Mark is identified, and exports position and identification probability of the window in artwork, i.e. highway position in the picture and highway Identification probability；

Highway divides sub-network, is made of multiple convolutional layers, and Pixel-level segmentation is carried out to the candidate target window feature figure of nomination, The probability that each pixel is highway is exported, highway dicing masks thermodynamic chart, the i.e. dicing masks of highway are formed.

As a further improvement of the above technical scheme, in step S2, the training step of the deep neural network model It specifically includes：

S21, the sample image in sample database is labeled, is obtained with markd road vehicle sample image of taking photo by plane, and obtain Take the label result of the markd road vehicle sample image of taking photo by plane of each band：

Its edge is sketched the contours in the form of closed polygon to highway region and the vehicle in sample image, and mark object type with Title produces sample object mask according to polygon, and highway region and vehicle position in sample image are generated according to mask, With（Y, x, h, w）It indicates, wherein（Y, x）The POS INT point is described, is its top left co-ordinate in figure,（H, w）Description should The size of position is its height and the width, and here, the classification of the position of generation, mask and mark is collectively as true value, after participation The error calculation of continuous deep neural network model output result, verification is marked off according to the sample extraction certain proportion of all marks Collection；

S22, using default initial parameter deep neural network model to it is markd take photo by plane road vehicle sample image into Row classification, segmentation, obtain the output result with markd road vehicle sample image of taking photo by plane；

S23, it is pressed from rear to preceding side according to the error between the output result and label result of the deep neural network model Formula reversely successively propagates to the entire deep neural network model, updates all parameters of deep neural network model；

S24, the new markd road vehicle sample image of taking photo by plane of band is read, utilizes the deep neural network model of undated parameter Road vehicle sample image of taking photo by plane markd to new band is classified, is divided；

S25, judge whether to meet the condition of convergence or stop condition, the deconditioning if meeting, if not satisfied, step S23 is returned to, Continue to train, the wherein condition of convergence is mainly whether the error precision verified on collection reaches requirement, and stop condition is mainly trained Number whether be more than preset value.

As a further improvement of the above technical scheme, in step S3：

The highway and Vehicle Object identification probability include the identification probability of highway and the identification probability of vehicle;

The object's position includes the position of highway in the picture and the position of vehicle in the picture;

The subject area segmentation includes the dicing masks of the dicing masks and vehicle of highway.

As a further improvement of the above technical scheme, in step S3, using the deep neural network model to be checked Survey the step of road vehicle image is detected of taking photo by plane specifically includes：

S31, input road vehicle image of taking photo by plane to be detected；

S32, input picture are handled through global image feature extraction sub-network, extract image different levels feature, export characteristic pattern；

S33, sub-network is nominated into the characteristic pattern input candidate target position of output, determines the position candidate that target occurs, output Several position candidate windows；

S34, the position candidate window to output, target scale are adapted to sub-network by the corresponding feature of the window of different scale size Figure is further extracted as the characteristic pattern of fixed dimension；

S35, the fixed dimension characteristic pattern by output, input vehicle identification sub-network and highway identification sub-network is identified, defeated Go out identification probability and the target location after optimized regression；

S36, in S35, vehicle identification probability meets the window of vehicle judgement preset value, by its corresponding characteristic pattern through target ruler Degree adaptation sub-network is further extracted as the characteristic pattern of fixed dimension, and highway identification probability meets the window of highway judgement preset value Mouthful, its corresponding characteristic pattern is further extracted as to the characteristic pattern of fixed dimension through target scale adaptation sub-network；

S37, the fixed dimension characteristic pattern for exporting S36, input vehicles segmentation sub-network respectively and highway divides sub-network, output Vehicles segmentation mask and highway dicing masks.

As a further improvement of the above technical scheme, in step S5, the described pair of vehicle count being located in highway region, The dicing masks overlapping area for whether being located at dicing masks and highway that the judgment criterion in highway region is vehicle to vehicle should be big In the representative value of predetermined threshold value, the predetermined threshold value be 30%.

As a further improvement of the above technical scheme, in step S1, to the road vehicle sample of taking photo by plane in the sample database This image carries out pretreatment and includes random rescaling, random overturning, random brightness adjustment, random setting contrast, satisfies at random It is operated at least one of adjustment is spent.

As a further improvement of the above technical scheme, in step S1：

The a variety of imaging regions of road vehicle sample image of taking photo by plane include：It is double without central greenbelt in city, suburb, rural area To Four-Lane Road, two-way six-lane, there are two-way six-lane, the two-way eight tracks of central greenbelt, crossroad, T-shaped road junction；

The a variety of image-forming conditions of road vehicle sample image of taking photo by plane include：The fine day, cloudy, cloudy for causing brightness change, causes Slight haze, moderate haze, the severe haze of saturation degree variation at dawn, the dusk for causing coloration to change, cause partial occlusion Light rain, moderate rain.

The present invention also provides a kind of the road vehicle counting device of taking photo by plane based on deep learning, the technical solution used It is：

A kind of road vehicle counting device of taking photo by plane based on deep learning, including：

Image collection module, for obtaining road vehicle image of taking photo by plane；

Image pre-processing module, for being pre-processed to the sample image for participating in training；

Training module, for training above-mentioned deep neural network model；

Detection module, it is to be detected for being obtained to image collection module using the deep neural network model that training obtains in advance Image is detected, and identifies highway and vehicle target therein；

Counting module, for the vehicle count in highway.

The main theory foundation of method of counting is in the present invention：The research of a large amount of deep learnings thinks that the feature of image includes The planar structure of the pixel of shallow hierarchy, the linear structure of a deep level, such as variously-shaped straight line, curve, then a deep level, such as Rectangle, triangle, circle, then a variety of planar structures of a deep level polymerization, the various textural characteristics of formation.Pass through combination Low level feature forms more abstract high-level feature.Finally the various tools that human eye can identify are constituted by various high-level characteristics The target category being of practical significance.Therefore, either highway or vehicle are all by above-mentioned from shallow to deep various levels of Structure composition.The two is in the lower level feature on basis, it should have the common characteristic that can largely reuse.

The advantageous effects of the present invention：

Road vehicle method of counting provided by the present invention of taking photo by plane is efficiently used by sharing depth image feature extraction network Computing resource, saves model training and highway, vehicle detection run time.Here, road and vehicle are all to use to be based on The method of deep learning.Compared with Conventional visual detection method, depth model ability to express itself is strong, compares Manual definition's feature Conventional visual detection method, accuracy rate, recall rate all greatly improve.In the case of sample size abundance, the present invention can fit Answer haze, rainy, dusk, dawn, the various varying environments such as shade, uneven illumination.In short, the present invention have computational efficiency it is high, The characteristics of detection performance is good, strong robustness.

Description of the drawings

Fig. 1 is the flow diagram that the road vehicle method of counting of taking photo by plane based on deep learning is implemented；

Fig. 2 is the structural schematic diagram of deep neural network model；

Fig. 3 is the structural schematic diagram of the road vehicle counting device of taking photo by plane based on deep learning.

Specific implementation mode

For the ease of the implementation of the present invention, it is further described with reference to specific example.

A kind of road vehicle method of counting of taking photo by plane based on deep learning as shown in Figure 1, specific steps include：

S1 obtains road vehicle sample image of taking photo by plane

Fine day of weather condition when including high noon 11-13 when obtaining sample image, cloudy, cloudy, slight haze, moderate haze, Fine day dusk when fine day dawn when dawn 7-9, dusk 17-19,8 kinds of weather conditions such as light rain when 10-16 on daytime.Imaging Angle has 90 ° just taking the photograph vertically downward, 4 kinds of imaging angles such as 30 ° of inclinations, 45 ° of inclinations, 60 ° of inclinations.Image-forming condition classification total 8 × 4=32 kinds.Captured highway includes two-way four-lane, two-way six-lane without central greenbelt, there is the double of central greenbelt To six-lane, two-way eight tracks, 6 class such as crossroad, T-shaped road junction, highway region includes city, suburb, 3 class of rural area. Therefore, there are 6 × 3=18 kinds of scenes in highway combination region.

For two-way four-lane, two-way six-lane without central greenbelt, there is the two-way six-lane, two-way of central greenbelt 4 × 3=12 kinds of scenes that the highway in eight tracks and its region generate, select about 1000 samples under each image-forming condition Image.Here highway is mainly the straight way without branch road.2 × 3 generated for crossroad, T-shaped road junction and its region =6 kinds of scenes select about 300 sample images under each image-forming condition.The data set of formation includes about 32 × (1000 × 12+ 300 × 6)=441600 sample image.

Here, drone flying height is about 80 meters, and acquisition image resolution ratio is 1920 × 1080.

S11 pre-processes the sample image of acquisition

To about 440,000 images of step S1 acquisitions, a random progress one or more of image procossing is copied：Random ruler Degree transformation（Bian Huanfanwei [0.8,1.2]Times）, random overturning（Including upper and lower, left and right, diagonally）, random brightness adjustment（Adjust model Wei [0.8, 1.2]Times）, random setting contrast（Tiao Zhengfanwei [0.5,1.5]）, the adjustment of random saturation degree（Adjusting range [0.5,1.5]）, new sample is formed, the data set for constituting about 880,000 images is amounted to.

S2 utilizes the deep neural network model of pretreated vehicle sample image training structure

For the deep neural network model that the present invention designs based on faster-rcnn networks, network structure is as shown in Figure 2.Its In, in this example, each sub-network structure of use is specific as follows：

1）Global image feature extraction sub-network, structure choice depth residual error network ResNet101, the sub-network are also optional With ResNet50, ResNet152 etc..There are 5 groups of convolution in ResNet101.

First group of convolution includes level 1 volume lamination and 1 layer of pond layer.The convolution kernel size of convolutional layer is 7 × 7, convolution kernel Number is 64, sliding step 2.The pond range size of pond layer is 3 × 3, sliding step 2.

Second group of convolution includes 3 convolution blocks, and each convolution block includes mainly 3 convolutional layers, the convolution of the first convolutional layer Core size is 1 × 1, and convolution kernel number is 64, and the convolution kernel size of the second convolutional layer is 3 × 3, and convolution kernel number is 64, third The convolution kernel size of convolutional layer is 1 × 1, and convolution kernel number is 256.Second group of convolution shares 3 × 3=9 convolutional layers.

Third group convolution includes 4 convolution blocks, and each convolution block includes mainly 3 convolutional layers, the convolution of the first convolutional layer Core size is 1 × 1, and convolution kernel number is 128, and the convolution kernel size of the second convolutional layer is 3 × 3, and convolution kernel number is 128, the The convolution kernel size of three convolutional layers is 1 × 1, and convolution kernel number is 512.Third group convolution shares 4 × 3=12 convolutional layers.

4th group of convolution includes 23 convolution blocks, and each convolution block includes mainly 3 convolutional layers, the convolution of the first convolutional layer Core size is 1 × 1, and convolution kernel number is 256, and the convolution kernel size of the second convolutional layer is 3 × 3, and convolution kernel number is 256, the The convolution kernel size of three convolutional layers is 1 × 1, and convolution kernel number is 1024.4th group of convolution shares 23 × 3=69 convolutional layers.

5th group of convolution includes 3 convolution blocks, and each convolution block includes mainly 3 convolutional layers, the convolution of the first convolutional layer Core size is 1 × 1, and convolution kernel number is 512, and the convolution kernel size of the second convolutional layer is 3 × 3, and convolution kernel number is 512, the The convolution kernel size of three convolutional layers is 1 × 1, and convolution kernel number is 2048.5th group of convolution shares 3 × 3=9 convolutional layers.

Above-mentioned all convolutional layers, sliding step are 2.

Entire ResNet101 networks share 1+9+12+69+9=100 convolutional layer.Here without using original ResNet101 Full articulamentum in network for finally exporting.

2）Sub-network is nominated in candidate target position, and structure choice region recommendation network RPN, the sub-network also can be selected Other can reach the structural network of identical function.Concrete structure includes following four part:

A. after the characteristic pattern of ResNet101 networks output, increase a convolutional layer, form new characteristic pattern.The convolutional layer Convolution kernel size is 3 × 3, and convolution kernel number is 512.

B. sliding window size and step-length are preset.The basic size of preset window is 32 × 32,64 × 64,128 × 128, 512 × 512,1024 × 1024 is 5 kinds equal, keeps area is constant its length-width ratio is made to become (0.5,1,2) times respectively, can obtain 5 altogether The sliding step of × 3=15 kinds of preset windows, preset window is 2.

C. the window that the parts sliding b generate on the new feature figure that the parts a generate.Feature corresponding to each window Figure is separately connected target identification head and returns head with position.Wherein, target identification head includes a convolutional layer and one Softmax output layers.The convolution kernel size of its convolutional layer is 1 × 1, and convolution kernel number is 6, softmax layers and exports the window respectively Mouth is two results of probability of foreground target and background.Here foreground does not differentiate between highway and two class target of vehicle.Position returns Head includes a convolutional layer and an output layer, and the convolution kernel size of convolutional layer is 1 × 1, and convolution kernel number is 12.Output Layer directly exports the target location after convolution, which includes（Y, x, h, w）Four parameters, wherein（Y, x）The position is described to rise Initial point is its top left co-ordinate in figure,（H, w）The size for describing the position is its height and the width.To candidate window Recurrence provides the more accurate position of target.

D. the frame of the position candidate generated to the parts c, does non-maxima suppression NMS processing.First, foreground probability is selected Highest frame is as reference block；Then, all the remaining frames are traversed, if the overlapping area of the frame and reference block accounts for reference The 0.7 of frame, then it is assumed that the frame indicates same object with reference block, removes the frame；After the completion of traversal, before being selected then in remaining frame The high frame of scape probability time, repeats to traverse screening process, is finally left to eliminate the candidate nomination position of redundancy frame.

3）Target scale is adapted to sub-network, and structure includes the spatial pyramid pond layer SPP-layer of two single layers, should Other, which also can be selected, in sub-network can reach the structural network of identical function, be respectively used to subsequent image recognition and image segmentation The dimension that two generic tasks carry out input feature vector figure is unified.Here only to 2）The foreground that sub-network output is nominated in candidate target position is general 512 target location is handled before rate ranking.

To each target location, the corresponding characteristic pattern in target location is uniformly scaled 7 × 7 by identification SPP-layer Width and length, depth remain 2048（The characteristic pattern depth of last layer of output of ResNet101 networks）.This feature figure will The input of sub-network is identified as vehicle identification sub-network and highway.

To each target location, the corresponding characteristic pattern in target location is uniformly scaled 14 × 14 by segmentation SPP-layer Width and length, depth remain 2048（The characteristic pattern depth of last layer of output of ResNet101 networks）.This feature figure The input of sub-network will be divided as vehicles segmentation sub-network and highway.

4）Vehicle identification sub-network, structure are made of foundation and head task portion.

Wherein, foundation is made of a full articulamentum, node number 2048.

Head task portion includes vehicle classification head and vehicle location head.Vehicle classification head is by a softmax Output layer forms, and output characteristic pattern is the probability of vehicle and background.The full articulamentum that vehicle location head is 4 by a number of nodes Composition, location parameter when output characteristic pattern is vehicle（Y, x, h, w）.Further recurrence to vehicle location, can provide mesh The more accurate position of mark.

5）Vehicles segmentation sub-network, the full convolutional network FCN that structure choice simplifies.Its structure by a convolutional layer and One sigmoid output layers composition, wherein the convolution kernel size of convolutional layer is 3 × 3, and convolution kernel number is 256.Sigmoid is defeated Go out the probability that each pixel on layer output characteristic pattern is vehicle, forms vehicles segmentation mask thermodynamic chart.

6）Highway identifies sub-network, and identical as vehicle identification sub-network, structure is by foundation and head task portion Composition.

Wherein, foundation is made of a full articulamentum, node number 2048.

Head task portion includes highway classification head and highway location head.Highway classification head is by a softmax Output layer forms, and output characteristic pattern is the probability of highway and background.The full articulamentum that highway location head is 4 by a number of nodes Composition, location parameter when output characteristic pattern is highway（Y, x, h, w）.Further recurrence to highway location, can provide mesh The more accurate position of mark.

7）Highway divides sub-network, identical as vehicles segmentation sub-network, the full convolutional network that structure choice simplifies FCN.Its structure is made of a convolutional layer and a sigmoid output layer, wherein and the convolution kernel size of convolutional layer is 3 × 3, Convolution kernel number is 256.Sigmoid output layers export the probability that each pixel on characteristic pattern is highway, form highway segmentation and cover Film thermodynamic chart.

Here, for the precision of image segmentation, involved in network to all image cuts, zoom operations be all made of floating-point Operation.

In the implementation of this example, the specific training step of deep neural network model includes：

S21, the image in sample database is labeled, is obtained with markd road vehicle sample image of taking photo by plane；

Here, manually the step S1 road vehicle sample images of taking photo by plane obtained are labeled first.To the public affairs in Aerial Images Its edge is sketched the contours in road region with vehicle in the form of closed polygon, and marks object type and title.It can be given birth to according to polygon Sample object position in figure is produced, with above-mentioned according to the minimax coordinate of mask at sample object mask（Y, x, h, w）Form indicates.Here, the position of generation, mask, the classification with mark participate in successive depths nerve net collectively as true value Network model exports the error calculation of result.

The sample image newly obtained by image preprocessing to each in step S11, will be in its corresponding original image The polygon coordinate of mark carries out the operations such as the change of scale, the overturning that are used when accordingly pre-processing, you can generates new samples image Polygon.The classification of mark object is remained unchanged with title.I other words newly obtained by image preprocessing in step S11 Sample image can be automatically generated according to the original sample of step S1 by computer, avoid heavy of artificial repeat mark Business.

To all samples that step S1, step S11 are formed, goes out verification by 20% ratio cut partition and collect, in training process Model evaluation.

S22, classified to road vehicle image using the deep neural network model of default initial parameter, divided；

In this example, extract global image feature ResNet101 networks parameter preset selected it is disclosed be directed to COCO numbers According to the model parameter of collection image recognition tasks, the pre-training model parameter as this example.Other sub-networks are all made of random pre- Setting parameter.It is 0 that random parameter, which meets mean value, the Gaussian Profile that standard deviation is 0.1.

S23, result is exported according to the deep neural network model and the error between result is marked to press from rear to preceding Mode reversely successively propagates to the entire deep neural network model, updates all parameters；

In this example, the error of the deep neural network model includes following 8：

I. the error between the destination probability and true value of the output of candidate target position nomination sub-network, is denoted as rpn_class_loss；

II. candidate target position nomination sub-network output target location and actual position between starting point coordinate range error with The sum of target width of frame height error, is denoted as rpn_bbox_loss；

III. the error between the vehicle identification probability and true value of the output of vehicle identification sub-network, is denoted as motor_class_loss；

IV. the starting point coordinate range error and target frame width between the vehicle location and actual position of the output of vehicle identification sub-network The sum for spending height error, is denoted as motor_bbox_loss；

V. single pixel belongs to the mistake between the probability of vehicle and true value in the dicing masks thermodynamic chart of vehicles segmentation sub-network output Difference and, be denoted as motor_mask_loss,

VI. the error between the highway identification probability and true value of the output of highway identification sub-network, is denoted as road_class_loss；

VII. the starting point coordinate range error and target frame between the highway location and actual position of the output of highway identification sub-network The sum of Width x Height error, is denoted as road_bbox_loss；

VIII. single pixel belongs between the probability of highway and true value in the dicing masks thermodynamic chart of highway segmentation sub-network output Error and, be denoted as road_mask_loss.

When carrying out error back propagation undated parameter, vehicle identification sub-network using motor_class_loss with The sum of motor_bbox_loss, vehicles segmentation sub-network use error motor_mask_loss, highway to identify that sub-network uses Road_class_loss and road_bbox_loss's and, highway divides sub-network and uses error road_mask_loss, candidate Target location nominate sub-network using rpn_class_loss and rpn_bbox_loss's and, global image feature extraction subnet Network uses the summation of above-mentioned 8 errors.Direction of error propagation algorithm uses stochastic gradient descent method.

S24, read new image pattern, using undated parameter deep neural network model to new sample image into Row classification, segmentation；

S25, judge whether to meet the condition of convergence or stop condition, the deconditioning if meeting.If not satisfied, returning to step S23, continue to train.

In this example, the condition of convergence of setting is：Above-mentioned 8 sum of the deviations < 0.5 on verification collection, stop condition are Cycle of training epoch >=4（That is the frequency of training of single sample is approximately more than 3,500,000）.Other important parameters in training process are also It is 0.001 including learning rate learning_rate, study momentum learning_momentum is 0.9, weight attenuation coefficient Weight_decay is 0.0001.

S3 is detected road vehicle image of taking photo by plane to be detected using trained model, identifies highway and vehicle respectively

Include mainly the following steps according to aforementioned implementation steps：

S31, input road vehicle image of taking photo by plane to be identified；

S32, by global image feature sub-network ResNet101, extract image different levels feature, export characteristic pattern；

Sub-network is nominated in S33, the characteristic pattern input candidate target position for exporting step S32, determines the candidate bit that target occurs It sets, exports several position candidate windows, and its probability for foreground target；

256 target location is further processed before S34, the foreground probability ranking exported to step S33.By the candidate of nomination The corresponding characteristic pattern in target location is adapted to the identification SPP-layer of sub-network through target scale, is uniformly scaled in step S2 the 3）Partly specific scale size；

S35, the characteristic pattern for exporting step S34, input vehicle identification sub-network and highway identification sub-network is identified, respectively Export vehicle and the target location after highway identification probability and its optimized regression；

S36, the target window in step S35, being more than 0.7 to identification probability is further processed.After optimized regression The corresponding characteristic pattern in target location is adapted to the segmentation SPP-layer of sub-network through target scale, is uniformly scaled in step S2 the 3）Partly specific scale size；

S37, the characteristic pattern for exporting step S36, input vehicles segmentation sub-network and highway segmentation sub-network is detected, respectively Export vehicles segmentation mask thermodynamic chart and highway dicing masks thermodynamic chart；

S38, the dicing masks thermodynamic chart exported to step S37 do further binary conversion treatment by threshold value 0.5, export 0-1 squares The dicing masks of formation formula.

S4, according to the testing result of step S3, to the vehicle count in highway region.

In implementation, when counting, the judgment criterion whether vehicle is located at highway region is the dicing masks and highway of vehicle Dicing masks overlapping area should be greater than 30%.If there is vehicle to appear in two sections of highways, only count primary.

The computer used in this example is equipped with intel i7-8700K CPU, dominant frequency 3.7GHZ, 6 core, 12 thread；It is interior Save as 32GB；Core calculations unit is 3 pieces of Geforce GTX 1080Ti video cards, and the core frequency of every piece of video card is 1.62GHz, 11GB containing video memory, 3584 CUDA cores.It is trained for this Exemplar Data Set, while utilizing 3 pieces of GPU, training time 13-14 It.When detection, using single GPU, single image detection time is about 800ms.Collect in verification, trained Model Identification accuracy > 93%, recall ratio > 95%.

A kind of road vehicle counting device of taking photo by plane based on deep learning as shown in Figure 3, the counting device may include：

Image collection module, for obtaining training sample image and image to be detected；

Image pre-processing module carries out random change of scale, random overturning, random brightness tune for the sample image to acquisition One or more operations such as whole, random setting contrast, the adjustment of random saturation degree, form new sample, to improve the more of sample Sample；

Training module, for the deep neural network model of training structure, the network structure packet of the deep neural network model Include Infrastructure Network part and head Task Network part.Infrastructure Network part includes global image feature extraction subnet Network unit, candidate target position nomination sub-network unit, target scale aptamer network element.Head Task Network part includes Vehicle identification sub-network unit, vehicles segmentation sub-network unit, highway identification sub-network unit, highway divide sub-network unit.

Detection module, for being detected to image to be detected using the deep neural network model that training obtains in advance, Identify highway and vehicle target therein；

Counting module, for the vehicle count in highway.

The principle of each module is similar to aforementioned method of counting in the counting device, and overlaps will not be repeated.

The main theory of method of counting and device foundation is in the present embodiment：The feature of image is thought in the research of a large amount of deep learnings The planar knot of pixel including shallow hierarchy, the linear structure of a deep level, such as variously-shaped straight line, curve, then a deep level Structure, such as rectangle, triangle, circle, then the polymerization of a variety of planar structures of a deep level, the various textural characteristics of formation.It is i.e. logical It crosses combination low level feature and forms more abstract high-level feature.Finally constitute what human eye can identify by various high-level characteristics The various target categories with practical significance.Therefore, either highway or vehicle are all by above-mentioned from shallow to deep various The structure composition of level.The two is in the lower level feature on basis, it should have the common characteristic that can largely reuse.

Generally speaking, for road vehicle enumeration problem of taking photo by plane, the present invention makes full use of in deep learning, and image basis is special Deep neural network model is divided into Infrastructure Network and head Task Network two large divisions, it is proposed that one by the general character of sign Take photo by plane road vehicle method of counting and device of the kind based on deep learning.The core of this method and device, i.e. core of the invention Content is to propose the road vehicle identification multi task model based on deep learning.The model shares depth image feature extraction net Network can identify highway and vehicle target simultaneously.On this basis, statistics highway region vehicle, realizes the accurate counting of vehicle.Its In, it is divided into foundation characteristic extraction part and two head task portions again in sub-network is nominated in candidate target position：Target Identify that head returns head with position.In vehicle identification sub-network, it is also divided into foundation and head task portion：Vehicle Classification head and vehicle location head.In highway divides sub-network, it is also divided into foundation and head task portion：Highway Classification head and highway location head.Computing resource is greatly saved by way of this mutual foundation network, improves meter Calculate efficiency.Also vehicle identification and highway can be identified that two different types of tasks are attached in the same network simultaneously, into one Step saves resource, improves efficiency.And the basic technology of the present invention is depth learning technology.Itself just has to image Outstanding descriptive power compares traditional shallow Model, target signature can be described more fully, the accuracy rate of identification, recall ratio are equal It greatly improves so that vehicle count is more accurate.

The explanation of the preferred embodiment of the present invention contained above, this be for the technical characteristic that the present invention will be described in detail, and Be not intended to invention content being limited in concrete form described in embodiment, according to the present invention content purport carry out other Modifications and variations are also protected by this patent.The purport of the content of present invention is to be defined by the claims, rather than by embodiment Specific descriptions are defined.

Claims

1. a kind of road vehicle method of counting of taking photo by plane based on deep learning, which is characterized in that specifically include following steps：

S1, the road vehicle sample image of taking photo by plane obtained under a variety of image-forming conditions form sample database, and in the sample database Road vehicle sample image of taking photo by plane is pre-processed；

2. the road vehicle method of counting of taking photo by plane based on deep learning according to claim 1, which is characterized in that step S2 In, its network structure of the deep neural network model of structure includes Infrastructure Network and head Task Network；

Infrastructure Network includes：

Head Task Network includes：

3. the road vehicle method of counting of taking photo by plane based on deep learning according to claim 1, which is characterized in that step S2 In, the training step of the deep neural network model specifically includes：

S21, the sample image in pretreated sample database is labeled, is obtained with markd road vehicle sample of taking photo by plane Image, and obtain the label result of the markd road vehicle sample image of taking photo by plane of each band；

S25, judge whether to meet the condition of convergence or stop condition, the deconditioning if meeting, if not satisfied, step S23 is returned to, Continue to train.

4. according to claims 1 or 2 or the 3 road vehicle method of counting of taking photo by plane based on deep learning, which is characterized in that step In rapid S3:

5. according to claims 1 or 2 or the 3 road vehicle method of counting of taking photo by plane based on deep learning, which is characterized in that step In rapid S3, the step of being detected to road vehicle image to be detected of taking photo by plane using the deep neural network model, is specifically wrapped It includes：

S31, input road vehicle image of taking photo by plane to be detected；

6. according to claims 1 or 2 or the 3 road vehicle method of counting of taking photo by plane based on deep learning, which is characterized in that step In rapid S5, the described pair of vehicle count being located in highway region is vehicle to whether vehicle is located at the judgment criterion in highway region Dicing masks and the dicing masks overlapping area of highway should be greater than predetermined threshold value, the representative value of the predetermined threshold value is 30%.

7. according to claims 1 or 2 or the 3 road vehicle method of counting of taking photo by plane based on deep learning, which is characterized in that step In rapid S1, pretreatment is carried out to the road vehicle sample image of taking photo by plane in the sample database and includes random rescaling, turn at random Turn, random brightness adjustment, random setting contrast, random saturation degree adjustment at least one of operate.

8. according to claims 1 or 2 or the 3 road vehicle method of counting of taking photo by plane based on deep learning, which is characterized in that step In rapid S1：

9. a kind of road vehicle counting device of taking photo by plane based on deep learning, which is characterized in that including：

Training module, for training any one of claim 1 to 8 deep neural network model；

Counting module, for the vehicle count in highway.