CN107730903A

CN107730903A - Parking offense and the car vision detection system that casts anchor based on depth convolutional neural networks

Info

Publication number: CN107730903A
Application number: CN201710440988.7A
Authority: CN
Inventors: 汤平; 汤一平; 吴越; 钱小鸿; 柳展
Original assignee: Enjoyor Co Ltd
Current assignee: Enjoyor Co Ltd
Priority date: 2017-06-13
Filing date: 2017-06-13
Publication date: 2018-02-23

Abstract

A kind of parking offense based on depth convolutional neural networks and cast anchor car vision detection system, including video camera, traffic Cloud Server and road traffic accident automatic checkout system on urban road；The various vehicles on road are extracted by depth convolutional neural networks technology in systems, then identification is calculated with optical flow method to judge whether to belong to stationary vehicle, if there is there is stationary vehicle, simultaneously quiescent time exceedes residence time threshold value it is determined that parking offense, finally carried out in real time with the means such as WebGIS or broadcast and road caution board, so that traffic police arranges to remove these traffic obstacles rapidly, remind the accident conditions that front vehicle occurs in road ahead simultaneously, so as to take corresponding measure as early as possible, the generation of second accident is avoided.A kind of robustness of present invention offer is preferable, accuracy of identification higher parking offense and the car vision detection system that casts anchor based on depth convolutional neural networks.

Description

Parking offense and the car vision detection system that casts anchor based on depth convolutional neural networks

Technical field

The present invention relates to artificial intelligence, convolutional neural networks and computer vision in parking offense and car context of detection of casting anchor Application, belong to intelligent transportation field.

Background technology

Current traffic problems have become global " city common fault ", and parking offense is the master of city " traffic illness " Show." cause of disease " of city parking offense comes from many factors, and parking offense and the car that casts anchor directly affect the smooth of road Logical and pedestrian safety, vehicle peccancy parking are always the persistent ailment of traffic administration.How the driving behavior of specification driver, to disobey The behavior of anti-traffic rules timely and accurately carries out evidence obtaining investigation, is a key of road monitoring.

Road traffic accident statistical analysis shows that traffic events have following feature：1) 20%~50% traffic Accident be due to by the influence that traffic events have occurred and caused by, traffic events have occurred here is referred to as an event, with The accident occurred afterwards is referred to as second accident；2) more than 50% second accident is to occur after an event occurs in 10 minutes , in many cases, caused by these second accidents are due to a small-scale event, if an event information shifted to an earlier date It is supplied to and sails the driver for carrying out vehicle, these second accidents can avoids；3) on urban road, traffic accident and vehicle are thrown Traffic congestion caused by anchor accounts for the 20% of whole congestions, and this kind of congestion duration is longer.

Therefore, traffic events carried out using traffic event automatic detection system fast and accurately to detect, find thing in time Part, the driver for taking measures timely and effectively to handle and eliminate traffic events and event information being alarmed to later vehicle are with regard to energy The generation of traffic congestion and preventing secondary traffic accident is enough reduced or avoided.

Chinese invention patent application number discloses a kind of method for detecting parking offense for 201310020965.2, including： The region of moving image in detection video image be present；Extract the characteristic point in the region of the moving image；If the feature of extraction The match is successful with one group of pre-recorded reference characteristic point for point, it is determined that parking offense be present.It is said that this method can be identified effectively Go out moving image present in video image and whether the characteristic point in the image that records before is consistent, so as to judge whether Parking offense.

Chinese invention patent application number is that 201310020978.X discloses a kind of parking offense detection method, including step Suddenly：Specify the prohibition parking area domain of every two field picture in the video sequence of collection；Foreground detection is carried out, detects the target in prospect； The target that tracing detection arrives, determine whether that target enters the prohibition parking area domain, be, then draw the color histogram in the domain of prohibition parking area Figure, it is no, then continue to judge；Lasting duration after the changing of the color histogram is monitored, judges whether the duration is more than Preset time, it is then to judge there is parking offense in the domain of prohibition parking area, it is no, then judge there is no parking offense in the domain of prohibition parking area.

Chinese invention patent application number discloses a kind of break in traffic rules and regulations inspection based on video technique for 201310251206.7 Survey method, comprises the following steps：1) video image of present frame is loaded into, Mobile state renewal is entered to background；2) target image is entered Row filtering process, and background subtraction point is carried out, obtain foreground image；3) binaryzation foreground image；4) binary image is sentenced It is disconnected, whether there is vehicle in the detection zone of setting；If no, terminating the processing to the frame, next frame is transferred to；If there is vehicle, Then calculate the centroid position M of vehicle；5) act of violating regulations of vehicle is judged according to centroid position.

Chinese invention patent application number discloses a kind of parking offense inspection based on image texture for 201410549079.3 Survey method, this method include following steps, are to obtain background, Target Segmentation and texture analysis successively, it is main to obtain background Refer to determine that detection zone, detection zone demarcation interval, structure detection zone model and context update, Target Segmentation are primarily referred to as Dynamical threshold selection, Target Segmentation and morphologic filtering, can more accurately it be split using the Target Segmentation method of dynamic threshold Go out moving target, while can effectively be removed using opening operation in morphologic filtering；Caused noise jamming, line in cutting procedure Reason analysis is primarily referred to as static target detection, stopped vehicle identification and parking offense event confirmation, by analyzing image intensity value The process for reaching stable state determines whether that static target occurs, while removes this by the entropy of gradation of image co-occurrence matrix During existing false alarm.

Chinese invention patent application number discloses a kind of parking offense detection method and device for 201610387236.4, wraps Include step：The detection block marked to the prohibition parking area domain progress vehicle detection of the current video frame of input, by current video frame In all detection blocks write-in current queue；All detection blocks in current queue and all detection blocks in candidate queue are entered Row similarity-rough set；When the similarity of any detection block in the detection block and current queue in candidate queue is more than or equal to the During one threshold value, adjudicate the detection block and appear in current video frame, otherwise adjudicate the detection block and do not appear in current video frame In；The probable value that detection block in statistics candidate queue occurs in continuous multiple frames frame of video；Set when probable value is more than or equal to When determining probable value P0, it is parking offense vehicle to adjudicate the vehicle in the detection block；When the detection block in current queue and candidate team When the similarity of all detection blocks in row is respectively less than first threshold, the detection block is added in candidate queue.

Chinese invention patent application number discloses a kind of vehicle peccancy based on intelligent vision for 201510064429.1 and stopped Car test examining system and method, detecting system for vision sensor nodes output end respectively with image processing terminal device, video Library management device is connected, and image processing terminal device is connected with video library managing device, image processing terminal device, video Library management device is connected with database jointly, and database is connected with GIS-Geographic Information System, at vision sensor nodes, image Manage terminal installation, video library managing device, database, GIS-Geographic Information System and form image procossing private network, image procossing private network leads to Cross secure accessing platform with Data special web to be connected, client management system is arranged in Data special web, has automatic detection vehicle Parking offense behavior, automatic record video violating the regulations, processing are timely, avoid triggering " second accident " because of processing hysteresis, guarantee The security of data, the integrality and confidentiality for making data can be effectively protected.

These above-mentioned parking offense visible detection methods, the problem of being primarily present the following aspects：

1) robustness problem：Adapt to require because traffic environment complicated and changeable proposes environment to video analysis, such as light According to change, i.e. daytime, evening, dusk, cloudy, fine day；Wind-induced video camera fine jitter；Rain, snow, mist, shade, i.e. vehicle body Shade, the shade of road resting；More occlusion situations, these can all influence the precision of Vehicle Object Target Segmentation.Adopt Accurate moving vehicles detection and tracking in traffic scene is not very suitable for the method robustness of background modeling and motion segmentation.

2) feature extraction problem：The height of accuracy of identification is heavily dependent on the modeling of foreground object, the core of modeling Heart problem is the description and expression of feature, and its difficult point is exactly feature selecting；

The vision detection technology in deep learning epoch, the robustness of detection before being still fallen within due to above-mentioned vision detection technology It is a bottleneck of field development with precision.

Recent years, deep learning in the technology of computer vision field obtained rapid development, and deep learning can utilize Substantial amounts of training sample and hidden layer successively in depth learn the abstracted information of image, more comprehensively directly obtain characteristics of image. Digital picture is described with matrix, and convolutional neural networks describe the whole of image preferably from local message block Body structure, therefore solve problem using convolutional neural networks mostly in computer vision field, deep learning method.Unroll Improve accuracy of detection and detection time, depth convolutional neural networks technology is from R-CNN, Fast R-CNN to Faster R-CNN. Further precision improvement, acceleration, end-to-end and more practical is embodied in, is almost covered from being categorized into detection, segmentation, fixed Position every field.Depth learning technology, which applies to road violation parking and car vision-based detection of casting anchor, will be one and have very much actual answer With the research field of value.

Because parking offense and the car that casts anchor have very big similitude on visual signature, i.e., there is the long period on road Static vehicle, and the static vehicle periphery is without other static vehicles；In addition, parking offense and the car that casts anchor are to traffic Harm influences what is be also similar to, so the parking offense vision-based detection that we illustrate below also includes the vision inspection for the car that casts anchor Survey.

When the vision system of the mankind is perceiving moving target, moving target can be formed on the imaging plane of vision system A kind of image stream of even variation, referred to as light stream.Light stream expresses image pixel and changed with time speed degree, is one The apparent motion of gradation of image pattern in image sequence, it is the pixel being observed on the surface motion of space motion object Instantaneous velocity field.The advantages of optical flow method, is the provision of the speed of related movement of moving target, exercise attitudes position and surface The abundant informations such as texture structure, and can be in the case where not knowing any information of scene, or even under complex scene, can also examine Measure moving target.Therefore, after road vehicle is detected, moving vehicle or static car can be distinguished with optical flow method .

On the other hand, the information consumer as parking offense and car vision-based detection of casting anchor can substantially be summarized as 2 classes Colony, a types of populations are the managers of traffic, and traffic administration person is carried out once received system detectio and reminded by WebGIS Confirm locale, these traffic obstacles are removed in general rapid arrangement；Another types of populations is road traffic participant, main To be issued around the information for the generation for avoiding second accident, by broadcasting the friendship issued front in time with road caution board and occurred Interpreter's part.

The content of the invention

In order to overcome the shortcomings of that the robustness of existing parking offense and car detection mode of casting anchor is poor, accuracy of identification is relatively low, A kind of robustness of present invention offer is preferable, accuracy of identification higher parking offense and the car that casts anchor based on depth convolutional neural networks Vision detection system.

The technical solution adopted for the present invention to solve the technical problems is：

A kind of parking offense based on depth convolutional neural networks and the car vision detection system that casts anchor, including installed in city Video camera, traffic Cloud Server and road traffic accident automatic checkout system on road；

Described video camera is used to obtain the video data on each road in city, configures in the top of road, passes through network Vedio data on road is transferred to described traffic Cloud Server；

Described traffic Cloud Server is used to receive the road video data obtained from described video camera, and is passed Give described road traffic accident automatic checkout system to be detected and identified, testing result is finally stored in Cloud Server In and after being issued in a manner of WebGIS to realize the quick reply of the control of traffic, induction and live traffic police and notice Square vehicle avoids to avoid secondary traffic accident in time；

Described traffic event automatic detection system includes road traffic accident detection module and road incidents release module；

Described road traffic accident detection module, including it is static based on Fast R-CNN vehicle detections unit, optical flow method Vehicle detection unit, parking offense judging unit；

Described road incidents release module, for issuing the traffic events occurred on road, existed by WebGIS issues The vision-based detection situation that event occurs, so that traffic police arranges to remove these traffic obstacles rapidly；Pass through broadcast and road caution board The traffic events that issue front occurs in time, the accident conditions for reminding front vehicle to occur in road ahead, so as to adopt as early as possible Corresponding measure is taken to avoid the generation of second accident.

Further, it is described to be used to detect all cars in video image based on Fast R-CNN vehicle detections unit , specific practice is the motor vehicles gone out using depth convolutional neural networks Fast Segmentation on road and provides these vehicles in road Shared spatial positional information on road；

Motor vehicle segmentation and positioning used herein are made up of two models, and a model is that the selectivity for generating RoI is searched Rope network；Another model is Fast R-CNN motor vehicle target detection networks, and detection unit structure chart is as shown in Figure 1.

Selective search network, i.e. RPN；RPN networks are built any scalogram picture as input, output rectangular target The set of frame is discussed, each frame includes 4 position coordinates variables and a score.For formation zone Suggestion box, at last Small network is slided in the convolution Feature Mapping of shared convolutional layer output, this network is connected to input convolution Feature Mapping entirely In n × n spatial window.Each sliding window is mapped on a low-dimensional vector, a sliding window of each Feature Mapping A corresponding numerical value.This vector exports the layer of the full connection at the same level to two.

In the position of each sliding window, while k suggestion areas is predicted, so position returns layer and has 4k output, The codes co-ordinates of i.e. k bounding box.Layer of classifying exports the score of 2k bounding box, i.e., is target/non-targeted to each Suggestion box Estimated probability, be the classification layer realized with the softmax layers of two classification, k can also be generated with logistic recurrence Point.K Suggestion box is parameterized by the corresponding k Suggestion box for being referred to as anchor.Each anchor is with current sliding window mouth center Centered on, and a kind of corresponding yardstick and length-width ratio, using 3 kinds of yardsticks and 3 kinds of length-width ratios, so just have in each sliding position K=9 anchor.For example, for the convolution Feature Mapping that size is w × h, then a total of w × h × k anchor.RPN nets Network structure chart is as shown in Figure 2.

In order to train RPN networks, a binary label is distributed to each anchor, is to mark the anchor with this It is not target.Then distribute positive label and give this two class anchor：(I) with some real target bounding box, i.e. Ground Truth, GT has the ratio between highest IoU, i.e. Interse-ction-over-Union, common factor union, overlapping anchor；(II) it is and any GT bounding boxs have the overlapping anchor of the IoU more than 0.7.Notice that a GT bounding box may give multiple anchor distribution positive mark Label.The IoU ratios that the negative label of distribution gives all GT bounding boxs are below 0.3 anchor.Anon-normal non-negative anchor is to instruction Practicing target does not have any effect, then abandons.

There are these to define, it then follows the multitask loss in Fast R-CNN, to minimize object function.To image Loss function is defined as：

Here, i is anchor index, p_iIt is the prediction probability that anchor is the i-th target, if anchor is Just, GT labelsIt is exactly 1, if anchor is negative,It is exactly 0；t_iIt is a vector, represents 4 parameters of the bounding box of prediction Change coordinate,It is the coordinate vector of GT bounding boxs corresponding with positive anchor；λ is a balance weight, here λ=10, N_clsIt is The normalized value of cls items is mini-batch size, here N_cls=256, N_regThe normalized value for being reg items is anchor positions The quantity put, N_reg=2,400；Classification Loss function L_clsTwo classifications, i.e. motor vehicles target and non power driven vehicle target； Logarithm loss：

For returning loss function L_reg, defined to minor function：

In formula, L_regTo return loss function, R is the loss function of robust, and smooth L are calculated with formula (4)₁；

In formula, smooth_L1For smooth L₁Loss function, x are variable；

Fast R-CNN network structures in input picture after depth convolutional neural networks as shown in figure 3, can obtain Characteristic pattern, corresponding RoIs can be then obtained according to characteristic pattern and RPN networks, finally then passes through RoI ponds layer.The layer is only There is the process in level spatial " pyramid " pond.Input is N number of Feature Mapping and R RoI.N number of Feature Mapping comes from finally One convolutional layer, the size of each Feature Mapping is w × h × c.Each RoI is a tuple (n, r, c, h, w), wherein, n It is the index of Feature Mapping, n ∈ (0,1,2 ..., N-1), r, c are top left co-ordinates, and h, w are height and width respectively.Output then by The Feature Mapping that maximum pond obtains.The effect of this layer mainly has two, first, by the block pair in the RoI and characteristic pattern in artwork It should get up；It by characteristic pattern down-sampling is fixed size that another, which is, is then passed to full connection again.

Further, selective search network is shared with detecting the weights of network：Selective search network and Fast R- CNN is stand-alone training, differently to change their convolutional layer.Therefore need using between a kind of two networks of permission The technology of shared convolutional layer, rather than learn two networks respectively.A kind of 4 practical step training algorithms are used in invention, are passed through Alternative optimization learns shared feature.The first step, according to above-mentioned training RPN, at the beginning of model of the network with ImageNet pre-training Beginningization, and end-to-end finely tune suggests task for region.Second step, the Suggestion box generated using the RPN of the first step, by Fast R-CNN train one individually detection network, this detection network be equally by the model initialization of ImageNet pre-training, At this time two networks are also without shared convolutional layer.3rd step, trained with detection netinit RPN, but fixed shared volume Lamination, and only finely tune the exclusive layers of RPN, present two network share convolutional layers.4th step, shared convolutional layer is kept to consolidate It is fixed, fine setting Fast R-CNN fc, i.e., full articulamentum.So, two network share identical convolutional layers, form one it is unified Network.

In view of object it is multiple dimensioned the problem of, use three kinds of simple chis for each characteristic point on characteristic pattern Degree, the area of bounding box is respectively 128 × 128,256 × 256,512 × 512 and three kind of length-width ratio, respectively 1:1、1:2、2: 1.Pass through this design, in this way it is no longer necessary to which Analysis On Multi-scale Features or multi-scale sliding window mouth predict big region, can reach section Save the effect of a large amount of run times.

By the processing of above-mentioned two network, the motor vehicles in a frame video image and the size and sky to it are detected Between position confined, that is, obtained size and the locus of vehicle, its r, c are that the upper left corner of vehicle in the picture is sat Mark, h, w are projected size of the vehicle in the plane of delineation, i.e. height and width respectively；Then need to judge whether these motor vehicles are located In inactive state；

Further, described optical flow method stationary vehicle detection unit is used to judge whether vehicle is in static shape on road State；When the vehicle in road scene correspond to two dimensional image plane move when, these vehicles two dimensional image plane projection just Motion is formd, the flowing that this motion is showed with plane of delineation luminance patterns is known as light stream.Optical flow method is to motion The important method that sequence image is analyzed, the movable information of Vehicle Object target in image is included in light stream；

The present invention uses a kind of sparse iterative method of Lucas-Kanade light streams based on pyramid model；Figure is first introduced below The pyramidal representation of picture, it is assumed that image I size is n_x×n_y.Define I⁰The 0th tomographic image is represented, the 0th tomographic image is rate respectively Highest image, i.e. original image, this tomographic image it is wide and a height ofWith Then we are with one Recursive mode is planted to describe pyramidal representation：We pass through I^L-1To calculate I^L(L=1,2 ...).I^L-1Represent pyramid L- 1 layer of image, I^LRepresent the image of pyramid L layers.Assuming that image I^L-1It is wide and a height ofWithSo image I^LCan be with table It is shown as

In order to simplify formula, we are by imageBoundary point value definition It is as follows,

The point that formula (5) defines must is fulfilled for conditionTherefore image I^LWidthAnd heightNeed to meet formula (6),

Image I pyramid model { I is built by formula (5) and (6)_LL=0 ..., L_m。L_mFor pyramid model Highly, L_mTypically take 2,3 or 4.For in general image L_mIt is just nonsensical more than 4.Using the image of 640 × 480 sizes as Example, the 1st, 2,3,4 tomographic image size of its pyramid model is respectively 320 × 240, and 160 × 120,80 × 60,40 × 30；

LK optical flow computation methods based on pyramid model, first the top k layer search characteristics in image pyramid model The match point of point, then kth -1 of the initial estimate in image pyramid model using the result of calculation of k layers as k-1 layers Layer search match point, goes round and begins again and iterates to the 0th layer of image pyramid model always, so as to which the light of this feature point be calculated Stream；

The detection target of optical flow method is：In front and rear two field pictures I and J, for image I some pixel u, in image Its match point v=u+d is found in J, or finds out its offset vector d, is calculated with formula (7)；

V=u+d=[u_x+d_x u_y+d_y]^T (7)

In formula, u is some pixel in image I, and v is pixel matched in image J, and d is between the two Offset vector；

First, image I and J pyramid model { I are established_LL=0 ..., L_m{ J_LL=0 ..., L_m；Then picture is calculated Vegetarian refreshments u positions in each pyramidal layers of image IL=0 ..., L_m；Then by a search window image J gold Word tower model highest tomographic image IL_mMiddle calculating uL_mMatch point vL_m, and calculate offset vector d^Lm。

Next we describe the optical flow method based on pyramid model with the mode of iteration；Assuming that pyramid mould is known The offset vector d of type L+1 layers^L+1.So by 2d^L+1As the initial value of L layers, with the match point for nearby searching for L layers vL；And then obtain the offset vector d of L layers^L；

By each layer of offset vector d of iterative calculation^LAfter (L=0 ..., Lm), the final light stream of the pixel is

In formula, d be a certain pixel light stream value, d^LFor a certain pixel L layers light stream value；

After the light stream vectors value of each feature pixel in obtaining image, engaged in this profession according to described vehicle detection unit detection Motor vehicles and shared spatial positional information on road on road, i.e., obtained each vehicle in two dimensional image plane Frame, each frame have four data representations, the position r in the upper left corner, c and length and width h, w；Here each inframe is calculated to own The average value of feature-point optical flow vector, is calculated with formula (9),

In formula,For the average value of the light stream vectors of certain vehicle inframe, d_iFor a certain feature pixel of certain vehicle inframe Light stream vectors value, n are the quantity of the feature pixel of certain vehicle inframe；

The average value of the light stream vectors of certain vehicle inframe is calculatedAfterwards, just should if the value is less than a certain threshold value T Vehicle frame is as doubtful stationary vehicle frame；Then timing, five data of the present invention are proceeded by the doubtful stationary vehicle frame Express the position r of doubtful stationary vehicle frame, the i.e. upper left corner, c, length and width h, w and quiescent time t_d；In program cyclic process In, if the doubtful stationary vehicle frame occurred in front and rear two frame will be added up in same position, quiescent time, i.e. t_d←t_d+t；

According to parking offense and the feature of car visually of casting anchor, i.e., there is the vehicle that the long period is static on road, and The static vehicle periphery is without other static vehicles；Road detected by described road traffic accident detection module Upper all stationary vehicles, but to judge whether to be parking offense and the car that casts anchor also needs to further look at these stationary vehicles Whether longer time has been stopped；

Described parking offense judging unit is used to judge to whether there is parking offense or disabled vehicles on road, sentenced Disconnected mode is first checked whether there is doubtful stationary vehicle frame, just terminates the list if there is no doubtful stationary vehicle frame Member processing；Conversely, to differentiate with that reveal now doubtful stationary vehicle frame position whether with original existing stationary vehicle frame Position overlap, if overlapped, be integrated into for doubtful stationary vehicle frame in stationary vehicle frame, and in stationary vehicle frame Quiescent time t_dAdded up；Otherwise, it is necessary to establish a new stationary vehicle frame, and start to the quiet of the stationary vehicle frame Only time t_dCarry out timing；Need further exist for judging the static standing vehicle time in stationary vehicle frame, if the stationary vehicle The quiescent time t of frame_d≥T_s, T_sFor residence time threshold value, it is arranged to here 5 minutes；If doubtful stationary vehicle frame it is static when Between more than 5 minutes so it is determined that parking offense either casts anchor car；If the vehicle of doubtful stationary vehicle inframe moves It is dynamic, i.e., the vehicle characteristics light stream vectors average value of doubtful stationary vehicle inframeMore than a certain threshold value T, it is doubtful static just to remove this Vehicle frame, on the contrary continue to retain stationary vehicle frame；Handling process is as shown in Figure 5.

Beneficial effects of the present invention are mainly manifested in：Robustness is preferable, accuracy of identification is higher.

Brief description of the drawings

Fig. 1 is Faster R-CNN structure charts；

Fig. 2 is selective search network；

Fig. 3 is Fast R-CNN structure charts；

Fig. 4 is road traffic accident vision-based detection flow chart；

Fig. 5 is the detailed flow chart of road traffic accident vision-based detection.

Embodiment

The invention will be further described below in conjunction with the accompanying drawings.

1~Fig. 4 of reference picture, a kind of parking offense based on depth convolutional neural networks and the car vision detection system that casts anchor, Including the video camera on urban road, traffic Cloud Server and road traffic accident automatic checkout system.

Described traffic event automatic detection system includes road traffic accident detection module and road incidents issue mould Block；

Described road traffic accident detection module, including it is static based on Fast R-CNN vehicle detections unit, optical flow method Vehicle detection unit and parking offense judging unit；

Described is used for all vehicles of the detection in video image based on Fast R-CNN vehicle detections unit, specifically does Method is the motor vehicles gone out using depth convolutional neural networks Fast Segmentation on road and provides these vehicles shared by road Spatial positional information；

There are these to define, it then follows the multitask loss in Fast R-CNN [17], to minimize object function.One is schemed The loss function of picture is defined as：

Here, i is anchor index, p_iIt is the prediction probability that anchor is the i-th target, if anchor is Just, GT labelsIt is exactly 1, if anchor is negative,It is exactly 0；t_iIt is a vector, represents 4 parameters of the bounding box of prediction Change coordinate,It is the coordinate vector of GT bounding boxs corresponding with positive anchor；λ is a balance weight, here λ=10, N_clsIt is The normalized value of cls items is mini-batch size, here N_cls=256, N_regThe normalized value for being reg items is anchor positions The quantity put, N_reg=2,400；Classification Loss function L_clsTwo classifications, i.e. motor vehicles target and non power driven vehicle target Logarithm loss：

For returning loss function L_reg, defined to minor function：

In formula, smooth_L1For smooth L₁Loss function, x are variable；

Selective search network is shared with detecting the weights of network：Selective search network and Fast R-CNN are independent Training, differently to change their convolutional layer.Therefore need to allow to share convolutional layer between two networks using a kind of Technology, rather than respectively learn two networks.A kind of 4 practical step training algorithms are used in invention, by alternative optimization come The shared feature of study.The first step, according to above-mentioned training RPN, the model initialization of network ImageNet pre-training, and hold and arrive End fine setting is used for region and suggests task.Second step, the Suggestion box generated using the RPN of the first step, one is trained by Fast R-CNN Individual individually detection network, this detection network is equally at this time two by the model initialization of ImageNet pre-training Network is also without shared convolutional layer.3rd step, trained with detection netinit RPN, but fixed shared convolutional layer, and only Finely tune the exclusive layers of RPN, present two network share convolutional layers.4th step, keep shared convolutional layer to fix, finely tune Fast R-CNN fc, i.e., full articulamentum.So, two network share identical convolutional layers, a unified network is formed.

Described optical flow method stationary vehicle detection unit is used to judge whether vehicle remains static on road；Work as road When vehicle in scene corresponds to two dimensional image plane motion, these vehicles are formed transporting in the projection of two dimensional image plane Dynamic, the flowing that this motion is showed with plane of delineation luminance patterns is known as light stream.Optical flow method is to movement sequence image The important method analyzed, the movable information of Vehicle Object target in image is included in light stream；

V=u+d=[u_x+d_x u_y+d_y]^T (7)

According to parking offense and the feature of car visually of casting anchor, i.e., there is the vehicle that the long period is static on road, and The static vehicle periphery is without other static vehicles；Road detected by described road traffic accident detection module Upper all stationary vehicles, but to judge whether to be parking offense and the car that casts anchor also needs to further look at these stationary vehicles Whether longer time has been stopped；Shown in Fig. 4 is several important steps of parking offense detection；

Described road incidents release module is used to issue the traffic events occurred on road, there is two kinds of basic publishers Formula, a kind of is the vision-based detection situation occurred by WebGIS issues in event, so that traffic police arranges to remove these traffic rapidly Obstacle；Another kind is by broadcasting the traffic events issued front in time with road caution board and occurred, reminding front vehicle preceding The accident conditions occurred on Fang Daolu, so as to take corresponding measure as early as possible, avoid the generation of second accident.

The foregoing is only the preferable implementation example of the present invention, be not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims

1. a kind of parking offense based on depth convolutional neural networks based on depth convolutional neural networks is examined with car vision of casting anchor Examining system, it is characterised in that：It is automatic including the video camera on urban road, traffic Cloud Server and road traffic accident Detecting system；

Described video camera is used to obtain video data on each road in city, configures in the top of road, by network by road Vedio data on road is transferred to described traffic Cloud Server；

Described traffic Cloud Server is used to receive the road video data obtained from described video camera, and is submitted to Described road traffic accident automatic checkout system is detected and identified, is finally stored in testing result in Cloud Server simultaneously Issued in a manner of WebGIS to realize the quick reply of the control of traffic, induction and live traffic police and notice rear car In time avoid to avoid secondary traffic accident；

Described road traffic accident detection module, including based on Fast R-CNN vehicle detections unit, optical flow method stationary vehicle Detection unit and parking offense judging unit；

Described road incidents release module, for issuing the traffic events occurred on road, issued by WebGIS in event The vision-based detection situation of generation, so that traffic police arranges to remove these traffic obstacles rapidly；It is timely by broadcast and road caution board The traffic events that issue front occurs, the accident conditions for reminding front vehicle to occur in road ahead, so as to take phase as early as possible Measure is answered to avoid the generation of second accident.

2. as claimed in claim 1 the parking offense based on depth convolutional neural networks based on depth convolutional neural networks and Cast anchor car vision detection system, it is characterised in that：Described is used to detect in video based on Fast R-CNN vehicle detections unit All vehicles in image, the motor vehicles gone out using depth convolutional neural networks Fast Segmentation on road simultaneously provide these vehicles The shared spatial positional information on road；

The motor vehicle segmentation and positioning used is made up of two models, and a model is the selective search network for generating RoI；Separately One model is Fast R-CNN motor vehicle target detection networks；

Described selective search network, i.e. RPN；RPN networks export rectangular target using any scalogram picture as input The set of Suggestion box, each frame include 4 position coordinates variables and a score；The target of described target Suggestion box refers to Motor vehicles object；

It is the estimated probability of target/non-targeted to each Suggestion box, is the classification layer realized with the softmax layers of two classification；K Suggestion box is parameterized by the corresponding k Suggestion box for being referred to as anchor；

Each anchor is centered on current sliding window mouth center, and a kind of corresponding yardstick and length-width ratio, uses 3 kinds of yardsticks and 3 Kind length-width ratio, so just has k=9 anchor in each sliding position；

In order to train RPN networks, a binary label is distributed to each anchor, is to mark the anchor with this Target；Then distribute positive label and give this two class anchor：(I) have with some real target bounding box, i.e. Ground Truth, GT The ratio between highest IoU, i.e. Interse-ction-over-Union, common factor union, overlapping anchor；(II) with any GT bags Enclosing box has the overlapping anchor of the IoU more than 0.7；Notice that a GT bounding box may distribute positive label to multiple anchor； The IoU ratios that the negative label of distribution gives all GT bounding boxs are below 0.3 anchor；Anon-normal non-negative anchor is to training mesh No any effect is marked, then is abandoned；

The multitask loss in Fast R-CNN is followed, object function is minimized, the loss function of an image is defined as：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mo>{</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>,</mo> <mo>{</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mrow> <mi>c</mi> <mi>k</mi> </mrow> </msub> </mfrac> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>L</mi> <mrow> <mi>c</mi> <mi>l</mi> <mi>s</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <mi>&lambda;</mi> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> </mfrac> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msubsup> <mi>p</mi> <mi>i</mi> <mo>*</mo> </msubsup> <msub> <mi>L</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Here, i is anchor index, p_iIt is the prediction probability that anchor is the i-th target, if anchor is just, GT marks LabelIt is exactly 1, if anchor is negative,It is exactly 0；t_iIt is a vector, represents 4 parametrization coordinates of the bounding box of prediction,It is the coordinate vector of GT bounding boxs corresponding with positive anchor；λ is a balance weight, N_clsThe normalized value for being cls items is Mini-batch size, N_regBe reg items normalized value be anchor positions quantity, Classification Loss function L_clsIt is two The logarithm of classification, i.e. motor vehicles subject object and non power driven vehicle target loses：

In formula, L_clsFor Classification Loss function, P_iIt is the prediction probability of the i-th target for anchor；P_i ^*For real target bounding box The prediction probability of i-th target；

For returning loss function L_reg, defined to minor function：

In formula, L_regTo return loss function, R is the loss function of robust, and smoothL is calculated with formula (4)₁；

<mrow> <msub> <mi>smooth</mi> <mrow> <mi>L</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>0.5</mn> <msup> <mi>x</mi> <mn>2</mn> </msup> </mrow> </mtd> <mtd> <mtable> <mtr> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <mrow> <mo>|</mo> <mi>x</mi> <mo>|</mo> </mrow> <mo><</mo> <mn>1</mn> </mrow> </mtd> </mtr> </mtable> </mtd> </mtr> <mtr> <mtd> <mrow> <mrow> <mo>|</mo> <mi>x</mi> <mo>|</mo> </mrow> <mo>-</mo> <mn>0.5</mn> </mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> <mo>,</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

In formula, smooth_L1For smooth L₁Loss function, x are variable；

Fast R-CNN networks, characteristic pattern can be obtained after depth convolutional neural networks in input picture, according to characteristic pattern And RPN networks can then obtain corresponding RoIs, finally then pass through RoI ponds layer；Wherein RoI, i.e. area-of-interest, refer to It is exactly the region of motor vehicle；

For Fast R-CNN networks, input is N number of Feature Mapping and R RoI；N number of Feature Mapping comes from last volume Lamination, the size of each Feature Mapping is w × h × c；

Each RoI is a tuple (n, r, c, h, w), wherein, n is the index of Feature Mapping, n ∈ (0,1,2 ..., N-1), r, C is top left co-ordinate, and h, w are height and width respectively；

Export the Feature Mapping then obtained by maximum pond；RoI in artwork is mapped with the block in characteristic pattern；And by spy It is fixed size to levy figure down-sampling, is then passed to full connection again.

3. as claimed in claim 2 the parking offense based on depth convolutional neural networks based on depth convolutional neural networks and Cast anchor car vision detection system, it is characterised in that：The selective search network and Fast R-CNN are all stand-alone trainings, are made With 4 step training algorithms, learn shared feature by alternative optimization；The first step, according to above-mentioned training RPN, the network is used The model initialization of ImageNet pre-training, and end-to-end finely tune suggests task for region；Second step, utilize the first step RPN generation Suggestion box, by Fast R-CNN train one individually detection network, this detection network be equally by The model initialization of ImageNet pre-training, at this time two networks are also without shared convolutional layer；3rd step, with detection network RPN training, but fixed shared convolutional layer are initialized, and only finely tunes the exclusive layers of RPN, present two network share convolutional layers ；4th step, shared convolutional layer is kept to fix, fine setting Fast R-CNN fc, i.e., full articulamentum；So, two networks are total to Identical convolutional layer is enjoyed, forms a unified network；

By the processing of above-mentioned two network, the motor vehicles in a frame video image and the size and space bit to it are detected Put and confined, that is, obtained size and the locus of vehicle, its r, c are the top left co-ordinate of vehicle in the picture, h, w It is projected size of the vehicle in the plane of delineation, i.e. height and width respectively；Then need to judge whether these motor vehicles are in static State.

4. the disobeying based on depth convolutional neural networks based on depth convolutional neural networks as described in one of claims 1 to 3 Chapter stops and the car vision detection system that casts anchor, it is characterised in that：Described optical flow method stationary vehicle detection unit is used to judge Whether vehicle remains static on road；When the vehicle in road scene, which corresponds to two dimensional image plane, to move, these vehicles It is formed moving in the projection of two dimensional image plane, the flowing that this motion is showed with plane of delineation luminance patterns just claims For light stream, the movable information of Vehicle Object target in image is included in light stream；

Here a kind of sparse iterative method of Lucas-Kanade light streams based on pyramid model is used；LK based on pyramid model Optical flow computation method, first image pyramid model top k layers search characteristics point match point, then with the calculating of k layers As a result kth -1 layer search match point of the initial estimate in image pyramid model as k-1 layers, goes round and begins again and changes always In generation, arrives the 0th layer of image pyramid model, so as to which the light stream of this feature point be calculated；

Using the sparse iterative method of Lucas-Kanade light streams based on pyramid model；Assuming that image I size is n_x×n_y, it is fixed Adopted I⁰Represent the 0th tomographic image, the 0th tomographic image is rate highest image, i.e. original image respectively, this tomographic image it is wide and a height ofWith Then pyramidal representation is described with recursive mode：Pass through I^L-1To calculate I^L, L=1, 2 ..., I^L-1Represent the image of pyramid L-1 layers, I^LRepresent the image of pyramid L layers, it is assumed that image I^L-1It is wide and a height of WithSo image I^LIt is expressed as

By imageThe value of boundary point be defined as follows,

<mrow> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mover> <mo>=</mo> <mo>&CenterDot;</mo> </mover> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow>

<mrow> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mover> <mo>=</mo> <mo>&CenterDot;</mo> </mover> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mn>0</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msubsup> <mi>n</mi> <mi>x</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mover> <mo>=</mo> <mo>&CenterDot;</mo> </mover> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msubsup> <mi>n</mi> <mi>x</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow>

<mrow> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <msubsup> <mi>n</mi> <mi>y</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mover> <mo>=</mo> <mo>&CenterDot;</mo> </mover> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mi>y</mi> <mo>,</mo> <msubsup> <mi>n</mi> <mi>y</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msubsup> <mi>n</mi> <mi>x</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>n</mi> <mi>y</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>)</mo> </mrow> <mover> <mo>=</mo> <mo>&CenterDot;</mo> </mover> <msup> <mi>I</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <msubsup> <mi>n</mi> <mi>x</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>-</mo> <mn>1</mn> <mo>,</mo> <msubsup> <mi>n</mi> <mi>y</mi> <mrow> <mi>L</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Image I pyramid model { I is built by formula (5) and (6)_LL=0 ..., L_m, L_mFor the height of pyramid model Degree；

LK optical flow computation methods based on pyramid model, first the top k layers search characteristics point in image pyramid model Match point, then kth -1 layer of the initial estimate in image pyramid model using the result of calculation of k layers as k-1 layers search Rope match point, go round and begin again and iterate to the 0th layer of image pyramid model always, so as to which the light stream of this feature point be calculated；

The detection target of optical flow method is：In front and rear two field pictures I and J, for image I some pixel u, in image J Its match point v=u+d is found, or finds out its offset vector d, is calculated with formula (7)；

V=u+d=[u_x+d_x u_y+d_y]^T (7)

In formula, u is some pixel in image I, and v is matched pixel in image J, and d is between the two inclined The amount of shifting to；

First, image I and J pyramid model { I are established_LL=0 ..., L_m{ J_LL=0 ..., L_m；Then pixel u is calculated The position in each pyramidal layers of image IL=0 ..., L_m；Then by a search window image J pyramid Model highest tomographic image IL_mMiddle calculating uL_mMatch point vL_m, and calculate offset vector d^Lm；

Next we describe the optical flow method based on pyramid model with the mode of iteration；Assuming that pyramid model is known The offset vector d of L+1 layers^L+1.So by 2d^L+1As the initial value of L layers, with the match point vL for nearby searching for L layers；Enter And obtain the offset vector d of L layers^L；

<mrow> <mi>d</mi> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>L</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>L</mi> <mi>m</mi> </msub> </munderover> <msup> <mn>2</mn> <mi>L</mi> </msup> <msup> <mi>d</mi> <mi>L</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

After the light stream vectors value of each feature pixel in obtaining image, detected according to described vehicle detection unit on road Motor vehicles and shared spatial positional information on road, i.e., the frame of each vehicle has been obtained in two dimensional image plane, Each frame has four data representations, the position r in the upper left corner, c and length and width h, w；Here all spies of each inframe are calculated The average value of sign point light stream vectors, is calculated with formula (9),

<mrow> <mover> <mi>d</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>d</mi> <mi>i</mi> </msub> <mo>/</mo> <mi>n</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

In formula,For the average value of the light stream vectors of certain vehicle inframe, d_iFor the light stream of a certain feature pixel of certain vehicle inframe Vector value, n are the quantity of the feature pixel of certain vehicle inframe；

The average value of the light stream vectors of certain vehicle inframe is calculatedAfterwards, if the value is less than a certain threshold value T, just by the vehicle Frame is as doubtful stationary vehicle frame；Then timing is proceeded by the doubtful stationary vehicle frame, doubted here with five data representations Like the position r of stationary vehicle frame, the i.e. upper left corner, c, length and width h, w and quiescent time t_d；In program cyclic process, such as The doubtful stationary vehicle frame occurred before and after fruit in two frames will be added up in same position, quiescent time, i.e. t_d←t_d+t。

5. the disobeying based on depth convolutional neural networks based on depth convolutional neural networks as described in one of claims 1 to 3 Chapter stops and the car vision detection system that casts anchor, it is characterised in that：Described parking offense judging unit is used to judge It is no there is parking offense or disabled vehicles, the mode of judgement is first checked whether there is doubtful stationary vehicle frame, such as Fruit just terminates the cell processing in the absence of doubtful stationary vehicle frame；Reveal now doubtful stationary vehicle frame conversely, to differentiate with that Position whether overlapped with the position of original existing stationary vehicle frame, if overlapped, return for doubtful stationary vehicle frame And into stationary vehicle frame, and to the quiescent time t in stationary vehicle frame_dAdded up；Otherwise, it is necessary to establish one it is new quiet Only vehicle frame, and start the quiescent time t to the stationary vehicle frame_dCarry out timing；Need further exist for judging in stationary vehicle frame The static standing vehicle time, if the quiescent time t of the stationary vehicle frame_d≥T_s, T_sFor residence time threshold value, set here For 5 minutes；If the quiescent time of doubtful stationary vehicle frame is more than 5 minutes so it is determined that parking offense either casts anchor Car；If the vehicle of doubtful stationary vehicle inframe is moved, i.e., the vehicle characteristics light stream vectors of doubtful stationary vehicle inframe are put down AverageMore than a certain threshold value T, the doubtful stationary vehicle frame is just removed, otherwise continues to retain stationary vehicle frame.