CN106886755A

CN106886755A - A kind of intersection vehicles system for detecting regulation violation based on Traffic Sign Recognition

Info

Publication number: CN106886755A
Application number: CN201710037410.7A
Authority: CN
Inventors: 余贵珍; 钟晓明; 秦洪懋; 张钊; 李欣旭
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-01-19
Filing date: 2017-01-19
Publication date: 2017-06-23

Abstract

This patent discloses a kind of intersection vehicles peccancy detection and discrimination method based on deep learning, methods described comprises the following steps：Step one, collection intersection video detection image；The identification of step 2, traffic sign and vehicle；Step 3, the tracking of vehicle；Step 4, vehicle peccancy detection and identification system.The present invention can automatically pick out the situation violating the regulations of intersection vehicle by based on the traffic mark board and the automatic recognition and tracking of vehicle in crossing monitor video, and the system flase drop is few, efficiency high.

Description

A kind of intersection vehicles system for detecting regulation violation based on Traffic Sign Recognition

Technical field

This patent belongs to technical field of intelligent traffic, is disobeyed in particular to a kind of intersection vehicles based on deep learning Chapter examines car and identification system.

Technical background

The punishment of parking offense can be punished by manual site in the prior art, and by the non-at-scene place of camera The mode penalized is carried out.In the prior art, the image by way of on road is clapped by being arranged on the camera in roadside, by intelligence Know situation about being driven against traffic regulations in identification image otherwise, and it is a kind of common means to intercept corresponding evidence.

But, driving against traffic regulations for intersection in the prior art, for example no turns, and place turns around, not by regulation Lanes etc., easily find in law enforcement at the scene, but are difficult to be realized by way of Intelligent Recognition.

Because different road its passing rules for setting are different, and same intersection sometimes passes through rule It is then that can be changed according to the consideration of various factors.And the camera of intersection is located at for simple road conditions, example The identification such as made a dash across the red light is accurate；And for complicated road conditions, such as road intersection not according to the rules Lanes, parking offense, no turns, and place turns around etc. identification is inaccurate, or identification degree is not high, to traveling people or Person is that vehicle supervision department brings puzzlement.

The content of the invention

This patent is based on the demand of the prior art and proposes, this patent technical problem to be solved is to carry For a kind of intersection vehicles peccancy detection based on deep learning and identification system, in order to Intelligent Measurement in intersection Vehicle peccancy situation.

In order to solve the above problems, the technical scheme that this patent is provided includes：

A kind of intersection vehicles peccancy detection and discrimination method based on deep learning, methods described comprise the following steps：

Step one, collection intersection video detection image

Video image is gathered using the video capture device in intersection；

The identification of step 2, traffic sign and vehicle

In this step, for the identification of traffic sign and vehicle, its process includes：

Step S201 carries out feature extraction to input picture

Automatic signature extraction is carried out to input picture using convolutional neural networks profit；The convolutional neural networks include feature Extract and characteristic pattern cascades two parts；

1) feature extraction, the convolutional neural networks include five layers convolutional layers, wherein the first convolutional layer includes 1_1 convolution Layer；Second convolutional layer includes 2_1 convolutional layers, 2_2 convolutional layers and 2_3 convolutional layers；3rd convolutional layer include 3_1 convolutional layers, 3_2 volumes Lamination, 3_3 convolutional layers and 3_4 convolutional layers；Volume Four lamination includes 4_1 convolutional layers, C4_2 convolutional layers, 4_3 convolutional layers and 4_4 volumes Lamination；5th convolutional layer includes 5_1 convolutional layers, 5_2 convolutional layers, 5_3 convolutional layers and 5_4 convolutional layers；Wherein the first convolutional layer Action type uses C.ReLU, and receptive field size is 7*7 pixels, and output channel number is 32.The action type of the second convolutional layer is adopted With C.ReLU, receptive field size is 3*3 pixels, and output channel number is 64；The action type of the 3rd convolutional layer uses C.ReLU, sense It is 3*3 pixels by wild size, output channel number is 128；The action type of Volume Four lamination and the 5th convolutional layer is used Inception；

2) characteristic pattern cascade, the convolution characteristic layer of the different scale in different convolution stages is cascaded up, and make use of high level The detail textures information of low-level feature is additionally contemplates that while the semantic information of feature, 3_4 convolutional layers are carried out to use maximum Chi Hua, 5_4 convolutional layer are up-sampled using bilinear interpolation algorithm, and then both are combined by 1x1's with 4_4 convolutional layers The multiple dimensioned output characteristic of 512 passages is generated after convolution as the input of following candidate region network；

The detection and localization of S202 traffic signs and vehicle

The candidate region of traffic sign and vehicle is included from positioning from the image after feature extraction, afterwards by candidate region Export the Classification and Identification module of traffic sign and vehicle；Including：

1) use candidate region network extraction candidate region, input picture by after feature extraction network, at last Small network is slided in the convolution Feature Mapping of convolutional layer output, the small network of slip is connected to input convolution Feature Mapping entirely In the spatial window of n*n, to generate candidate region frame.

2) loss function of candidate region network

In order to train candidate region network, a binary label is distributed (respectively to each candidate frame in this step Corresponding situation is：It is target, is not target).Positive label is distributed to two class candidate frames：(1) with certain real estate bounding box There is maximum IoU (two common factor areas in region and union area ratio) value；(2) with the IoU values of any real estate bounding box Candidate frame more than 0.7.It is all negative labels of candidate frame distribution with real estate IoU values less than 0.3.Candidate region network Loss function is defined as：

Wherein, i is once the candidate frame index of selection in batch iteration；p_iFor candidate frame is the probability of target, if waited It is positive label to select frame, its corresponding real estate label p_i ^*It is 1, otherwise, p_i ^*It is 0；t_iRepresent 4 parameters of the bounding box of prediction Change coordinate vector, t_i ^*It is the coordinate vector of corresponding real estate bounding box.

Classification Loss C_clsIt is directed to the logarithm loss of two classifications of target and non-targeted：

Lost for returning, usedTo calculate, wherein S (x) is：

For returning, this step is using 4 parameters of coordinate：

t_x=(x-x_a)/w_a t_y=(y-y_a)/h_a

t_w=log (w/w_a) t_h=log (h/h_a)

Wherein, x, y, w, h refer to (x, y) coordinate, width, the height at bounding box center, variable x, x_a,x^*Respectively predict Bounding box, candidate frame, the x coordinate of real estate bounding box, variable y, y_a,y^*It is the bounding box respectively predicted, candidate frame, true The y-coordinate of region bounding box；Variable w, w_a,w^*Bounding box, candidate frame, the w coordinates of real estate bounding box respectively predicted； Variable h, h_a,h^*Bounding box, candidate frame, the h coordinates of real estate bounding box respectively predicted；N_clsTake 256, N_regTake 2400, λ takes 10；t_x,t_y,t_w,t_hRepresent 4 parametrization coordinate vectors of the bounding box of prediction, t_x ^*,t_y ^*,t_w ^*,t_h ^*It is corresponding true The coordinate vector of real region bounding box.

The Classification and Identification of S203 traffic signs and vehicle

The main task of the Classification and Identification of traffic sign and vehicle be the traffic sign that finally navigates to for previous step and The candidate region of vehicle carries out feature extraction and Classification and Identification, so as to obtain the type information of traffic sign and vehicle, wherein Including：

1) identification of traffic sign and vehicle is carried out using region convolutional network, for obtaining containing traffic mark in S202 The candidate region of will, traffic sign is carried out using a kind of with candidate region network share convolution region convolutional neural networks algorithm Identification；

2) singular value decomposition, the recognition speed of Traffic Sign Recognition System is accelerated using singular value decomposition；If W is one Size is the weight matrix of u × v, it is possible to use singular value decomposition is obtained, such as formula：

W≈U∑_tV^T

In formula：U：Size is a vector of u × t；∑_t：Size is the diagonal matrix of t × t；V^T：Size for t × v to Amount；By | W^TW- λ I |=0 tries to achieve W^TThe eigenvalue λ of W；If the transposition of matrix W is W^T, by formula (W^TW)v_i=λ_iv_iTry to achieve W^TW's Characteristic vector v；According to formulaSingular value σ is obtained, and U then can be according to formulaObtain：U, ∑_t, V^TThree Matrix multiple is close to matrix W；

3) loss function of region convolutional neural networks, region convolutional neural networks can export K+1 classes to input area Target, comprising background, probability and the bounding box coordinate after returning.Loss function is defined for each training candidate region For：

L(p,u,t^u,t^*)=L_cls(p,u)+[u≥1]L_loc(t^u,t^*)

Wherein, L_cls(p, u)=- lgp_uIt is the logarithm loss of real goal classification u corresponding to candidate region；For k+1 Classification indicates p=(p₀,p₁,....,p_k)；[u >=1] is indicative function, when candidate region is background, u=0；L_loc(t^u,t^*) For the recurrence of bounding box coordinate is lost,

Function S in formula is：Wherein have for each in k class targetst^*It is the parameter coordinate vector of the corresponding real goal bounding box in candidate region；For in k class targets Predict the x coordinate of bounding box；It is the y-coordinate of the prediction bounding box in k class targets；It is the prediction bounding box in k class targets W coordinates；It is the h coordinates of the prediction bounding box in k class targets；

Step 3, the tracking of vehicle

Traffic sign in image and after the identification of vehicle, is painted by the vehicle to recognizing using track algorithm Make the enforcement track of vehicle；Using error least square and wave filterMake its response in target maximum, wherein G represents response output, and f represents input picture, and h represents Filtering Template,Represent convolution operation；Fast Fourier is carried out to above formula Become convolution operation of changing commanders and become dot product operation, that is, become：Write a Chinese character in simplified form into：G =FH^*Therefore, the task of tracking is exactly to find H^*, H^*It is the conjugation of H, i.e.,：During tracking, model is used Formula is：Wherein, ⊙ represents Hadamard products,

ObtainWherein w and v are the indexes of each element in H；It is asked Local derviation, and make local derviation be 0, i.e.,：Obtain

Finally obtaining H is：The as model formation of wave filter；Following template renewal mode：

B_t=η F_t·F_t ^*+(1-η)B_t-1

The model formation of wave filter is divided into two parts of molecule and denominator to be updated respectively, parameter more is that experience is normal Number η.Wherein A_tAnd A_t-1What is represented respectively is the molecule of present frame and previous frame；Wherein B_tAnd B_t+1What is represented respectively is present frame With the denominator of previous frame；During tracking, the image of template above and present frame is done associative operation, the corresponding knot that will be obtained In fruit that maximum corresponding coordinate as target in the position of present frame；Wherein, H_tIt is the masterplate of t wave filter, G_tTable Show t response output, G_t* it is G_tConjugation；F_tRepresent t input picture, F_t* it is F_tConjugation.

Step 4, vehicle peccancy detection and identification system

The specification of the legal traveling in the intersection is obtained by the identification of traffic mark board, then is processed by vehicle identification and tracking The driving trace of each vehicle in the intersection is obtained afterwards；By comparing the information of traffic mark board representative and the traveling rail of vehicle Mark can clearly pick out which vehicle peccancy traveling, it is possible to the video of this segment record vehicle peccancy is preserved as The foundation of the vehicle supervision departments such as delay penalty law enforcement.

The advantage of the invention is that：

1st, the present invention is by based on the traffic mark board and the automatic recognition and tracking of vehicle in crossing monitor video, entering And the situation violating the regulations of intersection vehicle can be automatically picked out, the system flase drop is few, efficiency high, without setting special taking the photograph again It is applied widely as head, can avoid expending substantial amounts of human and material resources financial resources.

2nd, the present invention makes full use of the advantage of image convolution, reduces illumination variation, the mould that color fading, motion are caused The influence of the factor to image recognition such as paste, complicated background, partial occlusion, improves antijamming capability, and recognition accuracy is high, by mistake Discrimination is low.

3. C.ReLu (x) activation primitives are applied in the present invention in the convolution of low layer, it is possible to reduce network parameter, make port number Half is reduced, the output before neuronal activation and its opposite number are carried out into cascade makes output become double, that is, reach original output Quantity, this cause without loss accuracy in the case of, calculating speed lifted one times.

4. Inception is applied in the present invention in high-rise network, mainly by controlling convolution kernel size to be lifted To the performance of Small object identification, while the port number of input can be reduced, play a part of dimensionality reduction, computational complexity drops significantly It is low.

5. the characteristic pattern cascade advantage in the present invention is the abstract characteristics of image of multilayer, and its resolution ratio is suitable to detection and localization, counts Calculate efficiency high.

Brief description of the drawings

Fig. 1 is the overall flow figure of method in this specific embodiment；

Fig. 2 is Traffic Sign Recognition and Vehicular system framework；

The overall structure figure of Fig. 3 recognizers；

Fig. 4 is C.ReLu (x) and standardized operational flowchart in this specific embodiment；

Fig. 5 is the pie graph of Inception in this specific embodiment；

In Fig. 6 this specific embodiment small Principles of Network schematic diagram is slided in convolution Feature Mapping.

Specific embodiment

The present invention is stated in detail with reference to the accompanying drawings and examples.It is pointed out that the specific embodiment Only to the citing of this patent optimal technical scheme, the limitation to this patent protection domain can not be interpreted as.

A kind of intersection vehicles peccancy detection and discrimination method based on deep learning are provided in this specific embodiment, Methods described is broadly divided into：The identification of the collection of crossing monitor video, traffic sign and vehicle, the tracking of vehicle, vehicle is disobeyed Chapter is detected and identification.Its flow is as shown in Figure 1.

Step one, collection intersection video detection image

The existing picture pick-up device acquisition monitoring of each intersection being arranged in urban road can be utilized in this step Video, generally can all set the first-class video capture device of shooting in the intersection of road.Kept away by existing video capture device The increase of cost is exempted from.But existing video detecting device directly can not intelligently detect the parking offense feelings at crossing Condition.Therefore also need to process the video that intersection collects.The video image detected in intersection includes vehicle Information, road information, traffic signal information etc..

The identification of step 2, traffic sign and vehicle

The parking offense information of vehicle can not be directly given in the video image for collecting, in order to detect parking offense Information is, it is necessary to by the position of vehicle and the species of vehicle, and traffic sign position in road and the species of traffic sign are entered Row identification.These identification informations are obtained, the transport information of correlation is obtained while computer or similar device intelligence can be easy to, For the Intelligent Recognition of parking offense provides basis.

The recognizer of traffic sign and vehicle it is critical only that a kind of more effective area-of-interest exacting method and one Kind more rapidly, accurately recognition methods, it quick detection and can identify the traffic sign and vehicle in high-resolution pictures. The identifying system framework of traffic sign and vehicle is as shown in Figure 2 in implementation method.

In this step, the identification for traffic sign and vehicle mainly includes three below part：The spy of input picture Levy the Classification and Identification of detection and localization, traffic sign and the vehicle of extraction, traffic sign and vehicle.It is above-mentioned to be identified by as follows specifically The step of realize：Main feature extraction, the traffic realized using convolutional neural networks for input picture in this step The Classification and Identification of the detection and localization, traffic sign and vehicle of mark and vehicle.In the present embodiment, the convolutional neural networks bag Include multiple convolutional layers to be set up with pond layer, more effective feature can be extracted from image, for the inspection of ensuing candidate region Survey and Classification and Identification, its course of work is as shown in Figure 3.Because convolution operation has translation invariant shape, the characteristic pattern of convolution generation In not only contain the characteristic information of object, also include the positional information of object.

Step S201 carries out feature extraction to input picture

Automatic signature extraction is carried out to input picture using convolutional neural networks profit in this step.Avoid artificial extraction The trouble of feature, simultaneously because the translation invariant characteristic of convolution, not only contains the feature of object in the characteristic pattern of convolution generation Information, also includes the positional information of object.The feature extraction network for using in this step includes feature extraction and characteristic pattern Cascade two parts.

1) feature extraction

For feature extraction, in this step, the network of first five layer of the convolutional neural networks of the feature extraction for using Result is as shown in table 1 below.

1 preceding 5 layer of convolutional layer network settings of table

The action type of wherein the first convolutional layer uses C.ReLU, and receptive field size is 7*7 (pixel), and output channel number is 32.The action type of the second convolutional layer uses C.ReLU, and receptive field size is 3*3 (pixel), and output channel number is 64；Volume three The action type of lamination uses C.ReLU, and receptive field size is 3*3 (pixel), and output channel number is 128.The C.ReLU operations C.ReLu (x) activation primitives, C.ReLu (x) activation primitives are used in type to be used for the convolution of low layer, can reduce network parameter, make Port number reduces half, and the output before neuronal activation and its opposite number are carried out into cascade makes output become double, that is, reach original Come the quantity for exporting, this causes in the case of without loss accuracy, to lift calculating speed.The mathematic(al) representation of C.ReLu is such as Under：

C.ReLu (wx+b)=ReLU (wx+b)+ReLU (- (wx+b))

Wherein w is weight, and x is the input of neuron, and b is the biasing of neuron.

In this step, it is preferable that the activation primitive is in active coating, standardization (batch is used before active coating Normalization, BN) accelerate network convergence, improve model accuracy.In the convolutional neural networks of this specific embodiment, Before BN should act on Nonlinear Mapping, i.e., the output wx+b of convolutional layer is done and standardized.I.e. in the convolution of this specific embodiment In neutral net, the input of neuron is x, then obtains (wx+b), reuses BN, then with activation primitive, obtains the nerve The output of unit.BN allows activation primitive in linearly interval by normalization, reduces the sensitivity to being input into, and increases gradient, Being conducive to model carries out gradient decline, prevents gradient disperse.

The input of BN is x={ x₁,x₂,...,x_mAnd given parameters γ, β, so having：

μ is average, σ²It is variance,To be processed into the result of standardized normal distribution

Wherein, ε is the constant more than 0.The final output of BN is：

So, it is as shown in Figure 4 using C.ReLu (x) and standardized operating process.

In 4-5 convolutional layers in this step, using Inception action types, in 4-5 layers of Inception operation class Type lifts the performance to Small object identification by controlling convolution kernel size.1x1 convolution kernels in Inception can not only increase Screening network it is non-linear, while the receptive field (not losing resolution ratio) of preceding layer can be kept, therefore to the detection of wisp There is effect well.The port number of input can be reduced after addition 1x1 convolution, plays a part of dimensionality reduction, computational complexity is significantly Reduce.The composition of Inception in this embodiment is as shown in Figure 5.

2) characteristic pattern cascade

In order to the convolution characteristic layer of the different scale in different convolution stages is cascaded up, the semanteme of high-level characteristic is make use of The detail textures information of low-level feature is additionally contemplates that while information, carries out 3_4 convolutional layers to use maximum in this step Chi Hua, 5_4 convolutional layer are up-sampled using bilinear interpolation algorithm, and then both are combined by 1x1's with 4_4 convolutional layers The multiple dimensioned output characteristic of 512 passages is generated after convolution as the input of following candidate region network.This method is to target The accuracy and positioning precision of detection are all benefited, and the identifying system of traffic sign and vehicle exactly with classification know by target positioning It is not combined.Characteristic pattern subtended network configuration such as table 2.Characteristic superiority after cascade is that multilayer is abstract, and resolution ratio is adapted to detect for, meter Calculate efficiency high.The characteristic pattern subtended network configuration table of table 2

The detection and localization of S202 traffic signs and vehicle

The detection and localization of traffic sign and vehicle as whole identifying system one of committed step, its main task is Candidate region of the positioning comprising traffic sign and vehicle from input picture, afterwards by candidate region output to traffic sign and car Classification and Identification module.

(1) candidate region network extraction candidate region is used

Input picture, in order to generate candidate region frame, is exported by after feature extraction network in last convolutional layer Small network is slided in convolution Feature Mapping, the small network of slip is connected to the spatial window of the nxn of input convolution Feature Mapping entirely On mouth.Sliding window operation is carried out on convolutional layer characteristic pattern, and dimension compared with sliding window is used on the original image Reduce.Each sliding window is mapped on a low-dimensional vector.This vector is exported to two layers of the full connection of peer：Bag Enclose box and return layer and bounding box classification layer.Wicket is slided in Feature Mapping layer set up a small network；Classification：Target and non-mesh Mark；Return：The position of bounding box.The position of sliding window provides the location information of relative picture.The recurrence of bounding box is provided Location information after the adjustment of relative sliding window, its principle is as shown in Figure 6.

In the position of each sliding window, while using multiple dimensioned candidate frame mechanism, various yardsticks and many have been corresponded to Length-width ratio is planted, k candidate frame is predicted.So returning layer has 4k output, i.e., the k codes co-ordinates of candidate frame.Classification layer leads to Cross the probability Estimation 2k score of output for carrying out target and non-targeted using softmax functions to every k candidate frame.Classification is given Possibility size of each candidate frame comprising target.The k candidate frame coordinate offset amount that layer gives is returned, by bounding box The target area coordinates position that recurrence is predicted.

Wherein, softmax functions are that multiple scalars are mapped as into a probability distribution.For k scalar x₁,x₂,..., x_k, softmax functions are defined as：

Finally, a series of selected by frame, candidate regions with target score are obtained.

(2) loss function of candidate region network

Wherein, i is once the candidate frame index of selection in batch iteration；p_iFor candidate frame is the probability of target, if waited It is positive label to select frame, its corresponding real estate label p_i ^*It is 1, otherwise, p_i ^*It is 0；t_iRepresent 4 parameters of the bounding box of prediction Change coordinate vector, t_i ^*It is the coordinate vector of corresponding real estate bounding box；

Wherein, i is once the candidate frame index of selection in batch iteration；p_iFor candidate frame is the probability of target, if waited It is positive label to select frame, its corresponding real estate label p_i ^*It is 1, otherwise, p_i ^*It is 0；

Lost for returning, usedTo calculate, wherein S (x) is：

For returning, this step is using 4 parameters of coordinate：

t_x=(x-x_a)/w_a t_y=(y-y_a)/h_a

t_w=log (w/w_a) t_h=log (h/h_a)

Wherein, x, y, w, h refer to (x, y) coordinate, width, the height at bounding box center, variable x, x_a,x^*Respectively predict Bounding box, candidate frame, the x coordinate of real estate bounding box, variable y, y_a,y^*It is the bounding box respectively predicted, candidate frame, true The y-coordinate of region bounding box；Variable w, w_a,w^*Bounding box, candidate frame, the w coordinates of real estate bounding box respectively predicted； Variable h, h_a,h^*Bounding box, candidate frame, the h coordinates of real estate bounding box respectively predicted；N_clsTake 256, N_regTake 2400, λ takes 10；t_x,t_y,t_w,t_hRepresent 4 parametrization coordinate vectors of the bounding box of prediction, t_x ^*, t_y ^*, t_w ^*, t_h ^*It is corresponding true The coordinate vector of real region bounding box.

The Classification and Identification of S203 traffic signs and vehicle

The main task of the Classification and Identification of traffic sign and vehicle be the traffic sign that finally navigates to for previous step and The candidate region of vehicle carries out feature extraction and Classification and Identification, and so as to obtain the type information of traffic sign and vehicle, this is The final step of identifying system, is also one of most important task in algorithm.

(1) identification of traffic sign and vehicle is carried out using region convolutional network

For the candidate region containing traffic sign obtained in S202, in the Classification and Identification stage of traffic sign, this hair A kind of bright identification that traffic sign is carried out using and candidate region network share convolution region convolutional neural networks algorithm.Training When, first by the data set input network with mark, generate convolution characteristic pattern.Candidate region then is mapped into this to share Characteristic pattern in, obtain corresponding characteristic information.Then, bilinear interpolation algorithm is carried out to this feature information, obtains one greatly The small provincial characteristics pondization for 7*7 is schemed.And then, one 4096 characteristic vector of dimension is further obtained by full connection, this to Amount is exactly the final feature of each candidate region that convolutional network is proposed.Finally, this characteristic vector is separately input to Softmax classifies with bounding box recurrence, and target classification and the position of candidate region are judged using non-maxima suppression method.It Afterwards, loss function is obtained using the difference between judgment value and actual mark value, then using back-propagation algorithm and boarding steps Degree descent method is optimized to network parameter structure, finally gives output network.In experiment, testing image is input to network In, directly export the classification of traffic sign and positioned.

(2) singular value decomposition

Full articulamentum in fast area network can consume the plenty of time.And Eigenvalues Decomposition is especially suitable for using in mathematics In the feature for extracting square formation.Because matrix is not square formation in full articulamentum, singular value decomposition now can be used to obtain matrix The feature of itself.Therefore this paper Selection utilizations singular value decomposition accelerates the recognition speed of Traffic Sign Recognition System.

If W is the weight matrix that a size is u × v, it is possible to use singular value decomposition is obtained, such as formula：

W≈U∑_tV^T

In formula：U：Size is a vector of u × t

∑_t：Size is the diagonal matrix of t × t

V^T：Size is the vector of t × v

By | W^TW- λ I |=0 tries to achieve W^TThe eigenvalue λ of W.

If the transposition of matrix W is W^T, W is tried to achieve by equation below^TThe characteristic vector v of W.

(W^TW)v_i=λ_iv_i

In addition to this it is possible to singular value σ is obtained according to formula (3-36), and U can then be obtained according to equation below：

U, ∑_t, V^TThree matrix multiples are close to matrix W, and the area sum of these three matrixes will be far smaller than original Matrix W, from the point of view of storage viewpoint, matrix area is smaller, and amount of storage is just smaller.Original matrix W is represented to compression stroke, we Leave three matrixes：U, ∑_t, V^T.The identification for reducing 30% using the method for singular value decomposition herein calculates time-consuming.

(3) loss function of region convolutional neural networks

After region convolutional neural networks can be to the probability of input area output K+1 classes target (including background) and recurrence Bounding box coordinate.Defining loss function for each training candidate region is：

L(p,u,t^u,t^*)=L_cls(p,u)+[u≥1]L_loc(t^u,t^*)

Wherein, L_cls(p, u)=- lgp_uIt is the logarithm loss of real goal classification u corresponding to candidate region.

P=(p are indicated for k+1 classifications₀,p₁,....,p_k).[u >=1] is indicative function, when candidate region is background, U=0.L_loc(t^u,t^*) it is the recurrence loss of bounding box coordinate,

Function S in formula is：

Wherein have for each in k class targetst^*It is the corresponding real goal in candidate region The parameter coordinate vector of bounding box；It is the x coordinate of the prediction bounding box in k class targets；For the prediction in k class targets is surrounded The y-coordinate of box；It is the w coordinates of the prediction bounding box in k class targets；It is the h coordinates of the prediction bounding box in k class targets.

Step 3, the tracking of vehicle

Traffic sign in image and after the identification of vehicle, is painted by the vehicle to recognizing using track algorithm Make the enforcement track of vehicle.

The present invention utilizes error least square and wave filter (Minimum Output Sum of Squared Error Filter, MOSSE), make its response in target maximum, then equation below：

Wherein g represents response output, and f represents input picture, and h represents Filtering Template,Represent convolution operation.Responded Output g, only need to determine filter template h.The calculating of above formula is to carry out convolutional calculation, when this calculating in a computer is consumed very Big, therefore Fast Fourier Transform (FFT) (FFT) is carried out to above formula herein.Consequently, it is possible to convolution operation becomes by fast Fourier Dot product operation has been reformed into after changing, amount of calculation has been greatly reduced.Above formula becomes following form：

Describe for convenience, above formula is written as form：H=FH^*Therefore, the task of tracking is exactly to find H^*, H^*For The conjugation of H, i.e.,：

During actual tracking, it is contemplated that the influence of the factor such as variable cosmetic of target, while considering m of target Image is used as reference, so as to improve the robustness of filter template, this model of MOSSE is proposed in this patent, and its model is public Under formula enters：

Wherein, ⊙ represents Hadamard products, for point-to-point multiplying between matrix or vector, the operation of above formula All it is that Element-Level is other, it is only necessary to make the MOSSE minimums of each element (w and v are the indexes of each element in H) therein i.e. Can.Therefore above formula can be exchanged into following form：

Expect minimumOnly need to seek above formula local derviation, and make local derviation be 0, i.e.,：

Obtain：

Finally obtaining H is：

Above formula is the model formation of wave filter.

There is more preferable robustness in order to allow the ectocines such as wave filter pair and deformation, illumination, following template is taken Update mode：

B_t=η F_t·F_t ^*+(1-η)B_t-1

Wherein, H_tIt is the masterplate of t wave filter, G_tRepresent t response output, G_t* it is G_tConjugation；F_tWhen representing t Carve input picture, F_t* it is F_tConjugation.

The model formation of wave filter is divided into two parts of molecule and denominator to be updated respectively, parameter more is that experience is normal Number η.Wherein A_tAnd A_t-1What is represented respectively is the molecule of present frame and previous frame；Wherein B_tAnd B_t-1What is represented respectively is present frame With the denominator of previous frame.

During tracking, it is only necessary to the image of template above and present frame is done associative operation, the corresponding knot that will be obtained In fruit that maximum corresponding coordinate as target in the position of present frame.This is the equal of to be translated on two dimensional surface Our filter template.

Step 4, vehicle peccancy detection and identification system

The specification of the legal traveling in the intersection is obtained by the identification of traffic mark board, then is processed by vehicle identification and tracking The driving trace of each vehicle in the intersection is obtained afterwards.By comparing the information of traffic mark board representative and the traveling rail of vehicle Mark can clearly pick out which vehicle peccancy traveling, it is possible to the video of this segment record vehicle peccancy is preserved as The foundation of the vehicle supervision departments such as delay penalty law enforcement.For example, identifying the traffic mark board for representing that no turns (no_turn) While, if the driving trace of certain vehicle of intersection shows that the vehicle is the traveling that turns around, show that the vehicle exists Drive against traffic regulations.

The present invention by based on the traffic mark board and the automatic recognition and tracking of vehicle in crossing monitor video, and then The situation violating the regulations of intersection vehicle can be automatically picked out, the system flase drop is few, efficiency high, without setting special shooting again Head, it is applied widely, can avoid expending substantial amounts of human and material resources financial resources.

Claims

1. a kind of intersection vehicles peccancy detection and discrimination method based on deep learning, it is characterised in that methods described includes Following steps：

Step one, collection intersection video detection image

Video image is gathered using the video capture device in intersection；

The identification of step 2, traffic sign and vehicle

Step S201 carries out feature extraction to input picture

Automatic signature extraction is carried out to input picture using convolutional neural networks profit；The convolutional neural networks include feature extraction Two parts are cascaded with characteristic pattern；

1) feature extraction, the convolutional neural networks include five layers convolutional layers, wherein the first convolutional layer includes 1_1 convolutional layers； Second convolutional layer includes 2_1 convolutional layers, 2_2 convolutional layers and 2_3 convolutional layers；3rd convolutional layer includes 3_1 convolutional layers, 3_2 convolution Layer, 3_3 convolutional layers and 3_4 convolutional layers；Volume Four lamination includes 4_1 convolutional layers, C4_2 convolutional layers, 4_3 convolutional layers and 4_4 convolution Layer；5th convolutional layer includes 5_1 convolutional layers, 5_2 convolutional layers, 5_3 convolutional layers and 5_4 convolutional layers；The wherein behaviour of the first convolutional layer C.ReLU is used as type, receptive field size is 7*7 pixels, and output channel number is 32.The action type of the second convolutional layer is used C.ReLU, receptive field size is 3*3 pixels, and output channel number is 64；The action type of the 3rd convolutional layer uses C.ReLU, impression Wild size is 3*3 pixels, and output channel number is 128；The action type of Volume Four lamination and the 5th convolutional layer is used Inception；

2) characteristic pattern cascade, the convolution characteristic layer of the different scale in different convolution stages is cascaded up, and make use of high-level characteristic Semantic information while be additionally contemplates that the detail textures information of low-level feature, 3_4 convolutional layers are carried out using maximum pond, 5_4 convolutional layers are up-sampled using bilinear interpolation algorithm, and then both are combined by the convolution of 1x1 with 4_4 convolutional layers The multiple dimensioned output characteristic of 512 passages is generated afterwards as the input of following candidate region network；

The detection and localization of S202 traffic signs and vehicle

The candidate region of traffic sign and vehicle is included from positioning from the image after feature extraction, afterwards candidate region is exported To traffic sign and the Classification and Identification module of vehicle；Including：

1) use candidate region network extraction candidate region, input picture by after feature extraction network, in last convolution Small network is slided in the convolution Feature Mapping of layer output, the small network of slip is connected to the n*n of input convolution Feature Mapping entirely Spatial window on, to generate candidate region frame.

2) loss function of candidate region network

In order to train candidate region network, distribute a binary label to each candidate frame in this step and (correspond to respectively Situation be：It is target, is not target).Positive label is distributed to two class candidate frames：(1) have most with certain real estate bounding box Big IoU (two common factor areas in region and union area ratio) value；(2) the IoU values with any real estate bounding box are more than 0.7 candidate frame.It is all negative labels of candidate frame distribution with real estate IoU values less than 0.3.The loss of candidate region network Function is defined as：

C {p_{i}}, {t_{i}} = \frac{1}{N_{c l s}} \underset{i}{Σ} C_{c l s} (p_{i}, {p_{i}}^{*}) + λ \frac{1}{N_{r e g}} \underset{i}{Σ} {p_{i}}^{*} C_{r e g} (t_{i}, {t_{i}}^{*})

Wherein, i is once the candidate frame index of selection in batch iteration；p_iFor candidate frame is the probability of target, if candidate frame is Positive label, its corresponding real estate label p_i ^*It is 1, otherwise, p_i ^*It is 0；t_iRepresent 4 parametrization coordinates of the bounding box of prediction Vector, t_i ^*It is the coordinate vector of corresponding real estate bounding box；

C_{c l s} (p_{i}, p_{i}^{*}) = - l o g [p_{i} p_{i}^{*} + (1 - p_{i}) (1 - p_{i}^{*})]

Wherein, i is once the candidate frame index of selection in batch iteration；p_iFor candidate frame is the probability of target, if candidate frame is Positive label, its corresponding real estate label p_i ^*It is 1, otherwise, p_i ^*It is 0；

Lost for returning, usedTo calculate, wherein S (x) is：

S (x) = \{\begin{matrix} 0.5 x^{2}, i f | x | < 1 \\ | x | - 0.5, o t h e r s \end{matrix}

For returning, this step is using 4 parameters of coordinate：

t_x=(x-x_a)/w_a t_y=(y-y_a)/h_a

t_w=log (w/w_a) t_h=log (h/h_a)

\begin{matrix} t_{x}^{*} = (x^{*} - x_{a}) / w_{a} & t_{y}^{*} = (y^{*} - y_{a}) / h_{a} \end{matrix}

\begin{matrix} t_{w}^{*} = l o g (w^{*} / w_{a}) & t_{h}^{*} = l o g (h^{*} / h_{a}) \end{matrix}

Wherein, x, y, w, h refer to (x, y) coordinate, width, the height at bounding box center, variable x, x_a,x^*The encirclement respectively predicted Box, candidate frame, the x coordinate of real estate bounding box, variable y, y_a,y^*Bounding box, candidate frame, the real estate respectively predicted The y-coordinate of bounding box；Variable w, w_a,w^*Bounding box, candidate frame, the w coordinates of real estate bounding box respectively predicted；Variable h,h_a,h^*Bounding box, candidate frame, the h coordinates of real estate bounding box respectively predicted；N_clsTake 256, N_reg2400, λ is taken to take 10；t_x,t_y,t_w,t_hRepresent 4 parametrization coordinate vectors of the bounding box of prediction, t_x ^*,t_y ^*,t_w ^*,t_h ^*It is corresponding true area The coordinate vector of domain bounding box；

The Classification and Identification of S203 traffic signs and vehicle

The main task of the Classification and Identification of traffic sign and vehicle is the traffic sign and vehicle finally navigated to for previous step Candidate region carry out feature extraction and Classification and Identification, so as to obtain the type information of traffic sign and vehicle, including：

1) identification of traffic sign and vehicle is carried out using region convolutional network, for what is obtained in S202 containing traffic sign Candidate region, using a kind of knowledge that traffic sign is carried out with candidate region network share convolution region convolutional neural networks algorithm Not；

2) singular value decomposition, the recognition speed of Traffic Sign Recognition System is accelerated using singular value decomposition；If W is a size It is the weight matrix of u × v, it is possible to use singular value decomposition is obtained, such as formula：

W≈U∑_tV^T

In formula：U：Size is a vector of u × t；∑_t：Size is the diagonal matrix of t × t；V^T：Size is the vector of t × v； By | W^TW- λ I |=0 tries to achieve W^TThe eigenvalue λ of W；If the transposition of matrix W is W^T, by formula (W^TW)v_i=λ_iv_iTry to achieve W^TThe feature of W Vector v；According to formulaSingular value σ is obtained, and U then can be according to formulaObtain：U, ∑_t, V^TThree matrixes It is multiplied close to matrix W；

3) loss function of region convolutional neural networks, region convolutional neural networks can export K+1 class targets to input area, Comprising background, probability and the bounding box coordinate after returning.Defining loss function for each training candidate region is：

L(p,u,t^u,t^*)=L_cls(p,u)+[u≥1]L_loc(t^u,t^*)

Wherein, L_cls(p, u)=- lg p_uIt is the logarithm loss of real goal classification u corresponding to candidate region；For k+1 classifications Indicate p=(p₀,p₁,....,p_k)；[u >=1] is indicative function, when candidate region is background, u=0；L_loc(t^u,t^*) it is bag The recurrence loss of box coordinate is enclosed,

Step 3, the tracking of vehicle

Traffic sign in image and after the identification of vehicle, is drawn out by the vehicle to recognizing using track algorithm The enforcement track of vehicle；Using error least square and wave filterMake its response in target maximum, wherein g tables Show response output, f represents input picture, and h represents Filtering Template,Represent convolution operation；Fast Fourier change is carried out to above formula Convolution operation of changing commanders becomes dot product operation, that is, become：Write a Chinese character in simplified form into：G= F·H^*Therefore, the task of tracking is exactly to find H^*, H^*It is the conjugation of H, i.e.,：It is public using model during tracking Formula is：Wherein, ⊙ represents Hadamard products,

ObtainWherein w and v are the indexes of each element in H；Local derviation is asked to it, And make local derviation be 0, i.e.,：Obtain

H_{t} = \frac{A_{t}}{B_{t}}

A_{t} = {ηF}_{t} \cdot G_{t}^{*} + (1 - η) A_{t - 1}

B_{t} = {ηF}_{t} \cdot {F_{t}}^{*} + (1 - η) B_{t - 1}

The model formation of wave filter is divided into two parts of molecule and denominator to be updated respectively, parameter more is empirical η. Wherein A_tAnd A_t-1What is represented respectively is the molecule of present frame and previous frame；Wherein B_tAnd B_t-1What is represented respectively is present frame and upper The denominator of one frame；During tracking, the image of template above and present frame is done associative operation, in the accordingly result that will be obtained That maximum corresponding coordinate is as target in the position of present frame；Wherein, H_tIt is the masterplate of t wave filter, G_tRepresent t Time of day response is exported, G_t* it is G_tConjugation；F_tRepresent t input picture, F_t* it is F_tConjugation.

Step 4, vehicle peccancy detection and identification system

The specification of the legal traveling in the intersection is obtained by the identification of traffic mark board, then by being obtained after vehicle identification and tracking treatment The driving trace of each vehicle in the intersection；By comparing the information of traffic mark board representative and the driving trace energy of vehicle Enough which vehicle peccancy travelings that clearly picks out, it is possible to preserve the video of this segment record vehicle peccancy as violating the regulations The foundation of the vehicle supervision departments such as fine law enforcement.

2. a kind of intersection vehicles peccancy detection and discrimination method based on deep learning according to claim 1, it is special Levy and be,

Each sliding window is mapped on a low-dimensional vector；This vector is exported to two layers of the full connection of peer：Surround Box returns layer and bounding box classification layer；

Wicket is slided in Feature Mapping layer set up a small network；Classification：Target and non-targeted；Return：The position of bounding box. The position of sliding window provides the location information of relative picture；After the recurrence of bounding box provides the adjustment of relative sliding window Location information；In the position of each sliding window, while using multiple dimensioned candidate frame mechanism, corresponded to various yardsticks and Various length-width ratios, predict k candidate frame.So returning layer has 4k output, i.e., the k codes co-ordinates of candidate frame；

Classification layer exports 2k by carrying out target using softmax functions to every k candidate frame with the probability Estimation of non-targeted Score；Classification gives possibility size of each candidate frame comprising target；Return the k candidate frame coordinate offset that layer gives Amount, the target area coordinates position predicted is returned by bounding box；

Wherein, soft max functions are that multiple scalars are mapped as into a probability distribution.For k scalar x₁,x₂,...,x_k, Soft max functions are defined as：Finally, obtain it is a series of it is being selected by frame, obtain with target The candidate region divided.