CN108205649A

CN108205649A - Driver drives to take the state identification method and device of phone

Info

Publication number: CN108205649A
Application number: CN201611185468.8A
Authority: CN
Inventors: 陈鑫嘉; 张震
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2016-12-20
Filing date: 2016-12-20
Publication date: 2018-06-26
Anticipated expiration: 2036-12-20
Also published as: CN108205649B

Abstract

The application provides a kind of driver and drives to take the state identification method and device of phone, the method includes：The vehicle window of target vehicle in monitoring image is positioned；According to the location information of vehicle window and the license board information of target vehicle, obtain driver and detect candidate region；Utilization orientation histogram of gradients and support vector machines detect candidate region to the driver and are detected, and obtain the head and shoulder region of the driver；The head and shoulder region is sequentially input into first layer CNN networks and second layer CNN networks, the first layer CNN networks tentatively screen the head and shoulder region, obtain the head and shoulder region of the doubtful driver for taking phone, the second layer CNN networks further screen the output result of the first layer CNN networks, obtain the state that the driver takes phone.The application can exclude it is non-genuine under complex scene take accidentally picking up for phone, improve accuracy of identification, adaptability, the robustness of scene are preferable.

Description

Driver drives to take the state identification method and device of phone

Technical field

This application involves field of video monitoring more particularly to a kind of driver drive to take phone state identification method and Device.

Background technology

If driver takes phone in driving process, traffic accident rate can be greatly increased, thus is needed to driving The state that the person's of sailing truck takes phone effectively identified, the important evidence whether broken rules and regulations as it.

With the continuous development of image processing techniques, computer vision technique, depth learning technology and embedded technology, How a research in current intelligent transportation is had become to vehicle (containing occupant) progress automatic discrimination violating the regulations and evidence obtaining Hot spot.

The prior art provide it is a kind of identify that driver phones with mobile telephone the method (CN105868690A) of behavior, acquisition first is driven The video flowing in cabin is sailed, then by face component model orientation human face region, then carries out face correction, uses nonlinear discriminant Relationship trains two groups of parameters to be made whether to make a phone call to differentiate, wherein train classification models include ear region training set, phone with mobile telephone Training set and training set of not phoning with mobile telephone.This method uses DPM (Deformable Part Model, deformable part model) portion Part detection algorithm carry out human face region positioning, detection very take, on block or the Detection accuracy of face ambiguity influence It is larger；This method is made whether to make a phone call to differentiate using Nonlinear Classification method, by training ear region, region of phoning with mobile telephone, Region of not phoning with mobile telephone is identified, and accuracy rate is relatively low.

The prior art also provides the automatic monitoring method that mobile phone behavior is taken in a kind of driver drives vehicle way (CN103366506A), by image acquiring device, driver head and near zone are obtained first, is obtained using Face Detection It to the position of driver face and hand in the picture, reuses support vector machines and classifies, dock the driver to phone with mobile telephone It gives a warning.This method obtains the position of driver face and hand in the picture using Face Detection, and driver's cabin imaging is multiple Miscellaneous, daylighting and special weather can be larger to Imaging in the vehicle window under complex scene, thus this method missing inspection, flase drop compared with It is more.

Invention content

In view of this, the application provides a kind of driver and drives to take the state identification method and device of phone, to solve The problem of state recognition accuracy rate that driver's driving in the prior art takes phone is relatively low.

Specifically, the application is achieved by the following technical solution：

It drives to take the state identification method of phone, the side in a first aspect, providing a kind of driver according to the application's Method includes：

The vehicle window of target vehicle in monitoring image is positioned；

According to the location information of vehicle window and the license board information of target vehicle, obtain driver and detect candidate region；

Utilization orientation histogram of gradients and support vector machines detect candidate region to the driver and are detected, and obtain institute State the head and shoulder region of driver；

The head and shoulder region is sequentially input into first layer CNN networks and second layer CNN networks, the first layer CNN networks The head and shoulder region is tentatively screened, obtains the head and shoulder region of the doubtful driver for taking phone, the second layer CNN nets Network further screens the output result of the first layer CNN networks, obtains the state that the driver takes phone.

Optionally, it before the head and shoulder region is inputted first layer CNN networks and second layer CNN networks, further includes：

The histograms of oriented gradients feature in the head and shoulder region and local binary feature are extracted, and the direction gradient is straight Square figure feature and local binary feature are combined, to form the feature vector of a multidimensional；

Classified using linear discriminant analysis to described eigenvector, filter out non-head and shoulder region.

Optionally, the method further includes：

It is left area, right area and whole area by the head and shoulder region division of the doubtful driver for taking phone Domain；

The left area, right area and overall region are inputted into second layer CNN networks respectively, obtain the driver The state of phone is taken, the state that the driver takes phone includes：The left side takes phone, the right takes number, does not engage electricity Words and can not penalty.

Optionally, the method further includes：

If present frame monitoring image recognition result is in for driver and takes telephone state, to present frame monitoring image Next frame monitoring image continues to identify, if the recognition result of the next frame monitoring image is in for driver takes phone shape State is then alerted；Otherwise, it abandons alerting.

Optionally, the location information acquisition process of the vehicle window includes：

According to the license board information of the target vehicle, vehicle window upper right comer region is obtained；

Utilize the location information of positioning filter positioning vehicle window upper right angle point.

According to the second aspect of the application, a kind of driver is provided and drives to take the status identification means of phone, the dress Put including：

Locating module positions the vehicle window of target vehicle in monitoring image；

Region acquisition module according to the location information of vehicle window and the license board information of target vehicle, obtains driver's detection Candidate region；

Module of target detection, utilization orientation histogram of gradients and support vector machines to the driver detect candidate region into Row detection obtains the head and shoulder region of the driver；

The head and shoulder region is sequentially input first layer CNN networks and second layer CNN networks by identification module, and described first Layer CNN networks tentatively screen the head and shoulder region, the head and shoulder region of the doubtful driver for taking phone of acquisition, and described the Two layers of CNN networks further screen the output result of the first layer CNN networks, obtain the shape that the driver takes phone State.

Optionally, the module of target detection further includes：

Feature extraction submodule extracts the histograms of oriented gradients feature in the head and shoulder region and local binary feature, and The histograms of oriented gradients feature and local binary feature are combined, to form the feature vector of a multidimensional；

Filter submodule classifies to described eigenvector using linear discriminant analysis, filters out non-head and shoulder region.

Optionally, the identification module further includes：

Submodule is divided, is left area, the right area by the head and shoulder region division of the doubtful driver for taking phone Domain and overall region；

The left area, right area and overall region are inputted second layer CNN nets by fusion recognition submodule respectively Network, obtains the state that the driver takes phone, and the state that the driver takes phone includes：The left side takes phone, the right side While it takes phone, make a phone call and can not penalty.

Optionally, described device further includes：

Multiframe authentication module, if present frame monitoring image recognition result is in for driver takes telephone state, to working as The next frame monitoring image of previous frame monitoring image continues to identify, if the recognition result of the next frame monitoring image is at driver In taking telephone state, then alerted；Otherwise, it abandons alerting.

Optionally, the locating module includes：

The advantageous effect of the application：By vehicle glazing zone location, driver's target detection (head and shoulder region detection) with And the step of cascade these pure video detections of CNN networks, the time-consuming consumption that artificial detection driving takes telephone state can be removed from Power, and can exclude under complex scene it is non-genuine take accidentally picking up for phone, improve accuracy of identification, be compared to tradition side Method, penalty accuracy rate higher, adaptability, robustness to scene are more preferably.

It should be understood that above general description and following detailed description are only exemplary and explanatory, not The application can be limited.

Description of the drawings

Attached drawing herein is incorporated into specification and forms the part of this specification, shows the implementation for meeting the application Example, and for explaining the principle of the application together with specification.

Fig. 1 is that a kind of driver provided by the embodiments of the present application drives to take the state identification method flow chart of phone；

Fig. 2 is a kind of flow chart of target detection provided by the embodiments of the present application；

Fig. 3 is a kind of flow chart of target identification provided by the embodiments of the present application；

Fig. 4 is that a kind of driver provided by the embodiments of the present application drives to take the status identification means structure diagram of phone；

Fig. 5 is the structure diagram of module of target detection provided by the embodiments of the present application；

Fig. 6 is the structure diagram of identification module provided by the embodiments of the present application；

Fig. 7 is that another driver provided by the embodiments of the present application drives to take the status identification means structural frames of phone Figure.

Specific embodiment

Here exemplary embodiment will be illustrated in detail, example is illustrated in the accompanying drawings.Following description is related to During attached drawing, unless otherwise indicated, the same numbers in different attached drawings represent the same or similar element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.In addition, do not conflicting In the case of, the feature in following embodiment and embodiment can be combined with each other.

It is only merely for the purpose of description specific embodiment in term used in this application, and is not intended to be limiting the application. It is also intended in the application and " one kind " of singulative used in the attached claims, " described " and "the" including majority Form, unless context clearly shows that other meanings.It is also understood that term "and/or" used herein refers to and wraps Containing one or more associated list items purposes, any or all may be combined.

It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application A little information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, not departing from In the case of the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determining ".

Referring to Fig. 1, a kind of driver provided in this embodiment drives to take the state identification method of phone, and the method can To include：

S101：The vehicle window of target vehicle in monitoring image is positioned.

Wherein, monitoring scene may be selected that vehicle flowrate is more or accident easily sends out road scene, for example, common bayonet road, institute Monitoring image is stated captured by bayonet camera.

In one embodiment, for collected monitoring image f_src(x, y), x, y are respectively the horizontal seat put on monitoring image Mark and ordinate, the width of monitoring image is Width, a height of Height, it is assumed that has obtained the car plate letter of target vehicle in image Breath, the license board information can include car plate color LpColor and license board information Lp:(x, y, w, h), wherein, Lp (x, y) is respectively It is the abscissa of car plate and ordinate (abscissa and ordinate in the car plate upper left corner or the abscissa or ordinate at car plate center Deng), Lp (w, h) is the width and height of car plate respectively.

Wherein, vehicle window positioning is not limited to following scheme, it is, for example, possible to use Hough straight-line detections position vehicle window, Huo Zhetong It crosses a kind of Adaboost (iterative algorithm) vehicle window detection positioning vehicle window or determines vehicle window area using vehicle window upper right Corner character Domain.

In the present embodiment, vehicle window region is determined using vehicle window upper right Corner character.

Specifically, the location information acquisition process of the vehicle window includes：

In the present embodiment, vehicle window upper right comer region is estimated by license board information first, reuses positioning filter Position the location information of upper right angle point.

For a candidate vehicle window upper right corner image f_un(x, y) utilizes positioning filterTo candidate image f_un (x, y) carry out convolution calculation formula be：

In formula (1), x, y are respectively the abscissa and ordinate at candidate image midpoint；

Represent convolution operation, g (x, y) is candidate image f_un(x, y) and positioning filterConvolution results.

Calculated by formula (1), find out g (x, y) peak point be vehicle window upper right corner dot position information RgtUp (x, y)。

In the present embodiment, vehicle window upper right Corner character wave filterIt is trained by the image demarcated in batches It arrives：

In formula (2), h_i(x, y) is the corresponding positioning filter of i-th vehicle window upper right corner picture,

a′_iFor the filter weight coefficient after normalized,

In the present embodiment,

In formula (3), x_i、y_iThe location information of vehicle window upper right angle point for calibration；

δ is empirical coefficient；

f_un(x, y) is known vehicle window upper right corner image, and anti-h can be released according to formula (1) and (3)_i(x, y), by criticizing Amount has marked image, and positioning filter is calculated

S102：According to the location information of vehicle window and the license board information of target vehicle, obtain driver and detect candidate region.

Wherein, the shape for rule can be set as needed in the shape of driver's detection candidate region.In the present embodiment, drive The person of sailing detects the shape selection rectangle of candidate region.

The vehicle window upper right corner dot position information RgtUp (x, y) of target vehicle has been obtained according to step S101, has been believed with reference to car plate Cease Lp:(x, y, w, h) is calculated and is obtained driver detection candidate region Driver (x, y, w, h)：

Driver (x)=min (RgtUp (x)-α * Lp (w), 0) (2)

Driver (y)=min (RgtUp (y)-β * Lp (w), 0) (3)

Driver (w)=min (γ * Lp (w), Width-Driver (x)) (4)

Driver (h)=min (ε * Lp (w), Height-Driver (y)) (5)

In formula (2)-(5), Driver (x, y) be respectively driver detect candidate region vertex (such as can be a left side Upper angular vertex, lower-left angular vertex, upper right angular vertex and bottom right angular vertex) abscissa and ordinate；

Driver (w, h) is the width and height that driver detects candidate region respectively；

α, β, ε and γ are preset empirical coefficient, wherein, α ∈ [1,2], β ∈ [0.5,1], ε ∈ [2.5,3.5], γ ∈[1.5,2.5]。

In the present embodiment, α=1.5, β=0.5, γ=2.0, ε=3.0.

According to above-mentioned formula (2)-(5), it can calculate and obtain driver detection candidate region Driver (x, y, w, h).

S103：Utilization orientation histogram of gradients and support vector machines detect candidate region to the driver and are detected, Obtain the head and shoulder region of the driver.

Referring to Fig. 2, in this step, using histograms of oriented gradients (HOG, Histogram of Oriented Gradient) combination supporting vector machine (SVM, Support Vector Machine) classification obtains driver region coarse positioning Obtain the head and shoulder region of the driver.Extraction driver detect candidate region HOG features after, by HOG features input SVM into Row training, so as to carry out coarse positioning to driver region.

In the present embodiment, after candidate region is detected to driver and carries out histograms of oriented gradients feature extraction, pass through cunning The mode of window detection obtains the histograms of oriented gradients feature of corresponding sliding window size, and input SVM classifies respectively, is driven The person's of sailing head and shoulder detects.And the head and shoulder detection of use direction histogram of gradients feature is compared to colour of skin Face datection, DPM or The advantage of Adaboost Face datections is, for face block (sunshading board blocks face or vehicle window partial occlusion face etc.) or Under the reflective ambiguity of person's face, driver's detection has higher success rate (3% to 5% range).

The classification results in step S103 detection driver's head and shoulders region can be divided into head and shoulder target and non-head and shoulder target, but still There are more flase drops, are affected to result, it is therefore desirable to which histograms of oriented gradients combination supporting vector machine is extracted Flase drop further drops in head and shoulder region.

Again referring to Fig. 2, in the present embodiment, the state identification method that the driver drives to take phone can also include：

Extract the gradient orientation histogram feature drawn game in the head and shoulder region (the head and shoulder region obtained by step S103) Portion's binary feature (LBP, Local Binary Pattern), and by the gradient orientation histogram feature and local binary feature It is combined, to form the feature vector of a multidimensional；

Described eigenvector is carried out using linear discriminant analysis (LDA, Linear Discriminant Analysis) Classification, filters out non-head and shoulder region.

Wherein, LDA belongs to a kind of linear learning method, also referred to as Fisher discriminant analyses (i.e. fischer discriminant analysis), should Method projects to given sample set on straight line, can the separation of the subpoint maximum of different classes of sample is generic Sample as close possible to.

In the present embodiment, using LDA, the head and shoulder window that step S103 is detected is filtered, is divided into someone and nobody Two classes filter out the region of non-head and shoulder, to reduce false drop rate, improve accuracy of detection.

In one embodiment, the process of candidate region progress HOG feature extractions is detected to driver to be included：

Image normalization is 40*40, shares 16 block, wherein, a block (i.e. section) is (i.e. thin by 4 cell Born of the same parents' unit) it forms, the pixel set that a cell is 8*8 is scanned, each cell has 9 bin using 8 pixels as step-length (i.e. 9 parts), therefore HOG intrinsic dimensionalities are tieed up for 16*4*9=576.

And the process that candidate region progress LBP feature extractions are detected to driver can include：

Using uniform LBP patterns, 59 dimensions (59 dimensions are uniform LBP patterns), image normalization 48* are dropped to from 256 dimensions 48, and it is divided into 3*3 blocks, it is 16*16 per block size, every piece has 59 dimensional features, therefore LBP intrinsic dimensionalities are tieed up for 3*3*59=531.

After HOG features and LBP features, HOG and LBP is combined, obtains the feature vector, X of a n (n is natural number) dimensions (x₁, x₂, x₃..., x_n).Optionally, n=1009.

Training is iterated to training sample (feature vector of i.e. above-mentioned n dimensions) using LDA linear discriminant analysis, obtains one The optimal training parameter W (w of group₁, w₂, w₃..., w_n)。

H=w₁*x₁+w₂*x₂+w₃*x₃+…+w_n*x_n+b₁ (6)

In formula (6), h is the result of linear discriminant analysis；

x₁, x₂, x₃..., x_nFor the characteristic value in feature vector X；

w₁, w₂, w₃..., w_n、b₁For training parameter.

During test, by the feature vector, X (x of extraction₁, x₂, x₃..., x_n) substitute into formula (6), h is acquired, if h is more than etc. In 0, then it is assumed that the head and shoulder region of step S103 outputs is final head and shoulder region；If h is less than 0, then it is assumed that step S103 is exported Head and shoulder region for non-head and shoulder region, can directly filter out the non-head and shoulder region.

I.e. according to h, whether the head and shoulder region more than or equal to 0 to be exported to step S103 is further screened, final to obtain more Accurate head and shoulder region, reduces the interference in non-head and shoulder region, so as to improve accuracy of detection.

S104：The head and shoulder region is sequentially input into first layer CNN (Convolutional Neural Network, volume Product neural network) network and second layer CNN networks, the first layer CNN networks tentatively screen the head and shoulder region, obtained The head and shoulder region of the doubtful driver for taking phone is obtained, the second layer CNN networks are to the output knot of the first layer CNN networks Fruit (i.e. the head and shoulder region of the doubtful driver for taking phone of first layer CNN networks output) is further screened, and is driven described in acquisition The person of sailing takes the state of phone.

In the present embodiment, the number of plies of the first layer CNN networks is less than the number of plies of the second layer CNN networks, and described The convolution kernel number of first layer CNN networks is less than the convolution kernel number of the second layer CNN networks.

Wherein, CNN networks include input layer, Nc convolutional layer, Np down-sampling layer, Nf full articulamentums.

Specifically, every layer of convolutional layer includes Nc_Ck convolution kernel, and convolution kernel size is Ckm*Ckm, step-length 1, under every layer The core size of sample level is Pkm*Pkm, step-length Pkm, the neuron that last layer of the full articulamentum full articulamentum exports Quantity is needs number of classifying.

Referring to Fig. 3, in the present embodiment, first layer CNN networks export two classes, take phone and do not take phone the two are driven The person of sailing takes telephone state, i.e. first layer CNN networks output is 2.

Second layer CNN networks are sophisticated category, and it is 4 to export, i.e. 4 drivers take telephone state and (make a phone call, is right in the left side While make a phone call, do not make a phone call and can not penalty).

Wherein, the Nc ∈ [2,10], Np ∈ [2,10], Nf ∈ [1,3]；

Nc_Ck∈[Nc_Ck_min,Nc_Ck_max], Nc_Ck_min∈[6,16]；

Ckm ∈ [3,7], Pkm ∈ [2,4].

In step S104, head and shoulder region input first layer CNN networks will be obtained in step s 103 first, to be connect It makes a phone call and does not take phone the two states, realize quick, rough identification.

Wherein, the structure of first layer CNN networks is relatively simple, using less the network number of plies and convolution kernel number.Purpose It is fast filtering, it is as much as possible to retain the monitoring picture made a phone call, while exclude the non-monitoring picture made a phone call.

The output result of first layer CNN networks is two classes, that is, makes a phone call and do not make a phone call, a large amount of so as to filter out The non-monitoring image made a phone call, it is not necessary to carry out next layer (i.e. second layer CNN networks) sophisticated category again and (reduce and be input to second The quantity of the monitoring image of layer CNN networks), taking for sophisticated category was both reduced, while driving can be reduced take and make a phone call False Rate.

Compared to first layer CNN networks, second layer CNN network structures are complex, so as to fulfill fine identification.

Again referring to Fig. 3, the state identification method that the driver drives to take phone further includes：

By the head and shoulder region of the doubtful driver for taking phone, (i.e. described first layer CNN networks output is doubtful to be taken The head and shoulder region of the driver of phone) it is divided into left area, right area and overall region；

Assuming that the head and shoulder region of driver is Call (x, y, w, h), wherein, Call (x, y) is the abscissa in head and shoulder region And ordinate, Call (w, h) are the width and height in head and shoulder region.

In the present embodiment, head and shoulder region is normalized to w*h, i.e. w=150, h=100, then head and shoulder region to be identified point It is following three pieces：Left area be Call (x, y, α * w, h), right area be Call (x+ (1- α) * w, y, α * w, h), overall area Domain is Call (x, y, w, h).Wherein, α is empirical coefficient.Optionally, α=2/3.

After head and shoulder region to be identified is divided into three pieces, need the left area of division, right area and overall area Domain input second layer CNN networks carry out multiple features fusion differentiation.It can be seen that the present embodiment carries out driver using head and shoulder detection Target coarse positioning carries out driver's flase drop elimination, so as to reduce false drop rate in conjunction with multiple features fusion discriminant analysis.

In the present embodiment, in order to more accurately identify that driver's driving takes the state of phone, the essence of identification is improved Exactness, the state identification method that the driver drives to take phone further include：

Certainly, the frame number of the monitoring image identified needed for can also selecting as needed, for example, at least two frames continuously monitor Image.

As shown in figure 4, the structure diagram of the status identification means of phone is taken for driver's driving that the application provides, with The state identification method that above-mentioned driver drives to take phone is corresponding, can refer to above-mentioned driver and drives to take the state of phone The embodiment of recognition methods come understand or explain the driver drive take phone status identification means content.

Referring to Fig. 4, a kind of driver provided in this embodiment drives to take the status identification means of phone, and described device can To include locating module 100, region acquisition module 200, module of target detection 300 and identification module 400.

Wherein, locating module 100 position for the vehicle window to target vehicle in monitoring image；

Region acquisition module 200 according to the location information of vehicle window and the license board information of target vehicle, obtains driver's inspection Astronomical observation favored area；

Module of target detection 300, utilization orientation histogram of gradients and support vector machines detect candidate regions to the driver Domain is detected, and obtains the head and shoulder region of the driver；

The head and shoulder region is sequentially input first layer CNN networks and second layer CNN networks by identification module 400, and described One layer of CNN network tentatively screens the head and shoulder region, obtains the head and shoulder region of the doubtful driver for taking phone, described Second layer CNN networks further screen the output result of the first layer CNN networks, obtain the driver and take phone State.

Wherein, the number of plies of the first layer CNN networks is less than the number of plies of the second layer CNN networks, and the first layer The convolution kernel number of CNN networks is less than the convolution kernel number of the second layer CNN networks.

Further, the locating module 100 can include：

Further, referring to Fig. 5, the module of target detection 300 can also include feature extraction submodule 301 and filtering Submodule 302.

Wherein, feature extraction submodule 301, for extracting the histograms of oriented gradients feature in the head and shoulder region and part Binary feature, and the histograms of oriented gradients feature and local binary feature are combined, to form the spy of a multidimensional Sign vector；

Filter submodule 302 classifies to described eigenvector using linear discriminant analysis, filters out non-head and shoulder area Domain.

Further, referring to Fig. 6, the identification module 400 can also include dividing submodule 401 and fusion recognition submodule Block 402.

Wherein, submodule 401 is divided, by head and shoulder region (the first layer CNN of the doubtful driver for taking phone Network exports the head and shoulder region of the doubtful driver for taking phone) it is divided into left area, right area and overall region；

The left area, right area and overall region are inputted second layer CNN by fusion recognition submodule 402 respectively Network, obtains the state that the driver takes phone, and the state that the driver takes phone includes：The left side take phone, The right takes phone, does not make a phone call and can not penalty.

Referring to Fig. 7, the status identification means that the driver drives to take phone may also include：

Multiframe authentication module 500, it is right if present frame monitoring image recognition result is in for driver takes telephone state The next frame monitoring image of present frame monitoring image continues to identify, if the recognition result of the next frame monitoring image is driver In telephone state is taken, then alerted；Otherwise, it abandons alerting.

In conclusion the driver of the application drives to take the state identification method of phone and device passes through vehicle glazing area It the step of domain positioning, driver's target detection (head and shoulder region detection) and cascade these pure video detections of CNN networks, can Remove artificial detection driving from and take taking time and effort for telephone state, and can exclude under complex scene non-genuine takes phone Accidentally pick up, improve accuracy of identification, be compared to conventional method, penalty accuracy rate higher, adaptability, robustness to scene are more It is good.

The foregoing is merely the preferred embodiment of the application, not limiting the application, all essences in the application God and any modification, equivalent substitution, improvement and etc. within principle, done, should be included within the scope of the application protection.

Claims

1. a kind of driver drives to take the state identification method of phone, which is characterized in that the method includes：

The vehicle window of target vehicle in monitoring image is positioned；

Utilization orientation histogram of gradients and support vector machines detect candidate region to the driver and are detected, and are driven described in acquisition The head and shoulder region for the person of sailing；

The head and shoulder region is sequentially input into first layer CNN networks and second layer CNN networks, the first layer CNN networks are to institute It states head and shoulder region tentatively to be screened, obtains the head and shoulder region of the doubtful driver for taking phone, the second layer CNN networks pair The output result of the first layer CNN networks is further screened, and obtains the state that the driver takes phone.

2. driver as described in claim 1 drives to take the state identification method of phone, which is characterized in that by the head Shoulder region is inputted before first layer CNN networks and second layer CNN networks, is further included：

Extract the histograms of oriented gradients feature in the head and shoulder region and local binary feature, and by the histograms of oriented gradients Feature and local binary feature are combined, to form the feature vector of a multidimensional；

3. driver as described in claim 1 drives to take the state identification method of phone, which is characterized in that

The method further includes：

It is left area, right area and overall region by the head and shoulder region division of the doubtful driver for taking phone；

The left area, right area and overall region are inputted into second layer CNN networks respectively, the driver is obtained and takes The state of phone, the state that the driver takes phone include：The left side takes phone, the right takes phone, do not make a phone call with It and can not penalty.

4. driver as described in claim 1 drives to take the state identification method of phone, which is characterized in that the method is also Including：

If present frame monitoring image recognition result is in for driver and takes telephone state, to the next of present frame monitoring image Frame monitoring image continues to identify, if the recognition result of the next frame monitoring image is in for driver takes telephone state, It is alerted；Otherwise, it abandons alerting.

5. driver as described in claim 1 drives to take the state identification method of phone, which is characterized in that the vehicle window Location information acquisition process includes：

6. a kind of driver drives to take the status identification means of phone, which is characterized in that described device includes：

Region acquisition module according to the location information of vehicle window and the license board information of target vehicle, obtains driver and detects candidate Region；

Module of target detection, utilization orientation histogram of gradients and support vector machines detect candidate region to the driver and examine It surveys, obtains the head and shoulder region of the driver；

The head and shoulder region is sequentially input first layer CNN networks and second layer CNN networks, the first layer CNN by identification module Network tentatively screens the head and shoulder region, obtains the head and shoulder region of the doubtful driver for taking phone, the second layer CNN networks further screen the output result of the first layer CNN networks, obtain the state that the driver takes phone.

7. driver as claimed in claim 6 drives to take the status identification means of phone, which is characterized in that the target inspection Module is surveyed to further include：

Feature extraction submodule, extracts the histograms of oriented gradients feature in the head and shoulder region and local binary feature, and by institute It states histograms of oriented gradients feature and local binary feature is combined, to form the feature vector of a multidimensional；

8. driver as claimed in claim 6 drives to take the status identification means of phone, which is characterized in that the identification mould Block further includes：

Divide submodule, by the head and shoulder region division of the doubtful driver for taking phone for left area, right area and Overall region；

The left area, right area and overall region are inputted second layer CNN networks, obtained by fusion recognition submodule respectively The state that the driver takes phone is obtained, the state that the driver takes phone includes：The left side takes phone, the right takes Phone is not made a phone call and can not penalty.

9. driver as claimed in claim 6 drives to take the status identification means of phone, which is characterized in that described device is also Including：

Multiframe authentication module, if present frame monitoring image recognition result is in for driver takes telephone state, to present frame The next frame monitoring image of monitoring image continues to identify, be connect if the recognition result of the next frame monitoring image is in for driver Telephoning state is then alerted；Otherwise, it abandons alerting.

10. driver as claimed in claim 6 drives to take the status identification means of phone, which is characterized in that the positioning Module includes：