CN108205649B

CN108205649B - Method and device for recognizing state of driver for calling and answering

Info

Publication number: CN108205649B
Application number: CN201611185468.8A
Authority: CN
Inventors: 陈鑫嘉; 张震
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2016-12-20
Filing date: 2016-12-20
Publication date: 2021-08-31
Anticipated expiration: 2036-12-20
Also published as: CN108205649A

Abstract

The application provides a method and a device for recognizing the state of a driver for calling and answering, wherein the method comprises the following steps: positioning the window of the target vehicle in the monitoring image; obtaining a driver detection candidate area according to the positioning information of the vehicle window and the license plate information of the target vehicle; detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver; and sequentially inputting the head-shoulder area into a first-layer CNN network and a second-layer CNN network, wherein the first-layer CNN network preliminarily screens the head-shoulder area to obtain the head-shoulder area of a driver suspected to receive and make calls, and the second-layer CNN network further screens the output result of the first-layer CNN network to obtain the state of receiving and making calls of the driver. According to the method and the device, the false picking-up of the non-real call receiving and making under the complex scene can be eliminated, the identification precision is improved, and the adaptability and the robustness of the scene are good.

Description

Method and device for recognizing state of driver for calling and answering

Technical Field

The application relates to the field of video monitoring, in particular to a method and a device for recognizing the state of a driver calling and answering.

Background

If a driver makes a call in the driving process, the incidence rate of traffic accidents is greatly increased, so that the call making and receiving state of a driver truck needs to be effectively identified as important evidence for whether the driver truck breaks rules and regulations.

With the continuous development of image processing technology, computer vision technology, deep learning technology and embedded technology, how to automatically judge and obtain evidence of vehicle (including personnel in the vehicle) violation has become a research hotspot in current intelligent transportation.

The prior art provides a method (CN105868690A) for identifying a driver's behavior of making a phone call, which comprises the steps of firstly collecting a video stream in a cockpit, then positioning a face region through a face part model, then carrying out face correction, and training two groups of parameters to judge whether to make a phone call or not by using a nonlinear judgment relation, wherein the training classification model comprises an ear region training set, a making phone call training set and a non-making phone call training set. The method adopts a DPM (Deformable Part Model) component detection algorithm to position the face region, the detection is time-consuming, and the detection accuracy of the shielding or face blurring condition is greatly influenced; the method adopts a nonlinear classification method to judge whether to make a call, and the accuracy is low by training ear areas, making cell phone areas and non-making cell phone areas to identify.

The prior art also provides an automatic monitoring method (CN103366506A) for the behavior of a driver taking a call during driving, which includes the steps of firstly obtaining the head and the nearby area of the driver by an image obtaining device, obtaining the positions of the face and the hands of the driver in the image by using skin color detection, then classifying by using a support vector machine, and giving a warning to the driver taking the call. The method adopts skin color detection to obtain the positions of the face and the hands of the driver in the image, the imaging of the cab is complex, and the lighting and special weather in the window under the complex scene have great influence on the imaging, so that the method has more missed detection and false detection.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for identifying a driving, answering and making a call of a driver, so as to solve the problem in the prior art that the accuracy of identifying the driving, answering and making a call of the driver is low.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of the present application, there is provided a state recognition method for a driver to take a call while driving a vehicle, the method comprising:

positioning the window of the target vehicle in the monitoring image;

obtaining a driver detection candidate area according to the positioning information of the vehicle window and the license plate information of the target vehicle;

detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver;

and sequentially inputting the head-shoulder area into a first-layer CNN network and a second-layer CNN network, wherein the first-layer CNN network preliminarily screens the head-shoulder area to obtain the head-shoulder area of a driver suspected to receive and make calls, and the second-layer CNN network further screens the output result of the first-layer CNN network to obtain the state of receiving and making calls of the driver.

Optionally, before inputting the head-shoulder area into the first-layer CNN network and the second-layer CNN network, the method further includes:

extracting the directional gradient histogram feature and the local binary feature of the head and shoulder region, and combining the directional gradient histogram feature and the local binary feature to form a multi-dimensional feature vector;

and classifying the feature vectors by utilizing linear discriminant analysis, and filtering out non-head-shoulder regions.

Optionally, the method further comprises:

dividing the head and shoulder area of the suspected call receiving and making driver into a left area, a right area and an integral area;

respectively inputting the left area, the right area and the whole area into a second-layer CNN network to obtain the state of the driver for receiving and making calls, wherein the state of the driver for receiving and making calls comprises the following steps: left incoming calls, right incoming calls, no calls, and no penalty.

Optionally, the method further comprises:

if the current frame monitoring image recognition result is that the driver is in a call receiving and making state, continuously recognizing the next frame monitoring image of the current frame monitoring image, and if the next frame monitoring image recognition result is that the driver is in the call receiving and making state, giving an alarm; otherwise, the alarm is abandoned.

Optionally, the process of obtaining the positioning information of the vehicle window includes:

acquiring a window upper right corner area according to the license plate information of the target vehicle;

and positioning the position information of the upper right corner point of the car window by using the positioning filter.

According to a second aspect of the present application, there is provided a state recognition device for a driver to take a call while driving a vehicle, the device comprising:

the positioning module is used for positioning the window of the target vehicle in the monitoring image;

the region acquisition module is used for acquiring a driver detection candidate region according to the positioning information of the vehicle window and the license plate information of the target vehicle;

the target detection module is used for detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver;

and the identification module is used for sequentially inputting the head-shoulder area into a first-layer CNN network and a second-layer CNN network, the first-layer CNN network is used for preliminarily screening the head-shoulder area to obtain the head-shoulder area of a driver suspected of receiving and making calls, and the second-layer CNN network is used for further screening the output result of the first-layer CNN network to obtain the state of receiving and making calls of the driver.

Optionally, the target detection module further includes:

the feature extraction submodule extracts the directional gradient histogram feature and the local binary feature of the head and shoulder region and combines the directional gradient histogram feature and the local binary feature to form a multi-dimensional feature vector;

and the filtering submodule is used for classifying the characteristic vectors by utilizing linear discriminant analysis and filtering out non-head-shoulder regions.

Optionally, the identification module further comprises:

a division submodule for dividing the head and shoulder area of the suspected call receiving and making driver into a left area, a right area and an integral area;

the fusion identification submodule is used for respectively inputting the left area, the right area and the whole area into a second-layer CNN network to obtain the state of the call receiving and making of the driver, and the state of the call receiving and making of the driver comprises the following steps: left incoming calls, right incoming calls, no calls, and no penalty.

Optionally, the apparatus further comprises:

the multi-frame verification module is used for continuously identifying the next monitoring image of the current monitoring image if the identification result of the current monitoring image is that the driver is in the call receiving and making state, and giving an alarm if the identification result of the next monitoring image is that the driver is in the call receiving and making state; otherwise, the alarm is abandoned.

Optionally, the positioning module comprises:

The beneficial effect of this application: through the steps of pure video detection of vehicle window area positioning, driver target detection (head-shoulder area detection) and a cascaded CNN network, time and labor consumption of manual detection of a starting call receiving and making state can be avoided, false picking-up of non-real call receiving and making under a complex scene can be eliminated, identification precision is improved, compared with a traditional method, the penalty accuracy is higher, and adaptability and robustness to the scene are better.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart of a method for identifying a driving state and a calling state of a driver according to an embodiment of the present application;

fig. 2 is a flowchart of target detection provided in an embodiment of the present application;

FIG. 3 is a flow chart of object recognition provided by an embodiment of the present application;

fig. 4 is a block diagram of a state identification device for a driver to take a car and make a call according to an embodiment of the present application;

FIG. 5 is a block diagram of a target detection module according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of an identification module according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a further state identification device for enabling a driver to take a car and make a call according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims. In addition, the features in the embodiments and the examples described below may be combined with each other without conflict.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Referring to fig. 1, the method for identifying a driving state of a driver to make and receive a call according to the present embodiment may include:

s101: and positioning the window of the target vehicle in the monitoring image.

The monitoring scene can be a scene of a road with more traffic flow or an accident easy to happen, such as a common bayonet road, and the monitoring image is shot by a bayonet camera.

In one embodiment, the acquired monitoring image f is targeted_src(x, y), wherein x and y are respectively the abscissa and ordinate of a point on the monitored image, the Width of the monitored image is Width and the Height of the monitored image is Height, and assuming that the license plate information of the target vehicle in the image is obtained, the license plate information can comprise the license plate color LpColor and the license plate information Lp (x, y, w, h), wherein Lp (x, y) is respectively the abscissa and ordinate of the license plate (the upper left of the license plate)The abscissa and ordinate of the corner or the abscissa and ordinate of the license plate center, etc.), Lp (w, h) is the width and height of the license plate, respectively.

The window positioning is not limited to the following scheme, and for example, the window can be positioned by using Hough straight line detection, or by using Adaboost (an iterative algorithm) window detection, or the window area can be determined by using window upper right corner point positioning.

In this embodiment, the window area is determined using the window upper right corner point location.

Specifically, the process for acquiring the positioning information of the vehicle window comprises the following steps:

In this embodiment, the upper right corner area of the vehicle window is estimated through the license plate information, and then the positioning filter is used

And positioning the position information of the upper right corner point.

For a candidate window upper right corner image f_un(x, y) Using a positioning Filter

For candidate image f_unThe formula for the convolution calculation (x, y) is:

in the formula (1), x and y are respectively an abscissa and an ordinate of a point in the candidate image;

representing a convolution operation, g (x, y) being a candidate image f_un(x, y) and positioning filter

And (6) convolution results.

And (3) calculating by using a formula (1), and finding out a g (x, y) peak point, namely the position information RgtUp (x, y) of the upper right corner of the car window.

In this embodiment, the filter for positioning the upper right corner point of the car window

The method is obtained by batch calibrated image training:

in the formula (2), h_i(x, y) is a positioning filter corresponding to the picture of the upper right corner of the ith vehicle window,

a′_ito normalize the processed filter weight coefficients,

in the present embodiment, the first and second electrodes are,

in the formula (3), x_i、y_iPosition information of a calibrated upper right corner point of the car window;

delta is an empirical coefficient;

f_un(x, y) is the image of the upper right corner of the known vehicle window, and h can be reversely deduced according to the formulas (1) and (3)_i(x, y) calculating to obtain the positioning filter through batch labeled images

S102: and obtaining a driver detection candidate area according to the positioning information of the vehicle window and the license plate information of the target vehicle.

In this case, the shape of the driver detection candidate region may be set to a regular shape as needed. In the present embodiment, the shape of the driver detection candidate region is selected as a rectangle.

According to the position information RgtUp (x, y) of the upper right corner point of the window of the target vehicle obtained in the step S101, combining the license plate information Lp with (x, y, w, h), calculating to obtain a Driver detection candidate area Driver (x, y, w, h):

Driver(x)＝min(RgtUp(x)-α*Lp(w),0) (2)

Driver(y)＝min(RgtUp(y)-β*Lp(w),0) (3)

Driver(w)＝min(γ*Lp(w),Width-Driver(x)) (4)

Driver(h)＝min(ε*Lp(w),Height-Driver(y)) (5)

in equations (2) - (5), Driver (x, y) is the abscissa and ordinate of the vertex of the Driver detection candidate region (for example, may be the top left vertex, the bottom left vertex, the top right vertex, and the bottom right vertex), respectively;

driver (w, h) is the width and height of the Driver detection candidate region, respectively;

alpha, beta, epsilon and gamma are all preset empirical coefficients, wherein alpha belongs to [1,2], beta belongs to [0.5,1], epsilon belongs to [2.5,3.5], gamma belongs to [1.5,2.5 ].

In this example, α is 1.5, β is 0.5, γ is 2.0, and ∈ is 3.0.

According to the above equations (2) to (5), the Driver detection candidate region Driver (x, y, w, h) can be obtained by calculation.

S103: and detecting the driver detection candidate region by utilizing a direction gradient histogram and a support vector machine to obtain the head and shoulder region of the driver.

Referring to fig. 2, in this step, a driver area is coarsely located by using Histogram of Oriented Gradients (HOG) in combination with Support Vector Machine (SVM) classification, and a head-shoulder area of the driver is obtained. After the HOG characteristics of the driver detection candidate area are extracted, the HOG characteristics are input into an SVM for training, and therefore the driver area is roughly positioned.

In this embodiment, after extracting the histogram feature of directional gradient in the candidate driver detection region, the histogram feature of directional gradient in the corresponding size of the sliding window is obtained in a sliding window detection manner, and is respectively input to an SVM for classification, so as to obtain the head and shoulder detection of the driver. Compared with skin color face detection, the DPM or Adaboost face detection has the advantage that the driver detection success rate is higher (in the range of 3% to 5%) under the condition that the face is shielded (the sun shield shields the face or the car window partially shields the face, etc.) or the reflection of the face is fuzzy.

In step S103, the classification result of the head-shoulder region of the driver may be classified into a head-shoulder target and a non-head-shoulder target, but there are still many false detections, and the result is influenced greatly, so that the false detection needs to be further reduced for the head-shoulder region extracted by combining the histogram of directional gradients with the support vector machine.

Referring to fig. 2 again, in this embodiment, the method for recognizing the driving, calling and answering states of the driver may further include:

extracting gradient direction histogram features and Local Binary features (LBP) of the head-shoulder region (the head-shoulder region obtained through step S103), and combining the gradient direction histogram features and the Local Binary features to form a multi-dimensional feature vector;

the feature vectors are classified using Linear Discriminant Analysis (LDA) to filter out non-head-shoulder regions.

The LDA belongs to a linear learning method, also called Fisher discriminant analysis (Fisher discriminant analysis), and the method projects a given sample set onto a straight line, so that the projection points of different types of samples can be separated maximally, and the samples of the same type are as close as possible.

In this embodiment, the LDA is used to filter the head and shoulder windows detected in step S103, which are classified into two types, human and unmanned, and filter out the areas other than the head and shoulder windows, so as to reduce the false detection rate and improve the detection accuracy.

In an embodiment, the process of performing HOG feature extraction on the driver detection candidate region may include:

the images are normalized to 40 × 40 for 16 blocks, where one block (i.e., interval) is composed of 4 cells (i.e., cell units), one cell is a set of 8 × 8 pixels, the scanning is performed in 8 pixel steps, each cell has 9 bins (i.e., 9 parts), and thus the HOG feature dimension is 16 × 4 × 9 — 576 dimensions.

And the LBP feature extraction process for the driver detection candidate region may include:

with the uniform LBP pattern, from 256 dimensions down to 59 dimensions (59 dimensions are the uniform LBP pattern), the image is normalized to 48 × 48 and divided into 3 × 3 blocks, each block is 16 × 16 in size, each block has 59-dimensional features, and thus the LBP feature dimension is 3 × 59 — 531 dimensions.

After the HOG feature and the LBP feature, the HOG and the LBP are combined to obtain a feature vector X (X) with n (n is a natural number) dimensions₁，x₂，x₃，…，x_n). Optionally, n is 1009.

Performing iterative training on the training sample (i.e. the n-dimensional feature vector) by LDA linear discriminant analysis to obtain an optimal set of training parameters W (W)₁，w₂，w₃，…，w_n)。

h＝w₁*x₁+w₂*x₂+w₃*x₃+…+w_n*x_n+b₁ (6)

In the formula (6), h is a result of linear discriminant analysis;

x₁，x₂，x₃，…，x_nis the eigenvalue in the eigenvector X;

w₁，w₂，w₃，…，w_n、b₁are training parameters.

When in test, the extracted feature vector X (X)₁，x₂，x₃，…，x_n) Substituting the obtained result into the formula (6) to obtain h, and if h is greater than or equal to 0, considering the head-shoulder area output in the step S103 as a final head-shoulder area; if h is smaller than 0, the head-shoulder region output in step S103 is considered as a non-head-shoulder region, and the non-head-shoulder region can be directly filtered out.

The head and shoulder regions output in the step S103 are further screened according to whether h is greater than or equal to 0, so that a more accurate head and shoulder region is finally obtained, interference of a non-head and shoulder region is reduced, and thus detection accuracy is improved.

S104: and sequentially inputting the head-shoulder area into a first-layer CNN (Convolutional Neural Network) Network and a second-layer CNN Network, wherein the first-layer CNN Network preliminarily screens the head-shoulder area to obtain a head-shoulder area of a driver suspected to make a call, and the second-layer CNN Network further screens an output result of the first-layer CNN Network (namely the head-shoulder area of the driver suspected to make a call output by the first-layer CNN Network) to obtain a call making state of the driver.

In this embodiment, the number of layers of the first layer of CNN network is less than that of the second layer of CNN network, and the number of convolution kernels of the first layer of CNN network is less than that of convolution kernels of the second layer of CNN network.

The CNN network comprises an input layer, Nc convolutional layers, Np downsampling layers and Nf full-connection layers.

Specifically, each convolution layer includes Nc _ Ck convolution kernels, the convolution kernel size is Ckm × Ckm, the step size is 1, the kernel size of each downsampling layer is Pkm × Pkm, the step size is Pkm, and the number of neurons output by the last fully-connected layer of the fully-connected layers is the number of required classifications.

Referring to fig. 3, in the present embodiment, the first-layer CNN network outputs two types, i.e., two driver incoming and outgoing states, i.e., the first-layer CNN network output is 2.

The second layer CNN network is a fine classification with an output of 4, i.e. 4 drivers on-call status (left on-call, right on-call, no on-call, and no penalty).

Wherein Nc ∈ [2,10], Np ∈ [2,10], Nf ∈ [1,3 ];

Nc_Ck∈[Nc_Ck_min,Nc_Ck_max]，Nc_Ck_min∈[6,16]；

Ckm∈[3,7]，Pkm∈[2,4]。

in step S104, the head and shoulder area obtained in step S103 is first input into the first-layer CNN network to obtain two states of incoming and outgoing calls, thereby realizing quick and rough recognition.

The first layer CNN network has a simpler structure and adopts fewer network layer numbers and convolution kernels. The aim is to filter quickly, to retain the monitoring pictures of the calling as much as possible, and to exclude the monitoring pictures of the non-calling.

The output results of the first-layer CNN network are two types, namely calling and non-calling, so that a large number of non-calling monitoring images can be filtered, the next-layer (namely the second-layer CNN network) fine classification (namely the number of monitoring images input to the second-layer CNN network) is not needed, the time consumption of fine classification is reduced, and the misjudgment rate of calling and receiving can be reduced.

Compared with the first layer CNN network, the second layer CNN network has a more complex structure, thereby realizing fine identification.

Referring to fig. 3 again, the method for recognizing the driving, calling and answering states of the driver further includes:

dividing the head-shoulder area of the suspected call-receiving driver (namely the head-shoulder area of the suspected call-receiving driver output by the first-layer CNN network) into a left area, a right area and an integral area;

Assume that the driver's head-shoulder area is Call (x, y, w, h), where Call (x, y) is the abscissa and ordinate of the head-shoulder area, and Call (w, h) is the width and height of the head-shoulder area.

In this embodiment, the head-shoulder area is normalized to w × h, that is, w is 150, and h is 100, and the head-shoulder area to be identified is divided into three areas: the left region is Call (x, y, α w, h), the right region is Call (x + (1- α) w, y, α w, h), and the total region is Call (x, y, w, h). Where α is an empirical coefficient. Optionally, α -2/3.

After dividing the head and shoulder area to be identified into three, the divided left area, right area and total area need to be input into the second layer CNN network for multi-feature fusion discrimination. Therefore, in the embodiment, the driver target is roughly positioned by using the head and shoulder detection, and the false detection of the driver is eliminated by combining the multi-feature fusion discriminant analysis, so that the false detection rate is reduced.

In this embodiment, in order to more accurately identify the driving, answering and making a call state of the driver and improve the accuracy of identification, the method for identifying the driving, answering and making a call state of the driver further includes:

Of course, the number of frames of the monitoring image to be recognized, for example, at least two consecutive monitoring images, may be selected as needed.

As shown in fig. 4, a block diagram of a state recognition device for driver's driving to receive calls provided by the present application corresponds to the above state recognition method for driver's driving to receive calls, and the contents of the state recognition device for driver's driving to receive calls can be understood or explained with reference to the above embodiment of the state recognition method for driver's driving to receive calls.

Referring to fig. 4, the present embodiment provides a status recognition apparatus for a driver to take a car and make a phone call, which may include a positioning module 100, an area obtaining module 200, an object detecting module 300, and a recognition module 400.

The positioning module 100 is configured to position a window of a target vehicle in a monitored image;

the region acquisition module 200 is used for acquiring a driver detection candidate region according to the window positioning information and the license plate information of the target vehicle;

the target detection module 300 is used for detecting the driver detection candidate region by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder region of the driver;

the identification module 400 sequentially inputs the head-shoulder area into a first-layer CNN network and a second-layer CNN network, the first-layer CNN network performs preliminary screening on the head-shoulder area to obtain a head-shoulder area of a driver suspected to make and receive calls, and the second-layer CNN network further screens an output result of the first-layer CNN network to obtain a state of making and receiving calls of the driver.

The number of layers of the first layer of CNN network is less than that of the second layer of CNN network, and the number of convolution kernels of the first layer of CNN network is less than that of the convolution kernels of the second layer of CNN network.

Further, the positioning module 100 may include:

Further, referring to fig. 5, the object detection module 300 may further include a feature extraction sub-module 301 and a filtering sub-module 302.

The feature extraction submodule 301 is configured to extract a directional gradient histogram feature and a local binary feature of the head-shoulder region, and combine the directional gradient histogram feature and the local binary feature to form a multidimensional feature vector;

the filtering submodule 302 classifies the feature vectors by using linear discriminant analysis, and filters out non-head-shoulder regions.

Further, referring to fig. 6, the recognition module 400 may further include a dividing sub-module 401 and a fusion recognition sub-module 402.

The dividing sub-module 401 divides the head-shoulder area of the suspected call-receiving driver (the head-shoulder area of the suspected call-receiving driver output by the first-layer CNN network) into a left area, a right area and an integral area;

the fusion identification submodule 402 is configured to input the left area, the right area, and the whole area into a second-layer CNN network, and obtain a state of the driver receiving and making a call, where the state of the driver receiving and making a call includes: left incoming calls, right incoming calls, no calls, and no penalty.

Referring to fig. 7, the state recognition apparatus for driver's driving to make and receive calls may further include:

the multi-frame verification module 500 is used for continuously identifying the next monitoring image of the current monitoring image if the identification result of the current monitoring image is that the driver is in the call receiving and making state, and giving an alarm if the identification result of the next monitoring image is that the driver is in the call receiving and making state; otherwise, the alarm is abandoned.

In summary, according to the method and the device for identifying the state of the call answering when the driver starts the vehicle, the steps of pure video detection such as vehicle window area positioning, driver target detection (head-shoulder area detection) and a cascaded CNN network can be used for avoiding the time and labor consumption of manually detecting the state of the call answering when the driver starts the vehicle, and the false picking-up of the call answering and making under a complex scene can be eliminated, so that the identification precision is improved.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method for recognizing a driver's driving, answering and making a call, the method comprising:

positioning the window of the target vehicle in the monitoring image;

inputting the head and shoulder area into a first-layer CNN network, and primarily screening the head and shoulder area by the first-layer CNN network to filter the head and shoulder area of drivers who do not make calls and output the head and shoulder area of drivers who are suspected to make calls;

inputting the output result of the first-layer CNN network into a second-layer CNN network, and further screening the output result of the first-layer CNN network by the second-layer CNN network so as to output the state of the driver making a call based on the head-shoulder area of the driver suspected to make a call; wherein the classification category of the second-layer CNN network is different from the classification category of the first-layer CNN network.

2. The method for recognizing a driver's driving to make and receive calls as set forth in claim 1, further comprising, before inputting the head-shoulder area into the first-layer CNN network:

3. The state recognition method for driver's driving to make and receive calls according to claim 1,

the method further comprises the following steps:

4. The method for recognizing the driver's driving and making a call as set forth in claim 1, further comprising:

5. The state recognition method for a driver's driving to make and receive calls according to claim 1, wherein the window positioning information acquisition process includes:

6. A state recognition apparatus for making and receiving a call while a driver is driving, said apparatus comprising:

the identification module is used for inputting the head and shoulder area into a first-layer CNN network, and the first-layer CNN network preliminarily screens the head and shoulder area to filter the head and shoulder area of a driver who is not calling and output the head and shoulder area of the driver suspected of receiving and calling;

7. The driver's driving, call-receiving state recognition apparatus according to claim 6, wherein the object detection module further comprises:

8. The driver's driving, call-receiving state recognition device according to claim 6, wherein the recognition module further comprises:

9. The driver's driving, call-receiving state recognition apparatus according to claim 6, further comprising:

10. The driver's driving, call-receiving state recognition device according to claim 6, wherein the location module comprises: