CN108205649B - Method and device for recognizing state of driver for calling and answering - Google Patents

Method and device for recognizing state of driver for calling and answering Download PDF

Info

Publication number
CN108205649B
CN108205649B CN201611185468.8A CN201611185468A CN108205649B CN 108205649 B CN108205649 B CN 108205649B CN 201611185468 A CN201611185468 A CN 201611185468A CN 108205649 B CN108205649 B CN 108205649B
Authority
CN
China
Prior art keywords
driver
area
head
cnn network
call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611185468.8A
Other languages
Chinese (zh)
Other versions
CN108205649A (en
Inventor
陈鑫嘉
张震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co Ltd filed Critical Zhejiang Uniview Technologies Co Ltd
Priority to CN201611185468.8A priority Critical patent/CN108205649B/en
Publication of CN108205649A publication Critical patent/CN108205649A/en
Application granted granted Critical
Publication of CN108205649B publication Critical patent/CN108205649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a method and a device for recognizing the state of a driver for calling and answering, wherein the method comprises the following steps: positioning the window of the target vehicle in the monitoring image; obtaining a driver detection candidate area according to the positioning information of the vehicle window and the license plate information of the target vehicle; detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver; and sequentially inputting the head-shoulder area into a first-layer CNN network and a second-layer CNN network, wherein the first-layer CNN network preliminarily screens the head-shoulder area to obtain the head-shoulder area of a driver suspected to receive and make calls, and the second-layer CNN network further screens the output result of the first-layer CNN network to obtain the state of receiving and making calls of the driver. According to the method and the device, the false picking-up of the non-real call receiving and making under the complex scene can be eliminated, the identification precision is improved, and the adaptability and the robustness of the scene are good.

Description

Method and device for recognizing state of driver for calling and answering
Technical Field
The application relates to the field of video monitoring, in particular to a method and a device for recognizing the state of a driver calling and answering.
Background
If a driver makes a call in the driving process, the incidence rate of traffic accidents is greatly increased, so that the call making and receiving state of a driver truck needs to be effectively identified as important evidence for whether the driver truck breaks rules and regulations.
With the continuous development of image processing technology, computer vision technology, deep learning technology and embedded technology, how to automatically judge and obtain evidence of vehicle (including personnel in the vehicle) violation has become a research hotspot in current intelligent transportation.
The prior art provides a method (CN105868690A) for identifying a driver's behavior of making a phone call, which comprises the steps of firstly collecting a video stream in a cockpit, then positioning a face region through a face part model, then carrying out face correction, and training two groups of parameters to judge whether to make a phone call or not by using a nonlinear judgment relation, wherein the training classification model comprises an ear region training set, a making phone call training set and a non-making phone call training set. The method adopts a DPM (Deformable Part Model) component detection algorithm to position the face region, the detection is time-consuming, and the detection accuracy of the shielding or face blurring condition is greatly influenced; the method adopts a nonlinear classification method to judge whether to make a call, and the accuracy is low by training ear areas, making cell phone areas and non-making cell phone areas to identify.
The prior art also provides an automatic monitoring method (CN103366506A) for the behavior of a driver taking a call during driving, which includes the steps of firstly obtaining the head and the nearby area of the driver by an image obtaining device, obtaining the positions of the face and the hands of the driver in the image by using skin color detection, then classifying by using a support vector machine, and giving a warning to the driver taking the call. The method adopts skin color detection to obtain the positions of the face and the hands of the driver in the image, the imaging of the cab is complex, and the lighting and special weather in the window under the complex scene have great influence on the imaging, so that the method has more missed detection and false detection.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for identifying a driving, answering and making a call of a driver, so as to solve the problem in the prior art that the accuracy of identifying the driving, answering and making a call of the driver is low.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of the present application, there is provided a state recognition method for a driver to take a call while driving a vehicle, the method comprising:
positioning the window of the target vehicle in the monitoring image;
obtaining a driver detection candidate area according to the positioning information of the vehicle window and the license plate information of the target vehicle;
detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver;
and sequentially inputting the head-shoulder area into a first-layer CNN network and a second-layer CNN network, wherein the first-layer CNN network preliminarily screens the head-shoulder area to obtain the head-shoulder area of a driver suspected to receive and make calls, and the second-layer CNN network further screens the output result of the first-layer CNN network to obtain the state of receiving and making calls of the driver.
Optionally, before inputting the head-shoulder area into the first-layer CNN network and the second-layer CNN network, the method further includes:
extracting the directional gradient histogram feature and the local binary feature of the head and shoulder region, and combining the directional gradient histogram feature and the local binary feature to form a multi-dimensional feature vector;
and classifying the feature vectors by utilizing linear discriminant analysis, and filtering out non-head-shoulder regions.
Optionally, the method further comprises:
dividing the head and shoulder area of the suspected call receiving and making driver into a left area, a right area and an integral area;
respectively inputting the left area, the right area and the whole area into a second-layer CNN network to obtain the state of the driver for receiving and making calls, wherein the state of the driver for receiving and making calls comprises the following steps: left incoming calls, right incoming calls, no calls, and no penalty.
Optionally, the method further comprises:
if the current frame monitoring image recognition result is that the driver is in a call receiving and making state, continuously recognizing the next frame monitoring image of the current frame monitoring image, and if the next frame monitoring image recognition result is that the driver is in the call receiving and making state, giving an alarm; otherwise, the alarm is abandoned.
Optionally, the process of obtaining the positioning information of the vehicle window includes:
acquiring a window upper right corner area according to the license plate information of the target vehicle;
and positioning the position information of the upper right corner point of the car window by using the positioning filter.
According to a second aspect of the present application, there is provided a state recognition device for a driver to take a call while driving a vehicle, the device comprising:
the positioning module is used for positioning the window of the target vehicle in the monitoring image;
the region acquisition module is used for acquiring a driver detection candidate region according to the positioning information of the vehicle window and the license plate information of the target vehicle;
the target detection module is used for detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver;
and the identification module is used for sequentially inputting the head-shoulder area into a first-layer CNN network and a second-layer CNN network, the first-layer CNN network is used for preliminarily screening the head-shoulder area to obtain the head-shoulder area of a driver suspected of receiving and making calls, and the second-layer CNN network is used for further screening the output result of the first-layer CNN network to obtain the state of receiving and making calls of the driver.
Optionally, the target detection module further includes:
the feature extraction submodule extracts the directional gradient histogram feature and the local binary feature of the head and shoulder region and combines the directional gradient histogram feature and the local binary feature to form a multi-dimensional feature vector;
and the filtering submodule is used for classifying the characteristic vectors by utilizing linear discriminant analysis and filtering out non-head-shoulder regions.
Optionally, the identification module further comprises:
a division submodule for dividing the head and shoulder area of the suspected call receiving and making driver into a left area, a right area and an integral area;
the fusion identification submodule is used for respectively inputting the left area, the right area and the whole area into a second-layer CNN network to obtain the state of the call receiving and making of the driver, and the state of the call receiving and making of the driver comprises the following steps: left incoming calls, right incoming calls, no calls, and no penalty.
Optionally, the apparatus further comprises:
the multi-frame verification module is used for continuously identifying the next monitoring image of the current monitoring image if the identification result of the current monitoring image is that the driver is in the call receiving and making state, and giving an alarm if the identification result of the next monitoring image is that the driver is in the call receiving and making state; otherwise, the alarm is abandoned.
Optionally, the positioning module comprises:
acquiring a window upper right corner area according to the license plate information of the target vehicle;
and positioning the position information of the upper right corner point of the car window by using the positioning filter.
The beneficial effect of this application: through the steps of pure video detection of vehicle window area positioning, driver target detection (head-shoulder area detection) and a cascaded CNN network, time and labor consumption of manual detection of a starting call receiving and making state can be avoided, false picking-up of non-real call receiving and making under a complex scene can be eliminated, identification precision is improved, compared with a traditional method, the penalty accuracy is higher, and adaptability and robustness to the scene are better.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart of a method for identifying a driving state and a calling state of a driver according to an embodiment of the present application;
fig. 2 is a flowchart of target detection provided in an embodiment of the present application;
FIG. 3 is a flow chart of object recognition provided by an embodiment of the present application;
fig. 4 is a block diagram of a state identification device for a driver to take a car and make a call according to an embodiment of the present application;
FIG. 5 is a block diagram of a target detection module according to an embodiment of the present disclosure;
FIG. 6 is a block diagram of an identification module according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a further state identification device for enabling a driver to take a car and make a call according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims. In addition, the features in the embodiments and the examples described below may be combined with each other without conflict.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Referring to fig. 1, the method for identifying a driving state of a driver to make and receive a call according to the present embodiment may include:
s101: and positioning the window of the target vehicle in the monitoring image.
The monitoring scene can be a scene of a road with more traffic flow or an accident easy to happen, such as a common bayonet road, and the monitoring image is shot by a bayonet camera.
In one embodiment, the acquired monitoring image f is targetedsrc(x, y), wherein x and y are respectively the abscissa and ordinate of a point on the monitored image, the Width of the monitored image is Width and the Height of the monitored image is Height, and assuming that the license plate information of the target vehicle in the image is obtained, the license plate information can comprise the license plate color LpColor and the license plate information Lp (x, y, w, h), wherein Lp (x, y) is respectively the abscissa and ordinate of the license plate (the upper left of the license plate)The abscissa and ordinate of the corner or the abscissa and ordinate of the license plate center, etc.), Lp (w, h) is the width and height of the license plate, respectively.
The window positioning is not limited to the following scheme, and for example, the window can be positioned by using Hough straight line detection, or by using Adaboost (an iterative algorithm) window detection, or the window area can be determined by using window upper right corner point positioning.
In this embodiment, the window area is determined using the window upper right corner point location.
Specifically, the process for acquiring the positioning information of the vehicle window comprises the following steps:
acquiring a window upper right corner area according to the license plate information of the target vehicle;
and positioning the position information of the upper right corner point of the car window by using the positioning filter.
In this embodiment, the upper right corner area of the vehicle window is estimated through the license plate information, and then the positioning filter is used
Figure BDA0001186423010000065
And positioning the position information of the upper right corner point.
For a candidate window upper right corner image fun(x, y) Using a positioning Filter
Figure BDA0001186423010000066
For candidate image funThe formula for the convolution calculation (x, y) is:
Figure BDA0001186423010000061
in the formula (1), x and y are respectively an abscissa and an ordinate of a point in the candidate image;
Figure BDA0001186423010000062
representing a convolution operation, g (x, y) being a candidate image fun(x, y) and positioning filter
Figure BDA0001186423010000063
And (6) convolution results.
And (3) calculating by using a formula (1), and finding out a g (x, y) peak point, namely the position information RgtUp (x, y) of the upper right corner of the car window.
In this embodiment, the filter for positioning the upper right corner point of the car window
Figure BDA0001186423010000064
The method is obtained by batch calibrated image training:
Figure BDA0001186423010000071
in the formula (2), hi(x, y) is a positioning filter corresponding to the picture of the upper right corner of the ith vehicle window,
a′ito normalize the processed filter weight coefficients,
in the present embodiment, the first and second electrodes are,
Figure BDA0001186423010000072
in the formula (3), xi、yiPosition information of a calibrated upper right corner point of the car window;
delta is an empirical coefficient;
fun(x, y) is the image of the upper right corner of the known vehicle window, and h can be reversely deduced according to the formulas (1) and (3)i(x, y) calculating to obtain the positioning filter through batch labeled images
Figure BDA0001186423010000073
S102: and obtaining a driver detection candidate area according to the positioning information of the vehicle window and the license plate information of the target vehicle.
In this case, the shape of the driver detection candidate region may be set to a regular shape as needed. In the present embodiment, the shape of the driver detection candidate region is selected as a rectangle.
According to the position information RgtUp (x, y) of the upper right corner point of the window of the target vehicle obtained in the step S101, combining the license plate information Lp with (x, y, w, h), calculating to obtain a Driver detection candidate area Driver (x, y, w, h):
Driver(x)=min(RgtUp(x)-α*Lp(w),0) (2)
Driver(y)=min(RgtUp(y)-β*Lp(w),0) (3)
Driver(w)=min(γ*Lp(w),Width-Driver(x)) (4)
Driver(h)=min(ε*Lp(w),Height-Driver(y)) (5)
in equations (2) - (5), Driver (x, y) is the abscissa and ordinate of the vertex of the Driver detection candidate region (for example, may be the top left vertex, the bottom left vertex, the top right vertex, and the bottom right vertex), respectively;
driver (w, h) is the width and height of the Driver detection candidate region, respectively;
alpha, beta, epsilon and gamma are all preset empirical coefficients, wherein alpha belongs to [1,2], beta belongs to [0.5,1], epsilon belongs to [2.5,3.5], gamma belongs to [1.5,2.5 ].
In this example, α is 1.5, β is 0.5, γ is 2.0, and ∈ is 3.0.
According to the above equations (2) to (5), the Driver detection candidate region Driver (x, y, w, h) can be obtained by calculation.
S103: and detecting the driver detection candidate region by utilizing a direction gradient histogram and a support vector machine to obtain the head and shoulder region of the driver.
Referring to fig. 2, in this step, a driver area is coarsely located by using Histogram of Oriented Gradients (HOG) in combination with Support Vector Machine (SVM) classification, and a head-shoulder area of the driver is obtained. After the HOG characteristics of the driver detection candidate area are extracted, the HOG characteristics are input into an SVM for training, and therefore the driver area is roughly positioned.
In this embodiment, after extracting the histogram feature of directional gradient in the candidate driver detection region, the histogram feature of directional gradient in the corresponding size of the sliding window is obtained in a sliding window detection manner, and is respectively input to an SVM for classification, so as to obtain the head and shoulder detection of the driver. Compared with skin color face detection, the DPM or Adaboost face detection has the advantage that the driver detection success rate is higher (in the range of 3% to 5%) under the condition that the face is shielded (the sun shield shields the face or the car window partially shields the face, etc.) or the reflection of the face is fuzzy.
In step S103, the classification result of the head-shoulder region of the driver may be classified into a head-shoulder target and a non-head-shoulder target, but there are still many false detections, and the result is influenced greatly, so that the false detection needs to be further reduced for the head-shoulder region extracted by combining the histogram of directional gradients with the support vector machine.
Referring to fig. 2 again, in this embodiment, the method for recognizing the driving, calling and answering states of the driver may further include:
extracting gradient direction histogram features and Local Binary features (LBP) of the head-shoulder region (the head-shoulder region obtained through step S103), and combining the gradient direction histogram features and the Local Binary features to form a multi-dimensional feature vector;
the feature vectors are classified using Linear Discriminant Analysis (LDA) to filter out non-head-shoulder regions.
The LDA belongs to a linear learning method, also called Fisher discriminant analysis (Fisher discriminant analysis), and the method projects a given sample set onto a straight line, so that the projection points of different types of samples can be separated maximally, and the samples of the same type are as close as possible.
In this embodiment, the LDA is used to filter the head and shoulder windows detected in step S103, which are classified into two types, human and unmanned, and filter out the areas other than the head and shoulder windows, so as to reduce the false detection rate and improve the detection accuracy.
In an embodiment, the process of performing HOG feature extraction on the driver detection candidate region may include:
the images are normalized to 40 × 40 for 16 blocks, where one block (i.e., interval) is composed of 4 cells (i.e., cell units), one cell is a set of 8 × 8 pixels, the scanning is performed in 8 pixel steps, each cell has 9 bins (i.e., 9 parts), and thus the HOG feature dimension is 16 × 4 × 9 — 576 dimensions.
And the LBP feature extraction process for the driver detection candidate region may include:
with the uniform LBP pattern, from 256 dimensions down to 59 dimensions (59 dimensions are the uniform LBP pattern), the image is normalized to 48 × 48 and divided into 3 × 3 blocks, each block is 16 × 16 in size, each block has 59-dimensional features, and thus the LBP feature dimension is 3 × 59 — 531 dimensions.
After the HOG feature and the LBP feature, the HOG and the LBP are combined to obtain a feature vector X (X) with n (n is a natural number) dimensions1,x2,x3,…,xn). Optionally, n is 1009.
Performing iterative training on the training sample (i.e. the n-dimensional feature vector) by LDA linear discriminant analysis to obtain an optimal set of training parameters W (W)1,w2,w3,…,wn)。
h=w1*x1+w2*x2+w3*x3+…+wn*xn+b1 (6)
In the formula (6), h is a result of linear discriminant analysis;
x1,x2,x3,…,xnis the eigenvalue in the eigenvector X;
w1,w2,w3,…,wn、b1are training parameters.
When in test, the extracted feature vector X (X)1,x2,x3,…,xn) Substituting the obtained result into the formula (6) to obtain h, and if h is greater than or equal to 0, considering the head-shoulder area output in the step S103 as a final head-shoulder area; if h is smaller than 0, the head-shoulder region output in step S103 is considered as a non-head-shoulder region, and the non-head-shoulder region can be directly filtered out.
The head and shoulder regions output in the step S103 are further screened according to whether h is greater than or equal to 0, so that a more accurate head and shoulder region is finally obtained, interference of a non-head and shoulder region is reduced, and thus detection accuracy is improved.
S104: and sequentially inputting the head-shoulder area into a first-layer CNN (Convolutional Neural Network) Network and a second-layer CNN Network, wherein the first-layer CNN Network preliminarily screens the head-shoulder area to obtain a head-shoulder area of a driver suspected to make a call, and the second-layer CNN Network further screens an output result of the first-layer CNN Network (namely the head-shoulder area of the driver suspected to make a call output by the first-layer CNN Network) to obtain a call making state of the driver.
In this embodiment, the number of layers of the first layer of CNN network is less than that of the second layer of CNN network, and the number of convolution kernels of the first layer of CNN network is less than that of convolution kernels of the second layer of CNN network.
The CNN network comprises an input layer, Nc convolutional layers, Np downsampling layers and Nf full-connection layers.
Specifically, each convolution layer includes Nc _ Ck convolution kernels, the convolution kernel size is Ckm × Ckm, the step size is 1, the kernel size of each downsampling layer is Pkm × Pkm, the step size is Pkm, and the number of neurons output by the last fully-connected layer of the fully-connected layers is the number of required classifications.
Referring to fig. 3, in the present embodiment, the first-layer CNN network outputs two types, i.e., two driver incoming and outgoing states, i.e., the first-layer CNN network output is 2.
The second layer CNN network is a fine classification with an output of 4, i.e. 4 drivers on-call status (left on-call, right on-call, no on-call, and no penalty).
Wherein Nc ∈ [2,10], Np ∈ [2,10], Nf ∈ [1,3 ];
Nc_Ck∈[Nc_Ckmin,Nc_Ckmax],Nc_Ckmin∈[6,16];
Ckm∈[3,7],Pkm∈[2,4]。
in step S104, the head and shoulder area obtained in step S103 is first input into the first-layer CNN network to obtain two states of incoming and outgoing calls, thereby realizing quick and rough recognition.
The first layer CNN network has a simpler structure and adopts fewer network layer numbers and convolution kernels. The aim is to filter quickly, to retain the monitoring pictures of the calling as much as possible, and to exclude the monitoring pictures of the non-calling.
The output results of the first-layer CNN network are two types, namely calling and non-calling, so that a large number of non-calling monitoring images can be filtered, the next-layer (namely the second-layer CNN network) fine classification (namely the number of monitoring images input to the second-layer CNN network) is not needed, the time consumption of fine classification is reduced, and the misjudgment rate of calling and receiving can be reduced.
Compared with the first layer CNN network, the second layer CNN network has a more complex structure, thereby realizing fine identification.
Referring to fig. 3 again, the method for recognizing the driving, calling and answering states of the driver further includes:
dividing the head-shoulder area of the suspected call-receiving driver (namely the head-shoulder area of the suspected call-receiving driver output by the first-layer CNN network) into a left area, a right area and an integral area;
respectively inputting the left area, the right area and the whole area into a second-layer CNN network to obtain the state of the driver for receiving and making calls, wherein the state of the driver for receiving and making calls comprises the following steps: left incoming calls, right incoming calls, no calls, and no penalty.
Assume that the driver's head-shoulder area is Call (x, y, w, h), where Call (x, y) is the abscissa and ordinate of the head-shoulder area, and Call (w, h) is the width and height of the head-shoulder area.
In this embodiment, the head-shoulder area is normalized to w × h, that is, w is 150, and h is 100, and the head-shoulder area to be identified is divided into three areas: the left region is Call (x, y, α w, h), the right region is Call (x + (1- α) w, y, α w, h), and the total region is Call (x, y, w, h). Where α is an empirical coefficient. Optionally, α -2/3.
After dividing the head and shoulder area to be identified into three, the divided left area, right area and total area need to be input into the second layer CNN network for multi-feature fusion discrimination. Therefore, in the embodiment, the driver target is roughly positioned by using the head and shoulder detection, and the false detection of the driver is eliminated by combining the multi-feature fusion discriminant analysis, so that the false detection rate is reduced.
In this embodiment, in order to more accurately identify the driving, answering and making a call state of the driver and improve the accuracy of identification, the method for identifying the driving, answering and making a call state of the driver further includes:
if the current frame monitoring image recognition result is that the driver is in a call receiving and making state, continuously recognizing the next frame monitoring image of the current frame monitoring image, and if the next frame monitoring image recognition result is that the driver is in the call receiving and making state, giving an alarm; otherwise, the alarm is abandoned.
Of course, the number of frames of the monitoring image to be recognized, for example, at least two consecutive monitoring images, may be selected as needed.
As shown in fig. 4, a block diagram of a state recognition device for driver's driving to receive calls provided by the present application corresponds to the above state recognition method for driver's driving to receive calls, and the contents of the state recognition device for driver's driving to receive calls can be understood or explained with reference to the above embodiment of the state recognition method for driver's driving to receive calls.
Referring to fig. 4, the present embodiment provides a status recognition apparatus for a driver to take a car and make a phone call, which may include a positioning module 100, an area obtaining module 200, an object detecting module 300, and a recognition module 400.
The positioning module 100 is configured to position a window of a target vehicle in a monitored image;
the region acquisition module 200 is used for acquiring a driver detection candidate region according to the window positioning information and the license plate information of the target vehicle;
the target detection module 300 is used for detecting the driver detection candidate region by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder region of the driver;
the identification module 400 sequentially inputs the head-shoulder area into a first-layer CNN network and a second-layer CNN network, the first-layer CNN network performs preliminary screening on the head-shoulder area to obtain a head-shoulder area of a driver suspected to make and receive calls, and the second-layer CNN network further screens an output result of the first-layer CNN network to obtain a state of making and receiving calls of the driver.
The number of layers of the first layer of CNN network is less than that of the second layer of CNN network, and the number of convolution kernels of the first layer of CNN network is less than that of the convolution kernels of the second layer of CNN network.
Further, the positioning module 100 may include:
acquiring a window upper right corner area according to the license plate information of the target vehicle;
and positioning the position information of the upper right corner point of the car window by using the positioning filter.
Further, referring to fig. 5, the object detection module 300 may further include a feature extraction sub-module 301 and a filtering sub-module 302.
The feature extraction submodule 301 is configured to extract a directional gradient histogram feature and a local binary feature of the head-shoulder region, and combine the directional gradient histogram feature and the local binary feature to form a multidimensional feature vector;
the filtering submodule 302 classifies the feature vectors by using linear discriminant analysis, and filters out non-head-shoulder regions.
Further, referring to fig. 6, the recognition module 400 may further include a dividing sub-module 401 and a fusion recognition sub-module 402.
The dividing sub-module 401 divides the head-shoulder area of the suspected call-receiving driver (the head-shoulder area of the suspected call-receiving driver output by the first-layer CNN network) into a left area, a right area and an integral area;
the fusion identification submodule 402 is configured to input the left area, the right area, and the whole area into a second-layer CNN network, and obtain a state of the driver receiving and making a call, where the state of the driver receiving and making a call includes: left incoming calls, right incoming calls, no calls, and no penalty.
Referring to fig. 7, the state recognition apparatus for driver's driving to make and receive calls may further include:
the multi-frame verification module 500 is used for continuously identifying the next monitoring image of the current monitoring image if the identification result of the current monitoring image is that the driver is in the call receiving and making state, and giving an alarm if the identification result of the next monitoring image is that the driver is in the call receiving and making state; otherwise, the alarm is abandoned.
In summary, according to the method and the device for identifying the state of the call answering when the driver starts the vehicle, the steps of pure video detection such as vehicle window area positioning, driver target detection (head-shoulder area detection) and a cascaded CNN network can be used for avoiding the time and labor consumption of manually detecting the state of the call answering when the driver starts the vehicle, and the false picking-up of the call answering and making under a complex scene can be eliminated, so that the identification precision is improved.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (10)

1. A method for recognizing a driver's driving, answering and making a call, the method comprising:
positioning the window of the target vehicle in the monitoring image;
obtaining a driver detection candidate area according to the positioning information of the vehicle window and the license plate information of the target vehicle;
detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver;
inputting the head and shoulder area into a first-layer CNN network, and primarily screening the head and shoulder area by the first-layer CNN network to filter the head and shoulder area of drivers who do not make calls and output the head and shoulder area of drivers who are suspected to make calls;
inputting the output result of the first-layer CNN network into a second-layer CNN network, and further screening the output result of the first-layer CNN network by the second-layer CNN network so as to output the state of the driver making a call based on the head-shoulder area of the driver suspected to make a call; wherein the classification category of the second-layer CNN network is different from the classification category of the first-layer CNN network.
2. The method for recognizing a driver's driving to make and receive calls as set forth in claim 1, further comprising, before inputting the head-shoulder area into the first-layer CNN network:
extracting the directional gradient histogram feature and the local binary feature of the head and shoulder region, and combining the directional gradient histogram feature and the local binary feature to form a multi-dimensional feature vector;
and classifying the feature vectors by utilizing linear discriminant analysis, and filtering out non-head-shoulder regions.
3. The state recognition method for driver's driving to make and receive calls according to claim 1,
the method further comprises the following steps:
dividing the head and shoulder area of the suspected call receiving and making driver into a left area, a right area and an integral area;
respectively inputting the left area, the right area and the whole area into a second-layer CNN network to obtain the state of the driver for receiving and making calls, wherein the state of the driver for receiving and making calls comprises the following steps: left incoming calls, right incoming calls, no calls, and no penalty.
4. The method for recognizing the driver's driving and making a call as set forth in claim 1, further comprising:
if the current frame monitoring image recognition result is that the driver is in a call receiving and making state, continuously recognizing the next frame monitoring image of the current frame monitoring image, and if the next frame monitoring image recognition result is that the driver is in the call receiving and making state, giving an alarm; otherwise, the alarm is abandoned.
5. The state recognition method for a driver's driving to make and receive calls according to claim 1, wherein the window positioning information acquisition process includes:
acquiring a window upper right corner area according to the license plate information of the target vehicle;
and positioning the position information of the upper right corner point of the car window by using the positioning filter.
6. A state recognition apparatus for making and receiving a call while a driver is driving, said apparatus comprising:
the positioning module is used for positioning the window of the target vehicle in the monitoring image;
the region acquisition module is used for acquiring a driver detection candidate region according to the positioning information of the vehicle window and the license plate information of the target vehicle;
the target detection module is used for detecting the driver detection candidate area by utilizing a direction gradient histogram and a support vector machine to obtain a head and shoulder area of the driver;
the identification module is used for inputting the head and shoulder area into a first-layer CNN network, and the first-layer CNN network preliminarily screens the head and shoulder area to filter the head and shoulder area of a driver who is not calling and output the head and shoulder area of the driver suspected of receiving and calling;
inputting the output result of the first-layer CNN network into a second-layer CNN network, and further screening the output result of the first-layer CNN network by the second-layer CNN network so as to output the state of the driver making a call based on the head-shoulder area of the driver suspected to make a call; wherein the classification category of the second-layer CNN network is different from the classification category of the first-layer CNN network.
7. The driver's driving, call-receiving state recognition apparatus according to claim 6, wherein the object detection module further comprises:
the feature extraction submodule extracts the directional gradient histogram feature and the local binary feature of the head and shoulder region and combines the directional gradient histogram feature and the local binary feature to form a multi-dimensional feature vector;
and the filtering submodule is used for classifying the characteristic vectors by utilizing linear discriminant analysis and filtering out non-head-shoulder regions.
8. The driver's driving, call-receiving state recognition device according to claim 6, wherein the recognition module further comprises:
a division submodule for dividing the head and shoulder area of the suspected call receiving and making driver into a left area, a right area and an integral area;
the fusion identification submodule is used for respectively inputting the left area, the right area and the whole area into a second-layer CNN network to obtain the state of the call receiving and making of the driver, and the state of the call receiving and making of the driver comprises the following steps: left incoming calls, right incoming calls, no calls, and no penalty.
9. The driver's driving, call-receiving state recognition apparatus according to claim 6, further comprising:
the multi-frame verification module is used for continuously identifying the next monitoring image of the current monitoring image if the identification result of the current monitoring image is that the driver is in the call receiving and making state, and giving an alarm if the identification result of the next monitoring image is that the driver is in the call receiving and making state; otherwise, the alarm is abandoned.
10. The driver's driving, call-receiving state recognition device according to claim 6, wherein the location module comprises:
acquiring a window upper right corner area according to the license plate information of the target vehicle;
and positioning the position information of the upper right corner point of the car window by using the positioning filter.
CN201611185468.8A 2016-12-20 2016-12-20 Method and device for recognizing state of driver for calling and answering Active CN108205649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611185468.8A CN108205649B (en) 2016-12-20 2016-12-20 Method and device for recognizing state of driver for calling and answering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611185468.8A CN108205649B (en) 2016-12-20 2016-12-20 Method and device for recognizing state of driver for calling and answering

Publications (2)

Publication Number Publication Date
CN108205649A CN108205649A (en) 2018-06-26
CN108205649B true CN108205649B (en) 2021-08-31

Family

ID=62603495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611185468.8A Active CN108205649B (en) 2016-12-20 2016-12-20 Method and device for recognizing state of driver for calling and answering

Country Status (1)

Country Link
CN (1) CN108205649B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214289B (en) * 2018-08-02 2021-10-22 厦门瑞为信息技术有限公司 Method for recognizing two-stage calling behavior from whole to local
CN109165607B (en) * 2018-08-29 2021-12-14 浙江工业大学 Driver handheld phone detection method based on deep learning
CN109376634A (en) * 2018-10-15 2019-02-22 北京航天控制仪器研究所 A kind of Bus driver unlawful practice detection system neural network based
CN111310751B (en) * 2018-12-12 2023-08-29 北京嘀嘀无限科技发展有限公司 License plate recognition method, license plate recognition device, electronic equipment and storage medium
CN111723602B (en) * 2019-03-19 2023-08-08 杭州海康威视数字技术股份有限公司 Method, device, equipment and storage medium for identifying driver behavior
CN110110631B (en) * 2019-04-25 2021-06-29 深兰科技(上海)有限公司 Method and equipment for recognizing and making call
CN112307821A (en) * 2019-07-29 2021-02-02 顺丰科技有限公司 Video stream processing method, device, equipment and storage medium
CN112966563B (en) * 2021-02-04 2022-09-20 同济大学 Behavior identification method based on human skeleton detection and tracking algorithm

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737255A (en) * 2011-03-30 2012-10-17 索尼公司 Target detection device and method
CN103971135A (en) * 2014-05-05 2014-08-06 中国民航大学 Human body target detection method based on head and shoulder depth information features
CN104156717A (en) * 2014-08-31 2014-11-19 王好贤 Method for recognizing rule breaking of phoning of driver during driving based on image processing technology
CN104715238B (en) * 2015-03-11 2018-09-11 南京邮电大学 A kind of pedestrian detection method based on multi-feature fusion
CN105512683B (en) * 2015-12-08 2019-03-08 浙江宇视科技有限公司 Object localization method and device based on convolutional neural networks
CN106056071B (en) * 2016-05-30 2019-05-10 北京智芯原动科技有限公司 A kind of driver makes a phone call the detection method and device of behavior

Also Published As

Publication number Publication date
CN108205649A (en) 2018-06-26

Similar Documents

Publication Publication Date Title
CN108205649B (en) Method and device for recognizing state of driver for calling and answering
CN106682601B (en) A kind of driver's violation call detection method based on multidimensional information Fusion Features
US11003931B2 (en) Vehicle monitoring method and apparatus, processor, and image acquisition device
US9842266B2 (en) Method for detecting driver cell phone usage from side-view images
CN107220624A (en) A kind of method for detecting human face based on Adaboost algorithm
CN108108761A (en) A kind of rapid transit signal lamp detection method based on depth characteristic study
CN105809184B (en) Method for real-time vehicle identification and tracking and parking space occupation judgment suitable for gas station
US20150286884A1 (en) Machine learning approach for detecting mobile phone usage by a driver
CN106570439B (en) Vehicle detection method and device
CN109670515A (en) Method and system for detecting building change in unmanned aerial vehicle image
CN111553214B (en) Method and system for detecting smoking behavior of driver
CN106022242B (en) Method for identifying call receiving and making of driver in intelligent traffic system
JP2019106193A (en) Information processing device, information processing program and information processing method
CN112052782A (en) Around-looking-based parking space identification method, device, equipment and storage medium
CN106407951A (en) Monocular vision-based nighttime front vehicle detection method
CN103021179A (en) Real-time monitoring video based safety belt detection method
Reddy et al. A Deep Learning Model for Traffic Sign Detection and Recognition using Convolution Neural Network
CN112686248B (en) Certificate increase and decrease type detection method and device, readable storage medium and terminal
Nguwi et al. Number plate recognition in noisy image
CN110516547B (en) Fake-licensed vehicle detection method based on weighted non-negative matrix factorization
CN112733851A (en) License plate recognition method for optimizing grain warehouse truck based on convolutional neural network
JP2019106149A (en) Information processing device, information processing program and information processing method
CN111723800A (en) License plate calibration and identification method and system based on convolutional neural network and electronic equipment
Wang et al. The color identification of automobiles for video surveillance
CN110688876A (en) Lane line detection method and device based on vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant