CN114387587A

CN114387587A - Fatigue driving monitoring method

Info

Publication number: CN114387587A
Application number: CN202210040471.XA
Authority: CN
Inventors: 李春麟; 程维; 魏志国; 孙霜铭; 潘永康; 彭程
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-04-22

Abstract

The invention provides a fatigue driving monitoring method, which belongs to the technical field of fatigue driving monitoring and comprises the steps of constructing a prediction model for detecting the eyes to be opened/closed, calculating the blinking frequency of a driver, estimating the head posture of the driver by calculating the Euler angle of the head posture of the driver, constructing the prediction model for extracting the characteristics of a facial image and a binocular image, and estimating the gazing area of the driver; the fatigue state of the driver is monitored by monitoring the blinking frequency, the head posture angle and the fixation point of the driver in real time, so that the accuracy of detecting the fatigue driving of the driver is improved. The method is a non-contact method, can automatically monitor fatigue driving by only collecting the head and face images of the driver through one camera arranged in the vehicle, has low manufacturing cost, does not influence the normal driving of the driver, and greatly improves the market application value.

Description

Fatigue driving monitoring method

Technical Field

The invention belongs to the technical field of fatigue driving monitoring, and particularly relates to a fatigue driving monitoring method.

Background

In the modern society of high-speed operation, traffic accidents frequently occur, and fatigue driving is an important reason for the traffic accidents, so that the monitoring of whether a driver is fatigue driving has great significance. At present, three methods for monitoring fatigue driving at home and abroad can be generally adopted, and the first monitoring method is to monitor the driving characteristics (such as the motion rule of a steering wheel, the service condition of an accelerator pedal and the like) of a vehicle so as to monitor whether a driver is in fatigue driving. The monitoring method can be used only in specific environments, and the monitoring method also loses the function in complex road environments such as mountain roads, mud roads and the like; the second monitoring method focuses on electrical signals of a human body, such as an Electrooculogram (EOG), an Electrocardiogram (ECG) and an electroencephalogram (EEG), and requires a driver to wear a large number of sensors, which may affect normal driving of the driver, increase driving risk, and have great limitations. With the rapid development of computer vision technology, fatigue driving monitoring technology based on computer vision becomes more and more popular.

The Chinese patent 'CN 203562073U device for reminding driver of safe driving' provides a steering wheel annular sleeve for reminding driver of safe driving, and the device can realize vibration at fixed intervals so as to achieve the purpose of reminding driver of concentrating driving. In essence, the driver is reminded at a fixed time instead of monitoring the state of the driver, and the driver is reminded in real time when the driver is tired. Chinese patent CN 108010272A fatigue driving reminding device designs a fatigue driving reminding device, can realize driving to the fixed time after, reminds the driver to rest. The Chinese patent CN 110299014A safety driving prompting device designs a safety driving prompting device, and realizes the monitoring of fatigue driving of a driver by analyzing state data such as steering of a steering wheel. The fatigue state of the driver is estimated by analyzing the steering wheel data and the like, and this method cannot be applied to a road section with a complicated driving environment such as a mountain road, a mud road and the like.

In a production environment, many automotive companies collaborate with third party companies to design, develop and monitor products for driver status. For example, an alarm system in an automobile tracks the steering pattern of the driver and generates a warning signal when an abnormal deviation is detected. Or the rest auxiliary system in the automobile can judge the fatigue degree of the driver according to the motion rule of the steering wheel and the service condition of the pedal, and after fatigue driving is detected, the automobile warns the driver in the forms of steering wheel vibration, sound signals and the like. The fatigue driving monitoring products appearing on the market are popular in that the fatigue state of a driver is judged by monitoring pupils and eyelids through wireless glasses, and are applied to commercial vehicles at present. But is not friendly to the shortsighted drivers, has high cost and is not suitable for being popularized and used in a large area.

Disclosure of Invention

In view of the above problems, it is an object of the present invention to provide a fatigue driving monitoring method, which can monitor the driving state of a driver in real time by capturing a facial image of the driver, and prevent the driver from fatigue driving without interfering with the normal driving behavior of the driver.

In order to achieve the above object, the present invention provides a fatigue driving monitoring method, comprising:

constructing a detection model for detecting opening/closing of eyes, and calculating the blinking frequency of a driver;

estimating the head attitude of the driver by calculating the Euler angle of the head attitude of the driver;

constructing a detection model of a driver watching area, and detecting the watching area of the driver through an eye image and a face image of the driver;

and judging whether the driver is fatigue driving according to the blinking frequency, the head posture or the watching area of the driver.

The constructing of a detection model for detecting the opening/closing of eyes and the calculation of the blinking frequency of the driver include:

step 1.1: collecting facial image data of a driver to construct an open/closed eye detection data set;

step 1.2: constructing a LeNet neural network as a prediction model for detecting open/closed eyes, and training by using an open/closed eye detection data set;

step 1.3: aiming at the facial image of the driver to be monitored, the trained LeNet neural network is used for predicting the eyes to be opened or closed, and the statistical prediction result is the total time length T occupied by the adjacent 20 frames of images with closed eyes.

The estimation of the head posture of the driver by calculating the Euler angle of the head posture of the driver comprises the following steps:

step 2.1: positioning 68 key points of the face in each image by using a cascade regression tree algorithm to obtain two-dimensional coordinates of the 68 key points;

step 2.2: calculating a rotation matrix rot _ vector of the head through an N-point perspective pose solving algorithm according to the universal three-dimensional coordinates of the key points of the head and the two-dimensional coordinates of the key points of the 68 faces obtained in the step 2.1;

step 2.3: and converting the rotation matrix into a pitch angle pitch, a yaw angle yaw and a roll angle roll in a space coordinate system to represent the head posture of the driver.

The construction of a prediction model for extracting the characteristics of the facial image and the binocular image and the estimation of the gaze area of the driver comprises the following steps:

step 3.1: taking the public data set DDGC-DB1 as a training set of a training model;

step 3.2: carrying out specification adjustment on each sample in the training set, and uniformly adjusting the samples to 224 multiplied by 224;

step 3.3: cutting out the face image from the zoomed image, and recording the width of the face image as L₁Height is denoted as H₁；

Step 3.4: taking the geometric center of the eye area as the center, intercepting the images of the two eyes, and recording the distance between the two corners of the eyes as L₂Height is denoted as H₂；

Step 3.5: respectively extracting features of the intercepted face image and the intercepted binocular image by using a convolutional layer of a VGG16 neural network to obtain 3 feature vectors of 1 multiplied by 4096, and respectively recording the feature vectors as xi, psi and gamma;

step 3.6: calculating a weighted sum vector gamma of the vector calculated in the step 3.5;

calculating Euclidean distance o between vectors xi and psi₁：

Calculating Euclidean distance o between vectors xi and gamma₂：

Γ＝ξ+o₁ψ+o₂γ

In the formula, xi_kValue of the Kth element, ψ, representing vector ξ_kValue of the Kth element, gamma, representing the vector psi_kThe kth element value, K ═ 1,2,3, … …,4095, representing the vector γ;

ψ＝[ψ₀，ψ₁，……，ψ₄₀₉₅]，γ＝[γ0，γ₁，……，γ₄₀₉₅]，ξ＝[ξ₀，ξ₁，……，ξ₄₀₉₅]；

step 3.7: and calculating the sum vector gamma through two full-connection layers, and then carrying out normalization calculation on the result by using softmax to obtain an output vector R, wherein the element value in R is the predicted probability value of each category, and the element position where the maximum value is located is the prediction result of the gazing area.

Whether the driver is fatigue driving is judged according to the blinking frequency, the head posture or the watching area of the driver, and the method is specifically expressed as follows:

aiming at the facial image of the driver to be monitored, carrying out eye opening or eye closing prediction by utilizing a trained LeNet neural network, counting the total time length T occupied by adjacent 20 frames of images with closed eyes as a prediction result, and if the T is less than or equal to one minute or the T is more than or equal to two minutes, determining that the driver is fatigue driving;

when the maximum values of the changes of pitch, yaw and roll are all less than or equal to a set threshold value within a period of time, judging that the driver is fatigue driving;

and aiming at the facial image of the driver to be monitored, detecting the watching area according to the intercepted facial image and the binocular image, and judging that the driver is fatigue driving when the predicted results of the driver are the same watching area within a certain period of time.

The step 1.1 comprises the following steps:

step 1.1.1: respectively acquiring facial image data of different angles of the head of N drivers;

step 1.1.2: positioning 68 key points of the human face in each image;

step 1.1.3: after the key points of the face are obtained, the direction and the size of the face are corrected;

step 1.1.4: intercepting the corrected eye image as a sample in a training set;

step 1.1.5: each sample was labeled as open or closed.

Said step 1.1.3 comprises:

step S1-1: using the upper left corner point of the image as the origin, the horizontal direction as the horizontal axis and the vertical direction as the vertical axis, and according to the left eye corner key point P of the left eye₃₇Right corner of the right eye, key point P₄₆Calculates a rotation center point a (a) of the image_x,a_y)：

In the formula, P₃₇.x、P₃₇Y represents the abscissa and ordinate of the face key point with the number 37, respectively; p46.x, P₄₆Y represents respectivelyThe abscissa and the ordinate of the face key point with the number of 46;

step S1-2: when the image deflects in the horizontal direction, correcting the image in the horizontal direction according to the rotation angle alpha;

step S1-3: and the corrected images are zoomed to ensure that the eye sizes of all the images are consistent.

The step S1-2 includes:

step SS 1-1: calculating a point P by taking the upper left corner point of the image as an original point, the horizontal direction as an X axis and the vertical direction as a Y axis₃₇And point P₄₆Height difference h in Y-axis direction, point P when head is not deflected in horizontal direction₃₇And point P₄₆The height difference in the Y-axis direction is 0;

h＝＝P₃₇.y-P₄₆.y

when h <0, it means that the head is deflected leftward in the horizontal direction, and vice versa;

step SS 1-2: calculating a point P₃₇To point P₄₆Distance r therebetween:

step SS 1-3: calculating the rotation angle alpha:

step SS 1-4: and rotating the picture by alpha degrees in the horizontal direction according to the rotation angle to realize the correction of the picture in the horizontal direction.

The step S1-3 includes:

step SS 2-1: calculating right canthus key point P of left eye₄₀And left eye corner key point P of right eye₄₃Distance d in the direction of the X-axis, i.e. d ═ P₄₃.x-P₄₀.x；

Step SS 2-2: calculating a scaling scale, wherein the scale is D/D, and D is any constant;

step SS 2-3: and zooming the picture according to the zooming scale.

The step 1.2 comprises the following steps:

step 1.2.1: processing the sample image specification into a size of 3 multiplied by 32 as the input of a LeNet neural network;

step 1.2.2: performing convolution operation on an input image by using 6 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step length is 1, and edge filling (padding) is not performed to obtain a characteristic vector with the specification of 6 multiplied by 28;

step 1.2.3: performing maximum pooling operation on the obtained feature vector, wherein the size of a convolution kernel is 2 multiplied by 2, the step length is 2, edge filling (padding) is not performed, and the specification of the feature vector is converted into 6 multiplied by 14;

step 1.2.4: performing convolution operation on the feature vector obtained in the step 1.2.3 by using 16 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step length is 1, edge filling (padding) is not performed, and the specification of the feature vector is converted into 16 multiplied by 10;

step 1.2.5: performing maximum pooling operation on the feature vectors obtained in the step 1.2.4, wherein the size of a convolution kernel is 2 multiplied by 2, the step length is 2, edge filling (padding) is not performed, and the specification of the feature vectors is converted into 16 multiplied by 5;

step 1.2.6: performing convolution operation on the feature vector obtained in the step 1.2.5 by using 120 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step length is 1, edge filling (padding) is not performed, and the specification of the feature vector is converted into 120 multiplied by 1;

step 1.2.7: inputting the eigenvector obtained in the step 1.2.6_ into a full-connection layer F1 for full-connection calculation, wherein the number of the neurons of F1 is 120, and the eigenvector with the specification of 1 multiplied by 120 is obtained;

step 1.2.8: inputting the eigenvector obtained in the step 1.2.7 into a full-connection layer F2 for full-connection calculation, wherein the number of neurons in F2 is 2, and the eigenvector with the specification of 1 multiplied by 2 is obtained

Step 1.2.9: and (3) carrying out normalization operation on the vector obtained in the step (1.2.7) by using softmax to obtain the probability that the input eye picture is open or closed.

The invention has the beneficial effects that:

the invention provides a fatigue driving monitoring method, which monitors the fatigue state of a driver by monitoring the blinking frequency, the head posture angle and the fixation point of the driver in real time and improves the accuracy of detecting the fatigue driving of the driver. The method is a non-contact method, can automatically monitor fatigue driving by only collecting the head and face images of the driver through one camera arranged in the vehicle, has low manufacturing cost, does not influence the normal driving of the driver, and greatly improves the market application value.

Drawings

FIG. 1 is a flow chart of a method for monitoring fatigue driving according to the present invention;

FIG. 2 is a diagram of location information of 68 key points of a face according to the present invention;

FIG. 3 is a schematic view of a gaze area prediction network according to the present invention;

fig. 4 is a schematic view of the division of the gaze region in the present invention.

Detailed Description

The invention is further described with reference to the following figures and specific examples. After a camera acquires a face image, 68 key points of a personal face are positioned by using a landmark model of dlib, then the face image is subjected to geometric correction (size and direction adjustment) so that the face image is always kept horizontal and the distance between the right eye corner of a left eye and the left eye corner of a right eye is always ensured to be 90 pixels, then the eye image is acquired, eye opening/closing detection is carried out, and the statistical detection result is the total duration T occupied by 20 adjacent frames of images with closed eyes; again, 3D head angle calculations were performed using the SlovePnP algorithm in OpenCV with 68 individual face key points. And finally, the gaze area is detected through the face image, and whether the driver has dangerous driving behaviors or not can be monitored while fatigue driving of the driver is monitored. The monitoring method has the advantages that three monitoring modes of blink frequency monitoring, head posture change monitoring and watching region monitoring are combined, fatigue driving behaviors of a driver can be accurately and sensitively captured, in addition, distraction driving behaviors (operating a radio, a mobile phone and the like) of the driver can be captured through watching region monitoring, and driving safety of the driver is guaranteed.

As shown in fig. 1, the method for monitoring fatigue driving provided by the present invention comprehensively determines whether a driver is in fatigue driving by estimating a blinking frequency, a head pose, and a gaze region of the driver, and includes:

constructing a detection model for detecting opening/closing of eyes, and calculating the blinking frequency of a driver; the method comprises the following steps:

step 1.1: acquiring 20 facial image data of a driver to prepare a training sample set as an open/closed eye detection data set, wherein a camera for acquiring facial images of the driver can be arranged in the center of an instrument panel or at a position where normal driving of the driver is not affected, such as an interior rearview mirror; the method comprises the following steps:

step 1.1.1: respectively collecting facial image data of different angles of the face of N drivers;

step 1.1.2: positioning 68 key points of the human face in each image; in a specific implementation, a cascade regression tree algorithm (landmark model of dlib) is used, and other methods such as deep learning are also available, and the position information of 68 key points is shown in fig. 2.

Step 1.1.3: after the key points of the face are obtained, the direction and the size of the face are corrected; the method comprises the following steps:

step S1-1: using the upper left corner point of the image as the origin, the horizontal direction as the horizontal axis (X axis), and the vertical direction as the longitudinal axis (Y axis), according to the left eye corner key point P of the left eye₃₇(point 37 in FIG. 2), right corner key point P for the right eye₄₆(point 46 in fig. 2) coordinate values to calculate the rotation center point a (a) of the image_x,a_y)：

In the formula, P₃₇.x、P₃₇Y represents the abscissa and ordinate of the face key point with the number 37, respectively; p46.x, P₄₆Y represents the abscissa and ordinate of the face key point with the number 46, respectively;

step S1-2: when the image deflects in the horizontal direction, correcting the image in the horizontal direction according to the rotation angle alpha; the method comprises the following steps:

step SS 1-1: calculating a point P by taking the upper left corner point of the image as an original point, the horizontal direction as an X axis and the vertical direction as a Y axis₃₇And point P₄₆A height difference h in the Y-axis direction, which is 0 when the head is not deflected in the horizontal direction;

h＝P₃₇.y-P₄₆.y

step SS 1-3: calculating the rotation angle alpha:

step SS 1-4: rotating the picture by alpha degrees in the horizontal direction according to the rotation angle to realize the correction of the picture in the horizontal direction; the rotaimage method in the OpenCV toolkit or the like can be used;

step S1-3: the corrected images are zoomed to ensure that the eye sizes of all the images are consistent; the method comprises the following steps:

step SS 2-1: calculating right canthus key point P of left eye₄₀(point 40 in FIG. 2) and left eye corner key point P for the right eye₄₃(point 43 in fig. 2) is a distance d in the X-axis direction, i.e., d is P₄₃.x-P₄₀.x；

step SS 2-3: zooming the picture according to the zooming scale;

step 1.1.4: intercepting the corrected eye image as a sample in a training set;

step 1.1.5: labeling each sample with open eyes or closed eyes;

step 1.2: constructing a LeNet neural network as a prediction model for detecting open/closed eyes, and training by using a training sample set; the method comprises the following steps:

step 1.2.7: inputting the eigenvector obtained in the step 1.2.6 into a full-connection layer F1 for full-connection calculation, wherein the number of the neurons of F1 is 120, and the eigenvector with the specification of 1 multiplied by 120 is obtained;

step 1.2.8: inputting the eigenvector obtained in the step 1.2.7 into a full-connection layer F2 for full-connection calculation, wherein the number of neurons in F2 is 2, and the eigenvector with the specification of 1 multiplied by 2 is obtained;

step 1.2.9: carrying out normalization operation on the vector obtained in the step 1.2.7 by using softmax to obtain the probability that the input eye picture is open or closed;

model training: and (3) carrying out blink detection on the eye image by using a trained LeNet neural network, wherein the network architecture of LeNet comprises the following components: the input image is 3 × 32 × 32 in size, and the following operations are performed on the input image:

(1) carrying out convolution operation on an input image by using 6 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step size is 1, and a characteristic map of 6 multiplied by 28 is obtained without padding;

(2) performing maximum pooling operation on the obtained feature vectors, wherein the size of a convolution kernel is 2 multiplied by 2, the step length is 2, padding is not performed, and the size of a feature map is converted into 6 multiplied by 14;

(3) performing convolution operation on the feature vector obtained in the step (2) by using 16 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step size is 1, and a 16 multiplied by 10 feature map is obtained without padding;

(4) performing maximum pooling operation on the feature vectors obtained in the step (3), wherein the size of a convolution kernel is 2 multiplied by 2, the step length is 2, padding is not performed, and the size of a feature map is converted into 16 multiplied by 5;

(5) performing convolution operation on the feature vector obtained in the step (4) by using 120 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step length is 1, and a 120 multiplied by 1 feature map is obtained without padding;

(6) inputting the feature vectors obtained in the step (5) to a full-connection layer (F1) for full-connection operation, wherein the number of the neurons of F1 is 120;

(7) inputting the feature vector obtained in the step (6) to a full-connection layer (F2) for full-connection calculation, wherein the number of the neurons in F2 is 2;

(8) and (4) carrying out normalization operation on the vector obtained in the step (6) by using softmax to obtain a final open/closed eye prediction result.

Blink detection: the blinking frequency of a person is 15-20 times per minute under normal conditions, in the early stage of fatigue driving, a driver can frequently blink to relieve fatigue, and in the case of over fatigue, the blinking frequency is lower than 15 times per minute, so that the time for blinking 20 times is counted, and if the time is less than one minute or more than two minutes, the driver is considered to be fatigue driving. The blinking process is a process from eye opening to eye closing to eye opening, and tests show that in a normal condition, a camera with a frame rate of 30 has only one frame of image as eye closing in each blinking process, so that the blinking times can be reduced to the number of frames with eye closing, and the total time spent on each 20 frames of eye closing images is counted to judge whether the driver is tired to drive.

Step 1.3: aiming at the facial image of a driver to be monitored, carrying out eye opening or eye closing prediction by using a trained LeNet neural network, and counting the total time length T occupied by adjacent 20 frames of images with closed eyes as a detection result; and judging whether the driver is fatigue driving or not according to the total time length T, and if the T is less than or equal to one minute or the T is more than or equal to two minutes, determining that the driver is fatigue driving.

Head pose estimation: in the normal driving process, the driver needs to judge the driving environment information, so that the head posture changes, and if the driver does not detect large-angle rotation (more than 20 degrees) in the horizontal direction for a long time (more than 10 minutes), the driver is considered to be tired.

Estimating the head attitude of the driver by calculating the Euler angle of the head attitude of the driver; the method comprises the following steps:

step 2.1: using a cascade regression tree algorithm to locate 68 key points of the face in each image (using the landmark model of dlib), and obtaining two-dimensional coordinates of the 68 key points, as shown in fig. 2;

the 68 3D generic keypoint coordinates are as follows:

key points 1: -73.393523, -29.801432, -47.667532

Key point 2: -72.775014, -10.949766, -45.909403

Key points 3: 70.533638,7.929818, -44.84258

Key points 4: -66.850058,26.07428, -43.141114

Key points 5: -59.790187,42.56439, -38.635298

Key points 6: -48.368973,56.48108, -30.750622

Key points 7: -34.121101,67.246992, -18.456453

Key points 8 of-17.875411, 75.056892 and-3.609035

Key points 9:0.098749,77.061286,0.881698

The key points are 10:17.477031,74.758448, -5.181201

Key points 11:32.648966,66.929021, -19.176563

The key points are 12:46.372358,56.311389, -30.77057

Key points 13:57.34348,42.419126, -37.628629

Key points 14:64.388482,25.45588, -40.886309

Key points 15:68.212038,6.990805, -42.281449

Key points 16:70.486405, -11.666193, -44.142567

Key points 17:71.375822, -30.365191, -47.140426

Key points 18 of-61.119406, -49.361602, -14.254422

Key points 19-51.287588, -58.769795, -7.268147

Key points 20: 37.8048, -61.996155, -0.442051

Key points 21: -24.022754, -61.033399,6.606501

Key point 22, -11.635713, -56.686759,11.967398

The key points 23:12.056636, -57.391033,12.051204

Key points 24:25.106256, -61.902186,7.315098

Key points 25:38.338588, -62.777713,1.022953

Key points 26:51.191007, -59.302347, -5.349435

Key points 27:60.053851, -50.190255, -11.615746

Key points 28:0.65394, -42.19379,13.380835

Key points 29:0.804809, -30.993721,21.150853

The key points are 30:0.992204, -19.944596,29.284036

Key points 31:1.226783, -8.414541,36.94806

Key points 32: -14.772472,2.598255,20.132003

Key points 33-7.180239, 4.751589,23.536684

Key points 34:0.55592,6.5629,25.944448

Key points 35:8.272499,4.661005,23.695741

Key points 36:15.214351,2.643046,20.858157

Key points 37-46.04729, -37.471411, -7.037989

Key points 38-37.674688, -42.73051, -3.021217

Key points 39-27.883856, -42.711517, -1.353629

Key points 40: -19.648268, -36.754742,0.111088

Key points 41-28.272965, -35.134493,0.147273

Key points 42 of-38.082418, -34.919043, -1.476612

Key points 43:19.265868, -37.032306,0.665746

Key points 44:27.894191, -43.342445, -0.24766

Key points 45:37.437529, -43.110822, -1.696435

Key points 46:45.170805, -38.086515, -4.894163

Key points 47:38.196454, -35.532024, -0.282961

The key points are 48:28.764989, -35.484289,1.172675

Key points 49-28.916267, 28.612716,2.24031

Key points 50-17.533194, 22.172187,15.934335

Key points 51-6.68459, 19.029051,22.611355

Key points 52:0.381001,20.721118,23.748437

Key points 53:8.375443,19.03546,22.721995

Key points 54:18.876618,22.394109,15.610679

Key points 55:28.794412,28.079924,3.217393

Key points 56:19.057574,36.298248,14.987997

Key points 57:8.956375,39.634575,22.554245

Key points 58:0.381549,40.395647,23.591626

Key points 59-7.428895, 39.836405,22.406106

Key points 60 of-18.160634, 36.677899 and 15.121907

Key points 61: -24.37749,28.677771,4.785684

Key points 62: -6.897633,25.475976,20.893742

Key points 63:0.340663,26.014269,22.220479

Key points 64:8.444722,25.326198,21.02552

Key points 65:24.474473,28.323008,5.712776

Key points 66:8.449166,30.596216,20.671489

Key points 67:0.205322,31.408738,21.90367

Key points 68-7.198266, 30.844876,20.328022

Step 2.3: converting the rotation matrix into a pitch angle (pitch, rotating around an X axis), a yaw angle (yaw, rotating around a Y axis) and a roll angle (roll, rotating around a Z axis) in a space coordinate system (right-handed Cartesian coordinate system) so as to represent the head pose of a driver; the concrete expression is as follows:

converting the rotation matrix rot _ vent into quaternions which are respectively marked as w, p, q and k;

where rot _ vent [0] [0] represents the value of row 1, column 1 element of the rotation matrix; rot _ vent [1] [0] represents the value of row 2, column 1 element of the rotation matrix; rot _ vent [2] [0] represents the value of the element of row 3, column 1 of the rotation matrix;

calculating Euler angles of 3 directions in a right-handed Cartesian space coordinate system:

when the maximum value of the changes of the pitch, the paw, the yaw and the roll is monitored to be less than or equal to the set threshold value within a period of time, the fact that the driver is fatigue driving is indicated.

Annotation region estimation: in the normal driving process, a driver needs to observe the road environment, so that the annotation area changes, whether the driver is tired is judged according to the change condition of the fixation point, and if the fixation point does not change within 5 minutes, the driver can be considered to be tired. In addition, the driving safety is also affected by the distracting driving behavior of the driver operating the air conditioner, the radio, the mobile phone and other devices. Besides judging whether the driver is tired driving or not through the gaze point detection, the driver can also give a prompt when the driver is not attentive to driving. The annotation region estimation includes:

1) use of the data set: the public data set DDGC-DB1 is adopted;

2) model training: the estimation of the gaze area employs a multi-branch strategy, as shown in fig. 3. Firstly, face correction is carried out by using the same method as that in blink detection, then a face image and an eye image are intercepted, the size of the image is adjusted to 224 multiplied by 224, and a resize method in an opencv library can be used for realizing; the geometric center of the eye is taken as the center, the width of the intercepted image is the distance between two eye corners of the eye, and the same image size and the same intercepting method are adopted when the annotation region is estimated, so that the data used for prediction is the same as the data set of the training model.

Respectively extracting features of the intercepted face image and the intercepted binocular image by using a convolutional layer of a VGG16 neural network to obtain 3 feature vectors of 1 multiplied by 4096, and respectively recording the feature vectors as xi, psi and gamma; the convolutional layer parameters of the VGG16 neural network are shown in table 1, where: conv denotes the convolutional layer, Relu denotes the activation function, and Pool denotes the pooling layer.

TABLE 1 model parameters Table for VGG16 neural networks

Constructing a detection model of a driver gazing area, wherein the feature extraction model adopts a VGG16 neural network, and the gazing area of the driver is detected through an eye image and a face image of the driver; the method comprises the following steps:

step 3.1: taking the public data set DDGC-DB1 as a training set of the model;

calculating Euclidean distance o between vectors xi and psi₁：

Calculating Euclidean distance o between vectors xi and gamma₂：

Γ＝ξ+o₁ψ+o₂γ

ψ＝[ψ₀，ψ₁，……，ψ₄₀₉₅]，γ＝[γ₀，γ₁，……，γ₄₀₉₅]，ξ＝[ξ₀，ξ₁，……，ξ₄₀₉₅]；

step 3.7: calculating a sum vector gamma through two full-connection layers, and then carrying out normalization calculation on the result by using softmax to obtain a vector R, wherein element values in the R are predicted probability values of various categories, and the element position of the maximum value is a prediction result of a watching region; and inputting the acquired facial image of the driver to be monitored into the trained gazing area detection model, and outputting a prediction result of the gazing area. If the watching areas of the driver in a certain time period are the instrument panel area, the equipment control area such as navigation and audio, and the auxiliary driving glove box area, the driver is considered to drive with distraction, and the watching area division schematic diagram is shown in fig. 4, wherein (1) the area is the left side rearview mirror of the vehicle; (2) the area is an instrument panel; (3) the area is an equipment control area; (4) the area is a front glove compartment of the copilot; (5) the area is a right side rearview mirror; dividing the front windshield into 9 areas which are respectively numbered as (6) - (14); the number of the other areas of the left front door window except the rearview mirror area is (15); the number of the other areas of the right front side door window except the rearview mirror area is (16); the steering wheel area number is (17).

Judging whether the driver is fatigue driving according to the blinking frequency, the head posture or the watching area of the driver; the concrete expression is as follows:

aiming at the head image of the driver to be monitored, carrying out eye opening or eye closing prediction by utilizing a trained LeNet neural network, counting the total time length T occupied by adjacent 20 frames of images with closed eyes as a prediction result, and if the T is less than or equal to one minute or the T is more than or equal to two minutes, determining that the driver is fatigue driving;

and aiming at the head image of the driver to be monitored, detecting the watching region by using the intercepted face image and binocular image by using the trained watching region detection model, and judging that the driver is fatigue driving when the prediction results of the driver in a certain period of time are the same watching region.

In this embodiment, the fatigue driving determination conditions are set as follows:

when the driver is in driving, one of the following conditions occurs, and the driver is considered to be in fatigue driving:

the time taken for the driver to blink 20 times is more than 2 minutes or less than 1 minute;

the driver did not detect a large angular rotation of the head (over 20 degrees) for a long time (10 minutes);

the fixation area of the driver is not changed within 5 minutes;

if one of the three conditions is satisfied, fatigue driving is considered, and if 75 or more frames are concentrated in the region of attention of the driver for 3 seconds (or 150 frames) and are in the region (3), the region (4), and the region (17), the driver is considered to be distracted driving.

Claims

1. A method of monitoring fatigue driving, comprising:

constructing a detection model of a driver watching area, and detecting the driver watching area through an eye image and a face image of the driver;

2. The fatigue driving monitoring method according to claim 1, wherein the constructing a detection model for detecting the opening/closing of eyes, and calculating the blinking frequency of the driver comprises:

step 1.2: constructing a LeNet neural network as a model for detecting open/closed eyes, and training by using an open/closed eye detection data set;

step 1.3: aiming at the facial image of the driver to be monitored, the trained neural network LeNet is used for predicting the eyes to be opened or closed, and the total duration T occupied by the adjacent 20 frames of images with the eyes closed is counted as the detection result.

3. The fatigue driving monitoring method according to claim 2, wherein estimating the head posture of the driver by calculating the euler angle of the head posture of the driver comprises:

4. The fatigue driving monitoring method according to claim 3, wherein the constructing a detection model of the driver's gaze area, and the detecting the driver's gaze area through the eye image and the face image of the driver comprises:

step 3.1: taking the public data set DDGC-DB1 as a training set of the model;

step 3.3: cutting out the face image from the scaled image, and mapping the face imageImage width is noted as L₁Height is denoted as H₁；

calculating Euclidean distance o between vectors xi and psi₁：

Calculating Euclidean distance o between vectors xi and gamma₂：

Γ＝ξ+o₁ψ+o₂γ

ψ＝[ψ₀，ψ₁，......，ψ₄₀₉₅]，γ＝[γ₀，γ₁，......γ₄₀₉₅]，ξ＝[ξ₀，ξ₁，......，ξ₄₀₉₅]；

step 3.7: and calculating the sum vector gamma through two full-connection layers, and then carrying out normalization calculation on the result by using softmax to obtain a vector R, wherein the element value in R is the predicted probability value of each category, and the element position of the maximum value is the predicted result of the gazing area.

5. The method for monitoring fatigue driving of claim 4, wherein the determining whether the driver is fatigue driving according to the blinking frequency, the head posture or the gaze area of the driver is specifically expressed as:

6. A method as claimed in claim 2, wherein step 1.1 comprises:

step 1.1.2: positioning 68 key points of the human face in each image;

step 1.1.4: intercepting the corrected eye image as a sample in a training set;

step 1.1.5: each sample was labeled as open or closed.

7. A method for monitoring fatigue driving according to claim 6, wherein said step 1.1.3 comprises:

In the formula, P₃₇.x、P₃₇Y represents the abscissa and ordinate of the face key point with the number 37, respectively; p₄₆.x、P₄₆Y represents the abscissa and ordinate of the face key point with the number 46, respectively;

8. The fatigue driving monitoring method according to claim 7, wherein the step S1-2 includes:

h＝P₃₇.y-P₄₆.y

step SS 1-3: calculating the rotation angle alpha:

9. The fatigue driving monitoring method according to claim 7, wherein the step S1-3 includes:

step SS 2-3: and zooming the picture according to the zooming scale.

10. A method as claimed in claim 2, wherein said step 1.2 comprises:

step 1.2.2: performing convolution operation on an input image by using 6 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step length is 1, and edge filling is not performed to obtain a characteristic vector with the specification of 6 multiplied by 28;

step 1.2.3: performing maximum pooling operation on the obtained feature vector, wherein the size of a convolution kernel is 2 multiplied by 2, the step length is 2, edge filling is not performed, and the specification of the feature vector is converted into 6 multiplied by 14;

step 1.2.4: performing convolution operation on the feature vector obtained in the step 1.2.3 by using 16 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step length is 1, edge filling is not performed, and the specification of the feature vector is converted into 16 multiplied by 10;

step 1.2.5: performing maximum pooling operation on the feature vector obtained in the step 1.2.4, wherein the size of a convolution kernel is 2 multiplied by 2, the step length is 2, edge filling is not performed, and the specification of the feature vector is converted into 16 multiplied by 5;

step 1.2.6: performing convolution operation on the feature vector obtained in the step 1.2.5 by using 120 convolution kernels, wherein the size of the convolution kernels is 5 multiplied by 5, the step length is 1, edge filling is not performed, and the specification of the feature vector is converted into 120 multiplied by 1;