CN114973214A

CN114973214A - Unsafe driving behavior identification method based on face characteristic points

Info

Publication number: CN114973214A
Application number: CN202210651680.8A
Authority: CN
Inventors: 张扬; 陈昊楠; 郭宗豪; 张斌; 杨正一; 产思贤
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-08-30

Abstract

The invention discloses an unsafe driving behavior identification method based on human face characteristic points, which comprises the steps of firstly obtaining an image collected by an in-car camera, carrying out image brightness self-adaptive enhancement processing, and obtaining a processed image with enhanced brightness; then inputting the brightness enhanced image into a ResNet50 model integrated with a space attention mechanism, judging unsafe driving behaviors of a driver, inputting the brightness enhanced image into a human face key point and Euler angle joint detection network, detecting to obtain human face key point information and head Euler angle information, wherein the head Euler angle detection network used for detecting the head Euler angle information is an auxiliary sub-network of a human face key point detection network PFLD; and finally, judging whether the driver is in fatigue driving or not according to the face key point information and the head Euler angle information. The invention has greatly improved detection speed and larger application scene for real-time detection of the driver behavior.

Description

Unsafe driving behavior identification method based on face characteristic points

Technical Field

The application belongs to the technical field of behavior recognition, and particularly relates to an unsafe driving behavior recognition method based on human face characteristic points.

Background

With the rapid development of economy, the living standard of people is continuously improved, automobiles on roads are increased year by year, and the traffic safety problem is concerned when the automobiles bring living convenience to people.

Traffic accidents are caused by various factors such as unsafe behaviors of drivers, road conditions, car damage and the like, but most of the traffic accidents are caused by the unsafe behaviors of the drivers, including distracted driving, fatigue driving, drunk driving and the like, and the traffic accidents are caused to a certain extent. According to the relevant statistics of the traffic administration, 90% of traffic accidents are closely related to the behavior of drivers, and the distracted driving and the fatigue driving become the largest 'killer' causing the traffic accidents.

According to the definition of the international organization for standardization (ISO), the distracted driving refers to the behavior that a driver can disperse his attention to the normal driving when driving, such as making a call, drinking water, taking things from the back seat, and the like. Research shows that the probability of traffic accidents can be increased by 4 times while driving and calling, the probability of drinking water can be increased by 1.5 times while driving, and the probability of drinking water can be increased by 3 times when a driver does not watch the front for more than 2 seconds. Therefore, the distracted driving is a non-negligible safety problem, if the unsafe behaviors of the driver can be detected and reminded in real time, the probability of traffic accidents can be reduced from the source, but at present, the distracted behavior of the driver is convenient to identify, and a wide application product is not available.

For fatigue driving detection, the current method mainly comprises subjective detection and objective detection. The subjective detection is to evaluate a driver self-record table, a subjective questionnaire table, a Pearson fatigue scale and a Stanford sleep scale table, and the method has high dependence degree on the driver and cannot detect fatigue driving in real time. The objective monitoring comprises detection of behavior characteristics of a driver, detection based on vehicle parameters and detection based on measurement of physiological characteristic parameters of the driver, and the monitoring mode based on the vehicle parameters needs to be based on other vehicle behavior statistical information, such as the characteristics that a steering wheel is subjected to pressure from hands of the driver, vehicle acceleration and the like, and the defects that the monitoring mode is easily influenced by factors such as natural environment, the driving level of the driver, psychological quality, driving mood and the like exist; the detection mode based on the physiological characteristics of the driver needs the driver to wear related physiological index detection equipment, so that the comfort of the driver in the driving process can be influenced, and inconvenience is brought to the driver.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an unsafe driving behavior identification method based on human face characteristic points. The Resnet50 model is adopted to fuse a space attention mechanism, so that the accuracy of the detection of the distracted driving is effectively improved; the eye closing, yawning and head lowering behaviors are estimated and detected through the face characteristic point information and the head posture, and the accuracy and the speed of fatigue driving detection are effectively improved.

In order to solve the technical problems, the invention provides the following technical scheme:

a method for identifying unsafe driving behaviors based on human face characteristic points comprises the following steps:

acquiring an image acquired by a camera in the vehicle, and performing image brightness adaptive enhancement processing to obtain a processed brightness enhanced image;

inputting the image with enhanced brightness into a ResNet50 model integrated with a space attention mechanism, and judging unsafe driving behaviors of a driver, wherein the ResNet50 model integrated with the space attention mechanism integrates the space attention mechanism into each bottleneck block of a residual error network ResNet50, each bottleneck block comprises a residual error layer, a first feature output by the residual error layer sequentially passes through a global pooling layer, a full connection layer, a linear rectification function and the full connection layer, an output second feature is subjected to matrix multiplication with the output first feature of the residual error layer, and a third feature output after matrix multiplication is connected with the input feature of the residual error layer to serve as a final output feature of the bottleneck block;

inputting the image with enhanced brightness into a face key point and Euler angle joint detection network, and detecting to obtain face key point information and head Euler angle information, wherein the face key point and Euler angle joint detection network comprises a face key point detection network PFLD for detecting the face key point information, the output of a backbone network of the face key point detection network PFLD is also connected with a head Euler angle detection network for detecting the head Euler angle information, and the head Euler angle detection network is an auxiliary sub-network of the face key point detection network PFLD;

and judging whether the driver has fatigue driving or not according to the face key point information and the head Euler angle information.

Further, the acquiring of the image collected by the camera in the vehicle and the image brightness adaptive enhancement processing include:

after an image collected by an in-vehicle camera is obtained, calculating the brightness of the image;

if the image brightness is smaller than the first threshold value, enhancing the image brightness, otherwise, not enhancing the image brightness;

when image brightness enhancement is carried out, image histograms are counted, and abnormal numerical quantiles at two ends of the histograms are recorded;

removing values outside the quantile interval and stretching the quantile interval to (0, 255);

a processed brightness enhanced image is obtained.

Further, the loss function of the face key point and euler angle joint detection network is as follows:

Loss(x，y)＝wing(x)+L2LossFunction(y)

wherein, x represents the absolute value difference between the key point of the predicted face and the true value, and y represents the absolute value difference between the predicted Euler angle of the head and the true value;

wherein, ω is a nonnegative number, the nonlinear part is limited between intervals (- ω, ω), and e is a constant for constraining the bending degree of the curve and improving the stability of the training, C is a constant, and C is ω - ω ln (1+ | ω |/e) for smoothly connecting the linear and nonlinear parts defined by segments.

Further, the determining whether the driver has fatigue driving according to the face key point information and the head euler angle information includes:

determining 6 key point information of the eyes by the face key point information, wherein the key point information comprises a left eye corner key point, an upper left key point, an upper right key point, a right eye corner key point, a lower right key point and a lower left key point which are respectively marked as P ₁ ～P ₆ ；

Calculating the eye aspect ratio EAR, wherein the specific calculation formula is as follows:

and if the eye length-width ratio EAR exceeds a set second threshold value, judging that the eye is in an eye closing state, and performing fatigue alarm, otherwise, not performing fatigue alarm.

determining 6 key point information of the mouth part through the face key point information, wherein the key point information comprises a left mouth corner key point, an upper left key point, an upper right key point, a right mouth corner key point, a lower right key point and a lower left key point which are respectively marked as Q ₁ ～Q ₆ ；

Calculating the mouth aspect ratio MAR, wherein the specific calculation formula is as follows:

and if the length-width ratio MAR of the mouth part exceeds a set third threshold value, judging that a yawning state occurs, and performing fatigue alarm, otherwise, not performing fatigue alarm.

and judging whether the Euler angle of the head exceeds a set fourth threshold, if so, judging that the head-lowering behavior of the driver occurs, and giving a fatigue alarm, otherwise, not giving the fatigue alarm.

Compared with the prior art, the unsafe driving behavior identification method based on the human face characteristic points has the following advantages:

1. the ResNet50 network model is adopted to realize identification and classification of unsafe driving behaviors of drivers, a space attention mechanism is integrated, the identification accuracy of different behaviors is higher than 97%, and the application scene is large.

2. Compared with the method for calculating the Euler angle of the head posture through the face key point matrix operation, the method for predicting the Euler angle through the face key point model branch is adopted, the information of the face key point and the Euler angle can be simultaneously obtained through the model, the condition that the face key point and the Euler angle are respectively predicted through a dual model is avoided, the detection speed is greatly improved, and a large application scene is provided for real-time detection of the behavior of a driver.

3. The influence of epidemic situation normality is considered, and the accuracy of face key point calibration under the condition of large-area shielding of the face is improved in a data enhancement mode of immediately erasing, such as the condition of wearing a mask.

4. By adopting the image brightness enhancement self-adaptive algorithm, the accuracy of identifying unsafe behaviors of the driver in a dark scene can be effectively improved.

5. The application of the ResNet50 model and the detection and classification recognition of unsafe behaviors of drivers are proved to be feasible by a transfer learning method, the model training can be completed in a short implementation by the mode method, other unsafe driving behaviors are easily expanded, and the model recognition accuracy is high.

The method adopts the ResNet50 and face key point detection based on the spatial attention mechanism, the driver unsafe behavior detection is included in the dual model, the detection speed of the method is improved by adopting an inertia evaluation mode, and the real-time requirement is met.

Drawings

FIG. 1 is a flowchart illustrating an unsafe driving behavior recognition method based on facial feature points according to the present application;

FIG. 2 is a schematic diagram of a bottleneck block structure incorporating a spatial attention mechanism according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a structure of a face key point and euler angle joint detection network in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a method for identifying unsafe driving behavior based on human face feature points is provided, including:

and step S1, obtaining the image collected by the camera in the vehicle, and carrying out image brightness self-adaptive enhancement processing to obtain the processed image with enhanced brightness.

In the embodiment, the camera is arranged in the vehicle, the image containing the cab is acquired, the image generally comprises a face image of the driver, and the image can be used for identifying behaviors such as smoking, calling, playing a mobile phone, turning back to take articles, combing hair by looking at the mirror, drinking, eating east and west, safe driving and the like, and face key point characteristic information and head Euler angle information of the driver.

In the embodiment, the acquired image is preprocessed by an image brightness adaptive algorithm. The image brightness self-adaptive enhancement preprocessing flow is as follows:

after obtaining the image collected by the camera in the vehicle, calculating the brightness of the image;

removing values outside the quantile range, stretching the quantile range to (0,255)

And finally obtaining the processed image with enhanced brightness.

Specifically, when image brightness enhancement is performed, firstly, an image histogram is counted, abnormal numerical value quantiles at two ends of the histogram are recorded according to histogram distribution, namely, a first threshold value with too-dark brightness and a second threshold value with too-bright brightness are recorded. Then, the values outside the quantile point interval are removed, namely, the parts smaller than the first threshold value or larger than the second threshold value are removed, and the rest quantile point intervals are evenly distributed in (0,255) according to the linear mapping, so that the brightness enhancement is completed.

And S2, inputting the image with enhanced brightness into a ResNet50 model integrated with a space attention mechanism, and judging unsafe driving behaviors of a driver, wherein the ResNet50 model integrated with the space attention mechanism integrates the space attention mechanism into each bottleneck block of a residual error network ResNet50, the bottleneck block comprises a residual error layer, a first feature output by the residual error layer sequentially passes through a global pooling layer, a full connection layer, a linear rectification function and the full connection layer, an output second feature is subjected to matrix multiplication with the output first feature of the residual error layer, and a third feature output after matrix multiplication is connected with the input feature of the residual error layer to serve as a final output feature of the bottleneck block.

The embodiment adopts a ResNet50 model integrated with a space attention mechanism to judge unsafe driving behaviors of drivers. The present application incorporates a spatial attention mechanism in the bottleneck block of the residual network ResNet 50.

The bottleneck block integrated with the spatial attention mechanism is shown in fig. 2 and comprises a residual error layer, a first feature output by the residual error layer sequentially passes through the global pooling layer, the full connection layer, the linear rectification function and the full connection layer, an output second feature is subjected to matrix multiplication with the output first feature of the residual error layer, and a third feature output after matrix multiplication is connected with an input feature of the residual error layer and serves as a final output feature of the bottleneck block.

In training the model of ResNet50, which incorporates the spatial attention mechanism, the process is as follows:

fixing the camera in the position of the vehicle, and acquiring unsafe driving behavior data sets of different vehicle types, different age groups and different sexes, wherein the unsafe driving behavior data sets comprise eight behaviors of smoking, making a call, playing a mobile phone, turning backwards to take articles, combing hair according to a mirror, drinking water, eating east and west, safely driving and the like;

in order to ensure that the data have the same size and consistent format, the picture needs to be preprocessed, wherein the preprocessing comprises preprocessing methods such as image size definition, image sample normalization, image histogram processing and the like;

enhancing the preprocessed data set in a data enhancement mode, wherein the data set comprises random erasing, random rotation, horizontal overturning, horizontal displacement, random cutting and scaling, and the robustness of the model is improved;

and taking the processed image as a model data set, and performing the following steps of 7: 1: 2, dividing the test result into a training set, a verification set and a test set;

configuring and adjusting various parameters of the model;

training and verifying the network model by taking a training sample and a verification sample as input;

and calculating the cross entropy loss between the real value and the predicted value by adopting a classified cross entropy function, wherein the specific calculation formula is as follows, wherein n is the number of samples, true is the label value of the real sample, and pred is the label value of the predicted sample

Defining an optimization function, adopting an AdamW optimizer, improving the training of the model on speed and precision to achieve the optimum, wherein the specific formula is as follows, wherein lambda is the weight attenuation rate of each step, d is the learning rate,

gradient for batch t;

setting model training parameters, setting the image resolution to be 224 multiplied by 224, setting the initial learning rate to be 0.0001, setting the dynamic change learning rate, and setting the number of training rounds to be 100;

and training the network model, obtaining a trained model when the loss rate function is not reduced and the accuracy rate function is not increased, using the obtained model for testing a sample, and finally completing the training of the model.

Step S3, inputting the image with enhanced brightness into a face key point and Euler angle joint detection network, detecting to obtain face key point information and head Euler angle information, wherein the face key point and Euler angle joint detection network comprises a face key point detection network PFLD for detecting the face key point information, the output of the backbone network of the face key point detection network PFLD is also connected with a head Euler angle detection network for detecting the head Euler angle information, and the head Euler angle detection network is an auxiliary sub-network of the face key point detection network PFLD.

In the embodiment, the face key point information is detected by adopting a face key point detection network PFLD, and the head Euler angle information is detected by extracting a PFLD auxiliary sub-network.

The face key point and euler angle joint detection network is shown in fig. 3, the upper half part is a face key point detection network PFLD, and the lower half part is a head euler angle detection network.

In the face key point detection network PFLD of the present embodiment, the input image is also the image with enhanced brightness of the present embodiment, and includes the face of the driver. The input image passes through a moving-end neural network (such as MobileNet V2, a backbone network of PFLD), then is input into a multi-scale feature fusion module and a full-connection layer, and finally face key point information is output.

The header euler angle information is detected using a PFLD auxiliary sub-network, which includes four convolutional layers and one full-connection layer. The input of the PFLD auxiliary sub-network is the output of the neural network of the mobile terminal, and finally the information of the Euler angle of the head is output.

It should be noted that, when training the face key point detection network PFLD, a PFLD auxiliary sub-network is needed for auxiliary training, and in this embodiment, the PFLD auxiliary sub-network is skillfully used as a head euler angle detection network to obtain head euler angle information, that is, head pose euler angle information. The auxiliary sub-network in the original PFLD network is used for further convergence loss, namely the auxiliary sub-network is only used during training but does not participate during prediction, but the head Euler angle is needed in the method, in order to avoid calculating the head Euler angle through key point complex matrix operation or introducing a new model to predict the head Euler angle, the structure of the network is modified, the auxiliary sub-network is used during training and testing to form a double-stream network, one stream is used for key point detection, and the other stream is used for head Euler angle prediction. Because the PFLD auxiliary sub-network is needed to be adopted for assisting when the face key point detection network PFLD is trained, the network structure of the application is adopted, and the calculation amount needed by training is effectively reduced.

The process of training the face key point and Euler angle joint detection network is as follows:

integrating human face attributes into WFLW, 300W _ LP and LAPA, and labeling Euler angle labels on the human face attributes through PRNet;

carrying out image preprocessing on the acquired data set;

carrying out data enhancement operation on the preprocessed image;

taking the processed image as a model data set, and dividing the model data set into a training set, a verification set and a test set according to the proportion of 7: 1: 2;

configuring and adjusting various parameters of the model;

the method comprises the following steps of defining a Loss function, wherein the Loss function consists of two Loss functions of face key point prediction and head Euler angle, adopting Wing Loss as a Loss function of the face key point prediction to improve the network convergence speed, adopting an L2 Loss function as a Loss function of the head Euler angle prediction, and adopting the following specific calculation formula:

Loss(x，y)＝wing(x)+L2LossFunction(y)

wherein x represents the difference between the predicted key point and the absolute value of the group Truth, and y represents the difference between the predicted Euler angle of the head and the absolute value of the group Truth.

The loss function wing (x) is specifically formulated as follows:

wherein, x represents the distance between the predicted key point and the Ground Truth, ω is a nonnegative number, the nonlinear part is limited between intervals (- ω, ω), e is a constant for restricting the bending degree of the curve and improving the stability of the training, and C is a constant for smoothly connecting the linear part and the nonlinear part defined by segments, wherein C is ω - ω ln (1+ | ω |/[ e ]).

The loss function L2(y) is specifically formulated as follows:

wherein, y _true Is the true Euler angle of the head, y _predicted Is the predicted euler angle of the head.

Setting model training parameters, setting the image resolution to be 224 multiplied by 224, setting the initial learning rate to be 0.0001, setting the dynamic change learning rate, and setting the number of training rounds to be 250;

and training the network model, obtaining a trained model when the loss rate function is not reduced and the accuracy rate function is not increased, and using the obtained model for testing a test sample to finish the training of the network.

And step S4, judging whether fatigue driving occurs to the driver according to the face key point information and the head Euler angle information.

Specifically, whether fatigue driving occurs to a driver is judged according to face key point information and head euler angle information, and the method comprises the following steps:

example 1: and calculating the length-width ratio of the mouth part to the eyes by the key points of the face of the driver, and comparing the length-width ratio with a threshold value to judge whether the eyes of the driver are closed or yawned.

Wherein, the process of judging whether to close the eyes is as follows:

determining 6 key point information of the eyes through the key point information of the human face, wherein the key point information comprises a left eye corner key point, an upper left key point, an upper right key point, a right eye corner key point, a lower right key point and a lower left key point which are respectively marked as P ₁ ～P ₆ ；

and (3) carrying out experimental statistics on the eye length-width ratio threshold value under the conditions of eye closing and eye opening, judging whether the driver is in the eye closing state for a long time by comparing the threshold value, namely judging that the driver is in the eye closing state and carrying out fatigue alarm if the eye length-width ratio EAR is smaller than a set second threshold value, and otherwise, judging that the driver is not in the eye closing state and not carrying out fatigue alarm.

The process of judging whether yawning is performed is as follows:

determining 6 key point information of the mouth through the face key point information, wherein the key point information comprises a left corner key point, an upper left key point, an upper right key point, a right corner key point, a lower right key point and a lower left key point which are respectively marked as Q ₁ ～Q ₆ ；

and (4) carrying out experimental statistics on the mouth aspect ratio threshold under the conditions of yawning, speaking and mouth closing, and judging whether the yawning is performed for a long time by comparing the threshold. Namely, if the length-width ratio MAR of the mouth part exceeds a set third threshold value, judging that a yawning state occurs, and carrying out fatigue alarm, otherwise, not carrying out fatigue alarm.

Example 2, judging the long-time head lowering behavior of the driver according to the Euler angle of the head;

namely, whether the Euler angle of the head exceeds a set fourth threshold value is judged, if so, the driver is judged to have head lowering behavior, fatigue alarm is carried out, and otherwise, fatigue alarm is not carried out.

Furthermore, the fatigue judgment is carried out according to the time or continuous times, for example, when the time length of the head lowering behavior is judged to exceed the fifth threshold value continuously, the long-time head lowering behavior is considered to occur, the fatigue alarm is carried out, and otherwise, the fatigue alarm is not carried out. Or when the number of times of head lowering behaviors is continuously judged to exceed the sixth threshold value, carrying out fatigue alarm, or else, not carrying out fatigue alarm.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. The method for identifying the unsafe driving behavior based on the face characteristic points is characterized by comprising the following steps of:

2. The method for identifying unsafe driving behavior based on human face characteristic points according to claim 1, wherein the obtaining of the images collected by the in-vehicle camera and the image brightness adaptive enhancement processing are performed, and the method comprises:

removing the values outside the quantile range, and stretching the quantile range to (0,255);

a processed brightness enhanced image is obtained.

3. The method of claim 1, wherein the loss function of the face keypoint and euler angle joint detection network is as follows:

Loss(x，y)＝wing(x)+L2LossFunction(y)

4. The method for identifying unsafe driving behavior based on human face characteristic points according to claim 1, wherein the determining whether fatigue driving occurs to the driver according to the human face key point information and the head euler angle information comprises:

5. The method for identifying unsafe driving behavior based on human face characteristic points according to claim 1, wherein the determining whether fatigue driving occurs to the driver according to the human face key point information and the head euler angle information comprises:

by a personDetermining 6 key point information of the mouth by the face key point information, wherein the key point information comprises a left corner key point, an upper left key point, an upper right key point, a right corner key point, a lower right key point and a lower left key point which are respectively marked as Q ₁ ～Q ₆ ；

6. The method for identifying unsafe driving behavior based on human face characteristic points according to claim 1, wherein the determining whether fatigue driving occurs to the driver according to the human face key point information and the head euler angle information comprises: