CN114792437A

CN114792437A - Method and system for analyzing safe driving behavior based on facial features

Info

Publication number: CN114792437A
Application number: CN202210041064.0A
Authority: CN
Inventors: 王曦; 万磊; 王俊; 王曼; 杜超; 韩飞
Original assignee: Shenzhen Cbpm & Xinda Banking Technology Co ltd
Current assignee: Shenzhen Cbpm & Xinda Banking Technology Co ltd
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-07-26

Abstract

The invention discloses a method and a system for analyzing safe driving behaviors based on facial features, which belong to the technical field of safe driving, and are characterized in that the position of a candidate target driver is identified through a face detection module; positioning the 3D key point position of the face by using a face key point detection module based on the obtained face image; then respectively sending the data to a behavior detection module and a fatigue detection module according to different positioning positions; finally, the behaviors of the driver are recognized in different detection modules and fatigue classification is carried out, and the purposes are as follows: the driving behavior can be analyzed on the premise of guaranteeing real-time performance and accuracy.

Description

Method and system for analyzing safe driving behavior based on facial features

Technical Field

The invention belongs to the technical field of safe driving, and particularly relates to a safe driving behavior analysis method and system based on facial features.

Background

With the rapid development of economy, automobiles are also rapidly popularized, drivers need to drive for hours or overnight in many long-distance trips, the drivers are easily in fatigue states, and when the drivers take calls or smoke, the drivers are more likely to be distracted, and traffic accidents are easily caused. It is therefore currently necessary to monitor the behaviour and fatigue state of the driver and to make reminders.

In the prior art, a driving behavior of a driver is monitored through a monitor, for example, when the driver has behaviors of interfering driving (such as smoking, making a call and the like) or is in a fatigue state (such as blinking eyes ceaselessly or yawning ceaselessly) in the driving process, a system needs to detect and analyze the current behavior or fatigue degree of the driver in time, and then gives corresponding warning to the driver.

The prior art has the following problems:

the existing driving behavior analysis can analyze some basic behaviors, but the recognized objects and the image background are often not well distinguished and failure is caused as the image background becomes complex, although the accuracy of action recognition can be improved by a deep learning method, the network scale of a deep learning model is overlarge, so that model parameters are overlarge and the calculation amount is overlarge, real-time detection is difficult to realize when the deep learning model is applied to generally configured hardware, and meanwhile, the detection of key points of a human face is greatly influenced because a mask needs to be worn in an epidemic situation.

Disclosure of Invention

The invention provides a safe driving behavior analysis method and system based on facial features, aiming at the problems that in the prior art, as the image background becomes complicated, the identified object and the image background can not be well distinguished, so that the object and the image background are often invalid, although the accuracy of motion identification can be improved by a deep learning method, the network scale of a deep learning model is overlarge, so that the model parameters are overlarge and the calculation amount is overlarge, so that real-time detection is difficult to realize on generally configured hardware, and meanwhile, the detection of key points of a face is greatly influenced because an epidemic situation needs to wear a mask, so that the safe driving behavior analysis method and system based on the facial features have the purposes that: the driving behavior can be analyzed on the premise of guaranteeing real-time performance and accuracy.

In order to achieve the purpose, the invention adopts the technical scheme that: provided are a method and a system for safe driving behavior analysis based on facial features, comprising:

s1: performing face positioning in pixel aspect on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;

s2: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures for detecting the behavior characteristics of the face;

s3: extracting a plurality of pieces of face key point information and head posture information obtained by S2, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilTM, and judging the driving state of a driver in real time;

s4: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result.

Preferably, S1 in the present invention specifically includes:

s1.1: scaling the face scale image to pixels 300 x 300;

s1.2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;

s1.3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;

s1.4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.

Preferably, S2 in the present invention specifically includes:

recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting a plurality of face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;

the error between the predicted and true values is then calculated by a loss function, whose formula is:

wherein,

where Φ (w) is the regularization term, l (y) _i ,f(x _i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) _i (ii) a w) predicted values of the five sense organs and facial contours representing network prediction, y _i Indicates five sense organs harmonizing the faceThe real position of the partial contour is the true value, and Φ (w) represents the regularization term of the parameter w to limit the coefficient.

Preferably, in the present invention S3, the driving states include attentive driving, fatigue driving, and left-right-looking expectation.

Preferably, the S4 of the present invention is specifically:

s4.1: firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:

the loss function is defined as a weighted sum of the position error and the confidence error, wherein the weight coefficient alpha is set to 1 through cross validation, and N is the number of positive samples of the prior frame;

is an indication parameter when

The time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;

s4.2: for the position error, the following formula is used for definition:

wherein b ═ cx, cy, w, h [ ],

i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;

s4.3: for the confidence error, the following formula is used for definition:

wherein,

representing the true category for each predictive detection block,

is a prediction category of the network;

s4.4: when the driving behavior is predicted, firstly, determining the class and the confidence value according to the class confidence, and filtering a prediction frame belonging to the background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering the prediction frames with overlap through an NMS algorithm, and taking the remaining prediction frames as behavior detection results.

The invention also provides a system for analyzing safe driving behavior based on facial features, which comprises:

a face detection module: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;

face key point detection module: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures for detecting the behavior characteristics of the face;

a fatigue detection module: extracting a plurality of pieces of face key point information and head posture information obtained by a face key point detection module, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;

a behavior detection module: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result.

Preferably, the face detection module of the present invention specifically positions the face of the driver:

step 1: scaling the face scale image into pixels 300 x 300;

and 2, step: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;

and step 3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;

and 4, step 4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.

Preferably, the face key point detection module of the present invention specifically comprises:

wherein,

where Φ (w) is the regularization term, l (y) _i ,f(x _i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) _i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y _i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.

Preferably, in the fatigue detection module of the present invention, the driving state includes concentration driving, fatigue driving, and a left expectation.

Preferably, the behavior detection module of the invention specifically detects the behavior of the driver as follows:

firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:

is an indication parameter when

The time represents that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence coefficient predicted value, l is a position predicted value of a corresponding boundary box of the prior box, and g is a position parameter of a ground route;

for the position error, the following formula is used for definition:

wherein b ═ cx, cy, w, h [ ],

for confidence error, the following formula is used for definition:

wherein,

indicating true for each prediction detection boxThe real category of the content is,

is a prediction category of the network;

when the driving behavior is predicted, firstly, determining the class and the confidence value according to the class confidence, and filtering a prediction frame belonging to the background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering the prediction frames with overlap through an NMS algorithm, and taking the remaining prediction frames as behavior detection results.

Compared with the prior art, the technical scheme of the invention has the following advantages/beneficial effects:

1. the method disclosed by the invention uses 3D key point detection to make up for some defects of 2D key point detection in practical application, such as low identification accuracy rate, low living body detection accuracy rate and the like.

2. In the aspect of a fatigue detection mode, the fatigue state can be effectively judged by combining the key point information of the human face and the pitch angle information of the head posture and in a designed convolutional neural network.

3. The invention uses the RPN network to generate a series of coarse-grained candidate frame information in the behavior detection module, then classifies and regresses the coarse-grained candidate frame information so as to further regress to obtain more accurate frame information, and adopts the feature fusion operation for the target detection network, thereby effectively improving the detection effect on small targets.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a schematic flow chart of example 1 of the present invention.

Fig. 2 is a schematic diagram showing the detection of the key points in embodiment 1 of the present invention.

Fig. 3 is a schematic diagram of a key point face in embodiment 1 of the present invention.

Fig. 4 is a flowchart of the fatigue detection of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive efforts based on the embodiments of the present invention, are within the scope of protection of the present invention. Thus, the detailed description of the embodiments of the present invention provided below is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention.

Example 1:

as shown in fig. 1, fig. 2, fig. 3 and fig. 4, the present invention provides a method and a system for analyzing safe driving behavior based on facial features, comprising:

s1: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver; s1 specifically includes:

s1.1: scaling the face scale image into pixels 300 x 300;

S2: calibrating the recognized driver face image by using a 3D face key point detection module according to the recognized driver face image, and obtaining 68 personal face key points and head gestures for detecting the behavior characteristics of the face as shown in figure 2; s2 specifically includes:

recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting 68 face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle; as shown in fig. 3, wherein several important key points are explained: the nose tip was 31, the nose root 28, the chin 9, the left eye outer corner 37, the left eye inner corner 40, the right eye inner corner 43, the right eye outer corner 46, the mouth center 67, the mouth right corner 55, the left face 1, and the right face 17.

The error between the predicted value and the true value is then calculated by a loss function, the formula of which is:

wherein,

where Φ (w) is the regularization term, l (y) _i ,f(x _i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) _i (ii) a w) predicted values of the five sense organs and facial contours representing network prediction, y _i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.

The true values in this example 1 represent the true locations of the face's key points, which are the five sense organs and the facial contours.

S3: extracting 68 pieces of face key point information and head posture information obtained in S2, combining the 68 pieces of face key point information and the head posture information into 68 x 6 image feature vectors, inputting the image feature vectors of each frame into a convolutional neural network, outputting state feature vectors by the convolutional neural network, inputting the state feature vectors into a BiLSTM, and judging the driving state of a driver in real time; the driving states include attentive driving, fatigue driving, and left-right anticipation. In the case of determining the driving state of the driver wearing the mask, in example 1, the fatigue level of the driver is determined by determining the expression such as blinking eyes or yawning without stopping the blinking.

S4: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result. S4 specifically includes:

s4.1: firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, the detection effect on small targets is effectively improved, and the loss function adopted by the whole target detection network is as follows:

a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;

is an indication parameter when

The time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a predicted value of the category confidence coefficient, l is a predicted value of the position of the corresponding boundary box of the prior frame, and g is a group tA location parameter of run;

s4.2: for the position error, the following formula is used for definition (the optimization of the present invention for the position is optimized by using the Smooth L1 loss):

wherein b ═ cx, cy, w, h [ ],

Wherein i represents the anchor frame index id of each training batch as a positive sample (pos), and for each identified frame detected, is optimized by comparison with the position b (center coordinates x, y, and width w and height h of the detected frame) of the true position L at each point;

the function is actually a piecewise function, L2 loss is obtained when the input point x belongs to the range of [ -1,1], the problem that the L1 has a break point at 0 and L1 loss is obtained when the input point x is out of the range of [ -1,1], and the problem of outlier gradient explosion is solved.

S4.3: for confidence errors, the following formula is used for definition (for optimization of classification classes, cross entropy loss is used for optimization):

wherein, the cross entropy loss function optimizes each detection box i belonging to different positive samples (pos) and negative samples (Neg),

representing the true category for each predictive detection block,

is a prediction category of the network;

wherein the forecast category score for the network is normalized to between [0,1] by sotmax;

s4.4: when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out the prediction frames lower than the confidence coefficient threshold according to the set confidence coefficient threshold; and decoding the remaining prediction frames, performing clip after decoding to prevent the positions of the prediction frames from exceeding the picture, performing descending order arrangement according to the confidence value, obtaining the real position parameters of each prediction frame according to the prior frames, then reserving 400 prediction frames, filtering the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection results.

a face detection module: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver; the face detection module of the invention positions the face of the driver specifically as follows:

step 1: scaling the face scale image to pixels 300 x 300;

step 2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;

Face key point detection module: calibrating the recognized driver face image by using a 3D face key point detection module according to the recognized driver face image to obtain 68 personal face key points and head postures for detecting the human face behavior characteristics; the face key point detection module specifically comprises:

recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting 68 face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;

wherein,

where Φ (w) is a regularization term, l (y) _i ,f(x _i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) _i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y _i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.

A fatigue detection module: as shown in fig. 4, 68 pieces of face key point information and head pose information obtained by the face key point detection module are extracted and combined into 68 × 6 image feature vectors, then the image feature vectors of each frame are input into the convolutional neural network, the convolutional neural network outputs state feature vectors, then the state feature vectors are input into the BiLSTM, and the driving state of the driver is judged in real time; the driving states include attentive driving, fatigue driving, and left-right anticipation.

A behavior detection module: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result. The behavior detection module specifically detects the behavior of the driver as follows:

is an indication parameter when

for the position error, the following formula is used for definition:

wherein, b ═ { cx, cy,w,h}、

for confidence error, the following formula is used for definition:

wherein,

indicating the true category for each predictive detection block,

is a prediction category of the network;

when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing clip after decoding to prevent the positions of the prediction frames from exceeding the picture, performing descending order arrangement according to the confidence value, obtaining the real position parameters of each prediction frame according to the prior frames, then reserving 400 prediction frames, filtering the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection results.

The above is only a preferred embodiment of the present invention, and it should be noted that the above preferred embodiment should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the spirit and scope of the invention, and these modifications and adaptations should be considered within the scope of the invention.

Claims

1. A method for safe driving behavior analysis based on facial features, comprising:

s1: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;

s2: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures used for detecting the behavior characteristics of the face;

s3: extracting a plurality of pieces of face key point information and head posture information obtained in S2, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;

s4: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result.

2. The method for analyzing safe driving behavior based on facial features as claimed in claim 1, wherein the step S1 is specifically as follows:

s1.1: scaling the face scale image into pixels 300 x 300;

3. The method for analyzing safe driving behavior based on facial features as claimed in claim 2, wherein the step S2 is specifically as follows:

recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting the face key points based on the five sense organs and the face contour and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;

wherein,

where Φ (w) is a regularization term, l (y) _i ,f(x _i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) _i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y _i The real positions of the five sense organs and the facial contour are represented, namely, the real values, and phi (w) represents the regularization term of a parameter w to limit the coefficients.

4. A method for safe driving behavior analysis based on facial features as claimed in claim 1, wherein the driving states include attentive driving, fatigue driving and left look ahead in S3.

5. The method for safe driving behavior analysis based on facial features according to claim 1, wherein S4 is specifically:

s4.1: firstly, generating a series of coarse-grained candidate frame information through an RPN (resilient packet network), then classifying and regressing the candidate frame information, and adopting a feature fusion operation for a target detection network, wherein a loss function adopted by the whole target detection network is as follows:

is an indication parameter when

The time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence coefficient predicted value, l is a position predicted value of a corresponding boundary box of the prior box, and g is a position parameter of a ground route;

s4.2: for the position error, the following formula is used for definition:

wherein b ═ cx, cy, w, h [ ],

s4.3: for the confidence error, the following formula is used for definition:

wherein,

indicating the true category for each predictive detection block,

is a prediction category of the network;

s4.4: when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order arrangement according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering out the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection result.

6. A system for safe driving behavior analysis based on facial features, comprising:

a face detection module: performing face positioning in pixel aspect on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;

face key point detection module: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures used for detecting the behavior characteristics of the face;

a behavior detection module: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result.

7. The system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein the face detection module is specifically configured to locate the face of the driver by:

step 1: scaling the face scale image to pixels 300 x 300;

8. The system for safe driving behavior analysis based on facial features according to claim 7, wherein the face keypoint detection module is specifically:

wherein,

9. A system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein in the fatigue detection module, the driving states include attentive driving, fatigue driving, and left look ahead.

10. The system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein the behavior detection module specifically detects the behavior of the driver as:

is an indication parameter when

The time represents that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;

for the position error, the following formula is used for definition:

where b ═ { cx, cy, w, h }, or,

i tableThe anchor box index id for each training batch is shown as positive sample (pos), g represents the predicted value, d represents the true value;

for the confidence error, the following formula is used for definition:

wherein,

representing the true category for each predictive detection block,

is a prediction category of the network;

when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out the prediction frames lower than the confidence coefficient threshold according to the set confidence coefficient threshold; and decoding the remaining prediction frames, performing descending order arrangement according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering out the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection result.