CN114792437A - Method and system for analyzing safe driving behavior based on facial features - Google Patents

Method and system for analyzing safe driving behavior based on facial features Download PDF

Info

Publication number
CN114792437A
CN114792437A CN202210041064.0A CN202210041064A CN114792437A CN 114792437 A CN114792437 A CN 114792437A CN 202210041064 A CN202210041064 A CN 202210041064A CN 114792437 A CN114792437 A CN 114792437A
Authority
CN
China
Prior art keywords
face
frame
prediction
confidence
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210041064.0A
Other languages
Chinese (zh)
Inventor
王曦
万磊
王俊
王曼
杜超
韩飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Cbpm & Xinda Banking Technology Co ltd
Original Assignee
Shenzhen Cbpm & Xinda Banking Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Cbpm & Xinda Banking Technology Co ltd filed Critical Shenzhen Cbpm & Xinda Banking Technology Co ltd
Priority to CN202210041064.0A priority Critical patent/CN114792437A/en
Publication of CN114792437A publication Critical patent/CN114792437A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for analyzing safe driving behaviors based on facial features, which belong to the technical field of safe driving, and are characterized in that the position of a candidate target driver is identified through a face detection module; positioning the 3D key point position of the face by using a face key point detection module based on the obtained face image; then respectively sending the data to a behavior detection module and a fatigue detection module according to different positioning positions; finally, the behaviors of the driver are recognized in different detection modules and fatigue classification is carried out, and the purposes are as follows: the driving behavior can be analyzed on the premise of guaranteeing real-time performance and accuracy.

Description

Method and system for analyzing safe driving behavior based on facial features
Technical Field
The invention belongs to the technical field of safe driving, and particularly relates to a safe driving behavior analysis method and system based on facial features.
Background
With the rapid development of economy, automobiles are also rapidly popularized, drivers need to drive for hours or overnight in many long-distance trips, the drivers are easily in fatigue states, and when the drivers take calls or smoke, the drivers are more likely to be distracted, and traffic accidents are easily caused. It is therefore currently necessary to monitor the behaviour and fatigue state of the driver and to make reminders.
In the prior art, a driving behavior of a driver is monitored through a monitor, for example, when the driver has behaviors of interfering driving (such as smoking, making a call and the like) or is in a fatigue state (such as blinking eyes ceaselessly or yawning ceaselessly) in the driving process, a system needs to detect and analyze the current behavior or fatigue degree of the driver in time, and then gives corresponding warning to the driver.
The prior art has the following problems:
the existing driving behavior analysis can analyze some basic behaviors, but the recognized objects and the image background are often not well distinguished and failure is caused as the image background becomes complex, although the accuracy of action recognition can be improved by a deep learning method, the network scale of a deep learning model is overlarge, so that model parameters are overlarge and the calculation amount is overlarge, real-time detection is difficult to realize when the deep learning model is applied to generally configured hardware, and meanwhile, the detection of key points of a human face is greatly influenced because a mask needs to be worn in an epidemic situation.
Disclosure of Invention
The invention provides a safe driving behavior analysis method and system based on facial features, aiming at the problems that in the prior art, as the image background becomes complicated, the identified object and the image background can not be well distinguished, so that the object and the image background are often invalid, although the accuracy of motion identification can be improved by a deep learning method, the network scale of a deep learning model is overlarge, so that the model parameters are overlarge and the calculation amount is overlarge, so that real-time detection is difficult to realize on generally configured hardware, and meanwhile, the detection of key points of a face is greatly influenced because an epidemic situation needs to wear a mask, so that the safe driving behavior analysis method and system based on the facial features have the purposes that: the driving behavior can be analyzed on the premise of guaranteeing real-time performance and accuracy.
In order to achieve the purpose, the invention adopts the technical scheme that: provided are a method and a system for safe driving behavior analysis based on facial features, comprising:
s1: performing face positioning in pixel aspect on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
s2: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures for detecting the behavior characteristics of the face;
s3: extracting a plurality of pieces of face key point information and head posture information obtained by S2, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilTM, and judging the driving state of a driver in real time;
s4: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result.
Preferably, S1 in the present invention specifically includes:
s1.1: scaling the face scale image to pixels 300 x 300;
s1.2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
s1.3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
s1.4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
Preferably, S2 in the present invention specifically includes:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting a plurality of face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
Figure RE-GDA0003685071460000031
wherein,
Figure RE-GDA0003685071460000032
where Φ (w) is the regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network prediction, y i Indicates five sense organs harmonizing the faceThe real position of the partial contour is the true value, and Φ (w) represents the regularization term of the parameter w to limit the coefficient.
Preferably, in the present invention S3, the driving states include attentive driving, fatigue driving, and left-right-looking expectation.
Preferably, the S4 of the present invention is specifically:
s4.1: firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
Figure RE-GDA0003685071460000033
the loss function is defined as a weighted sum of the position error and the confidence error, wherein the weight coefficient alpha is set to 1 through cross validation, and N is the number of positive samples of the prior frame;
Figure RE-GDA0003685071460000034
is an indication parameter when
Figure RE-GDA0003685071460000035
The time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;
s4.2: for the position error, the following formula is used for definition:
Figure RE-GDA0003685071460000041
wherein b ═ cx, cy, w, h [ ],
Figure RE-GDA0003685071460000042
Figure RE-GDA0003685071460000043
Figure RE-GDA0003685071460000044
i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
s4.3: for the confidence error, the following formula is used for definition:
Figure RE-GDA0003685071460000045
Figure RE-GDA0003685071460000046
wherein,
Figure RE-GDA0003685071460000047
representing the true category for each predictive detection block,
Figure RE-GDA0003685071460000048
is a prediction category of the network;
s4.4: when the driving behavior is predicted, firstly, determining the class and the confidence value according to the class confidence, and filtering a prediction frame belonging to the background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering the prediction frames with overlap through an NMS algorithm, and taking the remaining prediction frames as behavior detection results.
The invention also provides a system for analyzing safe driving behavior based on facial features, which comprises:
a face detection module: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
face key point detection module: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures for detecting the behavior characteristics of the face;
a fatigue detection module: extracting a plurality of pieces of face key point information and head posture information obtained by a face key point detection module, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;
a behavior detection module: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result.
Preferably, the face detection module of the present invention specifically positions the face of the driver:
step 1: scaling the face scale image into pixels 300 x 300;
and 2, step: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
and step 3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
and 4, step 4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
Preferably, the face key point detection module of the present invention specifically comprises:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting a plurality of face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
Figure RE-GDA0003685071460000061
wherein,
Figure RE-GDA0003685071460000062
where Φ (w) is the regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.
Preferably, in the fatigue detection module of the present invention, the driving state includes concentration driving, fatigue driving, and a left expectation.
Preferably, the behavior detection module of the invention specifically detects the behavior of the driver as follows:
firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
Figure RE-GDA0003685071460000063
the loss function is defined as a weighted sum of the position error and the confidence error, wherein the weight coefficient alpha is set to 1 through cross validation, and N is the number of positive samples of the prior frame;
Figure RE-GDA0003685071460000064
is an indication parameter when
Figure RE-GDA0003685071460000065
The time represents that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence coefficient predicted value, l is a position predicted value of a corresponding boundary box of the prior box, and g is a position parameter of a ground route;
for the position error, the following formula is used for definition:
Figure RE-GDA0003685071460000066
wherein b ═ cx, cy, w, h [ ],
Figure RE-GDA0003685071460000067
Figure RE-GDA0003685071460000068
Figure RE-GDA0003685071460000071
i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
for confidence error, the following formula is used for definition:
Figure RE-GDA0003685071460000072
Figure RE-GDA0003685071460000073
wherein,
Figure RE-GDA0003685071460000074
indicating true for each prediction detection boxThe real category of the content is,
Figure RE-GDA0003685071460000075
is a prediction category of the network;
when the driving behavior is predicted, firstly, determining the class and the confidence value according to the class confidence, and filtering a prediction frame belonging to the background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering the prediction frames with overlap through an NMS algorithm, and taking the remaining prediction frames as behavior detection results.
Compared with the prior art, the technical scheme of the invention has the following advantages/beneficial effects:
1. the method disclosed by the invention uses 3D key point detection to make up for some defects of 2D key point detection in practical application, such as low identification accuracy rate, low living body detection accuracy rate and the like.
2. In the aspect of a fatigue detection mode, the fatigue state can be effectively judged by combining the key point information of the human face and the pitch angle information of the head posture and in a designed convolutional neural network.
3. The invention uses the RPN network to generate a series of coarse-grained candidate frame information in the behavior detection module, then classifies and regresses the coarse-grained candidate frame information so as to further regress to obtain more accurate frame information, and adopts the feature fusion operation for the target detection network, thereby effectively improving the detection effect on small targets.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart of example 1 of the present invention.
Fig. 2 is a schematic diagram showing the detection of the key points in embodiment 1 of the present invention.
Fig. 3 is a schematic diagram of a key point face in embodiment 1 of the present invention.
Fig. 4 is a flowchart of the fatigue detection of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive efforts based on the embodiments of the present invention, are within the scope of protection of the present invention. Thus, the detailed description of the embodiments of the present invention provided below is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention.
Example 1:
as shown in fig. 1, fig. 2, fig. 3 and fig. 4, the present invention provides a method and a system for analyzing safe driving behavior based on facial features, comprising:
s1: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver; s1 specifically includes:
s1.1: scaling the face scale image into pixels 300 x 300;
s1.2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
s1.3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
s1.4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
S2: calibrating the recognized driver face image by using a 3D face key point detection module according to the recognized driver face image, and obtaining 68 personal face key points and head gestures for detecting the behavior characteristics of the face as shown in figure 2; s2 specifically includes:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting 68 face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle; as shown in fig. 3, wherein several important key points are explained: the nose tip was 31, the nose root 28, the chin 9, the left eye outer corner 37, the left eye inner corner 40, the right eye inner corner 43, the right eye outer corner 46, the mouth center 67, the mouth right corner 55, the left face 1, and the right face 17.
The error between the predicted value and the true value is then calculated by a loss function, the formula of which is:
Figure RE-GDA0003685071460000091
wherein,
Figure RE-GDA0003685071460000092
where Φ (w) is the regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network prediction, y i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.
The true values in this example 1 represent the true locations of the face's key points, which are the five sense organs and the facial contours.
S3: extracting 68 pieces of face key point information and head posture information obtained in S2, combining the 68 pieces of face key point information and the head posture information into 68 x 6 image feature vectors, inputting the image feature vectors of each frame into a convolutional neural network, outputting state feature vectors by the convolutional neural network, inputting the state feature vectors into a BiLSTM, and judging the driving state of a driver in real time; the driving states include attentive driving, fatigue driving, and left-right anticipation. In the case of determining the driving state of the driver wearing the mask, in example 1, the fatigue level of the driver is determined by determining the expression such as blinking eyes or yawning without stopping the blinking.
S4: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result. S4 specifically includes:
s4.1: firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, the detection effect on small targets is effectively improved, and the loss function adopted by the whole target detection network is as follows:
Figure RE-GDA0003685071460000101
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;
Figure RE-GDA0003685071460000102
is an indication parameter when
Figure RE-GDA0003685071460000103
The time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a predicted value of the category confidence coefficient, l is a predicted value of the position of the corresponding boundary box of the prior frame, and g is a group tA location parameter of run;
s4.2: for the position error, the following formula is used for definition (the optimization of the present invention for the position is optimized by using the Smooth L1 loss):
Figure RE-GDA0003685071460000104
wherein b ═ cx, cy, w, h [ ],
Figure RE-GDA0003685071460000105
Figure RE-GDA0003685071460000106
Figure RE-GDA0003685071460000107
Wherein i represents the anchor frame index id of each training batch as a positive sample (pos), and for each identified frame detected, is optimized by comparison with the position b (center coordinates x, y, and width w and height h of the detected frame) of the true position L at each point;
the function is actually a piecewise function, L2 loss is obtained when the input point x belongs to the range of [ -1,1], the problem that the L1 has a break point at 0 and L1 loss is obtained when the input point x is out of the range of [ -1,1], and the problem of outlier gradient explosion is solved.
S4.3: for confidence errors, the following formula is used for definition (for optimization of classification classes, cross entropy loss is used for optimization):
Figure RE-GDA0003685071460000111
wherein, the cross entropy loss function optimizes each detection box i belonging to different positive samples (pos) and negative samples (Neg),
Figure RE-GDA0003685071460000112
representing the true category for each predictive detection block,
Figure RE-GDA0003685071460000113
is a prediction category of the network;
Figure RE-GDA0003685071460000114
wherein the forecast category score for the network is normalized to between [0,1] by sotmax;
s4.4: when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out the prediction frames lower than the confidence coefficient threshold according to the set confidence coefficient threshold; and decoding the remaining prediction frames, performing clip after decoding to prevent the positions of the prediction frames from exceeding the picture, performing descending order arrangement according to the confidence value, obtaining the real position parameters of each prediction frame according to the prior frames, then reserving 400 prediction frames, filtering the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection results.
The invention also provides a system for analyzing safe driving behavior based on facial features, which comprises:
a face detection module: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver; the face detection module of the invention positions the face of the driver specifically as follows:
step 1: scaling the face scale image to pixels 300 x 300;
step 2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
and step 3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
and 4, step 4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
Face key point detection module: calibrating the recognized driver face image by using a 3D face key point detection module according to the recognized driver face image to obtain 68 personal face key points and head postures for detecting the human face behavior characteristics; the face key point detection module specifically comprises:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting 68 face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted value and the true value is then calculated by a loss function, the formula of which is:
Figure RE-GDA0003685071460000121
wherein,
Figure RE-GDA0003685071460000122
where Φ (w) is a regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.
A fatigue detection module: as shown in fig. 4, 68 pieces of face key point information and head pose information obtained by the face key point detection module are extracted and combined into 68 × 6 image feature vectors, then the image feature vectors of each frame are input into the convolutional neural network, the convolutional neural network outputs state feature vectors, then the state feature vectors are input into the BiLSTM, and the driving state of the driver is judged in real time; the driving states include attentive driving, fatigue driving, and left-right anticipation.
A behavior detection module: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result. The behavior detection module specifically detects the behavior of the driver as follows:
firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
Figure RE-GDA0003685071460000131
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;
Figure RE-GDA0003685071460000132
is an indication parameter when
Figure RE-GDA0003685071460000133
The time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;
for the position error, the following formula is used for definition:
Figure RE-GDA0003685071460000134
wherein, b ═ { cx, cy,w,h}、
Figure RE-GDA0003685071460000135
Figure RE-GDA0003685071460000136
Figure RE-GDA0003685071460000137
i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
for confidence error, the following formula is used for definition:
Figure RE-GDA0003685071460000141
Figure RE-GDA0003685071460000142
wherein,
Figure RE-GDA0003685071460000143
indicating the true category for each predictive detection block,
Figure RE-GDA0003685071460000144
is a prediction category of the network;
when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing clip after decoding to prevent the positions of the prediction frames from exceeding the picture, performing descending order arrangement according to the confidence value, obtaining the real position parameters of each prediction frame according to the prior frames, then reserving 400 prediction frames, filtering the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection results.
The above is only a preferred embodiment of the present invention, and it should be noted that the above preferred embodiment should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the spirit and scope of the invention, and these modifications and adaptations should be considered within the scope of the invention.

Claims (10)

1. A method for safe driving behavior analysis based on facial features, comprising:
s1: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
s2: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures used for detecting the behavior characteristics of the face;
s3: extracting a plurality of pieces of face key point information and head posture information obtained in S2, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;
s4: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result.
2. The method for analyzing safe driving behavior based on facial features as claimed in claim 1, wherein the step S1 is specifically as follows:
s1.1: scaling the face scale image into pixels 300 x 300;
s1.2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
s1.3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
s1.4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
3. The method for analyzing safe driving behavior based on facial features as claimed in claim 2, wherein the step S2 is specifically as follows:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting the face key points based on the five sense organs and the face contour and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
Figure RE-FDA0003685071450000021
wherein,
Figure RE-FDA0003685071450000022
where Φ (w) is a regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the facial contour are represented, namely, the real values, and phi (w) represents the regularization term of a parameter w to limit the coefficients.
4. A method for safe driving behavior analysis based on facial features as claimed in claim 1, wherein the driving states include attentive driving, fatigue driving and left look ahead in S3.
5. The method for safe driving behavior analysis based on facial features according to claim 1, wherein S4 is specifically:
s4.1: firstly, generating a series of coarse-grained candidate frame information through an RPN (resilient packet network), then classifying and regressing the candidate frame information, and adopting a feature fusion operation for a target detection network, wherein a loss function adopted by the whole target detection network is as follows:
Figure RE-FDA0003685071450000023
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;
Figure RE-FDA0003685071450000024
is an indication parameter when
Figure RE-FDA0003685071450000025
The time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence coefficient predicted value, l is a position predicted value of a corresponding boundary box of the prior box, and g is a position parameter of a ground route;
s4.2: for the position error, the following formula is used for definition:
Figure RE-FDA0003685071450000031
wherein b ═ cx, cy, w, h [ ],
Figure RE-FDA0003685071450000032
Figure RE-FDA0003685071450000033
i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
s4.3: for the confidence error, the following formula is used for definition:
Figure RE-FDA0003685071450000034
Figure RE-FDA0003685071450000035
wherein,
Figure RE-FDA0003685071450000036
indicating the true category for each predictive detection block,
Figure RE-FDA0003685071450000037
is a prediction category of the network;
s4.4: when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order arrangement according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering out the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection result.
6. A system for safe driving behavior analysis based on facial features, comprising:
a face detection module: performing face positioning in pixel aspect on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
face key point detection module: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures used for detecting the behavior characteristics of the face;
a fatigue detection module: extracting a plurality of pieces of face key point information and head posture information obtained by a face key point detection module, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;
a behavior detection module: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result.
7. The system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein the face detection module is specifically configured to locate the face of the driver by:
step 1: scaling the face scale image to pixels 300 x 300;
step 2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
and step 3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
and 4, step 4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
8. The system for safe driving behavior analysis based on facial features according to claim 7, wherein the face keypoint detection module is specifically:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting a plurality of face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
Figure RE-FDA0003685071450000051
wherein,
Figure RE-FDA0003685071450000052
where Φ (w) is a regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the facial contour are represented, namely, the real values, and phi (w) represents the regularization term of a parameter w to limit the coefficients.
9. A system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein in the fatigue detection module, the driving states include attentive driving, fatigue driving, and left look ahead.
10. The system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein the behavior detection module specifically detects the behavior of the driver as:
firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
Figure RE-FDA0003685071450000053
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;
Figure RE-FDA0003685071450000054
is an indication parameter when
Figure RE-FDA0003685071450000055
The time represents that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;
for the position error, the following formula is used for definition:
Figure RE-FDA0003685071450000056
where b ═ { cx, cy, w, h }, or,
Figure RE-FDA0003685071450000057
Figure RE-FDA0003685071450000061
Figure RE-FDA0003685071450000062
i tableThe anchor box index id for each training batch is shown as positive sample (pos), g represents the predicted value, d represents the true value;
for the confidence error, the following formula is used for definition:
Figure RE-FDA0003685071450000063
Figure RE-FDA0003685071450000064
wherein,
Figure RE-FDA0003685071450000065
representing the true category for each predictive detection block,
Figure RE-FDA0003685071450000066
is a prediction category of the network;
when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out the prediction frames lower than the confidence coefficient threshold according to the set confidence coefficient threshold; and decoding the remaining prediction frames, performing descending order arrangement according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering out the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection result.
CN202210041064.0A 2022-01-14 2022-01-14 Method and system for analyzing safe driving behavior based on facial features Pending CN114792437A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210041064.0A CN114792437A (en) 2022-01-14 2022-01-14 Method and system for analyzing safe driving behavior based on facial features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210041064.0A CN114792437A (en) 2022-01-14 2022-01-14 Method and system for analyzing safe driving behavior based on facial features

Publications (1)

Publication Number Publication Date
CN114792437A true CN114792437A (en) 2022-07-26

Family

ID=82460721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210041064.0A Pending CN114792437A (en) 2022-01-14 2022-01-14 Method and system for analyzing safe driving behavior based on facial features

Country Status (1)

Country Link
CN (1) CN114792437A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541865A (en) * 2023-11-14 2024-02-09 中国矿业大学 Identity analysis and mobile phone use detection method based on coarse-granularity depth estimation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541865A (en) * 2023-11-14 2024-02-09 中国矿业大学 Identity analysis and mobile phone use detection method based on coarse-granularity depth estimation
CN117541865B (en) * 2023-11-14 2024-06-04 中国矿业大学 Identity analysis and mobile phone use detection method based on coarse-granularity depth estimation

Similar Documents

Publication Publication Date Title
CN108309311A (en) A kind of real-time doze of train driver sleeps detection device and detection algorithm
CN113158850B (en) Ship driver fatigue detection method and system based on deep learning
CN108596087B (en) Driving fatigue degree detection regression model based on double-network result
CN111553214B (en) Method and system for detecting smoking behavior of driver
CN111626272A (en) Driver fatigue monitoring system based on deep learning
CN109460704A (en) A kind of fatigue detection method based on deep learning, system and computer equipment
Luo et al. The driver fatigue monitoring system based on face recognition technology
CN109063686A (en) A kind of fatigue of automobile driver detection method and system
CN108108651B (en) Method and system for detecting driver non-attentive driving based on video face analysis
CN115331205A (en) Driver fatigue detection system with cloud edge cooperation
CN115937830A (en) Special vehicle-oriented driver fatigue detection method
CN114792437A (en) Method and system for analyzing safe driving behavior based on facial features
CN114220158A (en) Fatigue driving detection method based on deep learning
Guo et al. Monitoring and detection of driver fatigue from monocular cameras based on Yolo v5
CN113408389A (en) Method for intelligently recognizing drowsiness action of driver
CN118506330A (en) Fatigue driving detection method based on face recognition
Öztürk et al. Drowsiness detection system based on machine learning using eye state
Wang et al. Driver Fatigue Detection Using Improved Deep Learning and Personalized Framework
CN115601733A (en) Human body skeleton-based method and system for detecting cheating behaviors of three-subject security officer
CN115171189A (en) Fatigue detection method, device, equipment and storage medium
Abdel-Aty et al. Utilizing neutrosophic logic in a hybrid cnn-gru framework for driver drowsiness level detection with dynamic spatio-temporal analysis based on eye aspect ratio
Hu et al. Comprehensive driver state recognition based on deep learning and PERCLOS criterion
CN112329566A (en) Visual perception system for accurately perceiving head movements of motor vehicle driver
CN117058627B (en) Public place crowd safety distance monitoring method, medium and system
Turki et al. Facial Expression-Based Drowsiness Detection System for Driver Safety Using Deep Learning Techniques.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination