CN114792437A - Method and system for analyzing safe driving behavior based on facial features - Google Patents
Method and system for analyzing safe driving behavior based on facial features Download PDFInfo
- Publication number
- CN114792437A CN114792437A CN202210041064.0A CN202210041064A CN114792437A CN 114792437 A CN114792437 A CN 114792437A CN 202210041064 A CN202210041064 A CN 202210041064A CN 114792437 A CN114792437 A CN 114792437A
- Authority
- CN
- China
- Prior art keywords
- face
- frame
- prediction
- confidence
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001815 facial effect Effects 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000001514 detection method Methods 0.000 claims abstract description 86
- 230000006399 behavior Effects 0.000 claims abstract description 67
- 230000006870 function Effects 0.000 claims description 33
- 230000036544 posture Effects 0.000 claims description 28
- 239000013598 vector Substances 0.000 claims description 24
- 238000013527 convolutional neural network Methods 0.000 claims description 19
- 210000000697 sensory organ Anatomy 0.000 claims description 19
- 238000001914 filtration Methods 0.000 claims description 18
- 238000004458 analytical method Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 13
- 230000003993 interaction Effects 0.000 claims description 12
- 238000002790 cross-validation Methods 0.000 claims description 6
- 210000003128 head Anatomy 0.000 description 20
- 230000004397 blinking Effects 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 241001282135 Poromitra oscitans Species 0.000 description 2
- 206010048232 Yawning Diseases 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011897 real-time detection Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for analyzing safe driving behaviors based on facial features, which belong to the technical field of safe driving, and are characterized in that the position of a candidate target driver is identified through a face detection module; positioning the 3D key point position of the face by using a face key point detection module based on the obtained face image; then respectively sending the data to a behavior detection module and a fatigue detection module according to different positioning positions; finally, the behaviors of the driver are recognized in different detection modules and fatigue classification is carried out, and the purposes are as follows: the driving behavior can be analyzed on the premise of guaranteeing real-time performance and accuracy.
Description
Technical Field
The invention belongs to the technical field of safe driving, and particularly relates to a safe driving behavior analysis method and system based on facial features.
Background
With the rapid development of economy, automobiles are also rapidly popularized, drivers need to drive for hours or overnight in many long-distance trips, the drivers are easily in fatigue states, and when the drivers take calls or smoke, the drivers are more likely to be distracted, and traffic accidents are easily caused. It is therefore currently necessary to monitor the behaviour and fatigue state of the driver and to make reminders.
In the prior art, a driving behavior of a driver is monitored through a monitor, for example, when the driver has behaviors of interfering driving (such as smoking, making a call and the like) or is in a fatigue state (such as blinking eyes ceaselessly or yawning ceaselessly) in the driving process, a system needs to detect and analyze the current behavior or fatigue degree of the driver in time, and then gives corresponding warning to the driver.
The prior art has the following problems:
the existing driving behavior analysis can analyze some basic behaviors, but the recognized objects and the image background are often not well distinguished and failure is caused as the image background becomes complex, although the accuracy of action recognition can be improved by a deep learning method, the network scale of a deep learning model is overlarge, so that model parameters are overlarge and the calculation amount is overlarge, real-time detection is difficult to realize when the deep learning model is applied to generally configured hardware, and meanwhile, the detection of key points of a human face is greatly influenced because a mask needs to be worn in an epidemic situation.
Disclosure of Invention
The invention provides a safe driving behavior analysis method and system based on facial features, aiming at the problems that in the prior art, as the image background becomes complicated, the identified object and the image background can not be well distinguished, so that the object and the image background are often invalid, although the accuracy of motion identification can be improved by a deep learning method, the network scale of a deep learning model is overlarge, so that the model parameters are overlarge and the calculation amount is overlarge, so that real-time detection is difficult to realize on generally configured hardware, and meanwhile, the detection of key points of a face is greatly influenced because an epidemic situation needs to wear a mask, so that the safe driving behavior analysis method and system based on the facial features have the purposes that: the driving behavior can be analyzed on the premise of guaranteeing real-time performance and accuracy.
In order to achieve the purpose, the invention adopts the technical scheme that: provided are a method and a system for safe driving behavior analysis based on facial features, comprising:
s1: performing face positioning in pixel aspect on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
s2: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures for detecting the behavior characteristics of the face;
s3: extracting a plurality of pieces of face key point information and head posture information obtained by S2, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilTM, and judging the driving state of a driver in real time;
s4: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result.
Preferably, S1 in the present invention specifically includes:
s1.1: scaling the face scale image to pixels 300 x 300;
s1.2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
s1.3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
s1.4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
Preferably, S2 in the present invention specifically includes:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting a plurality of face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
where Φ (w) is the regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network prediction, y i Indicates five sense organs harmonizing the faceThe real position of the partial contour is the true value, and Φ (w) represents the regularization term of the parameter w to limit the coefficient.
Preferably, in the present invention S3, the driving states include attentive driving, fatigue driving, and left-right-looking expectation.
Preferably, the S4 of the present invention is specifically:
s4.1: firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
the loss function is defined as a weighted sum of the position error and the confidence error, wherein the weight coefficient alpha is set to 1 through cross validation, and N is the number of positive samples of the prior frame;is an indication parameter whenThe time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;
s4.2: for the position error, the following formula is used for definition:
wherein b ═ cx, cy, w, h [ ], i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
s4.3: for the confidence error, the following formula is used for definition:
wherein,representing the true category for each predictive detection block,is a prediction category of the network;
s4.4: when the driving behavior is predicted, firstly, determining the class and the confidence value according to the class confidence, and filtering a prediction frame belonging to the background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering the prediction frames with overlap through an NMS algorithm, and taking the remaining prediction frames as behavior detection results.
The invention also provides a system for analyzing safe driving behavior based on facial features, which comprises:
a face detection module: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
face key point detection module: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures for detecting the behavior characteristics of the face;
a fatigue detection module: extracting a plurality of pieces of face key point information and head posture information obtained by a face key point detection module, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;
a behavior detection module: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result.
Preferably, the face detection module of the present invention specifically positions the face of the driver:
step 1: scaling the face scale image into pixels 300 x 300;
and 2, step: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
and step 3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
and 4, step 4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
Preferably, the face key point detection module of the present invention specifically comprises:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting a plurality of face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
where Φ (w) is the regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.
Preferably, in the fatigue detection module of the present invention, the driving state includes concentration driving, fatigue driving, and a left expectation.
Preferably, the behavior detection module of the invention specifically detects the behavior of the driver as follows:
firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
the loss function is defined as a weighted sum of the position error and the confidence error, wherein the weight coefficient alpha is set to 1 through cross validation, and N is the number of positive samples of the prior frame;is an indication parameter whenThe time represents that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence coefficient predicted value, l is a position predicted value of a corresponding boundary box of the prior box, and g is a position parameter of a ground route;
for the position error, the following formula is used for definition:
wherein b ═ cx, cy, w, h [ ], i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
for confidence error, the following formula is used for definition:
wherein,indicating true for each prediction detection boxThe real category of the content is,is a prediction category of the network;
when the driving behavior is predicted, firstly, determining the class and the confidence value according to the class confidence, and filtering a prediction frame belonging to the background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering the prediction frames with overlap through an NMS algorithm, and taking the remaining prediction frames as behavior detection results.
Compared with the prior art, the technical scheme of the invention has the following advantages/beneficial effects:
1. the method disclosed by the invention uses 3D key point detection to make up for some defects of 2D key point detection in practical application, such as low identification accuracy rate, low living body detection accuracy rate and the like.
2. In the aspect of a fatigue detection mode, the fatigue state can be effectively judged by combining the key point information of the human face and the pitch angle information of the head posture and in a designed convolutional neural network.
3. The invention uses the RPN network to generate a series of coarse-grained candidate frame information in the behavior detection module, then classifies and regresses the coarse-grained candidate frame information so as to further regress to obtain more accurate frame information, and adopts the feature fusion operation for the target detection network, thereby effectively improving the detection effect on small targets.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart of example 1 of the present invention.
Fig. 2 is a schematic diagram showing the detection of the key points in embodiment 1 of the present invention.
Fig. 3 is a schematic diagram of a key point face in embodiment 1 of the present invention.
Fig. 4 is a flowchart of the fatigue detection of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive efforts based on the embodiments of the present invention, are within the scope of protection of the present invention. Thus, the detailed description of the embodiments of the present invention provided below is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention.
Example 1:
as shown in fig. 1, fig. 2, fig. 3 and fig. 4, the present invention provides a method and a system for analyzing safe driving behavior based on facial features, comprising:
s1: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver; s1 specifically includes:
s1.1: scaling the face scale image into pixels 300 x 300;
s1.2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
s1.3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
s1.4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
S2: calibrating the recognized driver face image by using a 3D face key point detection module according to the recognized driver face image, and obtaining 68 personal face key points and head gestures for detecting the behavior characteristics of the face as shown in figure 2; s2 specifically includes:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting 68 face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle; as shown in fig. 3, wherein several important key points are explained: the nose tip was 31, the nose root 28, the chin 9, the left eye outer corner 37, the left eye inner corner 40, the right eye inner corner 43, the right eye outer corner 46, the mouth center 67, the mouth right corner 55, the left face 1, and the right face 17.
The error between the predicted value and the true value is then calculated by a loss function, the formula of which is:
where Φ (w) is the regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network prediction, y i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.
The true values in this example 1 represent the true locations of the face's key points, which are the five sense organs and the facial contours.
S3: extracting 68 pieces of face key point information and head posture information obtained in S2, combining the 68 pieces of face key point information and the head posture information into 68 x 6 image feature vectors, inputting the image feature vectors of each frame into a convolutional neural network, outputting state feature vectors by the convolutional neural network, inputting the state feature vectors into a BiLSTM, and judging the driving state of a driver in real time; the driving states include attentive driving, fatigue driving, and left-right anticipation. In the case of determining the driving state of the driver wearing the mask, in example 1, the fatigue level of the driver is determined by determining the expression such as blinking eyes or yawning without stopping the blinking.
S4: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result. S4 specifically includes:
s4.1: firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, the detection effect on small targets is effectively improved, and the loss function adopted by the whole target detection network is as follows:
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;is an indication parameter whenThe time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a predicted value of the category confidence coefficient, l is a predicted value of the position of the corresponding boundary box of the prior frame, and g is a group tA location parameter of run;
s4.2: for the position error, the following formula is used for definition (the optimization of the present invention for the position is optimized by using the Smooth L1 loss):
wherein b ═ cx, cy, w, h [ ], Wherein i represents the anchor frame index id of each training batch as a positive sample (pos), and for each identified frame detected, is optimized by comparison with the position b (center coordinates x, y, and width w and height h of the detected frame) of the true position L at each point;
the function is actually a piecewise function, L2 loss is obtained when the input point x belongs to the range of [ -1,1], the problem that the L1 has a break point at 0 and L1 loss is obtained when the input point x is out of the range of [ -1,1], and the problem of outlier gradient explosion is solved.
S4.3: for confidence errors, the following formula is used for definition (for optimization of classification classes, cross entropy loss is used for optimization):
wherein, the cross entropy loss function optimizes each detection box i belonging to different positive samples (pos) and negative samples (Neg),representing the true category for each predictive detection block,is a prediction category of the network;
wherein the forecast category score for the network is normalized to between [0,1] by sotmax;
s4.4: when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out the prediction frames lower than the confidence coefficient threshold according to the set confidence coefficient threshold; and decoding the remaining prediction frames, performing clip after decoding to prevent the positions of the prediction frames from exceeding the picture, performing descending order arrangement according to the confidence value, obtaining the real position parameters of each prediction frame according to the prior frames, then reserving 400 prediction frames, filtering the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection results.
The invention also provides a system for analyzing safe driving behavior based on facial features, which comprises:
a face detection module: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver; the face detection module of the invention positions the face of the driver specifically as follows:
step 1: scaling the face scale image to pixels 300 x 300;
step 2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
and step 3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
and 4, step 4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
Face key point detection module: calibrating the recognized driver face image by using a 3D face key point detection module according to the recognized driver face image to obtain 68 personal face key points and head postures for detecting the human face behavior characteristics; the face key point detection module specifically comprises:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting 68 face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted value and the true value is then calculated by a loss function, the formula of which is:
where Φ (w) is a regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the face contour are represented as true values, and phi (w) represents a regularization term of a parameter w to limit the coefficients.
A fatigue detection module: as shown in fig. 4, 68 pieces of face key point information and head pose information obtained by the face key point detection module are extracted and combined into 68 × 6 image feature vectors, then the image feature vectors of each frame are input into the convolutional neural network, the convolutional neural network outputs state feature vectors, then the state feature vectors are input into the BiLSTM, and the driving state of the driver is judged in real time; the driving states include attentive driving, fatigue driving, and left-right anticipation.
A behavior detection module: firstly, a series of coarse-grained candidate frame information is generated by using an RPN, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then a characteristic fusion operation is used for a target detection network to obtain a behavior detection result. The behavior detection module specifically detects the behavior of the driver as follows:
firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;is an indication parameter whenThe time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;
for the position error, the following formula is used for definition:
wherein, b ═ { cx, cy,w,h}、 i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
for confidence error, the following formula is used for definition:
wherein,indicating the true category for each predictive detection block,is a prediction category of the network;
when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing clip after decoding to prevent the positions of the prediction frames from exceeding the picture, performing descending order arrangement according to the confidence value, obtaining the real position parameters of each prediction frame according to the prior frames, then reserving 400 prediction frames, filtering the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection results.
The above is only a preferred embodiment of the present invention, and it should be noted that the above preferred embodiment should not be considered as limiting the present invention, and the protection scope of the present invention should be subject to the scope defined by the claims. It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the spirit and scope of the invention, and these modifications and adaptations should be considered within the scope of the invention.
Claims (10)
1. A method for safe driving behavior analysis based on facial features, comprising:
s1: performing face positioning in terms of pixels on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
s2: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures used for detecting the behavior characteristics of the face;
s3: extracting a plurality of pieces of face key point information and head posture information obtained in S2, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;
s4: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result.
2. The method for analyzing safe driving behavior based on facial features as claimed in claim 1, wherein the step S1 is specifically as follows:
s1.1: scaling the face scale image into pixels 300 x 300;
s1.2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
s1.3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
s1.4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
3. The method for analyzing safe driving behavior based on facial features as claimed in claim 2, wherein the step S2 is specifically as follows:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting the face key points based on the five sense organs and the face contour and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
where Φ (w) is a regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the facial contour are represented, namely, the real values, and phi (w) represents the regularization term of a parameter w to limit the coefficients.
4. A method for safe driving behavior analysis based on facial features as claimed in claim 1, wherein the driving states include attentive driving, fatigue driving and left look ahead in S3.
5. The method for safe driving behavior analysis based on facial features according to claim 1, wherein S4 is specifically:
s4.1: firstly, generating a series of coarse-grained candidate frame information through an RPN (resilient packet network), then classifying and regressing the candidate frame information, and adopting a feature fusion operation for a target detection network, wherein a loss function adopted by the whole target detection network is as follows:
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;is an indication parameter whenThe time indicates that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence coefficient predicted value, l is a position predicted value of a corresponding boundary box of the prior box, and g is a position parameter of a ground route;
s4.2: for the position error, the following formula is used for definition:
wherein b ═ cx, cy, w, h [ ], i denotes the anchor box index id of each training batch as positive sample (pos), g denotes the predicted value, d denotes the true value;
s4.3: for the confidence error, the following formula is used for definition:
wherein,indicating the true category for each predictive detection block,is a prediction category of the network;
s4.4: when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out a prediction frame lower than the confidence coefficient threshold value according to the set confidence coefficient threshold value; and decoding the remaining prediction frames, performing descending order arrangement according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering out the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection result.
6. A system for safe driving behavior analysis based on facial features, comprising:
a face detection module: performing face positioning in pixel aspect on various face scale images by utilizing combined supervision and self-supervision multitask learning to position the face position of a candidate target driver;
face key point detection module: according to the recognized face image of the driver, a 3D face key point detection module is used for calibrating the face image to obtain a plurality of face key points and head postures used for detecting the behavior characteristics of the face;
a fatigue detection module: extracting a plurality of pieces of face key point information and head posture information obtained by a face key point detection module, combining the face key point information and the head posture information into a specific image feature vector, inputting the image feature vector of each frame into a convolutional neural network, outputting a state feature vector by the convolutional neural network, inputting the state feature vector into a BilSTM, and judging the driving state of a driver in real time;
a behavior detection module: firstly, RPN is used for generating a series of coarse-grained candidate frame information, then the coarse-grained candidate frame information is classified and regressed so as to further regress to obtain more accurate candidate frame information, and then feature fusion operation is used for a target detection network to obtain a behavior detection result.
7. The system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein the face detection module is specifically configured to locate the face of the driver by:
step 1: scaling the face scale image to pixels 300 x 300;
step 2: then inputting the face scale image into a convolutional neural network, extracting face features and inputting the face features into a feature pyramid to obtain a calibrated face frame with feature points, and calculating a predicted value of the calibrated face frame, wherein the predicted value comprises a classification predicted value, a boundary frame regression value and a feature point regression value;
and step 3: calculating the interaction ratio of the calibrated face frame and all preset candidate face frames, and taking the candidate face frame with the maximum value in the interaction ratio as an effective face frame;
and 4, step 4: and calculating the loss of the characteristic points, the loss of the boundary frame and the classification loss of the calibrated face frame through the calibrated face frame and the selected effective face frame.
8. The system for safe driving behavior analysis based on facial features according to claim 7, wherein the face keypoint detection module is specifically:
recognizing a face image of a driver based on an effective face frame, then obtaining a series of feature maps containing face key points through model training, and finally outputting a plurality of face key points based on five sense organs and face contours and a head posture, wherein the head posture comprises an azimuth angle, a pitch angle and a roll angle;
the error between the predicted and true values is then calculated by a loss function, whose formula is:
where Φ (w) is a regularization term, l (y) i ,f(x i (ii) a w)) is a loss function, and the loss function L takes the L2 loss, f (x) i (ii) a w) predicted values of the five sense organs and facial contours representing network predictions, y i The real positions of the five sense organs and the facial contour are represented, namely, the real values, and phi (w) represents the regularization term of a parameter w to limit the coefficients.
9. A system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein in the fatigue detection module, the driving states include attentive driving, fatigue driving, and left look ahead.
10. The system for safe driving behavior analysis based on facial features as claimed in claim 6, wherein the behavior detection module specifically detects the behavior of the driver as:
firstly, a series of coarse-grained candidate frame information is generated through an RPN, then the candidate frame information is classified and regressed, a feature fusion operation is adopted for a target detection network, and the loss function adopted by the whole target detection network is as follows:
a loss function is defined as a weighted sum of the position error and the confidence error, wherein a weight coefficient alpha is set to be 1 through cross validation, and N is the number of positive samples of a prior frame;is an indication parameter whenThe time represents that the ith prior frame is matched with the jth group channel, and the class of the group channel is p; c is a category confidence degree predicted value, l is a position predicted value of a boundary box corresponding to the prior box, and g is a position parameter of a ground route;
for the position error, the following formula is used for definition:
where b ═ { cx, cy, w, h }, or, i tableThe anchor box index id for each training batch is shown as positive sample (pos), g represents the predicted value, d represents the true value;
for the confidence error, the following formula is used for definition:
wherein,representing the true category for each predictive detection block,is a prediction category of the network;
when driving behavior prediction is carried out, firstly, determining the category and the confidence value of the driving behavior according to the category confidence, and filtering a prediction frame belonging to a background; then filtering out the prediction frames lower than the confidence coefficient threshold according to the set confidence coefficient threshold; and decoding the remaining prediction frames, performing descending order arrangement according to the confidence value, obtaining the real position parameter of each prediction frame according to the prior frame, then reserving top-k prediction frames, filtering out the prediction frames with overlap through an NMS algorithm, wherein the remaining prediction frames are the behavior detection result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210041064.0A CN114792437A (en) | 2022-01-14 | 2022-01-14 | Method and system for analyzing safe driving behavior based on facial features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210041064.0A CN114792437A (en) | 2022-01-14 | 2022-01-14 | Method and system for analyzing safe driving behavior based on facial features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114792437A true CN114792437A (en) | 2022-07-26 |
Family
ID=82460721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210041064.0A Pending CN114792437A (en) | 2022-01-14 | 2022-01-14 | Method and system for analyzing safe driving behavior based on facial features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114792437A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117541865A (en) * | 2023-11-14 | 2024-02-09 | 中国矿业大学 | Identity analysis and mobile phone use detection method based on coarse-granularity depth estimation |
-
2022
- 2022-01-14 CN CN202210041064.0A patent/CN114792437A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117541865A (en) * | 2023-11-14 | 2024-02-09 | 中国矿业大学 | Identity analysis and mobile phone use detection method based on coarse-granularity depth estimation |
CN117541865B (en) * | 2023-11-14 | 2024-06-04 | 中国矿业大学 | Identity analysis and mobile phone use detection method based on coarse-granularity depth estimation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108309311A (en) | A kind of real-time doze of train driver sleeps detection device and detection algorithm | |
CN113158850B (en) | Ship driver fatigue detection method and system based on deep learning | |
CN108596087B (en) | Driving fatigue degree detection regression model based on double-network result | |
CN111553214B (en) | Method and system for detecting smoking behavior of driver | |
CN111626272A (en) | Driver fatigue monitoring system based on deep learning | |
CN109460704A (en) | A kind of fatigue detection method based on deep learning, system and computer equipment | |
Luo et al. | The driver fatigue monitoring system based on face recognition technology | |
CN109063686A (en) | A kind of fatigue of automobile driver detection method and system | |
CN108108651B (en) | Method and system for detecting driver non-attentive driving based on video face analysis | |
CN115331205A (en) | Driver fatigue detection system with cloud edge cooperation | |
CN115937830A (en) | Special vehicle-oriented driver fatigue detection method | |
CN114792437A (en) | Method and system for analyzing safe driving behavior based on facial features | |
CN114220158A (en) | Fatigue driving detection method based on deep learning | |
Guo et al. | Monitoring and detection of driver fatigue from monocular cameras based on Yolo v5 | |
CN113408389A (en) | Method for intelligently recognizing drowsiness action of driver | |
CN118506330A (en) | Fatigue driving detection method based on face recognition | |
Öztürk et al. | Drowsiness detection system based on machine learning using eye state | |
Wang et al. | Driver Fatigue Detection Using Improved Deep Learning and Personalized Framework | |
CN115601733A (en) | Human body skeleton-based method and system for detecting cheating behaviors of three-subject security officer | |
CN115171189A (en) | Fatigue detection method, device, equipment and storage medium | |
Abdel-Aty et al. | Utilizing neutrosophic logic in a hybrid cnn-gru framework for driver drowsiness level detection with dynamic spatio-temporal analysis based on eye aspect ratio | |
Hu et al. | Comprehensive driver state recognition based on deep learning and PERCLOS criterion | |
CN112329566A (en) | Visual perception system for accurately perceiving head movements of motor vehicle driver | |
CN117058627B (en) | Public place crowd safety distance monitoring method, medium and system | |
Turki et al. | Facial Expression-Based Drowsiness Detection System for Driver Safety Using Deep Learning Techniques. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |