CN117456516A - Driver fatigue driving state detection method and device - Google Patents

Driver fatigue driving state detection method and device Download PDF

Info

Publication number
CN117456516A
CN117456516A CN202311428065.1A CN202311428065A CN117456516A CN 117456516 A CN117456516 A CN 117456516A CN 202311428065 A CN202311428065 A CN 202311428065A CN 117456516 A CN117456516 A CN 117456516A
Authority
CN
China
Prior art keywords
fatigue
driver
network
frame
mouth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311428065.1A
Other languages
Chinese (zh)
Inventor
郑鑫
李敬兆
陈建
姚远
罗斌
张杨
陆毓辉
刘辉
刘涛
郎贵彬
王磊
周小锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingji Coal Mine Huaihu Coal And Electricity Co ltd
Original Assignee
Dingji Coal Mine Huaihu Coal And Electricity Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dingji Coal Mine Huaihu Coal And Electricity Co ltd filed Critical Dingji Coal Mine Huaihu Coal And Electricity Co ltd
Priority to CN202311428065.1A priority Critical patent/CN117456516A/en
Publication of CN117456516A publication Critical patent/CN117456516A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention provides a method and a device for detecting fatigue driving state of a driver, which are characterized in that a fatigue characteristic detection model is obtained through training an improved YOLOv7 network, fatigue analysis is carried out on collected face image information of the driver, meanwhile, the fatigue analysis is carried out by combining facial expression coding system FACS analysis of the fatigue expression with the change data of the muscles around eyes and the change data of the muscles around the mouth of the driver, the fatigue degree is classified, the fatigue state is early-warned, and the accuracy and the real-time performance of the fatigue detection are improved.

Description

Driver fatigue driving state detection method and device
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for detecting fatigue driving state of a driver.
Background
Along with the rapid development of economy and the continuous improvement of living standard of people, automobiles and trucks are continuously increased, and road traffic accidents are continuously increased while great convenience is brought to people. Among them, fatigue driving is an important cause of traffic accidents, and especially on highways, 50% of people have experienced fatigue driving. Therefore, the method is particularly important to solve the traffic problem caused by fatigue driving, and the driver is timely reminded of not losing an efficient mode by detecting the driving state of the driver.
In the related art, whether the driver is tired is judged by determining whether the eyes are frequently closed or frequently yawed, and two main methods exist by detecting the opening and closing of the eyes or the mouth, one is to judge the opening and closing of the eyes or the mouth by using Dlib face key point positioning; one is to directly detect the opening and closing of the eyes or mouth with a network.
At present, most of the fatigue driving detection methods only adopt a single signal to judge the fatigue state of a driver, and the judgment basis is single, so that the accuracy and the real-time performance are poor.
Disclosure of Invention
In order to solve the problems, the invention provides a driver fatigue driving state detection method, and the accuracy and the speed of the method are obviously improved in the driver fatigue detection, so that the probability of accident occurrence is reduced, and the problems of high omission ratio, poor real-time performance and the like are solved.
The invention provides a driver fatigue driving state detection method, which comprises the following steps:
collecting face image information of a driver;
inputting the face image information of the driver into the fatigue feature detection model, extracting eye fatigue features and mouth fatigue features in the face image information by utilizing a double-channel attention mechanism of the fatigue feature detection model, and analyzing change data of muscles around eyes and muscles around a mouth of the driver by utilizing a facial expression coding system FACS of a fatigue expression;
performing fatigue analysis on the driver by fusing the eyestrain characteristics, the mouth fatigue characteristics, the change data of the muscles around the eyes and the change data of the muscles around the mouth to obtain a fatigue state analysis result of the driver;
and outputting the fatigue state analysis result.
Optionally, before the step of inputting the face image information of the driver into the fatigue feature detection model, the method further includes:
creating a sample dataset for fatigue detection;
constructing a YOLOv7 network; the YOLOv7 network uses a MobileNet-V2 network structure and is provided with a double-channel attention mechanism for extracting fatigue characteristics;
and training the YOLOv7 network by using the sample data set to obtain a trained fatigue characteristic detection model.
Optionally, the two-channel attention mechanism in the constructed improved YOLOv7 network includes an upper branch and a lower branch;
the upper branch compresses the dimension w, h and c into 1, 1 and c through global maximization, and generates 1, 1 and c attention weight coefficients through an activation function; c represents the number of channels, w represents the width, and h represents the length;
the lower branch extracts the spatial information of the input features in a multi-branch mode, the input channel dimension of each branch is c, the spatial information of different scales on each channel feature map can be effectively extracted by compressing the channel dimension of the input tensor, and the two parts of split are subjected to the connection operation after the feature map is subjected to the 3*3 convolution operation;
and multiplying the attention weight coefficient obtained based on the upper branch by the feature map of the lower branch to obtain a final feature map.
Optionally, the modified loss function of the constructed modified YOLOv7 network is as follows:
wherein b and b gt Representing the prediction frame and the real frame respectively, c being the diagonal distance of the minimum closure area of the prediction frame and the real frame, alpha being the balance parameter, ρ being the Euclidean distance, v being the measure of whether the aspect ratio is consistent, I IoU Representing the intersection ratio, w, of a real frame and a predicted frame gt Representing the width, h of a real frame gt Representing the height of the prediction frame, the width of the w prediction frame and the height of the h prediction frame;
when the aspect ratio of the real frame and the predicted frame is as large, v is zero, and the penalty term has no effect; in order to prevent the aspect ratio penalty term from not having an effect, an ACIoU loss function is designed, and the calculation formula is as follows
Optionally, the creating a sample data set for fatigue detection in step S1 includes:
wherein S is b To correspond to the area of the network output frame S bgt Marking the area of a frame for a data set, wherein lambda represents regularization parameters, lambda=0 when the aspect ratio of a predicted frame is as large as that of a real frame, and calculating the area where the real frame and the predicted frame do not intersect; λ=1 when the predicted box aspect ratio is different from the real box aspect ratio, and a penalty term for the aspect ratio is calculated to calculate the loss function.
Optionally, the creating a sample dataset for fatigue detection comprises:
collecting a driver fatigue detection monitoring video collected based on a camera, respectively extracting an eye opening picture, an eye closing picture, a mouth opening picture and a mouth closing picture to serve as sample pictures, and adding corresponding labels for each sample picture;
and splicing the sample pictures in a mode of combining the mosaics data enhancement with the self-adaptive picture scaling, random cutting and random arrangement to generate a sample data set.
Optionally, before training the YOLOv7 network using the sample data set, further comprises:
performing non-maximum suppression processing on the final detection frame of the target by adopting an NMS algorithm on the sample data set in the step S1 to obtain an optimal target frame;
the sample dataset was processed according to 6:1:1 for training, verifying and testing the YOLOv7 network with the training set, verifying set and testing set, respectively.
Optionally, the analyzing the change data of the muscles around the eyes and the change data of the muscles around the mouth of the driver by using the facial expression encoding system FACS of the fatigue expression in step S5 includes:
in a facial movement system, the facial expression of a person is divided into 64 independent and interrelated movement units AU;
the combined occurrence of the motion units AU1 and AU2 occurs in deep fatigue; AU7 occurs in a fatigue state where the driving environment is uncomfortable; AU26 is in a fatigue state, when both AU26 and AU43 are present alone, a fatigue state, and when the combination is present, a deep fatigue state.
Optionally, performing fatigue analysis on the driver by fusing the eyestrain feature, the mouth fatigue feature, the change data of the muscles around the eyes, and the change data of the muscles around the mouth includes:
setting the blink frequency to be larger than a first threshold value per minute, judging the blink frequency to be in a fatigue state, and setting the blink frequency to be larger than a second threshold value per minute to be in a deep fatigue state;
the duration of the yawning is longer than a third threshold value, the driver is considered to be in a fatigue state, and the duration of the yawning is longer than a fourth threshold value, the driver is considered to be in a deep fatigue state;
and judging the fatigue degree of the four fatigue characteristics of the blink frequency, the yawning time, the muscle change around eyes and the muscle change around mouths through a fatigue degree network judging model of fig. 6, and obtaining the fatigue state analysis result of the driver in a normal state, a fatigue state or a deep fatigue state.
Optionally, the network model includes: 3 convolutional layers, 3 pooling layers and 1 fully connected layer; the 1 st convolution size is 3 x 4 and is used for carrying out first extraction on fatigue characteristics; the 2 nd convolution size is 3 x 8 and is used for carrying out second extraction on the fatigue characteristics; the third convolution size is 3 x 12 and is used for third extraction of fatigue characteristics, and the network of the first extraction, the second extraction and the third extraction gradually improves deep characteristic information; the middle is connected through 3 pooling layers with the step length of 2 x 1 being 1, so as to improve the operation efficiency.
Optionally, the outputting the fatigue state analysis result includes:
transmitting the fatigue state analysis result to a vehicle-mounted NVR (Network Video Recorder ) through a vehicle-mounted network in combination with a vehicle-mounted switch;
taking the vehicle-mounted NVR as a data forwarding center, and transmitting the fatigue state analysis result to a monitoring screen in cooperation with a vehicle-mounted switch;
when the driver is in a fatigue state, automatically starting the entertainment system;
when the driver is in a deep fatigue state, reminding information for reminding the driver to rest is sent.
The invention also provides a driver fatigue driving state detection device comprising one or more processors and a non-transitory computer readable storage medium storing program instructions which, when executed by the one or more processors, are adapted to carry out the method according to any one of claims 1-9.
The beneficial effects of the invention are as follows:
1. according to the method for detecting the fatigue driving state of the driver, the fatigue degree of the driver is detected by detecting whether the state of eyes is closed or open and whether the state of the mouth is closed or open, analyzing AU values of muscles around the mouth and muscles around the eyes, classifying the fatigue degree and early warning the fatigue state.
2. According to the improved YOLOv7 model, a MobileNet-V2 network structure is used for replacing a CSPDarknet53 main network structure, and a dual-channel attention mechanism is introduced to extract fatigue characteristic information, so that a loss function is improved, and the accuracy and the instantaneity of fatigue detection are improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic flow diagram of an embodiment of the present invention;
FIG. 2 shows a Mosaic data enhancement diagram of an embodiment of the present invention;
FIG. 3 shows a schematic diagram of a network structure of a modified YOLOv7 model according to an embodiment of the present invention;
FIG. 4 illustrates a two-channel attention mechanism diagram of an embodiment of the present invention;
FIG. 5 illustrates a data set class classification scheme in accordance with an embodiment of the invention;
FIG. 6 illustrates a fatigue determination network diagram of an embodiment of the present invention;
FIG. 7 shows a schematic diagram of a loss function before and after modification of an embodiment of the present invention;
FIG. 8 shows a schematic diagram of a loss function of different models according to an embodiment of the invention;
FIG. 9 is a diagram showing comparison of different algorithm evaluation indexes according to an embodiment of the present invention.
Detailed Description
As used herein, the terms "first," "second," and the like may be used to describe elements in exemplary embodiments of the present invention. These terms are only used to distinguish one element from another element, and the inherent feature or sequence of the corresponding element, etc. is not limited by the terms. Unless defined otherwise, all terms (including technical or scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Those skilled in the art will understand that the devices and methods of the present invention described herein and illustrated in the accompanying drawings are non-limiting exemplary embodiments and that the scope of the present invention is defined solely by the claims. The features illustrated or described in connection with one exemplary embodiment may be combined with the features of other embodiments. Such modifications and variations are intended to be included within the scope of the present invention.
Furthermore, it should be understood that one or more of the following methods or aspects thereof may be performed by at least one control system, control unit, or controller. The terms "control unit", "controller", "control module" or "main control module" may refer to a hardware device comprising a memory and a processor, and the term "washing machine" may refer to a washing device. The memory or computer-readable storage medium is configured to store program instructions, and the processor is specifically configured to execute the program instructions to perform one or more processes that will be described further below. Moreover, it should be appreciated that the following methods may be performed by including a processor in combination with one or more other components, as will be appreciated by those of ordinary skill in the art.
The embodiment of the invention provides a driver fatigue driving state detection method, which comprises the following steps, and in combination with the illustration in fig. 1, the driver fatigue driving state detection method based on the YOLOv7 algorithm provided by the embodiment of the invention can comprise the following steps S1 to S7. The experiment of the driver fatigue driving state detection method is carried out on a PC of a Window11-64 bit operating system, the Python programming language is used for building and training a deep learning model based on a pytorch deep learning frame, and the experiment adopts 3070Ti-8G for training acceleration.
Step S1, a sample dataset for fatigue detection is created.
In some embodiments, a fatigue detection monitoring video of a driver of a mine hoist can be collected, and 5000 pictures are selected, wherein the pictures comprise 1417 open-eye pictures, 1412 closed-eye pictures, 1282 mouth pictures (yawning pictures) and 889 mouth-closing pictures; and splicing by adopting a mode of combining Mosaic data enhancement with self-adaptive picture scaling, random cutting and random arrangement.
The mosaics data enhancement map is that 4 pictures are spliced in a random scaling, random cutting and random arrangement mode, and the processing process is shown in figure 2. The data processing mode enriches the background of the pictures to be detected, makes the targets to be detected smaller to a certain extent so as to expand the small targets, sends the processed pictures into a network for training at the moment, and is equivalent to calculating the data of 4 pictures each time, so that a single GPU can achieve a better effect.
And S2, building an improved YOLOv7 network.
The YOLOv7 algorithm is one of the most advanced target detection algorithms at present, and its speed and accuracy have exceeded those of most known target detectors. The YOLOv7 network model is mainly composed of convolution, an E-ELAN (extended-ELAN) module, an MP-1 module and an SPPCSPC module. The ELAN module mainly controls the shortest and longest gradient paths, making the network more robust and learning more features. The ELAN-W module is similar to the ELAN module, but the number of output branches of the ELAN-W is different, so that the learning capacity of the network is enhanced. The MP-1 module is divided into 2 parts to extract characteristics, which is beneficial to information fusion. The SPPCSPC module adds parallel MaxPool operation for multiple times after convolution, so that the problems of image distortion and the like caused by image processing operation are solved, and the image with repeated characteristics extracted by the convolutional neural network is avoided. And the Head end adopts a FPN+PAN structure, so that the transmission of semantic information, the positioning of characteristic information and the prediction capability of a network structure are enhanced.
In this embodiment, YOLOv7 is improved, as shown in fig. 3, the improved YOLOv7 network uses a MobileNet-V2 network structure to replace a CSPDarknet53 backbone network structure, a dual-channel attention mechanism for extracting fatigue characteristics is added, and the feature information of eyes and mouth can be focused more under limited resources by adding the dual-channel attention mechanism, so that the accuracy and instantaneity in fatigue detection of a driver are improved, and whether the driver performs fatigue driving can be detected better.
The parameters of the Yolov7 trunk feature extraction network CSPDarknet53 are large, so that the model training time is long. MobileNet-V2 is a lightweight trunk feature extraction network, and the detection speed of the network can be well improved by using MobileNet-V2 to replace the trunk feature extraction network CSPDarknet53 of YOLOv 7. The network structure of MobileNet-V2 consists mainly of 3 two-dimensional convolution operations, 7 linear bottleneck layers and one averaging pooling operation.
The backbone network of YOLOv7 algorithm continuously downsamples the input image during the convolution process, so that the feature map size becomes smaller and the depth of the feature map increases, and although the feature map of each dimension represents different information extracted, the information is not completely useful. Meanwhile, the backbone network has the same attention degree on the target area and other areas, so that the learning capability of the characteristic information of the target area is not strong, and the attention adding mechanism can pay more attention to the fatigue characteristic information of the driver under the condition of limited resources. The embodiment enhances the feature extraction capability of the network to small targets by introducing a dual-channel attention mechanism into the model. As shown in fig. 4, the dual-channel attention mechanism includes an upper branch and a lower branch, the upper branch firstly compresses a dimension w×h×c into 1×1×c through global maximum pooling (global max pooling, GMP), and then generates a 1×1×c attention weight coefficient through an activation function; the lower branch extracts the spatial information of the input features in a multi-branch mode, the input channel dimension of each branch is C, the spatial information of different scales on each channel feature map can be effectively extracted by compressing the channel dimension of the input tensor, the feature map is subjected to 3*3 convolution operation, then the two parts of split are subjected to connection operation, the attention weight coefficient obtained by the upper branch is multiplied by the feature map of the lower branch, and the extraction capability of small target features is improved. The dual-channel attention mechanism can extract global features and local features of the picture, and the detection effect of the network on the small target is improved. Copying 640 x 640 pictures to be detected, compressing the dimension of one picture to be detected into 1 x c through global maximum pooling, and generating an attention weight coefficient through an activation function; and (3) carrying out 3*3 convolution on another identical picture to be detected, then splitting, finally connecting to obtain a feature map, and multiplying the attention weight coefficient generated before with the feature map to obtain a final feature map.
The improved loss function of the YOLOv7 network modified in this embodiment is as follows:
wherein b and b gt Representing the prediction frame and the real frame respectively, c being the diagonal distance of the minimum closure area of the prediction frame and the real frame, alpha being the balance parameter, ρ being the Euclidean distance, v being the measure of whether the aspect ratio is consistent, I IoU Representing the intersection ratio, w, of a real frame and a predicted frame gt Representing the width, h of a real frame gt Representing the height of the prediction frame, the width of the w prediction frame and the height of the h prediction frame;
when the aspect ratio of the real frame and the predicted frame is as large, v is zero, and the penalty term has no effect; in order to prevent the aspect ratio penalty term from having an effect, an ACIoU loss function is designed, and the calculation formula is as follows:
wherein S is b To correspond to the area of the network output frame S bgt The area of the frame is marked for the data set, lambda represents regularization parameters, lambda=0 when the aspect ratio of the predicted frame is as large as that of the real frame, and the area where the real frame marked for the corresponding data set does not intersect with the predicted frame output by the corresponding network is calculated; λ=1 when the predicted box aspect ratio is different from the real box aspect ratio, at which point the loss function is calculated by αv.
And step S3, training the YOLOv7 network by using the sample data set to obtain a trained fatigue characteristic detection model.
Specifically, a camera can be used for collecting facial images of a driver, the collected images are marked by using a fairy assistant, and a PASVAL VOC format is adopted. Four categories including closed_eye, closed_mole, open_eye, open_mole, wherein closed_eye indicates that the driver is closing the eyes, closed_mole indicates that the mouth is not open, open_eye indicates that the eyes are open, open_mole indicates that the mouth is open, and the number of labels is as shown in fig. 5 and table 1, further, the sample dataset is processed according to 6:1:1 for training, verifying and testing the YOLOv7 network with the training set, verifying set and testing set, respectively.
Taking the above embodiment as an example, the training set has 4000 pictures, the number of training set pictures is 500, and the number of verification set pictures is 500.
TABLE 1
And (3) performing non-maximum suppression processing on the last detection frame of the target by adopting an NMS algorithm on the data set marked in the step (S1) to obtain an optimal target frame, and further performing model training by utilizing each picture in the sample data set.
Further, an inference test can be performed on the trained YOLOv7 model.
The model evaluation indexes adopted in the embodiment include: the mean value (mean average precision, mAP) of the speed measurement and the average accuracy is calculated as follows:
wherein: TP (true example) indicates that the true value and the predicted value are both positive examples; FP (false positive example) indicates that the true value is a negative example and the predicted value is a positive example; FN (false negative example) indicates that the true value is a positive example and the predicted value is a negative example. mAP is used for measuring the recognition accuracy and is obtained by averaging all the class AP values. By using the above formulas (5) to (7) to test and evaluate the fatigue state detection model trained in this embodiment, the mAP of the fatigue state detection model trained can reach 0.9885, and the detection speed reaches 70FPS. Fig. 7 to 9 and table 2 show the performance comparisons of the respective models.
Table 2 algorithm performance comparison
And S4, acquiring face image information of the driver.
The trained model can be used for real-time detection, and in the actual use process, the face image information of the driver can be acquired through the camera so as to utilize the trained fatigue characteristic detection model to carry out fatigue analysis.
And S5, inputting the face image information of the driver into the fatigue feature detection model, extracting the eye fatigue feature and the mouth fatigue feature in the face image information by utilizing a double-channel attention mechanism of the fatigue feature detection model, and analyzing the change data of the muscles around eyes and the change data of the muscles around the mouth of the driver by utilizing a facial expression coding system FACS of the fatigue expression.
To further describe the qualitative character of the fatigue expression, a general expression analysis means needs to be introduced. Regarding the description and analysis of facial expressions, feature quantization and mathematical coding are performed on diversified expressions. In the facial motion system, the facial expression of a person is divided into 64 independent and interrelated motion Units (AU), and the fatigue degree is judged according to different AU values.
AU1 and AU2 occur in combination and can be judged as deep fatigue; AU7 occurs in a fatigue state where the driving environment is uncomfortable; AU26 is in a fatigue state, AU43 exhibits a fatigue state, and the combination of AU26 and AU43 exhibits a deep fatigue state.
S6, carrying out fatigue analysis on the driver by fusing the eyestrain characteristics, the mouth fatigue characteristics, the change data of the muscles around the eyes and the change data of the muscles around the mouth to obtain a fatigue state analysis result of the driver;
the frequency of blinking of a person in a normal state is 10-20 times per minute and the duration of each blink is between 100-400 ms, but in a tired state the frequency of blinking and the duration of blinking are prolonged. When the blink frequency is more than 20 times per minute, the fatigue state (including the eye-closing state) is judged, and the blink frequency is more than 40 times per minute, the fatigue state is deeply judged.
Yawning is a conditioned reflex in a fatigue state, typically lasting for 3s-5s. The duration of the yawning is more than 5 seconds, the driver is considered to be in a fatigue state, and the duration of the yawning is more than 10 seconds, the driver is considered to be in a deep fatigue state. And carrying out fatigue judgment on 4 fatigue characteristics of blink frequency, yawning time, muscle changes around eyes and muscle changes around mouths through a network model.
The embodiment judges the fatigue state of the driver by fusing the eye closing time, the yawning time, the muscle change around the eyes and the muscle change around the mouth. When the blink frequency is more than 20 times per minute, the fatigue state (including the eye-closing state) is obtained, and the blink frequency is more than 40 times per minute, the deep fatigue is obtained; the duration of the yawning is more than 5 seconds, the driver is considered to be in a fatigue state, and the duration of the yawning is more than 10 seconds, the driver is considered to be in a deep fatigue state; AU7, AU26 and AU43 appear as fatigue states when they appear alone, any two combined occurrences appear as deep fatigue states, and AU1+ AU2 are also deep fatigue states.
And carrying out fatigue judgment on 4 fatigue characteristics of blink frequency, yawning time, muscle changes around eyes and muscle changes around mouths through a network model. As shown in fig. 6, the network model includes: 3 convolutional layers, 3 pooling layers, and 1 fully-connected layer. The 1 st convolution is 3×3×4 in size for coarse extraction of fatigue features. The 2 nd convolution size is 3×3×8 for more accurate extraction of fatigue features. The third convolution size is 3 x 12 for accurate extraction of fatigue features. The middle is connected by 3 pooling layers with the step length of 2 multiplied by 1 being 1, so that the operation efficiency can be improved. And normalizing the scores of the normal state, the fatigue state and the deep fatigue state to be the probability of (0, 1) by using a full connection layer of 1 multiplied by 4, and finally determining the highest probability as the fatigue grade by using a Softmax output layer.
To verify the test effect of the model herein on 3 states, 2 test persons were randomly drawn to perform test experiments, each test person was respectively subjected to 100 simulation experiments, and the experimental data are shown in table 3.
TABLE 3 Table 3
And S7, outputting the fatigue state analysis result.
Transmitting the fatigue state analysis result to a vehicle-mounted NVR through a vehicle-mounted network in combination with a vehicle-mounted switch; taking the vehicle-mounted NVR as a data forwarding center, and transmitting the fatigue state analysis result to a monitoring screen in cooperation with a vehicle-mounted switch; when the driver is in a fatigue state, automatically starting the entertainment system; when the driver is in a deep fatigue state, reminding information for reminding the driver to rest is sent.
The embodiment of the invention also provides a driver fatigue driving state detection device, which comprises one or more processors and a non-transitory computer readable storage medium storing program instructions, wherein when the one or more processors execute the program instructions, the one or more processors are used for realizing the method according to the embodiment.
According to the embodiment of the invention, the fatigue state of the driver is judged by the method, corresponding judgment information is sent to the monitoring screen, the entertainment system is automatically started when the fatigue state of the driver occurs, the driver is stimulated, the mental state of the driver is improved, and the driver is reminded to rest when the deep fatigue state of the driver occurs.
The embodiment of the invention gives consideration to the speed and the precision of the fatigue driving state of the driver, provides a fatigue state detection algorithm which takes YOLOv7 as a basic network model, improves the network and fuses multiple characteristics to perform fatigue judgment while performing data enhancement processing. And compared with other algorithms, the algorithm is verified, and the accuracy, the instantaneity and other indexes of the algorithm are improved to a certain extent.
The figures and detailed description of the invention referred to above as examples of the invention are intended to illustrate the invention, but not to limit the meaning or scope of the invention described in the claims. Accordingly, modifications may be readily made by one skilled in the art from the foregoing description. In addition, one skilled in the art may delete some of the constituent elements described herein without deteriorating the performance, or may add other constituent elements to improve the performance. Furthermore, one skilled in the art may vary the order of the steps of the methods described herein depending on the environment of the process or equipment. Thus, the scope of the invention should be determined not by the embodiments described above, but by the claims and their equivalents.
While the invention has been described in connection with what is presently considered to be practical, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for detecting a driver fatigue driving state, the method comprising:
collecting face image information of a driver;
inputting the face image information of the driver into a fatigue feature detection model, extracting eye fatigue features and mouth fatigue features in the face image information, and analyzing the change data of the muscles around eyes and the change data of the muscles around the mouth of the driver by using a facial expression coding system FACS of the fatigue expression;
performing fatigue analysis on the driver by fusing the eyestrain characteristics, the mouth fatigue characteristics, the change data of the muscles around the eyes and the change data of the muscles around the mouth to obtain a fatigue state analysis result of the driver;
and outputting the fatigue state analysis result.
2. The method according to claim 1, wherein before the inputting of the face image information of the driver into the fatigue feature detection model, the method further comprises:
creating a sample dataset for fatigue detection;
constructing a YOLOv7 network; the YOLOv7 network uses a MobileNet-V2 network structure and is provided with a double-channel attention mechanism for extracting fatigue characteristics;
and training the YOLOv7 network by using the sample data set to obtain a trained fatigue characteristic detection model.
3. The method according to claim 2, characterized in that the two-channel attention mechanism in the constructed improved YOLOv7 network comprises an upper branch and a lower branch;
the upper branch compresses the dimension w, h and c into 1, 1 and c through global maximization, and generates 1, 1 and c attention weight coefficients through an activation function; c represents the number of channels, w represents the width, and h represents the length;
the lower branch extracts the space information of the input features in a multi-branch mode, the dimension of the input channel of each branch is c, and the split two parts are connected after the convolution operation of 3*3 is carried out on the feature map;
and multiplying the attention weight coefficient obtained based on the upper branch by the feature map of the lower branch to obtain a final feature map.
4. A method according to claim 3, characterized in that the modified YOLOv7 network modified loss function constructed in step S2 is as follows:
wherein b and b gt Representing the prediction frame and the real frame respectively, c being the diagonal distance of the minimum closure area of the prediction frame and the real frame, alpha being the balance parameter, ρ being the Euclidean distance, v being the measure of whether the aspect ratio is consistent, I IoU Representing the intersection ratio, w, of a real frame and a predicted frame gt Representing the width, h of a real frame gt Representing the height of the prediction frame, the width of the w prediction frame and the height of the h prediction frame;
when the aspect ratio of the real frame and the predicted frame is as large, v is zero;
an ACIoU loss function is calculated as follows:
wherein S is b To correspond to the area of the network output frame S bgt Marking the area of a frame for a data set, wherein lambda represents regularization parameters, lambda=0 when the aspect ratio of a predicted frame is as large as that of a real frame, and calculating the area where the real frame and the predicted frame do not intersect; λ=1 when the predicted box aspect ratio is different from the real box aspect ratio, and a penalty term for the aspect ratio is calculated to calculate the loss function.
5. The method of claim 2, wherein creating a sample dataset for fatigue detection comprises:
collecting a driver fatigue detection monitoring video collected based on a camera, respectively extracting an eye opening picture, an eye closing picture, a mouth opening picture and a mouth closing picture to serve as sample pictures, and adding corresponding labels for each sample picture;
and splicing the sample pictures in a mode of combining the mosaics data enhancement with the self-adaptive picture scaling, random cutting and random arrangement to generate a sample data set.
6. The method of claim 5, wherein step S3 of training the YOLOv7 network with the sample data set comprises:
performing non-maximum suppression processing on the final detection frame of the target by adopting an NMS algorithm on the sample data set in the step S1 to obtain an optimal target frame;
the sample dataset was processed according to 6:1:1 for training, verifying and testing the YOLOv7 network with the training set, verifying set and testing set, respectively.
7. The method of claim 1, wherein the facial expression encoding system FACS analyzing the change data of the muscles around the eyes and the change data of the muscles around the mouth of the driver using the fatigue expression comprises:
the occurrence of the combination of the motion units AU1 and AU2 occurs in deep fatigue; AU7 occurs in a fatigue state where the driving environment is uncomfortable; AU26 is in a fatigue state, when both AU26 and AU43 are present alone, a fatigue state, and when the combination is present, a deep fatigue state.
8. The method according to any one of claims 1 to 7, wherein the fatigue analysis of the driver by fusing the eyestrain feature, the mouth fatigue feature, the change data of the muscles around the eyes, and the change data of the muscles around the mouth comprises:
setting the blink frequency to be larger than a first threshold value per minute, judging the blink frequency to be in a fatigue state, and setting the blink frequency to be larger than a second threshold value per minute to be in a deep fatigue state;
the duration of the yawning is longer than a third threshold value, the driver is considered to be in a fatigue state, and the duration of the yawning is longer than a fourth threshold value, the driver is considered to be in a deep fatigue state;
the fatigue degree judgment is carried out on four fatigue characteristics of blink frequency, yawning time, muscle changes around eyes and muscle changes around a mouth through a fatigue degree network judgment model of FIG. 6, so that a fatigue state analysis result of a driver in a normal state, a fatigue state or a deep fatigue state is obtained;
the outputting the fatigue state analysis result includes:
transmitting the fatigue state analysis result to a vehicle-mounted NVR through a vehicle-mounted network in combination with a vehicle-mounted switch;
taking the vehicle-mounted NVR as a data forwarding center, and transmitting the fatigue state analysis result to a monitoring screen in cooperation with a vehicle-mounted switch;
when the driver is in a fatigue state, automatically starting the entertainment system;
when the driver is in a deep fatigue state, reminding information for reminding the driver to rest is sent.
9. The method of claim 8, wherein the network model comprises: 3 convolutional layers, 3 pooling layers and 1 fully connected layer;
the 1 st convolution size is 3 x 4 and is used for carrying out first extraction on fatigue characteristics; the 2 nd convolution size is 3 x 8 and is used for carrying out second extraction on the fatigue characteristics; the third convolution size is 3 x 12 and is used for third extraction of fatigue characteristics, and the network of the first extraction, the second extraction and the third extraction gradually improves deep characteristic information; the middle is connected through 3 pooling layers with the step length of 2 x 1 being 1, so as to improve the operation efficiency.
10. Driver fatigue driving state detection device, characterized by comprising one or more processors and a non-transitory computer readable storage medium storing program instructions for implementing the method according to any of claims 1-9 when the one or more processors execute the program instructions.
CN202311428065.1A 2023-10-30 2023-10-30 Driver fatigue driving state detection method and device Pending CN117456516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311428065.1A CN117456516A (en) 2023-10-30 2023-10-30 Driver fatigue driving state detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311428065.1A CN117456516A (en) 2023-10-30 2023-10-30 Driver fatigue driving state detection method and device

Publications (1)

Publication Number Publication Date
CN117456516A true CN117456516A (en) 2024-01-26

Family

ID=89587013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311428065.1A Pending CN117456516A (en) 2023-10-30 2023-10-30 Driver fatigue driving state detection method and device

Country Status (1)

Country Link
CN (1) CN117456516A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975665A (en) * 2024-03-28 2024-05-03 钧捷智能(深圳)有限公司 DMS driver fatigue grade identification system
CN117975665B (en) * 2024-03-28 2024-07-02 钧捷智能(深圳)有限公司 DMS driver fatigue grade identification system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975665A (en) * 2024-03-28 2024-05-03 钧捷智能(深圳)有限公司 DMS driver fatigue grade identification system
CN117975665B (en) * 2024-03-28 2024-07-02 钧捷智能(深圳)有限公司 DMS driver fatigue grade identification system

Similar Documents

Publication Publication Date Title
CN108182409B (en) Living body detection method, living body detection device, living body detection equipment and storage medium
CN108791299A (en) A kind of driving fatigue detection of view-based access control model and early warning system and method
CN111126366B (en) Method, device, equipment and storage medium for distinguishing living human face
CN112528961B (en) Video analysis method based on Jetson Nano
CN108319909A (en) A kind of driving behavior analysis method and system
CN112818871B (en) Target detection method of full fusion neural network based on half-packet convolution
CN114202711A (en) Intelligent monitoring method, device and system for abnormal behaviors in train compartment
CN112906631A (en) Dangerous driving behavior detection method and detection system based on video
CN110555346A (en) Driver emotion detection method and device, electronic equipment and storage medium
CN112614102A (en) Vehicle detection method, terminal and computer readable storage medium thereof
CN115937830A (en) Special vehicle-oriented driver fatigue detection method
CN108108651B (en) Method and system for detecting driver non-attentive driving based on video face analysis
CN113269111B (en) Video monitoring-based elevator abnormal behavior detection method and system
CN113205060A (en) Human body action detection method adopting circulatory neural network to judge according to bone morphology
CN117456516A (en) Driver fatigue driving state detection method and device
CN116434173A (en) Road image detection method, device, electronic equipment and storage medium
CN112381068B (en) Method and system for detecting 'playing mobile phone' of person
CN115393802A (en) Railway scene unusual invasion target identification method based on small sample learning
CN114898140A (en) Behavior detection method and device based on PAA algorithm and readable medium
CN114792437A (en) Method and system for analyzing safe driving behavior based on facial features
CN114758326A (en) Real-time traffic post working behavior state detection system
CN112329566A (en) Visual perception system for accurately perceiving head movements of motor vehicle driver
CN115713751A (en) Fatigue driving detection method, device, storage medium and apparatus
CN113573009A (en) Video processing method, video processing device, computer equipment and storage medium
Shewale et al. Real time driver drowsiness detection system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination