CN110674701A - Driver fatigue state rapid detection method based on deep learning - Google Patents

Driver fatigue state rapid detection method based on deep learning Download PDF

Info

Publication number
CN110674701A
CN110674701A CN201910824958.5A CN201910824958A CN110674701A CN 110674701 A CN110674701 A CN 110674701A CN 201910824958 A CN201910824958 A CN 201910824958A CN 110674701 A CN110674701 A CN 110674701A
Authority
CN
China
Prior art keywords
face
network
fatigue
deep learning
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910824958.5A
Other languages
Chinese (zh)
Inventor
路小波
张晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910824958.5A priority Critical patent/CN110674701A/en
Publication of CN110674701A publication Critical patent/CN110674701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a driver fatigue state rapid detection method based on deep learning, which comprises the following steps: (1) collecting a color image of a driver during driving, detecting a face part of the driver in the image by using a deep learning method, and marking the face part by using a regression frame; (2) inputting the face boundary regression frame into a multi-task learning network as input, and finally detecting face key points and the posture angle of the head of the face; (3) and establishing a space-time fatigue feature sequence by using the key points of the face and the head attitude angle, inputting the feature sequence into a fatigue recognition deep learning network by taking the feature sequence as input, and finally outputting a fatigue state recognition result. The method fully considers the real-time and accuracy requirements of the fatigue state detection of the driver, designs a corresponding optimization method on the premise of ensuring the accuracy by compressing and optimizing the fatigue characteristics of the deep learning network model, and maximally compresses the volume of the network and improves the operation speed of the algorithm.

Description

Driver fatigue state rapid detection method based on deep learning
Technical Field
The invention belongs to the field of pattern recognition, and relates to a driver fatigue state rapid detection method based on deep learning.
Background
Research shows that fatigue driving is one of the main reasons for causing road traffic accidents, and the research of a fatigue detection algorithm has great significance for improving the road traffic safety. In recent years, as the problem of road safety is increasingly emphasized, the technology for detecting the fatigue state of the driver becomes a hot research topic in the related field, and different enterprises and scientific research units develop a plurality of different detection schemes. The traditional fatigue detection method has the defects that the real-time performance is poor, the traditional fatigue detection method needs to be in contact with the limbs of a driver, the robustness is low, and the like, and the traditional fatigue detection method cannot be popularized and applied. In recent years, with the emergence of high-performance GPUs and the development of artificial intelligence chips, deep learning methods have been greatly developed in the image field, and methods based on deep learning in various fields have very good performance, so that the application of deep learning methods to embedded platforms becomes possible.
The invention provides a driver fatigue state rapid detection algorithm based on deep learning, which can be mainly used in a scene of driver fatigue driving detection, greatly reduces the volume of a neural network and accelerates the detection speed on the premise of ensuring the detection accuracy, and therefore, the algorithm can be transplanted to an embedded platform with low calculation force to meet the application requirements. The algorithm is mainly divided into three parts: a human face detection algorithm, a human face key point and head attitude angle detection algorithm and a fatigue state detection algorithm.
The human face detection algorithm is a key link in fatigue driving detection, and is used for acquiring the position and size information of a human face from an image so as to provide service for subsequent fatigue identification. The face key point detection technology has very important significance for fatigue driving detection, and the effect of fatigue detection can be seriously influenced by wrong key point positioning results. The fatigue identification method based on the physiological signal cannot be well popularized and applied due to the fact that the cost is high and the fatigue identification method needs to be in direct contact with the limbs of a driver, and the fatigue identification method based on the video image becomes a popular research direction in the field of fatigue identification at present due to the characteristics of non-contact, low cost, easiness in implementation and the like.
Disclosure of Invention
The invention aims to solve the problems and provides a driver fatigue state rapid detection algorithm based on deep learning, and the algorithm can be mainly used in a scene of driver fatigue driving detection.
In order to achieve the purpose, the method adopted by the invention is as follows: the invention discloses a driver fatigue state rapid detection algorithm based on deep learning, which at least comprises a face detection algorithm, a face key point and head attitude angle detection algorithm and a fatigue state recognition algorithm, and the invention discloses a face fatigue state detection algorithm, which comprises the following steps:
step 1: collecting a color image of a driving state, detecting a face part in the image by using a three-level cascaded deep neural network, and marking the face part by using a regression frame, wherein the specific process comprises the following steps:
step 1.1: inputting the whole image into a first-stage face candidate frame generation network, processing each window with the size of 12 multiplied by 12 in the image, and mapping through a network output layer to obtain a two-dimensional face classification vector and a four-dimensional boundary frame regression offset, wherein the offset is used for correcting a face regression frame.
Step 1.2: and zooming the image of the face candidate frame output by the first-level network into a size of 24 multiplied by 24 to be used as the input of a second-level face candidate frame coarse screening network, and outputting by using an output layer after network learning to obtain a face classification vector and a bounding box regression vector.
Step 1.3: and zooming the image of the face candidate frame output by the second-level network into 48 x 48 size as the input of a third-level face candidate frame fine screening network, and outputting the global average pooling layer by using a non-maximum suppression algorithm based on the position confidence after network learning to obtain a face positioning vector, a face classification vector and a face boundary frame regression vector.
Step 2: and (3) taking the face boundary box finally output in the step (1) as input, inputting the input into a deep learning network based on multi-task learning, and finally outputting to obtain face key points and the head pose angle of the face.
Step 2.1: a feature extraction minimum unit structure is used for replacing a traditional large convolution structure in the network, so that the network can be greatly reduced and the algorithm operation rate can be improved on the premise of ensuring that the accuracy rate is slightly reduced.
Step 2.2: and (3) scaling the face bounding box finally output in the step (1) into a size of 128 multiplied by 128, inputting the face bounding box into a deep learning network based on multi-task learning and formed by a minimum unit structure for feature extraction, and finally outputting a vector formed by 68 face key points and a vector formed by three head attitude angles (a Pitch angle Pitch, a tilt angle Yaw and a Roll angle).
And step 3: and (3) establishing a space-time fatigue feature sequence by using the key points of the face and the head attitude angle obtained in the step (2), inputting the feature sequence into a fatigue recognition deep learning network by taking the feature sequence as input, and finally outputting a fatigue recognition result.
Step 3.1: and (3) inputting the key points of the left eye and the right eye obtained in the step (2) into an eye state identification network, and outputting an eye state classification vector for indicating whether the eyes are opened or closed.
Step 3.2: and (3) performing key point position correction based on head posture inclination on the key points of the mouth obtained in the step (2), calculating the opening and closing degree of the mouth by using the corrected key points, and judging whether the mouth is in a fatigue state or a normal state of yawning by setting a threshold value of the opening and closing degree.
Step 3.3: taking the left eye and the right eye, the mouth bar opening and closing degree and the head pitch angle as fatigue features, extracting the facial fatigue features of each frame of image in the video to obtain a fatigue description feature vector with the length of 4 in each frame of image, wherein the vector is expressed as follows:
Figure BDA0002188458000000031
in the formulaIs left and right eye state, xmouth(t)Is mouth openness, xpose(t)Head pitch angle.
Step 3.4: combining a plurality of frames of image fatigue description feature vectors with a time window size from a video in a frame extraction selection mode to form a space-time fatigue feature sequence, wherein the sequence expression is as follows:
Fi={vt,vt+k,vt+2k,...,vt+nk}
wherein n is the length of the time window, and k is the number of the fixed interval frames of the frame extraction.
In a preferred embodiment of the present invention, the output layer of the first-level Network in step 1.1 is a Full Convolutional Network (FCN) structure, and the output layer of the second-level Network in step 1.2 is a Global Average Pooling layer (GAP) structure, and the calculation formula is as follows:
Figure BDA0002188458000000033
wherein f isGAPOut(x) For the output of the global average pooling layer, M and N are the feature map size, xijIs the pixel value of the feature map.
The output layer of the third-level network in step 1.3 is a global average pooling layer structure of a non-maximum suppression algorithm based on the location confidence, the location confidence is defined as the overlapping rate (IoU) of the candidate bounding box and the real face box, and the expression is as follows:
Ploc=IoU=S(A∩B)/S(A∪B)
wherein A denotes a bounding box of an input image, B denotes a real bounding box of a human face, S denotes a region area symbol, P denotes a region area symbollocIndicating the confidence of the positioning.
In a preferred embodiment of the present invention, the minimum feature extraction unit structure in step 2.1 includes three features: (1) dividing the standard Convolution operation into two operations of depth Convolution (Depthwise) and point-by-point Convolution (Pointwise) by using a depth Separable Convolution structure (Depthwise Separable Convolition); (2) the design uses a short connection (ShortCut) structure to connect the input signature with the output signature at the end of the cell output; (3) the LeakyReLU activation function is used in the convolutional layer to replace the traditional ReLU and Sigmoid, and the calculation formula of LeakyReLU is as follows:
Figure BDA0002188458000000041
in a preferred embodiment of the present invention, in step 3.4, n is 60 and k is 1.
Has the advantages that:
1. in the first-stage face candidate frame generation network, the invention uses the full convolution network to generate the face candidate frame, the full convolution network can recover the category of each pixel from the abstract characteristics, the classification of the image level is extended to the classification of the pixel level, only one-time network forward calculation needs to be executed on the whole image, and the calculation amount brought by using a sliding window can be effectively reduced.
2. In the second-level face candidate frame coarse screening network, the invention uses the structure of the global average pooling layer to replace the traditional full-connection layer structure to reduce the huge parameters brought by the full-connection layer.
3. In a third-level face candidate frame fine screening network, the invention provides a feature extraction minimum unit structure to replace a traditional large convolution structure, so as to achieve the purposes of reducing the volume of a network model and accelerating the operation speed of an algorithm, and specifically comprises the following steps:
3.1 the use of the depth separable convolution can not only greatly improve the compression rate of the network and improve the detection speed of the convolutional neural network, but also design the network deeper to improve the performance of the network on the premise of being capable of being deployed to a mobile terminal.
3.2 the invention uses a short-connection structure to connect the input characteristic diagram with the output characteristic diagram in the final design of unit output, which is equivalent to splicing the characteristic diagrams across convolution layers, so that the output of the network combines the characteristics after convolution kernel extraction and the original characteristics, thereby relieving the degradation problem in a deep network model.
3.3 the invention uses LeakyReLU activation function to replace ReLU activation function, LeakyReLU activation function multiplies a very small weight in the negative half interval of the input, so that the negative number area is no longer saturated and dead, thus avoiding the problem that the neuron in the negative interval is no longer learnt.
Drawings
FIG. 1 is an overall flow chart of the method of the present invention;
FIG. 2 is a network architecture diagram of a face detection portion of the algorithm;
FIG. 3 is a network architecture diagram of a minimum feature extraction unit;
fig. 4 is a diagram of a fatigue state recognition network.
Detailed Description
The detailed process of the invention is clearly and completely described in the following with the help of the attached drawings and the embodiment of the specification.
The flow of the human face fatigue state detection algorithm is shown in fig. 1 to 4, and the human face fatigue state detection algorithm is specifically carried out according to the following steps:
step 1: collecting a color image of a driving state, detecting a face part in the image by using a three-level cascaded deep neural network, and marking the face part by using a regression frame, wherein the specific process comprises the following steps:
step 1.1: inputting the whole image into a first-stage face candidate frame generation network, processing each window with the size of 12 multiplied by 12 in the image, and mapping through a network output layer to obtain a two-dimensional face classification vector and a four-dimensional boundary frame regression offset, wherein the offset is used for correcting a face regression frame.
Step 1.2: and zooming the image of the face candidate frame output by the first-level network into a size of 24 multiplied by 24 to be used as the input of a second-level face candidate frame coarse screening network, and outputting by using an output layer after network learning to obtain a face classification vector and a bounding box regression vector.
Step 1.3: and zooming the image of the face candidate frame output by the second-level network into 48 x 48 size as the input of a third-level face candidate frame fine screening network, and outputting the global average pooling layer by using a non-maximum suppression algorithm based on the position confidence after network learning to obtain a face positioning vector, a face classification vector and a face boundary frame regression vector.
Step 2: and (3) taking the face boundary box finally output in the step (1) as input, inputting the input into a deep learning network based on multi-task learning, and finally outputting to obtain face key points and the head pose angle of the face.
Step 2.1: a feature extraction minimum unit structure is used for replacing a traditional large convolution structure in the network, so that the network can be greatly reduced and the algorithm operation rate can be improved on the premise of ensuring that the accuracy rate is slightly reduced.
Step 2.2: and (3) scaling the face bounding box finally output in the step (1) into a size of 128 multiplied by 128, inputting the face bounding box into a deep learning network based on multi-task learning and formed by a minimum unit structure for feature extraction, and finally outputting a vector formed by 68 face key points and a vector formed by three head attitude angles (a Pitch angle Pitch, a tilt angle Yaw and a Roll angle).
And step 3: and (3) establishing a space-time fatigue feature sequence by using the key points of the face and the head attitude angle obtained in the step (2), inputting the feature sequence into a fatigue recognition deep learning network by taking the feature sequence as input, and finally outputting a fatigue recognition result.
Step 3.1: and (3) inputting the key points of the left eye and the right eye obtained in the step (2) into an eye state identification network, and outputting an eye state classification vector for indicating whether the eyes are opened or closed.
Step 3.2: and (3) performing key point position correction based on head posture inclination on the key points of the mouth obtained in the step (2), calculating the opening and closing degree of the mouth by using the corrected key points, and judging whether the mouth is in a fatigue state or a normal state of yawning by setting a threshold value of the opening and closing degree.
Step 3.3: taking the left eye and the right eye, the mouth bar opening and closing degree and the head pitch angle as fatigue features, extracting the facial fatigue features of each frame of image in the video to obtain a fatigue description feature vector with the length of 4 in each frame of image, wherein the vector is expressed as follows:
Figure BDA0002188458000000061
in the formula
Figure BDA0002188458000000062
Is left and right eye state, xmouth(t) mouth openness, xpose(t) is head pitch angle.
Step 3.4: combining a plurality of frames of image fatigue description feature vectors with a time window size from a video in a frame extraction selection mode to form a space-time fatigue feature sequence, wherein the sequence expression is as follows:
Fi={vt,vt+k,vt+2k,...,vt+nk}
wherein n is the length of the time window, and k is the number of the fixed interval frames of the frame extraction. The patent shows that the better scheme is n-60 and k-1 through experimental comparison.
Step 3.5: and processing a section of video, inputting the obtained space-time fatigue characteristic sequence serving as input into a fatigue recognition network based on long-time memory, and outputting a fatigue recognition result of the section of video.
The output layer of the first-level network in step 1.1 is a Full Convolutional Network (FCN) structure, and the output layer of the second-level network in step 1.2 is a Global average pooling layer (GAP) structure, and the calculation formula is as follows:
Figure BDA0002188458000000063
wherein f isGAPOut(x) For the output of the global average pooling layer, M and N are the feature map size, xijIs the pixel value of the feature map.
The output layer of the third-level network in step 1.3 is a global average pooling layer structure of a non-maximum suppression algorithm based on the location confidence, the location confidence is defined as the overlapping rate (IoU) of the candidate bounding box and the real face box, and the expression is as follows:
Ploc=IoU=S(A∩B)/S(A∪B)
wherein A denotes a bounding box of an input image, B denotes a real bounding box of a human face, S denotes a region area symbol, P denotes a region area symbollocIndicating the confidence of the positioning.
The minimum feature extraction unit structure in step 2.1 includes three features: (1) dividing the standard Convolution operation into two operations of depth Convolution (Depthwise) and point-by-point Convolution (Pointwise) by using a depth Separable Convolution structure (Depthwise Separable Convolition); (2) the design uses a short connection (ShortCut) structure to connect the input signature with the output signature at the end of the cell output; (3) the LeakyReLU activation function is used in the convolutional layer to replace the traditional ReLU and Sigmoid, and the calculation formula of LeakyReLU is as follows:
Figure BDA0002188458000000071
in the first-level face candidate frame generation network, the traditional method uses an image pyramid sliding window method to generate face candidate frames, which requires that the candidate frames obtained by each sliding window need to be subjected to forward network calculation once to obtain the face classification confidence. The invention uses the full convolution network to generate the face candidate frame, the full convolution network can recover the category of each pixel from the abstract characteristics, the classification of the image level is extended to the classification of the pixel level, only one network forward calculation is needed to be executed on the whole image, and the calculation amount brought by using the sliding window can be effectively reduced.
In the second-level face candidate frame coarse screening network, a full-connection structure is used in the traditional method, and each node of a full-connection layer needs to be connected with all nodes of the previous layer, so that the parameter quantity contained in the full-connection layer is excessive. The invention uses the structure of the global average pooling layer to replace the traditional full-connection layer structure to reduce the huge parameters brought by the full-connection layer.
In the third-level face candidate frame fine screening network, a non-maximum suppression algorithm based on classification confidence is used for combining overlapped face frames for an obtained candidate frame set in a traditional face detection algorithm, but compared with the classification confidence, the positioning confidence is more closely associated with a real target boundary frame, so that the target positioning boundary frame obtained by the non-maximum suppression algorithm based on the positioning confidence is relatively more accurate.
In addition, in order to compress the volumes of the face candidate frame and the head pose angle network, the invention provides a feature extraction minimum unit structure to replace the traditional large convolution structure so as to achieve the purposes of reducing the volume of a network model and accelerating the operation speed of an algorithm. The structure makes the following improvements:
in the standard convolution, each convolution kernel can simultaneously perform the same convolution operation with each channel of the input picture, while in the deep convolution operation, one convolution kernel only performs the convolution operation with one channel, and the feature maps are recombined to generate a new feature map in a point-by-point convolution mode. The depth separable convolution can greatly improve the compression rate of the network and the detection speed of the convolutional neural network, and the network can be designed to be deeper on the premise of being deployed to a mobile terminal to improve the performance of the network.
The deeper the number of layers of the convolutional neural network in the traditional CNN model is, the smaller the gradient of the layer closer to the former network is due to the propagation of the gradient from back to front, so that the problem of gradient disappearance is easily caused, and the deep convolutional neural network is difficult to train. According to the method, the input feature diagram and the output feature diagram are connected by using a short connection structure in the final design of unit output, which is equivalent to splicing the feature diagrams across convolution layers, so that the output of the network is combined with the features extracted by the convolution kernel and the original features, and the degradation problem in a deep network model is relieved.
The activation function used in the standard convolutional layer is a ReLU activation function, when the input value is negative, the output of the ReLU function is always 0, and the first derivative of the ReLU function is also always zero, which may result in that the parameters of the neuron cannot be updated, and when the convolutional neural network is used to perform the function approximation application of the key point detection, the fitting ability of the network may be reduced. According to the invention, the LeakyReLU activation function is used for replacing the ReLU activation function, and the LeakyReLU activation function is multiplied by a small weight in the input negative half interval, so that the negative number region is not saturated and died any more, and the problem that neurons in the negative interval are not learned any more is avoided.

Claims (4)

1. A driver fatigue state rapid detection method based on deep learning is characterized by comprising the following steps:
step 1: collecting a color image of a driving state, detecting a face part in the image by using a three-level cascaded deep neural network, and marking the face part by using a regression frame, wherein the specific process comprises the following steps:
step 1.1: inputting the whole image into a first-stage face candidate frame generation network, processing each window with the size of 12 multiplied by 12 in the image, mapping through a network output layer to obtain a two-dimensional face classification vector and a four-dimensional boundary frame regression offset, wherein the offset is used for correcting a face regression frame;
step 1.2: zooming the image of the face candidate frame output by the first-level network into 24 x 24 size as the input of the second-level face candidate frame coarse screening network, and obtaining a face classification vector and a bounding box regression vector by using the output layer output after network learning;
step 1.3: zooming the image of the face candidate frame output by the second-level network into 48 x 48 size as the input of a third-level face candidate frame fine screening network, and outputting the global average pooling layer by using a non-maximum suppression algorithm based on position confidence after network learning to obtain a positioning vector, a face classification vector and a face boundary frame regression vector of the face;
step 2: inputting the face bounding box finally output in the step 1 as input into a deep learning network based on multi-task learning, and finally outputting to obtain face key points and a head pose angle of the face, wherein the method specifically comprises the following steps:
step 2.1: replacing the traditional large convolution structure with a feature extraction minimum unit structure in the network;
step 2.2: the face bounding box finally output in the step 1 is scaled to be 128 x 128 in size, and is input into a deep learning network based on multi-task learning and formed by a minimum unit structure for feature extraction, and a vector consisting of 68 face key points and three head pose angles are finally output: a Pitch angle Pitch, a vector consisting of an inclination angle Yaw and a swing angle Roll;
and step 3: and (3) establishing a space-time fatigue feature sequence by using the face key points and the head attitude angles obtained in the step (2), inputting the feature sequence into a fatigue recognition deep learning network by taking the feature sequence as input, and finally outputting a fatigue recognition result, wherein the method specifically comprises the following steps:
step 3.1: inputting the key points of the left eye and the right eye obtained in the step 2 into an eye state identification network, and outputting an eye state classification vector for indicating whether the eyes are opened or closed;
step 3.2: performing key point position correction based on head posture inclination on the key points of the mouth obtained in the step 2, calculating the opening degree of the mouth by using the corrected key points, and judging whether the mouth is in a fatigue state of yawning or a normal state by setting a threshold value of the opening degree;
step 3.3: taking the left eye and the right eye, the mouth bar opening and closing degree and the head pitch angle as fatigue features, extracting the facial fatigue features of each frame of image in the video to obtain a fatigue description feature vector with the length of 4 in each frame of image, wherein the vector is expressed as follows:
Figure FDA0002188457990000021
in the formula
Figure FDA0002188457990000022
Is left and right eye state, xmouth(t)Is mouth openness, xpose(t)Is a head pitch angle;
step 3.4: combining a plurality of frames of image fatigue description feature vectors with a time window size from a video in a frame extraction selection mode to form a space-time fatigue feature sequence, wherein the sequence expression is as follows:
Fi={vt,vt+k,vt+2k,...,vt+nk}
wherein n is the length of the time window, and k is the number of the fixed interval frames of the frame extraction.
Step 3.5: and processing a section of video, inputting the obtained space-time fatigue characteristic sequence serving as input into a fatigue recognition network based on long-time memory, and outputting a fatigue recognition result of the section of video.
2. The method for rapidly detecting the fatigue state of the driver based on the deep learning as claimed in claim 1, wherein: the output layer of the first-level network in the step 1.1 is a full convolution network structure, the output layer of the second-level network in the step 1.2 is a global average pooling layer structure, and the calculation formula is as follows:
wherein f isGAPOut(x) For the output of the global average pooling layer, M and N are the feature map size, xijIs the pixel value of the feature map;
the output layer of the third-level network in step 1.3 is a global average pooling layer structure of a non-maximum suppression algorithm based on the location confidence, the location confidence is defined as the overlapping rate (IoU) of the candidate bounding box and the real face box, and the expression is as follows:
Ploc=IoU=S(A∩B)/S(A∪B)
wherein A denotes a bounding box of an input image, B denotes a real bounding box of a human face, S denotes a region area symbol, P denotes a region area symbollocIndicating the confidence of the positioning.
3. The method for rapidly detecting the fatigue state of the driver based on the deep learning as claimed in claim 1, wherein: the minimum feature extraction unit structure in step 2.1 includes three features: (1) the standard convolution operation is divided into two operations of depth convolution and point-by-point convolution by using a depth separable convolution structure; (2) the design uses a short connection structure to connect the input characteristic diagram with the output characteristic diagram at the end of the unit output; (3) the LeakyReLU activation function is used in the convolutional layer to replace the traditional ReLU and Sigmoid, and the calculation formula of LeakyReLU is as follows:
Figure FDA0002188457990000031
4. the deep learning-based rapid detection method for the fatigue state of the driver according to claim 1, characterized in that: in step 3.4, n is 60 and k is 1.
CN201910824958.5A 2019-09-02 2019-09-02 Driver fatigue state rapid detection method based on deep learning Pending CN110674701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910824958.5A CN110674701A (en) 2019-09-02 2019-09-02 Driver fatigue state rapid detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910824958.5A CN110674701A (en) 2019-09-02 2019-09-02 Driver fatigue state rapid detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN110674701A true CN110674701A (en) 2020-01-10

Family

ID=69075921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910824958.5A Pending CN110674701A (en) 2019-09-02 2019-09-02 Driver fatigue state rapid detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN110674701A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242049A (en) * 2020-01-15 2020-06-05 武汉科技大学 Student online class learning state evaluation method and system based on facial recognition
CN111645695A (en) * 2020-06-28 2020-09-11 北京百度网讯科技有限公司 Fatigue driving detection method and device, computer equipment and storage medium
CN111738262A (en) * 2020-08-21 2020-10-02 北京易真学思教育科技有限公司 Target detection model training method, target detection model training device, target detection model detection device, target detection equipment and storage medium
CN111898473A (en) * 2020-07-10 2020-11-06 华南农业大学 Driver state real-time monitoring method based on deep learning
CN112101103A (en) * 2020-08-07 2020-12-18 东南大学 Video driver fatigue detection method based on deep integration network
CN112304435A (en) * 2020-10-10 2021-02-02 广州中大数字家庭工程技术研究中心有限公司 Human body thermal imaging temperature measurement method combining face recognition
CN112668480A (en) * 2020-12-29 2021-04-16 上海高德威智能交通系统有限公司 Head attitude angle detection method and device, electronic equipment and storage medium
CN112686187A (en) * 2021-01-05 2021-04-20 四川铁投信息技术产业投资有限公司 Road traffic abnormal state detection method and device based on deep learning video classification
CN112733628A (en) * 2020-12-28 2021-04-30 杭州电子科技大学 Fatigue driving state detection method based on MobileNet-V3
CN113361452A (en) * 2021-06-24 2021-09-07 中国科学技术大学 Driver fatigue driving real-time detection method and system based on deep learning
CN113537115A (en) * 2021-07-26 2021-10-22 东软睿驰汽车技术(沈阳)有限公司 Method and device for acquiring driving state of driver and electronic equipment
CN113780158A (en) * 2021-09-08 2021-12-10 宁波书写芯忆科技有限公司 Intelligent attention force detection method
WO2022001091A1 (en) * 2020-06-29 2022-01-06 北京百度网讯科技有限公司 Dangerous driving behavior recognition method and apparatus, and electronic device and storage medium
CN114821747A (en) * 2022-05-26 2022-07-29 深圳市科荣软件股份有限公司 Method and device for identifying abnormal state of construction site personnel
CN114821713A (en) * 2022-04-08 2022-07-29 湖南大学 Fatigue driving detection method based on Video transducer
CN115565159A (en) * 2022-09-28 2023-01-03 华中科技大学 Construction method and application of fatigue driving detection model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105769120A (en) * 2016-01-27 2016-07-20 深圳地平线机器人科技有限公司 Fatigue driving detection method and device
CN108309311A (en) * 2018-03-27 2018-07-24 北京华纵科技有限公司 A kind of real-time doze of train driver sleeps detection device and detection algorithm
CN109740477A (en) * 2018-12-26 2019-05-10 联创汽车电子有限公司 Study in Driver Fatigue State Surveillance System and its fatigue detection method
CN110119676A (en) * 2019-03-28 2019-08-13 广东工业大学 A kind of Driver Fatigue Detection neural network based

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105769120A (en) * 2016-01-27 2016-07-20 深圳地平线机器人科技有限公司 Fatigue driving detection method and device
CN108309311A (en) * 2018-03-27 2018-07-24 北京华纵科技有限公司 A kind of real-time doze of train driver sleeps detection device and detection algorithm
CN109740477A (en) * 2018-12-26 2019-05-10 联创汽车电子有限公司 Study in Driver Fatigue State Surveillance System and its fatigue detection method
CN110119676A (en) * 2019-03-28 2019-08-13 广东工业大学 A kind of Driver Fatigue Detection neural network based

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LI, H等: "A convolutional neural network cascade for face detection", 《COMPUTER VISION & PATTERN RECOGNITION IEEE》 *
MIN LIN等: "Network In Network", 《ARXIV:1312.4400V3》 *
刘天亮等: "融合空间-时间双网络流和视觉注意的人体行为识别", 《电子与信息学报》 *
杨龙等: "基于深度卷积神经网络的 SAR 舰船目标检测", 《HTTP://KNS.CNKI.NET/KCMS/DETAIL/11.2422.TN.20190725.1519.002.HTML》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242049A (en) * 2020-01-15 2020-06-05 武汉科技大学 Student online class learning state evaluation method and system based on facial recognition
CN111242049B (en) * 2020-01-15 2023-08-04 武汉科技大学 Face recognition-based student online class learning state evaluation method and system
CN111645695A (en) * 2020-06-28 2020-09-11 北京百度网讯科技有限公司 Fatigue driving detection method and device, computer equipment and storage medium
CN111645695B (en) * 2020-06-28 2022-08-09 北京百度网讯科技有限公司 Fatigue driving detection method and device, computer equipment and storage medium
WO2022001091A1 (en) * 2020-06-29 2022-01-06 北京百度网讯科技有限公司 Dangerous driving behavior recognition method and apparatus, and electronic device and storage medium
CN111898473A (en) * 2020-07-10 2020-11-06 华南农业大学 Driver state real-time monitoring method based on deep learning
CN111898473B (en) * 2020-07-10 2023-09-01 华南农业大学 Driver state real-time monitoring method based on deep learning
CN112101103A (en) * 2020-08-07 2020-12-18 东南大学 Video driver fatigue detection method based on deep integration network
CN112101103B (en) * 2020-08-07 2022-08-09 东南大学 Video driver fatigue detection method based on deep integration network
CN111738262A (en) * 2020-08-21 2020-10-02 北京易真学思教育科技有限公司 Target detection model training method, target detection model training device, target detection model detection device, target detection equipment and storage medium
CN112304435A (en) * 2020-10-10 2021-02-02 广州中大数字家庭工程技术研究中心有限公司 Human body thermal imaging temperature measurement method combining face recognition
CN112733628A (en) * 2020-12-28 2021-04-30 杭州电子科技大学 Fatigue driving state detection method based on MobileNet-V3
CN112668480B (en) * 2020-12-29 2023-08-04 上海高德威智能交通系统有限公司 Head attitude angle detection method and device, electronic equipment and storage medium
CN112668480A (en) * 2020-12-29 2021-04-16 上海高德威智能交通系统有限公司 Head attitude angle detection method and device, electronic equipment and storage medium
CN112686187A (en) * 2021-01-05 2021-04-20 四川铁投信息技术产业投资有限公司 Road traffic abnormal state detection method and device based on deep learning video classification
CN113361452A (en) * 2021-06-24 2021-09-07 中国科学技术大学 Driver fatigue driving real-time detection method and system based on deep learning
CN113537115A (en) * 2021-07-26 2021-10-22 东软睿驰汽车技术(沈阳)有限公司 Method and device for acquiring driving state of driver and electronic equipment
CN113780158A (en) * 2021-09-08 2021-12-10 宁波书写芯忆科技有限公司 Intelligent attention force detection method
CN113780158B (en) * 2021-09-08 2023-10-31 宁波书写芯忆科技有限公司 Intelligent concentration detection method
CN114821713A (en) * 2022-04-08 2022-07-29 湖南大学 Fatigue driving detection method based on Video transducer
CN114821747A (en) * 2022-05-26 2022-07-29 深圳市科荣软件股份有限公司 Method and device for identifying abnormal state of construction site personnel
CN115565159A (en) * 2022-09-28 2023-01-03 华中科技大学 Construction method and application of fatigue driving detection model
CN115565159B (en) * 2022-09-28 2023-03-28 华中科技大学 Construction method and application of fatigue driving detection model

Similar Documents

Publication Publication Date Title
CN110674701A (en) Driver fatigue state rapid detection method based on deep learning
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN110570458B (en) Target tracking method based on internal cutting and multi-layer characteristic information fusion
Chen et al. Survey of pedestrian action recognition techniques for autonomous driving
CN106778796B (en) Human body action recognition method and system based on hybrid cooperative training
WO2021213158A1 (en) Real-time face summarization service method and system for intelligent video conference terminal
CN109948721B (en) Video scene classification method based on video description
CN110378208B (en) Behavior identification method based on deep residual error network
CN112288627B (en) Recognition-oriented low-resolution face image super-resolution method
CN109190561B (en) Face recognition method and system in video playing
CN110956082B (en) Face key point detection method and detection system based on deep learning
CN109063626B (en) Dynamic face recognition method and device
CN109543632A (en) A kind of deep layer network pedestrian detection method based on the guidance of shallow-layer Fusion Features
CN109635693B (en) Front face image detection method and device
CN105243376A (en) Living body detection method and device
KR20070016849A (en) Method and apparatus for serving prefer color conversion of skin color applying face detection and skin area detection
CN111402237A (en) Video image anomaly detection method and system based on space-time cascade self-encoder
CN113792635A (en) Gesture recognition method based on lightweight convolutional neural network
CN111768354A (en) Face image restoration system based on multi-scale face part feature dictionary
CN111401116B (en) Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
CN106778576A (en) A kind of action identification method based on SEHM feature graphic sequences
CN114155512A (en) Fatigue detection method and system based on multi-feature fusion of 3D convolutional network
CN109784215A (en) A kind of in-vivo detection method and system based on improved optical flow method
CN110503049B (en) Satellite video vehicle number estimation method based on generation countermeasure network
Wang et al. An attention self-supervised contrastive learning based three-stage model for hand shape feature representation in cued speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200110