US20230154207A1 - Driver fatigue detection method and system based on combining a pseudo-3d convolutional neural network and an attention mechanism - Google Patents
Driver fatigue detection method and system based on combining a pseudo-3d convolutional neural network and an attention mechanism Download PDFInfo
- Publication number
- US20230154207A1 US20230154207A1 US17/043,681 US202017043681A US2023154207A1 US 20230154207 A1 US20230154207 A1 US 20230154207A1 US 202017043681 A US202017043681 A US 202017043681A US 2023154207 A1 US2023154207 A1 US 2023154207A1
- Authority
- US
- United States
- Prior art keywords
- module
- attention
- driver fatigue
- fatigue detection
- convolution kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/59—Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
- G06V20/597—Recognising the driver's state or behaviour, e.g. attention or drowsiness
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B21/00—Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
- G08B21/02—Alarms for ensuring the safety of persons
- G08B21/06—Alarms for ensuring the safety of persons indicating a condition of sleep, e.g. anti-dozing alarms
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B21/00—Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
- G08B21/18—Status alarms
Definitions
- the present invention relates to the technical field of intelligent video analysis, and more particularly, to a driver fatigue detection method and system based on combining a pseudo-three-dimensional (P3D) convolutional neural network (CNN) and an attention mechanism.
- P3D pseudo-three-dimensional
- CNN convolutional neural network
- Fatigue driving is one of the major causes of traffic accidents.
- Drivers in fatigue state often feel drowsy and have transient loss of consciousness, which impairs their alertness.
- a driver who experiences fatigue while driving is less likely to have the capacity to react to sudden events, reflect traffic control and dangerous events, which all lead to motor vehicle accidents.
- the American Automobile Association (AAA) reported that 7% of all motor vehicle accidents and 21% of fatal traffic accidents were caused by fatigue drivers, indicating that fatigue driving constitutes the largest proportion of road traffic accidents.
- Prior art methods for detecting fatigue driving behavior can be divided into three types: physiological parameters based detection method, vehicle behavior based detection method and facial feature analysis based detection method.
- the physiological parameters based detection method requires a variety of sensors to have contact with the driver’s body, and determines whether the driver is in the fatigue state based on the detection of different physiological signals, such as, electrocardiography (ECG) and/or electroencephalogram (EEG), the currently known systems that have been used in this method are, for example, electromyography (EDA), Respiration and ECG.
- ECG electrocardiography
- EEG electroencephalogram
- physiological parameters based detection method has high fatigue driving detection accuracy, but it relies on experimental equipment and is considered invasive, which limits its application range.
- the vehicle behavior based detection method uses vehicle behavior parameters, such as lane departure, steering wheel angle (SWA) and yaw angle (YA) to detect fatigue driving behavior. Similar to physiological parameters based detection method, this method also depends on external factors such as road conditions.
- the facial feature analysis based detection method extracts the facial feature points of the driver, compares the fatigue state and the normal state of the driver, and detects the fatigue behavior characteristics of the driver such as head movement posture, eye state (blinking) and yawning.
- This method outperforms the above two methods in its advantages of being non-invasive and its easy implementation.
- One proposed method requires continuous recording of a driver’s eyelid movement through infrared sensors and studying the effectiveness of spontaneous blinking parameters.
- This proposed method determines the sub-components of blinking duration, namely the closing time and reopening time are investigated. Studies have shown that the blinking duration and reopening time change reliably with increasing drowsiness.
- the human face extraction system uses a support vector machine (SVM) based on facial extraction and a mouth detection method based on circular Hough transform (CHT) to detect the movement around the driver’s mouth area in determining the fatigue state of the driver based on the opening degree of the driver’s mouth.
- SVM support vector machine
- CHT circular Hough transform
- a CNN based image spatial feature extraction system and a long short-term memory (LSTM) based image temporal feature analysis system use LSTM to integrate information on the time series to obtain the optimal determination performance, and the frame-level CNN feature output is aggregated into video-level features for prediction.
- the study results show that the yawning detection obtains an accuracy rate of over 87% when performed on a continuous video.
- a method performs transfer learning on a trained Inception-v3 model, uses the trained Inception-v3 model to extract spatial features, the extracted spatial features are then input into the LSTM layer to integrate the temporal features for the prediction of the fatigue state.
- the multi-CNN-based driver activity detection model uses a Gaussian mixture model to segment the original image and extracts the driver’s physical features from the background. This model can effectively determine whether the driver is distracted, and its accuracy rate can reach 91.4%. These methods are more robust than the artificial feature based methods and better capture the relationship between different cues.
- the prediction model has massive parameters and contains a large amount of redundant spatial data.
- the convolutional spatial features are converted into a one-dimensional (1D) vector and input into the time series model, without considering the spatial correlation and without removing the interference of the background and noise on recognition. As a result, the temporal and spatial features cannot be well integrated.
- An objective of the present invention is to improve the performance of fatigue driving prediction, including provide a driver fatigue detection model based on combining a pseudo-three-dimensional (P3D) convolutional neural network (CNN) and an attention mechanism.
- the present invention creates a P3D-Attention module based on the decoupling of spatial and temporal convolutions by a P3D module, which is respectively integrated with a spatial attention (SA) module and a dual-channel attention model (DCAM) that are adaptive, thereby fully integrating the spatiotemporal features. Thereby improving the correlation of important channel features and increasing the global correlation of feature maps.
- SA spatial attention
- DCAM dual-channel attention model
- the present invention discloses a driver fatigue detection method based on combining a P3D CNN and an attention mechanism, including:
- the DCAM applies attention between video frames and on channels of each frame;
- the feature map is F ⁇ R (F ⁇ H ⁇ W ⁇ C) , where F in R represents the number of frames; C represents the number of channels in each frame; H and W represent features under different channels; a weight of M c ⁇ R (F ⁇ 1 ⁇ 1 ⁇ C) is learned to determine the importance of each channel;
- the feature map with a size of (F,H,W,C) is transposed into (H, W, F ⁇ C), and (H, W, F ⁇ C) is combined with a two-dimensional (2D) spatial attention (SA) module; weights of M c ⁇ R (F ⁇ 1 ⁇ 1 ⁇ 1) and M c ⁇ R (1 ⁇ 1 ⁇ 1 ⁇ C) are respectively learned to express attention on the frame and channel.
- SA spatial attention
- the 2D SA module uses a 2D convolution kernel to obtain a weight map of a feature layer in a spatial dimension; a SA module of F ⁇ R (F ⁇ H ⁇ W ⁇ C) learns a weight of M s ⁇ R (1 ⁇ W ⁇ H ⁇ 1) to determine the importance of each feature map.
- the attention module is divided into three different P3D-Attention modules to obtain a network model
- classification step includes:
- classification step specifically includes:
- a method of extracting the frame sequence from the video of a driver and processing the frame sequence specifically includes: capturing the video for approximately 5 seconds each time to extract 180 video frames.
- the step of performing the spatiotemporal feature learning through the P3D convolution module specifically includes: simulating a 3 ⁇ 3 ⁇ 3 convolution in spatial and temporal domains and decoupling the 3 ⁇ 3 ⁇ 3 convolution in time and space through a P3D architecture that uses a 1 ⁇ 3 ⁇ 3 convolution kernel and a 3 ⁇ 1 ⁇ 1 convolution kernel; and based on the P3D architecture, cascading P3D architectures with sizes of 32, 64 and 128 to obtain an image feature.
- the present invention further proposes a driver fatigue detection system based on combining a P3D CNN and an attention mechanism, including:
- the video capture and crop module captures a real-time video stream of upper body information of the driver
- the present invention designs a P3D-Attention module.
- the P3D-Attention module is based on the decoupling of spatial and temporal convolutions by the P3D module, and is respectively integrated with a SAmodule and a DCAM that are adaptive, thereby fully integrating the spatiotemporal features. In this way, the correlation of important channel features is improved and the global correlation of feature maps is increased, thereby improving the performance of fatigue driving prediction.
- FIG. 1 schematically shows modular design of the driver fatigue detection method based on combining the pseudo-three-dimensional (P3D) convolutional neural network (CNN) and the attention mechanism.
- P3D pseudo-three-dimensional
- CNN convolutional neural network
- FIG. 2 is a schematic diagram showing the cascading of the three P3D modules.
- FIG. 3 schematically shows the structure of the channel attention (CA) module.
- FIG. 4 schematically shows the structure of the spatial attention (SA) module.
- FIG. 5 schematically shows the P3D-Attention-A architecture.
- FIG. 6 schematically shows the P3D-Attention-B architecture.
- FIG. 7 schematically shows the P3D-Attention-C architecture.
- FIG. 8 A is a schematic diagram showing the features before being processed by the attention mechanism.
- FIG. 8 B is a schematic diagram showing the features after being processed by the attention mechanism.
- FIG. 9 is a schematic diagram showing the cascaded modules of P3D-Attention.
- the present invention proposes a driver fatigue detection method based on combining a pseudo-three-dimensional (P3D) convolutional neural network (CNN) and an attention mechanism, which includes the following steps:
- Step 1 a method of extracting and processing the video frame sequence from the video includes: The video is captured for approximately 5 seconds each time to extract 180 video frames.
- the step of performing the spatiotemporal feature learning through the P3D convolution module specifically includes:
- the P3D architecture uses a 1 ⁇ 3 ⁇ 3 convolution kernel and a 3 ⁇ 1 ⁇ 1 convolution kernel to simulate a 3 ⁇ 3 ⁇ 3 convolution in spatial and temporal domains, and decouples the 3 ⁇ 3 ⁇ 3 convolution in time and space.
- P3D architectures with the sizes of 32, 64, and 128 are cascaded to obtain an image feature, and down-sampling is performed through a max pooling layer, as shown in FIG. 2 .
- the step of constructing the P3D-Attention module specifically includes: On the basis that the P3D module decouples the 3D convolution into a spatial and temporal convolution, the attention module is integrated to design three different P3D-Attention modules, i.e., P3D-Attention-A to P3D-Attention-C to obtain the network model.
- This module uses a DCAM to make key frames play more important role in classification.
- the SA module is introduced to automatically assign different attention to different joint points according to the feature map, attention is paid to positions such as eyes and mouth mentioned in the prior knowledge, and the interference of the background and noise on recognition is removed.
- the attention mechanism is expressed as:
- Step 3.1 In order to adapt to the 3D convolution, a module named DCAM is constructed, as shown in FIG. 3 .
- the DCAM of the present invention applies attention between video frames and on the channels of each frame, instead of only on the time frame level.
- F in R represents the number of frames
- C represents the number of channels in each frame
- H and W represent features under different channels, but the contributions of the channels to the final detection result are not equal.
- the DCAM learns the weight of M c ⁇ R (F ⁇ 1 ⁇ 1 ⁇ C) to determine the importance of each channel.
- the attention on the frame and channel is expressed by transposing the feature map with a size of (F,H,W,C) into (H, W,F ⁇ C), embedding (H,W,F ⁇ C) in the 2D SA module, and learning the weights of M c ⁇ R (F ⁇ 1 ⁇ 1 ⁇ 1) and M c ⁇ R (1 ⁇ 1 ⁇ 1 ⁇ C) , respectively.
- Step 3.2 In order to obtain key information, the human visual mechanism pays more attention to the main target rather than the background. Therefore, in the present invention, the weight map of the feature layer in the spatial dimension is obtained through the SA module. Taking the feature map F ⁇ R (F ⁇ H ⁇ W ⁇ C) as an example, the SA module learns the weight of M s ⁇ R (1 ⁇ W ⁇ H ⁇ 1) to determine the importance of each feature map.
- the SA mechanism mainly uses a 2D convolution kernel to obtain a spatial feature weight.
- the scene in the car hardly changes. Therefore, different from other tasks that need to consider multiple scales, the special scene of fatigue detection can use convolution kernels of different sizes to adapt to convolution features of different depths.
- the modular architecture is shown in FIG. 4 .
- Step 3.3 In performing the fatigue driving detection task, the data input by the model is a continuous frame of a video.
- the attention module is integrated to design three different P3D-Attention blocks, i.e., P3D-Attention-A to P3D-Attention-C to obtain the network model, as shown in FIGS. 5 - 7 .
- This module uses the DCAM to make key frames play a more important role in classification.
- the SA module is introduced to automatically assign different attention to different joint points according to the feature map, attention is paid to positions such as eyes and mouth mentioned in the prior knowledge, and the interference of the background and noise on recognition is removed.
- P3D-Attention-A Based on the RU, the original P3D-A module first cascades the temporal 1D convolution kernel (T) to the spatial 2D convolution kernel (S) before considering the stacked architecture. Therefore, these two convolution kernels can directly influence each other on the same path, and only the temporal 1D convolution kernel is directly connected to the final output. Since the fatigue detection task does not require excessively deep convolutional layers, the RU is removed and only P3D-A is retained.
- the SA module is cascaded after S, and then the CA module is cascaded after T to obtain the P3D-Attention-A architecture, as shown in Equation 2:
- X t represents an input feature map
- X t+1 represents an output obtained after the attention mechanism is applied
- X t and X t+1 have the same feature dimension.
- P3D-Attention-B The original P3D-B uses the indirect influence between two convolution kernels, so that the two convolution kernels process convolution features in parallel.
- the SA module is cascaded after S, and then the CA module is cascaded after T, which is expressed as follows:
- P3D-Attention-C The original P3D-C module is a compromise between P3D-A and P3D-B, through which the direct influence between S, T and the final output is simultaneously established. Specifically, in order to achieve a direct connection between S and the final output based on the cascaded P3D-A architecture, the P3D-Attention-C is constructed by introducing the attention module, which is expressed as:
- the P3D-Attention architecture includes P3D-Attention-A, P3D-Attention-B and P3D-Attention-C in sequence.
- Step 3.4 The attention mechanism assigns different weights to different channels and features. After several convolutions, the spatiotemporal feature information of 90 frames is fused into 7 frames.
- FIG. 8 A and FIG. 8 b show the comparison between the features on the same channel before and after being processed by the CA mechanism and the SA mechanism, which illustrates that the features of the face and the more important features of the eyes and mouth are more obvious.
- Step 3.5 Three P3D-Attention modules (with sizes of 128, 256 and 256) are cascaded in the network architecture of the present invention to obtain key features, and the 3D MP layer is cascaded after the P3D-Attention-A module with a size of 128 to perform down sampling, as shown in FIG. 9 .
- a method of using the 2D GAP layer in the 3D convolution specifically includes: After the video frames are processed by the three P3D modules and three P3D-Attention modules, the time signals are not completely folded. In order to obtain more temporal feature information, the features are transposed and input into the 2D GAP layer instead of using 3D GAP. Finally, the features are input into Softmax and classified.
- Step 4.1 The GAP layer is used in order to replace a fully connected layer, reduce the number of parameters, and prevent overfitting.
- the fully connected layer is replaced with the GAP layer, the feature map output by the convolution architecture is transposed, and then more temporal features are retained through the 2D GAP layer.
- Step 4.2 The output of the 2D GAP layer is used as the input of Softmax to classify driver behaviors. A warning is issued when driver fatigue is detected.
- Step 4.3 The entire network is a CNN architecture.
- F1-score also known as balanced F score, is defined as a harmonic average of precision P and recall rate R.
- Precision refers to a ratio of the number of true positives to the number of positive samples determined by a classifier.
- TP indicates true positives correctly classified by the classifier
- FP indicates false positives incorrectly classified by the classifier
- Recall (R) refers to a ratio of the number of predicted true positives to the total number of positive samples.
- F1-score refers to a harmonic average of P and R.
- a driver fatigue detection system based on combining a P3D CNN and an attention mechanism includes a video capture and crop module, a method integration module and a display module.
- the video capture and crop module is fixedly installed in cameras directly in front of or on the left and right sides of a driver in a cab, and configured to capture a real-time video stream of upper body information of the driver.
- the real-time video stream is displayed on the display module and is transmitted as input information into the method integration module.
- the method integration module is configured to encapsulate the driver fatigue detection method based on combining the P3D CNN and the attention mechanism, and reserve an interface to form a black box.
- the input of the method integration module is video stream data in a correct format.
- the display module is used as an image presentation carrier and configured to display input video image information, output driver fatigue detection state information, and warning information output after the driver fatigue is detected.
- the present invention designs a P3D-Attention module.
- the P3D-Attention module is based on the decoupling of spatial and temporal convolutions by the P3D module, and is respectively integrated with the adaptive SA module and DCAM, thereby fully integrating the spatiotemporal features.
- the correlation of important channel features is improved and the global correlation of feature maps is increased, thereby improving the performance of fatigue driving prediction.
- the model size of the method of the present invention is 42.5 MB, which is 1 ⁇ 9 of that of the LSTM fusion method.
- the prediction of a 180-frame video takes approximately 660 milliseconds, which is approximately 11% of that of the LSTM fusion method.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Emergency Management (AREA)
- Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
- This application is the national phase entry of International Application No. PCT/CN2020/109693, filed on Aug. 18, 2020, which is based upon and claims priority to Chinese Patent Application No. 202010522475.2, filed on Jun. 10, 2020, the entire contents of which are incorporated herein by reference.
- The present invention relates to the technical field of intelligent video analysis, and more particularly, to a driver fatigue detection method and system based on combining a pseudo-three-dimensional (P3D) convolutional neural network (CNN) and an attention mechanism.
- Fatigue driving is one of the major causes of traffic accidents. Drivers in fatigue state often feel drowsy and have transient loss of consciousness, which impairs their alertness. Moreover, a driver who experiences fatigue while driving is less likely to have the capacity to react to sudden events, reflect traffic control and dangerous events, which all lead to motor vehicle accidents. The American Automobile Association (AAA) reported that 7% of all motor vehicle accidents and 21% of fatal traffic accidents were caused by fatigue drivers, indicating that fatigue driving constitutes the largest proportion of road traffic accidents. Prior art methods for detecting fatigue driving behavior can be divided into three types: physiological parameters based detection method, vehicle behavior based detection method and facial feature analysis based detection method.
- The physiological parameters based detection method requires a variety of sensors to have contact with the driver’s body, and determines whether the driver is in the fatigue state based on the detection of different physiological signals, such as, electrocardiography (ECG) and/or electroencephalogram (EEG), the currently known systems that have been used in this method are, for example, electromyography (EDA), Respiration and ECG. In general, physiological parameters based detection method has high fatigue driving detection accuracy, but it relies on experimental equipment and is considered invasive, which limits its application range. The vehicle behavior based detection method uses vehicle behavior parameters, such as lane departure, steering wheel angle (SWA) and yaw angle (YA) to detect fatigue driving behavior. Similar to physiological parameters based detection method, this method also depends on external factors such as road conditions.
- The facial feature analysis based detection method extracts the facial feature points of the driver, compares the fatigue state and the normal state of the driver, and detects the fatigue behavior characteristics of the driver such as head movement posture, eye state (blinking) and yawning. This method outperforms the above two methods in its advantages of being non-invasive and its easy implementation. One proposed method requires continuous recording of a driver’s eyelid movement through infrared sensors and studying the effectiveness of spontaneous blinking parameters. This proposed method determines the sub-components of blinking duration, namely the closing time and reopening time are investigated. Studies have shown that the blinking duration and reopening time change reliably with increasing drowsiness. By evaluating the performance of the latest in-vehicle fatigue prediction measures based on tracking the driver’s eye movements, including marking blinking candidates that meet the criteria (minimum/maximum duration, shape, and minimum amplitude) as valid blinks. Facial recognition algorithm is also used based on edge detection and texture measurement to segment the eyes and calculate the eye features that change over time. This method obtains 95.83% effectiveness under high illumination, and 87.5% effectiveness under medium illumination. The human face extraction system uses a support vector machine (SVM) based on facial extraction and a mouth detection method based on circular Hough transform (CHT) to detect the movement around the driver’s mouth area in determining the fatigue state of the driver based on the opening degree of the driver’s mouth. The above methods depend on artificial features, and often, these methods are unable to thoroughly explore the complex relationship between different visual cues. Additionally, these methods ignore the fact that the eyes and the mouth may be occluded, the yawning time and opening degree of the mouth vary from person to person, and do not consider the driver’s facial expression changes and head movement posture, etc.
- Different from the above artificial feature based facial feature analysis and detection methods, a CNN based image spatial feature extraction system and a long short-term memory (LSTM) based image temporal feature analysis system use LSTM to integrate information on the time series to obtain the optimal determination performance, and the frame-level CNN feature output is aggregated into video-level features for prediction. The study results show that the yawning detection obtains an accuracy rate of over 87% when performed on a continuous video. A method performs transfer learning on a trained Inception-v3 model, uses the trained Inception-v3 model to extract spatial features, the extracted spatial features are then input into the LSTM layer to integrate the temporal features for the prediction of the fatigue state. The multi-CNN-based driver activity detection model uses a Gaussian mixture model to segment the original image and extracts the driver’s physical features from the background. This model can effectively determine whether the driver is distracted, and its accuracy rate can reach 91.4%. These methods are more robust than the artificial feature based methods and better capture the relationship between different cues. However, due to the use of GoogleNet and Inception-v3 model for spatial feature extraction, the prediction model has massive parameters and contains a large amount of redundant spatial data. The convolutional spatial features are converted into a one-dimensional (1D) vector and input into the time series model, without considering the spatial correlation and without removing the interference of the background and noise on recognition. As a result, the temporal and spatial features cannot be well integrated.
- An objective of the present invention is to improve the performance of fatigue driving prediction, including provide a driver fatigue detection model based on combining a pseudo-three-dimensional (P3D) convolutional neural network (CNN) and an attention mechanism. The present invention creates a P3D-Attention module based on the decoupling of spatial and temporal convolutions by a P3D module, which is respectively integrated with a spatial attention (SA) module and a dual-channel attention model (DCAM) that are adaptive, thereby fully integrating the spatiotemporal features. Thereby improving the correlation of important channel features and increasing the global correlation of feature maps.
- The present invention discloses a driver fatigue detection method based on combining a P3D CNN and an attention mechanism, including:
- a step of extracting a frame sequence from a video;
- a step of performing spatiotemporal feature learning through a P3D convolution module;
- a step of constructing a P3D-Attention module, and applying attention on channels and a feature map through the attention mechanism;
- a step of applying attention on a time frame and a space frame through a dual-channel attention model (DCAM) to strengthen key frames; automatically assigning different attention to different joint points according to the feature map, paying attention to a position mentioned in the prior knowledge, and removing interference of a background and noise on recognition; the attention mechanism being expressed as:
-
- where, M represents the attention module, and F represents the feature map; and ⊗ represents element-wise multiplication of the matrix; and
- a classification step.
- Further, the DCAM applies attention between video frames and on channels of each frame; the feature map is F ∈ R(F×H×W× C), where F in R represents the number of frames; C represents the number of channels in each frame; H and W represent features under different channels; a weight of Mc ∈ R(F×1×1×C) is learned to determine the importance of each channel; the feature map with a size of (F,H,W,C) is transposed into (H, W, F×C), and (H, W, F×C) is combined with a two-dimensional (2D) spatial attention (SA) module; weights of Mc ∈ R(F×1×1×1) and Mc ∈ R(1×1×1×C) are respectively learned to express attention on the frame and channel.
- Further, the 2D SA module uses a 2D convolution kernel to obtain a weight map of a feature layer in a spatial dimension; a SA module of F ∈ R(F×H×W×C) learns a weight of Ms ∈ R(1×W×H×1) to determine the importance of each feature map.
- Further, on the basis that the P3D module decouples a 3D convolution into a spatial and temporal convolution, the attention module is divided into three different P3D-Attention modules to obtain a network model;
- P3D-Attention-A: a temporal ID convolution kernel T is cascaded to a spatial 2D convolution kernel S, a SA module is cascaded after S, and a channel attention (CA) module is cascaded after T, to obtain a P3D-Attention-A architecture; the temporal 1D convolution kernel T is directly connected to a final output, as shown in Equation (2):
-
- where, Xt represents an input feature map; Xt+1 represents an output obtained after the attention mechanism is applied; Xt and Xt+1 have the same feature dimension;
- P3D-Attention-B: an original P3D-B module uses an indirect influence between two convolution kernels, so that the two convolution kernels process convolution features in parallel; after a residual unit (RU) is removed, a SA module is cascaded after S, and then a CA module is cascaded after T, which is expressed as follows:
-
- P3D-Attention-C: an original P3D-C module is a compromise between P3D-A and P3D-B, through which a direct influence between S, T and the final output is simultaneously established; in order to achieve a direct connection between S and the final output based on the cascaded P3D-A architecture, an attention module is introduced to construct the P3D-Attention-C, which is expressed as:
-
- and
- the attention mechanism assigns different weights to different channels and features; after several convolutions, the spatial and temporal feature information is fused to obtain key features, and a 3D max pooling layer is cascaded after the P3D-Attention-A module for down sampling.
- Further, the classification step includes:
- introducing a 2D global average pooling (GAP) layer after the P3D-Attention module; wherein after the video frame passes through three P3D modules and three P3D-Attention modules, the features are input into the 2D GAP layer after being transposed, and finally the features are input into Softmax and classified.
- Further, the classification step specifically includes:
- replacing a fully connected layer with the GAP layer; transposing the feature map output by the convolution architecture, and retaining more temporal features through the 2D GAP layer;
- using an output of the 2D GAP layer as an input of Softmax to classify driver behaviors; and issuing a warning when driver fatigue is detected, wherein
- the entire network is a convolutional neural network architecture, and the performance of the model is evaluated using F1-score to reduce the misjudgment of the model during training.
- Further, a method of extracting the frame sequence from the video of a driver and processing the frame sequence specifically includes: capturing the video for approximately 5 seconds each time to extract 180 video frames.
- Further, the step of performing the spatiotemporal feature learning through the P3D convolution module specifically includes: simulating a 3×3×3 convolution in spatial and temporal domains and decoupling the 3×3×3 convolution in time and space through a P3D architecture that uses a 1×3×3 convolution kernel and a 3×1×1 convolution kernel; and based on the P3D architecture, cascading P3D architectures with sizes of 32, 64 and 128 to obtain an image feature.
- The present invention further proposes a driver fatigue detection system based on combining a P3D CNN and an attention mechanism, including:
- a video capture and crop module, configured to provide continuous video stream information of a driver; and
- a driver fatigue detection module, configured to detect driver fatigue using the detection method.
- Further, the video capture and crop module captures a real-time video stream of upper body information of the driver;
- the driver fatigue detection module reserves an interface, and the input of the driver fatigue detection module is video stream data in a correct format.
- 1) The present invention designs a P3D-Attention module. The P3D-Attention module is based on the decoupling of spatial and temporal convolutions by the P3D module, and is respectively integrated with a SAmodule and a DCAM that are adaptive, thereby fully integrating the spatiotemporal features. In this way, the correlation of important channel features is improved and the global correlation of feature maps is increased, thereby improving the performance of fatigue driving prediction.
- 2) A comparative test performed on the public dataset Yawning Detection Dataset (YawDD) shows that the F1-score of the method of the present invention reaches 99.89%, and the recall rate on the yawning category reaches 100%. On the data set named University of Texas at Arlington Real-Life Drowsiness Dataset (UTA-RLDD), the F1-score of the method of the present invention reaches 99.64% on the test set, and the recall rate of reaches 100% on the drowsy category.
-
FIG. 1 schematically shows modular design of the driver fatigue detection method based on combining the pseudo-three-dimensional (P3D) convolutional neural network (CNN) and the attention mechanism. -
FIG. 2 is a schematic diagram showing the cascading of the three P3D modules. -
FIG. 3 schematically shows the structure of the channel attention (CA) module. -
FIG. 4 schematically shows the structure of the spatial attention (SA) module. -
FIG. 5 schematically shows the P3D-Attention-A architecture. -
FIG. 6 schematically shows the P3D-Attention-B architecture. -
FIG. 7 schematically shows the P3D-Attention-C architecture. -
FIG. 8A is a schematic diagram showing the features before being processed by the attention mechanism. -
FIG. 8B is a schematic diagram showing the features after being processed by the attention mechanism. -
FIG. 9 is a schematic diagram showing the cascaded modules of P3D-Attention. - The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention. Apparently, the described embodiments are merely a part rather than all of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of protection of the present invention.
- The embodiments of the present invention are described in detail below with reference to the drawings.
- The present invention proposes a driver fatigue detection method based on combining a pseudo-three-dimensional (P3D) convolutional neural network (CNN) and an attention mechanism, which includes the following steps:
- Step 1: A video frame sequence is extracted from a driver’s video and processed.
- Step 2: Spatiotemporal feature learning is performed through a P3D convolution module.
- Step 3: A P3D-Attention module is constructed, and attention is applied on a channel and a feature map through the attention mechanism.
- Step 4: A 3D GAP layer is replaced with a 2D GAP layer to obtain more expressive features, and a classification is performed through a Softmax classification layer.
- In Step 1, a method of extracting and processing the video frame sequence from the video includes: The video is captured for approximately 5 seconds each time to extract 180 video frames.
- In Step 2, the step of performing the spatiotemporal feature learning through the P3D convolution module specifically includes: The P3D architecture uses a 1×3×3 convolution kernel and a 3×1×1 convolution kernel to simulate a 3×3×3 convolution in spatial and temporal domains, and decouples the 3×3×3 convolution in time and space. Based on the P3D architecture, P3D architectures with the sizes of 32, 64, and 128 are cascaded to obtain an image feature, and down-sampling is performed through a max pooling layer, as shown in
FIG. 2 . - In Step 3, the step of constructing the P3D-Attention module specifically includes: On the basis that the P3D module decouples the 3D convolution into a spatial and temporal convolution, the attention module is integrated to design three different P3D-Attention modules, i.e., P3D-Attention-A to P3D-Attention-C to obtain the network model. This module uses a DCAM to make key frames play more important role in classification. In addition, the SA module is introduced to automatically assign different attention to different joint points according to the feature map, attention is paid to positions such as eyes and mouth mentioned in the prior knowledge, and the interference of the background and noise on recognition is removed. The attention mechanism is expressed as:
-
- Step 3.1: In order to adapt to the 3D convolution, a module named DCAM is constructed, as shown in
FIG. 3 . In order to use the attention mechanism on the 3D convolution, the DCAM of the present invention applies attention between video frames and on the channels of each frame, instead of only on the time frame level. Taking the feature map F ∈ R(F×H×W×C) as an example, F in R represents the number of frames; C represents the number of channels in each frame; H and W represent features under different channels, but the contributions of the channels to the final detection result are not equal. The DCAM learns the weight of Mc ∈ R(F×1×1×C) to determine the importance of each channel. The attention on the frame and channel is expressed by transposing the feature map with a size of (F,H,W,C) into (H, W,F×C), embedding (H,W,F×C) in the 2D SA module, and learning the weights of Mc ∈ R(F×1×1×1) and Mc ∈ R(1×1×1×C), respectively. - Step 3.2: In order to obtain key information, the human visual mechanism pays more attention to the main target rather than the background. Therefore, in the present invention, the weight map of the feature layer in the spatial dimension is obtained through the SA module. Taking the feature map F ∈ R(F×H×W×C) as an example, the SA module learns the weight of Ms ∈ R(1×W×H×1) to determine the importance of each feature map. The SA mechanism mainly uses a 2D convolution kernel to obtain a spatial feature weight. During the driving process, the scene in the car hardly changes. Therefore, different from other tasks that need to consider multiple scales, the special scene of fatigue detection can use convolution kernels of different sizes to adapt to convolution features of different depths. The modular architecture is shown in
FIG. 4 . - Step 3.3: In performing the fatigue driving detection task, the data input by the model is a continuous frame of a video. On the basis that the P3D module decouples the 3D convolution into a spatial and temporal convolution, the attention module is integrated to design three different P3D-Attention blocks, i.e., P3D-Attention-A to P3D-Attention-C to obtain the network model, as shown in
FIGS. 5-7 . This module uses the DCAM to make key frames play a more important role in classification. In addition, the SA module is introduced to automatically assign different attention to different joint points according to the feature map, attention is paid to positions such as eyes and mouth mentioned in the prior knowledge, and the interference of the background and noise on recognition is removed. - P3D-Attention-A: Based on the RU, the original P3D-A module first cascades the temporal 1D convolution kernel (T) to the spatial 2D convolution kernel (S) before considering the stacked architecture. Therefore, these two convolution kernels can directly influence each other on the same path, and only the temporal 1D convolution kernel is directly connected to the final output. Since the fatigue detection task does not require excessively deep convolutional layers, the RU is removed and only P3D-A is retained. The SA module is cascaded after S, and then the CA module is cascaded after T to obtain the P3D-Attention-A architecture, as shown in Equation 2:
-
- where, Xt represents an input feature map; Xt+1 represents an output obtained after the attention mechanism is applied; and Xt and Xt+1 have the same feature dimension.
- P3D-Attention-B: The original P3D-B uses the indirect influence between two convolution kernels, so that the two convolution kernels process convolution features in parallel. After the residual unit is removed, the SA module is cascaded after S, and then the CA module is cascaded after T, which is expressed as follows:
-
- P3D-Attention-C: The original P3D-C module is a compromise between P3D-A and P3D-B, through which the direct influence between S, T and the final output is simultaneously established. Specifically, in order to achieve a direct connection between S and the final output based on the cascaded P3D-A architecture, the P3D-Attention-C is constructed by introducing the attention module, which is expressed as:
-
- As shown in
FIGS. 5 to 7 , the P3D-Attention architecture includes P3D-Attention-A, P3D-Attention-B and P3D-Attention-C in sequence. - Step 3.4: The attention mechanism assigns different weights to different channels and features. After several convolutions, the spatiotemporal feature information of 90 frames is fused into 7 frames.
FIG. 8A andFIG. 8 b show the comparison between the features on the same channel before and after being processed by the CA mechanism and the SA mechanism, which illustrates that the features of the face and the more important features of the eyes and mouth are more obvious. - Step 3.5: Three P3D-Attention modules (with sizes of 128, 256 and 256) are cascaded in the network architecture of the present invention to obtain key features, and the 3D MP layer is cascaded after the P3D-Attention-A module with a size of 128 to perform down sampling, as shown in
FIG. 9 . - Further, in Step 4, a method of using the 2D GAP layer in the 3D convolution specifically includes: After the video frames are processed by the three P3D modules and three P3D-Attention modules, the time signals are not completely folded. In order to obtain more temporal feature information, the features are transposed and input into the 2D GAP layer instead of using 3D GAP. Finally, the features are input into Softmax and classified.
- Step 4.1: The GAP layer is used in order to replace a fully connected layer, reduce the number of parameters, and prevent overfitting. In some embodiments, the fully connected layer is replaced with the GAP layer, the feature map output by the convolution architecture is transposed, and then more temporal features are retained through the 2D GAP layer.
- Step 4.2: The output of the 2D GAP layer is used as the input of Softmax to classify driver behaviors. A warning is issued when driver fatigue is detected.
- Step 4.3: The entire network is a CNN architecture. In order to reduce the misjudgment of the model during training, the present invention uses F1-score instead of accuracy to evaluate the performance of the model. F1-Score, also known as balanced F score, is defined as a harmonic average of precision P and recall rate R.
- Precision (P) refers to a ratio of the number of true positives to the number of positive samples determined by a classifier.
-
- where, TP indicates true positives correctly classified by the classifier; FP indicates false positives incorrectly classified by the classifier.
- Recall (R) refers to a ratio of the number of predicted true positives to the total number of positive samples.
-
- where, FN indicates false negatives incorrectly classified by the classifier.
- F1-score refers to a harmonic average of P and R.
-
- Further, a driver fatigue detection system based on combining a P3D CNN and an attention mechanism includes a video capture and crop module, a method integration module and a display module. The video capture and crop module is fixedly installed in cameras directly in front of or on the left and right sides of a driver in a cab, and configured to capture a real-time video stream of upper body information of the driver. The real-time video stream is displayed on the display module and is transmitted as input information into the method integration module.
- The method integration module is configured to encapsulate the driver fatigue detection method based on combining the P3D CNN and the attention mechanism, and reserve an interface to form a black box. The input of the method integration module is video stream data in a correct format.
- The display module is used as an image presentation carrier and configured to display input video image information, output driver fatigue detection state information, and warning information output after the driver fatigue is detected.
- The present invention designs a P3D-Attention module. The P3D-Attention module is based on the decoupling of spatial and temporal convolutions by the P3D module, and is respectively integrated with the adaptive SA module and DCAM, thereby fully integrating the spatiotemporal features. Thus, the correlation of important channel features is improved and the global correlation of feature maps is increased, thereby improving the performance of fatigue driving prediction. Compared with the LSTM fusion method using the Inception-V3 model, the model size of the method of the present invention is 42.5 MB, which is ⅑ of that of the LSTM fusion method. The prediction of a 180-frame video (approximately 5 seconds) takes approximately 660 milliseconds, which is approximately 11% of that of the LSTM fusion method.
- The above description is merely the specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any modification or replacement easily conceived by those skilled in the art within the technical scope of the present invention shall fall within the scope of protection of the present invention. Therefore, the scope of protection of the present invention shall be subject to the scope of protection defined by the claims.
Claims (20)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010522475.2A CN111428699B (en) | 2020-06-10 | 2020-06-10 | Driving fatigue detection method and system combining pseudo-3D convolutional neural network and attention mechanism |
CN202010522475.2 | 2020-06-10 | ||
PCT/CN2020/109693 WO2021248687A1 (en) | 2020-06-10 | 2020-08-18 | Driving fatigue detection method and system combining pseudo 3d convolutional neural network and attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
US20230154207A1 true US20230154207A1 (en) | 2023-05-18 |
US11783601B2 US11783601B2 (en) | 2023-10-10 |
Family
ID=71551314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/043,681 Active 2041-12-26 US11783601B2 (en) | 2020-06-10 | 2020-08-18 | Driver fatigue detection method and system based on combining a pseudo-3D convolutional neural network and an attention mechanism |
Country Status (3)
Country | Link |
---|---|
US (1) | US11783601B2 (en) |
CN (1) | CN111428699B (en) |
WO (1) | WO2021248687A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740649A (en) * | 2023-08-07 | 2023-09-12 | 山东科技大学 | Deep learning-based real-time detection method for behavior of crewman falling into water beyond boundary |
CN116778395A (en) * | 2023-08-21 | 2023-09-19 | 成都理工大学 | Mountain torrent flood video identification monitoring method based on deep learning |
CN117079256A (en) * | 2023-10-18 | 2023-11-17 | 南昌航空大学 | Fatigue driving detection algorithm based on target detection and key frame rapid positioning |
CN117218720A (en) * | 2023-08-25 | 2023-12-12 | 中南民族大学 | Footprint identification method, system and related device of composite attention mechanism |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111428699B (en) | 2020-06-10 | 2020-09-22 | 南京理工大学 | Driving fatigue detection method and system combining pseudo-3D convolutional neural network and attention mechanism |
CN111985343B (en) * | 2020-07-23 | 2024-04-09 | 深圳大学 | Construction method of behavior recognition depth network model and behavior recognition method |
CN111860427B (en) * | 2020-07-30 | 2022-07-01 | 重庆邮电大学 | Driving distraction identification method based on lightweight class eight-dimensional convolutional neural network |
CN111985617B (en) * | 2020-08-14 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | Processing method and device of 3D convolutional neural network on neural network processor |
CN112131981B (en) * | 2020-09-10 | 2021-06-22 | 山东大学 | Driver fatigue detection method based on skeleton data behavior recognition |
CN112507920B (en) * | 2020-12-16 | 2023-01-24 | 重庆交通大学 | Examination abnormal behavior identification method based on time displacement and attention mechanism |
CN114642413A (en) * | 2020-12-21 | 2022-06-21 | 奥泰医疗系统有限责任公司 | MRI head 3D image automatic scanning positioning method based on deep learning |
CN113435234B (en) * | 2021-03-25 | 2024-01-23 | 北京邮电大学 | Driver visual saliency area prediction method based on bimodal video EEG data |
CN113065450B (en) * | 2021-03-29 | 2022-09-20 | 重庆邮电大学 | Human body action recognition method based on separable three-dimensional residual error attention network |
CN113076884B (en) * | 2021-04-08 | 2023-03-24 | 华南理工大学 | Cross-mode eye state identification method from near infrared light to visible light |
CN113505305A (en) * | 2021-05-11 | 2021-10-15 | 清华大学 | Collaborative filtering recommendation method and system based on decoupling type two-channel hypergraph neural network |
CN113283338A (en) * | 2021-05-25 | 2021-08-20 | 湖南大学 | Method, device and equipment for identifying driving behavior of driver and readable storage medium |
CN113255530B (en) * | 2021-05-31 | 2024-03-29 | 合肥工业大学 | Attention-based multichannel data fusion network architecture and data processing method |
CN113592900A (en) * | 2021-06-11 | 2021-11-02 | 安徽大学 | Target tracking method and system based on attention mechanism and global reasoning |
CN114241453B (en) * | 2021-12-20 | 2024-03-12 | 东南大学 | Driver distraction driving monitoring method utilizing key point attention |
CN114332592B (en) * | 2022-03-11 | 2022-06-21 | 中国海洋大学 | Ocean environment data fusion method and system based on attention mechanism |
CN114565977B (en) * | 2022-03-16 | 2023-05-02 | 电子科技大学 | Gait feature extraction method |
CN114821421A (en) * | 2022-04-28 | 2022-07-29 | 南京理工大学 | Traffic abnormal behavior detection method and system |
CN114758302A (en) * | 2022-05-07 | 2022-07-15 | 广东电网有限责任公司广州供电局 | Electric power scene abnormal behavior detection method based on decentralized attention mechanism |
CN114821968B (en) * | 2022-05-09 | 2022-09-13 | 西南交通大学 | Intervention method, device and equipment for fatigue driving of motor car driver and readable storage medium |
CN115049969B (en) * | 2022-08-15 | 2022-12-13 | 山东百盟信息技术有限公司 | Bad video detection method for improving YOLOv3 and BiConvLSTM |
CN115272776B (en) * | 2022-09-26 | 2023-01-20 | 山东锋士信息技术有限公司 | Hyperspectral image classification method based on double-path convolution and double attention and storage medium |
CN115561243B (en) * | 2022-09-30 | 2023-05-23 | 东莞市言科新能源有限公司 | Pole piece quality monitoring system and method in lithium battery preparation |
CN115578719B (en) * | 2022-10-13 | 2024-05-17 | 中国矿业大学 | YM_SSH-based fatigue state detection method for lightweight target detection |
CN115578615B (en) * | 2022-10-31 | 2023-05-09 | 成都信息工程大学 | Night traffic sign image detection model building method based on deep learning |
CN115919315B (en) * | 2022-11-24 | 2023-08-29 | 华中农业大学 | Cross-main-body fatigue detection deep learning method based on EEG channel multi-scale parallel convolution |
CN115775236B (en) * | 2022-11-24 | 2023-07-14 | 广东工业大学 | Visual detection method and system for surface micro defects based on multi-scale feature fusion |
CN115762787B (en) * | 2022-11-24 | 2023-07-07 | 浙江大学 | Eyelid disease operation curative effect evaluation method and system |
CN115979350A (en) * | 2023-03-20 | 2023-04-18 | 北京航天华腾科技有限公司 | Data acquisition system of ocean monitoring equipment |
CN117197727B (en) * | 2023-11-07 | 2024-02-02 | 浙江大学 | Global space-time feature learning-based behavior detection method and system |
CN117612142A (en) * | 2023-11-14 | 2024-02-27 | 中国矿业大学 | Head posture and fatigue state detection method based on multi-task joint model |
CN117576666B (en) * | 2023-11-17 | 2024-05-10 | 合肥工业大学 | Dangerous driving behavior detection method based on multi-scale dynamic convolution attention weighting |
CN117831301B (en) * | 2024-03-05 | 2024-05-07 | 西南林业大学 | Traffic flow prediction method combining three-dimensional residual convolution neural network and space-time attention mechanism |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427807A (en) * | 2019-06-21 | 2019-11-08 | 诸暨思阔信息科技有限公司 | A kind of temporal events motion detection method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0785280B2 (en) | 1992-08-04 | 1995-09-13 | タカタ株式会社 | Collision prediction judgment system by neural network |
US20190244108A1 (en) * | 2018-02-08 | 2019-08-08 | Cognizant Technology Solutions U.S. Corporation | System and Method For Pseudo-Task Augmentation in Deep Multitask Learning |
CN110188239B (en) * | 2018-12-26 | 2021-06-22 | 北京大学 | Double-current video classification method and device based on cross-mode attention mechanism |
CN110263638B (en) * | 2019-05-16 | 2023-04-18 | 山东大学 | Video classification method based on significant information |
CN110378281A (en) * | 2019-07-17 | 2019-10-25 | 青岛科技大学 | Group Activity recognition method based on pseudo- 3D convolutional neural networks |
CN111428699B (en) * | 2020-06-10 | 2020-09-22 | 南京理工大学 | Driving fatigue detection method and system combining pseudo-3D convolutional neural network and attention mechanism |
-
2020
- 2020-06-10 CN CN202010522475.2A patent/CN111428699B/en active Active
- 2020-08-18 US US17/043,681 patent/US11783601B2/en active Active
- 2020-08-18 WO PCT/CN2020/109693 patent/WO2021248687A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427807A (en) * | 2019-06-21 | 2019-11-08 | 诸暨思阔信息科技有限公司 | A kind of temporal events motion detection method |
Non-Patent Citations (1)
Title |
---|
Lei et al, "Infant Brain MRI Segmentation with Dilated Convolution Pyramid Downsampling and Self-attention", 2019, arXiv preprint arXiv:1912 (9 PAGES) (Year: 2019) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740649A (en) * | 2023-08-07 | 2023-09-12 | 山东科技大学 | Deep learning-based real-time detection method for behavior of crewman falling into water beyond boundary |
CN116778395A (en) * | 2023-08-21 | 2023-09-19 | 成都理工大学 | Mountain torrent flood video identification monitoring method based on deep learning |
CN117218720A (en) * | 2023-08-25 | 2023-12-12 | 中南民族大学 | Footprint identification method, system and related device of composite attention mechanism |
CN117079256A (en) * | 2023-10-18 | 2023-11-17 | 南昌航空大学 | Fatigue driving detection algorithm based on target detection and key frame rapid positioning |
Also Published As
Publication number | Publication date |
---|---|
CN111428699A (en) | 2020-07-17 |
CN111428699B (en) | 2020-09-22 |
US11783601B2 (en) | 2023-10-10 |
WO2021248687A1 (en) | 2021-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11783601B2 (en) | Driver fatigue detection method and system based on combining a pseudo-3D convolutional neural network and an attention mechanism | |
Mandal et al. | Towards detection of bus driver fatigue based on robust visual analysis of eye state | |
Craye et al. | Driver distraction detection and recognition using RGB-D sensor | |
CN111753674A (en) | Fatigue driving detection and identification method based on deep learning | |
Yan et al. | Recognizing driver inattention by convolutional neural networks | |
Dipu et al. | Real-time driver drowsiness detection using deep learning | |
Kumar et al. | Driver drowsiness detection using modified deep learning architecture | |
CN112949345A (en) | Fatigue monitoring method and system, automobile data recorder and intelligent cabin | |
Dari et al. | A neural network-based driver gaze classification system with vehicle signals | |
Pandey et al. | Temporal and spatial feature based approaches in drowsiness detection using deep learning technique | |
Liu et al. | 3dcnn-based real-time driver fatigue behavior detection in urban rail transit | |
Pandey et al. | A survey on visual and non-visual features in Driver’s drowsiness detection | |
Kassem et al. | Drivers fatigue level prediction using facial, and head behavior information | |
Yarlagadda et al. | Driver drowsiness detection using facial parameters and rnns with lstm | |
Sistla et al. | Stacked ensemble classification based real-time driver drowsiness detection | |
AU2021103045A4 (en) | Drowsy detection using image processing | |
Mridha et al. | Driver Drowsiness Alert System Using Real-Time Detection | |
Adarsh et al. | Drowsiness Detection System in Real Time Based on Behavioral Characteristics of Driver using Machine Learning Approach | |
Subbaiah et al. | Driver drowsiness detection methods: A comprehensive survey | |
QU et al. | Multi-Attention Fusion Drowsy Driving Detection Model | |
Verma et al. | DRIVER DROWSINESS DETECTION | |
You et al. | R2DS: A novel hierarchical framework for driver fatigue detection in mountain freeway | |
Mishra | DRIVER DROWSINESS DETECTION | |
Pulluri et al. | An Efficient Vision based Method for Detecting Drowsiness in Real-time | |
Zahrae et al. | Enhancing Driver Safety: Real-time Drowsiness Detection with InceptionV3 Transfer Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NANJING UNIVERSITY OF SCIENCE AND TECHNOLOGY, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:QI, YONG;ZHUANG, YUAN;REEL/FRAME:053926/0038 Effective date: 20200901 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |