CN113408389A - Method for intelligently recognizing drowsiness action of driver - Google Patents

Method for intelligently recognizing drowsiness action of driver Download PDF

Info

Publication number
CN113408389A
CN113408389A CN202110650708.1A CN202110650708A CN113408389A CN 113408389 A CN113408389 A CN 113408389A CN 202110650708 A CN202110650708 A CN 202110650708A CN 113408389 A CN113408389 A CN 113408389A
Authority
CN
China
Prior art keywords
drowsiness
driver
optical flow
model
image information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110650708.1A
Other languages
Chinese (zh)
Inventor
唐明伟
李林熹
赵潇然
毛红运
曾晟珂
陈晓亮
何明星
徐杨胜
王鹏程
王刘萱
蒙科竹
陶林平
田佳鑫
蒋一铭
杨凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xihua University
Original Assignee
Xihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xihua University filed Critical Xihua University
Priority to CN202110650708.1A priority Critical patent/CN113408389A/en
Publication of CN113408389A publication Critical patent/CN113408389A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a method for intelligently identifying drowsiness actions of a driver, which comprises the following steps: the method comprises the following steps: acquiring a video stream in the driving process of a driver; step two: preprocessing the video stream to obtain gray image information and optical flow image information of the video stream; step three: and taking the gray scale image information and the optical flow image information as the input of a drowsiness action recognition model, thereby obtaining the drowsiness action of the driver. The invention takes the optical flow image as the input of the drowsiness action detection model, and the optical flow image stores the motion information of the object, thereby further improving the detection accuracy.

Description

Method for intelligently recognizing drowsiness action of driver
Technical Field
The invention relates to the technical field of image processing, in particular to a method for intelligently identifying drowsiness actions of a driver.
Background
In recent years, deep learning has exhibited excellent performance in various applications. CNN, also known as two-dimensional convolutional neural network (2D-CNN), is one of the most powerful deep learning algorithms in image recognition and classification. In the field of drowsiness detection, Zhao et al propose a drowsiness detection method based on CNN. They extract a face region from an image and classify the images according to the state of eyes using the proposed CNN model, and then determine whether the subject is drowsy according to PER-CLOS values. By using the CNN for feature extraction, the drowsiness detection accuracy is significantly improved. However, since the 2D-CNN convolves only the width and height of the image and does not include temporal features, there are still limitations. That is, 2D-CNNs do not take into account motion information contained in a continuous sequence of frames. When the driver frequently yawns or dozes, it indicates that the driver is asleep. These actions are dynamic and cannot be reflected in a single image, whereas 2D-CNN can only determine the static state of the driver. However, the eye-closing state may be an image of a normal blink of the driver, or may be a frame of a slow blink action of the driver, where the slow blink represents a state in which the driver has slowed down the blink speed due to fatigue or has fallen asleep with eyes directly closed. The duration of the eye-closed state at the time of slow blinking is longer than that at the time of normal blinking, which is not reflected in the two-dimensional image. To extract temporal features from a continuous sequence of frames, therefore, a three-dimensional convolutional neural network (3D-CNN) is proposed that integrates spatiotemporal information into a single model to capture discriminative features in the spatiotemporal dimension. However, the accuracy of the three-dimensional convolutional neural network detection is not high enough.
Disclosure of Invention
The invention aims to solve the defects in the prior art and provides a method for intelligently identifying drowsiness actions of a driver with higher identification precision.
A method for intelligently recognizing drowsiness actions of a driver comprises the following steps:
the method comprises the following steps: acquiring a video stream in the driving process of a driver;
step two: preprocessing the video stream to obtain gray image information and optical flow image information of the video stream;
step three: and taking the gray scale image information and the optical flow image information as the input of a drowsiness action recognition model, thereby obtaining the drowsiness action of the driver.
Further, according to the method for intelligently recognizing the drowsiness action of the driver as described above, in the second step, the preprocessing includes:
step 2-1: dividing the video stream into a plurality of video segments at certain time intervals;
step 2-2: extracting image information of the video clip according to frames, and respectively converting the image information of each frame to obtain a gray level image sequence;
step 2-3: and calculating optical flow information between two frames according to the optical flow between adjacent frames, thereby obtaining an optical flow image sequence.
Further, in the method for intelligently identifying the drowsiness action of the driver as described above, the time interval in step 2 is to intercept the video clip every 3 seconds.
Further, according to the method for intelligently recognizing the drowsiness of the driver as described above, the image size of each frame is 224 × 224.
Further, the method for intelligently recognizing the drowsiness action of the driver as described above includes: normal driving, nodding, slow blinking, yawning.
Further, the method for intelligently recognizing drowsiness of driver as described above, wherein the drowsiness recognition model comprises: the device comprises a convolution layer, a pooling layer, a full-connection layer and a softmax classifier which are connected in sequence;
wherein the convolutional layers comprise 4, the first convolutional layer comprises 8 convolutional kernels, and the size of each convolutional kernel is 3 x 3; the second and third convolutional layers have 16 convolution kernels, the fourth convolutional layer has 8 convolution kernels;
the full connection layer comprises two full connection layers, namely fc1 full connection layer and fc2 full connection layer; the number of the neurons of the fc1 full connection layer is 23520, and the number of the neurons of the fc2 full connection layer is 64.
Has the advantages that:
the invention provides a drowsiness action recognition method based on 3D-CNN, which can extract spatial and temporal features from an input image sequence and is beneficial to action recognition.
The invention also uses the optical flow image as the input of the drowsiness action detection model. The optical flow image stores the motion information of the object, so that the detection accuracy is further improved.
The method provided by the invention can identify a plurality of drowsiness actions, and the experimental result shows that the classification accuracy of the model reaches 86.6%, which is competitive with the existing method.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a conceptual diagram of time IoU;
FIG. 3 is a diagram of a drowsiness action recognition model;
FIG. 4 is a schematic view of an optical flow image;
FIG. 5 is a graph comparing the effect of data enhancement and optical flow on model accuracy;
FIG. 6 is a graph comparing the effect of data enhancement and optical flow on model loss.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described clearly and completely below, and it is obvious that the described embodiments are some, not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is improved on the basis of a three-dimensional convolution neural network model, and an optical flow image of object motion information is input into the model for training.
As shown in FIG. 1, the invention provides a drowsiness behavior recognition model based on a three-dimensional convolutional neural network for driver drowsiness detection research. The model can recognize four different actions, including a non-drowsy action (normal driving) and three drowsy actions (nodding, slow blinking, yawning), and the input of the model is a 10-frame grayscale image sequence and a 9-frame optical flow image sequence. First, 10 frames are extracted from a 3-second video clip, and each frame is converted into a grayscale image. In addition, we also calculate the optical flow between adjacent frames. And inputting the gray image sequence and the optical flow image sequence into a pre-trained drowsiness action recognition model. A 1 x 4 vector is obtained to represent the probability of each class.
Data pre-processing
The acquired video stream needs to be pre-processed first. The drowsiness action recognition model is the core of the proposed drowsiness detection scheme. It can recognize four behaviors including normal driving, nodding, slow blinking, and yawning. Since the model is based on a three-dimensional convolutional neural network, the input to the model is a sequence of frames, so it is necessary to extract successive frames from the video. Each video in the National Tsing Hua University Driver Detection (NTHU-DDD) data set lasts about 1 minute, with 1800 frames, which is too voluminous as a model input due to the limited computational power of the experimental equipment. Therefore, we need to clip these videos into video segments. The clipped video segment cannot be too long because the video segment is too long and contains multiple drowsiness actions, so that the model does not converge during training. Nor too short, which would not capture the characteristics of this action, because a drowsiness action may contain multiple phases, such as yawning, which includes an open mouth phase, a mouth-to-maximum and hold phase, and a closed mouth phase, which causes the model to recognize different phases of these same action as different actions. For this reason, we count the duration of each drowsiness action in the partial video, and find that the duration of these actions is above 3 seconds. To ensure that all drowsiness actions are detected, we set the duration of the video segment to 3 seconds and apply this setting to the training set and test set of NTHU-DDD.
We used the concept of time IoU to convert frame-level annotations in the ntuu-DDD dataset into fragment-level annotations. The concept of time IoU is shown in FIG. 2. We designate as the annotation value for the segment values that account for more than 50% of the frame-level annotation value. Furthermore, we extract 10 frames of grayscale images from a 3-second video segment at average time intervals, and then calculate optical flow between two adjacent frames using the LK optical flow method to capture motion information of the object. Each frame image is then resized to 224 x 224 and input into the model. The most common method to prevent model overfitting is to artificially enlarge the data set using label-preserving transformations. In the present invention, we apply image level flipping and luminance contrast transformation to augment the data set. The data generated by data enhancement can enlarge the number of training samples, increase the diversity of the training samples, simultaneously prevent overfitting and improve the performance of the model.
Drowsiness action recognition model
The input of the 3D-CNN is an image sequence instead of a single image, so that the convolution kernel and the generated feature map are three-dimensional, and the output feature map contains space-time feature representations. Formally, a point at position (i, j, k) in the feature map of the (l + 1) th layer is denoted as Pl+1(i,j,k),
Pl+1(i,j,k)=α(f(i,j,k)+b) (1)
Figure BDA0003111072210000051
Where α represents the activation function and n represents the number of profiles. w, h and d represent the convolution kernel length, and the height and depth w (x, y, z) represent the weight of the (x, y, z) position in the convolution, respectively. s represents a step size, and b represents an offset amount.
The drowsiness action recognition model provided by the invention is based on 3D-CNN, and comprises two input streams, a 10-frame gray image sequence and a 9-frame optical flow image sequence. The grayscale image stream passes through four convolutional layers and four pooling layers. Similarly, the optical flow image stream also passes through four convolutional layers and four pooling layers. The two outputs are then converted to a one-dimensional vector and connected to the fully-connected layer. And finally, obtaining a classification result of the input image sequence, wherein the model architecture is shown in fig. 3, gray represents a gray image sequence input stream, and flow represents an optical flow image input stream. 224 × 224 × 10 denotes the input image width and height as 224 × 224, and 10 denotes the depth (frame number) of the input image sequence. Ci-jK @ w × h × d indicates that the jth convolutional layer of the ith input stream has k convolutional kernels with the size w × h and the depth d. M @ w × h × d on the left side of the cube in the figure indicates that the current layer has m feature cubes (corresponding to feature maps in the two-dimensional convolution) with a size of w × h × d. The first convolutional layer contains 8 convolutional kernels, each of which has a size of 3 × 3 × 3, the second and third convolutional layers have 16 convolutional kernels, and the fourth convolutional layer has 8 convolutional kernels. Si-jK @ w h d indicates that the pooling size of the jth pooling layer of the ith input stream is w h d. fc representsAnd a connecting layer (full-connected layer) which stretches the fully learned feature cubes into one dimension, has 23520 neurons in total, is connected with 64 neurons in the fc2 layer, is finally connected to 4 neurons in the output layer, and obtains a final classification result through a softmax classifier.
LK optical flow method
Because the three-dimensional convolution has one dimension in time more than the two-dimensional convolution, the model can well learn the characteristic representation of the input image in time and space, and the previous research proves that the three-dimensional convolution can be well applied to the drowsiness detection research field. The invention introduces the optical flow method into the drowsiness action recognition model, and the experimental result shows that the optical flow image is really helpful for recognizing the drowsiness action.
The concept of optical flow was first proposed by Gibson in 1950 as the instantaneous velocity of pixel motion in spatially moving objects in the imaging plane. And finding the corresponding relation between the current frame and the previous frame by utilizing the change of the pixel values in the image sequence and the correlation between the adjacent frames, thereby calculating the motion information of the object between the adjacent frames. In general, optical flow is caused by movement of the foreground object itself, movement of the camera, or movement of both in the scene. The optical flow algorithm evaluates the deformation between two images, with the basic assumption that the pixel values of moving objects do not change in the image sequence. Based on this assumption, we can derive the constraint equation for the image:
I(x,y,t)=I(x+Δx,y+Δy,t+Δt) (3)
wherein (x, y) represents the coordinates of the pixel point in the image, I (x, y, t) represents the brightness of the pixel point I with the time position of (x, y) t, Δ x represents the displacement of the pixel I in the horizontal direction after the time Δ t, and Δ y represents the displacement of the pixel I in the vertical direction after the time Δ t.
The right part of equation (3) is developed using taylor's equation to yield:
Figure BDA0003111072210000071
wherein
Figure BDA0003111072210000072
Represents the partial derivative of I with respect to x,
Figure BDA0003111072210000073
represents the partial derivative of I with respect to y,
Figure BDA0003111072210000074
the partial derivative of I with respect to t is indicated. H.o.t represents the higher order terms of the taylor equation expansion, which can be ignored here as 0 because of the small motion assumption of the optical flow method. Equation (3) and equation (4) are available:
Figure BDA0003111072210000075
equation (5) both sides of the equation are divided by Δ t simultaneously to obtain:
Figure BDA0003111072210000076
order:
Figure BDA0003111072210000077
then there are:
IxVx+IyVy=-It1 (8)
in the formula (8), there is only one equation with two unknowns VxAnd VyAnd therefore cannot be solved. Therefore, according to the new assumption proposed by Lucas and Kanade et al, the spatial consistency is that the motion directions of the pixel points in the neighborhood range of the target pixel are consistent, and the algorithm is named as LK optical flow method. Assuming that the neighborhood range is m × m, there are:
Figure BDA0003111072210000078
where n is the nth pixel in the window. Equation (7) can be expressed in a matrix form:
Av=-b (10)
wherein
Figure BDA0003111072210000081
Equation (10) is an overdetermined system of equations, because the number of equations is greater than the number of unknowns, we can solve the overdetermined system of equations by using the least squares method:
ATAv=AT(-b) (11)
wherein A isTIs the transpose of matrix a. Then multiplying both sides of equation (11) by A on the leftTInversion of A:
v=(ATA)-1AT(-b) (12)
solving to obtain:
Figure BDA0003111072210000082
the finally obtained vector (V)x,Vy) Namely the light stream calculated by the L-K light stream. In the present invention, a window size of 5 × 5 is preferable, and the generated optical flow image is as shown in fig. 4.
Results of the experiment
We first trained a training set containing 6918 image sequences from the training set in ntuh-DDD, which together contained simulated driving data for 18 subjects, into a drowsiness behavior recognition model. The 1460 image sequences of the validation set were all from the validation set of NTHU-DDD, and contained simulated driving data for a total of 4 subjects. The data distribution in the training set is shown in table 1.
In fact, the normal driving category is the highest-occupied category in the entire data set, and is approximately 60%. In order to balance the proportion of drowsy and non-drowsy actions in the data set, we discarded 20% of the data in the normal driving category and controlled the proportion of drowsy and non-drowsy action data to be around 1: 1. In addition, the proportion of the five scene data in the training set is close to 1:1:1:1: 1. Finally, we extend the data in the training set by the data enhancement method mentioned in 3.2, we apply horizontal flip transform to the data in the training set, change brightness and contrast, and finally obtain 27672 image sequences as the training set.
TABLE 1 training set data distribution
Figure BDA0003111072210000091
Effect of optical flow and data enhancement on model
The drowsiness action recognition model provided by the invention comprises two input streams: a grayscale image sequence and an optical flow image sequence. From the above description of the LK optical flow method, it has been understood that the optical flow image holds the object motion information that is beneficial for drowsiness motion recognition, and we calculate the optical flow between two adjacent frames and input it into the model. The data enhancement and the optical flow influence the performance of the model, the classification accuracy of the model is improved, the influence of the accuracy of the two factor models is quantified through an experimental result, and the main reason of the accuracy improvement is determined due to the introduction of the optical flow. We performed four sets of comparative experiments using the controlled variable method: (1) training using only raw data; (2) applying data enhancement only to the original data set; (3) adding an optical flow image on the basis of an original data set as model input; (4) data enhancement is applied to the original data set while adding optical flow as an input to the model. The other parameters of the four experiments are set to be consistent, the optimization method is Adam, and the learning rate is set to be 0.001. The effect of the final data enhancement and optical flow images on model accuracy is shown in FIG. 5 and the effect on the loss of the model training process is shown in FIG. 6.
Figures 5 and 6 show the variation of accuracy and loss for the 20 epoch models of the four control experiments. As can be seen from the figure, the accuracy of the model fluctuates greatly during the training process when only the original data set is used for training. Although the accuracy of the final model is not much different from that of the model enhanced by the application data, the model trained by only using the original data fluctuates from the loss of the 6 th epoch starting model by combining the lost change curve in the model training process, and rebounds after the loss is reduced, and the final loss is not reduced, so that the conclusion can be drawn that the model trained by only using the original data is already fitted. In addition, it can be seen that the accuracy of the model using both data enhancement and optical flow is up to 86.6%. While the model accuracy using only data enhancement is 79.1%, the model accuracy using only optical flow is 81.7%, and the effect of visible optical flow on the model accuracy is more significant than the data set enhancement.
F1 score
The F1 score is an index used to measure the accuracy of the binary classification model in statistics, taking into account the accuracy and recall of the classification model. The F1 score can be viewed as a harmonic mean of model accuracy and recall with a maximum of 1 and a minimum of 0. It is calculated by the following formula:
Figure BDA0003111072210000101
the accuracy and recall may be calculated from the confusion matrix in table 2. True examples (TP) in the table indicate that the predicted example is actually Positive, and the prediction is also Positive; the True Negative case (TN, True Negative) indicates that the actual is Negative and the prediction is also Negative; false Positive case (FP) indicates that actually negative, but predicted Positive; false Negative (FN) indicates that actually positive, but predicted Negative. The accuracy and recall calculation is as follows:
Figure BDA0003111072210000102
Figure BDA0003111072210000103
TABLE 2 confusion matrix
Figure BDA0003111072210000104
Figure BDA0003111072210000111
Table 3 verifies the accuracy and recall of the different categories on the set
Figure BDA0003111072210000112
Table 3 shows the model's accuracy and recall on the validation set. The precision rate of the yawning class is the highest, because the yawning action comprises the processes of opening and closing the mouth, compared with other classes, the yawning action features are more obvious, the facial expression changes more greatly, and the model learns the features more easily. The recall rate and the accuracy rate of slow blinking are lowest, because the slow blinking is only related to the movement of eyes, the face proportion of the eye area is small, the recognition is influenced, and the recognition difficulty is larger in scenes with glasses and sunglasses. As can be seen from the table, the data of the slow blinking category are 699, of which 137 are predicted to be the normal driving category, which means that the model is easy to confuse the normal driving category and the slow blinking category, and the ability to distinguish the two categories is to be improved. However, the classification accuracy of the entire model was good, and the F1 score of the model was 0.861 by calculation.
NTHU-DDD is the most widely used drowsiness detection dataset, and there are many algorithms that compare the dataset to which it is a reference with other algorithms. However, these algorithms process the NTHU-DDD data set differently, some convert the video into images and then train using frame-level annotations in the data set, and some process the data set into video segments and then train after converting the frame-level annotations into segment-level annotations. The latter method employed by the present invention uses frame-level labeling for training, thus selecting the existing method using the same frame-level labeling for comparison.
Yu et al propose a model based on a three-dimensional convolutional neural network for drowsiness detection, which consists of three modules, representing learning, scene understanding and feature fusion, respectively. The model generates a spatiotemporal representation from a plurality of successive frames and analyzes scene conditions to define head, eye, mouth movements. And then using the analysis result of the scene condition understanding model as auxiliary information of sleep detection. Finally, the method generates fusion features using the spatio-temporal representation of the scene conditions and the classification results. By fusing features, it is proved that the proposed method can improve the performance of sleep detection. The method uses two feature fusion strategies: IAA (independent-Averaged Architecture) and FFA (Feature-Fused Architecture). Then, a Condition-Adaptive Learning Framework (CARLF) is proposed, which contains 4 models: spatio-temporal representation learning, scene understanding, feature fusion and drowsiness detection. Features extracted by spatiotemporal representation learning can describe motion in a video. The scene condition understanding classifies various conditions of the driver and scene conditions related to driving conditions, such as a state of wearing glasses, a lighting condition of driving, movement of face elements such as a head, eyes, and a mouth, and the like. Feature fusion generates a conditional adaptive representation using two features extracted from the model described above. The drowsiness detection model uses a condition-adaptive representation to identify the drowsiness state of the driver. The condition self-adaptive characterization learning framework can extract more distinguishing features for each scene condition, so that the sleepiness detection method can provide more accurate results for various driving conditions. A comparison of the proposed method with the above method results is shown in table 4. It can be seen that the method provided by the invention is obviously superior to the existing method, and the model learns more characteristics beneficial to motion recognition due to the introduction of optical flow.
Table 4 compares the results with the prior art method
Figure BDA0003111072210000121
Figure BDA0003111072210000131
Temporal complexity of model
Due to the introduction of the three-dimensional convolutional neural network, the calculation amount of the model is greatly increased compared with that of the two-dimensional convolutional neural network. Theoretically, the time complexity of the drowsiness action detection model based on the three-dimensional convolution neural network is
Figure BDA0003111072210000132
Where d represents the number of model convolution layers. Wi,Hi,DiThe width, height and depth (number of frames in the image sequence) of the input feature map for each convolutional layer are shown, respectively. n isi,miAnd kiRepresenting the width, height and depth of the 3-dimensional convolution kernel. The model consumes a large amount of computing resources during training, but only 0.2 second is needed for predicting a 3s video segment during prediction, and the time spent on computing the optical flow is added, so that the frame rate of the whole scheme is 25.3fps, and the standard of real-time detection is basically met.
In conclusion, the optical flow images are used as input streams and are put into a drowsiness action recognition model based on a three-dimensional convolutional neural network for feature extraction, and the final experimental result shows that the model well learns the motion information contained in the drowsiness action recognition model. The scheme firstly extracts 10 frames of images from a 3-second video segment according to an average interval, converts the 10 frames of images into gray images and calculates optical flow between two adjacent frames by using an LK optical flow method. Then inputting the 10 frames of gray level image sequence and the 9 frames of optical flow image sequence into a pre-trained drowsiness action recognition model, and because the model is based on a three-dimensional convolution neural network, compared with a two-dimensional convolution neural network, the model has one dimension in time, so that the time characteristic representation in the image sequence can be better extracted, and the action recognition is facilitated. The model can identify four classes of actions including a non-drowsiness action (a normal driving class) and three drowsiness actions (a yawning class, a slow blinking class and a nodding class), and finally, through calculation of the model, an input image sequence is converted into a 1 x 4 vector which respectively represents the probability that the input image sequence is of each class, and whether the input contains the drowsiness action is finally determined.
Experimental results show that the final classification accuracy of the method provided by the invention is 86.6%, and the drowsiness detection of the driver can be effectively carried out. Compared with the prior method based on the vehicle parameters, the method provided by the invention has higher detection accuracy. Compared with the method based on physiological parameters, the method provided by the invention is more convenient, the electrode does not need to be connected to the body of a driver, the normal driving of the driver is prevented, and the equipment cost is lower. The prior computer vision detection method based on manual characteristics commonly uses a PERCLOS method and a FOM method for drowsiness detection, but generally one algorithm can only detect one drowsiness action, and the detection efficiency is not high. The method provided by the invention only needs to train the drowsiness action recognition model in advance, one model can detect a plurality of drowsiness actions, and the detection efficiency is higher.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for intelligently recognizing drowsiness actions of a driver is characterized by comprising the following steps:
the method comprises the following steps: acquiring a video stream in the driving process of a driver;
step two: preprocessing the video stream to obtain gray image information and optical flow image information of the video stream;
step three: and taking the gray scale image information and the optical flow image information as the input of a drowsiness action recognition model, thereby obtaining the drowsiness action of the driver.
2. The method for intelligently recognizing the drowsiness of the driver as claimed in claim 1, wherein the preprocessing in the second step comprises:
step 2-1: dividing the video stream into a plurality of video segments at certain time intervals;
step 2-2: extracting image information of the video clip according to frames, and respectively converting the image information of each frame to obtain a gray level image sequence;
step 2-3: and calculating optical flow information between two frames according to the optical flow between adjacent frames, thereby obtaining an optical flow image sequence.
3. The method for intelligently identifying the drowsiness of driver as claimed in claim 1, wherein the time interval in step 2 is to intercept video clips every 3 seconds.
4. The method for intelligently identifying drowsiness of a driver according to claim 1, wherein each frame of image has a size of 224 x 224.
5. The method for intelligently recognizing the drowsiness of the driver according to claim 1, wherein the drowsiness of the driver comprises: normal driving, nodding, slow blinking, yawning.
6. The method for intelligently recognizing the drowsiness of the driver according to claim 1, wherein the drowsiness recognition model comprises: the device comprises a convolution layer, a pooling layer, a full-connection layer and a softmax classifier which are connected in sequence;
wherein the convolutional layers comprise 4, the first convolutional layer comprises 8 convolutional kernels, and the size of each convolutional kernel is 3 x 3; the second and third convolutional layers have 16 convolution kernels, the fourth convolutional layer has 8 convolution kernels;
the full connection layer comprises two full connection layers, namely fc1 full connection layer and fc2 full connection layer; the number of the neurons of the fc1 full connection layer is 23520, and the number of the neurons of the fc2 full connection layer is 64.
CN202110650708.1A 2021-06-10 2021-06-10 Method for intelligently recognizing drowsiness action of driver Pending CN113408389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110650708.1A CN113408389A (en) 2021-06-10 2021-06-10 Method for intelligently recognizing drowsiness action of driver

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110650708.1A CN113408389A (en) 2021-06-10 2021-06-10 Method for intelligently recognizing drowsiness action of driver

Publications (1)

Publication Number Publication Date
CN113408389A true CN113408389A (en) 2021-09-17

Family

ID=77683634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110650708.1A Pending CN113408389A (en) 2021-06-10 2021-06-10 Method for intelligently recognizing drowsiness action of driver

Country Status (1)

Country Link
CN (1) CN113408389A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311539A (en) * 2023-05-19 2023-06-23 亿慧云智能科技(深圳)股份有限公司 Sleep motion capturing method, device, equipment and storage medium based on millimeter waves

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188637A (en) * 2019-05-17 2019-08-30 西安电子科技大学 A kind of Activity recognition technical method based on deep learning
CN110543848A (en) * 2019-08-29 2019-12-06 交控科技股份有限公司 Driver action recognition method and device based on three-dimensional convolutional neural network
CN112699802A (en) * 2020-12-31 2021-04-23 青岛海山慧谷科技有限公司 Driver micro-expression detection device and method
CN112766145A (en) * 2021-01-15 2021-05-07 深圳信息职业技术学院 Method and device for identifying dynamic facial expressions of artificial neural network
CN112800988A (en) * 2021-02-02 2021-05-14 安徽工业大学 C3D behavior identification method based on feature fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188637A (en) * 2019-05-17 2019-08-30 西安电子科技大学 A kind of Activity recognition technical method based on deep learning
CN110543848A (en) * 2019-08-29 2019-12-06 交控科技股份有限公司 Driver action recognition method and device based on three-dimensional convolutional neural network
CN112699802A (en) * 2020-12-31 2021-04-23 青岛海山慧谷科技有限公司 Driver micro-expression detection device and method
CN112766145A (en) * 2021-01-15 2021-05-07 深圳信息职业技术学院 Method and device for identifying dynamic facial expressions of artificial neural network
CN112800988A (en) * 2021-02-02 2021-05-14 安徽工业大学 C3D behavior identification method based on feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
毛红运: ""基于卷积神经网络的司机嗜睡检测方法研究"", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, no. 2, pages 035 - 198 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311539A (en) * 2023-05-19 2023-06-23 亿慧云智能科技(深圳)股份有限公司 Sleep motion capturing method, device, equipment and storage medium based on millimeter waves
CN116311539B (en) * 2023-05-19 2023-07-28 亿慧云智能科技(深圳)股份有限公司 Sleep motion capturing method, device, equipment and storage medium based on millimeter waves

Similar Documents

Publication Publication Date Title
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
Zhang et al. Driver fatigue detection based on eye state recognition
Rangesh et al. Driver gaze estimation in the real world: Overcoming the eyeglass challenge
Chaudhari et al. Face detection using viola jones algorithm and neural networks
CN106682578B (en) Weak light face recognition method based on blink detection
Zhang et al. Driver yawning detection based on long short term memory networks
CN108596087B (en) Driving fatigue degree detection regression model based on double-network result
Celona et al. A multi-task CNN framework for driver face monitoring
Du et al. A multimodal fusion fatigue driving detection method based on heart rate and PERCLOS
CN110717389A (en) Driver fatigue detection method based on generation of countermeasure and long-short term memory network
CN116645917A (en) LED display screen brightness adjusting system and method thereof
Dipu et al. Real-time driver drowsiness detection using deep learning
CN112949560A (en) Method for identifying continuous expression change of long video expression interval under two-channel feature fusion
Valsan et al. Monitoring driver’s drowsiness status at night based on computer vision
Lee et al. Face and facial expressions recognition system for blind people using ResNet50 architecture and CNN
CN113408389A (en) Method for intelligently recognizing drowsiness action of driver
Patil et al. Emotion recognition from 3D videos using optical flow method
Pandey et al. Dumodds: Dual modeling approach for drowsiness detection based on spatial and spatio-temporal features
Fan et al. Nonintrusive driver fatigue detection
Pavani et al. Drowsy Driver Monitoring Using Machine Learning and Visible Actions
Yamamoto et al. Algorithm optimizations for low-complexity eye tracking
Guo et al. Monitoring and detection of driver fatigue from monocular cameras based on Yolo v5
AlKishri et al. Enhanced image processing and fuzzy logic approach for optimizing driver drowsiness detection
Chen Evaluation technology of classroom students’ learning state based on deep learning
Shi et al. Research on safe driving evaluation method based on machine vision and long short-term memory network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination