CN112183220A

CN112183220A - Driver fatigue detection method and system and computer storage medium

Info

Publication number: CN112183220A
Application number: CN202010918289.0A
Authority: CN
Inventors: 李雪辉; 张莹; 苗海丽; 许子华; 黄树程; 周鹏
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2021-01-05

Abstract

The invention relates to a method for detecting fatigue of a driver, a system thereof and a computer storage medium, wherein the method comprises the following steps: periodically acquiring a current frame face image of a driver; detecting the current frame face image by using a preset progressive calibration network so as to position feature points in the current frame face image; determining left and right eye region images and mouth region images according to the positioning result of the feature points, extracting target feature points in the left and right eye region images and the mouth region images, and performing sparse representation on the target feature points to obtain sparse feature vectors; processing the sparse feature vector by using a pre-trained neural network classification model to output an eye classification result and a mouth classification result of the current frame; and judging whether the driver is tired or not according to the eye classification result and the mouth classification result of the continuous multi-frame face images of the driver in a time sequence. The invention can improve the accuracy and the real-time performance of the fatigue detection of the driver.

Description

Driver fatigue detection method and system and computer storage medium

Technical Field

The invention relates to the technical field of safe driving of automobiles, in particular to a method and a system for detecting fatigue of a driver and a computer storage medium.

Background

The existing driver fatigue detection method mainly judges whether a driver is in fatigue driving or not through facial information of the driver, particularly eye image information, but because in the actual driving process, the lighting condition in the vehicle and the face posture of the driver are complicated and changeable, the existing traditional face detection and feature point positioning method cannot adapt to the situation that the lighting condition in the vehicle and the face posture of the driver are complicated and changeable in the actual driving process when being applied to the face recognition process of the driver in the vehicle, and the detection accuracy and the real-time performance of the method are still to be improved.

Disclosure of Invention

The invention aims to provide a driver fatigue detection method, a system and a computer storage medium thereof, so as to improve the accuracy and the real-time performance of the driver fatigue detection.

To achieve the above object, according to a first aspect, an embodiment of the present invention proposes a driver fatigue detection method, including:

step S1, periodically acquiring a current frame face image of the driver;

step S2, detecting the current frame face image by using a preset progressive calibration network so as to position the feature points in the current frame face image; wherein the feature points comprise left and right eye feature points and mouth feature points;

step S3, determining a left eye region image, a right eye region image and a mouth region image according to the positioning result of the feature points, extracting target feature points in the left eye region image, the right eye region image and the mouth region image, and performing sparse representation on the target feature points to obtain sparse feature vectors;

step S4, processing the sparse feature vector by using a pre-trained neural network classification model, and outputting an eye classification result and a mouth classification result of the current frame face image;

step S5, judging whether the driver is tired according to the eye classification result and the mouth classification result of the continuous multiframe human face images of the driver in a time sequence; wherein, a plurality of frames of face images of the driver are periodically acquired in one time sequence.

Optionally, the step S1 includes:

step S11, acquiring a current frame original image output by a vehicle camera;

step S12, reducing the size of the original image of the current frame according to a preset proportion;

step S13, reducing the size of the largest face in the original image of the current frame according to a preset proportion;

and step S14, determining a face search area in the current frame original image by using the face position in the previous frame original image, and performing face detection on the face search area by using a sliding window to obtain the current frame face image.

Optionally, the step S2 includes:

step S21, carrying out image transformation on the current frame face image obtained in step S1 according to the following formula;

Ri(x,y)＝log[Ii(x,y)/Li(x,y)]＝log Ii(x,y)-log[F(x,y)*Ii(x,y)]

where Ii (x, y) is the i-th color component of the current frame face image, Ri (x, y) is the reflected light information of the i-th color component, and Li (x, y) is the irradiated light information of the i-th color component, which denotes a convolution operation, σ is a scale factor, and K is a constant such that F (x, y) satisfies ×. F (x, y) dxdy ═ 1;

and step S22, detecting the current frame face image transformed in the step S21 by using a preset progressive calibration network so as to position the feature points in the current frame face image.

Optionally, the step S3 includes:

s31, constructing a target area feature point set according to the positioning result of the feature points;

step S32, according to the target region feature point set, defining and describing the relative coordinate position relationship between each point of the left and right eye target regions and the mouth target region on a face plane coordinate system, and determining the peripheral rectangular frame with the minimum area of the left and right eye target regions and the mouth target region according to the relative coordinate position relationship;

step S33, performing inclination angle correction on the peripheral rectangular frame with the minimum area of the left and right eye target areas and the mouth target area, and acquiring left and right eye area images and mouth area images according to the peripheral rectangular frame with the minimum area of the left and right eye target areas and the mouth target area after inclination angle correction;

and step S34, extracting target feature points in the left eye region image, the right eye region image and the mouth region image, and performing sparse representation on the target feature points to obtain sparse feature vectors.

Optionally, the step S31 includes:

if the feature points in the face image of the current frame are missing, supplementing the missing feature points of the eye and mouth regions according to a pre-constructed three-eye five-family digital model of the driver; the three-eye five-family digital model comprises a front face and image information of the face of a driver under various deflection angles and pitching angles;

wherein:

each point of the left and right eye target areas comprises: the canthus of the left and right eyes, the centers of the left and right eyeball, and the centers of the upper eyelid contour and the lower eyelid contour of the left and right eyes;

the various points of the mouth target area include: left and right corners of the mouth, center of contour of the upper lip, and center of contour of the lower lip.

Optionally, wherein:

the eye classification result includes: both eyes closed, both eyes slightly closed and both eyes open;

the mouth classification result includes: mouth closed, half mouth open and yawning like mouth open.

Optionally, the neural network classification model includes 3 convolutional layers, 3 pooling layers connected to the 3 convolutional layers in a one-to-one correspondence, a first fully-connected layer connected to the 3 pooling layers, and a second fully-connected layer connected to the first fully-connected layer; the 3 convolutional layers are respectively used for carrying out convolution processing on the left eye region image, the right eye region image and the mouth region image to obtain a feature vector which adopts sparse representation, the 3 pooling layers are respectively used for pooling convolution results of the 3 convolutional layers, and the first full-connection layer and the second full-connection layer are used for classifying outputs of the 3 pooling layers to obtain a classification result.

Optionally, the step S5 includes:

calculating an f value according to an eye classification result of continuous multi-frame face images of a driver in a time sequence and a formula f, wherein the formula f is Nc/Nt multiplied by 100%, and judging whether the eyes of the driver are closed or not according to a comparison result of the f value and a preset threshold; wherein Nc represents the number of frames of the eyes of the driver in a closed state in the period time, and Nt represents the number of effective identification frames of the eye state of the driver in the period time;

determining the duration of the opening of the mouth of the driver according to the mouth classification result of continuous multiframe face images of the driver in a time sequence, and judging that the yawning of the driver is performed when the duration is greater than or equal to the preset duration and the relative position relation of the coordinates of the feature points of the mouth in the yawning state is met.

In a second aspect, an embodiment of the present invention provides a driver fatigue detection system for performing the driver fatigue detection method of the first aspect, the system including:

the image acquisition unit is used for acquiring a face image of a periodic current driver;

the characteristic point detection unit is used for detecting the current frame face image by using a preset progressive calibration network so as to position the characteristic points in the current frame face image; wherein the feature points comprise left and right eye feature points and mouth feature points;

the region image acquisition unit is used for determining a left eye region image, a right eye region image and a mouth region image according to the positioning result of the feature points, extracting target feature points in the left eye region image, the right eye region image and the mouth region image, and performing sparse representation on the target feature points to obtain sparse feature vectors;

the classification unit is used for processing the sparse feature vector by utilizing a pre-trained neural network classification model and outputting an eye classification result and a mouth classification result of the current frame face image;

a fatigue determination unit for determining whether the driver is tired based on the eye classification result and the mouth classification result of the continuous multi-frame face images of the driver in one time sequence; wherein, a plurality of frames of face images of the driver are periodically acquired in one time sequence.

According to a third aspect, an embodiment of the invention proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the driver fatigue detection method according to the first aspect.

The embodiment of the invention provides a driver fatigue detection method, a system and a computer storage medium thereof, which cascade and detect multi-frame face images of a driver in a time sequence acquired by a camera through a plurality of convolutional neural networks to complete the positioning of characteristic points, further extract left and right eye target area images and mouth target area images in the images according to the positioning result, extract the sparse representation-based characteristic vector states of the left and right eye target area images and the mouth target area images by utilizing a neural network classification model and classify according to the characteristic vectors, finally judge whether the driver is tired according to the classification result of the multi-frame face images in the time sequence, greatly accelerate the processing speed of the images when the multi-convolutional neural network cascade detection is carried out on the face images, in addition, the sparse representation characteristic vectors are adopted, the method can adapt to the situation that the illumination condition in the automobile and the face posture of the driver are complicated and changeable in the actual driving process, and improves the accuracy and the real-time performance of the fatigue detection of the driver compared with the prior art.

Additional features and advantages of the invention will be set forth in the description which follows.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for detecting fatigue of a driver according to an embodiment of the present invention.

Fig. 2 is an original face image in the non-ideal environment in the present embodiment.

Fig. 3 is the face image of the original face image in the non-ideal environment in the present embodiment after being processed in step S2.

Fig. 4 is a schematic diagram of the principle of the gradual calibration network PCN in this embodiment.

Fig. 5 is a schematic structural diagram of a neural network classification model in this embodiment.

FIG. 6 is a block diagram of a driver fatigue detection system according to another embodiment of the present invention.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In addition, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known means have not been described in detail so as not to obscure the present invention.

Referring to fig. 1, an embodiment of the present invention provides a method for detecting driver fatigue, including:

step S1, periodically acquiring a current frame face image of the driver;

step S2, detecting the current frame face image by using a preset progressive calibration network so as to position the feature points in the current frame face image; wherein the feature points comprise left and right eye feature points and mouth feature points; specifically, localization refers to marking feature points in an image;

The method of the embodiment comprises the steps of cascading and detecting a plurality of frames of face images of a driver in a time sequence acquired by a camera through a plurality of convolutional neural networks to complete the positioning of feature points, further extracting left and right eye target area images and mouth target area images in the images according to the positioning result, extracting sparse representation-based feature vector states of the left and right eye target area images and the mouth target area images by using a neural network classification model, classifying according to feature vectors, and finally judging whether the driver is tired or not according to the classification result of the plurality of frames of face images in the time sequence, wherein when the face images are subjected to a plurality of convolutional neural network cascading detections, the processing speed of the images can be greatly accelerated, in addition, the sparse representation feature vectors are adopted, the method can adapt to the conditions of illumination in a vehicle and the complicated and changeable face postures of the driver in the actual driving process, compared with the prior art, the accuracy and the real-time performance of the fatigue detection of the driver are improved.

Optionally, step S1 in this embodiment includes:

step S11, acquiring a current frame original image output by a vehicle camera; where the original image includes the driver's face and other factors in the vehicle, such as other "irrelevant people".

Step S12, reducing the size of the original image of the current frame according to a preset proportion; the image pyramid level is reduced, and the image processing acceleration in the initial state is realized.

specifically, the purpose of reducing the maximum face size is to reduce the size of the face image of the driver, and facilitate the use of a sliding window for rapid detection. The reason is that when there are many possible persons in the current vehicle, the acquired face image may not be the face image of the driver, and the "driver face image" acquired in step S11 may also include "irrelevant persons" who enter the camera view at the back, so that the driver is the image with the largest face area, and after the position of the real driver face image is located, the face image in the area needs to be scaled down, which facilitates the sliding window search for the face features in step S14.

And step S14, determining a face search area in the current frame original image by using the face position in the previous frame original image, and performing face detection on the face search area by using a sliding window to obtain the current frame face image. Specifically, the face range in the original image of the previous frame can be properly amplified to be used as a search area of the sliding window of the current frame, so that the detection area is simplified, the feature extraction range is narrowed, and the acceleration of feature extraction is further realized.

Illustratively, the face frame of the previous frame original image is horizontally enlarged by 1.3 times, and vertically enlarged by 1.5 times, as the search range of the face of the current frame original image.

Optionally, step S2 in this embodiment includes:

let the acquired face image be represented as I (x, y) ═ L (x, y) × R (x, y);

wherein, R (x, y) is the reflected light information of the human face, and L (x, y) is the irradiation light information;

in order to reduce the calculated amount of the image and accelerate the speed of the face image preprocessing, the following transformation is carried out on the face image:

let log I (x, y) be log L (x, y) + log R (x, y);

since R (x, y) and L (x, y) contain different information, and the variation frequency of the signal is obvious, based on Retinex theory, the face output of the ith color component is:

Ri(x,y)＝log[Ii(x,y)/Li(x,y)]＝log Ii(x,y)-log[F(x,y)*Ii(x,y)]

where Ii (x, y) is the i-th color component of the current frame face image, Ri (x, y) is the reflected light information of the i-th color component, and Li (x, y) is the irradiated light information of the i-th color component, which denotes a convolution operation, σ is a scale factor, and K is a constant such that F (x, y) satisfies ×. F (x, y) dxdy ═ 1.

Fig. 2 is an original face image in a non-ideal environment, and fig. 3 is a face image processed in step S2.

Specifically, the accuracy of the face detection algorithm is reduced due to the head pose of the non-frontal face, so that the key points are not accurately positioned, and the region of interest cannot be accurately extracted. Aiming at the problem, the human face is detected through a PCN (progressive calibration network) in the step, and the feature points are positioned.

In the embodiment, the PCN cascades 3 CNN networks to predict the face frame and the face angle value from thick to thin. As shown in fig. 4, PCN-1 performs a classification task on the face angle while predicting the face mark frame, and turns the face of [ -180 °, 180 ° ] to [ -90 °, 90 ° ]. Similarly, PCN-2 carries out three classifications on the angles of the human face, and limits the angle range of the human face to [ -45 degrees, 45 degrees ]. PCN-3 predicts the precise angle using angle deviation regression. As shown in the following formula, Θ rip is θ 1+ θ 2+ θ 3, and the sum of the angles of the 3 network predictions is the final face offset angle; the finally obtained human face offset angle is used for correcting the offset human face head portrait to enable the human face head portrait to be in the face-facing posture, and the problems that the accuracy of a human face detection algorithm is reduced due to the head posture of a non-face, and further, key points are not accurately positioned and the region of interest cannot be accurately extracted are solved.

Optionally, step S3 in this embodiment includes:

Optionally, step S31 in this embodiment includes:

if the feature points in the face image of the current frame are missing, supplementing the missing feature points of the eye and mouth regions according to a pre-constructed three-eye five-family digital model of the driver;

the three-family five-eye digital model construction process comprises the following steps: the method comprises the steps of obtaining face images of a plurality of front faces and drivers under different deflection angles, different pitching angles and different illumination conditions, and constructing a three-family five-eye digital model of the front faces and the drivers under the different deflection angles and different pitching angles.

It should be noted that each driver has a three-family five-eye digital model corresponding to the driver, and according to the three-family five-eye digital model which is constructed in advance and combined with the detected and positioned partial feature points, the simulation supplement of the feature points can be performed when the head of the driver is large in deflection amplitude, or the pitch angle is large, or the feature points are few under the condition of partial shielding, so that the identification accuracy is greatly improved.

Based on this, the embodiment realizes accurate detection of the driver's face and accurate and rapid positioning of the feature points under the conditions of different illumination conditions, large-amplitude human face deflection, human face pitching and partial shielding in a complex in-vehicle specific environment, and improves the processing speed while ensuring the accuracy.

Wherein:

Optionally, in this embodiment:

Optionally, referring to fig. 5, in this embodiment, the neural network classification model includes 3 convolutional layers, 3 pooling layers connected to the 3 convolutional layers in a one-to-one correspondence, a first fully-connected layer connected to the 3 pooling layers, and a second fully-connected layer connected to the first fully-connected layer; the 3 convolutional layers are respectively used for carrying out convolution processing on the left eye region image, the right eye region image and the mouth region image to obtain a feature vector which adopts sparse representation, the 3 pooling layers are respectively used for pooling convolution results of the 3 convolutional layers, and the first full-connection layer and the second full-connection layer are used for classifying outputs of the 3 pooling layers to obtain a classification result.

Specifically, the 3

convolutional layers

1, 2, and 3 include 32, and 64 convolutional kernels, respectively, and the size of the convolutional kernel is 8; in the 3 pooling layers, the pooling layer 1 adopts maximum pooling, and the maximum value in the local receptive field is taken, and the

subsequent pooling layers

2 and 3 adopt average pooling, and the average value in the local receptive field is taken. The number of the neurons in the first full-connection layer and the second full-connection layer is 64 and 2 respectively, wherein the number of the neurons in the second full-connection layer is the number of the classification of the image to be classified, namely the eye classification result and the mouth classification result.

Illustratively, in the training process, a feature vector is constructed by using sparse representation, and for classifying and identifying the face processed based on Retinex theory, features capable of describing the face image category are extracted. Firstly, carrying out blocking operation on a human face, namely carrying out region division on a human face image according to three courtyard five eyes, and respectively extracting human eye feature points or mouth feature points in two regions of an atrium and a lower atrium; and then extracting the characteristics of each sub-block of the face by adopting a sparse representation algorithm so as to obtain a classification characteristic set for face recognition.

Specifically, sampling operation is performed on the face subblock Bb, that is, an eye characteristic point or a mouth characteristic point in the face subblock Bb is collected, all sampling results are combined together to obtain a vector vb ∈ Rd (d ═ m × n), and vectors of all sampling results form a characteristic vector of the subblock, so that the face image can be sparsely represented.

For all face image training samples, the feature dictionary formed by the same face sub-blocks is specifically as follows:

A_b＝[v_1，1，v_1，2，…，v_1，n1，v_2，1，…，v_2，n2，…，v_k，1，…，v_k，nk]∈R^d×N；

and vi and j are sparse representation features of the subblock Bb of the jth face in the ith class, N represents the number of face image training samples, k represents the number of the face classes, and d represents the dimension of the features.

Setting yb to represent the feature vector of a subblock Bb in a human face image test sample y, and solving a sparse expression, specifically:

min||x||₀s.t.y_b＝A_bx；

wherein | · | purple sweet₀Representing |₀And (4) norm.

Convert it to solve for l₁Norm specifically is:

min||x||₁s.t.y_b＝A_bx；

after obtaining the solution x, calculating the residual error of the solution for the ith class of face:

ri(y_b)＝||y_b-A_bi(x)||2；

where i (x) ε RN represents the x-derived vector.

Based on the above contents, the approximate feature vector of the tested face can be obtained according to the features of the training sample.

And further marking the feature vectors and the corresponding face classes, combining training samples and test sample sets of face recognition, wherein the feature vectors are used as the input of a classification model, and the face classes are used as the output of the classification model.

Optionally, step S5 in this embodiment includes:

step S51, calculating an f value according to the eye classification result of the continuous multiframe face images of the driver in a time sequence and a formula f, wherein the formula f is Nc/Nt multiplied by 100 percent, and judging whether the eyes of the driver are closed or not according to the comparison result of the f value and a preset threshold value; wherein Nc represents the number of frames of the eyes of the driver in a closed state in the period time, and Nt represents the number of effective identification frames of the eye state of the driver in the period time;

preferably, the preset threshold value in this embodiment is 0.5.

Specifically, in a certain frame of image, when both eyes are recognized as a closed state, the frame is regarded as an effective recognition frame of the closed eyes; when the judgment results of the two eyes are inconsistent, the middle effective identification frame is included; when both eyes are recognized in the open state, it is considered as a valid recognition frame of the open eyes.

Wherein, the effective identification frame number is the effective identification frame of closed eye + the middle effective identification frame + the effective identification frame of open eye.

And step S52, determining the duration of the opening of the mouth of the driver according to the mouth classification result of the continuous multiframe face images of the driver in a time sequence, and judging that the yawning of the driver occurs when the duration is greater than or equal to the preset duration and accords with the coordinate relative position relation of the feature points of the mouth in the yawning state.

Preferably, in this embodiment, the preset time period is 2 seconds.

Preferably, in this embodiment, a time sequence of the number of frames for effective eye state recognition is defined as 3 seconds.

Referring to fig. 6, another embodiment of the present invention provides a driver fatigue detection system for performing the driver fatigue detection method according to the above embodiment, the system including:

the image acquisition unit 1 is used for acquiring a face image of a periodic current driver;

the feature point detection unit 2 is configured to detect the current frame face image by using a preset progressive calibration network, so as to locate feature points in the current frame face image; wherein the feature points comprise left and right eye feature points and mouth feature points;

the region image acquisition unit 3 is configured to determine left and right eye region images and mouth region images according to the positioning result of the feature points, extract target feature points in the left and right eye region images and mouth region images, and perform sparse representation on the target feature points to obtain sparse feature vectors;

the classification unit 4 is used for processing the sparse feature vector by using a pre-trained neural network classification model and outputting an eye classification result and a mouth classification result of the current frame face image;

a fatigue determination unit 5 for determining whether the driver is tired based on the eye classification result and the mouth classification result of the continuous multiple frames of face images of the driver in one time series; wherein, a plurality of frames of face images of the driver are periodically acquired in one time sequence.

The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

It should be noted that the system described in the foregoing embodiment corresponds to the method described in the foregoing embodiment, and therefore, portions of the system described in the foregoing embodiment that are not described in detail can be obtained by referring to the content of the method described in the foregoing embodiment, and details are not described here.

Also, the driver fatigue detection system according to the above-described embodiment, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium.

Illustratively, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

Another embodiment of the present invention also proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the driver fatigue detection method of the above-mentioned embodiment.

Specifically, the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A driver fatigue detection method, characterized by comprising:

step S1, periodically acquiring a current frame face image of the driver;

2. The driver fatigue detection method according to claim 1, wherein the step S1 includes:

step S11, acquiring a current frame original image output by a vehicle camera;

3. The driver fatigue detection method according to claim 1, wherein the step S2 includes:

Ri(x,y)＝log[Ii(x,y)/Li(x,y)]＝log Ii(x,y)-log[F(x,y)*Ii(x,y)]

4. The driver fatigue detection method according to claim 1, wherein the step S3 includes:

5. The driver fatigue detection method according to claim 4, wherein the step S31 includes:

wherein:

6. The driver fatigue detection method according to claim 1, wherein:

7. The driver fatigue detection method according to claim 6, wherein the neural network classification model includes 3 convolutional layers, 3 pooling layers connected in one-to-one correspondence with the 3 convolutional layers, a first fully-connected layer connected to the 3 pooling layers, and a second fully-connected layer connected to the first fully-connected layer; the 3 convolutional layers are respectively used for carrying out convolution processing on the sparse feature vector, the 3 pooling layers are respectively used for pooling convolution results of the 3 convolutional layers, and the first full-connection layer and the second full-connection layer are used for classifying outputs of the 3 pooling layers to obtain classification results.

8. The driver fatigue detection method according to claim 6, wherein the step S5 includes:

9. A driver fatigue detection system for performing the method of any of claims 1-8, the system comprising:

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the driver fatigue detection method of any of claims 1-8.