CN116189155A

CN116189155A - Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism

Info

Publication number: CN116189155A
Application number: CN202211619272.0A
Authority: CN
Inventors: 常发亮; 黄一鸣; 刘春生; 路彦沙; 常致富
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-05-30

Abstract

The invention discloses a fatigue driving detection method and a system based on depth characteristics and a graph annotation force mechanism, wherein the method comprises the following steps: constructing a sample training set by using a plurality of original video sequences marked with whether to fatigue drive; constructing a fatigue driving detection model, and training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features; and inputting the original video sequence to be detected into a fatigue driving detection model after training, so as to finish fatigue driving detection and improve the precision and generalization of the fatigue driving detection.

Description

Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism

Technical Field

The invention relates to the technical field of fatigue driving detection, in particular to a fatigue driving detection method and system based on depth characteristics and a drawing force mechanism.

Background

Expressways rapidly develop in modern society, and fatigue driving is one of the main fierces causing traffic accidents. The fatigue driving condition can appear in the long-time driving of driver, in order to improve driver and passenger's trip safety, need detect the fatigue condition of driver to in time give the driver and remind. Therefore, with the improvement of safety consciousness and scientific technology of people, a fatigue driving detection means is generated.

The fatigue driving detection method commonly used at present mainly comprises a method based on physiological information, a method based on computer vision, a method based on vehicle running behavior change and the like. The method based on computer vision is higher in accuracy of fatigue driving detection through facial features of a driver such as eyes, a mouth, a head and an expression, is lower in cost, does not interfere with normal driving, is higher in practicality, acceptability and comfort, and is most widely applied. However, the existing fatigue driving detection method based on computer vision still has the defects in algorithm.

First, the direct interrelationship of driver facial features is not fully utilized. Conventional approaches typically use 3DCNN and LSTM to study the relationship between driver facial features, such as "Long-term multi-granularity deep framework for driver drowsiness detection" as disclosed by j.lyu et al and "Driver Yawning Detection Based on Subtle Facial Action Recognition" as disclosed by h.yang et al. However, 3DCNN is limited by the size of the convolution kernel receptive field, can only build timing features in certain continuous feature maps, and has difficulty in building long-term correlations; the LSTM processes the features sequentially in sequence, the features located at the back of the sequence can only obtain the aggregate information of the features located at the front of the sequence, and it is difficult to construct correlations for features farther apart in the sequence. The direct correlation between facial features is not explored in the existing detection methods.

Secondly, the ability to distinguish between peak frame features and non-peak frame features is lacking. The existing driver fatigue detection method treats each acquired facial feature equally, and ignores the fact that different facial features make different contributions to the final classification. In particular, for example, a yawned video clip, the driver's facial expression may change only subtly initially, and then develop a strong appearance, i.e., tight eyes and a large mouth. The former is often referred to as non-peak frames, while the latter is referred to as peak frames, features from peak frames have greater reference value for the final classification than features of non-peak frames. The existing method lacks the capability of distinguishing peak frame characteristics from non-peak frame characteristics, and cannot highlight the importance of the peak frame characteristics.

In summary, the existing fatigue driving detection method based on computer vision has the defects, and the precision and generalization of the fatigue driving detection are poor.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a fatigue driving detection method and a system based on depth characteristics and a graph annotation force mechanism, solves the problems of insufficient utilization of direct facial characteristic correlation and lack of capability of distinguishing peak frame characteristics from non-peak frame characteristics in the existing detection method, and improves the precision and generalization of fatigue driving detection.

In a first aspect, the present disclosure provides a fatigue driving detection method based on depth features and a schematic force mechanism, including:

acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences, and constructing a sample training set by using the original video sequences with the marks whether fatigue driving exists or not;

constructing a fatigue driving detection model, and training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;

and inputting the original video sequence to be detected, which contains the face image of the driver, into a fatigue driving detection model after training, and completing fatigue driving detection.

According to a further technical scheme, the fatigue driving detection model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network.

According to a further technical scheme, the method for acquiring the space-time facial feature sequence based on the original video sequence through facial feature extraction and position coding comprises the following steps:

acquiring a face image sequence through a face detection algorithm based on an original video sequence containing a face image of a driver;

based on the facial image sequence, extracting visual characterization features of each image in the facial image sequence by using a convolutional neural network, so as to form a spatial facial feature sequence;

and (3) carrying out position coding on the spatial facial feature sequence by utilizing a multi-frequency cosine position function, and injecting position information for each facial feature in the sequence to obtain a space-time facial feature sequence.

According to a further technical scheme, the method for performing position coding on a spatial facial feature sequence by using a multi-frequency cosine position function, injecting position information for each facial feature in the sequence, and obtaining a space-time facial feature sequence comprises the following steps:

coding by utilizing a multi-frequency cosine position function to obtain a position coding sequence;

and adding the position coding sequence and the space facial feature sequence to obtain a space-time facial feature sequence.

Further technical solution, learning correlations between spatiotemporal facial features based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, includes:

based on the space-time facial feature sequence, each space-time facial feature is regarded as a node to form node features, so that a directed complete graph is constructed;

mapping the input node characteristics to a characteristic subspace, and calculating attention coefficients for each pair of nodes in the directed graph through a shared self-attention mechanism so as to construct an adjacency matrix;

and fusing the adjacent matrix with the node characteristic matrix, updating the node characteristics, splicing the node characteristics updated by each attention head to form new node characteristics, and acquiring an updated space-time facial characteristic sequence.

According to a further technical scheme, the importance weight of each feature to the final classification is learned based on a graph attention mechanism, weighted fusion is carried out based on the importance weight and an updated space-time facial feature sequence, and fusion features are obtained, wherein the method comprises the following steps:

projecting node characteristics into a unified linear space, calculating attention coefficients for each pair of nodes in the graph by adopting a shared self-attention mechanism, and constructing an adjacency matrix based on the attention coefficients;

adding adjacent matrixes according to rows to form a weight vector; each element in the weight vector is a corresponding facial feature importance weight;

and carrying out weighted fusion on the updated space-time facial feature sequence and the weight vector to obtain fusion features.

According to a further technical scheme, the loss function is a cross entropy function, and the formula of the cross entropy function is as follows:

where K is the number of samples per batch, cls represents the class of classification, cls=0/1, 0 represents no fatigue, 1 represents fatigue,

is a sample tag, indicating that the ith sample belongs to class cls,/->

Representing the probability of detecting the i-th sample as class cls.

In a second aspect, the present disclosure provides a fatigue driving detection system based on depth features and a schematic force mechanism, comprising:

the sample training set construction module is used for acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences and constructing a sample training set according to whether the plurality of original video sequences marked with fatigue driving are marked;

the detection model construction module is used for constructing a fatigue driving detection model, and the model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network;

the detection model training module is used for training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;

the detection module is used for inputting an original video sequence to be detected, which contains the face image of the driver, into the fatigue driving detection model after training is completed, and the fatigue driving detection is completed.

In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the method of the first aspect.

In a fourth aspect, the present disclosure also provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.

The one or more of the above technical solutions have the following beneficial effects:

1. the invention provides a fatigue driving detection method and a system based on depth features and a drawing attention mechanism, wherein the direct interrelationship of each pair of features in a facial feature sequence of a driver is autonomously learned based on the drawing attention mechanism, and facial feature information is updated based on the learned interrelationship; and for the updated facial features, autonomously learning the contribution degree of each feature to the final classification by using a graph annotation force mechanism, giving importance weight to the contribution degree of the final classification, carrying out weighted fusion on the features, and based on the fusion features, enabling a detection model to effectively distinguish peak frame features from non-peak frame features, fully utilizing peak frame feature information, and further improving the precision and generalization of fatigue driving detection.

2. The fatigue driving detection method and system provided by the invention solve the problems of insufficient utilization of the direct correlation of facial features and lack of the capability of distinguishing peak frame features from non-peak frame features in the existing detection method, and improve the precision and generalization of fatigue driving detection.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a schematic diagram of a network structure of a fatigue driving detection model based on depth features and a schematic force mechanism in a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a full directed graph according to a first embodiment of the present invention;

FIG. 3 is a flowchart of a method for learning a schematic drawing in accordance with an embodiment of the present invention;

fig. 4 is a visual histogram of the importance weights of facial features in the facial feature sequences corresponding to the input facial images in accordance with the first embodiment of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

The embodiment provides a fatigue driving detection method based on depth characteristics and a drawing attention mechanism, which is used for detecting fatigue driving through the drawing attention mechanism based on a driver driving video clip. Based on the graph attention mechanism, autonomously learning a direct correlation of each pair of features in the driver's facial feature sequence, and updating facial feature information based on the learned correlation; and for the updated facial features, autonomously learning the contribution degree of each feature to the final classification by using a graph annotation force mechanism, giving importance weight to the contribution degree of the final classification, carrying out weighted fusion on the features, and based on the fusion features, enabling a detection model to effectively distinguish peak frame features from non-peak frame features, fully utilizing peak frame feature information, and further improving the precision and generalization of fatigue driving detection.

The fatigue driving detection method provided by the embodiment specifically comprises the following steps:

constructing a fatigue driving detection model, wherein the model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network;

training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;

In the detection method, a plurality of original video sequences containing face images of a driver are acquired through a camera in a cab, each original video sequence is marked, whether the original video sequence is fatigue driving is marked, and a sample training set is constructed by the plurality of original video sequences marked whether the fatigue driving is carried out.

Secondly, constructing a fatigue driving detection model, and training the fatigue driving detection model by using a sample training set. The constructed fatigue driving detection model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network: in a space-time facial feature extraction network, acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; in a multi-head graph attention network, learning correlations between spatiotemporal facial features based on a graph attention mechanism, and updating a spatiotemporal facial feature sequence based on the correlations; in a weighted graph attention feature fusion network, learning importance weights of each feature on final classification based on a graph attention mechanism, and carrying out weighted fusion on the importance weights and an updated space-time facial feature sequence to obtain fusion features; and finally, performing fatigue or non-fatigue classification training based on the fusion characteristics to finish training of the fatigue driving detection model.

The fatigue driving detection method provided by the embodiment is realized through a built fatigue driving detection model (DCFGA-Net) based on depth characteristics and a graph annotation force mechanism, the network structure of the detection model is shown in fig. 1, and on the basis, the training process comprises the following steps:

step 1, acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence in a space-time facial feature extraction network.

Step 1.1, acquiring a face image sequence through a face detection algorithm based on an original video sequence containing a face image of a driver.

In the step 1.1, firstly, an MTCNN face detection algorithm is adopted for an original video sequence containing face images of a driver, position coordinates of face boundary boxes appearing in all video frames in the original video sequence are detected, cutting is performed based on the position coordinates of the face boundary boxes, face images in the video frame images are cut out, and a face image sequence is obtained.

And 1.2, extracting visual characterization features of each image in the face image sequence based on the face image sequence by using a convolutional neural network, so as to form a spatial face feature sequence.

In step 1.2 above, the visual characteristic features (i.e., spatial facial features) of each facial image in the facial feature sequence are extracted using a VGG16 convolutional neural network pre-trained on the ImageNet dataset, thereby constructing a spatial facial feature sequence, the acquired spatial facial feature sequence being expressed as

Wherein N is the input video sequenceColumn length, i.e. the number of video frames in a video sequence, d _f For each spatial facial feature dimension.

And 1.3, performing position coding on the spatial facial feature sequence by utilizing a multi-frequency cosine position function, injecting position information for each facial feature in the sequence, and acquiring a space-time facial feature sequence for the subsequent multi-head diagram attention network to learn.

In step 1.3, the present embodiment constructs a position-coded feature extractor (PEFE) to inject the facial features with position information in the sequence using a multi-frequency cosine position function, considering that the absolute position of each video frame is clear and that there is a clear timing relationship between the facial features in one video clip. First, the multi-frequency cosine position function is expressed as:

where pos represents the position of the facial feature in the facial feature sequence and base is a predefined hyper-parameter.

The position coding sequence PE= { PE is obtained by coding in the mode ₁ ，pe ₂ ，...，pe _N Each item in the sequence }

And the spatial facial features have the same dimensions, so that the above-mentioned position-coding sequence and the spatial facial feature sequence are added to obtain a spatiotemporal facial feature sequence after addition, which can be expressed as f= { F ₁ ，f ₂ ，...，f _N And } wherein,

each feature f of (1) _n ＝r _n +pe _n All compriseSpatial feature information of the corresponding face image and timing information in the video sequence.

And 2, learning correlations between the space-time facial features based on a graph attention mechanism in a Multi-head graph attention network (Multi-heads GAT), and updating the space-time facial feature sequence based on the correlations.

And 2.1, based on the space-time facial feature sequence, regarding each space-time facial feature as a node, and constructing a directed complete graph.

In step 2.1, the embodiment uses the multi-head graph attention network to construct a direct correlation between the temporal and spatial facial features, and updates the feature information based on the learned correlation. Unlike other application scenarios such as social networks, there is no explicit topological relationship between features of a video sequence. Thus, the present embodiment proposes the following assumptions: 1) There is a broad correlation between spatio-temporal facial features in the same sequence, i.e., each video frame (one video frame is considered as a node in the graph) has a correlation with other video frames (nodes); 2) The relationship between the spatiotemporal facial features is directional and the effect of node i on node j is not necessarily the same as the effect of node j on node i. Based on the above assumptions, the spatiotemporal facial feature sequence forms a fully connected directed graph.

The structured directed graph is a triplet (V (G), E (G), ψ (G)), where G is the graph, V (G) is the graph node set, E (G) is the directed edge set, ψ (G) is the relationship function, and each element in E (G) corresponds to a pair of ordered elements in V (G). The fully directed graph is constructed as shown in fig. 2, and includes three nodes V (G) = { h _i ，h _j ，h _k And nine directed edges E (G) = { E _ij ，e _ji ，e _ik ，e _ki ，e _jk ，e _kj ，e _ii ，e _jj ，e _kk Each directed edge is defined by a relationship function ψ (G). In the present embodiment, V (G) is a spatiotemporal facial feature sequence, E (G) is a set of direct correlations between spatiotemporal facial features, and ψ (G) is a self-attention function.

And 2.2, constructing a multi-head diagram attention network formed by parallel multi-head self-attention layers, and before each diagram attention head, mapping the input node characteristics into a characteristic subspace by adopting a fully-connected feedforward network layer.

Specifically, based on the directed graph structure, a graph attention mechanism is introduced to construct the interrelationship between space-time facial features, wherein the process of graph attention learning is shown in fig. 3. In order to enable the fatigue driving detection model to learn autonomously and pay attention to information from different feature subspaces jointly, a multi-head graph annotation architecture is constructed. In each of the drawing force heads, a shared linear transformation matrix

Is used for temporal facial features f= { F ₁ ，f ₂ ，...，f _N Mapping, i.e.:

H＝F·M＝{f ₁ M，f ₂ M，...，f _N M} (3)

step 2.3, calculating the attention coefficient for each pair of nodes in the directed graph through a shared self-attention mechanism, and constructing an adjacency matrix according to the attention coefficient. Specifically, a shared attention mechanism

Is used to calculate h= { H ₁ ，h ₂ ，...，h _N }, wherein->

Attention coefficients for each pair of nodes. The calculation method can be expressed as:

e _ij ＝LeakyReLU(a(h _i ||h _j )) (4)

in the above-mentioned method, the step of, vector concatenation is represented by I, e _ij The importance of the feature representing the node j to the node i is the direct relation between the feature of the node i and the feature of the node j. In order to make the attention coefficients of different nodes easy to compare, the attention coefficients of all neighboring nodes j to node i are normalized using the softmax function, i.e.:

/>

from the empirical perfect directed graph described above, a corresponding adjacency matrix can be obtained:

step 2.4, fusing the adjacency matrix which is learned autonomously with the node characteristic matrix, and updating the node characteristics so that the characteristics of different nodes are complementary, and making up for the information deficiency; and then, splicing (jointing) the updated characteristics of each attention head to form new node characteristics, namely acquiring an updated space-time facial characteristic sequence.

Specifically, from the adjacency matrix, the output of each attention head can be obtained:

where σ represents a nonlinear activation function. The outputs of the attention heads are then spliced together to form the output of the multi-head graph attention network, namely:

wherein the feature vector

Where hd=1, 2, …, heads represents the vector a _n From the hd th attention head. Each row O of the multi-head graph attention network output _n Representing an updated node feature, which is constructed based on the correlation between the original spatiotemporal facial feature and the features learned by the attention mechanism.

And 3, learning importance weights of the final classification of each feature based on a graph attention mechanism in a Weighted graph attention feature fusion network (Weighted-GAT Feature Fusion), and carrying out Weighted fusion on the importance weights and the updated space-time facial feature sequences to obtain fusion features.

After passing through the multi-head graph attention network, node features are updated according to the learned correlations between features, which updated features are more valuable to the final classification than the feature information originally extracted by the convolutional neural network, however, considering the different degrees of contribution of the different features to the final classification, it is also necessary to implement a distinction between peak frame features and non-peak frame features so that the contribution of the peak frame features to the final classification is greater. In addition, the output in the form of a sequence also needs to be further fused for binary classification. Based on the above requirements, the present embodiment provides a weighted graph attention feature fusion network, which implements the following steps:

and 3.1, adopting a fully connected feedforward network to reduce the dimension of the updated node characteristics, and projecting the node characteristics into a unified linear space to perform the next graph learning.

Specifically, each row O output by the attention network due to the multiple head map _n Comprises

From different feature subspaces, this embodiment uses a transformation matrix for further graph learning by projecting them into a linear space>

Wherein d is _o And (3) for outputting vector dimensions, the node characteristics are projected into a unified linear space, and linear transformation is completed.

Step 3.2, calculating attention coefficients for each pair of nodes in the graph by adopting a shared self-attention mechanism, and constructing an adjacency matrix by the attention coefficients; and adding the adjacent matrixes according to rows to form a weight vector, wherein each element in the vector is the corresponding facial feature importance weight.

Specifically, as with the multi-head attention network, the weighted graph attention feature fusion network uses a shared attention mechanism

To calculate the attention coefficients of each pair of graph nodes:

the Output of the graph attention can be expressed as the product of the adjacency matrix formed by the attention coefficients and the linearly transformed multi-headed graph attention network Output matrix, namely:

the whole calculation process is presented more clearly through the formula (10), and the importance weight is defined

Is obtained by the acquisition mode of the system.

Afterwards, by pooling RAPs in line average, the reconstructed facial features are weighted and fused with the importance weights of each feature to form the final fusion feature, which can be expressed as:

by the above scheme, the final feature can be calculated as the reconstructed facial feature O _n And importance weight

Where Mo is the linear transformation matrix employed by the weighted graph attention feature fusion network.

And 4, performing fatigue or non-fatigue classification training based on the fusion characteristics to finish training of the fatigue driving detection model.

And finally, inputting the obtained fusion characteristics into a full-connection layer for fatigue or non-fatigue classification training until the loss function converges, and completing the training of the fatigue driving detection model.

In the iterative training process, the embodiment adopts a cross entropy function to optimize the network model parameters, and the cross entropy function has the following formula:

/>

where K is the number of samples per batch, cls represents the class of classification (in this example, two classifications, cls=0/1, 0 indicates no fatigue, 1 indicates fatigue),

is a sample tag, representing that the ith sample belongs to class cls,/or->

Is the result of the model output passing through softmax, representing the probability of detecting the i-th sample as a class of cls.

And performing iterative training by using the cross entropy loss function as a cost function through a back propagation algorithm. In this embodiment, an AdamW network parameter optimizer is selected, the training period is set to 50 epochs, the batch size of samples per batch is set to 16, the initial learning rate is set to 0.0001, and the attenuation rate is 0.05 after each half of the training period epochs.

By the method, a fatigue driving detection model after training is obtained.

And finally, inputting an original video sequence to be detected, which contains the face image of the driver, into a fatigue driving detection model after training, and completing fatigue driving detection.

Specifically, the fatigue driving detection proposed in this embodiment is further described and verified by the following examples.

First, the hardware conditions of the verification instance are: ubuntu16.04L, CPU: intel i9-9900X, RAM 64G,1 TITAN XP display card; the software environment used is: python3.7, pytorch=1.6.0, torchvision= 0.7.0; the data are from the NTHU-DDD dataset. The NTHU-DDD dataset simulates five driver conditions including daytime driver's bare face, daytime driver's glasses, nighttime driver's bare face, nighttime driver's glasses, daytime driver's sunglasses; each situation comprises four video segments, and four states of the driver are recorded, wherein the states comprise a non-sleepiness state, a sleepiness state, slow blinking and nodding, and yawning. The dataset was divided into a training set, a test set and a validation set, the training set containing 18 drivers and the test set containing 4 drivers, the validation set containing 14 drivers.

Fatigue driving detection experiments were performed on the NTHU-DDD dataset. In order to make the experimental results more convincing, a k-fold cross-validation method is used, in which the dataset is first randomly divided into k sub-datasets of non-coincident samples, the number of samples of which is approximately the same. Then sequentially selecting each sub-data set as an evaluation set, selecting other k-1 sub-data sets as training sets, and taking the average value of k experimental results as a final result. In this example, k=3 is selected.

The fatigue driving detection is carried out by adopting the existing algorithm and the method (DCFGA-Net) proposed by the embodiment, the detection results are shown in the following table 1, wherein the identification accuracy and F1 Score are adopted as the judging indexes, and the larger the numerical value is, the better the effect is. Obviously, the scheme of the embodiment is superior to the prior algorithm in performance.

Table 1 detection results of different fatigue driving detection algorithms

In addition, an ablation experiment is performed in the embodiment, and the effectiveness of the position coding, the multi-head diagram attention network and the weight diagram attention feature fusion network is proved. Tables 3, 4 and 5 demonstrate the improvement in performance of the present method by the introduction of a multi-head map attention module, a weight map attention feature fusion module and a position code, respectively.

Table 2 detection results of fatigue driving detection algorithm using different multi-head attention networks

Table 3 detection results of detection algorithm of whether or not the weight graph attention feature fusion network is adopted

Table 4 detection results of detection algorithm employing different position codes

In addition, fig. 4 shows the importance weights given to each face image by the weight graph attention feature fusion network, which further illustrates that the method described in this embodiment can distinguish between peak frames and non-peak frames, so as to improve the detection accuracy.

Example two

The embodiment provides a fatigue driving detection system based on depth characteristics and a schematic force mechanism, which comprises:

Example III

The embodiment provides an electronic device, which comprises a memory, a processor and computer instructions stored on the memory and running on the processor, wherein the computer instructions complete the steps in the fatigue driving detection method based on the depth characteristic and the schematic force mechanism.

Example IV

The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the steps in the fatigue driving detection method based on depth features and a schematic force mechanism as described above.

The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description of the second embodiment refers to the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, whereby they may be stored in storage means for execution by computing means, or they may be made into individual integrated circuit modules separately, or a plurality of modules or steps in them may be made into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. A fatigue driving detection method based on depth characteristics and a graph annotating force mechanism is characterized by comprising the following steps:

2. The fatigue driving detection method based on depth features and a graph attention mechanism according to claim 1, wherein the fatigue driving detection model comprises a spatiotemporal facial feature extraction network, a multi-head graph attention network and a weighted graph attention feature fusion network.

3. The fatigue driving detection method based on depth feature and graph annotation force mechanism as claimed in claim 1, wherein the obtaining the spatiotemporal facial feature sequence based on the original video sequence through facial feature extraction and position coding comprises:

4. The fatigue driving detection method based on depth feature and graph annotation force mechanism as set forth in claim 3, wherein the performing position coding on the spatial facial feature sequence by using a multi-frequency cosine position function, injecting position information for each facial feature in the sequence, and obtaining the space-time facial feature sequence includes:

5. The fatigue driving detection method based on depth features and a graph attention mechanism according to claim 1, wherein learning correlations between spatiotemporal facial features based on the graph attention mechanism, updating spatiotemporal facial feature sequences based on correlations, comprises:

6. The fatigue driving detection method based on depth features and a graph attention mechanism of claim 1, wherein learning importance weights of each feature to the final classification based on the graph attention mechanism, and performing weighted fusion with an updated spatiotemporal facial feature sequence based on the importance weights, to obtain fusion features, comprises:

7. The fatigue driving detection method based on depth features and a schematic force mechanism according to claim 1, wherein the loss function is a cross entropy function, and the cross entropy function formula is:

is a sample tag, indicating that the ith sample belongs to class cls,/->

Representing the probability of detecting the i-th sample as class cls.

8. A fatigue driving detection system based on depth characteristics and a drawing force mechanism is characterized by comprising:

9. An electronic device, characterized by: comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which, when executed by the processor, perform the steps of a method for fatigue driving detection based on depth profile and graphical user interface as claimed in any one of claims 1-7.

10. A computer-readable storage medium, characterized by: for storing computer instructions which, when executed by a processor, perform the steps of a method for fatigue driving detection based on depth features and a graphical user interface as defined in any one of claims 1-7.