CN116189155A - Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism - Google Patents

Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism Download PDF

Info

Publication number
CN116189155A
CN116189155A CN202211619272.0A CN202211619272A CN116189155A CN 116189155 A CN116189155 A CN 116189155A CN 202211619272 A CN202211619272 A CN 202211619272A CN 116189155 A CN116189155 A CN 116189155A
Authority
CN
China
Prior art keywords
sequence
fatigue driving
facial feature
driving detection
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211619272.0A
Other languages
Chinese (zh)
Inventor
常发亮
黄一鸣
刘春生
路彦沙
常致富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202211619272.0A priority Critical patent/CN116189155A/en
Publication of CN116189155A publication Critical patent/CN116189155A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a fatigue driving detection method and a system based on depth characteristics and a graph annotation force mechanism, wherein the method comprises the following steps: constructing a sample training set by using a plurality of original video sequences marked with whether to fatigue drive; constructing a fatigue driving detection model, and training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features; and inputting the original video sequence to be detected into a fatigue driving detection model after training, so as to finish fatigue driving detection and improve the precision and generalization of the fatigue driving detection.

Description

Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism
Technical Field
The invention relates to the technical field of fatigue driving detection, in particular to a fatigue driving detection method and system based on depth characteristics and a drawing force mechanism.
Background
Expressways rapidly develop in modern society, and fatigue driving is one of the main fierces causing traffic accidents. The fatigue driving condition can appear in the long-time driving of driver, in order to improve driver and passenger's trip safety, need detect the fatigue condition of driver to in time give the driver and remind. Therefore, with the improvement of safety consciousness and scientific technology of people, a fatigue driving detection means is generated.
The fatigue driving detection method commonly used at present mainly comprises a method based on physiological information, a method based on computer vision, a method based on vehicle running behavior change and the like. The method based on computer vision is higher in accuracy of fatigue driving detection through facial features of a driver such as eyes, a mouth, a head and an expression, is lower in cost, does not interfere with normal driving, is higher in practicality, acceptability and comfort, and is most widely applied. However, the existing fatigue driving detection method based on computer vision still has the defects in algorithm.
First, the direct interrelationship of driver facial features is not fully utilized. Conventional approaches typically use 3DCNN and LSTM to study the relationship between driver facial features, such as "Long-term multi-granularity deep framework for driver drowsiness detection" as disclosed by j.lyu et al and "Driver Yawning Detection Based on Subtle Facial Action Recognition" as disclosed by h.yang et al. However, 3DCNN is limited by the size of the convolution kernel receptive field, can only build timing features in certain continuous feature maps, and has difficulty in building long-term correlations; the LSTM processes the features sequentially in sequence, the features located at the back of the sequence can only obtain the aggregate information of the features located at the front of the sequence, and it is difficult to construct correlations for features farther apart in the sequence. The direct correlation between facial features is not explored in the existing detection methods.
Secondly, the ability to distinguish between peak frame features and non-peak frame features is lacking. The existing driver fatigue detection method treats each acquired facial feature equally, and ignores the fact that different facial features make different contributions to the final classification. In particular, for example, a yawned video clip, the driver's facial expression may change only subtly initially, and then develop a strong appearance, i.e., tight eyes and a large mouth. The former is often referred to as non-peak frames, while the latter is referred to as peak frames, features from peak frames have greater reference value for the final classification than features of non-peak frames. The existing method lacks the capability of distinguishing peak frame characteristics from non-peak frame characteristics, and cannot highlight the importance of the peak frame characteristics.
In summary, the existing fatigue driving detection method based on computer vision has the defects, and the precision and generalization of the fatigue driving detection are poor.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a fatigue driving detection method and a system based on depth characteristics and a graph annotation force mechanism, solves the problems of insufficient utilization of direct facial characteristic correlation and lack of capability of distinguishing peak frame characteristics from non-peak frame characteristics in the existing detection method, and improves the precision and generalization of fatigue driving detection.
In a first aspect, the present disclosure provides a fatigue driving detection method based on depth features and a schematic force mechanism, including:
acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences, and constructing a sample training set by using the original video sequences with the marks whether fatigue driving exists or not;
constructing a fatigue driving detection model, and training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;
and inputting the original video sequence to be detected, which contains the face image of the driver, into a fatigue driving detection model after training, and completing fatigue driving detection.
According to a further technical scheme, the fatigue driving detection model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network.
According to a further technical scheme, the method for acquiring the space-time facial feature sequence based on the original video sequence through facial feature extraction and position coding comprises the following steps:
acquiring a face image sequence through a face detection algorithm based on an original video sequence containing a face image of a driver;
based on the facial image sequence, extracting visual characterization features of each image in the facial image sequence by using a convolutional neural network, so as to form a spatial facial feature sequence;
and (3) carrying out position coding on the spatial facial feature sequence by utilizing a multi-frequency cosine position function, and injecting position information for each facial feature in the sequence to obtain a space-time facial feature sequence.
According to a further technical scheme, the method for performing position coding on a spatial facial feature sequence by using a multi-frequency cosine position function, injecting position information for each facial feature in the sequence, and obtaining a space-time facial feature sequence comprises the following steps:
coding by utilizing a multi-frequency cosine position function to obtain a position coding sequence;
and adding the position coding sequence and the space facial feature sequence to obtain a space-time facial feature sequence.
Further technical solution, learning correlations between spatiotemporal facial features based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, includes:
based on the space-time facial feature sequence, each space-time facial feature is regarded as a node to form node features, so that a directed complete graph is constructed;
mapping the input node characteristics to a characteristic subspace, and calculating attention coefficients for each pair of nodes in the directed graph through a shared self-attention mechanism so as to construct an adjacency matrix;
and fusing the adjacent matrix with the node characteristic matrix, updating the node characteristics, splicing the node characteristics updated by each attention head to form new node characteristics, and acquiring an updated space-time facial characteristic sequence.
According to a further technical scheme, the importance weight of each feature to the final classification is learned based on a graph attention mechanism, weighted fusion is carried out based on the importance weight and an updated space-time facial feature sequence, and fusion features are obtained, wherein the method comprises the following steps:
projecting node characteristics into a unified linear space, calculating attention coefficients for each pair of nodes in the graph by adopting a shared self-attention mechanism, and constructing an adjacency matrix based on the attention coefficients;
adding adjacent matrixes according to rows to form a weight vector; each element in the weight vector is a corresponding facial feature importance weight;
and carrying out weighted fusion on the updated space-time facial feature sequence and the weight vector to obtain fusion features.
According to a further technical scheme, the loss function is a cross entropy function, and the formula of the cross entropy function is as follows:
Figure BDA0003999086300000041
where K is the number of samples per batch, cls represents the class of classification, cls=0/1, 0 represents no fatigue, 1 represents fatigue,
Figure BDA0003999086300000042
is a sample tag, indicating that the ith sample belongs to class cls,/->
Figure BDA0003999086300000043
Representing the probability of detecting the i-th sample as class cls.
In a second aspect, the present disclosure provides a fatigue driving detection system based on depth features and a schematic force mechanism, comprising:
the sample training set construction module is used for acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences and constructing a sample training set according to whether the plurality of original video sequences marked with fatigue driving are marked;
the detection model construction module is used for constructing a fatigue driving detection model, and the model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network;
the detection model training module is used for training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;
the detection module is used for inputting an original video sequence to be detected, which contains the face image of the driver, into the fatigue driving detection model after training is completed, and the fatigue driving detection is completed.
In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the method of the first aspect.
In a fourth aspect, the present disclosure also provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.
The one or more of the above technical solutions have the following beneficial effects:
1. the invention provides a fatigue driving detection method and a system based on depth features and a drawing attention mechanism, wherein the direct interrelationship of each pair of features in a facial feature sequence of a driver is autonomously learned based on the drawing attention mechanism, and facial feature information is updated based on the learned interrelationship; and for the updated facial features, autonomously learning the contribution degree of each feature to the final classification by using a graph annotation force mechanism, giving importance weight to the contribution degree of the final classification, carrying out weighted fusion on the features, and based on the fusion features, enabling a detection model to effectively distinguish peak frame features from non-peak frame features, fully utilizing peak frame feature information, and further improving the precision and generalization of fatigue driving detection.
2. The fatigue driving detection method and system provided by the invention solve the problems of insufficient utilization of the direct correlation of facial features and lack of the capability of distinguishing peak frame features from non-peak frame features in the existing detection method, and improve the precision and generalization of fatigue driving detection.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a schematic diagram of a network structure of a fatigue driving detection model based on depth features and a schematic force mechanism in a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a full directed graph according to a first embodiment of the present invention;
FIG. 3 is a flowchart of a method for learning a schematic drawing in accordance with an embodiment of the present invention;
fig. 4 is a visual histogram of the importance weights of facial features in the facial feature sequences corresponding to the input facial images in accordance with the first embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
The embodiment provides a fatigue driving detection method based on depth characteristics and a drawing attention mechanism, which is used for detecting fatigue driving through the drawing attention mechanism based on a driver driving video clip. Based on the graph attention mechanism, autonomously learning a direct correlation of each pair of features in the driver's facial feature sequence, and updating facial feature information based on the learned correlation; and for the updated facial features, autonomously learning the contribution degree of each feature to the final classification by using a graph annotation force mechanism, giving importance weight to the contribution degree of the final classification, carrying out weighted fusion on the features, and based on the fusion features, enabling a detection model to effectively distinguish peak frame features from non-peak frame features, fully utilizing peak frame feature information, and further improving the precision and generalization of fatigue driving detection.
The fatigue driving detection method provided by the embodiment specifically comprises the following steps:
acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences, and constructing a sample training set by using the original video sequences with the marks whether fatigue driving exists or not;
constructing a fatigue driving detection model, wherein the model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network;
training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;
and inputting the original video sequence to be detected, which contains the face image of the driver, into a fatigue driving detection model after training, and completing fatigue driving detection.
In the detection method, a plurality of original video sequences containing face images of a driver are acquired through a camera in a cab, each original video sequence is marked, whether the original video sequence is fatigue driving is marked, and a sample training set is constructed by the plurality of original video sequences marked whether the fatigue driving is carried out.
Secondly, constructing a fatigue driving detection model, and training the fatigue driving detection model by using a sample training set. The constructed fatigue driving detection model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network: in a space-time facial feature extraction network, acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; in a multi-head graph attention network, learning correlations between spatiotemporal facial features based on a graph attention mechanism, and updating a spatiotemporal facial feature sequence based on the correlations; in a weighted graph attention feature fusion network, learning importance weights of each feature on final classification based on a graph attention mechanism, and carrying out weighted fusion on the importance weights and an updated space-time facial feature sequence to obtain fusion features; and finally, performing fatigue or non-fatigue classification training based on the fusion characteristics to finish training of the fatigue driving detection model.
The fatigue driving detection method provided by the embodiment is realized through a built fatigue driving detection model (DCFGA-Net) based on depth characteristics and a graph annotation force mechanism, the network structure of the detection model is shown in fig. 1, and on the basis, the training process comprises the following steps:
step 1, acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence in a space-time facial feature extraction network.
Step 1.1, acquiring a face image sequence through a face detection algorithm based on an original video sequence containing a face image of a driver.
In the step 1.1, firstly, an MTCNN face detection algorithm is adopted for an original video sequence containing face images of a driver, position coordinates of face boundary boxes appearing in all video frames in the original video sequence are detected, cutting is performed based on the position coordinates of the face boundary boxes, face images in the video frame images are cut out, and a face image sequence is obtained.
And 1.2, extracting visual characterization features of each image in the face image sequence based on the face image sequence by using a convolutional neural network, so as to form a spatial face feature sequence.
In step 1.2 above, the visual characteristic features (i.e., spatial facial features) of each facial image in the facial feature sequence are extracted using a VGG16 convolutional neural network pre-trained on the ImageNet dataset, thereby constructing a spatial facial feature sequence, the acquired spatial facial feature sequence being expressed as
Figure BDA0003999086300000083
Wherein N is the input video sequenceColumn length, i.e. the number of video frames in a video sequence, d f For each spatial facial feature dimension.
And 1.3, performing position coding on the spatial facial feature sequence by utilizing a multi-frequency cosine position function, injecting position information for each facial feature in the sequence, and acquiring a space-time facial feature sequence for the subsequent multi-head diagram attention network to learn.
In step 1.3, the present embodiment constructs a position-coded feature extractor (PEFE) to inject the facial features with position information in the sequence using a multi-frequency cosine position function, considering that the absolute position of each video frame is clear and that there is a clear timing relationship between the facial features in one video clip. First, the multi-frequency cosine position function is expressed as:
Figure BDA0003999086300000081
Figure BDA0003999086300000082
where pos represents the position of the facial feature in the facial feature sequence and base is a predefined hyper-parameter.
The position coding sequence PE= { PE is obtained by coding in the mode 1 ,pe 2 ,...,pe N Each item in the sequence }
Figure BDA0003999086300000091
And the spatial facial features have the same dimensions, so that the above-mentioned position-coding sequence and the spatial facial feature sequence are added to obtain a spatiotemporal facial feature sequence after addition, which can be expressed as f= { F 1 ,f 2 ,...,f N And } wherein,
Figure BDA0003999086300000092
each feature f of (1) n =r n +pe n All compriseSpatial feature information of the corresponding face image and timing information in the video sequence.
And 2, learning correlations between the space-time facial features based on a graph attention mechanism in a Multi-head graph attention network (Multi-heads GAT), and updating the space-time facial feature sequence based on the correlations.
And 2.1, based on the space-time facial feature sequence, regarding each space-time facial feature as a node, and constructing a directed complete graph.
In step 2.1, the embodiment uses the multi-head graph attention network to construct a direct correlation between the temporal and spatial facial features, and updates the feature information based on the learned correlation. Unlike other application scenarios such as social networks, there is no explicit topological relationship between features of a video sequence. Thus, the present embodiment proposes the following assumptions: 1) There is a broad correlation between spatio-temporal facial features in the same sequence, i.e., each video frame (one video frame is considered as a node in the graph) has a correlation with other video frames (nodes); 2) The relationship between the spatiotemporal facial features is directional and the effect of node i on node j is not necessarily the same as the effect of node j on node i. Based on the above assumptions, the spatiotemporal facial feature sequence forms a fully connected directed graph.
The structured directed graph is a triplet (V (G), E (G), ψ (G)), where G is the graph, V (G) is the graph node set, E (G) is the directed edge set, ψ (G) is the relationship function, and each element in E (G) corresponds to a pair of ordered elements in V (G). The fully directed graph is constructed as shown in fig. 2, and includes three nodes V (G) = { h i ,h j ,h k And nine directed edges E (G) = { E ij ,e ji ,e ik ,e ki ,e jk ,e kj ,e ii ,e jj ,e kk Each directed edge is defined by a relationship function ψ (G). In the present embodiment, V (G) is a spatiotemporal facial feature sequence, E (G) is a set of direct correlations between spatiotemporal facial features, and ψ (G) is a self-attention function.
And 2.2, constructing a multi-head diagram attention network formed by parallel multi-head self-attention layers, and before each diagram attention head, mapping the input node characteristics into a characteristic subspace by adopting a fully-connected feedforward network layer.
Specifically, based on the directed graph structure, a graph attention mechanism is introduced to construct the interrelationship between space-time facial features, wherein the process of graph attention learning is shown in fig. 3. In order to enable the fatigue driving detection model to learn autonomously and pay attention to information from different feature subspaces jointly, a multi-head graph annotation architecture is constructed. In each of the drawing force heads, a shared linear transformation matrix
Figure BDA0003999086300000105
Is used for temporal facial features f= { F 1 ,f 2 ,...,f N Mapping, i.e.:
H=F·M={f 1 M,f 2 M,...,f N M} (3)
step 2.3, calculating the attention coefficient for each pair of nodes in the directed graph through a shared self-attention mechanism, and constructing an adjacency matrix according to the attention coefficient. Specifically, a shared attention mechanism
Figure BDA0003999086300000101
Is used to calculate h= { H 1 ,h 2 ,...,h N }, wherein->
Figure BDA0003999086300000102
Attention coefficients for each pair of nodes. The calculation method can be expressed as:
e ij =LeakyReLU(a(h i ||h j )) (4)
in the above-mentioned method, the step of, vector concatenation is represented by I, e ij The importance of the feature representing the node j to the node i is the direct relation between the feature of the node i and the feature of the node j. In order to make the attention coefficients of different nodes easy to compare, the attention coefficients of all neighboring nodes j to node i are normalized using the softmax function, i.e.:
Figure BDA0003999086300000103
/>
from the empirical perfect directed graph described above, a corresponding adjacency matrix can be obtained:
Figure BDA0003999086300000104
step 2.4, fusing the adjacency matrix which is learned autonomously with the node characteristic matrix, and updating the node characteristics so that the characteristics of different nodes are complementary, and making up for the information deficiency; and then, splicing (jointing) the updated characteristics of each attention head to form new node characteristics, namely acquiring an updated space-time facial characteristic sequence.
Specifically, from the adjacency matrix, the output of each attention head can be obtained:
Figure BDA0003999086300000111
where σ represents a nonlinear activation function. The outputs of the attention heads are then spliced together to form the output of the multi-head graph attention network, namely:
Figure BDA0003999086300000112
wherein the feature vector
Figure BDA0003999086300000113
Where hd=1, 2, …, heads represents the vector a n From the hd th attention head. Each row O of the multi-head graph attention network output n Representing an updated node feature, which is constructed based on the correlation between the original spatiotemporal facial feature and the features learned by the attention mechanism.
And 3, learning importance weights of the final classification of each feature based on a graph attention mechanism in a Weighted graph attention feature fusion network (Weighted-GAT Feature Fusion), and carrying out Weighted fusion on the importance weights and the updated space-time facial feature sequences to obtain fusion features.
After passing through the multi-head graph attention network, node features are updated according to the learned correlations between features, which updated features are more valuable to the final classification than the feature information originally extracted by the convolutional neural network, however, considering the different degrees of contribution of the different features to the final classification, it is also necessary to implement a distinction between peak frame features and non-peak frame features so that the contribution of the peak frame features to the final classification is greater. In addition, the output in the form of a sequence also needs to be further fused for binary classification. Based on the above requirements, the present embodiment provides a weighted graph attention feature fusion network, which implements the following steps:
and 3.1, adopting a fully connected feedforward network to reduce the dimension of the updated node characteristics, and projecting the node characteristics into a unified linear space to perform the next graph learning.
Specifically, each row O output by the attention network due to the multiple head map n Comprises
Figure BDA0003999086300000121
From different feature subspaces, this embodiment uses a transformation matrix for further graph learning by projecting them into a linear space>
Figure BDA0003999086300000122
Wherein d is o And (3) for outputting vector dimensions, the node characteristics are projected into a unified linear space, and linear transformation is completed.
Step 3.2, calculating attention coefficients for each pair of nodes in the graph by adopting a shared self-attention mechanism, and constructing an adjacency matrix by the attention coefficients; and adding the adjacent matrixes according to rows to form a weight vector, wherein each element in the vector is the corresponding facial feature importance weight.
Specifically, as with the multi-head attention network, the weighted graph attention feature fusion network uses a shared attention mechanism
Figure BDA0003999086300000123
To calculate the attention coefficients of each pair of graph nodes:
Figure BDA0003999086300000124
the Output of the graph attention can be expressed as the product of the adjacency matrix formed by the attention coefficients and the linearly transformed multi-headed graph attention network Output matrix, namely:
Figure BDA0003999086300000125
the whole calculation process is presented more clearly through the formula (10), and the importance weight is defined
Figure BDA0003999086300000126
Is obtained by the acquisition mode of the system.
Afterwards, by pooling RAPs in line average, the reconstructed facial features are weighted and fused with the importance weights of each feature to form the final fusion feature, which can be expressed as:
Figure BDA0003999086300000131
Figure BDA0003999086300000132
by the above scheme, the final feature can be calculated as the reconstructed facial feature O n And importance weight
Figure BDA0003999086300000133
Where Mo is the linear transformation matrix employed by the weighted graph attention feature fusion network.
And 4, performing fatigue or non-fatigue classification training based on the fusion characteristics to finish training of the fatigue driving detection model.
And finally, inputting the obtained fusion characteristics into a full-connection layer for fatigue or non-fatigue classification training until the loss function converges, and completing the training of the fatigue driving detection model.
In the iterative training process, the embodiment adopts a cross entropy function to optimize the network model parameters, and the cross entropy function has the following formula:
Figure BDA0003999086300000134
/>
where K is the number of samples per batch, cls represents the class of classification (in this example, two classifications, cls=0/1, 0 indicates no fatigue, 1 indicates fatigue),
Figure BDA0003999086300000135
is a sample tag, representing that the ith sample belongs to class cls,/or->
Figure BDA0003999086300000136
Is the result of the model output passing through softmax, representing the probability of detecting the i-th sample as a class of cls.
And performing iterative training by using the cross entropy loss function as a cost function through a back propagation algorithm. In this embodiment, an AdamW network parameter optimizer is selected, the training period is set to 50 epochs, the batch size of samples per batch is set to 16, the initial learning rate is set to 0.0001, and the attenuation rate is 0.05 after each half of the training period epochs.
By the method, a fatigue driving detection model after training is obtained.
And finally, inputting an original video sequence to be detected, which contains the face image of the driver, into a fatigue driving detection model after training, and completing fatigue driving detection.
Specifically, the fatigue driving detection proposed in this embodiment is further described and verified by the following examples.
First, the hardware conditions of the verification instance are: ubuntu16.04L, CPU: intel i9-9900X, RAM 64G,1 TITAN XP display card; the software environment used is: python3.7, pytorch=1.6.0, torchvision= 0.7.0; the data are from the NTHU-DDD dataset. The NTHU-DDD dataset simulates five driver conditions including daytime driver's bare face, daytime driver's glasses, nighttime driver's bare face, nighttime driver's glasses, daytime driver's sunglasses; each situation comprises four video segments, and four states of the driver are recorded, wherein the states comprise a non-sleepiness state, a sleepiness state, slow blinking and nodding, and yawning. The dataset was divided into a training set, a test set and a validation set, the training set containing 18 drivers and the test set containing 4 drivers, the validation set containing 14 drivers.
Fatigue driving detection experiments were performed on the NTHU-DDD dataset. In order to make the experimental results more convincing, a k-fold cross-validation method is used, in which the dataset is first randomly divided into k sub-datasets of non-coincident samples, the number of samples of which is approximately the same. Then sequentially selecting each sub-data set as an evaluation set, selecting other k-1 sub-data sets as training sets, and taking the average value of k experimental results as a final result. In this example, k=3 is selected.
The fatigue driving detection is carried out by adopting the existing algorithm and the method (DCFGA-Net) proposed by the embodiment, the detection results are shown in the following table 1, wherein the identification accuracy and F1 Score are adopted as the judging indexes, and the larger the numerical value is, the better the effect is. Obviously, the scheme of the embodiment is superior to the prior algorithm in performance.
Table 1 detection results of different fatigue driving detection algorithms
Figure BDA0003999086300000141
Figure BDA0003999086300000151
In addition, an ablation experiment is performed in the embodiment, and the effectiveness of the position coding, the multi-head diagram attention network and the weight diagram attention feature fusion network is proved. Tables 3, 4 and 5 demonstrate the improvement in performance of the present method by the introduction of a multi-head map attention module, a weight map attention feature fusion module and a position code, respectively.
Table 2 detection results of fatigue driving detection algorithm using different multi-head attention networks
Figure BDA0003999086300000152
Table 3 detection results of detection algorithm of whether or not the weight graph attention feature fusion network is adopted
Figure BDA0003999086300000153
Table 4 detection results of detection algorithm employing different position codes
Figure BDA0003999086300000154
In addition, fig. 4 shows the importance weights given to each face image by the weight graph attention feature fusion network, which further illustrates that the method described in this embodiment can distinguish between peak frames and non-peak frames, so as to improve the detection accuracy.
Example two
The embodiment provides a fatigue driving detection system based on depth characteristics and a schematic force mechanism, which comprises:
the sample training set construction module is used for acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences and constructing a sample training set according to whether the plurality of original video sequences marked with fatigue driving are marked;
the detection model construction module is used for constructing a fatigue driving detection model, and the model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network;
the detection model training module is used for training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;
the detection module is used for inputting an original video sequence to be detected, which contains the face image of the driver, into the fatigue driving detection model after training is completed, and the fatigue driving detection is completed.
Example III
The embodiment provides an electronic device, which comprises a memory, a processor and computer instructions stored on the memory and running on the processor, wherein the computer instructions complete the steps in the fatigue driving detection method based on the depth characteristic and the schematic force mechanism.
Example IV
The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the steps in the fatigue driving detection method based on depth features and a schematic force mechanism as described above.
The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description of the second embodiment refers to the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, whereby they may be stored in storage means for execution by computing means, or they may be made into individual integrated circuit modules separately, or a plurality of modules or steps in them may be made into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (10)

1. A fatigue driving detection method based on depth characteristics and a graph annotating force mechanism is characterized by comprising the following steps:
acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences, and constructing a sample training set by using the original video sequences with the marks whether fatigue driving exists or not;
constructing a fatigue driving detection model, and training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;
and inputting the original video sequence to be detected, which contains the face image of the driver, into a fatigue driving detection model after training, and completing fatigue driving detection.
2. The fatigue driving detection method based on depth features and a graph attention mechanism according to claim 1, wherein the fatigue driving detection model comprises a spatiotemporal facial feature extraction network, a multi-head graph attention network and a weighted graph attention feature fusion network.
3. The fatigue driving detection method based on depth feature and graph annotation force mechanism as claimed in claim 1, wherein the obtaining the spatiotemporal facial feature sequence based on the original video sequence through facial feature extraction and position coding comprises:
acquiring a face image sequence through a face detection algorithm based on an original video sequence containing a face image of a driver;
based on the facial image sequence, extracting visual characterization features of each image in the facial image sequence by using a convolutional neural network, so as to form a spatial facial feature sequence;
and (3) carrying out position coding on the spatial facial feature sequence by utilizing a multi-frequency cosine position function, and injecting position information for each facial feature in the sequence to obtain a space-time facial feature sequence.
4. The fatigue driving detection method based on depth feature and graph annotation force mechanism as set forth in claim 3, wherein the performing position coding on the spatial facial feature sequence by using a multi-frequency cosine position function, injecting position information for each facial feature in the sequence, and obtaining the space-time facial feature sequence includes:
coding by utilizing a multi-frequency cosine position function to obtain a position coding sequence;
and adding the position coding sequence and the space facial feature sequence to obtain a space-time facial feature sequence.
5. The fatigue driving detection method based on depth features and a graph attention mechanism according to claim 1, wherein learning correlations between spatiotemporal facial features based on the graph attention mechanism, updating spatiotemporal facial feature sequences based on correlations, comprises:
based on the space-time facial feature sequence, each space-time facial feature is regarded as a node to form node features, so that a directed complete graph is constructed;
mapping the input node characteristics to a characteristic subspace, and calculating attention coefficients for each pair of nodes in the directed graph through a shared self-attention mechanism so as to construct an adjacency matrix;
and fusing the adjacent matrix with the node characteristic matrix, updating the node characteristics, splicing the node characteristics updated by each attention head to form new node characteristics, and acquiring an updated space-time facial characteristic sequence.
6. The fatigue driving detection method based on depth features and a graph attention mechanism of claim 1, wherein learning importance weights of each feature to the final classification based on the graph attention mechanism, and performing weighted fusion with an updated spatiotemporal facial feature sequence based on the importance weights, to obtain fusion features, comprises:
projecting node characteristics into a unified linear space, calculating attention coefficients for each pair of nodes in the graph by adopting a shared self-attention mechanism, and constructing an adjacency matrix based on the attention coefficients;
adding adjacent matrixes according to rows to form a weight vector; each element in the weight vector is a corresponding facial feature importance weight;
and carrying out weighted fusion on the updated space-time facial feature sequence and the weight vector to obtain fusion features.
7. The fatigue driving detection method based on depth features and a schematic force mechanism according to claim 1, wherein the loss function is a cross entropy function, and the cross entropy function formula is:
Figure FDA0003999086290000031
where K is the number of samples per batch, cls represents the class of classification, cls=0/1, 0 represents no fatigue, 1 represents fatigue,
Figure FDA0003999086290000032
is a sample tag, indicating that the ith sample belongs to class cls,/->
Figure FDA0003999086290000033
Representing the probability of detecting the i-th sample as class cls.
8. A fatigue driving detection system based on depth characteristics and a drawing force mechanism is characterized by comprising:
the sample training set construction module is used for acquiring a plurality of original video sequences containing face images of a driver, marking the original video sequences and constructing a sample training set according to whether the plurality of original video sequences marked with fatigue driving are marked;
the detection model construction module is used for constructing a fatigue driving detection model, and the model comprises a space-time facial feature extraction network, a multi-head diagram attention network and a weighted diagram attention feature fusion network;
the detection model training module is used for training the fatigue driving detection model by using a sample training set; the training process comprises the following steps: acquiring a space-time facial feature sequence through facial feature extraction and position coding based on an original video sequence; learning correlations between the spatiotemporal facial features and importance weights of each feature on final classification based on a graph attention mechanism, updating a spatiotemporal facial feature sequence based on the correlations, performing weighted fusion based on the importance weights and the updated spatiotemporal facial feature sequence to obtain fusion features, and performing classification training based on the fusion features;
the detection module is used for inputting an original video sequence to be detected, which contains the face image of the driver, into the fatigue driving detection model after training is completed, and the fatigue driving detection is completed.
9. An electronic device, characterized by: comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which, when executed by the processor, perform the steps of a method for fatigue driving detection based on depth profile and graphical user interface as claimed in any one of claims 1-7.
10. A computer-readable storage medium, characterized by: for storing computer instructions which, when executed by a processor, perform the steps of a method for fatigue driving detection based on depth features and a graphical user interface as defined in any one of claims 1-7.
CN202211619272.0A 2022-12-14 2022-12-14 Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism Pending CN116189155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211619272.0A CN116189155A (en) 2022-12-14 2022-12-14 Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211619272.0A CN116189155A (en) 2022-12-14 2022-12-14 Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism

Publications (1)

Publication Number Publication Date
CN116189155A true CN116189155A (en) 2023-05-30

Family

ID=86431760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211619272.0A Pending CN116189155A (en) 2022-12-14 2022-12-14 Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism

Country Status (1)

Country Link
CN (1) CN116189155A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116959078A (en) * 2023-09-14 2023-10-27 山东理工职业学院 Method for constructing fatigue detection model, fatigue detection method and device thereof
CN117079255A (en) * 2023-10-17 2023-11-17 江西开放大学 Fatigue driving detection method based on face recognition and voice interaction
CN117152155A (en) * 2023-10-31 2023-12-01 海杰亚(北京)医疗器械有限公司 Multi-needle ablation planning method and device, storage medium and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116959078A (en) * 2023-09-14 2023-10-27 山东理工职业学院 Method for constructing fatigue detection model, fatigue detection method and device thereof
CN116959078B (en) * 2023-09-14 2023-12-05 山东理工职业学院 Method for constructing fatigue detection model, fatigue detection method and device thereof
CN117079255A (en) * 2023-10-17 2023-11-17 江西开放大学 Fatigue driving detection method based on face recognition and voice interaction
CN117079255B (en) * 2023-10-17 2024-01-05 江西开放大学 Fatigue driving detection method based on face recognition and voice interaction
CN117152155A (en) * 2023-10-31 2023-12-01 海杰亚(北京)医疗器械有限公司 Multi-needle ablation planning method and device, storage medium and electronic equipment
CN117152155B (en) * 2023-10-31 2024-02-13 海杰亚(北京)医疗器械有限公司 Multi-needle ablation planning method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN107633513B (en) 3D image quality measuring method based on deep learning
CN116189155A (en) Fatigue driving detection method and system based on depth characteristics and graph annotation force mechanism
CN111814719B (en) Skeleton behavior recognition method based on 3D space-time diagram convolution
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
AU2021290336A1 (en) Structure learning in convolutional neural networks
CN108388876A (en) A kind of image-recognizing method, device and relevant device
CN106068514A (en) For identifying the system and method for face in free media
CN105740758A (en) Internet video face recognition method based on deep learning
CN113139470B (en) Glass identification method based on Transformer
CN113989890A (en) Face expression recognition method based on multi-channel fusion and lightweight neural network
WO2021243947A1 (en) Object re-identification method and apparatus, and terminal and storage medium
CN113435432B (en) Video anomaly detection model training method, video anomaly detection method and device
US20220391611A1 (en) Non-linear latent to latent model for multi-attribute face editing
CN112733602B (en) Relation-guided pedestrian attribute identification method
CN113569627B (en) Human body posture prediction model training method, human body posture prediction method and device
Kumar Arora et al. Optimal facial feature based emotional recognition using deep learning algorithm
CN113158815A (en) Unsupervised pedestrian re-identification method, system and computer readable medium
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
Tallec et al. Multi-order networks for action unit detection
CN113780129B (en) Action recognition method based on unsupervised graph sequence predictive coding and storage medium
Sudha et al. On-road driver facial expression emotion recognition with parallel multi-verse optimizer (PMVO) and optical flow reconstruction for partial occlusion in internet of things (IoT)
CN111950373B (en) Method for micro expression recognition based on transfer learning of optical flow input
Cilla et al. Human action recognition with sparse classification and multiple‐view learning
CN110751005A (en) Pedestrian detection method integrating depth perception features and kernel extreme learning machine
CN117373116A (en) Human body action detection method based on lightweight characteristic reservation of graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination