CN114782933A - Driver fatigue detection system based on multi-mode Transformer network - Google Patents

Driver fatigue detection system based on multi-mode Transformer network Download PDF

Info

Publication number
CN114782933A
CN114782933A CN202210501128.0A CN202210501128A CN114782933A CN 114782933 A CN114782933 A CN 114782933A CN 202210501128 A CN202210501128 A CN 202210501128A CN 114782933 A CN114782933 A CN 114782933A
Authority
CN
China
Prior art keywords
token
image
electroencephalogram
eeg
facial expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210501128.0A
Other languages
Chinese (zh)
Inventor
陆闻滢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Original Assignee
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Artificial Intelligence of Hefei Comprehensive National Science Center filed Critical Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority to CN202210501128.0A priority Critical patent/CN114782933A/en
Publication of CN114782933A publication Critical patent/CN114782933A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/168Evaluating attention deficit, hyperactivity
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/18Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state for vehicle drivers or machine operators
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B2503/00Evaluating a particular growth phase or type of persons or animals
    • A61B2503/20Workers
    • A61B2503/22Motor vehicles operators, e.g. drivers, pilots, captains
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Psychiatry (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Developmental Disabilities (AREA)
  • General Physics & Mathematics (AREA)
  • Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Social Psychology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Educational Technology (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Fuzzy Systems (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The invention discloses a driver fatigue detection system based on a multi-mode Transformer network, and relates to deep learning and intelligent biomedical signal processing. The data acquisition module is used for acquiring electroencephalogram signals of a driver and facial expression image blocks of the driver; the embedded processing module is used for embedding the electroencephalogram signal and the facial expression image block to obtain an electroencephalogram token and an image block token; the self-attention analysis module is used for carrying out multi-head self-attention analysis on the brain electrical token and the image block token through a Transformer encoder so as to obtain a brain electrical signal token vector and an image block token vector; and the result prediction module is used for performing result prediction on the token vector through a prediction function expression. The invention can obtain the driving state of the driver according to the electroencephalogram and the facial expression of the driver and give feedback in real time, thereby effectively improving the driving safety.

Description

Driver fatigue detection system based on multi-mode Transformer network
Technical Field
The invention relates to deep learning and intelligent biomedical signal processing, in particular to a driver fatigue detection system based on a multi-modal Transformer network.
Background
The fatigue driving has great traffic hidden trouble, and the traffic accidents caused by the hidden trouble are not enumerated. According to the statistics of survey data in the road traffic industry, the proportion of accidents caused by fatigue driving in the serious traffic accidents reaches more than 40 percent, which is one of the three major causes of the serious traffic accidents, and the proportion of the accidents in the accidents causing death is up to 21 percent. If the driver can be detected to be in a fatigue driving state in time, then an automatic driving means is used for intervention, so that accidents can be reduced, and a lot of lives can be saved.
The existing driver fatigue detection model is mainly monomodal, or based on electroencephalogram, or based on facial expression recognition.
Wherein, the brain electricity is realized based on a brain-computer interface. The brain-computer interface refers to a direct link established between the brain of a human or animal and an external device, and realizes information exchange between the brain and the device. The technology can utilize the human brain to generate different reactions to different things or cognitive activities so as to obtain different types of electroencephalogram signals, and the conversion of control instructions is realized by amplifying, filtering, collecting, extracting and classifying the electroencephalogram signals.
Facial expressions have an important role in interpersonal communication, and are more intuitive and accurate in expressing human emotions compared with media such as characters, voice and the like. The facial expression recognition method roughly includes the following three aspects: preprocessing the facial image, learning facial expression characteristics and classifying facial expressions, and finally obtaining the physiological and psychological states of the person according to the facial expression classification.
The information processing of the electroencephalogram or the facial expression is realized by mainly adopting a convolutional neural network and a cyclic neural network. However, whether the convolutional neural network or the cyclic neural network is adopted, there is still room for improvement in modeling the long-distance dependency relationship of the input features and in parallel performance. However, for real-time detection systems, the inference time of the long-distance dependency model of the input features becomes a bottleneck limiting the performance of the model.
Disclosure of Invention
The invention aims to solve the technical problem that the prior art is not enough, provides a driver fatigue detection system based on a multi-mode Transformer network, and solves the problem that the inference time of the existing information processing network model restricts the model performance.
The invention relates to a driver fatigue detection system based on a multi-mode Transformer network, which comprises,
the data acquisition module is used for acquiring electroencephalogram signals of a driver and facial expression images of the driver, and dividing the facial expression images into facial expression image blocks;
the embedded processing module is used for embedding the electroencephalogram signal and the facial expression image block to obtain an electroencephalogram token and an image block token;
the self-attention analysis module is used for carrying out multi-head self-attention analysis on the electroencephalogram token and the image block token through a Transformer encoder so as to obtain an electroencephalogram signal token vector and an image block token vector;
and the result prediction module is used for performing result prediction on the electroencephalogram signal token vector and the image block token vector through a prediction function formula capable of performing dimension reduction processing.
The formula of the prediction function is as follows,
prediction=tanh(W(Pooling([EEG class token;Image class token]))+b);
in the formula, prediction is a prediction result value; EEG class token is an electroencephalogram token vector; the Image class token is an Image block token vector; pooling is a Pooling function; b is a bias vector; w is a weight vector.
And constructing a loss function according to the classification task of the Transformer, and analyzing the predicted result value through the loss function so as to optimize the predicted result value.
The loss function is a function of the loss of,
Figure BDA0003634351170000031
in the formula, yi represents whether the prediction result value sample belongs to the i-th class classification task; p is a radical ofiProbability of considering the sample to belong to class i classification task for the model; task is a parameter factor; class is the number of classes.
The electroencephalogram signal token vector and the image block token vector are obtained through the following formulas,
x’=MSA(LN(x))+x;
output=tanh(MLP(LN(x’))+x’);
wherein MSA is a multi-headed self-attention mechanism function; LN is a layer normalization function; x is the input value of the transform coder; x' is an output value of x after the operation of the multi-head self-attention mechanism function; MLP is the multilayer perceptron function; tan h is a classifier function; output is the output value of the transform encoder.
The electroencephalogram signal and the facial expression image block are subjected to embedding processing by the following formula,
EEG=[EEGclass;EEG1;…;EEGt]+posEEG
Image=[Imageclass;ImagePatch1;…;ImagePatchn]+posImage
in the formula, EEG is the EEG token, and EEG belongs to Rt×c(ii) a R is a real number domain; t is the number of sampling points in the time dimension; c is the number of channels; EEG (electroencephalogram)classA token for the electroencephalogram signal category; posEEGEmbedding a matrix for the location of the brain electrical signal, and posEEG∈R(t+1)×c(ii) a Image is an Image block token; image (Image)classA token for a facial expression image block category; posImageEmbedding a matrix for the position of the image block of the facial expression, and posImage∈R(n+1)×c
Adding position vectors for the brain electricity token and the image block token through the following formula,
Figure BDA0003634351170000032
Figure BDA0003634351170000033
in the formula, PE(pos,2i)Is a row vector of elements; PE (polyethylene)(pos,2i+1)A column vector of elements; pos is the position of the electroencephalogram signal or the facial expression image block in the time sequence, namely a row vector of the position embedded matrix; i is the dimension of the matrix row vector.
The number of facial expression image blocks into which the facial expression image is divided is,
n=H×L/P2
wherein, P is the image size; l is the image length; h is the image width; c is the number of channels.
Advantageous effects
The invention has the advantages that: the electroencephalogram signal and the facial expression signal are subjected to fusion processing through the embedded processing module, the self-attention analysis module and the result prediction module based on the Transformer module, the driving state of a driver can be obtained according to the electroencephalogram and the facial expression of the driver, and feedback can be given in real time, so that the problem that the deducing time of the existing information processing network model restricts the model performance is solved.
Drawings
FIG. 1 is a simplified flow diagram of a detection system according to the present invention;
FIG. 2 is a schematic flow chart of the data acquisition module, the embedded processing module and the self-attention analysis module according to the present invention;
fig. 3 is a schematic flow chart of a self-attention analysis module according to the present invention.
Detailed Description
The invention is further described below with reference to examples, but not to be construed as in any way limiting the invention, which is intended to be covered by the claims of the invention, with only a limited number of modifications being possible by anyone within the scope of the claims.
Referring to fig. 1 to fig. 3, the driver fatigue detection system based on the multi-modal Transformer network of the present invention includes a data acquisition module, an embedded processing module, a self-attention analysis module, and a result prediction module. The embedded processing module, the self-attention analysis module and the result prediction module are all realized based on a Transformer module.
In this embodiment, the data acquisition module is used to acquire electroencephalogram signals of a driver and facial expression image blocks thereof.
Regarding the acquisition of the electroencephalogram signals, the electroencephalogram acquisition equipment is used for acquiring the electroencephalogram signals of a driver to obtain a series of multi-channel electroencephalogram signals. Because the acquisition of the electroencephalogram signal has a certain duration, the electroencephalogram signal can be used as a time series [ EEG ]1,EEG2,…,EEGt]And (4) showing.
A camera is placed behind the front windshield of the car to obtain the driver's video stream. Because the acquired electroencephalogram signal is a time sequence and has a certain duration, the average value processing is carried out on the video frames of the acquired video stream within the duration to obtain the average image of the facial expression of the driver. The average image of the facial expression of the driver is cut into n H × L/P2Facial expression image blocks, each image block being of size C × P.
Wherein, P is the image size; l is the image length; h is the image width; c is the number of channels.
The n facial expression image blocks can pass through a time sequence ImagePatch1;…;ImagePatchn]And (4) showing.
And the embedded processing module is used for embedding the electroencephalogram signal and the facial expression image block to obtain an electroencephalogram token and an image block token. The brain electricity token and the image block token are used as the input of the multi-mode Transformer model.
Specifically, the formula for embedding the input by using the Transformer is as follows:
EEG=[EEGclass;EEG1;…;EEGt]+posEEG
Image=[Imageclass;ImagePatch1;…;ImagePatchn]+posImage
wherein, EEG is an EEG token, and EEG belongs to Rt×c(ii) a R is a real number domain; t is the number of sampling points in the time dimension; c is the number of channels; EEG (electroencephalogram)classThe electroencephalogram signal class token is encoded by a transform encoder and then output to a multilayer sensor, and then the electroencephalogram signal can be classified; posEEGEmbedding a matrix for the location of the brain electrical signal, and posEEG∈R(t+1)×c(ii) a Image is an Image block token; image (Image)classA facial expression image block category token; posImageEmbedding a matrix for the position of the image block of the facial expression, and posImage∈R(n+1)×c
Since the embedding of the input using the Transformer is order-free, a vector of order features needs to be added to each element in the token matrix. The position vector of the elements in the token matrix can be determined by the following formula.
Figure BDA0003634351170000061
Figure BDA0003634351170000062
In the formula, PE(pos,2i)Is a row vector of elements; PE (polyethylene)(pos,2i+1)A column vector of elements; pos is the position of the electroencephalogram signal or the facial expression image block in the time sequence, namely a row vector of the position embedded matrix; i is the dimension of the matrix row vector.
The electroencephalogram signals and the facial expression image blocks are subjected to embedding processing, so that the position codes of each signal or each image block are unique, and the distances between any two positions of time sequence sequences with different lengths can be kept consistent. Furthermore, since the trigonometric function is used to encode the position of the signal or image block, the position encoding can also be made not to go infinite due to the infinite increase of the sequence.
The self-attention analysis module is used for respectively carrying out self-attention analysis on the electroencephalogram token and the image block token through a Transformer encoder so as to obtain an electroencephalogram signal token vector and an image block token vector. The electroencephalogram token vector is used for representing the correlation inside the electroencephalogram; the image block token vector is used to represent the association between the facial expression image blocks.
The electroencephalogram token vector and the image block token vector can be obtained through the following formulas:
X’=MSA(LN(x))+x;
output=tanh(MLP(LN(x’))+x’)。
wherein MSA is a multi-headed self-attention mechanism function; LN is a layer normalization function; x is an input value of the encoder, wherein x is EEG in an EEG signal classification task, and x is Image in an Image classification task; x' is an output value of x after the operation of a multi-head self-attention mechanism function; MLP is a multi-layer perceptron function; tan h is a classifier function; output is the output value of the encoder, namely the electroencephalogram signal token vector or the image block token vector. After a series of encoder coding analysis is carried out on the input value of the encoder, the token vector is made to learn the correlation inside the electroencephalogram signal and the correlation among the image blocks.
The multi-headed self-attentiveness mechanism will be described in detail below. The multi-headed self-attention mechanism function can be specifically expressed by the following formula.
Figure BDA0003634351170000071
Figure BDA0003634351170000072
In the above formula, Q, K, V are three parameter factors, namely Query, Key and Value, in the multi-head self-attention mechanism.
The three parameter factors can be obtained by multiplying the same input value by three different weight matrixes respectively:
Q=Wq×Input;
K=Wk×Input;
V=Wv×Input。
wherein, Wq、Wk、WvAre all weight matrices; input is an Input value, equal to x.
A set of attention weights can be obtained by calculating the correlation between Q and K, and the attention weights are mapped to [0, 1 ] by the softmax function]Interval, however when the vector dimension is high, QKTWill reach a large magnitude, which will cause the gradient of the softmax function to become small, and therefore will need to be at QKTOn the basis of the method, a parameter dk is divided, and normalization is carried out on the dimension so as to eliminate the influence brought by the dimension.
In this embodiment, to improve the parallel performance of the model, the input electroencephalogram signal or average image may also be divided into a plurality of attention heads by a multi-head self-attention mechanism. I.e. the input brain electrical signal or image block matrix is split into a plurality of parts. And splicing results of each part after the operation of the self-attention mechanism, so that the parallel performance of the model can be improved.
And the result prediction module is used for performing result prediction on the electroencephalogram signal token vector and the image block token vector through a prediction function formula capable of performing dimension reduction processing after the token vector is obtained.
The specific prediction mode can be realized by the following functional formula:
prediction=tanh(W(Pooling([EEG class token;Image class token]))+b)。
in the formula, prediction is a prediction result value; EEG class token is a brain electrical signal token vector; the Image class token is an Image block token vector; pooling is a Pooling function and is used for performing dimensionality reduction operation on the token vector; b is a bias vector; w is a weight vector. Wherein, the two parameters W and b jointly form a full connection layer, and the vector passing through the full connection layer is subjected to linear transformation.
The Transformer network originates from the field of natural language processing, and its core is to learn the relationship between elements in different positions of a sequence through a module called the self-attention mechanism. Compared with a convolutional neural network, the Transformer network has a more flexible receptive field and can better capture the long-distance dependence in the characteristic diagram; compared with a recurrent neural network, the Transformer network has more excellent parallel performance without waiting for the output of the previous state as the input of the model. Therefore, the embodiment can fuse the information of the two modes, namely the electroencephalogram and the facial expression, through the transform network technology, can obtain the driving state of the driver according to the electroencephalogram and the facial expression of the driver and give feedback in real time, and therefore the problem that the inference time of the existing information processing network model restricts the model performance is solved. In addition, the Transformer model without convolution operation will perform better than the convolutional neural network in terms of parameter and inference time.
Preferably, the embodiment further designs a loss function, so that the transform module can directly utilize the encoder to process the visual features and can interact with the multi-modal features, without adding an additional visual encoder.
Since the Transformer model is composed of three classification tasks, the formula of the loss function in this embodiment is designed as follows:
Figure BDA0003634351170000081
wherein, yiWhether the prediction result value sample belongs to the classification task of the ith class or not is represented; p is a radical of formulaiProbability of considering the sample as belonging to class i classification task for the model; task is a parameter factor; class is the number of classes.
The predictive result value is analyzed through the loss function, the predictive result value can be optimized, and the state of the driver can be fed back more accurately through the Transformer model.
When y isiWhen 1, piThe closer to 1 the penalty is, the smaller the gradient passed when the sample propagates backwards. p is a radical ofiThe closer to 0, the greater the penalty, i.e. the greater the gradient passed when the sample is propagated backwards.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various changes and modifications without departing from the structure of the invention, which will not affect the effect of the invention and the practicability of the patent.

Claims (8)

1. The driver fatigue detection system based on the multi-mode transform network is characterized by comprising,
the data acquisition module is used for acquiring electroencephalogram signals of a driver and facial expression images of the driver, and dividing the facial expression images into facial expression image blocks;
the embedded processing module is used for embedding the electroencephalogram signal and the facial expression image block to obtain an electroencephalogram token and an image block token;
the self-attention analysis module is used for carrying out multi-head self-attention analysis on the electroencephalogram tokens and the image block tokens through a Transformer encoder so as to obtain electroencephalogram token vectors and image block token vectors;
and the result prediction module is used for performing result prediction on the electroencephalogram signal token vector and the image block token vector through a prediction function formula capable of performing dimension reduction processing.
2. The multi-modal Transformer network based driver fatigue detection system of claim 1, wherein the prediction function is,
prediction=tanh(W(Pooling([EEG class token;Image class token]))+b);
in the formula, prediction is a prediction result value; EEG class token is an electroencephalogram token vector; the Image class token is an Image block token vector; pooling is a Pooling function; b is a bias vector; w is a weight vector.
3. The multi-modal Transformer network-based driver fatigue detection system according to claim 2, wherein a loss function is constructed according to the classification task of the Transformer, and the predicted result value is analyzed through the loss function to optimize the predicted result value.
4. The multi-modal Transformer network based driver fatigue detection system of claim 3, wherein the loss function is,
Figure FDA0003634351160000011
in the formula, yiWhether the prediction result value sample belongs to the classification task of the ith class or not is represented; p is a radical of formulaiProbability of considering the sample as belonging to class i classification task for the model; task is a parameter factor; class is the number of classes.
5. The multimodal transducer network based driver fatigue detection system of claim 1, wherein the electroencephalogram token vector and the tile token vector are each obtained by the following formula,
x’=MSA(LN(x))+x;
output=tanh(MLP(LN(x’))+x’);
wherein MSA is a multi-headed self-attention mechanism function; LN is a layer normalization function; x is the input value of the transform coder; x' is an output value of x after the operation of the multi-head self-attention mechanism function; MLP is the multilayer perceptron function; tan h is a classifier function; output is the output value of the transform encoder.
6. The multi-modal transducer network based driver fatigue detection system of claim 1, wherein the electroencephalogram signal and the image patch of the facial expression are embedded by the following formula,
EEG=[EEGclass;EEG1;…;EEGt]+posEEG
Image=[Imageclass;ImagePatch1;…;ImagePatchn]+posImage
wherein EEG is an EEG token and EEG ∈ Rt×c(ii) a R is a real number domain; t is the number of sampling points in the time dimension; c is the number of channels; EEG (electroencephalogram)classA token of the electroencephalogram signal category; posEEGEmbedding a matrix for the location of the brain electrical signal, and posEEG∈R(t +1)×c(ii) a Image is an Image block token; image (I)classA facial expression image block category token; posImageEmbedding a matrix for the position of the image block of the facial expression, and posImage∈R(n+1)×c
7. The multi-modal Transformer network based driver fatigue detection system of claim 6, wherein position vectors are added to the brain electricity tokens and patch tokens by the formula,
Figure FDA0003634351160000021
Figure FDA0003634351160000031
in the formula, PE(pos,2i)A row vector of elements; PE (polyethylene)(pos,2i+1)A column vector of elements; pos is electroencephalogram signal or facial expression image blockThe positions in the time sequence, i.e. the row vectors of the position embedding matrix; i is the dimension of the matrix row vector.
8. The multimodal transducer network based driver fatigue detection system of claim 1, wherein the facial expression image is segmented into a number of facial expression image patches,
n=H×L/P2
wherein, P is the image size; l is the image length; h is the image width; c is the number of channels.
CN202210501128.0A 2022-05-09 2022-05-09 Driver fatigue detection system based on multi-mode Transformer network Pending CN114782933A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210501128.0A CN114782933A (en) 2022-05-09 2022-05-09 Driver fatigue detection system based on multi-mode Transformer network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210501128.0A CN114782933A (en) 2022-05-09 2022-05-09 Driver fatigue detection system based on multi-mode Transformer network

Publications (1)

Publication Number Publication Date
CN114782933A true CN114782933A (en) 2022-07-22

Family

ID=82437941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210501128.0A Pending CN114782933A (en) 2022-05-09 2022-05-09 Driver fatigue detection system based on multi-mode Transformer network

Country Status (1)

Country Link
CN (1) CN114782933A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116831581A (en) * 2023-06-15 2023-10-03 中南大学 Remote physiological sign extraction-based driver state monitoring method and system
WO2024060917A1 (en) * 2022-09-23 2024-03-28 中国电信股份有限公司 Defect identification method, apparatus and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024060917A1 (en) * 2022-09-23 2024-03-28 中国电信股份有限公司 Defect identification method, apparatus and system
CN116831581A (en) * 2023-06-15 2023-10-03 中南大学 Remote physiological sign extraction-based driver state monitoring method and system

Similar Documents

Publication Publication Date Title
CN113936339B (en) Fighting identification method and device based on double-channel cross attention mechanism
CN114782933A (en) Driver fatigue detection system based on multi-mode Transformer network
KR102318775B1 (en) Method for Adaptive EEG signal processing using reinforcement learning and System Using the same
CN115529166A (en) Network security scanning risk management and control system and method based on multi-source data
CN114176607B (en) Electroencephalogram signal classification method based on vision transducer
KR20200018868A (en) Method for Adaptive EEG signal processing using reinforcement learning and System Using the same
Cao et al. MCS-YOLO: A multiscale object detection method for autonomous driving road environment recognition
CN111079665A (en) Morse code automatic identification method based on Bi-LSTM neural network
CN116226715A (en) Multi-mode feature fusion-based online polymorphic identification system for operators
CN114722950B (en) Multi-mode multi-variable time sequence automatic classification method and device
CN114863572A (en) Myoelectric gesture recognition method of multi-channel heterogeneous sensor
CN116863384A (en) CNN-Transfomer-based self-supervision video segmentation method and system
CN116129405A (en) Method for identifying anger emotion of driver based on multi-mode hybrid fusion
CN116956222A (en) Multi-complexity behavior recognition system and method based on self-adaptive feature extraction
CN118015562A (en) Method and system for extracting key frames of traffic accident monitoring video in severe weather
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN116503631A (en) YOLO-TGB vehicle detection system and method
CN116503379A (en) Lightweight improved YOLOv 5-based part identification method
CN113298004B (en) Lightweight multi-head age estimation method based on face feature learning
CN114548216A (en) Online abnormal driving behavior identification method based on Encoder-Decoder attention network and LSTM
CN114663959A (en) Alarm management system and working method thereof
CN118013234B (en) Multi-source heterogeneous big data-based key vehicle driver portrait intelligent generation system
CN116965817B (en) EEG emotion recognition method based on one-dimensional convolution network and transducer
Ding et al. A learnable end-edge-cloud cooperative network for driving emotion sensing
Hu et al. A lightweight two-stream model for driver emotion recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination