WO2024015018A1 - Cognitive workload recognition from temporal series information - Google Patents
Cognitive workload recognition from temporal series information Download PDFInfo
- Publication number
- WO2024015018A1 WO2024015018A1 PCT/SG2023/050490 SG2023050490W WO2024015018A1 WO 2024015018 A1 WO2024015018 A1 WO 2024015018A1 SG 2023050490 W SG2023050490 W SG 2023050490W WO 2024015018 A1 WO2024015018 A1 WO 2024015018A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- lstm network
- lstm
- network
- main
- Prior art date
Links
- 230000001149 cognitive effect Effects 0.000 title claims abstract description 64
- 230000002123 temporal effect Effects 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 44
- 230000006403 short-term memory Effects 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 9
- 230000004044 response Effects 0.000 claims abstract description 8
- 230000004424 eye movement Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 11
- 238000009499 grossing Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 23
- 230000006399 behavior Effects 0.000 description 11
- 230000004927 fusion Effects 0.000 description 10
- 238000013459 approach Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 5
- 230000015654 memory Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 239000003086 colorant Substances 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000002790 cross-validation Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000003340 mental effect Effects 0.000 description 3
- 210000003205 muscle Anatomy 0.000 description 3
- 210000004761 scalp Anatomy 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012876 topography Methods 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013481 data capture Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012880 independent component analysis Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000003973 paint Substances 0.000 description 2
- 240000004759 Inga spectabilis Species 0.000 description 1
- 241000238370 Sepia Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000011157 data evaluation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- VZCCETWTMQHEPK-QNEBEIHSSA-N gamma-linolenic acid Chemical compound CCCCC\C=C/C\C=C/C\C=C/CCCCC(O)=O VZCCETWTMQHEPK-QNEBEIHSSA-N 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003446 memory effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000007427 paired t-test Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/163—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state by tracking eye movement, gaze, or pupil change
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/18—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state for vehicle drivers or machine operators
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
- A61B5/316—Modalities, i.e. specific diagnostic methods
- A61B5/369—Electroencephalography [EEG]
- A61B5/372—Analysis of electroencephalograms
- A61B5/374—Detecting the frequency distribution of signals, e.g. detecting delta, theta, alpha, beta or gamma waves
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/08—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B2503/00—Evaluating a particular growth phase or type of persons or animals
- A61B2503/20—Workers
- A61B2503/22—Motor vehicles operators, e.g. drivers, pilots, captains
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2520/00—Input parameters relating to overall vehicle dynamics
- B60W2520/10—Longitudinal speed
- B60W2520/105—Longitudinal acceleration
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2520/00—Input parameters relating to overall vehicle dynamics
- B60W2520/12—Lateral speed
- B60W2520/125—Lateral acceleration
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2520/00—Input parameters relating to overall vehicle dynamics
- B60W2520/14—Yaw
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2540/00—Input parameters relating to occupants
- B60W2540/22—Psychological state; Stress level or workload
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2540/00—Input parameters relating to occupants
- B60W2540/225—Direction of gaze
Definitions
- the present invention relates, in general terms, to systems and methods for training supervised learning models for cognitive workload recognition.
- Driver workload inference is significant for the design of intelligent humanmachine cooperative driving schemes. Such inference allows systems to alert drivers before potentially dangerous manoeuvres are performed and achieve a safer control transition. However, pattern variations among individual drivers and sensor artefacts pose great challenges to the existing cognitive workload recognition approaches.
- IVI in-vehicle infotainment
- navigation systems provide real-time guidance to drivers but visual-manual tasks and auditory-verbal activities increase the driver's mental workload and are secondary to driving, thereby increasing the risk of distraction.
- ARecNet Attention- enabled Recognition Network
- EEG electroencephalogram
- An "external state” may be a "vehicle state” in embodiments applied to cognitive workload recognition for vehicle drivers.
- Previous machine learning technologies consider single input modalities which cannot fully exploit the complementarity of multimodal data in assessing cognitive workload, due to the information redundancy.
- ARecNet employs a feature-level fusion architecture across input modes, to recognize driver cognitive workload.
- a system that trains a supervised learning model for cognitive workload recognition, wherein the system comprises a plurality of processors configured to train the model by employing a sequence-to-sequence learning paradigm, wherein the model comprises a main long short-term memory (LSTM) network, an auxiliary LSTM network and a classifier layer; wherein the model is configured to output a predicted cognitive workload level of a user in response to input of temporal series information related to the user over a plurality of time steps.
- LSTM main long short-term memory
- ARecNet may embody a method to train a supervised learning model for cognitive workload recognition by employing a sequence-to-sequence learning paradigm.
- the model comprises a main long short-term memory (LSTM) network, an auxiliary LSTM network and a classifier layer.
- the model is configured to output a predicted cognitive workload level of a user in response to input of temporal series information related to the user over a plurality of time steps. It does so by, at each said time step: updating the main LSTM network to map the temporal series information to a sequence of hidden states; updating the auxiliary LSTM network to generate weights for the main LSTM network; and obtaining the predicted cognitive workload level by processing the hidden states and the weights through the classifier layer.
- the temporal series information in the phrase "input of temporal series information related to the user over a plurality of time steps", and similar, refers to information relating to the user themselves (e.g. of the driver, where the user is a driver) and information relating to external states (e.g. of a vehicle, such as a neighbouring vehicle or a vehicle being driven by the user, where the user is a driver).
- embodiments of the present invention establish an attention- enabled decision-level fusion architecture to infer driver cognitive workload levels. This suggests the availability of a viable generic technique for capturing useful feature representation from time-series multimodal information.
- embodiments of the present invention involve constructing a novel driver workload dataset, including multimodal signals and multiple driving scenarios.
- Figure 1 is a schematic overview of an attention-enabled cognitive workload recognition method and mode, with multimodal information fusion in accordance with present teachings
- Figure 2 is a platform for multi-modal information capture, which can be applied as a driver-in-the-loop platform;
- Figure 3 illustrates a method to train a supervised learning model for cognitive workload recognition, in accordance with present teachings
- Figure 4 shows activity power spectra for typical components and the corresponding scalp topographies, in which image (a) shows eye artefact, (b) shows muscle artefact and (c) is normal;
- Figure 6 shows confusion matrices for driver cognitive workload recognition with different historical horizons in typical driving scenarios, in which image (a) is sunny noon, (b) is foggy dusk, and (c) is rainy night;
- Figure 8 is the outcome of statistical tests of decision-level fusion-based approaches with HyperLSTM modules.
- Figure 9 is a block diagram of a system for cognitive workload recognition (estimation).
- ARecNet an attention-enabled recognition network with a decisionlevel fusion architecture, that assesses cognitive workload estimation performance.
- the present methodology employs a cross-attention mechanism to enhance useful feature representations learned by hyper long short-term memory (HyperLSTM) based modules from time-series multimodal information, e.g., EEG signals, eye movements, external states or behaviours (e.g. vehicle states or vehicle behaviour or, where the user is an athlete, the state or behaviour of the user's body and/or competing athletes around the user - such as on a running track).
- HyperLSTM hyper long short-term memory
- Figure 1 schematically represents a method 100, for training a supervised learning model for cognitive workload recognition, in the context of multi-modal input information acquisition 102.
- the input information comprises temporal or time series information related to the vehicle and/or driver over a plurality of time steps.
- This information can be acquired through any appropriate mechanism - e.g. the information can be extracted from a database or recorded and used for real-time training.
- the temporal series information may have a single input mode - e.g. EEG, eye tracking/movement or vehicle behavior/performance - the present temporal series information is multi-modal, thus having multiple input modes.
- Each input mode is a respective one of EEG signals, eye movements and vehicle states.
- the temporal series information is captured from monitoring drivers during normal on-road driving.
- the temporal series information is captured through the driver-in- the-loop experimental platform 200, shown in Figure 2.
- the platform 200 may have any appropriate configuration to facilitate data acquisition of the desired input modes, and presently comprises a physical simulator 202 (e.g. Logitech G29), an image capture device or system for tracking eye movement 204 (e.g. an infrared eye tracker such as Tobii Pro), and a wired or wireless EEG headset
- n-back tasks with varying difficulty may be employed to modulate cognitive workload levels objectively.
- the n-back task may be a visual-auditory mixed n-back task for regulating cognitive loads on drivers. This task can therefore reflect cognitive workload introduced by both visual and auditory information during driving.
- the tests may comprise secondary tasks with the varied amount of information that participants need to memorize and respond to, such as maintaining speed through traffic, driving from origin to destination and others.
- These secondary tasks enable the system 200 to obtain three classes of driver cognitive workload, namely slight level, moderate level and intensive level, which correspond to ground truth labels - e.g. ternary ground truth labels for three input modes.
- the data may be pre-processed. This can be necessary to remove artifacts such as blink, facial and body movement artifacts from eye tracking data.
- Various techniques can be adopted for removing noise from signals, or removing signals, before extracting sub-band components from raw data. Band-pass filtering (low- and/or high-band) and notch filtering (to remove power supply noise) may restrict the band spectrum to relevant information - e.g. 1-30 Hz, and independent component analysis (ICA) may be used to reject artefact-induced signal components.
- ICA independent component analysis
- the temporal series information captured by the system 200 is transmitted or conveyed to a system 210 that employs method 100 to train a supervised learning model 102 for cognitive workload recognition.
- the model 104 comprises a main long shortterm memory (LSTM) network 106, an auxiliary LSTM network 108 and a classifier layer 110.
- the model 104 is configured to output a predicted cognitive workload level of a vehicle driver in response to input of the temporal series information over a plurality of time steps.
- the method 100 (as also reflected in Figure 3) employs a sequence-to-sequence learning paradigm.
- the learning paradigm comprises, at each time step:
- step 106' updating the main LSTM network 106 to map the temporal series information to a sequence of hidden states
- step 108' updating the auxiliary LSTM network 108 to generate weights for the main LSTM network
- step 110' obtaining the predicted cognitive workload level by processing the hidden states and the weights through a classifier layer of the model.
- step 108' may be performed before step 106'.
- the driver cognitive load recognition is formulated as a supervised classification problem.
- the workload levels are adopted as labels, as shown in Figure 1.
- the tasks user/drivers are asked to perform will be assigned a predetermined workload level based on an anticipated difficulty or amount of attention required to successfully complete the task.
- an EEG measurement system may be mounted ot a user/driver and the user-driver may be in a vehicle with one or more image capture devices and sensors mounted to capture images (e.g. video feed) of the driving environment, vehicle parameters - e.g. vehicle speed - and/or eye movements of the user/driver.
- the images may be cross-referenced to simulated tasks or otherwise processed to ascertain workload levels at multiple time intervals while driving.
- the present discussion will be made with reference to a simulated environment, but it will be appreciated that the same or similar teachings may be employed in respect of a real world environment.
- the temporal series information is multimodal, comprising electroencephalogram (EEG) signals, eye movements and vehicle states. Consequently, given the dataset as: (1)
- X 1 denotes the multimodal temporal sequences of size /
- j is the j th sample
- N is its total number of samples.
- x represents the feature vectors across the time steps, with x, being the feature vector at the i th time step.
- the features may described one or more relationships between the multi-modal input data and driver cognitive load.
- v [ ⁇ v x , ⁇ v y , ⁇ a x , ⁇ a y , ⁇ v, y] represents the instantaneous longitudinal and lateral velocities/accelerations of the vehicle with respect to the front one, the relative resultant velocity and the yaw rate, respectively. In some embodiments, only a proper subset of these velocities is required for assessing cognitive workload. Accordingly, where Dim is the total dimension from concatenating all feature vectors in x. In the embodiment given above, Dim is 14 (w (4), m (4) and v (6)) and thus x ⁇
- the model itself comprises a main LSTM 106 that maps the temporal series information 108 to a sequence of hidden states 110.
- the model 104 further comprises at least one auxiliary LSTM network 112 and a classifier layer 114.
- the model further comprises an attention mechanism 116.
- the number of HyperLSTMs may correspond to the number of modal inputs - for example, three or four HyperLSTMs will be use for three or four modal inputs, respectively.
- the model 104 comprises a plurality of auxiliary LSTM networks, herein referred to as hyper long short-term memory (HyperLSTM) based modules each associated with a main LSTM, marked 106, 118, 120 for input modes EEG, eye movements and vehicle performance, respectively.
- Each HyperLSTM module comprises a LSTM network and a HyperLSTM network.
- the number of HyperLSTM networks or modules may be the same as the number of weights or hyperparameters to be dynamically learned. For example, for each input mode the hyperparameters may be the standard cell, input, output and forget gate values.
- the HyperLSTM a variant of HyperNetworks, is an auxiliary LSTM network that is designed to dynamically learn hyper parameters, i.e., the weights of each main LSTM cell at each time step.
- the HyperLSTM-based module is a dualnetwork architecture that jointly captures time-series feature representations and adapts itself through dynamic hyperparameters learning from the multimodal information. This joint capturing of information assists with managing data variability among individual drivers.
- the dual-network architecture involves the HyperLSTM output being fed to the LSTM, in each HyperLSTM module.
- the update of the main LSTM network is denoted as: (5)
- the input of the HyperLSTM network is the concatenation of the hidden state from (the principle of the main LSTM network, with reference to equations (4) and (5)) and EEG signals w t :
- weights are functions of a set of embeddings, where the embeddings are linear projections of the hidden states of the relevant auxiliary LSTM network. More formulaically, weights matrices W*, I* and b* are functions of a set of embeddings z* h , z*and z b * , respectively, which are linear projections of the hidden states of HyperLSTM cells: where can be set to any desired value, based on memory usage requirements. For example, N z can be set to 16 to reduce the memory usage required by the ARecNet. is the hidden size of the HyperLSTM. At each time step, weights matrices of main LSTM cells are dynamically formulated.
- the dynamic formulation may follow: (9) where denotes the tensor dot product. Accordingly, the last hidden state of the main LSTM, i.e., , , (106") is obtained as the representation of the EEG information. Similarly, learning representations of eye movements and vehicle states are mapped as n and , respectively, the last hidden state of the respective main LSTMs being labelled 108" and 110", respectively. For this reason, the input modes corresponding to representations w, m, and v have been replaced with k in various formulae, to indicate that the k may be any one of the input modes.
- the outputs of the last hidden state for each input mode, 106", 108", 110" is given to the classifier layer 114.
- the outputs may be concatenated.
- learning or feature representations (collectively 122) obtained by each HyperLSTM- based module (h in equation (10)) performs an equidimensional projection through a fully connected layer 124 (HZ in equation (10) - i.e. the input dimensions are the same, and the output dimensions are consistent with the input dimensions).
- the results are concatenated at 126 as: (10) wherein are parameters to be learned and Then, similarity scores of feature representations of different information sources are computed using an attention matrix 128, formulated as: (11) where is utilized to enhance useful representations through increasing their scores automatically.
- M att can be regarded as a weight matrix. M att and representations 112 will be multiplied, as shown by the connecting path between the two in Figure 12.
- the hidden states are integrated by an integration layer, presently a max pooling layer 130, such that: (12) yhere h att e [R wh denotes the attention-based hidden state with strengthened feature representations.
- the ARecNet performs a nonlinear projection through a classifier layer 114: (14) where are parameters to be learned.
- the predicted cognitive workload level is obtained, being either 0, 1 or 2, corresponding to slight, moderate or intensity cognitive workload.
- the classifier layer 114 may have any appropriate architecture.
- the classifier layer comprises a fully connected layer followed by a softmax activation layer that produces .
- the classifier layer 114 is Figure 1 further comprises a fully connected layer and a rectified linear unit the output of which is fed into the second fully connected layer and from there into the softmas layer.
- Label smoothing may be performed using any appropriate method.
- Slight level Only the primary task is required to be accomplished, i.e., participants need to avoid other vehicles and reach the destination. Moderate level-. In addition to avoiding all obstacles, participants need to recall the colour category of the previous obstacle and press the corresponding button as they drive past new one.
- participant Apart from primary and visual tasks, participants also need to listen to a pre-recorded series of 15 letters separated by approximately 4 second intervals and count the number of times two identical letters appeared in pairs in a sequence, e.g., "H, H".
- the driver workload dataset extracted as set out with reference to Figure 2 can be migrated to both the performance evaluation of other recognition approaches and extended studies involving the cognitive workload, such as driving authority allocation and takeover strategies design, etc.
- the driver's authority In human-machine cooperative driving, the driver's authority is usually determined. For a very high driver workload, the driving authority can be zero. Consequently, to ensure safety the driver's inputs will not be executed.
- the range of driver authority can be [0, 1]. 0 indicates that the vehicle has been taken over by the machine, and 1 means that the vehicle is completely controlled by the human.
- the dataset is multimodal, presently containing three types of information, i.e., EEG signals, eye gaze and vehicle states.
- the dataset contains multiple scenarios - lighting conditions, colours, speeds, obstacles, audio and/or visual tasks.
- the dataset can, for example, reveal the influence of varied visibility on driver workload recognition.
- the dataset is multi-sensory. For example, visual-auditory mixed stimuli can be used, requiring drivers to respond to visual information, which reflects the visual-induced cognitive workload in the real world, and audio stimuli, which reflects auditory activities such as voice navigation and phone calls during practical driving.
- Pre-processing is performed, as set out with reference to Figure 2, to remove artifacts from the data.
- an activity power spectrum of three typical components and the corresponding scalp topographies can be produced as shown in Figure 4.
- the artifact in image (a) of Figure 3 is produced by eye activities such as blink, in which high power at low frequencies is concentrated close to eyes. Muscle artifacts are also evident as shown in image (b) of Figure 3, the muscle artefact having relatively high power at high frequencies (20-30 Hz) with a localized distribution on the scalp topography.
- Image (c) of Figure 3 represents the normal component generated by brain- related activities. Image (c) is therefore adopted to calculate the power of various frequency bands.
- the recognition performance of the present methodology can be evaluation through various metrics, including average accuracy (Gave), precision (Pr), recall (Re) and Fl score, which are formulated as:
- MTS-CNN variant of a CNN-based architecture
- DecNet variant of an LSTM-based network
- CNN-LSTM model CNN-LSTM model
- m-HyperLSTM variant of HyperNetworks which uses only one HyperLSTM-based module
- the present methodology can effectively capture and strengthen useful timeseries feature representations through HyperLSTM-based modules and a crossattention mechanism.
- the designed model was trained using an Adam optimizer, with a desired learning rate - e.g. 0.001.
- the batch size is selected to obtain a trade-off between the training time and model generalization ability (e.g. batch size of 64).
- the recognition accuracies and standard deviations of the the present methodology with varied time-series information and historical horizons under typical driving scenarios is shown in Figure 5.
- vehicle states clearly have a lower influence on cognitive workload than physiological and visual information. Since extra mental workload is generally required to ensure safe driving with the decreased visibility, classification becomes more difficult as average cognitive workload increases (e.g. in rough inverse proportion to visibility).
- the multimodal information fusion-based ARecNet has a relatively stable recognition performance in varied environments, and has lower standard deviation in most cases, indicating that the multimodal information fusion-based ARecNet has better stability.
- each confusion matrix displays the average result of five-fold cross validation.
- the rows are recognition accuracy for slight, moderate and intensive cognitive workload and the classification accuracy of each workload level with respect to prediction values, and the rightmost column shows the recognition results with respect to the ground truth label. Recognition accuracy increases with extended historical horizons.
- the macro-average curves are nearly the same in all of images (a), (b) and (c), indicating superior comprehensive recognition performance of the present methodology in varied weather conditions.
- the optimal threshold point is at the tangent of the corresponding precision-recall curve and the Fl-score curve.
- Table I ablation study on the recognition accuracy of HyperLSTM and cross attention with varied historical horizons in different driving scenarios.
- HyperLSTM greatly improved the model performance in all cases, further indicating its superior feature capturing ability compared to conventional LSTM models.
- the influence of the cross-attention mechanism remains inconspicuous in Table I, especially for HyperLSTM-based models.
- a paired t-test was also employed to determine statistical significance of cross attention, with five-fold cross validation performed 50 times with the same sequence of random seeds, and statistical results presented in Figure 8 - single asterisk (*) and double asterisks (**) represent R values lower than 0.05 and 0.01, respectively.
- Cross attention provided no statistically relevant improvement at a 1 s historical horizon, but provided significantly better performance for longer horizons - e.g. 4 s.
- the phenomena demonstrate that the cross- attention mechanism is better at strengthening useful learning representations of longer sequence multimodal information.
- Embodiments of the present methodology seek to address the two limitations of most previous driver workload recognition models in practical applications, namely single-modality indicators, and time-series signal distortion.
- the present, decision-level multimodal information fusion architecture can employ a cross-attention mechanism to strengthen useful feature representations captured by the HyperLSTM-based module from individual information sources.
- Experimental results demonstrate that the proposed models are advantageous over other baseline approaches in terms of recognition accuracy and robustness.
- the data collection methodology provides a generic driver monitoring framework for advanced driving assistance systems (ADAS).
- ADAS advanced driving assistance systems
- Using a minor alteration of the structure of the model e.g. adding more, or removing, HyperLSTM modules, containing one or more HyperLSTM networks and a main LSTM network, depending on the number of information sources/input modes - e.g. road types and traffic condition monitoring) according to the number of information sources, it can be utilized for driver distraction/fatigue detection.
- accurate driver states recognition can provide a decision-making basis for the mutual takeover of drivers and vehicles, which is beneficial to other ADAS technologies such as lane departure warning systems, traffic jam assistant systems, and others.
- the present framework can also be extended into specific application fields involving multiple biosensors, for example, athlete health stress estimation and air traffic controller states monitoring, etc.
- An end-user computing device or system referred to in this disclosure comprises a smartphone device, a tablet device, a laptop device etc. that is used by an end user to train a supervised learning model for cognitive workload recognition, or implement that model for real time cognitive workload recognition.
- Computing device 900 of Figure 9 illustrates a schematic diagram of one such device.
- the device 900 comprises one or more processing units 910 with access to one or more pre-processors 902 (if used) for pre-processing input data - e.g. from EEG, vehicle behaviour and/or eye tracking - has a communication channel to a camera/external device(s) 904 for collecting input data (e.g.
- auxiliary network module(s) 906 each comprising auxiliary network(s) 908 and a main LSTM network 910, an attention mechanism 912 and a classifier layer 914 (the term “classifier layer” may be used to refer to a single layer or multiple layers in a machine learning network, depending on the context used herein).
- the external device(s) 904 may be integral with, or unitary with, system 900, or may be separate.
- System 900 may be in communication (e.g. over network 918) with one or more server system 916 that serve as a back end system for an application executing on the system 900.
- server system 916 may be a backend application server of a relevant application for input data evaluation executing on the system 900.
- the server system 916 may transmit code or information to the system 900 and may receive information from system 900 obtained after pre-processing input data captured by external device(s) 904.
- the code running the methodology, and/or input data whether before or after pre-processing, may be stored in memory 920.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Public Health (AREA)
- Psychiatry (AREA)
- Pathology (AREA)
- Veterinary Medicine (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Psychology (AREA)
- Child & Adolescent Psychology (AREA)
- Social Psychology (AREA)
- Educational Technology (AREA)
- Hospice & Palliative Care (AREA)
- Developmental Disabilities (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physiology (AREA)
- Fuzzy Systems (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Traffic Control Systems (AREA)
Abstract
Disclosed is a method to train a supervised learning model for cognitive workload recognition by employing a sequence-to-sequence learning paradigm. The model comprises a main long short-term memory (LSTM) network, an auxiliary LSTM network and a classifier layer. The model is configured to output a predicted cognitive workload level of a user in response to input of temporal series information related to the user over a plurality of time steps. It does so by, at each said time step: updating the main LSTM network to map the temporal series information to a sequence of hidden states; updating the auxiliary LSTM network to generate weights for the main LSTM network; and obtaining the predicted cognitive workload level by processing the hidden states and the weights through the classifier layer.
Description
COGNITIVE WORKLOAD RECOGNITION FROM TEMPORAL SERIES INFORMATION
Technical Field
The present invention relates, in general terms, to systems and methods for training supervised learning models for cognitive workload recognition.
Background
Driver workload inference is significant for the design of intelligent humanmachine cooperative driving schemes. Such inference allows systems to alert drivers before potentially dangerous manoeuvres are performed and achieve a safer control transition. However, pattern variations among individual drivers and sensor artefacts pose great challenges to the existing cognitive workload recognition approaches.
Various advanced functions have been developed for intelligent vehicles to improve the driving experience and convenience, but each has its drawbacks. For example, in-vehicle infotainment (IVI) such as navigation systems provide real-time guidance to drivers but visual-manual tasks and auditory-verbal activities increase the driver's mental workload and are secondary to driving, thereby increasing the risk of distraction.
Various approaches to obtaining cognitive load levels have been studied. The most straightforward ones are subjective measures, requiring drivers to conduct the self-evaluation by completing questionnaires after driving tasks. Typically, subjective measuring approaches provide cumulative estimations of the cognitive workload based on drivers' memories, while these methods are intuitive and uncertain since they are susceptible to memory bias.
Studies have been performed to define cognitive workload based on physiological and vehicle indicators. Such methods typically require extended sampling windows (e.g. 2-5 min recordings for heart rate), or a susceptible to noise from changes in driving conditions (e.g. machine vision-based techniques for eye tracking deteriorate in low light). Vehicle indicators such as steering angles, vehicle speeds, and accelerations can be used, but are insensitive to low workload levels.
Many learning-based approaches have been developed for the recognition of driver cognitive loads from different measured signals. Some such approaches employ deep, machine learning technologies. These machine learning-based methods commonly require the manual features extraction from raw data. However, these methods commonly consider confidence of cognitive workload assessment based on individual input modes - e.g. eye tracking or vehicle behaviour - and thereby fail to take advantage of the benefits of multi-modal information.
It would be desirable to overcome or ameliorate at least one of the abovedescribed problems, or at least to provide a useful alternative.
Summary
To address the aforementioned challenges, proposed herein is an Attention- enabled Recognition Network (ARecNet) for recognizing driver cognitive load in real time using multiple input modes - e.g. electroencephalogram (EEG) signals, eye movements and external states. An "external state" may be a "vehicle state" in embodiments applied to cognitive workload recognition for vehicle drivers. Previous machine learning technologies consider single input modalities which cannot fully exploit the complementarity of multimodal data in assessing cognitive workload, due to the information redundancy. In contrast, ARecNet employs a feature-level fusion architecture across input modes, to recognize driver cognitive workload.
Also disclosed is a system that trains a supervised learning model for cognitive workload recognition, wherein the system comprises a plurality of processors configured to train the model by employing a sequence-to-sequence learning paradigm, wherein the model comprises a main long short-term memory (LSTM) network, an auxiliary LSTM network and a classifier layer; wherein the model is configured to output a predicted cognitive workload level of a user in response to input of temporal series information related to the user over a plurality of time steps. It does so by, at each said time step: updating the main LSTM network to map the temporal series information to a sequence of hidden states; updating the auxiliary LSTM network to generate weights for the main LSTM network; and obtaining the predicted cognitive workload level by processing the hidden states and the weights through a classifier layer of the model.
Relevantly, ARecNet may embody a method to train a supervised learning model for cognitive workload recognition by employing a sequence-to-sequence learning paradigm. The model comprises a main long short-term memory (LSTM) network, an auxiliary LSTM network and a classifier layer. The model is configured to output a predicted cognitive workload level of a user in response to input of temporal series information related to the user over a plurality of time steps. It does so by, at each said time step: updating the main LSTM network to map the temporal series information to a sequence of hidden states; updating the auxiliary LSTM network to generate weights for the main LSTM network; and obtaining the predicted cognitive workload level by processing the hidden states and the weights through the classifier layer.
The temporal series information in the phrase "input of temporal series information related to the user over a plurality of time steps", and similar, refers to information relating to the user themselves (e.g. of the driver, where the user is a driver) and information relating to external states (e.g. of a vehicle, such as a neighbouring vehicle or a vehicle being driven by the user, where the user is a driver).
Advantageously, embodiments of the present invention establish an attention- enabled decision-level fusion architecture to infer driver cognitive workload levels. This suggests the availability of a viable generic technique for capturing useful feature representation from time-series multimodal information.
Advantageously, embodiments of the present invention involve constructing a novel driver workload dataset, including multimodal signals and multiple driving scenarios.
Brief description of the drawings
Embodiments of the present invention will now be described, by way of nonlimiting example, with reference to the drawings in which:
Figure 1 is a schematic overview of an attention-enabled cognitive workload recognition method and mode, with multimodal information fusion in accordance with present teachings;
Figure 2 is a platform for multi-modal information capture, which can be applied as a driver-in-the-loop platform;
Figure 3 illustrates a method to train a supervised learning model for cognitive workload recognition, in accordance with present teachings;
Figure 4 shows activity power spectra for typical components and the corresponding scalp topographies, in which image (a) shows eye artefact, (b) shows muscle artefact and (c) is normal;
Figure 5 shows the results of recognition accuracy assessment of the present methodology with varied temporal series information and historical horizons (image (a) tw = 1 s, (b) tw = 2 s, (c) tw = 4 s) in typical driving scenarios;
Figure 6 shows confusion matrices for driver cognitive workload recognition with different historical horizons in typical driving scenarios, in which image (a) is sunny noon, (b) is foggy dusk, and (c) is rainy night;
Figure 7 provides the precision-recall curves of cognitive workload level recognition with the historical horizon tw = 1 s in different driving scenarios, being (a) sunny noon, (b) foggy dusk, and (c) rainy night, where shaded areas represent the extrema across the 5-fold cross validation;
Figure 8 is the outcome of statistical tests of decision-level fusion-based approaches with HyperLSTM modules; and
Figure 9 is a block diagram of a system for cognitive workload recognition (estimation).
Detailed description
Disclosed is ARecNet, an attention-enabled recognition network with a decisionlevel fusion architecture, that assesses cognitive workload estimation performance. Specifically, the present methodology employs a cross-attention mechanism to enhance useful feature representations learned by hyper long short-term memory (HyperLSTM) based modules from time-series multimodal information, e.g., EEG signals, eye movements, external states or behaviours (e.g. vehicle states or vehicle behaviour or, where the user is an athlete, the state or behaviour of the user's body and/or competing athletes around the user - such as on a running track). Also disclosed is the construction of a novel dataset containing multiple driving scenarios for evaluating model performance across different historical horizons and decision thresholds.
The description below will be made with reference to a driver (user) and vehicle states, for illustration purposes only. Without loss of generality, the same
teachings apply to other types of user such as an athlete, where the external states and external behaviours, being "vehicle states" and "vehicle behaviours", can be substituted for the 'states' and 'behaviours' of the athlete and/or one or more neighbouring athletes (on the same team - e.g. in volleyball - or different teams - e.g. in a competitive running event), or such as an air traffic controller (user) where the "vehicle states" and "vehicle behaviours" can be replaced by the "aeroplane states" and "aeroplane behaviours" of the aeroplane currently being directed by the air traffic controller and/or aeroplanes other than the aeroplane currently being directed by the air traffic controller.
Figure 1 schematically represents a method 100, for training a supervised learning model for cognitive workload recognition, in the context of multi-modal input information acquisition 102. The input information comprises temporal or time series information related to the vehicle and/or driver over a plurality of time steps. This information can be acquired through any appropriate mechanism - e.g. the information can be extracted from a database or recorded and used for real-time training.
While the temporal series information may have a single input mode - e.g. EEG, eye tracking/movement or vehicle behavior/performance - the present temporal series information is multi-modal, thus having multiple input modes. Each input mode is a respective one of EEG signals, eye movements and vehicle states.
In some embodiments, the temporal series information is captured from monitoring drivers during normal on-road driving. However, in the present embodiment the temporal series information is captured through the driver-in- the-loop experimental platform 200, shown in Figure 2. The platform 200 may have any appropriate configuration to facilitate data acquisition of the desired input modes, and presently comprises a physical simulator 202 (e.g. Logitech G29), an image capture device or system for tracking eye movement 204 (e.g.
an infrared eye tracker such as Tobii Pro), and a wired or wireless EEG headset
206 (e.g. EMOTIVE EPOC Flex, with 32 channels).
Data are captured for different weather and lighting conditions, such as sunny noon, foggy dusk, and rainy night, to be able to learn cognitive workload information across various levels of visibility and stress. Since human mental workload cannot be directly observed, the method may involve collecting the input temporal series information by subjecting one or more drivers to tests of varying difficulty. For example, n-back tasks with varying difficulty may be employed to modulate cognitive workload levels objectively. The n-back task may be a visual-auditory mixed n-back task for regulating cognitive loads on drivers. This task can therefore reflect cognitive workload introduced by both visual and auditory information during driving.
In each driving environment or under each set of driving conditions, the tests may comprise secondary tasks with the varied amount of information that participants need to memorize and respond to, such as maintaining speed through traffic, driving from origin to destination and others. These secondary tasks enable the system 200 to obtain three classes of driver cognitive workload, namely slight level, moderate level and intensive level, which correspond to ground truth labels - e.g. ternary ground truth labels for three input modes.
The data may be pre-processed. This can be necessary to remove artifacts such as blink, facial and body movement artifacts from eye tracking data. Various techniques can be adopted for removing noise from signals, or removing signals, before extracting sub-band components from raw data. Band-pass filtering (low- and/or high-band) and notch filtering (to remove power supply noise) may restrict the band spectrum to relevant information - e.g. 1-30 Hz, and independent component analysis (ICA) may be used to reject artefact-induced signal components.
The temporal series information captured by the system 200 is transmitted or conveyed to a system 210 that employs method 100 to train a supervised learning model 102 for cognitive workload recognition.
With further reference to Figure 1, the model 104 comprises a main long shortterm memory (LSTM) network 106, an auxiliary LSTM network 108 and a classifier layer 110. The model 104 is configured to output a predicted cognitive workload level of a vehicle driver in response to input of the temporal series information over a plurality of time steps. Via model 104, the method 100 (as also reflected in Figure 3) employs a sequence-to-sequence learning paradigm. The learning paradigm comprises, at each time step:
- step 106' - updating the main LSTM network 106 to map the temporal series information to a sequence of hidden states
- step 108' - updating the auxiliary LSTM network 108 to generate weights for the main LSTM network
- step 110' - obtaining the predicted cognitive workload level by processing the hidden states and the weights through a classifier layer of the model.
The order of the steps may be changed. For example, step 108' may be performed before step 106'.
To make use of the model 104, the driver cognitive load recognition is formulated as a supervised classification problem. In this problem, the workload levels are adopted as labels, as shown in Figure 1. For data capture from a simulated environment, the tasks user/drivers are asked to perform will be assigned a predetermined workload level based on an anticipated difficulty or amount of attention required to successfully complete the task. For data capture in a real world environment, an EEG measurement system may be mounted ot a user/driver and the user-driver may be in a vehicle with one or more image capture devices and sensors mounted to capture images (e.g. video feed) of the driving environment, vehicle parameters - e.g. vehicle speed - and/or eye movements of the user/driver. The images may be cross-referenced to
simulated tasks or otherwise processed to ascertain workload levels at multiple time intervals while driving. For illustration purposes, the present discussion will be made with reference to a simulated environment, but it will be appreciated that the same or similar teachings may be employed in respect of a real world environment. In this embodiment, the temporal series information is multimodal, comprising electroencephalogram (EEG) signals, eye movements and vehicle states. Consequently, given the dataset as:
(1)
X1 denotes the multimodal temporal sequences of size /, y is the corresponding driver cognitive workload, which is categorized into three levels, i.e., slight (y = 0), moderate (y = 1) and intensive (y = 2), j is the jth sample, and N is its total number of samples. For each sample: (2)
wherein x represents the feature vectors across the time steps, with x, being the feature vector at the ith time step. The features may described one or more relationships between the multi-modal input data and driver cognitive load. The temporal series information consists of EEG signals, eye movements and vehicle motion states, denoted by X1 = [W1, M1, V'] and x = [w, m, v], respectively. Specifically, w = [wδ, wΘ, wa, wβ] is the power of four typical EEG frequency bands, i.e., delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz). Greater, fewer or different frequency bands may be used, m = [mCx, mCy, mSx, msy] is horizontal/vertical coordinates and speeds of the eye gaze. In some embodiments, only the coordinates may be provided and in other embodiments only the speeds of eye gaze may be provided, v = [ Δvx, Δvy, Δax, Δay, Δv, y] represents the instantaneous longitudinal and lateral velocities/accelerations of the vehicle with respect to the front one, the relative resultant velocity and the yaw rate, respectively. In some embodiments, only a proper subset of these velocities is required for assessing cognitive workload. Accordingly,
where Dim is the total dimension from concatenating all feature vectors in x. In the embodiment given above, Dim is 14 (w (4), m (4) and v (6)) and thus x ∈
The model then learns to generate the corresponding workload level
based on temporal series information
(3)
where < = {0,1,2} is the label of cognitive workload levels.
With reference to Figure 1, the model itself comprises a main LSTM 106 that maps the temporal series information 108 to a sequence of hidden states 110. The model 104 further comprises at least one auxiliary LSTM network 112 and a classifier layer 114. In some embodiments the model further comprises an attention mechanism 116. The number of HyperLSTMs may correspond to the number of modal inputs - for example, three or four HyperLSTMs will be use for three or four modal inputs, respectively.
As shown, the model 104 comprises a plurality of auxiliary LSTM networks, herein referred to as hyper long short-term memory (HyperLSTM) based modules each associated with a main LSTM, marked 106, 118, 120 for input modes EEG, eye movements and vehicle performance, respectively. Each HyperLSTM module comprises a LSTM network and a HyperLSTM network. The number of HyperLSTM networks or modules may be the same as the number of weights or hyperparameters to be dynamically learned. For example, for each input mode the hyperparameters may be the standard cell, input, output and forget gate values.
The HyperLSTM, a variant of HyperNetworks, is an auxiliary LSTM network that is designed to dynamically learn hyper parameters, i.e., the weights of each
main LSTM cell at each time step. The HyperLSTM-based module is a dualnetwork architecture that jointly captures time-series feature representations and adapts itself through dynamic hyperparameters learning from the multimodal information. This joint capturing of information assists with managing data variability among individual drivers. The dual-network architecture involves the HyperLSTM output being fed to the LSTM, in each HyperLSTM module.
Regarding using the HyperLSTM-based module for mapping EEG signals (the description for other input modes is the same, but with the feature vector for the relevant input mode - so
refers to the hidden state at time I in temporal series k where, for the input modes mentioned above, k is one of w, m and v): given EEG temporal sequences W1, for all t ∈ {1,2,...,/}, the main LSTM network maps the time-series information to a sequence of hidden states
via following updates per: (4)
where δ and tanh are the sigmoid function and hyperbolic tangent function, respectively, Θ represents the element-wise product,
are parameters generated by the HyperLSTM, * denotes one of {/,f,o,c} gates, Nh is the hidden size of the main LSTM network and Nw = 4 is the number of EEG features. The update of the main LSTM network is denoted as:
(5)
The input of the HyperLSTM network is the concatenation of the hidden state from
(the principle of the main LSTM network, with reference to equations (4) and (5)) and EEG signals wt :
In the described methodology, weights are functions of a set of embeddings, where the embeddings are linear projections of the hidden states of the relevant auxiliary LSTM network. More formulaically, weights matrices W*, I* and b* are functions of a set of embeddings z*h, z*and zb* , respectively, which are linear projections of the hidden states of HyperLSTM cells:
where
can be set to any desired value, based on memory usage requirements. For example, Nz can be set to 16 to reduce the memory usage required by the ARecNet.
is the hidden size of the HyperLSTM. At each time step, weights matrices of main LSTM cells are dynamically formulated. The dynamic formulation may follow: (9)
where
denotes the tensor dot product.
Accordingly, the last hidden state of the main LSTM, i.e.,
, , (106") is obtained as the representation of the EEG information. Similarly, learning representations of eye movements and vehicle states are mapped as n and
, respectively,
the last hidden state of the respective main LSTMs being labelled 108" and 110", respectively. For this reason, the input modes corresponding to representations w, m, and v have been replaced with k in various formulae, to indicate that the k may be any one of the input modes.
In some embodiments, the outputs of the last hidden state for each input mode, 106", 108", 110" is given to the classifier layer 114. To achieve this, the outputs may be concatenated. However, in the embodiment shown in Figure 1, learning or feature representations (collectively 122) obtained by each HyperLSTM- based module (h in equation (10)) performs an equidimensional projection through a fully connected layer 124 (HZ in equation (10) - i.e. the input dimensions are the same, and the output dimensions are consistent with the input dimensions). The results are concatenated at 126 as: (10)
wherein are parameters to be learned and
Then, similarity scores of feature representations of different information sources are computed using an attention matrix 128, formulated as: (11)
where
is utilized to enhance useful representations through increasing their scores automatically. Matt can be regarded as a weight matrix. Matt and representations 112 will be multiplied, as shown by the connecting path between the two in Figure 12. The hidden states are integrated by an integration layer, presently a max pooling layer 130, such that: (12)
yhere hatt e [Rwh denotes the attention-based hidden state with strengthened feature representations. To improve the training stability, the hatt may be normalized (at layer 132) as: hatt = layernorm(hatt) (13)
To obtain the probability of each label, the ARecNet performs a nonlinear projection through a classifier layer 114:
(14) where
are parameters to be learned. Eventually, the predicted cognitive workload level
is obtained, being either 0, 1 or 2, corresponding to slight, moderate or intensity cognitive workload. The classifier layer 114 may have any appropriate architecture. In some embodiments, the classifier layer comprises a fully connected layer followed by a softmax activation layer that produces . The classifier layer 114 is Figure 1
further comprises a fully connected layer and a rectified linear unit the output of which is fed into the second fully connected layer and from there into the softmas layer.
During training, instead of relying on a single label, a sequence-to-sequence (Seq2Seq) learning paradigm is employed. For the given dataset D) , an optimized cross-entropy loss function is adopted:
(15)
where x≤t = [x1,x2,...,xt] denotes the subsequence of . In addition to
encouraging the ARecNet to extract feature representations from early observations, the given loss function reduces the possibility of overfitting when the current information is insufficient for recognition.
The cognitive workload cannot be observed directly, resulting in inherent uncertainty in its label. Therefore, a regularization technique, namely label smoothing, is introduced to improve the model generalization ability. Label smoothing may be performed using any appropriate method. For example, label smoothing may comprise assigning the real cognitive workload label a probability that penliases overconfident predictions, e.g. real cognitive workload label yj may be assigned a probability 1 - ∈, while the probability of other labels is replaced by
wherein a tunable parameter ∈ is set to 0.1 and k = 3 is the number of labels (i.e. 0, 1, 2).
In experiments, data was collected from fourteen participants (10 males, 4 females) with varied ages and driving experience. Data collection was performed using the system described with reference to Figure 2. The simulated driving environment was a three-lane expressway with several stationary vehicles distributed randomly in the three lanes. For all driving scenarios, the primary objective was to drive along a straight road and avoid stationary vehicles. Participants were asked to maintain the vehicle speed within a predetermined speed range - e.g. 80-90 km/h. This ensures a consistent workload level throughout driving. Obstacles with various paint colours are placed at a regular interval, and vehicle paint is classified into two categories, namely dark colours (black, navy, sepia) and bright colours (white, red, yellow). The driving tasks in different cognitive workload levels are illustrated below:
Slight level’. Only the primary task is required to be accomplished, i.e., participants need to avoid other vehicles and reach the destination.
Moderate level-. In addition to avoiding all obstacles, participants need to recall the colour category of the previous obstacle and press the corresponding button as they drive past new one.
Intensive level-. Apart from primary and visual tasks, participants also need to listen to a pre-recorded series of 15 letters separated by approximately 4 second intervals and count the number of times two identical letters appeared in pairs in a sequence, e.g., "H, H".
Audio stimuli exist in all experiments to inhibit their effect on EEG signals. However, only participants with the intensive workload level react to them. To ensure no single lane was free of obstacles for an extended stretch of road, a custom-defined discrete distribution for obstacle locations is employed: ( 16)
where KT is the distance between the current obstacle and the previous one in line T , IntervalSize denotes the distance between two adjacent obstacles. Moreover, both visual and audio stimuli are regenerated randomly in each experiment to rule out human memory effect.
The driver workload dataset extracted as set out with reference to Figure 2, can be migrated to both the performance evaluation of other recognition approaches and extended studies involving the cognitive workload, such as driving authority allocation and takeover strategies design, etc. In human-machine cooperative driving, the driver's authority is usually determined. For a very high driver workload, the driving authority can be zero. Consequently, to ensure safety the driver's inputs will not be executed. Generally, the range of driver authority can be [0, 1]. 0 indicates that the vehicle has been taken over by the machine, and 1 means that the vehicle is completely controlled by the human.
The dataset is multimodal, presently containing three types of information, i.e., EEG signals, eye gaze and vehicle states. This facilitates identification of workload response on one channel (mode) where that response is not evident on another channel (mode). The dataset contains multiple scenarios - lighting conditions, colours, speeds, obstacles, audio and/or visual tasks. The dataset can, for example, reveal the influence of varied visibility on driver workload recognition. The dataset is multi-sensory. For example, visual-auditory mixed stimuli can be used, requiring drivers to respond to visual information, which reflects the visual-induced cognitive workload in the real world, and audio stimuli, which reflects auditory activities such as voice navigation and phone calls during practical driving.
Pre-processing is performed, as set out with reference to Figure 2, to remove artifacts from the data. By identifying artifacts, an activity power spectrum of three typical components and the corresponding scalp topographies can be produced as shown in Figure 4. The artifact in image (a) of Figure 3 is produced by eye activities such as blink, in which high power at low frequencies is concentrated close to eyes. Muscle artifacts are also evident as shown in image (b) of Figure 3, the muscle artefact having relatively high power at high frequencies (20-30 Hz) with a localized distribution on the scalp topography. Image (c) of Figure 3 represents the normal component generated by brain- related activities. Image (c) is therefore adopted to calculate the power of various frequency bands. The power may be calculated through: (17)
where w Ψ with Ψ = { δ,θ , α (β} is the power of the corresponding EEG signal, Ω represents EEG channels, fi and fu are lower and upper frequency limits of Ψ- band, Sj(f) is the power spectrum density (PSD) of the jth channel, which is calculated using the fast Fourier transform (FFT) with a hamming window.
All features are uniformly resampled to 10 Hz, i.e., / = 10 tw, and normalized (zz score) to the same scale. Also, the input sequences are extracted using a sliding window with 90% overlap to augment training samples.
The recognition performance of the present methodology can be evaluation through various metrics, including average accuracy (Gave), precision (Pr), recall (Re) and Fl score, which are formulated as:
(18)
(19)
where tp, tn, fp and fn represent true positives, true negatives, false positives and false negatives, respectively.
To test performance enhancement over known methods, a comparative study was performed between the present methodology and previous learning-based methods, namely, MTS-CNN (variant of a CNN-based architecture), DecNet (a variant of an LSTM-based network), CNN-LSTM model, and m-HyperLSTM (a variant of HyperNetworks which uses only one HyperLSTM-based module).
The present methodology can effectively capture and strengthen useful timeseries feature representations through HyperLSTM-based modules and a crossattention mechanism. The designed model was trained using an Adam optimizer, with a desired learning rate - e.g. 0.001. The batch size is selected to obtain a trade-off between the training time and model generalization ability (e.g. batch size of 64).
The recognition accuracies and standard deviations of the the present methodology with varied time-series information and historical horizons under typical driving scenarios is shown in Figure 5. Based on recognition accuracy, vehicle states clearly have a lower influence on cognitive workload than
physiological and visual information. Since extra mental workload is generally required to ensure safe driving with the decreased visibility, classification becomes more difficult as average cognitive workload increases (e.g. in rough inverse proportion to visibility). By comparison, the multimodal information fusion-based ARecNet has a relatively stable recognition performance in varied environments, and has lower standard deviation in most cases, indicating that the multimodal information fusion-based ARecNet has better stability.
The recognition results of the multimodal information fusion-based ARecNet with different historical horizons in typical driving scenarios are shown in Figure 6, in which each confusion matrix displays the average result of five-fold cross validation. In each set (a - sunny noon), (b - foggy dusk) and (c - rainy night) of confusion matrices, the rows (top to bottom) are recognition accuracy for slight, moderate and intensive cognitive workload and the classification accuracy of each workload level with respect to prediction values, and the rightmost column shows the recognition results with respect to the ground truth label. Recognition accuracy increases with extended historical horizons.
Figure 7 shows precision-recall curves illustrating the influence of different decision thresholds n on each workload level (historical horizon tw = 1 s). The macro-average curves are nearly the same in all of images (a), (b) and (c), indicating superior comprehensive recognition performance of the present methodology in varied weather conditions. The optimal threshold point is at the tangent of the corresponding precision-recall curve and the Fl-score curve.
In experiments against known workload recognition models mentioned above, the performance of m-HyperLSTM universally surpassed the LSTM-based models and the performance of DecNet and CNN-LSTM models was close to m- HyperLSTM in some specific situations - this suggests the adaptive module can capture feature representations more effectively than static ones. The Fl scores of the present methodology were significantly higher than those for m- HyperLSTM, especially with the historical horizons tw = 1 s, which is increased
by at least 3.32%. The phenomena indicate the superiority of decision-level fusion architecture in the present disclosure.
In ablation experiments the effects of HyperLSTM and cross-attention within the ARecNet were tested. In this regard, a variant was also tested that lacks an attention mechanism (herein referred to as RecNet). The results are in Table I.
Variants ARecNet
HyperLSTM x x √ √
Cross attention x √ V √
Driving scenario (high visibility): Sunny noon tw = 1 s 0.832 (14.81%)0.843 (13.55%) 0.877 ($0.34%) 0.874 tw = 2 s 0.879 (14.66%)0.892 (13.25%) 0.918 (10.43%) 0.922 tw = 4 s 0.882 (17.16%)0.911 (14.11%) 0.945 (10.53%) 0.950
Driving scenario (medium visibility) : Foggy dusk tw = 1 s 0.772 (15.04%)0.793 (12.46%) 0.817 (fO.49%) 0.813 tw = 2 s 0.836 Q4.89%)0.856 (12.62%) 0.874 (10.57%) 0.879 tw = 4 s 0.842 (18.48%)0.884 (13.91%) 0.913 (10.76%) 0.920
Driving scenario (low visibility): Rainy night tw = 1 s 0.756 (13.69%)0.772 (11.66%) 0.782 (10.38%) 0.785 tw = 2 s 0.787 (14.61%)0.801 (12.91%) 0.816 (11.09%) 0.825 tw = 4 s 0.793 (18. ll%)0.831 (13.71%) 0.853 (11.16%) 0.863
Table I: ablation study on the recognition accuracy of HyperLSTM and cross attention with varied historical horizons in different driving scenarios.
HyperLSTM greatly improved the model performance in all cases, further indicating its superior feature capturing ability compared to conventional LSTM models. The influence of the cross-attention mechanism remains inconspicuous in Table I, especially for HyperLSTM-based models. A paired t-test was also employed to determine statistical significance of cross attention, with five-fold cross validation performed 50 times with the same sequence of random seeds, and statistical results presented in Figure 8 - single asterisk (*) and double asterisks (**) represent R values lower than 0.05 and 0.01, respectively. Cross attention provided no statistically relevant improvement at a 1 s historical
horizon, but provided significantly better performance for longer horizons - e.g. 4 s. The phenomena demonstrate that the cross- attention mechanism is better at strengthening useful learning representations of longer sequence multimodal information.
Embodiments of the present methodology seek to address the two limitations of most previous driver workload recognition models in practical applications, namely single-modality indicators, and time-series signal distortion. The present, decision-level multimodal information fusion architecture can employ a cross-attention mechanism to strengthen useful feature representations captured by the HyperLSTM-based module from individual information sources. Experimental results demonstrate that the proposed models are advantageous over other baseline approaches in terms of recognition accuracy and robustness.
In addition to the driver workload estimation, the data collection methodology provides a generic driver monitoring framework for advanced driving assistance systems (ADAS). Using a minor alteration of the structure of the model (e.g. adding more, or removing, HyperLSTM modules, containing one or more HyperLSTM networks and a main LSTM network, depending on the number of information sources/input modes - e.g. road types and traffic condition monitoring) according to the number of information sources, it can be utilized for driver distraction/fatigue detection. Meanwhile, accurate driver states recognition can provide a decision-making basis for the mutual takeover of drivers and vehicles, which is beneficial to other ADAS technologies such as lane departure warning systems, traffic jam assistant systems, and others.
The present framework can also be extended into specific application fields involving multiple biosensors, for example, athlete health stress estimation and air traffic controller states monitoring, etc.
An end-user computing device or system referred to in this disclosure comprises a smartphone device, a tablet device, a laptop device etc. that is used by an
end user to train a supervised learning model for cognitive workload recognition, or implement that model for real time cognitive workload recognition. Computing device 900 of Figure 9 illustrates a schematic diagram of one such device. The device 900 comprises one or more processing units 910 with access to one or more pre-processors 902 (if used) for pre-processing input data - e.g. from EEG, vehicle behaviour and/or eye tracking - has a communication channel to a camera/external device(s) 904 for collecting input data (e.g. image capture device(s), vehicle monitoring sensors for speed, yaw and others, EEG system etc), auxiliary network module(s) 906 each comprising auxiliary network(s) 908 and a main LSTM network 910, an attention mechanism 912 and a classifier layer 914 (the term "classifier layer" may be used to refer to a single layer or multiple layers in a machine learning network, depending on the context used herein).
The external device(s) 904 may be integral with, or unitary with, system 900, or may be separate.
System 900 may be in communication (e.g. over network 918) with one or more server system 916 that serve as a back end system for an application executing on the system 900. For example, in embodiments where system 900 is a smartphone, the server system 916 may be a backend application server of a relevant application for input data evaluation executing on the system 900. The server system 916 may transmit code or information to the system 900 and may receive information from system 900 obtained after pre-processing input data captured by external device(s) 904.
The code running the methodology, and/or input data whether before or after pre-processing, may be stored in memory 920.
It will be appreciated that many further modifications and permutations of various aspects of the described embodiments are possible. Accordingly, the
described aspects are intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
Claims
Claims
1. A method to train a supervised learning model for cognitive workload recognition by employing a sequence-to-sequence learning paradigm, wherein the model comprises a main long short-term memory (LSTM) network, an auxiliary LSTM network and a classifier layer; wherein the model is configured to output a predicted cognitive workload level of a user in response to input of temporal series information related to the user over a plurality of time steps by, at each said time step: updating the main LSTM network to map the temporal series information to a sequence of hidden states; updating the auxiliary LSTM network to generate weights for the main LSTM network; and obtaining the predicted cognitive workload level by processing the hidden states and the weights through the classifier layer.
2. The method of claim 1, wherein the temporal series information is multimodal, the model comprising, for each input mode, a main long shortterm memory (LSTM) network, an auxiliary LSTM network and a classifier layer, wherein the hidden states from each main LSTM network are integrated and the classifier layer obtains the predicted cognitive workload level by processing the integrated hidden states and the weights.
3. The method of claim 2, wherein each input mode is a respective one of electroencephalogram (EEG) signals, eye movements and external states.
The method of claim 2 or 3, wherein the model comprises an attention mechanism for learning cross-attention parameters between input modes and integrating the hidden states using the cross-attention parameters. The method of any one of claims 1 to 4, wherein updating the main LSTM network is according to
where
refers to the hidden state at time t in temporal series k, kt refers to the temporal series information at time t, ct refers to a gate for selectively carrying information from the previous time step through a forget gate at time t. The method of claim 5, wherein updating the auxiliary LSTM network is according to
where kt is the concatenation of the hidden state ht_1 from
The method of any one of claims 1 to 6, wherein the weights are functions of a set of embeddings, wherein the embeddings are linear projections of the hidden states of the auxiliary LSTM network. The method of any one of claims 1 to 7, wherein the classifier layer is a fully connected layer followed by a softmax activation. The method of any one of claims 1 to 8, wherein employing the sequence- to-sequence learning paradigm comprises adopting a cross-entropy loss function according to
where x<t represents the subsequence of the hidden states. . The method of any one of claims 1 to 9, comprising using label smoothing to improve generalization ability of the model. . The method of any one of claims 1 to 10, comprising using an Adam optimizer to train the model. . The method of any one of claims 1 to 11, wherein the user is a driver. A system that trains a supervised learning model for cognitive workload recognition, wherein the system comprises a plurality of processors configured to train the model by employing a sequence-to- sequence learning paradigm, wherein the model comprises a main long short-term memory (LSTM) network, an auxiliary LSTM network and a classifier layer; wherein the model is configured to output a predicted cognitive workload level of a user in response to input of temporal series information related to the user over a plurality of time steps by, at each said time step: updating the main LSTM network to map the temporal series information to a sequence of hidden states; updating the auxiliary LSTM network to generate weights for the main LSTM network; and obtaining the predicted cognitive workload level by processing the hidden states and the weights through a classifier layer of the model. . The system of claim 12, wherein the temporal series information is multi-modal, the model comprising, for each input mode, a main long short-term memory (LSTM) network, an auxiliary LSTM network and a
classifier layer, wherein the hidden states from each main LSTM network are integrated and the classifier layer obtains the predicted cognitive workload level by processing the integrated hidden states and the weights. . The system of claim 13, wherein each input mode is a respective one of electroencephalogram (EEG) signals, eye movements and external states. The system of claim 13 or 14, wherein the model comprises an attention mechanism for learning cross-attention parameters between input modes and integrating the hidden states using the cross-attention parameters. The system of any one of claims 12 to 15, wherein updating the main LSTM network is according to
where ht refers to the hidden state at time t, xt refers to the temporal series information at time t, ct refers to a gate for selectively carrying information from the previous time step through a forget gate at time t. . The system of claim 16, wherein updating the auxiliary LSTM network is according to
where xt is the concatenation of the hidden state hL-1 from €hyper and xt. . The system of any one of claims 12 to 17, wherein the weights are functions of a set of embeddings, wherein the embeddings are linear projections of the hidden states of the auxiliary LSTM network.
20. The system of any one of claims 12 to 18, wherein the classifier layer is a fully connected layer followed by a softmax activation.
21. The system of any one of claims 12 to 19, wherein employing the sequence-to-sequence learning paradigm comprises adopting a crossentropy loss function according to
where x<t represents the subsequence of the hidden states.
22. The system of any one of claims 12 to 20, wherein the processors are configured to use label smoothing to improve generalization ability of the model.
23. The system of any one of claims 12 to 22, wherein the processors are configured to use an Adam optimizer to train the model.
24. The system of any one of claims 12 to 23, wherein the user is a driver.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG10202250435C | 2022-07-12 | ||
SG10202250435C | 2022-07-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024015018A1 true WO2024015018A1 (en) | 2024-01-18 |
Family
ID=89537553
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2023/050490 WO2024015018A1 (en) | 2022-07-12 | 2023-07-12 | Cognitive workload recognition from temporal series information |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024015018A1 (en) |
-
2023
- 2023-07-12 WO PCT/SG2023/050490 patent/WO2024015018A1/en unknown
Non-Patent Citations (3)
Title |
---|
CHAOPENG PAN; HAOTIAN CAO; WEIWEI ZHANG; XIAOLIN SONG; MINGJUN LI: "Driver activity recognition using spatial‐temporal graph convolutional LSTM networks with attention mechanism", IET INTELLIGENT TRANSPORT SYSTEMS, vol. 15, no. 2, 21 December 2020 (2020-12-21), Michael Faraday House, Six Hills Way, Stevenage, Herts. SG1 2AY, UK , pages 297 - 307, XP006116504, ISSN: 1751-956X, DOI: 10.1049/itr2.12025 * |
HUA QIANG, JIN LISHENG, JIANG YUYING, GAO MING, GUO BAICANG: "Cognitive Distraction State Recognition of Drivers at a Nonsignalized Intersection in a Mixed Traffic Environment", ADVANCES IN CIVIL ENGINEERING, vol. 2021, 3 March 2021 (2021-03-03), pages 1 - 16, XP093132333, ISSN: 1687-8086, DOI: 10.1155/2021/6676807 * |
RUOHAN WANG; PIERLUIGI V. AMADORI; YIANNIS DEMIRIS: "Real-Time Workload Classification during Driving using HyperNetworks", ARXIV.ORG, 7 October 2018 (2018-10-07), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080930566 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11783601B2 (en) | Driver fatigue detection method and system based on combining a pseudo-3D convolutional neural network and an attention mechanism | |
Yin et al. | Automatic dangerous driving intensity analysis for advanced driver assistance systems from multimodal driving signals | |
Doshi et al. | On-road prediction of driver's intent with multimodal sensory cues | |
US7839292B2 (en) | Real-time driving danger level prediction | |
US20220327840A1 (en) | Control device, system and method for determining perceptual load of a visual and dynamic driving scene in real time | |
Manawadu et al. | Multiclass classification of driver perceived workload using long short-term memory based recurrent neural network | |
Costa et al. | Detecting driver’s fatigue, distraction and activity using a non-intrusive ai-based monitoring system | |
Celona et al. | A multi-task CNN framework for driver face monitoring | |
CN113743471B (en) | Driving evaluation method and system | |
JP2019523943A (en) | Control apparatus, system and method for determining perceptual load of visual and dynamic driving scene | |
Rezaei et al. | Simultaneous analysis of driver behaviour and road condition for driver distraction detection | |
Yang et al. | Real-time driver cognitive workload recognition: Attention-enabled learning with multimodal information fusion | |
Wei et al. | Driver's mental workload classification using physiological, traffic flow and environmental factors | |
Zhao et al. | Deep convolutional neural network for drowsy student state detection | |
Selvakumar et al. | Real-time vision based driver drowsiness detection using partial least squares analysis | |
KR102543604B1 (en) | Method for detecting driver fatigue based multimodal, recording medium and system for performing the method | |
Vasudevan et al. | Driver drowsiness monitoring by learning vehicle telemetry data | |
Zhao et al. | A driver stress detection model via data augmentation based on deep convolutional recurrent neural network | |
CN116955943A (en) | Driving distraction state identification method based on eye movement sequence space-time semantic feature analysis | |
WO2024015018A1 (en) | Cognitive workload recognition from temporal series information | |
CN116597611A (en) | Method, system and device for monitoring and early warning driver state | |
CN114170588A (en) | Railway dispatcher bad state identification method based on eye features | |
Kalisetti et al. | Analysis of driver drowsiness detection methods | |
Suresh et al. | Analysis and Implementation of Deep Convolutional Neural Network Models for Intelligent Driver Drowsiness Detection System | |
Subbaiah et al. | Driver drowsiness detection methods: A comprehensive survey |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23840073 Country of ref document: EP Kind code of ref document: A1 |