CN116881853B - Attention assessment method, system, equipment and medium based on multi-mode fusion - Google Patents
Attention assessment method, system, equipment and medium based on multi-mode fusion Download PDFInfo
- Publication number
- CN116881853B CN116881853B CN202311154786.8A CN202311154786A CN116881853B CN 116881853 B CN116881853 B CN 116881853B CN 202311154786 A CN202311154786 A CN 202311154786A CN 116881853 B CN116881853 B CN 116881853B
- Authority
- CN
- China
- Prior art keywords
- scoring
- model
- sub
- data
- accuracy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000004927 fusion Effects 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims description 89
- 230000006870 function Effects 0.000 claims description 42
- 238000011156 evaluation Methods 0.000 claims description 39
- 230000004424 eye movement Effects 0.000 claims description 39
- 230000001815 facial effect Effects 0.000 claims description 29
- 238000012790 confirmation Methods 0.000 claims description 27
- 238000012795 verification Methods 0.000 claims description 21
- 230000015654 memory Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 16
- 230000004069 differentiation Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 8
- 238000002372 labelling Methods 0.000 claims description 6
- 238000000302 molecular modelling Methods 0.000 claims description 6
- 238000005303 weighing Methods 0.000 claims description 6
- 210000004556 brain Anatomy 0.000 claims description 5
- 238000010200 validation analysis Methods 0.000 claims 1
- 210000001508 eye Anatomy 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 210000003128 head Anatomy 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000001133 acceleration Effects 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 239000012141 concentrate Substances 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 210000001747 pupil Anatomy 0.000 description 4
- 230000004434 saccadic eye movement Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000001149 cognitive effect Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000008921 facial expression Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 230000005611 electricity Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000002599 functional magnetic resonance imaging Methods 0.000 description 2
- 238000012880 independent component analysis Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000005252 bulbus oculi Anatomy 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000012407 engineering method Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 230000004399 eye closure Effects 0.000 description 1
- 230000004418 eye rotation Effects 0.000 description 1
- 210000001652 frontal lobe Anatomy 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 208000022119 inability to concentrate Diseases 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 210000004761 scalp Anatomy 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method, a system, equipment and a medium for attention assessment based on multi-mode fusion, which have the technical scheme that multi-mode data of a user to be assessed are collected, wherein the multi-mode data comprise: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles; inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value; and carrying out linear weighting according to each sub-score value, the accuracy weight corresponding to each sub-score value and the time attenuation weight corresponding to each sub-score value to obtain a comprehensive score value, wherein each accuracy weight is determined according to the accuracy of the corresponding score sub-model, each time attenuation weight is determined according to each corresponding score sub-model and a preset time attenuation function, the attention state can be reflected more comprehensively by comprehensively utilizing the multi-mode data, and the accuracy is higher than that of a single mode.
Description
Technical Field
The invention belongs to the technical field of attention assessment, and particularly relates to an attention assessment method, system, equipment and medium based on multi-mode fusion.
Background
Attention has an important impact on the efficiency of the work and life of a person. Attention states can generally be divided into two categories, concentrate and disperse. The inability to concentrate on long periods of time can seriously affect the quality of work and life, and therefore, being able to accurately assess the individual's state of attention in real time is of great importance to improving attention management.
Current studies to evaluate the state of attention are mainly based on physiological signal analysis, representative physiological signals including electroencephalogram (EEG), functional magnetic resonance (fMRI, functional magnetic resonance imaging), eye movement signals, facial videos, and the like. EEG is used as a noninvasive physiological signal, the most widely applied traditional method based on EEG mainly extracts time domain and frequency domain features, but the manually designed features cannot describe the complex dynamics of EEG well. In recent years, the deep learning technology is applied to EEG analysis, and the attention-related time sequence characteristics can be directly learned from the original EEG signals through a cyclic neural network and the like, so that the effect is remarkably improved compared with the traditional characteristic engineering method.
However, EEG signals are susceptible to multifaceted factors, accurate determination of the state of attention is difficult to achieve with only EEG, eye movements and facial videos contain attention state information in addition to EEG, eye movements can reflect line of sight focus, and facial expression changes are also associated with mental states. The existing system such as FaceReader can analyze facial expression to judge psychological states, but most of the existing systems are single-mode analysis, so that effective fusion of multi-mode physiological signals is not realized, and judging accuracy is uneven. Therefore, how to perform the fusion of the multi-mode physiological signals and improve the accuracy and reliability of the judgment of the attention state is a current technical problem.
Disclosure of Invention
The invention aims to provide a attention assessment method, a system, equipment and a medium based on multi-mode fusion, which comprehensively utilize multi-mode data, can reflect attention states more comprehensively and have higher accuracy than single mode.
The first aspect of the invention discloses a attention assessment method based on multi-mode fusion, which is characterized by comprising the following steps:
collecting multi-modal data of a user to be evaluated, wherein the multi-modal data comprises: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles;
inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value;
and carrying out linear weighting according to each sub-score value, the accuracy weight corresponding to each sub-score value and the time attenuation weight corresponding to each sub-score value to obtain a comprehensive score value, wherein each accuracy weight is determined according to the accuracy of the corresponding score sub-model, and each time attenuation weight is determined according to each corresponding score sub-model and a preset time attenuation function.
Optionally, the time decay function is:
,
Wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the current time +.>Is the optimal time attenuation coefficient corresponding to the scoring sub-model i, < ->The time attenuation weight corresponding to the scoring submodel i is obtained;
the method for determining the optimal time attenuation coefficient comprises the following steps:
defining time attenuation coefficients of the evaluation molecular models;
acquiring a plurality of training data corresponding to each evaluation molecular model, and labeling all the training data with real labels;
inputting each training data into a corresponding scoring sub-model to obtain a corresponding prediction score, and comparing each prediction score with a corresponding real label to obtain a corresponding comparison result;
and accumulating and summing all comparison results corresponding to each evaluation molecular model to obtain a loss function, updating the corresponding time attenuation coefficient through back propagation of the loss function, and obtaining the corresponding optimal time attenuation coefficient after multiple iterative training.
Optionally, each evaluation molecular model is obtained through training of the following steps:
acquiring a plurality of historical mode data of the user to be evaluated corresponding to each scoring sub-model;
judging whether the sum of the number of the historical mode data and the current mode data corresponding to each evaluation molecular model is smaller than a preset threshold value;
If yes, loading each pre-trained general scoring model, selecting a first preset percentage of modal data from all historical modal data and current modal data corresponding to each general scoring model as a corresponding fine tuning data set, fixing parameters of all layers except the last layer of each general scoring model, training the last layer by using the corresponding fine tuning data set, and updating the weight of the last layer to obtain a corresponding scoring sub-model;
if not, loading each pre-trained general scoring model, selecting a second preset percentage of mode data from all the historical mode data and the current mode data corresponding to each general scoring model as a training set, using the rest of mode data as a verification set, training the corresponding general scoring model by using the training set, testing the trained general scoring model by using the corresponding verification set every preset training round number until the loss function of the corresponding verification set converges, and using the trained general scoring model as a corresponding scoring sub-model.
Optionally, the determining, by the accuracy weights, the accuracy of the corresponding scoring sub-models includes:
Extracting a corresponding feature set from each current mode data of the user to be evaluated;
calculating the distinguishing degree of the corresponding feature set on each evaluation molecular model;
determining the accuracy weight of each scoring submodel according to an accuracy weight calculation formula, wherein the accuracy weight calculation formula is as follows:
;
;
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for scoring the accuracy corresponding to submodel i, < ->For scoring the degree of differentiation corresponding to submodel i, < >>For the average differentiation, ++>For the weight adjustment factor, +.>And (5) weighing the accuracy corresponding to the scoring submodel i.
Optionally, the method further comprises:
judging whether the comprehensive score value is lower than a preset score value or not;
if yes, judging whether the comprehensive score value rises to be more than a preset score value in the confirmation time window, if yes, judging that the time is short, if not, judging that the time is real, and sending out a warning;
if not, the normal attention state is determined.
Optionally, the method for determining the confirmation time window includes:
collecting a plurality of sections of historical comprehensive scoring value sequences, wherein the comprehensive scoring values in the historical comprehensive scoring value sequences are selected according to preset quantity;
Calculating the mean value and standard deviation of each historical comprehensive scoring value sequence;
setting a candidate time sequence, wherein the candidate time sequence comprises a plurality of time windows;
counting the proportion of the true low-attention state with correct prediction of all the historical comprehensive score value sequences to the true low-attention state in each time window to obtain a corresponding recall rate, and taking the time window with the highest recall rate as a first window;
calculating a first average value of the average values of all the historical comprehensive score value sequences, calculating a second average value of the standard values of all the historical comprehensive score value sequences, and calculating a second window through the first average value and the second average value;
and testing the recall rates of the first window and the second window on the historical comprehensive score value sequence, judging whether the recall rate of the second window is higher than that of the first window, if so, taking the second window as a confirmation time window, and if not, taking the first window as the confirmation time window.
Optionally, the multi-modal data includes: any of brain electrical data, eye movement data, and facial video data.
The second aspect of the invention discloses a attention assessment system based on multi-modal fusion, comprising:
The data acquisition module is used for acquiring multi-mode data of a user to be evaluated, wherein the multi-mode data comprises: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles;
the scoring sub-module is used for inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value;
and the comprehensive scoring module is used for carrying out linear weighting according to each sub-scoring value, the accuracy weight corresponding to each sub-scoring value and the time attenuation weight corresponding to each sub-scoring value to obtain a comprehensive scoring value, wherein each accuracy weight is determined according to the accuracy of the corresponding scoring sub-model, and each time attenuation weight is determined according to each corresponding scoring sub-model and a preset time attenuation function.
In a third aspect the invention discloses a computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when the processor executes the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method described above.
The technical scheme provided by the invention has the following advantages and effects: the attention state can be reflected more comprehensively by comprehensively utilizing the multi-mode data, and the accuracy is higher than that of a single mode; the time attenuation weights of the corresponding scoring submodels are determined through the scoring submodels and the preset time attenuation function, so that the influence of the latest data on attention assessment is larger, meanwhile, the characteristic of attention attenuation along with time is simulated, the cognitive characteristic that the attention cannot be continuously stabilized is met, and the real-time assessment and monitoring of high sensitivity of the attention variation can be realized; and training the last layer of the general scoring model by using the fine-tuning data set to obtain a corresponding scoring sub-model, wherein the scoring sub-model is a model obtained after adjustment according to individual data of the user to be evaluated, after continuously collecting multi-modal data of the user to be evaluated, the historical modal data of the user to be evaluated is gradually increased until the user to be evaluated becomes an old user, loading the corresponding pre-trained general scoring model, randomly initializing network weights, not loading the pre-training weights, training the corresponding general scoring model by using the training set, obtaining model parameters when training is unstable, obtaining the corresponding scoring sub-model, and realizing individual modeling of the user to be evaluated.
Drawings
FIG. 1 is a flow chart of a method for attention assessment based on multi-modal fusion, disclosed in an embodiment of the invention;
FIG. 2 is a block diagram of a multi-modal fusion-based attention assessment system in accordance with an embodiment of the present invention;
fig. 3 is an internal structural diagram of a computer device disclosed in an embodiment of the present invention.
Detailed Description
In order that the invention may be readily understood, a more particular description of specific embodiments thereof will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
As used herein, the terms "first and second …" are used merely to distinguish between names and not to represent a particular number or order unless otherwise specified or defined.
The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items, unless specifically stated or otherwise defined.
The term "fixed" or "connected" as used herein may be directly fixed or connected to an element, or indirectly fixed or connected to an element.
As shown in fig. 1, an embodiment of the present invention discloses a method for evaluating attention based on multi-modal fusion, including:
step S1, acquiring multi-modal data of a user to be evaluated, wherein the multi-modal data comprises: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles.
The multi-modal data in this embodiment includes: electroencephalogram data, eye movement data, and face video data; in other embodiments the multimodal data may be any two of electroencephalogram data, eye movement data, and facial video data; in practical application, loadable portable electroencephalogram acquisition equipment can be used, sampling rate is selected to be 250Hz or more, electroencephalogram detail is ensured to be read, relevant area electroencephalogram data such as frontal lobe, top lobe and the like are acquired, scalp of a user to be evaluated is required to be clean in the acquisition process, electrode contact skin is good, signal quality is ensured, a non-contact eye tracker can be used, an infrared technology is adopted, frequency is more than or equal to 100Hz, equipment is calibrated in the acquisition process, accurate pupil position and gaze point information is ensured to be acquired, a high-definition camera can be used, resolution ratio is 1080p or more, face detail of the user to be evaluated is acquired, imaging angle is ensured to be shot, partial area distortion is avoided, light conditions are sufficient and uniform in the acquisition process, and diffuse reflection is avoided to influence image quality.
And S2, inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value.
In this embodiment, after the electroencephalogram data, the eye movement data and the face video data are acquired, preprocessing is required to be performed on the electroencephalogram data, the eye movement data and the face video data, for example, the electroencephalogram data is filtered by using a band-pass filter, the frequency range is generally 0.5-70Hz, direct current drift and high-frequency noise can be effectively filtered, a common band-pass filter comprises an FIR (Finite Impulse Response ) filter, an IIR (Infinite Impulse Response, infinite impulse response) filter and the like, noise components caused by the eye electricity, myoelectricity and the like can be separated and removed by using a method of Independent Component Analysis (ICA) and the like, and in other embodiments, noise reduction can be achieved by using a method of narrow-band wave trap and the like. Segmenting the filtered and noise-reduced electroencephalogram data according to a time window to obtain a plurality of electroencephalogram data segments, wherein the time window can take 2-10 seconds, extracting characteristics of each electroencephalogram data segment, such as calculating statistical characteristics of mean value, variance, peak value, time domain waveform signal energy and the like of each electroencephalogram data segment, using fast Fourier transform to each electroencephalogram data segment to obtain a power spectrum of each frequency band, extracting relative energy or power characteristics of main frequency bands such as delta (1-3 Hz), theta (4-7 Hz), alpha (8-13 Hz), beta (14-30 Hz) and the like, extracting complexity and randomness characteristics of each electroencephalogram data segment by adopting methods such as relevant dimension, maximum Lyapunov index and the like, obtaining a time-frequency diagram by using methods such as wavelet transform and the like, describing energy distribution of each electroencephalogram data segment in time and frequency, and extracting characteristics such as relativity and synchronicity among electroencephalogram data segments of different parts.
For preprocessing of eye movement data, a low-pass filter can be used for smoothing and denoising the eye movement data, the low-pass filter can be used for carrying out eye movement calibration by adopting a mean value filter, gaussian filter and the like, then eye movement calibration is carried out by adopting a stimulus reference point, a mapping between an eye movement angle and a display coordinate is established for eliminating errors caused by head movement, in addition, distortion samples caused by eye rotation and eye closure in the eye movement data are required to be detected and removed, a speed threshold and an acceleration threshold can be set for identifying the distortion samples, such as an upper speed threshold, a lower speed threshold, an upper acceleration threshold and an acceleration lower limit threshold are set, eye movement data which are not between the upper speed threshold and the lower speed threshold are removed, eye movement data which are not between the upper acceleration threshold and the lower acceleration threshold are then divided into eye movement data segments with fixed time length, so that characteristics can be extracted, such as calculating statistical characteristics of fixation times, fixation duration and the like in each eye movement data segment, analyzing the viewing range, saccade path length, saccade average saccade path length, saccade diameter and the like of each eye movement data segment, and the characteristics of the eye movement data which are extracted from the eye movement data are subjected to the stimulus, and the pupil movement characteristics are predicted, and the pupil movement characteristics are changed, and the pupil movement characteristics are described.
For preprocessing of video face data, firstly, the illumination condition and shielding condition of each frame of image in the video face data are checked, then, images with quality which cannot meet the requirements are removed, and threshold values such as brightness, contrast and integrity can be set. After eliminating images with quality which cannot meet the requirements in video facial data, facial positioning can be performed by using Haar, HOG (Histogram of Oriented Gradient, direction gradient histogram) and other features, facial areas are extracted, the Haar features comprise edge features, linear features, central features and diagonal features, then the extracted facial images are rotated, scaled and corrected according to the positions of eyes, the corrected facial images are cut according to the required size to extract facial local areas, the facial image quality is improved by using histogram equalization, denoising and other methods, image enhancement of the facial images is realized, feature extraction is performed on the facial images after the image enhancement, if an expression descriptor based on feature point displacement is constructed, basic expression categories are represented, so that expression features are extracted, microscopic actions are captured by adopting methods such as optical flow, image enhancement and the like, micro expression appearance is judged, micro expression features are extracted, head three-dimensional rotation is estimated, the direction of attention deviation is judged to extract head posture features, and the eye positioning is combined with the eye opening degree and other features are analyzed to extract eye features.
The scoring sub-model in this embodiment includes: an electroencephalogram scoring sub-model, an eye movement scoring sub-model, and a face scoring sub-model; the electroencephalogram scoring sub-model uses an LSTM (Long Short Term Memory, long-term memory network) model, a time step folding layer is added behind the LSTM, the sub-scoring value of each time step can be obtained, the time step folding layer is a full-connection layer, the preprocessed electroencephalogram data is time sequence data, the preprocessed electroencephalogram data is input into the electroencephalogram scoring sub-model, the LSTM can learn the long-term dependency characteristic of a time sequence (namely the electroencephalogram data), long-distance time sequence rules are captured and transmitted through memory cells, the time sequence characteristic is obtained, the LSTM also comprises gating structures such as an input gate, an output gate and the like, unimportant information can be filtered, the time sequence characteristic of the preprocessed electroencephalogram data is obtained after the LSTM, and the time sequence characteristic is converted into the sub-scoring value of the electroencephalogram data corresponding to each time point after the time step folding layer and the full-connection layer are processed; the eye movement scoring sub-model uses a convolutional neural network model, the preprocessed eye movement data is usually a two-dimensional eye movement track image or thermodynamic diagram, the convolutional neural network can extract local features of the eye movement track image or thermodynamic diagram, such as eyeball accurate positioning, fixation point gathering and the like, the convolutional kernel automatically learns the features by sliding sampling at different positions, fine changes of the eye movement track can be detected, and the convolutional layer outputs sub-scoring values corresponding to the eye movement data through a full-connection layer after extracting the local features; the facial scoring sub-model uses a convolutional neural network model, the preprocessed facial video data is usually a facial image sequence, facial expression features are extracted through a convolutional layer, local connection is added, a part of interest region is connected, for example, the interest region is concentrated in key regions such as eyes and mouth, micro expression features are learned, after the facial features are extracted through the convolutional layer, the facial features are input into a full-connection layer to be converted into sub-scoring values corresponding to the facial video data.
Specifically, each evaluation molecular model is obtained through training by the following steps:
acquiring a plurality of historical mode data of the user to be evaluated corresponding to each scoring sub-model;
judging whether the sum of the number of the historical mode data and the current mode data corresponding to each evaluation molecular model is smaller than a preset threshold value;
if yes, loading each pre-trained general scoring model, selecting a first preset percentage of modal data from all historical modal data and current modal data corresponding to each general scoring model as a corresponding fine tuning data set, fixing parameters of all layers except the last layer of each general scoring model, training the last layer by using the corresponding fine tuning data set, and updating the weight of the last layer to obtain a corresponding scoring sub-model;
if not, loading each pre-trained general scoring model, selecting a second preset percentage of mode data from all the historical mode data and the current mode data corresponding to each general scoring model as a training set, using the rest of mode data as a verification set, training the corresponding general scoring model by using the training set, testing the trained general scoring model by using the corresponding verification set every preset training round number until the loss function of the corresponding verification set converges, and using the trained general scoring model as a corresponding scoring sub-model.
In this embodiment, the preset threshold may be set to 200 groups, in the case that the user to be evaluated is a new user, no history mode data exists, at this time, the number of collected current mode data is generally smaller than the preset threshold, then each pre-trained general scoring model needs to be loaded, if the collected current mode data is electroencephalogram data, the LSTM model corresponding to the electroencephalogram data is trained by using a preset training sample, the training method adopts a training method in the prior art, and the preset training sample is the electroencephalogram data of other users collected in advance, so as to obtain a general scoring model corresponding to the trained electroencephalogram data; under the condition that the acquired current mode data is eye movement data, training a convolutional neural model corresponding to the eye movement data by using a preset training sample, wherein the training method adopts a training method in the prior art, and the preset training sample is eye movement data of other users acquired in advance, so that a general scoring model corresponding to the trained eye movement data is obtained; under the condition that the acquired current modal data is facial video data, training a convolutional neural model corresponding to the facial video data by using a preset training sample, wherein the training method adopts a training method in the prior art, the preset training sample is the facial video data of other users acquired in advance, a general scoring model corresponding to the trained facial video data is obtained, after the trained general scoring model corresponding to each current modal data is obtained, the first preset percentage is 10% -20%, if the number of the acquired current modal data of the user to be evaluated is 50, 10 groups (namely, the first preset percentage is 20%) are randomly selected from the current modal data and serve as a corresponding fine tuning data set, the corresponding general scoring model is trained by using the fine tuning data set, specifically, parameters of all layers (such as a convolutional layer, a folding layer and the like) except the last layer (namely, a full-connection layer) in the general scoring model are fixed, the initial learning rate is set to be 0.00001, the number of training iteration rounds is 5, and the small learning rate and the number of times of overlapping are avoided are adopted; training for 5 rounds by utilizing the fine adjustment data set, and updating the weight of the last layer to obtain a corresponding scoring sub-model, wherein the scoring sub-model is a model obtained after adjustment according to the individual data of the user to be evaluated; after continuously collecting multi-modal data of a user to be evaluated, the historical modal data of the user to be evaluated is gradually increased, and in the process that the historical modal data is gradually increased, a first preset percentage of modal data is selected from the historical modal data and the current modal data to serve as a corresponding fine adjustment data set, so that training is continuously carried out on all universal scoring models, and dynamic adjustment of model parameters is achieved; under the condition that the sum of the historical mode data corresponding to each evaluation molecular model and the current mode data of the user to be evaluated is not smaller than a preset threshold value, the user to be evaluated is already the old user, the second preset percentage is 80% -90%, if the sum of the historical mode data corresponding to the evaluation molecular models and the current mode data is 600 groups of data, 80% of data (480 groups) are taken as training sets, 20% of data (120 groups) are taken as verification sets, a corresponding pre-trained general score model is loaded, network weights are randomly initialized, the pre-training weights are not loaded, a smaller learning rate is set, for example, 0.001 is set, the general score model corresponding to training is used, the number of iteration rounds is 100, each training round is carried out for 10 times, the general score model is tested by using the corresponding verification sets, a loss function corresponding to the verification set of each test is recorded, if the loss function of continuous 5 times of iteration is not reduced, the loss function corresponding to the verification sets is converged, model parameters of training loss stabilization are ended in advance, the corresponding training loss functions are obtained, the corresponding individual score models are obtained, and the individual evaluation model to be evaluated is realized.
In this embodiment, after the scoring sub-model corresponding to each current mode data is obtained, the corresponding scoring sub-model can be further fine-tuned by using each current mode data, so as to enhance model generalization. And feeding back the performance of each scoring sub-model on the corresponding current modal data to the corresponding general scoring model, and repeating the process as a supervision signal for general scoring model optimization, and continuously using each current modal data to jointly optimize the corresponding scoring sub-model and the corresponding general scoring model to realize group coordination.
And S3, carrying out linear weighting according to each sub-score value, the accuracy weight corresponding to each sub-score value and the time attenuation weight corresponding to each sub-score value to obtain a comprehensive score value, wherein each accuracy weight is determined according to the accuracy of the corresponding score sub-model, and each time attenuation weight is determined according to each corresponding score sub-model and a preset time attenuation function.
In this embodiment, the time decay function is:
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the current time +.>Is the optimal time attenuation coefficient corresponding to the scoring sub-model i, < - >The time attenuation weight corresponding to the scoring submodel i is obtained;
the method for determining the optimal time attenuation coefficient comprises the following steps:
defining time attenuation coefficients of the evaluation molecular models;
acquiring a plurality of training data corresponding to each evaluation molecular model, and labeling all the training data with real labels;
inputting each training data into a corresponding scoring sub-model to obtain a corresponding prediction score, and comparing each prediction score with a corresponding real label to obtain a corresponding comparison result;
and accumulating and summing all comparison results corresponding to each evaluation molecular model to obtain a loss function, updating the corresponding time attenuation coefficient through back propagation of the loss function, and obtaining the corresponding optimal time attenuation coefficient after multiple iterative training.
In particular, the method comprises the steps of,the rate of decay of the sub-score values of scoring sub-model i over time is controlled, and different settings can be setReflecting the timeliness of different scoring submodels i, such as the electroencephalogram scoring submodel with very strong timeliness and arrangementAre generally larger; for the eye movement scoring submodel, the timeliness is weaker, set +.>Is usually smaller, in this embodiment +. >Initializing to a small value, e.g. 0.01, i.e. defining a time decay factor for each of said scoring models, inputting each of said training data into the correspondingThe prediction scores and the corresponding real labels are compared, the difference between the prediction scores and the real labels is calculated, MSE (Mean squared error, root mean square error), MAE (meanabsolute error, average absolute error), cross entropy and the like can be used as difference metrics to obtain comparison results, the prediction scores of all training samples and the difference of the real labels are accumulated and summed to obtain a final loss function, the corresponding time attenuation coefficient value is updated through the back propagation of the loss function, the loss function is minimized, the corresponding time attenuation coefficient is continuously optimized through a plurality of training rounds until the loss function converges or the maximum round reaches, the finally obtained time attenuation coefficient is the optimal time attenuation coefficient of the corresponding scoring sub-model, the optimal time attenuation coefficient is set, the sensitivity of attention evaluation to time is reflected, the time attenuation weight of the scoring sub-model is determined through the optimal time attenuation coefficient and the time attenuation function, so that the influence of the latest data on the attention evaluation is larger, meanwhile, the characteristic of attention attenuation along with time is simulated, the attention is not continuously stable, and the cognitive characteristic of high attention can be realized.
In this embodiment, the determining, by using the accuracy weights, the accuracy of the corresponding scoring sub-models includes:
extracting a corresponding feature set from each current mode data of the user to be evaluated;
calculating the distinguishing degree of the corresponding feature set on each evaluation molecular model;
determining the accuracy weight of each scoring submodel according to an accuracy weight calculation formula, wherein the accuracy weight calculation formula is as follows:
;
;
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for scoring the accuracy corresponding to submodel i, < ->For scoring the degree of differentiation corresponding to submodel i, < >>For the average differentiation, ++>For the weight adjustment factor, +.>And (5) weighing the accuracy corresponding to the scoring submodel i.
Specifically, extracting features in each current mode data of the user to be evaluated to form a corresponding feature set, for example, a feature set corresponding to the electroencephalogram data is an electroencephalogram feature set, a feature set corresponding to the eye movement data is an eye movement feature set, a feature set corresponding to the face video data is a face video feature set, the distinguishing degree of the corresponding feature set is calculated on each evaluation molecular model, N feature sets corresponding to each mode data need to be collected in advance, each feature set of the user to be evaluated and each feature set of the N users are input into a corresponding scoring sub-model, each scoring sub-model obtains a plurality of feature outputs, a classifier is used for predicting which of the user to be evaluated and the N users belongs to, and the classification accuracy is calculated and is the distinguishing degree of the corresponding scoring sub-model; nodding is a common attention-related feature as in facial video data, where the head remains relatively stationary while the average person is concentrating on work, but there are differences between individuals. For example, user a may have a nodding habit when concentrating, which is an individual characteristic of a, while other users may not nod when concentrating, which is different from user a. If such individual differences are not taken into consideration, when the system detects the user a nod, it may be misjudged as inattentive because the nod is generally regarded as a feature of inattention. In the method, the degree of distinction of the face scoring submodel is calculated, so that the nod has strong distinction degree on the user A, that is, the nod features can effectively distinguish different attention states of the user A, and then the weight of the nod features in the face scoring submodel can be enhanced, and personalized parameter adjustment can be carried out. When the user A clicks, the user A can correctly judge that the user A is actually in a concentration state. Thereby improving the accuracy of the assessment of the attentiveness status of user a. The specific calculation process is as follows:
(1) A section of facial video data of the user A in different attention states is collected, and head position change among video frames is extracted as nodding features.
(2) When the user A concentrates on, the head position change is larger than the head position change when not concentrating on, the concentrated state point feature set is set as F1, and the non-concentrated state point feature set is set as F2.
(3) The other 3 users' nod feature sets in both the concentrate and the non-concentrate state are collected and respectively marked as f1_1, f2_1, f1_2, f2_2, f1_3, f2_3.
(4) These 8 feature sets are input into the face scoring sub-model, each resulting in an output score O1_ A, O2_ A, O _1, O2_1, O1_2, O2_2, O1_3, O2_3.
(5) The 8 output scores were used as samples, and the labels were a focus, a not focus, 1 not focus, 2 not focus, 3 not focus, and construct a classification problem.
(6) After training using the Logistic regression classifier, the 8 samples, O1_ A, O2_ A, O1_1, O2_1, O1_2, O2_2, O1_3, O2_3, were predicted. Assuming that the prediction results are that the predictions of O1_ A, O2_ A, O _1, O2_1, O1_2 and O2_2 are correct, and the predictions of O1_3 and O2_3 are incorrect, the calculated classification accuracy is equal to that the correct number of samples accounts for 75% of the total number of samples, and the differentiation of the nodding feature set on the face scoring sub-model is 75%.
After the differentiation degree corresponding to each scoring sub-model is obtained, the accuracy weight of each scoring sub-model is obtained through calculation according to an accuracy weight calculation formula, and after the modal data of the user to be evaluated and other users are continuously collected, the differentiation degree can be recalculated, the accuracy weight can be adjusted according to the collected modal data of the user to be evaluated and other users, and the accuracy weight of each scoring sub-model can be optimized in real time. And finally obtaining the accuracy weight matched with the individual difference of the user to be evaluated for attention scoring.
After each sub-score value, the accuracy weight corresponding to each sub-score value and the time attenuation weight corresponding to each sub-score value are obtained, a comprehensive score value is obtained through calculation according to a score formula, wherein the score formula is as follows:
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the current time +.>Is the accuracy weight corresponding to scoring submodel i, < ->Time decay weight corresponding to scoring submodel i, < ->Is a comprehensive score value.
The embodiment further includes:
judging whether the comprehensive score value is lower than a preset score value or not;
if yes, judging whether the comprehensive score value rises to be more than a preset score value in the confirmation time window, if yes, judging that the time is short, if not, judging that the time is real, and sending out a warning;
If not, the normal attention state is determined.
Specifically, the confirmation time window is a time period, such as 1 minute, 3 minutes, 5 minutes or 10 minutes, through the setting of the confirmation time window, when the comprehensive score value is detected to be lower than the preset threshold value, a confirmation stage is started, whether the comprehensive score value is returned to be higher than a normal value or not in the confirmation time window is judged, if the comprehensive score value is returned to be higher than the normal value in the confirmation time window, the user to be evaluated is indicated to be in a short-time low-attention state, if the comprehensive score value is not returned to be higher than the normal value in the confirmation time window, the user to be evaluated is indicated to be in a real low-attention state, a warning is sent, and through the confirmation stage, false warning of the short-time low-attention is avoided.
The method for determining the confirmation time window in this embodiment includes:
collecting a plurality of sections of historical comprehensive scoring value sequences, wherein the comprehensive scoring values in the historical comprehensive scoring value sequences are selected according to preset quantity;
calculating the mean value and standard deviation of each historical comprehensive scoring value sequence;
setting a candidate time sequence, wherein the candidate time sequence comprises a plurality of time windows;
Counting the proportion of the true low-attention state with correct prediction of all the historical comprehensive score value sequences to the actual low-attention state in each time window to obtain a corresponding recall rate, and taking the time window with the highest recall rate as a first window;
calculating a first average value of the average values of all the historical comprehensive score value sequences, calculating a second average value of the standard values of all the historical comprehensive score value sequences, and calculating a second window through the first average value and the second average value;
and testing the recall rates of the first window and the second window on the historical comprehensive score value sequence, judging whether the recall rate of the second window is higher than that of the first window, if so, taking the second window as a confirmation time window, and if not, taking the first window as the confirmation time window.
Specifically, a section of historical comprehensive score value sequence may be { score1, score 2..score n }, the candidate time sequence may be { T1, T2,..tm }, e.g., {1min, 3min, 5min, 10min }, when the comprehensive score value in the historical comprehensive score value sequence is lower than the preset score value, each time window is used to predict the real low-attention state corresponding to all the historical comprehensive score value sequences, then the recall rate corresponding to each time window is counted, the higher the recall rate is, the more the identifiable low-attention time period is indicated, the better the effect is, the recall rate curves of different time windows can be drawn, the x-axis is the different time windows, the y-axis is the corresponding recall rate, the curve is drawn by using the (time window, the recall rate) coordinate point, the peak value of the curve is observed, and the time window with the highest recall rate is found as the first window.
The second window is calculated in this embodiment using the following formula,
,
wherein,for the second window, k1 and k2 are preset coefficients, < >>For the first average, +.>For the second average, the values of k1 and k2 may be trained by grid search, specifically, the values of k1 and k2 are set to be super-parameters, e.g., the k1 value ranges of [0.5,1,2]The k2 value range is [5,10,20 ]]Establishing a two-dimensional parameter grid, wherein the horizontal axis is the value of k1, the vertical axis is the value of k2, 9 combinations (k 1, k 2) exist in the two-dimensional parameter grid, training a scoring sub-model for each combination (k 1, k 2), calculating an evaluation index on a verification set, such as accuracy, traversing all combinations, finding out the set of super-parameter values (k 1, k 2) with the best evaluation index, retraining the scoring sub-model by using the optimal (k 1, k 2) to obtain a final model, selecting the optimal parameters by traversing different combinations in a super-parameter space by using the verification set evaluation index,thereby finding the best values of k1 and k 2; by simultaneously utilizing the information of the mean value and the standard deviation, the whole level and the fluctuation range of the historical score can be more comprehensively evaluated, more reasonable window time is adaptively calculated, if the recall rate of the second window is higher than that of the first window, the self-adaptive mode is described to truly promote the confirmation effect, the second window is determined to be the confirmation time window, and otherwise, the first window is determined to be the confirmation time window.
According to the attention assessment method based on multi-mode fusion, disclosed by the embodiment of the invention, the attention state can be reflected more comprehensively by comprehensively utilizing multi-mode data such as brain electricity data, eye movement data and facial video data, and the accuracy is higher than that of a single mode; the time attenuation weights of the corresponding scoring submodels are determined through the scoring submodels and the preset time attenuation function, so that the influence of the latest data on attention assessment is larger, meanwhile, the characteristic of attention attenuation along with time is simulated, the cognitive characteristic that the attention cannot be continuously stabilized is met, and the real-time assessment and monitoring of high sensitivity of the attention variation can be realized; and training the last layer of the general scoring model by using the fine-tuning data set to obtain a corresponding scoring sub-model, wherein the scoring sub-model is a model obtained after adjustment according to individual data of the user to be evaluated, after continuously collecting multi-modal data of the user to be evaluated, the historical modal data of the user to be evaluated is gradually increased until the user to be evaluated becomes an old user, loading the corresponding pre-trained general scoring model, randomly initializing network weights, not loading the pre-training weights, training the corresponding general scoring model by using the training set, obtaining model parameters when training is unstable, obtaining the corresponding scoring sub-model, and realizing individual modeling of the user to be evaluated.
As shown in fig. 2, an embodiment of the present invention discloses an attention assessment system based on multi-modal fusion, including:
the data acquisition module 10 is configured to acquire multi-modal data of a user to be evaluated, where the multi-modal data includes: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles;
the scoring sub-module 20 is configured to input each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value;
and the comprehensive scoring module 30 is configured to perform linear weighting according to each sub-score value, an accuracy weight corresponding to each sub-score value, and a time attenuation weight corresponding to each sub-score value to obtain a comprehensive score value, where each accuracy weight is determined according to the accuracy of the corresponding scoring sub-model, and each time attenuation weight is determined according to each corresponding scoring sub-model and a preset time attenuation function.
For specific configurations of the trip hotspot extraction system, reference may be made to the configurations of the trip hotspot extraction method above, which are not described herein. The modules of the travel hotspot extraction system can be all or partially realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a trip hotspot extraction method.
It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory storing a computer program and a processor that when executing the computer program performs the steps of:
collecting multi-modal data of a user to be evaluated, wherein the multi-modal data comprises: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles;
inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value;
and carrying out linear weighting according to each sub-score value, the accuracy weight corresponding to each sub-score value and the time attenuation weight corresponding to each sub-score value to obtain a comprehensive score value, wherein each accuracy weight is determined according to the accuracy of the corresponding score sub-model, and each time attenuation weight is determined according to each corresponding score sub-model and a preset time attenuation function.
In one embodiment, the time decay function is:
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the current time +.>Is the optimal time attenuation coefficient corresponding to the scoring sub-model i, < - >The time attenuation weight corresponding to the scoring submodel i is obtained;
the method for determining the optimal time attenuation coefficient comprises the following steps:
defining time attenuation coefficients of the evaluation molecular models;
acquiring a plurality of training data corresponding to each evaluation molecular model, and labeling all the training data with real labels;
inputting each training data into a corresponding scoring sub-model to obtain a corresponding prediction score, and comparing each prediction score with a corresponding real label to obtain a corresponding comparison result;
and accumulating and summing all comparison results corresponding to each evaluation molecular model to obtain a loss function, updating the corresponding time attenuation coefficient through back propagation of the loss function, and obtaining the corresponding optimal time attenuation coefficient after multiple iterative training.
In one embodiment, each of the scoring models is trained by:
acquiring a plurality of historical mode data of the user to be evaluated corresponding to each scoring sub-model;
judging whether the sum of the number of the historical mode data and the current mode data corresponding to each evaluation molecular model is smaller than a preset threshold value;
if yes, loading each pre-trained general scoring model, selecting a first preset percentage of modal data from all historical modal data and current modal data corresponding to each general scoring model as a corresponding fine tuning data set, fixing parameters of all layers except the last layer of each general scoring model, training the last layer by using the corresponding fine tuning data set, and updating the weight of the last layer to obtain a corresponding scoring sub-model;
If not, loading each pre-trained general scoring model, selecting a second preset percentage of mode data from all the historical mode data and the current mode data corresponding to each general scoring model as a training set, using the rest of mode data as a verification set, training the corresponding general scoring model by using the training set, testing the trained general scoring model by using the corresponding verification set every preset training round number until the loss function of the corresponding verification set converges, and using the trained general scoring model as a corresponding scoring sub-model.
In one embodiment, the determining the accuracy weights according to the accuracy of the corresponding scoring sub-model includes:
extracting a corresponding feature set from each current mode data of the user to be evaluated;
calculating the distinguishing degree of the corresponding feature set on each evaluation molecular model;
determining the accuracy weight of each scoring submodel according to an accuracy weight calculation formula, wherein the accuracy weight calculation formula is as follows:
;
;
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for scoring the accuracy corresponding to submodel i, < - >For scoring the degree of differentiation corresponding to submodel i, < >>For the average differentiation, ++>For the weight adjustment factor, +.>And (5) weighing the accuracy corresponding to the scoring submodel i.
In one embodiment, further comprising:
judging whether the comprehensive score value is lower than a preset score value or not;
if yes, judging whether the comprehensive score value rises to be more than a preset score value in the confirmation time window, if yes, judging that the time is short, if not, judging that the time is real, and sending out a warning;
if not, the normal attention state is determined.
In one embodiment, the method for determining the acknowledgement time window includes:
collecting a plurality of sections of historical comprehensive scoring value sequences, wherein the comprehensive scoring values in the historical comprehensive scoring value sequences are selected according to preset quantity;
calculating the mean value and standard deviation of each historical comprehensive scoring value sequence;
setting a candidate time sequence, wherein the candidate time sequence comprises a plurality of time windows;
counting the proportion of the true low-attention state with correct prediction of all the historical comprehensive score value sequences to the true low-attention state in each time window to obtain a corresponding recall rate, and taking the time window with the highest recall rate as a first window;
Calculating a first average value of the average values of all the historical comprehensive score value sequences, calculating a second average value of the standard values of all the historical comprehensive score value sequences, and calculating a second window through the first average value and the second average value;
and testing the recall rates of the first window and the second window on the historical comprehensive score value sequence, judging whether the recall rate of the second window is higher than that of the first window, if so, taking the second window as a confirmation time window, and if not, taking the first window as the confirmation time window.
In one embodiment, the multimodal data includes: any of brain electrical data, eye movement data, and facial video data.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
collecting multi-modal data of a user to be evaluated, wherein the multi-modal data comprises: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles;
inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value;
And carrying out linear weighting according to each sub-score value, the accuracy weight corresponding to each sub-score value and the time attenuation weight corresponding to each sub-score value to obtain a comprehensive score value, wherein each accuracy weight is determined according to the accuracy of the corresponding score sub-model, and each time attenuation weight is determined according to each corresponding score sub-model and a preset time attenuation function.
In one embodiment, the time decay function is:
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the current time +.>Is the optimal time attenuation coefficient corresponding to the scoring sub-model i, < ->The time attenuation weight corresponding to the scoring submodel i is obtained;
the method for determining the optimal time attenuation coefficient comprises the following steps:
defining time attenuation coefficients of the evaluation molecular models;
acquiring a plurality of training data corresponding to each evaluation molecular model, and labeling all the training data with real labels;
inputting each training data into a corresponding scoring sub-model to obtain a corresponding prediction score, and comparing each prediction score with a corresponding real label to obtain a corresponding comparison result;
and accumulating and summing all comparison results corresponding to each evaluation molecular model to obtain a loss function, updating the corresponding time attenuation coefficient through back propagation of the loss function, and obtaining the corresponding optimal time attenuation coefficient after multiple iterative training.
In one embodiment, each of the scoring models is trained by:
acquiring a plurality of historical mode data of the user to be evaluated corresponding to each scoring sub-model;
judging whether the sum of the number of the historical mode data and the current mode data corresponding to each evaluation molecular model is smaller than a preset threshold value;
if yes, loading each pre-trained general scoring model, selecting a first preset percentage of modal data from all historical modal data and current modal data corresponding to each general scoring model as a corresponding fine tuning data set, fixing parameters of all layers except the last layer of each general scoring model, training the last layer by using the corresponding fine tuning data set, and updating the weight of the last layer to obtain a corresponding scoring sub-model;
if not, loading each pre-trained general scoring model, selecting a second preset percentage of mode data from all the historical mode data and the current mode data corresponding to each general scoring model as a training set, using the rest of mode data as a verification set, training the corresponding general scoring model by using the training set, testing the trained general scoring model by using the corresponding verification set every preset training round number until the loss function of the corresponding verification set converges, and using the trained general scoring model as a corresponding scoring sub-model.
In one embodiment, the determining the accuracy weights according to the accuracy of the corresponding scoring sub-model includes:
extracting a corresponding feature set from each current mode data of the user to be evaluated;
calculating the distinguishing degree of the corresponding feature set on each evaluation molecular model;
determining the accuracy weight of each scoring submodel according to an accuracy weight calculation formula, wherein the accuracy weight calculation formula is as follows:
;
;
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for scoring the accuracy corresponding to submodel i, < ->For scoring the degree of differentiation corresponding to submodel i, < >>For the average differentiation, ++>For the weight adjustment factor, +.>And (5) weighing the accuracy corresponding to the scoring submodel i.
In one embodiment, further comprising:
judging whether the comprehensive score value is lower than a preset score value or not;
if yes, judging whether the comprehensive score value rises to be more than a preset score value in the confirmation time window, if yes, judging that the time is short, if not, judging that the time is real, and sending out a warning;
if not, the normal attention state is determined.
In one embodiment, the method for determining the acknowledgement time window includes:
Collecting a plurality of sections of historical comprehensive scoring value sequences, wherein the comprehensive scoring values in the historical comprehensive scoring value sequences are selected according to preset quantity;
calculating the mean value and standard deviation of each historical comprehensive scoring value sequence;
setting a candidate time sequence, wherein the candidate time sequence comprises a plurality of time windows;
counting the proportion of the true low-attention state with correct prediction of all the historical comprehensive score value sequences to the true low-attention state in each time window to obtain a corresponding recall rate, and taking the time window with the highest recall rate as a first window;
calculating a first average value of the average values of all the historical comprehensive score value sequences, calculating a second average value of the standard values of all the historical comprehensive score value sequences, and calculating a second window through the first average value and the second average value;
and testing the recall rates of the first window and the second window on the historical comprehensive score value sequence, judging whether the recall rate of the second window is higher than that of the first window, if so, taking the second window as a confirmation time window, and if not, taking the first window as the confirmation time window.
In one embodiment, the multimodal data includes: any of brain electrical data, eye movement data, and facial video data.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
Claims (8)
1. A method of attention assessment based on multimodal fusion, comprising:
collecting multi-modal data of a user to be evaluated, wherein the multi-modal data comprises: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles;
inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value;
performing linear weighting according to each sub-score value, an accuracy weight corresponding to each sub-score value and a time attenuation weight corresponding to each sub-score value to obtain a comprehensive score value, wherein each accuracy weight is determined according to the accuracy of a corresponding score sub-model, and each time attenuation weight is determined according to each corresponding score sub-model and a preset time attenuation function;
the time decay function is:
,
Wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the current time +.>Is the optimal time attenuation coefficient corresponding to the scoring sub-model i, < ->The time attenuation weight corresponding to the scoring submodel i is obtained;
the method for determining the optimal time attenuation coefficient comprises the following steps:
defining time attenuation coefficients of the evaluation molecular models;
acquiring a plurality of training data corresponding to each evaluation molecular model, and labeling all the training data with real labels;
inputting each training data into a corresponding scoring sub-model to obtain a corresponding prediction score, and comparing each prediction score with a corresponding real label to obtain a corresponding comparison result;
accumulating and summing all comparison results corresponding to each evaluation molecular model to obtain a loss function, updating a corresponding time attenuation coefficient through counter propagation of the loss function, and obtaining a corresponding optimal time attenuation coefficient after multiple iterative training;
the accuracy weights are determined according to the accuracy of the corresponding scoring submodel, and the method comprises the following steps:
extracting a corresponding feature set from each current mode data of the user to be evaluated;
calculating the distinguishing degree of the corresponding feature set on each evaluation molecular model;
Determining the accuracy weight of each scoring submodel according to an accuracy weight calculation formula, wherein the accuracy weight calculation formula is as follows:
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the accuracy corresponding to the scoring sub-model i,for scoring the degree of differentiation corresponding to submodel i, < >>For the average differentiation, ++>For the weight adjustment factor, +.>And (5) weighing the accuracy corresponding to the scoring submodel i.
2. The attention assessment method based on multi-modal fusion as claimed in claim 1, wherein each of the assessment models is trained by:
acquiring a plurality of historical mode data of the user to be evaluated corresponding to each scoring sub-model;
judging whether the sum of the number of the historical mode data and the current mode data corresponding to each evaluation molecular model is smaller than a preset threshold value;
if yes, loading each pre-trained general scoring model, selecting a first preset percentage of modal data from all historical modal data and current modal data corresponding to each general scoring model as a corresponding fine tuning data set, fixing parameters of all layers except the last layer of each general scoring model, training the last layer by using the corresponding fine tuning data set, and updating the weight of the last layer to obtain a corresponding scoring sub-model;
If not, loading each pre-trained general scoring model, selecting a second preset percentage of mode data from all the historical mode data and the current mode data corresponding to each general scoring model as a training set, using the rest of mode data as a verification set, training the corresponding general scoring model by using the training set, testing the trained general scoring model by using the corresponding verification set every preset training round number until the loss function of the corresponding verification set converges, and using the trained general scoring model as a corresponding scoring sub-model.
3. The attention assessment method based on multi-modal fusion as claimed in claim 1, further comprising:
judging whether the comprehensive score value is lower than a preset score value or not;
if yes, judging whether the comprehensive score value rises to be more than a preset score value in the confirmation time window, if yes, judging that the time is short, if not, judging that the time is real, and sending out a warning;
if not, the normal attention state is determined.
4. A method of attention assessment based on multimodal fusion as claimed in claim 3, wherein the method of determining the validation time window comprises:
Collecting a plurality of sections of historical comprehensive scoring value sequences, wherein the comprehensive scoring values in the historical comprehensive scoring value sequences are selected according to preset quantity;
calculating the mean value and standard deviation of each historical comprehensive scoring value sequence;
setting a candidate time sequence, wherein the candidate time sequence comprises a plurality of time windows;
counting the proportion of the true low-attention state with correct prediction of all the historical comprehensive score value sequences to the true low-attention state in each time window to obtain a corresponding recall rate, and taking the time window with the highest recall rate as a first window;
calculating a first average value of the average values of all the historical comprehensive score value sequences, calculating a second average value of the standard values of all the historical comprehensive score value sequences, and calculating a second window through the first average value and the second average value;
and testing the recall rates of the first window and the second window on the historical comprehensive score value sequence, judging whether the recall rate of the second window is higher than that of the first window, if so, taking the second window as a confirmation time window, and if not, taking the first window as the confirmation time window.
5. The attention assessment method based on multimodal fusion as recited in claim 1, wherein the multimodal data includes: any of brain electrical data, eye movement data, and facial video data.
6. An attention assessment system based on multimodal fusion, comprising:
the data acquisition module is used for acquiring multi-mode data of a user to be evaluated, wherein the multi-mode data comprises: a plurality of current modality data reflecting the attention characteristics of the user to be evaluated from different angles;
the scoring sub-module is used for inputting each current mode data in the multi-mode data into a corresponding trained scoring sub-model to obtain a corresponding sub-scoring value;
the comprehensive scoring module is used for carrying out linear weighting according to each sub-scoring value, the accuracy weight corresponding to each sub-scoring value and the time attenuation weight corresponding to each sub-scoring value to obtain a comprehensive scoring value, wherein each accuracy weight is determined according to the accuracy of the corresponding scoring sub-model, each time attenuation weight is determined according to each corresponding scoring sub-model and a preset time attenuation function, and the time attenuation function is as follows:
,
wherein n is the number of scoring sub-models, 0<i<n, i is a positive integer,for the current time +.>Is the optimal time attenuation coefficient corresponding to the scoring sub-model i, < ->The time attenuation weight corresponding to the scoring submodel i is obtained;
The method for determining the optimal time attenuation coefficient comprises the following steps:
defining time attenuation coefficients of the evaluation molecular models;
acquiring a plurality of training data corresponding to each evaluation molecular model, and labeling all the training data with real labels;
inputting each training data into a corresponding scoring sub-model to obtain a corresponding prediction score, and comparing each prediction score with a corresponding real label to obtain a corresponding comparison result;
accumulating and summing all comparison results corresponding to each evaluation molecular model to obtain a loss function, updating a corresponding time attenuation coefficient through counter propagation of the loss function, and obtaining a corresponding optimal time attenuation coefficient after multiple iterative training;
the accuracy weights are determined according to the accuracy of the corresponding scoring submodel, and the method comprises the following steps:
extracting a corresponding feature set from each current mode data of the user to be evaluated;
calculating the distinguishing degree of the corresponding feature set on each evaluation molecular model;
determining the accuracy weight of each scoring submodel according to an accuracy weight calculation formula, wherein the accuracy weight calculation formula is as follows:
,
wherein n is the number of scoring sub-models, 0 <i<n, i is a positive integer,for the accuracy corresponding to the scoring sub-model i,for scoring the degree of differentiation corresponding to submodel i, < >>For the average differentiation, ++>For the weight adjustment factor, +.>And (5) weighing the accuracy corresponding to the scoring submodel i.
7. Computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311154786.8A CN116881853B (en) | 2023-09-08 | 2023-09-08 | Attention assessment method, system, equipment and medium based on multi-mode fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311154786.8A CN116881853B (en) | 2023-09-08 | 2023-09-08 | Attention assessment method, system, equipment and medium based on multi-mode fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116881853A CN116881853A (en) | 2023-10-13 |
CN116881853B true CN116881853B (en) | 2024-01-05 |
Family
ID=88257314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311154786.8A Active CN116881853B (en) | 2023-09-08 | 2023-09-08 | Attention assessment method, system, equipment and medium based on multi-mode fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116881853B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117687313B (en) * | 2023-12-29 | 2024-07-12 | 广东福临门世家智能家居有限公司 | Intelligent household equipment control method and system based on intelligent door lock |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103995804A (en) * | 2013-05-20 | 2014-08-20 | 中国科学院计算技术研究所 | Cross-media topic detection method and device based on multimodal information fusion and graph clustering |
CN111861569A (en) * | 2020-07-23 | 2020-10-30 | 中国工商银行股份有限公司 | Product information recommendation method and device |
CN114366103A (en) * | 2022-01-07 | 2022-04-19 | 北京师范大学 | Attention assessment method and device and electronic equipment |
CN116127853A (en) * | 2023-03-03 | 2023-05-16 | 北京工业大学 | Unmanned driving overtaking decision method based on DDPG (distributed data base) with time sequence information fused |
CN116386862A (en) * | 2023-02-10 | 2023-07-04 | 平安科技(深圳)有限公司 | Multi-modal cognitive impairment evaluation method, device, equipment and storage medium |
CN116530938A (en) * | 2023-05-09 | 2023-08-04 | 中国人民解放军军事科学院系统工程研究院 | Cognitive enhancement training system and method |
-
2023
- 2023-09-08 CN CN202311154786.8A patent/CN116881853B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103995804A (en) * | 2013-05-20 | 2014-08-20 | 中国科学院计算技术研究所 | Cross-media topic detection method and device based on multimodal information fusion and graph clustering |
CN111861569A (en) * | 2020-07-23 | 2020-10-30 | 中国工商银行股份有限公司 | Product information recommendation method and device |
CN114366103A (en) * | 2022-01-07 | 2022-04-19 | 北京师范大学 | Attention assessment method and device and electronic equipment |
CN116386862A (en) * | 2023-02-10 | 2023-07-04 | 平安科技(深圳)有限公司 | Multi-modal cognitive impairment evaluation method, device, equipment and storage medium |
CN116127853A (en) * | 2023-03-03 | 2023-05-16 | 北京工业大学 | Unmanned driving overtaking decision method based on DDPG (distributed data base) with time sequence information fused |
CN116530938A (en) * | 2023-05-09 | 2023-08-04 | 中国人民解放军军事科学院系统工程研究院 | Cognitive enhancement training system and method |
Non-Patent Citations (2)
Title |
---|
Correlated Attention Networks for Multimodal Emotion Recognition;Jie-Lin Qiu等;《2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)》;第2656-2660页 * |
基于多模态情感融合的个性化推荐算法研究;肖旋;《中国优秀硕士学位论文全文数据库信息科技辑》(第03期);第I138-3325页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116881853A (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Driver fatigue detection based on eye state recognition | |
Kalka et al. | Estimating and fusing quality factors for iris biometric images | |
CN102547123B (en) | Self-adapting sightline tracking system and method based on face recognition technology | |
CA2559381C (en) | Interactive system for recognition analysis of multiple streams of video | |
JP7070605B2 (en) | Focus range estimator, its method and program | |
CN116881853B (en) | Attention assessment method, system, equipment and medium based on multi-mode fusion | |
CN108596087B (en) | Driving fatigue degree detection regression model based on double-network result | |
CN102136024A (en) | Biometric feature identification performance assessment and diagnosis optimizing system | |
CN110464367B (en) | Psychological anomaly detection method and system based on multi-channel cooperation | |
CN105373767A (en) | Eye fatigue detection method for smart phones | |
CN112733772B (en) | Method and system for detecting real-time cognitive load and fatigue degree in warehouse picking task | |
Busey et al. | Characterizing human expertise using computational metrics of feature diagnosticity in a pattern matching task | |
CN116343284A (en) | Attention mechanism-based multi-feature outdoor environment emotion recognition method | |
Rigas et al. | Towards a multi-source fusion approach for eye movement-driven recognition | |
CN117370828A (en) | Multi-mode feature fusion emotion recognition method based on gating cross-attention mechanism | |
CN111297327A (en) | Sleep analysis method, system, electronic equipment and storage medium | |
CN112907635B (en) | Method for extracting abnormal eye movement characteristics based on geometric analysis | |
CN114424941A (en) | Fatigue detection model construction method, fatigue detection method, device and equipment | |
CN114298189A (en) | Fatigue driving detection method, device, equipment and storage medium | |
CN113887365A (en) | Special personnel emotion recognition method and system based on multi-mode data fusion | |
CN111222374A (en) | Lie detection data processing method and device, computer equipment and storage medium | |
CN117171658A (en) | Cognitive load judging method based on brain intelligent technology | |
Wild et al. | Impact of (segmentation) quality on long vs. short‐timespan assessments in iris recognition performance | |
CN115153476A (en) | Sleep evaluation method and device based on multi-dimensional data, electronic equipment and medium | |
Akinci et al. | A video-based eye pupil detection system for diagnosing bipolar disorder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |