CN117137488A - Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images - Google Patents

Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images Download PDF

Info

Publication number
CN117137488A
CN117137488A CN202311405231.6A CN202311405231A CN117137488A CN 117137488 A CN117137488 A CN 117137488A CN 202311405231 A CN202311405231 A CN 202311405231A CN 117137488 A CN117137488 A CN 117137488A
Authority
CN
China
Prior art keywords
facial expression
task
feature
module
depression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311405231.6A
Other languages
Chinese (zh)
Other versions
CN117137488B (en
Inventor
吕玉丹
杨鑫
王长明
姚翰
张永祥
殷雪峰
李童
张肇轩
尹宝才
张远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Jilin University
Original Assignee
Dalian University of Technology
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology, Jilin University filed Critical Dalian University of Technology
Priority to CN202311405231.6A priority Critical patent/CN117137488B/en
Publication of CN117137488A publication Critical patent/CN117137488A/en
Application granted granted Critical
Publication of CN117137488B publication Critical patent/CN117137488B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/165Evaluating the state of mind, e.g. depression, anxiety
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/0033Features or image-related aspects of imaging apparatus classified in A61B5/00, e.g. for MRI, optical tomography or impedance tomography apparatus; arrangements of imaging apparatus in a room
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/0059Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
    • A61B5/0077Devices for viewing the surface of the body, e.g. camera, magnifying lens
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Abstract

The invention discloses an auxiliary identification method for depression symptoms based on brain electrical data and facial expression images, which is mainly designed in such a way that the degree of conflict treatment disorder is evaluated by organically combining brain electrical physiological indexes and facial image indexes around the conflict treatment disorder and core symptoms of negative bias of depression groups, so that objective and quantitative indexes are extracted for the identification and the pre-estimation of depression. Specifically, an electroencephalogram stimulation experiment is executed based on a preset experimental paradigm, facial expression image data and individual electroencephalogram data to be tested are synchronously collected, after N270 waveforms are obtained through analysis, multi-feature extraction is carried out on the facial expression image data and the N270 waveforms, and then multi-feature integration is carried out, and then the multi-feature integration is input into a trained neural network model, so that an auxiliary identification result of the depression symptoms is output. The invention not only can provide objective indexes for assisting doctors to identify depression by using experimental waveforms and image data, but also has the advantages of high accuracy, strong robustness and low error rate.

Description

Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images
Technical Field
The invention relates to the field of artificial intelligence technology application, in particular to an auxiliary identification method for depression symptoms based on brain electrical data and facial expression images.
Background
Depression is a common affective disorder disease, seriously affects physical and mental health of patients, and how to better assist doctors in identifying, distinguishing and predicting relevant indicators of depression, explores the construction of objective indicators, and becomes a long-term and important target for assisting diagnosis and treatment of depression.
The Event-related potential (ERP) paradigm is a neurophysiologic experimental design and analysis method used to study the electrophysiological response of the brain to a specific stimulus or task, also known as "cognitive potential", and is highly advantageous in revealing psycho-cognitive processes. N270 is a negative wave recorded at 270ms when the two stimuli are not completely matched, reflects the processing capacity of the brain to conflict information, can be used as an electric activity index of the conflict processing capacity, and has strong specificity. The core symptoms of obvious conflict processing ability disorder and negative bias exist in the patients suffering from the depression, and the core symptoms are also the main reasons that the patients suffering from the depression cannot normally conduct social connection and recover social functions. Thus, N270 serves as an electrical activity index for evaluating conflicting information processing capability, and can serve as a specific objective index for predicting depression independently of the depression syndrome. Previous image studies have shown that the conflict handling system of a depressive patient is impaired in work and social activities, and therefore the invention considers that N270 can be used as an objective index for predicting the sensitivity of depression.
However, the previously known N270 task paradigm suffers from the following problems: 1. the signal-to-noise ratio is low, a large amount of test repetition and data processing are needed to extract reliable components, the test operation procedure is complicated, the study cost of a tested person is high, and the time and the workload of the test are high; 2. in the stimulus presentation system, accurate timing is not performed between the event code and the stimulus code, and time scale errors exist; 3. the materials used in the paradigm are mainly numbers, letters, figures and the like, and cannot be designed for negative deviation symptoms of depressed people well; 4. especially, considering single application N270 brain electricity data, the reliability is relatively limited, and the method has certain influence on the subsequent generation of guiding reports.
Disclosure of Invention
In view of the foregoing, the present invention is directed to an auxiliary identification method for depressive disorder based on brain electrical data and facial expression images, so as to solve the above-mentioned technical problems.
The technical scheme adopted by the invention is as follows:
the invention provides a depression symptom auxiliary identification method based on brain electrical data and facial expression images, which comprises the following steps:
performing an electroencephalogram stimulation experiment based on a preset experimental paradigm, and synchronously collecting facial expression image data of a tested person and electroencephalogram data of an individual;
analyzing the electroencephalogram data to obtain an N270 waveform;
performing multi-feature extraction on the facial expression image data and the N270 waveform;
and inputting the integrated multi-feature into a trained neural network model based on a self-attention mechanism, and outputting an auxiliary identification result of the depression symptoms.
In at least one possible implementation manner, the editing process of the preset experimental paradigm includes:
based on Matlab and E-prime, using gray photos with consistent physical properties; the sex proportion in the gray photos is balanced, the tested list is neutral and the negative proportion is balanced, no facial mark exists, and partial facial shielding treatment is carried out on half of the photos;
a given number of trials, the duration of a single trial, and the total duration of the experiment are set.
In at least one possible implementation manner, the multi-feature extraction includes:
extracting facial features and physiological signal features from the facial expression image data by utilizing a pre-trained multitasking depth prediction model;
extracting waveform features based on the N270 waveform;
and obtaining emotion feature vectors based on the facial expression image data and the N270 waveform.
In at least one possible implementation manner, the multi-task depth prediction model comprises a common feature extraction module and a multi-task feature fusion module;
the common feature extraction module is used for extracting features of each task and recovering a rough depth map, a semantic segmentation map and a surface vector map;
the multi-task feature fusion module is used for carrying out multi-task fusion on the features extracted by the common feature extraction module, and can distinguish the common semantic features of each task and the unique semantic features of each task.
In at least one possible implementation manner, the common feature extraction module adopts a single-input multi-output network and is composed of at least four parts: the device comprises an encoder, a multi-dimensional decoder, a multi-scale feature fusion sub-module and a refinement sub-module;
the encoder is used for extracting characteristics of various scales;
the multi-dimensional decoder is used for gradually expanding the final characteristics of the encoder through an up-sampling module and reducing the number of channels at the same time;
the multi-scale feature fusion submodule is used for combining different information of a plurality of scales into one;
the refinement sub-module is used for adjusting the output size of the image and the channel number.
In at least one possible implementation manner, the multi-task feature fusion module adopts a multi-input multi-output network and is composed of at least two parts: the multi-input feature fusion module is used for fusing the multi-task features output by the previous module; the feature decoding section is a multi-output decoder.
In at least one possible implementation manner, the outputting the auxiliary identification result of the depression disorder includes:
fusing the multiple features in time sequence to obtain space-time feature vectors;
and inputting the space-time feature vector into a Transformer encoder model, and classifying by softmax to obtain an auxiliary recognition result.
Compared with the prior art, the main design concept of the invention is that the degree of conflict treatment disorder is evaluated by organically combining the brain physiological index and the facial image index around the conflict treatment disorder and the core symptom of negative bias of the depression crowd, so that objective and quantitative indexes are extracted for the identification and the pre-estimation of the depression. Specifically, an electroencephalogram stimulation experiment is executed based on a preset experimental paradigm, facial expression image data and individual electroencephalogram data to be tested are synchronously collected, after N270 waveforms are obtained through analysis, multi-feature extraction is carried out on the facial expression image data and the N270 waveforms, and then multi-feature integration is carried out, and then the multi-feature integration is input into a trained neural network model, so that an auxiliary identification result of the depression symptoms is output. The invention not only can provide objective indexes for assisting doctors to identify depression by using experimental waveforms and image data, but also has the advantages of high accuracy, strong robustness and low error rate.
Furthermore, the machine learning technology is applied to facial expression or micro-expression detection and feature extraction of a depression patient, and deep learning of the multi-task module is combined, so that the sensitivity, accuracy and specificity of depression indication identification and prediction can be remarkably improved.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the accompanying drawings, in which:
fig. 1 is a flow chart of an auxiliary identification method for depressive disorder based on brain electrical data and facial expression images according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The invention provides an embodiment of a depression disorder auxiliary identification method based on brain electrical data and facial expression images, specifically, as shown in fig. 1, the method comprises the following steps:
step S1, performing an electroencephalogram stimulation experiment based on a preset experimental paradigm, and synchronously collecting facial expression image data of a tested person (the facial expression can be collected and recorded in a close-up manner through a high-definition camera) and electroencephalogram data of an individual person (specifically, electroencephalogram time domain data);
in the preset paradigm mentioned here, the specific editing process may include the following parts: based on Matlab and E-prime, using gray photos with consistent physical properties; the sex proportion in the gray photos is balanced, the tested emotion is balanced with the negative proportion, no facial marks (such as glasses, beards, skin nevi, jewelry and the like) exist, and partial face shielding treatment is carried out on half of the photos; the preset number of test times, the duration time of a single test time and the total experimental duration are set, for example, at least 180 test times are set, each test time lasts for 500ms, and the total duration time of pictures and stimulus intervals which are sequentially displayed is about 8 minutes.
The novel N270 task paradigm has the following advantages: 1. the experimental flow is simple, and the tested operation is simple; 2. the parameter design is reasonable, the repeatability of the induced brain wave shape is good, the signal to noise ratio is high, the timing system is accurately calibrated, and the timing error range is controllable; 3. the picture material covers negative and neutral emotion and aims at core symptoms of negative bias of depressed people; 4. the method has the advantages of sensitive task and high specificity, is designed aiming at the core symptom of conflict processing capacity disturbance, and can be used as an objective index for early prediction of depression.
In addition, the analysis proves that the depression has the core symptoms of depression syndrome, conflict treatment disorder, negative bias and the like, and the facial expression or the micro-expression change, such as the increase of sad negative expression and the less smile positive expression. Facial expressions are produced by muscle contraction, but their changes are consistent with the intrinsic psychological changes, and can be used as a transmission window for information such as emotion, intention, and desire. Depression patients are stimulated by conflicting information (conflicting information refers to information that conflicts with the mind, value or expectations of the patient), and negative emotions such as anxiety, stress and the like are triggered due to the weakening of conflict handling capacity, and facial expressions are further converted into negative emotions such as depression, despair, helplessness and the like. Based on the above, when the task of the novel N270 paradigm stimulation brain conflict processing system provided by the invention is tested, the facial expression of a patient can be synchronously collected as characteristic reinforcement, and the facial expression characteristic can be deeply learned by means of an AI auxiliary algorithm, so that the identification of the conflict processing capacity of the depressed crowd and the extraction and quantification of the facial characteristic are further enhanced.
S2, analyzing the electroencephalogram data to obtain an N270 waveform;
in actual operation, mainly may include:
after the electroencephalogram data is imported and independent component analysis is carried out, setting an N270 segmentation time window;
and respectively superposing and averaging the data fragments of different stimuli and the same stimulus, obtaining an N270 waveform in a waveform subtraction mode, and determining the N270 waveform as a negative wave between 220ms and 380ms after stimulus presentation.
The complete waveform interpretation process can be referred to as follows:
a) Importing data: starting Matlab and EEGlab, and selecting corresponding import format to import data;
b) Downsampling: the sampling rate is adjusted to 250Hz, data is compressed, and the influence of high-frequency information is reduced;
c) Importing channel information: determining coordinates of each electrode; reject the useless electrode: screening leads according to actual conditions and needs;
d) Re-referencing: taking bilateral mastoid as a reference;
e) And (3) filtering: carrying out band-pass filtering of 0.5Hz-45 Hz;
f) Independent component analysis ICA: ICA analysis of data to remove artifacts in signals
In particular ocular artifacts, and removes noise components;
g) Segmentation: the N270 segment time window is set to 200ms before stimulation and 800ms after stimulation;
h) Baseline correction: correction was performed with 200ms before stimulation as baseline;
i) Superposition average: respectively superposing and averaging the data fragments of different stimuli and the same stimulus, and subtracting the waveforms of the different stimulus of the human face and the same stimulus of the human face to obtain an N270 waveform;
j) ERP component identification: n270 is a negative wave between 220-380ms after stimulus presentation.
S3, carrying out multi-feature extraction on facial expression image data and N270 waveforms;
in particular, the multi-feature extraction may include:
extracting facial features and physiological signal features based on the facial expression image data;
extracting waveform features based on the N270 waveform;
and obtaining emotion feature vectors based on the facial expression image data and the N270 waveform.
It should be noted that, the depth prediction algorithm for integrating the characteristics of multiple tasks is an algorithm capable of extracting the same characteristics of each task and utilizing the interrelationship of each task to perform mutual promotion, so as to perform characteristic fusion or parameter sharing of task interaction. The existing model for integrating the multi-task features mainly aims at solving the problem that the feature fusion mode is only to use parameter sharing, and the common features are not used for showing the help of tasks and the characteristics among the tasks, so that the invention can effectively solve the problem by introducing a deep learning method based on a multi-task module.
For the process of feature extraction described above, reference is given here to the implementation as follows:
(1) Facial features
Step S31, carrying out framing processing on the facial expression image data to obtain a plurality of image sequences;
s32, extracting facial features of the image sequence by utilizing a pre-trained multitask depth prediction model to obtain a facial frame sequence;
step S33, extracting optical flow from adjacent frames of the face frame sequence by adopting an optical flow method to obtain an optical flow sequence;
and step S34, fusing and expanding the face frame sequence and the optical flow sequence to obtain a face one-dimensional vector, and then carrying out linear mapping on the face one-dimensional vector to obtain a face embedded vector.
(2) Physiological signal characteristics
Step S310, extracting a region of interest of cheek parts from the image sequence by using the multi-task depth prediction model to obtain a sequence of interest;
and step 320, expanding the interested sequence to obtain an interested one-dimensional vector, and then linearly mapping the interested one-dimensional vector to obtain a physiological signal embedded vector.
(3) Waveform characteristics
And step S311, extracting basic features of N270 waveforms by taking a frame as a unit, forming the basic features into waveform one-dimensional vectors, and then carrying out linear mapping on the waveform one-dimensional vectors to obtain waveform embedded vectors.
(4) Emotional characteristics
Step S301, extracting a multidimensional facial feature vector from the image sequence;
step S302, performing Fourier transform on an N270 waveform to convert the waveform into frequency domain data, performing block division on the frequency domain data matrix according to different frequency ranges by taking frequency as a reference to obtain a block frequency domain matrix, calculating covariance matrixes of all sub-blocks of the block frequency domain matrix, and calculating LES of all the covariance matrixes to obtain LES feature vectors;
step S303, splicing the multi-dimensional facial feature vector and the LES feature vector to obtain a fusion vector; and inputting the fusion vector into a pre-trained emotion feature extraction model to obtain an emotion embedded vector corresponding to the image sequence.
And S4, inputting the integrated multi-feature into a trained neural network model based on a self-attention mechanism, and outputting an auxiliary identification result of the depression symptoms.
The following thought can be adopted for the step specifically but not limited to:
fusing the multiple features in time sequence to obtain space-time feature vectors;
and inputting the space-time feature vector into a Transformer encoder model, and classifying by softmax to obtain an auxiliary recognition result.
Further, regarding the foregoing multi-task depth prediction model, network training may be performed through two modules, where one module is a common feature extraction module, and is responsible for extracting features of each task and recovering a rough depth map, a semantic segmentation map, and a surface vector map; the other module is a multi-task feature fusion module which is responsible for carrying out multi-task fusion on the features extracted by the common feature extraction module, and the network can distinguish the common semantic features of all tasks and the unique semantic features of all tasks, so that the finally recovered image has more structurality. Specifically:
(one) with respect to the common feature extraction module
(1) Network structure
The common feature extraction module comprises four parts: an encoder, a multi-dimensional decoder, a multi-scale feature fusion sub-module, and a refinement sub-module. Specifically, the encoder may consist of four convolutional layers, responsible for extracting features of multiple scales 1/4, 1/8, 1/16 and 1/32; the multidimensional decoder adopts four up-sampling modules, so that the final characteristics of the encoder are gradually expanded, and meanwhile, the number of channels is reduced; the multiscale feature fusion sub-module integrates four different scale features from the encoder using upsampling and channel connection approaches: corresponding to the encoder, up-sampling four-layer outputs (each having 16 channels) of the encoder in the form of x 2, ×4, ×8 and x 16, respectively, so as to have the same size as the final output, the up-sampling being accomplished in a channel-connected manner, and then further transforming by a convolution layer to obtain an output having 64 channels, the main purpose of the multi-scale feature fusion sub-module being to combine different information of multiple scales into one, so that the lower-layer output of the encoder retains information having finer spatial resolution, helping to recover the detail information lost due to the multiple down-sampling; and finally, the thinning sub-module is used for adjusting the output size and the channel number of the image, and three convolution layers are adopted corresponding to three tasks respectively, so that the output channel number is recovered to be 1 channel of the depth image, 1 channel of the semantically segmented image and 3 channels of the surface vector, and the loss calculation and the back propagation are facilitated. Schematically in the above embodiment:
a) Carrying out framing treatment on the facial expression image data and the N270 waveform respectively to obtain a plurality of image sequences;
b) The image is edited into RGB image with 320 multiplied by 240 multiplied by 3;
c) Four layers of characteristics are obtained through four convolution layers of the encoder;
d) The four-layer features are decoded by four sampling layers of the decoder;
e) After the completion, the up-sampling output of different scales is fused together through a multi-scale fusion sub-module, and finally the shapes of the depth image, the semantic segmentation map and the surface vector map are recovered through different deconvolution layers in a thinning sub-module.
(2) Training process
a) Establishing a loss function between the RGB image and the depth image, the semantic segmentation map, the surface vector map and the corresponding image predicted by the network, wherein the number of the processed images is 4 each time;
b) Updating network parameters by reducing the loss function;
c) And setting the iteration times as 100 after the iteration update until the loss function converges.
The common feature extraction module is a single-input multi-output network, and in the training process, the network is required to establish a loss function between the RGB image and the depth image in the dataset, the semantic segmentation graph, the surface vector graph and the corresponding image predicted by the network, update network parameters and iterate until the loss function converges, so that the trained network can be obtained. Because 3 tasks need to be processed simultaneously and correlation among the tasks is guaranteed, the loss function can be divided into 3 parts, and the specific feature extraction loss function is as follows:
Ltask =Ldepth +Lseg +Lnormal
the function and purpose of the loss function of each part are as follows:
(a) Depth map loss function, namely:
Ldepth =L1 +Lgrad
the function isL1 a sum of a loss function and a gradient loss function, wherein for each pixel pointiThe corresponding predicted depth and the true depth are respectivelydiAndDiL1 the loss function can be constraineddiAnd (3) withDiThe difference provides the guarantee of accuracy for the main part of the loss function.
(b) The semantic segmentation loss function is a cross entropy function, is a loss function commonly used in semantic segmentation, is commonly used for describing the distance between two probabilities, and is used for constraining the probability of the predicted semantic category of an object in an image in the model, and the expression is as follows:
wherein the image is segmented for predicted semanticsS And each pixel in the truth imagesAnd, elements that are all non-0, i.e., 1, on class classifications, only one for each pixelsjSince the cross entropy loss is only concerned with the prediction probability for the correct class, it can be ensured that the classification result is correct as long as its value is large enough.
(c) Surface vector loss function, i.e. measuring and estimating the surface normal of depth mapn i d ) Surface normal to its true datan i g ) Accuracy of (3). The surface vector loss is also calculated from the depth gradient, but it measures the angle between the two surface normals, so the loss is very sensitive to the depth structure, and can promote the consistency of the predicted structure, expressed as:
(d) Gradient loss function, constraint pointxA shaft(s),yGradient change on shaftgx (ei ) Andgv (ei ) Respectively isxAndygradient loss on axis), edge information is detected sensitively, the depth is usually discontinuous at the boundary of the object. Note that gradient loss and previous depth loss are different types of errors, so a weighted training network is needed, and the expression is:
in the training process of the common feature extraction module, a loss function is designed between images of 3 tasks and corresponding true values of the images obtained from the output end of the network, and is reversely propagated in a gradient descending mode, so that parameters of the network are updated, and training can be completed when the loss function converges. In a certain training embodiment of the common feature extraction module algorithm, the iteration number is set to be 100, the number of images processed each time is 4, the initial learning rate is 10-4, the learning rate is updated to be one tenth of the original learning rate every 20 times of iteration, and finally the convergence loss function value obtained after 100 times of iteration is 0.1245.
(II) with respect to the multi-tasking feature fusion module
(1) Network structure
The multi-task feature fusion module consists of two parts, wherein the first part is a multi-input feature fusion module which is responsible for fusing the multi-task features output by the previous module, and the network used is a densely connected U-net; the second part is a feature decoding part, similar to the decoder part of the previous part, and is a multi-output decoder, so that the description is omitted. In connection with the previous examples, in particular:
a) Creating a coding path consisting of a plurality of streams, each stream handling the image form of a different task of the previous module;
b) The three tasks are encoded through the U-net encoder, the pooling features obtained after the primary convolution pooling operation of the image of the task 1 are combined with the secondary pooling features of the task 2, and the pooling features of the task 3 are combined after passing through the convolution layer, so that the feature sharing performance is ensured;
c) The obtained common features firstly obtain a common upsampling feature through an upsampling operation;
d) Decoding the up-sampling feature together with the previous pooling feature, and respectively sending the up-sampling feature and the pooling features with different scales of three tasks to a decoder;
e) Connecting with the features extracted by the previous tasks and recovering the original shape of each task through an up-sampling layer;
f) The restored depth image, semantic segmentation map, and surface vector map are loss compared to truth values in the dataset to update parameters in the network.
The multi-task feature fusion module adds a dense connection method on the basis of the original U-net network, can effectively enhance the feature extraction capability of the multi-input mode, and in order to realize the dense connection mode, firstly, a coding path consisting of a plurality of streams is created, and each stream processes the image forms of different tasks of the previous module. The main purpose of employing separate streams for different image forms is to scatter information that would otherwise have fused in the early stages, thereby limiting the learning ability of complex relationships between network capture modes. It can be seen from the network structure that the 3 tasks are encoded by the encoders of the U-net, but the difference is that interaction occurs when different convolution layers are transmitted, for example, the image of the task 1 is subjected to a primary convolution pooling operation, the obtained pooling feature is combined with the secondary pooling feature of the task 2, and the pooled feature of the task 3 is combined with the pooling feature of the task 2 after passing through the convolution layers. Thus, the characteristics can flow between tasks, and the commonality of the characteristics is ensured. In the decoder part, the obtained common features are firstly subjected to an up-sampling operation to obtain a common up-sampling feature, then the up-sampling feature is combined with the previous pooling feature to be decoded, the pooling feature with different scales of 3 tasks is respectively sent into the decoder, the pooling feature is connected with the features extracted by the previous tasks and is subjected to an up-sampling layer to restore the original shape of each task, and the restored depth image, the semantic segmentation map and the surface vector map are subjected to loss comparison with true values in a dataset to update parameters in a network. The method for combining the channel connection and the downsampling connection can further fuse the characteristics of different tasks and promote the conversion between the tasks, is different from the common characteristics in the common characteristic extraction module, can reflect the connection between different tasks under the condition of keeping the original characteristics, and highlights the fusion between the multi-task characteristics.
(2) Training process
a) Establishing a loss function for the relation among the output and the true values of the depth image, the semantic segmentation graph and the surface vector graph corresponding to the data base, wherein the number of the processed images is 4 each time;
b) Updating network parameters by reducing a loss function, wherein the loss function is the same as the common feature extraction module;
c) And setting the iteration times as 100 after the iteration update until the loss function converges.
The multi-task feature fusion module belongs to a multi-input multi-output network, in training, a loss function is required to be established for outputting relations among depth images, semantic segmentation graphs and surface vector graph true values corresponding to a database, network parameters are updated by reducing the loss function, and the network parameters are iteratively updated until the loss function converges, so that a trained network is obtained. When the network is trained, the module and the common feature extraction module can be trained in a unified way so as to form an end-to-end neural network, namely, a single RGB image is input, and a depth image, a semantic segmentation map and a surface vector map corresponding to the single RGB image are output. Since the 3 tasks are identical to the common feature extraction module, the same penalty function is used for constraint and therefore not described in detail. In the training process of the module, the model directly obtains images of 3 tasks at the output end of the network, designs a loss function between the images and the corresponding true values, and carries out back propagation in a gradient descending mode so as to update the parameters of the network, and when the loss function converges, the training can be completed. In the training process of the model algorithm, the iteration number is set to be 100, the number of images processed each time is 4, the initial learning rate is 10 < -4 >, the learning rate is updated to be one tenth of the original learning rate every 20 iterations, and finally the convergence loss function value obtained after 100 iterations is 0.1159.
In summary, the main design concept of the invention is to evaluate the degree of conflict treatment disorder by organically combining the brain physiological index and the facial image index around the conflict treatment disorder and the core symptom of negative bias of the depression population, thereby providing objective and quantitative index for the recognition and the pre-estimation of depression. Specifically, an electroencephalogram stimulation experiment is executed based on a preset experimental paradigm, facial expression image data and individual electroencephalogram data to be tested are synchronously collected, after N270 waveforms are obtained through analysis, multi-feature extraction is carried out on the facial expression image data and the N270 waveforms, and then multi-feature integration is carried out, and then the multi-feature integration is input into a trained neural network model, so that an auxiliary identification result of the depression symptoms is output. The invention not only can provide objective indexes for assisting doctors to identify depression by using experimental waveforms and image data, but also has the advantages of high accuracy, strong robustness and low error rate.
In the embodiments of the present invention, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.
The construction, features and effects of the present invention are described in detail according to the embodiments shown in the drawings, but the above is only a preferred embodiment of the present invention, and it should be understood that the technical features of the above embodiment and the preferred mode thereof can be reasonably combined and matched into various equivalent schemes by those skilled in the art without departing from or changing the design concept and technical effects of the present invention; therefore, the invention is not limited to the embodiments shown in the drawings, but is intended to be within the scope of the invention as long as changes made in the concept of the invention or modifications to the equivalent embodiments do not depart from the spirit of the invention as covered by the specification and drawings.

Claims (7)

1. An auxiliary identification method for depression symptoms based on brain electrical data and facial expression images is characterized by comprising the following steps:
performing an electroencephalogram stimulation experiment based on a preset experimental paradigm, and synchronously collecting facial expression image data of a tested person and electroencephalogram data of an individual;
analyzing the electroencephalogram data to obtain an N270 waveform;
performing multi-feature extraction on the facial expression image data and the N270 waveform;
and inputting the integrated multi-feature into a trained neural network model based on a self-attention mechanism, and outputting an auxiliary identification result of the depression symptoms.
2. The method for assisting in identifying a depressive disorder based on electroencephalogram data and facial expression images according to claim 1, wherein the editing process of the preset experimental paradigm comprises:
based on Matlab and E-prime, using gray photos with consistent physical properties; the sex proportion in the gray photos is balanced, the tested list is neutral and the negative proportion is balanced, no facial mark exists, and partial facial shielding treatment is carried out on half of the photos;
a given number of trials, the duration of a single trial, and the total duration of the experiment are set.
3. The method for assisting in the identification of a depressive disorder based on electroencephalogram data and facial expression images according to claim 1, wherein the multi-feature extraction comprises:
extracting facial features and physiological signal features from the facial expression image data by utilizing a pre-trained multitasking depth prediction model;
extracting waveform features based on the N270 waveform;
and obtaining emotion feature vectors based on the facial expression image data and the N270 waveform.
4. The method for assisting in identifying a depression disorder based on electroencephalogram data and facial expression images according to claim 3, wherein the multi-task depth prediction model comprises a common feature extraction module and a multi-task feature fusion module;
the common feature extraction module is used for extracting features of each task and recovering a rough depth map, a semantic segmentation map and a surface vector map;
the multi-task feature fusion module is used for carrying out multi-task fusion on the features extracted by the common feature extraction module, and can distinguish the common semantic features of each task and the unique semantic features of each task.
5. The method for assisting in identifying a depressive disorder based on electroencephalogram data and facial expression images according to claim 4, wherein the common feature extraction module adopts a single-input multi-output network and is composed of at least four parts: the device comprises an encoder, a multi-dimensional decoder, a multi-scale feature fusion sub-module and a refinement sub-module;
the encoder is used for extracting characteristics of various scales;
the multi-dimensional decoder is used for gradually expanding the final characteristics of the encoder through an up-sampling module and reducing the number of channels at the same time;
the multi-scale feature fusion submodule is used for combining different information of a plurality of scales into one;
the refinement sub-module is used for adjusting the output size of the image and the channel number.
6. The method for assisting in identifying a depressive disorder based on electroencephalogram data and facial expression images according to claim 4, wherein the multi-task feature fusion module adopts a multi-input multi-output network and is composed of at least two parts: the multi-input feature fusion module is used for fusing the multi-task features output by the previous module; the feature decoding section is a multi-output decoder.
7. The auxiliary identification method for depression symptoms based on brain electrical data and facial expression images according to any one of claims 1 to 6, wherein the outputting the auxiliary identification result for depression symptoms comprises:
fusing the multiple features in time sequence to obtain space-time feature vectors;
and inputting the space-time feature vector into a Transformer encoder model, and classifying by softmax to obtain an auxiliary recognition result.
CN202311405231.6A 2023-10-27 2023-10-27 Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images Active CN117137488B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311405231.6A CN117137488B (en) 2023-10-27 2023-10-27 Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311405231.6A CN117137488B (en) 2023-10-27 2023-10-27 Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images

Publications (2)

Publication Number Publication Date
CN117137488A true CN117137488A (en) 2023-12-01
CN117137488B CN117137488B (en) 2024-01-26

Family

ID=88902970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311405231.6A Active CN117137488B (en) 2023-10-27 2023-10-27 Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images

Country Status (1)

Country Link
CN (1) CN117137488B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633667A (en) * 2024-01-26 2024-03-01 吉林大学第一医院 N270 waveform-based depression symptom identification method, device and equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201360302Y (en) * 2009-02-12 2009-12-09 南靖万利达科技有限公司 Television and computer all-in-one machine
CN109157231A (en) * 2018-10-24 2019-01-08 阿呆科技(北京)有限公司 Portable multi-channel Depression trend assessment system based on emotional distress task
CN109171769A (en) * 2018-07-12 2019-01-11 西北师范大学 It is a kind of applied to depression detection voice, facial feature extraction method and system
KR20190130808A (en) * 2018-05-15 2019-11-25 연세대학교 산학협력단 Emotion Classification Device and Method using Convergence of Features of EEG and Face
CN111797747A (en) * 2020-06-28 2020-10-20 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on EEG, BVP and micro-expression
US20210113129A1 (en) * 2016-12-01 2021-04-22 Sin-Ger Huang A system for determining emotional or psychological states
WO2021104099A1 (en) * 2019-11-29 2021-06-03 中国科学院深圳先进技术研究院 Multimodal depression detection method and system employing context awareness
US20210353224A1 (en) * 2018-10-15 2021-11-18 The Board Of Trustees Of The Leland Stanford Junior University Treatment of depression using machine learning
CN114748072A (en) * 2022-01-21 2022-07-15 上海大学 Electroencephalogram-based information analysis and rehabilitation training system and method for depression auxiliary diagnosis
CN116467672A (en) * 2023-03-29 2023-07-21 吉林大学 Depression recognition system based on brain electricity-voice bimodal decision fusion
CN116602676A (en) * 2023-05-10 2023-08-18 浙江工业大学 Electroencephalogram emotion recognition method and system based on multi-feature fusion and CLSTN

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201360302Y (en) * 2009-02-12 2009-12-09 南靖万利达科技有限公司 Television and computer all-in-one machine
US20210113129A1 (en) * 2016-12-01 2021-04-22 Sin-Ger Huang A system for determining emotional or psychological states
KR20190130808A (en) * 2018-05-15 2019-11-25 연세대학교 산학협력단 Emotion Classification Device and Method using Convergence of Features of EEG and Face
CN109171769A (en) * 2018-07-12 2019-01-11 西北师范大学 It is a kind of applied to depression detection voice, facial feature extraction method and system
US20210353224A1 (en) * 2018-10-15 2021-11-18 The Board Of Trustees Of The Leland Stanford Junior University Treatment of depression using machine learning
CN109157231A (en) * 2018-10-24 2019-01-08 阿呆科技(北京)有限公司 Portable multi-channel Depression trend assessment system based on emotional distress task
WO2021104099A1 (en) * 2019-11-29 2021-06-03 中国科学院深圳先进技术研究院 Multimodal depression detection method and system employing context awareness
CN111797747A (en) * 2020-06-28 2020-10-20 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on EEG, BVP and micro-expression
CN114748072A (en) * 2022-01-21 2022-07-15 上海大学 Electroencephalogram-based information analysis and rehabilitation training system and method for depression auxiliary diagnosis
CN116467672A (en) * 2023-03-29 2023-07-21 吉林大学 Depression recognition system based on brain electricity-voice bimodal decision fusion
CN116602676A (en) * 2023-05-10 2023-08-18 浙江工业大学 Electroencephalogram emotion recognition method and system based on multi-feature fusion and CLSTN

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIJUAN DUAN ET AL: "Machine Learning Approaches for MDD Detection and Emotion Decoding Using EEG Signals", 《FRONTIERS IN HUMAN NEUROSCIENCE》 *
彭焱 等: "抑郁症患者冲突信息认知加工能力受损: 来自事件相关电位N270的证据", 《中国健康心理学杂志》, vol. 30, no. 10 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633667A (en) * 2024-01-26 2024-03-01 吉林大学第一医院 N270 waveform-based depression symptom identification method, device and equipment

Also Published As

Publication number Publication date
CN117137488B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
Seal et al. DeprNet: A deep convolution neural network framework for detecting depression using EEG
CN109886273B (en) CMR image segmentation and classification system
CN111528859B (en) Child ADHD screening and evaluating system based on multi-modal deep learning technology
CN117137488B (en) Auxiliary identification method for depression symptoms based on electroencephalogram data and facial expression images
US20140003658A1 (en) Method and apparatus for coding of eye and eye movement data
Pal et al. Deep learning techniques for prediction and diagnosis of diabetes mellitus
CN111920420A (en) Patient behavior multi-modal analysis and prediction system based on statistical learning
CN115272295A (en) Dynamic brain function network analysis method and system based on time domain-space domain combined state
CN117216546A (en) Model training method, device, electronic equipment, storage medium and program product
Liu et al. PRA-Net: Part-and-Relation Attention Network for depression recognition from facial expression
Tosun et al. Novel eye‐blink artefact detection algorithm from raw EEG signals using FCN‐based semantic segmentation method
Creagh et al. Interpretable deep learning for the remote characterisation of ambulation in multiple sclerosis using smartphones
CN117237351B (en) Ultrasonic image analysis method and related device
Maria et al. A comparative study on prominent connectivity features for emotion recognition from EEG
Chen et al. DCTNet: Hybrid deep neural network-based EEG signal for detecting depression
CN116452593B (en) Method, device and system for constructing AI evaluation model of vascular cognitive disorder
Qendro et al. High frequency eeg artifact detection with uncertainty via early exit paradigm
CN115909438A (en) Pain expression recognition system based on depth time-space domain convolutional neural network
CN116383618A (en) Learning concentration assessment method and device based on multi-mode data
CN114246588A (en) Depression research method
Sridurga et al. Detecting Autism Spectrum Syndrome using VGG19 and Xception Networks
Nagarhalli et al. Evaluating the Effectiveness of the Convolution Neural Network in Detecting Brain Tumors
CN115429272B (en) Psychological health state assessment method and system based on multi-mode physiological signals
Gopi Brain tissue segmentation to detect schizophrenia in gray matter using MR images
Tekulu et al. Schizophrenia Disease Classification with Deep learning and Convolutional Neural Network Architectures using TensorFlow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant