CN117137488A

CN117137488A - An auxiliary identification method for depression symptoms based on EEG data and facial expression images

Info

Publication number: CN117137488A
Application number: CN202311405231.6A
Authority: CN
Inventors: 吕玉丹; 杨鑫; 王长明; 姚翰; 张永祥; 殷雪峰; 李童; 张肇轩; 尹宝才; 张远
Original assignee: Dalian University of Technology; Jilin University
Current assignee: Dalian University of Technology; Jilin University
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2023-12-01
Anticipated expiration: 2043-10-27
Also published as: CN117137488B

Abstract

The invention discloses an auxiliary identification method for depression symptoms based on brain electrical data and facial expression images, which is mainly designed in such a way that the degree of conflict treatment disorder is evaluated by organically combining brain electrical physiological indexes and facial image indexes around the conflict treatment disorder and core symptoms of negative bias of depression groups, so that objective and quantitative indexes are extracted for the identification and the pre-estimation of depression. Specifically, an electroencephalogram stimulation experiment is executed based on a preset experimental paradigm, facial expression image data and individual electroencephalogram data to be tested are synchronously collected, after N270 waveforms are obtained through analysis, multi-feature extraction is carried out on the facial expression image data and the N270 waveforms, and then multi-feature integration is carried out, and then the multi-feature integration is input into a trained neural network model, so that an auxiliary identification result of the depression symptoms is output. The invention not only can provide objective indexes for assisting doctors to identify depression by using experimental waveforms and image data, but also has the advantages of high accuracy, strong robustness and low error rate.

Description

An auxiliary identification method for depression symptoms based on EEG data and facial expression images

技术领域Technical field

本发明涉及人工智能技术应用领域，尤其涉及一种基于脑电数据与面部表情影像的抑郁症病症辅助识别方法。The present invention relates to the application field of artificial intelligence technology, and in particular to an auxiliary identification method for depression symptoms based on EEG data and facial expression images.

背景技术Background technique

抑郁症作为常见的情感障碍性疾病，严重影响患者身心健康，而如何更佳地辅助医生辨识、判别、预测抑郁症的相关指征，探索客观指标的构建，成为辅助抑郁症诊疗的远期、重要目标。As a common affective disorder, depression seriously affects the physical and mental health of patients. How to better assist doctors in identifying, distinguishing, and predicting the relevant indications of depression and exploring the construction of objective indicators have become a long-term and important way to assist in the diagnosis and treatment of depression. Important goals.

事件相关电位（Event-related potentials，ERP）范式是一种神经生理学实验设计和分析方法，用于研究大脑对特定刺激或任务的电生理反应，又称“认知电位”，在揭示心理-认知过程方面极具优势。N270是前后两次刺激不完全匹配的时候在270ms处记录到的负波，反映了脑对冲突信息的处理能力，可以作为冲突处理能力的电活动指标，且特异性强。抑郁症患者存在明显的冲突处理能力障碍及负性偏向的核心症状，也是抑郁症患者不能正常进行社交、恢复社会功能的主要原因。因此，N270作为评判冲突信息处理能力的电活动指标，可以作为独立于抑郁症候群之外的预测抑郁症的特异性客观指标。既往影像研究表明抑郁症患者在工作及社会活动中冲突处理系统受损，由此本发明认为N270可以作为预测抑郁症的敏感性客观指标。The event-related potentials (ERP) paradigm is a neurophysiological experimental design and analysis method used to study the brain's electrophysiological response to specific stimuli or tasks, also known as "cognitive potentials", in revealing psychological-cognitive Great advantage in knowing the process. N270 is a negative wave recorded at 270ms when the two stimulations do not completely match. It reflects the brain's ability to process conflicting information. It can be used as an electrical activity indicator of conflict processing ability and has strong specificity. Patients with depression have obvious core symptoms of conflict handling impairment and negative bias, which are also the main reasons why patients with depression cannot socialize normally and restore social functions. Therefore, N270, as an electrical activity index to evaluate the ability to process conflict information, can be used as a specific objective index to predict depression independent of depressive syndrome. Previous imaging studies have shown that patients with depression have impaired conflict processing systems in work and social activities. Therefore, the present invention believes that N270 can be used as a sensitive objective indicator for predicting depression.

然而，既往已知的N270任务范式存在以下问题：1、信噪比较低，需要进行大量的试验重复和数据处理才能提取出可靠的成分，实验操作程序繁琐，被试者学习成本较高，导致实验的时间和工作量比较大；2、刺激呈现系统中，事件码与刺激码之间未做到精确计时，存在时间尺度的误差；3、范式所用素材主要为数字、字母、图形等，不能较好针对抑郁人群的负性偏向症状设计；4、尤其是考虑到单一应用N270脑电数据，其可靠性相对受限，为后续生成指导性报告带来一定影响。However, the previously known N270 task paradigm has the following problems: 1. The signal-to-noise ratio is low, requiring a large number of trial repetitions and data processing to extract reliable components. The experimental operation procedures are cumbersome and the subject's learning cost is high. This results in a relatively large amount of time and workload for the experiment; 2. In the stimulus presentation system, precise timing is not achieved between the event code and the stimulus code, and there is an error in time scale; 3. The materials used in the paradigm are mainly numbers, letters, graphics, etc. It cannot be well designed for the negative bias symptoms of depressed people; 4. Especially considering the single application of N270 EEG data, its reliability is relatively limited, which will have a certain impact on the subsequent generation of guidance reports.

发明内容Contents of the invention

鉴于上述，本发明旨在提供一种基于脑电数据与面部表情影像的抑郁症病症辅助识别方法，以解决前述提及的技术问题。In view of the above, the present invention aims to provide an auxiliary identification method for depression symptoms based on EEG data and facial expression images to solve the aforementioned technical problems.

本发明采用的技术方案如下：The technical solutions adopted by the present invention are as follows:

本发明提供了一种基于脑电数据与面部表情影像的抑郁症病症辅助识别方法，其中包括：The present invention provides an auxiliary identification method for depression symptoms based on EEG data and facial expression images, which includes:

基于预设的实验范式执行脑电刺激实验，并同步采集被试的面部表情影像数据以及个体的脑电数据；Perform brain electrical stimulation experiments based on preset experimental paradigms, and simultaneously collect subjects' facial expression image data and individual EEG data;

对脑电数据进行解析，得到N270波形；Analyze the EEG data to obtain the N270 waveform;

对面部表情影像数据以及N270波形进行多特征提取；Perform multi-feature extraction on facial expression image data and N270 waveform;

将多特征整合后输入经训练的基于自注意力机制的神经网络模型，输出抑郁症病症辅助识别结果。The multiple features are integrated and then input into the trained neural network model based on the self-attention mechanism to output the auxiliary identification results of depression symptoms.

在其中至少一种可能的实现方式中，预设的实验范式的编辑过程包括：In at least one of the possible implementations, the editing process of the preset experimental paradigm includes:

基于Matlab及E-prime，使用物理性质一致的灰度照片；其中，灰度照片中的性别比例均衡，被试表情中性与负性比例均衡，且无任何面部标志，并对其中一半数量的照片进行面部局部遮挡处理；Based on Matlab and E-prime, grayscale photos with consistent physical properties were used; among them, the gender ratio in the grayscale photos was balanced, the subjects' expressions were balanced in neutral and negative proportions, and there were no facial marks, and half of them were The photo is partially blocked on the face;

设置既定数量的试次、单个试次的持续时间以及实验总时长。Set a given number of trials, the duration of individual trials, and the total duration of the experiment.

在其中至少一种可能的实现方式中，所述多特征提取包括：In at least one possible implementation, the multi-feature extraction includes:

利用预先训练的多任务深度预测模型从所述面部表情影像数据提取面部特征、生理信号特征；Using a pre-trained multi-task depth prediction model to extract facial features and physiological signal features from the facial expression image data;

基于所述N270波形提取波形特征；Extract waveform features based on the N270 waveform;

基于所述面部表情影像数据以及所述N270波形，获得情感特征向量。Based on the facial expression image data and the N270 waveform, an emotional feature vector is obtained.

在其中至少一种可能的实现方式中，所述多任务深度预测模型包括共有特征提取模块以及多任务特征融合模块；In at least one possible implementation, the multi-task depth prediction model includes a common feature extraction module and a multi-task feature fusion module;

所述共有特征提取模块用于提取各任务的特征并恢复粗糙的深度图、语义分割图、表面向量图；The common feature extraction module is used to extract features of each task and restore rough depth maps, semantic segmentation maps, and surface vector maps;

所述多任务特征融合模块用于将所述共有特征提取模块提取到的特征进行多任务融合，且能够区分各任务的共有语义特征和各任务独有的语义特征。The multi-task feature fusion module is used to perform multi-task fusion on the features extracted by the common feature extraction module, and can distinguish the common semantic features of each task and the unique semantic features of each task.

在其中至少一种可能的实现方式中，所述共有特征提取模块采用单输入多输出型网络，并至少由如下四部分组成：编码器、多维解码器、多尺度特征融合子模块以及细化子模块；In at least one possible implementation, the common feature extraction module adopts a single-input multiple-output network and consists of at least the following four parts: an encoder, a multi-dimensional decoder, a multi-scale feature fusion sub-module and a refinement sub-module. module;

所述编码器用于提取多种尺度的特征；The encoder is used to extract features at multiple scales;

所述多维解码器用于经过上采样模块，逐步扩大所述编码器的最终特征，同时减少通道数量；The multi-dimensional decoder is used to gradually expand the final features of the encoder through an upsampling module while reducing the number of channels;

所述多尺度特征融合子模块用于将多个尺度的不同信息合并为一个；The multi-scale feature fusion sub-module is used to combine different information at multiple scales into one;

所述细化子模块用于调整图像的输出大小以及通道数。The thinning sub-module is used to adjust the output size and number of channels of the image.

在其中至少一种可能的实现方式中，所述多任务特征融合模块采用多输入多输出型网络，并至少由如下两部分组成：多输入特征融合模块，用于将前一模块输出的多任务特征进行融合；特征解码部分为多输出解码器。In at least one possible implementation, the multi-task feature fusion module adopts a multiple-input multiple-output network and consists of at least the following two parts: a multiple-input feature fusion module, which is used to combine the multi-task features output by the previous module. Features are fused; the feature decoding part is a multi-output decoder.

在其中至少一种可能的实现方式中，所述输出抑郁症病症辅助识别结果包括：In at least one possible implementation, the output of the auxiliary identification result of depression symptoms includes:

将多特征按时间顺序融合，得到时空特征向量；Fusion of multiple features in time order to obtain spatio-temporal feature vectors;

将所述时空特征向量输入Transformer encoder模型，再经过softmax分类得到辅助识别结果。The spatio-temporal feature vector is input into the Transformer encoder model, and then the auxiliary recognition result is obtained through softmax classification.

与现有技术相比，本发明的主要设计构思在于，围绕抑郁人群冲突处理障碍及负性偏向的核心症状，有机结合脑电生理指标及面部影像指标，评测其冲突处理障碍的程度，进而为抑郁症的识别及预估的提炼出客观、量化指标。具体是基于预设的实验范式执行脑电刺激实验，并同步采集被试的面部表情影像数据及个体脑电数据，经解析得到N270波形后，对面部表情影像数据及N270波形进行多特征提取，再将多特征整合后输入经训练的神经网络模型，输出抑郁症病症辅助识别结果。本发明不仅能够提供辅助医生利用实验波形及影像数据辨识抑郁症的客观指标，还具有准确性高、鲁棒性强、误差率低的优点。Compared with the existing technology, the main design concept of the present invention is to focus on the core symptoms of conflict processing disorder and negative bias in depressed people, organically combine brain electrophysiological indicators and facial imaging indicators to evaluate the degree of conflict processing disorder, and then provide Objective and quantitative indicators are extracted for the identification and prediction of depression. Specifically, the brain electrical stimulation experiment is performed based on the preset experimental paradigm, and the subject's facial expression image data and individual EEG data are simultaneously collected. After the N270 waveform is obtained through analysis, multi-feature extraction is performed on the facial expression image data and N270 waveform. The multiple features are then integrated and input into the trained neural network model to output the auxiliary identification results of depression symptoms. The present invention can not only provide objective indicators to assist doctors in identifying depression using experimental waveforms and image data, but also has the advantages of high accuracy, strong robustness and low error rate.

进一步地，将机器学习技术应用于抑郁症患者面部表情或微表情检测、特征提取，并结合多任务模块的深度学习，可以显著提高抑郁症指征辨识、预测的敏感性、准确性、特异性。Furthermore, applying machine learning technology to the detection and feature extraction of facial expressions or micro-expressions of patients with depression, combined with the deep learning of multi-task modules, can significantly improve the sensitivity, accuracy and specificity of identification and prediction of depression indicators. .

附图说明Description of the drawings

为使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作进一步描述，其中：In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described below in conjunction with the accompanying drawings, in which:

图1为本发明实施例提供的基于脑电数据与面部表情影像的抑郁症病症辅助识别方法的流程示意图。Figure 1 is a schematic flowchart of an auxiliary identification method for depression based on EEG data and facial expression images provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。The embodiments of the present invention are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and cannot be construed as limiting the present invention.

本发明提出了一种基于脑电数据与面部表情影像的抑郁症病症辅助识别方法的实施例，具体来说，如图1所示，其中包括：The present invention proposes an embodiment of an auxiliary identification method for depression based on EEG data and facial expression images. Specifically, as shown in Figure 1, it includes:

步骤S1、基于预设的实验范式执行脑电刺激实验，并同步采集被试的面部表情影像数据（可以通过高清摄像头对面部表情进行特写采集并记录）以及个体的脑电数据（具体为，脑电图时域数据）；Step S1: Perform a brain electrical stimulation experiment based on a preset experimental paradigm, and simultaneously collect the facial expression image data of the subject (facial expressions can be collected and recorded close-up through a high-definition camera) and individual EEG data (specifically, brain electrogram time domain data);

这里提及的预设范式中，具体的编辑过程可以包括如下部分：基于Matlab及E-prime，使用物理性质一致的灰度照片；其中，灰度照片中的性别比例均衡，被试表情中性与负性比例均衡，且无任何面部标志(例如眼镜、胡须、皮肤痣、首饰等)，并对其中一半数量的照片进行面部局部遮挡处理；设置既定数量的试次、单个试次的持续时间以及实验总时长，例如设置至少180个试次，每个试次持续500ms，且先后呈现的图片与刺激间隔的总时长为8min左右。In the preset paradigm mentioned here, the specific editing process can include the following parts: based on Matlab and E-prime, using grayscale photos with consistent physical properties; among them, the gender ratio in the grayscale photos is balanced, and the subject's expression is neutral Balanced with negative proportions, and without any facial marks (such as glasses, beards, skin moles, jewelry, etc.), half of the photos are partially occluded on the face; set a set number of trials and the duration of a single trial As well as the total duration of the experiment, for example, set at least 180 trials, each trial lasts 500ms, and the total duration of the interval between successively presented pictures and stimulation is about 8 minutes.

该新型N270任务范式具有以下优点：1、实验流程简洁，被试操作简单；2、参数设计合理，诱发脑电波形重复性好、信噪比高，精确了校准计时系统，计时误差范围可控；3、图片素材覆盖负性、中性情绪，针对抑郁人群负性偏向的核心症状；4、任务敏感，特异性高，针对冲突处理能力障碍这一核心症状设计，可以作为早期预测抑郁症的客观指标。This new N270 task paradigm has the following advantages: 1. The experimental process is simple and the subjects are easy to operate; 2. The parameters are reasonably designed, the induced EEG waveforms are repeatable, the signal-to-noise ratio is high, the timing system is accurately calibrated, and the timing error range is controllable. ; 3. The picture material covers negative and neutral emotions, targeting the core symptoms of negative bias in depressed people; 4. It is task-sensitive and highly specific. It is designed for the core symptom of conflict processing disorder and can be used as an early predictor of depression. objective indicators.

此外，本发明经分析认为，抑郁症除了抑郁症候群、冲突处理障碍、负性偏向等核心症状外，同时也存在面部表情或微表情的变化，如悲伤负性表情增加，笑容正性表情越少。面部表情虽由肌肉收缩产生，但其变化与内在心理变化呈现一致性，可作为情绪、意图和欲望等信息的传递窗口。抑郁症患者受到冲突信息（冲突信息指的是与患者内心的信念、价值观或期望相冲突的信息）刺激时，由于冲突处理能力的减弱，会触发不安、焦虑和压力等负面情绪，面部表情则会进一步向负面情绪转化，如沮丧、绝望、无助等。基于此，被试在进行本发明提出的新N270范式刺激大脑冲突处理系统的任务时，可同步收集患者的面部表情作为特征补强，并可以优选借助AI辅助算法，对面部表情特征进行深度学习，由此进一步加深对抑郁人群冲突处理能力的辨识，对面部特征的提取及量化。In addition, the present invention believes that in addition to core symptoms such as depressive syndrome, conflict processing disorder, and negativity bias, depression also has changes in facial expressions or micro-expressions, such as an increase in negative expressions of sadness and a decrease in positive expressions of smiles. . Although facial expressions are produced by muscle contraction, their changes are consistent with internal psychological changes and can be used as a window for transmitting information such as emotions, intentions, and desires. When patients with depression are stimulated by conflicting information (conflicting information refers to information that conflicts with the patient's inner beliefs, values, or expectations), due to the weakening of conflict processing abilities, negative emotions such as uneasiness, anxiety, and stress will be triggered, and facial expressions will It will further transform into negative emotions, such as depression, despair, helplessness, etc. Based on this, when subjects perform the task of stimulating the brain's conflict processing system with the new N270 paradigm proposed by the present invention, they can simultaneously collect the patient's facial expressions as feature enhancements, and can preferably use AI-assisted algorithms to conduct in-depth learning of facial expression features. , thereby further deepening the identification of the conflict handling ability of depressed people, and extracting and quantifying facial features.

步骤S2、对脑电数据进行解析，得到N270波形；Step S2: Analyze the EEG data to obtain the N270 waveform;

在实际操作中，主要可以包括：In actual operation, it mainly includes:

在导入所述脑电数据并进行独立成分分析后，设置N270分段时间窗；After importing the EEG data and performing independent component analysis, set the N270 segmented time window;

将不同刺激和相同刺激的数据片段分别叠加再平均，以波形相减方式得到N270波形，且将N270波形确定为刺激呈现之后220ms~380ms之间的一个负向波。The data segments of different stimuli and the same stimulus were superimposed and averaged, and the N270 waveform was obtained by waveform subtraction, and the N270 waveform was determined to be a negative wave between 220ms and 380ms after the stimulus presentation.

完整的波形解读过程可参考如下：The complete waveform interpretation process can be found as follows:

a)导入数据：启动Matlab和EEGlab，选择相应的导入格式导入数据；a) Import data: Start Matlab and EEGlab, select the corresponding import format to import data;

b)降采样：调整采样率为250Hz，压缩数据，减少高频信息影响；b) Downsampling: adjust the sampling rate to 250Hz, compress data, and reduce the impact of high-frequency information;

c)导入通道信息：确定各个电极的坐标；剔除无用电极：按照实际情况与需要筛选导联；c) Import channel information: determine the coordinates of each electrode; eliminate useless electrodes: filter leads according to actual conditions and needs;

d)重参考：以双侧乳突为参考；d) Re-reference: use the bilateral mastoid processes as a reference;

e)滤波：进行0.5Hz-45Hz的带通滤波；e) Filtering: perform band-pass filtering from 0.5Hz to 45Hz;

f)独立成分分析ICA：对数据进行ICA分析，为了去除信号中的伪迹f) Independent component analysis ICA: perform ICA analysis on the data in order to remove artifacts in the signal

尤其是眼电伪迹，并去除噪声成分；Especially EOG artifacts and removing noise components;

g)分段：N270分段时间窗设置为刺激前200ms和刺激后800ms；g) Segmentation: The N270 segmentation time window is set to 200ms before stimulation and 800ms after stimulation;

h)基线校正：以刺激前200ms为基线进行校正；h) Baseline correction: use 200ms before stimulation as the baseline for correction;

i)叠加平均：把不同刺激和相同刺激的数据片段分别叠加平均，把人脸不相同刺激和人脸相同刺激的波形相减得到N270波形；i) Superposition average: superimpose and average the data fragments of different stimuli and the same stimulus respectively, and subtract the waveforms of different face stimuli and the same face stimulus to obtain the N270 waveform;

j)ERP成分辨认：N270为刺激呈现后220-380ms之间的一个负向波。j) ERP component identification: N270 is a negative wave between 220-380ms after stimulus presentation.

步骤S3、对面部表情影像数据以及N270波形进行多特征提取；Step S3: Extract multiple features from facial expression image data and N270 waveform;

具体来说，所述多特征提取可以包括：Specifically, the multi-feature extraction may include:

基于所述面部表情影像数据提取面部特征、生理信号特征；Extract facial features and physiological signal features based on the facial expression image data;

这里需说明的是，多任务特征整合的深度预测算法是一种既能提取出各个任务的相同特征，又能利用各任务的相互关系来进行彼此促进的算法，旨在执行任务交互的特征融合或参数共享。而目前多任务特征整合的模型存在的问题主要是特征融合的方式仅是使用参数共享，并未将共有特征对任务的帮助和各任务之间的特点表现出来，因此，本发明通过引入基于多任务模块的深度学习方法，可以有效解决这一问题。What needs to be explained here is that the depth prediction algorithm of multi-task feature integration is an algorithm that can not only extract the same features of each task, but also use the mutual relationship between each task to promote each other. It is designed to perform feature fusion of task interaction. or parameter sharing. The main problem with the current multi-task feature integration model is that the feature fusion method only uses parameter sharing, and does not show the help of common features to tasks and the characteristics of each task. Therefore, the present invention introduces a multi-task based feature integration model. The deep learning method of the task module can effectively solve this problem.

针对上述各特征提取的过程，这里给出如下实施参考：For the above feature extraction process, the following implementation reference is given here:

（1）面部特征(1) Facial features

步骤S31、将面部表情影像数据进行分帧处理得到若干个图像序列；Step S31: Process the facial expression image data into frames to obtain several image sequences;

步骤S32、利用预先训练的多任务深度预测模型对所述图像序列进行面部特征提取，得到面部帧序列；Step S32: Use the pre-trained multi-task depth prediction model to extract facial features from the image sequence to obtain a facial frame sequence;

步骤S33、采用光流法对所述面部帧序列的相邻帧提取光流，得到光流序列；Step S33: Use the optical flow method to extract optical flow from adjacent frames of the facial frame sequence to obtain an optical flow sequence;

步骤S34、将所述面部帧序列和所述光流序列融合并展开，得到面部一维向量，再对所述面部一维向量进行线性映射得到面部嵌入向量。Step S34: Fusion and expansion of the face frame sequence and the optical flow sequence to obtain a one-dimensional face vector, and then linearly map the one-dimensional face vector to obtain a face embedding vector.

（2）生理信号特征(2) Physiological signal characteristics

步骤S310、利用所述多任务深度预测模型从所述图像序列中提取面颊部的感兴趣区，得到感兴趣序列；Step S310: Use the multi-task depth prediction model to extract the region of interest of the cheek from the image sequence to obtain the sequence of interest;

步骤S320、对所述感兴趣序列进行展开得到感兴趣一维向量，再对所述感兴趣一维向量进行线性映射得到生理信号嵌入向量。Step S320: Expand the sequence of interest to obtain a one-dimensional vector of interest, and then linearly map the one-dimensional vector of interest to obtain a physiological signal embedding vector.

（3）波形特征(3) Waveform characteristics

步骤S311、以帧为单位提取N270波形的基本特征，将所述基本特征组成波形一维向量，再对所述波形一维向量进行线性映射得到波形嵌入向量。Step S311: Extract the basic features of the N270 waveform in units of frames, combine the basic features into a one-dimensional waveform vector, and then linearly map the one-dimensional waveform vector to obtain a waveform embedding vector.

（4）情感特征(4) Emotional characteristics

步骤S301、对所述图像序列提取多维度面部特征向量；Step S301: Extract multi-dimensional facial feature vectors from the image sequence;

步骤S302、对N270波形进行傅里叶变换转为频域数据，以频率为基准将所述频域数据矩阵依据不同的频段范围进行区块划分，得到分块频域矩阵，计算所述分块频域矩阵的各个子块的协方差矩阵后，计算各个所述协方差矩阵的LES，得到LES特征向量；Step S302: Perform Fourier transform on the N270 waveform to convert it into frequency domain data, divide the frequency domain data matrix into blocks according to different frequency ranges based on frequency, obtain a block frequency domain matrix, and calculate the block After calculating the covariance matrix of each sub-block of the frequency domain matrix, calculate the LES of each covariance matrix to obtain the LES feature vector;

步骤S303、将所述多维度面部特征向量与LES特征向量进行拼接，得到融合向量；将所述融合向量输入到预训练的情感特征提取模型，得到与图像序列对应的情感嵌入向量。Step S303: Splice the multi-dimensional facial feature vector and the LES feature vector to obtain a fusion vector; input the fusion vector into a pre-trained emotional feature extraction model to obtain an emotional embedding vector corresponding to the image sequence.

步骤S4、将多特征整合后输入经训练的基于自注意力机制的神经网络模型，输出抑郁症病症辅助识别结果。Step S4: Integrate the multiple features and input them into the trained neural network model based on the self-attention mechanism, and output the auxiliary identification results of depression symptoms.

对此步骤具体可以但不限于采用如下思路：Specifically, this step can be, but is not limited to, the following ideas:

进一步地，关于前文所述多任务深度预测模型，具体可以通过两个模块进行网络训练，其中一个是共有特征提取模块，负责提取各任务的特征并可以恢复粗糙的深度图、语义分割图、表面向量图；另一个模块是多任务特征融合模块，该模块负责将共有特征提取模块提取到的特征进行多任务的融合，网络将会区分各任务共有语义特征和各任务独有的语义特征，使得最后恢复的图像更具有结构性。具体来说：Furthermore, regarding the multi-task depth prediction model mentioned above, network training can be carried out through two modules. One of them is a common feature extraction module, which is responsible for extracting the features of each task and can restore rough depth maps, semantic segmentation maps, and surface Vector diagram; the other module is the multi-task feature fusion module, which is responsible for multi-task fusion of the features extracted by the common feature extraction module. The network will distinguish the semantic features shared by each task and the semantic features unique to each task, so that The final recovered image is more structured. Specifically:

（一）关于所述共有特征提取模块(1) About the shared feature extraction module

（1）网络结构(1) Network structure

共有特征提取模块包含四部分：编码器、多维解码器、多尺度特征融合子模块和细化子模块。具体地，编码器可由四个卷积层组成，负责提取1/4、1/8、1/16和1/32多种尺度的特征；多维解码器采用四个上采样模块，逐步扩大编码器的最终特征，同时减少通道数量；多尺度特征融合子模块使用向上采样和通道连接的方式来集成自编码器的四种不同比例特征：与编码器相对应，将编码器的四层输出（每个具有16个通道）分别以×2、×4、×8 和×16 的形式上采样，以便具有与最终输出相同的大小，这种上采样以通道连接方式完成，然后通过卷积层进一步变换以获得具有64个通道的输出，多尺度特征融合子模块的主要目的是将多个尺度的不同信息合并为一个，从而使编码器的下层输出保留了具有更精细空间分辨率的信息，有助于恢复由于多次下采样而丢失的细节信息；最后，细化子模块用于调整图像的输出大小以及通道数，分别对应三个任务采用三个卷积层，使得输出的通道数恢复为深度图像的1通道，语义分割图像的1通道和表面向量的3通道，便于进行损失的计算和反向传播。以上述实施例示意来说：The total feature extraction module consists of four parts: encoder, multi-dimensional decoder, multi-scale feature fusion sub-module and refinement sub-module. Specifically, the encoder can be composed of four convolutional layers, responsible for extracting features at multiple scales of 1/4, 1/8, 1/16, and 1/32; the multidimensional decoder uses four upsampling modules to gradually expand the encoder final features while reducing the number of channels; the multi-scale feature fusion sub-module uses upsampling and channel connection to integrate four different scale features of the autoencoder: corresponding to the encoder, the four-layer output of the encoder (each (with 16 channels) are upsampled in the form of ×2, ×4, ×8 and ×16 respectively to have the same size as the final output. This upsampling is done in a channel-connected manner and then further transformed by convolutional layers. In order to obtain an output with 64 channels, the main purpose of the multi-scale feature fusion sub-module is to merge different information from multiple scales into one, so that the lower layer output of the encoder retains information with finer spatial resolution, which helps To restore the detailed information lost due to multiple downsampling; finally, the refinement sub-module is used to adjust the output size and channel number of the image, using three convolutional layers corresponding to three tasks, so that the output channel number is restored to depth 1 channel for images, 1 channel for semantic segmentation images and 3 channels for surface vectors to facilitate loss calculation and backpropagation. Taking the above embodiment as a schematic:

a)把面部表情影像数据及N270波形分别进行分帧处理得到多个图像序列；a) Process the facial expression image data and N270 waveform into frames respectively to obtain multiple image sequences;

b)图像编辑为320×240×3大小的RGB图像；b) The image is edited into an RGB image of 320×240×3 size;

c)通过编码器的四个卷积层得到四层特征；c) Obtain four layers of features through the four convolutional layers of the encoder;

d)四层特征通过解码器的四个采样层进行解码；d) The four-layer features are decoded through the four sampling layers of the decoder;

e)完毕后，通过多尺度融合子模块将不同尺度的上采样输出融合在一起，最后通过细化子模块中不同的反卷积层恢复出深度图像、语义分割图、表面向量图的形状。e) After completion, the upsampling outputs of different scales are fused together through the multi-scale fusion sub-module, and finally the shapes of the depth image, semantic segmentation map, and surface vector map are restored through different deconvolution layers in the refinement sub-module.

（2）训练过程(2) Training process

a)将RGB图像与深度图像、语义分割图、表面向量图与网络预测的对应图像之间建立损失函数，每次处理图像数量为4；a) Establish a loss function between the RGB image, depth image, semantic segmentation map, surface vector map and the corresponding image predicted by the network, and the number of images processed at each time is 4;

b)通过减小损失函数来更新网络参数；b) Update network parameters by reducing the loss function;

c)经过迭代更新到损失函数收敛，设定迭代次数为100。c) After iterative updating until the loss function converges, set the number of iterations to 100.

所述共有特征提取模块是单输入多输出型网络，在训练的过程中，需要网络根据数据集中RGB图像与深度图像、语义分割图、表面向量图与网络预测的对应图像之间建立损失函数，进行网络参数的更新、迭代直至损失函数收敛时，能够得到训练好的网络。由于需要同时处理3个任务，且保证各任务之间的相关性，所以上述损失函数可分为3个部分，具体特征提取损失函数为：The shared feature extraction module is a single-input multiple-output network. During the training process, the network needs to establish a loss function between the RGB image and depth image, semantic segmentation map, surface vector map and the corresponding image predicted by the network in the data set. When the network parameters are updated and iterated until the loss function converges, the trained network can be obtained. Since three tasks need to be processed at the same time and the correlation between the tasks must be ensured, the above loss function can be divided into three parts. The specific feature extraction loss function is:

Ltask =Ldepth +Lseg +Lnormal L task = L depth + L seg + L normal

各部分损失函数的作用和目的如下：The functions and purposes of each part of the loss function are as follows:

（a）深度图损失函数，即：(a) Depth map loss function, that is:

Ldepth =L1 +Lgrad L depth = L 1 + L grad

该函数为L1损失函数与梯度损失函数之和，其中对于每个像素点i，对应的预测深度与真实深度分别为di和Di，L1损失函数可以约束di与Di间的不同，为该损失函数的主要部分，提供了准确率的保证。This function is the sum of the L 1 loss function and the gradient loss function. For each pixel point i , the corresponding predicted depth and true depth are di and Di respectively. The L 1 loss function can constrain the difference between di and Di , which is The main part of the loss function provides the guarantee of accuracy.

（b）语义分割损失函数，其为交叉熵函数，是语义分割问题中常用的损失函数，常用于描述两个概率之间的距离，在本模型中约束图像中物体的预测语义类别的概率，其表达式为：(b) Semantic segmentation loss function, which is the cross-entropy function, is a commonly used loss function in semantic segmentation problems. It is often used to describe the distance between two probabilities. In this model, the probability of the predicted semantic category of the object in the image is constrained, Its expression is:

其中，对于预测的语义分割图像S 和真值图像中的每个像素上s和，在classes个分类上都是非0即1的元素，对于每个像素仅有一个sj的值为 1，所以交叉熵损失只关心对正确类别的预测概率，只要其值足够大，就可以确保分类结果正确。Among them, for each pixel in the predicted semantic segmentation image S and the true value image, s and s are elements that are either 0 or 1 in classes. There is only one sj value for each pixel, so the intersection Entropy loss only cares about the predicted probability of the correct category. As long as its value is large enough, it can ensure that the classification result is correct.

(c)表面向量损失函数，即测量估计深度图表面法线(n _i ^d)相对于其真实数据表面法线(n _i ^g)的准确性。表面向量损失同样是根据深度梯度计算得到的，但其测量的是两个表面法线之间的角度，因此该损失对深度结构很敏感，能够提升预测结构的一致性，表达式为：(c) Surface vector loss function, which measures the accuracy of the estimated depth map surface normal ( n _i ^d ) relative to its real data surface normal ( n _i ^g ). The surface vector loss is also calculated based on the depth gradient, but it measures the angle between the two surface normals. Therefore, the loss is very sensitive to the depth structure and can improve the consistency of the predicted structure. The expression is:

(d)梯度损失函数，能约束点在x轴、y轴上的梯度变化(gx (ei )和gv (ei )，分别为x和y轴上的梯度损失)，敏感地检测出边缘信息，在物体的边界处深度通常是不连续的。需注意的是，梯度损失与前面的深度损失等是不同类型的误差，所以需要加权来训练的网络，其表达式为：(d) The gradient loss function can constrain the gradient changes of points on the x- axis and y -axis ( gx ( ei ) and gv ( ei ), respectively, are the gradient losses on the x- and y -axes), and sensitively detect edge information, Depth is often discontinuous at the boundaries of an object. It should be noted that the gradient loss and the previous depth loss are different types of errors, so the network that needs weighting to train is expressed as:

在该共有特征提取模块训练过程中，通过直接在网络的输出端得到 3个任务的图像并与其对应的真值之间设计损失函数，经过梯度下降的方式进行反向传播，从而对网络的参数进行更新，当损失函数收敛时，可完成训练。在所述共有特征提取模块算法的训练某实施例中，设定迭代次数为100，每次处理图像数量为4，初始学习率为10-4，每经过20次迭代更新学习率为原来的十分之一，最后在经过100次迭代后得到的收敛损失函数值为0.1245。During the training process of the common feature extraction module, the images of the three tasks are obtained directly at the output end of the network and the loss function is designed between the corresponding true values, and backpropagation is performed through gradient descent, thereby improving the parameters of the network. Updates are made and when the loss function converges, training can be completed. In an embodiment of training the shared feature extraction module algorithm, the number of iterations is set to 100, the number of images processed each time is 4, the initial learning rate is 10-4, and the learning rate is updated to the original 10 after every 20 iterations. 1/1, and the final convergence loss function value obtained after 100 iterations is 0.1245.

（二）关于所述多任务特征融合模块(2) Regarding the multi-task feature fusion module

（1）网络结构(1) Network structure

多任务特征融合模块由两部分组成，第一部分是多输入特征融合模块，负责将上一模块输出的多任务特征进行融合，所使用的网络为密集连接的U-net；第二部分为特征解码部分，与上一部分的解码器部分类似，为多输出解码器，故不做赘述。结合前文示例，具体来说：The multi-task feature fusion module consists of two parts. The first part is the multi-input feature fusion module, which is responsible for fusing the multi-task features output by the previous module. The network used is densely connected U-net; the second part is feature decoding. The decoder part is similar to the decoder part in the previous part and is a multi-output decoder, so no further details will be given. Combined with the previous examples, specifically:

a)创建一个由多个流组成的编码路径，每个流都处理上一模块不同的任务的图像形式；a) Create an encoding path consisting of multiple streams, each stream processing the image form of a different task in the previous module;

b)三个任务先分别通过U-net的编码器进行编码，任务1的图像在经过一次卷积池化操作之后得到的池化特征会与任务2的二次池化特征进行结合，通过卷积层后又与任务3的池化特征结合，保证特征的共用性；b) The three tasks are first encoded by the U-net encoder respectively. The pooled features of the image of task 1 after a convolution and pooling operation will be combined with the secondary pooling features of task 2. Through the convolution After layering, it is combined with the pooling features of Task 3 to ensure the commonality of features;

c)得到的共用特征先通过一个上采样操作得到一个共有上采样特征；c) The obtained common features are first obtained through an upsampling operation to obtain a common upsampling feature;

d)将此上采样特征结合之前的池化特征一起进行解码，分别将其与三种任务不同尺度的池化特征送入解码器；d) Combine this upsampling feature with the previous pooling feature for decoding, and send it and the pooling features of different scales for the three tasks to the decoder respectively;

e)与之前的各任务提取的特征进行连接并通过上采样层，恢复各任务原来的形状；e) Connect with the features extracted from each previous task and restore the original shape of each task through the upsampling layer;

f)将恢复的深度图像、语义分割图、和表面向量图与数据集中的真值进行损失比较，以更新网络中参数。f) Compare the recovered depth image, semantic segmentation map, and surface vector map with the true values in the data set to update the parameters in the network.

所述多任务特征融合模块在原U-net网络的基础上添加了密集连接的方法，能有效地增强多输入模式的特征提取能力，为了实现这种密集的连接模式，首先创建一个由多个流组成的编码路径，每个流都处理上一模块不同的任务的图像形式。对不同的图像形式采用单独的流的主要目的是分散原本会在早期融合的信息，从而限制了网络捕获模式之间复杂关系的学习能力。从网络结构中可以看出，3个任务先分别通过U-net的编码器进行编码，但不同点在于，在不同的卷积层传递时会产生交互，比如，任务1的图像在经过一次卷积池化操作之后，得到的池化特征会在任务2的二次池化特征进行结合，通过卷积层后又与任务3的池化特征结合。这样可以使特征在任务之间流动，保证特征的共用性。在解码器部分，将得到的共用特征先通过一个上采样操作得到一个共有上采样特征，再将此上采样特征结合之前的池化特征一起进行解码，分别将其与3种任务不同尺度的池化特征送入解码器，并与之前的各任务提取的特征进行连接并通过上采样层，恢复各任务原来的形状，再将恢复的深度图像、语义分割图、和表面向量图与数据集中的真值进行损失比较，以更新网络中参数。通过此种通道连接和下采样连接结合的方式可以进一步将不同任务的特征融合，促进各任务之间的转化，该方式与共有特征提取模块中的共有特征不同，在保留原有特征的情况下更能体现不同任务之间的联系，突出多任务特征之间的融合。The multi-task feature fusion module adds a dense connection method based on the original U-net network, which can effectively enhance the feature extraction capabilities of multi-input modes. In order to realize this dense connection mode, first create a multi-stream Composed of an encoding path, each stream handles a different task in the image form of the previous module. The main purpose of having separate streams for different image modalities is to fragment information that would otherwise be fused early on, thus limiting the network's ability to learn complex relationships between patterns. It can be seen from the network structure that the three tasks are first encoded by the U-net encoder, but the difference is that there will be interaction when passing through different convolutional layers. For example, the image of task 1 is passed through a convolution layer. After the pooling operation, the resulting pooled features will be combined with the secondary pooling features of Task 2, and then combined with the pooled features of Task 3 after passing through the convolution layer. This allows features to flow between tasks and ensures the commonality of features. In the decoder part, the obtained common features are first obtained through an upsampling operation to obtain a common upsampled feature, and then the upsampled feature is combined with the previous pooling feature for decoding, and it is combined with the pooling of different scales for the three tasks. The features are sent to the decoder, and are connected with the features extracted from each previous task and passed through the upsampling layer to restore the original shape of each task, and then combine the restored depth image, semantic segmentation map, and surface vector map with the ones in the data set. The true value is compared with the loss to update the parameters in the network. Through this combination of channel connection and downsampling connection, the features of different tasks can be further fused and the transformation between tasks can be promoted. This method is different from the shared features in the shared feature extraction module. While retaining the original features, It can better reflect the connection between different tasks and highlight the integration of multi-task features.

（2）训练过程(2) Training process

a)将输出与数据库中对应的深度图像、语义分割图、表面向量图真值之间的关系建立损失函数，每次处理图像数量为4；a) Establish a loss function between the output and the corresponding depth image, semantic segmentation map, and surface vector map true value in the database, and the number of images processed each time is 4;

b)通过减小损失函数来更新网络参数，损失函数采用与共有特征提取模块相同；b) Update the network parameters by reducing the loss function. The loss function is the same as the common feature extraction module;

所述多任务特征融合模块属于多输入多输出型网络，在训练中，需要将输出与数据库中对应的深度图像、语义分割图、表面向量图真值之间的关系建立损失函数，并通过减小损失函数来更新网络参数，经过迭代更新到损失函数收敛，得到训练好的网络。在训练网络时，可以将本模块与共有特征提取模块统一训练，以便形成一个端到端的神经网络，即输入为单张 RGB图像，输出为与其对应的深度图像、语义分割图、表面向量图。由于3个任务与共有特征提取模块相同，故采用相同的损失函数进行约束，故不详细说明。关于该模块训练的过程，本模型通过直接在网络的输出端得到3个任务的图像并与其对应的真值之间设计损失函数，经过梯度下降的方式进行反向传播，从而对网络的参数进行更新，当损失函数收敛时，可完成训练。在本模型算法的训练过程中，设定迭代次数为100，每次处理图像数量为4，初始学习率为10-4，每经过20次迭代更新学习率为原来的十分之一，最后在经过100次迭代后得到的收敛损失函数值为0.1159。The multi-task feature fusion module belongs to a multi-input multi-output network. During training, it is necessary to establish a loss function based on the relationship between the output and the corresponding depth image, semantic segmentation map, and surface vector map true value in the database, and reduce the A small loss function is used to update the network parameters. After iterative updating until the loss function converges, a well-trained network is obtained. When training the network, this module can be trained together with the common feature extraction module to form an end-to-end neural network, that is, the input is a single RGB image, and the output is its corresponding depth image, semantic segmentation map, and surface vector map. Since the three tasks are the same as the common feature extraction module, the same loss function is used for constraints, so no detailed description is given. Regarding the training process of this module, this model directly obtains the images of the three tasks at the output end of the network and designs a loss function between them and their corresponding true values, and performs backpropagation through gradient descent, thereby performing the parameters of the network. Update, when the loss function converges, training can be completed. During the training process of this model algorithm, the number of iterations is set to 100, the number of images processed each time is 4, the initial learning rate is 10-4, and the learning rate is updated to one-tenth of the original value after every 20 iterations, and finally The convergence loss function value obtained after 100 iterations is 0.1159.

综上所述，本发明的主要设计构思在于，围绕抑郁人群冲突处理障碍及负性偏向的核心症状，有机结合脑电生理指标及面部影像指标，评测其冲突处理障碍的程度，进而为抑郁症的识别及预估的提炼出客观、量化指标。具体是基于预设的实验范式执行脑电刺激实验，并同步采集被试的面部表情影像数据及个体脑电数据，经解析得到N270波形后，对面部表情影像数据及N270波形进行多特征提取，再将多特征整合后输入经训练的神经网络模型，输出抑郁症病症辅助识别结果。本发明不仅能够提供辅助医生利用实验波形及影像数据辨识抑郁症的客观指标，还具有准确性高、鲁棒性强、误差率低的优点。To sum up, the main design concept of the present invention is to focus on the core symptoms of conflict processing disorder and negativity bias in depressed people, organically combine electroencephalogram physiological indicators and facial imaging indicators to evaluate the degree of conflict processing disorder, and then provide the basis for depression. Identify and estimate to extract objective and quantitative indicators. Specifically, the brain electrical stimulation experiment is performed based on the preset experimental paradigm, and the subject's facial expression image data and individual EEG data are simultaneously collected. After the N270 waveform is obtained through analysis, multi-feature extraction is performed on the facial expression image data and N270 waveform. The multiple features are then integrated and input into the trained neural network model to output the auxiliary identification results of depression symptoms. The present invention can not only provide objective indicators to assist doctors in identifying depression using experimental waveforms and image data, but also has the advantages of high accuracy, strong robustness and low error rate.

本发明实施例中，“至少一个”是指一个或者多个，“多个”是指两个或两个以上。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A，B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达，是指的这些项中的任意组合，包括单项或复数项的任意组合。例如，a，b和c中的至少一项可以表示：a，b，c，a和b，a和c，b和c或a和b和c，其中a，b，c可以是单个，也可以是多个。In the embodiment of the present invention, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the relationship between associated objects, indicating that there can be three relationships. For example, A and/or B can represent the existence of A alone, the existence of A and B at the same time, or the existence of B alone. Where A and B can be singular or plural. The character "/" generally indicates that the related objects are in an "or" relationship. "At least one of the following" and similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c can mean: a, b, c, a and b, a and c, b and c or a and b and c, where a, b, c can be single, also Can be multiple.

以上依据图式所示的实施例详细说明了本发明的构造、特征及作用效果，但以上仅为本发明的较佳实施例，需要言明的是，上述实施例及其优选方式所涉及的技术特征，本领域技术人员可以在不脱离、不改变本发明的设计思路以及技术效果的前提下，合理地组合搭配成多种等效方案；因此，本发明不以图面所示限定实施范围，凡是依照本发明的构想所作的改变，或修改为等同变化的等效实施例，仍未超出说明书与图示所涵盖的精神时，均应在本发明的保护范围内。The structure, features and effects of the present invention have been described in detail based on the embodiments shown in the drawings. However, the above are only preferred embodiments of the present invention. It should be noted that the technologies involved in the above-mentioned embodiments and their preferred modes Characteristics, those skilled in the art can rationally combine and match them into various equivalent solutions without departing from or changing the design ideas and technical effects of the present invention; therefore, the scope of the present invention is not limited to what is shown in the drawings. Any changes made in accordance with the concept of the present invention, or modifications to equivalent embodiments with equivalent changes, which do not exceed the spirit covered by the description and drawings, should be within the protection scope of the present invention.

Claims

1. An auxiliary identification method for depression symptoms based on brain electrical data and facial expression images is characterized by comprising the following steps:

performing an electroencephalogram stimulation experiment based on a preset experimental paradigm, and synchronously collecting facial expression image data of a tested person and electroencephalogram data of an individual;

analyzing the electroencephalogram data to obtain an N270 waveform;

performing multi-feature extraction on the facial expression image data and the N270 waveform;

and inputting the integrated multi-feature into a trained neural network model based on a self-attention mechanism, and outputting an auxiliary identification result of the depression symptoms.

2. The method for assisting in identifying a depressive disorder based on electroencephalogram data and facial expression images according to claim 1, wherein the editing process of the preset experimental paradigm comprises:

based on Matlab and E-prime, using gray photos with consistent physical properties; the sex proportion in the gray photos is balanced, the tested list is neutral and the negative proportion is balanced, no facial mark exists, and partial facial shielding treatment is carried out on half of the photos;

a given number of trials, the duration of a single trial, and the total duration of the experiment are set.

3. The method for assisting in the identification of a depressive disorder based on electroencephalogram data and facial expression images according to claim 1, wherein the multi-feature extraction comprises:

extracting facial features and physiological signal features from the facial expression image data by utilizing a pre-trained multitasking depth prediction model;

extracting waveform features based on the N270 waveform;

and obtaining emotion feature vectors based on the facial expression image data and the N270 waveform.

4. The method for assisting in identifying a depression disorder based on electroencephalogram data and facial expression images according to claim 3, wherein the multi-task depth prediction model comprises a common feature extraction module and a multi-task feature fusion module;

the common feature extraction module is used for extracting features of each task and recovering a rough depth map, a semantic segmentation map and a surface vector map;

the multi-task feature fusion module is used for carrying out multi-task fusion on the features extracted by the common feature extraction module, and can distinguish the common semantic features of each task and the unique semantic features of each task.

5. The method for assisting in identifying a depressive disorder based on electroencephalogram data and facial expression images according to claim 4, wherein the common feature extraction module adopts a single-input multi-output network and is composed of at least four parts: the device comprises an encoder, a multi-dimensional decoder, a multi-scale feature fusion sub-module and a refinement sub-module;

the encoder is used for extracting characteristics of various scales;

the multi-dimensional decoder is used for gradually expanding the final characteristics of the encoder through an up-sampling module and reducing the number of channels at the same time;

the multi-scale feature fusion submodule is used for combining different information of a plurality of scales into one;

the refinement sub-module is used for adjusting the output size of the image and the channel number.

6. The method for assisting in identifying a depressive disorder based on electroencephalogram data and facial expression images according to claim 4, wherein the multi-task feature fusion module adopts a multi-input multi-output network and is composed of at least two parts: the multi-input feature fusion module is used for fusing the multi-task features output by the previous module; the feature decoding section is a multi-output decoder.

7. The auxiliary identification method for depression symptoms based on brain electrical data and facial expression images according to any one of claims 1 to 6, wherein the outputting the auxiliary identification result for depression symptoms comprises:

fusing the multiple features in time sequence to obtain space-time feature vectors;

and inputting the space-time feature vector into a Transformer encoder model, and classifying by softmax to obtain an auxiliary recognition result.