CN115732076A - Fusion analysis method for multi-modal depression data - Google Patents
Fusion analysis method for multi-modal depression data Download PDFInfo
- Publication number
- CN115732076A CN115732076A CN202211433256.2A CN202211433256A CN115732076A CN 115732076 A CN115732076 A CN 115732076A CN 202211433256 A CN202211433256 A CN 202211433256A CN 115732076 A CN115732076 A CN 115732076A
- Authority
- CN
- China
- Prior art keywords
- data
- depression
- value
- attention
- multimodal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 20
- 238000004458 analytical method Methods 0.000 title claims abstract description 13
- 230000007246 mechanism Effects 0.000 claims abstract description 13
- 238000012216 screening Methods 0.000 claims abstract description 5
- 230000014509 gene expression Effects 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013479 data entry Methods 0.000 claims description 2
- 230000008451 emotion Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000003745 diagnosis Methods 0.000 abstract description 3
- 230000002996 emotional effect Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000020401 Depressive disease Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 208000033300 perinatal asphyxia Diseases 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Landscapes
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
Description
技术领域technical field
本发明属于多模态数据融合领域,具体是一种应用于情绪识别的多模态融合分析方法。The invention belongs to the field of multimodal data fusion, in particular to a multimodal fusion analysis method applied to emotion recognition.
背景技术Background technique
抑郁症因发病率高、危害性大,已成为国际公认的严重威胁人类身心健康的公共卫生问题,早期识别、早期干预对于降低抑郁症的风险至关重要。传统抑郁症的诊断是医生根据临床经验和量表进行,这一方法主要依赖于单一模态数据,存在主观偏差,有滞后性、被动性和受限性等缺点。Jeffery等人研究发现运用多模态技术识别抑郁症的效果要优于单模态。Due to its high incidence and great harm, depression has become an internationally recognized public health problem that seriously threatens human physical and mental health. Early identification and early intervention are crucial to reducing the risk of depression. The traditional diagnosis of depression is made by doctors based on clinical experience and scales. This method mainly relies on single-modal data, which has subjective bias, lag, passivity, and limitations. Jeffery et al. found that using multimodal technology to identify depression is better than unimodal.
多模态技术指的是同时处理或拟合多种模态数据来增强模型性能的一种方法。不同模态的数据,因表现形式不同,表示含义不同而难以被对齐并融合。如在图像音频识别任务中,图像数据通常表现为图片,而语言数据通常表现为文字,两者因表现形式不同而难以融合;在基因测序分析中,不同测序方法之间的数据又因为表示含义不同而难以融合。Multimodal techniques refer to a method of simultaneously processing or fitting multiple modal data to enhance model performance. Data of different modalities is difficult to be aligned and fused due to different forms of expression and different meanings. For example, in image and audio recognition tasks, image data is usually presented as pictures, while language data is usually presented as text, and the two are difficult to integrate due to different forms of expression; Different and difficult to integrate.
现存的工作也对多模态技术有很多探索。Dupont,S等人用隐马尔可夫联合有限自动机的方法将语音数据与图片数据对齐,并用双模态数据识别语音与图片。该方法从一定程度上融合了不同表现形式的数据,但仍存在效率不高,可推广性较差的缺点。另一种思路是用神经网络进行多数据融合。Zeng,X等人利用多模态的自编码器,将10种药物描述信息(如副作用、作用通路等)融合在一起,同时输入疾病类型对疾病的类型进行划分实现药物种类的匹配,第二步对疾病的发病症状进行分割,对每个症状在不同个体上表现的差异性,增减对应的用药量。这种多模态融合方法,未能充分考虑到模态间的关系,也没有办法融合不同表现形式的数据。综上,虽然多模态技术已经有了许多尝试,但依旧没有一个方法能很好融合多模态数据。Existing works also explore a lot of multimodal techniques. Dupont, S et al. used hidden Markov joint finite automata to align speech data with image data, and used bimodal data to recognize speech and images. This method fuses data in different forms to a certain extent, but it still has the disadvantages of low efficiency and poor generalizability. Another way of thinking is to use neural networks for multi-data fusion. Zeng, X et al. used multi-modal self-encoders to fuse 10 kinds of drug description information (such as side effects, action pathways, etc.), and input disease types to classify the types of diseases to achieve the matching of drug types. The second The first step is to segment the symptoms of the disease, and to increase or decrease the corresponding drug dosage according to the differences in the performance of each symptom on different individuals. This multimodal fusion method fails to fully consider the relationship between modalities, and there is no way to fuse data of different representations. In summary, although there have been many attempts in multimodal technology, there is still no method that can well integrate multimodal data.
发明内容Contents of the invention
为了解决上述问题,本发明的目的是提供一种多模态抑郁数据的融合分析方法。In order to solve the above problems, the object of the present invention is to provide a fusion analysis method of multimodal depression data.
为了实现上述目的,本发明的技术方案如下:一种多模态抑郁数据的融合分析方法,将不同数据类别的数据进行多阶段数据录入,此时将录入的数据进行情绪特征提取,之后,不同模态的数据特征分别通过三个线形层求出K值、Q值和V值表达,再根据融合抑郁数据注意力机制,利用K,Q计算各模态数据的注意力A,将A·V作为融合后的特征,服务下游任务。由于融合抑郁数据注意力机制,融合后的数据特征将包含多模态信息,并能辅助下游分类任务。In order to achieve the above object, the technical solution of the present invention is as follows: a fusion analysis method of multi-modal depression data, data of different data categories is entered into multi-stage data, and at this time, the entered data is subjected to emotional feature extraction, after that, different The data characteristics of the modality are expressed through the three linear layers to obtain the K value, Q value and V value, and then according to the attention mechanism of the fusion depression data, use K and Q to calculate the attention A of each modality data, and A·V As a fused feature, it serves downstream tasks. Due to the attention mechanism of fused depression data, the fused data features will contain multimodal information and can assist downstream classification tasks.
进一步,包括以下步骤,Further, including the following steps,
S1数据预处理,将数据组分为文本数据、图像数据和音频数据;S1 data preprocessing, the data group is divided into text data, image data and audio data;
S2融合抑郁数据注意力机制,计算预处理后的数据,从而获得包含多模态信息的特征;S2 integrates the attention mechanism of depression data and calculates the preprocessed data to obtain features containing multimodal information;
S3抑郁症识别,将包含多模态信息的特征拼接,并通过一个线性层,输出一个融合后的数据特征,最后一层的神经元使用softmax函数作为激活函数并输出分类预测结果。S3 depression recognition, splicing features containing multi-modal information, and output a fused data feature through a linear layer, the neurons in the last layer use the softmax function as the activation function and output classification prediction results.
进一步,S1中文本数据包括量表和电子病例,所述量表和电子病历数据进行特征初筛,缺失值处理,特征编码和归一化。Further, the text data in S1 includes scales and electronic medical records, and the scales and electronic medical record data are subjected to feature preliminary screening, missing value processing, feature encoding and normalization.
进一步,S1中视频数据按照每秒20帧的频率进行图像抽取,将所得图像数据去噪声和去伪影后,对每帧图像进行面部位置检测,根据眼睛位置对齐图像,随后将视频图像裁剪为256×256像素的面部图像。Further, the video data in S1 is extracted at a frequency of 20 frames per second. After denoising and de-artifacting the obtained image data, the face position detection is performed on each frame of the image, and the images are aligned according to the position of the eyes, and then the video image is cropped as 256×256 pixel face image.
进一步,S1中音频数据与抽帧获取的图像集合对齐后针对各个对齐的语音片段提取梅尔频率倒谱系数。Further, after the audio data in S1 is aligned with the image set acquired by frame extraction, Mel-frequency cepstral coefficients are extracted for each aligned speech segment.
进一步,所述S2中计算文本数据、图像数据和音频数据分别对应的K值,Q值,V值和K值,Q值分别计算出视频、音频和文本的辅助注意力,将三种辅助注意力拼接并通过Softmax函数后形成视频、音频、文本的注意力并乘以前一步算出的V值。Further, in the said S2, the K value corresponding to the text data, the image data and the audio data are calculated respectively, the Q value, the V value and the K value, and the Q value calculates the auxiliary attention of the video, audio and text respectively, and the three auxiliary attention After force splicing and passing through the Softmax function, the attention of video, audio, and text is formed and multiplied by the V value calculated in the previous step.
进一步,所述S3中预测结果采用交叉熵损失函数拟合预测值与真实值的差异。Further, the predicted result in S3 uses a cross-entropy loss function to fit the difference between the predicted value and the real value.
采用上述方案后实现了以下有益效果:1、相对于通过单一模态研究抑郁症的现有技术,单一模态会受到个体差异等因素的影响,因此本技术方案利用病例集合对患者个体差异的甄别特征,随后根据甄别特征融合患者的图像、动作和声音,实现综合式诊断。After adopting the above scheme, the following beneficial effects have been achieved: 1. Compared with the prior art of studying depression through a single modality, the single modality will be affected by factors such as individual differences. The discriminative features are then used to fuse the patient's image, motion and voice for a comprehensive diagnosis.
2、相对于传统的拼接式数据融合方式,本技术方案中产生了以下效果结合不同模态的信息,将不同的模态在媒介上信息的表示结合。其次是对齐问题,对齐不同的模态信息并处理可能存在的依赖。最后是转换问题,使多个模态信息统一形式。2. Compared with the traditional splicing data fusion method, this technical solution produces the following effects: Combining information of different modalities, and combining information representations of different modalities on the media. Then there is the issue of alignment, aligning different modal messages and dealing with possible dependencies. Finally, there is the transformation problem, so that the information of multiple modalities can be unified.
附图说明Description of drawings
图1为多模态融合围产期抑郁症评估模型框架;Figure 1 is the framework of the multimodal fusion assessment model for perinatal depression;
图2为融合抑郁数据注意力机制方法。Figure 2 shows the attention mechanism method for fusing depression data.
具体实施方式Detailed ways
下面通过具体实施方式进一步详细说明:The following is further described in detail through specific implementation methods:
实施例基本如附图1和附图2所示:一种多模态抑郁数据的融合分析方法将不同数据类别的数据进行多阶段数据录入,此时将录入的数据进行情绪特征提取,之后,不同模态的数据特征分别通过三个线形层求出K值、Q值和V值表达,再根据融合抑郁数据注意力机制,利用K,Q计算各模态数据的注意力A,将A·V作为融合后的特征,服务下游任务。由于融合抑郁数据注意力机制,融合后的数据特征将包含多模态信息,并能辅助下游分类任务。The embodiment is basically as shown in accompanying
具体实施过程如下:本发明的输入为视频、音频与文本数据。分为三个主要阶段数据预处理,融合抑郁数据注意力机制(IntegratedDepressionDataAttention,IDDA),抑郁症识别。包括以下步骤,The specific implementation process is as follows: the input of the present invention is video, audio and text data. It is divided into three main stages: data preprocessing, integrated depression data attention mechanism (Integrated Depression Data Attention, IDDA), and depression recognition. Include the following steps,
S1数据预处理,将数据组分为文本数据、图像数据和音频数据,文本数据包括量表和电子病例,所述量表和电子病历数据进行特征初筛,缺失值处理,特征编码和归一化,视频数据按照每秒20帧的频率进行图像抽取,将所得图像数据去噪声和去伪影后,对每帧图像进行面部位置检测,根据眼睛位置对齐图像,随后将视频图像裁剪为256×256像素的面部图像,音频数据与抽帧获取的图像集合对齐后针对各个对齐的语音片段提取梅尔频率倒谱系数;S1 data preprocessing, the data group is divided into text data, image data and audio data, the text data includes scales and electronic medical records, and the scales and electronic medical record data are subjected to feature preliminary screening, missing value processing, feature encoding and normalization The video data is extracted at a frequency of 20 frames per second. After denoising and de-artifacting the obtained image data, the face position detection is performed on each frame of the image, and the image is aligned according to the eye position, and then the video image is cropped to 256× 256-pixel facial image, audio data is aligned with the image set obtained by frame extraction, and Mel frequency cepstral coefficients are extracted for each aligned speech segment;
S2融合抑郁数据注意力机制,计算预处理后的数据,从而获得包含多模态信息的特征,为了更好融合数据,我们提出了新的多模态数据融合机制(IDDA)。首先,1、对三种数据分别计算其对应的K值,Q值,V值。之后,用K值,Q值分别计算出视频、音频、文本的辅助注意力;S2 fuses the attention mechanism of depression data and calculates the preprocessed data to obtain features containing multimodal information. In order to better fuse data, we propose a new multimodal data fusion mechanism (IDDA). First, 1. Calculate the corresponding K value, Q value and V value for the three kinds of data respectively. After that, use the K value and Q value to calculate the auxiliary attention of video, audio, and text respectively;
将三种辅助注意力拼接并通过Softmax函数后形成视频、音频、文本的注意力并乘以前一步算出的V值,获得包含多模态信息的特征。The three kinds of auxiliary attention are spliced and passed through the Softmax function to form the attention of video, audio, and text, and multiplied by the V value calculated in the previous step to obtain features containing multimodal information.
将包含多模态信息的特征拼接,并通过一个线性层,输出一个融合后的数据特征,该数据特征作为下游任务的输入。The features containing multimodal information are concatenated and passed through a linear layer to output a fused data feature, which is used as the input of the downstream task.
S3抑郁症识别,将包含多模态信息的特征拼接,并通过一个线性层,输出一个融合后的数据特征,我们选用了LSTM作为下游任务的分类器。使用Adam优化器对模型进行优化,最后一层的神经元使用softmax函数作为激活函数并输出分类预测结果。采用交叉熵损失函数拟合预测值与真实值的差异,模型的学习率为0.001。S3 depression recognition, splicing features containing multimodal information, and output a fused data feature through a linear layer, we chose LSTM as the classifier for downstream tasks. The model is optimized using the Adam optimizer, and the neurons in the last layer use the softmax function as the activation function and output classification prediction results. The cross-entropy loss function was used to fit the difference between the predicted value and the real value, and the learning rate of the model was 0.001.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device.
以上所述的仅是本发明的实施例,方案中公知的具体结构及特性等常识在此未作过多描述,所属领域普通技术人员知晓申请日或者优先权日之前发明所属技术领域所有的普通技术知识,能够获知该领域中所有的现有技术,并且具有应用该日期之前常规实验手段的能力,所属领域普通技术人员可以在本申请给出的启示下,结合自身能力完善并实施本方案,一些典型的公知结构或者公知方法不应当成为所属领域普通技术人员实施本申请的障碍。应当指出,对于本领域的技术人员来说,在不脱离本发明结构的前提下,还可以作出若干变形和改进,这些也应该视为本发明的保护范围,这些都不会影响本发明实施的效果和专利的实用性。本申请要求的保护范围应当以其权利要求的内容为准,说明书中的具体实施方式等记载可以用于解释权利要求的内容。What is described above is only an embodiment of the present invention, and the common knowledge such as the specific structure and characteristics known in the scheme is not described too much here, and those of ordinary skill in the art know all the common knowledge in the technical field to which the invention belongs before the filing date or the priority date Technical knowledge, being able to know all the existing technologies in this field, and having the ability to apply conventional experimental methods before this date, those of ordinary skill in the art can improve and implement this plan based on their own abilities under the inspiration given by this application, Some typical known structures or known methods should not be obstacles for those of ordinary skill in the art to implement the present application. It should be pointed out that for those skilled in the art, under the premise of not departing from the structure of the present invention, several modifications and improvements can also be made, and these should also be regarded as the protection scope of the present invention, and these will not affect the implementation of the present invention. Effects and utility of patents. The scope of protection required by this application shall be based on the content of the claims, and the specific implementation methods and other records in the specification may be used to interpret the content of the claims.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211433256.2A CN115732076A (en) | 2022-11-16 | 2022-11-16 | Fusion analysis method for multi-modal depression data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211433256.2A CN115732076A (en) | 2022-11-16 | 2022-11-16 | Fusion analysis method for multi-modal depression data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115732076A true CN115732076A (en) | 2023-03-03 |
Family
ID=85296043
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211433256.2A Pending CN115732076A (en) | 2022-11-16 | 2022-11-16 | Fusion analysis method for multi-modal depression data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115732076A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116259407A (en) * | 2023-05-16 | 2023-06-13 | 季华实验室 | Disease diagnosis method, device, equipment and medium based on multimodal data |
CN116563920A (en) * | 2023-05-06 | 2023-08-08 | 北京中科睿途科技有限公司 | Method and device for identifying age in cabin environment based on multi-mode information |
CN118507036A (en) * | 2024-07-17 | 2024-08-16 | 长春理工大学中山研究院 | Emotion semantic multi-mode depression tendency recognition system |
-
2022
- 2022-11-16 CN CN202211433256.2A patent/CN115732076A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116563920A (en) * | 2023-05-06 | 2023-08-08 | 北京中科睿途科技有限公司 | Method and device for identifying age in cabin environment based on multi-mode information |
CN116563920B (en) * | 2023-05-06 | 2023-10-13 | 北京中科睿途科技有限公司 | Method and device for identifying age in cabin environment based on multi-mode information |
CN116259407A (en) * | 2023-05-16 | 2023-06-13 | 季华实验室 | Disease diagnosis method, device, equipment and medium based on multimodal data |
CN118507036A (en) * | 2024-07-17 | 2024-08-16 | 长春理工大学中山研究院 | Emotion semantic multi-mode depression tendency recognition system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115732076A (en) | Fusion analysis method for multi-modal depression data | |
Dhuheir et al. | Emotion recognition for healthcare surveillance systems using neural networks: A survey | |
Sonawane et al. | Review of automated emotion-based quantification of facial expression in Parkinson’s patients | |
Bhaskar et al. | LSTM model for visual speech recognition through facial expressions | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
WO2022031725A1 (en) | Ensemble machine-learning models to detect respiratory syndromes | |
CN112101096A (en) | Suicide emotion perception method based on multi-mode fusion of voice and micro-expression | |
CN116230234A (en) | Method and system for identifying mental health abnormalities with multimodal feature consistency | |
CN111326178A (en) | Multi-mode speech emotion recognition system and method based on convolutional neural network | |
CN117409454B (en) | Facial muscle movement monitoring-based emotion dynamic recognition method and device | |
CN117796810B (en) | A multidimensional psychological state assessment method based on multimodal fusion | |
CN115035438A (en) | Sentiment analysis method, device and electronic device | |
Kumar et al. | Chest X ray and cough sample based deep learning framework for accurate diagnosis of COVID-19 | |
CN117763446B (en) | Multi-mode emotion recognition method and device | |
CN114492579A (en) | Emotion recognition method, camera device, emotion recognition device and storage device | |
Palaniappan et al. | Reliable system for respiratory pathology classification from breath sound signals | |
Zhang et al. | Multimodal sensing for depression risk detection: Integrating audio, video, and text data | |
CN118398173A (en) | Multi-mode depression detection method based on hierarchical cross attention mechanism | |
Sahu et al. | Novel framework for Alzheimer early diagnosis using inductive transfer learning techniques | |
Jose | Frame work for EEG based emotion recognition based on hybrid neural network | |
Li et al. | End-to-end multimodal emotion recognition based on facial expressions and remote photoplethysmography signals | |
Krishna et al. | Different approaches in depression analysis: A review | |
Chen et al. | SVM-based identification of pathological voices | |
CN118507036B (en) | Emotion semantic multi-mode depression tendency recognition system | |
CN118629676A (en) | Visitation management method and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |