CN111985532A - Scene-level context-aware emotion recognition deep network method - Google Patents
Scene-level context-aware emotion recognition deep network method Download PDFInfo
- Publication number
- CN111985532A CN111985532A CN202010664287.3A CN202010664287A CN111985532A CN 111985532 A CN111985532 A CN 111985532A CN 202010664287 A CN202010664287 A CN 202010664287A CN 111985532 A CN111985532 A CN 111985532A
- Authority
- CN
- China
- Prior art keywords
- body part
- context
- layer
- network
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 26
- 230000002996 emotional effect Effects 0.000 claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 55
- 230000004927 fusion Effects 0.000 claims abstract description 48
- 230000003044 adaptive effect Effects 0.000 claims abstract description 39
- 238000012360 testing method Methods 0.000 claims abstract description 38
- 230000008451 emotion Effects 0.000 claims abstract description 24
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 22
- 230000037007 arousal Effects 0.000 claims abstract description 13
- 238000002372 labelling Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000011176 pooling Methods 0.000 claims description 19
- 238000010606 normalization Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 210000000746 body region Anatomy 0.000 claims 1
- 238000013507 mapping Methods 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 6
- 238000005070 sampling Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 5
- 238000007500 overflow downdraw method Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了场景级上下文感知的情感识别深度网络方法,通过读取训练样本集Xin身体标注值与原始情感标注值,得到身体部位图像集XB;对Xin和XB归一化处理后分别送入上、下层卷积神经网络,提取情感特征TF和上下文情感特征TC,将TF和TC分别送入上、下层自适应层得到融合权重λF和λC,将TF、TC、λF和λC融合得到情感融合特征TA,TA经全连接层线性映射,得到arousal和valence初始预测值,衡量两个初始预测值与原始情感标注值之间的损失,逐步收敛,完成训练,得到网络模型;将测试样本集处理后送入网络模型,获得测试样本集Xtn预测标签值。本发明的方法融合特征时考虑到不同属性的特征对与人物情感的影响程度,在丰富基于图像的情感识别研究工作的基础上提高了模型的预测性能。
The invention discloses a scene-level context-aware emotion recognition deep network method. The body part image set X B is obtained by reading the body labeling value of the training sample set X in and the original emotion labeling value; and the X in and X B are normalized. Then, it is sent to the upper and lower convolutional neural networks respectively to extract the emotional feature TF and the contextual emotional feature TC, and send TF and TC to the upper and lower adaptive layers respectively to obtain the fusion weights λ F and λ C . F , T C , λ F and λ C are fused to obtain emotional fusion feature T A , T A is linearly mapped by the fully connected layer to obtain the initial predicted values of arousal and valence, and measure the loss between the two initial predicted values and the original emotional labeling value , gradually converge, complete the training, and obtain the network model; after processing the test sample set, send it to the network model, and obtain the predicted label value of the test sample set X tn . The method of the invention takes into account the influence degree of the features of different attributes on the emotion of the characters when fusing the features, and improves the prediction performance of the model on the basis of enriching the research work of emotion recognition based on images.
Description
技术领域technical field
本发明属于模式识别技术领域,具体涉及一种场景级上下文感知的情感 识别深度网络方法。The invention belongs to the technical field of pattern recognition, and in particular relates to a scene-level context-aware emotion recognition deep network method.
背景技术Background technique
情感是人们表达自身感受的一种必要表达形式。在日常生活中,从一个 人所处的实际场景理解并识别他们的情绪有助于感知其精神状态和预测行 为,有效进行互动。早在上世纪90年代,情感计算这一概念已被MIT媒体 实验室提出,随后科学家们就开始致力于将人类的复杂情感转化为计算机可 以识别的数值信息,以更好实现人机交互,使计算机趋于智能化,这已成为 人工智能时代亟待解决的关键问题之一。Emotions are a necessary form of expression for people to express their feelings. In everyday life, understanding and recognizing a person's emotions from the actual situation they are in can help to perceive their mental state and predict behavior for effective interaction. As early as the 1990s, the concept of affective computing was proposed by the MIT Media Lab, and then scientists began to work on transforming complex human emotions into numerical information that can be recognized by computers, so as to better realize human-computer interaction and enable human-computer interaction. Computers are becoming more intelligent, which has become one of the key problems to be solved urgently in the era of artificial intelligence.
传统意义上,针对静态图像的情感识别任务主要依据人脸图像展开研究。 对于人脸图像,采用预先定义的特征提取方法进行情感特征提取,并送入分 类器(回归器)进行模型训练,最终实现情感预测。然而,基于人脸图像进 行情感识别容易受到姿态、光照以及人脸差异等自然环境与样本特征因素的 影响。Traditionally, emotion recognition tasks for static images are mainly based on face images. For the face image, the pre-defined feature extraction method is used to extract emotional features, and then sent to the classifier (regressor) for model training, and finally achieve emotional prediction. However, emotion recognition based on face images is easily affected by natural environment and sample characteristic factors such as posture, illumination, and face differences.
根据心理学研究发现,视觉传达的情感信息中,人脸图像传递的信息大 约占55%。在日常情感交流中,判断一个人的情感,不仅可以通过目标人物 的面部表情,还可以通过周围环境,如人物动作,与他人互动,所处场景等 一系列丰富的上下文信息对人物内心的情感进行估计,甚至在检测不到人脸 的极端情况下,依然可以通过大量上下文信息对研究对象的情感进行估计。According to psychological research, about 55% of the emotional information conveyed by vision is conveyed by face images. In daily emotional communication, a person's emotions can be judged not only through the facial expressions of the target character, but also through a series of rich contextual information such as the character's actions, interaction with others, and the scene in which the character is located. Even in the extreme case where no face is detected, the emotion of the research subject can still be estimated through a large amount of contextual information.
近年来,基于深度卷积网络的复杂情感识别方法引起关注,让网络自行 学习情感特征并进行分析替代了传统人为定义的方式。但是,当下的深度学 习分析方法主要针对人脸图像进行情感分析,缺乏对于自然场景复杂情形下 人物表现的综合考虑,未曾考虑场景级上下文信息对于场景中人物情感识别 的影响。同时,对于不同属性特征的融合方式也研究不足,所建立的模型往 往忽略了不同属性特征对于情感状态识别的贡献程度。In recent years, complex emotion recognition methods based on deep convolutional networks have attracted attention, allowing the network to learn and analyze emotional features on its own instead of the traditional human-defined method. However, the current deep learning analysis methods mainly focus on emotion analysis of face images, lack of comprehensive consideration for the performance of characters in complex natural scenes, and have not considered the impact of scene-level context information on emotion recognition of characters in scenes. At the same time, there is insufficient research on the fusion method of different attribute features, and the established models often ignore the contribution of different attribute features to emotional state recognition.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种场景级上下文感知的情感识别深度网络方法, 解决了现有技术中基于静态图像的情感分析范围较窄,仅针对人脸图像,以 及采用直接拼接不同属性特征的方式进行情感识别的局限性问题。The purpose of the present invention is to provide a scene-level context-aware emotion recognition deep network method, which solves the problem that the range of emotion analysis based on static images in the prior art is narrow, only for face images, and the method of directly splicing different attribute features is adopted. Limitations of Emotion Recognition.
本发明所采用的技术方案是,一种场景级上下文感知的情感识别深度网 络方法,具体包括以下步骤:The technical solution adopted in the present invention is, a scene-level context-aware emotion recognition deep network method, which specifically includes the following steps:
步骤1、采集图像,确定训练样本集Xin和测试样本集Xtn;Step 1, collect images, determine the training sample set X in and the test sample set X tn ;
步骤2、读取训练样本集Xin中每个样本的身体标注值与原始情感标注值, 依据身体标注值提取每个样本的身体部位,得到身体部位图像集XB;Step 2, reading the body labeling value and the original emotion labeling value of each sample in the training sample set X in , and extracting the body part of each sample according to the body labeling value to obtain a body part image set X B ;
步骤3、对训练样本集Xin进行集合内归一化处理,得到上下文情感图像 集Xim;对身体部位图像集XB进行集合内归一化处理,得到归一化的身体部 位图像集Xbody;Step 3. Carry out intra-set normalization processing to the training sample set X in to obtain the context emotional image set X im ; Carry out intra-set normalization processing to the body part image set X B to obtain the normalized body part image set X body ;
步骤4、将归一化的身体部位图像集Xbody送入上层的卷积神经网络提取 身体部位情感特征TF,并将上下文情感图像集Xim送入下层的卷积神经网络 提取场景级上下文情感特征TC;Step 4. Send the normalized body part image set X body to the upper convolutional neural network to extract body part emotional features TF , and send the contextual emotional image set X im to the lower convolutional neural network to extract scene-level context affective feature T C ;
步骤5、将身体部位情感特征TF与场景级上下文情感特征TC分别送入上 层的自适应层和下层的自适应层进行特征自适应学习,上层的自适应层输出 身体部位融合权重λF,下层的自适应层输出上下文融合权重λC;Step 5. The body part emotional feature TF and the scene - level contextual emotional feature TC are respectively sent to the upper adaptive layer and the lower adaptive layer for feature adaptive learning, and the upper adaptive layer outputs the body part fusion weight λF , the lower adaptive layer outputs the context fusion weight λ C ;
步骤6、将身体部位情感特征TF、场景级上下文情感特征TC与身体部位 融合权重λF、上下文融合权重λC进行加权融合得到结合上下文信息的情感 融合特征TA,然后将TA经过全连接层线性映射,得到arousal和valence的初 始预测值,采用KL散度损失函数,衡量arousal和valence初始预测值与对 应的原始情感标注值之间的损失,通过网络反向传播,多次迭代,更新网络 权重,逐渐减少损失,使算法逐步收敛,完成训练,得到网络模型;Step 6. Perform weighted fusion of the body part emotional feature TF , the scene -level contextual emotional feature TC with the body part fusion weight λ F , and the context fusion weight λ C to obtain the emotional fusion feature TA combined with the context information, and then pass TA through . The fully connected layer is linearly mapped to obtain the initial predicted values of arousal and valence, and the KL divergence loss function is used to measure the loss between the initial predicted values of arousal and valence and the corresponding original sentiment annotation values, backpropagation through the network, multiple iterations , update the network weight, gradually reduce the loss, make the algorithm gradually converge, complete the training, and obtain the network model;
步骤7、按照步骤2提取测试样本集Xtn中每一个测试样本的身体部位, 获得测试身体部位图像集XtB,接着按照步骤3分别对测试样本集Xtn和测试 身体部位图像集XtB进行归一化处理后,送入步骤6所得的网络模型,最终 得到测试样本集Xtn的arousal和valence预测标签值。Step 7: Extract the body part of each test sample in the test sample set X tn according to step 2, and obtain the test body part image set X tB , and then perform the test sample set X tn and the test body part image set X tB according to step 3 respectively. After normalization, it is sent to the network model obtained in step 6, and finally the arousal and valence predicted label values of the test sample set X tn are obtained.
本发明的特点还在于,The present invention is also characterized in that,
步骤2中对训练样本集Xin进行身体部位提取的具体步骤为:In step 2, the specific steps for extracting body parts from the training sample set X in are as follows:
步骤2.1、读取训练样本集Xin中每个样本的的身体标注(Bx1,By1,Bx2,By2), 其中(Bx1,By1,),(Bx2,By2)为身体部位所在的斜对角的两点坐标,通过公式(1) 计算位置与尺寸参数集其中:Step 2.1. Read the body label (B x1 ,B y1 ,B x2 ,B y2 ) of each sample in the training sample set X in , where (B x1 ,B y1 ,), (B x2 ,B y2 ) are The two-point coordinates of the diagonally opposite corners where the body part is located, and the position and size parameter set is calculated by formula (1) in:
公式(1)中,Bw表示身体部位图像的宽度,Bh表示身体部位图像的宽 度;In formula (1), B w represents the width of the body part image, and B h represents the width of the body part image;
步骤2.2、根据步骤1.1中得到的参数集对训练样本集Xin中的每个样本进行裁剪,得到身体部位图像集XB。Step 2.2, according to the parameter set obtained in step 1.1 Each sample in the training sample set X in is cropped to obtain a body part image set X B .
步骤3中对训练样本集Xin进行集合内归一化处理的公式如下:The formula for performing intra-set normalization on the training sample set X in in step 3 is as follows:
公式(2)中,Xin为训练样本集,Xim为上下文情感图像集,σ为训练 样本集的标准差图像,xmean为训练样本集的均值图像;In formula (2), X in is the training sample set, X im is the context emotional image set, σ is the standard deviation image of the training sample set, and x mean is the mean image of the training sample set;
公式(2)中xmean和σ定义如下:In formula (2), x mean and σ are defined as follows:
公式(3)和(4)中,xi表示训练样本集Xin中的任一单个样本,n表 示总的训练样本个数,n≥1。In formulas (3) and (4), x i represents any single sample in the training sample set X in , n represents the total number of training samples, and n≥1.
步骤3中对身体部位图像集XB进行集合内归一化处理的公式如下:The formula for performing intra-set normalization on the body part image set X B in step 3 is as follows:
公式(5)中,XB为身体部位图像集,Xbody为归一化后的身体部位图像 集,σ'为身体部位图像集的标准差图像,x'mean为身体部位图像集的均值图像;In formula (5), X B is the body part image set, X body is the normalized body part image set, σ' is the standard deviation image of the body part image set, and x' mean is the mean image of the body part image set ;
公式(5)中x′mean和σ'定义如下:In formula (5), x' mean and σ' are defined as follows:
公式(6)和(7)中,x′i'表示身体部位图像集XB中的任一单个样本,n 表示总的训练样本个数,n≥1。In formulas (6) and (7), x'i' represents any single sample in the body part image set X B , n represents the total number of training samples, and n≥1.
步骤4中上层的卷积神经网络和下层的卷积神经网络结构参数相同,均 采用VGG16架构。In step 4, the upper-layer convolutional neural network and the lower-layer convolutional neural network have the same structure parameters, and both use the VGG16 architecture.
步骤4中身体部位情感特征TF和上下文情感特征TC的计算过程如下:The calculation process of the body part emotional feature TF and the contextual emotional feature TC in step 4 is as follows:
TF=F(Xbody,WF) (8)T F =F(X body ,W F ) (8)
TC=F(Xim,WC) (9)T C =F(X im ,W C ) (9)
公式(8)中,WF表示上层的卷积神经网络全部卷积层与池化层的所有 参数,公式(9)中,WC表示下层的卷积神经网络全部卷积层与池化层的所 有参数,F表示特征提取网络中卷积及池化的计算操作。In formula (8), WF represents all parameters of all convolutional layers and pooling layers of the upper convolutional neural network, and in formula (9), WC represents all convolutional layers and pooling layers of the lower convolutional neural network All parameters of , F represents the computational operations of convolution and pooling in the feature extraction network.
步骤5中身体部位融合权重λF和上下文融合权重λC的计算过程如下:The calculation process of the body part fusion weight λ F and the context fusion weight λ C in step 5 is as follows:
λF=F(TF,WD) (10)λ F =F(T F ,W D ) (10)
λC=F(TC,WE) (11)λ C =F(T C ,W E ) (11)
公式(10)中,WD为上层的自适应层网络参数,公式(11)中,WE为 下层的自适应层网络参数,且λF+λC=1。In formula (10), WD is the network parameter of the adaptive layer of the upper layer, and in the formula (11), WE is the network parameter of the adaptive layer of the lower layer, and λ F + λ C = 1.
步骤5中上层的自适应层和下层的自适应层的网络结构完全相同,具体 的网络架构参数如下:In step 5, the network structures of the adaptive layer of the upper layer and the adaptive layer of the lower layer are exactly the same, and the specific network architecture parameters are as follows:
上层的自适应层和下层的自适应层各包含一个最大池化层和两个卷积 层以及一个Softmax层。The upper adaptive layer and the lower adaptive layer each contain a max pooling layer, two convolutional layers and a Softmax layer.
步骤6中身体部位情感特征TF、场景级上下文特征TC与身体部位融合权 重λF、上下文融合权重λC进行加权融合的计算公式如下:In step 6, the calculation formula for weighted fusion of body part emotional feature TF, scene-level context feature TC, body part fusion weight λ F , and context fusion weight λ C is as follows:
公式(12)中,TA为情感融合特征,Π表示连接运算符,表示将融合权 重后的身体部位情感特征和场景级上下文情感特征进行拼接,表示不同特 性特征与融合权重之间的卷积操作。In formula (12), T A is the emotional fusion feature, and Π represents the connection operator, which means that the emotional features of the body parts and the contextual emotional features of the scene level after the fusion weight are spliced, Represents the convolution operation between different feature features and fusion weights.
本发明的有益效果是:本发明的一种场景级上下文感知的情感识别深度 网络方法,提出一种两阶段上下文情感识别网络,第一阶段采用两层卷积神 经网络对复杂图像中的人物以及场景分别进行特征提取,第二阶段采用自适 应网络对两层子网络中不同属性的特征进行加权融合。采用两阶段上下文情 感识别网络,一方面解决了现有针对图像的情感识别任务主要针对人脸图像 数据的现实不足问题;另一方面,在融合特征时充分考虑不同属性的特征对 与人物情感的影响程度,在丰富基于图像的情感识别研究工作的基础上提高 了模型的预测性能。The beneficial effects of the present invention are as follows: a scene-level context-aware emotion recognition deep network method of the present invention proposes a two-stage contextual emotion recognition network. Feature extraction is performed on the scene separately, and in the second stage, the adaptive network is used to perform weighted fusion of the features of different attributes in the two-layer sub-network. Using a two-stage contextual emotion recognition network, on the one hand, it solves the problem that the existing emotion recognition tasks for images mainly focus on face image data; The degree of influence improves the predictive performance of the model on the basis of enriching research work on image-based emotion recognition.
附图说明Description of drawings
图1是本发明的一种场景级上下文感知的情感识别深度网络方法的整体 流程图;Fig. 1 is the overall flow chart of the emotion recognition deep network method of a kind of scene level context perception of the present invention;
图2是复杂情感图像及其情感维度标注信息展示图;Figure 2 is a display diagram of complex emotional images and their emotional dimension annotation information;
图3是卷积操作示意图;Figure 3 is a schematic diagram of a convolution operation;
图4是通过小卷积核堆叠扩大感受野示意图;Figure 4 is a schematic diagram of expanding the receptive field by stacking small convolution kernels;
图5是池化操作示意图。Figure 5 is a schematic diagram of the pooling operation.
具体实施方式Detailed ways
下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
本发明的一种场景级上下文感知的情感识别深度网络方法,具体过程如 图1所示,具体包括以下步骤:A scene-level context-aware emotion recognition deep network method of the present invention, the specific process is shown in Figure 1, and specifically includes the following steps:
步骤1、采集图像,确定训练样本集Xin和测试样本集Xtm;Step 1, collect images, determine the training sample set X in and the test sample set X tm ;
每个训练样本样本和测试样本均有其对应的原始情感标注值和身体标 注值。Each training sample sample and test sample has its corresponding original emotion annotation value and body annotation value.
对于训练样本集Xin,原始情感标注值为n×2维的向量 y=[(a1,v1),(a2,v2),...,(an,vn)],其中(a1,v1)分别表示训练样本集Xin中的第1个样 本的arousal和valence标签,…,(an,vn)分别表示训练样本集Xin中的第n个 样本的arousal和valence标签,身体部位标注值为n×4维的向量 表示身体部位图像集 XB中的第1个样本的身体标注,…,表示身体部位图像集XB中的第n个样本的身体标注。For the training sample set X in , the original sentiment annotation value is an n×2-dimensional vector y=[(a 1 ,v 1 ),(a 2 ,v 2 ),...,(a n ,v n )], where (a 1 , v 1 ) represent the arousal and valence labels of the first sample in the training sample set X in , respectively, ..., (a n , v n ) represent the nth sample in the training sample set X in , respectively arousal and valence labels, body part label value is a vector of n × 4 dimensions represents the body annotation of the 1st sample in the set of body part images X B , ..., Represents the body annotation of the nth sample in the set of body part images X B .
对于测试样本集Xtn,原始情感标注值为m×2维的向量 ty=[(ta1,tv1),(ta2,tv2),...,(tam,tvm)],身体部位标注值为m×4维的向量m代表测试样本的数目。For the test sample set X tn , the original sentiment annotation value is an m×2-dimensional vector ty=[(ta 1 ,tv 1 ),(ta 2 ,tv 2 ),...,( tam ,tv m ) ], Body part label value is a vector of dimensions m × 4 m represents the number of test samples.
步骤2、读取训练样本集Xin中每个样本的身体标注值与原始情感标注值, 依据身体标注值提取每个样本的身体部位,得到身体部位图像集XB;Step 2, reading the body labeling value and the original emotion labeling value of each sample in the training sample set X in , and extracting the body part of each sample according to the body labeling value to obtain a body part image set X B ;
其中,对训练样本集Xin进行身体部位提取的具体步骤为:The specific steps for extracting body parts from the training sample set X in are as follows:
步骤2.1、读取训练样本集Xin中每个样本的身体标注(Bx1,By1,Bx2,By2), 其中(Bx1,By1,),(Bx2,By2)为身体部位所在的斜对角的两点坐标,通过公式(1) 计算位置与尺寸参数集其中:Step 2.1. Read the body annotations (B x1 , By y1 , B x2 , By y2 ) of each sample in the training sample set X in , where (B x1 , By y1 ,), (B x2 , By y2 ) are the body The two-point coordinates of the diagonally opposite corner where the part is located, and the position and size parameter set is calculated by formula (1) in:
公式(1)中,Bw表示身体部位图像的宽度,Bh表示身体部位图像的宽 度。In formula (1), B w represents the width of the body part image, and B h represents the width of the body part image.
步骤2.2、根据步骤2.1中得到的参数集对训练样本集Xin中每个训练样本进行裁剪,得到身体部位图像集XB。Step 2.2, according to the parameter set obtained in step 2.1 Each training sample in the training sample set X in is cropped to obtain a body part image set X B .
步骤3、对训练样本集Xin进行集合内归一化处理,得到上下文情感图像 集Xim;对身体部位图像集XB进行集合内归一化处理,得到归一化的身体部 位图像集Xbody;Step 3. Carry out intra-set normalization processing to the training sample set X in to obtain the context emotional image set X im ; Carry out intra-set normalization processing to the body part image set X B to obtain the normalized body part image set X body ;
其中,对训练样本集Xin进行集合内归一化处理的公式如下:Among them, the formula for performing intra-set normalization on the training sample set X in is as follows:
公式(2)中,Xin为训练样本集,Xim为上下文情感图像集,σ为训练 样本集的标准差图像,xmean为训练样本集的均值图像;In formula (2), X in is the training sample set, X im is the context emotional image set, σ is the standard deviation image of the training sample set, and x mean is the mean image of the training sample set;
公式(2)中xmean和σ定义如下:In formula (2), x mean and σ are defined as follows:
公式(3)和(4)中,xi表示训练样本集Xin中的任一单个样本,n表 示总的训练样本个数,n≥1。In formulas (3) and (4), x i represents any single sample in the training sample set X in , n represents the total number of training samples, and n≥1.
对身体部位图像集XB进行集合内归一化处理的公式如下:The formula for intra-set normalization of the body part image set X B is as follows:
公式(5)中,XB为身体部位图像集,Xbody为归一化后的身体部位图像 集,σ'为身体部位图像集的标准差图像,x'mean为身体部位图像集的均值图像;In formula (5), X B is the body part image set, X body is the normalized body part image set, σ' is the standard deviation image of the body part image set, and x' mean is the mean image of the body part image set ;
公式(5)中x′mean和σ'定义如下:In formula (5), x' mean and σ' are defined as follows:
公式(6)和(7)中,x′i'表示身体部位图像集XB中的任一单个样本,n 表示总的训练样本个数,n≥1。In formulas (6) and (7), x'i' represents any single sample in the body part image set X B , n represents the total number of training samples, and n≥1.
步骤4、将归一化的身体部位图像集Xbody送入上层的卷积神经网络提取 身体部位情感特征TF,并将上下文情感图像集Xim送入下层的卷积神经网络 提取场景级上下文情感特征TC;Step 4. Send the normalized body part image set X body to the upper convolutional neural network to extract body part emotional features TF , and send the contextual emotional image set X im to the lower convolutional neural network to extract scene-level context affective feature T C ;
步骤4.1、对整体网络架构参数初始化,包含网络中所有卷积层、池化 层、全连接层,将每层权重初始化为整体服从均值为0,标准差为1的高斯 分布,偏置项统一初始化为0.001;Step 4.1. Initialize the parameters of the overall network architecture, including all convolutional layers, pooling layers, and fully-connected layers in the network. Initialize the weights of each layer to a Gaussian distribution with a mean of 0 and a standard deviation of 1, and the bias terms are unified. initialized to 0.001;
步骤4.2、将身体部位图像集Xbody送入上层的卷积神经网络,将上下文 情感图像集Xim送入下层的卷积神经网络,上层与下层卷积神经网络模型结 构相同,均采用VGG16网络架构,VGG16网络架构参数如下表1所示:Step 4.2. Send the body part image set X body to the upper convolutional neural network, and send the contextual emotion image set X im to the lower convolutional neural network. The upper and lower convolutional neural network models have the same structure, and both use VGG16 network Architecture, VGG16 network architecture parameters are shown in Table 1 below:
表1情感特征提取卷积网络架构参数表Table 1 Sentiment feature extraction convolutional network architecture parameter table
由网络架构参数表1可见,对于网络结构中5个卷积层C1、C2、C3、 C4、C5,其对应的特征图谱个数分别为64、128、256、512、512,每个特 征图谱由输入图像或上一层的输出Xm分别与对应数目的卷积模版Kuv作卷积 运算,最后再加上偏置项bv,卷积过程如图3所示,特征图谱的具体计算公 式为:It can be seen from Table 1 of the network architecture parameters that for the five convolutional layers C1, C2, C3, C4, and C5 in the network structure, the corresponding number of feature maps are 64, 128, 256, 512, and 512, respectively. The input image or the output X m of the previous layer is convolved with the corresponding number of convolution templates K uv respectively, and finally the bias term b v is added. The convolution process is shown in Figure 3. The specific calculation of the feature map The formula is:
公式(13)中,u的取值为{1,2,3,4,5},表示对应的卷积层数,v的取值 为各个层对应的卷积模版的个数,分别为64,128,256,512,512,代 表步长为1的卷积运算,卷积核大小都为3×3,通过小卷积核的堆叠,扩大 了卷积层的感受野,同时可有效减少卷积层的参数量,感受野示意图如图4 所示。In formula (13), the value of u is {1, 2, 3, 4, 5}, indicating the number of corresponding convolution layers, and the value of v is the number of convolution templates corresponding to each layer, which are 64 respectively. , 128, 256, 512, 512, Represents a convolution operation with a stride of 1, and the size of the convolution kernel is 3 × 3. The stacking of small convolution kernels expands the receptive field of the convolution layer, and at the same time can effectively reduce the amount of parameters of the convolution layer, and the receptive field The schematic diagram is shown in Figure 4.
对于池化层S1、S2、S3、S4,采用最大采样对应的卷积层得到的结果 进行采样,本发明的池化采样区域大小为2×2,步长为2,池化过程如图5 所示,例如:卷积层C1的第1个特征图谱Xm的第一个采样区域2×2,采样 结果得到池化层S1的第1个特征图谱的第一个输入O1,其中采样方法为取 2×2区域中的最大值,其他输出也类似,采样后的水平和垂直空间分辨率变 为原来的1/2。For the pooling layers S1, S2, S3, and S4, the results obtained by the convolutional layer corresponding to the maximum sampling are used for sampling. The size of the pooling sampling area in the present invention is 2×2, and the step size is 2. The pooling process is shown in Figure 5. As shown, for example: the first sampling area of the first feature map X m of the convolutional layer C1 is 2×2, and the sampling result obtains the first input O 1 of the first feature map of the pooling layer S1, where sampling The method is to take the maximum value in the 2×2 area, and other outputs are similar, and the horizontal and vertical spatial resolution after sampling becomes 1/2 of the original.
步骤4.3、归一化的身体部位图像集Xbody、上下文情感图像集Xim分别经 过上层的卷积神经网络和下层的卷积神经网络迭代及计算之后,可得到身体 部位情感特征TF,场景级上下文情感特征TC,计算过程可用下式表示:Step 4.3. After the normalized body part image set X body and the contextual emotion image set X im are iterated and calculated by the upper-layer convolutional neural network and the lower-layer convolutional neural network respectively, the body part emotional feature TF can be obtained, and the scene level context emotional feature T C , the calculation process can be expressed by the following formula:
TF=F(Xbody,WF) (8)T F =F(X body ,W F ) (8)
TC=F(Xim,WC) (9)T C =F(X im ,W C ) (9)
公式(8)中,WF表示上层的身体部位情感特征提取涉及的网络参数, 公式(9)中,WC表示为下层的场景级上下文信息特征提取涉及的网络参数, F表示特征提取网络中卷积及池化层的计算操作;In formula (8), WF represents the network parameters involved in the extraction of emotional features of the upper body part, in formula (9), W C represents the network parameters involved in the feature extraction of the scene-level context information of the lower layer, and F represents the network parameters in the feature extraction network. Computational operations of convolution and pooling layers;
步骤5、将身体部位情感特征TF与场景级上下文情感特征TC分别送入上 层的自适应层和下层的自适应层进行自适应权重学习,上层的自适应层输出 身体部位融合权重λF,下层的自适应层输出上下文融合权重λC;Step 5. The body part emotional feature TF and the scene - level contextual emotional feature TC are respectively sent to the upper adaptive layer and the lower adaptive layer for adaptive weight learning, and the upper adaptive layer outputs the body part fusion weight λF , the lower adaptive layer outputs the context fusion weight λ C ;
对于自适应层网络结构,上、下两层的自适应层网络结构完全相同,两 个网络分别包含一个最大池化层、两个卷积层和一个Softmax层,整体结构 参数如下表2所示:For the adaptive layer network structure, the adaptive layer network structure of the upper and lower layers is exactly the same. The two networks respectively include a maximum pooling layer, two convolution layers and a Softmax layer. The overall structural parameters are shown in Table 2 below. :
表2自适应融合网络架构参数表Table 2 Parameter table of adaptive fusion network architecture
最终通过Softmax层输出身体部位融合权重λF和上下文融合权重λC,计 算过程如下:Finally, the body part fusion weight λ F and the context fusion weight λ C are output through the Softmax layer. The calculation process is as follows:
λF=F(TF,WD) (10)λ F =F(T F ,W D ) (10)
λC=F(TC,WE) (11)λ C =F(T C ,W E ) (11)
公式(10)中,WD为上层的自适应层网络参数,公式(11)中,WE为 下层的自适应层网络参数,同时通过自适应网络层最后的Softmax层对融合 权值添加约束,使得λF+λC=1。In formula (10), WD is the network parameter of the upper adaptive layer, and in formula (11), WE is the network parameter of the adaptive layer of the lower layer. At the same time, constraints are added to the fusion weights through the last Softmax layer of the adaptive network layer. , so that λ F +λ C =1.
步骤6、将身体部位情感特征TF、场景级上下文情感特征TC与身体部位 融合权重λF、上下文融合权重λC进行加权融合得到结合上下文信息的情感 融合特征TA,然后将TA经过全连接层线性映射,得到arousal和valence的初 始预测值,采用KL散度损失函数,衡量arousal和valence初始预测值与对 应的原始情感标注值之间的损失,通过网络反向传播,多次迭代,更新网络 权重,逐渐减少损失,使算法逐步收敛,完成训练,得到网络模型。Step 6. Perform weighted fusion of the body part emotional feature TF , the scene -level contextual emotional feature TC with the body part fusion weight λ F , and the context fusion weight λ C to obtain the emotional fusion feature TA combined with the context information, and then pass TA through . The fully connected layer is linearly mapped to obtain the initial predicted values of arousal and valence, and the KL divergence loss function is used to measure the loss between the initial predicted values of arousal and valence and the corresponding original sentiment annotation values, backpropagation through the network, multiple iterations , update the network weight, gradually reduce the loss, make the algorithm gradually converge, complete the training, and get the network model.
步骤6.1、将身体部位情感特征TF、场景级上下文情感特征TC与身体部 位融合权重λF、上下文融合权重λC进行加权融合,得到情感融合特征TA, 表达式如下所示:Step 6.1 . Perform weighted fusion of the body part emotional feature TF , the scene-level contextual emotional feature TC with the body part fusion weight λ F , and the context fusion weight λ C to obtain the emotional fusion feature TA , and the expression is as follows:
公式(12)中,Π表示连接运算符,表示将融合权重后的身体部位情感 特征和场景级上下文情感特征进行拼接,表示不同特性特征与融合权重之 间的卷积操作;In formula (12), Π represents the connection operator, which means to splicing the emotional features of body parts and scene-level contextual emotional features after the fusion weight, Represents the convolution operation between different feature features and fusion weights;
步骤6.2、将融合特征TA送入全连接层进行处理,由于预测值为连续型, 因此最后一层全连接层改为线性激活函数,全连接层参数结构表如下所示:Step 6.2. Send the fusion feature T A to the fully connected layer for processing. Since the predicted value is continuous, the last fully connected layer is changed to a linear activation function. The parameter structure table of the fully connected layer is as follows:
全连接层参数表Fully connected layer parameter table
步骤6.3、最终256维情感特征经过全连接层Fc10线性映射成为2维预 测标签值arousal和valence值,损失函数采用KL散度损失,衡量预测标签 值与原始标签值之间的损失,使网络反向传播,共迭代80次,更新网络权 重,逐渐减少损失,使算法逐步收敛,完成训练。Step 6.3. The final 256-dimensional emotional feature is linearly mapped into the 2-dimensional predicted label value arousal and valence value through the fully connected layer Fc10. The loss function adopts the KL divergence loss to measure the loss between the predicted label value and the original label value, so that the network reverses. Propagation, a total of 80 iterations, update the network weight, gradually reduce the loss, make the algorithm gradually converge, and complete the training.
其中,采用的损失函数为KL散度函数,具体定义式如下所示:Among them, the loss function used is the KL divergence function, and the specific definition is as follows:
公式(14)中,p(yi”)表示原始情感标签y的真实分布,q(lyi”)表示模型 预测标签值ly的分布,n表示总的训练样本个数。In formula (14), p(y i” ) represents the true distribution of the original sentiment label y, q(ly i” ) represents the distribution of the model predicted label value ly, and n represents the total number of training samples.
本发明所采用的卷积神经网络的反向传播包括三种情况:The back propagation of the convolutional neural network adopted by the present invention includes three situations:
(1)、池化层后接全连接层时,误差会由一个全连接层反向传入多个下 采样层中,需要求得特征图中每个像素的梯度。(1) When the pooling layer is followed by the fully connected layer, the error will be reversely transmitted to multiple downsampling layers from one fully connected layer, and the gradient of each pixel in the feature map needs to be obtained.
如式(15)所示,f′(ul j)表示第l层激活函数的偏导,j代表当前层特征 图的个数,δl+1 j为第l+1层偏置的梯度,先将l+1层权重矩阵Wl+1 j旋转180 度后,再将δl+1 j周围邻域补0后与权重矩阵rot180(Wl+1 j)进行卷积操作,其 中⊙代表两个矩阵点乘。获得当前层特征图中对应元素的偏置梯度后,下采 样层的偏置梯度和权重梯度分别如下式所示:As shown in equation (15), f'(u l j ) represents the partial derivative of the activation function of the lth layer, j represents the number of feature maps of the current layer, and δ l+1 j is the gradient of the l+1th layer bias , first rotate the weight matrix W l+1 j of the l+1 layer by 180 degrees, then fill the neighborhood around δ l+1 j with 0, and perform a convolution operation with the weight matrix rot180(W l+1 j ), where ⊙ Represents the dot product of two matrices. After obtaining the bias gradient of the corresponding element in the feature map of the current layer, the bias gradient and weight gradient of the downsampling layer are respectively as follows:
dl j=downsample(xl-1 j)为l-1层的第j个特征图的下采样结果。d l j =downsample(x l-1 j ) is the down-sampling result of the jth feature map of the l-1 layer.
(2)、池化层后接卷积层时与情形(1)类似,偏置和权重梯度的求解 也相同。(2) When the pooling layer is followed by the convolutional layer, it is similar to the case (1), and the solutions of the bias and weight gradient are also the same.
(3)卷积层后为池化层时,特征图之间一一对应。同理,先求当前层 特征图中每个像素点的偏置梯度δl j:(3) When the convolutional layer is followed by a pooling layer, the feature maps correspond one-to-one. Similarly, first find the bias gradient δ l j of each pixel in the feature map of the current layer:
δl j=wl+1 j(f′(ul j)×upsample(δl+1 j)) (17)δ l j =w l+1 j (f′(u l j )×upsample(δ l+1 j )) (17)
公式(17)中,upsample(δl+1 j)代表对δl+1 j上采样,将l+1层下采样的第 j个结果上采样,恢复为与卷积特征图相同的尺寸,方便与f′(ul j)矩阵做点 乘,卷积层的偏置梯度和权重梯度如式(18)和(19)所示。In formula (17), upsample(δ l+1 j ) represents up-sampling δ l+1 j , up-sampling the j-th result of the l+1 layer down-sampling, and restores it to the same size as the convolution feature map, It is convenient to do dot multiplication with the f'(u l j ) matrix, and the bias gradient and weight gradient of the convolutional layer are shown in equations (18) and (19).
公式(18)、(19)中,wl j为第l层第j个特征图xl j对应的卷积核,pl j为第l-1层的第j个特征图xl-1 j与卷积核wl j卷积后得到的对应结果。In formulas (18) and (19), w l j is the convolution kernel corresponding to the j-th feature map x l j of the l-th layer, and p l j is the j-th feature map x l-1 of the l-1 layer The corresponding result obtained after j is convolved with the convolution kernel w l j .
步骤7、对测试样本集Xtn,按照步骤2提取测试样本集Xtn中每一个测 试样本的身体部位,获得测试身体部位图像集XtB,接着按照步骤3分别对 测试样本集Xtn和测试身体部位图像集XtB进行归一化处理后,送入步骤6所 得的网络模型,最终得到测试样本集Xtn的arousal和valence预测标签值。Step 7. For the test sample set X tn , extract the body part of each test sample in the test sample set X tn according to step 2, and obtain the test body part image set X tB , and then according to step 3, respectively perform the test sample set X tn and the test sample set X tn . After the body part image set X tB is normalized, it is sent to the network model obtained in step 6, and finally the arousal and valence predicted label values of the test sample set X tn are obtained.
步骤7的具体过程如下:The specific process of step 7 is as follows:
步骤7.1、读取测试样本集Xtn中每个样本的身体标注(tBx1,tBy1,tBx2,tBy2), 通过下面的公式计算得到位置与尺寸参数集 Step 7.1. Read the body annotation (tB x1 , tB y1 , tB x2 , tB y2 ) of each sample in the test sample set X tn , and calculate the position and size parameter set by the following formula
步骤7.2、按照步骤7.1中得到的参数集对测试样本集 Xtn进行裁剪,得到测试身体部位图像集XtB。Step 7.2, follow the parameter set obtained in step 7.1 The test sample set X tn is cropped to obtain the test body part image set X tB .
步骤7.3、参照步骤3对测试样本集Xtn和测试身体部位图像集XtB分别 作集合内归一化处理,得到对应的测试上下文情感图像集Xtm和归一化的测 试身体部位图像集Xtbody;Step 7.3, with reference to step 3, perform intra-set normalization on the test sample set X tn and the test body part image set X tB respectively, to obtain the corresponding test context emotional image set X tm and the normalized test body part image set X t tbody ;
步骤7.4、将归一化的测试身体部位图像集Xtbody送入步骤6所得网络模 型的上层结构,将测试上下文情感图像集Xtm送如步骤6所得网络模型的下 层结构,通过模型预测得到测试样本集Xtn的arousal和valence预测标签值。Step 7.4, the normalized test body part image set X tbody is sent to the superstructure of the network model obtained in step 6, the test context emotional image set X tm is sent to the lower structure of the network model obtained in step 6, and the test is obtained by model prediction. The arousal and valence of the sample set X tn predict the label value.
实施例Example
本发明的实验基于EMOTIC数据库进行,EMOTIC数据集提供了丰富 的复杂场景下的情感图像,图像中不仅包含待测对象本身还包括大量环境及 其他因素的场景级上下文信息;该数据集共有23554个待测样本,可分为 17077个训练集样本,2088个验证集样本,4389个测试集样本。其标注信息 不仅包含离散标注和连续维度标注,同时还包括每张图像中待测对象的身体 部位标注,便于展开场景级上下文研究,部分复杂情感图像及其标注如附图 2所示。The experiments of the present invention are carried out based on the EMOTIC database. The EMOTIC dataset provides rich emotional images in complex scenes. The images include not only the object to be tested but also scene-level context information of a large number of environments and other factors; the dataset has a total of 23,554 The samples to be tested can be divided into 17077 training set samples, 2088 validation set samples, and 4389 test set samples. The annotation information not only includes discrete annotations and continuous dimension annotations, but also includes the body part annotations of the object to be measured in each image, which is convenient for scene-level context research. Some complex emotional images and their annotations are shown in Figure 2.
实验结果对比如下:The experimental results are compared as follows:
1)不同特征融合方式对情感识别的影响1) The influence of different feature fusion methods on emotion recognition
由于不同网络结构提取的特征往往其属性并不相同,若将身体部位情感 特征与场景级上下文情感特征这两种不同属性的特征直接拼接并不能提供 最优的性能判别。因此,为验证自适应融合网络的有效性,采用同样的实验 设置,将两层卷积神经网络输出的特征分别采用直接拼接融合方式和自适应 网络融合方式进行对比,实验结果如下表3所示:Since the features extracted by different network structures often have different attributes, directly splicing the features of the two different attributes, the body part emotional feature and the scene-level contextual emotional feature, cannot provide the optimal performance discrimination. Therefore, in order to verify the effectiveness of the adaptive fusion network, the same experimental settings were used to compare the features output by the two-layer convolutional neural network using the direct splicing fusion method and the adaptive network fusion method. The experimental results are shown in Table 3 below. :
表3不同特征融合方式对情感识别的影响Table 3 Effects of different feature fusion methods on emotion recognition
由表中数据可知,本发明设计的自适应融合网络对于情感特征的融合方 式要优于直接将两种不同属性的特征进行直接拼接。这验证了本发明将自适 应融合网络引入上下文情感识别网络结构的有效性。It can be seen from the data in the table that the adaptive fusion network designed by the present invention is better than the direct splicing of the features of two different attributes for the fusion of emotional features. This verifies the effectiveness of the present invention in introducing the adaptive fusion network into the contextual emotion recognition network structure.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010664287.3A CN111985532B (en) | 2020-07-10 | 2020-07-10 | Scene-level context-aware emotion recognition deep network method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010664287.3A CN111985532B (en) | 2020-07-10 | 2020-07-10 | Scene-level context-aware emotion recognition deep network method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111985532A true CN111985532A (en) | 2020-11-24 |
CN111985532B CN111985532B (en) | 2021-11-09 |
Family
ID=73439067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010664287.3A Active CN111985532B (en) | 2020-07-10 | 2020-07-10 | Scene-level context-aware emotion recognition deep network method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111985532B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112733756A (en) * | 2021-01-15 | 2021-04-30 | 成都大学 | Remote sensing image semantic segmentation method based on W divergence countermeasure network |
CN113011504A (en) * | 2021-03-23 | 2021-06-22 | 华南理工大学 | Virtual reality scene emotion recognition method based on visual angle weight and feature fusion |
CN113076905A (en) * | 2021-04-16 | 2021-07-06 | 华南理工大学 | Emotion recognition method based on context interaction relationship |
CN114764906A (en) * | 2021-01-13 | 2022-07-19 | 长沙中车智驭新能源科技有限公司 | Multi-sensor post-fusion method for automatic driving, electronic equipment and vehicle |
CN117636426A (en) * | 2023-11-20 | 2024-03-01 | 北京理工大学珠海学院 | A facial and situational emotion recognition method based on attention mechanism |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512680A (en) * | 2015-12-02 | 2016-04-20 | 北京航空航天大学 | Multi-view SAR image target recognition method based on depth neural network |
CN108830296A (en) * | 2018-05-18 | 2018-11-16 | 河海大学 | A kind of improved high score Remote Image Classification based on deep learning |
CN109977413A (en) * | 2019-03-29 | 2019-07-05 | 南京邮电大学 | A kind of sentiment analysis method based on improvement CNN-LDA |
WO2019174376A1 (en) * | 2018-03-14 | 2019-09-19 | 大连理工大学 | Lung texture recognition method for extracting appearance and geometrical feature based on deep neural network |
CN110399490A (en) * | 2019-07-17 | 2019-11-01 | 武汉斗鱼网络科技有限公司 | A kind of barrage file classification method, device, equipment and storage medium |
CN110472245A (en) * | 2019-08-15 | 2019-11-19 | 东北大学 | A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks |
-
2020
- 2020-07-10 CN CN202010664287.3A patent/CN111985532B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512680A (en) * | 2015-12-02 | 2016-04-20 | 北京航空航天大学 | Multi-view SAR image target recognition method based on depth neural network |
WO2019174376A1 (en) * | 2018-03-14 | 2019-09-19 | 大连理工大学 | Lung texture recognition method for extracting appearance and geometrical feature based on deep neural network |
CN108830296A (en) * | 2018-05-18 | 2018-11-16 | 河海大学 | A kind of improved high score Remote Image Classification based on deep learning |
CN109977413A (en) * | 2019-03-29 | 2019-07-05 | 南京邮电大学 | A kind of sentiment analysis method based on improvement CNN-LDA |
CN110399490A (en) * | 2019-07-17 | 2019-11-01 | 武汉斗鱼网络科技有限公司 | A kind of barrage file classification method, device, equipment and storage medium |
CN110472245A (en) * | 2019-08-15 | 2019-11-19 | 东北大学 | A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114764906A (en) * | 2021-01-13 | 2022-07-19 | 长沙中车智驭新能源科技有限公司 | Multi-sensor post-fusion method for automatic driving, electronic equipment and vehicle |
CN112733756A (en) * | 2021-01-15 | 2021-04-30 | 成都大学 | Remote sensing image semantic segmentation method based on W divergence countermeasure network |
CN113011504A (en) * | 2021-03-23 | 2021-06-22 | 华南理工大学 | Virtual reality scene emotion recognition method based on visual angle weight and feature fusion |
CN113011504B (en) * | 2021-03-23 | 2023-08-22 | 华南理工大学 | Emotion Recognition Method for Virtual Reality Scenes Based on Perspective Weight and Feature Fusion |
CN113076905A (en) * | 2021-04-16 | 2021-07-06 | 华南理工大学 | Emotion recognition method based on context interaction relationship |
CN117636426A (en) * | 2023-11-20 | 2024-03-01 | 北京理工大学珠海学院 | A facial and situational emotion recognition method based on attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN111985532B (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111985532B (en) | Scene-level context-aware emotion recognition deep network method | |
CN112990054B (en) | Compact linguistics-free facial expression embedding and novel triple training scheme | |
CN107066583B (en) | A kind of picture and text cross-module state sensibility classification method based on the fusion of compact bilinearity | |
CN110188598A (en) | A Real-time Hand Pose Estimation Method Based on MobileNet-v2 | |
CN114936623B (en) | Aspect-level emotion analysis method integrating multi-mode data | |
CN108427921A (en) | A kind of face identification method based on convolutional neural networks | |
CN111028319B (en) | Three-dimensional non-photorealistic expression generation method based on facial motion unit | |
CN111783622A (en) | Method, apparatus, device and computer-readable storage medium for facial expression recognition | |
CN111860362A (en) | Generating face image correction model and method and device for correcting face image | |
CN110674685B (en) | A Human Analytical Segmentation Model and Method Based on Edge Information Enhancement | |
CN114240891A (en) | Welding spot quality identification method fusing knowledge graph and graph convolution neural network | |
Sevli et al. | Turkish sign language digits classification with CNN using different optimizers | |
Depuru et al. | Convolutional neural network based human emotion recognition system: A deep learning approach | |
CN108985298A (en) | A kind of human body clothing dividing method based on semantic consistency | |
CN115937945A (en) | Visual transform and convolution network fusion-based facial expression recognition method | |
CN116129141A (en) | Medical data processing method, apparatus, device, medium and computer program product | |
Fu et al. | Personality trait detection based on ASM localization and deep learning | |
CN115982652A (en) | A Cross-Modal Sentiment Analysis Method Based on Attention Network | |
CN112560712B (en) | Behavior recognition method, device and medium based on time enhancement graph convolutional network | |
CN114613016A (en) | Gesture image feature extraction method based on Xscene network improvement | |
Wang | Improved facial expression recognition method based on gan | |
CN107704810A (en) | A kind of expression recognition method suitable for medical treatment and nursing | |
CN118298369A (en) | Classroom learning effect recognition method and device | |
CN106778579A (en) | A kind of head pose estimation method based on accumulative attribute | |
CN114898418B (en) | Complex emotion detection method and system based on annular model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |