CN111797936A

CN111797936A - Image emotion classification method and device based on saliency detection and multi-level feature fusion

Info

Publication number: CN111797936A
Application number: CN202010670001.2A
Authority: CN
Inventors: 邓泽林; 朱其然
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2020-10-20
Anticipated expiration: 2040-07-13
Also published as: CN111797936B

Abstract

The invention discloses an image emotion classification method based on saliency detection and multi-level feature fusion. Firstly, the saliency detection network is used to extract the saliency map of the emotion image; The modulation of the feature map, so that the Inception‑v4 network can pay more attention to the emotional expression area of the emotional image, thereby effectively improving the accuracy of image emotion classification; finally, the Inception‑v4 network is used to modulate the modulation feature map of the twin neural network. Classification is performed to accurately obtain the emotion category corresponding to the emotion image. The image emotion classification method provided by the present invention can accurately locate the emotion expression area in the emotion image, and realizes that the classification network pays higher attention to the emotion expression area of the emotion image by means of feature modulation, thereby effectively improving the image emotion classification. accuracy of the method.

Description

Image sentiment classification method based on saliency detection and multi-level feature fusion and device

技术领域technical field

本发明涉及图像情感分类技术领域，尤其是一种基于显著性检测和多层次特征融合的图像情感分类方法及装置。The invention relates to the technical field of image emotion classification, in particular to an image emotion classification method and device based on saliency detection and multi-level feature fusion.

背景技术Background technique

随着摄影技术和社交网络的发展和普及，人们已经习惯通过图像或者视频在网上分享经验和表达意见。这就对图像和视频内容的处理和理解提出了迫切的需求。与低水平的视觉表象相比，人类能够更好地感知理解高水平的语义和情感。近年来，心理学、情感计算、多媒体社区的图像内容的情感层次分析受到了广泛的关注。对图像在情感层次的分析也是图像内容分析的重中之重，它可以在人机交互、舆情分析、图像检索等方面实现广泛的应用。With the development and popularization of photography technology and social networks, people have become accustomed to sharing experiences and expressing opinions online through images or videos. This creates an urgent need for the processing and understanding of image and video content. Humans are better able to perceptually understand high-level semantics and emotions than low-level visual representations. In recent years, sentiment hierarchy analysis of image content in psychology, affective computing, and multimedia communities has received extensive attention. The analysis of images at the emotional level is also the top priority of image content analysis, which can be widely used in human-computer interaction, public opinion analysis, and image retrieval.

图像情感的表达主要是由图像中的情感区域决定，如图像中的目标，而现有的方法并没有给予图像的情感区域更多的关注，因此无法获取更具有判别性的情感特征。The expression of image emotion is mainly determined by the emotional area in the image, such as the target in the image, and the existing methods do not pay more attention to the emotional area of the image, so it is impossible to obtain more discriminative emotional features.

发明内容SUMMARY OF THE INVENTION

本发明提供一种基于显著性检测和多层次特征融合的图像情感分类方法及装置，用于克服现有技术中情感区域关注少等缺陷。The present invention provides an image emotion classification method and device based on saliency detection and multi-level feature fusion, which are used to overcome the defects of the prior art such as less attention to emotional regions.

为实现上述目的，本发明提出一种基于显著性检测和多层次特征融合的图像情感分类方法，所述图像情感分类方法包括：In order to achieve the above object, the present invention proposes an image emotion classification method based on saliency detection and multi-level feature fusion, and the image emotion classification method includes:

构建情感图像集；所述情感图像集包括被标记的情感图像；constructing an emotional image set; the emotional image set includes marked emotional images;

根据所述情感图像集建立训练集和验证集，利用显著性检测网络提取所述训练集和验证集中情感图像的显著性图；Establish a training set and a verification set according to the emotional image set, and use a saliency detection network to extract the saliency maps of the emotional images in the training set and the verification set;

将所述训练集中的情感图像和显著性图输入预先构建的图像情感分类模型；所述图像情感分类模型包括孪生神经网络和Inception-v4网络；The emotional images and saliency maps in the training set are input into a pre-built image emotion classification model; the image emotion classification model includes a twin neural network and an Inception-v4 network;

利用所述训练集中的情感图像和显著性图对所述孪生神经网络进行训练，并通过训练好的孪生神经网络来利用所述训练集中的显著性图对相应的情感图像进行特征调制，获得调制特征图；The Siamese neural network is trained by using the emotional images and saliency maps in the training set, and the saliency map in the training set is used to perform feature modulation on the corresponding emotional images through the trained Siamese neural network to obtain modulation feature map;

利用所述调制特征图对所述Inception-v4网络进行训练，获得训练好的图像情感分类模型；Use the modulation feature map to train the Inception-v4 network to obtain a trained image emotion classification model;

利用所述验证集中的情感图像和显著性图对训练好的图像情感分类模型进行验证；Use the emotional images and saliency maps in the verification set to verify the trained image sentiment classification model;

将待分类图像以及待分类图像的显著性图输入验证后的图像情感分类模型进行分类，获得图像情感类别。The image to be classified and the saliency map of the to-be-classified image are input into the verified image emotion classification model for classification, and the image emotion category is obtained.

为实现上述目的，本发明还提出一种基于显著性检测和多层次特征融合的图像情感分类装置，包括：In order to achieve the above object, the present invention also proposes an image emotion classification device based on saliency detection and multi-level feature fusion, including:

图像集构建模块，用于构建情感图像集；所述情感图像集包括被标记的情感图像；an image set building module for constructing an emotional image set; the emotional image set includes marked emotional images;

显著性图获取模块，用于根据所述情感图像集建立训练集和验证集，利用显著性检测网络提取所述训练集和验证集中情感图像的显著性图；A saliency map acquisition module, used for establishing a training set and a verification set according to the emotional image set, and extracting the saliency map of the emotional images in the training set and the verification set by using a saliency detection network;

模型训练模块，用于将所述训练集中的情感图像和显著性图输入预先构建的图像情感分类模型；所述图像情感分类模型包括孪生神经网络和Inception-v4网络；利用所述训练集中的情感图像和显著性图对所述孪生神经网络进行训练，并通过训练好的孪生神经网络来利用所述训练集中的显著性图对相应的情感图像进行特征调制，获得调制特征图；利用所述调制特征图对所述Inception-v4网络进行训练，获得训练好的图像情感分类模型；A model training module for inputting the emotional images and saliency maps in the training set into a pre-built image emotion classification model; the image emotion classification model includes a twin neural network and an Inception-v4 network; using the emotion in the training set The image and the saliency map are used to train the twin neural network, and the saliency map in the training set is used to perform feature modulation on the corresponding emotional image through the trained twin neural network to obtain a modulation feature map; using the modulation The feature map is used to train the Inception-v4 network to obtain a trained image emotion classification model;

模型验证模块，用于利用所述验证集中的情感图像和显著性图对训练好的图像情感分类模型进行验证；A model verification module for verifying the trained image sentiment classification model using the emotional images and saliency maps in the verification set;

分类模块，用于将待分类图像以及待分类图像的显著性图输入验证后的图像情感分类模型进行分类，获得图像情感类别。The classification module is used for inputting the image to be classified and the saliency map of the image to be classified into the verified image emotion classification model for classification to obtain the image emotion category.

为实现上述目的，本发明还提出一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现上述所述方法的步骤。To achieve the above object, the present invention also provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above method when executing the computer program.

为实现上述目的，本发明还提出一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述所述方法的步骤。In order to achieve the above object, the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned method are implemented.

与现有技术相比，本发明的有益效果有：Compared with the prior art, the beneficial effects of the present invention are:

本发明提供的基于显著性检测和多层次特征融合的图像情感分类方法首先利用显著性检测网络提取情感图像的显著性图；再通过孪生神经网络实现显著性图的特征图对对应情感图像的特征图的调制，以便于Inception-v4网络对情感图像的情感表达区域更多的关注度，从而有效提高图像情感分类的精度；最后利用Inception-v4网络对经孪生神经网络调制后的调制特征图进行分类，以准确获得对应情感图像的情感类别。本发明提供的图像情感分类方法可准确的定位情感图像中的情感表达区域，并通过特征调制的方式实现让分类网络在情感图像的情感表达区域给予更高的注意力，从而有效提高图像情感分类方法的准确率。The image emotion classification method based on saliency detection and multi-level feature fusion provided by the present invention firstly uses the saliency detection network to extract the saliency map of the emotion image; The modulation of the image, so that the Inception-v4 network can pay more attention to the emotional expression area of the emotional image, thereby effectively improving the accuracy of image emotion classification; finally, the Inception-v4 network is used to modulate the modulation feature map modulated by the twin neural network. classification to accurately obtain the emotion category corresponding to the emotion image. The image emotion classification method provided by the present invention can accurately locate the emotion expression area in the emotion image, and realizes that the classification network pays higher attention to the emotion expression area of the emotion image by means of feature modulation, thereby effectively improving the image emotion classification. accuracy of the method.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图示出的结构获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained according to the structures shown in these drawings without creative efforts.

图1为本发明提供的基于显著性检测和多层次特征融合的图像情感分类方法流程图；1 is a flowchart of an image emotion classification method based on saliency detection and multi-level feature fusion provided by the present invention;

图2为本发明提供的基于显著性检测和多层次特征融合的图像情感分类方法的整体结构图；Fig. 2 is the overall structure diagram of the image emotion classification method based on saliency detection and multi-level feature fusion provided by the present invention;

图3为本发明提供的图像情感分类模型的结构图；3 is a structural diagram of an image emotion classification model provided by the present invention;

图4为情感轮和情感距离图。Figure 4 is a graph of emotional wheel and emotional distance.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

另外，本发明各个实施例之间的技术方案可以相互结合，但是必须是以本领域普通技术人员能够实现为基础，当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在，也不在本发明要求的保护范围之内。In addition, the technical solutions between the various embodiments of the present invention can be combined with each other, but must be based on the realization by those of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that the combination of technical solutions does not exist and is not within the scope of protection claimed by the present invention.

本发明提出一种基于显著性检测和多层次特征融合的图像情感分类方法，如图1和图2所示，图像情感分类方法包括：The present invention proposes an image emotion classification method based on saliency detection and multi-level feature fusion. As shown in Figure 1 and Figure 2, the image emotion classification method includes:

101：构建情感图像集；情感图像集包括被标记的情感图像；101: Construct an emotional image set; the emotional image set includes marked emotional images;

情感图像集中的情感图像来自于国际情感图像系统子集(IAPSa)、抽象画数据集(Abstract)、艺术图像集(ArtPhoto)以及从Flicker和Instagram中收集的若干弱标记情感图像等等。The emotional images in the emotional image set come from the International Affective Image System Subset (IAPSa), the Abstract Painting Dataset (Abstract), the Art Image Collection (ArtPhoto), and several weakly labeled emotional images collected from Flicker and Instagram, etc.

标记，是指情感图像集中的情感图像预先进行了情感类别标记。Labeling means that the emotional images in the emotional image set are pre-marked with emotional categories.

情感类别包括anger(愤怒)、disgust(反感)、fear(害怕)、sadness(悲伤)、amusement(愉悦)、awe(惊叹)、contentment(满意)、excitement(兴奋)。The emotion categories include anger, disgust, fear, sadness, amusement, awe, contentment, and excitement.

102：根据情感图像集建立训练集和验证集，利用显著性检测网络提取训练集和验证集中情感图像的显著性图；102: Establish a training set and a verification set according to the emotional image set, and use a saliency detection network to extract the saliency maps of the emotional images in the training set and the verification set;

显著性检测网络，参见Liu J J,Hou Q,Cheng M M,et al.A Simple Pooling-Based Design for Real-Time Salient Object Detection[J].2019.Saliency detection network, see Liu J J, Hou Q, Cheng M M, et al. A Simple Pooling-Based Design for Real-Time Salient Object Detection [J]. 2019.

103：将训练集中的情感图像和显著性图输入预先构建的图像情感分类模型；图像情感分类模型包括孪生神经网络和Inception-v4网络；103: Input the emotional images and saliency maps in the training set into a pre-built image emotion classification model; the image emotion classification model includes the Siamese neural network and the Inception-v4 network;

孪生神经网络，由两个完全相同并且权值共享的神经网络组成。Siamese neural network consists of two identical neural networks with shared weights.

Inception-v4网络，为性能优异的分类网络。The Inception-v4 network is a classification network with excellent performance.

104：利用训练集中的情感图像和显著性图对孪生神经网络进行训练，并通过训练好的孪生神经网络来利用训练集中的显著性图对相应的情感图像进行特征调制，获得调制特征图；104: Use the emotional image and saliency map in the training set to train the twin neural network, and use the saliency map in the training set to perform feature modulation on the corresponding emotional image through the trained twin neural network to obtain a modulation feature map;

通过孪生神经网络实现显著性图的特征图对对应情感图像的特征图的调制，以便于Inception-v4网络对情感图像的情感表达区域更多的关注度，从而有效提高图像情感分类的精度。The modulation of the feature map of the saliency map to the feature map of the corresponding emotional image is realized by the Siamese neural network, so that the Inception-v4 network can pay more attention to the emotional expression area of the emotional image, thereby effectively improving the accuracy of image emotion classification.

105：利用调制特征图对Inception-v4网络进行训练，获得训练好的图像情感分类模型；105: Use the modulation feature map to train the Inception-v4 network to obtain a trained image emotion classification model;

106：利用验证集中的情感图像和显著性图对训练好的图像情感分类模型进行验证；106: Validate the trained image emotion classification model by using the emotion images and saliency maps in the validation set;

107：将待分类图像以及待分类图像的显著性图输入验证后的图像情感分类模型进行分类，获得图像情感类别。107 : Input the image to be classified and the saliency map of the image to be classified into the verified image emotion classification model for classification, and obtain an image emotion category.

在其中一个实施例中，对于步骤103，孪生神经网络如图3所示，包括第一分支和第二分支，第一分支用于对情感图像进行特征提取，第二分支用于对显著性图进行特征提取；所述第一分支和第二分支均由卷积核大小相同的四个卷积层(BasicConv2d)构成；所述第二分支中第二个和第四个卷积层的输出端与所述第一分支中第二个和第四个卷积层的输出端相连接。In one embodiment, for step 103, the Siamese neural network is shown in FIG. 3, including a first branch and a second branch, the first branch is used for feature extraction on emotional images, and the second branch is used for saliency map Perform feature extraction; both the first branch and the second branch are composed of four convolutional layers (BasicConv2d) with the same convolution kernel size; the outputs of the second and fourth convolutional layers in the second branch connected to the outputs of the second and fourth convolutional layers in the first branch.

为了保证情感图像的特征和显著性图的特征在网络传播中保持相似的特征空间，该孪生神经网络的两个分支都由卷积核大小相同的四个卷积层构成。对于第一分支，四个卷积层的输入和输出通道都为3；而第二分支的四个卷积层的输入和输入通道都为1。In order to ensure that the features of emotional images and the features of saliency maps maintain a similar feature space in the network propagation, both branches of this Siamese neural network are composed of four convolutional layers with the same convolution kernel size. For the first branch, the input and output channels of the four convolutional layers are all 3; while the input and input channels of the four convolutional layers of the second branch are all 1.

通过显著性检测网络得到情感图像的显著性图后，为了让显著性图能够成功的约束分类模型给予情感图像的情感区域更高的关注度，引入了孪生神经网络。在孪生神经网络中，使用第二分支中的显著性图来调制第一分支中的情感图像，完成对情感图像中情感区域关注度的增强。After obtaining the saliency map of the emotional image through the saliency detection network, in order to allow the saliency map to successfully constrain the classification model to give higher attention to the emotional region of the emotional image, a Siamese neural network is introduced. In the Siamese neural network, the saliency map in the second branch is used to modulate the emotional images in the first branch, and the attention to the emotional regions in the emotional images is enhanced.

在某个实施例中，对于步骤104，通过训练好的孪生神经网络利用训练集中的显著性图对相应的情感图像进行特征调制，获得调制特征图，包括：In a certain embodiment, for step 104, the trained twin neural network uses the saliency map in the training set to perform feature modulation on the corresponding emotional image to obtain a modulation feature map, including:

401：将训练集中的情感图像a(x,y,z)输入训练好的孪生神经网络的第一分支，显著性图b(x,y)输入第二分支；其中，x,y为坐标系，z为情感图像的三个颜色通道。401: Input the emotional image a(x, y, z) in the training set into the first branch of the trained Siamese neural network, and the saliency map b(x, y) into the second branch; where x, y are the coordinate systems , z is the three color channels of the emotional image.

402：获取第二分支中第二个卷积层输出的特征图S(S∈R^w×h，其中w为特征图的宽，h为特征图的高)，将所述特征图S与所述第一分支中第二个卷积层输出的特征图T(T∈R^c ^×w×h，c为特征图的通道数)进行对应元素的乘法运算，获得特征图H(H∈R^c×w×h)，并将所述特征图H与所述特征图T进行对应元素的加法运算，获得特征图G；402: Obtain the feature map S output by the second convolutional layer in the second branch (S∈R ^w×h , where w is the width of the feature map, and h is the height of the feature map), and the feature map S and the The feature map T (T ∈ R ^c ^{×w × h} , c is the number of channels of the feature map) output by the second convolutional layer in the first branch is multiplied by the corresponding elements to obtain the feature map H (H ∈ R ^{c ×w×h} ), and the feature map H and the feature map T are added with corresponding elements to obtain the feature map G;

乘法运算是为了利用特征图S调制特征图T；The multiplication operation is to use the feature map S to modulate the feature map T;

加法运算是为了避免乘法运算后特征图T部分特征被完全忽略的问题，加法运算以再次强调原图(输入的情感图像)的特征。The addition operation is to avoid the problem that the features of the T part of the feature map are completely ignored after the multiplication operation, and the addition operation re-emphasizes the features of the original image (input emotional image).

403：将所述特征图G输入第一分支的第三个卷积层，将所述特征图S输入第二分支的第三个卷积层；403: Input the feature map G into the third convolutional layer of the first branch, and input the feature map S into the third convolutional layer of the second branch;

404：获取第二分支中第四个卷积层输出的特征图S′，将所述特征图S′与所述第一分支中第四个卷积层输出的特征图T′进行对应元素的乘法运算，获得特征图H′，并将所述特征图H′与所述特征图T′进行对应元素的加法运算，获得调制特征图F。404: Obtain the feature map S' output by the fourth convolutional layer in the second branch, and perform the corresponding element comparison between the feature map S' and the feature map T' output by the fourth convolutional layer in the first branch. A multiplication operation is performed to obtain a feature map H', and an addition operation of corresponding elements is performed on the feature map H' and the feature map T' to obtain a modulation feature map F.

特征调制是为了让特征图T中的显著性区域获得更大的响应值，即为了让Inception-v4网络给予情感图像中的情感表达区域更多的关注度。同时，为了确保效果明显，进行了两次连续的调制。The feature modulation is to make the salient region in the feature map T obtain a larger response value, that is, to let the Inception-v4 network give more attention to the emotional expression region in the emotional image. At the same time, to ensure that the effect is obvious, two consecutive modulations are carried out.

在另一个实施例中，由于特征图S和特征图T的数量无关，只考虑单个特征图S。特征图G的计算公式为：In another embodiment, since the number of feature maps S and feature maps T is irrelevant, only a single feature map S is considered. The calculation formula of the feature map G is:

G＝f(T(w,h,c)*[S(w,h)+1]) (1)G=f(T(w,h,c)*[S(w,h)+1]) (1)

式中，f表示Sigmoid激活函数，使0＜T∈R^w×h＜1，从而为特征调制保证合适的范围；T(w,h,c)表示第一分支中第二个卷积层输出的特征图；S(w,h)表示第二分支中第二个卷积层输出的特征图；w,h,c分别表示特征图的宽、高和通道数；In the formula, f represents the sigmoid activation function, so that 0<T∈Rw ^×h <1, so as to ensure a suitable range for feature modulation; T(w,h,c) represents the output of the second convolutional layer in the first branch The feature map of ; S(w, h) represents the feature map output by the second convolutional layer in the second branch; w, h, c represent the width, height and number of channels of the feature map, respectively;

调制特征图F的计算公式为：The calculation formula of the modulation feature map F is:

F＝f(T′(w,h,c)*[S′(w,h)+1]) (2)F=f(T'(w,h,c)*[S'(w,h)+1]) (2)

式中，T′(w,h,c)表示第一分支中第四个卷积层输出的特征图；S′(w,h)表示第二分支中第四个卷积层输出的特征图。In the formula, T'(w,h,c) represents the feature map output by the fourth convolutional layer in the first branch; S'(w,h) represents the feature map output by the fourth convolutional layer in the second branch .

在下一个实施例中，Inception-v4网络如图3所示，为多分支结构，依次包括三个卷积层(BasicConv2d)、一个Mixed_3a模块、一个Mixed_4a模块、一个Mixed_5a模块、四个Inception_A模块、一个Reduction_A模块、七个Inception_B模块、一个Reduction_B模块、三个Inception_C模块、一个平均池化层(Avarage Pooling)和一个全连接层(FullyConnection)；所述Mixed_5a模块、Reduction-A模块和Reduction-B模块之后分别引入侧分枝结构(BasicConv2d)，所述侧分枝结构由一个卷积层组成；三个所述侧分枝结构的输出端连接一个全连接层(Fully Connection)，所述全连接层用于将三个侧分枝结构输出的侧分枝特征融合并输出融合特征，并将所述融合特征输出至所述平均池化层的输出端。In the next embodiment, the Inception-v4 network is shown in Fig. 3, which is a multi-branch structure, including three convolutional layers (BasicConv2d), one Mixed_3a module, one Mixed_4a module, one Mixed_5a module, four Inception_A modules, one Reduction_A module, seven Inception_B modules, one Reduction_B module, three Inception_C modules, one average pooling layer (Avarage Pooling) and one fully connected layer (FullyConnection); after the Mixed_5a module, Reduction-A module and Reduction-B module The side branch structure (BasicConv2d) is respectively introduced, and the side branch structure consists of a convolutional layer; the outputs of the three side branch structures are connected to a fully connected layer (Fully Connection), and the fully connected layer uses It fuses the side branch features output by the three side branch structures and outputs the fused feature, and outputs the fused feature to the output end of the average pooling layer.

三个所述侧分枝结构分别由一个输出通道为256的卷积层构成，卷积核的尺寸为1，卷积步长为1。为了融合三个侧分枝结构输出的特征，本实施例在三个侧分枝结构后面定义了层神经元个数为256的全连接层(Fully Connection)。最后，把在Inception-v4网络最后的全连接层的神经元数量设置为8，对应8个情感类别，作为最终的分类器。通过上述网络结构，可以从三个侧分枝结构获取到深度网络中三个不同层次的特征L₁,L₂,L₃，加上深度网络的顶层特征L₄，共得到Inception-v4网络中四个层次的特征图。The three side branch structures are respectively composed of a convolutional layer with an output channel of 256, the size of the convolution kernel is 1, and the convolution stride is 1. In order to fuse the features output by the three side branch structures, in this embodiment, a fully connected layer (Fully Connection) with 256 layer neurons is defined behind the three side branch structures. Finally, set the number of neurons in the last fully connected layer of the Inception-v4 network to 8, corresponding to 8 emotion categories, as the final classifier. Through the above network structure, three different levels of features L ₁ , L ₂ , L ₃ in the deep network can be obtained from the three side branch structures, and the top-level feature L ₄ of the deep network can be obtained. Inception-v4 network Four levels of feature maps.

在得到不同层次的特征后，在模型方法中如何整合这些不同层次的特征是及其重要的。通过观察发现，在图像情感识别中，相对于图像的风格特征，图像的语义特征在情感的识别中起到更大的作用。所以，本实施例在特征融合的时候，给予语义特征更高的关注度。After obtaining different levels of features, how to integrate these different levels of features in the model method is extremely important. Through observation, it is found that in image emotion recognition, the semantic features of images play a greater role in emotion recognition than the style features of images. Therefore, in this embodiment, when the feature is fused, a higher degree of attention is given to the semantic feature.

本实施例的特征融合具体为：The feature fusion of this embodiment is specifically:

步骤1：先对特征L₁,L₂,L₃进行concat操作，然后输入到全连接层中，得到特征L。Step 1: First perform the concat operation on the features L ₁ , L ₂ , and L ₃ , and then input them into the fully connected layer to obtain the feature L.

步骤2：对特征L和语义特征L₄进行concat操作，得到最终的分类特征F。Step 2: Concat the feature L and the semantic feature L ₄ to obtain the final classification feature F.

本实施例充分考虑不同类型特征对情感唤醒的影响，从而制定更具有表达力的特征，以进一步解决情感特征和情感表达之间的鸿沟问题，从而提高情感分类的准确性。This embodiment fully considers the influence of different types of features on emotional arousal, so as to formulate more expressive features to further solve the problem of the gap between emotional features and emotional expression, thereby improving the accuracy of emotion classification.

本实施例中，在Inception-v4网络的基础上引入多分支结构，以分别在网络的低层次和高层次获取到图像的风格特征和语义特征，然后对多层次的特征进行融合，通过Softmax方式得到图像情感分类模型的情感类型预测分布。In this embodiment, a multi-branch structure is introduced on the basis of the Inception-v4 network, so as to obtain the style and semantic features of the image at the low-level and high-level of the network respectively, and then fuse the multi-level features through the Softmax method. Get the sentiment type prediction distribution of the image sentiment classification model.

在某个实施例中，Inception-v4网络中最后一个全连接层通过Softmax方式得到情感图像属于第i类情感类别的概率：In a certain embodiment, the last fully connected layer in the Inception-v4 network obtains the probability that the emotional image belongs to the i-th emotional category by Softmax:

式中，y_i表示情感图像属于第i类情感类别的概率；z_i表示情感图像属于第i类的激活值；z_j表示情感图像属于第j类的激活值；C表示情感类别。In the formula, _yi represents the probability that the emotional image belongs to the i-th emotional category; z _i represents the activation value of the emotional image belonging to the i-th category; z _j represents the activation value of the emotional image belonging to the j-th category; C represents the emotional category.

在另一个实施例中，对于步骤105，因为情感数据集的数据量相对较小，所以先采用迁移学习的策略，先将Inception-v4网络在ImageNet数据集上进行了预训练。之后再利用所述调制特征图对所述Inception-v4网络进行训练。In another embodiment, for step 105, because the data volume of the emotion data set is relatively small, the strategy of transfer learning is adopted first, and the Inception-v4 network is pre-trained on the ImageNet data set. Then, the Inception-v4 network is trained using the modulation feature map.

在下一个实施例中，所述图像情感分类模型采用基于情感多样性约束的多任务损失函数，所述多任务损失函数为：In the next embodiment, the image emotion classification model adopts a multi-task loss function based on emotion diversity constraints, and the multi-task loss function is:

L_multi＝L_cls+λL_ed (4)L _multi =L _cls +λL _ed (4)

式中，L_multi表示多任务损失函数；L_cls表示传统的分类损失；L_ed表示其他伴随情绪的损失；λ表示权重；q_i表示图像情感标签标定图像属于第i类情感的概率；p_i表示模型预测情感图像属于第i类情感类别的概率；f(i)表示其他伴随情绪的概率；j表示主导情绪；i表示其他伴随情绪；

表示主导情绪的概率；p_j表示主导情绪j的类别概率；dis_ij表示Mikels’wheel中定义的情绪i和主导情绪j之间的距离；i,j∈B表示情感类别i和j在同一个极性中。In the formula, L _multi represents the multi-task loss function; L _cls represents the traditional classification loss; L _ed represents the loss of other accompanying _emotions ; _λ represents the weight; represents the probability that the model predicts that the emotional image belongs to the i-th emotional category; f(i) represents the probability of other accompanying emotions; j represents the dominant emotion; i represents other accompanying emotions;

represents the probability of the dominant emotion; p _j represents the category probability of the dominant emotion j; dis _ij represents the distance between the emotion i and the dominant emotion j defined in Mikels'wheel; i,j∈B represents the emotion category i and j in the same in polarity.

在大多数图像情感数据的收集中，多数投票的策略被广泛的采用，以获取到图像的情感标签。而从情感表达的多样性出发，考虑基于标签概率的形式估计情感的分布。受情感理论研究的启发，两个情感的关系决定着两者的相似程度。而两个情感从相似到完全相反可以通过Mikels’Wheel(见图4，图4中(a)为情感轮，(b)为情感距离)表达，通过Mikels’Wheel定义的距离，可以计算出两个情感间的距离，而距离表示着这两个情感的相似度，即情感i和情感j之间的距离d_ij越小，表示这两个情感越相似。所以，通过Mikels’Wheel的定义，可以得到情感图像的概率标签。In the collection of most image sentiment data, the majority voting strategy is widely adopted to obtain the sentiment labels of images. Starting from the diversity of emotional expressions, we consider the distribution of emotions based on the probability of labels. Inspired by research in emotion theory, the relationship between two emotions determines how similar they are. And two emotions from similar to completely opposite can be expressed through Mikels'Wheel (see Figure 4, Figure 4 (a) is the emotional wheel, (b) is the emotional distance), through the distance defined by Mikels'Wheel, can calculate the two The distance between the emotions, and the distance represents the similarity of the two emotions, that is, the smaller the distance d _ij between the emotion i and the emotion j, the more similar the two emotions are. Therefore, through the definition of Mikels'Wheel, the probability label of emotional image can be obtained.

为了得到更合理的概率标签，本发明进一步结合情感理论的研究，情感图像的情感分为消极(anger、disgust、fear、sadness)N和积极(amusement、awe、contentment、excitement)P两个极性，通过情感理论研究可知，情感的多样性表达，实际上更多是一个主导情感伴随着多个相同极性的其他情感，所以在生成标签概率的时候，引入情感极性的关系。In order to obtain a more reasonable probability label, the present invention further combines the research of emotion theory. The emotion of emotional images is divided into two polarities: negative (anger, disgust, fear, sadness) N and positive (amusement, awe, contentment, excitement) P , through the study of emotion theory, the diversity expression of emotion is actually more of a dominant emotion accompanied by multiple other emotions of the same polarity, so when generating label probability, the relationship of emotion polarity is introduced.

根据主导情绪和Mikels’Wheel中的距离定义，计算出和主导情绪相同极性其他情感的概率分布，而对于相反极性的情感概率分布设为0。计算公式表达为公式(7)和(8)。According to the dominant emotion and the definition of distance in Mikels'Wheel, the probability distribution of other emotions with the same polarity as the dominant emotion is calculated, and the probability distribution of emotions with opposite polarity is set to 0. The calculation formulas are expressed as formulas (7) and (8).

通过情感图像的类别标签和概率标签，引入多任务损失函数，公式(5)。Through the category labels and probability labels of emotional images, a multi-task loss function is introduced, formula (5).

根据Mikels’Wheel生成情感图像的分布式标签，计算出预测分布和分布式标签的情感分布损失，并通过引入权值λ结合分类损失，形成新的损失函数，以达到对图像表达多样性的约束。The distributed labels of emotional images are generated according to Mikels'Wheel, and the emotional distribution loss of the predicted distribution and distributed labels is calculated, and a new loss function is formed by introducing the weight λ and the classification loss to achieve the constraints on the diversity of image expression. .

本实施例中，使用随机梯度下降优化上述多任务损失函数L_multi，定义{a_i|i＝1,2,...,C}表示最后一个全连接层第i个情感类别的激活值。In this embodiment, stochastic gradient descent is used to optimize the multi-task loss function L _multi , and {a _i |i=1,2,...,C} is defined to represent the activation value of the ith emotion category in the last fully connected layer.

梯度可以通过式(9)计算：The gradient can be calculated by equation (9):

图像情感的表达具有主观性和多样性，而大多数情感数据集的收集是由投票产生的单一情感标签。但是实际上图像的不同区域可能表达出不同的情感，很难将一张图像划分为单一的情感类型，图像的情感表达往往是一种主导情感伴随着一个或者多个其他相同极性的情感。所以采用单一的情感标签会导致图像情感数据集的标记存在不准确的问题，从而在很大程度上影响了图像情感分类的准确性。因此，本实施例中引入情感极性的关系，通过引入多任务损失函数来综合考虑主导情感和与主导情感相同极性的其他情感，以提高图像情感分类的准确性。The representation of image emotion is subjective and diverse, while most emotion datasets are collected from single emotion labels generated by voting. But in fact, different areas of an image may express different emotions, and it is difficult to classify an image into a single emotion type. The emotional expression of an image is often a dominant emotion accompanied by one or more other emotions of the same polarity. Therefore, the use of a single emotion label will lead to inaccurate labeling of image emotion datasets, which greatly affects the accuracy of image emotion classification. Therefore, the relationship of emotion polarity is introduced in this embodiment, and the dominant emotion and other emotions with the same polarity as the dominant emotion are comprehensively considered by introducing a multi-task loss function, so as to improve the accuracy of image emotion classification.

本实施例中，从图像情感表达的多样性出发，提出了一种新的结合情感分布损失计算的损失函数，以完成对图像情感表达多样性的约束。In this embodiment, starting from the diversity of image emotion expression, a new loss function calculated by combining emotion distribution loss is proposed to complete the constraint on the diversity of image emotion expression.

本发明还提出一种基于显著性检测和多层次特征融合的图像情感分类装置，包括：The present invention also proposes an image emotion classification device based on saliency detection and multi-level feature fusion, including:

模型训练模块，用于将所述训练集中的情感图像和显著性图输入预先构建的图像情感分类模型中；所述图像情感分类模型包括孪生神经网络和Inception-v4网络；利用所述训练集中的情感图像和显著性图对所述孪生神经网络进行训练，并通过训练好的孪生神经网络来利用所述训练集中的显著性图对相应的情感图像进行特征调制，获得调制特征图；利用所述调制特征图对所述Inception-v4网络进行训练，获得训练好的图像情感分类模型；A model training module is used to input the emotional images and saliency maps in the training set into a pre-built image emotion classification model; the image emotion classification model includes a twin neural network and an Inception-v4 network; using the The emotion image and the saliency map are used to train the twin neural network, and the saliency map in the training set is used to perform feature modulation on the corresponding emotion image through the trained twin neural network to obtain a modulation feature map; Modulate the feature map to train the Inception-v4 network to obtain a trained image emotion classification model;

在其中一个实施例中，对于模型训练模块，孪生神经网络如图3所示，包括第一分支和第二分支，第一分支用于对情感图像进行特征提取，第二分支用于对显著性图进行特征提取；所述第一分支和第二分支均由卷积核大小相同的四个卷积层(BasicConv2d)构成；所述第二分支中第二个和第四个卷积层的输出端与所述第一分支中第二个和第四个卷积层的输出端相连接。In one embodiment, for the model training module, the Siamese neural network is shown in FIG. 3, including a first branch and a second branch, the first branch is used for feature extraction on emotional images, and the second branch is used for saliency Feature extraction is performed on the graph; the first branch and the second branch are both composed of four convolutional layers (BasicConv2d) with the same convolution kernel size; the outputs of the second and fourth convolutional layers in the second branch The terminal is connected to the output terminal of the second and fourth convolutional layers in the first branch.

在某个实施例中，模型训练模块还包括：In a certain embodiment, the model training module further includes:

G＝f(T(w,h,c)*[S(w,h)+1]) (1)G=f(T(w,h,c)*[S(w,h)+1]) (1)

F＝f(T′(w,h,c)*[S′(w,h)+1]) (2)F=f(T'(w,h,c)*[S'(w,h)+1]) (2)

在另一个实施例中，在模型训练模块中，因为情感数据集的数据量相对较小，所以先采用迁移学习的策略，先将Inception-v4网络在ImageNet数据集上进行了预训练。之后再利用所述调制特征图对所述Inception-v4网络进行训练。In another embodiment, in the model training module, because the data volume of the emotion data set is relatively small, the strategy of transfer learning is adopted first, and the Inception-v4 network is pre-trained on the ImageNet data set. Then, the Inception-v4 network is trained using the modulation feature map.

在下一个实施例中，在模型训练模块中，所述图像情感分类模型采用基于情感多样性约束的多任务损失函数，所述多任务损失函数为：In the next embodiment, in the model training module, the image emotion classification model adopts a multi-task loss function based on emotion diversity constraints, and the multi-task loss function is:

L_multi＝L_cls+λL_ed (4)L _multi =L _cls +λL _ed (4)

图像情感的表达具有主观性和多样性，而大多数情感数据集的收集是由投票产生的单一情感标签。但是实际上图像的不同区域可能表达出不同的情感，很难将一张图像划分为单一的情感类型，图像的情感表达往往是一种主导情感伴随着一个或者多个其他相同极性的情感。所以采用单一的情感标签会导致图像情感数据集的标记存在不准确的问题，从而在很大程度上影响了图像情感分类的准确性。因此，本实施例中引入情感极性的关系，通过引入多任务损失函数来综合考虑主导情感和与主导情感相同极性的其他情感，以提高图像情感分类的准确性。The representation of image emotion is subjective and diverse, while most emotion datasets are collected from single emotion labels generated by voting. However, in fact, different regions of an image may express different emotions, and it is difficult to classify an image into a single emotion type. The emotional expression of an image is often a dominant emotion accompanied by one or more other emotions of the same polarity. Therefore, the use of a single emotion label will lead to inaccurate labeling of image emotion datasets, which greatly affects the accuracy of image emotion classification. Therefore, the relationship of emotion polarity is introduced in this embodiment, and the dominant emotion and other emotions with the same polarity as the dominant emotion are comprehensively considered by introducing a multi-task loss function, so as to improve the accuracy of image emotion classification.

本发明还提出一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现上述所述方法的步骤。The present invention also provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above-mentioned method when the processor executes the computer program.

本发明还提出一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述所述方法的步骤。The present invention also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned method are implemented.

以上所述仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是在本发明的发明构思下，利用本发明说明书及附图内容所作的等效结构变换，或直接/间接运用在其他相关的技术领域均包括在本发明的专利保护范围内。The above descriptions are only the preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Under the inventive concept of the present invention, the equivalent structural transformations made by the contents of the description and drawings of the present invention, or the direct/indirect application Other related technical fields are included in the scope of patent protection of the present invention.

Claims

1. an image emotion classification method based on saliency detection and multi-level feature fusion, it is characterized in that, described image emotion classification method comprises:

constructing an emotional image set; the emotional image set includes marked emotional images;

Establish a training set and a verification set according to the emotional image set, and use a saliency detection network to extract the saliency maps of the emotional images in the training set and the verification set;

The emotional images and saliency maps in the training set are input into a pre-built image emotion classification model; the image emotion classification model includes a twin neural network and an Inception-v4 network;

The Siamese neural network is trained by using the emotional images and saliency maps in the training set, and the saliency map in the training set is used to perform feature modulation on the corresponding emotional images through the trained Siamese neural network to obtain modulation feature map;

Use the modulation feature map to train the Inception-v4 network to obtain a trained image emotion classification model;

Use the emotional images and saliency maps in the verification set to verify the trained image sentiment classification model;

The image to be classified and the saliency map of the to-be-classified image are input into the verified image emotion classification model for classification, and the image emotion category is obtained.

2. The image emotion classification method according to claim 1, wherein the Siamese neural network comprises a first branch and a second branch, the first branch is used to perform feature extraction on emotional images, and the second branch is used to perform feature extraction on emotional images. The saliency map is used for feature extraction; the first branch and the second branch are both composed of four convolutional layers with the same convolution kernel size; the outputs of the second and fourth convolutional layers in the second branch connected to the outputs of the second and fourth convolutional layers in the first branch.

3. image emotion classification method as claimed in claim 2, is characterized in that, utilizes the saliency map in described training set to carry out feature modulation to corresponding emotional image by trained twin neural network, obtains modulation feature map, comprises:

The emotional images in the training set are input into the first branch of the trained twin neural network, and the saliency map is input into the second branch;

Obtain the feature map S output by the second convolution layer in the second branch, and perform multiplication of the corresponding elements on the feature map S and the feature map T output by the second convolution layer in the first branch to obtain features Figure H, and the feature map H and the feature map T are added with corresponding elements to obtain the feature map G;

Input the feature map G into the third convolutional layer of the first branch, and input the feature map S into the third convolutional layer of the second branch;

Obtain the feature map S' output by the fourth convolution layer in the second branch, and perform the multiplication operation of the corresponding elements on the feature map S' and the feature map T' output by the fourth convolution layer in the first branch , obtain the feature map H', and perform the addition operation of the corresponding elements on the feature map H' and the feature map T' to obtain the modulation feature map F.

4. image emotion classification method as claimed in claim 3, is characterized in that, the calculation formula of described feature map G is:

G=f(T(w,h,c)*[S(w,h)+1]) (1)

In the formula, f represents the sigmoid activation function; T(w,h,c) represents the feature map output by the second convolutional layer in the first branch; S(w,h) represents the second convolutional layer in the second branch The output feature map; w, h, c represent the width, height and number of channels of the feature map, respectively;

The calculation formula of the modulation characteristic map F is:

F=f(T'(w,h,c)*[S'(w,h)+1]) (2)

In the formula, T'(w,h,c) represents the feature map output by the fourth convolutional layer in the first branch; S'(w,h) represents the feature map output by the fourth convolutional layer in the second branch .

5. The image emotion classification method as claimed in claim 1, wherein the Inception-v4 network is a multi-branch structure, comprising successively three convolutional layers, a Mixed_3a module, a Mixed_4a module, a Mixed_5a module, four Inception_A modules, one Reduction_A module, seven Inception_B modules, one Reduction_B module, three Inception_C modules, one average pooling layer and one fully connected layer; the Mixed_5a module, Reduction-A module and Reduction-B module are introduced respectively after a side branch structure, the side branch structure is composed of a convolution layer; the output ends of the three side branch structures are connected to a fully connected layer, and the fully connected layer is used to output the three side branch structures The side branch features are fused and output the fused features, and the fused features are output to the output of the average pooling layer.

6. The image emotion classification method as claimed in claim 5, wherein the last fully connected layer in the Inception-v4 network obtains the probability that the emotional image belongs to the i-th emotional category by Softmax mode:

In the formula, _yi represents the probability that the emotional image belongs to the i-th emotional category; z _i represents the activation value of the emotional image belonging to the i-th category; z _j represents the activation value of the emotional image belonging to the j-th category; C represents the emotional category.

7. The image emotion classification method according to claim 1, wherein the image emotion classification model adopts a multi-task loss function based on emotion diversity constraints, and the multi-task loss function is:

L _multi =L _cls +λL _ed (4)

In the formula, L _multi represents the multi-task loss function; L _cls represents the traditional classification loss; L _ed represents the loss of other accompanying _emotions ; _λ represents the weight; represents the probability that the model predicts that the emotional image belongs to the i-th emotional category; j represents the dominant emotion; i represents other accompanying emotions;

represents the probability of the dominant emotion; p _j represents the class probability of the dominant emotion j; f(i) represents the probability of other accompanying emotions; dis _ij represents the distance between the emotion i defined in Mikels'wheel and the dominant emotion j; i,j ∈B indicates that sentiment categories i and j are in the same polarity.

8. An image emotion classification device based on saliency detection and multi-level feature fusion, characterized in that, comprising:

an image set building module for constructing an emotional image set; the emotional image set includes marked emotional images;

A saliency map acquisition module, used for establishing a training set and a verification set according to the emotional image set, and extracting the saliency map of the emotional images in the training set and the verification set by using a saliency detection network;

A model training module for inputting the emotional images and saliency maps in the training set into a pre-built image emotion classification model; the image emotion classification model includes a twin neural network and an Inception-v4 network; using the emotion in the training set The image and the saliency map are used to train the twin neural network, and the saliency map in the training set is used to perform feature modulation on the corresponding emotional image through the trained twin neural network to obtain a modulation feature map; using the modulation The feature map is used to train the Inception-v4 network to obtain a trained image emotion classification model;

A model verification module for verifying the trained image sentiment classification model using the emotional images and saliency maps in the verification set;

The classification module is used for inputting the image to be classified and the saliency map of the image to be classified into the verified image emotion classification model for classification to obtain the image emotion category.

9. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 7 when the processor executes the computer program .

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.