CN110287320A

CN110287320A - A deep learning multi-category sentiment analysis model combined with attention mechanism

Info

Publication number: CN110287320A
Application number: CN201910553755.7A
Authority: CN
Inventors: 刘磊; 孙应红; 陈浩; 李静
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2019-09-27
Anticipated expiration: 2039-06-25
Also published as: CN110287320B

Abstract

The invention relates to a deep learning multi-category sentiment analysis model combined with an attention mechanism, which belongs to the technical field of natural language processing. The invention analyzes the weaknesses of the existing CNN network and LSTM network in text sentiment analysis, and proposes a combination of attention A deep learning multi-category sentiment analysis model based on force mechanism. The model uses the attention mechanism to integrate the local features extracted by the CNN network and the word order features extracted by the LSTM model, and adopts the idea of an integrated model at the classification layer to splicing the emotional features extracted by the CNN network and the LSTM network respectively as the final extraction of the model. emotional characteristics. Through comparative experiments, it is found that the accuracy of the model has been significantly improved.

Description

A deep learning multi-category sentiment analysis model combined with attention mechanism

技术领域technical field

本发明属于文本信息处理领域，涉及一种结合注意力机制的深度学习多分类情感分析模型。The invention belongs to the field of text information processing, and relates to a deep learning multi-category sentiment analysis model combined with an attention mechanism.

背景技术Background technique

随着微博、Twitter等社交网络的不断兴起，互联网已不仅仅是人们获取日常信息的来源，同时也成为人们表达自己观点不可或缺的平台。人们在网络社区评论热点事件、抒写影评观点以及描述产品体验等，都会产生大量的带有情感色彩(如：喜怒哀乐等)的文本信息，而对这些文本信息进行有效的情感分析，可以更好地了解用户的兴趣倾向和关注程度。但随着人们对网络信息关注度的增加，网络社区每天都有海量的带有情感色彩的文本产生，如果仅仅依靠人工标记，已经远远无法完成这一任务，这就使得文本情感分析成为自然语言处理领域的一个研究热点。With the continuous rise of social networks such as Weibo and Twitter, the Internet is not only a source for people to obtain daily information, but also an indispensable platform for people to express their opinions. People comment on hot events in online communities, express opinions on film reviews, and describe product experience, etc., which will generate a large amount of text information with emotional color (such as: emotions, etc.), and effective sentiment analysis of these text information can be more effective. Get a better understanding of the user's interest tendencies and attention levels. However, with the increase of people's attention to network information, a large number of texts with emotional color are produced in the network community every day. If only relying on manual marking, it is far from being able to complete this task, which makes text sentiment analysis a natural A research hotspot in the field of language processing.

随着深度学习方法在计算机视觉方向的成功应用，越来越多的深度学习技术也被应用于自然语言处理方向。深度学习的优势在于，不仅可以自动提取文本的特征，而且对大数据有较强的表达能力。目前主流的基于深度学习的文本情感分析方法主要包括卷积神经网络(Convolutional Neural Network，CNN)和循环神经网络(Recurrent NeuralNetwork，RNN)两种，基于这两种方法的情感分析模型的准确率都较低，主要有以下几个方面的原因：With the successful application of deep learning methods in the direction of computer vision, more and more deep learning techniques have also been applied in the direction of natural language processing. The advantage of deep learning is that it can not only automatically extract the features of text, but also has a strong ability to express big data. The current mainstream text sentiment analysis methods based on deep learning mainly include Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). The accuracy of the sentiment analysis model based on these two methods is very low. lower, mainly due to the following reasons:

首先，在文本的情感分析过程中，卷积神经网络通过扩大卷积核尺寸，有效地捕捉到不同位置的情感信息，进而获取到文本的局部情感特征。但卷积的过程中，往往忽略了文本中语序间的前后关系。但在文本情感分析过程中，语序的先后关系又十分重要，没有语序的特征信息必导致结果有一定的偏差。First of all, in the process of text sentiment analysis, the convolutional neural network can effectively capture the emotional information of different positions by expanding the size of the convolution kernel, and then obtain the local emotional features of the text. However, in the process of convolution, the contextual relationship between the word order in the text is often ignored. However, in the process of text sentiment analysis, the sequence of word order is very important, and the lack of characteristic information of word order will lead to certain deviations in the results.

其次，循环神经网络网络利用前后依赖关系，有效模拟文本数据的先后顺序，能够提取到文本的语序关系和语义信息，因此在文本的情感分析中能达到很好的效果。但当样本数据较长或者语言场景较复杂时，有用的情感信息的间隔有大有小，长短不一，长短记忆网络(Long Short-Term Memory，LSTM)的性能也因此受到限制。Secondly, the cyclic neural network network uses the front and back dependencies to effectively simulate the sequence of text data, and can extract the word order relationship and semantic information of the text, so it can achieve good results in text sentiment analysis. However, when the sample data is long or the language scene is complex, the intervals of useful emotional information vary in size and length, and the performance of Long Short-Term Memory (LSTM) is therefore limited.

本发明充分利用了注意力机制、CNN网络、LSTM网络，提出并实现了一种结合注意力机制的深度学习多分类情感分析模型。此模型能够有效的提高文本情感分析的准确率。The present invention makes full use of attention mechanism, CNN network and LSTM network, proposes and implements a deep learning multi-category sentiment analysis model combined with attention mechanism. This model can effectively improve the accuracy of text sentiment analysis.

发明内容Contents of the invention

本发明提出了一种基于注意力机制的深度学习多分类情感分析模型。该模型结合CNN网络和LSTM网络进行情感特征融合。首先利用CNN网络的多尺度卷积核提取出待分析文本的局部特征，然后利用注意力机制，将CNN网络提取的局部特征融入到LSTM网络中。最后利用集成模型的思想，将CNN网络的池化层结果和LSTM网络的特征提取结果进行拼接，作为最终的模型输出。实验表明，在文本情感分析中，该模型的准确率有了显著的提高。The present invention proposes a deep learning multi-category sentiment analysis model based on an attention mechanism. This model combines CNN network and LSTM network for emotional feature fusion. First, the multi-scale convolution kernel of the CNN network is used to extract the local features of the text to be analyzed, and then the attention mechanism is used to integrate the local features extracted by the CNN network into the LSTM network. Finally, using the idea of the integrated model, the pooling layer results of the CNN network and the feature extraction results of the LSTM network are spliced as the final model output. Experiments show that the accuracy of the model has been significantly improved in text sentiment analysis.

为实现上述目的，本发明采用如下的技术方案：To achieve the above object, the present invention adopts the following technical solutions:

1.一种结合注意力机制的深度学习多分类情感分析方法，其特征在于包括以下步骤：1. a deep learning multi-category sentiment analysis method combined with attention mechanism, is characterized in that comprising the following steps:

步骤(1)数据预处理Step (1) data preprocessing

设情感数据集表示为：G＝[(segtxt₁,y₁),(segtxt₂,y₂),......,(segtxt_N,y_N)]，其中，segtxt_i表示第i个样本，y_i则为对应的情感类别标签，N表示数据集G中样本个数，对G中样本进行数据预处理，Let the emotional data set be expressed as: G=[(segtxt ₁ ,y ₁ ),(segtxt ₂ ,y ₂ ),...,(segtxt _N ,y _N )], where segtxt _i represents the i-th sample, y _i is the corresponding emotional category label, N represents the number of samples in the data set G, and data preprocessing is performed on the samples in G,

数据集G经预处理后，表示为G′＝[(seg₁,y₁),(seg₂,y₂),...,(seg_M,y_M)]，其中：seg_i表示为数据集G′中第i个样本，y_i则为对应的情感类别标签，M表示数据集G′中样本个数；After the data set G is preprocessed, it is expressed as G′=[(seg ₁ ,y ₁ ),(seg ₂ ,y ₂ ),...,(seg _M ,y _M )], where: seg _i is expressed as data The i-th sample in the set G′, y _i is the corresponding emotional category label, and M represents the number of samples in the data set G′;

步骤(2)构建模型的输入Step (2) Build the input of the model

对于数据集G′中任意一个待分析样本数据(seg,y)，将其进一步细化表示为：For any sample data (seg, y) to be analyzed in the data set G′, it is further refined as:

seg＝[w₁,w₂,w₃,...,w_d]^T (1)seg＝[w ₁ ,w ₂ ,w ₃ ,...,w _d ] ^T (1)

y＝[0,0,1,...,0] (2)y=[0,0,1,...,0] (2)

其中：w_i∈R^ε是指依据词表wordList对待分析文本中第i词语的one-hot编码，ε为词表wordList的大小，d表示该文本的句长。y∈R^p是依据情感类别的one-hot编码，p表示模型待分的类别数目。则该样本的词向量嵌入矩阵可表示为：Among them: w _i ∈ R ^ε refers to the one-hot encoding of the i-th word in the text to be analyzed according to the vocabulary wordList, ε is the size of the vocabulary wordList, and d represents the sentence length of the text. y∈R ^p is the one-hot encoding based on the emotional category, and p represents the number of categories to be classified by the model. Then the word vector embedding matrix of this sample can be expressed as:

X＝seg*E^T (3)X=seg*E ^T (3)

其中：X∈R^d×m，X＝[x₁,x₂,...,x_d]^T为待分析文本的词向量矩阵表示，m为词向量的维度，x_i∈R^m为该文本中第i个词汇的词向量表示，E为词向量嵌入层表示；Among them: X∈R ^d×m , X=[x ₁ ,x ₂ ,...,x _d ] ^T is the word vector matrix representation of the text to be analyzed, m is the dimension of the word vector, x _i ∈ R ^m is the The word vector representation of the i-th vocabulary in the text, E is the word vector embedding layer representation;

步骤(3)构建深度学习多分类情感分析模型Step (3) Build a deep learning multi-category sentiment analysis model

深度学习多分类情感分析模型包括基于CNN网络的局部特征提取阶段和基于LSTM网络的语序关系特征提取阶段，将基于CNN网络的局部特征提取阶段的池化层结果C_Cnn和基于LSTM网络的语序关系特征提取阶段的结果C'_Rnn拼接，即向量[C_Cnn；C'_Rnn]作为模型最终提取的特征向量。然后将特征向量[C_Cnn；C'_Rnn]经过全连接层得到最终的模型输出向量其中p表示模型待分的类别数目。The deep learning multi-category sentiment analysis model includes the stage of local feature extraction based on CNN network and the stage of word order relationship feature extraction based on _LSTM network. The results of the feature extraction stage C' _Rnn splicing, that is, the vector [C _Cnn ; C' _Rnn ] is used as the feature vector finally extracted by the model. Then pass the feature vector [C _Cnn ; C' _Rnn ] through the fully connected layer to get the final model output vector where p represents the number of categories to be classified by the model.

所述的基于CNN网络的局部特征提取阶段，包括以下内容：The described local feature extraction stage based on the CNN network includes the following:

局部特征提取阶段输入为公式3的待分析文本的词向量矩阵表示X；In the local feature extraction stage, the input is the word vector matrix representation X of the text to be analyzed in formula 3;

局部特征提取阶段基于CNN网络，一共包括两层，即一层卷积层、一层池化层，其中：The local feature extraction stage is based on the CNN network, which includes two layers, namely a convolutional layer and a pooling layer, in which:

卷积层采用n种不同尺度的卷积核对待分析文本进行卷积，且同一尺度卷积核的滤波器即神经元各k个；The convolutional layer uses n types of convolution kernels of different scales to convolve the text to be analyzed, and the filters of the same scale convolution kernels are k neurons;

池化层采用最大池化层的方法将卷积所得的向量做下采样，选出局部最优特征，因此每个滤波器通过最大池化层变为一个标量，该标量代表着该滤波器中最优的情感特征；The pooling layer uses the method of the maximum pooling layer to down-sample the vector obtained by convolution and select the local optimal features, so each filter becomes a scalar through the maximum pooling layer, which represents the Optimal emotional characteristics;

局部特征提取模块的输出为C_Cnn＝[c₁,c₂,...,c_nk]，即将池化层中不同尺寸的多个滤波器选取的最优特征拼接到一起C_Cnn＝[c₁,c₂,...,c_nk]作为本模块的输出，其中，C_Cnn∈R^nk，nk为卷积层中所有滤波器的个数；The output of the local feature extraction module is C _Cnn =[c ₁ ,c ₂ ,...,c _nk ], that is, the optimal features selected by multiple filters of different sizes in the pooling layer are spliced together C _Cnn =[c ₁ ,c ₂ ,...,c _nk ] as the output of this module, where C _Cnn ∈ R ^nk , nk is the number of all filters in the convolutional layer;

所述的基于LSTM网络的语序关系特征提取阶段，包括以下内容：Described word order relation feature extraction stage based on LSTM network, comprises the following contents:

多尺度CNN网络局部特征提取：将基于CNN网络的局部特征提取阶段中卷积层同一卷积尺度的k个滤波器的卷积结果拼接，得到集合Z_Cnn，然后将集合Z_Cnn中的每个向量Z_i输入到GLU机制中，即门控卷积网络，得到的结果记为{π₁,π₂,...,π_n}，完成多尺度CNN网络局部特征的提取。Multi-scale CNN network local feature extraction: Concatenate the convolution results of k filters of the same convolution scale in the convolutional layer in the local feature extraction stage based on the CNN network to obtain a set Z _Cnn , and then combine each of the set Z _Cnn The vector Z _i is input into the GLU mechanism, that is, the gated convolutional network, and the obtained results are recorded as {π ₁ ,π ₂ ,...,π _n } to complete the extraction of local features of the multi-scale CNN network.

其中，Z_Cnn＝{Z₁,Z₂,...,Z_n}，Z_i为尺度为i的多个滤波器卷积结果的拼接；Among them, Z _Cnn = {Z ₁ , Z ₂ ,..., Z _n }, Z _i is the concatenation of multiple filter convolution results with a scale of i;

其中，Z_i代表某一尺度的k个滤波器卷积结果的拼接，W₁，W₂∈R^λ×q为权重矩阵，λ表示对应权重矩阵的维度，的b₁，b₂∈R^q为偏置量，σ表示sigmoid函数，π_i∈R^q，q为LSTM网络的输出维度；in, Z _i represents the concatenation of k filter convolution results of a certain scale, W ₁ , W ₂ ∈ R ^λ×q is the weight matrix, λ represents the dimension of the corresponding weight matrix, and b ₁ , b ₂ ∈ R ^q is the bias Set amount, σ represents the sigmoid function, π _i ∈ R ^q , q is the output dimension of the LSTM network;

然后，利用注意力机制，将多尺度CNN网络局部特征提取结果{π₁,π₂,...,π_n}融入到LSTM网络中，得到基于LSTM网络的语序关系特征提取阶段的输出结果C'_Rnn，即 Then, using the attention mechanism, the multi-scale CNN network local feature extraction results {π ₁ ,π ₂ ,...,π _n } are integrated into the LSTM network, and the output result C of the word order relationship feature extraction stage based on the LSTM network is obtained ' _Rnn , ie

其中，表示待分析文本中最后一个词语所对应的LSTM模块的输出，表示待分析文本中第一个词语所对应的LSTM模块的输出，本发明采用双向LSTM模型，即BiLSTM模型，in, Indicates the output of the LSTM module corresponding to the last word in the text to be analyzed, Indicates the output of the LSTM module corresponding to the first word in the text to be analyzed. The present invention adopts a bidirectional LSTM model, namely the BiLSTM model,

采用正向传播，具体计算过程如下： Using forward propagation, the specific calculation process is as follows:

d为待分析文本的长度，该文本中每一个词语顺序对应一个LSTM模块，，d is the length of the text to be analyzed, each word sequence in the text corresponds to an LSTM module,,

正向传播过程中，第t-1个LSTM模块的输出为则第t个LSTM模块的输出计算公式如下：During forward propagation, the output of the t-1th LSTM module is Then the output of the tth LSTM module Calculated as follows:

其中：是两个向量的点乘，也称打分函数，是用来计算前一词语的LSTM的输出和当前局部特征向量的相似度，in: Is the dot product of two vectors, also known as the scoring function, which is used to calculate the output of the LSTM of the previous word The similarity with the current local feature vector,

其中：α_t,i∈R代表特征π_i的权重，Among them: α _t,i ∈ R represents the weight of feature π _i ,

其中：s_t-1∈R^q是多个卷积特征的加权结果，利用s_t-1代替结合当前词语的词向量x_t求得当前LSTM模块的输出公式如下：Among them: s _t-1 ∈ R ^q is the weighted result of multiple convolutional features, using s _t-1 instead Combine the word vector x _t of the current word to obtain the output of the current LSTM module The formula is as follows:

采用反向传播，具体计算过程与正向传播一样，此处不再赘述； Using backpropagation, the specific calculation process is the same as that of forward propagation, and will not be repeated here;

步骤(4)模型训练：将训练数据输入多分类情感分析模型，采用交叉熵损失函数，结合反向传播BP算法调整参数，利用softmax回归作为分类算法，完成训练；Step (4) Model training: Input the training data into the multi-category sentiment analysis model, use the cross-entropy loss function, adjust the parameters in combination with the backpropagation BP algorithm, and use softmax regression as the classification algorithm to complete the training;

步骤(5)模型分析：将待分析文本输入训练完成的模型，最终输出对该文本分析后的情感分类结果。Step (5) Model analysis: Input the text to be analyzed into the trained model, and finally output the sentiment classification result after analyzing the text.

所述的预处理过程包括以下步骤：Described pretreatment process comprises the following steps:

1)分词、去除停用、英文大写转小写、繁体转简体。1) Word segmentation, deactivation, uppercase to lowercase, and traditional to simplified.

2)选取数据集G中频率大于等于σ的词语，构造词汇表wordList＝{word₁,word₂,...word_ε}，其中，word_i表示词汇表wordlist中第i个词语，ε表示数据集G中词频超过σ的词语总数。2) Select words whose frequency is greater than or equal to σ in the data set G, and construct a vocabulary wordList={word ₁ ,word ₂ ,...word _ε }, where word _i represents the i-th word in the vocabulary wordlist, and ε represents the data The total number of words in the set G whose word frequency exceeds σ.

3)对数据集G中每一个样本，若长度大于d，则删除该样本，若长度小于d，则用符号</>补齐。3) For each sample in the data set G, if the length is greater than d, delete the sample, and if the length is less than d, fill it with the symbol </>.

所述的基于CNN网络的局部特征提取模块的卷积层计算公式如下：The convolutional layer calculation formula of the local feature extraction module based on the CNN network is as follows:

z＝f(∑W^T*x_i:i+s-1+b) (8)z＝f(∑W ^T *x _i:i+s-1 +b) (8)

其中:z表示一个神经元对待分析文本的卷积所得的特征向量，f(·)表示激活函数，W∈R^s×m表示神经元的权重矩阵，同一个神经元参数共享，s×m表示卷积核尺寸的大小，b表示阈值，x_i:i+s-1表示由文本句子中的第i个词到i+s-1个词语的词向量。Among them: z represents the feature vector obtained by the convolution of a neuron to analyze the text, f(·) represents the activation function, W∈R ^s×m represents the weight matrix of the neuron, the same neuron parameters are shared, s×m represents The size of the convolution kernel, b represents the threshold, x _i:i+s-1 represents the word vector from the i-th word to the i+s-1 word in the text sentence.

所述的训练数据为经过预处理之后的数据。The training data is preprocessed data.

所述的基于CNN网络的局部特征提取阶段的卷积层采用4种不同尺度的卷积核。所述的训练结束条件是准确率不再改变或达到设置迭代次数。The convolution layer of the local feature extraction stage based on the CNN network adopts four convolution kernels of different scales. The training end condition is that the accuracy rate does not change or reaches the set number of iterations.

附图说明Description of drawings

图1本发明的方法流程图；Fig. 1 method flowchart of the present invention;

图2结合注意力机制的深度学习多分类情感分析模型结构示意图。Figure 2 Schematic diagram of the structure of the deep learning multi-category sentiment analysis model combined with the attention mechanism.

具体实施方式Detailed ways

下面结合图表和实施例，对本发明的具体实施方式作进一步的详细描述。以下实施例用于说明本发明，但不用来限制本发明的范围。The specific implementation of the present invention will be further described in detail below in conjunction with the diagrams and examples. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

本发明提出的方法是依次按以下步骤实现的：The method that the present invention proposes is to realize by following steps successively:

步骤(1)数据预处理Step (1) data preprocessing

情感语数据集表示为：G＝[(segtxt₁,y₁),(segtxt₂,y₂),......,(segtxt_N,y_N)]，其中，segtxt_i表示第i个样本，y_i则为对应的情感类别标签。N表示数据集G中样本个数，情感标签取“喜悦”、“愤怒”、“厌恶”、“低落”四大类，N取80000，其中，四类情感样本各20000条。对G中样本进行数据预处理包括以下几个步骤：The emotional language data set is expressed as: G=[(segtxt ₁ ,y ₁ ),(segtxt ₂ ,y ₂ ),...,(segtxt _N ,y _N )], where segtxt _i represents the i-th samples, and y _i is the corresponding emotion category label. N represents the number of samples in the data set G, and the emotion labels are divided into four categories: "joy", "anger", "disgust", and "depression". N is 80,000, and each of the four types of emotion samples is 20,000. Data preprocessing for samples in G includes the following steps:

2)选取数据集G中频率大于等于σ的词语，构造词汇表wordList＝{word₁,word₂,...word_ε}，其中，word_i表示数据集G中第i个词语，ε表示数据集G中词频超过σ的词语总数。σ取2，最终得到的数据集G中，词频大于等于2的词语共41763个，即ε为41763。2) Select the words whose frequency is greater than or equal to σ in the data set G, and construct a vocabulary wordList={word ₁ , word ₂ ,...word _ε }, where word _i represents the i-th word in the data set G, and ε represents the data The total number of words in the set G whose word frequency exceeds σ. If σ is set to 2, in the final data set G, there are 41763 words with a word frequency greater than or equal to 2, that is, ε is 41763.

3)将上述处理后，对数据集G中每一个样本，若长度大于d，则删除该样本，若长度小于d，则用符号</>补齐。d取64。3) After the above processing, for each sample in the data set G, if the length is greater than d, delete the sample, and if the length is less than d, fill it with the symbol </>. d takes 64.

数据集G经预处理后，表示为G′＝[(seg₁,y₁),(seg₂,y₂),...,(seg_M,y_M)]。其中：seg_i表示为数据集G′中第i个样本，y_i则为对应的情感类别标签，M表示数据集G′中样本个数。最终数据集G′中样本的个数为73150条，各情感类别的样本条数如表1所示：After the data set G is preprocessed, it is expressed as G′=[(seg ₁ ,y ₁ ),(seg ₂ ,y ₂ ),...,(seg _M ,y _M )]. Among them: seg _i represents the i-th sample in the data set G′, y _{i represents} the corresponding emotion category label, and M represents the number of samples in the data set G′. The number of samples in the final data set G′ is 73150, and the number of samples of each emotion category is shown in Table 1:

表1预处理后各类别样本数量Table 1 Number of samples in each category after preprocessing

步骤(2)模型的输入Step (2) The input of the model

y＝[0,0,1,...,0] (2)y=[0,0,1,...,0] (2)

其中：w_i∈R^ε是指依据词表wordList对待分析文本中第i词语的one-hot编码，ε为词表wordList的大小，文本的句长d为64。y∈R^p是依据情感类别的one-hot编码，p表示模型待分的类别数目，p为4。则该样本的词向量嵌入矩阵可表示为：Among them: w _i ∈ R ^ε refers to the one-hot encoding of the i-th word in the analyzed text according to the vocabulary wordList, ε is the size of the vocabulary wordList, and the sentence length d of the text is 64. y∈R ^p is the one-hot encoding based on the emotional category, p represents the number of categories to be classified by the model, and p is 4. Then the word vector embedding matrix of this sample can be expressed as:

X＝seg*E^T (3)X=seg*E ^T (3)

其中：X∈R^d×m，X＝[x₁,x₂,...,x_d]^T为待分析文本的词向量矩阵表示，词向量维度m取256。x_i∈R^m为该文本中第i个词汇的词向量表示，词向量嵌入层表示E，采用维基百科开源word2vec词向量，接下来将X作为网络模型的输入。Where: X∈R ^d×m , X=[x ₁ ,x ₂ ,...,x _d ] ^T is the word vector matrix representation of the text to be analyzed, and the word vector dimension m is 256. x _i ∈ R ^m is the word vector representation of the i-th vocabulary in the text, the word vector embedding layer represents E, using Wikipedia's open source word2vec word vector, and then X is used as the input of the network model.

卷积层采用n种不同尺度的卷积核对待分析文本进行卷积，且同一尺度卷积核的滤波器即神经元各k个，本发明n取4，k取128。The convolution layer uses n types of convolution kernels of different scales to convolve the text to be analyzed, and there are k filters of the same scale convolution kernels, that is, neurons. In the present invention, n is 4, and k is 128.

局部特征提取模块的输出为C_Cnn＝[c₁,c₂,...,c_nk]，即将池化层中不同尺寸的多个滤波器选取的最优特征拼接到一起C_Cnn＝[c₁,c₂,...,c_nk]作为本模块的输出，其中，C_Cnn∈R^nk，nk为卷积层中所有滤波器的个数，共512个；The output of the local feature extraction module is C _Cnn =[c ₁ ,c ₂ ,...,c _nk ], that is, the optimal features selected by multiple filters of different sizes in the pooling layer are spliced together C _Cnn =[c ₁ ,c ₂ ,...,c _nk ] as the output of this module, where C _Cnn ∈ R ^nk , nk is the number of all filters in the convolutional layer, a total of 512;

其中，Z_i代表某一尺度的k个滤波器卷积结果的拼接，W₁，W₂∈R^λ×q为权重矩阵，λ表示对应权重矩阵的维度，b₁，b₂∈R^q为偏置量，σ表示sigmoid函数，π_i∈R^q，q为LSTM网络的输出维度，q取256；in, Z _i represents the concatenation of k filter convolution results of a certain scale, W ₁ , W ₂ ∈ R ^λ×q is the weight matrix, λ represents the dimension of the corresponding weight matrix, b ₁ , b ₂ ∈ R ^q is the bias Quantity, σ represents the sigmoid function, π _i ∈ R ^q , q is the output dimension of the LSTM network, and q is 256;

步骤(4)模型训练：将训练数据输入多分类情感分析模型，采用交叉熵损失函数，结合反向传播BP算法调整参数，利用softmax回归作为分类算法，完成训练。Step (4) Model training: Input the training data into the multi-category sentiment analysis model, use the cross-entropy loss function, combine the backpropagation BP algorithm to adjust the parameters, and use softmax regression as the classification algorithm to complete the training.

z＝f(∑W^T*x_i:i+s-1+b) (8)z＝f(∑W ^T *x _i:i+s-1 +b) (8)

其中:z表示一个神经元对待分析文本的卷积所得的特征向量，f(·)表示激活函数，W∈R^s×m表示神经元的权重矩阵，同一个神经元参数共享，s×m表示卷积核尺寸的大小，b表示阈值，x_i:i+s-1表示由文本句子中的第i个词到i+s-1个词语的词向量，s取[2,3,4,5]四种不同的卷积尺寸，f(·)采用RELU激活函数。Among them: z represents the feature vector obtained by the convolution of a neuron to analyze the text, f(·) represents the activation function, W∈R ^s×m represents the weight matrix of the neuron, the same neuron parameters are shared, s×m represents The size of the convolution kernel, b represents the threshold, x _i:i+s-1 represents the word vector from the i-th word in the text sentence to the i+s-1 word, and s takes [2,3,4, 5] Four different convolution sizes, f(·) using the RELU activation function.

1.实验分析1. Experimental analysis

测试阶段，选取喜悦、愤怒、厌恶、低落各类别情感语料各占2000条。使用准确率Acc(Accuracy)作为评价指标，测试阶段模型的参数保持不变，测试集结果如表2所示：In the test phase, 2,000 emotion corpora were selected for each category of joy, anger, disgust, and depression. Using the accuracy rate Acc (Accuracy) as the evaluation index, the parameters of the model in the test phase remain unchanged, and the results of the test set are shown in Table 2:

表2情感分析结果对比Table 2 Comparison of sentiment analysis results

表2中给出了几种模型的测试结果对比，其中，实验1是通用的卷积核尺寸为3的单尺度CNN网络模型，实验2是通用的LSTM网络，实验3则是本文提出的基于注意力机制的文本情感分析模型。Table 2 shows the comparison of the test results of several models. Among them, Experiment 1 is a general-purpose single-scale CNN network model with a convolution kernel size of 3, Experiment 2 is a general-purpose LSTM network, and Experiment 3 is based on Text Sentiment Analysis Model with Attention Mechanism.

通过实验的对比性分析可见，相较通常的CNN网络和LSTM网络，本文提出的基于注意力机制的情感分析模型的准确率都明显提高，说明了本发明提出的方法可以有效的提取CNN网络的局部特征信息和LSTM网络的语序特征信息，说明了该方法的有效性。Through the comparative analysis of the experiment, it can be seen that compared with the usual CNN network and LSTM network, the accuracy of the emotional analysis model based on the attention mechanism proposed in this paper is significantly improved, which shows that the method proposed by the present invention can effectively extract the content of the CNN network. The local feature information and the word order feature information of the LSTM network illustrate the effectiveness of the method.

Claims

1. A deep learning multi-classification emotion analysis method combined with an attention mechanism is characterized by comprising the following steps:

step (1) data preprocessing

Let the emotion data set be expressed as: g ═ segtxt₁,y₁),(segtxt₂,y₂),...,(segtxt_N,y_N)]Wherein segtxt_iDenotes the ith sample, y_iThen the corresponding emotion type label is obtained, N represents the number of samples in the data set G, and the samples in G are processedThe pre-processing of the data is carried out,

the dataset G was preprocessed and denoted as G' ═ seg [ ("seg₁,y₁),(seg₂,y₂),...,(seg_M,y_M)]Wherein: seg_iExpressed as the ith sample, y, in the data set G_iThen the label is a corresponding emotion category label, and M represents the number of samples in the data set G';

step (2) input of the constructed model

For any sample data to be analyzed (seg, y) in the data set G', it is further detailed as:

seg＝[w₁,w₂,...,w_i,...,w_d]^T (1)

y＝[0,0,1,...,0] (2)

wherein: w is a_i∈R^εOne-hot coding of the ith word in a text to be analyzed is carried out according to a word list wordList, epsilon is the size of the word list wordList, d represents the sentence length of the text, and y belongs to R^pAccording to one-hot encoding of emotion categories, p represents the number of categories to be classified by the model, and the word vector embedding matrix of the sample can be represented as follows:

X＝seg*E^T (3)

wherein: x is formed by R^d×m，X＝[x₁,x₂,...,x_d]^TFor a word vector matrix representation of the text to be analyzed, m is the dimension of the word vector, x_i∈R^mRepresenting the word vector of the ith vocabulary in the text, and E representing the word vector embedded layer;

step (3) constructing a deep learning multi-classification emotion analysis model

The deep learning multi-classification emotion analysis model comprises a CNN network-based local feature extraction stage, an LSTM network-based word order relation feature extraction stage and a pooling layer result C of the CNN network-based local feature extraction stage_CnnAnd result C 'of language sequence relation feature extraction stage based on LSTM network'_RnnSplicing, i.e. vector [ C_Cnn；C'_Rnn]As the feature vector of the final extraction of the model, and then the feature vector [ C ]_Cnn；C'_Rnn]Obtaining the final model through the full connection layerOutput vectorWhere p represents the number of classes to be assigned to the model,

the local feature extraction stage based on the CNN network comprises the following contents:

inputting a word vector matrix representation X of the text to be analyzed of a formula 3 in a local feature extraction stage;

the local feature extraction stage is based on a CNN network and comprises two layers in total, namely a convolutional layer and a pooling layer, wherein:

the convolution layer adopts n convolution kernels with different scales to convolute the text to be analyzed, and the number of filters of the convolution kernel with the same scale is k, namely, k of each neuron;

in the pooling layer, the vector obtained by convolution is down-sampled by adopting a maximum pooling layer method, and local optimal features are selected, so that each filter becomes a scalar through the maximum pooling layer, and the scalar represents the optimal emotional features in the filter;

the output of the local feature extraction module is C_Cnn＝[c₁,c₂,...,c_nk]Splicing the optimal features selected by a plurality of filters with different sizes in the pooling layer together C_Cnn＝[c₁,c₂,...,c_nk]As output of this module, where C_Cnn∈R^nkAnd nk is the number of all filters in the convolutional layer;

the language order relation feature extraction stage based on the LSTM network comprises the following contents:

multi-scale CNN network local feature extraction: splicing convolution results of k filters with the same convolution scale of convolution layers in the local feature extraction stage based on the CNN network to obtain a set Z_CnnThen set Z_CnnEach vector Z in (a)_iInputting the result into a GLU mechanism, namely, gating a convolution network, and marking the obtained result as { pi₁,π₂,...,π_nFinishing the extraction of the local features of the multi-scale CNN network,

wherein Z is_Cnn＝{Z₁,Z₂,...,Z_n}，Z_iSplicing convolution results of a plurality of filters with the scale i;

wherein,Z_istitching of convolution results of k filters representing a certain scale, W₁，W₂∈R^λ×qFor the weight matrix, λ represents the dimension of the corresponding weight matrix, b₁，b₂∈R^qFor the offset, σ denotes the sigmoid function, π_i∈R^qQ is the output dimension of the LSTM network;

then, extracting the local feature extraction result { pi ] of the multi-scale CNN network by using an attention mechanism₁,π₂,...,π_nIntegrating the language sequence relationship feature extraction stage into an LSTM network to obtain an output result C 'of the language sequence relationship feature extraction stage based on the LSTM network'_RnnI.e. by

Wherein,represents the output of the LSTM module corresponding to the last word in the text to be analyzed,representing the output of the LSTM module corresponding to the first word in the text to be analyzed, the invention uses a bi-directional LSTM model, i.e. a BiLSTM model,

the forward propagation is adopted, and the specific calculation process is as follows:

d is the length of the text to be analyzed, each word in the text sequentially corresponds to an LSTM module,

in the forward propagation process, the output of the t-1 th LSTM module isThen the output of the tth LSTM moduleThe calculation formula is as follows:

wherein:is a dot product of two vectors, also known as a scoring function, and is the output of the LSTM used to calculate the previous wordAnd the similarity of the current local feature vector,

wherein α_t,iEpsilon R represents characteristic pi_iThe weight of (a) is determined,

wherein: s_t-1∈R^qIs a weighted result of a plurality of convolution features, using s_t-1Instead of the formerWord vector x in combination with current word_tEvaluating the output of a current LSTM moduleThe formula is as follows:

the backward propagation is adopted, the specific calculation process is the same as the forward propagation, and the details are not repeated here;

step (4), model training: inputting training data into a multi-classification emotion analysis model, adjusting parameters by adopting a cross entropy loss function and combining a back propagation BP algorithm, and finishing training by using softmax regression as a classification algorithm;

and (5) analyzing a model: and inputting the text to be analyzed into the trained model, and finally outputting the emotion classification result after the text is analyzed.

2. The method for deep learning multi-classification emotion analysis combined with attention mechanism as claimed in claim 1, wherein the preprocessing process comprises the following steps:

1) word segmentation, stop removal, conversion from English capitals to lowercases, conversion from traditional Chinese to simplified Chinese,

2) selecting words with frequency more than or equal to sigma in the data set G, and constructing a vocabulary table word List ═ word₁,word₂,...word_εTherein, word_iRepresenting the ith word in the vocabulary word, epsilon represents the total number of words in the data set G with a word frequency exceeding sigma,

3) for each sample in the data set G, if the length is larger than d, deleting the sample, and if the length is smaller than d, filling the sample with symbols </>.

3. The method for deep learning multi-classification emotion analysis combined with attention mechanism as claimed in claim 1, wherein the convolutional layer calculation formula of the CNN network-based local feature extraction module is as follows:

z＝f(∑W^T*x_i:i+s-1+b) (8)

wherein z represents a feature vector obtained by convolution of a neuron with a text to be analyzed, f (-) represents an activation function, and W is equal to R^s×mWeight matrix representing neurons, shared by the same neuron parameter, sxm represents the size of convolution kernel, b represents threshold, x represents_i:i+s-1A word vector representing the i-th word to the i + s-1 words in the text sentence.

4. The method as claimed in claim 1, wherein the training data is preprocessed data.

5. The method for deep learning multi-classification emotion analysis combined with attention mechanism as claimed in claim 1, wherein the convolutional layer of the local feature extraction stage based on the CNN network employs 4 convolutional kernels with different scales.

6. The method as claimed in claim 1, wherein the end condition of the training is that the accuracy is not changed or the set number of iterations is reached.