CN115544252A - Text emotion classification method based on attention static routing capsule network - Google Patents

Text emotion classification method based on attention static routing capsule network Download PDF

Info

Publication number
CN115544252A
CN115544252A CN202211152911.7A CN202211152911A CN115544252A CN 115544252 A CN115544252 A CN 115544252A CN 202211152911 A CN202211152911 A CN 202211152911A CN 115544252 A CN115544252 A CN 115544252A
Authority
CN
China
Prior art keywords
text
attention
layer
vector
capsule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211152911.7A
Other languages
Chinese (zh)
Inventor
苏依拉
杨佩恒
仁庆道尔吉
吉亚图
乌尼尔
路敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN202211152911.7A priority Critical patent/CN115544252A/en
Publication of CN115544252A publication Critical patent/CN115544252A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A text sentiment classification method based on an attention static routing capsule network collects non-labeled text data of a target language; training by using label-free text data and a word2vec method to obtain word vector representation of a target language; collecting text data with tags of a target language; constructing a classification model based on an attention static routing capsule network; carrying out supervised training on the classification model by using the text data with the tag of the target language; and evaluating the trained classification model by using the accuracy, the precision, the recall rate and the F1Score to obtain a text sentiment classification model meeting the requirements, and classifying the input text by using the text sentiment classification model meeting the requirements. The method and the device can improve the extraction capability of the text features and the relation modeling capability among the text features, and finally improve the precision of the text emotion classification.

Description

Text emotion classification method based on attention static routing capsule network
Technical Field
The invention belongs to the technical field of artificial intelligence and text emotion classification, and particularly relates to a text emotion classification method based on an attention static routing capsule network.
Background
Text sentiment classification is one of the most basic and important tasks in the field of machine learning. Conventionally, word frequency inverse text frequency (tf-idf) is used as a feature representation of text, and then a general classifier such as a Support Vector Machine (SVM) or logistic regression is used for text emotion classification.
However, in recent years, the continued development of deep learning methods has made it possible to find distributed representations of words and documents in an efficient manner, which further improves the accuracy of textual emotion classification. The main deep learning models used in the field of text emotion classification are mainly based on Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) and the Transformer architecture of the big fire in recent years. Hinton in 2017 proposed a capsule network in view of the shortcomings of the convolutional neural network, and applied it in the field of image processing, proving that it is effective in understanding spatial relationships in high-level data. Researchers try to apply the capsule network to text processing and achieve good effects later, and prove that the capsule network has advantages for text information processing. The information transmission between different layers of capsules of the traditional capsule network adopts a dynamic routing mechanism, and the dynamic routing mechanism needs to iteratively calculate the weights of different capsules according to data dynamics each time, so that the process is very time-consuming.
Therefore, how to reduce the time spent on routing between capsules in the capsule network without reducing the accuracy of the model becomes an urgent problem to be solved in the field.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a text emotion classification method based on an attention static routing capsule network, so as to improve the extraction capability of text features and the relation modeling capability among the text features, and finally improve the accuracy of text emotion classification.
In order to achieve the purpose, the invention adopts the technical scheme that:
a text emotion classification method based on an attention static routing capsule network comprises the following steps:
step 1, collecting non-labeled text data of a target language; the target language is a language used for finishing the text emotion classification task finally;
step 2, training by using the label-free text data and the word2vec method in the step 1 to obtain word vector representation of the target language;
step 3, collecting text data with labels of the target language;
step 4, constructing a classification model based on the attention static routing capsule network;
step 5, performing supervised training on the classification model in the step 4 by using the text data with the labels of the target language obtained in the step 3;
and 6, evaluating the classification model trained in the step 5 by using the accuracy, the precision, the recall rate and the F1Score to obtain a text emotion classification model meeting the requirements, and classifying the input text by using the text emotion classification model meeting the requirements.
In one embodiment, in step 1, text data is collected and cleaned, useless text and non-text content are removed, and label-free text data is obtained, wherein the total number of words of the label-free text is not less than 100 ten thousand words.
In one embodiment, in step 2, word embedding pre-training is performed by using a continuous bag of words model (CBOW) model of word2vec, so as to obtain real number vectors of all words in the target language, that is, the word vector representation.
In one embodiment, step 3, text data is collected and cleaned, useless text and non-text content are removed, and then emotional tendency of each text is marked manually.
In one embodiment, the model component of the classification model comprises: the word2vec word embedded layer, the two-dimensional convolution layer, the full connection layer, the extrusion pooling layer, the primary capsule layer, the intermediate capsule layer, the high-grade capsule layer and the classification capsule layer;
the word2vec word embedding layer is used for mapping texts into word vector sequences; the word vector sequence forms a real number matrix and is used as a picture of a single input channel to be input into the two-dimensional convolution layer, and the two-dimensional convolution layer extracts multi-scale features of a text by utilizing multi-scale convolution and converts the multi-scale features into a vector capsule;
the full-connection layer is used for unifying dimensions of the multi-scale features extracted from the two-dimensional convolutional layer and then performing feature fusion on the multi-scale features with unified dimensions based on attention weight;
the extrusion pooling layer is used for compressing the fused features into vectors with the die length of 0-1 and then serving as the input of the primary capsule layer;
the primary capsule layer, the middle-level capsule layer, the high-level capsule layer and the classification capsule layer are used for transmitting the most original semantic information extracted by the convolution layer to the model to be output by using the attention static route step by step, so that the category of the text emotion is obtained.
In one embodiment, the step 5 training process is as follows:
1) Text data to be classified T = { w = { (w) 1 ,w 2 ,…,w n Inputting into the word2vec word embedding layer, and inputting each word w i Mapped as a real vector v i ∈R d So that the entire text becomes a matrix D = { v = 1 ,v 2 ,…,v n }∈R d×n Wherein d is the dimension of the word vector, and n is the length of the text;
2) Inputting the matrix D into a two-dimensional convolution layer as a picture of a single input channel, performing feature extraction on the matrix D by using a multi-scale convolution kernel to obtain multi-scale features, wherein the formula of an output shape is as follows;
Figure BDA0003857635110000031
wherein
Figure BDA0003857635110000032
Denotes a downward integer, n h Is the longitudinal length, k, of the matrix D h Is the longitudinal length of the convolution kernel, p h For longitudinal filling, s h Is a longitudinal stride.
3) Changing the dimensionality of the multi-scale features to be the same through the full connection layer to obtain multi-scale output features g i
4) Outputting the characteristic g in multiple scales i Performing weighted fusion on the same output channel based on attention weight to obtain a fused feature s i
5) In extrusion pooling, fused features s i Compressing the mixture into a vector c with the die length of 1 by an extrusion operation, and inputting the vector into a subsequent capsule layer, wherein the formula of the extrusion operation is as follows:
Figure BDA0003857635110000033
6) The primary capsule layer, the intermediate capsule layer, the high-level capsule layer and the classification capsule layer are all connected, and the routing mode among the capsules adopts an attention static routing mechanism.
In one embodiment, the two-dimensional convolution layers and the multi-scale convolution kernels have a total number of 5, and the sizes of the two-dimensional convolution layers and the multi-scale convolution kernels are respectively as follows: 1 xd, 3 xd, 5 xd, 7 xd and 9 xd, longitudinal step s h =1, vertical filling p h =0, the output channels are all 256, and the output shapes of the calculated multi-scale convolution are: o 1 ∈R n×1 ,o 2 ∈R (n-2)×1 ,o 3 ∈R (n-4)×1 ,o 4 ∈R (n-6)×1 ,o 5 ∈R (n-8)×1 (ii) a The full connection layer is respectively as follows: w 1 ∈R e×n ,W 2 ∈R e×(n-2) ,W 3 ∈R e×(n-4) ,W 4 ∈R e×(n-6) ,W 5 ∈R e×(n-8) (ii) a The dimensions are unified as follows: w i o i =g i ∈R e×1 Wherein g is i Of the same dimensionMulti-scale output features; and e is the dimension after the multi-scale features are unified.
In one embodiment, the weighted fusion method is as follows:
the multi-scale output features are m vectors g on each channel i ∈R e Let g i =k i =v i ∈R e (ii) a Setting a query vector q epsilon R for querying semantic feature importance e And m key value pairs (k) 1 ,v 1 ),…,(k m ,v m ) Fusing the multi-scale features based on attention weights is represented as the following formula:
Figure BDA0003857635110000041
s∈R e
let g i =k i =v i ∈R e Q represents query, k represents key, v represents value; q, k 1 …k m ,v 1 …v i Is a function input, the function relation is
Figure BDA0003857635110000042
Wherein q and k i Attention weight of (a, k) of (b) i ) Is a function of attention scoring
Figure BDA0003857635110000043
The vectors q and k i Mapping into scalar, and calculating with softmax to obtain real number weight between 0-1, alpha (q, k) i ) The calculation formula of (a) is as follows:
Figure BDA0003857635110000044
α(q,k i )∈R
attention scoring function
Figure BDA0003857635110000045
Is calculated with additive attentionGiven a vector q ∈ R e Vector k i ∈R e Learnable parameter matrix W q ∈R e×e Learnable parameter matrix W k ∈R e×e Learnable parameter vector w v ∈R 1×e Will matrix W q Performing matrix multiplication with the vector q and the matrix W k And vector k i Adding the results after matrix multiplication, inputting into tanh function for nonlinear transformation, and vector w v The result of the transposition and the nonlinear transformation is multiplied to finally obtain the attention fraction, wherein the attention fraction is a real number, and the calculation formula is as follows:
Figure BDA0003857635110000051
Figure BDA0003857635110000052
in one embodiment, the static routing mechanism assigns a weight to each vector by means of a learnable parameter matrix and an attention mechanism, lower level capsules 1, 2, 3, respectively outputting a vector v 1 、v 2 And v 3 Using additive attention scoring function
Figure BDA0003857635110000053
Scoring each output vector, inputting the attention score into softmax operation to obtain corresponding weight, and combining the weight with v 1 、v 2 And v 3 Carrying out weighted summation to obtain a vector y, and then carrying out extrusion operation on the vector y to obtain a vector v with the modular length between 0 and 1 i And then the mixture is sent into the next layer of capsules.
Compared with the prior art, the invention has the beneficial effects that:
firstly, the invention designs a new model structure: the overall structure of a Capsule network (Capsule network-ASR: capsule network based on attentive static routing) based on attention static routing is as follows: word embedding layer, convolution layer, initial capsule layer, intermediate capsule layer, high-grade capsule layer and classification capsule layer. Secondly, dynamic routing between the capsule layers is replaced by a special attention static routing mechanism of the invention, and the network automatically learns how to distribute the weight of the routing to the bottom layer capsule in the training stage, thereby improving the routing efficiency. Thirdly, the convolution layer in the model adopts a multi-scale convolution kernel to better extract text information, and the multi-scale convolution characteristics are subjected to weighted fusion on the same output channel by using an attention mechanism. And finally, replacing the common pooling operation of the convolutional neural network with extrusion operation so as to improve the modeling capability of the relationship among the semantic features. Through the improvement, the extraction capability of the text features can be effectively improved, the relation modeling capability among the text features is improved, and finally the text emotion classification precision is improved.
Drawings
FIG. 1 is a diagram of a classification model architecture for an attention-based static routing capsule network.
FIG. 2 is a schematic diagram of fusion of multi-scale features based on attention weights.
Fig. 3 is an attention static routing diagram.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
Compared with the existing text emotion classification method, the text emotion classification method has the advantages that the dynamic routing process of the capsule network is replaced by the static routing based on the attention mechanism, and the network can automatically learn how to distribute the weight of the routing to the bottom-layer capsule in the training stage; and the usual pooling operation of convolutional neural networks is replaced by a squeezing operation. The overall structure of the network model is as follows: the capsule comprises a word embedding layer, a two-dimensional convolution layer, a full connection layer, an extrusion pooling layer, an initial capsule layer, a middle-grade capsule layer, a high-grade capsule layer and a classification capsule layer.
Specifically, the method comprises the following steps:
step 1, collecting and sorting non-labeled text data of a target language.
The method mainly comprises the step of collecting corresponding text data according to specific tasks, wherein a target language is a language used for finishing a text emotion classification task finally. The purpose of word embedding is to allow subsequent neural networks to identify the similarity between words by mapping each word to a real number vector. The word embedding process only needs to be aided by the context information of each word, and therefore only needs to collect the unmarked text of the target language.
For example, if the target task is microblog comment sentiment analysis, a large amount of Chinese unlabeled comment information in the microblog is collected and sorted. Exemplarily, the text data is collected and cleaned in the step, and useless texts and non-text contents such as hyperlinks, symbols, emoticons and the like are removed to obtain the unmarked text data. Such as collecting various articles on the internet or various texts (topic texts, comment texts, etc.) on a microblog, and then removing irrelevant hyperlinks, symbols, emoticons, etc. in the texts. In order to ensure the accuracy of the word vector, the more the unlabeled text used for the pre-training word vector, the better, and generally the total word number is not less than 100 ten thousand words.
And 2, training by using the label-free text data and the word2vec method in the step 1 to obtain word vector representation of the target language.
Specifically, word frequency statistics is carried out on the collected and sorted non-labeled text data in the step 1, such as microblog comment texts, a vocabulary table is established, and each word in the vocabulary table corresponds to a real number vector w to be trained i ∈R d . And then performing word embedding training. The word2vec method includes two types, the Skip-Gram (Skip-Gram) model and the continuous bag of words (CBOW) model. The method adopts a self-supervision training mode, utilizes a continuous bag of words model (CBOW) of word2vec to carry out word embedding pre-training, and obtains real number vectors of all words in a target language, namely the word vector representation.
The continuous bag of words model assumes that the core word is generated based on its surrounding context words in the text sequence. For example, in the text sequence "i", "people", "love", "self", "have", "ancestor", "country", in case "love" is the core word and the context window is 2, the continuous bag of words model considers the conditional probability of generating the core word "love" based on the context words "i", "people", "self", "have", i.e.: p ("love" | "me", "people", "from", "have").
Training all words in the sorted text by using maximum likelihood estimation and continuously updating the iterative word vector w by taking the maximum conditional probability as a target i ∈R d
And 3, collecting and sorting the text data with the tag of the target language.
In order to realize the task of text emotion classification, the model needs to be supervised and trained, and therefore labeled text data needs to be collected, wherein each training sample is a text and a category label. For example, comment texts and corresponding emotion labels when microblog comment emotion is classified.
Illustratively, similar to step 1, this step also collects text data and cleans it to remove useless text and non-text content, after which the emotional propensity (positive, negative, neutral) of each text is manually labeled. Taking microblog emotional analysis as an example, microblog comment texts are collected, emoticons, hyperlinks and other non-text contents in the microblog comment texts are removed, the emotional tendency of each comment text is manually marked, and finally each comment is changed into a text-emotional tag pair. For example, the collected comment information such as "this blogger is really young, we should learn from you" and then manually label their sentiment tag as "positive sentiment". Therefore, a piece of text data with labels in the microblog comment sentiment classification field is obtained.
And 4, constructing a classification model based on the attention static routing capsule network.
The classification model of the invention can use the PyTorch framework widely used in academia at present to write the model code. Referring to fig. 1, the model assembly includes: word2vec word embedding layer, two-dimensional convolution layer, full connection layer, extrusion pooling layer, primary capsule layer, middle-level capsule layer, senior capsule layer and categorised capsule layer, wherein:
the word2vec word embedding layer is used for mapping the text into a word vector sequence; the word vector sequence forms a real number matrix, the real number matrix is used as a picture of a single input channel to be input into the two-dimensional convolution layer, the two-dimensional convolution layer utilizes multi-scale convolution to extract multi-scale features of the text, and the multi-scale features are converted into vector capsules;
the full-connection layer is used for unifying dimensions of the multi-scale features extracted by the two-dimensional convolutional layer, and then performing feature fusion on the multi-scale features with unified dimensions based on attention weight;
the extrusion pooling layer is used for compressing the fused features into vectors with the die length of 0-1 and then used as input of the primary capsule layer;
the primary capsule layer, the intermediate capsule layer, the high-level capsule layer and the classification capsule layer are used for transmitting the most original semantic information extracted by the convolution layer to the model for output by using the attention static route step by step, so that the category of the text emotion is obtained.
The overall sequence of the classification model during training is as follows:
1) And inputting the text data to be classified into a word2vec word embedding layer. Input is T = { w = { (w) 1 ,w 2 ,…,w n Will each word w i Mapped as a real vector v i ∈R d So that the entire text becomes a matrix D = { v = 1 ,v 2 ,…,v n }∈R d ×n Where d is the dimension of the word vector and n is the length of the text.
As shown in fig. 1, the text: "the Bo owner is young and is in, after the word is embedded into the layer, it is mapped into a real matrix D ∈ R d×n . The text length n =10 and the hyperparameter d =64.
2) Inputting the matrix D into a two-dimensional convolution layer as a picture of a single input channel, performing feature extraction on the matrix D by using a multi-scale convolution kernel to obtain multi-scale features, and determining the output shape according to the following formula:
Figure BDA0003857635110000081
wherein
Figure BDA0003857635110000082
Denotes a downward integer, n h Is the longitudinal length, k, of the matrix D h Is the longitudinal length of the convolution kernel, p h For longitudinal filling, s h Is a longitudinal stride.
Illustratively, there are 5 multiscale convolution kernels, each with a size: 1 xd, 3 xd, 5 xd, 7 xd and 9 xd. Longitudinal step s h =1, vertical filling p h =0, the output channels are all 256. The output shapes of the multi-scale convolution obtained by calculation are respectively as follows: o 1 ∈R n×1 ,o 2 ∈R (n-2)×1 ,o 3 ∈R (n-4)×1 ,o 4 ∈R (n-6)×1 ,o 5 ∈R (n-8)×1 . In the above text: the result of this blogger is a young one, and the output shape is o 1 ∈R 10×1 ,o 2 ∈R 8×1 ,o 3 ∈R 6×1 ,o 4 ∈R 4×1 ,o 5 ∈R 2×1
3) Because the output shapes of the multi-scale features in each output channel are different, the dimensions of the multi-scale features need to be changed into the same dimensions through the full-connection layer, and the multi-scale output features g are obtained i . The specific method comprises the following steps:
with a fully-connected layer W 1 ∈R e×n ,W 2 ∈R e×(n-2) ,W 3 ∈R e×(n-4) ,W 4 ∈R e×(n-6) ,W 5 ∈R e×(n-8) (in the above-mentioned text: "this owner is young person 1 ∈R e×10 ,W 2 ∈R e×8 ,W 3 ∈R e×6 ,W 4 ∈R e×4 ,W 5 ∈R e×2 ). The dimensions are unified as follows: w i o i =g i ∈R e×1 Wherein g is i Outputting features for multiple scales of the same dimension; and e is the dimension after the multi-scale features are unified. Because of the matrix and vector multiplication W i o i =g i ∈R e×1 The results obtained are all e-dimensional vectors, e.g. W 1 Is an e × n matrix, and o 1 Is a vector of n × 1, so W 1 o 1 To obtaine × 1 vector.
4) Multi-scale output characteristic g after changing dimensionality i Performing weighted fusion on the same output channel based on attention weight to obtain fused features S i
Referring to FIG. 2, the multi-scale output features are m vectors g on each channel i ∈R e Let g i =k i =v i ∈R e . Suppose there is a query vector q ∈ R for querying semantic feature importance e And m key value pairs (k) 1 ,v 1 ),…,(k m ,v m ) The fusion of multi-scale features based on attention weights can be expressed as the following formula:
Figure BDA0003857635110000091
s∈R e
wherein q represents query, k represents key, and v represents value; q, k 1 …k m ,v 1 …v i Is a function input, the function relation is
Figure BDA0003857635110000092
q and k i Attention weight of (a) (q, k) i ) Is a function of attention scoring
Figure BDA0003857635110000093
The vectors q and k i Mapping into scalar, and calculating by softmax to obtain real number weight between 0 and 1. Attention weight α (q, k) i ) The calculation formula of (a) is as follows:
Figure BDA0003857635110000094
α(q,k i )∈R
attention scoring function
Figure BDA0003857635110000095
The calculation of (c) takes additive attention.Given vector q ∈ R e Vector k i ∈R e Learnable parameter matrix W q ∈R e×e Learnable parameter matrix W k ∈R e×e Learnable parameter vector w v ∈R 1×e Will matrix W q Performing matrix multiplication with the vector q and the matrix W k And vector k i Adding the results after matrix multiplication, inputting into tanh function for nonlinear transformation, and vector w v The result of the transposition and the nonlinear transformation is multiplied to finally obtain the attention fraction, wherein the attention fraction is a real number. The calculation formula is as follows:
Figure BDA0003857635110000101
Figure BDA0003857635110000102
5) In extrusion pooling, fused features s i The compression is performed as a vector c with a die length of 1 by an extrusion operation, which is then fed into the subsequent layer of capsules, as shown below.
Figure BDA0003857635110000103
6) The primary capsule layer, the intermediate capsule layer, the high-grade capsule layer and the classification capsule layer are all connected, and the routing mode among the capsules adopts an attention static routing mechanism.
The conventional dynamic routing mechanism assigns weights to output vectors of each low-level capsule in an iterative manner, while the attention static routing mechanism assigns weights to each vector in a learnable parameter matrix and attention mechanism, as shown in fig. 3, for each of low-level capsules 1, 2, and 3, which respectively output vectors v 1 、v 2 And v 3 Using additive attention scoring function
Figure BDA0003857635110000104
Scoring each output vector, inputting the attention score into softmax operation to obtain corresponding weight, and combining the weight with v 1 、v 2 And v 3 Carrying out weighted summation to obtain a vector y, and then carrying out extrusion operation on the vector y to obtain a vector v with the modular length between 0 and 1 i And then the mixture is sent into the next layer of capsules.
And 5, performing supervised training on the classification model in the step 4 by using the text data with the label of the target language obtained in the step 3. For example, the text: the emotion category "this blogger is young to" has been manually labeled as "forward". Prediction results obtained during supervised training
Figure BDA0003857635110000105
The loss is calculated in the "forward" direction from the actual class and the model parameters are updated using a back propagation algorithm.
And 6, evaluating the classification model trained in the step 5 by utilizing the accuracy, the precision, the recall rate and the F1 Score. And after the model training is finished, testing the model by using a part of test data sets which are not used for training. And evaluating the model by using the accuracy, precision, recall and F1Score according to the result obtained by the model test to finally obtain a text emotion classification model meeting the requirements, and carrying out emotion classification on the input text by using the text emotion classification model meeting the requirements.
The Accuracy, refers to the proportion of all samples with correct prediction, and the calculation formula is as follows:
Figure BDA0003857635110000111
the Precision indicates how many of the samples predicted to be positive are true positive samples, and the calculation formula is as follows:
Figure BDA0003857635110000112
the Recall rate recalling indicates how much the positive case in the sample is predicted correctly, and the calculation formula is as follows:
Figure BDA0003857635110000113
the F1Score (F1 Score) is an index used statistically to measure the accuracy of the two-class model. The method simultaneously considers the accuracy rate and the recall rate of the classification model. The F1score can be viewed as a harmonic mean of the model accuracy and recall, reflecting the robustness of the model, with a maximum of 1 and a minimum of 0. The formula for the F1-score calculation is shown below:
Figure BDA0003857635110000114
in order to obtain accuracy, precision, recall and F1Score, a confusion matrix needs to be drawn for statistics, TP, TN, FP and FN are obtained respectively, under a classification task, four different combinations exist between a prediction result and an actual result, and the confusion matrix can be formed as shown in the following table:
Figure BDA0003857635110000115
in the confusion matrix, TP (True Positive) represents the number of samples that are actually True Positive samples among the samples whose prediction results are Positive samples; FP (False Positive) represents the number of samples that are not Positive in fact among samples whose prediction results are Positive; FP (False Negative) represents the number of samples which are not Negative in reality in the samples with the Negative prediction result; TN (True Negative) represents the number of samples that are actually Negative in the samples whose prediction results are Negative;
the positive examples and the negative examples are relative, for example, in the emotion classification task, the emotion classification of a sentence can be three types of positive, neutral and negative. If positive is chosen as the positive case, then neutral and negative are together called negative cases.
The model CapsNet-ASR of the invention is used for carrying out experiments on Chinese emotion text data sets ASAP, chnSentiCorp, NLPCC14-SC and SE-ABSA16 respectively, and the evaluation and comparison are carried out by taking the accuracy as an index compared with the traditional model. The comparison result is shown in the following table, and it can be seen that the improvement effect of the method is significant in the field of Chinese text sentiment classification.
ASAP ChnSentiCorp NLPCC14-SC SE-ABSA16
RNN 75.9 84.4 83.8 83.1
LSTM 80.3 85.7 84.5 89.5
Generic capsule network 81.2 88.9 87.5 90.8
CapsNet-ASR 84.5 92.2 91.9 91.5
In addition, because the invention uses a static routing mechanism in the CapsNet-ASR, and uses a dynamic routing mechanism in the common capsule network, the training time of the invention is shorter in theory. Therefore, two models are respectively trained on data sets ASAP, chnSentiCorp, NLPCC14-SC and SE-ABSA16, the two models are trained for 60 epochs, the experimental result shows that the training time of the CapsNet-ASR model is obviously shorter than that of the common capsule network, the experimental result is shown in the following table, the figures in the table are the training time of the models in the data set, and the unit is hour.
ASAP ChnSentiCorp NLPCC14-SC SE-ABSA16
Generic capsule network 8 14 16 9
CapsNet-ASR 3 8 10 6

Claims (9)

1. A text emotion classification method based on an attention static routing capsule network is characterized by comprising the following steps:
step 1, collecting non-labeled text data of a target language; the target language is a language used for finishing the text emotion classification task finally;
step 2, training by using the label-free text data and word2vec method in the step 1 to obtain word vector representation of the target language;
step 3, collecting text data with labels of the target language;
step 4, constructing a classification model based on the attention static routing capsule network;
step 5, performing supervised training on the classification model in the step 4 by using the text data with the labels of the target language obtained in the step 3;
and 6, evaluating the classification model trained in the step 5 by using the accuracy, the precision, the recall rate and the F1Score to obtain a text emotion classification model meeting the requirements, and classifying the input text by using the text emotion classification model meeting the requirements.
2. The method for classifying emotion of text based on capsule network with static attention routing as claimed in claim 1, wherein in step 1, text data is collected and washed to remove useless text and non-text content, so as to obtain non-labeled text data, wherein the total word number of the non-labeled text is not less than 100 ten thousand words.
3. The attention static routing capsule network-based text emotion classification method according to claim 1, wherein in the step 2, word embedding pre-training is performed by using a continuous bag of words model (CBOW) model of word2vec to obtain real number vectors, namely the word vector representation, of all words in a target language.
4. The text emotion classification method based on attention static routing capsule network as claimed in claim 1, wherein step 3, text data is collected and washed to remove useless text and non-text contents, and then emotion tendencies of each text are labeled manually.
5. The method for text sentiment classification based on attention static routing capsule network according to claim 1, wherein the model component of the classification model comprises: the word2vec word embedded layer, the two-dimensional convolution layer, the full-connection layer, the extrusion pooling layer, the primary capsule layer, the middle-grade capsule layer, the high-grade capsule layer and the classification capsule layer;
the word2vec word embedding layer is used for mapping texts into word vector sequences; the word vector sequence forms a real number matrix and is used as a picture of a single input channel to be input into the two-dimensional convolution layer, and the two-dimensional convolution layer extracts multi-scale features of a text by utilizing multi-scale convolution and converts the multi-scale features into a vector capsule;
the full-connection layer is used for unifying dimensions of the multi-scale features extracted from the two-dimensional convolutional layer and then performing feature fusion on the multi-scale features with unified dimensions based on attention weight;
the extrusion pooling layer is used for compressing the fused features into vectors with the die length of 0-1 and then serving as the input of the primary capsule layer;
the primary capsule layer, the middle-level capsule layer, the high-level capsule layer and the classification capsule layer are used for transmitting the most original semantic information extracted by the convolution layer to the model to be output by using the attention static route step by step, so that the category of the text emotion is obtained.
6. The text emotion classification method based on attention static routing capsule network as claimed in claim 4, wherein in the step 5, the training process is as follows:
1) To be classified text data T = { w 1 ,w 2 ,…,w n Inputting into the word2vec word embedding layer, and inputting each word w i Mapped as a real vector v i ∈R d So that the entire text becomes a matrix D = { v = 1 ,v 2 ,…,v n }∈R d×n Wherein d is the dimension of the word vector, and n is the length of the text;
2) Inputting the matrix D into a two-dimensional convolution layer as a picture of a single input channel, performing feature extraction on the matrix D by using a multi-scale convolution kernel to obtain multi-scale features, wherein the formula of an output shape is as follows;
Figure FDA0003857635100000021
wherein
Figure FDA0003857635100000022
Denotes a downward integer, n h Is the longitudinal length, k, of the matrix D h Is the longitudinal length of the convolution kernel, p h For longitudinal filling, s h Is the longitudinal step.
3) Changing the dimensionality of the multi-scale features to be the same through the full connection layer to obtain multi-scale output features g i
4) Outputting the multi-scale output characteristic g i Performing weighted fusion on the same output channel based on attention weight to obtain a fused feature s i
5) In extrusion pooling, fused features s i Compressing the mixture into a vector c with the die length of 1 by an extrusion operation, and inputting the vector into a subsequent capsule layer, wherein the formula of the extrusion operation is as follows:
Figure FDA0003857635100000031
6) The primary capsule layer, the intermediate capsule layer, the high-level capsule layer and the classification capsule layer are all connected, and the routing mode among the capsules adopts an attention static routing mechanism.
7. The text emotion classification method based on attention static routing capsule network of claim 6, wherein the number of the two-dimensional convolution layers and the number of the multi-scale convolution kernels are 5, and the sizes are respectively as follows: 1 xd, 3 xd, 5 xd, 7 xd and 9 xd, longitudinal step s h =1, vertical filling p h =0, the output channels are all 256, and the output shapes of the calculated multi-scale convolution are: o 1 ∈R n×1 ,o 2 ∈R (n-2)×1 ,o 3 ∈R (n-4)×1 ,o 4 ∈R (n-6)×1 ,o 5 ∈R (n-8)×1 (ii) a The full connection layer is respectively as follows: w is a group of 1 ∈R e ×n ,W 2 ∈R e×(n-2) ,W 3 ∈R e×(n-4) ,W 4 ∈R e×(n-6) ,W 5 ∈R e×(n-8) (ii) a The dimensions are unified as follows: w is a group of i o i =g i ∈R e×1 Wherein g is i Outputting features for multiple scales of the same dimension; and e is the dimension of the unified multi-scale feature.
8. The text emotion classification method based on attention static routing capsule network according to claim 6, wherein the weighted fusion method is as follows:
the multi-scale output features are m vectors g on each channel i ∈R e Let g i =k i =v i ∈R e (ii) a Setting a query vector q epsilon R for querying semantic feature importance e And m key value pairs (k) 1 ,v 1 ),…,(k m ,v m ) Fusing the multi-scale features based on attention weights is represented as the following formula:
Figure FDA0003857635100000032
s∈R e
let g i =k i =v i ∈R e Q represents query, k represents key, v represents value; q, k 1 …k m ,v 1 …v i Is a function input, the function relation is
Figure FDA0003857635100000033
Wherein q and k i Attention weight of (a) (q, k) i ) Is a function of attention scoring
Figure FDA0003857635100000034
The vectors q and k i Mapping into scalar, and calculating with softmax to obtain real number weight between 0 and 1, alpha (q, k) i ) The calculation formula of (a) is as follows:
Figure FDA0003857635100000041
α(q,k i )∈R
attention scoring function
Figure FDA0003857635100000042
Is calculated with additive attention, given vector q ∈ R e Vector k i ∈R e Learnable parameter matrix W q ∈R e×e Learnable parameter matrix W k ∈R e×e Learnable parameter vector w v ∈R 1×e Will matrix W q Performing matrix multiplication with the vector q and the matrix w k And vector k i Adding the results after matrix multiplication, inputting into tanh function for nonlinear transformation, and vector w v The result of the transposition and the nonlinear transformation is multiplied to finally obtain the attention fraction, wherein the attention fraction is a real number, and the calculation formula is as follows:
Figure FDA0003857635100000043
Figure FDA0003857635100000044
9. the method according to claim 6, wherein the static attention routing capsule network assigns a weight to each vector by means of a learnable parameter matrix and an attention mechanism, and the low-level capsules 1, 2 and 3 respectively output a vector v 1 、v 2 And v 3 Using additive attention scoring function
Figure FDA0003857635100000045
Scoring each output vector, inputting the attention score into softmax operation to obtain corresponding weight, and combining the weight with v 1 、v 2 And v 3 Carrying out weighted summation to obtain a vector y, and then carrying out extrusion operation on the vector y to obtain a vector v with a modular length between 0 and 1 i And then the mixture is sent into the next layer of capsules.
CN202211152911.7A 2022-09-21 2022-09-21 Text emotion classification method based on attention static routing capsule network Pending CN115544252A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211152911.7A CN115544252A (en) 2022-09-21 2022-09-21 Text emotion classification method based on attention static routing capsule network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211152911.7A CN115544252A (en) 2022-09-21 2022-09-21 Text emotion classification method based on attention static routing capsule network

Publications (1)

Publication Number Publication Date
CN115544252A true CN115544252A (en) 2022-12-30

Family

ID=84726699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211152911.7A Pending CN115544252A (en) 2022-09-21 2022-09-21 Text emotion classification method based on attention static routing capsule network

Country Status (1)

Country Link
CN (1) CN115544252A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304842A (en) * 2023-05-18 2023-06-23 南京信息工程大学 Capsule network text classification method based on CFC structure improvement
CN116304585A (en) * 2023-05-18 2023-06-23 中国第一汽车股份有限公司 Emotion recognition and model training method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304842A (en) * 2023-05-18 2023-06-23 南京信息工程大学 Capsule network text classification method based on CFC structure improvement
CN116304585A (en) * 2023-05-18 2023-06-23 中国第一汽车股份有限公司 Emotion recognition and model training method and device, electronic equipment and storage medium
CN116304585B (en) * 2023-05-18 2023-08-15 中国第一汽车股份有限公司 Emotion recognition and model training method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108399158B (en) Attribute emotion classification method based on dependency tree and attention mechanism
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
CN107025284A (en) The recognition methods of network comment text emotion tendency and convolutional neural networks model
CN112001186A (en) Emotion classification method using graph convolution neural network and Chinese syntax
JPH07295989A (en) Device that forms interpreter to analyze data
CN115544252A (en) Text emotion classification method based on attention static routing capsule network
CN112256866B (en) Text fine-grained emotion analysis algorithm based on deep learning
CN110750648A (en) Text emotion classification method based on deep learning and feature fusion
CN113343690B (en) Text readability automatic evaluation method and device
CN110717330A (en) Word-sentence level short text classification method based on deep learning
CN113806547B (en) Deep learning multi-label text classification method based on graph model
CN111368082A (en) Emotion analysis method for domain adaptive word embedding based on hierarchical network
CN112287197B (en) Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases
CN111582506A (en) Multi-label learning method based on global and local label relation
CN114742071B (en) Cross-language ideas object recognition analysis method based on graph neural network
CN112732872A (en) Biomedical text-oriented multi-label classification method based on subject attention mechanism
Mozafari et al. Emotion detection by using similarity techniques
CN115329085A (en) Social robot classification method and system
Baboo et al. Sentiment analysis and automatic emotion detection analysis of twitter using machine learning classifiers
Sajeevan et al. An enhanced approach for movie review analysis using deep learning techniques
Mehendale et al. Cyber bullying detection for Hindi-English language using machine learning
CN116775880A (en) Multi-label text classification method and system based on label semantics and transfer learning
CN116562302A (en) Multi-language event viewpoint object identification method integrating Han-Yue association relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination