CN115544252A

CN115544252A - Text emotion classification method based on attention static routing capsule network

Info

Publication number: CN115544252A
Application number: CN202211152911.7A
Authority: CN
Inventors: 苏依拉; 杨佩恒; 仁庆道尔吉; 吉亚图; 乌尼尔; 路敏
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2022-12-30

Abstract

A text sentiment classification method based on an attention static routing capsule network collects non-labeled text data of a target language; training by using label-free text data and a word2vec method to obtain word vector representation of a target language; collecting text data with tags of a target language; constructing a classification model based on an attention static routing capsule network; carrying out supervised training on the classification model by using the text data with the tag of the target language; and evaluating the trained classification model by using the accuracy, the precision, the recall rate and the F1Score to obtain a text sentiment classification model meeting the requirements, and classifying the input text by using the text sentiment classification model meeting the requirements. The method and the device can improve the extraction capability of the text features and the relation modeling capability among the text features, and finally improve the precision of the text emotion classification.

Description

Text emotion classification method based on attention static routing capsule network

Technical Field

The invention belongs to the technical field of artificial intelligence and text emotion classification, and particularly relates to a text emotion classification method based on an attention static routing capsule network.

Background

Text sentiment classification is one of the most basic and important tasks in the field of machine learning. Conventionally, word frequency inverse text frequency (tf-idf) is used as a feature representation of text, and then a general classifier such as a Support Vector Machine (SVM) or logistic regression is used for text emotion classification.

However, in recent years, the continued development of deep learning methods has made it possible to find distributed representations of words and documents in an efficient manner, which further improves the accuracy of textual emotion classification. The main deep learning models used in the field of text emotion classification are mainly based on Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) and the Transformer architecture of the big fire in recent years. Hinton in 2017 proposed a capsule network in view of the shortcomings of the convolutional neural network, and applied it in the field of image processing, proving that it is effective in understanding spatial relationships in high-level data. Researchers try to apply the capsule network to text processing and achieve good effects later, and prove that the capsule network has advantages for text information processing. The information transmission between different layers of capsules of the traditional capsule network adopts a dynamic routing mechanism, and the dynamic routing mechanism needs to iteratively calculate the weights of different capsules according to data dynamics each time, so that the process is very time-consuming.

Therefore, how to reduce the time spent on routing between capsules in the capsule network without reducing the accuracy of the model becomes an urgent problem to be solved in the field.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a text emotion classification method based on an attention static routing capsule network, so as to improve the extraction capability of text features and the relation modeling capability among the text features, and finally improve the accuracy of text emotion classification.

In order to achieve the purpose, the invention adopts the technical scheme that:

a text emotion classification method based on an attention static routing capsule network comprises the following steps:

step 1, collecting non-labeled text data of a target language; the target language is a language used for finishing the text emotion classification task finally;

step 2, training by using the label-free text data and the word2vec method in the step 1 to obtain word vector representation of the target language;

step 3, collecting text data with labels of the target language;

step 4, constructing a classification model based on the attention static routing capsule network;

step 5, performing supervised training on the classification model in the step 4 by using the text data with the labels of the target language obtained in the step 3;

and 6, evaluating the classification model trained in the step 5 by using the accuracy, the precision, the recall rate and the F1Score to obtain a text emotion classification model meeting the requirements, and classifying the input text by using the text emotion classification model meeting the requirements.

In one embodiment, in step 1, text data is collected and cleaned, useless text and non-text content are removed, and label-free text data is obtained, wherein the total number of words of the label-free text is not less than 100 ten thousand words.

In one embodiment, in step 2, word embedding pre-training is performed by using a continuous bag of words model (CBOW) model of word2vec, so as to obtain real number vectors of all words in the target language, that is, the word vector representation.

In one embodiment, step 3, text data is collected and cleaned, useless text and non-text content are removed, and then emotional tendency of each text is marked manually.

In one embodiment, the model component of the classification model comprises: the word2vec word embedded layer, the two-dimensional convolution layer, the full connection layer, the extrusion pooling layer, the primary capsule layer, the intermediate capsule layer, the high-grade capsule layer and the classification capsule layer;

the word2vec word embedding layer is used for mapping texts into word vector sequences; the word vector sequence forms a real number matrix and is used as a picture of a single input channel to be input into the two-dimensional convolution layer, and the two-dimensional convolution layer extracts multi-scale features of a text by utilizing multi-scale convolution and converts the multi-scale features into a vector capsule;

the full-connection layer is used for unifying dimensions of the multi-scale features extracted from the two-dimensional convolutional layer and then performing feature fusion on the multi-scale features with unified dimensions based on attention weight;

the extrusion pooling layer is used for compressing the fused features into vectors with the die length of 0-1 and then serving as the input of the primary capsule layer;

the primary capsule layer, the middle-level capsule layer, the high-level capsule layer and the classification capsule layer are used for transmitting the most original semantic information extracted by the convolution layer to the model to be output by using the attention static route step by step, so that the category of the text emotion is obtained.

In one embodiment, the step 5 training process is as follows:

1) Text data to be classified T = { w = { (w) ₁ ,w ₂ ,…,w _n Inputting into the word2vec word embedding layer, and inputting each word w _i Mapped as a real vector v _i ∈R ^d So that the entire text becomes a matrix D = { v = ₁ ,v ₂ ,…,v _n }∈R ^d×n Wherein d is the dimension of the word vector, and n is the length of the text;

2) Inputting the matrix D into a two-dimensional convolution layer as a picture of a single input channel, performing feature extraction on the matrix D by using a multi-scale convolution kernel to obtain multi-scale features, wherein the formula of an output shape is as follows;

wherein

Denotes a downward integer, n _h Is the longitudinal length, k, of the matrix D _h Is the longitudinal length of the convolution kernel, p _h For longitudinal filling, s _h Is a longitudinal stride.

3) Changing the dimensionality of the multi-scale features to be the same through the full connection layer to obtain multi-scale output features g _i ；

4) Outputting the characteristic g in multiple scales _i Performing weighted fusion on the same output channel based on attention weight to obtain a fused feature s _i ；

5) In extrusion pooling, fused features s _i Compressing the mixture into a vector c with the die length of 1 by an extrusion operation, and inputting the vector into a subsequent capsule layer, wherein the formula of the extrusion operation is as follows:

6) The primary capsule layer, the intermediate capsule layer, the high-level capsule layer and the classification capsule layer are all connected, and the routing mode among the capsules adopts an attention static routing mechanism.

In one embodiment, the two-dimensional convolution layers and the multi-scale convolution kernels have a total number of 5, and the sizes of the two-dimensional convolution layers and the multi-scale convolution kernels are respectively as follows: 1 xd, 3 xd, 5 xd, 7 xd and 9 xd, longitudinal step s _h =1, vertical filling p _h =0, the output channels are all 256, and the output shapes of the calculated multi-scale convolution are: o ₁ ∈R ^n×1 ，o ₂ ∈R ^(n-2)×1 ，o ₃ ∈R ^(n-4)×1 ，o ₄ ∈R ^(n-6)×1 ，o ₅ ∈R ^(n-8)×1 (ii) a The full connection layer is respectively as follows: w ₁ ∈R ^e×n ，W ₂ ∈R ^e×(n-2) ，W ₃ ∈R ^e×(n-4) ，W ₄ ∈R ^e×(n-6) ，W ₅ ∈R ^e×(n-8) (ii) a The dimensions are unified as follows: w _i o _i ＝g _i ∈R ^e×1 Wherein g is _i Of the same dimensionMulti-scale output features; and e is the dimension after the multi-scale features are unified.

In one embodiment, the weighted fusion method is as follows:

the multi-scale output features are m vectors g on each channel _i ∈R ^e Let g _i ＝k _i ＝v _i ∈R ^e (ii) a Setting a query vector q epsilon R for querying semantic feature importance ^e And m key value pairs (k) ₁ ,v ₁ ),…,(k _m ,v _m ) Fusing the multi-scale features based on attention weights is represented as the following formula:

s∈R ^e

let g _i ＝k _i ＝v _i ∈R ^e Q represents query, k represents key, v represents value; q, k ₁ …k _m ,v ₁ …v _i Is a function input, the function relation is

Wherein q and k _i Attention weight of (a, k) of (b) _i ) Is a function of attention scoring

The vectors q and k _i Mapping into scalar, and calculating with softmax to obtain real number weight between 0-1, alpha (q, k) _i ) The calculation formula of (a) is as follows:

α(q,k _i )∈R

attention scoring function

Is calculated with additive attentionGiven a vector q ∈ R ^e Vector k _i ∈R ^e Learnable parameter matrix W _q ∈R ^e×e Learnable parameter matrix W _k ∈R ^e×e Learnable parameter vector w _v ∈R ^1×e Will matrix W _q Performing matrix multiplication with the vector q and the matrix W _k And vector k _i Adding the results after matrix multiplication, inputting into tanh function for nonlinear transformation, and vector w _v The result of the transposition and the nonlinear transformation is multiplied to finally obtain the attention fraction, wherein the attention fraction is a real number, and the calculation formula is as follows:

in one embodiment, the static routing mechanism assigns a weight to each vector by means of a learnable parameter matrix and an attention mechanism, lower level capsules 1, 2, 3, respectively outputting a vector v ₁ 、v ₂ And v ₃ Using additive attention scoring function

Scoring each output vector, inputting the attention score into softmax operation to obtain corresponding weight, and combining the weight with v ₁ 、v ₂ And v ₃ Carrying out weighted summation to obtain a vector y, and then carrying out extrusion operation on the vector y to obtain a vector v with the modular length between 0 and 1 _i And then the mixture is sent into the next layer of capsules.

Compared with the prior art, the invention has the beneficial effects that:

firstly, the invention designs a new model structure: the overall structure of a Capsule network (Capsule network-ASR: capsule network based on attentive static routing) based on attention static routing is as follows: word embedding layer, convolution layer, initial capsule layer, intermediate capsule layer, high-grade capsule layer and classification capsule layer. Secondly, dynamic routing between the capsule layers is replaced by a special attention static routing mechanism of the invention, and the network automatically learns how to distribute the weight of the routing to the bottom layer capsule in the training stage, thereby improving the routing efficiency. Thirdly, the convolution layer in the model adopts a multi-scale convolution kernel to better extract text information, and the multi-scale convolution characteristics are subjected to weighted fusion on the same output channel by using an attention mechanism. And finally, replacing the common pooling operation of the convolutional neural network with extrusion operation so as to improve the modeling capability of the relationship among the semantic features. Through the improvement, the extraction capability of the text features can be effectively improved, the relation modeling capability among the text features is improved, and finally the text emotion classification precision is improved.

Drawings

FIG. 1 is a diagram of a classification model architecture for an attention-based static routing capsule network.

FIG. 2 is a schematic diagram of fusion of multi-scale features based on attention weights.

Fig. 3 is an attention static routing diagram.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

Compared with the existing text emotion classification method, the text emotion classification method has the advantages that the dynamic routing process of the capsule network is replaced by the static routing based on the attention mechanism, and the network can automatically learn how to distribute the weight of the routing to the bottom-layer capsule in the training stage; and the usual pooling operation of convolutional neural networks is replaced by a squeezing operation. The overall structure of the network model is as follows: the capsule comprises a word embedding layer, a two-dimensional convolution layer, a full connection layer, an extrusion pooling layer, an initial capsule layer, a middle-grade capsule layer, a high-grade capsule layer and a classification capsule layer.

Specifically, the method comprises the following steps:

step 1, collecting and sorting non-labeled text data of a target language.

The method mainly comprises the step of collecting corresponding text data according to specific tasks, wherein a target language is a language used for finishing a text emotion classification task finally. The purpose of word embedding is to allow subsequent neural networks to identify the similarity between words by mapping each word to a real number vector. The word embedding process only needs to be aided by the context information of each word, and therefore only needs to collect the unmarked text of the target language.

For example, if the target task is microblog comment sentiment analysis, a large amount of Chinese unlabeled comment information in the microblog is collected and sorted. Exemplarily, the text data is collected and cleaned in the step, and useless texts and non-text contents such as hyperlinks, symbols, emoticons and the like are removed to obtain the unmarked text data. Such as collecting various articles on the internet or various texts (topic texts, comment texts, etc.) on a microblog, and then removing irrelevant hyperlinks, symbols, emoticons, etc. in the texts. In order to ensure the accuracy of the word vector, the more the unlabeled text used for the pre-training word vector, the better, and generally the total word number is not less than 100 ten thousand words.

And 2, training by using the label-free text data and the word2vec method in the step 1 to obtain word vector representation of the target language.

Specifically, word frequency statistics is carried out on the collected and sorted non-labeled text data in the step 1, such as microblog comment texts, a vocabulary table is established, and each word in the vocabulary table corresponds to a real number vector w to be trained _i ∈R ^d . And then performing word embedding training. The word2vec method includes two types, the Skip-Gram (Skip-Gram) model and the continuous bag of words (CBOW) model. The method adopts a self-supervision training mode, utilizes a continuous bag of words model (CBOW) of word2vec to carry out word embedding pre-training, and obtains real number vectors of all words in a target language, namely the word vector representation.

The continuous bag of words model assumes that the core word is generated based on its surrounding context words in the text sequence. For example, in the text sequence "i", "people", "love", "self", "have", "ancestor", "country", in case "love" is the core word and the context window is 2, the continuous bag of words model considers the conditional probability of generating the core word "love" based on the context words "i", "people", "self", "have", i.e.: p ("love" | "me", "people", "from", "have").

Training all words in the sorted text by using maximum likelihood estimation and continuously updating the iterative word vector w by taking the maximum conditional probability as a target _i ∈R ^d 。

And 3, collecting and sorting the text data with the tag of the target language.

In order to realize the task of text emotion classification, the model needs to be supervised and trained, and therefore labeled text data needs to be collected, wherein each training sample is a text and a category label. For example, comment texts and corresponding emotion labels when microblog comment emotion is classified.

Illustratively, similar to step 1, this step also collects text data and cleans it to remove useless text and non-text content, after which the emotional propensity (positive, negative, neutral) of each text is manually labeled. Taking microblog emotional analysis as an example, microblog comment texts are collected, emoticons, hyperlinks and other non-text contents in the microblog comment texts are removed, the emotional tendency of each comment text is manually marked, and finally each comment is changed into a text-emotional tag pair. For example, the collected comment information such as "this blogger is really young, we should learn from you" and then manually label their sentiment tag as "positive sentiment". Therefore, a piece of text data with labels in the microblog comment sentiment classification field is obtained.

And 4, constructing a classification model based on the attention static routing capsule network.

The classification model of the invention can use the PyTorch framework widely used in academia at present to write the model code. Referring to fig. 1, the model assembly includes: word2vec word embedding layer, two-dimensional convolution layer, full connection layer, extrusion pooling layer, primary capsule layer, middle-level capsule layer, senior capsule layer and categorised capsule layer, wherein:

the word2vec word embedding layer is used for mapping the text into a word vector sequence; the word vector sequence forms a real number matrix, the real number matrix is used as a picture of a single input channel to be input into the two-dimensional convolution layer, the two-dimensional convolution layer utilizes multi-scale convolution to extract multi-scale features of the text, and the multi-scale features are converted into vector capsules;

the full-connection layer is used for unifying dimensions of the multi-scale features extracted by the two-dimensional convolutional layer, and then performing feature fusion on the multi-scale features with unified dimensions based on attention weight;

the extrusion pooling layer is used for compressing the fused features into vectors with the die length of 0-1 and then used as input of the primary capsule layer;

the primary capsule layer, the intermediate capsule layer, the high-level capsule layer and the classification capsule layer are used for transmitting the most original semantic information extracted by the convolution layer to the model for output by using the attention static route step by step, so that the category of the text emotion is obtained.

The overall sequence of the classification model during training is as follows:

1) And inputting the text data to be classified into a word2vec word embedding layer. Input is T = { w = { (w) ₁ ,w ₂ ,…,w _n Will each word w _i Mapped as a real vector v _i ∈R ^d So that the entire text becomes a matrix D = { v = ₁ ,v ₂ ,…,v _n }∈R ^d ^×n Where d is the dimension of the word vector and n is the length of the text.

As shown in fig. 1, the text: "the Bo owner is young and is in, after the word is embedded into the layer, it is mapped into a real matrix D ∈ R ^d×n . The text length n =10 and the hyperparameter d =64.

2) Inputting the matrix D into a two-dimensional convolution layer as a picture of a single input channel, performing feature extraction on the matrix D by using a multi-scale convolution kernel to obtain multi-scale features, and determining the output shape according to the following formula:

wherein

Illustratively, there are 5 multiscale convolution kernels, each with a size: 1 xd, 3 xd, 5 xd, 7 xd and 9 xd. Longitudinal step s _h =1, vertical filling p _h =0, the output channels are all 256. The output shapes of the multi-scale convolution obtained by calculation are respectively as follows: o ₁ ∈R ^n×1 ，o ₂ ∈R ^(n-2)×1 ，o ₃ ∈R ^(n-4)×1 ，o ₄ ∈R ^(n-6)×1 ，o ₅ ∈R ^(n-8)×1 . In the above text: the result of this blogger is a young one, and the output shape is o ₁ ∈R ^10×1 ，o ₂ ∈R ^8×1 ，o ₃ ∈R ^6×1 ，o ₄ ∈R ^4×1 ，o ₅ ∈R ^2×1 。

3) Because the output shapes of the multi-scale features in each output channel are different, the dimensions of the multi-scale features need to be changed into the same dimensions through the full-connection layer, and the multi-scale output features g are obtained _i . The specific method comprises the following steps:

with a fully-connected layer W ₁ ∈R ^e×n ，W ₂ ∈R ^e×(n-2) ，W ₃ ∈R ^e×(n-4) ，W ₄ ∈R ^e×(n-6) ，W ₅ ∈R ^e×(n-8) (in the above-mentioned text: "this owner is young person ₁ ∈R ^e×10 ，W ₂ ∈R ^e×8 ，W ₃ ∈R ^e×6 ，W ₄ ∈R ^e×4 ，W ₅ ∈R ^e×2 ). The dimensions are unified as follows: w _i o _i ＝g _i ∈R ^e×1 Wherein g is _i Outputting features for multiple scales of the same dimension; and e is the dimension after the multi-scale features are unified. Because of the matrix and vector multiplication W _i o _i ＝g _i ∈R ^e×1 The results obtained are all e-dimensional vectors, e.g. W ₁ Is an e × n matrix, and o ₁ Is a vector of n × 1, so W ₁ o ₁ To obtaine × 1 vector.

4) Multi-scale output characteristic g after changing dimensionality _i Performing weighted fusion on the same output channel based on attention weight to obtain fused features S _i 。

Referring to FIG. 2, the multi-scale output features are m vectors g on each channel _i ∈R ^e Let g _i ＝k _i ＝v _i ∈R ^e . Suppose there is a query vector q ∈ R for querying semantic feature importance ^e And m key value pairs (k) ₁ ,v ₁ ),…,(k _m ,v _m ) The fusion of multi-scale features based on attention weights can be expressed as the following formula:

s∈R ^e

wherein q represents query, k represents key, and v represents value; q, k ₁ …k _m ,v ₁ …v _i Is a function input, the function relation is

q and k _i Attention weight of (a) (q, k) _i ) Is a function of attention scoring

The vectors q and k _i Mapping into scalar, and calculating by softmax to obtain real number weight between 0 and 1. Attention weight α (q, k) _i ) The calculation formula of (a) is as follows:

α(q,k _i )∈R

attention scoring function

The calculation of (c) takes additive attention.Given vector q ∈ R ^e Vector k _i ∈R ^e Learnable parameter matrix W _q ∈R ^e×e Learnable parameter matrix W _k ∈R ^e×e Learnable parameter vector w _v ∈R ^1×e Will matrix W _q Performing matrix multiplication with the vector q and the matrix W _k And vector k _i Adding the results after matrix multiplication, inputting into tanh function for nonlinear transformation, and vector w _v The result of the transposition and the nonlinear transformation is multiplied to finally obtain the attention fraction, wherein the attention fraction is a real number. The calculation formula is as follows:

5) In extrusion pooling, fused features s _i The compression is performed as a vector c with a die length of 1 by an extrusion operation, which is then fed into the subsequent layer of capsules, as shown below.

6) The primary capsule layer, the intermediate capsule layer, the high-grade capsule layer and the classification capsule layer are all connected, and the routing mode among the capsules adopts an attention static routing mechanism.

The conventional dynamic routing mechanism assigns weights to output vectors of each low-level capsule in an iterative manner, while the attention static routing mechanism assigns weights to each vector in a learnable parameter matrix and attention mechanism, as shown in fig. 3, for each of low-level capsules 1, 2, and 3, which respectively output vectors v ₁ 、v ₂ And v ₃ Using additive attention scoring function

And 5, performing supervised training on the classification model in the step 4 by using the text data with the label of the target language obtained in the step 3. For example, the text: the emotion category "this blogger is young to" has been manually labeled as "forward". Prediction results obtained during supervised training

The loss is calculated in the "forward" direction from the actual class and the model parameters are updated using a back propagation algorithm.

And 6, evaluating the classification model trained in the step 5 by utilizing the accuracy, the precision, the recall rate and the F1 Score. And after the model training is finished, testing the model by using a part of test data sets which are not used for training. And evaluating the model by using the accuracy, precision, recall and F1Score according to the result obtained by the model test to finally obtain a text emotion classification model meeting the requirements, and carrying out emotion classification on the input text by using the text emotion classification model meeting the requirements.

The Accuracy, refers to the proportion of all samples with correct prediction, and the calculation formula is as follows:

the Precision indicates how many of the samples predicted to be positive are true positive samples, and the calculation formula is as follows:

the Recall rate recalling indicates how much the positive case in the sample is predicted correctly, and the calculation formula is as follows:

the F1Score (F1 Score) is an index used statistically to measure the accuracy of the two-class model. The method simultaneously considers the accuracy rate and the recall rate of the classification model. The F1score can be viewed as a harmonic mean of the model accuracy and recall, reflecting the robustness of the model, with a maximum of 1 and a minimum of 0. The formula for the F1-score calculation is shown below:

in order to obtain accuracy, precision, recall and F1Score, a confusion matrix needs to be drawn for statistics, TP, TN, FP and FN are obtained respectively, under a classification task, four different combinations exist between a prediction result and an actual result, and the confusion matrix can be formed as shown in the following table:

in the confusion matrix, TP (True Positive) represents the number of samples that are actually True Positive samples among the samples whose prediction results are Positive samples; FP (False Positive) represents the number of samples that are not Positive in fact among samples whose prediction results are Positive; FP (False Negative) represents the number of samples which are not Negative in reality in the samples with the Negative prediction result; TN (True Negative) represents the number of samples that are actually Negative in the samples whose prediction results are Negative;

the positive examples and the negative examples are relative, for example, in the emotion classification task, the emotion classification of a sentence can be three types of positive, neutral and negative. If positive is chosen as the positive case, then neutral and negative are together called negative cases.

The model CapsNet-ASR of the invention is used for carrying out experiments on Chinese emotion text data sets ASAP, chnSentiCorp, NLPCC14-SC and SE-ABSA16 respectively, and the evaluation and comparison are carried out by taking the accuracy as an index compared with the traditional model. The comparison result is shown in the following table, and it can be seen that the improvement effect of the method is significant in the field of Chinese text sentiment classification.

	ASAP	ChnSentiCorp	NLPCC14-SC	SE-ABSA16
					RNN	75.9	84.4	83.8	83.1
LSTM	80.3	85.7	84.5	89.5
					Generic capsule network	81.2	88.9	87.5	90.8
CapsNet-ASR	84.5	92.2	91.9	91.5

In addition, because the invention uses a static routing mechanism in the CapsNet-ASR, and uses a dynamic routing mechanism in the common capsule network, the training time of the invention is shorter in theory. Therefore, two models are respectively trained on data sets ASAP, chnSentiCorp, NLPCC14-SC and SE-ABSA16, the two models are trained for 60 epochs, the experimental result shows that the training time of the CapsNet-ASR model is obviously shorter than that of the common capsule network, the experimental result is shown in the following table, the figures in the table are the training time of the models in the data set, and the unit is hour.

	ASAP	ChnSentiCorp	NLPCC14-SC	SE-ABSA16
					Generic capsule network	8	14	16	9
CapsNet-ASR	3	8	10	6

Claims

1. A text emotion classification method based on an attention static routing capsule network is characterized by comprising the following steps:

step 2, training by using the label-free text data and word2vec method in the step 1 to obtain word vector representation of the target language;

step 3, collecting text data with labels of the target language;

2. The method for classifying emotion of text based on capsule network with static attention routing as claimed in claim 1, wherein in step 1, text data is collected and washed to remove useless text and non-text content, so as to obtain non-labeled text data, wherein the total word number of the non-labeled text is not less than 100 ten thousand words.

3. The attention static routing capsule network-based text emotion classification method according to claim 1, wherein in the step 2, word embedding pre-training is performed by using a continuous bag of words model (CBOW) model of word2vec to obtain real number vectors, namely the word vector representation, of all words in a target language.

4. The text emotion classification method based on attention static routing capsule network as claimed in claim 1, wherein step 3, text data is collected and washed to remove useless text and non-text contents, and then emotion tendencies of each text are labeled manually.

5. The method for text sentiment classification based on attention static routing capsule network according to claim 1, wherein the model component of the classification model comprises: the word2vec word embedded layer, the two-dimensional convolution layer, the full-connection layer, the extrusion pooling layer, the primary capsule layer, the middle-grade capsule layer, the high-grade capsule layer and the classification capsule layer;

6. The text emotion classification method based on attention static routing capsule network as claimed in claim 4, wherein in the step 5, the training process is as follows:

1) To be classified text data T = { w ₁ ,w ₂ ,…,w _n Inputting into the word2vec word embedding layer, and inputting each word w _i Mapped as a real vector v _i ∈R ^d So that the entire text becomes a matrix D = { v = ₁ ,v ₂ ,…,v _n }∈R ^d×n Wherein d is the dimension of the word vector, and n is the length of the text;

wherein

Denotes a downward integer, n _h Is the longitudinal length, k, of the matrix D _h Is the longitudinal length of the convolution kernel, p _h For longitudinal filling, s _h Is the longitudinal step.

4) Outputting the multi-scale output characteristic g _i Performing weighted fusion on the same output channel based on attention weight to obtain a fused feature s _i ；

7. The text emotion classification method based on attention static routing capsule network of claim 6, wherein the number of the two-dimensional convolution layers and the number of the multi-scale convolution kernels are 5, and the sizes are respectively as follows: 1 xd, 3 xd, 5 xd, 7 xd and 9 xd, longitudinal step s _h =1, vertical filling p _h =0, the output channels are all 256, and the output shapes of the calculated multi-scale convolution are: o ₁ ∈R ^n×1 ，o ₂ ∈R ^(n-2)×1 ，o ₃ ∈R ^(n-4)×1 ，o ₄ ∈R ^(n-6)×1 ，o ₅ ∈R ^(n-8)×1 (ii) a The full connection layer is respectively as follows: w is a group of ₁ ∈R ^e ^×n ，W ₂ ∈R ^e×(n-2) ，W ₃ ∈R ^e×(n-4) ，W ₄ ∈R ^e×(n-6) ，W ₅ ∈R ^e×(n-8) (ii) a The dimensions are unified as follows: w is a group of _i o _i ＝g _i ∈R ^e×1 Wherein g is _i Outputting features for multiple scales of the same dimension; and e is the dimension of the unified multi-scale feature.

8. The text emotion classification method based on attention static routing capsule network according to claim 6, wherein the weighted fusion method is as follows:

s∈R ^e

Wherein q and k _i Attention weight of (a) (q, k) _i ) Is a function of attention scoring

The vectors q and k _i Mapping into scalar, and calculating with softmax to obtain real number weight between 0 and 1, alpha (q, k) _i ) The calculation formula of (a) is as follows:

α(q,k _i )∈R

attention scoring function

Is calculated with additive attention, given vector q ∈ R ^e Vector k _i ∈R ^e Learnable parameter matrix W _q ∈R ^e×e Learnable parameter matrix W _k ∈R ^e×e Learnable parameter vector w _v ∈R ^1×e Will matrix W _q Performing matrix multiplication with the vector q and the matrix w _k And vector k _i Adding the results after matrix multiplication, inputting into tanh function for nonlinear transformation, and vector w _v The result of the transposition and the nonlinear transformation is multiplied to finally obtain the attention fraction, wherein the attention fraction is a real number, and the calculation formula is as follows:

9. the method according to claim 6, wherein the static attention routing capsule network assigns a weight to each vector by means of a learnable parameter matrix and an attention mechanism, and the low-level capsules 1, 2 and 3 respectively output a vector v ₁ 、v ₂ And v ₃ Using additive attention scoring function

Scoring each output vector, inputting the attention score into softmax operation to obtain corresponding weight, and combining the weight with v ₁ 、v ₂ And v ₃ Carrying out weighted summation to obtain a vector y, and then carrying out extrusion operation on the vector y to obtain a vector v with a modular length between 0 and 1 _i And then the mixture is sent into the next layer of capsules.