CN115757792A

CN115757792A - Microblog text sentiment classification method based on deep learning

Info

Publication number: CN115757792A
Application number: CN202211504882.6A
Authority: CN
Inventors: 陈藜文; 肖正
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-03-07

Abstract

The invention provides a microblog text sentiment classification method based on deep learning, which comprises the steps of obtaining a microblog source text, preprocessing the text, inputting the preprocessed text into a GloVe pre-training model and a BERT pre-training model to generate corresponding word vectors, inputting the word vectors into a SenticNet sentiment dictionary to obtain sentiment polarity values, stacking and embedding the generated word vectors into a CNN and a BiGRU, outputting local characteristic vectors and global characteristic vectors, splicing the local characteristic vectors and sentiment dimension distribution vectors and inputting the spliced local characteristic vectors and the sentiment dimension distribution vectors into a full connection layer, classifying by using a Softmax function, equalizing the obtained sentiment polarity values, judging sentiment tendency through a set threshold, solving the problems that the traditional sentiment classification model has poor characteristic extraction capability, cannot process word meaning, sarcasm semantics and the like, and improving the sentiment classification effect of the text.

Description

Microblog text sentiment classification method based on deep learning

Technical Field

The invention relates to the technical field of natural language processing, in particular to a microblog text sentiment classification method based on deep learning.

Background

Text emotion analysis refers to a process of analyzing, processing and extracting subjective texts with emotion colors by using natural language processing and text mining technologies, and is one of the hot problems of research in the field of natural language processing and text mining continuously in recent years. The emotion analysis task can be divided into sub-problems of emotion classification, emotion retrieval, emotion extraction and the like according to the task type of the emotion analysis task, wherein the emotion classification is also called emotion tendency analysis and is used for identifying whether the emotion of subjective text tends to be positive or negative for given text.

The main methods of the research work for analyzing the emotion tendentiousness of the subjective text at present are a semantic-based emotion dictionary method, a machine learning-based method and a deep learning-based method, wherein the semantic-based emotion dictionary method comprises the following steps of:

1. sentiment dictionary method based on semantics. The construction of the emotion dictionary is a precondition and a basis of emotion classification, and in actual use, the emotion dictionary is divided into four types of general emotion words, degree adverbs, negative words and field words. The construction method of the emotion dictionary at home and abroad mainly utilizes the existing semantic resource vocabulary to create, and the text is regarded as a word set. And (4) carrying out paragraph disassembling and syntactic analysis on the text by formulating the rule of linguistic expression and manually marking the emotion dictionary. The English aspect is mainly based on the expansion of Word Net of an English dictionary, and the emotional tendency of the emotional words is judged by utilizing synonymy and near relation among words in the Word Net, so that the emotional polarity of the text viewpoint is judged. The Chinese aspect is mainly the expansion of How Net, and the semantic similarity of the word and the reference emotion word level is calculated by utilizing a semantic similarity calculation method so as to judge the emotion tendency of the word. In conclusion, words in the text are matched with the emotion dictionary, the emotion tendentiousness score of the text is calculated, the classification effect of the emotion tendentiousness score depends on the quality of the emotion dictionary, and the universality in different fields is not high.

2. A method based on machine learning. Text orientation is marked manually to serve as a training set, text emotion characteristics are extracted, an emotion classifier is constructed through a machine learning method, and the text to be classified is subjected to orientation classification through the classifier. The commonly used emotion classification features include emotion words, parts of speech, syntactic structures, negative expression templates, connections, semantic topics, and the like, and the feature extraction methods include Information Gain (IG), CHI-statistic (CHI-square, CHI), document Frequency (DF), and the like, and the commonly used classification methods include a central vector classification method, a K-Nearest-Neighbor (KNN) classification method, naive bayes, a support vector machine, a conditional random field, maximum entropy classification, a decision tree, and the like. The method needs manual labeling, has large workload and wastes time and labor, and the selection of the characteristics directly influences the performance of the emotion analysis task.

3. A method based on deep learning. In recent years, deep learning algorithms are dominating other traditional emotion analysis methods. These algorithms detect emotions or opinions in text without feature engineering. There are a variety of deep learning algorithms, namely recurrent neural networks and convolutional neural networks, that are applied to sentiment analysis and provide more accurate results than machine learning models provide. For the emotion analysis method of the Chinese text on the social platform, the effect of deep learning and text feature extraction is influenced due to the problems of short text, word ambiguity, ironic theory and the like, so that the emotion classification effect is influenced.

Disclosure of Invention

Aiming at the defects or the improvement requirements of the prior art, the invention provides a microblog emotion classification method based on deep learning, aiming at solving the problems that the traditional emotion classification model has poor characteristic extraction capability and cannot process word ambiguity, irony semantics and the like, and improving the characteristic extraction mode of a text by stacking and embedding static word vectors and dynamic word vectors; for the dependency relationship in the original information and the context information which are not fully considered, a CNN and a BiGRU are constructed to extract local features and global features in parallel; aiming at the influence of specific words in the text on the text emotion polarity, a SenticNet dictionary is introduced to calculate the emotion polarity and emotion dimension distribution, the characteristics of the text and the words are fused, fine-grained analysis is carried out from four dimensions of emotion, and the text emotion classification effect is improved.

According to a first aspect of the invention, a microblog text sentiment classification method based on deep learning is provided, and is characterized by comprising the following steps:

step 1: and preprocessing microblog text data, wherein the microblog text data are crawled as source texts, the source texts are subjected to data cleaning through regular expressions, special symbols and labels are removed, chinese word segmentation is carried out on the source texts, stop words in the source texts are removed through a natural language processing tool kit, interference features are eliminated, and texts D are obtained.

Step 2: generating a word vector, including converting words in text D into static word vectors according to a GloVe model

Converting words in text D into dynamic word vectors S according to a BERT model _b ＝[w _b1 ,w _b2 ,...,w _bt ]And performing similarity calculation on words in the text D according to a SenticNet emotion dictionary to obtain a first emotion polarity value P _sentic ＝polarity(w _i ) And an emotion dimension distribution vector S _w ＝[pleasantness _w ,attention _w ,sensitivity _w ,aptitude _w ]Where t is the dimension of the generated word vector.

And 3, step 3: extracting feature vectors from the neural network model, including static word vectors

And a dynamic word vector S _b ＝[w _b1 ,w _b2 ,...,w _bt ]Stack embedding to generate input word vectors

Will input the word vector

Inputting into convolutional neural network to obtain local feature vector, and inputting word vector

And inputting the global feature vector into a BiGRU model to obtain a global feature vector.

And 4, step 4: text emotion classification, which comprises the steps of splicing the local feature vector, the global feature vector and the emotion dimension distribution vector, inputting the spliced local feature vector, the global feature vector and the emotion dimension distribution vector into a full connection layer for processing, and then inputting the processed global feature vector into a Softmax classifier for calculating an emotion polarity value to obtain a second emotion polarity value P _{CNN_BiGRU} Averaging the first emotion polarity value and the second emotion polarity value to obtain an emotion polarity value P of the text D _D And performing emotion classification judgment according to a preset threshold value, and outputting a text emotion classification result.

Further, the microblog text sentiment classification method based on deep learning provided by the invention is characterized in that the step 2 comprises the following steps:

step 2-1: the GloVe model is a word characterization tool based on global word frequency statistics, a co-occurrence matrix is constructed according to a text D, and word vectors are represented as

Wherein each element x _ij Representing the number of co-occurrences of word i and word j in a context window of a particular size, by a decay function based on the distance d between the two words in the context window

Calculating weight, and generating word vector w by co-occurrence matrix _i Outputting static word vectors

Step 2-2: adopting a BERT model of Chinese _ L-12 \ -u H-768 \ -A-12, wherein the number of transform Encoder layers is 12, the dimensionality of a hidden layer is 768, the number of self-attention heads of a self-attention machine layer is 12, inputting the word vector representation of a text D, fusing the words in the text D by the transform Encoder layers and the self-attention machine layer to enhance vector representation of full-text semantic information, and outputting a dynamic word vector S _b ＝[w _b1 ,w _b2 ,...,w _bt ]。

Step 2-3: sentiment dictionary for calculating sentiment intensity, sentiment Net is a conceptA hierarchical knowledge base provides concepts of semantic, emotion and polarity associations, wherein semantic refers to five concepts most semantically related to an input concept, emotion refers to emotion values of four emotional dimensions (attition, sensitivity, applicability), and an interval [ -1,1]The first emotion polarity value P is obtained by inputting words in the text D _sentic ＝polarity(w _i ) And an emotion dimension distribution vector S _w ＝[pleasantness _w ,attention _w ,sensitivity _w ,aptitude _w ]。

Further, the microblog text sentiment classification method based on deep learning provided by the invention is characterized in that the step 3 comprises the following steps:

step 3-1: the convolutional layer is composed of several characteristic maps, each characteristic map is composed of several neurons, and the neuron is connected with upper layer by means of convolutional kernel and input

Extracting a certain partial feature by each convolution kernel to obtain a partial feature matrix, wherein each line in the matrix corresponds to a word vector of a word, when text features are extracted, the convolution kernels carry out convolution operation from top to bottom, after the convolution operation is finished, nonlinear mapping is carried out on the convolution result of the convolution kernels, a ReLU function f = ReLU = max (0, x) is used as an activation function, and a feature matrix Z = f (W × S) is obtained _input +b)＝relu(W*S _input + b), where W is a weight matrix, b is a bias term, the convolution layer adopts convolution kernels (Conv 2, conv3, conv 5) of three different sizes to obtain features between word sequences of different distances, the pooling layer extracts the feature value with the largest pooling area in the feature map by the maximum pooling method to reduce the dimension of feature information, Z is _max ＝max(Z _i ) Feature map representing maximum pooling layer extraction, wherein Z _i The ith feature map Z is shown, and max is the maximum value.

Step 3-2: the BiGRU model is used for extracting global features of text data, and comprises a reset gate and an update gate, and the global features are transmitted through a last state h _t-1 And input S of the current node _input-t To obtain two gating states, and to hide the layer h at time t in the model _t From the forward direction hidden layer

And a reverse hidden layer

The weighted sum is obtained, and the calculation mode is as follows:

wherein S is _input-t An input representing a current hidden layer is shown,

indicating the forward hidden layer state at time (t-1),

represents the reverse hidden layer state at time (t-1), w _t 、v _t Respectively representing a pre-hidden layer weight value and a post-hidden layer weight value of a BiGRU model at the moment t, b _t A bias value representing the state of the hidden layer at time t.

Further, the microblog text sentiment classification method based on deep learning provided by the invention is characterized in that the step 4 comprises the following steps:

local feature vector Z _max Global feature vector h _t And an emotion dimension distribution vector S _w Splicing, processing by a full connection layer, preventing model overfitting by fusing a Dropout method before the full connection layer, and inputting the model overfitting into Softmand the ax classifier performs text emotion classification operation to obtain:

Y'＝Dropout(a*y)+b，

Y＝softmax(Y')，

wherein a and b are weight matrix and offset value of the full connection layer, y is spliced eigenvector, and a second emotion polarity value P is output _{CNN_BiGRU} ＝polarity(Y)。

Will P _{CNN_BiGRU} = polarity (Y) and P _sentic ＝polarity(w _i ) Averaging to obtain the emotion polarity value of the text D

And performing emotion classification judgment according to a preset threshold value, and outputting a text emotion classification result.

According to a third aspect of the present invention, there is provided a computer apparatus comprising:

a memory to store instructions; and a processor for invoking the memory-stored instructions to perform the method of the first or second aspect.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium characterized by instructions stored thereon which, when executed by a processor, perform the method of the first or second aspect.

Compared with the prior art, the technical scheme of the invention at least has the following beneficial effects:

1. because the invention adopts a word embedding mode of a GloVe model and a BERT model, and combines a static word vector and a dynamic word vector, aiming at the problem that the traditional model processes word ambiguity, the representation of text information can be better adjusted according to context; and the GloVe Word embedding method is faster in training speed than Word2 vec.

2. According to the invention, local features and global features are extracted by simultaneously inputting word vectors into CNN and BiGRU models, and emotion dimension distribution vectors extracted by combining SenticNet dictionary knowledge bases associated with semantics, emotions and polarities are simultaneously matched, so that the features of text data are better and more comprehensively obtained, and better emotion classification effect and stability are expressed;

3. the method disclosed by the invention has stability and is more generally suitable for emotion classification of the text.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flowchart illustrating a microblog emotion classification method based on deep learning according to an exemplary embodiment.

FIG. 2 is a flowchart illustrating the stacking embedding of word vectors, according to an example embodiment.

FIG. 3 is a flowchart illustrating a microblog emotion classification method based on deep learning according to another exemplary embodiment.

FIG. 4 is an illustration of a BERT pre-training model in accordance with an exemplary embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the respective embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides a microblog emotion classification method based on deep learning, which comprises the following steps 1-4 as shown in figure 1:

step 1: the method comprises the steps of crawling microblog text data as a source text, cleaning the source text by a regular expression, removing special symbols and labels, performing Chinese word segmentation on the source text, removing stop words in the source text by a natural language processing toolkit, removing interference characteristics and obtaining a text D.

The method comprises the steps of obtaining a source text, preprocessing the text, obtaining microblog data, and preprocessing the text, wherein the preprocessing comprises marking, deleting stop words, POS marking and the like.

Step 1, crawling microblog data from a newly-billed microblog as a source text, cleaning the source text by a regular expression, removing special symbols and useless labels, performing Chinese word segmentation on the source text by a jiaba word segmentation tool, performing preprocessing operations such as word removal and stop by a natural language processing tool kit, removing interference characteristics, reducing complexity for subsequent processing, and obtaining a text D.

And 2, step: generating a word vector, including converting words in text D into static word vectors according to a GloVe model

Converting words in text D into dynamic word vectors S according to a BERT model _b ＝[w _b1 ,w _b2 ,...,w _bt ]And performing similarity calculation on words in the text D according to the sentiment dictionary of SenticNet to obtain a first sentiment polarity value P _sentic ＝polarity(w _i ) And an emotion dimension distribution vector S _w ＝[pleasantness _w ,attention _w ,sensitivity _w ,aptitude _w ]Where t is the dimension of the generated word vector.

The text or words obtained in step 1 are mapped to real-valued vectors, and words having the same semantics or being related to each other are represented as similar vectors, so that the machine can understand that the "queen" + "simple" vector representation is the same as the "king" vector representation. The word vector method respectively adopts a GloVe model and a BERT model to generate a corresponding static word vector and a corresponding dynamic word vector. And inputting the text words into a SenticNet emotion dictionary to generate corresponding emotion polarity values and emotion dimension distribution vectors.

And 3, step 3: extracting feature vectors based on the neural network model, including static word vectors

Will input the word vector

Inputting the global feature vector into a BiGRU model to obtain a global feature vector;

and 4, step 4: text sentiment classification, which comprises splicing the local feature vector, the global feature vector and the sentiment dimension distribution vector, inputting the spliced local feature vector, the global feature vector and the sentiment dimension distribution vector into a full connection layer for processing, and then inputting the processed local feature vector, the global feature vector and the sentiment dimension distribution vector into a Softmax classifier for calculating a sentiment polarity value to obtain a second sentiment polarity value P _{CNN_BiGRU} Averaging the first emotion polarity value and the second emotion polarity value to obtain an emotion polarity value P of the text D _D And performing emotion classification judgment according to a preset threshold value, and outputting a text emotion classification result.

Aiming at the problems that the feature extraction capability of the traditional emotion classification model is poor and cannot process polysemy, irony and the like, the feature extraction mode of a text is improved by stacking and embedding static word vectors and dynamic word vectors; for the dependency relationship in the original information and the context information which are not fully considered, a CNN and a BiGRU are constructed to extract local features and global features in parallel; aiming at the influence of specific words in the text on the emotion polarity of the text, a SenticNet dictionary is introduced to calculate the emotion polarity and emotion dimension distribution, the characteristics of the text and the words are fused, fine-grained analysis is carried out on four dimensions of the emotion, and the effect of text emotion classification is improved.

In some embodiments, the step 2 word vector method is generated by using GloVe, BERT model and SenticNet, and specifically includes:

step 2-1: gloVe is a word characterization tool based on global word frequency statistics and is based on preprocessed textD constructing a co-occurrence matrix, and expressing the word vector as

Wherein each element x _ij Representing the number of times word i and word j co-occur within a context window of a particular size, some semantic property between words may be captured. Since the word vector is affected by the distance between the words, according to the distance d between the two words in the context window, passing through the attenuation function

For calculating weights such that two words further apart are weighted less heavily into the count. Generating word vectors w by co-occurrence matrices _i Outputting static word vector

Step 2-2: the BERT (Bidirectional Encoder retrieval from transformations) model, as shown in FIG. 4, is pre-trained by both Masked LM and Next sequence Prediction tasks. The method adopts Chinese _ L-12_H-768_A-12, namely the number of transform Encoder layers is 12, the dimensionality of a hidden layer is 768, and the number of self-attention heads is 12. By inputting word vector representation of text, enhanced vector representation of full text semantic information is fused to words in text by a Transformer Encoder and a self-attention mechanism layer, and a dynamic word vector S is output _b ＝[w _b1 ,w _b2 ,...,w _bt ]。

Step 2-3: the SenticNet sentiment dictionary is used to calculate sentiment intensity. SenticNet is a concept-level knowledge base that provides concepts of semantic, sentiment, and polarity associations, with semantic meaning referring to the five most semantically related concepts to the input concept, sentiment referring to sentiment values in four sentiment dimensions (features), and interval [ -1, 1)]The emotion polarity value of (c). The senticNet can be downloaded as an independent XML file, can be obtained through API, can also be called as Python third party class library, and obtains the emotion pole of text words by inputting words in the text DProperty value P _sentic ＝polarity(w _i ) And an emotion dimension distribution vector S _w ＝[pleasantness _w ,attention _w ,sensitivity _w ,aptitude _w ]。

In some embodiments, step 3 combines the static word vectors obtained in step 2

As shown in fig. 2. Will input the word vector

The method comprises the steps of inputting the local features of input information into a Convolutional Neural Network (CNN), performing convolutional abstraction operation on input word vectors by using a convolutional layer to check the input word vectors, enabling an original word vector sequence to be changed into a convolutional abstract meaning sequence, namely the local feature vectors, inputting the convolutional abstract meaning sequence into a maximum pooling layer to perform feature dimension reduction, reserving the obvious feature vectors, reducing data volume, reducing parameters and reducing calculated amount. Will input word vector

The global feature of the input information is obtained by inputting the input information into a Bidirectional Gated Current Uint (BiGRU), and the global feature vector is output by considering the context information.

Specifically, a convolution neural network CNN and a BiGRU double channel are constructed, and a word vector is input

Inputting the local feature vector and the global feature vector of the input information into a convolutional neural network and a BiGRU to obtain:

the method comprises the following steps: 3-1: the CNN consists of two parts, a convolutional layer and a pooling layer:

the convolutional layer is composed of several characteristic patternsEach characteristic diagram is composed of multiple neurons connected with the previous layer by convolution kernel, and input

Each convolution kernel should extract a certain partial feature to obtain the partial feature matrix. Each line in the matrix corresponds to a word vector of a word, when text features are extracted, convolution operation is carried out on a convolution kernel from top to bottom, after the convolution operation is completed, nonlinear mapping is carried out on a convolution result of the convolution kernel, and a feature matrix Z is obtained as follows:

Z＝f(W*S _input +b)，

where W is the weight matrix and b is the bias term.

Nonlinear mapping is performed on the convolution result of each convolution kernel in the CNN, and a ReLU function is generally used as an activation function, as shown in the formula:

f＝relu＝max(0,x)，

thus, the feature matrix Z can be expressed as:

Z＝f(W*S _input +b)＝relu(W*S _input +b)，

the convolution layer adopts convolution kernels (Conv 2, conv3 and Conv 5) with three different sizes to obtain features among different distance word sequences, so that local features can be extracted more comprehensively.

The pooling layer extracts the largest characteristic value of the pooling region in the characteristic diagram by a maximum pooling method, so that the dimension reduction and Z measurement can be performed on the characteristic information _max A feature map representing the maximum pooling layer extraction, as follows:

Z _max ＝max(Z _i )，

wherein, Z _i The ith feature map Z is shown, and max is the maximum value.

Step 3-2: the BiGRU model is used for extracting global features of text data, and comprises a reset gate and an update gate, and the global features are transmitted through a last state h _t-1 And input S of the current node _input-t To obtain two gating states. Current hidden layer h at time t in model _t From the forward direction hidden layer

And a reverse hidden layer

The weighted sum is obtained by calculating the following formula:

wherein S is _input-t An input representing a current hidden layer is shown,

indicating the forward hidden layer state at time (t-1),

represents the reverse hidden layer state at time (t-1), w _t 、v _t Respectively represents the relevant weight values of the front hidden layer and the rear hidden layer corresponding to the BiGRU at the time t, b _t A bias value representing the state of the hidden layer at time t.

In some embodiments, step 4 combines the local feature vector Z obtained in step 3 _max Global feature vector h _t And the emotion dimension distribution vector S obtained in the step (2) _w And splicing, processing through a full connection layer, and preventing overfitting of the model by fusing a Dropout method in front of the full connection layer. Inputting the data into a Softmax classifier to perform text emotion classification operation, wherein the formula is as follows:

Y'＝Dropout(a*y)+b，

Y＝soft max(Y')，

wherein a and b are weight moments of the full connection layerArray sum offset value, y is spliced eigenvector, and output emotion polarity value P _{CNN_BiGRU} ＝polarity(Y)。

The obtained emotion polarity value P _{CNN_BiGRU} Polarity value P obtained by = polarity (Y) and SenticNet emotion dictionary in step (2) _sentic ＝polarity(w _i ) The equalization is carried out as follows:

finally outputting the emotion polarity value P of the text _D The emotional tendency is determined by a threshold value.

In summary, the invention provides a microblog text sentiment classification method based on deep learning, and the flow is as shown in fig. 3:

(1) Acquiring a microblog source text, preprocessing the text and generating a text D;

(2) Simultaneously inputting the text D into a GloVe pre-training model and a BERT pre-training model to respectively generate corresponding word vectors

And S _b ＝[w _b1 ,w _b2 ,...,w _bt ]Inputting the text D into a SenticNet emotion dictionary to obtain an emotion polarity value P _sentic And S _w ；

(3) Stacking and embedding the generated word vectors to obtain input word vectors

Simultaneously inputting the local feature vectors into the CNN and the BiGRU, and outputting corresponding local feature vectors Z _max And a global feature vector h _t ；

(4) Local feature vector Z _max Global feature vector h _t And an emotion dimension distribution vector S _w Splicing, inputting into a full connection layer, and classifying by utilizing a Softmax function to obtain an emotion polarity value P _{CNN_BiGRU} To P _{CNN_BiGRU} And P _sentic Performing equalization processing and transmissionSentiment polarity value P of output text _D And judging the emotion tendency through a set threshold value to obtain a final result of emotion classification.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A microblog text sentiment classification method based on deep learning is characterized by comprising the following steps:

step 1: pre-processing microblog text data, wherein the method comprises the steps of crawling microblog text data as a source text, cleaning the source text by a regular expression, removing special symbols and labels, performing Chinese word segmentation on the source text, removing stop words in the source text by a natural language processing tool kit, removing interference features and obtaining a text D;

Converting words in text D into dynamic word vectors S according to a BERT model _b ＝[w _b1 ,w _b2 ,...,w _bt ]And performing similarity calculation on words in the text D according to a SenticNet emotion dictionary to obtain a first emotion polarity value P _sentic ＝polarity(w _i ) And an emotion dimension distribution vector S _w ＝[pleasantness _w ,attention _w ,sensitivity _w ,aptitude _w ]Wherein t is the dimension of the generated word vector;

and step 3: extracting feature vectors based on the neural network model, including static word vectors

Will input the word vector

2. The deep learning-based microblog text sentiment classification method according to claim 1, wherein the step 2 comprises:

step 2-1: the GloVe model is a word characterization tool based on global word frequency statistics, a co-occurrence matrix is constructed according to a text D, and word vectors are expressed as

Calculating weights, and generating word vector w by co-occurrence matrix _i Outputting static word vector

Step 2-2: adopting a BERT model of Chinese _ L-12_H-768_A-12, wherein the number of transform Encoder layers is 12, the dimensionality of a hidden layer is 768, the number of self-attention heads of a self-attention machine making layer is 12, inputting word vector representation of a text D, carrying out enhanced vector representation of full-text semantic information fusion on words in the text D through the transform Encoder layers and the self-attention machine making layer, and outputting a dynamic word vector S _b ＝[w _b1 ,w _b2 ,...,w _bt ]；

Step 2-3: the sentiment dictionary of SenticNet is used for calculating sentiment intensity, and SenticNet is a knowledge base of concept hierarchy and provides the concepts of semanteme, sentiment and polarity association, wherein the semanteme refers to five concepts which are most semantically related to an input concept, the sentiment refers to sentiment values of four sentiment dimensions (affection, sentiment, preference) and an interval [ -1,1]The first emotion polarity value P is obtained by inputting words in the text D _sentic ＝polarity(w _i ) And an emotion dimension distribution vector S _w ＝[pleasantness _w ,attention _w ,sensitivity _w ,aptitude _w ]；

3. The microblog text sentiment classification method based on deep learning according to claim 2, wherein the step 3 comprises the following steps:

step 3-1: the convolutional layer is composed of a plurality of characteristic maps, each characteristic map is composed of a plurality of neurons,the neuron is connected with the previous layer through a convolution kernel and input

Extracting a certain partial feature by each convolution kernel to obtain a partial feature matrix, wherein each line in the matrix corresponds to a word vector of a word, when text features are extracted, the convolution kernels carry out convolution operation from top to bottom, after the convolution operation is finished, nonlinear mapping is carried out on the convolution result of the convolution kernels, a ReLU function f = ReLU = max (0, x) is used as an activation function, and a feature matrix Z = f (W × S) is obtained _input +b)＝relu(W*S _input + b), where W is a weight matrix, b is a bias term, the convolution layer adopts convolution kernels (Conv 2, conv3, conv 5) of three different sizes to obtain features between word sequences of different distances, the pooling layer extracts feature values with the largest pooling area in the feature map by the maximum pooling method to reduce dimension of feature information, and Z is a term _max ＝max(Z _i ) Feature map representing maximum pooling layer extraction, wherein Z _i Represents the ith feature map Z, and max represents the maximum value;

And a reverse hidden layer

The weighted sum is obtained, and the calculation mode is as follows:

wherein S is _input-t An input representing a current hidden layer is shown,

indicating the forward hidden layer state at time (t-1),

4. The microblog text sentiment classification method based on deep learning according to claim 3, wherein the step 4 comprises the following steps:

local feature vector Z _max Global feature vector h _t And sentiment dimension distribution vector S _w And (3) splicing, processing through a full connection layer, preventing overfitting of the model by fusing a Dropout method in front of the full connection layer, inputting the overfitting to a Softmax classifier to perform text emotion classification operation, and obtaining:

Y'＝Dropout(a*y)+b，

Y＝softmax(Y')，

wherein a and b are weight matrix and offset value of the full connection layer, y is spliced eigenvector, and a second emotion polarity value P is output _{CNN_BiGRU} ＝polarity(Y)，

5. A computer device, comprising:

a memory to store instructions; and

a processor for invoking the instructions stored in the memory to execute the method for deep learning based emotion classification of microblog text according to claims 1-4.

6. A computer-readable storage medium storing instructions which, when executed by a processor, perform the method for emotion classification based on deep learning of microblog texts according to claims 1-4.