CN114547299A

CN114547299A - Short text sentiment classification method and device based on composite network model

Info

Publication number: CN114547299A
Application number: CN202210150471.5A
Authority: CN
Inventors: 李校林; 赵路伟; 伍晓思
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-27

Abstract

The invention belongs to the field of natural language processing, and particularly relates to a short text sentiment classification method and device based on a composite network model, wherein the method comprises the steps of acquiring a text data set and processing the text data set; inputting the processed text data set into a BERT model to obtain word vector representation; respectively sending the word vector representations into a TextCNN network and a BiGRU network to obtain a first feature vector and a second feature vector; splicing the first feature vector and the second feature vector to obtain a spliced feature vector; the method combines the TextCNN and the BiGRU network in parallel, so that the model can fully combine the capacity of extracting local features of the TextCNN and the capacity of associating context information of the BiGRU, make up the defect of using a single network for training and effectively avoid the negative interaction between the TextCNN and the BiGRU network.

Description

Short text sentiment classification method and device based on composite network model

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a short text emotion classification method and device based on a composite network model.

Background

The emotion analysis task is a popular research task in the field of natural language processing, also called Opinion Mining (Opinion Mining), and integrates other field subjects such as computer science, signal processing, artificial intelligence and the like into related knowledge of network data Mining, and the main task is to mine emotional tendency expressed by commentators by combining given text content. Analyzing text emotion through deep learning is a research hotspot in processing natural languages, and in recent years, many researchers have applied different neural networks to the process of text emotion analysis and achieved good results.

However, the conventional Chinese short text emotion analysis task generally adopts a static pre-training model to generate word vectors, each word can only generate a unique corresponding word vector, and the problems caused by word characteristics such as word ambiguity, grammar and semantics and the like cannot be processed according to different text information. Meanwhile, in the selection of the neural network, the traditional text convolution neural network can extract local features of text data by using position invariance, but the TextCNN network cannot integrally represent the pooled word vectors in combination with other word vectors in the context. In addition, word vectors in the context can be fully combined by using the RNN network for learning, so that the output vector can well reflect the associated information in the sentence, but the traditional RNN network has the problems of gradient disappearance and gradient explosion in the training process, and the unidirectional RNN network cannot comprehensively combine the context information during training of the word vectors, so that the training effect needs to be improved.

Disclosure of Invention

In order to solve the problems, the invention provides a short text sentiment classification method and a device based on a composite network model, which combine TextCNN and BiGRU networks in parallel, so that the model can fully combine the capacity of TextCNN for extracting local features and the capacity of BiGRU associated context information, make up the defect of using a single network for training, and effectively avoid the negative interaction between the TextCNN and the BiGRU networks.

A short text sentiment classification method based on a composite network model constructs a BERT-CNN-BiGRU composite network model, the model comprises a BERT model, a TextCNN network, a BiGRU network and an improved self-attention module, and the short text sentiment classification method based on the composite network model comprises the following steps:

s1, acquiring a Chinese short text data set, and processing the Chinese short text data set, wherein the processing means comprises defining a BERT word separator, removing stop words, separating words, filling length, dividing the data set and the like;

s2, inputting the processed Chinese short text data set into a BERT model to obtain word vector representation;

s3, respectively sending the word vector representations into a TextCNN network and a BiGRU network to obtain a first feature vector and a second feature vector;

s4, splicing the first feature vector and the second feature vector to obtain a spliced feature vector;

and S5, sending the spliced feature vectors into an improved self-attention module to obtain classified feature vectors, and obtaining a classification result through a linear layer with a softmax activation function.

Furthermore, the TextCNN network is a three-channel text convolution network, which includes a first convolution layer with a convolution kernel of 2, a second convolution layer with a convolution kernel of 3, a third convolution layer with a convolution kernel of 4, and a pooling layer, and the process of obtaining the first feature vector through the TextCNN network is as follows:

convolving the word vector representation with a first convolution layer, a second convolution layer and a third convolution layer respectively to obtain a first convolution result, a second convolution result and a third convolution result;

pooling the first convolution result, the second convolution result and the third convolution result through a pooling layer to obtain a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector;

and splicing the first low-dimensional vector, the second low-dimensional vector and the third low-dimensional vector to obtain a first feature vector.

Further, the calculation formula of the three convolutional layers is as follows:

wherein C is a convolution result, W is a weight vector of a convolution kernel, X is a word vector matrix formed by word vector representation, b is a bias execution vector, and sigma is a sigmoid activation function.

Further, the pooling layer respectively obtains maximum eigenvalues of the first convolution result, the second convolution result and the third convolution result as a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector, and the formulas are expressed as:

M_i＝max(c_i1,c_i2,...,c_ij)；

wherein M is_iRepresenting the maximum characteristic value obtained by pooling the first convolution result, the second convolution result or the third convolution result, c_ijAnd expressing the ith characteristic vector in the first convolution result, the second convolution result or the third convolution result and a value generated by the ith convolution, wherein one convolution generates a value, the values generated by multiple convolutions are spliced, the spliced result is used as the convolution result output by the convolution layer, and the maximum pooling formula is that the maximum value is taken from the convolution result of multiple convolution operations of one convolution layer, so that the dimension reduction effect is realized.

Further, the calculation formula of the improved self-attention module is as follows:

wherein h' is a classification feature vector, X is a splicing feature vector,

for the adjustment factor, W is a position weight parameter.

A short text sentiment classification device based on a composite network model comprises:

the data processing module is used for processing the acquired text data;

the BERT module is used for acquiring word vector representation of the text processed in the data processing module;

the text convolution module is used for performing convolution pooling on the word vector representation output by the BERT module to obtain a first feature vector;

the gating cycle module is used for acquiring context information represented by the word vector output by the BERT module and outputting a second feature vector;

the splicing unit is used for splicing the first eigenvector output by the text convolution module and the second eigenvector output by the gating circulation module to obtain a spliced vector;

the improved self-attention module is used for screening key information in the spliced vector to obtain a classified feature vector;

and the classification module is used for acquiring the emotion classification result of the text data according to the classification feature vector.

The invention has the beneficial effects that:

the invention provides a short text sentiment classification method and a short text sentiment classification device based on a composite network model, wherein a BERT-CNN-BiGRU composite network model is constructed, TextCNN and BiGRU networks are combined in parallel in the model, the training efficiency is higher, and the classification time is shortened. Position information is introduced into the self-attention mechanism for improvement, the improved self-attention mechanism is utilized for weighting and summing, so that the characteristics obtained by the two mechanisms are fully combined, the training efficiency is improved, the attention weight parameters can reflect key information, and the text emotion classification accuracy is greatly improved.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a diagram of a BERT-CNN-BiGRU composite network model structure according to the present invention;

FIG. 3 is a diagram of a GRU network architecture;

fig. 4 is a diagram illustrating the structure of the TextCNN network according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention constructs a BERT-CNN-BiGRU composite network model based on a BERT model and a CNN-BiGRU composite neural network, the structure of the model is shown in figure 2, the BERT-CNN-BiGRU composite network model combines network modules in a series-parallel combination mode, a short text data set is sent into the BERT model to be trained to obtain a word vector representation sequence, the word vector representation sequence is sent into a TextCNN network and a BiGRU network respectively to be trained, the TextCNN network and the BiGRU network are parallel, and the training processes of the TextCNN network and the BiGRU network are independent. And respectively training the TextCNN and the BiGRU network to obtain corresponding feature vectors, splicing the TextCNN and the BiGRU network, fusing the TextCNN and the BiGRU network through an improved self-attention mechanism, so that the finally generated feature vectors for classification can fully combine the ability of the TextCNN network for extracting and capturing local feature information and the ability of the BiGRU network for combining context associated semantic information, and finally classifying through a linear layer with a softmax activation function to obtain a final judgment result.

In the overall training process of the model, the TextCNN and the BiGRU network adopt a parallel combination mode, in a natural language processing task, the TextCNN network can extract and capture local feature information of an input text sequence by utilizing the position invariance of the TextCNN network, but information outside a moving window is inevitably lost when the TextCNN network is operated by convolution pooling and the like, so that the output feature vector cannot be well combined with full-text information for classification operation. When processing an input text sequence, an RNN network such as a BiGRU network may sufficiently combine context-related semantic information according to a time sequence or an input sequence, but may combine a certain amount of interference information to affect the determination, and also has no capability of capturing local feature information. If the TextCNN and the BiGRU network are combined in series, training results of the TextCNN and the BiGRU network are mutually interfered, information outside a sliding window can be lost after the input text vector is subjected to local feature extraction through the TextCNN, and the text vector cannot be effectively combined with context associated semantic information after being sent into the BiGRU network. If the extraction and combination of the context associated semantic information are carried out through the BiGRU network, and then the text vector structure is damaged when the text vector structure is sent into the TextCNN network, so that the local feature information can not be effectively extracted, the invention adopts a parallel combination mode of the TextCNN and the BiGRU network for training, thereby combining the advantages of the TextCNN and the BiGRU network and effectively avoiding the negative interaction between the TextCNN and the BiGRU network. The training efficiency and the training result are obviously improved.

In an embodiment, a short text sentiment classification method based on a composite network model, as shown in fig. 1, includes the following steps:

s1, acquiring a microblog text data set and processing the microblog text data set;

s2, inputting the processed microblog text data set into a BERT model to obtain word vector representation;

The data set adopted in the embodiment is an open source data set weibo _ senti _100k, the data set is composed of text comment data of the Xinlang microblogs, and the comment data originate from a large number of different types of microblogs and relate to multiple types of microblog comments. Emoji expressions, which are often used in reviews, have been replaced with corresponding textual information. The emotion tags of microblog comment data in the data set are classified into two categories [0,1], the emotion tag of comment data with negative emotion polarity is 0, and the emotion tag of comment with positive emotion polarity is 1. 119988 microblog comment data are summed, wherein 59993 positive emotion comments and 59995 negative emotion comments.

Specifically, the microblog data set is characterized in that text sentences are short, although a large number of emoji emoticons and special characters are contained in the microblog data set, the microblog data set has single emotion polarity and contains more interference information, and the microblog data set is suitable for directly operating the whole sentence and is not suitable for segmentation operation.

The RNN may create a gradient vanishing or gradient explosion problem as time steps increase during training, which makes it impossible for the RNN to effectively extract and utilize historical information. The LSTM network uses three gates to maintain and control the cell state, specifically, forgets to discard previously useless history information and stores valid information again through an input gate, thereby solving various deficiencies of the RNN network.

The GRU network improves and upgrades the LSTM network, and effectively solves the problem that the LSTM network structure is too complex. The GRU network structure is shown in fig. 3, the GRU network replaces the last forgetting gate and the input gate of the original LSTM network with an update gate, and the update gate can decide which information to keep or discard after the computation is finished. Another gate in the GRU network is a reset gate which manages the mutual combination of new inputs and previous computation results for determining the extent to which hidden layer information was ignored at the previous moment. The GRU network improves a three-layer structure in the LSTM network into two layers, has the same cell state and hidden layer state, and has the advantages of more simplified structure, less parameters, better model convergence and shorter training time on the basis of keeping the effect of the LSTM network.

Preferably, the forward propagation formula of the GRU network is:

r_t＝σ(W_r·[h_t-1,x_t])；

z_t＝σ(W_z·[h_t-1,x_t])；

wherein z is_tFor updating the gate at time t, r_tReset gate for time t, x_tInput sequences fed into the GRU network for time t, i.e. word vector representations of the output of the BERT model, h_tIndicating the active state at time t,

indicating candidate activation states at time t, W_r、W_z、W_hAnd sigma is a sigmoid activation function for the corresponding weight parameter matrix.

When processing the sequence data output by the BERT, the unidirectional GRU network can only combine the semantic information of the above with the input of the next time according to the time sequence relation, and can not capture the related information of the following effectively. For this purpose the bidirectional GRU network is used for training.

The TextCNN network adopted by the invention is a three-channel text convolution network, the structure of which is shown in figure 4, wherein W in the figure_iIs the input i-th word vector. The dimensionality of the word vector output by the BERT model of the experiment is 768 dimensions, and the width of a convolution kernel used in NLP task is required to be evenly divided by 768, so that the sizes of convolution kernels with 2, 3 and 4 word numbers are used;

specifically, the first convolution layer with convolution kernel 2, the second convolution layer with convolution kernel 3, and the third convolution layer with convolution kernel 4, and the process of obtaining the first feature vector through the TextCNN network in step S3 is as follows:

Specifically, the calculation formula of the convolutional layer is as follows:

wherein, the first and the second end of the pipe are connected with each other,

for convolution operation, C is a convolution result, W is a weight vector of a convolution kernel, X is a matrix vector formed by word vector representation, b is a bias execution vector, and sigma is a sigmoid activation function.

The pooling layer can extract local feature information captured by the convolution layer and play an effective dimension reduction role, and the formula is as follows:

M_i＝max(c_i1,c_i2,...,c_ij)；

wherein M is_iRepresenting the maximum characteristic value obtained by pooling the first convolution result, the second convolution result or the third convolution result, c_ijRepresenting the ith feature vector in the first convolution result, the second convolution result or the third convolution result, obtaining feature vectors with the same size after dimension reduction after convolution by convolution kernels with the word vector widths of 2, 3 and 4 and having different sizes after convolution, namely screening the maximum feature value from different feature vectors generated after convolution of each sliding window, and carrying out splicing operation on each screened maximum feature value to obtain a new feature value combined with the feature values of all sliding windows, thereby eliminating the length difference between vector sequences generated by calculation difference.

The purpose of the attention mechanism is to give higher weight to the key information on the basis of keeping the original characteristic information, so that the key information has larger influence on the final classification result. Only the internal elements of the user can be focused through the self-attention mechanism, the training efficiency is higher, and the generated result is more accurate. The calculation formula of the self-attention mechanism is as follows:

h is the self-attention mechanism output.

In the calculation process of the self-attention mechanism, vectors at different positions in a spliced vector sequence have different influences on a final result, for example, a vector at a front position often obtains a larger attention weight parameter because an observation window is smaller and is less combined with other information, which affects a final judgment result.

Specifically, the calculation formula for the improved self-attention module is as follows:

wherein h' is a classification feature vector, X is a splicing feature vector,

for the adjustment factor, W is a position weight parameter, expressed as:

a_mmis a location weight parameter.

In the embodiment, a microblog data set is adopted for training, and the accuracy (accuracy), precision (precision), recall (call) and F1 values are adopted for evaluation after model training is completed.

The accuracy is the proportion of correct judgment in all judgments, which represents the proportion of samples with correct prediction results in the total samples, and the formula is as follows:

the precision rate is also called precision rate, and it represents the proportion of the correctly predicted samples in all the samples predicted to be positive according to the predicted result, and its formula is:

the recall ratio is called recall ratio, and represents the proportion of the correctly predicted positive samples in all the actually positive samples, and the formula is as follows:

the F1 value is a comprehensive evaluation index after combining the accuracy rate and the recall rate, the larger the value is, the better the value is, and the formula is as follows:

tp (true poisites) represents a sample which is correctly predicted as a positive class in a positive class sample, fp (false poisites) represents a sample which is incorrectly predicted as a positive class in a negative class sample, fn (false positives) represents a sample which is incorrectly predicted as a negative class in the positive class sample, and tn (true negotives) represents a sample which is correctly predicted as a negative class in the negative class sample.

To verify the performance of the model presented herein, comparative experiments based on BERT and CNN-BiGRU composite neural networks were set up, and the accuracy, precision, recall, and F1 values were taken for each experiment and compared.

The comparative experiment contained the following 5:

(1) W2V-CNN-BiGRU the word vector output using the traditional and training model word2vec was trained in a composite network designed herein.

(2) BERT: and (4) directly training by using a single BERT model, and directly classifying the obtained result.

(3) BERT-TextCNN: and (5) taking the BERT as a pre-training model to obtain word vectors, outputting the word vectors, and then training the word vectors through a TextCNN network.

(4) BERT-BiGRU: and (4) taking the BERT as a pre-training model to obtain word vectors and output the word vectors, and then training the word vectors through a BiGRU network.

(5) BERT-CNN-BiGRU: and obtaining word vectors by taking the BERT as a pre-training model, respectively sending the output into a TextCNN network and a BiGRU network for training, splicing the output obtained by the two networks, and performing weighted calculation by an improved self-attention mechanism.

The results obtained after test comparison of the individual experiments on the test set are shown in table 1:

TABLE 1 comparative results

As can be seen from Table 1, the indexes of the BERT-CNN-BiGRU model have obvious advantages over other models. The BERT model is formed by using encoder stacking in a transform, and a self-attention mechanism in the transform model can fully combine the context of an input data set when generating word vectors, which is a great advantage of the BERT model compared with a traditional word2vec model. Therefore, even though the BERT model is used for training alone, the effect of the output feature vector after classification task is directly carried out is still very superior to that of the w2v composite network model in the 1 st experiment, and the accuracy and the F1 value obtained finally in the experiment are greatly improved. However, the model that the output of the BERT is classified after local features or context correlation information is extracted through other networks still has a progress space, and the accuracy and the F1 value of the final output of the BERT model after the BERT model is connected with the TextCNN or the BiGRU network are obviously improved compared with the effect of training the BERT model. Comparing the 3 rd and 4 th experiments in combination with the previous training procedure, it can be seen that the TextCNN model here works slightly better when dealing with the tasks herein than the model using the BiGRU network alone. In this embodiment, the BERT-CNN-BiGRU model fully combines the capability of extracting local features from the CNN network with the capability of extracting context-dependent semantic information from the BiGRU, and the obtained feature vector also contains more key information, so that the accuracy and the F1 value in the 5 th experiment are the highest in all comparison experiments.

the data processing module is used for processing the acquired text data;

the gating circulation module is used for acquiring context information represented by the word vector output by the BERT module and outputting a second feature vector;

Specifically, the text convolution module is used for extracting local features in a word vector sequence by utilizing the position invariance of a convolution neural network and screening local key contents;

specifically, the gating cycle module adopts a BiGRU network which is a bidirectional improved RNN network, and the gating cycle module has the function of fully combining context associated semantic information to ensure that a vector sequence output by the gating cycle module can fully combine context semantic information in a text;

specifically, the improved self-attention module has the function of combining the local feature vectors extracted and captured by the TextCNN network with the vectors with context-associated semantic information generated by the BiGRU network, so that the vectors finally used for classification can combine the advantages of the local feature vectors and the vectors, thereby improving the classification accuracy. And more important information can be given more weight when the attention mechanism weight is calculated, so that the influence on the final classification result is larger.

In the self-attention mechanism, a weight parameter iterator, namely a position weight parameter matrix is added, and the function of the weight parameter iterator is to improve the defects of the TextCNN and the BiGRU network in the training process, for example, when the self-attention mechanism is calculated, a vector with a front position tends to obtain larger attention weight. Balancing it with this weighting parameter matrix will result in a better classification.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A short text sentiment classification method based on a composite network model is characterized in that a BERT-CNN-BiGRU composite network model is constructed, the model comprises a BERT model, a TextCNN network, a BiGRU network and an improved self-attention module, and the short text sentiment classification method based on the composite network model comprises the following steps:

s1, acquiring a Chinese short text data set, and processing the Chinese short text data set;

2. The short text sentiment classification method based on the composite network model according to claim 1, wherein the TextCNN network is a three-channel text convolution network, which includes a first convolution layer with a convolution kernel of 2, a second convolution layer with a convolution kernel of 3, a third convolution layer with a convolution kernel of 4 and a pooling layer, and the process of obtaining the first feature vector through the TextCNN network is as follows:

respectively sending the word vector representations into a first convolution layer, a second convolution layer and a third convolution layer for convolution to obtain a first convolution result, a second convolution result and a third convolution result;

3. The method for short text sentiment classification based on a composite network model according to claim 2, wherein the calculation formula of the three convolutional layers is as follows:

4. The method for short text sentiment classification based on a composite network model according to claim 2, wherein the pooling layer respectively obtains maximum eigenvalues of the first convolution result, the second convolution result and the third convolution result as a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector, and the formulas are as follows:

M_i＝max(c_i1,c_i2,...,c_ij)；

wherein M is_iRepresenting the first convolution result, the second convolution result, orMaximum eigenvalue, c, obtained after pooling of the third convolution result_ijThe representation represents the ith feature vector in the first convolution result, the second convolution result or the third convolution result.

5. The method for short text sentiment classification based on a composite network model according to claim 1, wherein the calculation formula of the improved self-attention module is as follows:

wherein h' is a classification feature vector, X is a splicing feature vector,

for the adjustment factor, W is a position weight parameter.

6. A short text sentiment classification device based on a composite network model is characterized by comprising the following steps:

the data processing module is used for processing the acquired text data;