CN115630653A

CN115630653A - Network popular language emotion analysis method based on BERT and BilSTM

Info

Publication number: CN115630653A
Application number: CN202211362305.8A
Authority: CN
Inventors: 李新路; 雷园园; 嵇圣硙
Original assignee: Hefei University
Current assignee: Hefei University
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2023-01-20

Abstract

The invention relates to the technical field of text analysis, in particular to a BERT and BilSTM-based network popular language emotion analysis method, which comprises the following steps: constructing a network popular language emotion analysis model according to BERT-BilSTM represented by a bidirectional encoder of bidirectional long-short term memory conversion, wherein a BERT pre-training model is subjected to fine tuning to generate dynamic representation of network popular word vectors, and then the dynamic representation is input into the BilSTM for training to obtain local and global semantic features of a text and then emotion classification is carried out; acquiring data from the Internet, establishing a data set, and labeling and classifying; screening a classification data set, acquiring standard data, inputting the standard data as input quantity into a BERT-BilSTM deep learning model for analysis, acquiring an analysis result, and judging the emotion of the popular language; and performing emotional tendency analysis on the network popular speech so as to grasp the development direction of spreading the bad and interesting languages in real time.

Description

Network popular language emotion analysis method based on BERT and BilSTM

Technical Field

The invention relates to the technical field of text analysis, in particular to a BERT and BilSTM based network popular language emotion analysis method.

Background

With the research and development of deep learning in the aspect of natural language processing, the deep neural network obtains prominent expression in the aspect of emotion analysis. Kim solves the emotion classification problem with Convolutional Neural Networks (CNN). Cho proposes gated round-robin units (GRUs) to analyze contexts where long dependency problems exist, with significant improvements in the various tasks of natural language processing. Qu and Wang propose an emotion analysis model based on a hierarchical attention network, and compared with a recurrent neural network, the accuracy is improved by 5%. To address the inability of traditional neural networks to fully capture the entire context of a sentence or comment, some scholars have employed variants of Recurrent Neural Networks (RNNs), such as LSTM or GRU, to solve sentiment analysis problems. The LSTM model prioritizes emotion classification at the phrase level to include linguistic regularization, such as negativity, strength, and polarity. Nio uses bi-directional LSTM to train the tagged text to handle the syntax and semantics of Japanese. For the polarity classification of phrase-level words, zhang proposes a BiGRU model with multiple inputs and outputs.

In the above research, the sample data basically belongs to regular and formal written text data, but most of the network popular language text data researched by the application is presented in a random form, and the text has no obvious semantic structure, and the models of the CNN and the RNN cannot well process the text, so that different emotions in the context of the network popular language are difficult to solve; and a great amount of word ambiguity exists in the network popular language, and richer dynamic word vector expression capability is needed.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention mainly aims to provide a BERT pre-training model combined with a BilSTM bidirectional long-and-short term memory network for analyzing the sentiment of the popular words in the network. Generating a dynamic word vector based on a pre-trained BERT model, then inputting the dynamic word vector into a bidirectional long-and-short time memory network BilSTM, extracting the emotional characteristics of the text by combining the context semantics contained in the word vector, and finally judging the emotional polarity of the current text by utilizing a Softmax layer.

In order to achieve the purpose, the invention adopts the following technical scheme that a network popular language emotion analysis method based on BERT and BilSTM comprises the following steps:

constructing a network popular language emotion analysis model according to a BERT pre-training language model and a BilSTM bidirectional long-short term memory network model;

the method comprises the following steps that a BERT is used as an upstream module and a BilSTM bidirectional long-short term memory network model is used as a downstream module in the network popular language emotion analysis model, the BERT pre-training language model is adjusted to generate dynamic representation of network popular word vectors, the dynamic representation is input into the BilSTM bidirectional long-short term memory network model for training, and emotion classification is carried out after local and global semantic features of a text are obtained;

and acquiring the data of the network popular words to be analyzed from the Internet, and analyzing the data through the trained emotion analysis model of the network popular words to obtain emotion classification results corresponding to the network popular words.

Preferably, the BERT pre-training language model is pre-trained through large-scale corpora to obtain model network parameters adapted to a general natural language processing task, and the pre-training network parameters are adjusted according to a text of a current task.

Preferably, the adjustment of the BERT pre-training language model is to use parameters obtained by pre-training as initial values of the model, input an artificially labeled data set according to a downstream task, balance the relationship between the data and the model, further fit and converge the BERT pre-training language model, and obtain a model for pre-training the downstream task.

Preferably, the acquired popular language data to be analyzed is screened to obtain a data set meeting the input standard of the popular language emotion analysis model, and the data set is classified into a training set, a test set and a verification set, wherein the training set accounts for 80%, the test set accounts for 10%, and the verification set accounts for 10%.

Preferably, the labels are divided into two categories, negative and positive.

Preferably, the result of multiplying the output of the BERT pre-training language model by the weight capable of learning is used as the input of the BilSTM bidirectional long short term memory network model.

Compared with the prior art, the method is specifically a BERT and BilSTM combined deep learning model, dynamic representation of the network popular word vectors is generated in downstream tasks by finely adjusting the BERT model, rich information of texts is captured in an embedded layer, and the problem of static representation of the word vectors is solved. And then inputting the generated word vectors into a BilSTM bidirectional long and short term memory network model for feature extraction, acquiring local and global semantic features of the text, highlighting the emotion polarity of the text, and performing emotion classification. Experimental results show that the model has a good effect in the comprehensive evaluation index F1, and the model has important significance and research value for sentiment analysis of irregular texts such as network popular languages.

Drawings

FIG. 1 is a diagram of the ERT-BilSTM network popular language architecture of the present invention;

FIG. 2 is a BilSTM layer language model structure of the present invention;

FIG. 3 is a graph of Train Acc/Loss according to the present invention;

FIG. 4 is a graph of Train Acc/Loss according to the present invention;

FIG. 5 is a ValAcc/Loss plot of the present invention;

fig. 6 is a flow chart of the present invention.

Detailed Description

The invention is further described with reference to the following figures and embodiments.

Example 1:

referring to fig. 6, a method for analyzing emotion of popular words in network based on BERT and BiLSTM includes the following steps:

constructing a network popular language emotion analysis model according to a BERT pre-training language model and a BilSTM bidirectional long-short term memory network model; the network popular language emotion analysis model adopts BERT as an upstream module and BilSTM as a downstream module, adjusts a BERT pre-training model to generate dynamic representation of network popular language word vectors, inputs the dynamic representation into the BilSTM for training, acquires local and global semantic features of a text, and then carries out emotion classification; and acquiring the data of the network popular words to be analyzed from the Internet, and analyzing the data through the trained emotion analysis model of the network popular words to obtain emotion classification results corresponding to the network popular words.

As a further optimization of the invention, the BERT model is pre-trained through large-scale linguistic data to obtain model network parameters suitable for a general natural language processing task, and the pre-trained network parameters are adjusted according to the text of the current task.

As a further optimization of the invention, the adjustment of the BERT model is to use parameters obtained by pre-training as initial values of the model, input an artificially labeled data set according to a downstream task, balance the relationship between data and the model, further fit and converge the BERT model, and obtain a model which can be used for pre-training of the downstream task.

As a further optimization of the method, the acquired network popular language data to be analyzed is screened to obtain a data set which accords with the input standard of the network popular language emotion analysis model, and the data set is classified into a training set, a test set and a verification set, wherein the training set accounts for 80%, the test set accounts for 10% and the verification set accounts for 10%.

As a further optimization of the present invention, the labels fall into two categories, negative and positive.

As a further optimization of the present invention, the result of multiplying the BERT pre-training model output by a weight that can be learned is used as an input to the BilSTM model.

Example 2

Referring to fig. 1-5, based on embodiment 1, it can be seen that the combined prediction model, i.e. the bidirectional coder representation for bidirectional long-short term memory transformation (BERT-BiLSTM), is used in the present application to construct the emotion analysis model of the network popular language. It should be noted that the BERT-BiLSTM proposed in the present application uses BERT as an upstream module and BiLSTM as a downstream module, and the BERT pre-training model is fine-tuned and then input into BiLSTM for training, instead of using a combined model between BERT and BiLSTM, and the structure of the BERT-BiLSTM network popular language is shown in fig. 1.

Adjusting a BERT pre-training model to generate dynamic representation of network popular word vectors, inputting the dynamic representation into a BilSTM for training, acquiring local and global semantic features of a text, and performing emotion classification;

and acquiring the network popular language data to be analyzed from the Internet, and analyzing through the trained network popular language emotion analysis model to obtain an emotion classification result corresponding to the network popular language.

At present, the BERT model has strong advantages in tasks such as text classification and emotion analysis in the NLP field. Therefore, for a language which is a network popular language and has strong semantic pertinence and a word ambiguity phenomenon, the language can be well expressed by adopting the BERT model. BERT is a transfer learning pre-training neural network model, and a Transformer-based bidirectional encoder of the model can take information before and after a word into consideration when the word is processed, so as to obtain the semantics of the context of the word. The basic model of BERT has 12 stacked coding layers, each with 12 self-attention heads, and 768 hidden units per feed-forward layer, and the output of the final model will be used as the input of the downstream task to provide high-quality word vectors for the downstream task. The BERT pre-trains the model through large-scale linguistic data to obtain model network parameters suitable for a general natural language processing task, and then fine-tunes the pre-trained network parameters by using the text of the current task to enable the model to be suitable for a specific downstream task.

BERT can calculate Word representation according to context information and adjust according to the meaning of the Word when fusing context information, but the Word represented by Word2Vec cannot contain context relationship.

The inputs to the BERT model are represented by vector overlays of Token embedding, segment embedding, and Position embedding, which generate word vector tokens. The Token embedding is used as a word vector of a first mark, the initial value of the Token embedding can be randomly generated and is a segmentation mark between sentences, the Segment embedding is used for distinguishing whether different popular language texts are similar in semanteme, and the Position embedding is used for representing Position information of each word in the popular language texts. Therefore, the input vector of the BERT model not only contains short text semantics, but also contains distinguishing information between different sentences and position information between words. BERT is pre-trained using MLM (Masked Language Model), which randomly conceals the input words and then predicts the original vocabulary of the concealed parts according to context. Whether the positions of two sentences are adjacent is predicted by learning the relational features between sentences by inserting and labeling the beginning and end of each Sentence separately using NSP (Next sequence Prediction). This two-way training model gives BERT a deeper sense of language context and the ability to learn the context of words through the surrounding environment. Model input is fine-tuned for different natural language processing downstream tasks. The BERT fine tuning process directly uses parameters obtained by pre-training as initial values of the model, and inputs a manually marked data set according to a downstream task, balances the relation between the data and the model, so that the BERT model is further fitted and converged, and finally obtains a model which can be used for pre-training of the downstream task.

The problems of language fragmentation, incomplete semantic structure, irregular features and the like exist in the network popular language text, and BERT has strong capacity of learning and counting the features of nearby words; the BilSTM model is then used to extract contextual features in the input sequence data. And analyzing the feature vector output by the BiLSTM by using a Softmax classifier, and realizing the emotion polarity classification of the network popular words.

BERT-BilSTM model trained output C = { T } of hidden layer of BERT ₀ ,T ₁ ,T ₂ ,···,T _N Is ∈ Rn (including [ CLS ]]) Multiplied by a learnable weight

As an input of the BilSTM model, the calculation formula is as follows:

a _i ＝g1(W _a C _i +b _a ) (1)

wherein i is more than or equal to 1 and less than or equal to n, and n represents the dimension of the feature vector output after the BERT model is finely adjusted; ai is an element of Rda as an input vector of the BilSTM layer; b _a Is dimension d _a The offset vector of (2); the activation function g1 employs a Sigmoid function, which inputs an input vector into the hidden layer.

The common LSTM model calculates a unidirectional hidden layer sequence h, while the BilSTM model calculates forward hidden layer vectors as

And the backward hidden layer vector is

And finally, combining the two to output. If the output vector of the hidden layer at the ith moment is vi, the input and output processes of the BilSTM layer are as shown in FIG. 6, and the calculation formula is as follows:

wherein the content of the first and second substances,

in addition, the activation function calculation formula of the model hidden layer is as follows:

wherein g2 is a tanh function;

is the weight matrix of ai; d is from {0,1} represents two different directions in the hidden layer;

hidden layer output sequence h corresponding to the last time instant, i.e. time instant i-1 ^d A weight matrix of (a);

an offset vector representing the directional index d. Then output sequences of all hidden layers

And splicing together to obtain the final feature vector H of the sentence. And then, inputting the feature vector H into a full connection layer taking a Relu function as an activation function and a full connection layer taking a Softmax function as an activation function in sequence to realize the classification of the emotional tendency. The probability calculation formula of the emotional tendency classification is as follows:

p(y|H,W _s ,b _s )＝softmax(W _s H+b _s ) (4)

wherein, W _s ∈R ^|s|×|l| And b _s ∈R ^|l| Is a parameter of the output layer and l is the number of categories classified.

The data set part of the application is from an internet corpus of a dog searching laboratory, and the part of the data set uses a comment text with popular terms of a microblog hot event acquisition part to acquire 90000 network popular terms in total. About twenty thousand texts have the problems of overlong content, complex emotional tendency, excessive special symbols and the like, so that the original text data is cleaned and screened. The final network popular languages meeting the input model standard have 60000. Most of special texts such as network popular languages do not have labels of words with emotional tendencies, and in order to ensure data validity, the text contents are manually labeled and classified (the labels are divided into a negative category and a positive category). And through comparison and analysis of manpower and an algorithm, more accurate linguistic data is further obtained through adjustment. In view of the present application's study of emotion two classification, the negative web prevalence is denoted by 0 and the positive web prevalence is denoted by 1. 80% of the data sets were selected as training set, 10% as test set and 10% as validation set (see Table 1).

TABLE 1 network popular language text collections

The model of the application adopts an operating system of Ubuntu20.04 and a graphics card of GeForce RTX 3080. The experimental framework is built by pytorreh, the development environment is torch =1.11.0, and the development language is python3.7.

The dynamic learning rate and early termination (early stopping) in the BERT model may determine the parameters num _ epochs and spare _ rate. Setting num _ epochs =9 indicates that the model was trained 9 times; recording the accuracy of num _ epochs, reducing the current learning rate if the current accuracy is not improved compared with the previous epochs, and stopping training if the accuracy in the test is not improved after 9 continuous epochs, wherein the learning rate of the model is 5e-5.Batch _ size is the number of training samples in each Batch. The Batch _ size may be set to 16, 32, 64, or the like, and when the Batch _ size =16, 32, the training speed is slower as the number of iterations increases, and the larger the Batch _ size is, the more the characteristics of the entire data can be expressed, and the accuracy of the gradient descent direction can be ensured. The number of iterations decreases when set to 256, but causes the parameter to be revised slowly, eventually setting the value of Batch _ size to 128. And the Pad _ size represents the processing length of each text, the short text is filled, and the long text is segmented. Since the network popular text belongs to the short text, the value of Pad _ size is set to 64. The Hidden _ size is set to 768 in the BERT model, indicating the neuron number of the Hidden layer of the model, and is not changed in the combined model, yet Hidden _ size is set to 768 (see table 2).

TABLE 2 Experimental parameters

In the present application, accuracy (Precision), recall (Recall), and F1 value (F1-score) are used as evaluation indexes. Wherein Precision represents the correct positive example sample proportion to predict; the recognition capability of the Recall reaction model on the positive sample reaches the maximum value of F1 only when the numerical values of the precision rate and the Recall rate are simultaneously 1. If the positive and negative data sets are irregular texts, gaps can occur in the calculation of the accuracy rate and the recall rate. The F1-score can more comprehensively reflect the classification performance by combining the Precision evaluation index and the Recall evaluation index, and the better the performance of the classifier is, the closer the F1 value is to 1. The concrete expression form is as follows:

results	Practical correction	Practical negative example
			Example of prediction	TP	FP
Negative example of prediction	FN	TN

Precision's calculation formula can be expressed as:

the calculation formula for Recall can be expressed as:

the formula for the calculation of the F1 value can be expressed as:

comparative analysis for experiments

According to the application, through 8 groups of experiments, values such as Precision, recall and F1 are obtained on a test set respectively, and the comparison result of a self-established network popular phrase data set is shown in a table 3.

TABLE 3 comparison of models

Model (model)	Precision(％)	Recall(％)	F1-score(％)
				BERT	93.86	93.55	93.71
BERT-RNN	94.27	93.00	93.63
				BERT-BiLSTM	93.78	93.80	93.79
TextCNN	92.83	91.77	92.29
				BiLSTM	92.05	92.96	92.50
BiLSTM-Attention	93.59	93.63	92.61
				ERNIE	94.20	93.28	93.74
BERT-CNN	94.32	92.81	93.56

From the experimental results shown in Table 3, the BERT-BilSTM model presented herein outperforms other models on the web epidemic data set. First, the BERT-BiLSTM model has the highest F1-score value, which shows that the model has strong comprehensive capability and good generalization performance. Even the F1-score value is an evaluation index integrating the two indexes and is used for comprehensively reflecting the overall index. Second, the Recall value of the BERT-BiLSTM model is also highest compared to the other models. This also indicates that the number of forward samples identified by the model is the best, and the coverage of training samples is broad, reflecting the sensitivity of the model. Third, BERT-BilSTM performed well in terms of accuracy. Its value is approximately close to BERT, slightly worse than BERT-CNN and BERT-RNN, only 0.54% difference. This is due to the higher recall rate of the model, which indirectly results in a decrease in accuracy. It is worth noting that the BiLSTM model has more prominent learning ability on the context characteristics of the text, so that the three models of BERT-BiLSTM, biLSTM and BiLSTM-Attention have better results in the aspects of Recall and F1-score. Obviously, the best training effect of the BERT-BilSTM model is achieved, and it is proved that the word vector obtained by using the BERT pre-training model contains more complete context information, which is beneficial to extracting text information by a subsequent model.

FIG. 3 shows a comparison of F1 values in eight experiments. It can be seen that the F1 index of the BERT-BilSTM model in the seven-group comparison experiment designed in the experiment is the highest, which shows the improvement of the overall classification performance of the model and the improvement of the capability of distinguishing different classes, and proves that the model has strong capability in emotion analysis.

Fig. 4 and fig. 5 show the performance of the BERT-BiLSTM model in the training set and verification set experiment process of the self-built network popular language data set respectively, and the performance of the model on the verification set tends to be stable through graphs of Train Acc, train Loss, val Acc and Val Loss. The final Loss Test Loss and accuracy Test Acc on the Test set were 0.15 and 93.74%, respectively.

The wide use of social media platforms brings about endless expression modes, and meanwhile, a large number of network popular phrases without complete semantic structures and obvious emotional features emerge. A plurality of network popular lines are generated from microblogs, jitters or hot event comments and chats of some game platforms, and emotion tendency analysis is carried out on the network popular lines, so that the fact that the development direction of spreading of the bad and interesting languages is mastered in real time has important significance for purifying the network environment.

In the embodiment, the network popular lines with high use rate in the last decade are collected as research objects, BERT and BilSTM models are introduced, and a new method is provided for feature extraction and emotional tendency classification of the network popular line text. For the problems that the model word vector cannot cover context semantic information and is difficult to process irregular and informal written texts, the method uses a BERT model to pre-train the network popular language text, obtains the characteristic representation of the input text, and then uses the characteristic representation as the input of a bidirectional LSTM model to perform emotion classification training. The model achieves a relatively ideal effect in the experiment of the network popular language text data set. Meanwhile, as the acquired data volume is not large enough to dig more emotional features for analysis and the degree of negative words cannot be finely divided, the data range is expanded in subsequent research to carry out deep research, and a better method and suggestion are provided for the emotional analysis of the network popular phrases.

It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The above embodiments are merely illustrative and should not be construed as limiting the scope of the invention, which is intended to be covered by the claims.

Claims

1. A method for analyzing network popular language emotion based on BERT and BilSTM is characterized by comprising the following steps:

the method comprises the following steps that a BERT pre-training language model is used as an upstream module and a BilSTM bidirectional long-short term memory network model is used as a downstream module in the network popular language emotion analysis model, the BERT pre-training language model is adjusted to generate dynamic representation of network popular word vectors, the dynamic representation is input into the BilSTM bidirectional long-short term memory network model for training, and after local and global semantic features of a text are obtained, emotion classification is carried out;

2. The method for emotion analysis of network popular lines based on BERT and BilSTM as claimed in claim 1, wherein said BERT pre-trained language model is pre-trained through large-scale corpus to obtain model network parameters adapted to general natural language processing task, and the pre-trained network parameters are adjusted according to the text of the current task.

3. The method for analyzing emotion of popular lines in networks based on BERT and BilSTM as claimed in claim 2, wherein the adjustment of the BERT pre-trained language model is to use the parameters obtained from pre-training as initial values of the model, and to balance the relationship between data and model according to the manually labeled data set inputted by the downstream task, so that the BERT pre-trained language model is further fitted and converged to obtain a model for pre-training of the downstream task.

4. The method as claimed in claim 3, wherein the obtained popular language data to be analyzed is screened to obtain a data set meeting the input standard of the popular language emotion analysis model, and the data set is classified into a training set, a testing set and a validation set, wherein the training set accounts for 80%, the testing set accounts for 10%, and the validation set accounts for 10%.

5. The method for analyzing emotion of popular lines in networks based on BERT and BilSTM as claimed in claim 3, wherein said labels are divided into negative and positive categories.

6. The method of claim 1, wherein the result of multiplying the output of the BERT pre-training language model by a weight capable of learning is used as the input of the BilSTM bidirectional long-short term memory network model.