CN114595693A

CN114595693A - Text emotion analysis method based on deep learning

Info

Publication number: CN114595693A
Application number: CN202011428365.6A
Authority: CN
Inventors: 蔡颖凯; 王楚; 王忠锋; 张冶; 曹世龙; 关艳; 高曦莹; 宋纯贺; 李力刚; 赵洪莹; 邹云峰; 夏靖怡
Original assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center; Marketing Service Center Of State Grid Liaoning Electric Power Co ltd; Shenyang Institute of Automation of CAS
Current assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center; Marketing Service Center Of State Grid Liaoning Electric Power Co ltd; Shenyang Institute of Automation of CAS
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2022-06-07

Abstract

The invention relates to a text emotion analysis method based on deep learning, which comprises the following steps: the method comprises the following steps: preprocessing text sample data, and manually marking emotion assessment grades in advance; step two: constructing a Self-Attention deep learning model for online text emotion analysis, and training the model by using training set data; calculating a loss function every time, calculating the gradient of neurons in an output layer, and carrying out forward propagation and backward propagation to update the network parameter value of each layer until a cut-off condition is reached, and then obtaining an optimized Self-Attention deep learning model and each network parameter; step three: and acquiring actual text corpus data, and processing the data by using an optimized Self-Attention deep learning model to obtain an online text emotion analysis result.

Description

Text emotion analysis method based on deep learning

Technical Field

The invention belongs to the field of machine learning and text mining, and particularly relates to a text emotion analysis method based on deep learning.

Background

In an online interactive platform, text analysis greatly changes the communication and thinking modes of people, and promotes the explosive growth of user generated information. In recent years, a large amount of text generated by a user has become one of the most representative data sources of big data. Mining and analyzing user generated information has become an important component of social developmental research. As an emerging information processing technology, emotion analysis of social media text for analyzing, processing, summarizing, and reasoning subjective text with emotion has received much attention in recent years in both academic and industrial areas and has been widely applied to many fields of the internet. Even in life, the method has a wide application range, for example, in the field of user interaction of service robots in power business halls, the traditional emotion analysis research work mainly focuses on analyzing text emotion, but ignores individual differences of users in emotion expression, and therefore, the quality of analysis results is influenced. To solve these problems, the present invention aims to solve the personalized emotion analysis problem. In consideration of the wide application of the BP neural network technology in social media processing, the invention provides various models based on the BP neural network to apply the social media text personalized emotion analysis method to online commodity comments.

Sentiment Analysis (SA) is a process of analyzing, processing, summarizing, and reasoning about subjective text with sentiment expressions (e.g., micro blogs, online reviews, online news, etc.). The history of the emotional analytics study is not long. It started to receive a great deal of attention and rapidly developed in around 2000, and then gradually became a hot topic in the fields of natural language processing and text mining. There are many alternative names and similar techniques for emotion analysis, such as opinion mining, emotion mining, subjective analysis, etc., all of which can be studied under emotion analysis. For example, for movie reviews, the user's evaluation of the movie may be identified and analyzed, or the product reviews of the digital camera may be analyzed, such as the user's emotional tendency for "price", "size", "zoom", and other indicators. At present, emotion analysis becomes a comprehensive research field across subjects such as natural language processing, information retrieval, computational linguistics, machine learning and artificial intelligence. The existing text emotion analysis algorithm mainly relates to the viewpoint and the opinion of a user on the text. Due to the lack of interpretation of the user characteristics of the text, it is difficult for these algorithms to fully accurately reflect the user's true emotional expressions. In order to overcome the defects of the existing method, the invention can provide a personalized text emotion analysis method by introducing the influence of users and even product functions. Although emotion analysis of text is currently widely studied and performs well in many public evaluation tasks. However, there has been little research on truly available text emotion analysis tools, particularly personalized text emotion analysis tools, and to some extent has been overlooked by scholars and the industry. Meanwhile, how to dynamically capture the personalized emotion preference of the user also provides a new battle for emotion analysis. The invention firstly introduces an Attention Model idea, and the Attention Model can calculate the importance degree of the current part, namely an Attention Model can track the object to be classified, focus on and find the interesting classified object. Meanwhile, the LSTM deep neural network model based on the Attention model can effectively solve the problems of text information redundancy, information loss and other long-term dependence. In order to design a lighter network structure, the invention is improved on the basis of the Attention-LSTM network model, and the Attention-GRU network model is obtained to classify the personalized feelings.

Disclosure of Invention

In order to solve the above-mentioned deficiency in the prior art, the technical scheme adopted by the invention is as follows:

a text emotion analysis method based on deep learning comprises the following steps:

the method comprises the following steps: preprocessing text sample data, and manually marking emotion assessment grades in advance;

step two: constructing a Self-Attention deep learning model for online text emotion analysis, and training the model by using training set data; calculating a loss function every time, calculating the gradient of neurons in an output layer, and carrying out forward and backward propagation to update network parameter values of each layer until a cut-off condition is reached, and then obtaining an optimized Self-Attention deep learning model and network parameters;

step three: and collecting actual text corpus data, processing the data by using an optimized Self-Attention deep learning model, and acquiring an online text emotion analysis result.

In the second step, a Self-Attention deep learning model for online text emotion analysis is constructed, and training the model by using training set data comprises the following steps:

modeling is carried out based on a BERT deep learning model under a Self-Attention deep learning model;

embedding words of all words of an input sentence as semantic representation of the sentence; for the input semantic representation, acquiring the semantic representation of a hidden layer by using linear operation of matrix multiplication and a nonlinear activation function; inputting semantic representation of a hidden layer, and obtaining sentence-level semantic representation by using dimension reduction operation; combining the sentence representation form and the user representation form and inputting the combination into a classification layer, and combining the functions of user information on the sentence level; the classification layer maps the obtained vectors into a two-dimensional emotion space and performs emotion classification by using a softmax method;

inputting a model: training set D { (x)₁,...,x_k) Where k is 1, M is the number of training data;

and (3) outputting a model: training the optimized BERT deep learning model and network parameters;

wherein (x)₁,...,x_k) Is a word vector, u_kManually pre-labeled emotion rating for representing the characteristics of the user.

And the training set D is obtained after the sentences in the original online text sample data are subjected to word segmentation and word deactivation.

Dividing the sample data into a training set and a verification set, training a BERT deep learning model by using the data in the training set, and verifying by using the data in the verification set.

The method comprises the following steps: the method for constructing the text emotion analysis BERT deep learning model specifically comprises the following steps:

1) the method of the Encode-Decoder coding-decoding module in the Attention model is adopted to carry out the following processing:

and performing machine coding on the input natural sequence or the characteristic matrix sequence in an Encoder coding module, wherein the expression formula is expressed as follows:

C＝F(x₁,x₂,x₃,…,x_n)

wherein x is₁,x₂,x₃,…,x_nFor word segmentation and removalStopping the word processing to obtain a word vector; f is an Encoder coding function, and C is a word vector form coded according to the coding function;

decoding the natural sequence or the characteristic matrix sequence after the Encoder coding in a Decoder module, wherein the expression is as follows:

y_i＝G(C,y₁,y₂,y₃,…,y_i-1)

wherein, y₁,y₂,y₃,…,y_i-1，y_iFor the decoded word vector, i is the number; g is a Decoder decoding function;

2) segmenting the emotion text, and obtaining a word expression T by fine tuning a BERT model as follows:

T_i＝BERT(y_i)

a new user vector representation is obtained by embedding the user vector:

wherein E is^uA newly added user vector;

3) new user vector N obtained by using Bi-LSTM layer pair_iForward propagation of results

And back propagation results

The calculation of (a) is as follows:

get the latest representation of the ith word of the text:

the high level of feature acquisition by Attention indicates that the classification of text is predicted at the classification level, as follows:

c_i＝softmax(W_wh_i+b_w)

wherein the content of the first and second substances,

to classify the final parameters of the network, b_w∈R^lC represents probability distribution belonging to the current ith class for the classification result bias information;

the network is trained using a cross entropy loss function as follows:

wherein u is_kA truth label representing an emotion sample i, c_iPredicting class y for a sample_iThe probability of (c). Network parameters are optimized using random gradient descent (SGD).

Inputting verification data into a BERT deep learning model to obtain a corresponding evaluation grade result, manually marking emotion evaluation grades in advance for comparison, stopping iteration if the evaluation grade result meets an error range, and obtaining the optimized BERT deep learning model and network parameters.

Further comprising: and automatically mapping and associating the text emotion analysis result with the original actual text material data of the user according to the emotion rating, marking the preference degree, and performing visual display for visually displaying the preference degree of the user.

The method adopts a color gradient sequence with strong, dark, light and weak colors to carry out information coloring labeling on actual text corpus data of different users.

The invention has the advantages that:

text emotion analysis is handled in the present invention using a deep neural network based approach. The deep learning method is completely different from the traditional machine learning method, is mainly based on a neural network method, can autonomously perform feature representation learning of discrimination, does not need to design and train features and dictionaries of texts in advance, and can perform deep capture on semantic information. The method is used for text emotion analysis.

Drawings

FIG. 1 general framework of Encoder-Decoder model;

FIG. 2 is a schematic diagram of the Self-orientation structure;

FIG. 3 BERT model structure;

FIG. 4 is a block diagram of a BERT-Attention-BilSTM network;

FIG. 5 Bi-LSTM layer;

FIG. 6 a classification layer;

FIG. 7 is a comparison of the Yelp2013 algorithm with a conventional classification method;

FIG. 8 is a comparison of the Yelp2014 algorithm with conventional classification methods;

FIG. 9 comparison of the Yelp2013 algorithm with other deep learning models;

FIG. 10 comparison of the Yelp2013 algorithm with other deep learning models;

Detailed Description

Attention Model as a Model simulating the characteristics of the Attention of the human brain to the objects of interest, Attention Model can focus on the recognition of the regions of interest. The core idea is to draw the reference that the human brain concentrates more on certain convenient attention of certain things in a specific scene specific area at a certain specific moment, and simultaneously neglects other secondary parts or uninteresting parts, so that the model is a model for the optimal allocation of human brain resources. The principle is to allocate more attention to some key parts and interested parts, allocate less attention to other uninteresting parts or even no attention, reasonably utilize the computing resources of the human brain, and also remove the influence of non-key factors or interference factors on the human brain. The Attention Model is applied in the field of computer vision at first, is used for tasks such as picture recognition, classification and target detection, and achieves good effect. Later attention models were used to deal with the problem of image-to-text conversion, i.e. converting pictures into descriptions of natural language that humans can easily understand, which makes traditional models more effective in this respect.

The data sets employed by the present invention are Yelp2013 and Yelp 2014. The data set includes 470 ten thousand user ratings, 15 more than ten thousand merchant information, 20 ten thousand pictures, and 12 metropolis. In addition, 100 million tips of 110 multiple users are covered, and over 120 million merchant attributes (such as business hours, whether a parking lot exists, whether reservation can be made, environment and other information) are covered, and the total number of users signed in at each merchant is increased along with the time. The reviews in the dataset are divided into 5 levels, and the 5 levels are respectively in english: "Eek, metals not", "Meh, I have experienced better bet", "A-OK", "Yay! I am a fun "," Woohoo! As good As it gets! ". As shown in table 1, the larger the number of stars of the customer for the merchant, the better the evaluation of the customer.

TABLE 1 evaluation of a Sandwich restaurant on the Yelp Web site and its rating examples

The invention mainly describes an Attention Model in the field of Natural Language Processing (NLP). In the field of NLP, the Attention Model is often used in conjunction with the Encoder-Decoder Model. The invention explains the use principle and the use effect of the Attention model through the use method and the flow of an Encoder-Decoder module in the Attention model, and introduces the application method and the experimental effect of the Attention model in personalized situation classification.

The Encoder-Decoder model is a classic coding and decoding natural language processing model, which uses the idea that a natural sequence or a characteristic matrix sequence is labeled to input a group of natural sequences or characteristic matrix sequences to be converted into another group of changed (coded) sequences. The core idea of the encoding and decoding model is that an encode module encodes an input natural sequence (which may be a matrix sequence after word vectorization or a feature obtained after a deep neural network) to form a form which is easy to calculate and process by a computer, so as to obtain a code natural sequence or a feature matrix sequence after encoding, then the code natural sequence or the feature matrix sequence after encoding by the encode module is input to a Decoder module for interpretation (decoding), and finally the changed natural sequence or the feature matrix sequence which is easy to identify and classify is output. The Encode-Decoder model has strong universality and usability, and can be conveniently combined with various traditional neural network models and deep neural network models. A variety of coding models can be used in the Encoder module to encode natural or feature matrix sequences. From this, the general framework structure of the Encoder-Decoder model can be seen.

Where input is typically a natural sequence or a sequence of characteristic matrices, X ═ X (X)₁,x₂,x₃,…,x_n) Output is a decoded natural sequence or a characteristic matrix sequence Y ═ Y₁,y₂,y₃,…,y_n}. The input natural sequence or characteristic matrix sequence is generally subjected to specific machine coding in an Encoder module, where the invention uses C to represent the coded natural sequence or characteristic matrix sequence, and the expression is as follows:

C＝F(x₁,x₂,x₃,…,x_n)

wherein x is₁,x₂,x₃,…,x_nThe word vector is obtained after word segmentation and word stop processing; f is an Encoder coding function, and C is a word vector form coded according to the coding function;

the Encoder encoded natural sequence or feature matrix sequence is decoded in a Decoder module. E.g. computer output y_iC and y generated before are used₁,y₂,y₃,…,y_i-1Thus y is_iThe calculation formula of (2) is as follows:

y_i＝G(C,y₁,y₂,y₃,…,y_i-1)

it can thus be seen that the output y is calculated in the Decoder decoding block_iAs a result, the semantic information used is the same, all by x₁,x₂,x₃,…,x_nA code natural sequence or a feature matrix sequence generated after an Encoder. That is to say for the input sequence x₁,x₂,x₃,…,x_nThe interference and influence capabilities of each element of (a) on the output sequence are the same. The magnitude of these influences is determined by the order of the elements in the input sequence and the relevance of each element to the target. Furthermore, for the input of longer natural sequences or feature matrix sequences, although some codec models can efficiently preserve the influence information generated by historical data, part of the information valid for the result is lost due to the dimensional limitations of the semantic Encoder (encoding) vector in natural language. From the above analysis, the present invention finds that for the sequence encoded by the same encoding rule, the influence of the decoding data generated in the decoding stage on the output is the same, and this encoding and decoding mechanism is greatly different from the best attention allocation thinking mode. Therefore, in order to achieve the same or similar effect of the Attention model of the human brain on the coding Attention mode of the machine, an Attention model mechanism of the human brain-like Attention mode is provided. The principle of this attention mechanism consists in computing the input natural sequence or the sequence of characteristic matrices for the currently output sequence y in the decoder stage_iWhen the attention probability distribution is carried out, a unique semantic code corresponding to the interested target is calculated for each output, the code integrates the attention probability distribution of the feature sequence coded in the input to the current output feature sequence, and the current output result can be optimized.

The Attention-Model structure of the invention adopts the most popular Self-Attention structure form to process data.

The structure of Self-orientation is shown in FIG. 2. The three matrices of Q (query), K (Key), and V (value) all come from the same input, and the invention calculates the dot product between Q and K, and then divides the square root of a dimension, which is the dimension of a query and key vector, to prevent the result from being too large. And normalizing the result into probability distribution by utilizing Softmax operation, and multiplying the probability distribution by a matrix V to obtain the representation of weight summation. The attention weight of the output element is calculated as follows:

most of the existing models use Word2Vec pre-training Word vectors, however, the Word vectors trained by using the models have a problem when the Word vectors required by the invention are generated. Because the word vector acquired by the method belongs to one of static codes, the same word is still expressed in different context environments, and the semantic understanding of the model to different situations is deviated. To address this problem, the present invention selects the BERT pre-training language model as the word vector generation model. The BERT processing structure is a new natural language representation model, the BERT model can better represent the spatial interrelation of semantic information of sentences in the whole text, namely represent different expression meanings and information of the same text information in different contexts, on one hand, the sentences in the same or similar contexts have similar expression meanings, and theoretically, the distances of the sentences in the space are closer than the distances of the sentences in the space. On the other hand, the BERT model uses an operation method substantially similar to the understanding of the human brain when processing vector operations between sentences, and its model structure is shown in fig. 3.

The input to the BERT model is the sum of 3 vectors. For each input word, its representation includes 3 parts, respectively, a word vector (token entries), a segment vector (segment entries), and a position vector (positions entries). Where a word vector represents the encoding of the current word, a segment vector represents the positional encoding of the sentence in which the current word is located, and a position vector represents the positional encoding of the current word, each sentence using CLS and SEP as the beginning and end markers.

The most important part of the BERT model is a bidirectional Transformer coding layer, and text feature extraction is performed by the bidirectional Transformer coding layer, and an Encode feature extractor of the Transformer is used. The Encoder consists of a self-attention mechanism (self-attention) and a feed-forward-neural-network (feed-forward-neural-network). The core of the Encoder is self-attribute, which can find the degree of association between each word and other words in the word without the limitation of distance, and the relationship between dozens of or even hundreds of words can still be found, so that the left and right context information of each word can be fully mined, and the bidirectional representation of the words can be obtained.

The model mainly comprises three layers, namely a BERT structural layer, a BilSTM structural layer and an Attention structural layer. The model structure is shown in fig. 4 below. The model is divided into five layers in total: a BERT layer, a user vector embedding layer, a Bi-LSTM layer, an attention layer, and a classification layer.

BERT layer: firstly, segmenting words of the emotion text, and obtaining a word expression T by finely adjusting a BERT model. I.e. given text D ═ x₁,x₂,x₃,…,x_n}，x_iRepresenting words in a given text. The word vector representation of each word can be obtained by the BERT word vector model:

T_i＝BERT(y_i)

wherein T is_i∈R^dA word vector representation for each word is represented and d represents the dimension of the word vector.

User vector embedding layer: each user representation being a vector representation E^u∈R^dFinally, the invention can obtain a new user vector representation:

Bi-LSTM layer: two operations are performed on the obtained new user vector. Each vector N_iForward propagation results and backward propagation results. The forward propagation result can be calculated according to the following formula:

the back propagation results can be according to the formula:

wherein the content of the first and second substances,

d_hthe number of neurons is implied for each LSTM. This then results in the latest representation of the ith word of the text:

then, the text expression matrix H ═ H can be obtained₁；h₂；h₃；...；h_n]Wherein

The Bi-LSTM layer network structure is shown in figure 5.

An Attention layer: in this section, the present invention provides a new attention expression form, that is, the attention of the ith word and all words is calculated to find the emotion related words, which is specifically expressed as follows:

wherein h is_iAnd h_jRespectively, different words are represented by the characteristics of Bi-LSTM. Final h_iIs defined as:

finally, the product is processed

A high level of feature representation in an emotional sentence is represented.

A classification layer: according to the high-level feature representation obtained by the above Attention, the classification of the text is predicted at the classification level:

c_i＝softmax(W_wh_i+b_w)

wherein the content of the first and second substances,

to classify the final parameters of the network, b_w∈R^lTo bias information for the classification result, c represents a probability distribution. The hierarchical structure is shown in fig. 6.

This section still chooses to train the network with cross entropy loss function:

wherein u is_kA truth label representing an emotion sample i, c_iPredicting a sample as class y_iThe probability of (c). The optimizer chooses to optimize network parameters using random gradient descent (SGD) in this section.

During model training, the method is trained and tested respectively based on a word vector classification model and a sentence vector classification model. The classification model based on the word vector is a method for segmenting a document into a series of words and inputting the words into a deep neural network for training during training; the sentence vector classification model is a method for inputting word vectors into a deep neural network in a sentence form as a whole for training. When the word vector is calculated, the special characters are screened, and the special characters such as repeated marks, messy codes and the like are deleted. The sentence vector is formed by splicing word vectors into fixed length and is sent to a network for training. The hyper-parameters used in the model training process are shown in table 2.

TABLE 2 Superparameter settings

The model training procedure is shown in table 3:

TABLE 3 Algorithm flow

In order to evaluate the performance of the model proposed by the present invention, the following evaluation criteria were used for evaluation. The emotion classification is generally classified into a binomial classification and a multinomial classification.

For the two-term classification, if class C in the sample:

TABLE 4 two-item classification result matrix

For multiple classifications, it is assumed that there is a class C_i,i∈[1…N_c](N_cNumber of categories):

TABLE 5 Multi-item Classification result matrix

Wherein N is_ijRepresenting the number of samples of class i in class j.

Then, each evaluation index is:

(1) rate of accuracy

The accuracy rate is a common evaluation index of a neural network model, and the definition is as follows: for a given data set, the ratio of the number of samples correctly classified by the classifier to the total number of samples, the specific calculation formula can in a sense of accuracy derive whether a classifier is valid, but it cannot always evaluate the working of a classifier effectively. As a simple example, 90% positive samples and 10% negative samples, which are severely unbalanced, are used. In this case, only all samples need to be predicted as positive samples to obtain an accuracy of 90%, but actually, it is impossible to determine whether the classifier is valid. This indicates that there is greater moisture at high accuracy due to sample imbalance problems.

(2) Rate of accuracy

The calculation formula of the probability of actually being positive samples among all the samples predicted to be correct is shown below. The accuracy and precision look somewhat similar, but are completely different concepts. The accuracy rate represents the degree of prediction in the positive sample results, while the accuracy rate represents the overall degree of prediction accuracy, including both positive and negative samples.

(3) Recall rate

It is defined as the probability of being predicted as a positive sample in the actual positive sample, and the calculation formula is shown as the following formula. Application scenario of recall rate: for example, a poor-rated user is more concerned about the poor-rated user in website comments than a good-rated user. This may affect the subsequent user's judgment if too many users are classified as bad-rated users. The higher the recall rate, the higher the probability that represents an realistically bad appraised user is predicted.

(4)F-score

For Precision and Recall, there is no necessary correlation relationship, though from a formulation point of view. However, in large-scale data sets, these two criteria tend to be mutually restrictive. Ideally, it is of course best to achieve both indices being high. But in general, the higher the accuracy, the lower the recall. In practical application, the trade-off is often made according to specific situations. Therefore, in order to balance the two indexes comprehensively, a new index F-score is introduced, and the calculation formula is shown as the following formula, which is a harmonic value comprehensively considering the accuracy and the recall ratio.

(5)RMSE

The mean square error represents the degree of variance of the predicted values, also called the standard error, and the best fit is when RMSE is 0. The root mean square error is also one of the comprehensive indicators of error analysis. The calculation formula is as follows

The experiment of the invention adopts the used programming language Python3.7, and develops by matching with a deep learning framework of the Pythroch 1.5 version on the Jupyter Notebook platform. The data sets used were: "Yelp 13", "Yelp 14". The data set is divided, 80% of the data set is used as a training set, 10% is used as a verification set to store the model, and 10% is used as a test set to calculate the effect of the model. This section compares these three models with the base model, respectively. The performance evaluation criteria used for the experiments were accuracy, precision, recall, F-score, and RMSE.

TABLE 6 Yelp2013 data set Algorithm comparison results

TABLE 7 Yelp2014 data set Algorithm comparison results

It can be seen from tables 6 and 7 and bar graphs 7 and 8 that the Bert-Attention has the best results over all the evaluation criteria. The Bert model has been successfully applied to many tasks of natural language processing, and obtaining a representation of a word using Bert can well capture the semantics of the word. In Table 6, the Bert-Attention model is nearly 3% higher than the Attention-LSTM model for the F-score index, and in Table 7, the Bert-Attention model is nearly 4% higher than the Attention-LSTM model. Attention-LSTM and Attention-GRU have similar performance on both datasets, because they are structurally similar, and their ability to capture semantic features is similar. The table shows that the three models are superior to other basic models, and the fact that deep learning designed by the invention has better performance in personalized emotion analysis is verified again, so that the accuracy, recall rate and precision of emotion classification can be further improved. According to the invention, an attention-based deep neural network emotion analysis model is designed aiming at user individuation, and after an attention structure is added into a deep learning model, relevant feature information of a text at different positions can be effectively obtained, and similar semantic features can be effectively extracted. As can be seen in the histogram, the attention-based deep learning model achieves a lower RMSE result, which indicates that the deep learning model is more stable in emotion analysis than the conventional machine learning model in personalized emotion analysis.

In order to detect whether the attention mechanism of the model is effective, the invention designs an ablation experiment and compares the experimental results of the model with attention and without attention. As shown in tables 8 and 9.

TABLE 8 Yelp2013 data set Algorithm comparison results

TABLE 9 Yelp2014 data set Algorithm comparison results

From table 8, table 9 and histograms 9 and 10, it can be seen that the structure with the attention model is higher in F-socre and accuracy than the structure without the attention model. From Table 8, it can be seen that the Attention-LSTM is 1.41% higher in accuracy than the LSTM model, the Attention-GRU model is 1.67% higher than the F-score of the GRU model, and the Bert-Attention model is 1.23% higher than the Bert model. These illustrate the importance of the attention model in personalized sentiment analysis. The Attention model can well capture important features of sentences, more accurately position key information, filter unimportant information, further extract important individual emotional features, and further improve the classification performance of the model.

The Attention mechanism and principle of the Attention model and the Attention weight calculation formula calculation method show that the Attention model can highlight the effect of the input key natural sequence or feature matrix accumulation on the analysis of the output feature coding sequence by calculating the probability distribution of Attention, and has good optimization effect on the traditional depth network model. Meanwhile, the application of the Attention model in various fields can be analyzed, so that the thought application range of the Attention model is wide, and the Attention model has a good effect on tasks such as text classification, emotion analysis and the like in the current natural language processing field.

In the traditional model, the model depends heavily on the emotion dictionary, and an accurate emotion dictionary is difficult to construct. These shortcomings have led researchers to seek more convenient solutions. Machine learning based analysis methods have emerged. The method based on machine learning is very effective, but for different data, a proper classifier and a proper method for extracting text features need to be selected to obtain a good analysis result. With the research and application of deep learning and neural networks, emotion analysis based on deep learning and neural networks is a universal solution. The model based on the combination of the deep learning and attention mechanism has strong generalization capability on different data sets, can be applied to emotion analysis of a plurality of data sets, and does not need to select the model like a traditional machine learning model. The model provided by the invention enables the personalized difference between users and the potential personalized factors of the users to be as follows: language habits, user personality, opinion bias, etc. The model provided by the invention simply and effectively solves the problem of cold start of the user. Most importantly, the model provided by the invention is far superior to the traditional machine learning model in each evaluation index, and provides a better solution for solving the emotion analysis problem.

Claims

1. A text emotion analysis method based on deep learning is characterized by comprising the following steps:

step two: constructing a Self-Attention deep learning model for online text emotion analysis, and training the model by using training set data; calculating a loss function every time, calculating the gradient of neurons in an output layer, and carrying out forward propagation and backward propagation to update the network parameter value of each layer until a cut-off condition is reached, and then obtaining an optimized Self-Attention deep learning model and each network parameter;

step three: and acquiring actual text corpus data, and processing the data by using an optimized Self-Attention deep learning model to obtain an online text emotion analysis result.

2. The method of claim 1, wherein the step two of constructing a Self-Attention deep learning model for online text emotion analysis, and the training of the model with training set data comprises:

embedding words of all words of an input sentence as semantic representation of the sentence; for the input semantic representation, acquiring the semantic representation of the hidden layer by using linear operation of matrix multiplication and a nonlinear activation function; inputting semantic representation of a hidden layer, and obtaining sentence-level semantic representation by using dimension reduction operation; combining the sentence representation form and the user representation form and inputting the combination into a classification layer, and combining the functions of user information on the sentence level; the classification layer maps the obtained vectors into a two-dimensional emotion space, and performs emotion classification by using a softmax method;

wherein (x)₁,...,x_k) Is a word vector, u_kManually pre-labeled emotional ratings representing characteristics of the user.

3. The text emotion analysis method based on deep learning of claim 2, wherein the training set D is obtained by performing word segmentation and word deactivation on sentences in original online text sample data.

4. The method according to claim 2, wherein the sample data is divided into a training set and a verification set, the BERT deep learning model is trained by using the data in the training set, and the verification is performed by using the data in the verification set.

5. An emotion analysis method based on deep learning as claimed in claim 2, comprising: the method for constructing the text emotion analysis BERT deep learning model specifically comprises the following steps:

performing machine coding on an input natural sequence or a characteristic matrix sequence in an Encoder coding module, wherein an expression is expressed as follows:

C＝F(x₁，x₂，x₃，…，x_n)

wherein x is₁，x₂，x₃，...，x_nThe word vector is obtained after word segmentation and word stop processing; f is an Encoder coding function, and C is a word vector form coded according to the coding function;

y_i＝G(C，y₁，y₂，y₃，...，y_i-1)

wherein, y₁，y₂，y₃，...，y_i-1，y_iFor the decoded word vector, i is the number; g is a Decoder decoding function;

T_i＝BERT(y_i)

a new user vector representation is obtained by embedding the user vector:

wherein, E^uA newly added user vector;

And back-propagating the results

The calculation of (a) is as follows:

get the latest representation of the ith word of the text:

6. the method of claim 5, wherein the high-level features obtained by the Attention predict the text category at the classification level as follows:

c_i＝softmax(W_wh_i+b_w)

wherein the content of the first and second substances,

the network is trained using a cross entropy loss function as follows:

7. The method for emotion analysis based on deep learning text according to claim 1, wherein the verification data is input into a BERT deep learning model to obtain a corresponding evaluation grade result, emotion rating grade marking is manually performed in advance for comparison, iteration is stopped if an error range is met, and an optimized BERT deep learning model and network parameters are obtained.

8. The method for analyzing text emotion based on deep learning of any one of claims 1-7, characterized by further comprising: and automatically mapping and associating the text emotion analysis result with the original actual text corpus data of the user according to the emotion rating level, marking the preference degree, and performing visual display for visually displaying the preference degree of the user.

9. The text emotion analysis method based on deep learning of claim 8, wherein the information coloring and labeling are performed on the actual text corpus data of different users by adopting a color gradient sequence of strong, dark, light and weak.