CN110245229B

CN110245229B - Deep learning theme emotion classification method based on data enhancement

Info

Publication number: CN110245229B
Application number: CN201910365005.7A
Authority: CN
Inventors: 周晨星; 赖韩江; 印鉴
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2023-03-28
Anticipated expiration: 2039-04-30
Also published as: CN110245229A

Abstract

The invention provides a deep learning theme emotion classification method based on data enhancement, which can enable words to obtain primary semantic information through a bert pre-training language model, then learns context semantic features between the words through a bidirectional GRU network, and simultaneously provides a method for enhancing data. Experiments on corresponding data sets show that compared with the prior emotion classification method, the emotion classification method has a large improvement.

Description

Deep learning theme emotion classification method based on data enhancement

Technical Field

The invention relates to the field of natural language processing, in particular to a deep learning theme emotion classification method based on data enhancement.

Background

In recent years, internet technology has become mature, and people are used to communicate with each other and express their own ideas on the internet. During the period, a lot of text information is reserved on the internet, and the emotion analysis technology aims to extract the viewpoint and tendency of a client in expressing emotion for a certain thing from the text information and provide technical support for subsequent specific application scenes such as shop innovation and the like, so the emotion analysis technology has high application value in both academic and industrial fields.

The emotion analysis at the theme level is to judge the emotion tendentiousness of a sentence about a certain theme, which has a very important role in emotion analysis. Currently, the commonly used analysis methods mainly include methods based on emotion dictionaries and methods based on machine learning. The method based on the emotion dictionary is to comprehensively evaluate the emotional tendency of a sentence about a subject by searching the occurrence number of emotion vocabularies about the subject in the sentence and the emotion polarities of the emotion vocabularies, and the key point of the step is to determine which emotion vocabularies related to the given subject are and then perform statistical analysis. The method is simple and convenient to operate and easy to operate. However, the disadvantages are also apparent: 1. the quality requirement on the constructed emotion dictionary is high, and some words which implicitly express emotion are easy to ignore, so that the accuracy of emotion analysis is reduced. 2. The emotional words of the current sentence about a certain subject need to be accurately positioned, and if the positioning is not accurate, the misclassification is caused, and the classification performance is also reduced. Therefore, the emotion dictionary-based method is gradually replaced with other methods. At present, many researches adopt an emotion analysis method of machine learning, which is firstly regarded as a classification problem, features beneficial to emotion analysis and judgment about a theme are selected from labeled training samples, and then a classifier model (such as nearest neighbor KNN, bayes, support Vector Machine (SVM) and the like) is trained to predict the emotion polarity of an unknown sentence about a certain theme. The method has better classification effect than the method based on the emotion dictionary, but still does not meet the expectation of people.

And analyzing the reason which causes poor classification effect at present, namely the data set is small in scale. It is contemplated that the deep network has a greater classification capability by augmenting the new data set on top of the original data set. Because the words influencing the emotion polarity of a certain theme in a sentence are not unique, the word with the maximum emotion polarity judgment influence in the sentence is dug out and is put into a deep network for retraining again as a new training set, so that the aim of expanding the data set can be fulfilled on one hand, and the extraction and learning of the deep network on the characteristics of the data set are enhanced; and on the other hand, the analysis capability of the deep network on sentences with unobvious emotion polarities is enhanced. Through the method, the classifier has better classification effect and higher accuracy.

Disclosure of Invention

The invention provides a deep learning theme emotion classification method based on data enhancement with high accuracy.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a deep learning theme emotion classification method based on data enhancement comprises the following steps:

s1: establishing a deep learning network model G for generating semantic information, feature representation and a classifier of a sentence;

s2: selecting the most important words influencing emotion analysis in the training set according to the deep learning network model G to form a new training set;

s3: and training the deep learning network model G again according to the original training set and the new training set, and then testing.

Further, the specific process of step S1 is:

s11: the method comprises the steps of utilizing a bert pre-training language model, representing each word in a sentence after pre-training by using a low-dimensional dense real number vector, wherein the bert pre-training language model already contains semantic modeling for each word, so that each word output by the bert has semantic information, and the whole sentence is represented as X = [ X ] ([ X ]) ₁ ,…,x _t ,…,x _n ]Where n is the length of the sentence, the vector matrix X has dimensions of 768;

s12: according to the word vector representation passing through the bert layer, the model is required to learn the context information of each word of the sentence, and a bidirectional GRU network is used for learning the context information of the sentence; let each word represent a time step t, and the input of each GRU cell unit is the word vector x at the current time t _t And GRU cryptic layer output h at time t-1 _ft-1 Get the expression H for forward GRU _f ＝[h _f1 ,…,h _ft ,…h _fn ]Similarly, backward GRU is denoted as H _b ＝[h _b1 ,…,h _bt ,…h _bn ]

S13: in order to learn the relationship between each word and the subject word of the sentence, an Attention layer is constructed for calculating the weight of each word relative to the subject word, the greater the weight represents the greater the emotional polarity of the word affecting the sentence relative to the current subject, firstly, each word is represented by S12 as H = H _f +H _b The word vector of the current subject word is denoted as e _N Then the two vectors are spliced and the tanh activation function is used, the resulting vector is denoted as M = tanh ([ H; e) _N ]) Then, learning a parameter W to calculate the weight of each word with respect to the subject word, and multiplying the weight by the GRU output of each word at the corresponding position to obtain an overall representation r of the sentence with respect to the subject word, wherein r = H · softmax (W) ^T M)；

S14: establishing a last output layer, mapping the sentence expression r obtained in the step S13 to three classification categories through two full-connection layers and one softmax layer, respectively corresponding to positive, negative and neutral probabilities of the emotion polarity of the current sentence, then outputting the emotion polarity of the maximum probability according to the probability condition, and outputting the result;

s15: training the training data in the data set for one time according to the process, adopting cross entropy as a loss function in the training process, optimizing by using an ADAM optimizer, adopting L2 regularization to prevent overfitting, and finally storing the parameters of the network.

Further, the specific process of step S2 is:

s21: using [ MASK ] to each word of each sentence of training data]Performing one-time replacement, and setting the current sentence as s = [ w = ₁ ,…,w _t ,…,w _n ]N represents the number of words contained in the current sentence, and the sentence sets s 'after each sentence is replaced one by one have n sentences in total, wherein s' is { [ [ MASK [ ]],…,w _t ,…,w _n ],…,[w ₁ ,…,[MASK],…,w _n ],…,[w ₁ ,…,w _t ,…,[MASK]]}.

And S22, reloading the network parameters stored in the S15 to obtain a previously trained network G, respectively inputting each sentence of S' into the network G to obtain probability distribution for predicting emotion polarity, selecting the sentence which is farthest away from the real emotion distribution, putting the sentence into a new training set to obtain a new training set which is doubled, and excavating a word which influences the emotion polarity of the sentence for each sentence to enhance the classification capability of the model.

Further, the specific process of step S3 is:

s31: and (3) putting the training set generated in the S22 and original training data together as a training set into the trained network in the S15, then performing training again according to the process of the S1, still adopting cross entropy as a loss function during training, adopting ADAM as an optimizer, adopting L2 regularization, setting the learning rate to be 0.01, and converging the model after training for 5 epochs.

S32: and (5) testing the test data in the trained network in the S31, wherein the test indexes are measured by adopting the accuracy.

Further, the decision rule of the sentence farthest from the real emotion distribution in step S22 is:

suppose the true sentiment distribution is y ₁ ,y ₂ ,y ₃ ···,y _n The real label is y _t The set of probability distributions for all predictions is { (x) ₁₁ ,x ₁₂ ,x ₁₃ ,···，x _1n ),(x ₂₁ ,x ₂₂ ,x ₂₃ ,···，x _2n ),…,(x _m1 ,x _m2 ,x _m3 ,···，x _mn ) Find the smallest x _it Distribution (x) corresponding thereto _i1 ,x _i2 ,x _i3 ,···，x _in ) I.e. the sentence that is the farthest away from the true sentiment distribution.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention can enable words to obtain a primary semantic information through a bert pre-training language model, then learns the context semantic features between the words through a bidirectional GRU network, and simultaneously provides a data enhancement method. Experiments on corresponding data sets show that compared with the prior emotion classification method, the emotion classification method has a large improvement.

Drawings

FIG. 1 is a schematic flow diagram of a data enhanced network according to the present invention;

fig. 2 is a schematic diagram of a complete model of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the present embodiments, certain elements of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described with reference to the drawings and the embodiments.

Example 1

As shown in fig. 1, a deep learning topic emotion classification method based on data enhancement includes the following steps:

The specific process of step S1 is:

s12: according to the word vector representation passing through the bert layer, the model is required to learn the context information of each word of the sentence, and a bidirectional GRU network is used for learning the context information of the sentence; let each word represent a time step t, and the input of each GRU cell unit is the word vector x at the current time t _t And GRU cell hidden layer output h at t-1 time _ft-1 Get the expression H of forward GRU _f ＝[h _f1 ,…,h _ft ,…h _fn ]Similarly, backward GRU is denoted as H _b ＝[h _b1 ,…,h _bt ,…h _bn ]

S13: in order to learn the relationship between each word of the sentence and the subject word, an Attention layer is constructed for calculating the weight of each word relative to the subject word, the more the weight is, the more the emotional polarity of the word influencing the sentence relative to the current subject is,first each word is denoted by S12 as H = H _f +H _b The word vector of the current subject word is denoted as e _N Then the two vectors are spliced and the tanh activation function is used, the resulting vector is denoted as M = tanh ([ H; e) _N ]) Then, learning a parameter W to calculate the weight of each word with respect to the subject word, and multiplying the weight by the GRU output of each word at the corresponding position to obtain an overall representation r of the sentence with respect to the subject word, wherein r = H · softmax (W) ^T M)；

The specific process of step S2 is:

The specific process of step S3 is:

The decision rule of the sentence farthest from the real emotion distribution in step S22 is:

Example 2

The data set adopted by the method is a task evaluated by a series of computational semantic analysis systems designed under the flag of SIGLEX of the word Special interest group of the computational linguistics association in 2015, and comes from the task12. There are 13 subject words for restaurant reviews, 87 subject words for laptop reviews, and both reviews contain only three affective tags: positive, neutral and negative. The data set used in the present invention is basically as follows:

Dataset	Train	Test	Topics
				Restaurant	1478	775	13
Laptop	1972	948	87

the construction of the network N is shown in the left part of fig. 2.

Taking this sentence as an example: the food is good and delicious, but The table is teriable. Topic: service (tag is negative) inputs this original sentence into The model, and after bert, a 768 × 14 matrix is obtained. This matrix is then input into the bidirectional GRU to learn word-to-word context information, which respectively results in forward GRU outputs: h _f ＝[h _f1 ,…,h _ft ,…h _fn ]And a backward GRU output: h _f ＝[h _f1 ,…,h _ft ,…h _fn ]. Then adding the output corresponding dimensionalities of the two words to obtain the final integral representation of each word: h = [ H = ₁ ,h ₂ ,…,h _n ]. Next, each word of the matrix is stitched together with the word vector of the subject word in order to calculate the relevance of each word-related subject word. Topic words are performed using GloveThe dimension of the word representation is 300, the dimension of the matrix M after splicing is 1068 multiplied by 14, then a parameter W is learned to calculate the weight value of each word, and the calculation method is softmax (W) ^T M), the weight matrix obtained finally is alpha = [ alpha ] ₁ ,α ₂ ,…,α _n ]Where the sum of all elements is 1, then the weight matrix is multiplied by the representation H corresponding to each word and added to get the representation r of the whole sentence, where r is a 1 x 768 vector. Then the emotion data is mapped to three classification categories through two full connection layers and finally through a softmax layer, and the corresponding emotion classifications are positive, negative and neutral respectively.

The method comprises the steps of training a model which is just built by using an existing training set, optimizing by using an ADAM optimizer by using cross entropy as a loss function in a training process, and preventing overfitting by using L2 regularization. After the model is trained, the parameters are saved, and then the data set is expanded by using the model. The extended idea is derived from, for example, judging that The food is good and delay, but The static is reliable, the service can judge that The emotion polarity of The service is negative through two aspects, one is through The term of reliable, the other is through The good and The but, the but is followed by The turn, and because The good is positive, the object described after The but is negative. Therefore, even if the term terrible is covered, the emotional polarity of the service can be presumed to be negative through good and but. And the model is forced to learn the semantic information in the sentence by using the judgment method, so that the classification capability of the model is enhanced.

Then the words covering what can increase the difficulty of the model in predicting the emotion polarity of the current sentence and at the same time increase the size of the data set? As shown in fig. 2, the following steps are adopted: the sentence above is The food is good and delay, but The phrase is three, the word covered is replaced by [ MASK ], so 14 sentences are generated (because there are 14 words), then a predicted probability distribution for each sentence can be obtained by putting The sentences into The model which has just been trained, then The sentence with The probability distribution farthest from The true sentiment distribution is selected (if The two probability distributions are 0.1 0.8.1 and 0.3 0.5 0.2, respectively, and The true sentiment distribution is 01 0, then The distribution of 0.3 0.5.2 is considered farthest from The true sentiment distribution) and put into The new training set. By performing the above operation on each sentence of the training set, as many new training sets as the training set size can be obtained.

After a new data set is obtained, the new data set and the original data set are combined to form a larger data set, then the network is trained again on the basis of the previous model, cross entropy is still adopted as a loss function, ADAM is used as an optimizer, L2 regularization is adopted, the learning rate is set to be 0.01, and the model starts to converge after 5 epochs are trained. The test data is then placed in a trained model for testing.

In order to show the good effect of the experiment, the experiment and the current theme emotion classification model (Word & clause level) with good effect are compared and verified, the evaluation index is accuracy (accuracy), and the accuracy is defined as the percentage of the number of samples correctly predicted by the model to the total number of samples in the whole test data set. The experimental results are as follows:

the result shows that the invention has larger promotion compared with the previous method, and the invention improves the classification capability of the model in the topic emotion analysis for classifying sentences difficult to be classified from the viewpoint of data enhancement, and reasonably utilizes a pre-training language model such as bert.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, and equivalents thereof are intended to be included in the scope of the present invention.

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

The same or similar reference numerals correspond to the same or similar parts;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A deep learning theme emotion classification method based on data enhancement is characterized by comprising the following steps:

s1: establishing a deep learning network model G for generating semantic information, feature representation and a classifier of a sentence; the specific process is as follows:

s11: using a bert pre-training language model to represent each word in the sentence after pre-training by a low-dimensional and dense real number vector, and because of the bert pre-trainingThe language model itself already contains semantic modeling for each word, so each word that is output via bert has semantic information, and the whole sentence is then represented as X = [ X ] ₁ ,…,x _t ,…,x _n ]Where n is the length of the sentence, the vector matrix X has dimensions of 768;

s12: according to the fact that the word vector passing through the bert layer represents that the word vector has certain semantic information and the context information of each word of a sentence needs to be learned by a model, the context information of the sentence is learned by a bidirectional GRU network; assuming that each word represents a time step t, the input of each GRU cell unit is the word vector x at the current time t _t And GRU cell hidden layer output h at t-1 time _ft-1 Get the expression H for forward GRU _f ＝[h _f1 ,…,h _ft ,…h _fn ]Similarly, backward GRU is denoted as H _b ＝[h _b1 ,…,h _bt ,…h _bn ]

S13: in order to learn the relationship between each word and the subject word of the sentence, an Attention layer is constructed for calculating the weight of each word relative to the subject word, the greater the weight represents the greater the emotional polarity of the word affecting the sentence relative to the current subject, firstly, each word is represented by S12 as H = H _f +H _b The word vector of the current subject word is denoted as e _N Then the two vectors are spliced and the tanh activation function is used, the resulting vector being denoted as M = tanh ([ H; e) _N ]) Then, learning a parameter W to calculate the weight of each word with respect to the subject word, and multiplying the weight by the GRU output of each word at the corresponding position to obtain an overall representation r of the sentence with respect to the subject word, wherein r = H · softmax (W) ^T M)；

s15: training the training data in the data set for one time according to the steps S1-S14, adopting cross entropy as a loss function in the training process, optimizing by using an ADAM optimizer, adopting L2 regularization to prevent overfitting, and finally storing the parameters of the network;

2. The deep learning topic emotion classification method based on data enhancement as claimed in claim 1, wherein the specific process of step S2 is:

3. The method for classifying deep learning topic emotions based on data enhancement according to claim 2, wherein the specific process of the step S3 is as follows:

s31: putting the training set generated in the S22 and original training data together as a training set into the trained network in the S15, then carrying out training again according to the flow of the S1, still adopting cross entropy as a loss function during training, adopting ADAM as an optimizer, adopting L2 regularization, setting the learning rate to be 0.01, and converging the model after training 5 epochs;

4. The method for classifying deep learning topic emotions based on data enhancement as claimed in claim 3, wherein the decision rule of the sentence farthest from the real emotion distribution in step S22 is: