CN110245229A

CN110245229A - A kind of deep learning theme sensibility classification method based on data enhancing

Info

Publication number: CN110245229A
Application number: CN201910365005.7A
Authority: CN
Inventors: 周晨星; 赖韩江; 印鉴
Original assignee: National Sun Yat Sen University
Current assignee: National Sun Yat Sen University
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2019-09-17
Anticipated expiration: 2039-04-30
Also published as: CN110245229B

Abstract

The present invention provides a kind of deep learning theme sensibility classification method based on data enhancing, this method can allow word first to get a preliminary semantic information by bert pre-training language model, then by the context semantic feature between two-way GRU e-learning word and word, a kind of method for enhancing data is proposed simultaneously, the maximum word of feeling polarities is influenced in each sentence by rejecting, the judgement for the feeling polarities for forcing model that study is gone to be more difficult sentence, while EDS extended data set enables model to capture feature from more data sets again.Show the sensibility classification method before present invention comparison by the experiment on corresponding data collection, has a distinct increment.

Description

A kind of deep learning theme sensibility classification method based on data enhancing

Technical field

The present invention relates to natural language processing fields, more particularly, to a kind of deep learning master based on data enhancing Inscribe sensibility classification method.

Background technique

In recent years, Internet technology is more mature, people are accustomed to mutually exchanging and expressing the idea of oneself on the net.? Many text informations are remained during this, on internet, and sentiment analysis technology is intended to excavate from these text informations Client mentions in the viewpoint and tendency to show emotion for some things for subsequent concrete application scene such as retail shop's innovation etc. out Technical support is supplied, so sentiment analysis technology all has very high application value in academia and industry.

And the other sentiment analysis of subject matter level is exactly emotion tendency of the judgement in short about some theme, this is in emotion point It is had a decisive role in analysis.Common analysis method mainly includes based on sentiment dictionary and based on engineering at present The method of habit.Method based on sentiment dictionary be by find sentence in about some theme emotion word remittance abroad show quantity and Their feeling polarities carry out the Sentiment orientation of overall merit this sentence about this theme, the emphasis of this step be it needs to be determined that It is then for statistical analysis again which emotion vocabulary relevant to given theme has.The method is easy to operate, is easy to operate.No Cross disadvantage to be also evident from: the quality requirement of sentiment dictionary constructed by 1. pairs is very high, there is some words implicitly to show emotion It is easy to be ignored and sentiment analysis accuracy rate is caused to decline.2. needing precise positioning to current sentence about some theme Emotion word will cause the same reduction classification performance of mistake classification if position inaccurate.Therefore, based on the side of sentiment dictionary Method is gradually substituted by other methods.Many researchs all use the sentiment analysis method of machine learning at present, are seen first At being a classification problem, the feature for being conducive to determine about the sentiment analysis of theme is chosen from the training sample marked, Then train a sorter model (such as arest neighbors KNN, Bayes and support vector machines etc.) go to predict unknown sentence about The feeling polarities of some theme.This method is more preferable than the classification effect based on sentiment dictionary, but does not reach still The expectation of people.

Analyze occur at present cause classifying quality it is bad a reason --- data set scale is small.Consider, by original Expanding new data set on the basis of data set makes depth network have a more powerful classification capacity.Due in a word The word for influencing the feeling polarities of some theme is not that uniquely, keeping in mind influences feeling polarities judgement maximum by cutting out in a word That word, be put into depth network as new training set and train again, on the one hand can achieve EDS extended data set Purpose, to enhance extraction and study of the depth network to data set features；On the other hand enhancing depth network is to emotion pole The analysis ability of the unconspicuous sentence of property.A kind of mode can make classifier classifying quality more preferable in this way, and accuracy rate also can It is higher.

Summary of the invention

The present invention provides a kind of accuracy rate the higher deep learning theme sensibility classification method based on data enhancing.

In order to reach above-mentioned technical effect, technical scheme is as follows:

A kind of deep learning theme sensibility classification method based on data enhancing, comprising the following steps:

S1: the semantic information for generating sentence, the deep learning network model G of character representation and classifier are established；

S2: being picked out according to deep learning network model G influences the most important word of sentiment analysis in training set is constituted newly Training set；

S3: deep learning network model G is trained again according to original training set and new training set, is then surveyed Examination.

Further, the detailed process of the step S1 is:

S11: utilizing bert pre-training language model, and each word in pre-training treated sentence is low with one Dimension, dense real vector are indicated, and since bert pre-training language model itself has been contained to each word Semantic modeling, therefore, by bert export each word there is semantic information, then by entire sentence expression at X= [x₁,…,x_t,…,x_n], wherein n is the length of sentence, and the dimension of vector matrix X is 768 dimensions；

S12: it indicates to have been provided with certain semantic information according to the term vector by bert layer, it is also necessary to allow model learning The contextual information of each word of sentence removes the contextual information of study sentence with a two-way GRU network；If each word generation One time step t of table, the input of each GRU cell factory are the term vector x of current t moment_tAnd the GRU cell at t-1 moment Hidden layer exports h_ft-1, H is expressed as to GRU before obtaining_f=[h_f1,…,h_ft,…h_fn], similarly, backward GRU's is expressed as H_b= [h_b1,…,h_bt,…h_bn]

S13: in order to learn each word of sentence and the relationship of descriptor, constructing one layer Attention layers, every for calculating Weight of a word about descriptor, weight is bigger, and the word that represents is bigger about the feeling polarities of current topic in influence sentence, first First each word is expressed as H=H by S12_f+H_b, the term vector of current topic word is expressed as e_N, then two vectors are spliced And tanh activation primitive is used, obtained vector is expressed as M=tanh ([H；e_N]), then learn a parameter W and goes to calculate often A word exports the entirety for obtaining sentence about descriptor multiplied by the GRU of each word of corresponding position about the weight size of descriptor Indicate r, wherein r=Hsoftmax (W^TM)；

S14: establishing the last layer output layer, and the sentence expression r that S13 is obtained passes through two layers of full articulamentum and one layer Softmax is mapped in three class categories, and the feeling polarities for respectively corresponding current sentence are that actively, passive and neutral is general Then rate exports the feeling polarities of maximum probability according to probability size cases, export result；

S15: the training data in data set is carried out to a training according to the above process, uses cross entropy in training process It as loss function, is optimized using ADAM optimizer, over-fitting is prevented using L2 regularization, finally by the ginseng of network Number preserves.

Further, the detailed process of the step S2 is:

S21: each word of each sentence of training data is once replaced with [MASK] respectively, if currently Sentence is s=[w₁,…,w_t,…,w_n], n indicates the number for the word that current sentence includes, then each sentence after replacing one by one The shared n word of sentence set s ' one, wherein s ' be { [[MASK] ..., w_t,…,w_n],…,[w₁,…,[MASK],…, w_n],…,[w₁,…,w_t,…,[MASK]]}.

S22: reloading the network parameter saved in S15, the network G of trained mistake is obtained before, then by s's ' Each sentence is separately input to the probability distribution of available prediction feeling polarities in network G, then selects from true Emotion is distributed that sentence of lie farthest away, is put into new training set, gets more one times of new training set, and every One sentence, which has all been cut out, influences the maximum word of the sentence feeling polarities, enhances the classification capacity of model.

Further, the detailed process of the step S3 is:

S31: the training set generated in S22 is put into S15 with former training data together as training set and has been trained Network in, then once trained again according to the process of S1, training when, still uses cross entropy as loss function, Use ADAM as optimizer, using L2 regularization, learning rate is set as 0.01, and model is restrained after 5 epochs of training.

S32: test data being put into S31 in trained network and is tested, and test index is carried out using accuracy rate It measures.

Further, the decision rule of that sentence in the step S22 from real feelings distribution lie farthest away is:

Assuming that real feelings are distributed as y₁,y₂,y₃···,y_n, true tag y_t, the probability distribution set of all predictions For { (x₁₁,x₁₂,x₁₃, x_1n),(x₂₁,x₂₂,x₂₃, x_2n),…,(x_m1,x_m2,x_m3, x_mn), it looks for The smallest x out_it, corresponding to distribution (x_i1,x_i2,x_i3, x_in) it is the language that lie farthest away is distributed from real feelings Sentence.

Compared with prior art, the beneficial effect of technical solution of the present invention is:

The present invention can allow word first to get a preliminary semantic information by bert pre-training language model, then pass through The context semantic feature between two-way GRU e-learning word and word is crossed, while proposing a kind of method for enhancing data, by picking Except the maximum word of feeling polarities is influenced in each sentence, the judgement for the feeling polarities for forcing model that study is gone to be more difficult sentence, simultaneously EDS extended data set enables model to capture feature from more data sets again.Shown by the experiment on corresponding data collection Sensibility classification method before present invention comparison, has a distinct increment.

Detailed description of the invention

Fig. 1 is the flow through a network schematic diagram that data of the invention enhance；

Fig. 2 is the complete model schematic of the present invention.

Specific embodiment

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；

In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product Size；

To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.

The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.

Embodiment 1

As shown in Figure 1, a kind of deep learning theme sensibility classification method based on data enhancing, comprising the following steps:

The detailed process of step S1 is:

The detailed process of step S2 is:

The detailed process of step S3 is:

The decision rule of that sentence in step S22 from real feelings distribution lie farthest away is:

Embodiment 2

The data set that this method uses is to set under the special interest group SIGLEX of computational linguistics association vocabulary for 2015 The task of the series of computation semantic parsing system assessment of meter, the data set of use contain two from this task of Task12. Partial data, a part are the restaurant reviews from some clients, other comments from client to laptop computer.About The descriptor of restaurant review has 13, and the descriptor about laptop computer comment has 87, and two comments all only include three Kind affective tag: actively, neutral and passive.Data set basic condition used in the present invention is as shown in the table:

Dataset	Train	Test	Topics
				Restaurant	1478	775	13
Laptop	1972	948	87

The building of network N is as shown in Fig. 2 left-hand component.

Using the words as example: The food is so good and delicious, but the staff is This original sentence is input in model by terrible.Topic:service (label is passive), after bert It can obtain one 768 × 14 matrix.Then this Input matrix is learnt to the context letter between word and word into two-way GRU Breath, the GRU output of available forward direction respectively: H_f=[h_f1,…,h_ft,…h_fn] and backward GRU output: H_f= [h_f1,…,h_ft,…h_fn].Then the output of the two is corresponded to dimension to be added, the whole of last each word, which can be obtained, to be indicated: H =[h₁,h₂,…,h_n].Next, the term vector of each word of the matrix and descriptor is stitched together, in order to count Calculate the correlation that each word closes descriptor.Descriptor carries out word expression using Glove, and dimension is the matrix M after 300. splicings Dimension be 1068 × 14. then learn a parameter W go to calculate the weight of each word, the method for calculating is softmax (W^TM), the weight matrix finally obtained is α=[α₁,α₂,…,α_n], wherein all elements and be 1, then by weight matrix It is multiplied by the expression H of corresponding each word and is added and obtain the expression r of entire sentence, wherein r is one 1 × 768 vector.Then will It is mapped in three class categories by two full articulamentums finally by softmax layers, and corresponding emotional semantic classification is respectively Actively, passive and neutral.

First the model just built is trained with existing training set, using cross entropy as loss in training process Function is optimized using ADAM optimizer, and prevents over-fitting using L2 regularization.By parameter after training model It preserves, next data set is expanded using the model.The idea of expansion, which derives from, will such as judge this following sentence The food is so good and delicious, but the staff is terrible.Topic:service is talked about, Can by two aspect judge that this topic feeling polarities is negative about service, one be by terrible this Word, the other is but is followed by turnover by good and but, due to good be it is positive, but is described later right As being exactly passive.So still can speculate service by good and but even if this word covers terrible Feeling polarities be negative.And using this judgment method force model go study sentence in semantic information, to enhance The classification capacity of model.

The difficulty of model prediction current sentence feeling polarities can be increased and at the same time increasing by so covering what kind of word The scale of data set? such as Fig. 2, the step of using, is as follows: with that sentence above for The food is so good and Delicious, but the staff is terrible. are covered from first word is covered until covering the last one word Word is substituted with [MASK], is thus generated 14 words (because having 14 words), is then respectively put into these sentences just A prediction probability of available each sentence is distributed in trained good model, then select probability distribution and real feelings That sentence of lie farthest away is distributed (if two probability distribution are respectively 0.1 0.8 0.1 and 0.3 0.5 0.2, and true It is distributed as 010 in fact, then it is assumed that 0.3 0.5 0.2 this distribution distance are really distributed farthest) it is put into new training, for working as For preceding sentence, can select The food is so good and delicious, but the staff is [MASK] by its It is put into the middle of new training set.Aforesaid operations are executed by each sentence to training set, can be obtained and training set The new training set of size as many.

After obtaining new data set, it is merged into one bigger data of composition with original data set Collection, then time network of retraining on model basis before, still uses ADAM as loss function using cross entropy As optimizer, using L2 regularization, learning rate is set as 0.01, and model starts to restrain after having trained 5 epochs.Then will Test data is put into trained model and is tested.

In order to show the good result of this experiment, this experiment and the current existing good theme sentiment classification model of effect (Word&clause level) compares verifying, and evaluation index is accuracy rate (accuracy), and accuracy rate is defined as model Correctly predicted number of samples accounts for the percentage of entire test data set total sample number.Experimental result is as follows:

Invention as can be seen from the results has a biggish promotion compared to pervious method, and the present invention is from the angle that data enhance Degree sets out, classification capacity of the lift scheme in the theme sentiment analysis for classifying more difficult sentence, and rationally utilizes bert in this way A kind of pre-training language model.

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications done without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

The same or similar label correspond to the same or similar components；

Described in attached drawing positional relationship for only for illustration, should not be understood as the limitation to this patent；

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

The same or similar label correspond to the same or similar components；

Claims

1. a kind of deep learning theme sensibility classification method based on data enhancing, which comprises the following steps:

S2: being picked out according to deep learning network model G influences the most important word of sentiment analysis in training set constitutes new training Collection；

S3: deep learning network model G is trained again according to original training set and new training set, is then tested.

2. the deep learning theme sensibility classification method according to claim 1 based on data enhancing, which is characterized in that institute The detailed process for stating step S1 is:

S11: utilizing bert pre-training language model, thick by one low-dimensional of each word in pre-training treated sentence Close real vector is indicated, and the semanteme to each word has been contained due to bert pre-training language model itself Therefore modeling has semantic information by each word that bert is exported, then by entire sentence expression at X=[x₁,…, x_t,…,x_n], wherein n is the length of sentence, and the dimension of vector matrix X is 768 dimensions；

S12: it indicates to have been provided with certain semantic information according to the term vector by bert layer, it is also necessary to allow model learning sentence Each word contextual information, with a two-way GRU network go study sentence contextual information；If each word represents one A time step t, the input of each GRU cell factory are the term vector x of current t moment_tAnd the GRU cell hidden layer at t-1 moment Export h_ft-1, H is expressed as to GRU before obtaining_f=[h_f1,…,h_ft,…h_fn], similarly, backward GRU's is expressed as H_b= [h_b1,…,h_bt,…h_bn]

S13: in order to learn each word of sentence and the relationship of descriptor, one layer Attention layers are constructed, for calculating each word About the weight of descriptor, weight is bigger, and the word that represents is bigger about the feeling polarities of current topic in influence sentence, first often A word is expressed as H=H by S12_f+H_b, the term vector of current topic word is expressed as e_N, then two vectors are spliced and are made With tanh activation primitive, obtained vector is expressed as M=tanh ([H；e_N]), then learn a parameter W and goes to calculate each word Weight size about descriptor exports the entirety expression for obtaining sentence about descriptor multiplied by the GRU of each word of corresponding position R, wherein r=Hsoftmax (W^TM)；

S15: the training data in data set is carried out to a training according to the above process, uses cross entropy conduct in training process Loss function is optimized using ADAM optimizer, prevents over-fitting using L2 regularization, finally protects the parameter of network It leaves and.

3. the deep learning theme sensibility classification method according to claim 2 based on data enhancing, which is characterized in that institute The detailed process for stating step S2 is:

S21: each word of each sentence of training data is once replaced with [MASK] respectively, if current sentence For s=[w₁,…,w_t,…,w_n], n indicates the number for the word that current sentence includes, then each sentence is by replaced sentence one by one The shared n word of subclass s ' one, wherein s ' is { [[MASK] ..., w_t,…,w_n],…,[w₁,…,[MASK],…,w_n],…, [w₁,…,w_t,…,[MASK]]}.

S22: reloading the network parameter saved in S15, the network G of trained mistake is obtained before, then by each of s ' Sentence is separately input to the probability distribution of available prediction feeling polarities in network G, then selects from real feelings It is distributed that sentence of lie farthest away, is put into new training set, gets more one times of new training set, and each Sentence, which has all been cut out, influences the maximum word of the sentence feeling polarities, enhances the classification capacity of model.

4. the deep learning theme sentiment analysis method according to claim 3 based on data enhancing, which is characterized in that institute The detailed process for stating step S3 is:

S31: the training set generated in S22 and former training data are put into S15 trained net together as training set It in network, is then once trained again according to the process of S1, still uses cross entropy as loss function when training, use ADAM is as optimizer, and using L2 regularization, learning rate is set as 0.01, and model is restrained after 5 epochs of training.

S32: test data being put into S31 in trained network and is tested, and test index is measured using accuracy rate.

5. the deep learning theme sentiment analysis method according to claim 4 based on data enhancing, which is characterized in that institute The decision rule for stating that sentence in step S22 from real feelings distribution lie farthest away is:

Assuming that real feelings are distributed as y₁,y₂,y₃···,y_n, true tag y_t, the probability distribution collection of all predictions is combined into {(x₁₁,x₁₂,x₁₃, x_1n),(x₂₁,x₂₂,x₂₃, x_2n),…,(x_m1,x_m2,x_m3, x_mn), it finds out The smallest x_it, corresponding to distribution (x_i1,x_i2,x_i3, x_in) it is the language that lie farthest away is distributed from real feelings Sentence.