CN107153642A

CN107153642A - A kind of analysis method based on neural network recognization text comments Sentiment orientation

Info

Publication number: CN107153642A
Application number: CN201710342178.8A
Authority: CN
Inventors: 何慧; 冷永才; 胡然; 张莹; 焦润海
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2017-05-16
Filing date: 2017-05-16
Publication date: 2017-09-12

Abstract

The invention discloses a kind of analysis method based on neural network recognization text comments Sentiment orientation for belonging to computer language word processing technical field.It is word or word one by one the accurate participle of each sentence first by CBOW processing in processing text comments data；Each sentence has a corresponding class label；Then the emotion tendency commented on using long short-term memory LSTM Model checkings；Obtain the label of each sentence；And then be compared with true tag and obtain accuracy rate, by training the neural network model to obtain its best accuracy rate, that is, reach the purpose of neural network recognization text comments Sentiment orientation analysis.By the training process of GPU accelerans networks, the accuracy rate of emotional semantic classification is not only improved, and the training speed of large-scale corpus is got a promotion；The Sentiment orientation of comment can be effectively recognized, especially other faces have more preferable application prospect in electric business, film etc..

Description

A kind of analysis method based on neural network recognization text comments Sentiment orientation

Technical field

It is more particularly to a kind of to be based on neural network recognization text the invention belongs to computer language word processing technical field Comment on the analysis method of Sentiment orientation.

Background technology

Continuing to develop and becoming increasingly popular with computer internet, people are in the urgent need to the text to being become increasingly abundant on network This comment resource is managed, and is realized and is commented on mass text the effectively accurate emotional semantic classification of resource.Traditional text emotion point Class is taken as a text categorization task to consider, accurately far can not carry out emotion tendency judgement to text comments.

Sentiment analysis is that the emotion expressed by text comments is accurately classified, and helps people conveniently and efficiently to recognize Go out the Sentiment orientation of comment.At present, the method for text comments emotional semantic classification may be referred to document：[1] number of patent application： 201410602800.0, patent name：A kind of text emotion analysis method and equipment based on SVMs；[2] patent Shen Please number：201510452024.5, patent name：A kind of Chinese text sentiment analysis side based on Computerized Information Processing Tech Method；[3] number of patent application：201410224628.X, patent name：A kind of network text sentiment analysis side based on emotion value Method.Above-mentioned document is the feature and its weight for constructing text mostly, then carries out text emotion identification by certain sorting algorithm.

For text comments data, handled according to the method described above, there is weak point.For example, text comments number According to typically shorter, the above method have ignored important function of the word order in emotional semantic classification merely with word feature.Therefore, this A little methods can not carry out emotion tendency identification to text comments well.

The content of the invention

The purpose of the present invention is to propose to a kind of analysis method based on neural network recognization text comments Sentiment orientation, it is special Levy and be, in processing text comments data, use continuous type bag of words CBOW models (Continuous Bag-of-Words Model) as training term vector method, then using long short-term memory LSTM models (Long Short Time Memory, LSTM the emotion tendency of comment) is differentiated；Comprise the following steps that：

The first step, language material pretreatment is word or word one by one the accurate participle of each sentence；Each sentence has one Individual corresponding class label, i.e., 0,1,2, negative, neutral, front is represented respectively；Need each class label being converted into herein Three-dimensional vector, i.e., 0 is converted to [1 0 0], and 1 is converted to [0 1 0], and 2 are converted to [0 0 1].The purpose of this conversion is to lead to Cross and obtain the label of each sentence after training and contrasted；

Second step, term vector training is trained to the word obtained after text comments language material participle using CBOW, obtains each Vector corresponding to word, the vectorial dimension can be configured as needed；

3rd step, LSTM training.Use LSTM as the model of classification in the present invention, the language material with affective tag is made Training set, by the first step, second step processing, you can the problem of problem is converted into neural metwork training disaggregated model；

Assuming that including m word, corresponding term vector V={ v in a word l₁,v₂,L,v_m, now sentence l is just with vector V represents, the vector corresponding to the word of every words is directly inputted and handled to Recognition with Recurrent Neural Network；The Cyclic Operation Network Use conventional model：Long short-term memory LSTM models, are just corresponded to according to each word of sentence in Recognition with Recurrent Neural Network One LSTM model, namely one of word in the training of actual sentence, and will be all according to the position relationship of word The composition that is connected in turn before and after LSTM models chain structure, so as to be trained, the corresponding term vector V={ v of text word₁, v₂,…,v_mSuccessively as x in each models of LSTM_tInput value, last LSTM model in every a word is obtained into h_t Exported as the three-dimensional vector of this；It regard the output valve of every as the input of Softmax functions, the definition of Softmax functions again For：The function is probability-distribution function, and obtains three probable values, and three probable value sums are 1；In formula：e^xTo calculate the value of each classification；To calculate the value sum of k classification；If there are three class labels, the is calculated The probability of one class isCalculate Equations of The Second Kind probability beThe probability for calculating the 3rd class isThe emotional category as this for obtaining maximum is compared by the probable value to each classification, is obtained Maximum class probability value is just as the label of each sentence；And then be compared with true tag and obtain accuracy rate, pass through instruction Practice the neural network model and obtain its best accuracy rate, while also Optimal Parameters；Realize neural network recognization text comments The purpose of Sentiment orientation analysis；It is above-mentioned it is various in：V represents m term vector corresponding to m word, v in a word₁Represent first Term vector corresponding to individual word, v₂The term vector corresponding to second word is represented, by that analogy；In t in LSTM models Input value x_t, i.e., the corresponding term vector of t-th word as LSTM models input；The corresponding term vector v of first word₁It is used as x First input value x₁；h_tIt is the output of LSTM models, is made up of two parts, is to be obtained by sigmoid layers at the beginning of one first Begin to export, then zoomed to Ct values between -1 and 1 with tanh, the output then obtained with sigmoid is by being multiplied, so as to obtain To the output of model.

The term vector training basic thought and step of the second step are as follows：The description of language model form is exactly given one The character string S of individual T word, regards the probability P (w of natural language as₁,w₂,w₃,…,w_T),w₁To w_TRepresent successively in the words Each word, i.e., following reasoning

P (s)=P (w₁,w₂,...,w_T)=P (w₁)P(w₂|w₁)P(w₃|w₁,w₂)...P(w_T|w₁,w₂,w₃,...,w_T-1)

After i.e. first word is determined, i.e., the probability that word below occurs in the case of word appearance above；(" everybody likes Vigorously eat apple ", by obtaining four words after participle, " everybody ", " liking ", " eating ", " apple ", the natural language of the words it is general Rate is：P (everybody, likes, and eats, apple)=P (everybody) P (like | everybody) P (eat | everybody, like) P (apple | everybody, happiness Vigorously, eat)) each probability at the same time can be obtained respectively, above formula is reduced to：

Work as Context_iIt is exactly its own P (w) for space-time.

CBOW model cores are exactly that, on gradient calculation, its key technology is exactly Hierarchical Softmax, herein Need to use the knowledge that Huffman trees are related, using each word in dictionary as Huffman trees leaf node, for Huffman Some leaf node in tree, it is assumed that corresponding in dictionary is word w, in order to which following convenient calculate introduces some symbols:

(1)p^w:The path of correspondence leaf node from root node to w.

(2)l^w:Path p^wIn include the number of node；

(3)Path p^wIn l^wIndividual node,Represent the corresponding nodes of word w；

(4)Word w Huffman tree-encodings,Represent path p^wIn j-th of node pair The coding answered；

(5)Path p^wThe corresponding vector of middle non-leaf nodes,Represent path p^wMiddle jth The corresponding vector of individual non-leaf nodes；

For any word w in dictionary, than one of existence anduniquess from root node to word w corresponding nodes in Huffman trees Path p^w；Path p^wOn there is l^w- 1 branch, regards each branch as two classification, does not have a subseries just to produce one Probability, it is exactly required P (w | Context (w)) that these probability, which are multiplied,；

Conditional probability P (w | Context (w)) general formulae is write as：

Wherein：

Arranging merging according to above formula can obtain：

In formula：Represent that it is every subseries knot to reach leafy node from Huffman tree roots node The probability of fruit；According to logistic regression, the probability that a node is divided into positive class isIt is divided into negative class Probability isIt is exactly above formula that two formulas, which are combined together,；θ：The corresponding vector of non-leaf nodes；σ：It is Sigmoid functions, formula：X：2c vector of input layer does summation and added up, i.e.,2c represents current word w, before have c word, behind have c word；

The object function of language model based on neutral net is usually taken to be following log-likelihood function：

P (w | Context (w)) is substituted into Γ log-likelihood functions to obtain：

Gradient is derived for convenience, and the content inside the dual summation bracket of above formula is denoted as into Γ (w, j) i.e.：

Now above formula Γ is then the object function of CBOW models, is next exactly that object function is optimized, use It is stochastic gradient rise method, that is, asks the maximization of object function；

The thought of stochastic gradient rise method is：Often take a sample (Context (w), w) just to all in object function Parameter, which is done, once to be refreshed, and Γ (w, the j) gradients vectorial on these is first provided herein；Provide first Γ (w, j) onLadder Pair degree is calculated, i.e.,Carry out derivation：

Then, it is rightMore new formula writeable be：

Wherein, η represents learning rate.

Secondly Γ (w, j) is calculated on X_wGradient, examining Γ (w, j) can obtainWith X_wBe it is symmetrical, institute in the hope of Lead ibid：

Final purpose is the term vector of each word in requirement dictionary, and X here_wRepresent each in Context (w) Adding up for term vector, then utilizeNext pairIt is updated：

I.e.Contribute on each term vector in Context (w), par contribution is used herein, so The term vector of each word can be obtained.

The structure of the LSTM models is as follows：

(1) forget gate layer, mainly determine what information is abandoned from cell state, the door can read h_t-1And x_t, output one The individual numerical value between 0 to 1, to each in cell state C_t-1In numeral, 1 represent " being fully retained ", 0 represent " completely house Abandon ", representation is

f_t=σ (W_f·[h_t-1,x_t]+b_f)；

(2) candidate's layer, determines which type of fresh information is stored in cell state, is made up of two parts, first, Sigmoid layers are referred to as " input gate layer " and determine that what value will update；Second, one tanh layers create a new candidate values to AmountIt will be added into state, the renewal to state is produced according to the two information, representation is

i_t=σ (W_i[h_t-1,x_t]+b_i)

(3) cell state, new and old cell state C are updated_t-1It is updated to C_t, oldState C_t-1With f_tIt is multiplied, discards really Surely the information to be abandoned, is then addedHere it is new candidate value, is carried out according to the degree for determining each state of renewal Change, representation is

(4) gate layer is exported, last it needs to be determined that what value exported, this output valve will be based on cell state, therefore One sigmoid layers of operation come determine cell state which partly output is gone out；Then cell state is passed through tanh letters Number progress, which is handled, obtains a value between -1 and 1, and it is multiplied with the output of sigmoid, and final output is represented For：

h_t=o_t*tanh(C_t)；

The above is exactly the content of LSTM models, and a LSTM is just corresponded in Recognition with Recurrent Neural Network according to each word of sentence Model, namely one of word in the training of actual sentence, and according to the position relationship of word by all LSTM models The front and rear composition chain structure that is connected in turn；So as to be trained；With the corresponding term vector V={ v of text word₁,v₂,…,v_m} Successively as x in each models of LSTM_tInput value；Last LSTM model in every a word is obtained into h_tIt is used as the sentence Three-dimensional vector output；

(5) using the output valve of every as the input of Softmax functions, Softmax functions are defined as：The function is probability-distribution function, and three probable value sums are 1；By to each classification Probable value is compared the emotional category as this for obtaining maximum, obtains the label of each sentence；And then and true tag It is compared and obtains accuracy rate, by training the neural network model to obtain its best accuracy rate, while also Optimal Parameters；I.e. Reach the purpose of neural network recognization text comments Sentiment orientation analysis.

The beneficial effects of the invention are as follows two problems that the present invention solves the presence of traditional text emotional semantic classification：

(1) term vector trained using CBOW methods is dense, real-valued vectors, and it can be effectively using a large amount of without mark number According to so as to obtain word, more accurately semanteme is portrayed in semantic space.Simultaneously, it is to avoid sparse, dimension that traditional only hotlist shows The shortcomings of spending disaster.

(2) relatively conventional sorting technique, LSTM can not only utilize the word information of comment text, can also be to word order It is modeled, so as to obtain commenting on the distinctive document representation method of language.

(3) can by GPU accelerans networks training process, not only improve the accuracy rate of emotional semantic classification, and make The training speed of large-scale corpus gets a promotion；The Sentiment orientation of comment can effectively be recognized, especially electric business, film etc. other Face has more preferable application prospect.

Brief description of the drawings

Fig. 1 is the flow chart of the analysis of identification text comments Sentiment orientation.

Fig. 2 is LSTM composition schematic diagram.

Embodiment

The present invention proposes a kind of analysis method based on neural network recognization text comments Sentiment orientation, below in conjunction with the accompanying drawings It is explained.

The present invention uses continuous type bag of words CBOW models (Continuous Bag-of- in processing text comments data Words Model).As the method for training term vector, long short-term memory LSTM models (Long Short Time are then utilized Memory, LSTM) differentiate the emotion tendency commented on；Comprising the following steps that as shown in Fig. 1 flow charts：

The term vector training basic thought and step are as follows：The description of language model form is exactly to give a T word Character string S, regard the probability P (w of natural language as₁,w₂,w₃,…,w_T),w₁To w_TEach word in the words is represented successively, I.e. following reasoning

Work as Context_iIt is exactly its own P (w) for space-time.

(6)p^w:The path of correspondence leaf node from root node to w.

(7)l^w:Path p^wIn include the number of node；

(8)Path p^wIn l^wIndividual node,Represent the corresponding nodes of word w；

(9)Word w Huffman tree-encodings,Represent path p^wIn j-th of node pair The coding answered；

(10)Path p^wThe corresponding vector of middle non-leaf nodes,Represent path p^wIn The corresponding vector of j non-leaf nodes；

Conditional probability P (w | Context (w)) general formulae is write as：

Wherein：

Arranging merging according to above formula can obtain：

In formula：Represent that it is every subseries knot to reach leafy node from Huffman tree roots node The probability of fruit；According to logistic regression, the probability that a node is divided into positive class isIt is divided into the general of negative class Rate isIt is exactly above formula that two formulas, which are combined together,；θ：The corresponding vector of non-leaf nodes；σ：It is sigmoid Function, formula：X：2c vector of input layer does summation and added up, i.e.,2c tables Show current word w, before have c word, behind have c word；

Then, it is rightMore new formula writeable be：

Wherein, η represents learning rate.

Assuming that including m word, corresponding term vector V={ v in a word l₁,v₂,…,v_m, now sentence l just with to V is measured to represent, wherein, V represents m term vector corresponding to m word, v in a word₁Represent first word corresponding to word to Amount, v₂The term vector corresponding to second word is represented, by that analogy, the vector corresponding to the word of every words is directly inputted to following Ring neutral net is handled；The Cyclic Operation Network uses conventional model：Long short-term memory LSTM models, according to sentence Son each word in Recognition with Recurrent Neural Network just correspondence one LSTM model, actual sentence training in namely one of those Word, and according to the position relationship of word by the composition chain structure that is connected in turn before and after all LSTM models, so as to carry out Training, the corresponding term vector V={ v of text word₁,v₂,…,v_mSuccessively as x in each models of LSTM_t；In LSTM models The input value x of t_t, i.e., the corresponding term vector of t-th word as LSTM models input.Such as, the corresponding word of first word to Measure v₁It is used as x first input value x₁.Input value, last LSTM model in every a word is obtained into h_t；h_tIt is The output of LSTM models, is made up of two parts, is to obtain an initial output by sigmoid layers first, then will with tanh Ct values are zoomed between -1 and 1, and the output then obtained with sigmoid is by being multiplied, so as to obtain the output of model.It is used as this The three-dimensional vector output of sentence；Again using the output valve of every as the input of Softmax functions, Softmax functions are defined as：In formula, e^xTo calculate the value of each classification；To calculate the value sum of k classification；If having three Individual class label, calculate the first kind probability beCalculate Equations of The Second Kind probability beCalculate the The probability of three classes isThe function is probability-distribution function, and obtains three probable values, and three probable values it With for 1；The emotional category as this for obtaining maximum is compared by the probable value to each classification, each sentence is obtained Label；And then be compared with true tag and obtain accuracy rate, by training the neural network model to obtain its best standard True rate, while also Optimal Parameters；Realize the purpose of neural network recognization text comments Sentiment orientation analysis.Specifically such as Fig. 2 institutes The structure of the LSTM models shown is as follows：

f_t=σ (W_f·[h_t-1,x_t]+b_f)；

i_t=σ (W_i[h_t-1,x_t]+b_i)

h_t=o_t*tanh(C_t)；

Claims

1. a kind of analysis method based on neural network recognization text comments Sentiment orientation, it is characterised in that commented in processing text By in data, using continuous type bag of words CBOW as the method for training term vector, long short-term memory LSTM moulds are then utilized Type differentiates the emotion tendency of comment；Comprise the following steps that：

The first step, language material pretreatment is word or word one by one the accurate participle of each sentence；Each sentence have one it is right The class label answered, i.e., 0,1,2, represent negative, neutral, front respectively；Need each class label being converted into three-dimensional herein Vector, i.e., 0 is converted to [1 0 0], and 1 is converted to [0 1 0], and 2 are converted to [0 0 1], this conversion purpose be by instruction The label that each sentence is obtained after white silk is contrasted；

Second step, term vector training is trained to the word obtained after text comments language material participle using CBOW, obtains each word institute Corresponding vector, the vectorial dimension can be configured as needed；

Use LSTM as the model of classification in 3rd step, LSTM training, the present invention, the language material with affective tag is trained Collection, by the first step, second step processing, you can the problem of problem is converted into neural metwork training disaggregated model；

Assuming that including m word, corresponding term vector V={ v in a word l₁,v₂,L,v_m, now sentence l just uses vector V tables Show, the vector corresponding to the word of every words is directly inputted and handled to Recognition with Recurrent Neural Network；The Cyclic Operation Network is used Be conventional model：Long short-term memory LSTM models, one is just corresponded to according to each word of sentence in Recognition with Recurrent Neural Network LSTM models, namely one of word in the training of actual sentence, and according to the position relationship of word by all LSTM The composition that is connected in turn before and after model chain structure, so as to be trained, the corresponding term vector V={ v of text word₁,v₂,…, v_mSuccessively as x in each models of LSTM_tInput value, last LSTM model in every a word is obtained into h_tIt is used as this The three-dimensional vector output of sentence；Again using the output valve of every as the input of Softmax functions, Softmax functions are defined as：

The function is probability-distribution function, and obtains three probable values, and three probable value sums are 1；In formula：e^xTo calculate the value of each classification；To calculate the value sum of k classification；If there are three class labels, the is calculated The probability of one class isCalculate Equations of The Second Kind probability beThe probability for calculating the 3rd class isThe emotional category as this for obtaining maximum is compared by the probable value to each classification, is obtained Maximum class probability value is just as the label of each sentence；And then be compared with true tag and obtain accuracy rate, pass through instruction Practice the neural network model and obtain its best accuracy rate, while also Optimal Parameters；Realize neural network recognization text comments The purpose of Sentiment orientation analysis；It is above-mentioned it is various in：V represents m term vector corresponding to m word in a word, and v1 represents first Term vector corresponding to individual word, v₂The term vector corresponding to second word is represented, by that analogy；In t in LSTM models Input value x_t, i.e., the corresponding term vector of t-th word as LSTM models input；The corresponding term vector v of first word₁It is used as x First input value x₁；h_tIt is the output of LSTM models, is made up of two parts, is to be obtained by sigmoid layers at the beginning of one first Begin to export, then zoomed to Ct values between -1 and 1 with tanh, the output then obtained with sigmoid is by being multiplied, so as to obtain To the output of model.

2. a kind of analysis method based on neural network recognization text comments Sentiment orientation, its feature according to claim 1 It is, the term vector training basic thought and step of the second step are as follows：The description of language model form is exactly to give a T The character string S of individual word, regards the probability P (w of natural language as₁,w₂,w₃,…,w_T),w₁To w_TRepresent successively each in the words Individual word, i.e., following reasoning P (s)=P (w₁,w₂,...,w_T)=P (w₁)P(w₂|w₁)P(w₃|w₁,w₂)...P(w_T|w₁,w₂, w₃,...,w_T-1)

After i.e. first word is determined, i.e., the probability that word below occurs in the case of word appearance above；At the same time can be with Each probability is obtained respectively, and above formula is reduced to：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>w</mi> <mn>2</mn> </msub> <mo>,</mo> <msub> <mi>w</mi> <mn>3</mn> </msub> <mo>,</mo> <mo>...</mo> <mo>,</mo> <msub> <mi>w</mi> <mi>T</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Pi;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>Context</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

Work as Context_iIt is exactly its own P (w) for space-time；

CBOW model cores are exactly that, on gradient calculation, its key technology is exactly Hierarchical Softmax, is needed herein Using the related knowledge of Huffman trees, using each word in dictionary as Huffman trees leaf node, in Huffman trees Some leaf node, it is assumed that corresponding in dictionary is word w, in order to which following convenient calculate introduces some symbols:

(1)p^w:The path of correspondence leaf node from root node to w；

(2)l^w:Path p^wIn include the number of node；

(3)Path p^wIn l^wIndividual node,Represent the corresponding nodes of word w；

(4)Word w Huffman tree-encodings,Represent path p^wIn j-th of node it is corresponding compile Code；

(5)Path p^wThe corresponding vector of middle non-leaf nodes,Represent path p^wIn j-th it is non- The corresponding vector of leaf node；

For any word w in dictionary, than one of existence anduniquess from root node to the road of word w corresponding nodes in Huffman trees Footpath p^w；Path p^wOn there is l^w- 1 branch, regards each branch as two classification, does not have a subseries just to produce a probability, It is exactly required P (w | Context (w)) that these probability, which are multiplied,；

Conditional probability P (w | Context (w)) general formulae is write as：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> <mo>(</mo> <mi>w</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Pi;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>2</mn> </mrow> <msup> <mi>l</mi> <mi>w</mi> </msup> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>|</mo> <msub> <mi>X</mi> <mi>w</mi> </msub> <mo>,</mo> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Wherein：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>|</mo> <msub> <mi>X</mi> <mi>w</mi> </msub> <mo>,</mo> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>=</mo> <mn>0</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>1</mn> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>=</mo> <mn>1</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

Arranging merging according to above formula can obtain：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>|</mo> <msub> <mi>X</mi> <mi>W</mi> </msub> <mo>,</mo> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mrow> <mo>&lsqb;</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>W</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mrow> <mn>1</mn> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> </mrow> </msup> <mo>&CenterDot;</mo> <msup> <mrow> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> </msup> </mrow>

In formula：Represent that it is each classification results to reach leafy node from Huffman tree roots node Probability；According to logistic regression, the probability that a node is divided into positive class isIt is divided into the probability of negative class It isIt is exactly above formula that two formulas, which are combined together,；θ：The corresponding vector of non-leaf nodes；σ：It is sigmoid letters Number, formula：X：2c vector of input layer does summation and added up, i.e.,2c is represented Current word w, before have c word, behind have c word；

<mrow> <mi>&Gamma;</mi> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>w</mi> <mo>&Element;</mo> <mi>c</mi> </mrow> </munder> <mi>log</mi> <mi> </mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> <mo>(</mo> <mi>w</mi> <mo>)</mo> <mo>)</mo> </mrow> </mrow> 2

<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>&Gamma;</mi> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>w</mi> <mo>&Element;</mo> <mi>c</mi> </mrow> </munder> <mi>l</mi> <mi>o</mi> <mi>g</mi> <munderover> <mo>&Pi;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>2</mn> </mrow> <msup> <mi>l</mi> <mi>w</mi> </msup> </munderover> <mo>{</mo> <msup> <mrow> <mo>&lsqb;</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mrow> <mn>1</mn> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> </mrow> </msup> <mo>&times;</mo> <msup> <mrow> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> </msup> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>w</mi> <mo>&Element;</mo> <mi>c</mi> </mrow> </munder> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>2</mn> </mrow> <msup> <mi>l</mi> <mi>w</mi> </msup> </munderover> <mo>{</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>log</mi> <mo>&lsqb;</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>+</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>&CenterDot;</mo> <mi>log</mi> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>}</mo> </mrow> </mtd> </mtr> </mtable> </mfenced>

<mrow> <mi>&Gamma;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>log</mi> <mo>&lsqb;</mo> <mi>&sigma;</mi> <mo>(</mo> <mrow> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>&rsqb;</mo> <mo>+</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msubsup> <mo>&CenterDot;</mo> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <mi>&sigma;</mi> <mo>(</mo> <mrow> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msubsup> </mrow> <mo>)</mo> <mo>&rsqb;</mo> </mrow>

Now above formula Γ is then the object function of CBOW models, next be exactly object function is optimized, use with Machine gradient rise method, that is, ask the maximization of object function；

The thought of stochastic gradient rise method is：Often take a sample (Context (w), w) just to all parameters in object function Do and once refresh, Γ (w, the j) gradients vectorial on these is first provided herein；Provide first Γ (w, j) onGradiometer Pair calculate, i.e.,Carry out derivation：

Then, it is rightMore new formula writeable be：

<mrow> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>:</mo> <mo>=</mo> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>+</mo> <mi>&eta;</mi> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <msub> <mi>X</mi> <mi>w</mi> </msub> </mrow>

Wherein, η represents learning rate；

Secondly Γ (w, j) is calculated on X_wGradient, examining Γ (w, j) can obtainWith X_wIt is symmetrical, so derivation is same On：

<mrow> <mfrac> <mrow> <mo>&part;</mo> <mi>&Gamma;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mo>&part;</mo> <msub> <mi>X</mi> <mi>w</mi> </msub> </mrow> </mfrac> <mo>=</mo> <mo>&lsqb;</mo> <mn>1</mn> <mo>-</mo> <msubsup> <mi>d</mi> <mi>j</mi> <mi>w</mi> </msubsup> <mo>-</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msubsup> <mi>X</mi> <mi>w</mi> <mi>T</mi> </msubsup> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <msubsup> <mi>&theta;</mi> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>w</mi> </msubsup> </mrow>

Final purpose is the term vector of each word in requirement dictionary, and X here_wRepresent in Context (w) each word to Adding up for amount, then utilizeNext pair It is updated：

<mrow> <mi>v</mi> <mrow> <mo>(</mo> <mover> <mi>w</mi> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mo>:</mo> <mo>=</mo> <mi>v</mi> <mrow> <mo>(</mo> <mover> <mi>w</mi> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mo>+</mo> <mi>&eta;</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>2</mn> </mrow> <msup> <mi>l</mi> <mi>w</mi> </msup> </munderover> <mfrac> <mrow> <mo>&part;</mo> <mi>&Gamma;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mo>&part;</mo> <msub> <mi>X</mi> <mi>w</mi> </msub> </mrow> </mfrac> <mover> <mi>w</mi> <mo>&OverBar;</mo> </mover> <mo>&Element;</mo> <mi>C</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </mrow>

I.e.Contribute on each term vector in Context (w), par contribution is used herein, so can Obtain the term vector of each word.

3. a kind of analysis method based on neural network recognization text comments Sentiment orientation according to claim 1 or claim 2, it is special Levy and be, the structure of the LSTM models is as follows：

(1) forget gate layer, mainly determine what information is abandoned from cell state, the door can read h_t-1And x_t, output one is 0 Numerical value between to 1, to each in cell state C_t-1In numeral, 1 represent " being fully retained ", 0 represent " giving up completely ", table Show that form is

f_t=σ (W_f·[h_t-1,x_t]+b_f)；

(2) candidate's layer, determines which type of fresh information is stored in cell state, is made up of two parts, first, sigmoid Layer is referred to as " input gate layer " and determines that what value will update；Second, one tanh layers create a new candidate value vectorWill It can be added into state, the renewal to state is produced according to the two information, representation is

i_t=σ (W_i[h_t-1,x_t]+b_i)

<mrow> <mover> <msub> <mi>C</mi> <mi>t</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mi>tanh</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>C</mi> </msub> <mo>&lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> </mrow> </msub> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>C</mi> </msub> <mo>)</mo> </mrow> <mo>;</mo> </mrow>

(3) cell state, new and old cell state C are updated_t-1It is updated to C_t, oldState C_t-1With f_tIt is multiplied, discarding determination will The information of discarding, is then addedHere it is new candidate value, is changed according to the degree for determining to update each state, Representation is

<mrow> <msub> <mi>C</mi> <mi>t</mi> </msub> <mo>=</mo> <msub> <mi>f</mi> <mi>t</mi> </msub> <mo>*</mo> <msub> <mi>C</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>i</mi> <mi>t</mi> </msub> <mo>*</mo> <mover> <msub> <mi>C</mi> <mi>t</mi> </msub> <mo>&OverBar;</mo> </mover> <mo>;</mo> </mrow>

(4) gate layer is exported, last it needs to be determined that what value exported, this output valve will be based on cell state, therefore operation One sigmoid layers come determine cell state which partly output is gone out；Then cell state is entered by tanh functions Row processing obtains a value between -1 and 1, and it is multiplied with the output of sigmoid, and final output is expressed as：

h_t=o_t*tanh(C_t)；

The above is exactly the content of LSTM models, and a LSTM mould is just corresponded in Recognition with Recurrent Neural Network according to each word of sentence Type, namely one of word in the training of actual sentence, and according to the position relationship of word by before all LSTM models After be connected in turn composition chain structure；So as to be trained；With the corresponding term vector V={ v of text word₁,v₂,…,v_mAccording to X in secondary each model as LSTM_tInput value；Last LSTM model in every a word is obtained into h_tIt is used as this 's Three-dimensional vector is exported；