CN108038166A

CN108038166A - A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item

Info

Publication number: CN108038166A
Application number: CN201711279503.7A
Authority: CN
Inventors: 刘进; 郭峻材; 陈雪; 崔晓晖
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2017-12-06
Filing date: 2017-12-06
Publication date: 2018-05-15

Abstract

The present invention relates to a kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item.Step is as follows：(1) target microblog data collection to be analyzed is obtained；(2) every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and the emotion word to preceding connecing negative word is combined operation；(3) to pretreated microblog data, emotion priori and skewed popularity priori are introduced；(4) skewed popularity, emotion and the theme label of each lexical item are sampled using Gibbs sampling algorithms；(5) skewed popularity and emotion Joint Distribution variable of every microblogging are calculated；(6) the final feeling polarities probability distribution of every microblogging is calculated, and then determines the feeling polarities of microblogging.This method proposes the concept of the subjective and objective skewed popularity (abbreviation skewed popularity) of lexical item for microblog data, combines modeling to the relation of skewed popularity, emotion and theme using Gibbs algorithms.This method is simple and practical, can significantly improve microblog emotional classification performance.

Description

A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item

Technical field

The present invention relates to a kind of sentiment analysis method to Chinese microblogging, is specifically related to be directed to microblog data collection, proposes The concept of the skewed popularity of lexical item, while emotion priori and skewed popularity priori are introduced, based on skewed popularity, emotion and theme Relation this triple combination is sampled using Gibbs sampling algorithms, then calculate the skewed popularity and emotion joint point of every microblogging Cloth variable, then the final emotion probability distribution of every microblogging is calculated, and then determine the feeling polarities of microblogging, it is that one kind is based on lexical item The Chinese microblog emotional analysis method of subjective and objective skewed popularity.

Background technology

In recent years, as the rapid development of Internet technology, various social media platforms emerge rapidly, people are more and more Emotion or viewpoint that oneself is expressed using social medias such as microbloggings, there is the microblogging of magnanimity constantly producing and passing daily Broadcast.Compared with traditional long text, how therefrom microblogging short text has the characteristics that brief, colloquial style, lack of standardization and feature are sparse, Emotion or opinion knowledge are effectively excavated, has become an important research direction.

Mainly there are two major class methods to microblog emotional analysis at present：Method based on sentiment dictionary and based on machine learning Method.Method based on sentiment dictionary mainly utilizes the emotion word in sentiment dictionary, by Keywords matching, and then determines one The feeling polarities or intensity of text, the major defect of this kind of method are to be too dependent on surface characteristics.Side based on machine learning Method is also divided into supervision, Weakly supervised and unsupervised approaches entirely.Full measure of supervision is instructed on the large-scale dataset manually marked first Practice emotion classifiers, then trained grader is used in further emotional semantic classification, manually mark therein is especially time-consuming Effort.Weakly supervised method is mainly marked by the use of noise tokens such as the emoticons in social networks text as the emotion of text, Full measure of supervision training grader is reused, but noise present in marking can also influence the performance of grader.Unsupervised side Training set is not required in method, guides sentiment analysis using emotion word as emotion priori mostly.

Recent studies suggest that the emotion of text has dependence with theme, on this idea basis, occur very much The conjunctive model of emotion and theme.This kind of unsupervised approaches are primarily based on emotion and the relation of theme constructs rational lexical item life Into model, combined sampling then is carried out to the emotion and theme of lexical item using the Gibbs method of samplings, and then calculate the emotion of text Distribution, and using the emotional category belonging to the emotional category as text of maximum probability.

It is above-mentioned to be had the following disadvantages based on emotion and thematic relation to analyze the method for microblog emotional：

(1) they only think that emotion and theme have dependence, do not account for influence of the skewed popularity to emotion；

(2) when for microblogging field, they cannot utilize this most typical affective characteristics of emoticon well；

(3) due to not accounting for skewed popularity, they can not utilize the skewed popularity that the part of speech of emoticon and lexical item is included Priori.

The content of the invention

It is an object of the invention to for the deficiency in terms of current Chinese microblog emotional analysis, there is provided one kind is based on lexical item master The Chinese microblog emotional analysis method of objective skewed popularity, this method propose the concept of the skewed popularity of lexical item, while introduce emotion elder generation Knowledge and skewed popularity priori are tested, the relation based on skewed popularity, emotion and theme joins this three using Gibbs sampling algorithms Sampling is closed, then calculates the skewed popularity and emotion Joint Distribution variable of every microblogging, then to calculate the final emotion of every microblogging general Rate is distributed, and then determines the feeling polarities of microblogging.

To achieve the above object, design of the invention is as follows：Obtain target microblog data collection to be analyzed and carry out Pretreatment；Emotion priori and skewed popularity priori are introduced, and the deviation of each lexical item is sampled using Gibbs sampling algorithms Property, emotion and theme label；Calculate the skewed popularity and emotion Joint Distribution variable of every microblogging；Calculate the final feelings of every microblogging Feel polarity probability distribution, and then determine the feeling polarities of microblogging.

Thought according to above-mentioned invention, the present invention use following technical proposals：

A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that：Including following step Suddenly：

Step 1, obtain target microblog data collection to be analyzed；

Step 2, segment every microblogging, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing negative word Emotion word is combined operation；

Step 3, to pretreated microblog data, introduce emotion priori and skewed popularity priori, emotion priori Knowledge includes emotion word and emoticon；The skewed popularity of lexical item includes subjective skewed popularity and objective skewed popularity, the former refers to partially To in the subjective emotion of expression, the latter refers to being partial to describe objective things；This method uses emoticon to be inclined to as subjective Property priori, time word, place word and pronoun are as objective skewed popularity priori；Introduce emotion and skewed popularity priori Process be specially：

Step 3a, build skewed popularity transfer matrix η, K × S × T × V's of transference matrix λ, K × V of empty S × V β matrixes and final prior matrix F (β, η, λ)；Wherein S, T, K, V represent respectively emotion number, theme number, skewed popularity number and Different lexical item numbers in data set；

Step 3b, η_K×VAnd λ_S×VElement be initialized as 1；

Step 3c, for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion Mark l ∈ { 1 ..., S }, if w is skewed popularity priori, η_K×VIn element η_cwRenewal is as follows：

If w is emotion priori, λ_S×VIn element λ_lwRenewal is as follows：

Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w；

Step 3d, for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark Note c ∈ 1 ..., K }

With

Each theme z ∈ { 1 ..., T }, final priori F_c,l,z,w(β, η, λ) is：

F_c,l,z,w(β, η, λ)=η_c,w·β_c,l,z,w·λ_l,w

Step 4, according to pretreated microblog data and priori, utilize Gibbs sampling algorithms to sample each word Skewed popularity, emotion and the theme label of item, concentrate data the lexical item w of each position i_iSkewed popularity label c_i, emotion label l_i With theme label z_iSampling is as follows：

P(c_i=k, l_i=s, z_i=t | w, c^-i,l^-i,z^-i,ε,γ,α,β,η,λ)∝

Wherein, N_dRepresent w_iThe lexical item number of place text d, N_d,kRepresent the lexical item number for belonging to skewed popularity k in text d, N_d,k,s Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, N_d,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d Inscribe the lexical item number of t, N_k,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, N_k,s,t,wRepresent data Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t；In addition, the priori meter that ε, γ are skewed popularity and emotion marks Number, is empirical value, and the priori that α is the theme counts, and learns to obtain by maximal possibility estimation, and-i represents not include current lexical item；

Step 5, the sampling by certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging, The skewed popularity and emotion Joint Distribution variable of microblogging d calculates as follows：

Step 6, calculate the final emotion probability distribution of every microblogging, and the feeling polarities of select probability maximum are as microblogging Feeling polarities, the final emotion probability distribution calculating of microblogging d are as follows：

The present invention's is a kind of based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item and existing technology phase Compare, have the advantages that following outstanding feature and：First, it is believed that emotion, skewed popularity and theme are interrelated in microblogging text, to text This semanteme carried out deeper into analysis；Second, while emotion priori and skewed popularity priori are introduced, it is not only fully sharp With emoticon, and the part-of-speech tagging of lexical item is combined, used the emotion correlated characteristic of text to a greater extent；The Three, using influence of the skewed popularity to emotion, to emotion, skewed popularity and theme combined sampling so that final emotional semantic classification is more Accurately.

Brief description of the drawings

Fig. 1 is a kind of flow chart of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item of the present invention.

Embodiment

The embodiment of the present invention is further described below in conjunction with attached drawing.

The Method And Principle of the present invention is introduced first：

A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that its specific steps It is as follows：

(1) target microblog data collection to be analyzed is obtained；

(2) every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the feelings of negative word Feel word

It is combined operation；

(3) to pretreated microblog data, emotion priori and skewed popularity priori are introduced；

(4) according to pretreated microblog data and priori, each lexical item is sampled using Gibbs sampling algorithms Skewed popularity, emotion and theme label；

(5) by the sampling of certain iterations, the skewed popularity and emotion Joint Distribution variable of every microblogging of calculating；

(6) the final emotion probability distribution of every microblogging is calculated, the feelings of the feeling polarities of select probability maximum as microblogging Feel polarity.In (2) step, emotion word and negative word combination are into after neologisms, the emotion attribute before having no longer；

In (3) step, emotion priori includes emotion word and emoticon.The skewed popularity of lexical item includes subjective skewed popularity With objective skewed popularity, the former refers to being partial to express subjective emotion, and the latter refers to being partial to describe objective things.This method Using emoticon as subjective skewed popularity priori, time word, place word and pronoun are as objective skewed popularity priori. The process for introducing emotion and skewed popularity priori is specially：

The β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of (3a) structure Battle array and final prior matrix F (β, η, λ).Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively According to the different lexical item numbers of concentration；

(3b)η_K×VAnd λ_S×VElement be initialized as 1；

(3c) is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion L ∈ { 1 ..., S }, if w is skewed popularity priori, η_K×VIn element η_cwRenewal is as follows：

If w is emotion priori, λ_S×VIn element λ_lwRenewal is as follows：

(3d) is for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c ∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori F_c,l,z,w(β, η, λ) is：

F_c,l,z,w(β, η, λ)=η_c,w·β_c,l,z,w.λ_l,w

In (4) step, data are concentrated with the lexical item w of each position i_iSkewed popularity label c_i, emotion label l_iWith theme mark Number z_iSampling is as follows：

P(c_i=k, l_i=s, z_i=t | w, c^-i,l^-i,z^-i,ε,γ,α,β,η,λ)∝

Wherein, N_dRepresent w_iThe lexical item number of place text d, N_d,kRepresent the lexical item number for belonging to skewed popularity k in text d, N_d,k,s Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, N_d,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d Inscribe the lexical item number of t, N_k,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, N_k,s,t,wRepresent data Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t.In addition, the priori meter of ε, γ skewed popularity and emotion mark Number, is empirical value, and the priori that α is the theme counts, and learns to obtain by maximal possibility estimation, and-i represents not include current lexical item.

In (5) step, skewed popularity and emotion Joint Distribution the variable calculating of microblogging d is as follows：

In (6) step, the final emotion probability distribution calculating of microblogging d is as follows：

Wherein, k_subRepresent subjective skewed popularity label, k_objRepresent objective skewed popularity label, W_subAnd W_objRepresent respectively subjective Emotion under skewed popularity and under objective skewed popularity is distributed in weight shared in overall emotion distribution.Obviously, subjective skewed popularity Emotion distribution shared by weight bigger.

Here is the specific embodiment with reference to the above method.

Embodiment one：Referring to Fig. 1, based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item, its feature It is：For microblog data collection, the concept of the skewed popularity of lexical item is proposed, while introduce emotion priori and skewed popularity priori is known Know, the relation based on skewed popularity, emotion and theme samples this triple combination using Gibbs sampling algorithms, and then determines microblogging Feeling polarities.

The process of the introducing emotion priori and skewed popularity priori is as follows：

(3b)η_K×VAnd λ_S×VElement be initialized as 1；

If w is emotion priori, λ_S×VIn element λ_lwRenewal is as follows：

F_c,l,z,w(β, η, λ)=η_c,w·β_c,l,z,w·λ_l,w

The lexical item w that data are concentrated with each position i_iSkewed popularity label c_i, emotion label l_iWith theme label z_i's Sampling type is as follows：

P(c_i=k, l_i=s, z_i=t | w, c^-i,l^-i,z^-i,ε,γ,α,β,η,λ)∝

Wherein, N_dRepresent w_iThe lexical item number of place text d, N_d,kRepresent the lexical item number for belonging to skewed popularity k in text d, N_d,k,s Represent the lexical item number for belonging to skewed popularity k and emotion s in text d, N_d,k,s,tRepresent to belong to skewed popularity k, emotion s and master in text d Inscribe the lexical item number of t, N_k,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, N_k,s,t,wRepresent data Concentrate the number for the lexical item w for belonging to skewed popularity k, emotion s and theme t.In addition, the priori meter of ε, γ skewed popularity and emotion mark Number, is empirical value, and the priori that α is the theme counts, and estimates study by maximum likelihood and obtains, and-i represents not include current lexical item.

The skewed popularity and emotion Joint Distribution variable calculating formula to every microblogging d is as follows：

The final emotion probability distribution to microblogging d calculates as follows：

Embodiment two：Based on the Chinese microblog emotional analysis method of the subjective and objective skewed popularity of lexical item, from Sina weibo website Crawl 3000

Bar microblogging is as target data set to be analyzed.As shown in Figure 1, one kind of the present embodiment is subjective and objective partially based on lexical item Tropism

Chinese microblog emotional analysis method, its step is as follows：

S1. it is " modern as target data set to be analyzed, such as microblogging 3000 microblog datas to be crawled from Sina weibo website It has bought new cell-phone, good happy！[heartily] "；

S2. every microblogging is segmented, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the feelings of negative word Sense word is combined operation.For example, using the Chinese Academy of Sciences NLPIR as participle and part-of-speech tagging instrument, microblogging " bought newly by today Mobile phone, it is good happy！[heartily] " the lexical item sequence after processing for " today ", " buying ", " new ", " mobile phone ", " good ", " happy ", " [heartily] " }, corresponding part of speech sequence is { time word, verb, adjective, noun, adjective, prefix, character string }；

S3. to pretreated microblog data, emotion priori and skewed popularity priori are introduced.For example, for place The emoticon in microblogging lexical item sequence { " today ", " buying ", " new ", " mobile phone ", " good ", " happy ", " [heartily] " } after reason " [heartily] ", it is corresponded to marked as 5 in language material lexicon, and polarity is forward direction, then its corresponding η_·,5For [0,1]^T, wherein Element be respectively objective skewed popularity and subjective skewed popularity priori, corresponding λ_·,5For [1,0]^T, element therein is respectively forward direction With negative sense polarity priori.Set in β that all elements is 0.1, then for each theme z ∈ { 1 ..., T }, final priori F_·,.,z,5For：

S4. data are concentrated with the lexical item w of each position i_iSkewed popularity label c_i, emotion label l_iWith theme label z_iInto Row sampling.Such as the emoticon " [heartily] " in the microblogging d " it is good happy [heartily] that today buys new cell-phone " after processing, its The sampling type of skewed popularity, emotion and theme label is as follows：

P(c_i=k, l_i=s, z_i=t | w, c^-i,l^-i,z^-i,ε,γ,α,β,η,λ)∝

Wherein, N_dRepresent current lexical item w_iThe lexical item number of place text d, N_d,kRepresent the lexical item for belonging to skewed popularity k in text d Number, N_d,k,sRepresent the lexical item number for belonging to skewed popularity k and emotion s in text d, N_d,k,s,tRepresent to belong to skewed popularity k, feelings in text d Feel the lexical item number of s and theme t, N_k,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, N_k,s,t,w Represent the number for belonging to the lexical item w of skewed popularity k, emotion s and theme t in data set ,-i represents not include current lexical item.Based on α The priori of topic counts, by the acquistion of maximum likelihood numerology to.In addition, the priori of ε, γ skewed popularity and emotion mark counts, for warp Value is tested, sets γ=0.1*AL/S, ε=0.1*AL/K, wherein AL represents the average text size of microblog data collection；

S5. pass through the sampling of certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging.Such as After 1000 sampling iteration, the skewed popularity and emotion Joint Distribution of microblogging d " it is good happy [heartily] that today buys new cell-phone " Variable calculating formula is as follows：

S6. the final emotion probability distribution of every microblogging is calculated, the feelings of the feeling polarities of select probability maximum as microblogging Feel polarity.Such as microblogging d " it is good happy [heartily] that today buys new cell-phone " it is final emotion probability distribution calculating formula it is as follows

Wherein, k_subRepresent subjective skewed popularity label, k_objRepresent objective skewed popularity label.This microblogging after normalization The probability for belonging to positively and negatively feeling polarities is respectively 0.893434 and 0.106566, therefore judges the emotion pole of this microblogging Property for forward direction.

Specific embodiment described herein is only to spirit explanation for example of the invention.Technology belonging to the present invention is led The technical staff in domain can do various modifications or additions to described specific embodiment or replace in a similar way Generation, but without departing from spirit of the invention or beyond the scope of the appended claims.

Claims

A kind of 1. Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item, it is characterised in that：Comprise the following steps：

Step 1, obtain target microblog data collection to be analyzed；

Step 2, segment every microblogging, the pre-operation such as part-of-speech tagging, stop words filtering, and to preceding connecing the emotion of negative word Word is combined operation；

Step 3, to pretreated microblog data, introduce emotion priori and skewed popularity priori, emotion priori Including emotion word and emoticon；The skewed popularity of lexical item includes subjective skewed popularity and objective skewed popularity, the former refers to being partial to The subjective emotion of expression, the latter refer to being partial to describe objective things；This method uses emoticon first as subjective skewed popularity Knowledge is tested, time word, place word and pronoun are as objective skewed popularity priori；Introduce the mistake of emotion and skewed popularity priori Journey is specially：

Step 3a, the β squares of skewed popularity transfer matrix η, K × S × T × V of transference matrix λ, K × V of the empty S × V of structure Battle array and final prior matrix F (β, η, λ)；Wherein S, T, K, V represent emotion number, theme number, skewed popularity number sum number respectively According to the different lexical item numbers of concentration；

Step 3b, η_K×VAnd λ_S×VElement be initialized as 1；

Step 3c, l is marked for each lexical item w ∈ { 1 ..., V }, every kind of skewed popularity mark c ∈ { 1 ..., K } and every kind of emotion ∈ { 1 ..., S }, if w is skewed popularity priori, η_K×VIn element η_cwRenewal is as follows：

If w is emotion priori, λ_S×VIn element λ_lwRenewal is as follows：

Wherein, K (w) is the corresponding skewed popularity labels of w, and S (w) is the corresponding emotion labels of w；

Step 3d, for each lexical item w ∈ { 1 ..., V }, every kind of emotion mark l ∈ { 1 ..., S }, every kind of skewed popularity mark c ∈ { 1 ..., K } and each theme z ∈ { 1 ..., T }, final priori F_c,l,z,w(β, η, λ) is：

F_c,l,z,w(β, η, λ)=η_c,w·β_c,l,z,w·λ_l,w

Step 4, according to pretreated microblog data and priori, utilize Gibbs sampling algorithms to sample each lexical item Skewed popularity, emotion and theme label, concentrate data the lexical item w of each position i_iSkewed popularity label c_i, emotion label l_iAnd master Inscribe label z_iSampling is as follows：

<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>k</mi> <mo>,</mo> <msub> <mi>l</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>s</mi> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>t</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <msup> <mi>c</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <msup> <mi>l</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <msup> <mi>z</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> <mo>,</mo> <mi>&epsiv;</mi> <mo>,</mo> <mi>&gamma;</mi> <mo>,</mo> <mi>&alpha;</mi> <mo>,</mo> <mi>&beta;</mi> <mo>,</mo> <mi>&eta;</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>&Proportional;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>&epsiv;</mi> </mrow> <mrow> <msubsup> <mi>N</mi> <mi>d</mi> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>K</mi> <mo>&CenterDot;</mo> <mi>&epsiv;</mi> </mrow> </mfrac> <mo>&CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> <mo>&CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msub> <mi>&alpha;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msubsup> <mo>&Sigma;</mo> <mi>t</mi> <mi>T</mi> </msubsup> <msub> <mi>&alpha;</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> </mrow> </mfrac> <mo>&CenterDot;</mo> <mfrac> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msub> <mi>F</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <msub> <mi>w</mi> <mi>i</mi> </msub> </mrow> </msub> </mrow> <mrow> <msubsup> <mi>N</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msubsup> <mo>+</mo> <msubsup> <mo>&Sigma;</mo> <mi>v</mi> <mi>V</mi> </msubsup> <msub> <mi>F</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>v</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </mtd> </mtr> </mtable> </mfenced>

Wherein, N_dRepresent w_iThe lexical item number of place text d, N_d,kRepresent the lexical item number for belonging to skewed popularity k in text d, N_d,k,sRepresent Belong to the lexical item number of skewed popularity k and emotion s, N in text d_d,k,s,tRepresent to belong to skewed popularity k, emotion s and theme t in text d Lexical item number, N_k,s,tRepresent the lexical item number for belonging to skewed popularity k, emotion s and theme t in data set, N_k,s,t,wRepresent in data set Belong to the number of the lexical item w of skewed popularity k, emotion s and theme t；In addition, the priori counting that ε, γ are skewed popularity and emotion marks, For empirical value, the priori that α is the theme counts, learns to obtain by maximal possibility estimation, and-i represents not include current lexical item；

Step 5, the sampling by certain iterations, calculate the skewed popularity and emotion Joint Distribution variable of every microblogging, microblogging d Skewed popularity and emotion Joint Distribution variable calculate it is as follows：

<mrow> <msub> <mi>&pi;</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <mi>k</mi> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> </mrow>

Step 6, calculate the final emotion probability distribution of every microblogging, the emotion of the feeling polarities of select probability maximum as microblogging Polarity, the final emotion probability distribution calculating of microblogging d are as follows：

<mrow> <msub> <mo>&Pi;</mo> <mrow> <mi>d</mi> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>&CenterDot;</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> <mo>+</mo> <msub> <mi>W</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>&CenterDot;</mo> <mfrac> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mi>s</mi> </mrow> </msub> <mo>+</mo> <mi>&gamma;</mi> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>d</mi> <mo>,</mo> <msub> <mi>k</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> </mrow> </msub> <mo>+</mo> <mi>S</mi> <mo>&CenterDot;</mo> <mi>&gamma;</mi> </mrow> </mfrac> <mo>,</mo> <msub> <mi>W</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>></mo> <msub> <mi>W</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <mo>></mo> <mn>0.</mn> </mrow>