CN105117428A

CN105117428A - Web comment sentiment analysis method based on word alignment model

Info

Publication number: CN105117428A
Application number: CN201510471154.3A
Authority: CN
Inventors: 程红蓉; 唐明霜; 蔡腾远; 郭彦伟; 张锋
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2015-08-04
Filing date: 2015-08-04
Publication date: 2015-12-02
Anticipated expiration: 2035-08-04
Also published as: CN105117428B

Abstract

The present invention belongs to the field of sentiment analysis in natural language processing, and discloses a Web comment sentiment analysis method based on a word alignment model. The method specifically comprises: obtaining comment information from a webpage and preprocessing the content of the comment information; obtaining a candidate emotional word and a candidate evaluation object word from the comment based on an improved machine translation model; then using the emotional relationship of the emotional word and the evaluation object word, and a characteristic index of the word to extract the emotional word and the evaluation object from a candidate word list; and finally determining an emotional tendency of the corresponding emotional word to the evaluation object by use of an effective multi-categorization regression model. The method disclosed by the invention has performed experiments on a comment data set of a plurality of categories, and obtained good experimental results.

Description

A kind of web based on word alignment model comments on sentiment analysis method

Technical field

The present invention relates to internet arena, natural language processing and machine learning field, be specifically related to a kind of web based on word alignment model and comment on sentiment analysis implementation method.

Background technology

Along with the arrival of web2.0 and the develop rapidly of mobile Internet, information on internet is explosive growth, International Telecommunications Union (ITU) issues " Information & Communication Technology in 2014 " report and claims, to the end of the year in 2014, Global Internet number of users will reach about 3,000,000,000, and the most network user also obtains from the network information in past the fabricator that taker becomes the network information, make the visit capacity of the quantity of Web content and the network information all in swift and violent increase.Sentiment analysis is exactly to the information on internet, and as news, blog article, comment on commodity, mail, the contents such as forum postings carry out analyzing and excavating.

Along with specification and the development of ecommerce, the user of online shopping gets more and more.Line is done shopping, and user is cognitive really to product neither one, so user can be partial to see that the review information of commodity determines whether buying.For manufacturer or electricity Shang company, the public praise wanted by understanding certain brand judges following sales trend, no longer be confined to questionnaire in the past or call-on back by phone carrys out obtaining information, they directly just can obtain the commodity market feedback information wanted by online comment on commodity.In addition, by carrying out sentiment analysis to comment on commodity and then commercial product recommending is also one to user applying very widely.Therefore, how effectively from the comment on commodity of magnanimity, the active demand that profound emotion information becomes every profession and trade people is excavated.The sentiment analysis (SentimentAnalysis/OpinionMining) of network comment also just naturally becomes current study hotspot.

The groundwork of sentiment analysis is tendentiousness information extraction and tendentiousness classification.The main task of tendentiousness information extraction is at word, sentence or chapter rank extract the key element relevant to Sentiment orientation, wherein finer work be there has also been in recent years to the extraction of evaluation object (opiniontarget, also referred to as product feature productfeature).Hu and Liu (Huetal., 2004) think the evaluation object noun often mentioned of reviewer or noun phrase often, therefore with the method for correlation rule extract minimum supporting rate (minimumsupport) be 1% noun or noun phrase as the evaluation object of frequent (frequent).Then extracting the adjective comprised in the sentence of evaluation object is emotion word (opinionword), and the evaluation object frequently that finally combination is drawn into and emotion word are to extract the evaluation object of non-frequent (infrequent).Popescu and Etzioni (Popescuetal., 2005) improves the method for Hu and Liu.First for each product category defines a series of whole relation mark word (meronymydiscriminator), some mutual information (PMI) value then calculating overall mark word and noun obtains the possibility that this noun is evaluation object.The people (Qiuetal., 2011) such as Qiu in 2011 extract evaluation object with the two-way propagation algorithm (DoublePropagation) based on semantic relation.The people (Liuetal., 2012) such as Liu in 2012 have used the statistical machine translation based on word in sentiment analysis first, combine and extract evaluation object and emotion word.

Comprising word, short sentence, sentence, the granularity that chapter etc. are different for emotion tendency classification, is all text classification problem.Emotion tendency is classified, and mainly contains the method based on supervised learning, and wherein making with the most use has naive Bayesian, support vector machine, maximum entropy model, k nearest neighbor and condition random field sorter etc., be also widely used in emotional semantic classification based on the method for semi-supervised learning and unsupervised learning point method.Although semi-supervised simpler than the method for supervised learning in realization with method that is unsupervised learning, but the semantic similarity between emotion word is difficult to calculate, final classification results does not have the classify accuracy of supervised learning high yet, so present Sentiment orientation classification great majority or the method for the supervised learning used.

Summary of the invention

Based on above background technology, the present invention proposes a kind of sentiment analysis method to product review, object is to provide reference to the purchase of user, they effective feedback information can be provided again, so that can improve to product or to user's recommended products for production firm or electricity Shang company.

The main body of method proposed by the invention extracts emotion word and evaluation object word, different from the method that the emotional relationship (opinionrelation) between traditional independent consideration emotion word and evaluation object or the independent characteristic relying on word itself extract emotion word and evaluation object, the characteristic that present invention incorporates emotional relationship and word combines extraction emotion word and evaluation object, can obtain higher extraction accuracy rate.Finally there is the machine learning method of supervision to carry out emotional semantic classification to emotion word with a kind of, and then judge the Sentiment orientation of the emotion word corresponding to evaluation object.Specific embodiment of the invention step is as follows:

1. data prediction

Data of the present invention are the comments on commodity captured from network by crawlers, there is the nonstandard situation of format write in these data, in order to reduce the impact on text analyzing and emotional semantic classification, first pre-service has been carried out to data, such as remove null, remove space, remove the punctuate and web page tag etc. that repeat.Carry out participle and part-of-speech tagging with participle instrument to the text after process again, finally the text after participle being pressed punctuation mark (comma, fullstop, exclamation mark) cutting is sentence.

2. extract emotion word and the evaluation object of candidate

Before extraction candidate's emotion word and evaluation object, the present invention is based on such hypothesis: all noun/noun phrases are all the evaluation objects of candidate, all adjective/verbs are all the emotion word of candidate.This hypothesis is widely used in sentiment analysis before, and is proved to be effective.Had this hypothesis, the present invention just regards extraction emotion word and evaluation object as and extracts the right task of (adjective/verb, noun/noun phrase) word in the text.The present invention is the Machine Translation Model (WTM based on word, Word-BasedTranslationModel) the word alignment model changing single language into extracts the right task of word, specifically improve one's methods and be: noun/noun phrase (adjective/verb) snaps to adjective/verb (noun/noun phrase) or snaps to sky (NULL), make the word alignment of other part of speech own to them.The text handled well in 1st step is copied the parallel corpus of generation one, using the input data of these two identical corpus as model.In corpus one is contained to the sentence S={ ω of n word ₁, ω ₂..., ω _n, word be tried to achieve to A={ (j, aj) | j ∈ [1, n] }, the new probability formula that will be calculated as follows:

P (A | S) &Proportional; Π_{k = 1}^{n} n (φ_{k} | ω_{k}) Π_{j = 1}^{n} t (ω_{j} | ω_{j}) d (j | a j, n) - - - (1)

Wherein t (ω _j| ω _aj) represent that the noun/noun phrase (adjective/verb) of a jth position and the adjective/verb (noun/noun phrase) of aj position appear at the probabilistic information in sentence simultaneously.If an adjective/verb and a noun/noun phrase occur frequently in corpus, so t (ω _j| ω _aj) value will be larger.D (j|aj, n) the modelling positional information of word, represents the probability of the word of the word aligned position j of position aj.N (φ _k| ω _k) the modelling procreation probability of word alignment, illustrate the situation of word one-to-many, wherein φ _krepresent and snap to word ω _kthe number of word.By maximizing new probability formula just can in the hope of word pair.

3. obtain accurate emotion word and evaluation object

1) relation value between emotion word and evaluation object is calculated

Obtain all words to afterwards by above method, just can calculate the alignment probability between noun/noun phrase and adjective/verb, formula is as follows:

P (ω_{o} | ω_{t}) = \frac{c o u n t (ω_{o}, ω_{t})}{c o u n t (ω_{t})} - - - (2)

Wherein ω _orepresent adjective/verb, ω _trepresent noun/noun phrase, P (ω _o| ω _t) represent that noun/noun phrase is to the alignment probability of adjective/verb, in like manner can try to achieve the probability that aligns of adjective/verb and noun/noun phrase.By alignment probability, just can potential emotional relationship between formulistic emotion word and evaluation object, represent with Association.Concrete formula is as follows:

Association(ω _o,ω _t)＝(λ*P(ω _t|ω _o)+(1-λ)*P(ω _o|ω _t)) ^-1(3)

As was previously described, the present invention extracts emotion word and evaluation object in conjunction with the characteristic index (the present invention Indicator represents) of potential emotional relationship and word itself.Because emotion word and evaluation object have different I ndicator, the present invention has used diverse ways to calculate them.

2) Indicator of Calculation Estimation object

The present invention is evaluation object noun/noun phrase being regarded as candidate, and this kind of word has field singularity, simultaneously in corpus medium-high frequency and the candidate word be evenly distributed more likely becomes evaluation object.Based on this 2 point, the present invention has used m with the comment incoherent corpus in language material field and has carried out the Indicator of Calculation Estimation object in conjunction with document frequency and information entropy.

About fix on comments material storehouse C={d ₁, d ₂..., d _nin each section comment d _jseparate, if certain evaluation object word t _idistribute in corpus more even, then information entropy is larger.Just well can be reflected the distribution situation of word by the size of entropy, in the present invention, the computing formula of information entropy is as follows:

I E (t_{i}) = - Σ_{i = 1}^{n} p (d_{j}, t_{i}) \log p (d_{j}, t_{i}) - - - (4)

Wherein p (d _j, t _i) represent candidate word t _iat comment d _jthe probability of middle appearance, circular is as follows

p (d_{i}, t_{i}) = \frac{{tf}_{i j}}{Σ_{j = 1}^{n} {tf}_{i j}} - - - (5)

Wherein tf _ijrepresent candidate word t _iword frequency in the comment of a jth section, if t _ionly occur in one section of comment, so p (d _j, t _i)=1, then logp (d _j, t _i)=0, makes IE (t _i)=0.In order to the feasibility of subsequent calculations, IE (t _i) can not be 0, the present invention just adds very little constant term factor ε=0.0001 in the denominator part of formula (5).New probability formula so is now:

p (d_{j}, t_{i}) = \frac{{tf}_{i j}}{Σ_{j = 1}^{n} {tf}_{i j} + ϵ} - - - (6)

Based on entropy, high frequency words can be preferred, but may have common major terms in high frequency words, as: " people ", " thing " etc., and also may there is evaluation object in the low-frequency word omitted.In order to make up this defect, the present invention uses m and comments on the uncorrelated but corpus that scale is identical in language material field, and calculates in conjunction with document frequency, the present invention Ds (t _i) _jrepresent candidate word t _idistributed intelligence in the incoherent corpus in a jth field, concrete formula is as follows:

D s {(t_{i})}_{j} = \{\begin{matrix} α \times l o g (1 + {df}_{i n}) & \begin{matrix} i f & {df}_{o u t_j} = 0 \end{matrix} \\ \frac{l o g (1 + {df}_{i n})}{l o g (1 + {df}_{o u t_j})} & o t h e r w i s e \end{matrix} - - - (7)

Wherein df _incandidate word t _idocument frequency in comment corpus, df _{out_j}represent t _iat jth and the document frequency in the incoherent corpus in field.Work as df _{out_j}when=0, candidate word t _ivery large probability is then had to be the evaluation object word with field singularity.Therefore in order to improve document frequency to distributed intelligence Ds (t _i) _jimpact, parameters α is greater than 1.

Finally, the formula of the Indicator of evaluation object is asked to be expressed as:

I (t_{i}) = \overset{&OverBar;}{D s (t_{1})} \times I E (t_{i}) - - - (8)

Wherein

\overset{&OverBar;}{D s (t_{1})} = \frac{Σ_{j = 1}^{m} D s {(t_{i})}_{j}}{m}

3) Indicator of emotion word is calculated

The emotion word of candidate is adjective/verb, and major part does not have field correlativity, as: " good ", " disliking ", " liking " etc.Small part word has field singularity, as: " good to eat " in food and drink comment, " stimulation " in film comment.The present invention is in conjunction with the Indicator of document frequency and word distribution proportion calculated candidate emotion word.Concrete formula is as follows:

I(o _i)＝log(1+df _i)×D _i(9)

Df in above formula _irepresent candidate word o _idocument frequency in comment corpus, represent the distribution situation of candidate word.Tf _ijcandidate word o _iword frequency in the comment of a jth section, represent o _iaverage word frequency in all comments.

By above 1) ~ 3) step obtains the important indicator value Indicator of emotional relationship value Association between emotion word and evaluation object and word itself.Association and Indicator is combined, forms the parameter of a screening candidate word, be called the energy value (Energy) of candidate word.In candidate word list, energy value is chosen as final emotion word (or evaluation object word) higher than those words of certain threshold value.The present invention turns to a bigraph (bipartitegraph) this algorithm model, and calculates the energy value of emotion word and evaluation object with the Random Walk Algorithm restarted (RandomWalkingwithRestart), and formula is as follows:

\begin{matrix} E (t) = λ \times R \times E (o) + (1 - λ) \times I_{t} \\ E (o) = λ \times R \times E (t) + (1 - λ) \times I_{o} \end{matrix} - - - (10)

Wherein E (t) and E (o) represents the energy value of evaluation object word and emotion word respectively, and R is relational matrix, R _ijrepresent the Association weight between the evaluation object word of i-th candidate and jth candidate's emotion word.I _trepresent the vector of the Indicator of candidate evaluations object, each element value is wherein calculated by formula (8).I _orepresent the vector of the Indicator of candidate's emotion word, wherein each element value is calculated by formula (9).λ ∈ [0,1] is a mediation parameter.

4. emotion word feeling polarities classification

According to the method for the embodiment of the present invention, the Sentiment orientation of emotion word corresponding to evaluation object finally to be obtained.The present invention uses a kind of effective polytypic regression model Softmax to carry out emotional semantic classification to emotion word, and the Sentiment orientation of emotion word is divided three classes (positive, neutral, negative), in regression model, use (3,2,1) to represent classification respectively.For given data, emotion word is converted into the proper vector of applicable Softmax by the first word vector model of the present invention.Input the data { (x of Softmax regression model ⁽¹⁾, y ⁽¹⁾) ..., (x ⁽ⁿ⁾, y ⁽ⁿ⁾), wherein y ⁽ⁿ⁾{ 1,2,3} represents classification to ∈, input feature vector x ⁽ⁱ⁾∈ R ⁿ⁺¹, the dimension of representation feature vector x is n+1.By 5 folding cross validation training pattern parameters.The highest that classification of the probability finally exported is as prediction classification.The result of emotional semantic classification is shown with an example at this, as hotel's comment: " room in hotel is very warm; feel just as going back home ", extract emotion word and evaluation object word to (room, warm), classified by Softmax, the emotion classification of " warmth " correspondence is 3, being evaluated as " just " then to " room ", be expressed as (room ,+).

Accompanying drawing explanation

Fig. 1 is the general frame figure of the web comment sentiment analysis method that the present invention is based on word alignment model;

Fig. 2 is the treatment scheme of the web comment sentiment analysis method that the present invention is based on word alignment model;

Fig. 3 is that the web that the present invention is based on word alignment model comments on the illustraton of model extracting emotion word and evaluation object in sentiment analysis method;

Fig. 4 is the process flow diagram that the web that the present invention is based on word alignment model comments on for emotion word emotion kind judging in sentiment analysis method.

Embodiment

With reference to the accompanying drawings, and in conjunction with specific embodiments, embodiments of the invention are described in detail.The embodiment described below with reference to accompanying drawing is exemplary, just for explaining the present invention, and can not be interpreted as limitation of the present invention.

The present invention is the web comment sentiment analysis method based on word alignment model, mainly carries out sentiment analysis to the comment on commodity on internet.As depicted in figs. 1 and 2, the present invention includes following steps:

S1. review information is obtained from Web.The data of specific embodiments of the invention are with crawlers from Jingdone district net, Dangdang.com, camera comment, book review, hotel's comment and food and drink comment that ctrip.com and popular comment website capture respectively.The concrete scale of data set is as shown in table 1.

Table 1 comment data collection

Field	Comment record	Comment sentence number
			Camera	17052	63574
Book	9473	21630
			Hotel	2331	7365
Food and drink	35519	346832

S2. pre-service is carried out to data

The data captured from Web are above all nonstandard usually, first remove web page tag, remove the punctuation mark etc. of repetition.Then with the Words partition system NLPIR of the Chinese Academy of Sciences, participle is carried out to text, obtain corpus C.Then press comma, fullstop, exclamation mark is sentence each section of comment cutting, obtains corpus C1.

S3. from review information, obtain emotion word and the evaluation object of candidate, concrete steps are as follows:

1) S2 step is obtained text data language material C1 and copy the parallel corpus C2 of generation one.

2) amendment is based on the Machine Translation Model of word, and concrete modification strategy is as follows: allow noun/noun phrase (adjective/verb) snap to adjective/verb (noun/noun phrase) or NULL.Make the word alignment of other part of speech own to them.

3) 1) in data set C1 and C2 be input to 2) in amendment model, finally obtain (noun/noun phrase, adjective/verb/NULL) or (adjective/verb, noun/noun phrase/NULL) word pair.

S4. accurate word pair is extracted from candidate word centering

1) obtain all words pair in comment corpus by S3, just can calculate the right emotional relationship Association of word now, circular is with reference to formula (3).

2) because the evaluation object word of candidate has field singularity, the present embodiment 5 corpus D1s uncorrelated and identical with C1 scale with comment language material field, D2, D3, D4, D5, then the word index Indicator of combining information entropy and document frequency calculated candidate evaluation object, concrete with reference to formula (8).

3) most of emotion word does not have field singularity, and as " good ", " liking ", " ugliness " etc., only have a small amount of emotion word to have field singularity, as " good to eat " in food and drink comment.True based on this, the document frequency of the embodiment of the present invention in conjunction with word and the word index Indicator of distribution proportion acquisition candidate emotion word, concrete grammar is with reference to formula (9).

4) by 1), 2), 3) step obtains the emotional relationship value Association of candidate word and their point other Indicator values, in order to these two factor models, the present embodiment constructs a bigraph G (V, E, R), as accompanying drawing 3, v _t∈ V, and v _o∈ V, v _trepresent candidate evaluations subject word, v _orepresent candidate's emotion word.E is the limit collection between summit, v _tand v _obetween when having an emotional relationship, then have limit.R represents the weight set on limit, and each element in R is by 1) in calculate Association form.Then obtain the energy value (Energy) of candidate word with the Random Walk Algorithm restarted (RWR) with reference to formula (10) iterative computation, those words that in candidate word list, energy value is greater than certain threshold value are selected as final emotion word and evaluation object word.

Evaluation criterion: embodiment of the present invention accuracy rate, recall rate and F1 value as evaluation index, through manual verification, the accuracy rate on four data sets, recall rate and F1 Data-Statistics see the following form:

Table 2 experimental result

Data set	Precision	Recall	F1 value
				Book review	0.64	0.92	0.75
Camera is commented on	0.60	0.76	0.67
				Hotel is commented on	0.70	0.87	0.77
Food and drink is commented on	0.63	0.85	0.72

S5. Sentiment orientation

The one that the embodiment of the present invention is effectively many classification regression model Softmax carries out feeling polarities judgement to emotion word.The feeling polarities of emotion word is divided into front by the embodiment of the present invention, neutral, negative three classifications, and use numeral 3,2 respectively, 1 represents.Before training Softmax model, the first emotion classification of artificial mark emotion word, in embodiments of the present invention, three people are first allowed to carry out Emotion tagging to emotion word respectively, there is the situation of difference in the classification of these three people's marks, is then consulted to discuss by three people, draw final annotation results.Then with a kind of term vector model, emotion word is converted into the proper vector of n dimension, the term vector model of the embodiment of the present invention is word2vector model.Finally draw the value that predicts the outcome of each emotion word.Specific experiment flow process is shown in accompanying drawing 4.

Below describe the embodiment of the present invention illustratively, so that the researchist of the art understands the present invention, but be noted that the scope that the invention is not restricted to specific embodiment.For the ordinary skill in the art, without departing from the principles and spirit of the present invention, multiple amendment is carried out to these embodiments, change, replace and modification etc., all should be included within scope that claims of the present invention protect.

Claims

1. the web based on word alignment model comments on a sentiment analysis method, and the concrete steps that the method comprises are as follows:

Step 1, to capturing the punctuation mark that the comment data of getting off removes repetition from internet, remove the process such as web page tag, then participle and part-of-speech tagging are carried out to it.Again the data marked by comma, fullstop, exclamation mark cutting be short sentence.

Step 2, revise Machine Translation Model based on word, this bilingual translation model is applied to the emotion word and the evaluation object word pair that extract candidate in single words and phrases language alignment model.

Step 3, characteristic in conjunction with the emotional relationship between emotion word and evaluation object word and word itself, extract accurate emotion word and evaluation object from candidate word centering.

Step 4, with much a kind of more effective classification regression model, feeling polarities judgement is carried out to emotion word.

2. method according to claim 1, is characterized in that, is be that the corpus of sentence copies and generates another parallel corpus by the cutting in step 1 in step 2, and these two identical corpus are as the input language material of single language alignment model.

3. method according to claim 1 and 2, is characterized in that, further comprising the steps in step 3:

3.1 according to all words obtained to the emotional relationship value calculated between emotion word and evaluation object.

The word characteristic value Indicator of 3.2 calculated candidate evaluation object words

Evaluation object word due to candidate has field singularity, and the present invention has used m and commented on the uncorrelated but language material that scale is identical with former corpus in language material field, then the word index Indicator of combining information entropy and document frequency calculated candidate evaluation object.

If in the evaluation object list of candidate word have higher word frequency and in corpus distribution uniform, so to become the possibility of evaluation object larger for this candidate word.In comment corpus C, each section of comment as being an independently classification, a n section in corpus C, is had to comment on C={d ₁, d ₂..., d _n, if certain evaluation object word t _idistribute in corpus more even, then information entropy is larger.Describing formulism above:

I E (t_{i}) = - Σ_{j = 1}^{n} p (d_{j}, t_{i}) \log p (d_{j}, t_{i}) - - - (1)

P (d in above formula _j, t _i) represent candidate word t _iat comment d _jthe probability of middle appearance, circular is as follows:

p (d_{j}, t_{i}) = \frac{{tf}_{i j}}{Σ_{j = 1}^{n} {tf}_{i j}} - - - (2)

Tf in above formula _ijrepresent candidate word t _iword frequency in the comment of a jth section, if t _ionly occur in one section of comment, so p (d _j, t _i)=1, then logp (d _j, t _i)=0, makes IE (t _i)=0.In order to the feasibility of subsequent calculations, IE (t _i) can not be 0, just adding very little constant term factor ε=0.0001. new probability formula so now in the denominator part of formula (2) is:

p (d_{j}, t_{i}) = \frac{{tf}_{i j}}{Σ_{j = 1}^{n} {tf}_{i j} + ϵ} - - - (3)

According to information entropy meeting prioritizing selection high frequency words, but common major terms may be had in high frequency words, and also may there is evaluation object in the low-frequency word omitted.Therefore, the present invention has used m and comment incoherent expectation storehouse, language material field, and calculates in conjunction with document frequency, and concrete formula is as follows:

D s {(t_{i})}_{j} = \{\begin{matrix} α \times l o g (1 + {df}_{i n}) & \begin{matrix} i f & {df}_{o u t_j} = 0 \end{matrix} \\ \frac{l o g (1 + {df}_{i n})}{l o g (1 + {df}_{o u t_j})} & o t h e r w i s e \end{matrix} - - - (4)

Df in above formula _incandidate word t _idocument frequency in comment corpus, df _{out_j}represent t _iat jth and the document frequency in the incoherent corpus in field.Work as df _{out_j}when=0, candidate word t _ivery large probability is had to be the evaluation object word with field singularity.Therefore α is a parameter being greater than 1.

By above description, the formula of the Indicator of evaluation object is asked to be expressed as:

I (t_{i}) = \overset{&OverBar;}{D s (t_{1})} \times I E (t_{i}) - - - (5)

In above formula

\overset{&OverBar;}{D s (t_{1})} = \frac{Σ_{j = 1}^{m} D s {(t_{i})}_{j}}{m}

The word characteristic value Indicator of 3.3 calculated candidate emotion word

Most emotion word does not have field correlativity, as: " good ", " disliking ", " liking " etc.Small part emotion word has field singularity, as: " good to eat " in food and drink comment.The present invention is in conjunction with the Indicator of document frequency and word distribution proportion calculated candidate emotion word.Computing formula is as follows:

I(o _i)＝log(1+df _i)×D _i(6)

3.4 obtain accurate emotion word and evaluation object

In order to more than modelling two factors, the present invention constructs a bipartite graph, then use the energy value of a Random Walk Algorithm iterative computation emotion word and evaluation object, in candidate word list, energy value is chosen as final emotion word (or evaluation object word) higher than those words of certain threshold value.Concrete formula is as follows:

E(t)＝λ×R×E(o)+(1-λ)×I _t

(7)

E(o)＝λ×R×E(t)+(1-λ)×I _o

Wherein E (t) and E (o) represents the energy value of evaluation object word and emotion word respectively, R representation relation matrix, R _ijrepresent the Association weight between the evaluation object word of i-th candidate and jth candidate's emotion word.I _trepresent the vector of the Indicator of candidate evaluations object, each element value is wherein calculated by formula (5).I _orepresent the vector of the Indicator of candidate's emotion word, wherein each element value is calculated by formula (6).λ ∈ [0,1] is a mediation parameter.