Background technology
Along with the progressively development of Internet technology, increasing netizen is by blog, and micro-blog, forum, the channels such as news online comment express them to commercial product, accident, the suggestion of government work etc.The commercial product company of some specific areas, as digital product industry, grocery trade, hotel industry etc., need to understand in time client to the evaluation of their product, understand client to the satisfaction of its product, and then make suitable product adjustment and adapt to competition among enterprises fierce at present; In addition, departments of government also needs to understand netizen in time to the attitude of government work or the attitude of focusing media event, grasps the mood of the people in time, makes correct choice, prevent the generation of malignant event.Because network information is large, increase also very fast, therefore we by computing machine, will carry out the sentiment analysis work of robotization simultaneously.
From current research conditions both domestic and external, the object of sentiment analysis assessment text is grouped into front evaluate (Positive) or negative (Negative) evaluation.Such as given product evaluation, system needs to judge expressed by this evaluation to be front suggestion or the negative comment (also introducing neutral evaluation in some cases) of the person of reading and appraising.Sentiment analysis is mainly used in judging the popular prevailing paradigm to an object fast.This task is similar based on the text classification (text classification) of theme to tradition, therefore identical technology is mostly used to solve this problem from technically seeing of current research use, as supervised learning algorithm (supervisedlearning), semi-supervised learning algorithm (semi ?supervised learning) and unsupervised learning algorithm (unsupervised learning).But both distinguish again to some extent, based on the text classification of theme document assigned in the different themes classification pre-defined, such as politics, science, physical culture etc.Based in the classification of theme, theme relative words are important.And in suggestion classification, theme relative words are unessential.On the contrary, the suggestion vocabulary characterizing front or negative view is important, such as, and " good ", " outstanding ", " sad ", " poor " etc.Meanwhile, these fields residing for viewpoint word are also very important, and different suggestion tendencies can be expressed in same word in different fields.Therefore, according to the domain feature of sentiment analysis, we can be divided into single field sentiment analysis technology and cross-cutting sentiment analysis technology.
1, single field sentiment analysis technology
Single field sentiment analysis, namely by the Emotional Corpus in certain field marked, is trained a disaggregated model on this basis, has then been continued the sentiment analysis task in this field follow-up by this model.In this classification, the application of supervised learning algorithm is occupied an leading position, as K ?nearest neighbor algorithm (kNN), NB Algorithm (NaiveBayes), and algorithm of support vector machine (SVM).
Due to the triviality of corpus annotation, semi-supervised learning algorithm, as expectation-maximization algorithm (EM) and unsupervised learning algorithm, score function method is also all widely used in sentiment analysis research.But experiment proves that supervised learning method is better than semi-supervised and unsupervised algorithm in the classification of single field suggestion.
2, cross-cutting sentiment analysis technology
In sentiment analysis application, field migration (domain ?transfer) problem is more common.As given an emotion word " height ".If " room rate is high ", so this word is exactly negative; If " earning a large income ", so this word is exactly front.If we adopt supervised learning algorithm conventional in single field in this case, the decline of sorter accuracy rate will be caused.In current research, solve sentiment analysis field migration problem and mainly contain three kinds of schemes.The first uses unsupervised learning (unsupervised learning) method exactly, namely semantic point (semantic orientation) method in English sentiment analysis in application, first the method carries out part-of-speech tagging to each participle, then the part of speech sequence of specifying is installed filter, the sequence satisfied condition is remained, then uses point to mutual information (PointWise Mutual Information) algorithm and AltaVista search engine completes and carries out feeling polarities marking to each phrase remained.The marking of last these phrases comprehensive, completes the feeling polarities marking to one or one section words.First scheme is the public characteristic (generalizable features) finding training set field (also referred to as source domain) and test set field (also referred to as target domain), then completes migration task by semi-supervised learning (semi ?supervised learning) algorithm.Last a solution uses corpus migration algorithm, and this algorithm finds the similar portion of target domain corpus and source domain corpus by calculating, then re-training sorter obtains disaggregated model.This scheme completes based on supervised learning algorithm.Although two kinds of solutions below avoid the drawback of the first scheme, all need re-training sorter, therefore the operational efficiency of whole algorithm is not high yet.
At present, robotization sentiment analysis for text it is also proposed some solutions: such as patented claim CN201210154332.6 (application title: a kind of text sentiment classification method and system, application time: 2012 ?05 ?17, applicant: University Of Suzhou) disclose a kind of text sentiment classification method, comprise: contrast preset emotion vocabulary, in text to be sorted, find out emotion word, and obtain the feeling polarities corresponding with emotion word according to emotion vocabulary; Use two polarity transformation rule judgment emotion word whether polarity transformation to occur, according to the feeling polarities of emotion word and the polarity transformation result of emotion word, calculate each word in text to be sorted and appear at the probability in the text of each polarity; Appear at the probability in the text of each polarity according to each word in text to be sorted, utilize Bayesian classifier model to treat classifying text and classify.These technical schemes are mainly used in judging text feeling polarities, as front, neutrality or negative.Because emotional semantic classification is different from text classification, to same section of comment, different people may think that it have expressed different Sentiment orientation, such as text: " this mobile phone working procedure travelling speed is very fast, and screen is also fine, is short of to some extent exactly in stand-by time." somebody may think neutral, somebody may think negative, and even some people may think front.Too thought in absolute terms the feeling polarities that text provides by technique scheme, be difficult to meet proprietary Sentiment orientation, analysis result accuracy rate is not high.
Therefore, how effectively to improve the accuracy rate that text emotion is analyzed? be still the technical barrier that a urgent need will solve.
Summary of the invention
In view of this, the object of this invention is to provide a kind of text emotion index calculation method and system, effectively can improve the accuracy rate that text emotion is analyzed.
In order to achieve the above object, the invention provides a kind of text emotion index calculation method, described method includes:
Steps A, build the emotion dictionary that non-field limits, select the emotion word that multiple non-field limits, and the emotion mark of the emotion word limited in described non-field and correspondence thereof is kept in the emotion dictionary that non-field limits;
Step B, according to punctuation mark, text to be calculated is divided into multiple clause, and participle is carried out to each clause, then the emotion word of the non-field restriction comprised in each clause is found successively, the emotion mark of affection index computation rule to the emotion word that each non-field comprised in clause limits is used to adjust, finally according to the emotion mark of the emotion word of the non-field restriction after adjustment, add up the emotion mark of each clause, thus calculate the affection index value of text to be calculated
In described step B, find the emotion word of the non-field restriction comprised in each clause successively, use the emotion mark of affection index computation rule to the emotion word that each non-field comprised in clause limits to adjust, include further:
Step B1, judge the emotion word that whether has non-field to limit in the participle of each clause, if so, then from the emotion dictionary that non-field limits, extract the emotion mark F that emotion word that described non-field limits is corresponding, continue next step; If not, then this flow process terminates;
Whether step B2, the participle judging to be positioned in described clause before emotion word that non-field limits have adversative, and if so, then the emotion mark F of the emotion word limited in described non-field is adjusted to 2*F, continues next step; If not, then next step is continued; Described adversative includes but are not limited to: but but but, wilfully, just but, as, so that, unexpectedly, surprisingly;
Step B3, the participle judging before emotion word that in described clause, non-field limits whether have negative word and the word distance of emotion word that negative word and non-field limit is less than or equal to 2, if, the emotion mark F of the emotion word then limited in described non-field is adjusted to-F, continues next step; If not, then next step is continued; Described negative word includes but are not limited to: not, do not have;
Step B4, judge whether to have adverbial word in described clause and the word distance of emotion word that adverbial word and non-field limit is less than or equal to 2, if so, then the emotion mark F of the emotion word limited in described non-field is adjusted to 2*F, continues next step; If not, then next step is continued; Described adverbial word includes but are not limited to: very, especially;
Step B5, judge whether emotion word that non-field limits is in the end of the sentence of clause, if so, then the emotion mark F of the emotion word limited in described non-field is adjusted to 2*F.
In order to achieve the above object, present invention also offers a kind of text emotion index computing system, include:
Non-field emotion dictionary construction device, for building the emotion dictionary that non-field limits, selects the emotion word that multiple non-field limits, and the emotion mark of the emotion word limited in described non-field and correspondence thereof is kept in the emotion dictionary of non-field restriction;
Text emotion exponential calculation device, for the text to be calculated of input being divided into multiple clause according to punctuation mark, and participle is carried out to each clause, then the emotion word of the non-field restriction comprised in each clause is found successively, the emotion mark of affection index computation rule to the emotion word that each non-field comprised in clause limits is used to adjust, finally according to the emotion mark of the emotion word of the non-field restriction after adjustment, add up the emotion mark of each clause, thus calculate the affection index value of text to be calculated
Text emotion exponential calculation device includes further:
Text input unit, for inputting text to be calculated, and sends to text emotion computing unit by described text to be calculated;
Text emotion computing unit, for text to be calculated is divided into multiple clause, and participle is carried out to each clause, the emotion word that the non-field of preserving in the emotion the dictionary then each participle in clause and non-field limited one by one limits contrasts, if consistent, then described clause and the emotion word of the non-field restriction of its correspondence are sent to emotion word score calculating unit; Receive the emotion mark of the emotion word of the non-field restriction that emotion word score calculating unit returns, and according to the emotion word that the non-field that each clause comprises limits, add up the emotion mark of each clause, finally according to the emotion mark of all clauses, add up the affection index value of text to be calculated;
Emotion word score calculating unit, the emotion word that the clause sent for receiving text emotion computing unit limits with the non-field of its correspondence, the emotion mark that the emotion word of described non-field restriction is corresponding is extracted from the emotion dictionary that non-field limits, then the emotion mark of affection index computation rule to the emotion word that the non-field comprised in clause limits is used to adjust, the emotion mark of the emotion word finally limited in the non-field after adjustment sends to text emotion computing unit, described affection index computation rule can based on Chinese parsing, the emotion word limited according to field non-in clause and adversative, negative word, the position of adverbial word in clause, the emotion mark of the emotion word that field non-in clause limits is adjusted.
Compared with prior art, the invention has the beneficial effects as follows: the present invention does not need re-training sorter in actual applications, and execution efficiency is higher, take into full account the field animal migration of emotion word, and consider the feature of Chinese expression, emotional semantic classification is different from text classification, to same section of comment, different people may think that it have expressed different Sentiment orientation, therefore the present invention represents the result that text emotion is analyzed instead of concrete feeling polarities by text emotion index, and a given affection index interval: [text emotion Index Min, text emotion index maximal value], affection index value is more close to text emotion index maximal value, then illustrate that the positive emotion tendency that the text is expressed is larger, more close to text emotion minimum value, then illustrate that its negative emotion tendency is larger, thus avoid and too think in absolute terms, effectively improve accuracy rate and the user satisfaction of text emotion analysis.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with drawings and Examples, the present invention is described in further detail.
In sentiment analysis research, emotion word plays important role, is the emotion word that non-field limits especially.Such as " good " and " bad " is just the emotion word that non-field limits respectively, because in the field of the overwhelming majority, " good " is all the word of expressing positive emotion, and " bad " is the word of expressing negative emotion.But the word as " height " is exactly field to be limited, such as: " his income is very high ", what now " height " this vocabulary reached is exactly positive emotion, but: " present room rate is very high ", what now " height " expressed is exactly negative emotion.Therefore, the present invention first obtains a large amount of emotion word from various approach such as networks, the Emotional Corpus of multiple FIELD Data is included subsequently by one, calculate the emotion mark of these emotion word, result represents no matter at positive emotion or in negative emotion, and the higher or lower word of score is the word that limits of non-field; Then, by emotion word and the emotion mark of wherein non-field restriction, and based on Chinese parsing, the affection index value of text to be measured is calculated, described affection index can be used for the emotion intensity described expressed by text to be measured, interval index can select-150 to 150, and the mark more trending towards the two poles of the earth illustrates that the negative of text to be measured or positive emotion tend to stronger.
As shown in Figure 1, three circles represent the emotion word having distinguished positive and negative emotion in three fields respectively, and intermediate interdigitated the part of blacking represents three field compathy words, therefore can think that black part is the emotion word that the non-field in these three fields limits.
As shown in Figure 2, a kind of text emotion index calculation method of the present invention includes:
Steps A, build the emotion dictionary that non-field limits, select the emotion word that multiple non-field limits, and the emotion mark of the emotion word limited in described non-field and correspondence thereof is kept in the emotion dictionary that non-field limits;
Step B, according to punctuation mark, text to be calculated is divided into multiple clause, and participle is carried out to each clause, then the emotion word of the non-field restriction comprised in each clause is found successively, the emotion mark of affection index computation rule to the emotion word that each non-field comprised in clause limits is used to adjust, finally according to the emotion mark of the emotion word of the non-field restriction after adjustment, add up the emotion mark of each clause, thus calculate the affection index value of text to be calculated.
The position of emotion word in clause that described affection index computation rule can limit according to other participles in clause (as adversative, negative word, adverbial word) and non-field, adjusts the emotion mark of the emotion word that non-field limits.
As shown in Figure 3, Fig. 2 steps A includes further:
Steps A 1, obtain multiple emotion word, and calculate in corpus according to emotion mark the probability that in the probability and front/or negative document that probability that the front/or negative number of documents, emotion word and front/that include emotion word or negative document occur jointly, front/or negative document occur, emotion word occurs, calculate the co-occurrence mark of each emotion word and positive emotion/or negative emotion respectively.
Described emotion mark calculates corpus can select an Emotional Corpus including multiple FIELD Data, and such as, shown in table 1, the emotion mark selected calculates the suggestion grouped data that corpus includes notebook computer, hotel, these three fields of books:
Table 1 emotion mark calculates corpus
In described steps A 1, the computing formula of the co-occurrence mark of emotion word w and positive emotion is:
wherein, c (e
p, be w) calculate at emotion mark the front number of documents including emotion word w in corpus; P (e
p, w|e
p) be the probability that emotion mark calculates that in corpus, emotion word w and front document occur jointly, its value is:
n
pthe quantity that emotion mark calculates front document in corpus; P (e
p) be the probability that emotion mark calculates that in corpus, front document occurs, its value is:
c (e
p) be the quantity that emotion mark calculates front document in corpus, N
dthe quantity that emotion mark calculates all documents in corpus; P (w|e
p) be the probability that emotion mark calculates that in the front document of corpus, emotion word w occurs, its value is:
c (w|e
p) be the number of times that emotion mark calculates that in the front document of corpus, emotion word w occurs, words
pit is total word frequency number (the word sum namely in the document of front) that emotion mark calculates in the front document of corpus;
The computing formula of the co-occurrence mark of emotion word w and negative emotion is:
wherein, c (e
n, be w) calculate at emotion mark the negative number of documents including emotion word w in corpus; P (e
n, w|e
n) be the probability that emotion mark calculates that in corpus, emotion word w and negative document occur jointly, its value is:
n
nthe quantity that emotion mark calculates negative document in corpus; P (e
n) be the probability that emotion mark calculates that in corpus, negative document occurs, its value is:
c (e
n) be the quantity that emotion mark calculates negative document in corpus; P (w|e
n) be the probability that emotion mark calculates that in the negative document of corpus, emotion word w occurs, its value is:
c (w|e
n) be the number of times that emotion mark calculates that in the negative document of corpus, emotion word w occurs, words
nit is total word frequency number that emotion mark calculates in the negative document of corpus.
Steps A 2, use maximum-minimum method for normalizing, respectively the co-occurrence mark of each emotion word and positive emotion/or negative emotion be normalized, wherein, to the computing formula that the co-occurrence mark of emotion word w and positive emotion is normalized be:
cP (e
p, w)
minthe minimum value of the co-occurrence mark of all emotion word and positive emotion, CP (e
p, w)
maxit is the maximal value of the co-occurrence mark of all emotion word and positive emotion; To the computing formula that the co-occurrence mark of emotion word w and negative emotion is normalized be:
cN (e
n, w)
minthe minimum value of the co-occurrence mark of all emotion word and negative emotion, CN (e
n, w)
maxit is the maximal value of the co-occurrence mark of all emotion word and negative emotion;
Steps A 3, difference according to the co-occurrence mark of emotion word and positive emotion, negative emotion, the front polarity calculating each emotion word is poor, and wherein, the computing formula of the front polarity difference of emotion word w is: DValue (w, p, n)=(CP (e
p, w)-CN (e
nw)) β, DValue (w, p, n) be that the front polarity of emotion word w is poor, β is Dynamic gene parameter, and object makes the front polarity difference of cross-cutting emotion word be greater than 1.0, to facilitate the adjustment follow-up NB Algorithm being carried out to conditional probability, the value of such as β can be set to 10000.
Steps A 4, (described T is a real number being greater than 0 to arrange polarity difference limen value T, such as: T=1.0), and do you judge that the front polarity difference of each emotion word is greater than T or is less than-T? if, then using the front polarity of described emotion word difference as its emotion mark, and described emotion word and emotion mark thereof to be kept in the emotion dictionary that non-field limits.Wherein, if the front polarity difference of described emotion word is greater than T, then illustrate that described emotion word is the positive emotion word that non-field limits, if its front polarity difference is less than-T, then illustrate that described emotion word is the negative emotion word that non-field limits.
If the value of DValue (w, p, n) is greater than T or is less than-T, then illustrate that described emotion word is the emotion word that non-field limits, be saved in the emotion dictionary of non-field restriction; If DValue is (w, p, n) value is at [-T, T] in scope, then illustrate that the difference on the frequency that described emotion word occurs in the front and negative document of emotion mark calculating corpus is few, that is described emotion word may express positive emotion in the field had, and negative emotion then can be expressed in some fields, and therefore it is not the emotion word that non-field limits.
As shown in Figure 4, in Fig. 2 step B, find the emotion word of the non-field restriction comprised in each clause successively, use the emotion mark of affection index computation rule to the emotion word that each non-field comprised in clause limits to adjust, include further:
Do you step B1, judge the emotion word that has non-field to limit in the participle of each clause? if so, then from the emotion dictionary that non-field limits, extract the emotion mark F that the emotion word of described non-field restriction is corresponding, continue next step; If not, then this flow process terminates;
Described step B1 can further include: extract each participle in clause successively, and and the emotion word that limits of the non-field of preserving in the emotion dictionary that limits of non-field contrast, if unanimously, then illustrate that described participle is the emotion word that non-field limits.
Do step B2, the participle judging to be positioned in described clause before emotion word that non-field limits have adversative? if so, then the emotion mark F of the emotion word limited in described non-field is adjusted to 2*F, continues next step; If not, then next step is continued;
Described adversative includes but are not limited to: but but but, wilfully, just but, as, so that, unexpectedly, surprisingly.
Described step B2 can further include: build turnover dictionary, and the participle be arranged in clause before emotion word that non-field limits and the adversative preserved of turnover dictionary are contrasted, if consistent, then have adversative before the emotion word that in described clause, non-field limits is described.
Step B3, the participle judging before non-field limits in described clause emotion word have negative word and the word distance of emotion word that negative word and non-field limit is less than or equal to 2? if, the emotion mark F of the emotion word then limited in described non-field is adjusted to-F, continues next step; If not, then next step is continued;
Described negative word includes but are not limited to: not, do not have.
Do you step B4, judge to have adverbial word in described clause and the word distance of emotion word that adverbial word and non-field limit is less than or equal to 2? if so, then the emotion mark F of the emotion word limited in described non-field is adjusted to 2*F, continues next step; If not, then next step is continued;
Described adverbial word includes but are not limited to: very, especially.
Do you step B5, judge that emotion word that non-field limits is in the end of the sentence of clause? if so, then the emotion mark F of the emotion word limited in described non-field is adjusted to 2*F, and this flow process terminates; If not, then this flow process terminates.
In Fig. 2 step B, according to the emotion mark of the emotion word that the non-field after adjustment limits, add up the emotion mark of each clause, thus calculate the affection index value of text to be calculated, include further:
Step 1, the emotion mark of emotion word limited according to all non-field comprised in clause, calculate the emotion mark of described clause: FU=F (A1)+F (A2)+... + F (An), wherein A1, A2 ... An is the emotion word that the non-field comprised in clause limits respectively, F (A1), F (A2) ... F (An) be non-field limit emotion word A1, A2 ... emotion mark after An adjustment;
Do you step 2, judge that the participle number of described clause is less than participle maximal value (such as: participle maximal value is 20)? if so, then the emotion mark FU of described clause is adjusted to 2*FU, then continues next step; If not, then next step is continued;
Step 3, the affection index value of adding up text to be calculated are the emotion mark sum of all clauses;
Do you step 4, judge that the affection index value of described text to be calculated is greater than text emotion index maximal value (if text emotion index maximal value is 15000)? if, then the affection index value of described text to be calculated is adjusted to text emotion index maximal value, this flow process terminates; If not, then next step is continued;
Do you step 5, judge that the affection index value of described text to be calculated is less than text emotion Index Min (if text emotion Index Min is-15000)? if, then the affection index value of described text to be calculated is adjusted to text emotion Index Min, this flow process terminates; If not, then this flow process terminates.
A kind of text emotion index computing system of the present invention includes non-field emotion dictionary construction device and text emotion exponential calculation device, wherein:
Non-field emotion dictionary construction device, for building the emotion dictionary that non-field limits, selects the emotion word that multiple non-field limits, and the emotion mark of the emotion word limited in described non-field and correspondence thereof is kept in the emotion dictionary of non-field restriction;
Text emotion exponential calculation device, for the text to be calculated of input being divided into multiple clause according to punctuation mark, and participle is carried out to each clause, then the emotion word of the non-field restriction comprised in each clause is found successively, the emotion mark of affection index computation rule to the emotion word that each non-field comprised in clause limits is used to adjust, finally according to the emotion mark of the emotion word of the non-field restriction after adjustment, add up the emotion mark of each clause, thus calculate the affection index value of text to be calculated.
As shown in Figure 5, non-field emotion dictionary construction device includes emotion co-occurrence score calculating unit, normalization unit, front polarity difference computing unit and non-field emotion word judging unit further, wherein:
Emotion co-occurrence score calculating unit, for obtaining multiple emotion word, and include the front/of emotion word or negative number of documents according in emotion mark calculating corpus, the probability that emotion word and front/or negative document occur jointly, the probability that front/or negative document occur, and the probability that in front/or negative document, emotion word occurs, calculate the co-occurrence mark of each emotion word and positive emotion/or negative emotion respectively, and the co-occurrence mark of described emotion word and positive emotion/or negative emotion is sent to normalization unit, described emotion mark calculates corpus can select an Emotional Corpus including multiple FIELD Data, the computing formula of the co-occurrence mark of emotion word w and positive emotion is:
wherein, c (e
p, be w) calculate at emotion mark the front number of documents including emotion word w in corpus, P (e
p, w|e
p) be the probability that emotion mark calculates that in corpus, emotion word w and front document occur jointly, its value is:
n
pthe quantity that emotion mark calculates front document in corpus, p (e
p) be the probability that emotion mark calculates that in corpus, front document occurs, its value is:
c (e
p) be the quantity that emotion mark calculates front document in corpus, N
dthe quantity that emotion mark calculates all documents in corpus, p (w|e
p) be the probability that emotion mark calculates that in the front document of corpus, emotion word w occurs, its value is:
c (w|e
p) be the number of times that emotion mark calculates that in the front document of corpus, emotion word w occurs, words
pit is total word frequency number (the word sum namely in the document of front) that emotion mark calculates in the front document of corpus, the computing formula of the co-occurrence mark of emotion word w and negative emotion is:
wherein, c (e
n, be w) calculate at emotion mark the negative number of documents including emotion word w in corpus, P (e
n, w|e
n) be the probability that emotion mark calculates that in corpus, emotion word w and negative document occur jointly, its value is:
n
nthe quantity that emotion mark calculates negative document in corpus, p (e
n) be the probability that emotion mark calculates that in corpus, negative document occurs, its value is:
c (e
n) be the quantity that emotion mark calculates negative document in corpus, p (w|e
n) be the probability that emotion mark calculates that in the negative document of corpus, emotion word w occurs, its value is:
c (w|e
n) be the number of times that emotion mark calculates that in the negative document of corpus, emotion word w occurs, words
nit is total word frequency number that emotion mark calculates in the negative document of corpus,
Normalization unit, for using maximum-minimum method for normalizing, the emotion word calculate emotion co-occurrence score calculating unit and the co-occurrence mark of positive emotion/or negative emotion are normalized, and the co-occurrence mark of the emotion word after process and positive emotion/or negative emotion is sent to front polarity difference computing unit, wherein:
cP (e
p, w)
minthe minimum value of the co-occurrence mark of all emotion word and positive emotion, CP (e
p, w)
maxit is the maximal value of the co-occurrence mark of all emotion word and positive emotion;
cN (e
n, w)
minthe minimum value of the co-occurrence mark of all emotion word and negative emotion, CN (e
n, w)
maxit is the maximal value of the co-occurrence mark of all emotion word and negative emotion;
Front polarity difference computing unit, for the difference of the co-occurrence mark according to the emotion word after normalization cell processing and positive emotion, negative emotion, the front polarity calculating each emotion word is poor, and the front polarity difference of emotion word is sent to non-field emotion word judging unit, wherein: DValue (w, p, n)=(CP (e
p, w)-CN (e
n, w)) and β, DValue (w, p, n) they are that the front polarity of emotion word w is poor, β is Dynamic gene parameter;
Non-field emotion word judging unit, for arranging polarity difference limen value T (described T is a real number being greater than 0), and judge whether the front polarity difference of the emotion word that front polarity difference computing unit is sent is greater than T or is less than-T, if, then using the front polarity of described emotion word difference as its emotion mark, and described emotion word and emotion mark thereof to be kept in the emotion dictionary that non-field limits.
As shown in Figure 6, text emotion exponential calculation device includes text input unit, text emotion computing unit and emotion word score calculating unit further, wherein:
Text input unit, for inputting text to be calculated, and sends to text emotion computing unit by described text to be calculated;
Text emotion computing unit, for text to be calculated is divided into multiple clause, and participle is carried out to each clause, the emotion word that the non-field of preserving in the emotion the dictionary then each participle in clause and non-field limited one by one limits contrasts, if consistent, then described clause and the emotion word of the non-field restriction of its correspondence are sent to emotion word score calculating unit; Receive the emotion mark of the emotion word of the non-field restriction that emotion word score calculating unit returns, and according to the emotion word that the non-field that each clause comprises limits, add up the emotion mark of each clause, finally according to the emotion mark of all clauses, add up the affection index value of text to be calculated;
Emotion word score calculating unit, the emotion word that the clause sent for receiving text emotion computing unit limits with the non-field of its correspondence, the emotion mark that the emotion word of described non-field restriction is corresponding is extracted from the emotion dictionary that non-field limits, then the emotion mark of affection index computation rule to the emotion word that the non-field comprised in clause limits is used to adjust, the emotion mark of the emotion word finally limited in the non-field after adjustment sends to text emotion computing unit, described affection index computation rule can based on Chinese parsing, the emotion word limited according to field non-in clause and adversative, negative word, the position of adverbial word in clause, the emotion mark of the emotion word that field non-in clause limits is adjusted.
Explain further the present invention in order to clearer, citing is below illustrated:
1, first from knowing download 8936 emotion word, wherein positive emotion word 4566 net (HOWNET), negative emotion word 4370.
2, by the present invention, the positive emotion word that 1074 non-fields limit can be selected, the negative emotion word that 365 non-fields limit.Table 2 lists the emotion mark of emotion word that non-field that emotion mark comes the first five and latter five limits and their correspondences:
The emotion mark of the emotion word that the non-field of table 2 the first five and latter five limits and correspondence
As can be seen from the data of table 2, it is the situation that positive emotion word or negative emotion word meet the restriction of non-field all very much, such as positive emotion word first " liking ", also be that non-field limits in our subjective determination, what express when seldom having field to occur " liking " this word is negative emotion.In like manner, " love ", " satisfaction ", " disappointment ", " gloomy " etc. in table 2 are also all the emotion word that obvious non-field limits.
3, text to be calculated is: weather is good especially, and mood is also good.
(1), clause 1 is: weather is good especially; Clause 2 is: and mood is also good.
(2), for clause 1, the emotion word that ' good ' limits for the non-field be present in the emotion dictionary of non-field restriction, emotion mark is 10 points, occurred again adverbial word ' especially ' in clause 1, and be 1 with the distance of ' good ', and ' good ' is in end of the sentence, therefore, the emotion mark of ' good ' is adjusted to: 10*2*2=40, and because the participle number of clause 1 is less than 20, the emotion mark of then clause 1 is: 40*2=80.
(3), for clause 2, the emotion word that ' well ' limits for the non-field be present in the emotion dictionary of non-field restriction, emotion mark is 10 points, and ' well ' is in clause's end of the sentence, therefore, the emotion mark of ' well ' is adjusted to: 10*2=20, and because the participle number of clause 2 is less than 20, the emotion mark of then clause 2 is: 20*2=40.
(4), the affection index value of text to be calculated is: 80+40=120.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.