CN105956095B

CN105956095B - A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary

Info

Publication number: CN105956095B
Application number: CN201610286515.1A
Authority: CN
Inventors: 于瑞国; 林榆旺; 王建荣; 于健; 喻梅; 刘江月
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2019-11-05
Anticipated expiration: 2036-04-29
Also published as: CN105956095A

Abstract

The invention discloses a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary, this method comprises: step (1), obtaining the corresponding Chinese dictionary of ANEW dictionary using the method for translation；Step (2), vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains；Step (3), the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, step (4), the expansion that sentiment dictionary is carried out based on synonym woods extended edition；Step (5), the expansion that dictionary is carried out based on improved SO-PMI algorithm；Step (6) carries out rule-based emotional orientation analysis for microblogging text；Step (7) executes the sentiment analysis algorithm based on weight factor.Compared with prior art, the present invention is not limited by corpus quantity, and unsupervised execution completely may be implemented, and is very suitable to that microblogging is a large amount of and unmarked data.

Description

A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary

Technical field

The invention belongs to data minings and information retrieval field, more particularly to a kind of heart based on fine granularity sentiment dictionary Manage Early-warning Model.

Background technique

Currently, the prior art research of most of text analyzing is the sentiment analysis for English text, wherein including pole Property dictionary, context relation converter etc..However, since Chinese has the characteristics that have a large vocabulary, if using in manual markings Literary affection resources need to pay huge workload, therefore how by existing English resource rapid build to go out Chinese emotion word The research of remittance is of great significance.

It needs to measure word in sentiment dictionary building.Three-dimensional emotion model PAD (Pleasure- Displeasure, Arousal-nonarousal, Dominance-submissiveness) it is the tool proposed by Mehrabian There is the emotion model most widely applied.Wherein P represents pleasure degree Pleasure, and A represents arousal Arousal, and D, which is represented, to be dominated Spend Dominance.Emotional category representated by a word can be measured with PAD model, as shown in table 1:

The corresponding affective style citing of each dimension of table 1, PAD

Margaret M.Bradley and Peter professor J.Lang is the research in University of Florida research center at heart Personnel propose the dictionary for specification english vocabulary emotion grade, english vocabulary emotion specification (Affective Norms for English Words,ANEW).ANEW emotion vocabulary is using PAD as prototype, according to three dimensions of PAD to written material It scores.Research work is unfolded also around ANEW in the researcher of various countries, scores various countries' language.

Summary of the invention

Based on the above-mentioned prior art and there are the problem of, the invention proposes a kind of psychology based on fine granularity sentiment dictionary Early-warning Model construction method, construction method and expansion to Chinese sentiment dictionary, and to microblog text affective tendency detection and Psychological early warning.Basic research work especially in the research directions such as Chinese text research, sentiment analysis, Internet public opinion analysis The further research that work is other on text contributes, to accelerate the Efficiency on Chinese text and provide one kind Psychological method for early warning finds the psycho-emotional of text, the Sentiment orientation of awareness network public sentiment and user in time.

The invention proposes a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary walks below this method It is rapid:

Step 1 obtains the corresponding Chinese dictionary of ANEW dictionary using the method for translation；

Step 2, vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains；

Step 3, the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, and normalization is public Formula indicates are as follows:

Wherein, Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the intensity of emotion word place classification most Big value, the minimum value of classification intensity where Minvalue indicates emotion word, X indicates to change the emotional intensity of word, after Y indicates normalization Emotional intensity；

Step 4, the expansion that sentiment dictionary is carried out based on synonym woods extended edition；

Step 5, the expansion that dictionary is carried out based on improved SO-PMI algorithm, specific processing are as follows:

According to following formula:

SO (word)=max (W_i(word))w_ij

Wherein, wherein γ is adjustment coefficient, w_ijIndicate j-th of benchmark word, W in the i-th class emotional category_i(word) it indicates Neologisms word is in the SO-PMI value with the i-th class emotion word；

For the SO-PMI value that neologisms word is calculated in inhomogeneity, selection is wherein new with maximum SO-PMI Word word；

Step 6 carries out microblogging text rule-based emotional orientation analysis, including word segmentation processing, takes out to text It takes rule to be expanded, polarity word is shifted, degree adverb is handled, for negative word+degree adverb+emotion word Structure and degree adverb+negative word+emotion word structure be analyzed and processed, assign different weights；

Step 7 executes the sentiment analysis algorithm based on weight factor, which indicates are as follows:

Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, W_ijI-th is represented to belong to The emotional value of the emotion word W of emotional category j, C_iRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.

Compared with prior art, it is the advantages of above-mentioned technical proposal: is not limited by corpus quantity, is may be implemented completely Unsupervised execution, is very suitable to that microblogging is a large amount of and unmarked data.

Detailed description of the invention

Fig. 1 is that the overall flow of the psychological Early-warning Model construction method of the invention based on fine granularity sentiment dictionary is illustrated Figure.

Specific embodiment

Below in conjunction with the drawings and the specific embodiments, technical solution of the present invention is described in further detail.

As shown in Figure 1, the psychological Early-warning Model construction method of the invention based on fine granularity sentiment dictionary, process are specifically wrapped Include following steps:

Step 1 carries out machine translation, obtains the corresponding Chinese word of ANEW dictionary by artificial and machine translation method Allusion quotation, the step specifically include following processing:

Processing one arranges and merges all lexical informations in ANEW dictionary, rejects the past tense that cannot be indicated in Chinese English vocabulary；Processing two is obtained bilingual table by machine translation, while in translation process, further word for word arranged The accuracy for looking into confirmation vocabulary, prevents from causing biggish ambiguity；Inconsistent entry is showed in processing three, centering English dictionary, It is corrected, the option that selection is best suitable for sentiment analysis is added into dictionary.The final Chinese vocabulary for obtaining certain scale；

Step 2, vocabulary screening, delete some vocabulary for not being suitable for sentiment analysis, which specifically includes following processing: There are the vocabulary of differential expression in Chinese and English early warning for deletion, delete such word that will affect sentiment analysis result；

Step 3: carrying out the normalized of emotional value, the emotional value of word is normalized between -1~1, is specifically included Handle below: the standards of grading range of emotion time is 1~8 in ANEW, the negative affect intensity for indicating emotion from small to large of numerical value To the variation range of positive emotional intensity, the standards of grading of emotion word polar antagonism and PAD dimension values are considered, by the strong of word Angle value is normalized；It is shown using normalization formula such as formula (1),

Wherein Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the maximum intensity of emotion word place classification Value, the minimum value of classification intensity where Minvalue indicates emotion word, X indicates to change the emotional intensity of word, after Y indicates normalization Emotional intensity；

Step 4, the expansion that sentiment dictionary is carried out based on synonym woods extended edition, specifically include following processing: processing one, choosing It takes microblog data as corpus, filters out the extremely low vocabulary of the frequency of occurrences, construct more efficient sentiment dictionary；Processing two, Using the word similarity algorithm of Harbin Institute of Technology's Chinese thesaurus, and it is similar to combine existing semantic dictionary to carry out calculating lexical semantic Degree；

Step 5, the expansion that dictionary is carried out based on improved SO-PMI algorithm, specifically include following processing: processing one utilizes Network neologisms expand sentiment dictionary, select the benchmark word of positive emotion and Negative Affect, are denoted as PS and NS respectively；Processing Two, to neologisms word PMI value corresponding with PS, NS set calculating, it is denoted as WP and WN respectively.Between one word and a word The calculation of PMI value, as shown in formula 2,

Wherein, N indicates word number total in corpus, f (word₁,word₂) indicate word₁,word₂It is same in corpus When the frequency that occurs, f (word₁) indicate word word₁The frequency occurred in corpus, f (word₂) indicate word₂In corpus The frequency occurred in library, log₂() function representation with 2 for bottom logarithmic function, such as formula (2) kind may be assumed that word₁It is new Word, word₂For the word (otherwise can also with) in PS, NS set.

If what is calculated is the PMI value of a word and a set of words, as shown in formula 3,

Wherein, WordSet indicates a set of words, and word' is the word in WordSet；

Processing three, the calculation formula such as formula (4) of SO-PMI value are shown,

SO (word)=PMI (word, PS)-PMI (word, NS) (4)

Wherein, SO (word) indicates the SO-PMI value of word word, and positive dictionary is added if obtained value is greater than 0, Passive dictionary is added if obtained value is less than 0, is otherwise added without any dictionary.

Processing four, the improved method of the present invention are such as formula (5), shown in formula (6)

SO (word)=max (W_i(word))w_ij (6)

Wherein, γ is adjustment coefficient, w_ij w_ijIndicate j-th of benchmark word, W in the i-th class emotional category_i(word) indicate new Word word is in the SO-PMI value with the i-th class emotion word.What formula (6) showed neologisms word is calculated in inhomogeneity SO-PMI value selects maximum, while classification belonging to the available word.

Step 6: rule-based emotional orientation analysis is carried out for microblogging text, specifically includes following processing: processing One, word segmentation processing is carried out using the Chinese lexical analysis device ICLTCLAS of Institute of Computing Technology, CAS exploitation；Processing two, it is right Text decimation rule is expanded, and the text decimation rule that the present invention uses is obtained, and decimation rule is as shown in table 3；Processing three is incited somebody to action Polarity word is shifted, and one -1 coefficient is multiplied by for the emotion of negative word (if not, may not wait words) modification.For adversative Although the sentence that (such as, etc.) occurs, only carries out sentiment analysis to later half sentence；Processing four handles degree adverb, presses It is divided into Pyatyi according to the intensity of emotion, value is between 0.5-3；Handle five, for negative word+degree adverb+emotion word Structure and degree adverb+negative word+emotion word structure are analyzed and processed, and assign different weights.

Step 7 executes the sentiment analysis algorithm based on weight factor, and the sentiment analysis of the invention based on weight factor is calculated Method (Text sentiment orientation classification algorithm based on weighting Factor, WF-SO), as shown in formula (7).

Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, W_ijI-th is represented to belong to The emotional value of the emotion word W of emotional category j, C_iRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.When α is 1, The tendency of text is the classification for the emotion word that frequency of occurrence is most in play, and when α tends to infinitely great, text tendency is in the sentence The classification of the maximum word of emotional intensity.

Table 2, macro average experiment comparing result

Table 3, text decimation rule

The present invention is using in NLP&CC (Natural Language Processing&Chinese Computing) 2013 The data that literary microblogging trend analysis evaluation and test provides.According to the requirement of NLP&CC, the identification and classification of mood sentence are carried out.Experimental result Obtained accuracy is 0.3420, recall rate 0.8873, and F value is 0.4935.Although result of the present invention in accuracy compared with It is low, but the method for building sentiment dictionary of the invention has the advantage that are as follows: it is not limited, is may be implemented completely by corpus quantity Unsupervised execution, is very suitable to that microblogging is a large amount of and unmarked data.

Macro is averagely the arithmetic mean of instantaneous value of each emotion class performance indicator, and micro- is averagely the performance of each instance document The arithmetic mean of instantaneous value of index.The present invention obtained in microblog data emotional semantic classification (good, happy, anger, sorrow are feared, and dislike, frightened) experiment it is micro- The accuracy of average result, recall rate, F value are respectively 0.3332,0.2959,0.3134, and the accuracy of macro average result is recalled Rate, F value are respectively 0.3411,0.2232,0.2698.More satisfied result is totally obtained.It as shown in table 1, is the present invention Method in macro average index with other method comparing results.

Claims

1. a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary, which is characterized in that this method following steps:

Step (1) obtains the corresponding Chinese dictionary of ANEW dictionary using the method for translation；

Step (2), vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains；

Step (3), the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, normalizes formula It indicates are as follows:

Wherein, Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the maximum of intensity of emotion word place classification, The minimum value of classification intensity where Minvalue indicates emotion word, X indicate to change the emotional intensity of word, the feelings after Y expression normalization Feel intensity；

Step (4), the expansion that sentiment dictionary is carried out based on synonym woods extended edition, it is extremely low to filter out the frequency of occurrences in corpus Vocabulary constructs more efficient sentiment dictionary；Using the word similarity algorithm of Harbin Institute of Technology's Chinese thesaurus, and combine existing Semantic dictionary carries out calculating Similarity of Words；

Step (5), the expansion that dictionary is carried out based on improved SO-PMI algorithm, specific processing are as follows:

According to following formula:

SO (word)=max (W_i(word))w_ij

Wherein, wherein γ is adjustment coefficient, w_ijIndicate j-th of benchmark word, W in the i-th class emotional category_i(word) neologisms are indicated Word is in the SO-PMI value with the i-th class emotion word；

For the SO-PMI value that neologisms word is calculated in inhomogeneity, selection is wherein with the neologisms of maximum SO-PMI word；

Step (6) carries out microblogging text rule-based emotional orientation analysis, including word segmentation processing, extracts to text Rule is expanded, polarity word is shifted, is handled degree adverb, for negative word+degree adverb+emotion word Structure and degree adverb+negative word+emotion word structure are analyzed and processed, and assign different weights；

Step (7) executes the sentiment analysis algorithm based on weight factor, which indicates are as follows:

Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, W_ijIt represents i-th and belongs to emotion The emotional value of the emotion word W of classification j, C_iRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.