CN105956095B - A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary - Google Patents
A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary Download PDFInfo
- Publication number
- CN105956095B CN105956095B CN201610286515.1A CN201610286515A CN105956095B CN 105956095 B CN105956095 B CN 105956095B CN 201610286515 A CN201610286515 A CN 201610286515A CN 105956095 B CN105956095 B CN 105956095B
- Authority
- CN
- China
- Prior art keywords
- word
- dictionary
- sentiment
- emotional
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary, this method comprises: step (1), obtaining the corresponding Chinese dictionary of ANEW dictionary using the method for translation;Step (2), vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains;Step (3), the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, step (4), the expansion that sentiment dictionary is carried out based on synonym woods extended edition;Step (5), the expansion that dictionary is carried out based on improved SO-PMI algorithm;Step (6) carries out rule-based emotional orientation analysis for microblogging text;Step (7) executes the sentiment analysis algorithm based on weight factor.Compared with prior art, the present invention is not limited by corpus quantity, and unsupervised execution completely may be implemented, and is very suitable to that microblogging is a large amount of and unmarked data.
Description
Technical field
The invention belongs to data minings and information retrieval field, more particularly to a kind of heart based on fine granularity sentiment dictionary
Manage Early-warning Model.
Background technique
Currently, the prior art research of most of text analyzing is the sentiment analysis for English text, wherein including pole
Property dictionary, context relation converter etc..However, since Chinese has the characteristics that have a large vocabulary, if using in manual markings
Literary affection resources need to pay huge workload, therefore how by existing English resource rapid build to go out Chinese emotion word
The research of remittance is of great significance.
It needs to measure word in sentiment dictionary building.Three-dimensional emotion model PAD (Pleasure-
Displeasure, Arousal-nonarousal, Dominance-submissiveness) it is the tool proposed by Mehrabian
There is the emotion model most widely applied.Wherein P represents pleasure degree Pleasure, and A represents arousal Arousal, and D, which is represented, to be dominated
Spend Dominance.Emotional category representated by a word can be measured with PAD model, as shown in table 1:
The corresponding affective style citing of each dimension of table 1, PAD
Margaret M.Bradley and Peter professor J.Lang is the research in University of Florida research center at heart
Personnel propose the dictionary for specification english vocabulary emotion grade, english vocabulary emotion specification (Affective Norms
for English Words,ANEW).ANEW emotion vocabulary is using PAD as prototype, according to three dimensions of PAD to written material
It scores.Research work is unfolded also around ANEW in the researcher of various countries, scores various countries' language.
Summary of the invention
Based on the above-mentioned prior art and there are the problem of, the invention proposes a kind of psychology based on fine granularity sentiment dictionary
Early-warning Model construction method, construction method and expansion to Chinese sentiment dictionary, and to microblog text affective tendency detection and
Psychological early warning.Basic research work especially in the research directions such as Chinese text research, sentiment analysis, Internet public opinion analysis
The further research that work is other on text contributes, to accelerate the Efficiency on Chinese text and provide one kind
Psychological method for early warning finds the psycho-emotional of text, the Sentiment orientation of awareness network public sentiment and user in time.
The invention proposes a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary walks below this method
It is rapid:
Step 1 obtains the corresponding Chinese dictionary of ANEW dictionary using the method for translation;
Step 2, vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains;
Step 3, the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, and normalization is public
Formula indicates are as follows:
Wherein, Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the intensity of emotion word place classification most
Big value, the minimum value of classification intensity where Minvalue indicates emotion word, X indicates to change the emotional intensity of word, after Y indicates normalization
Emotional intensity;
Step 4, the expansion that sentiment dictionary is carried out based on synonym woods extended edition;
Step 5, the expansion that dictionary is carried out based on improved SO-PMI algorithm, specific processing are as follows:
According to following formula:
SO (word)=max (Wi(word))wij
Wherein, wherein γ is adjustment coefficient, wijIndicate j-th of benchmark word, W in the i-th class emotional categoryi(word) it indicates
Neologisms word is in the SO-PMI value with the i-th class emotion word;
For the SO-PMI value that neologisms word is calculated in inhomogeneity, selection is wherein new with maximum SO-PMI
Word word;
Step 6 carries out microblogging text rule-based emotional orientation analysis, including word segmentation processing, takes out to text
It takes rule to be expanded, polarity word is shifted, degree adverb is handled, for negative word+degree adverb+emotion word
Structure and degree adverb+negative word+emotion word structure be analyzed and processed, assign different weights;
Step 7 executes the sentiment analysis algorithm based on weight factor, which indicates are as follows:
Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, WijI-th is represented to belong to
The emotional value of the emotion word W of emotional category j, CiRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.
Compared with prior art, it is the advantages of above-mentioned technical proposal: is not limited by corpus quantity, is may be implemented completely
Unsupervised execution, is very suitable to that microblogging is a large amount of and unmarked data.
Detailed description of the invention
Fig. 1 is that the overall flow of the psychological Early-warning Model construction method of the invention based on fine granularity sentiment dictionary is illustrated
Figure.
Specific embodiment
Below in conjunction with the drawings and the specific embodiments, technical solution of the present invention is described in further detail.
As shown in Figure 1, the psychological Early-warning Model construction method of the invention based on fine granularity sentiment dictionary, process are specifically wrapped
Include following steps:
Step 1 carries out machine translation, obtains the corresponding Chinese word of ANEW dictionary by artificial and machine translation method
Allusion quotation, the step specifically include following processing:
Processing one arranges and merges all lexical informations in ANEW dictionary, rejects the past tense that cannot be indicated in Chinese
English vocabulary;Processing two is obtained bilingual table by machine translation, while in translation process, further word for word arranged
The accuracy for looking into confirmation vocabulary, prevents from causing biggish ambiguity;Inconsistent entry is showed in processing three, centering English dictionary,
It is corrected, the option that selection is best suitable for sentiment analysis is added into dictionary.The final Chinese vocabulary for obtaining certain scale;
Step 2, vocabulary screening, delete some vocabulary for not being suitable for sentiment analysis, which specifically includes following processing:
There are the vocabulary of differential expression in Chinese and English early warning for deletion, delete such word that will affect sentiment analysis result;
Step 3: carrying out the normalized of emotional value, the emotional value of word is normalized between -1~1, is specifically included
Handle below: the standards of grading range of emotion time is 1~8 in ANEW, the negative affect intensity for indicating emotion from small to large of numerical value
To the variation range of positive emotional intensity, the standards of grading of emotion word polar antagonism and PAD dimension values are considered, by the strong of word
Angle value is normalized;It is shown using normalization formula such as formula (1),
Wherein Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the maximum intensity of emotion word place classification
Value, the minimum value of classification intensity where Minvalue indicates emotion word, X indicates to change the emotional intensity of word, after Y indicates normalization
Emotional intensity;
Step 4, the expansion that sentiment dictionary is carried out based on synonym woods extended edition, specifically include following processing: processing one, choosing
It takes microblog data as corpus, filters out the extremely low vocabulary of the frequency of occurrences, construct more efficient sentiment dictionary;Processing two,
Using the word similarity algorithm of Harbin Institute of Technology's Chinese thesaurus, and it is similar to combine existing semantic dictionary to carry out calculating lexical semantic
Degree;
Step 5, the expansion that dictionary is carried out based on improved SO-PMI algorithm, specifically include following processing: processing one utilizes
Network neologisms expand sentiment dictionary, select the benchmark word of positive emotion and Negative Affect, are denoted as PS and NS respectively;Processing
Two, to neologisms word PMI value corresponding with PS, NS set calculating, it is denoted as WP and WN respectively.Between one word and a word
The calculation of PMI value, as shown in formula 2,
Wherein, N indicates word number total in corpus, f (word1,word2) indicate word1,word2It is same in corpus
When the frequency that occurs, f (word1) indicate word word1The frequency occurred in corpus, f (word2) indicate word2In corpus
The frequency occurred in library, log2() function representation with 2 for bottom logarithmic function, such as formula (2) kind may be assumed that word1It is new
Word, word2For the word (otherwise can also with) in PS, NS set.
If what is calculated is the PMI value of a word and a set of words, as shown in formula 3,
Wherein, WordSet indicates a set of words, and word' is the word in WordSet;
Processing three, the calculation formula such as formula (4) of SO-PMI value are shown,
SO (word)=PMI (word, PS)-PMI (word, NS) (4)
Wherein, SO (word) indicates the SO-PMI value of word word, and positive dictionary is added if obtained value is greater than 0,
Passive dictionary is added if obtained value is less than 0, is otherwise added without any dictionary.
Processing four, the improved method of the present invention are such as formula (5), shown in formula (6)
SO (word)=max (Wi(word))wij (6)
Wherein, γ is adjustment coefficient, wij wijIndicate j-th of benchmark word, W in the i-th class emotional categoryi(word) indicate new
Word word is in the SO-PMI value with the i-th class emotion word.What formula (6) showed neologisms word is calculated in inhomogeneity
SO-PMI value selects maximum, while classification belonging to the available word.
Step 6: rule-based emotional orientation analysis is carried out for microblogging text, specifically includes following processing: processing
One, word segmentation processing is carried out using the Chinese lexical analysis device ICLTCLAS of Institute of Computing Technology, CAS exploitation;Processing two, it is right
Text decimation rule is expanded, and the text decimation rule that the present invention uses is obtained, and decimation rule is as shown in table 3;Processing three is incited somebody to action
Polarity word is shifted, and one -1 coefficient is multiplied by for the emotion of negative word (if not, may not wait words) modification.For adversative
Although the sentence that (such as, etc.) occurs, only carries out sentiment analysis to later half sentence;Processing four handles degree adverb, presses
It is divided into Pyatyi according to the intensity of emotion, value is between 0.5-3;Handle five, for negative word+degree adverb+emotion word
Structure and degree adverb+negative word+emotion word structure are analyzed and processed, and assign different weights.
Step 7 executes the sentiment analysis algorithm based on weight factor, and the sentiment analysis of the invention based on weight factor is calculated
Method (Text sentiment orientation classification algorithm based on weighting
Factor, WF-SO), as shown in formula (7).
Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, WijI-th is represented to belong to
The emotional value of the emotion word W of emotional category j, CiRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.When α is 1,
The tendency of text is the classification for the emotion word that frequency of occurrence is most in play, and when α tends to infinitely great, text tendency is in the sentence
The classification of the maximum word of emotional intensity.
Table 2, macro average experiment comparing result
Table 3, text decimation rule
The present invention is using in NLP&CC (Natural Language Processing&Chinese Computing) 2013
The data that literary microblogging trend analysis evaluation and test provides.According to the requirement of NLP&CC, the identification and classification of mood sentence are carried out.Experimental result
Obtained accuracy is 0.3420, recall rate 0.8873, and F value is 0.4935.Although result of the present invention in accuracy compared with
It is low, but the method for building sentiment dictionary of the invention has the advantage that are as follows: it is not limited, is may be implemented completely by corpus quantity
Unsupervised execution, is very suitable to that microblogging is a large amount of and unmarked data.
Macro is averagely the arithmetic mean of instantaneous value of each emotion class performance indicator, and micro- is averagely the performance of each instance document
The arithmetic mean of instantaneous value of index.The present invention obtained in microblog data emotional semantic classification (good, happy, anger, sorrow are feared, and dislike, frightened) experiment it is micro-
The accuracy of average result, recall rate, F value are respectively 0.3332,0.2959,0.3134, and the accuracy of macro average result is recalled
Rate, F value are respectively 0.3411,0.2232,0.2698.More satisfied result is totally obtained.It as shown in table 1, is the present invention
Method in macro average index with other method comparing results.
Claims (1)
1. a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary, which is characterized in that this method following steps:
Step (1) obtains the corresponding Chinese dictionary of ANEW dictionary using the method for translation;
Step (2), vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains;
Step (3), the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, normalizes formula
It indicates are as follows:
Wherein, Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the maximum of intensity of emotion word place classification,
The minimum value of classification intensity where Minvalue indicates emotion word, X indicate to change the emotional intensity of word, the feelings after Y expression normalization
Feel intensity;
Step (4), the expansion that sentiment dictionary is carried out based on synonym woods extended edition, it is extremely low to filter out the frequency of occurrences in corpus
Vocabulary constructs more efficient sentiment dictionary;Using the word similarity algorithm of Harbin Institute of Technology's Chinese thesaurus, and combine existing
Semantic dictionary carries out calculating Similarity of Words;
Step (5), the expansion that dictionary is carried out based on improved SO-PMI algorithm, specific processing are as follows:
According to following formula:
SO (word)=max (Wi(word))wij
Wherein, wherein γ is adjustment coefficient, wijIndicate j-th of benchmark word, W in the i-th class emotional categoryi(word) neologisms are indicated
Word is in the SO-PMI value with the i-th class emotion word;
For the SO-PMI value that neologisms word is calculated in inhomogeneity, selection is wherein with the neologisms of maximum SO-PMI
word;
Step (6) carries out microblogging text rule-based emotional orientation analysis, including word segmentation processing, extracts to text
Rule is expanded, polarity word is shifted, is handled degree adverb, for negative word+degree adverb+emotion word
Structure and degree adverb+negative word+emotion word structure are analyzed and processed, and assign different weights;
Step (7) executes the sentiment analysis algorithm based on weight factor, which indicates are as follows:
Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, WijIt represents i-th and belongs to emotion
The emotional value of the emotion word W of classification j, CiRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610286515.1A CN105956095B (en) | 2016-04-29 | 2016-04-29 | A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610286515.1A CN105956095B (en) | 2016-04-29 | 2016-04-29 | A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105956095A CN105956095A (en) | 2016-09-21 |
CN105956095B true CN105956095B (en) | 2019-11-05 |
Family
ID=56914867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610286515.1A Active CN105956095B (en) | 2016-04-29 | 2016-04-29 | A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105956095B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106708805A (en) * | 2016-12-30 | 2017-05-24 | 深圳天珑无线科技有限公司 | Text statistics-based psychoanalysis method and device |
CN109408798B (en) * | 2018-07-27 | 2021-09-14 | 昆明理工大学 | Word emotional tendency judgment method |
CN109284499A (en) * | 2018-08-01 | 2019-01-29 | 数据地平线(广州)科技有限公司 | A kind of industry text emotion acquisition methods, device and storage medium |
CN109740044B (en) * | 2018-12-24 | 2023-03-21 | 东华大学 | Enterprise transaction early warning method based on time series intelligent prediction |
CN109597999B (en) * | 2018-12-26 | 2021-09-07 | 青海大学 | Extraction modeling method and device for behavior semantic relation of emotional words |
CN111538834A (en) * | 2020-01-21 | 2020-08-14 | 中国银联股份有限公司 | Emotion dictionary construction method and system, emotion recognition method and system and storage medium |
CN112199956B (en) * | 2020-11-02 | 2023-03-24 | 天津大学 | Entity emotion analysis method based on deep representation learning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104090864A (en) * | 2014-06-09 | 2014-10-08 | 合肥工业大学 | Emotion dictionary building and emotion calculation method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150213002A1 (en) * | 2014-01-24 | 2015-07-30 | International Business Machines Corporation | Personal emotion state monitoring from social media |
-
2016
- 2016-04-29 CN CN201610286515.1A patent/CN105956095B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104090864A (en) * | 2014-06-09 | 2014-10-08 | 合肥工业大学 | Emotion dictionary building and emotion calculation method |
Also Published As
Publication number | Publication date |
---|---|
CN105956095A (en) | 2016-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105956095B (en) | A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary | |
US9613024B1 (en) | System and methods for creating datasets representing words and objects | |
US10437867B2 (en) | Scenario generating apparatus and computer program therefor | |
Beliga | Keyword extraction: a review of methods and approaches | |
CN102663139B (en) | Method and system for constructing emotional dictionary | |
CN103207913B (en) | The acquisition methods of commercial fine granularity semantic relation and system | |
CN105786991A (en) | Chinese emotion new word recognition method and system in combination with user emotion expression ways | |
CN109960756B (en) | News event information induction method | |
US10095685B2 (en) | Phrase pair collecting apparatus and computer program therefor | |
Wang et al. | Ptr: Phrase-based topical ranking for automatic keyphrase extraction in scientific publications | |
CN111080055A (en) | Hotel scoring method, hotel recommendation method, electronic device and storage medium | |
CN114706972B (en) | Automatic generation method of unsupervised scientific and technological information abstract based on multi-sentence compression | |
Shih et al. | Enhancement of domain ontology construction using a crystallizing approach | |
Alfawareh et al. | Resolving ambiguous entity through context knowledge and fuzzy approach | |
Wu et al. | ECNU at SemEval-2017 task 3: Using traditional and deep learning methods to address community question answering task | |
Chang et al. | A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING. | |
Stanchev | Creating a similarity graph from WordNet | |
JP6850405B2 (en) | Biological system Information retrieval system and method | |
Pan et al. | SPRF: A semantic Pseudo-relevance Feedback enhancement for information retrieval via ConceptNet | |
CN109344331A (en) | A kind of user feeling analysis method based on online community network | |
Li et al. | A method of polarity computation of chinese sentiment words based on gaussian distribution | |
Sánchez et al. | Web-scale taxonomy learning | |
Kuş et al. | An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish | |
Al-Buraihy et al. | An Ml-based classification scheme for analyzing the social network reviews of yemeni people. | |
CN109558586A (en) | A kind of speech of information is according to from card methods of marking, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |