CN105956095B - A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary - Google Patents

A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary Download PDF

Info

Publication number
CN105956095B
CN105956095B CN201610286515.1A CN201610286515A CN105956095B CN 105956095 B CN105956095 B CN 105956095B CN 201610286515 A CN201610286515 A CN 201610286515A CN 105956095 B CN105956095 B CN 105956095B
Authority
CN
China
Prior art keywords
word
dictionary
sentiment
emotional
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610286515.1A
Other languages
Chinese (zh)
Other versions
CN105956095A (en
Inventor
于瑞国
林榆旺
王建荣
于健
喻梅
刘江月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201610286515.1A priority Critical patent/CN105956095B/en
Publication of CN105956095A publication Critical patent/CN105956095A/en
Application granted granted Critical
Publication of CN105956095B publication Critical patent/CN105956095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary, this method comprises: step (1), obtaining the corresponding Chinese dictionary of ANEW dictionary using the method for translation;Step (2), vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains;Step (3), the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, step (4), the expansion that sentiment dictionary is carried out based on synonym woods extended edition;Step (5), the expansion that dictionary is carried out based on improved SO-PMI algorithm;Step (6) carries out rule-based emotional orientation analysis for microblogging text;Step (7) executes the sentiment analysis algorithm based on weight factor.Compared with prior art, the present invention is not limited by corpus quantity, and unsupervised execution completely may be implemented, and is very suitable to that microblogging is a large amount of and unmarked data.

Description

A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary
Technical field
The invention belongs to data minings and information retrieval field, more particularly to a kind of heart based on fine granularity sentiment dictionary Manage Early-warning Model.
Background technique
Currently, the prior art research of most of text analyzing is the sentiment analysis for English text, wherein including pole Property dictionary, context relation converter etc..However, since Chinese has the characteristics that have a large vocabulary, if using in manual markings Literary affection resources need to pay huge workload, therefore how by existing English resource rapid build to go out Chinese emotion word The research of remittance is of great significance.
It needs to measure word in sentiment dictionary building.Three-dimensional emotion model PAD (Pleasure- Displeasure, Arousal-nonarousal, Dominance-submissiveness) it is the tool proposed by Mehrabian There is the emotion model most widely applied.Wherein P represents pleasure degree Pleasure, and A represents arousal Arousal, and D, which is represented, to be dominated Spend Dominance.Emotional category representated by a word can be measured with PAD model, as shown in table 1:
The corresponding affective style citing of each dimension of table 1, PAD
Margaret M.Bradley and Peter professor J.Lang is the research in University of Florida research center at heart Personnel propose the dictionary for specification english vocabulary emotion grade, english vocabulary emotion specification (Affective Norms for English Words,ANEW).ANEW emotion vocabulary is using PAD as prototype, according to three dimensions of PAD to written material It scores.Research work is unfolded also around ANEW in the researcher of various countries, scores various countries' language.
Summary of the invention
Based on the above-mentioned prior art and there are the problem of, the invention proposes a kind of psychology based on fine granularity sentiment dictionary Early-warning Model construction method, construction method and expansion to Chinese sentiment dictionary, and to microblog text affective tendency detection and Psychological early warning.Basic research work especially in the research directions such as Chinese text research, sentiment analysis, Internet public opinion analysis The further research that work is other on text contributes, to accelerate the Efficiency on Chinese text and provide one kind Psychological method for early warning finds the psycho-emotional of text, the Sentiment orientation of awareness network public sentiment and user in time.
The invention proposes a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary walks below this method It is rapid:
Step 1 obtains the corresponding Chinese dictionary of ANEW dictionary using the method for translation;
Step 2, vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains;
Step 3, the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, and normalization is public Formula indicates are as follows:
Wherein, Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the intensity of emotion word place classification most Big value, the minimum value of classification intensity where Minvalue indicates emotion word, X indicates to change the emotional intensity of word, after Y indicates normalization Emotional intensity;
Step 4, the expansion that sentiment dictionary is carried out based on synonym woods extended edition;
Step 5, the expansion that dictionary is carried out based on improved SO-PMI algorithm, specific processing are as follows:
According to following formula:
SO (word)=max (Wi(word))wij
Wherein, wherein γ is adjustment coefficient, wijIndicate j-th of benchmark word, W in the i-th class emotional categoryi(word) it indicates Neologisms word is in the SO-PMI value with the i-th class emotion word;
For the SO-PMI value that neologisms word is calculated in inhomogeneity, selection is wherein new with maximum SO-PMI Word word;
Step 6 carries out microblogging text rule-based emotional orientation analysis, including word segmentation processing, takes out to text It takes rule to be expanded, polarity word is shifted, degree adverb is handled, for negative word+degree adverb+emotion word Structure and degree adverb+negative word+emotion word structure be analyzed and processed, assign different weights;
Step 7 executes the sentiment analysis algorithm based on weight factor, which indicates are as follows:
Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, WijI-th is represented to belong to The emotional value of the emotion word W of emotional category j, CiRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.
Compared with prior art, it is the advantages of above-mentioned technical proposal: is not limited by corpus quantity, is may be implemented completely Unsupervised execution, is very suitable to that microblogging is a large amount of and unmarked data.
Detailed description of the invention
Fig. 1 is that the overall flow of the psychological Early-warning Model construction method of the invention based on fine granularity sentiment dictionary is illustrated Figure.
Specific embodiment
Below in conjunction with the drawings and the specific embodiments, technical solution of the present invention is described in further detail.
As shown in Figure 1, the psychological Early-warning Model construction method of the invention based on fine granularity sentiment dictionary, process are specifically wrapped Include following steps:
Step 1 carries out machine translation, obtains the corresponding Chinese word of ANEW dictionary by artificial and machine translation method Allusion quotation, the step specifically include following processing:
Processing one arranges and merges all lexical informations in ANEW dictionary, rejects the past tense that cannot be indicated in Chinese English vocabulary;Processing two is obtained bilingual table by machine translation, while in translation process, further word for word arranged The accuracy for looking into confirmation vocabulary, prevents from causing biggish ambiguity;Inconsistent entry is showed in processing three, centering English dictionary, It is corrected, the option that selection is best suitable for sentiment analysis is added into dictionary.The final Chinese vocabulary for obtaining certain scale;
Step 2, vocabulary screening, delete some vocabulary for not being suitable for sentiment analysis, which specifically includes following processing: There are the vocabulary of differential expression in Chinese and English early warning for deletion, delete such word that will affect sentiment analysis result;
Step 3: carrying out the normalized of emotional value, the emotional value of word is normalized between -1~1, is specifically included Handle below: the standards of grading range of emotion time is 1~8 in ANEW, the negative affect intensity for indicating emotion from small to large of numerical value To the variation range of positive emotional intensity, the standards of grading of emotion word polar antagonism and PAD dimension values are considered, by the strong of word Angle value is normalized;It is shown using normalization formula such as formula (1),
Wherein Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the maximum intensity of emotion word place classification Value, the minimum value of classification intensity where Minvalue indicates emotion word, X indicates to change the emotional intensity of word, after Y indicates normalization Emotional intensity;
Step 4, the expansion that sentiment dictionary is carried out based on synonym woods extended edition, specifically include following processing: processing one, choosing It takes microblog data as corpus, filters out the extremely low vocabulary of the frequency of occurrences, construct more efficient sentiment dictionary;Processing two, Using the word similarity algorithm of Harbin Institute of Technology's Chinese thesaurus, and it is similar to combine existing semantic dictionary to carry out calculating lexical semantic Degree;
Step 5, the expansion that dictionary is carried out based on improved SO-PMI algorithm, specifically include following processing: processing one utilizes Network neologisms expand sentiment dictionary, select the benchmark word of positive emotion and Negative Affect, are denoted as PS and NS respectively;Processing Two, to neologisms word PMI value corresponding with PS, NS set calculating, it is denoted as WP and WN respectively.Between one word and a word The calculation of PMI value, as shown in formula 2,
Wherein, N indicates word number total in corpus, f (word1,word2) indicate word1,word2It is same in corpus When the frequency that occurs, f (word1) indicate word word1The frequency occurred in corpus, f (word2) indicate word2In corpus The frequency occurred in library, log2() function representation with 2 for bottom logarithmic function, such as formula (2) kind may be assumed that word1It is new Word, word2For the word (otherwise can also with) in PS, NS set.
If what is calculated is the PMI value of a word and a set of words, as shown in formula 3,
Wherein, WordSet indicates a set of words, and word' is the word in WordSet;
Processing three, the calculation formula such as formula (4) of SO-PMI value are shown,
SO (word)=PMI (word, PS)-PMI (word, NS) (4)
Wherein, SO (word) indicates the SO-PMI value of word word, and positive dictionary is added if obtained value is greater than 0, Passive dictionary is added if obtained value is less than 0, is otherwise added without any dictionary.
Processing four, the improved method of the present invention are such as formula (5), shown in formula (6)
SO (word)=max (Wi(word))wij (6)
Wherein, γ is adjustment coefficient, wij wijIndicate j-th of benchmark word, W in the i-th class emotional categoryi(word) indicate new Word word is in the SO-PMI value with the i-th class emotion word.What formula (6) showed neologisms word is calculated in inhomogeneity SO-PMI value selects maximum, while classification belonging to the available word.
Step 6: rule-based emotional orientation analysis is carried out for microblogging text, specifically includes following processing: processing One, word segmentation processing is carried out using the Chinese lexical analysis device ICLTCLAS of Institute of Computing Technology, CAS exploitation;Processing two, it is right Text decimation rule is expanded, and the text decimation rule that the present invention uses is obtained, and decimation rule is as shown in table 3;Processing three is incited somebody to action Polarity word is shifted, and one -1 coefficient is multiplied by for the emotion of negative word (if not, may not wait words) modification.For adversative Although the sentence that (such as, etc.) occurs, only carries out sentiment analysis to later half sentence;Processing four handles degree adverb, presses It is divided into Pyatyi according to the intensity of emotion, value is between 0.5-3;Handle five, for negative word+degree adverb+emotion word Structure and degree adverb+negative word+emotion word structure are analyzed and processed, and assign different weights.
Step 7 executes the sentiment analysis algorithm based on weight factor, and the sentiment analysis of the invention based on weight factor is calculated Method (Text sentiment orientation classification algorithm based on weighting Factor, WF-SO), as shown in formula (7).
Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, WijI-th is represented to belong to The emotional value of the emotion word W of emotional category j, CiRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.When α is 1, The tendency of text is the classification for the emotion word that frequency of occurrence is most in play, and when α tends to infinitely great, text tendency is in the sentence The classification of the maximum word of emotional intensity.
Table 2, macro average experiment comparing result
Table 3, text decimation rule
The present invention is using in NLP&CC (Natural Language Processing&Chinese Computing) 2013 The data that literary microblogging trend analysis evaluation and test provides.According to the requirement of NLP&CC, the identification and classification of mood sentence are carried out.Experimental result Obtained accuracy is 0.3420, recall rate 0.8873, and F value is 0.4935.Although result of the present invention in accuracy compared with It is low, but the method for building sentiment dictionary of the invention has the advantage that are as follows: it is not limited, is may be implemented completely by corpus quantity Unsupervised execution, is very suitable to that microblogging is a large amount of and unmarked data.
Macro is averagely the arithmetic mean of instantaneous value of each emotion class performance indicator, and micro- is averagely the performance of each instance document The arithmetic mean of instantaneous value of index.The present invention obtained in microblog data emotional semantic classification (good, happy, anger, sorrow are feared, and dislike, frightened) experiment it is micro- The accuracy of average result, recall rate, F value are respectively 0.3332,0.2959,0.3134, and the accuracy of macro average result is recalled Rate, F value are respectively 0.3411,0.2232,0.2698.More satisfied result is totally obtained.It as shown in table 1, is the present invention Method in macro average index with other method comparing results.

Claims (1)

1. a kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary, which is characterized in that this method following steps:
Step (1) obtains the corresponding Chinese dictionary of ANEW dictionary using the method for translation;
Step (2), vocabulary screening, delete the vocabulary that sentiment analysis is not suitable in the Chinese dictionary that step (1) obtains;
Step (3), the normalized for carrying out emotional value, the emotional value of word is normalized between -1~1, normalizes formula It indicates are as follows:
Wherein, Avevalue indicates that the average value of emotional intensity, Maxvalue indicate the maximum of intensity of emotion word place classification, The minimum value of classification intensity where Minvalue indicates emotion word, X indicate to change the emotional intensity of word, the feelings after Y expression normalization Feel intensity;
Step (4), the expansion that sentiment dictionary is carried out based on synonym woods extended edition, it is extremely low to filter out the frequency of occurrences in corpus Vocabulary constructs more efficient sentiment dictionary;Using the word similarity algorithm of Harbin Institute of Technology's Chinese thesaurus, and combine existing Semantic dictionary carries out calculating Similarity of Words;
Step (5), the expansion that dictionary is carried out based on improved SO-PMI algorithm, specific processing are as follows:
According to following formula:
SO (word)=max (Wi(word))wij
Wherein, wherein γ is adjustment coefficient, wijIndicate j-th of benchmark word, W in the i-th class emotional categoryi(word) neologisms are indicated Word is in the SO-PMI value with the i-th class emotion word;
For the SO-PMI value that neologisms word is calculated in inhomogeneity, selection is wherein with the neologisms of maximum SO-PMI word;
Step (6) carries out microblogging text rule-based emotional orientation analysis, including word segmentation processing, extracts to text Rule is expanded, polarity word is shifted, is handled degree adverb, for negative word+degree adverb+emotion word Structure and degree adverb+negative word+emotion word structure are analyzed and processed, and assign different weights;
Step (7) executes the sentiment analysis algorithm based on weight factor, which indicates are as follows:
Wherein, SO (S) is the Sentiment orientation value (Sentiment Orientation) of sentence S, WijIt represents i-th and belongs to emotion The emotional value of the emotion word W of classification j, CiRefer to the weight factor for modifying the emotion word, α is adjustment coefficient.
CN201610286515.1A 2016-04-29 2016-04-29 A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary Active CN105956095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610286515.1A CN105956095B (en) 2016-04-29 2016-04-29 A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610286515.1A CN105956095B (en) 2016-04-29 2016-04-29 A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary

Publications (2)

Publication Number Publication Date
CN105956095A CN105956095A (en) 2016-09-21
CN105956095B true CN105956095B (en) 2019-11-05

Family

ID=56914867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610286515.1A Active CN105956095B (en) 2016-04-29 2016-04-29 A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary

Country Status (1)

Country Link
CN (1) CN105956095B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708805A (en) * 2016-12-30 2017-05-24 深圳天珑无线科技有限公司 Text statistics-based psychoanalysis method and device
CN109408798B (en) * 2018-07-27 2021-09-14 昆明理工大学 Word emotional tendency judgment method
CN109284499A (en) * 2018-08-01 2019-01-29 数据地平线(广州)科技有限公司 A kind of industry text emotion acquisition methods, device and storage medium
CN109740044B (en) * 2018-12-24 2023-03-21 东华大学 Enterprise transaction early warning method based on time series intelligent prediction
CN109597999B (en) * 2018-12-26 2021-09-07 青海大学 Extraction modeling method and device for behavior semantic relation of emotional words
CN111538834A (en) * 2020-01-21 2020-08-14 中国银联股份有限公司 Emotion dictionary construction method and system, emotion recognition method and system and storage medium
CN112199956B (en) * 2020-11-02 2023-03-24 天津大学 Entity emotion analysis method based on deep representation learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090864A (en) * 2014-06-09 2014-10-08 合肥工业大学 Emotion dictionary building and emotion calculation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150213002A1 (en) * 2014-01-24 2015-07-30 International Business Machines Corporation Personal emotion state monitoring from social media

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090864A (en) * 2014-06-09 2014-10-08 合肥工业大学 Emotion dictionary building and emotion calculation method

Also Published As

Publication number Publication date
CN105956095A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
CN105956095B (en) A kind of psychological Early-warning Model construction method based on fine granularity sentiment dictionary
US9613024B1 (en) System and methods for creating datasets representing words and objects
US10437867B2 (en) Scenario generating apparatus and computer program therefor
Beliga Keyword extraction: a review of methods and approaches
CN102663139B (en) Method and system for constructing emotional dictionary
CN103207913B (en) The acquisition methods of commercial fine granularity semantic relation and system
CN105786991A (en) Chinese emotion new word recognition method and system in combination with user emotion expression ways
CN109960756B (en) News event information induction method
US10095685B2 (en) Phrase pair collecting apparatus and computer program therefor
Wang et al. Ptr: Phrase-based topical ranking for automatic keyphrase extraction in scientific publications
CN111080055A (en) Hotel scoring method, hotel recommendation method, electronic device and storage medium
CN114706972B (en) Automatic generation method of unsupervised scientific and technological information abstract based on multi-sentence compression
Shih et al. Enhancement of domain ontology construction using a crystallizing approach
Alfawareh et al. Resolving ambiguous entity through context knowledge and fuzzy approach
Wu et al. ECNU at SemEval-2017 task 3: Using traditional and deep learning methods to address community question answering task
Chang et al. A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING.
Stanchev Creating a similarity graph from WordNet
JP6850405B2 (en) Biological system Information retrieval system and method
Pan et al. SPRF: A semantic Pseudo-relevance Feedback enhancement for information retrieval via ConceptNet
CN109344331A (en) A kind of user feeling analysis method based on online community network
Li et al. A method of polarity computation of chinese sentiment words based on gaussian distribution
Sánchez et al. Web-scale taxonomy learning
Kuş et al. An Extractive Text Summarization Model for Generating Extended Abstracts of Medical Papers in Turkish
Al-Buraihy et al. An Ml-based classification scheme for analyzing the social network reviews of yemeni people.
CN109558586A (en) A kind of speech of information is according to from card methods of marking, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant