CN106503220A - A kind of microblogging emoticon affection computation method based on a mutual information - Google Patents
A kind of microblogging emoticon affection computation method based on a mutual information Download PDFInfo
- Publication number
- CN106503220A CN106503220A CN201610961250.0A CN201610961250A CN106503220A CN 106503220 A CN106503220 A CN 106503220A CN 201610961250 A CN201610961250 A CN 201610961250A CN 106503220 A CN106503220 A CN 106503220A
- Authority
- CN
- China
- Prior art keywords
- emoticon
- emotion
- word
- occurrence
- mutual information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The invention discloses a kind of affection computation method of the microblogging emoticon based on a mutual information.Step is as follows:(1) crawl extensive Sina weibo and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;(2) pretreatment is carried out to microblogging, to front connecing negative word, the emotion word of degree word is combined and operates and calculate its emotion value;(3) to pretreated microblogging extract " emoticon emotion word " co-occurrence to and constitute co-occurrence to set;(4) point mutual information of the emoticon in " emoticon emotion word " co-occurrence is to set with each co-occurrence emotion word is calculated;(5) the initial emotion value of each emoticon is calculated;(6) the initial emotion value of emoticon is standardized.The method using co-occurrence emotion word and emoticon point mutual information come calculate and standardization emoticon emotion value, its method simple, intuitive, as a result accurately.
Description
Technical field
The present invention relates to a kind of affection computation method to microblogging emoticon, is specifically related to a kind of based on point mutual information
Microblogging emoticon affection computation method.
Background technology
With developing rapidly for Internet technology, microblogging incorporates rapidly the life of people as a kind of novel information transmitting carrier
Living.People log in microblog by various terminals, and impromptu issue includes that the information such as word, picture, video, emoticon carry out table
Up to thought and emotion.Emoticon is concisely lively, intuitively can reflect emotion by various expressions and action, and therefore people are more next
Intert expression symbols emotion manyly in microblogging text.
Emoticon is still ignored just for text message to the main way that microblog emotional is analyzed at present, mainly there are two kinds
Thinking:Method based on emotion knowledge and the method based on machine learning.Method based on emotion knowledge mainly passes through emotion word
Allusion quotation or field emotion dictionary are judging the feeling polarities of text;Method based on machine learning regards sentiment analysis as traditional dividing
Class is carried out extraction feature and is classified.Due to eliminating emoticon, the simple sentiment analysis for relying on word can substantially lose greatly
The emotion information of amount, causes to analyze not accurate enough.
Processing method currently for microblogging emoticon mainly includes:Based on the emotion value that emotion word calculates microblogging text
And then infer the emotion value of emoticon.Firstly, for the microblogging comprising emotion word and emoticon per bar, feelings therein are found out
Sense word, the emotion value that simple for corresponding for emotion word emotion value phase Calais is obtained microblogging;Then for each emoticon, respectively
The positive negative affect total value of positive emotion microblogging and negative sense emotion microblogging comprising the emoticon is calculated, is finally taken absolute value larger
Emotion total value be the emoticon emotion value.
When calculating emoticon emotion value above by emotion word and microblogging text simple superposition, have the following disadvantages:
(1) emotion value is calculated by simple phase Calais only, does not account for the dependency letter between emotion word and emoticon
Breath;
(2) the emotion value of emoticon is determined by the order of magnitude of the positive negative affect total value of comparison, is directly given up
The little emotion total value of absolute value, have lost contribution information of this partial feeling to emoticon affection computation;
(3) the emotion value of the emoticon for finally giving does not have standardization, it is impossible to carry out Integrated comparative.
Content of the invention
Present invention aims to the deficiency in terms of the affection computation of current microblogging emoticon, there is provided one kind is based on
The microblogging emoticon affection computation method of point mutual information, the method is using the emotion word in microblogging and the co-occurrence feelings of emoticon
Condition determines power of influence of the emotion word to emoticon emotion value, further according to emotion word calculating both point mutual informations with this
Emotion value, calculate the emotion value of emoticon and be standardized.
To achieve the above object, design of the invention is as follows:Extensive microblog data is crawled as basic corpus
And filtered and pretreatment;Co-occurrence situation by emoticon and emotion word in corpus microblogging is calculating between them
Point mutual information, and then determine power of influence of the emotion word to emoticon emotion value, and with reference to emotion word emotion value weighting meter
Calculate the emotion value of emoticon;Finally the emotion value of emoticon is standardized.
According to above-mentioned design, the present invention adopts following technical proposals:
A kind of microblogging emoticon affection computation method based on a mutual information, which comprises the following steps that:
1) crawl extensive microblog data and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;
2) pre-operation that participle, stop words are filtered is carried out to every microblogging, and to front connecing the emotion word of negative word, degree word
It is combined and operates and calculate its emotion value;
3) to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to set;
4) for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence feelings
The point mutual information of sense word;
5) using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each emoticon is calculated
Number initial emotion value;
6) the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.
The step 4) in emoticon and co-occurrence emotion word point mutual information, its mutual information calculating formula is as follows:
Wherein, e is emoticon, and we is the emotion word with e co-occurrences, PMI (e;We it is) the point mutual information of e and we, p (e,
We), p (e) and p (we) represent that " e-we " co-occurrence occurs in " emoticon-emotion word " co-occurrence is to set to, e and we respectively
Probability.
The step 5) in emoticon initial emotion value, its initial emotion value calculating formula is as follows:
Wherein, e is emoticon, initial emotion values of the REV (e) for e, weiIt is the emotion word with e co-occurrences, PMI (e;wei)
For e and weiPoint mutual information, EV (wei) it is weiEmotion value, the value comes from Dalian University of Technology's Research into information retrieval room body
Sentiment dictionary and negative word, degree word and emotion contamination emotion value are calculated.
The step 6) in emoticon emotion value standardization, its standardized calculation formula is as follows:
2/ π of NEV (e)=arctan (REV (e))
Wherein, e is emoticon, standardized emotional values of the NEV (e) for e, initial emotion values of the REV (e) for e, the standard
, when being standardized to new data, standardized data will not be impacted for other for change method.
The present invention is compared with prior art compared with following outstanding feature and advantage:
First, it is believed that in microblogging, emotion word has certain power of influence to the emotion value of the emoticon of its co-occurrence, and just
Than in both point mutual informations, taking full advantage of the correlation structure feature in microblogging;Second, it is ensured that all and emoticon
The emotion word of co-occurrence can be participated in the calculating of emoticon emotion value according to dependency, and such result is more accurate;3rd,
Standardization is carried out to the initial emotion value of emoticon, has been easy to Integrated comparative to use with the later stage.
Description of the drawings
Fig. 1 is a kind of flow chart of microblogging emoticon affection computation method based on a mutual information of the present invention.
Specific embodiment
Embodiments of the invention are further described below in conjunction with accompanying drawing.
As shown in figure 1, a kind of microblogging emoticon affection computation method based on a mutual information, which comprises the following steps that:
1) crawl extensive microblog data and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;
2) pre-operation that participle, stop words are filtered is carried out to every microblogging, and to front connecing the emotion word of negative word, degree word
It is combined and operates and calculate its emotion value;
3) to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to set;
4) for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence feelings
The point mutual information of sense word;
5) using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each emoticon is calculated
Number initial emotion value;
6) the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.
The step 4) in emoticon and co-occurrence emotion word point mutual information, its mutual information calculating formula is as follows:
Wherein, e is emoticon, and we is the emotion word with e co-occurrences, PMI (e;We it is) the point mutual information of e and we, p (e,
We), p (e) and p (we) represent that " e-we " co-occurrence occurs in " emoticon-emotion word " co-occurrence is to set to, e and we respectively
Probability.
The step 5) in emoticon initial emotion value, its initial emotion value calculating formula is as follows:
Wherein, e is emoticon, initial emotion values of the REV (e) for e, weiIt is the emotion word with e co-occurrences, PMI (e;wei)
For e and weiPoint mutual information, EV (wei) it is weiEmotion value, the value comes from Dalian University of Technology's Research into information retrieval room body
Sentiment dictionary and negative word, degree word and emotion contamination emotion value are calculated.
The step 6) in emoticon emotion value standardization, its standardized calculation formula is as follows:
2/ π of NEV (e)=arctan (REV (e))
Wherein, e is emoticon, standardized emotional values of the NEV (e) for e, initial emotion values of the REV (e) for e, the standard
, when being standardized to new data, standardized data will not be impacted for other for change method.
Embodiment
The present embodiment crawls about 5,000,000 microblog datas as basic corpus from Sina weibo website.
A kind of microblogging emoticon affection computation method based on a mutual information, its step are as follows:
S1. about 5,000,000 microblog datas are crawled as basic corpus from Sina weibo website, to corpus in microblogging
Data are filtered, and only retain the microblogging comprising emoticon and emotion word simultaneously, and for example " [heartily] [heartily], this mascot was good
Lovely!", about 520,000 microbloggings are also remained after filtration;
S2. the pre-operations such as participle, stop words filtration are carried out to every microblogging, and to front connecing the emotion of negative word, degree word
Word is combined and operates and calculate its emotion value.For example, " [heartily] [heartily], this mascot was good lovely for microblogging!" through process after
For " [heartily] [heartily] mascot is good lovely ", the emotion value for combining emotion word " good lovely " is 1.8*5=9.0;
S3. to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to collection
Close.For example, there are 2 " [heartily]-good lovely " co-occurrences pair in " [heartily] [heartily] mascot is good lovely ", one has in corpus
234 " [heartily]-good lovely " co-occurrence pair;
S4. for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence
The point mutual information of emotion word.For example, 1898770 " emoticon-emotion word " co-occurrences pair, emoticon " [breathe out in corpus
Breathe out] ", combination emotion word " good lovely " occurrence number be respectively 38329,4691, " [heartily]-good lovely " co-occurrence is to having 234
Individual, then " [heartily] " be calculated as with the point mutual information of " good lovely ":
Wherein, PMI ([heartily];Good lovely) for " [heartily] " and " good lovely " point mutual information, p (well may be used by [heartily]
Love), p ([heartily]) and p (good lovely) represent respectively " [heartily]-good lovely " co-occurrence to, " [heartily] " and " well lovely " be in " table
Probability of the feelings symbol-emotion word " co-occurrence to appearance in set;
S5. using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each table is calculated
The initial emotion value of feelings symbol.For example, mutual with its all co-occurrence word using calculated emoticon in S4 " [heartily] "
Information, the initial emotion value of " [heartily] " are calculated as:
Wherein, initial emotion values of the REV ([heartily]) for " [heartily] ", weiIt is the emotion word with " [heartily] " co-occurrence,
PMI ([heartily];wei) it is " [heartily] " and weiPoint mutual information, EV (wei) it is weiEmotion value;
S6. the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.Example
Such as, the standardized emotional value of emoticon " [heartily] " is calculated as:
NEV ([heartily])=arctan (REV ([heartily])) 2/ π=arctan (2.90218751753) × 2/ π=
0.78875227095
Wherein, standardized emotional values of the NEV ([heartily]) for " [heartily] ", REV ([heartily]) are initial for " [heartily] "
Emotion value.
The scope of emoticon standardized emotional value is [- 1,1] herein, it can be seen that, emoticon " [heartily] "
Emotion value is interval in higher positive emotion, illustrates in actual applications, and emoticon " [heartily] " is mainly used in more accumulating
In extremely positive emotional expression.
Claims (4)
1. a kind of microblogging emoticon affection computation method based on a mutual information, it is characterised in that:With extensive microblog data
For basic corpus, based on the emotion word in microblogging, it is believed that emotion word has to the emotion value of the emoticon of its co-occurrence
Certain power of influence, and make full use of the point mutual information of co-occurrence emotion word and emoticon to determine its power of influence, and then calculate
And the emotion value of standardization emoticon;Which comprises the following steps that:
1) crawl extensive microblog data and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;
2) carry out the pre-operation that participle, stop words are filtered to every microblogging, and to front connecing negative word, the emotion word of degree word carries out
Combination operation simultaneously calculates its emotion value;
3) to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to set;
4) for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence emotion word
Point mutual information;
5) using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each emoticon is calculated
Initial emotion value;
6) the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.
2. the microblogging emoticon affection computation method based on a mutual information according to claim 1, it is characterised in that:Institute
State step 4) in emoticon and co-occurrence emotion word point mutual information, its mutual information calculating formula is as follows:
Wherein, e is emoticon, and we is the emotion word with e co-occurrences, PMI (e;We it is) the point mutual information of e and we, p (e, we), p
E () and p (we) represent that " e-we " co-occurrence occurs in " emoticon-emotion word " co-occurrence is to set to, e and we respectively general
Rate.
3. the microblogging emoticon affection computation method based on a mutual information according to claim 1, it is characterised in that:Institute
State step 5) in emoticon initial emotion value, its initial emotion value calculating formula is as follows:
Wherein, e is emoticon, initial emotion values of the REV (e) for e, weiIt is the emotion word with e co-occurrences, PMI (e;wei) it is e
With weiPoint mutual information, EV (wei) it is weiEmotion value, the value comes from Dalian University of Technology's Research into information retrieval room body feelings
Sense dictionary and negative word, degree word and emotion contamination emotion value are calculated.
4. the microblogging emoticon affection computation method based on a mutual information according to claim 1, it is characterised in that:Institute
State step 6) in emoticon emotion value standardization, its standardized calculation formula is as follows:
2/ π of NEV (e)=arctan (REV (e))
Wherein, e is emoticon, standardized emotional values of the NEV (e) for e, initial emotion values of the REV (e) for e, the standardization side
, when being standardized to new data, standardized data will not be impacted for other for method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610961250.0A CN106503220A (en) | 2016-10-28 | 2016-10-28 | A kind of microblogging emoticon affection computation method based on a mutual information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610961250.0A CN106503220A (en) | 2016-10-28 | 2016-10-28 | A kind of microblogging emoticon affection computation method based on a mutual information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106503220A true CN106503220A (en) | 2017-03-15 |
Family
ID=58322544
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610961250.0A Pending CN106503220A (en) | 2016-10-28 | 2016-10-28 | A kind of microblogging emoticon affection computation method based on a mutual information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503220A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943789A (en) * | 2017-11-17 | 2018-04-20 | 新华网股份有限公司 | Mood analysis method, device and the server of topic information |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077207A (en) * | 2012-12-28 | 2013-05-01 | 深圳先进技术研究院 | Method and system for analyzing microblog happiness index |
CN103646088A (en) * | 2013-12-13 | 2014-03-19 | 合肥工业大学 | Product comment fine-grained emotional element extraction method based on CRFs and SVM |
CN103699626A (en) * | 2013-12-20 | 2014-04-02 | 华南理工大学 | Method and system for analysing individual emotion tendency of microblog user |
CN105740228A (en) * | 2016-01-25 | 2016-07-06 | 云南大学 | Internet public opinion analysis method |
CN105843796A (en) * | 2016-03-28 | 2016-08-10 | 北京邮电大学 | Microblog emotional tendency analysis method and device |
-
2016
- 2016-10-28 CN CN201610961250.0A patent/CN106503220A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103077207A (en) * | 2012-12-28 | 2013-05-01 | 深圳先进技术研究院 | Method and system for analyzing microblog happiness index |
CN103646088A (en) * | 2013-12-13 | 2014-03-19 | 合肥工业大学 | Product comment fine-grained emotional element extraction method based on CRFs and SVM |
CN103699626A (en) * | 2013-12-20 | 2014-04-02 | 华南理工大学 | Method and system for analysing individual emotion tendency of microblog user |
CN105740228A (en) * | 2016-01-25 | 2016-07-06 | 云南大学 | Internet public opinion analysis method |
CN105843796A (en) * | 2016-03-28 | 2016-08-10 | 北京邮电大学 | Microblog emotional tendency analysis method and device |
Non-Patent Citations (2)
Title |
---|
SHI FENG ET AL.: ""A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs", 《WORLD WIDE WEB》 * |
王文远: "面向情感倾向分析的微博表情情感词典构建及应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943789A (en) * | 2017-11-17 | 2018-04-20 | 新华网股份有限公司 | Mood analysis method, device and the server of topic information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103150367B (en) | A kind of Sentiment orientation analytical approach of Chinese microblogging | |
CN104615593B (en) | Hot microblog topic automatic testing method and device | |
Jiang et al. | Target-dependent twitter sentiment classification | |
CN103761239B (en) | A kind of method utilizing emoticon that microblogging is carried out Sentiment orientation classification | |
CN103123618B (en) | Text similarity acquisition methods and device | |
CN104298714B (en) | A kind of mass text automatic marking method based on abnormality processing | |
CN104102681B (en) | Microblog key event acquiring method and device | |
Tago et al. | Influence analysis of emotional behaviors and user relationships based on Twitter data | |
CN104679738B (en) | Internet hot words mining method and device | |
WO2020108430A1 (en) | Weibo sentiment analysis method and system | |
CN105045857A (en) | Social network rumor recognition method and system | |
CN104268160A (en) | Evaluation object extraction method based on domain dictionary and semantic roles | |
Chatzakou et al. | Detecting variation of emotions in online activities | |
CN105389389B (en) | A kind of network public-opinion propagation situation medium control analysis method | |
CN103246644B (en) | Method and device for processing Internet public opinion information | |
CN103034626A (en) | Emotion analyzing system and method | |
CN109783614B (en) | Differential privacy disclosure detection method and system for to-be-published text of social network | |
CN106484829B (en) | A kind of foundation and microblogging diversity search method of microblogging order models | |
CN105893582A (en) | Social network user emotion distinguishing method | |
CN103440235A (en) | Method and device for identifying text emotion types based on cognitive structure model | |
CN105183717A (en) | OSN user emotion analysis method based on random forest and user relationship | |
CN105843796A (en) | Microblog emotional tendency analysis method and device | |
CN106126502A (en) | A kind of emotional semantic classification system and method based on support vector machine | |
CN107229689A (en) | A kind of method that microblogging public sentiment risk is studied and judged | |
CN107305545A (en) | A kind of recognition methods of the network opinion leader based on text tendency analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170315 |
|
RJ01 | Rejection of invention patent application after publication |