CN106503220A - A kind of microblogging emoticon affection computation method based on a mutual information - Google Patents

A kind of microblogging emoticon affection computation method based on a mutual information Download PDF

Info

Publication number
CN106503220A
CN106503220A CN201610961250.0A CN201610961250A CN106503220A CN 106503220 A CN106503220 A CN 106503220A CN 201610961250 A CN201610961250 A CN 201610961250A CN 106503220 A CN106503220 A CN 106503220A
Authority
CN
China
Prior art keywords
emoticon
emotion
word
occurrence
mutual information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610961250.0A
Other languages
Chinese (zh)
Inventor
陈雪
郭峻材
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201610961250.0A priority Critical patent/CN106503220A/en
Publication of CN106503220A publication Critical patent/CN106503220A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a kind of affection computation method of the microblogging emoticon based on a mutual information.Step is as follows:(1) crawl extensive Sina weibo and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;(2) pretreatment is carried out to microblogging, to front connecing negative word, the emotion word of degree word is combined and operates and calculate its emotion value;(3) to pretreated microblogging extract " emoticon emotion word " co-occurrence to and constitute co-occurrence to set;(4) point mutual information of the emoticon in " emoticon emotion word " co-occurrence is to set with each co-occurrence emotion word is calculated;(5) the initial emotion value of each emoticon is calculated;(6) the initial emotion value of emoticon is standardized.The method using co-occurrence emotion word and emoticon point mutual information come calculate and standardization emoticon emotion value, its method simple, intuitive, as a result accurately.

Description

A kind of microblogging emoticon affection computation method based on a mutual information
Technical field
The present invention relates to a kind of affection computation method to microblogging emoticon, is specifically related to a kind of based on point mutual information Microblogging emoticon affection computation method.
Background technology
With developing rapidly for Internet technology, microblogging incorporates rapidly the life of people as a kind of novel information transmitting carrier Living.People log in microblog by various terminals, and impromptu issue includes that the information such as word, picture, video, emoticon carry out table Up to thought and emotion.Emoticon is concisely lively, intuitively can reflect emotion by various expressions and action, and therefore people are more next Intert expression symbols emotion manyly in microblogging text.
Emoticon is still ignored just for text message to the main way that microblog emotional is analyzed at present, mainly there are two kinds Thinking:Method based on emotion knowledge and the method based on machine learning.Method based on emotion knowledge mainly passes through emotion word Allusion quotation or field emotion dictionary are judging the feeling polarities of text;Method based on machine learning regards sentiment analysis as traditional dividing Class is carried out extraction feature and is classified.Due to eliminating emoticon, the simple sentiment analysis for relying on word can substantially lose greatly The emotion information of amount, causes to analyze not accurate enough.
Processing method currently for microblogging emoticon mainly includes:Based on the emotion value that emotion word calculates microblogging text And then infer the emotion value of emoticon.Firstly, for the microblogging comprising emotion word and emoticon per bar, feelings therein are found out Sense word, the emotion value that simple for corresponding for emotion word emotion value phase Calais is obtained microblogging;Then for each emoticon, respectively The positive negative affect total value of positive emotion microblogging and negative sense emotion microblogging comprising the emoticon is calculated, is finally taken absolute value larger Emotion total value be the emoticon emotion value.
When calculating emoticon emotion value above by emotion word and microblogging text simple superposition, have the following disadvantages:
(1) emotion value is calculated by simple phase Calais only, does not account for the dependency letter between emotion word and emoticon Breath;
(2) the emotion value of emoticon is determined by the order of magnitude of the positive negative affect total value of comparison, is directly given up The little emotion total value of absolute value, have lost contribution information of this partial feeling to emoticon affection computation;
(3) the emotion value of the emoticon for finally giving does not have standardization, it is impossible to carry out Integrated comparative.
Content of the invention
Present invention aims to the deficiency in terms of the affection computation of current microblogging emoticon, there is provided one kind is based on The microblogging emoticon affection computation method of point mutual information, the method is using the emotion word in microblogging and the co-occurrence feelings of emoticon Condition determines power of influence of the emotion word to emoticon emotion value, further according to emotion word calculating both point mutual informations with this Emotion value, calculate the emotion value of emoticon and be standardized.
To achieve the above object, design of the invention is as follows:Extensive microblog data is crawled as basic corpus And filtered and pretreatment;Co-occurrence situation by emoticon and emotion word in corpus microblogging is calculating between them Point mutual information, and then determine power of influence of the emotion word to emoticon emotion value, and with reference to emotion word emotion value weighting meter Calculate the emotion value of emoticon;Finally the emotion value of emoticon is standardized.
According to above-mentioned design, the present invention adopts following technical proposals:
A kind of microblogging emoticon affection computation method based on a mutual information, which comprises the following steps that:
1) crawl extensive microblog data and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;
2) pre-operation that participle, stop words are filtered is carried out to every microblogging, and to front connecing the emotion word of negative word, degree word It is combined and operates and calculate its emotion value;
3) to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to set;
4) for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence feelings The point mutual information of sense word;
5) using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each emoticon is calculated Number initial emotion value;
6) the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.
The step 4) in emoticon and co-occurrence emotion word point mutual information, its mutual information calculating formula is as follows:
Wherein, e is emoticon, and we is the emotion word with e co-occurrences, PMI (e;We it is) the point mutual information of e and we, p (e, We), p (e) and p (we) represent that " e-we " co-occurrence occurs in " emoticon-emotion word " co-occurrence is to set to, e and we respectively Probability.
The step 5) in emoticon initial emotion value, its initial emotion value calculating formula is as follows:
Wherein, e is emoticon, initial emotion values of the REV (e) for e, weiIt is the emotion word with e co-occurrences, PMI (e;wei) For e and weiPoint mutual information, EV (wei) it is weiEmotion value, the value comes from Dalian University of Technology's Research into information retrieval room body Sentiment dictionary and negative word, degree word and emotion contamination emotion value are calculated.
The step 6) in emoticon emotion value standardization, its standardized calculation formula is as follows:
2/ π of NEV (e)=arctan (REV (e))
Wherein, e is emoticon, standardized emotional values of the NEV (e) for e, initial emotion values of the REV (e) for e, the standard , when being standardized to new data, standardized data will not be impacted for other for change method.
The present invention is compared with prior art compared with following outstanding feature and advantage:
First, it is believed that in microblogging, emotion word has certain power of influence to the emotion value of the emoticon of its co-occurrence, and just Than in both point mutual informations, taking full advantage of the correlation structure feature in microblogging;Second, it is ensured that all and emoticon The emotion word of co-occurrence can be participated in the calculating of emoticon emotion value according to dependency, and such result is more accurate;3rd, Standardization is carried out to the initial emotion value of emoticon, has been easy to Integrated comparative to use with the later stage.
Description of the drawings
Fig. 1 is a kind of flow chart of microblogging emoticon affection computation method based on a mutual information of the present invention.
Specific embodiment
Embodiments of the invention are further described below in conjunction with accompanying drawing.
As shown in figure 1, a kind of microblogging emoticon affection computation method based on a mutual information, which comprises the following steps that:
1) crawl extensive microblog data and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;
2) pre-operation that participle, stop words are filtered is carried out to every microblogging, and to front connecing the emotion word of negative word, degree word It is combined and operates and calculate its emotion value;
3) to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to set;
4) for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence feelings The point mutual information of sense word;
5) using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each emoticon is calculated Number initial emotion value;
6) the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.
The step 4) in emoticon and co-occurrence emotion word point mutual information, its mutual information calculating formula is as follows:
Wherein, e is emoticon, and we is the emotion word with e co-occurrences, PMI (e;We it is) the point mutual information of e and we, p (e, We), p (e) and p (we) represent that " e-we " co-occurrence occurs in " emoticon-emotion word " co-occurrence is to set to, e and we respectively Probability.
The step 5) in emoticon initial emotion value, its initial emotion value calculating formula is as follows:
Wherein, e is emoticon, initial emotion values of the REV (e) for e, weiIt is the emotion word with e co-occurrences, PMI (e;wei) For e and weiPoint mutual information, EV (wei) it is weiEmotion value, the value comes from Dalian University of Technology's Research into information retrieval room body Sentiment dictionary and negative word, degree word and emotion contamination emotion value are calculated.
The step 6) in emoticon emotion value standardization, its standardized calculation formula is as follows:
2/ π of NEV (e)=arctan (REV (e))
Wherein, e is emoticon, standardized emotional values of the NEV (e) for e, initial emotion values of the REV (e) for e, the standard , when being standardized to new data, standardized data will not be impacted for other for change method.
Embodiment
The present embodiment crawls about 5,000,000 microblog datas as basic corpus from Sina weibo website.
A kind of microblogging emoticon affection computation method based on a mutual information, its step are as follows:
S1. about 5,000,000 microblog datas are crawled as basic corpus from Sina weibo website, to corpus in microblogging Data are filtered, and only retain the microblogging comprising emoticon and emotion word simultaneously, and for example " [heartily] [heartily], this mascot was good Lovely!", about 520,000 microbloggings are also remained after filtration;
S2. the pre-operations such as participle, stop words filtration are carried out to every microblogging, and to front connecing the emotion of negative word, degree word Word is combined and operates and calculate its emotion value.For example, " [heartily] [heartily], this mascot was good lovely for microblogging!" through process after For " [heartily] [heartily] mascot is good lovely ", the emotion value for combining emotion word " good lovely " is 1.8*5=9.0;
S3. to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to collection Close.For example, there are 2 " [heartily]-good lovely " co-occurrences pair in " [heartily] [heartily] mascot is good lovely ", one has in corpus 234 " [heartily]-good lovely " co-occurrence pair;
S4. for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence The point mutual information of emotion word.For example, 1898770 " emoticon-emotion word " co-occurrences pair, emoticon " [breathe out in corpus Breathe out] ", combination emotion word " good lovely " occurrence number be respectively 38329,4691, " [heartily]-good lovely " co-occurrence is to having 234 Individual, then " [heartily] " be calculated as with the point mutual information of " good lovely ":
Wherein, PMI ([heartily];Good lovely) for " [heartily] " and " good lovely " point mutual information, p (well may be used by [heartily] Love), p ([heartily]) and p (good lovely) represent respectively " [heartily]-good lovely " co-occurrence to, " [heartily] " and " well lovely " be in " table Probability of the feelings symbol-emotion word " co-occurrence to appearance in set;
S5. using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each table is calculated The initial emotion value of feelings symbol.For example, mutual with its all co-occurrence word using calculated emoticon in S4 " [heartily] " Information, the initial emotion value of " [heartily] " are calculated as:
Wherein, initial emotion values of the REV ([heartily]) for " [heartily] ", weiIt is the emotion word with " [heartily] " co-occurrence, PMI ([heartily];wei) it is " [heartily] " and weiPoint mutual information, EV (wei) it is weiEmotion value;
S6. the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.Example Such as, the standardized emotional value of emoticon " [heartily] " is calculated as:
NEV ([heartily])=arctan (REV ([heartily])) 2/ π=arctan (2.90218751753) × 2/ π= 0.78875227095
Wherein, standardized emotional values of the NEV ([heartily]) for " [heartily] ", REV ([heartily]) are initial for " [heartily] " Emotion value.
The scope of emoticon standardized emotional value is [- 1,1] herein, it can be seen that, emoticon " [heartily] " Emotion value is interval in higher positive emotion, illustrates in actual applications, and emoticon " [heartily] " is mainly used in more accumulating In extremely positive emotional expression.

Claims (4)

1. a kind of microblogging emoticon affection computation method based on a mutual information, it is characterised in that:With extensive microblog data For basic corpus, based on the emotion word in microblogging, it is believed that emotion word has to the emotion value of the emoticon of its co-occurrence Certain power of influence, and make full use of the point mutual information of co-occurrence emotion word and emoticon to determine its power of influence, and then calculate And the emotion value of standardization emoticon;Which comprises the following steps that:
1) crawl extensive microblog data and filtered, only retain the microblogging comprising emoticon and emotion word simultaneously;
2) carry out the pre-operation that participle, stop words are filtered to every microblogging, and to front connecing negative word, the emotion word of degree word carries out Combination operation simultaneously calculates its emotion value;
3) to pretreated microblog data, extract " emoticon-emotion word " co-occurrence to and constitute co-occurrence to set;
4) for each emoticon, calculate it in " emoticon-emotion word " co-occurrence is to set with each co-occurrence emotion word Point mutual information;
5) using emoticon and the emotion value for putting mutual information and emotion word of co-occurrence emotion word, each emoticon is calculated Initial emotion value;
6) the initial emotion value of all emoticons is standardized so as to normalize to [- 1,1] interval.
2. the microblogging emoticon affection computation method based on a mutual information according to claim 1, it is characterised in that:Institute State step 4) in emoticon and co-occurrence emotion word point mutual information, its mutual information calculating formula is as follows:
P M I ( e ; w e ) = l o g p ( e , w e ) p ( e ) · p ( w e )
Wherein, e is emoticon, and we is the emotion word with e co-occurrences, PMI (e;We it is) the point mutual information of e and we, p (e, we), p E () and p (we) represent that " e-we " co-occurrence occurs in " emoticon-emotion word " co-occurrence is to set to, e and we respectively general Rate.
3. the microblogging emoticon affection computation method based on a mutual information according to claim 1, it is characterised in that:Institute State step 5) in emoticon initial emotion value, its initial emotion value calculating formula is as follows:
R E V ( e ) = Σ i ( P M I ( e ; we i ) · E V ( we i ) ) Σ i P M I ( e ; we i )
Wherein, e is emoticon, initial emotion values of the REV (e) for e, weiIt is the emotion word with e co-occurrences, PMI (e;wei) it is e With weiPoint mutual information, EV (wei) it is weiEmotion value, the value comes from Dalian University of Technology's Research into information retrieval room body feelings Sense dictionary and negative word, degree word and emotion contamination emotion value are calculated.
4. the microblogging emoticon affection computation method based on a mutual information according to claim 1, it is characterised in that:Institute State step 6) in emoticon emotion value standardization, its standardized calculation formula is as follows:
2/ π of NEV (e)=arctan (REV (e))
Wherein, e is emoticon, standardized emotional values of the NEV (e) for e, initial emotion values of the REV (e) for e, the standardization side , when being standardized to new data, standardized data will not be impacted for other for method.
CN201610961250.0A 2016-10-28 2016-10-28 A kind of microblogging emoticon affection computation method based on a mutual information Pending CN106503220A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610961250.0A CN106503220A (en) 2016-10-28 2016-10-28 A kind of microblogging emoticon affection computation method based on a mutual information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610961250.0A CN106503220A (en) 2016-10-28 2016-10-28 A kind of microblogging emoticon affection computation method based on a mutual information

Publications (1)

Publication Number Publication Date
CN106503220A true CN106503220A (en) 2017-03-15

Family

ID=58322544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610961250.0A Pending CN106503220A (en) 2016-10-28 2016-10-28 A kind of microblogging emoticon affection computation method based on a mutual information

Country Status (1)

Country Link
CN (1) CN106503220A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943789A (en) * 2017-11-17 2018-04-20 新华网股份有限公司 Mood analysis method, device and the server of topic information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077207A (en) * 2012-12-28 2013-05-01 深圳先进技术研究院 Method and system for analyzing microblog happiness index
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN103699626A (en) * 2013-12-20 2014-04-02 华南理工大学 Method and system for analysing individual emotion tendency of microblog user
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN105843796A (en) * 2016-03-28 2016-08-10 北京邮电大学 Microblog emotional tendency analysis method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077207A (en) * 2012-12-28 2013-05-01 深圳先进技术研究院 Method and system for analyzing microblog happiness index
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN103699626A (en) * 2013-12-20 2014-04-02 华南理工大学 Method and system for analysing individual emotion tendency of microblog user
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN105843796A (en) * 2016-03-28 2016-08-10 北京邮电大学 Microblog emotional tendency analysis method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHI FENG ET AL.: ""A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs", 《WORLD WIDE WEB》 *
王文远: "面向情感倾向分析的微博表情情感词典构建及应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943789A (en) * 2017-11-17 2018-04-20 新华网股份有限公司 Mood analysis method, device and the server of topic information

Similar Documents

Publication Publication Date Title
CN103150367B (en) A kind of Sentiment orientation analytical approach of Chinese microblogging
CN104615593B (en) Hot microblog topic automatic testing method and device
Jiang et al. Target-dependent twitter sentiment classification
CN103761239B (en) A kind of method utilizing emoticon that microblogging is carried out Sentiment orientation classification
CN103123618B (en) Text similarity acquisition methods and device
CN104298714B (en) A kind of mass text automatic marking method based on abnormality processing
CN104102681B (en) Microblog key event acquiring method and device
Tago et al. Influence analysis of emotional behaviors and user relationships based on Twitter data
CN104679738B (en) Internet hot words mining method and device
WO2020108430A1 (en) Weibo sentiment analysis method and system
CN105045857A (en) Social network rumor recognition method and system
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
Chatzakou et al. Detecting variation of emotions in online activities
CN105389389B (en) A kind of network public-opinion propagation situation medium control analysis method
CN103246644B (en) Method and device for processing Internet public opinion information
CN103034626A (en) Emotion analyzing system and method
CN109783614B (en) Differential privacy disclosure detection method and system for to-be-published text of social network
CN106484829B (en) A kind of foundation and microblogging diversity search method of microblogging order models
CN105893582A (en) Social network user emotion distinguishing method
CN103440235A (en) Method and device for identifying text emotion types based on cognitive structure model
CN105183717A (en) OSN user emotion analysis method based on random forest and user relationship
CN105843796A (en) Microblog emotional tendency analysis method and device
CN106126502A (en) A kind of emotional semantic classification system and method based on support vector machine
CN107229689A (en) A kind of method that microblogging public sentiment risk is studied and judged
CN107305545A (en) A kind of recognition methods of the network opinion leader based on text tendency analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170315

RJ01 Rejection of invention patent application after publication