CN103530360A - Network society influence maximization algorithm based on microblog text affective computing - Google Patents

Network society influence maximization algorithm based on microblog text affective computing Download PDF

Info

Publication number
CN103530360A
CN103530360A CN201310475440.8A CN201310475440A CN103530360A CN 103530360 A CN103530360 A CN 103530360A CN 201310475440 A CN201310475440 A CN 201310475440A CN 103530360 A CN103530360 A CN 103530360A
Authority
CN
China
Prior art keywords
microblogging
word
network
emotion
microblog
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310475440.8A
Other languages
Chinese (zh)
Inventor
覃晓
元昌安
唐涛
元建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Teachers College
Original Assignee
Guangxi Teachers College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Teachers College filed Critical Guangxi Teachers College
Priority to CN201310475440.8A priority Critical patent/CN103530360A/en
Publication of CN103530360A publication Critical patent/CN103530360A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Abstract

The invention discloses a network society influence maximization algorithm based on microblog text affective computing, and mainly relates to the field of text affective computing, network society calculation and influence maximization, in particular to a microblog emotional tendency calculation method and a network community relation structure extraction method. Firstly, a special microblog dictionary is constructed according to anagrams, neologisms and new meanings (such as 'watch man') appearing in microblogs. The emotional tendency of microblog texts is analyzed according to a HowNet dictionary. Then, a network community user relationship tree is constructed according to the interaction operation relationship of various users. Finally, network design emotional influence maximization calculation is carried out according to the emotional tendency of the microblog texts and the network community user relationship tree. The problems that the user relationship in a network community structure is single, and maximization influence problem calculation is incomplete are solved, and the microblog emotional tendency can be calculated more accurately, and a network emotional influence maximization user set meeting reality better can be obtained.

Description

Having the greatest impact of a networked society algorithm calculating based on microblogging text emotion
Technical field
The present invention relates to text emotion calculating, a networked society calculating and having the greatest impact field, having the greatest impact of a networked society algorithm specifically calculating based on microblogging text emotion.
Background technology
Web influence maximize to calculate can find out those the most influential members in network, to them, provides information or commercial sample, wishes the recommendation by means of them, reaches to other members in network and propagates or the object of marketing.Along with the appearance of WEB2.0 and popular, the number of members of a lot of large-scale online social network sites also sharply rises, this makes traditional having the greatest impact algorithm and propagation model research all be faced with huge challenge, and in community network, having the greatest impact algorithm becomes study hotspot again.
The research method in this field is mainly 1 at present) utilize user interactive data to expand web influence scope.Be mainly the probability that affects by interaction data between calculating user, or user's historical behavior daily record is added up, carry out the degree of impact between estimating user.Yet this method is not considered the effect of the content of interaction data between user to coverage.2) algorithm based on Information Propagation Model is improved, to reach the object of the time complexity that reduces algorithm.
To sum up, about the most structure Network Based of the maximized research of web influence, mainly there are two problems in it at present:
1) do not have the content of abundant digging user interaction data to maximize the effect in model at web influence.At microblogging, comment on, have a talk about etc. in emerging system, the text of replying mutually, exchanging between the content-data that user submits to and user can reflect relation and the impact between user more faithfully.Therefore, for web influence, maximized calculating has stronger cogency to these data.
2) in web influence maximize to calculate, do not take into full account and be included in the emotion information of user in content distributed.Emotion information has very important using value for understanding a networked society relation.For example, in network public-opinion monitoring, if the user that can have maximum negative effect to those carries out key monitoring, just can contain in time rumour, to the diffusion of control deceptive information and pernicious social event, provide strong technical guarantee.
For this reason, emotion is calculated and introduced in web influence maximization calculating, will improve the accuracy of having the greatest impact calculating and the cogency of result of calculation.
Emotion is calculated to the having the greatest impact model field of introducing, at present domestic rarely seen one piece of report---towards having the greatest impact of the emotion model of microblogging.This article only relies on the deficiency of social network diagram for the derivation algorithm of having the greatest impact at present, forwarding relational tree based on microblogging and the emotion tendency of microblogging content, and user's social network diagram, proposed one can portray user feeling impact having the greatest impact of emotion model---emotion affects apportion model (sentiment influence distribution, SID).Accompanying drawing (1) is the implementation figure of SID model.
(1) SID model, when calculating microblogging emotion tendency, adopts the method based on sentiment dictionary.Sentiment dictionary adopts the word collection for sentiment analysis (HowNet) of knowing net.And calculate with the formula 1 in conceptual scheme.Yet the word expression way in microblogging is very abundant, having some distinctive words is contrary with the literal meaning, as " happiness ", " happiness " of quotation marks etc. have been added, or some words are new vogue words (as " cousin ") of network, know that net sentiment analysis word concentrates and not have, so when calculating emotion tendency, the method for prior art is too simple, emotional orientation analysis result can also further improve.
(2) SID model reflects with forwarding relational tree calculating the have the greatest impact relational structure of Hua Shi, Web Community of emotion.Yet in actual microblogging system, a lot of users, when expressing viewpoint, do not forward the microblogging that it pays close attention to user, microblogging that the user habit also having be take comment or replied its concern user is as expressing the mode of viewpoint.Therefore, forward the relation that relational tree can not comprehensively be reflected community, thus the colony that also just can not well show having the greatest impact.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, a kind of having the greatest impact of a networked society computing method of calculating based on microblogging text emotion are provided.
The technical scheme that the present invention solves the problems of the technologies described above is as follows:
Having the greatest impact of a networked society algorithm calculating based on microblogging text emotion, operation steps is as follows:
1. microblogging particular lexicon is constructed, and concrete grammar is as follows:
For there being a lot of alternative words in microblogging, the feature of new buzzword, uses Chinese information processing technology (participle, extraction feature etc.), and the alternative word obtaining, neologisms are explained; Analyze its part of speech, the meaning of a word, apposition, upper hyponym, tendentiousness and index structure, and store with dictionary form.
2. microblogging emotional orientation analysis, concrete grammar is as follows:
Knowing on the basis of net sentiment analysis word collection (HowNet) and the special word dictionary of microblogging, microblogging is carried out to emotional orientation analysis.First judge whether the word in microblogging appears in HowNet, if not, retrieve microblogging particular lexicon, then export the tendentiousness of this word, add up the tendentiousness of all words, and finally obtain the emotion tendency result of calculation of microblogging.
The computing method of the emotion tendency of microblogging d (EScore):
EScore ( d ) = PosC ( d ) - NegC ( d ) PosC ( d ) + NegC ( d ) + 1 - - - ( 1 )
In formula:
PosC (d) represents the number of forward word in microblogging d;
NegC (d) represents the number of negative sense word in d.Regulation: if<sup TranNum="73">eScore (d)</sup>>0.05, the emotion tendency of d is forward; If<sup TranNum="74">eScore (d)</sup><-0.05, the emotion tendency of d is negative sense.
3. community structure analysis, concrete grammar is as follows:
While analyzing in Web Community being related between two users (being called first and second), not only consider whether first has forwarded the microblogging of second, also will take into full account first and whether comment on or replied the microblogging of second.Regard user as node, in a given time window T, if node first forwards, commented on or replied the microblogging of node second, from second to first, draw a directed edge, if originally there is directed edge between second and first, make the weights on this directed edge add 1.Obtain thus some microblog users relational trees in T.Microblogging relational tree is carried out to union operation, obtain the relational tree collection of Web Community.
4. calculate the emotion degree of impact of whole network:
The relational tree set of the Web Community obtaining according to above-mentioned 1,2,3 steps, and the emotion tendency result of calculation of each microblogging, calculate the emotion degree of impact of whole network.Computing method are carried out by prior art.
The present invention's advantage compared with the prior art has:
1. by increasing microblogging particular lexicon, make the emotion tendency of entry in microblogging text add up science more, the emotional orientation analysis of microblogging is also more accurate.
2. by increasing the investigation to interactive operation kind between microblog users, make the description of Web Community's relation more careful, it is more accurate that having the greatest impact of network sentiment calculated, and result of calculation has more practical popularization significance.
Accompanying drawing explanation
Fig. 1 is existing techniques in realizing conceptual scheme related to the present invention.
Fig. 2 is the system assumption diagram of computation model of the present invention.
Fig. 3 is microblogging particular lexicon constructor process flow diagram in the present invention.
Fig. 4 is microblogging sentiment analysis device process flow diagram in the present invention.
Fig. 5 is Web Community's relation parser scanning process figure for the first time in the present invention.
Fig. 6 is the i time scanning process figure of Web Community's relation parser in the present invention.
Embodiment
Below in conjunction with accompanying drawing, the invention will be further described.
Having the greatest impact of a networked society algorithm that the present invention is based on the calculating of microblogging text emotion is implemented as shown in Figure 2, mainly comprises 4 parts.Wherein Web Community's emotion maximization analyzer is undertaken by prior art scheme, and the present invention only relates to other three modules: microblogging particular lexicon constructor, microblogging emotional orientation analysis device and community structure analyzer.Each module is described as follows:
(1) microblogging particular lexicon constructor: as shown in Figure 3, its concrete operations are as follows for the process flow diagram of microblogging particular lexicon constructor: utilize cyber stalker to collect microblogging text in each community of network, as microblogging text corpus.Process successively the microblogging of corpus, first microblogging is carried out to word segmentation processing, obtain Feature Words vector, checking one by one whether Feature Words appears at knows in net sentiment dictionary HowNet, if do not occur, show that current Feature Words is neologisms, is inserted in microblogging particular lexicon, and manual its meaning of a word, part of speech, apposition word, the emotion tendency etc. of marking.Until all microbloggings of corpus are disposed, obtain microblogging particular lexicon W.
(2) microblogging emotional orientation analysis device: as shown in Figure 4, its concrete operations are the process flow diagram of microblogging emotional orientation analysis module: select a suitable time window, judge the emotion tendency of community's microblogging in this time window.The microblogging that single treatment is to be analyzed, first carries out Feature Words extraction process to microblogging, obtains microblogging proper vector, checks that one by one whether Feature Words appears in HowNet, if so, directly exports the emotion tendency of this Feature Words; If not, retrieve this Feature Words normal school and appear in microblogging particular lexicon W, be to export the tendentiousness of this word; No, this word is inserted in W, Manual analysis, mark are also exported its emotion tendency.Add up the tendentiousness of all words, and finally obtain the emotion tendency result of calculation of microblogging.
The computing method of the emotion tendency of microblogging d (EScore):
EScore ( d ) = PosC ( d ) - NegC ( d ) PosC ( d ) + NegC ( d ) + 1 - - - ( 1 )
Wherein, PosC (d) represents the number of forward word in microblogging d; NegC (d) represents the number of negative sense word in d.Regulation: if<sup TranNum="98">eScore (d)</sup>>0.05, the emotion tendency of d is forward; If<sup TranNum="99">eScore (d)</sup><-0.05, the emotion tendency of d is negative sense.
(3) community structure analyzer: community structure is expressed as to a relational tree G=(V, E, W), wherein<img TranNum="101" file="BDA0000394954370000062.GIF" he="67" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="370"/>represent two users in community,<u, v>(<u, v>∈ E) represent the relation between user in Web Community, W<sub TranNum="102">uv</sub>expression<u, v>on weights.Web Community analyzes for the first time scanning process, and as shown in Figure 5, its concrete steps are as follows:
STEP1: set up microblog users table Ta, Ta is initialized as to empty table.
STEP2: setup times window T.If T > 0, turn STEP3, otherwise, turn STEP10.
STEP3: the microblogging in time window T is scanned for the first time, if the user u scanning does not exist in Ta, register u in Ta.Take u as tree root, investigate the relation of subsequent user and user u.
STEP4: scan next user v, if v does not exist in Ta, turn STEP5, otherwise, turn STEP6.
STEP5: register v in Ta.
STEP6: investigate user v and whether forwarded, comment on or replied the microblogging of user u, if there is one of above-mentioned interactive operation, investigate and between node u and v, whether have a directed edge, if do not exist, turn STEP7, otherwise, turn STEP8.
STEP7: set up directed edge<u, v>.
STEP8:W uv++
STEP9:T--, turns STEP4.
STEP10: the relational tree that u is tree root is take in output.
In order to obtain in time window T the relation between all users in Web Community, need to scan with front n-1 user Wei Gen,Dui Web Community in subscriber's meter Ta.The i time of Web Community (1 < i < n, n is the total number of users in community) process flow diagram is shown in to accompanying drawing 6.Its process with scan for the first time basically identically, just lacked toward the step of registered user in subscriber's meter Ta, explain no longer in detail here.

Claims (1)

1. having the greatest impact of a networked society algorithm calculating based on microblogging text emotion, operation steps is as follows:
1) microblogging particular lexicon structure, concrete grammar is as follows:
For there being a lot of alternative words in microblogging, the feature of new buzzword, uses Chinese information processing technology as participle, extraction feature, and the alternative word obtaining, neologisms are explained; Analyze its part of speech, the meaning of a word, apposition, upper hyponym, tendentiousness and index structure, and store with dictionary form;
2) microblogging emotional orientation analysis, concrete grammar is as follows:
Knowing on the basis of net sentiment analysis word collection HowNet and the special word dictionary of microblogging, microblogging is carried out to emotional orientation analysis, first judge whether the word in microblogging appears in HowNet, if not, retrieve microblogging particular lexicon, then export the tendentiousness of this word, add up the tendentiousness of all words, and finally obtain the emotion tendency result of calculation of microblogging
The computing method of the emotion tendency of microblogging d (EScore):
EScore ( d ) = PosC ( d ) - NegC ( d ) PosC ( d ) + NegC ( d ) + 1 - - - ( 1 )
In formula:
PosC (d) represents the number of forward word in microblogging d;
NegC (d) represents the number of negative sense word in d, regulation: if<sup TranNum="128">eScore (d)</sup>>0.05, the emotion tendency of d is forward; If<sup TranNum="129">eScore (d)</sup><-0.05, the emotion tendency of d is negative sense;
3) community structure analysis, concrete grammar is as follows:
When analyzing in Web Community two users and being called being related between first and second, not only consider whether first has forwarded the microblogging of second, also will take into full account first and whether comment on or replied the microblogging of second; Regard user as node, in a given time window T, if node first forwards, has commented on or replied the microblogging of node second, from second to first, draw a directed edge, if originally there is directed edge between second and first, make the weights on this directed edge add 1, obtain thus some microblog users relational trees in T; Microblogging relational tree is carried out to union operation, obtain the relational tree collection of Web Community;
4) calculate the emotion degree of impact of whole network:
According to above-mentioned 1), 2), 3) the relational tree set of the Web Community that obtains of step, and the emotion tendency result of calculation of each microblogging, calculates the emotion degree of impact of whole network, computing method are carried out by prior art.
CN201310475440.8A 2013-10-12 2013-10-12 Network society influence maximization algorithm based on microblog text affective computing Pending CN103530360A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310475440.8A CN103530360A (en) 2013-10-12 2013-10-12 Network society influence maximization algorithm based on microblog text affective computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310475440.8A CN103530360A (en) 2013-10-12 2013-10-12 Network society influence maximization algorithm based on microblog text affective computing

Publications (1)

Publication Number Publication Date
CN103530360A true CN103530360A (en) 2014-01-22

Family

ID=49932369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310475440.8A Pending CN103530360A (en) 2013-10-12 2013-10-12 Network society influence maximization algorithm based on microblog text affective computing

Country Status (1)

Country Link
CN (1) CN103530360A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834638A (en) * 2014-02-10 2015-08-12 腾讯科技(深圳)有限公司 Hot word presentation method and device and electronic equipment
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN106780073A (en) * 2017-01-11 2017-05-31 中南大学 A kind of community network maximizing influence start node choosing method for considering user behavior and emotion
CN107688630A (en) * 2017-08-21 2018-02-13 北京工业大学 A kind of more sentiment dictionary extending methods of Weakly supervised microblogging based on semanteme
CN108549632A (en) * 2018-04-03 2018-09-18 重庆邮电大学 A kind of social network influence power propagation model construction method based on sentiment analysis

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
欧高炎等: "面向微博的情感影响最大化模型", 《计算机科学与探索》 *
田家堂等: "一种新型的社会网络影响最大化算法", 《计算机学报》 *
谢丽星: "基于层次结构的多策略中文微博情感分析和特征抽取", 《中文信息学报》 *
陈晓东: "基于情感词典的中文微博情感倾向分析研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834638A (en) * 2014-02-10 2015-08-12 腾讯科技(深圳)有限公司 Hot word presentation method and device and electronic equipment
CN104834638B (en) * 2014-02-10 2019-07-05 腾讯科技(深圳)有限公司 A kind of hot word methods of exhibiting, device and electronic equipment
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN106598944B (en) * 2016-11-25 2019-03-19 中国民航大学 A kind of civil aviaton's security public sentiment sentiment analysis method
CN106780073A (en) * 2017-01-11 2017-05-31 中南大学 A kind of community network maximizing influence start node choosing method for considering user behavior and emotion
CN106780073B (en) * 2017-01-11 2021-05-25 中南大学 Social network influence maximization initial node selection method considering user behaviors and emotions
CN107688630A (en) * 2017-08-21 2018-02-13 北京工业大学 A kind of more sentiment dictionary extending methods of Weakly supervised microblogging based on semanteme
CN107688630B (en) * 2017-08-21 2020-05-22 北京工业大学 Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN108549632A (en) * 2018-04-03 2018-09-18 重庆邮电大学 A kind of social network influence power propagation model construction method based on sentiment analysis
CN108549632B (en) * 2018-04-03 2022-02-11 重庆邮电大学 Social network influence propagation model construction method based on emotion analysis

Similar Documents

Publication Publication Date Title
CN103324637B (en) A kind of hot information method for digging and system
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN102254038B (en) System and method for analyzing network comment relevance
CN107220386A (en) Information-pushing method and device
CN103530360A (en) Network society influence maximization algorithm based on microblog text affective computing
CN110162636A (en) Text mood reason recognition methods based on D-LSTM
CN105843897A (en) Vertical domain-oriented intelligent question and answer system
CN104268200A (en) Unsupervised named entity semantic disambiguation method based on deep learning
CN104133897B (en) A kind of microblog topic source tracing method based on topic influence
CN101127042A (en) Sensibility classification method based on language model
CN106202584A (en) A kind of microblog emotional based on standard dictionary and semantic rule analyzes method
CN103886501B (en) Post-loan risk early warning system based on semantic sentiment analysis
CN105975455A (en) Information analysis system based on bidirectional recursive neural network
CN104572877A (en) Detection method and detection system of game public opinion
CN104199845A (en) On-line comment sentiment classification method based on agent model
CN103095849B (en) A method and a system of spervised web service finding based on attribution forecast and error correction of quality of service (QoS)
Qi et al. DuReadervis: A Chinese dataset for open-domain document visual question answering
Li et al. Event extraction for criminal legal text
CN104750676B (en) Machine translation processing method and processing device
CN105912720A (en) Method for analyzing emotion-involved text data in computer
CN103810170A (en) Communication platform text classification method and device
JP2016181062A (en) Poster analysis device, program, and method for analyzing profile item of poster from posted sentence
CN105183806A (en) Method and system for identifying same user among different platforms
CN103530421A (en) Micro-blog based event similarity measuring method and system
CN110188352A (en) A kind of text subject determines method, apparatus, calculates equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140122

WD01 Invention patent application deemed withdrawn after publication