CN106547875A - A kind of online incident detection method of the microblogging based on sentiment analysis and label - Google Patents

A kind of online incident detection method of the microblogging based on sentiment analysis and label Download PDF

Info

Publication number
CN106547875A
CN106547875A CN201610945406.6A CN201610945406A CN106547875A CN 106547875 A CN106547875 A CN 106547875A CN 201610945406 A CN201610945406 A CN 201610945406A CN 106547875 A CN106547875 A CN 106547875A
Authority
CN
China
Prior art keywords
label
microblogging
word
emotion
sentiment analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610945406.6A
Other languages
Chinese (zh)
Other versions
CN106547875B (en
Inventor
邹晓梅
杨静
张健沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201610945406.6A priority Critical patent/CN106547875B/en
Publication of CN106547875A publication Critical patent/CN106547875A/en
Application granted granted Critical
Publication of CN106547875B publication Critical patent/CN106547875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to network detection field, and in particular to a kind of online incident detection method of the microblogging based on sentiment analysis and label.The present invention includes:Using sentiment classification model emotion wheel, sentiment analysis model --- emotion co-occurrence figure is constructed;The sentiment analysis model constructed using step (1), carries out emotional semantic classification to the microblogging in microblogging stream, detects the burst period of microblogging stream using kleinberg algorithms;The microblog label in burst period is extracted, rubbish label is filtered out, word segmentation processing is carried out to remaining label;The initial key word of formation event;The keyword generated using step (3), in extraction microblogging, the word related to this keyword, forms the final description of event.Emotion co-occurrence figure of the present invention construction based on emotion wheel, emotional semantic classification are more careful, and emotion is easier to understand and explains, higher relative to the event detection accuracy rate based on emotional symbol.

Description

A kind of online incident detection method of the microblogging based on sentiment analysis and label
Technical field
The invention belongs to network detection field, and in particular to a kind of microblogging based on sentiment analysis and label is happened suddenly thing online Part detection method.
Background technology
Flourish recently as Web2.0 technologies, emerge a series of social networks.These social networks such as Sina Microblogging, push away top grade and attract substantial amounts of user.Users are active on social networks, issue substantial amounts of Twitter message, wherein Comprising about some events view or viewpoint.By excavating these Twitter messages, can obtain substantial amounts of such as user feeling Deng deeper information.The use of these profound information can be that government or enterprise provide service, for example, government can be with Judge whether people are supported to law bill using these information, which type of view is held to a certain social event, so as to enter Row public sentiment is controlled and is guided;The behavioural habits and preference of user by the Twitter message of digging user, can be learnt by enterprise, so as to The commodity of most possible interested or purchase to its recommended user.
For incident detection, conventional method has two kinds, i.e., the incident detection and feature based based on document Incident detection.Based on the incident detection thought of document it is, by document representation into term vector or name entity vector, meter The similarity between document is calculated, cluster is carried out to document and is formed event.It is to excavate number that event detection is carried out to feature based burst One of effective ways according to accident in stream, its main thought are abstracting document Feature Words first, by analyze Feature Words with Then Feature Words with same burst track are polymerized by time change track detection burst phenomenon, form accident. However, this two methods are in the case of microblogging short text and do not apply to.Microblog data amount is big first, for each microblogging is carried Take Feature Words, formation tfidf matrixes to require a great deal of time.Secondly, microblogging expression way is irregular, and form is changeable, can Substantial amounts of neologisms can be contained, the matrix of formation is sparse, be unfavorable for calculating similarity, increase identification difficulty.Meanwhile, conventional method is only The extraction of accident is completed, deeper analysis, such as sentiment analysis are not carried out to accident.
The content of the invention
It is an object of the invention to provide a kind of online incident detection model for microblog data stream short text, energy Enough microbloggings based on sentiment analysis and label for accurately and rapidly extracting the accident in data flow are happened suddenly thing online Part detection method.
The object of the present invention is achieved like this:
A kind of online incident detection method of the microblogging based on sentiment analysis and label, comprises the steps:
(1) using sentiment classification model emotion wheel, construct sentiment analysis model --- emotion co-occurrence figure;
(2) the sentiment analysis model constructed using step (1), carries out emotional semantic classification to the microblogging in microblogging stream, adopts Kleinberg algorithms detect the burst period of microblogging stream;
(3) microblog label in burst period is extracted, rubbish label is filtered out, word segmentation processing is carried out to remaining label;Formed The initial key word of event;
(4) keyword generated using step (3), in extraction microblogging, the word related to this keyword, forms event most Describe eventually.
In the step (1), emotion co-occurrence figure is constructed by the following method:
(1.1) using emotion wheel model, manually rational vocabulary is given to emotional symbol;
(1.2) word segmentation processing is carried out to original microblog data, microblogging corpus is formed;
(1.3) using HowNet dictionaries, using word Similarity measures microblogging corpus word and emotion based on distance Similarity between symbol word;
(1.3) used in, equation below calculates the similitude of word detection:
W in formula1And W2Represent word, word W1There is the k senses of a dictionary entry:{n11,n12,…,n1k, word W2There is the p senses of a dictionary entry:{n21, n22,…,n2p, p1And p2Represent that two justice are former, d is p1And p2Path in adopted original hierarchical system, is a positive integer; α is an adjustable parameter;
(1.4) similarity is set up more than the connection between the word of given threshold value λ, complete the construction of emotion co-occurrence figure;λ is selected Select 0.6.
In described step (3), comprise the steps of:
(3.1) label to extracting carries out part-of-speech tagging, removes the mark of the only label of verb or only one of which noun Sign;
(3.2) weed out the label containing additional character in label;
(3.3) weed out containing standard date format, only have the label of numeral and punctuation mark;
Comprise the following steps in described step (4):
(4.1) word segmentation processing is carried out to remaining label in burst period;
(4.2) calculate the frequent mode in burst period about microblog label keyword;
(4.3) 2 item collections in frequent mode are extracted, the mutual information between word in 2 item collection is calculated;
(4.4) retain morphology of the mutual information more than given threshold value γ into final event description;The value of γ selects 1.5;
In step 4.4, mutual information computing formula is:
C(W1) and C (W2) respectively represent corpus in contain W1And W2Microblogging quantity, C (W1,W2) represent and contain W simultaneously1 And W2Microblogging quantity;Scales of the R for corpus, i.e. microblogging sum.
The invention has the beneficial effects as follows:
Emotion co-occurrence figure of the present invention construction based on emotion wheel, emotional semantic classification are more careful, and emotion is easier to understand reconciliation Release, it is higher relative to the event detection accuracy rate based on emotional symbol.Sentiment analysis, mistake are carried out using the emotion co-occurrence figure set up Substantial amounts of useless microblogging is filtered, and the bursty state of microblog data stream, efficiency high is detected using sentiment analysis result.Using microblogging mark Label carry out accident discovery as guiding, find that accuracy rate is high than the event based on cluster, and detection detection time is fast.
Description of the drawings
Online accident model frameworks of the Fig. 1 based on emotion co-occurrence figure.
Specific embodiment
With reference to the accompanying drawings and detailed description the implementation process of the present invention is described in further detail.
Step 1:Using sentiment classification model emotion wheel, sentiment analysis model --- emotion co-occurrence figure is constructed.Specifically include Following steps:
Step 1.1:Using emotion wheel model, manually rational vocabulary is given to emotional symbol;
Step 1.2:Word segmentation processing is carried out to original microblog data, microblogging corpus is formed;
Step 1.3:Using HowNet dictionaries, using word Similarity measures microblogging corpus word and feelings based on distance Similarity between sense symbol word.
Used in step 1.3, equation below calculates the similitude of word detection:
W in formula1And W2Represent word, word W1There is the k senses of a dictionary entry (concept):{n11,n12,…,n1k, word W2Have p it is adopted Item (concept):{n21,n22,…,n2p, p1And p2Represent that two justice are former, d is p1And p2Path length in adopted original hierarchical system Degree, is a positive integer.α is an adjustable parameter, takes 1.6 in the present invention.
Step 1.4:Similarity is set up more than the connection between the word of given threshold value λ, the construction of emotion co-occurrence figure is completed. λ selects 0.6 in the present invention.
Step 2:The sentiment analysis model constructed using step 1, carries out emotional semantic classification to the microblogging in microblogging stream, adopts Kleinberg algorithms detect the burst period of microblogging stream.
Step 2.1:Each microblogging in for microblogging stream, carries out word segmentation processing to which.
Step 2.2:The microblogging finished to participle, sets up the emotion vector of microblogging using the emotion co-occurrence graph model set up Sd。
Step 2.3:Flag bit flag=true is set, if the corresponding emotion mark σ sk of Sd vectors are 1, by the microblogging In adding emotion document sets Ds Tk, flag is set to into false.
Step 2.4:Repeat step 2.2 and 2.3 is finished until the classification of all of microblogging.
Step 2.5:For each class emotion microblogging, burst period is detected using kleinberg algorithms.
Step 3:The microblog label in burst period is extracted, rubbish label is filtered out, word segmentation processing is carried out to remaining label.Shape Into the initial key word of event.
Step 3.1:Label to extracting carries out part-of-speech tagging, removes the only label of verb or only one of which noun Label, such as " # good morning # ", " # good night # ", " # sings # ", " # Jiu Zhaigous # ", " # journey # " this kind of label.
Step 3.2:Weed out in label containing additional character ("《", "+", "-", "-") label.As " # makes laughs+regards Frequency # ", " # good morning * loves shop # ", " #Weico+# ".
Step 3.3:Weed out containing standard date format, only the label of numeral and punctuation mark,.Such as " #365# ", " # 4.01#”。
Step 4:The keyword generated using step 3, in extraction microblogging, the word related to this keyword, forms event most Describe eventually.
Step 4.1:Word segmentation processing is carried out to remaining label in burst period.
Step 4.2:Calculate the frequent mode about microblog label keyword in burst period.
Step 4.3:2 item collections in frequent mode are extracted, the mutual information between word in 2 item collection is calculated.
Step 4.4:Retain word of the mutual information more than given threshold value Y, word is ranked up by word frequency, form final event Description.In the present invention, the value of Y selects 1.5.
In step 4.4, mutual information computing formula is:
C(W1) and C (W2) respectively represent corpus in contain W1And W2Microblogging quantity, C (W1, W2) represent and contain W simultaneously1 And W2Microblogging quantity.Scales of the R for corpus, i.e. microblogging sum.

Claims (4)

1. a kind of online incident detection method of microblogging based on sentiment analysis and label, it is characterised in that including following step Suddenly:
(1) using sentiment classification model emotion wheel, construct sentiment analysis model --- emotion co-occurrence figure;
(2) the sentiment analysis model constructed using step (1), carries out emotional semantic classification to the microblogging in microblogging stream, adopts Kleinberg algorithms detect the burst period of microblogging stream;
(3) microblog label in burst period is extracted, rubbish label is filtered out, word segmentation processing is carried out to remaining label;Formation event Initial key word;
(4) keyword generated using step (3), in extraction microblogging, the word related to this keyword, forms finally retouching for event State.
2. the online incident detection method of a kind of microblogging based on sentiment analysis and label according to claim 1, its It is characterised by, in the step (1), constructs emotion co-occurrence figure by the following method:
(1.1) using emotion wheel model, manually rational vocabulary is given to emotional symbol;
(1.2) word segmentation processing is carried out to original microblog data, microblogging corpus is formed;
(1.3) using HowNet dictionaries, using word Similarity measures microblogging corpus word and emotional symbol based on distance Similarity between word;
(1.3) used in, equation below calculates the similitude of word detection:
W in formula1And W2Represent word, word W1There is the k senses of a dictionary entry:{n11,n12,…,n1k, word W2There is the p senses of a dictionary entry:{n21, n22,…,n2p, p1And p2Represent that two justice are former, d is p1And p2Path in adopted original hierarchical system, is a positive integer; α is an adjustable parameter;
(1.4) similarity is set up more than the connection between the word of given threshold value λ, complete the construction of emotion co-occurrence figure;λ is selected 0.6。
3. the online incident detection method of a kind of microblogging based on sentiment analysis and label according to claim 1, its It is characterised by, in described step (3), comprises the steps of:
(3.1) label to extracting carries out part-of-speech tagging, removes the label of the only label of verb or only one of which noun;
(3.2) weed out the label containing additional character in label;
(3.3) weed out containing standard date format, only have the label of numeral and punctuation mark.
4. the online incident detection method of a kind of microblogging based on sentiment analysis and label according to claim 1, its It is characterised by, comprises the following steps in described step (4):
(4.1) word segmentation processing is carried out to remaining label in burst period;
(4.2) calculate the frequent mode in burst period about microblog label keyword;
(4.3) 2 item collections in frequent mode are extracted, the mutual information between word in 2 item collection is calculated;
(4.4) retain morphology of the mutual information more than given threshold value γ into final event description;The value of γ selects 1.5;
In step 4.4, mutual information computing formula is:
C(W1) and C (W2) respectively represent corpus in contain W1And W2Microblogging quantity, C (W1,W2) represent and contain W simultaneously1And W2 Microblogging quantity;Scales of the R for corpus, i.e. microblogging sum.
CN201610945406.6A 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label Active CN106547875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610945406.6A CN106547875B (en) 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610945406.6A CN106547875B (en) 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label

Publications (2)

Publication Number Publication Date
CN106547875A true CN106547875A (en) 2017-03-29
CN106547875B CN106547875B (en) 2020-05-15

Family

ID=58393729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610945406.6A Active CN106547875B (en) 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label

Country Status (1)

Country Link
CN (1) CN106547875B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886442A (en) * 2017-11-28 2018-04-06 合肥工业大学 Public's emotion distribution modeling method and device based on microblogging text
CN109189910A (en) * 2018-09-18 2019-01-11 哈尔滨工程大学 A kind of label auto recommending method towards mobile application problem report
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN109977231A (en) * 2019-04-10 2019-07-05 上海海事大学 A kind of depressive emotion analysis method based on emotion decay factor
JP2019144905A (en) * 2018-02-21 2019-08-29 富士通株式会社 Information processing program, message analysis program, information processor, and information processing method
CN110990592A (en) * 2019-11-07 2020-04-10 北京科技大学 Microblog burst topic online detection method and detection device
CN111950273A (en) * 2020-07-31 2020-11-17 南京莱斯网信技术研究院有限公司 Network public opinion emergency automatic identification method based on emotion information extraction analysis
CN112084333A (en) * 2020-08-31 2020-12-15 杭州电子科技大学 Social user generation method based on emotional tendency analysis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246728A (en) * 2013-05-10 2013-08-14 北京大学 Emergency detection method based on document lexical feature variations
CN103559233A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Extraction method for network new words in microblogs and microblog emotion analysis method and system
CN104573031A (en) * 2015-01-14 2015-04-29 哈尔滨工业大学深圳研究生院 Micro blog emergency detection method
CN105224604A (en) * 2015-09-01 2016-01-06 天津大学 A kind of microblogging incident detection method based on heap optimization and pick-up unit thereof
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559233A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Extraction method for network new words in microblogs and microblog emotion analysis method and system
CN103246728A (en) * 2013-05-10 2013-08-14 北京大学 Emergency detection method based on document lexical feature variations
CN104573031A (en) * 2015-01-14 2015-04-29 哈尔滨工业大学深圳研究生院 Micro blog emergency detection method
CN105224604A (en) * 2015-09-01 2016-01-06 天津大学 A kind of microblogging incident detection method based on heap optimization and pick-up unit thereof
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张鲁民等: ""一种基于情感符号的在线突发事件检测方法"", 《计算机学报》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886442A (en) * 2017-11-28 2018-04-06 合肥工业大学 Public's emotion distribution modeling method and device based on microblogging text
JP7091700B2 (en) 2018-02-21 2022-06-28 富士通株式会社 Information processing program, message analysis program, information processing device and information processing method
JP2019144905A (en) * 2018-02-21 2019-08-29 富士通株式会社 Information processing program, message analysis program, information processor, and information processing method
CN109189910A (en) * 2018-09-18 2019-01-11 哈尔滨工程大学 A kind of label auto recommending method towards mobile application problem report
CN109189910B (en) * 2018-09-18 2019-09-10 哈尔滨工程大学 A kind of label auto recommending method towards mobile application problem report
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN109977231A (en) * 2019-04-10 2019-07-05 上海海事大学 A kind of depressive emotion analysis method based on emotion decay factor
CN110990592A (en) * 2019-11-07 2020-04-10 北京科技大学 Microblog burst topic online detection method and detection device
CN110990592B (en) * 2019-11-07 2023-06-23 北京科技大学 Online microblog burst topic detection method and detection device
CN111950273B (en) * 2020-07-31 2023-09-01 南京莱斯网信技术研究院有限公司 Automatic network public opinion emergency identification method based on emotion information extraction analysis
CN111950273A (en) * 2020-07-31 2020-11-17 南京莱斯网信技术研究院有限公司 Network public opinion emergency automatic identification method based on emotion information extraction analysis
CN112084333B (en) * 2020-08-31 2022-04-22 杭州电子科技大学 Social user generation method based on emotional tendency analysis
CN112084333A (en) * 2020-08-31 2020-12-15 杭州电子科技大学 Social user generation method based on emotional tendency analysis

Also Published As

Publication number Publication date
CN106547875B (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN106547875A (en) A kind of online incident detection method of the microblogging based on sentiment analysis and label
CN109800310B (en) Electric power operation and maintenance text analysis method based on structured expression
CN104199972B (en) A kind of name entity relation extraction and construction method based on deep learning
CN104778209B (en) A kind of opining mining method for millions scale news analysis
Akaichi et al. Text mining facebook status updates for sentiment classification
CN104008091B (en) A kind of network text sentiment analysis method based on emotion value
CN108363725B (en) Method for extracting user comment opinions and generating opinion labels
CN103500175B (en) A kind of method based on sentiment analysis on-line checking microblog hot event
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN105045857A (en) Social network rumor recognition method and system
CN104281653A (en) Viewpoint mining method for ten million microblog texts
CN103218444A (en) Method of Tibetan language webpage text classification based on semanteme
CN105183717A (en) OSN user emotion analysis method based on random forest and user relationship
CN114579833B (en) Microblog public opinion visual analysis method based on topic mining and emotion analysis
CN102436480B (en) Incidence relation excavation method for text-oriented knowledge unit
CN104199845B (en) Line Evaluation based on agent model discusses sensibility classification method
CN106126502A (en) A kind of emotional semantic classification system and method based on support vector machine
CN105843796A (en) Microblog emotional tendency analysis method and device
CN115017303A (en) Method, computing device and medium for enterprise risk assessment based on news text
CN107463703A (en) English social media account number classification method based on information gain
CN102073646A (en) Blog group-oriented subject propensity processing method and system
CN103455639A (en) Method and device for recognizing microblog burst hotspot events
Bouchlaghem et al. A machine learning approach for classifying sentiments in Arabic tweets
CN109857869A (en) A kind of hot topic prediction technique based on Ap increment cluster and network primitive
CN110019820A (en) Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant