CN106547875B - Microblog online emergency detection method based on emotion analysis and label - Google Patents

Microblog online emergency detection method based on emotion analysis and label Download PDF

Info

Publication number
CN106547875B
CN106547875B CN201610945406.6A CN201610945406A CN106547875B CN 106547875 B CN106547875 B CN 106547875B CN 201610945406 A CN201610945406 A CN 201610945406A CN 106547875 B CN106547875 B CN 106547875B
Authority
CN
China
Prior art keywords
emotion
microblog
labels
words
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610945406.6A
Other languages
Chinese (zh)
Other versions
CN106547875A (en
Inventor
邹晓梅
杨静
张健沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201610945406.6A priority Critical patent/CN106547875B/en
Publication of CN106547875A publication Critical patent/CN106547875A/en
Application granted granted Critical
Publication of CN106547875B publication Critical patent/CN106547875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of network detection, and particularly relates to a microblog online emergency detection method based on emotion analysis and labels. The invention comprises the following steps: constructing an emotion analysis model, namely an emotion concurrence graph, by using an emotion classification model emotion wheel; performing sentiment classification on the microblogs in the microblog flow by using the sentiment analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm; extracting microblog labels in a burst period, filtering out junk labels, and performing word segmentation processing on the rest labels; forming an initial keyword of an event; and (4) extracting words related to the keywords in the microblog by using the keywords generated in the step (3) to form the final description of the event. The emotion detection method based on the emotion wheel constructs the emotion concurrence graph based on the emotion wheel, emotion classification is more detailed, emotion is easier to understand and explain, and compared with the event detection accuracy based on the emotion symbol, the emotion detection method based on the emotion wheel is higher in accuracy.

Description

Microblog online emergency detection method based on emotion analysis and label
Technical Field
The invention belongs to the field of network detection, and particularly relates to a microblog online emergency detection method based on emotion analysis and labels.
Background
With the vigorous development of Web2.0 technology in recent years, a series of social networks emerge. These social networks, such as the newsbook, twitter, etc., attract a large number of users. Users are active on social networks and post a large number of microblog messages containing opinions or views about certain events. By mining the microblog messages, a large amount of deeper information such as user emotion can be obtained. The information can be used for providing services for governments or enterprises, for example, the governments can use the information to judge whether people support legal laws and what opinions are held on a certain social event, so as to carry out public opinion control and guidance; the enterprise can learn the behavior habits and preferences of the user by mining microblog messages of the user, so that commodities which are most likely to be interested or bought by the user are recommended to the enterprise.
For incident detection, there are two conventional approaches, namely document-based incident detection and feature-based incident detection. The idea of detecting the emergent events based on the documents is to represent the documents into word vectors or named entity vectors, calculate the similarity between the documents and cluster the documents to form the events. The method for detecting the events based on the characteristic bursts is one effective method for mining the burst events in the data stream, and the method has the main idea that the characteristic words of the document are firstly extracted, the burst phenomenon is detected by analyzing the time-varying tracks of the characteristic words, and then the characteristic words with the same burst tracks are aggregated to form the burst events. However, these two methods are not applicable in case of microblog short texts. Firstly, the data volume of the microblogs is large, and a large amount of time is needed for extracting feature words and forming a tfidf matrix for each microblog. Secondly, the microblog expression mode is irregular, the form is changeable, a large number of new words are likely to be contained, the formed matrix is sparse, the similarity is not easy to calculate, and the identification difficulty is increased. Meanwhile, the traditional method only finishes the extraction of the emergency and does not carry out deeper analysis on the emergency, such as sentiment analysis.
Disclosure of Invention
The invention aims to provide an online emergency detection model for microblog data stream short texts, and the online emergency detection method based on emotion analysis and labels can accurately and quickly extract the emergency in the data stream.
The purpose of the invention is realized as follows:
a microblog online emergency detection method based on emotion analysis and labels comprises the following steps:
(1) constructing an emotion analysis model, namely an emotion concurrence graph, by using an emotion classification model emotion wheel;
(2) performing sentiment classification on the microblogs in the microblog flow by using the sentiment analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm;
(3) extracting microblog labels in a burst period, filtering out junk labels, and performing word segmentation processing on the rest labels; forming an initial keyword of an event;
(4) and (4) extracting words related to the keywords in the microblog by using the keywords generated in the step (3) to form the final description of the event.
In the step (1), an emotion concurrence graph is constructed by the following method:
(1.1) using an emotion wheel model, and manually endowing reasonable words to emotion symbols;
(1.2) performing word segmentation processing on the original microblog data to form a microblog corpus;
(1.3) calculating the similarity between words of the microblog corpus and words of the emotion symbols by using a HowNet dictionary and adopting word similarity based on distance;
(1.3) the similarity of word detection is calculated using the following formula:
Figure BDA0001140590780000021
Figure BDA0001140590780000022
in the formula W1And W2Represents a word, word W1There are k terms: { n11,n12,…,n1k}, word W2There are p sense items: { n21,n22,…,n2p},p1And p2Denotes two sememes, d is p1And p2The path length in the semantic hierarchy is a positive integer α is an adjustable parameter;
(1.4) establishing connection among words with similarity larger than a given threshold lambda to finish the construction of the emotion concurrence graph; lambda is selected to be 0.6.
The step (3) comprises the following steps:
(3.1) performing part-of-speech tagging on the extracted tag, and removing the tag only with a verb or the tag only with a noun;
(3.2) rejecting labels containing special symbols in the labels;
(3.3) removing labels which contain standard date formats and only have numbers and punctuation marks;
the step (4) comprises the following steps:
(4.1) performing word segmentation on the residual labels in the burst period;
(4.2) calculating a frequent mode of related microblog label keywords in a burst period;
(4.3) extracting 2 item sets in the frequent pattern, and calculating mutual information among words in the 2 item sets;
(4.4) keeping the words with mutual information larger than a given threshold value gamma to form a final event description; selecting the value of gamma to be 1.5;
the mutual information calculation formula in step 4.4 is:
Figure BDA0001140590780000023
Figure BDA0001140590780000024
Figure BDA0001140590780000025
Figure BDA0001140590780000026
Figure BDA0001140590780000031
C(W1) And C (W)2) Respectively indicate W contained in corpus1And W2Number of microblogs, C (W)1,W2) Indicates that W is contained at the same time1And W2The number of microblogs; and R is the size of the corpus, namely the total number of microblogs.
The invention has the beneficial effects that:
the emotion detection method based on the emotion wheel constructs the emotion concurrence graph based on the emotion wheel, emotion classification is more detailed, emotion is easier to understand and explain, and compared with the event detection accuracy based on the emotion symbol, the emotion detection method based on the emotion wheel is higher in accuracy. And performing emotion analysis by using the established emotion concurrence graph, filtering a large number of useless microblogs, and detecting the burst state of the microblog data stream by using the emotion analysis result, so that the efficiency is high. The microblog label is used as a guide to discover the emergency, the accuracy is higher than that of event discovery based on clustering, and the detection time is short.
Drawings
FIG. 1 is an online emergency model framework based on emotional concurrency graphs.
Detailed Description
The following describes the implementation of the present invention in further detail with reference to the accompanying drawings and the detailed description.
Step 1: and constructing an emotion analysis model, namely an emotion concurrence graph by using the emotion classification model emotion wheel. The method specifically comprises the following steps:
step 1.1: using an emotion wheel model, and manually endowing reasonable words to emotion symbols;
step 1.2: performing word segmentation processing on original microblog data to form a microblog corpus;
step 1.3: and calculating the similarity between words of the microblog corpus and words of the emotion symbols by using the HowNet dictionary and adopting the word similarity based on the distance.
In step 1.3, the similarity of word detection is calculated using the following formula:
Figure BDA0001140590780000032
Figure BDA0001140590780000033
in the formula W1And W2Represents a word, word W1There are k terms (concepts): { n11,n12,…,n1k}, word W2There are p sense items (concepts): { n21,n22,…,n2p},p1And p2Denotes two sememes, d is p1And p2The path length in the semantic hierarchy is a positive integer α is an adjustable parameter, which in the present invention is taken to be 1.6.
Step 1.4: and establishing connection among the words with the similarity larger than a given threshold lambda to finish the construction of the emotion concurrence graph. In the present invention λ is chosen to be 0.6.
Step 2: and (3) carrying out emotion classification on the microblogs in the microblog flow by using the emotion analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm.
And 2.1, performing word segmentation on each microblog in the microblog flow.
Step 2.2: and establishing an emotion vector Sd of the microblog by using the established emotion concurrence graph model for the microblog with the participle.
Step 2.3: and setting a flag bit flag to true, if the corresponding emotion mark sigma sk of the Sd vector is 1, adding the microblog into an emotion document set Ds Tk, and setting the flag to false.
Step 2.4: and repeating the steps 2.2 and 2.3 until all the microblogs are classified.
Step 2.5: for each type of emotional microblog, a kleinberg algorithm is used for detecting the outbreak period.
And step 3: extracting microblog labels in the burst period, filtering out junk labels, and performing word segmentation processing on the rest labels. An initial keyword for the event is formed.
Step 3.1: and performing part-of-speech tagging on the extracted tags, and removing tags only of verbs or tags only of nouns, such as tags like "# early-safe #", "# late-safe #", "# sing bar #" "# nine village #", "# journey #".
Step 3.2: labels containing special symbols ("," + ",") in the labels are removed. Such as "# laugh + video #", "# early love house #", "# Weico + #".
Step 3.3: labels with standard date format, only numbers and punctuation are removed. Such as "# 365 #", "# 4.01 #".
And 4, step 4: and 3, extracting words related to the keywords in the microblog by using the keywords generated in the step 3 to form the final description of the event.
Step 4.1: and performing word segmentation on the rest tags in the burst period.
Step 4.2: and calculating the frequent mode of the keywords of the microblog labels in the burst period.
Step 4.3: and extracting 2 item sets in the frequent pattern, and calculating mutual information among the words in the 2 item sets.
Step 4.4: and reserving words with mutual information larger than a given threshold value Y, and sequencing the words according to word frequency to form final event description. In the present invention, the value of Y is selected to be 1.5.
The mutual information calculation formula in step 4.4 is:
Figure BDA0001140590780000041
Figure BDA0001140590780000042
Figure BDA0001140590780000043
Figure BDA0001140590780000044
Figure BDA0001140590780000045
C(W1) And C (W)2) Respectively indicate W contained in corpus1And W2Number of microblogs, C (W)1,W2) Indicates that W is contained at the same time1And W2The number of microblogs. And R is the size of the corpus, namely the total number of microblogs.

Claims (1)

1. A microblog online emergency detection method based on emotion analysis and labels is characterized by comprising the following steps:
(1) constructing an emotion analysis model, namely an emotion concurrence graph, by using an emotion classification model emotion wheel;
(2) performing sentiment classification on the microblogs in the microblog flow by using the sentiment analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm;
(3) extracting microblog labels in a burst period, filtering out junk labels, and performing word segmentation processing on the rest labels; forming an initial keyword of an event;
(4) extracting words related to the keywords in the microblog by using the keywords generated in the step (3) to form final description of the event;
in the step (1), an emotion concurrence graph is constructed by the following method:
(1.1) using an emotion wheel model, and manually endowing reasonable words to emotion symbols;
(1.2) performing word segmentation processing on the original microblog data to form a microblog corpus;
(1.3) calculating the similarity between words of the microblog corpus and words of the emotion symbols by using a HowNet dictionary and adopting word similarity based on distance;
(1.3) the similarity of word detection is calculated using the following formula:
Figure FDA0002209203670000011
Figure FDA0002209203670000012
in the formula W1And W2Represents a word, word W1There are k terms: { n11,n12,…,n1k}, word W2There are p sense items: { n21,n22,…,n2p},p1And p2Denotes two sememes, d is p1And p2The path length in the semantic hierarchy is a positive integer α is an adjustable parameter;
(1.4) establishing connection among words with similarity larger than a given threshold lambda to finish the construction of the emotion concurrence graph; lambda is selected to be 0.6;
the step (3) comprises the following steps:
(3.1) performing part-of-speech tagging on the extracted tag, and removing the tag only with a verb or the tag only with a noun;
(3.2) rejecting labels containing special symbols in the labels;
(3.3) removing labels which contain standard date formats and only have numbers and punctuation marks;
the step (4) comprises the following steps:
(4.1) performing word segmentation on the residual labels in the burst period;
(4.2) calculating a frequent mode of related microblog label keywords in a burst period;
(4.3) extracting 2 item sets in the frequent pattern, and calculating mutual information among words in the 2 item sets;
(4.4) keeping the words with mutual information larger than a given threshold value gamma to form a final event description; selecting the value of gamma to be 1.5;
the mutual information calculation formula in step 4.4 is:
Figure FDA0002209203670000021
Figure FDA0002209203670000022
Figure FDA0002209203670000023
Figure FDA0002209203670000024
Figure FDA0002209203670000025
C(W1) And C (W)2) Respectively indicate W contained in corpus1And W2Number of microblogs, C (W)1,W2) Indicates that W is contained at the same time1And W2The number of microblogs; r is gauge of corpusAnd module, namely the total number of microblogs.
CN201610945406.6A 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label Active CN106547875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610945406.6A CN106547875B (en) 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610945406.6A CN106547875B (en) 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label

Publications (2)

Publication Number Publication Date
CN106547875A CN106547875A (en) 2017-03-29
CN106547875B true CN106547875B (en) 2020-05-15

Family

ID=58393729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610945406.6A Active CN106547875B (en) 2016-11-02 2016-11-02 Microblog online emergency detection method based on emotion analysis and label

Country Status (1)

Country Link
CN (1) CN106547875B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886442A (en) * 2017-11-28 2018-04-06 合肥工业大学 Public's emotion distribution modeling method and device based on microblogging text
JP7091700B2 (en) * 2018-02-21 2022-06-28 富士通株式会社 Information processing program, message analysis program, information processing device and information processing method
CN109189910B (en) * 2018-09-18 2019-09-10 哈尔滨工程大学 A kind of label auto recommending method towards mobile application problem report
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN109977231B (en) * 2019-04-10 2021-04-02 上海海事大学 Depressed mood analysis method based on emotional decay factor
CN110990592B (en) * 2019-11-07 2023-06-23 北京科技大学 Online microblog burst topic detection method and detection device
CN111950273B (en) * 2020-07-31 2023-09-01 南京莱斯网信技术研究院有限公司 Automatic network public opinion emergency identification method based on emotion information extraction analysis
CN112084333B (en) * 2020-08-31 2022-04-22 杭州电子科技大学 Social user generation method based on emotional tendency analysis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246728A (en) * 2013-05-10 2013-08-14 北京大学 Emergency detection method based on document lexical feature variations
CN103559233A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Extraction method for network new words in microblogs and microblog emotion analysis method and system
CN104573031A (en) * 2015-01-14 2015-04-29 哈尔滨工业大学深圳研究生院 Micro blog emergency detection method
CN105224604A (en) * 2015-09-01 2016-01-06 天津大学 A kind of microblogging incident detection method based on heap optimization and pick-up unit thereof
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559233A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Extraction method for network new words in microblogs and microblog emotion analysis method and system
CN103246728A (en) * 2013-05-10 2013-08-14 北京大学 Emergency detection method based on document lexical feature variations
CN104573031A (en) * 2015-01-14 2015-04-29 哈尔滨工业大学深圳研究生院 Micro blog emergency detection method
CN105224604A (en) * 2015-09-01 2016-01-06 天津大学 A kind of microblogging incident detection method based on heap optimization and pick-up unit thereof
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"一种基于情感符号的在线突发事件检测方法";张鲁民等;《计算机学报》;20130815(第8期);正文第1660-1666页、图2 *

Also Published As

Publication number Publication date
CN106547875A (en) 2017-03-29

Similar Documents

Publication Publication Date Title
CN106547875B (en) Microblog online emergency detection method based on emotion analysis and label
Kumar et al. Sentiment analysis of multimodal twitter data
Gokulakrishnan et al. Opinion mining and sentiment analysis on a twitter data stream
WO2012096388A1 (en) Unexpectedness determination system, unexpectedness determination method, and program
CN105183717A (en) OSN user emotion analysis method based on random forest and user relationship
JP2010211594A (en) Text analysis device and method, and program
Anoop et al. Leveraging heterogeneous data for fake news detection
Stavrianou et al. NLP-based feature extraction for automated tweet classification
Manke et al. A review on: opinion mining and sentiment analysis based on natural language processing
CN109857869A (en) A kind of hot topic prediction technique based on Ap increment cluster and network primitive
CN104484437B (en) A kind of network short commentary emotion method for digging
CN104123336B (en) Depth Boltzmann machine model and short text subject classification system and method
KR102185733B1 (en) Server and method for automatically generating profile
Subramani et al. Text mining and real-time analytics of twitter data: A case study of australian hay fever prediction
Kameswari et al. Predicting Election Results using NLTK
CN107729509A (en) The chapter similarity decision method represented based on recessive higher-dimension distributed nature
Vanetik et al. Propaganda Detection in Russian Telegram Posts in the Scope of the Russian Invasion of Ukraine
Saqib et al. Automatic classification of product reviews into interrogative and noninterrogative: Generating real time answer
Shirahatti et al. Sentiment analysis on Twitter data using Hadoop
Mapa et al. Text normalization in social media by using spell correction and dictionary based approach
Kotevska et al. Automatic Categorization of Social Sensor Data
Jawale et al. Design of automated sentiment or opinion discovery system to enhance its performance
CN110837740B (en) Comment aspect opinion level mining method based on dictionary improvement LDA model
Singh et al. Sentiment analysis of twitter data set: survey
Liu et al. Discovering Opinion Changes in Online Reviews via Learning Fine-Grained Sentiments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant