CN106547875B - Microblog online emergency detection method based on emotion analysis and label - Google Patents
Microblog online emergency detection method based on emotion analysis and label Download PDFInfo
- Publication number
- CN106547875B CN106547875B CN201610945406.6A CN201610945406A CN106547875B CN 106547875 B CN106547875 B CN 106547875B CN 201610945406 A CN201610945406 A CN 201610945406A CN 106547875 B CN106547875 B CN 106547875B
- Authority
- CN
- China
- Prior art keywords
- emotion
- microblog
- labels
- words
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 65
- 238000001514 detection method Methods 0.000 title claims abstract description 23
- 238000004458 analytical method Methods 0.000 title claims abstract description 19
- 230000011218 segmentation Effects 0.000 claims abstract description 11
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 238000013145 classification model Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 description 4
- 238000005065 mining Methods 0.000 description 3
- 230000002996 emotional effect Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Human Resources & Organizations (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of network detection, and particularly relates to a microblog online emergency detection method based on emotion analysis and labels. The invention comprises the following steps: constructing an emotion analysis model, namely an emotion concurrence graph, by using an emotion classification model emotion wheel; performing sentiment classification on the microblogs in the microblog flow by using the sentiment analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm; extracting microblog labels in a burst period, filtering out junk labels, and performing word segmentation processing on the rest labels; forming an initial keyword of an event; and (4) extracting words related to the keywords in the microblog by using the keywords generated in the step (3) to form the final description of the event. The emotion detection method based on the emotion wheel constructs the emotion concurrence graph based on the emotion wheel, emotion classification is more detailed, emotion is easier to understand and explain, and compared with the event detection accuracy based on the emotion symbol, the emotion detection method based on the emotion wheel is higher in accuracy.
Description
Technical Field
The invention belongs to the field of network detection, and particularly relates to a microblog online emergency detection method based on emotion analysis and labels.
Background
With the vigorous development of Web2.0 technology in recent years, a series of social networks emerge. These social networks, such as the newsbook, twitter, etc., attract a large number of users. Users are active on social networks and post a large number of microblog messages containing opinions or views about certain events. By mining the microblog messages, a large amount of deeper information such as user emotion can be obtained. The information can be used for providing services for governments or enterprises, for example, the governments can use the information to judge whether people support legal laws and what opinions are held on a certain social event, so as to carry out public opinion control and guidance; the enterprise can learn the behavior habits and preferences of the user by mining microblog messages of the user, so that commodities which are most likely to be interested or bought by the user are recommended to the enterprise.
For incident detection, there are two conventional approaches, namely document-based incident detection and feature-based incident detection. The idea of detecting the emergent events based on the documents is to represent the documents into word vectors or named entity vectors, calculate the similarity between the documents and cluster the documents to form the events. The method for detecting the events based on the characteristic bursts is one effective method for mining the burst events in the data stream, and the method has the main idea that the characteristic words of the document are firstly extracted, the burst phenomenon is detected by analyzing the time-varying tracks of the characteristic words, and then the characteristic words with the same burst tracks are aggregated to form the burst events. However, these two methods are not applicable in case of microblog short texts. Firstly, the data volume of the microblogs is large, and a large amount of time is needed for extracting feature words and forming a tfidf matrix for each microblog. Secondly, the microblog expression mode is irregular, the form is changeable, a large number of new words are likely to be contained, the formed matrix is sparse, the similarity is not easy to calculate, and the identification difficulty is increased. Meanwhile, the traditional method only finishes the extraction of the emergency and does not carry out deeper analysis on the emergency, such as sentiment analysis.
Disclosure of Invention
The invention aims to provide an online emergency detection model for microblog data stream short texts, and the online emergency detection method based on emotion analysis and labels can accurately and quickly extract the emergency in the data stream.
The purpose of the invention is realized as follows:
a microblog online emergency detection method based on emotion analysis and labels comprises the following steps:
(1) constructing an emotion analysis model, namely an emotion concurrence graph, by using an emotion classification model emotion wheel;
(2) performing sentiment classification on the microblogs in the microblog flow by using the sentiment analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm;
(3) extracting microblog labels in a burst period, filtering out junk labels, and performing word segmentation processing on the rest labels; forming an initial keyword of an event;
(4) and (4) extracting words related to the keywords in the microblog by using the keywords generated in the step (3) to form the final description of the event.
In the step (1), an emotion concurrence graph is constructed by the following method:
(1.1) using an emotion wheel model, and manually endowing reasonable words to emotion symbols;
(1.2) performing word segmentation processing on the original microblog data to form a microblog corpus;
(1.3) calculating the similarity between words of the microblog corpus and words of the emotion symbols by using a HowNet dictionary and adopting word similarity based on distance;
(1.3) the similarity of word detection is calculated using the following formula:
in the formula W1And W2Represents a word, word W1There are k terms: { n11,n12,…,n1k}, word W2There are p sense items: { n21,n22,…,n2p},p1And p2Denotes two sememes, d is p1And p2The path length in the semantic hierarchy is a positive integer α is an adjustable parameter;
(1.4) establishing connection among words with similarity larger than a given threshold lambda to finish the construction of the emotion concurrence graph; lambda is selected to be 0.6.
The step (3) comprises the following steps:
(3.1) performing part-of-speech tagging on the extracted tag, and removing the tag only with a verb or the tag only with a noun;
(3.2) rejecting labels containing special symbols in the labels;
(3.3) removing labels which contain standard date formats and only have numbers and punctuation marks;
the step (4) comprises the following steps:
(4.1) performing word segmentation on the residual labels in the burst period;
(4.2) calculating a frequent mode of related microblog label keywords in a burst period;
(4.3) extracting 2 item sets in the frequent pattern, and calculating mutual information among words in the 2 item sets;
(4.4) keeping the words with mutual information larger than a given threshold value gamma to form a final event description; selecting the value of gamma to be 1.5;
the mutual information calculation formula in step 4.4 is:
C(W1) And C (W)2) Respectively indicate W contained in corpus1And W2Number of microblogs, C (W)1,W2) Indicates that W is contained at the same time1And W2The number of microblogs; and R is the size of the corpus, namely the total number of microblogs.
The invention has the beneficial effects that:
the emotion detection method based on the emotion wheel constructs the emotion concurrence graph based on the emotion wheel, emotion classification is more detailed, emotion is easier to understand and explain, and compared with the event detection accuracy based on the emotion symbol, the emotion detection method based on the emotion wheel is higher in accuracy. And performing emotion analysis by using the established emotion concurrence graph, filtering a large number of useless microblogs, and detecting the burst state of the microblog data stream by using the emotion analysis result, so that the efficiency is high. The microblog label is used as a guide to discover the emergency, the accuracy is higher than that of event discovery based on clustering, and the detection time is short.
Drawings
FIG. 1 is an online emergency model framework based on emotional concurrency graphs.
Detailed Description
The following describes the implementation of the present invention in further detail with reference to the accompanying drawings and the detailed description.
Step 1: and constructing an emotion analysis model, namely an emotion concurrence graph by using the emotion classification model emotion wheel. The method specifically comprises the following steps:
step 1.1: using an emotion wheel model, and manually endowing reasonable words to emotion symbols;
step 1.2: performing word segmentation processing on original microblog data to form a microblog corpus;
step 1.3: and calculating the similarity between words of the microblog corpus and words of the emotion symbols by using the HowNet dictionary and adopting the word similarity based on the distance.
In step 1.3, the similarity of word detection is calculated using the following formula:
in the formula W1And W2Represents a word, word W1There are k terms (concepts): { n11,n12,…,n1k}, word W2There are p sense items (concepts): { n21,n22,…,n2p},p1And p2Denotes two sememes, d is p1And p2The path length in the semantic hierarchy is a positive integer α is an adjustable parameter, which in the present invention is taken to be 1.6.
Step 1.4: and establishing connection among the words with the similarity larger than a given threshold lambda to finish the construction of the emotion concurrence graph. In the present invention λ is chosen to be 0.6.
Step 2: and (3) carrying out emotion classification on the microblogs in the microblog flow by using the emotion analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm.
And 2.1, performing word segmentation on each microblog in the microblog flow.
Step 2.2: and establishing an emotion vector Sd of the microblog by using the established emotion concurrence graph model for the microblog with the participle.
Step 2.3: and setting a flag bit flag to true, if the corresponding emotion mark sigma sk of the Sd vector is 1, adding the microblog into an emotion document set Ds Tk, and setting the flag to false.
Step 2.4: and repeating the steps 2.2 and 2.3 until all the microblogs are classified.
Step 2.5: for each type of emotional microblog, a kleinberg algorithm is used for detecting the outbreak period.
And step 3: extracting microblog labels in the burst period, filtering out junk labels, and performing word segmentation processing on the rest labels. An initial keyword for the event is formed.
Step 3.1: and performing part-of-speech tagging on the extracted tags, and removing tags only of verbs or tags only of nouns, such as tags like "# early-safe #", "# late-safe #", "# sing bar #" "# nine village #", "# journey #".
Step 3.2: labels containing special symbols ("," + ",") in the labels are removed. Such as "# laugh + video #", "# early love house #", "# Weico + #".
Step 3.3: labels with standard date format, only numbers and punctuation are removed. Such as "# 365 #", "# 4.01 #".
And 4, step 4: and 3, extracting words related to the keywords in the microblog by using the keywords generated in the step 3 to form the final description of the event.
Step 4.1: and performing word segmentation on the rest tags in the burst period.
Step 4.2: and calculating the frequent mode of the keywords of the microblog labels in the burst period.
Step 4.3: and extracting 2 item sets in the frequent pattern, and calculating mutual information among the words in the 2 item sets.
Step 4.4: and reserving words with mutual information larger than a given threshold value Y, and sequencing the words according to word frequency to form final event description. In the present invention, the value of Y is selected to be 1.5.
The mutual information calculation formula in step 4.4 is:
C(W1) And C (W)2) Respectively indicate W contained in corpus1And W2Number of microblogs, C (W)1,W2) Indicates that W is contained at the same time1And W2The number of microblogs. And R is the size of the corpus, namely the total number of microblogs.
Claims (1)
1. A microblog online emergency detection method based on emotion analysis and labels is characterized by comprising the following steps:
(1) constructing an emotion analysis model, namely an emotion concurrence graph, by using an emotion classification model emotion wheel;
(2) performing sentiment classification on the microblogs in the microblog flow by using the sentiment analysis model constructed in the step (1), and detecting the burst period of the microblog flow by adopting a kleinberg algorithm;
(3) extracting microblog labels in a burst period, filtering out junk labels, and performing word segmentation processing on the rest labels; forming an initial keyword of an event;
(4) extracting words related to the keywords in the microblog by using the keywords generated in the step (3) to form final description of the event;
in the step (1), an emotion concurrence graph is constructed by the following method:
(1.1) using an emotion wheel model, and manually endowing reasonable words to emotion symbols;
(1.2) performing word segmentation processing on the original microblog data to form a microblog corpus;
(1.3) calculating the similarity between words of the microblog corpus and words of the emotion symbols by using a HowNet dictionary and adopting word similarity based on distance;
(1.3) the similarity of word detection is calculated using the following formula:
in the formula W1And W2Represents a word, word W1There are k terms: { n11,n12,…,n1k}, word W2There are p sense items: { n21,n22,…,n2p},p1And p2Denotes two sememes, d is p1And p2The path length in the semantic hierarchy is a positive integer α is an adjustable parameter;
(1.4) establishing connection among words with similarity larger than a given threshold lambda to finish the construction of the emotion concurrence graph; lambda is selected to be 0.6;
the step (3) comprises the following steps:
(3.1) performing part-of-speech tagging on the extracted tag, and removing the tag only with a verb or the tag only with a noun;
(3.2) rejecting labels containing special symbols in the labels;
(3.3) removing labels which contain standard date formats and only have numbers and punctuation marks;
the step (4) comprises the following steps:
(4.1) performing word segmentation on the residual labels in the burst period;
(4.2) calculating a frequent mode of related microblog label keywords in a burst period;
(4.3) extracting 2 item sets in the frequent pattern, and calculating mutual information among words in the 2 item sets;
(4.4) keeping the words with mutual information larger than a given threshold value gamma to form a final event description; selecting the value of gamma to be 1.5;
the mutual information calculation formula in step 4.4 is:
C(W1) And C (W)2) Respectively indicate W contained in corpus1And W2Number of microblogs, C (W)1,W2) Indicates that W is contained at the same time1And W2The number of microblogs; r is gauge of corpusAnd module, namely the total number of microblogs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610945406.6A CN106547875B (en) | 2016-11-02 | 2016-11-02 | Microblog online emergency detection method based on emotion analysis and label |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610945406.6A CN106547875B (en) | 2016-11-02 | 2016-11-02 | Microblog online emergency detection method based on emotion analysis and label |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106547875A CN106547875A (en) | 2017-03-29 |
CN106547875B true CN106547875B (en) | 2020-05-15 |
Family
ID=58393729
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610945406.6A Active CN106547875B (en) | 2016-11-02 | 2016-11-02 | Microblog online emergency detection method based on emotion analysis and label |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106547875B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886442A (en) * | 2017-11-28 | 2018-04-06 | 合肥工业大学 | Public's emotion distribution modeling method and device based on microblogging text |
JP7091700B2 (en) * | 2018-02-21 | 2022-06-28 | 富士通株式会社 | Information processing program, message analysis program, information processing device and information processing method |
CN109189910B (en) * | 2018-09-18 | 2019-09-10 | 哈尔滨工程大学 | A kind of label auto recommending method towards mobile application problem report |
CN109783800B (en) * | 2018-12-13 | 2024-04-12 | 北京百度网讯科技有限公司 | Emotion keyword acquisition method, device, equipment and storage medium |
CN109977231B (en) * | 2019-04-10 | 2021-04-02 | 上海海事大学 | Depressed mood analysis method based on emotional decay factor |
CN110990592B (en) * | 2019-11-07 | 2023-06-23 | 北京科技大学 | Online microblog burst topic detection method and detection device |
CN111950273B (en) * | 2020-07-31 | 2023-09-01 | 南京莱斯网信技术研究院有限公司 | Automatic network public opinion emergency identification method based on emotion information extraction analysis |
CN112084333B (en) * | 2020-08-31 | 2022-04-22 | 杭州电子科技大学 | Social user generation method based on emotional tendency analysis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246728A (en) * | 2013-05-10 | 2013-08-14 | 北京大学 | Emergency detection method based on document lexical feature variations |
CN103559233A (en) * | 2012-10-29 | 2014-02-05 | 中国人民解放军国防科学技术大学 | Extraction method for network new words in microblogs and microblog emotion analysis method and system |
CN104573031A (en) * | 2015-01-14 | 2015-04-29 | 哈尔滨工业大学深圳研究生院 | Micro blog emergency detection method |
CN105224604A (en) * | 2015-09-01 | 2016-01-06 | 天津大学 | A kind of microblogging incident detection method based on heap optimization and pick-up unit thereof |
CN105718598A (en) * | 2016-03-07 | 2016-06-29 | 天津大学 | AT based time model construction method and network emergency early warning method |
-
2016
- 2016-11-02 CN CN201610945406.6A patent/CN106547875B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559233A (en) * | 2012-10-29 | 2014-02-05 | 中国人民解放军国防科学技术大学 | Extraction method for network new words in microblogs and microblog emotion analysis method and system |
CN103246728A (en) * | 2013-05-10 | 2013-08-14 | 北京大学 | Emergency detection method based on document lexical feature variations |
CN104573031A (en) * | 2015-01-14 | 2015-04-29 | 哈尔滨工业大学深圳研究生院 | Micro blog emergency detection method |
CN105224604A (en) * | 2015-09-01 | 2016-01-06 | 天津大学 | A kind of microblogging incident detection method based on heap optimization and pick-up unit thereof |
CN105718598A (en) * | 2016-03-07 | 2016-06-29 | 天津大学 | AT based time model construction method and network emergency early warning method |
Non-Patent Citations (1)
Title |
---|
"一种基于情感符号的在线突发事件检测方法";张鲁民等;《计算机学报》;20130815(第8期);正文第1660-1666页、图2 * |
Also Published As
Publication number | Publication date |
---|---|
CN106547875A (en) | 2017-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106547875B (en) | Microblog online emergency detection method based on emotion analysis and label | |
Kumar et al. | Sentiment analysis of multimodal twitter data | |
Gokulakrishnan et al. | Opinion mining and sentiment analysis on a twitter data stream | |
WO2012096388A1 (en) | Unexpectedness determination system, unexpectedness determination method, and program | |
CN105183717A (en) | OSN user emotion analysis method based on random forest and user relationship | |
JP2010211594A (en) | Text analysis device and method, and program | |
Anoop et al. | Leveraging heterogeneous data for fake news detection | |
Stavrianou et al. | NLP-based feature extraction for automated tweet classification | |
Manke et al. | A review on: opinion mining and sentiment analysis based on natural language processing | |
CN109857869A (en) | A kind of hot topic prediction technique based on Ap increment cluster and network primitive | |
CN104484437B (en) | A kind of network short commentary emotion method for digging | |
CN104123336B (en) | Depth Boltzmann machine model and short text subject classification system and method | |
KR102185733B1 (en) | Server and method for automatically generating profile | |
Subramani et al. | Text mining and real-time analytics of twitter data: A case study of australian hay fever prediction | |
Kameswari et al. | Predicting Election Results using NLTK | |
CN107729509A (en) | The chapter similarity decision method represented based on recessive higher-dimension distributed nature | |
Vanetik et al. | Propaganda Detection in Russian Telegram Posts in the Scope of the Russian Invasion of Ukraine | |
Saqib et al. | Automatic classification of product reviews into interrogative and noninterrogative: Generating real time answer | |
Shirahatti et al. | Sentiment analysis on Twitter data using Hadoop | |
Mapa et al. | Text normalization in social media by using spell correction and dictionary based approach | |
Kotevska et al. | Automatic Categorization of Social Sensor Data | |
Jawale et al. | Design of automated sentiment or opinion discovery system to enhance its performance | |
CN110837740B (en) | Comment aspect opinion level mining method based on dictionary improvement LDA model | |
Singh et al. | Sentiment analysis of twitter data set: survey | |
Liu et al. | Discovering Opinion Changes in Online Reviews via Learning Fine-Grained Sentiments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |