CN107273496B - Method for detecting microblog network region emergency - Google Patents

Method for detecting microblog network region emergency Download PDF

Info

Publication number
CN107273496B
CN107273496B CN201710455550.6A CN201710455550A CN107273496B CN 107273496 B CN107273496 B CN 107273496B CN 201710455550 A CN201710455550 A CN 201710455550A CN 107273496 B CN107273496 B CN 107273496B
Authority
CN
China
Prior art keywords
word
microblog
equal
burst
ewc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710455550.6A
Other languages
Chinese (zh)
Other versions
CN107273496A (en
Inventor
仲兆满
管燕
李存华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Jinge Network Technology Co ltd
Jiangsu Ocean University
Original Assignee
Huaihai Institute of Techology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaihai Institute of Techology filed Critical Huaihai Institute of Techology
Priority to CN201710455550.6A priority Critical patent/CN107273496B/en
Publication of CN107273496A publication Critical patent/CN107273496A/en
Application granted granted Critical
Publication of CN107273496B publication Critical patent/CN107273496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a detection method of a regional emergency of a microblog network, which comprises the steps of (1) acquiring a regional microblog from a microblog network to obtain a microblog set P L MB, preprocessing the microblog to obtain a microblog set L MB, (2) extracting emergency words from the microblog set L MB to obtain an emergency word set EW, and (3) clustering the emergency words in the EW to obtain an emergency word cluster EWC { EWC ═ E } E1,ewc2,…,ewcqSuppose there are q word clusters. The method provided by the invention calculates the burst value of the word by using 4 types of indexes of word frequency, word-associated users, word distribution regions and word social behaviors, more reasonably utilizes the burst characteristics of microblog network words, and is more suitable for detecting the microblog network region emergency.

Description

Method for detecting microblog network region emergency
Technical Field
The invention relates to an information mining technology, in particular to a microblog network region emergency detection method.
Background
Microblogs are social media with strong real-time performance and interactivity, a platform for freely publishing contents and exchanging information is provided for users, and the microblogs become preferred media for people to burst material events, publish opinions and share experiences. In reality, many events occur on the microblog with the explosive materials, and then the traditional mainstream media are reported, such as the Boston explosion event in 2013, the Sacherfer's off-the-shelf event and the like. Microblog-oriented event detection has become a research hotspot in the recent event detection field.
The microblog-oriented local regional event detection (L) has become an emerging research direction because much content of the microblog contains regional information including the positions mentioned by the blog, the registered positions of users who issue the blog, geographical labels attached to the blog and the like, and the microblog-oriented local regional event detection (L) has a basic assumption that users rarely discuss events when no event occurs in a local region, and once the events occur, a great deal of discussion is provided, such as the occurrence of fire, explosion, flood, traffic accidents, pollution, disease infection and the like in the region.
Conference discourse set published in the us in 2010: the 19th international World Wide Web Conference (19th international World Wide Web Conference) in 2010, entitled: earthquake detection based on Twitter users, namely events are detected in real time through social sensors (Earth quakes Twitter users: real-time event detection by social sensors), the authors are Takeshi Sakaki, Makoto Okazaki, YutaMatsuo, and the text simulates each Twitter user as a node in a wireless sensor network, the process of publishing a blog text related to the Earthquake by the users is abstracted to the behavior of publishing information acquired by the nodes in the wireless sensor network, and whether the Earthquake occurs or not is confirmed through a time and space model of the blog text and subsequent filtering processing. However, the method needs some query entries designed manually, and is difficult to be applied to the detection of unconventional emergency events.
Journal published in china in 2016: modern book intelligence technology, the title is: microblog event detection and analysis based on geographic coordinates, wherein the author is as follows: li further, an Zhongjie, the article uses 5 indexes of the number of issued microblogs, the number of forwarded microblogs, the number of comments, the user liveness and the moving strength to construct the characteristics of the microblogs. When the method is used for detecting microblog emergencies, the considered characteristics of the social media of the microblog are not comprehensive, including frequency of emergency words, regional emergencies and the like, and a specific calculation method (including a formalized formula and the like) is not given when each index is calculated.
The set of meeting statements published in the United states in 2016, 39th International ACM information retrieval Conference (39th International ACM SIGIR Conference on Research and Development in informational retrieval), titled GeoBurst, which monitors regional events in Real Time from a geotag Tweet stream (GeoBurst: read-Time L temporal Event Detection in Geo-Tagged tweets), authors are Zhang Chao, Zhouguangyu, Yuan Quan, Zhuang Honglei, Zheng Yu, Kaplan L ane, Wang Shaowen, Hanjiawei, which first identifies some important microblogs as central axis points (Pivots) within a query window, further obtains bursty events by comparing temporal and spatial historical data, the method is based on text information, and is difficult to extract some short and short text features.
Disclosure of Invention
The invention aims to solve the technical problem of providing a novel method for detecting the microblog network region emergency, which more reasonably utilizes the emergency characteristics of microblog network words and is more suitable for detecting the microblog network region emergency.
The technical problem to be solved by the present invention is achieved by the following technical means. The invention provides a method for detecting a regional emergency of a microblog network, which is characterized by comprising the following specific steps of:
A. collecting regional microblogs from a microblog network to obtain a microblog set P L MB, and preprocessing the microblogs to obtain a microblog set L MB;
B. extracting a burst word from the microblog collection L MB to obtain a burst word collection EW;
C. clustering emergent words in EW to obtain an emergent event word cluster EWC (equal to { EWC) }1,ewc2,…,ewcq-assuming there are q word clusters;
in the method of the present invention, in the step a, a regional microblog is collected from a microblog network, and a microblog collection L MB is obtained after preprocessing, preferably, the following specific steps are adopted:
a1, acquiring a microblog information set P L MB ═ plmb (plmb) of a region L affected by using an acquisition tool1,plmb2,…,plmbmWhere plmb isi(i is more than or equal to 1 and less than or equal to m) is taken as each region microblog; m represents the number of regional microblogs;
a2, preprocessing a microblog set P L MB, removing link address and emoticon information in the microblog, removing microblogs with the length less than 5 words, and obtaining a preprocessed microblog set L MB, &lTtT translation = L "&gTt L &/T &gTt MB { lmb1,lmb2,…,lmbnTherein lmbi(1≤iN) is less than or equal to n) is the microblog of each region.
In the step B of the method of the present invention, the extraction of the unexpected word from the microblog collection L MB to obtain the unexpected word collection EW is preferably performed in the following specific steps:
b1, for each microblog lmb in L MBi(i is more than or equal to 1 and less than or equal to n) is segmented, n represents the number of microblog entries, stop words are removed, nouns, verbs, place names, personal names and proper nouns are reserved, and the final word set is obtained and is L MBW ═ w1,w2…,wr-assuming there are r words;
b2, calculating word wiAnd (i is more than or equal to 1 and less than or equal to r), assuming that the time point of the current emergency detection is k, selecting historical data of previous p moments as reference, and obtaining a word wiThe frequency burstiness at the k time point is defined as:
Figure GDA0002544731740000031
wherein the molecule
Figure GDA0002544731740000032
Is the word wiIn the denominator of the frequency of occurrence at the k time point
Figure GDA0002544731740000033
Figure GDA0002544731740000034
B3, calculating word wiAnd (i is more than or equal to 1 and less than or equal to r), supposing that the time point of the current emergency detection is k, selecting historical data of previous p moments as reference, and using a word wiThe associated user burstiness at k time point is defined as:
Figure GDA0002544731740000035
wherein the molecule
Figure GDA0002544731740000036
For the time point k, the word w is mentionediOf different users, in denominator
Figure GDA0002544731740000037
Figure GDA0002544731740000038
B4, calculating word wi(1 ≤ i ≤ r) regional paroxysmal word wiThe distribution regional burstiness at the k time point is defined as:
Figure GDA0002544731740000039
wherein the molecule
Figure GDA00025447317400000310
For the time point k, the word w is mentionediOf the different geotags of (2), in the denominator
Figure GDA00025447317400000311
Figure GDA00025447317400000312
B5, calculating word wiSocial behavior burstiness of (1 ≦ i ≦ r), word wiThe social behavior burst at the k time point is defined as:
Figure GDA00025447317400000313
wherein the molecule
Figure GDA00025447317400000314
For the time point k, the word w is mentionediThe sum of the forwarding number, the comment number and the reading number of the microblog, in the denominator
Figure GDA00025447317400000315
Figure GDA00025447317400000316
B6, and integrating the four burstiness of the steps B2, B3, B4 and B5 to finally obtain a word wiThe burst value at time point k is: BurstyScore (w)i)=α*F(wi)+β*U(u|wi)+χ*GT(gt|wi)+*SB(sb|wi) Wherein α, β and χ are regulating coefficients and are used for regulating the weight of four indexes, α + β + χ + (1), α is more than or equal to 0, β is more than or equal to 0, χ is more than or equal to 0 and more than or equal to 0;
b7, after calculating the burst value of each word, selecting n burst words by using the quartering difference, and forming a burst word set EW. The distance calculation method of the quartering difference comprises the following steps: iQS (EW) ═ Q3(EW)-Q1(EW). When the burst value of a word is larger than a certain threshold value, the word is taken as a burst word, and the calculation method of the threshold value is as follows: maximum (ew) ═ Q3(EW)+1.5×IQS(EW)。
In the method for detecting the microblog network regional emergency, the emergency words in the EW are clustered in the step C to obtain an emergency word cluster EWC (equal to { EWC) }1,ewc2,…,ewcqThe preferable specific steps are as follows:
c1, constructing a sudden word association network EWN ═ V, E based on the sudden word set EW obtained in step B, where V is the sudden word set EW and E represents the association strength between sudden words. Burst word ewi、ewjThe correlation strength is the number of times of co-occurrence of two words in the same microblog text;
after the C2 and the emergent word association network EWN are constructed, clustering is performed on EWN by using an open source C L UTO tool package to obtain an emergent event word cluster EWC (equal to { EWC) }1,ewc2,…,ewcqSuppose there are q word clusters.
Compared with the prior art, the invention provides an index for comprehensively utilizing the characteristics of the microblog network to detect the event, and provides 4 indexes of word frequency, word associated users, word distribution regions and word social behaviors to calculate the burst value of the word, so that the burst characteristics of microblog network words are more reasonably utilized, and the method is more suitable for detecting the burst event in the microblog network regions. And a specific calculation method is provided, so that the method has great practical value.
Drawings
FIG. 1 is a flowchart of a method for detecting a regional emergency event in a microblog network according to the invention;
fig. 2 is a flowchart of acquiring geographical microblogs from the microblog network to obtain a microblog set P L MB and preprocessing the microblogs to obtain a microblog set L MB in step 101 in fig. 1;
FIG. 3 is a flowchart of extracting a unexpected word from the microblog collection L MB to obtain an unexpected word collection EW in step 102 in FIG. 1;
fig. 4 is a diagram illustrating clustering of emergency words in EW as described in step 103 in fig. 1, to obtain an emergency word cluster EWC ═ EWC1,ewc2,…,ewcqThe flow chart of.
Detailed Description
The following describes the implementation of the present invention in further detail with reference to the accompanying drawings and the detailed description.
Referring to fig. 1, a method for detecting a microblog network regional emergency includes the following steps:
step 101, collecting regional microblogs from a microblog network to obtain a microblog set P L MB, preprocessing the microblogs to obtain a microblog set L MB, and referring to fig. 2, the method specifically comprises the following steps:
step 201, using a collection tool to obtain a microblog information set P L MB ═ { plmb ═ of a region L affected1,plmb2,…,plmbmWhere plmb isiAnd (i is more than or equal to 1 and less than or equal to m) is the microblog of each region. After the microblog applies for developer permission, different interfaces in the API are called, and dynamic microblog information around a certain position can be obtained. The returned microblog content, the forwarding number, the comment number, the praise number, the user information, the check-in place and the like can be obtained by calling the position service interface.
Step 202, preprocessing the microblog set P L MB, removing link address and emoticon information in the microblog, removing the microblog with the length less than 5 words, and obtaining a preprocessed microblog set L MB, &ttttranslation = L "&tttl &/ttt &mtb ═ { lmb &1,lmb2,…,lmbnTherein lmbiAnd (i is more than or equal to 1 and less than or equal to n) is the microblog of each region. In the collected regional microblogs, although targeted screening is already performed on massive microblogs, some interference information still exists, and the interference information needs to be filtered, so that the complexity of later-stage calculation is reduced.
102, extracting a burst word from the microblog collection L MB to obtain a burst word collection EW, which includes the following steps with reference to fig. 3:
step 301, for each microblog lmb in L MBi(i is more than or equal to 1 and less than or equal to n), dividing words, removing stop words, keeping nouns, verbs, place names, personal names and proper nouns, and obtaining a final word set of L MBW ═ w { (w)1,w2,…,wrLet r words. Because some verbs have no practical meaning, such as "hold, go, develop, meet", etc., the disabled verbs are further removed;
step 302, calculate word wiAnd (i is more than or equal to 1 and less than or equal to r), assuming that the time point of the current emergency detection is k, selecting historical data of previous p moments as reference, and obtaining a word wiThe frequency burstiness at the k time point is defined as:
Figure GDA0002544731740000061
wherein the molecule
Figure GDA0002544731740000062
Is the word wiIn the denominator of the frequency of occurrence at the k time point
Figure GDA0002544731740000063
Figure GDA0002544731740000064
F(wi) The larger the word w is, the more the current k time point is illustratediThe larger the frequency increment of the occurrence is, the more probable the occurrence is a sudden word;
step 303, calculating word wiAnd (i is more than or equal to 1 and less than or equal to r), supposing that the time point of the current emergency detection is k, selecting historical data of previous p moments as reference, and using a word wiThe associated user burstiness at k time point is defined as:
Figure GDA0002544731740000065
wherein the molecule
Figure GDA0002544731740000066
For the time point k, the word w is mentionediOf different users, in denominator
Figure GDA0002544731740000067
Figure GDA0002544731740000068
The larger the k time point, the word w is mentionediThe larger the increase in the number of users, the word wiThe more likely it is a sudden word;
step 304, calculating word wi(1 ≤ i ≤ r) regional paroxysmal word wiThe distribution regional burstiness at the k time point is defined as:
Figure GDA0002544731740000069
wherein the molecule
Figure GDA0002544731740000071
For the time point k, the word w is mentionediOf the different geotags of (2), in the denominator
Figure GDA0002544731740000072
Figure GDA0002544731740000073
GT(wi) The larger the k time point, the word w is mentionediThe greater the increase in the number of geotags, the word wiThe more likely it is a sudden word;
step 305, calculating word wiSocial behavior burstiness of (1 ≦ i ≦ r), word wiThe social behavior burst at the k time point is defined as:
Figure GDA0002544731740000074
wherein the molecule
Figure GDA0002544731740000075
For the time point k, the word w is mentionediThe sum of the forwarding number, the comment number and the reading number of the microblog, in the denominator
Figure GDA0002544731740000076
Figure GDA0002544731740000077
SB(wi) The larger the k time point, the word w is mentionediThe larger the increase in the number of social behaviors, the word wiThe more likely it is a sudden word;
step 306, integrating the four burstiness of the words to finally obtain a word wiThe burst value at time point k is: BurstyScore (w)i)=α*F(wi)+β*U(u|wi)+χ*GT(gt|wi)+*SB(sb|wi) Wherein α, β and χ are regulating coefficients and are used for regulating the weight of four indexes, α + β + χ + (1), α is more than or equal to 0, β is more than or equal to 0, χ is more than or equal to 0, and BurstyScore (w is more than or equal to 0)i) The larger the explication word wiThe greater the burstiness at time k, the word wiThe more likely it is a sudden word;
step 307, after the burst value of each word is calculated, n burst words are selected by using the quartering difference, and a burst word set EW is formed. The distance calculation method of the quartering difference comprises the following steps: iQS (EW) ═ Q3(EW)-Q1(EW). When the burst value of a word is larger than a certain threshold value, the word is taken as a burst word, and the calculation method of the threshold value is as follows: maximum (ew) ═ Q3(EW)+1.5×IQS(EW)。
Step 103, clustering the emergency words in the EW to obtain an emergency word cluster EWC ═ { EWC ═1,ewc2,…,ewcqReferring to fig. 4, the specific steps are as follows:
step 401, constructing a burst word association network EWN ═ V, E based on the burst feature set EW, where V is the burst word set EW and E represents the association strength between burst words. Burst word ewi、ewjThe correlation strength is the number of times of co-occurrence of two words in the same microblog text;
step 402, burst association networkEWN, clustering EWN by using an open-source C L UTO tool package to obtain an emergency word cluster EWC (EWC) ═ EWC1,ewc2,…,ewcqC L UTO provides three clustering algorithms, which can be directly clustered on the characteristic space of the clustering object or clustered according to the similar space of the object.
Comparative example: and comparing the detection effectiveness of the regional emergency by using three different microblog network regional emergency detection methods. The three methods are as follows:
(1) the method 1-HBED includes the steps of selecting Hashtag contained in microblogs, representing the Hashtag as a vector mode, calculating the weight of words in a TF-IDF mode, and considering the number change of the microblogs contained in one cluster when calculating the heat degree of the cluster.
(2) According to the method 2-GeoBurst, some important microblogs are firstly identified in a query window as central axis points, and further emergencies are obtained through comparison with historical data in the aspect of space and time.
(3) The method 3-L ocTBED mainly includes the steps of performing clustering by using a clustering method baglo provided by C L UTO, wherein the number of clusters is designated as 10, a similarity function of the clustering is designated as a cosine function cos, and when a burst value of a word is calculated, a historical investigation time of the word is set to be one week (7 days), and when four types of indexes are accumulated, a regulation parameter α - β - χ -0.25 is set.
By taking a real social media-Sinlang microblog as an example, the microblog with the geographical labels in two cities of Beijing and Jiangsu Hongkong of Jiangsu province are collected, the time for collecting the information in the Beijing area is 1 month and 1 day to 12 months and 30 days (data of one month) in 2016, 346863 microblogs with the geographical labels are collected together, the time for collecting the information in the Hongkong city is 5 months and 1 day to 10 months and 31 days (data of half a year) in 2016, and 63744 microblogs with the geographical labels are collected together. The validity of various event detection methods is verified by taking a day as a unit, namely, the region emergency on a certain specified day is detected.
Because the region emergency of each city every day is unknown, the precision ratio P @ n is adopted as an evaluation index by referring to the existing mainstream research method at present. For the Top-k emergency detected every day, manually judging whether the detected Top-k emergency is a regional emergency, wherein the workload of manual evaluation is not complicated because the number of the Top-k detected events is small.
The results obtained by the 3 methods on the 5 evaluation indexes are shown in table 1.
TABLE 1.5 test results of 5 evaluation indexes by the method
Methods P@1 P@2 P@3 P@4 P@5 Average
HBED 0.20 0.30 0.20 0.30 0.24 0.24
GeoBurst 0.80 0.70 0.80 0.75 0.72 0.72
LocTBED 0.80 0.80 0.87 0.80 0.76 0.76
Compared with 3 methods, the L ocTBED provided by the method has the most ideal effect, the average value obtained on 5 evaluation indexes is 0.76, GeoBurst is used secondly, the average value obtained on 5 evaluation indexes is 0.72, although the values obtained by the two methods are relatively close, the two methods have larger difference in the sequencing of the emergency in the detection result, and L ocTBED considers the number of the regional words contained in the cluster when calculating the heat degree of the emergency cluster, thereby being of great help for detecting the regional emergency.
The effect deviation of the HBED is mainly caused by the fact that the number of the acquired geographical label microblogs with Hashtag is small, the acquired geographical label microblogs are mostly wide-area events, and the detection of the regional events is not applicable.
The method of the present invention is not limited to the examples described in the specific embodiments, and other embodiments derived from the technical solutions of the present invention by those skilled in the art also belong to the technical innovation scope of the present invention.

Claims (3)

1. A method for detecting a microblog network region emergency is characterized by comprising the following specific steps:
A. collecting regional microblogs from a microblog network to obtain a microblog set P L MB, and preprocessing the microblogs to obtain a microblog set L MB;
B. extracting a burst word from the microblog collection L MB to obtain a burst word collection EW;
C. clustering emergent words in EW, and assuming that q word clusters exist, obtaining an emergent event word cluster EWC (EWC)1,ewc2,…,ewcq};
The specific steps of the step B are as follows:
b1, for each microblog lmb in L MBi(i is more than or equal to 1 and less than or equal to n) is segmented, n represents the number of microblog entries, stop words are removed, nouns, verbs, place names, personal names and proper nouns are reserved, and the final word set is obtained and is L MBW ═ w1,w2,…,wr-assuming there are r words;
b2, calculating word wiAnd (i is more than or equal to 1 and less than or equal to r), assuming that the time point of the current emergency detection is k, selecting historical data of previous p moments as reference, and obtaining a word wiThe frequency burstiness at the k time point is defined as:
Figure FDA0002503500210000011
wherein the molecule
Figure FDA0002503500210000012
Is the word wiIn the denominator of the frequency of occurrence at the k time point
Figure FDA0002503500210000013
B3, calculating word wiAnd (i is more than or equal to 1 and less than or equal to r), supposing that the time point of the current emergency detection is k, selecting historical data of previous p moments as reference, and using a word wiThe associated user burstiness at k time point is defined as:
Figure FDA0002503500210000014
wherein the molecule
Figure FDA0002503500210000015
For the time point k, the word w is mentionediOf different users, in denominator
Figure FDA0002503500210000016
B4, calculating word wi(1 ≤ i ≤ r) regional paroxysmal word wiThe distribution regional burstiness at the k time point is defined as:
Figure FDA0002503500210000017
wherein the molecule
Figure FDA0002503500210000018
For the time point k, the word w is mentionediOf the different geotags of (2), in the denominator
Figure FDA0002503500210000021
B5, calculating word wiSocial behavior burstiness of (1 ≦ i ≦ r), word wiThe social behavior burst at the k time point is defined as:
Figure FDA0002503500210000022
wherein the molecule
Figure FDA0002503500210000025
For the time point k, the word w is mentionediThe sum of the forwarding number, the comment number and the reading number of the microblog, in the denominator
Figure FDA0002503500210000023
Figure FDA0002503500210000024
B6, and integrating the four burstiness of the steps B2, B3, B4 and B5 to finally obtain a word wiThe burst value at time point k is: BurstyScore (w)i)=α*F(wi)+β*U(u|wi)+χ*GT(gt|wi)+*SB(sb|wi) Wherein α, β and χ are regulating coefficients and are used for regulating the weight of four indexes, α + β + χ + (1), α is more than or equal to 0, β is more than or equal to 0, χ is more than or equal to 0 and more than or equal to 0;
b7, after the burst value of each word is calculated, selecting n burst words by using quartering difference to form a burst word set EW; the distance calculation method of the quartering difference comprises the following steps: iQS (EW) ═ Q3(EW)-Q1(EW); when the burst value of a word is larger than a certain threshold value, the word is taken as a burst word, and the calculation method of the threshold value is as follows: maximum (ew) ═ Q3(EW)+1.5×IQS(EW)。
2. The method for detecting the regional emergency of the microblog network according to claim 1, wherein the method comprises the following steps: the specific steps of the step A are as follows:
a1, acquiring a microblog information set P L MB ═ plmb (plmb) of a region L affected by using an acquisition tool1,plmb2,…,plmbmWherein plmbi(i is more than or equal to 1 and less than or equal to m) is taken as each region microblog; m represents the number of regional microblogs;
a2, preprocessing a microblog set P L MB, removing link address and emoticon information in the microblog, removing microblogs with the length less than 5 words, and obtaining a preprocessed microblog set L MB, &lTtT translation = L "&gTt L &/T &gTt MB { lmb1,lmb2,…,lmbnTherein lmbiAnd (i is more than or equal to 1 and less than or equal to n) is the microblog of each region.
3. The method for detecting the microblog network regional emergency according to claim 1, wherein the step C comprises the following specific steps:
c1, constructing a burst word association network EWN ═ V, E based on the burst feature set EW obtained in the step B, wherein V is the burst word set EW, and E represents the association strength between burst words; burst word ewi、ewjThe correlation strength is the number of times of co-occurrence of two words in the same microblog text;
after the C2 and the emergent word association network EWN are constructed, clustering is performed on EWN by using an open source C L UTO tool package to obtain an emergent event word cluster EWC (equal to { EWC) }1,ewc2,…,ewcqSuppose there are q word clusters.
CN201710455550.6A 2017-06-15 2017-06-15 Method for detecting microblog network region emergency Active CN107273496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710455550.6A CN107273496B (en) 2017-06-15 2017-06-15 Method for detecting microblog network region emergency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710455550.6A CN107273496B (en) 2017-06-15 2017-06-15 Method for detecting microblog network region emergency

Publications (2)

Publication Number Publication Date
CN107273496A CN107273496A (en) 2017-10-20
CN107273496B true CN107273496B (en) 2020-07-28

Family

ID=60067208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710455550.6A Active CN107273496B (en) 2017-06-15 2017-06-15 Method for detecting microblog network region emergency

Country Status (1)

Country Link
CN (1) CN107273496B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733791B (en) * 2018-05-11 2020-11-20 北京科技大学 Network event detection method
CN109509110B (en) * 2018-07-27 2021-08-31 福州大学 Microblog hot topic discovery method based on improved BBTM model
CN110502703A (en) * 2019-07-12 2019-11-26 北京邮电大学 Social networks incident detection method based on character string dictionary building
CN111475732B (en) * 2020-04-13 2023-07-14 深圳市雅阅科技有限公司 Information processing method and device
CN112257429B (en) * 2020-10-16 2024-04-16 北京工商大学 Microblog emergency detection method based on BERT-BTM network
CN112528024B (en) * 2020-12-15 2022-11-18 哈尔滨工程大学 Microblog emergency detection method based on multi-feature fusion
CN112527960A (en) * 2020-12-17 2021-03-19 华东师范大学 Emergency detection method based on keyword clustering
CN112948587A (en) * 2021-03-30 2021-06-11 杭州叙简科技股份有限公司 Microblog public opinion analysis method and device based on earthquake industry and electronic equipment
CN114461763B (en) * 2022-04-13 2022-07-15 南京众智维信息科技有限公司 Network security event extraction method based on burst word clustering

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281608A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Emergency analyzing method based on microblogs
US9397904B2 (en) * 2013-12-30 2016-07-19 International Business Machines Corporation System for identifying, monitoring and ranking incidents from social media
CN104216954B (en) * 2014-08-20 2017-07-14 北京邮电大学 The prediction meanss and Forecasting Methodology of accident topic state
CN106294333B (en) * 2015-05-11 2019-10-29 国家计算机网络与信息安全管理中心 A kind of microblogging burst topic detection method and device
US20170024412A1 (en) * 2015-07-17 2017-01-26 Environmental Systems Research Institute (ESRI) Geo-event processor

Also Published As

Publication number Publication date
CN107273496A (en) 2017-10-20

Similar Documents

Publication Publication Date Title
CN107273496B (en) Method for detecting microblog network region emergency
Dahal et al. Topic modeling and sentiment analysis of global climate change tweets
Resch et al. Combining machine-learning topic models and spatiotemporal analysis of social media data for disaster footprint and damage assessment
Hou et al. Survey on data analysis in social media: A practical application aspect
Yang et al. Automatic detection of rumor on sina weibo
Xu et al. Participatory sensing-based semantic and spatial analysis of urban emergency events using mobile social media
CN105488092B (en) A kind of time-sensitive and adaptive sub-topic online test method and system
CN109783614B (en) Differential privacy disclosure detection method and system for to-be-published text of social network
CN105630884B (en) A kind of geographical location discovery method of microblog hot event
Farseev et al. bbridge: A big data platform for social multimedia analytics
Wang et al. Urban crisis detection technique: A spatial and data driven approach based on latent Dirichlet allocation (LDA) topic modeling
Ebrahimi et al. Twitter user geolocation by filtering of highly mentioned users
Manaskasemsak et al. Graph clustering-based emerging event detection from twitter data stream
Apostol et al. ContCommRTD: A distributed content-based misinformation-aware community detection system for real-time disaster reporting
Fuchs et al. Extracting personal behavioral patterns from geo-referenced tweets
Roedler et al. Profile matching across online social networks based on geo-tags
Hou et al. Understanding social media beyond text: a reliable practice on Twitter
Wu et al. Mining typhoon victim information based on multi-source data fusion using social media data in China: a case study of the 2019 Super Typhoon Lekima
Lei et al. Can we monitor the natural environment analyzing online social network posts? A literature review
Qian et al. Quantifying urban linguistic diversity related to rainfall and flood across China with social media data
Kim et al. Mining based urban climate disaster index service according to potential risk
Zhou et al. Classification of microblogs for support emergency responses: Case study Yushu earthquake in China
Stojanovski et al. Social networks VGI: Twitter sentiment analysis of social hotspots
Bayer et al. Information overload in crisis management: Bilingual evaluation of embedding models for clustering social media posts in emergencies
Ma et al. “Hello, Fellow Villager!”: Perceptions and Impact of Displaying Users’ Locations on Weibo

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 222000 zhongzhaoman transfer of computer school of Huaihai Institute of technology, No. 59 Cangwu Road, Haizhou District, Lianyungang City, Jiangsu Province

Patentee after: Jiangsu Ocean University

Country or region after: China

Address before: 222000 zhongzhaoman transfer of computer school of Huaihai Institute of technology, No. 59 Cangwu Road, Haizhou District, Lianyungang City, Jiangsu Province

Patentee before: HUAIHAI INSTITUTE OF TECHNOLOGY

Country or region before: China

TR01 Transfer of patent right

Effective date of registration: 20241010

Address after: Floor 17-2-12, Huaguoshan Avenue, Haizhou District, Lianyungang City, Jiangsu Province, 222000

Patentee after: JIANGSU JINGE NETWORK TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 222000 zhongzhaoman transfer of computer school of Huaihai Institute of technology, No. 59 Cangwu Road, Haizhou District, Lianyungang City, Jiangsu Province

Patentee before: Jiangsu Ocean University

Country or region before: China