CN107273496B - A detection method for regional emergencies in Weibo network - Google Patents

A detection method for regional emergencies in Weibo network Download PDF

Info

Publication number
CN107273496B
CN107273496B CN201710455550.6A CN201710455550A CN107273496B CN 107273496 B CN107273496 B CN 107273496B CN 201710455550 A CN201710455550 A CN 201710455550A CN 107273496 B CN107273496 B CN 107273496B
Authority
CN
China
Prior art keywords
word
microblog
burst
regional
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710455550.6A
Other languages
Chinese (zh)
Other versions
CN107273496A (en
Inventor
仲兆满
管燕
李存华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Jinge Network Technology Co ltd
Jiangsu Ocean University
Original Assignee
Jiangsu Ocean University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Ocean University filed Critical Jiangsu Ocean University
Priority to CN201710455550.6A priority Critical patent/CN107273496B/en
Publication of CN107273496A publication Critical patent/CN107273496A/en
Application granted granted Critical
Publication of CN107273496B publication Critical patent/CN107273496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a detection method of a regional emergency of a microblog network, which comprises the steps of (1) acquiring a regional microblog from a microblog network to obtain a microblog set P L MB, preprocessing the microblog to obtain a microblog set L MB, (2) extracting emergency words from the microblog set L MB to obtain an emergency word set EW, and (3) clustering the emergency words in the EW to obtain an emergency word cluster EWC { EWC ═ E } E1,ewc2,…,ewcqSuppose there are q word clusters. The method provided by the invention calculates the burst value of the word by using 4 types of indexes of word frequency, word-associated users, word distribution regions and word social behaviors, more reasonably utilizes the burst characteristics of microblog network words, and is more suitable for detecting the microblog network region emergency.

Description

一种微博网络地域突发事件的检测方法A detection method for regional emergencies in Weibo network

技术领域technical field

本发明涉及一种信息挖掘技术,具体地说,涉及一种微博网络地域突发事件检测方法。The invention relates to an information mining technology, in particular to a method for detecting regional emergencies in a microblog network.

背景技术Background technique

微博作为实时性、交互性很强的社交媒体,为用户提供了自由发表内容以及信息交换的平台,已经成为人们爆料事件、发表观点、分享经验的首选媒体。现实中发生的很多事件在微博上都先有爆料,而后传统的主流媒体才予以报道,比如,2013年的波士顿爆炸事件、撒切尔夫人的离世等等事件。面向微博的事件检测已成为近期事件检测领域的研究热点。As a real-time and highly interactive social media, Weibo provides a platform for users to freely publish content and exchange information. Many events that happened in reality were first revealed on Weibo, and then the traditional mainstream media reported them, such as the Boston bombing in 2013, the death of Margaret Thatcher, and so on. Weibo-oriented event detection has become a research hotspot in the field of event detection recently.

由于微博的很多内容带有地域信息,包括博文提及的地点,发表博文的用户的注册地点,以及博文附带的地理标签等,面向微博的局部地域事件检测(Localized event)已经成为了新兴的研究方向。这类事件检测有一个基本假设,即当本地域没有事件发生的时候,用户很少会讨论此类事件,一旦发生了,就会有大量的讨论,比如地域发生火灾、爆炸、洪水、交通事故、污染、疾病传染等等事件。这与社交媒体的广域事件检测(Global event)有很大的不同,广域事件检测不考虑地域特性,面对的是媒体的整个信息流,不仅分析的工作量大,而且可能忽略了局部地域的热点事件,已有的事件检测方法难以直接应用到地域事件检测之中。Since many contents of Weibo contain regional information, including the location mentioned in the blog post, the registration location of the user who published the blog post, and the geotags attached to the blog post, etc., localized event detection for Weibo has become an emerging trend. research direction. This type of event detection has a basic assumption, that is, when no events occur in the local domain, users rarely discuss such events, and once they occur, there will be a lot of discussions, such as fires, explosions, floods, and traffic accidents in the region. , pollution, disease transmission, etc. This is very different from the global event detection of social media. The global event detection does not consider regional characteristics, and faces the entire information flow of the media, which not only requires a large amount of analysis work, but also may ignore local For regional hot events, the existing event detection methods are difficult to directly apply to regional event detection.

在2010年美国出版的会议论文集:2010年第19届国际万维网会议(19thInternational World Wide Web Conference),题目为:基于Twitter用户的地震检测-通过社交传感器实时检测事件(Earthquake shakes Twitter users:real-time eventdetection by social sensors),作者是Takeshi Sakaki,Makoto Okazaki,YutakaMatsuo,该文把每个Twitter用户模拟成无线传感器网络中的节点,用户发表与地震相关的博文的过程被抽象成无线传感器网络中的节点发布自身采集到的信息行为,再通过博文的时间和空间模型及后续的滤波处理,对地震是否发生进行确认。但该方法需要人工设计一些查询输入项,难以应用到非常规的突发事件的检测。Proceedings published in the United States in 2010: 19th International World Wide Web Conference, 2010, titled: Earthquake shakes Twitter users: real- time eventdetection by social sensors), the authors are Takeshi Sakaki, Makoto Okazaki, YutakaMatsuo, this article simulates each Twitter user as a node in the wireless sensor network, and the process of users publishing earthquake-related blog posts is abstracted into the wireless sensor network. The node publishes the information behavior it has collected, and then confirms whether the earthquake occurs through the time and space model of the blog post and subsequent filtering processing. However, this method needs to manually design some query input items, which is difficult to apply to the detection of unconventional emergencies.

在2016年中国出版的期刊:现代图书情报技术,题目为:基于地理坐标的微博事件检测与分析,作者是:李进华,安仲杰,该文使用了微博数据的发布数、转发数、评论数、用户活跃度和移动强度5个指标构建微博的特征。该方法在检测微博突发事件时,考虑到的微博类的社交媒体的特征并不全面,包括突发词的频率、地域突发性等,而且在计算各个指标时并没有给出具体的计算方法(包括形式化的公式等等)。The journal published in China in 2016: Modern Library and Information Technology, with the title: Detection and Analysis of Weibo Events Based on Geographic Coordinates, by Li Jinhua and An Zhongjie The characteristics of Weibo are constructed from five indicators: number, user activity and mobile intensity. When this method detects microblog emergencies, the characteristics of social media such as microblogs are not comprehensive, including the frequency of sudden words, regional suddenness, etc., and no specific information is given when calculating each indicator. The calculation method (including formal formulas, etc.).

在2016年美国出版的会议论文集:第39届国际ACM信息检索会议(39thInternational ACM SIGIR Conference on Research and Development in InformationRetrieval),题目为:GeoBurst:从地理标签推特流中实时监测区域事件(GeoBurst:Real-Time Local Event Detection in Geo-Tagged Tweet Streams),作者是Zhang Chao,ZhouGuangyu,Yuan Quan,Zhuang Honglei,Zheng Yu,Kaplan Lance,Wang Shaowen,HanJiawei,该文首先在查询窗口内识别一些重要微博作为中心轴点(Pivots),进一步通过与历史数据在时空方面的比较得到突发事件。该方法是从微博文本信息的角度出发,由于微博比较短小,且用语不规范,直接从一些短小的单篇微博文本中难以提取出有效的特征。Proceedings published in the United States in 2016: 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, titled: GeoBurst: Real-time Monitoring of Regional Events from Geo-Tagged Twitter Streams (GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams), the authors are Zhang Chao, ZhouGuangyu, Yuan Quan, Zhuang Honglei, Zheng Yu, Kaplan Lance, Wang Shaowen, HanJiawei, this article first identifies some important microblogs in the query window As the central pivot point (Pivots), emergent events are further obtained by comparing with historical data in space and time. This method is based on the microblog text information. Because microblogs are relatively short and the language is not standardized, it is difficult to extract effective features directly from some short single microblog texts.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是针对现有技术的不足,提供一种新的微博网络地域突发事件的检测方法,该方法更合理的利用了微博网络词的突发特征,更适合微博网络地域突发事件的检测。The technical problem to be solved by the present invention is to aim at the deficiencies of the prior art, and to provide a new detection method of microblog network regional emergencies, which more reasonably utilizes the sudden characteristics of microblog network words and is more suitable for microblog Detection of regional emergencies in the blog network.

本发明所要解决的技术问题是通过以下的技术方案来实现的。本发明提供了一种微博网络地域突发事件的检测方法,其特点是,其具体步骤如下:The technical problem to be solved by the present invention is achieved through the following technical solutions. The invention provides a detection method for microblog network regional emergencies, which is characterized in that the specific steps are as follows:

A、从微博网络中采集地域微博,得到微博集合PLMB,对微博预处理后得到微博集合LMB;A. Collect regional microblogs from the microblog network, obtain the microblog set PLMB, and obtain the microblog set LMB after preprocessing the microblog;

B、从微博集合LMB中提取突发词,得到突发词集合EW;B. Extract the burst words from the microblog set LMB to obtain the burst word set EW;

C、对EW中的突发词进行聚类,得到突发事件词簇EWC={ewc1,ewc2,…,ewcq},假设有q个词簇;C. Cluster the emergent words in EW to obtain the emergent event word cluster EWC={ewc 1 ,ewc 2 ,...,ewc q }, assuming that there are q word clusters;

本发明方法所述的步骤A中所述的从微博网络中采集地域微博,预处理后得到微博集合LMB,优选采用以下具体步骤:In the step A of the method of the present invention, collecting regional microblogs from the microblog network, and obtaining the microblog set LMB after preprocessing, preferably adopts the following specific steps:

A1、使用采集工具获取地域Localized的微博信息集合PLMB={plmb1,plmb2,…,plmbm},其中plmbi(1≤i≤m)为每一条地域微博;m代表地域微博的条数;A1. Use the collection tool to obtain the localized microblog information set PLMB={plmb 1 , plmb 2 , ..., plmb m }, where plmb i (1≤i≤m) is each regional microblog; m represents the regional microblog the number of bars;

A2、对微博集合PLMB进行预处理,去除微博中链接网址、表情符号信息,去除长度小于5个字的微博,得到预处理后的微博集合LMB,LMB={lmb1,lmb2,…,lmbn},其中lmbi(1≤i≤n)为每一条地域微博。A2. Preprocess the microblog set PLMB, remove the link URL and emoticon information in the microblog, remove the microblogs whose length is less than 5 words, and obtain the preprocessed microblog set LMB, LMB={lmb 1 , lmb 2 , ..., lmb n }, where lmb i (1≤i≤n) is each regional microblog.

本发明方法所述的步骤B中所述的从微博集合LMB中提取突发词,得到突发词集合EW,其优选的具体步骤如下:Extracting burst words from the microblog set LMB as described in step B of the method of the present invention to obtain burst word set EW, the preferred specific steps are as follows:

B1、对LMB中的每条微博lmbi(1≤i≤n)进行分词,n代表微博的条数,去除停用词,保留名词、动词、地名、人名、专有名词,得到最终的词集合为LMBW={w1,w2…,wr},假设有r个词;B1. Perform word segmentation on each microblog lmb i (1≤i≤n) in LMB, n represents the number of microblogs, remove stop words, keep nouns, verbs, place names, person names, proper nouns, and get the final The set of words is LMBW={w 1 , w 2 ..., w r }, assuming there are r words;

B2、计算词wi(1≤i≤r)的频率突发性,假设当前突发事件检测的时间点为k,选取之前的p个时刻的历史数据为参考,词wi在k时间点的频率突发性定义为:

Figure GDA0002544731740000031
其中,分子
Figure GDA0002544731740000032
为词wi在k时间点出现的频率,分母中的
Figure GDA0002544731740000033
Figure GDA0002544731740000034
B2. Calculate the frequency burstiness of word wi (1≤i≤r), assuming that the time point of the current emergency event detection is k, select the historical data of the previous p moments as a reference, and the word wi is at the k time point The frequency burstiness of is defined as:
Figure GDA0002544731740000031
Among them, the molecule
Figure GDA0002544731740000032
is the frequency of word wi at time k, where in the denominator
Figure GDA0002544731740000033
Figure GDA0002544731740000034

B3、计算词wi(1≤i≤r)的关联用户突发性,假设当前突发事件检测的时间点为k,选取之前的p个时刻的历史数据为参考,词wi在k时间点的关联用户突发性定义为:

Figure GDA0002544731740000035
其中,分子
Figure GDA0002544731740000036
为k时间点,提及到词wi的不同用户数量,分母中的
Figure GDA0002544731740000037
Figure GDA0002544731740000038
B3. Calculate the associated user emergencies of word wi (1≤i≤r), assuming that the current emergency detection time point is k, select the historical data of the previous p moments as a reference, and the word wi is at time k The associated user burstiness of a point is defined as:
Figure GDA0002544731740000035
Among them, the molecule
Figure GDA0002544731740000036
is the k time point, the number of different users who mentioned the word wi , in the denominator
Figure GDA0002544731740000037
Figure GDA0002544731740000038

B4、计算词wi(1≤i≤r)的地域突发性,词wi在k时间点的分布地域突发性定义为:

Figure GDA0002544731740000039
其中,分子
Figure GDA00025447317400000310
为k时间点,提及到词wi的不同地理标签的数量,分母中的
Figure GDA00025447317400000311
Figure GDA00025447317400000312
B4. Calculate the regional burstiness of word wi (1≤i≤r), and the regional burstiness of word wi at time k is defined as:
Figure GDA0002544731740000039
Among them, the molecule
Figure GDA00025447317400000310
for k time points, the number of distinct geotags that mention word wi , in the denominator
Figure GDA00025447317400000311
Figure GDA00025447317400000312

B5、计算词wi(1≤i≤r)的社交行为突发性,词wi在k时间点的社交行为突发性定义为:

Figure GDA00025447317400000313
其中,分子
Figure GDA00025447317400000314
为k时间点,提及到词wi的微博的转发数、评论数和阅读数之和,分母中的
Figure GDA00025447317400000315
Figure GDA00025447317400000316
B5. Calculate the suddenness of social behavior of word wi (1≤i≤r). The suddenness of social behavior of word wi at time k is defined as:
Figure GDA00025447317400000313
Among them, the molecule
Figure GDA00025447317400000314
is the sum of the number of retweets, comments and readings of the microblogs mentioning the word wi at time point k, the denominator
Figure GDA00025447317400000315
Figure GDA00025447317400000316

B6、综合步骤B2、B3、B4、B5的四个突发性,最终得到一个词wi在k时间点的突发值为:BurstyScore(wi)=α*F(wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi),其中,α、β、χ、δ为调节系数,用于调节四类指标的权重,α+β+χ+δ=1,α≥0,β≥0,χ≥0,δ≥0;B6. Synthesize the four bursts of steps B2, B3, B4, and B5, and finally obtain the burst value of a word wi at time k: BurstyScore( wi )=α*F( wi )+β* U(u|w i )+χ*GT(gt|w i )+δ*SB(sb|w i ), where α, β, χ, and δ are adjustment coefficients, which are used to adjust the weights of the four types of indicators, α+β+χ+δ=1, α≥0, β≥0, χ≥0, δ≥0;

B7、在计算出每个词的突发值后,使用四分差选出n个突发词,构成突发词集合EW。四分差的距离计算方法为:IQS(EW)=Q3(EW)-Q1(EW)。当一个词的突发值大于一定的阈值,则作为突发词,阈值的计算方法为:maxima(EW)=Q3(EW)+1.5×IQS(EW)。B7. After calculating the burst value of each word, use the quartile difference to select n burst words to form a burst word set EW. The distance calculation method of the quartile difference is: IQS(EW)=Q 3 (EW)-Q 1 (EW). When the burst value of a word is greater than a certain threshold, it is regarded as a burst word, and the calculation method of the threshold is: maxima(EW)=Q 3 (EW)+1.5×IQS(EW).

本发明方法所述的一种微博网络地域突发事件检测方法,所述的步骤C中对EW中的突发词进行聚类,得到突发事件词簇EWC={ewc1,ewc2,…,ewcq},优选的具体步骤如下:In the method of the present invention, in the method for detecting emergencies in microblog network regions, in the step C, the emergent words in the EW are clustered to obtain the emergent event word cluster EWC={ewc 1 ,ewc 2 , ..., ewc q }, the preferred specific steps are as follows:

C1、基于步骤B获取的突发特征集EW,构建突发词关联网络EWN=(V,E),V是突发词集合EW,E表示突发词之间的关联强度。突发词ewi、ewj关联强度是统计两个词在同一篇微博博文中共现的次数;C1. Based on the burst feature set EW obtained in step B, construct a burst word association network EWN=(V, E), where V is the burst word set EW, and E represents the strength of association between burst words. The association strength of sudden words ew i and ew j is to count the number of co-occurrences of the two words in the same Weibo post;

C2、突发词关联网络EWN构建完成后,使用开源的CLUTO工具包对EWN进行聚类,获取突发事件词簇EWC={ewc1,ewc2,…,ewcq},假设有q个词簇。C2. After the construction of the EWN of the burst word association network is completed, use the open source CLUTO toolkit to cluster the EWN to obtain the burst word cluster EWC={ewc 1 ,ewc 2 ,...,ewc q }, assuming that there are q words cluster.

与现有技术相比,本发明提出了全面的利用微博网络的特征进行事件检测的指标,提出了利用词频率、词关联用户、词分布地域及词社交行为4类指标,计算词的突发值,更合理的利用了微博网络词的突发特征,更适合微博网络地域突发事件的检测。并给出了具体的计算方法,有很大的实用价值。Compared with the prior art, the present invention proposes a comprehensive index for event detection using the characteristics of the microblog network, and proposes four types of indicators, including word frequency, word-related users, word distribution area, and word social behavior, to calculate the prominence of words. It makes more reasonable use of the sudden characteristics of Weibo network words, and is more suitable for the detection of regional emergencies in Weibo network. And gives the specific calculation method, which has great practical value.

附图说明Description of drawings

图1是本发明的微博网络地域突发事件检测方法的一种流程图;Fig. 1 is a kind of flow chart of the microblog network regional emergency detection method of the present invention;

图2是图1中步骤101所述的从微博网络中采集地域微博,得到微博集合PLMB,对微博预处理后得到微博集合LMB的流程图;Fig. 2 is the flow chart of collecting regional microblogs from the microblog network described in step 101 in Fig. 1, obtaining the microblog set PLMB, and obtaining the microblog set LMB after preprocessing the microblog;

图3是图1中步骤102所述的从微博集合LMB中提取突发词,得到突发词集合EW的流程图;Fig. 3 is described in step 102 in Fig. 1 from micro-blog set LMB to extract burst word, obtains the flow chart of burst word set EW;

图4是图1中步骤103所述的对EW中的突发词进行聚类,得到突发事件词簇EWC={ewc1,ewc2,…,ewcq}的流程图。FIG. 4 is a flowchart of clustering the emergent words in the EW described in step 103 in FIG. 1 to obtain the emergent event word cluster EWC={ewc 1 , ewc 2 , . . . , ewc q }.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明的实施过程作进一步详细的描述。The implementation process of the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

参照图1,一种微博网络地域突发事件的检测方法,该方法包括如下步骤:Referring to Fig. 1, a detection method of microblog network regional emergencies, the method comprises the following steps:

步骤101、从微博网络中采集地域微博,得到微博集合PLMB,对微博预处理后得到微博集合LMB,参照图2,其具体步骤如下:Step 101: Collect regional microblogs from the microblog network, obtain a microblog set PLMB, and obtain a microblog set LMB after preprocessing the microblogs. Referring to FIG. 2, the specific steps are as follows:

步骤201、使用采集工具获取地域Localized的微博信息集合PLMB={plmb1,plmb2,…,plmbm},其中plmbi(1≤i≤m)为每一条地域微博。在微博申请开发者权限后,调用API中不同接口,可以获取到某个位置周边的动态微博信息。调用位置服务接口可以获取返回的微博内容、转发数、评论数、点赞数、用户信息、签到地点等。Step 201 , using a collection tool to obtain a localized microblog information set PLMB={plmb 1 , plmb 2 , . . . , plmb m }, where plmb i (1≤i≤m) is each regional microblog. After applying for developer permission on Weibo, you can obtain dynamic Weibo information around a certain location by calling different interfaces in the API. Calling the location service interface can obtain the returned Weibo content, number of retweets, number of comments, number of likes, user information, check-in location, etc.

步骤202、对微博集合PLMB进行预处理,去除微博中链接网址、表情符号信息,去除长度小于5个字的微博,得到预处理后的微博集合LMB,LMB={lmb1,lmb2,…,lmbn},其中lmbi(1≤i≤n)为每一条地域微博。采集到的地域微博中,虽然已经是从海量的微博中进行了有针对性的筛选,但其中还存在一些干扰信息,需要对其进行过滤,减少后期计算的复杂度。Step 202: Preprocess the microblog set PLMB, remove the link URL and emoticon information in the microblog, remove the microblogs whose length is less than 5 words, and obtain the preprocessed microblog set LMB, LMB={lmb 1 , lmb 2 , ..., lmb n }, where lmb i (1≤i≤n) is each regional microblog. In the collected regional microblogs, although targeted screening has been carried out from a large number of microblogs, there is still some interference information, which needs to be filtered to reduce the complexity of later calculation.

步骤102、从微博集合LMB中提取突发词,得到突发词集合EW,参照图3,其具体步骤如下:Step 102, extract burst words from the microblog set LMB, and obtain burst word set EW, with reference to FIG. 3, the specific steps are as follows:

步骤301、对LMB中的每条微博lmbi(1≤i≤n)进行分词,去除停用词,保留名词、动词、地名、人名、专有名词,得到最终的词集合为LMBW={w1,w2,…,wr},假设有r个词。因为有些动词不具有实际意义,比如“举行、进行、开展、会”等等,进一步的去除其中的停用动词;Step 301, perform word segmentation on each microblog lmb i (1≤i≤n) in the LMB, remove stop words, retain nouns, verbs, place names, personal names, proper nouns, and obtain the final word set as LMBW={ w 1 , w 2 , ..., w r }, suppose there are r words. Because some verbs do not have actual meaning, such as "hold, carry out, carry out, meet", etc., further remove the stop verbs;

步骤302、计算词wi(1≤i≤r)的频率突发性,假设当前突发事件检测的时间点为k,选取之前的p个时刻的历史数据为参考,词wi在k时间点的频率突发性定义为:Step 302: Calculate the frequency burstiness of the word wi (1≤i≤r), assuming that the time point of the current emergency event detection is k, select the historical data of the previous p moments as a reference, and the word wi is at time k. The frequency burstiness of a point is defined as:

Figure GDA0002544731740000061
Figure GDA0002544731740000061

其中,分子

Figure GDA0002544731740000062
为词wi在k时间点出现的频率,分母中的Among them, the molecule
Figure GDA0002544731740000062
is the frequency of word wi at time k, where in the denominator

Figure GDA0002544731740000063
Figure GDA0002544731740000064
Figure GDA0002544731740000063
Figure GDA0002544731740000064

F(wi)越大,说明在当前k时间点,词wi出现的频率增势越大,越有可能是突发词;The larger F( wi ) is, it means that at the current k time point, the frequency of word wi appears more frequently, and it is more likely to be a burst word;

步骤303、计算词wi(1≤i≤r)的关联用户突发性,假设当前突发事件检测的时间点为k,选取之前的p个时刻的历史数据为参考,词wi在k时间点的关联用户突发性定义为:

Figure GDA0002544731740000065
其中,分子
Figure GDA0002544731740000066
为k时间点,提及到词wi的不同用户数量,分母中的
Figure GDA0002544731740000067
Figure GDA0002544731740000068
越大,说明k时间点,提及到词wi的用户数量增势越大,词wi越有可能是突发词;Step 303: Calculate the associated user emergencies of word wi (1≤i≤r), assuming that the current emergency detection time point is k, select the historical data of the previous p moments as a reference, and the word wi is at k. The associated user burstiness at a point in time is defined as:
Figure GDA0002544731740000065
Among them, the molecule
Figure GDA0002544731740000066
is the k time point, the number of different users who mentioned the word wi , in the denominator
Figure GDA0002544731740000067
Figure GDA0002544731740000068
The larger the value, the more likely it is that the word wi is a sudden word at the k time point.

步骤304、计算词wi(1≤i≤r)的地域突发性,词wi在k时间点的分布地域突发性定义为:Step 304: Calculate the regional burstiness of word wi (1≤i≤r), and the regional burstiness of word wi at time point k is defined as:

Figure GDA0002544731740000069
Figure GDA0002544731740000069

其中,分子

Figure GDA0002544731740000071
为k时间点,提及到词wi的不同地理标签的数量,分母中的
Figure GDA0002544731740000072
Figure GDA0002544731740000073
GT(wi)越大,说明k时间点,提及到词wi的地理标签数量增势越大,词wi越有可能是突发词;Among them, the molecule
Figure GDA0002544731740000071
for k time points, the number of distinct geotags that mention word wi , in the denominator
Figure GDA0002544731740000072
Figure GDA0002544731740000073
The larger the GT( wi ), the greater the increase in the number of geotags referring to the word wi at time k, and the more likely the word wi is a sudden word;

步骤305、计算词wi(1≤i≤r)的社交行为突发性,词wi在k时间点的社交行为突发性定义为:

Figure GDA0002544731740000074
其中,分子
Figure GDA0002544731740000075
为k时间点,提及到词wi的微博的转发数、评论数和阅读数之和,分母中的
Figure GDA0002544731740000076
Figure GDA0002544731740000077
SB(wi)越大,说明k时间点,提及到词wi的社交行为数量增势越大,词wi越有可能是突发词;Step 305: Calculate the suddenness of social behavior of word wi (1≤i≤r). The suddenness of social behavior of word wi at time k is defined as:
Figure GDA0002544731740000074
Among them, the molecule
Figure GDA0002544731740000075
is the sum of the number of retweets, comments and readings of the microblogs mentioning the word wi at time point k, the denominator
Figure GDA0002544731740000076
Figure GDA0002544731740000077
The larger the SB( wi ), the greater the increase in the number of social behaviors referring to the word wi at time k, and the more likely the word wi is a sudden word;

步骤306、综合上述词的四个突发性,最终得到一个词wi在k时间点的突发值为:BurstyScore(wi)=α*F(wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi),其中,α、β、χ、δ为调节系数,用于调节四类指标的权重,α+β+χ+δ=1,α≥0,β≥0,χ≥0,δ≥0。BurstyScore(wi)越大,说明词wi在k时间点的突发性越大,词wi越有可能是突发词;Step 306 , synthesizing the four bursts of the above words, finally obtain the burst value of a word wi at time k: BurstyScore( wi )=α*F( wi )+β*U(u|w i )+χ*GT(gt|w i )+δ*SB(sb|w i ), where α, β, χ, and δ are adjustment coefficients, which are used to adjust the weights of the four types of indicators, α+β+χ +δ=1, α≥0, β≥0, χ≥0, δ≥0. The larger the BurstyScore( wi ), the greater the burstiness of the word wi at time k, and the more likely the word wi is a burst word;

步骤307、在计算出每个词的突发值后,使用四分差选出n个突发词,构成突发词集合EW。四分差的距离计算方法为:IQS(EW)=Q3(EW)-Q1(EW)。当一个词的突发值大于一定的阈值,则作为突发词,阈值的计算方法为:maxima(EW)=Q3(EW)+1.5×IQS(EW)。Step 307: After calculating the burst value of each word, select n burst words by using the quartile difference to form a burst word set EW. The distance calculation method of the quartile difference is: IQS(EW)=Q 3 (EW)-Q 1 (EW). When the burst value of a word is greater than a certain threshold, it is regarded as a burst word, and the calculation method of the threshold is: maxima(EW)=Q 3 (EW)+1.5×IQS(EW).

步骤103、对EW中的突发词进行聚类,得到突发事件词簇EWC={ewc1,ewc2,…,ewcq},参照图4,其具体步骤如下:Step 103: Cluster the emergent words in the EW to obtain the emergent event word cluster EWC={ewc 1 , ewc 2 ,..., ewc q }, referring to FIG. 4 , the specific steps are as follows:

步骤401、基于突发特征集EW,构建突发词关联网络EWN=(V,E),V是突发词集合EW,E表示突发词之间的关联强度。突发词ewi、ewj关联强度是统计两个词在同一篇微博博文中共现的次数;Step 401 , based on the burst feature set EW, construct a burst word association network EWN=(V, E), where V is the burst word set EW, and E represents the strength of association between burst words. The association strength of sudden words ew i and ew j is to count the number of co-occurrences of the two words in the same Weibo post;

步骤402、突发词关联网络EWN构建完成后,使用开源的CLUTO工具包对EWN进行聚类,获取突发事件词簇EWC={ewc1,ewc2,…,ewcq},假设有q个词簇。CLUTO提供三种聚类算法,既可以直接在聚类对象的特征空间上直接聚类,也可以按照对象的相似空间来聚类。这些算法为基于切分的、基于凝聚的和基于图形切分的。实际应用中,基于凝聚的层次聚类方法用的较多,因此本发明选用了凝聚层次聚类方法。Step 402: After the construction of the emergency word association network EWN is completed, use the open source CLUTO toolkit to cluster the EWN to obtain the emergency word cluster EWC={ewc 1 ,ewc 2 ,...,ewc q }, assuming that there are q word clusters. CLUTO provides three clustering algorithms, which can either directly cluster on the feature space of the clustered objects, or cluster according to the similar space of the objects. These algorithms are slice-based, agglomerative-based, and graph-slice-based. In practical applications, agglomeration-based hierarchical clustering methods are often used, so the present invention selects agglomerative hierarchical clustering methods.

对比例:使用三种不同的微博网络地域突发事件检测方法,比较地域突发事件检测的有效性。三种方法如下:Comparative Example: Using three different Weibo network regional emergency detection methods to compare the effectiveness of regional emergency detection. The three methods are as follows:

(1)方法1-HBED,选取微博中包含的Hashtag,将Hashtag表示为向量模式,词的权重采用TF-IDF的方式计算,计算聚簇的热度时考虑了一个簇包含微博的数量变化。(1) Method 1-HBED, select the hashtag contained in the microblog, and represent the hashtag as a vector pattern, the weight of the word is calculated by TF-IDF, and the number of microblogs contained in a cluster is considered when calculating the heat of the cluster. .

(2)方法2-GeoBurst,首先在查询窗口内识别一些重要微博作为中心轴点,进一步的通过与历史数据在时空方面的比较得到突发事件。突发事件的排序根据词簇中词的时间和空间突发性。四个主要的参数设置:核函数宽度h=0.01,重新开始概率α=0.2,随机游走相似度阈值δ=0.02,平衡时空突发性的参数η=0.5。(2) Method 2-GeoBurst, firstly identify some important microblogs as central axis points in the query window, and further obtain emergencies by comparing with historical data in terms of time and space. The ordering of emergent events is based on the temporal and spatial emergencies of the words in the word cluster. Four main parameters are set: kernel function width h = 0.01, restart probability α = 0.2, random walk similarity threshold δ = 0.02, and parameter η = 0.5 for balancing spatiotemporal abruptness.

(3)方法3-LocTBED,本发明提出的方法,主要是提出的词的突发性计算,使用CLUTO提供的凝聚聚类方法bagglo进行聚类,簇的个数指定为10,聚类的相似度函数指定为余弦函数Cos。词的突发值计算时,词的历史考察时间设置为一周(7天),四类指标累加时的调节参数α=β=χ=δ=0.25。(3) Method 3-LocTBED, the method proposed by the present invention is mainly the sudden calculation of the proposed word, using the agglomerative clustering method bagglo provided by CLUTO for clustering, the number of clusters is designated as 10, and the clusters are similar The degree function is specified as the cosine function Cos. When calculating the burst value of a word, the historical investigation time of the word is set to one week (7 days), and the adjustment parameter α=β=χ=δ=0.25 when the four types of indicators are accumulated.

本发明以真实的社交媒体-新浪微博为例,采集了北京、江苏省连云港市两个城市带有地理标签的微博,北京地区信息采集的时间是2016年12月1日-12月30日(一个月的数据),共采集到346863条带地理标签的微博,连云港市信息采集的时间是2016年5月1日-10月31日(半年的数据),共采集到63744条带地理标签的微博。以天为单位验证各种事件检测方法的有效性,即检测指定的某天的地域突发事件。The present invention takes the real social media-Sina Weibo as an example, and collects Weibo with geographic tags in two cities of Beijing and Lianyungang City, Jiangsu Province. The information collection time in Beijing is from December 1st to December 30th, 2016. A total of 346,863 microblogs with geotags were collected from 2016 (1 month data), and the information collection time of Lianyungang City was from May 1, 2016 to October 31, 2016 (half-year data), and a total of 63,744 microblogs were collected. Geo-tagged tweets. Verifies the effectiveness of various event detection methods on a daily basis, that is, detects regional emergencies on a specified day.

由于每个城市每天的地域突发事件是未知的,所以参考目前已有的主流研究方法,采用精准率P@n作为评价指标。对于每天检测到的Top-k突发事件,人工判断检测到的是否是地域突发事件,由于Top-k检测的事件数量较少,所以人工评测的工作量并不复杂。Since the daily regional emergencies in each city are unknown, the accuracy rate P@n is used as the evaluation index with reference to the existing mainstream research methods. For Top-k emergencies detected every day, it is manually determined whether the detected events are regional emergencies. Since the number of events detected by Top-k is small, the workload of manual evaluation is not complicated.

3种方法在5个评测指标上获取的结果如表1所示。The results obtained by the three methods on the five evaluation indicators are shown in Table 1.

表1.5种方法在5个评测指标上的检测结果Table 1. The detection results of 5 methods on 5 evaluation indicators

MethodsMethods P@1P@1 P@2P@2 P@3P@3 P@4P@4 P@5P@5 AverageAverage HBEDHBED 0.200.20 0.300.30 0.200.20 0.300.30 0.240.24 0.240.24 GeoBurstGeoBurst 0.800.80 0.700.70 0.800.80 0.750.75 0.720.72 0.720.72 LocTBEDLocTBED 0.800.80 0.800.80 0.870.87 0.800.80 0.760.76 0.760.76

对比3种方法,本文提出的方法LocTBED获取的效果最为理想,在5个评测指标上得到的平均值为0.76。其次是GeoBurst,在5个评测指标上得到的平均值为0.72。虽然这两种方法得到的值比较接近,但两者得到检测结果中的突发事件的排序有较大的区别。方法LocTBED在计算突发事件类簇的热度时,考虑了类簇包含的地域词的个数,对检测地域性突发事件有重要的帮助。Comparing the three methods, the method proposed in this paper, LocTBED, has the most ideal effect, and the average value obtained on the five evaluation indicators is 0.76. Next is GeoBurst, with an average of 0.72 on the five evaluation metrics. Although the values obtained by these two methods are relatively close, there is a big difference in the ordering of the emergent events in the detection results obtained by the two methods. The method LocTBED considers the number of regional words contained in the cluster when calculating the popularity of emergency event clusters, which is of great help in detecting regional emergencies.

方法HBED的效果偏差,主要原因是,获取的地理标签微博中,带有Hashtag的微博数量偏少,且多是广域性的事件,对地域性事件的检测不适用。The main reason for the deviation of the effect of the method HBED is that in the obtained geo-tagged micro-blogs, the number of micro-blogs with hashtags is relatively small, and most of them are wide-area events, which are not applicable to the detection of regional events.

本发明所述的方法并不限于具体实施方式中所述的实施例,本领域技术人员根据本发明的技术方案得出的其它的实施方式,同样属于本发明的技术创新范围。The method described in the present invention is not limited to the examples described in the specific implementation manner, and other embodiments obtained by those skilled in the art according to the technical solutions of the present invention also belong to the technical innovation scope of the present invention.

Claims (3)

1.一种微博网络地域突发事件的检测方法,其特征在于,其具体步骤如下:1. a detection method for microblog network regional emergencies, is characterized in that, its concrete steps are as follows: A、从微博网络中采集地域微博,得到微博集合PLMB,对微博预处理后得到微博集合LMB;A. Collect regional microblogs from the microblog network, obtain the microblog set PLMB, and obtain the microblog set LMB after preprocessing the microblog; B、从微博集合LMB中提取突发词,得到突发词集合EW;B. Extract the burst words from the microblog set LMB to obtain the burst word set EW; C、对EW中的突发词进行聚类,假设有q个词簇,得到突发事件词簇EWC={ewc1,ewc2,…,ewcq};C. Cluster the emergent words in EW, assuming that there are q word clusters, and obtain the emergent event word cluster EWC={ewc 1 ,ewc 2 ,...,ewc q }; 所述步骤B的具体步骤如下:The specific steps of the step B are as follows: B1、对LMB中的每条微博lmbi(1≤i≤n)进行分词,n代表微博的条数,去除停用词,保留名词、动词、地名、人名、专有名词,得到最终的词集合为LMBW={w1,w2,…,wr},假设有r个词;B1. Perform word segmentation on each microblog lmb i (1≤i≤n) in LMB, n represents the number of microblogs, remove stop words, keep nouns, verbs, place names, person names, proper nouns, and get the final The word set of is LMBW={w 1 , w 2 , ..., w r }, assuming there are r words; B2、计算词wi(1≤i≤r)的频率突发性,假设当前突发事件检测的时间点为k,选取之前的p个时刻的历史数据为参考,词wi在k时间点的频率突发性定义为:
Figure FDA0002503500210000011
其中,分子
Figure FDA0002503500210000012
为词wi在k时间点出现的频率,分母中的
Figure FDA0002503500210000013
B2. Calculate the frequency burstiness of word wi (1≤i≤r), assuming that the time point of the current emergency event detection is k, select the historical data of the previous p moments as a reference, and the word wi is at the k time point The frequency burstiness of is defined as:
Figure FDA0002503500210000011
Among them, the molecule
Figure FDA0002503500210000012
is the frequency of word wi at time k, where in the denominator
Figure FDA0002503500210000013
B3、计算词wi(1≤i≤r)的关联用户突发性,假设当前突发事件检测的时间点为k,选取之前的p个时刻的历史数据为参考,词wi在k时间点的关联用户突发性定义为:
Figure FDA0002503500210000014
其中,分子
Figure FDA0002503500210000015
为k时间点,提及到词wi的不同用户数量,分母中的
Figure FDA0002503500210000016
B3. Calculate the associated user emergencies of word wi (1≤i≤r), assuming that the current emergency detection time point is k, select the historical data of the previous p moments as a reference, and the word wi is at time k The associated user burstiness of a point is defined as:
Figure FDA0002503500210000014
Among them, the molecule
Figure FDA0002503500210000015
is the k time point, the number of different users who mentioned the word wi , in the denominator
Figure FDA0002503500210000016
B4、计算词wi(1≤i≤r)的地域突发性,词wi在k时间点的分布地域突发性定义为:
Figure FDA0002503500210000017
其中,分子
Figure FDA0002503500210000018
为k时间点,提及到词wi的不同地理标签的数量,分母中的
Figure FDA0002503500210000021
B4. Calculate the regional burstiness of word wi (1≤i≤r), and the regional burstiness of word wi at time k is defined as:
Figure FDA0002503500210000017
Among them, the molecule
Figure FDA0002503500210000018
for k time points, the number of distinct geotags that mention word wi , in the denominator
Figure FDA0002503500210000021
B5、计算词wi(1≤i≤r)的社交行为突发性,词wi在k时间点的社交行为突发性定义为:
Figure FDA0002503500210000022
其中,分子
Figure FDA0002503500210000025
为k时间点,提及到词wi的微博的转发数、评论数和阅读数之和,分母中的
Figure FDA0002503500210000023
Figure FDA0002503500210000024
B5. Calculate the suddenness of social behavior of word wi (1≤i≤r). The suddenness of social behavior of word wi at time k is defined as:
Figure FDA0002503500210000022
Among them, the molecule
Figure FDA0002503500210000025
is the sum of the number of retweets, comments and readings of the microblogs mentioning the word wi at time point k, the denominator
Figure FDA0002503500210000023
Figure FDA0002503500210000024
B6、综合步骤B2、B3、B4、B5的四个突发性,最终得到一个词wi在k时间点的突发值为:BurstyScore(wi)=α*F(wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi),其中,α、β、χ、δ为调节系数,用于调节四类指标的权重,α+β+χ+δ=1,α≥0,β≥0,χ≥0,δ≥0;B6. Synthesize the four bursts of steps B2, B3, B4, and B5, and finally obtain the burst value of a word wi at time k: BurstyScore( wi )=α*F( wi )+β* U(u|w i )+χ*GT(gt|w i )+δ*SB(sb|w i ), where α, β, χ, and δ are adjustment coefficients, which are used to adjust the weights of the four types of indicators, α+β+χ+δ=1, α≥0, β≥0, χ≥0, δ≥0; B7、在计算出每个词的突发值后,使用四分差选出n个突发词,构成突发词集合EW;四分差的距离计算方法为:IQS(EW)=Q3(EW)-Q1(EW);当一个词的突发值大于一定的阈值,则作为突发词,阈值的计算方法为:maxima(EW)=Q3(EW)+1.5×IQS(EW)。B7. After calculating the burst value of each word, use the quartile difference to select n burst words to form the burst word set EW; the distance calculation method of the quartile difference is: IQS(EW)=Q 3 ( EW)-Q 1 (EW); when the burst value of a word is greater than a certain threshold, it is regarded as a burst word, and the calculation method of the threshold is: maxima(EW)=Q 3 (EW)+1.5×IQS(EW) .
2.根据权利要求1所述的一种微博网络地域突发事件的检测方法,其特征在于:上述步骤A的具体步骤如下:2. the detection method of a kind of microblog network regional emergency according to claim 1, is characterized in that: the concrete steps of above-mentioned step A are as follows: A1、使用采集工具获取地域Localized的微博信息集合PLMB={plmb1,plmb2,…,plmbm}其中plmbi(1≤i≤m)为每一条地域微博;m代表地域微博的条数;A1. Use the collection tool to obtain the localized microblog information set PLMB={plmb 1 , plmb 2 , ..., plmb m } where plmb i (1≤i≤m) is each regional microblog; m represents the number of regional microblogs number of bars; A2、对微博集合PLMB进行预处理,去除微博中链接网址、表情符号信息,去除长度小于5个字的微博,得到预处理后的微博集合LMB,LMB={lmb1,lmb2,…,lmbn}其中lmbi(1≤i≤n)为每一条地域微博。A2. Preprocess the microblog set PLMB, remove the link URL and emoticon information in the microblog, remove the microblogs whose length is less than 5 words, and obtain the preprocessed microblog set LMB, LMB={lmb 1 , lmb 2 , ..., lmb n } where lmb i (1≤i≤n) is each regional microblog. 3.根据权利要求1所述的一种微博网络地域突发事件的检测方法,其特征在于,所述步骤C的具体步骤如下:3. the detection method of a kind of micro-blog network regional emergencies according to claim 1, is characterized in that, the concrete steps of described step C are as follows: C1、基于步骤B获取的突发特征集EW,构建突发词关联网络EWN=(V,E),V是突发词集合EW,E表示突发词之间的关联强度;突发词ewi、ewj关联强度是统计两个词在同一篇微博博文中共现的次数;C1. Based on the burst feature set EW obtained in step B, construct a burst word association network EWN=(V, E), where V is the burst word set EW, and E represents the association strength between burst words; burst word ew The association strength of i and ew j is to count the number of co-occurrences of two words in the same Weibo post; C2、突发词关联网络EWN构建完成后,使用开源的CLUTO工具包对EWN进行聚类,获取突发事件词簇EWC={ewc1,ewc2,…,ewcq},假设有q个词簇。C2. After the construction of the EWN of the burst word association network is completed, use the open source CLUTO toolkit to cluster the EWN to obtain the burst word cluster EWC={ewc 1 ,ewc 2 ,...,ewc q }, assuming that there are q words cluster.
CN201710455550.6A 2017-06-15 2017-06-15 A detection method for regional emergencies in Weibo network Active CN107273496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710455550.6A CN107273496B (en) 2017-06-15 2017-06-15 A detection method for regional emergencies in Weibo network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710455550.6A CN107273496B (en) 2017-06-15 2017-06-15 A detection method for regional emergencies in Weibo network

Publications (2)

Publication Number Publication Date
CN107273496A CN107273496A (en) 2017-10-20
CN107273496B true CN107273496B (en) 2020-07-28

Family

ID=60067208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710455550.6A Active CN107273496B (en) 2017-06-15 2017-06-15 A detection method for regional emergencies in Weibo network

Country Status (1)

Country Link
CN (1) CN107273496B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733791B (en) * 2018-05-11 2020-11-20 北京科技大学 Network event detection method
CN109509110B (en) * 2018-07-27 2021-08-31 福州大学 Weibo hot topic discovery method based on improved BBTM model
CN110502703A (en) * 2019-07-12 2019-11-26 北京邮电大学 A Social Network Emergencies Detection Method Based on String Dictionary Construction
CN111475732B (en) * 2020-04-13 2023-07-14 深圳市雅阅科技有限公司 Information processing method and device
CN112257429B (en) * 2020-10-16 2024-04-16 北京工商大学 Microblog emergency detection method based on BERT-BTM network
CN112528024B (en) * 2020-12-15 2022-11-18 哈尔滨工程大学 Microblog emergency detection method based on multi-feature fusion
CN112527960A (en) * 2020-12-17 2021-03-19 华东师范大学 Emergency detection method based on keyword clustering
CN112948587A (en) * 2021-03-30 2021-06-11 杭州叙简科技股份有限公司 Microblog public opinion analysis method and device based on earthquake industry and electronic equipment
CN114461763B (en) * 2022-04-13 2022-07-15 南京众智维信息科技有限公司 Network security event extraction method based on burst word clustering

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281608A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Emergency analyzing method based on microblogs
US9397904B2 (en) * 2013-12-30 2016-07-19 International Business Machines Corporation System for identifying, monitoring and ranking incidents from social media
CN104216954B (en) * 2014-08-20 2017-07-14 北京邮电大学 The prediction meanss and Forecasting Methodology of accident topic state
CN106294333B (en) * 2015-05-11 2019-10-29 国家计算机网络与信息安全管理中心 A kind of microblogging burst topic detection method and device
US20170024412A1 (en) * 2015-07-17 2017-01-26 Environmental Systems Research Institute (ESRI) Geo-event processor

Also Published As

Publication number Publication date
CN107273496A (en) 2017-10-20

Similar Documents

Publication Publication Date Title
CN107273496B (en) A detection method for regional emergencies in Weibo network
Aphiwongsophon et al. Detecting fake news with machine learning method
Cortés et al. Stream processing of healthcare sensor data: studying user traces to identify challenges from a big data perspective
CN104216954B (en) The prediction meanss and Forecasting Methodology of accident topic state
CN103345524A (en) Method and system for detecting microblog hot topics
CN103150374B (en) Method and system for identifying abnormal microblog users
CN103745000B (en) Hot topic detection method of Chinese micro-blogs
CN103823888B (en) Node-closeness-based social network site friend recommendation method
Du et al. Predicting the hand, foot, and mouth disease incidence using search engine query data and climate variables: an ecological study in Guangdong, China
CN106296422A (en) A kind of social networks junk user detection method merging many algorithms
CN106570144A (en) Method and apparatus for recommending information
CN106021508A (en) Sudden event emergency information mining method based on social media
CN109242553A (en) A kind of user behavior data recommended method, server and computer-readable medium
CN102982157A (en) Device and method used for mining microblog hot topics
Farseev et al. bbridge: A big data platform for social multimedia analytics
Sangameswar et al. An algorithm for identification of natural disaster affected area
CN105335476B (en) Method and device for classifying hot events
Fan et al. Effects of population co-location reduction on cross-county transmission risk of COVID-19 in the United States
Gao et al. A novel method for geographical social event detection in social media
Rachunok et al. Is the data suitable? the comparison of keyword versus location filters in crisis informatics using twitter data
Kumbalaparambi et al. Assessment of urban air quality from Twitter communication using self-attention network and a multilayer classification model
CN109977324A (en) A kind of point of interest method for digging and system
CN106294621B (en) A method and system for calculating event similarity based on complex network node similarity
CN110083701B (en) An early warning system for mass incidents in cyberspace based on average influence
CN108694247B (en) A typhoon disaster analysis method based on Weibo topic popularity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 222000 zhongzhaoman transfer of computer school of Huaihai Institute of technology, No. 59 Cangwu Road, Haizhou District, Lianyungang City, Jiangsu Province

Patentee after: Jiangsu Ocean University

Country or region after: China

Address before: 222000 zhongzhaoman transfer of computer school of Huaihai Institute of technology, No. 59 Cangwu Road, Haizhou District, Lianyungang City, Jiangsu Province

Patentee before: HUAIHAI INSTITUTE OF TECHNOLOGY

Country or region before: China

CP03 Change of name, title or address
TR01 Transfer of patent right

Effective date of registration: 20241010

Address after: Floor 17-2-12, Huaguoshan Avenue, Haizhou District, Lianyungang City, Jiangsu Province, 222000

Patentee after: JIANGSU JINGE NETWORK TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 222000 zhongzhaoman transfer of computer school of Huaihai Institute of technology, No. 59 Cangwu Road, Haizhou District, Lianyungang City, Jiangsu Province

Patentee before: Jiangsu Ocean University

Country or region before: China

TR01 Transfer of patent right