CN107273496A - A kind of detection method of micro blog network region accident - Google Patents

A kind of detection method of micro blog network region accident Download PDF

Info

Publication number
CN107273496A
CN107273496A CN201710455550.6A CN201710455550A CN107273496A CN 107273496 A CN107273496 A CN 107273496A CN 201710455550 A CN201710455550 A CN 201710455550A CN 107273496 A CN107273496 A CN 107273496A
Authority
CN
China
Prior art keywords
word
microblogging
burst
lmb
ewc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710455550.6A
Other languages
Chinese (zh)
Other versions
CN107273496B (en
Inventor
仲兆满
管燕
李存华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaihai Institute of Techology
Original Assignee
Huaihai Institute of Techology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaihai Institute of Techology filed Critical Huaihai Institute of Techology
Priority to CN201710455550.6A priority Critical patent/CN107273496B/en
Publication of CN107273496A publication Critical patent/CN107273496A/en
Application granted granted Critical
Publication of CN107273496B publication Critical patent/CN107273496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of detection method of micro blog network region accident, its step is as follows:(1) region microblogging is gathered from micro blog network, microblogging set PLMB is obtained, microblogging set LMB is obtained after being pre-processed to microblogging;(2) burst word is extracted from microblogging set LMB, burst set of words EW is obtained;(3) the burst word in EW is clustered, obtains accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.The inventive method proposes the burst value that word is calculated using word frequency rate, word association user, word distributional region and the class index of word Social behaviors 4, more reasonably make use of the burst character of micro blog network word, is more suitable for the detection of micro blog network region accident.

Description

A kind of detection method of micro blog network region accident
Technical field
The present invention relates to a kind of information service, specifically, it is related to a kind of micro blog network region incident detection Method.
Background technology
Microblogging has provided the user as real-time, interactivity very strong social media and has freely delivered content and information The platform of exchange, has become people and discloses event, the preferred media delivered viewpoint, shared one's experience.What is occurred in reality is a lot Event all first discloses that then traditional mainstream media is just reported, such as on microblogging, Boston blast thing of 2013 years Part, the event of passing away etc. of Mrs Thatcher.Event detection towards microblogging has turned into the research heat of recent event detection field Point.
Because many contents of microblogging carry regional information, including the place that blog article is referred to, the note of the user of blog article is delivered Volume place, and subsidiary geographical labels of blog article etc., towards microblogging local region event detection (Localized event) The emerging research direction through becoming.This kind of event detection has a basic assumption, i.e., when local domain does not have event occurs when Wait, user can seldom discuss such event, once there occurs, just have a substantial amounts of discussion, such as region occur fire, blast, Flood, traffic accident, pollution, disease propagation etc. event.This wide area event detection (Global event) with social media It is very different, wide area event detection does not consider regional characteristic, what is faced is the whole information flow of media, the work not only analyzed Work amount is big, and may have been omitted the focus incident of local region, and existing event detecting method is difficult to be applied directly to region Among event detection.
The proceeding published in the U.S. in 2010:19th International World Wide Web meeting (19th in 2010 International World Wide Web Conference), it is entitled:Earthquake detection based on Twitter user-logical Cross the social real-time detecting event of sensor (Earthquake shakes Twitter users:real-time event Detection by social sensors), author is Takeshi Sakaki, Makoto Okazaki, Yutaka Matsuo, this article is modeled to each Twitter user the node in wireless sensor network, and user delivers related to earthquake The node that the process of blog article is abstracted into wireless sensor network issues the information behavior itself collected, then passes through blog article Time and spatial model and follow-up filtering process, whether earthquake occurs to confirm.But this method needs engineer one A little inquiry input items, it is difficult to be applied to the detection of unconventional accident.
The periodical published in China in 2016:Modem long jump skill intelligence technology, it is entitled:Microblogging event based on geographical coordinate Detection and analysis, Zuo Zheshi:Lijin China, An Zhongjie, this article has used the issue number of microblog data, forwarding number, comment number, user Liveness and mobile 5 indexs of intensity build the feature of microblogging.This method detect microblogging accident when, it is contemplated that microblogging The feature of the social media of class is not comprehensive, including the burst frequency of word, region are sudden etc., and when calculating each index Specific computational methods (including the formula of formalization etc.) are not provided.
The proceeding published in the U.S. in 2016:39th world ACM information retrieval meetings (39th International ACM SIGIR Conference on Research and Development in Information Retrieval), it is entitled:GeoBurst:Real-time monitored area event (GeoBurst in special stream is pushed away from geographical labels:Real- Time Local Event Detection in Geo-Tagged Tweet Streams), author is Zhang Chao, Zhou Guangyu, Yuan Quan, Zhuang Honglei, Zheng Yu, Kaplan Lance, Wang Shaowen, Han Jiawei, this article recognizes some important microbloggings as center axis point (Pivots) in query window first, further by with Historical data relatively obtains accident in terms of space-time.This method is the angle from microblogging text message, due to micro- It is rich relatively shorter and smaller, and term is lack of standardization, directly is difficult to extract effective feature from some single short and small microblogging texts.
The content of the invention
The technical problems to be solved by the invention are that there is provided a kind of new micro blog network region in view of the shortcomings of the prior art The detection method of accident, this method more reasonably make use of the burst character of micro blog network word, with being more suitable for micro blog network The detection of domain accident.
The technical problems to be solved by the invention are realized by following technical scheme.The invention provides one kind The detection method of micro blog network region accident, is characterized in, it is comprised the following steps that:
A, the collection region microblogging from micro blog network, obtain microblogging set PLMB, and microblogging collection is obtained after being pre-processed to microblogging Close LMB;
B, the extraction burst word from microblogging set LMB, obtain burst set of words EW;
C, the burst word in EW is clustered, obtain accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that There is q word cluster;
Region microblogging is gathered from micro blog network described in step A described in the inventive method, obtains micro- after pretreatment Rich set LMB, it is preferred to use step in detail below:
A1, the micro-blog information set PLMB={ plmb using sampling instrument acquisition region Localized1, plmb2, L, plmbm, wherein plmbi(1#i m) is each region microblogging;
A2, microblogging set PLMB is pre-processed, remove link network address, emoticon information in microblogging, remove length Less than the microblogging of 5 words, pretreated microblogging set LMB, LMB={ lmb is obtained1, lmb2, L, lmbn, wherein lmbi(1#i N) it is each region microblogging.
Burst word is extracted from microblogging set LMB described in step B described in the inventive method, the set of words that happens suddenly is obtained EW, it is preferred to comprise the following steps that:
B1, to every microblogging lmb in LMBi(1#i n) carry out participle, remove stop words, retain noun, verb, place name, Name, proper noun, obtain final word set and are combined into LMBW={ w1, w2, L, wr, }, it is assumed that and there is r word;
B2, calculating word wiThe frequency burst of (1#i r), it is assumed that the time point of current incident detection is k, before selection The historical data at p moment is reference, word wiIt is defined as in the frequency burst at k time points:Wherein, MoleculeFor word wiIn the frequency occurred at k time points, denominator
B3, calculating word wiThe association user of (1#i r) is sudden, it is assumed that the time point of current incident detection is k, choosing The historical data at p moment before taking is reference, word wiIt is defined as the association user at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent user quantity, in denominator
B4, calculating word wiThe region of (1#i r) is sudden, word wiIt is defined as the distributional region at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent geographical labels quantity, point In mother
B5, calculating word wiThe Social behaviors of (1#i r) are sudden, word wiIn the sudden definition of the Social behaviors at k time points For:Wherein, moleculeFor k time points, word w is mentioned toiMicroblogging forwarding number, comment number With read number sum, in denominator
B6, four of combining step B2, B3, B4, B5 it is sudden, finally give a word wiIn the burst value at k time points For:BurstyScore(wi)=α * F (wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi), wherein, α, β, χ, δ are tune Coefficient is saved, the weight for adjusting four class indexs, alpha+beta+χ+δ=1, α >=0, β >=0, χ >=0, δ >=0;
B7, after the burst value of each word is calculated, select n burst word using quartile deviation, constitute burst set of words EW. The distance calculating method of quartile deviation is:IQS (EW)=Q3(EW)-Q1(EW).When a word burst value be more than certain threshold value, Then as burst word, the computational methods of threshold value are:Maxima (EW)=Q3(EW)+1.5×IQS(EW)。
To in EW in a kind of micro blog network region incident detection method described in the inventive method, described step C Burst word clustered, obtain accident word cluster EWC={ ewc1, ewc2..., ewcq, preferably comprise the following steps that:
C1, the burst character collection EW obtained based on step B, build burst word association network EWN=(V, E), and V is burst word Set EW, E represent the strength of association between burst word.Happen suddenly word ewi、ewjStrength of association is two words of statistics in same piece microblogging The number of times of co-occurrence in blog article;
After the completion of C2, burst word association network EWN are built, EWN is clustered using the CLUTO kits increased income, obtained Take accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.
Compared with prior art, the present invention is proposed comprehensively carries out the finger of event detection using the feature of micro blog network Mark, it is proposed that using word frequency rate, word association user, word distributional region and the class index of word Social behaviors 4, calculate the burst value of word, The burst character of micro blog network word is more reasonably make use of, is more suitable for the detection of micro blog network region accident.And give Specific computational methods, there is very big practical value.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the micro blog network region incident detection method of the present invention;
Fig. 2 is that region microblogging is gathered from micro blog network described in step 101 in Fig. 1, microblogging set PLMB is obtained, to micro- Microblogging set LMB flow chart is obtained after rich pretreatment;
Fig. 3 is that burst word is extracted from microblogging set LMB described in step 102 in Fig. 1, the stream for the set of words EW that obtains happening suddenly Cheng Tu;
Fig. 4 is being clustered to the burst word in EW described in step 103 in Fig. 1, obtains accident word cluster EWC= {ewc1, ewc2..., ewcqFlow chart.
Embodiment
The implementation process to the present invention is described in further detail with reference to the accompanying drawings and detailed description.
A kind of reference picture 1, the detection method of micro blog network region accident, this method comprises the following steps:
Step 101, the collection region microblogging from micro blog network, obtain microblogging set PLMB, are obtained after being pre-processed to microblogging Microblogging set LMB, reference picture 2, it is comprised the following steps that:
Step 201, the micro-blog information set PLMB={ plmb using sampling instrument acquisition region Localized1, plmb2, L, plmbm, wherein plmbi(1#i m) is each region microblogging.After microblogging application developer's authority, call different in API Interface, can get the dynamic micro-blog information on some position periphery.Calling station service interface can obtain the microblogging of return Content, forwarding number, comment number, thumb up number, user profile, place of registering etc..
Step 202, microblogging set PLMB is pre-processed, remove link network address, emoticon information in microblogging, remove Length is less than the microblogging of 5 words, obtains pretreated microblogging set LMB, LMB={ lmb1, lmb2, L, lmbn, wherein lmbi (1#i n) is each region microblogging.In the region microblogging collected, although be to have carried out having pin from the microblogging of magnanimity Screening to property, but wherein also in the presence of some interference informations, it is necessary to be filtered to it, the complexity that the reduction later stage calculates.
Step 102, the extraction burst word from microblogging set LMB, obtain happen suddenly set of words EW, reference picture 3, its specific steps It is as follows:
Step 301, to every microblogging lmb in LMBi(1#i n) carries out participle, removes stop words, retains noun, moves Word, place name, name, proper noun, obtain final word set and are combined into LMBW={ w1, w2, L, wr, }, it is assumed that and there is r word.Because having A little verbs do not have practical significance, such as " hold, carry out, carry out, meeting " etc., further remove deactivation verb therein;
Step 302, calculating word wiThe frequency burst of (1#i r), it is assumed that the time point of current incident detection is k, The historical data at p moment before selection is reference, word wiIt is defined as in the frequency burst at k time points:Wherein, moleculeFor word wiIn the frequency occurred at k time points, denominatorF(wi) bigger, illustrate at current k time points, word wiThe frequency of appearance Rate growth trend is bigger, is more likely to be burst word;
Step 303, calculating word wiThe association user of (1#i r) is sudden, it is assumed that the time point of current incident detection For k, the historical data at p moment before selection is reference, word wiIt is defined as the association user at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent user quantity, in denominatorU(wi) bigger, illustrate k time points, be mentioned to word wiNumber of users Measure growth trend bigger, word wiMore it is likely to be burst word;
Step 304, calculating word wiThe region of (1#i r) is sudden, word wiIn the sudden definition of the distributional region at k time points For:Wherein, moleculeFor k time points, word w is mentioned toiDifferent geographical labels number In amount, denominatorGT(wi) bigger, illustrate k time points, refer to To word wiGeographical labels quantity growth trend it is bigger, word wiMore it is likely to be burst word;
Step 305, calculating word wiThe Social behaviors of (1#i r) are sudden, word wiIt is sudden in the Social behaviors at k time points It is defined as:Wherein, moleculeFor k time points, word w is mentioned toiMicroblogging forwarding number, comment By number and number sum is read, in denominator SB(wi) bigger, say At bright k time points, it is mentioned to word wiSocial behaviors quantity growth trend it is bigger, word wiMore it is likely to be burst word;
Step 306, four of summary word it is sudden, finally give a word wiIt is in the burst value at k time points: BurstyScore(wi)=α * F (wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi), wherein, α, β, χ, δ are for regulation Number, the weight for adjusting four class indexs, alpha+beta+χ+δ=1, α >=0, β >=0, χ >=0, δ >=0.BurstyScore(wi) bigger, Declarer wiIn sudden bigger, the word w at k time pointsiMore it is likely to be burst word;
Step 307, after the burst value of each word is calculated, using quartile deviation select n burst word, constitute burst word set Close EW.The distance calculating method of quartile deviation is:IQS (EW)=Q3(EW)-Q1(EW).When the burst value of a word is more than necessarily Threshold value, then as burst word, the computational methods of threshold value are:Maxima (EW)=Q3(EW)+1.5×IQS(EW)。
Step 103, the burst word in EW is clustered, obtain accident word cluster EWC={ ewc1, ewc2..., ewcq, reference picture 4, it is comprised the following steps that:
Step 401, based on burst character collection EW, build burst word association network EWN=(V, E), V is burst set of words EW, E represent the strength of association between burst word.Happen suddenly word ewi、ewjStrength of association is two words of statistics in same piece microblogging blog article The number of times of middle co-occurrence;
After the completion of step 402, burst word association network EWN are built, EWN is gathered using the CLUTO kits increased income Class, obtains accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.CLUTO provides three kinds of clusters and calculated Method, both can directly on the feature space of clustering object direct clustering, can also be clustered according to the similar spatial of object.This A little algorithms are based on cutting, based on cohesion and based on figure cutting.In practical application, the hierarchical clustering side based on cohesion Method it is more, therefore the present invention has selected Agglomerative Hierarchical Clustering method.
Comparative example:Using three kinds of different micro blog network region incident detection methods, compare region accident inspection The validity of survey.Three kinds of methods are as follows:
(1) method 1-HBED, chooses the Hashtag included in microblogging, Hashtag is expressed as into vector pattern, the power of word Calculated again by the way of TF-IDF, the number change that a cluster includes microblogging is considered when calculating the temperature clustered.
(2) method 2-GeoBurst, recognizes some important microbloggings as center axis point, further in query window first By relatively obtaining accident in terms of space-time with historical data.The sequence of accident is according to time of word in word cluster It is sudden with space.Four main parameter settings:Kernel function width h=0.01, restarts probability α=0.2, random walk Similarity threshold δ=0.02, paroxysmal parameter η=0.5 of balance space-time.
(3) method 3-LocTBED, method proposed by the present invention, the sudden calculating of the word mainly proposed is used The cohesion clustering method bagglo that CLUTO is provided is clustered, and the number of cluster is appointed as 10, and the similarity function of cluster is appointed as Cosine function Cos.When the burst value of word is calculated, the historical limitations set of time of word is one week (7 days), when four class indexs are cumulative Regulation parameter α=β=χ=8=0.25.
The present invention acquires Beijing, Jiangsu Province Lianyungang Liang Ge cities by taking real social media-Sina weibo as an example Microblogging with geographical labels, the time of Beijing area information gathering is the (number of one month on 1 day-December 30 December in 2016 According to), the microblogging of 346863 band geographical labels is collected altogether, and the time of Lianyungang information gathering is 1 day -10 May in 2016 Month (data of half a year) on the 31st, collect the microblogging of 63744 band geographical labels altogether.Various event detections are verified in units of day The validity of method, that is, detect the region accident of certain day specified.
Because the daily region accident in each city is unknown, so with reference to current existing mainstream research side Method, evaluation index is used as using rate of precision P@n.For the Top-k accidents detected daily, what artificial judgment was detected is No is region accident, because the event number that Top-k is detected is less, so the workload manually evaluated and tested and uncomplicated.
The result that 3 kinds of methods are obtained on 5 evaluation metricses is as shown in table 1.
Testing result of the 1. 5 kinds of methods of table on 5 evaluation metricses
Methods P@1 P@2 P@3 P@4 P@5 Average
HBED 0.20 0.30 0.20 0.30 0.24 0.24
GeoBurst 0.80 0.70 0.80 0.75 0.72 0.72
LocTBED 0.80 0.80 0.87 0.80 0.76 0.76
Contrast 3 kinds of methods, set forth herein the effects that obtain of method LocTBED it is ideal, on 5 evaluation metricses The average value arrived is 0.76.Next to that GeoBurst, the average value obtained on 5 evaluation metricses is 0.72.Although both Relatively, but both obtain the sequence of the accident in testing result to the value that method is obtained larger difference.Method LocTBED is when calculating the temperature of accident class cluster, it is contemplated that the number for the region word that class cluster is included, and detection region is dashed forward Hair event has important help.
Method HBED effect deviation, main cause is, in the geographical labels microblogging of acquisition, the microblogging with Hashtag Quantity is on the low side, and is the event of wide regional coverage mostly, and the detection to region sexual behavior part is not applied to.
Method of the present invention is not limited to the embodiment described in embodiment, those skilled in the art according to Other embodiments that technical scheme is drawn, also belong to the technological innovation scope of the present invention.

Claims (4)

1. the detection method of a kind of micro blog network region accident, it is characterised in that it is comprised the following steps that:
A, the collection region microblogging from micro blog network, obtain microblogging set PLMB, and microblogging set is obtained after being pre-processed to microblogging LMB;
B, the extraction burst word from microblogging set LMB, obtain burst set of words EW;
C, the burst word in EW is clustered, it is assumed that have q word cluster, obtain accident word cluster EWC={ ewc1, ewc2..., ewcq}。
2. a kind of detection method of micro blog network region accident according to claim 1, it is characterised in that:Above-mentioned step Rapid A's comprises the following steps that:
A1, the micro-blog information set PLMB={ plmb using sampling instrument acquisition region Localized1, plmb2, L, plmbm, Wherein plmbi(1#i m) is each region microblogging;
A2, microblogging set PLMB is pre-processed, remove link network address, emoticon information in microblogging, remove length and be less than 5 The microblogging of individual word, obtains pretreated microblogging set LMB, LMB={ lmb1, lmb2, L, lmbn, wherein lmbi(1#i n) is Each region microblogging.
3. a kind of detection method of micro blog network region accident according to claim 1, it is characterised in that the step Rapid B's comprises the following steps that:
B1, to every microblogging lmb in LMBi(1#i n) carries out participle, removes stop words, retains noun, verb, place name, people Name, proper noun, obtain final word set and are combined into LMBW={ w1, w2, L, wr, }, it is assumed that and there is r word;
B2, calculating word wiThe frequency burst of (1#i r), it is assumed that the time point of current incident detection is k, the p before selection The historical data at individual moment is reference, word wiIt is defined as in the frequency burst at k time points:Wherein, MoleculeFor word wiIn the frequency occurred at k time points, denominator
B3, calculating word wiThe association user of (1#i r) is sudden, it is assumed that the time point of current incident detection is k, chooses it The historical data at p preceding moment is reference, word wiIt is defined as the association user at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent user quantity, in denominator
B4, calculating word wiThe region of (1#i r) is sudden, word wiIt is defined as the distributional region at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent geographical labels quantity, point In mother
B5, calculating word wiThe Social behaviors of (1#i r) are sudden, word wiIt is defined as the Social behaviors at k time points are sudden:Wherein, moleculeFor k time points, word w is mentioned toiMicroblogging forwarding number, comment number and Read in number sum, denominator
B6, four of combining step B2, B3, B4, B5 it is sudden, finally give a word wiIt is in the burst value at k time points: BurstyScore(wi)=α * F (wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi), wherein, α, β, χ, δ are for regulation Number, the weight for adjusting four class indexs, alpha+beta+χ+δ=1, α >=0, β >=0, χ >=0, δ >=0;
B7, after the burst value of each word is calculated, select n burst word using quartile deviation, constitute burst set of words EW;Four points Difference distance calculating method be:IQS (EW)=Q3(EW)-Q1(EW);When the burst value of a word is more than certain threshold value, then make For burst word, the computational methods of threshold value are:Maxima (EW)=Q3(EW)+1.5×IQS(EW)。
4. a kind of detection method of micro blog network region accident according to claim 1, it is characterised in that the step Rapid C's comprises the following steps that:
C1, the burst character collection EW obtained based on step B, build burst word association network EWN=(V, E), and V is burst set of words EW, E represent the strength of association between burst word;Happen suddenly word ewi、ewjStrength of association is two words of statistics in same piece microblogging blog article The number of times of middle co-occurrence;
After the completion of C2, burst word association network EWN are built, EWN is clustered using the CLUTO kits increased income, obtains prominent Hair event word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.
CN201710455550.6A 2017-06-15 2017-06-15 Method for detecting microblog network region emergency Active CN107273496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710455550.6A CN107273496B (en) 2017-06-15 2017-06-15 Method for detecting microblog network region emergency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710455550.6A CN107273496B (en) 2017-06-15 2017-06-15 Method for detecting microblog network region emergency

Publications (2)

Publication Number Publication Date
CN107273496A true CN107273496A (en) 2017-10-20
CN107273496B CN107273496B (en) 2020-07-28

Family

ID=60067208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710455550.6A Active CN107273496B (en) 2017-06-15 2017-06-15 Method for detecting microblog network region emergency

Country Status (1)

Country Link
CN (1) CN107273496B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733791A (en) * 2018-05-11 2018-11-02 北京科技大学 network event detection method
CN109509110A (en) * 2018-07-27 2019-03-22 福州大学 Method is found based on the hot microblog topic for improving BBTM model
CN110502703A (en) * 2019-07-12 2019-11-26 北京邮电大学 Social networks incident detection method based on character string dictionary building
CN111475732A (en) * 2020-04-13 2020-07-31 腾讯科技(深圳)有限公司 Information processing method and device
CN112257429A (en) * 2020-10-16 2021-01-22 北京工商大学 BERT-BTM network-based microblog emergency detection method
CN112528024A (en) * 2020-12-15 2021-03-19 哈尔滨工程大学 Microblog emergency detection method based on multi-feature fusion
CN112527960A (en) * 2020-12-17 2021-03-19 华东师范大学 Emergency detection method based on keyword clustering
CN112948587A (en) * 2021-03-30 2021-06-11 杭州叙简科技股份有限公司 Microblog public opinion analysis method and device based on earthquake industry and electronic equipment
CN114461763A (en) * 2022-04-13 2022-05-10 南京众智维信息科技有限公司 Network security event extraction method based on burst word clustering

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104281608A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Emergency analyzing method based on microblogs
US20150186378A1 (en) * 2013-12-30 2015-07-02 International Business Machines Corporation System for identifying, monitoring and ranking incidents from social media
CN106294333A (en) * 2015-05-11 2017-01-04 国家计算机网络与信息安全管理中心 A kind of microblogging burst topic detection method and device
US20170024412A1 (en) * 2015-07-17 2017-01-26 Environmental Systems Research Institute (ESRI) Geo-event processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281608A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Emergency analyzing method based on microblogs
US20150186378A1 (en) * 2013-12-30 2015-07-02 International Business Machines Corporation System for identifying, monitoring and ranking incidents from social media
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN106294333A (en) * 2015-05-11 2017-01-04 国家计算机网络与信息安全管理中心 A kind of microblogging burst topic detection method and device
US20170024412A1 (en) * 2015-07-17 2017-01-26 Environmental Systems Research Institute (ESRI) Geo-event processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张雄宝等: "基于突发词地域分析的微博突发事件检测方法", 《情报杂志》 *
郭跇秀等: "基于突发词聚类的微博突发事件检测方法", 《计算机应用》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733791A (en) * 2018-05-11 2018-11-02 北京科技大学 network event detection method
CN108733791B (en) * 2018-05-11 2020-11-20 北京科技大学 Network event detection method
CN109509110A (en) * 2018-07-27 2019-03-22 福州大学 Method is found based on the hot microblog topic for improving BBTM model
CN109509110B (en) * 2018-07-27 2021-08-31 福州大学 Microblog hot topic discovery method based on improved BBTM model
CN110502703A (en) * 2019-07-12 2019-11-26 北京邮电大学 Social networks incident detection method based on character string dictionary building
CN111475732A (en) * 2020-04-13 2020-07-31 腾讯科技(深圳)有限公司 Information processing method and device
CN112257429A (en) * 2020-10-16 2021-01-22 北京工商大学 BERT-BTM network-based microblog emergency detection method
CN112257429B (en) * 2020-10-16 2024-04-16 北京工商大学 Microblog emergency detection method based on BERT-BTM network
CN112528024A (en) * 2020-12-15 2021-03-19 哈尔滨工程大学 Microblog emergency detection method based on multi-feature fusion
CN112527960A (en) * 2020-12-17 2021-03-19 华东师范大学 Emergency detection method based on keyword clustering
CN112948587A (en) * 2021-03-30 2021-06-11 杭州叙简科技股份有限公司 Microblog public opinion analysis method and device based on earthquake industry and electronic equipment
CN114461763A (en) * 2022-04-13 2022-05-10 南京众智维信息科技有限公司 Network security event extraction method based on burst word clustering

Also Published As

Publication number Publication date
CN107273496B (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN107273496A (en) A kind of detection method of micro blog network region accident
Xu et al. Understanding mobile traffic patterns of large scale cellular towers in urban environment
Lv et al. Social media based transportation research: The state of the work and the networking
Gao et al. A comparative study of users’ microblogging behavior on Sina Weibo and Twitter
De Choudhury et al. How does the data sampling strategy impact the discovery of information diffusion in social media?
Sankaranarayanan et al. Twitterstand: news in tweets
Lee et al. A novel approach for event detection by mining spatio-temporal information on microblogs
CN103617169B (en) A kind of hot microblog topic extracting method based on Hadoop
US20130304818A1 (en) Systems and methods for discovery of related terms for social media content collection over social networks
US20130297581A1 (en) Systems and methods for customized filtering and analysis of social media content collected over social networks
CN105630884B (en) A kind of geographical location discovery method of microblog hot event
CN103345524B (en) Method and system for detecting microblog hot topics
CN105224593B (en) Frequent co-occurrence account method for digging in the of short duration online affairs of one kind
CN113454954A (en) Real-time event detection on social data streams
CN104166726B (en) A kind of burst keyword detection method towards microblogging text flow
Cacho et al. Social smart destination: a platform to analyze user generated content in smart tourism destinations
Williams et al. Improving geolocation of social media posts
Farseev et al. bbridge: A big data platform for social multimedia analytics
Gao et al. A novel method for geographical social event detection in social media
Jendryke et al. Big location‐based social media messages from China's Sina Weibo network: Collection, storage, visualization, and potential ways of analysis
CN104281646B (en) Urban waterlogging detection method based on microblog data
Chong et al. Fine-grained geolocation of tweets in temporal proximity
Kim et al. TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme
Stojanovski et al. Social networks VGI: Twitter sentiment analysis of social hotspots
Ruhela et al. Towards the use of online social networks for efficient internet content distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant