CN107273496A - A kind of detection method of micro blog network region accident - Google Patents
A kind of detection method of micro blog network region accident Download PDFInfo
- Publication number
- CN107273496A CN107273496A CN201710455550.6A CN201710455550A CN107273496A CN 107273496 A CN107273496 A CN 107273496A CN 201710455550 A CN201710455550 A CN 201710455550A CN 107273496 A CN107273496 A CN 107273496A
- Authority
- CN
- China
- Prior art keywords
- word
- microblogging
- burst
- lmb
- ewc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 36
- 230000011273 social behavior Effects 0.000 claims abstract description 8
- 238000000205 computational method Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 3
- GRUVVLWKPGIYEG-UHFFFAOYSA-N 2-[2-[carboxymethyl-[(2-hydroxyphenyl)methyl]amino]ethyl-[(2-hydroxyphenyl)methyl]amino]acetic acid Chemical compound C=1C=CC=C(O)C=1CN(CC(=O)O)CCN(CC(O)=O)CC1=CC=CC=C1O GRUVVLWKPGIYEG-UHFFFAOYSA-N 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000009849 deactivation Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 210000003811 finger Anatomy 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000001314 paroxysmal effect Effects 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000009329 sexual behaviour Effects 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of detection method of micro blog network region accident, its step is as follows:(1) region microblogging is gathered from micro blog network, microblogging set PLMB is obtained, microblogging set LMB is obtained after being pre-processed to microblogging;(2) burst word is extracted from microblogging set LMB, burst set of words EW is obtained;(3) the burst word in EW is clustered, obtains accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.The inventive method proposes the burst value that word is calculated using word frequency rate, word association user, word distributional region and the class index of word Social behaviors 4, more reasonably make use of the burst character of micro blog network word, is more suitable for the detection of micro blog network region accident.
Description
Technical field
The present invention relates to a kind of information service, specifically, it is related to a kind of micro blog network region incident detection
Method.
Background technology
Microblogging has provided the user as real-time, interactivity very strong social media and has freely delivered content and information
The platform of exchange, has become people and discloses event, the preferred media delivered viewpoint, shared one's experience.What is occurred in reality is a lot
Event all first discloses that then traditional mainstream media is just reported, such as on microblogging, Boston blast thing of 2013 years
Part, the event of passing away etc. of Mrs Thatcher.Event detection towards microblogging has turned into the research heat of recent event detection field
Point.
Because many contents of microblogging carry regional information, including the place that blog article is referred to, the note of the user of blog article is delivered
Volume place, and subsidiary geographical labels of blog article etc., towards microblogging local region event detection (Localized event)
The emerging research direction through becoming.This kind of event detection has a basic assumption, i.e., when local domain does not have event occurs when
Wait, user can seldom discuss such event, once there occurs, just have a substantial amounts of discussion, such as region occur fire, blast,
Flood, traffic accident, pollution, disease propagation etc. event.This wide area event detection (Global event) with social media
It is very different, wide area event detection does not consider regional characteristic, what is faced is the whole information flow of media, the work not only analyzed
Work amount is big, and may have been omitted the focus incident of local region, and existing event detecting method is difficult to be applied directly to region
Among event detection.
The proceeding published in the U.S. in 2010:19th International World Wide Web meeting (19th in 2010
International World Wide Web Conference), it is entitled:Earthquake detection based on Twitter user-logical
Cross the social real-time detecting event of sensor (Earthquake shakes Twitter users:real-time event
Detection by social sensors), author is Takeshi Sakaki, Makoto Okazaki, Yutaka
Matsuo, this article is modeled to each Twitter user the node in wireless sensor network, and user delivers related to earthquake
The node that the process of blog article is abstracted into wireless sensor network issues the information behavior itself collected, then passes through blog article
Time and spatial model and follow-up filtering process, whether earthquake occurs to confirm.But this method needs engineer one
A little inquiry input items, it is difficult to be applied to the detection of unconventional accident.
The periodical published in China in 2016:Modem long jump skill intelligence technology, it is entitled:Microblogging event based on geographical coordinate
Detection and analysis, Zuo Zheshi:Lijin China, An Zhongjie, this article has used the issue number of microblog data, forwarding number, comment number, user
Liveness and mobile 5 indexs of intensity build the feature of microblogging.This method detect microblogging accident when, it is contemplated that microblogging
The feature of the social media of class is not comprehensive, including the burst frequency of word, region are sudden etc., and when calculating each index
Specific computational methods (including the formula of formalization etc.) are not provided.
The proceeding published in the U.S. in 2016:39th world ACM information retrieval meetings (39th
International ACM SIGIR Conference on Research and Development in Information
Retrieval), it is entitled:GeoBurst:Real-time monitored area event (GeoBurst in special stream is pushed away from geographical labels:Real-
Time Local Event Detection in Geo-Tagged Tweet Streams), author is Zhang Chao, Zhou
Guangyu, Yuan Quan, Zhuang Honglei, Zheng Yu, Kaplan Lance, Wang Shaowen, Han
Jiawei, this article recognizes some important microbloggings as center axis point (Pivots) in query window first, further by with
Historical data relatively obtains accident in terms of space-time.This method is the angle from microblogging text message, due to micro-
It is rich relatively shorter and smaller, and term is lack of standardization, directly is difficult to extract effective feature from some single short and small microblogging texts.
The content of the invention
The technical problems to be solved by the invention are that there is provided a kind of new micro blog network region in view of the shortcomings of the prior art
The detection method of accident, this method more reasonably make use of the burst character of micro blog network word, with being more suitable for micro blog network
The detection of domain accident.
The technical problems to be solved by the invention are realized by following technical scheme.The invention provides one kind
The detection method of micro blog network region accident, is characterized in, it is comprised the following steps that:
A, the collection region microblogging from micro blog network, obtain microblogging set PLMB, and microblogging collection is obtained after being pre-processed to microblogging
Close LMB;
B, the extraction burst word from microblogging set LMB, obtain burst set of words EW;
C, the burst word in EW is clustered, obtain accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that
There is q word cluster;
Region microblogging is gathered from micro blog network described in step A described in the inventive method, obtains micro- after pretreatment
Rich set LMB, it is preferred to use step in detail below:
A1, the micro-blog information set PLMB={ plmb using sampling instrument acquisition region Localized1, plmb2, L,
plmbm, wherein plmbi(1#i m) is each region microblogging;
A2, microblogging set PLMB is pre-processed, remove link network address, emoticon information in microblogging, remove length
Less than the microblogging of 5 words, pretreated microblogging set LMB, LMB={ lmb is obtained1, lmb2, L, lmbn, wherein lmbi(1#i
N) it is each region microblogging.
Burst word is extracted from microblogging set LMB described in step B described in the inventive method, the set of words that happens suddenly is obtained
EW, it is preferred to comprise the following steps that:
B1, to every microblogging lmb in LMBi(1#i n) carry out participle, remove stop words, retain noun, verb, place name,
Name, proper noun, obtain final word set and are combined into LMBW={ w1, w2, L, wr, }, it is assumed that and there is r word;
B2, calculating word wiThe frequency burst of (1#i r), it is assumed that the time point of current incident detection is k, before selection
The historical data at p moment is reference, word wiIt is defined as in the frequency burst at k time points:Wherein,
MoleculeFor word wiIn the frequency occurred at k time points, denominator
B3, calculating word wiThe association user of (1#i r) is sudden, it is assumed that the time point of current incident detection is k, choosing
The historical data at p moment before taking is reference, word wiIt is defined as the association user at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent user quantity, in denominator
B4, calculating word wiThe region of (1#i r) is sudden, word wiIt is defined as the distributional region at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent geographical labels quantity, point
In mother
B5, calculating word wiThe Social behaviors of (1#i r) are sudden, word wiIn the sudden definition of the Social behaviors at k time points
For:Wherein, moleculeFor k time points, word w is mentioned toiMicroblogging forwarding number, comment number
With read number sum, in denominator
B6, four of combining step B2, B3, B4, B5 it is sudden, finally give a word wiIn the burst value at k time points
For:BurstyScore(wi)=α * F (wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi), wherein, α, β, χ, δ are tune
Coefficient is saved, the weight for adjusting four class indexs, alpha+beta+χ+δ=1, α >=0, β >=0, χ >=0, δ >=0;
B7, after the burst value of each word is calculated, select n burst word using quartile deviation, constitute burst set of words EW.
The distance calculating method of quartile deviation is:IQS (EW)=Q3(EW)-Q1(EW).When a word burst value be more than certain threshold value,
Then as burst word, the computational methods of threshold value are:Maxima (EW)=Q3(EW)+1.5×IQS(EW)。
To in EW in a kind of micro blog network region incident detection method described in the inventive method, described step C
Burst word clustered, obtain accident word cluster EWC={ ewc1, ewc2..., ewcq, preferably comprise the following steps that:
C1, the burst character collection EW obtained based on step B, build burst word association network EWN=(V, E), and V is burst word
Set EW, E represent the strength of association between burst word.Happen suddenly word ewi、ewjStrength of association is two words of statistics in same piece microblogging
The number of times of co-occurrence in blog article;
After the completion of C2, burst word association network EWN are built, EWN is clustered using the CLUTO kits increased income, obtained
Take accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.
Compared with prior art, the present invention is proposed comprehensively carries out the finger of event detection using the feature of micro blog network
Mark, it is proposed that using word frequency rate, word association user, word distributional region and the class index of word Social behaviors 4, calculate the burst value of word,
The burst character of micro blog network word is more reasonably make use of, is more suitable for the detection of micro blog network region accident.And give
Specific computational methods, there is very big practical value.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the micro blog network region incident detection method of the present invention;
Fig. 2 is that region microblogging is gathered from micro blog network described in step 101 in Fig. 1, microblogging set PLMB is obtained, to micro-
Microblogging set LMB flow chart is obtained after rich pretreatment;
Fig. 3 is that burst word is extracted from microblogging set LMB described in step 102 in Fig. 1, the stream for the set of words EW that obtains happening suddenly
Cheng Tu;
Fig. 4 is being clustered to the burst word in EW described in step 103 in Fig. 1, obtains accident word cluster EWC=
{ewc1, ewc2..., ewcqFlow chart.
Embodiment
The implementation process to the present invention is described in further detail with reference to the accompanying drawings and detailed description.
A kind of reference picture 1, the detection method of micro blog network region accident, this method comprises the following steps:
Step 101, the collection region microblogging from micro blog network, obtain microblogging set PLMB, are obtained after being pre-processed to microblogging
Microblogging set LMB, reference picture 2, it is comprised the following steps that:
Step 201, the micro-blog information set PLMB={ plmb using sampling instrument acquisition region Localized1, plmb2,
L, plmbm, wherein plmbi(1#i m) is each region microblogging.After microblogging application developer's authority, call different in API
Interface, can get the dynamic micro-blog information on some position periphery.Calling station service interface can obtain the microblogging of return
Content, forwarding number, comment number, thumb up number, user profile, place of registering etc..
Step 202, microblogging set PLMB is pre-processed, remove link network address, emoticon information in microblogging, remove
Length is less than the microblogging of 5 words, obtains pretreated microblogging set LMB, LMB={ lmb1, lmb2, L, lmbn, wherein lmbi
(1#i n) is each region microblogging.In the region microblogging collected, although be to have carried out having pin from the microblogging of magnanimity
Screening to property, but wherein also in the presence of some interference informations, it is necessary to be filtered to it, the complexity that the reduction later stage calculates.
Step 102, the extraction burst word from microblogging set LMB, obtain happen suddenly set of words EW, reference picture 3, its specific steps
It is as follows:
Step 301, to every microblogging lmb in LMBi(1#i n) carries out participle, removes stop words, retains noun, moves
Word, place name, name, proper noun, obtain final word set and are combined into LMBW={ w1, w2, L, wr, }, it is assumed that and there is r word.Because having
A little verbs do not have practical significance, such as " hold, carry out, carry out, meeting " etc., further remove deactivation verb therein;
Step 302, calculating word wiThe frequency burst of (1#i r), it is assumed that the time point of current incident detection is k,
The historical data at p moment before selection is reference, word wiIt is defined as in the frequency burst at k time points:Wherein, moleculeFor word wiIn the frequency occurred at k time points, denominatorF(wi) bigger, illustrate at current k time points, word wiThe frequency of appearance
Rate growth trend is bigger, is more likely to be burst word;
Step 303, calculating word wiThe association user of (1#i r) is sudden, it is assumed that the time point of current incident detection
For k, the historical data at p moment before selection is reference, word wiIt is defined as the association user at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent user quantity, in denominatorU(wi) bigger, illustrate k time points, be mentioned to word wiNumber of users
Measure growth trend bigger, word wiMore it is likely to be burst word;
Step 304, calculating word wiThe region of (1#i r) is sudden, word wiIn the sudden definition of the distributional region at k time points
For:Wherein, moleculeFor k time points, word w is mentioned toiDifferent geographical labels number
In amount, denominatorGT(wi) bigger, illustrate k time points, refer to
To word wiGeographical labels quantity growth trend it is bigger, word wiMore it is likely to be burst word;
Step 305, calculating word wiThe Social behaviors of (1#i r) are sudden, word wiIt is sudden in the Social behaviors at k time points
It is defined as:Wherein, moleculeFor k time points, word w is mentioned toiMicroblogging forwarding number, comment
By number and number sum is read, in denominator SB(wi) bigger, say
At bright k time points, it is mentioned to word wiSocial behaviors quantity growth trend it is bigger, word wiMore it is likely to be burst word;
Step 306, four of summary word it is sudden, finally give a word wiIt is in the burst value at k time points:
BurstyScore(wi)=α * F (wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi), wherein, α, β, χ, δ are for regulation
Number, the weight for adjusting four class indexs, alpha+beta+χ+δ=1, α >=0, β >=0, χ >=0, δ >=0.BurstyScore(wi) bigger,
Declarer wiIn sudden bigger, the word w at k time pointsiMore it is likely to be burst word;
Step 307, after the burst value of each word is calculated, using quartile deviation select n burst word, constitute burst word set
Close EW.The distance calculating method of quartile deviation is:IQS (EW)=Q3(EW)-Q1(EW).When the burst value of a word is more than necessarily
Threshold value, then as burst word, the computational methods of threshold value are:Maxima (EW)=Q3(EW)+1.5×IQS(EW)。
Step 103, the burst word in EW is clustered, obtain accident word cluster EWC={ ewc1, ewc2...,
ewcq, reference picture 4, it is comprised the following steps that:
Step 401, based on burst character collection EW, build burst word association network EWN=(V, E), V is burst set of words
EW, E represent the strength of association between burst word.Happen suddenly word ewi、ewjStrength of association is two words of statistics in same piece microblogging blog article
The number of times of middle co-occurrence;
After the completion of step 402, burst word association network EWN are built, EWN is gathered using the CLUTO kits increased income
Class, obtains accident word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.CLUTO provides three kinds of clusters and calculated
Method, both can directly on the feature space of clustering object direct clustering, can also be clustered according to the similar spatial of object.This
A little algorithms are based on cutting, based on cohesion and based on figure cutting.In practical application, the hierarchical clustering side based on cohesion
Method it is more, therefore the present invention has selected Agglomerative Hierarchical Clustering method.
Comparative example:Using three kinds of different micro blog network region incident detection methods, compare region accident inspection
The validity of survey.Three kinds of methods are as follows:
(1) method 1-HBED, chooses the Hashtag included in microblogging, Hashtag is expressed as into vector pattern, the power of word
Calculated again by the way of TF-IDF, the number change that a cluster includes microblogging is considered when calculating the temperature clustered.
(2) method 2-GeoBurst, recognizes some important microbloggings as center axis point, further in query window first
By relatively obtaining accident in terms of space-time with historical data.The sequence of accident is according to time of word in word cluster
It is sudden with space.Four main parameter settings:Kernel function width h=0.01, restarts probability α=0.2, random walk
Similarity threshold δ=0.02, paroxysmal parameter η=0.5 of balance space-time.
(3) method 3-LocTBED, method proposed by the present invention, the sudden calculating of the word mainly proposed is used
The cohesion clustering method bagglo that CLUTO is provided is clustered, and the number of cluster is appointed as 10, and the similarity function of cluster is appointed as
Cosine function Cos.When the burst value of word is calculated, the historical limitations set of time of word is one week (7 days), when four class indexs are cumulative
Regulation parameter α=β=χ=8=0.25.
The present invention acquires Beijing, Jiangsu Province Lianyungang Liang Ge cities by taking real social media-Sina weibo as an example
Microblogging with geographical labels, the time of Beijing area information gathering is the (number of one month on 1 day-December 30 December in 2016
According to), the microblogging of 346863 band geographical labels is collected altogether, and the time of Lianyungang information gathering is 1 day -10 May in 2016
Month (data of half a year) on the 31st, collect the microblogging of 63744 band geographical labels altogether.Various event detections are verified in units of day
The validity of method, that is, detect the region accident of certain day specified.
Because the daily region accident in each city is unknown, so with reference to current existing mainstream research side
Method, evaluation index is used as using rate of precision P@n.For the Top-k accidents detected daily, what artificial judgment was detected is
No is region accident, because the event number that Top-k is detected is less, so the workload manually evaluated and tested and uncomplicated.
The result that 3 kinds of methods are obtained on 5 evaluation metricses is as shown in table 1.
Testing result of the 1. 5 kinds of methods of table on 5 evaluation metricses
Methods | P@1 | P@2 | P@3 | P@4 | P@5 | Average |
HBED | 0.20 | 0.30 | 0.20 | 0.30 | 0.24 | 0.24 |
GeoBurst | 0.80 | 0.70 | 0.80 | 0.75 | 0.72 | 0.72 |
LocTBED | 0.80 | 0.80 | 0.87 | 0.80 | 0.76 | 0.76 |
Contrast 3 kinds of methods, set forth herein the effects that obtain of method LocTBED it is ideal, on 5 evaluation metricses
The average value arrived is 0.76.Next to that GeoBurst, the average value obtained on 5 evaluation metricses is 0.72.Although both
Relatively, but both obtain the sequence of the accident in testing result to the value that method is obtained larger difference.Method
LocTBED is when calculating the temperature of accident class cluster, it is contemplated that the number for the region word that class cluster is included, and detection region is dashed forward
Hair event has important help.
Method HBED effect deviation, main cause is, in the geographical labels microblogging of acquisition, the microblogging with Hashtag
Quantity is on the low side, and is the event of wide regional coverage mostly, and the detection to region sexual behavior part is not applied to.
Method of the present invention is not limited to the embodiment described in embodiment, those skilled in the art according to
Other embodiments that technical scheme is drawn, also belong to the technological innovation scope of the present invention.
Claims (4)
1. the detection method of a kind of micro blog network region accident, it is characterised in that it is comprised the following steps that:
A, the collection region microblogging from micro blog network, obtain microblogging set PLMB, and microblogging set is obtained after being pre-processed to microblogging
LMB;
B, the extraction burst word from microblogging set LMB, obtain burst set of words EW;
C, the burst word in EW is clustered, it is assumed that have q word cluster, obtain accident word cluster EWC={ ewc1, ewc2...,
ewcq}。
2. a kind of detection method of micro blog network region accident according to claim 1, it is characterised in that:Above-mentioned step
Rapid A's comprises the following steps that:
A1, the micro-blog information set PLMB={ plmb using sampling instrument acquisition region Localized1, plmb2, L, plmbm,
Wherein plmbi(1#i m) is each region microblogging;
A2, microblogging set PLMB is pre-processed, remove link network address, emoticon information in microblogging, remove length and be less than 5
The microblogging of individual word, obtains pretreated microblogging set LMB, LMB={ lmb1, lmb2, L, lmbn, wherein lmbi(1#i n) is
Each region microblogging.
3. a kind of detection method of micro blog network region accident according to claim 1, it is characterised in that the step
Rapid B's comprises the following steps that:
B1, to every microblogging lmb in LMBi(1#i n) carries out participle, removes stop words, retains noun, verb, place name, people
Name, proper noun, obtain final word set and are combined into LMBW={ w1, w2, L, wr, }, it is assumed that and there is r word;
B2, calculating word wiThe frequency burst of (1#i r), it is assumed that the time point of current incident detection is k, the p before selection
The historical data at individual moment is reference, word wiIt is defined as in the frequency burst at k time points:Wherein,
MoleculeFor word wiIn the frequency occurred at k time points, denominator
B3, calculating word wiThe association user of (1#i r) is sudden, it is assumed that the time point of current incident detection is k, chooses it
The historical data at p preceding moment is reference, word wiIt is defined as the association user at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent user quantity, in denominator
B4, calculating word wiThe region of (1#i r) is sudden, word wiIt is defined as the distributional region at k time points is sudden:Wherein, moleculeFor k time points, word w is mentioned toiDifferent geographical labels quantity, point
In mother
B5, calculating word wiThe Social behaviors of (1#i r) are sudden, word wiIt is defined as the Social behaviors at k time points are sudden:Wherein, moleculeFor k time points, word w is mentioned toiMicroblogging forwarding number, comment number and
Read in number sum, denominator
B6, four of combining step B2, B3, B4, B5 it is sudden, finally give a word wiIt is in the burst value at k time points:
BurstyScore(wi)=α * F (wi)+β*U(u|wi)+χ*GT(gt|wi)+δ*SB(sb|wi), wherein, α, β, χ, δ are for regulation
Number, the weight for adjusting four class indexs, alpha+beta+χ+δ=1, α >=0, β >=0, χ >=0, δ >=0;
B7, after the burst value of each word is calculated, select n burst word using quartile deviation, constitute burst set of words EW;Four points
Difference distance calculating method be:IQS (EW)=Q3(EW)-Q1(EW);When the burst value of a word is more than certain threshold value, then make
For burst word, the computational methods of threshold value are:Maxima (EW)=Q3(EW)+1.5×IQS(EW)。
4. a kind of detection method of micro blog network region accident according to claim 1, it is characterised in that the step
Rapid C's comprises the following steps that:
C1, the burst character collection EW obtained based on step B, build burst word association network EWN=(V, E), and V is burst set of words
EW, E represent the strength of association between burst word;Happen suddenly word ewi、ewjStrength of association is two words of statistics in same piece microblogging blog article
The number of times of middle co-occurrence;
After the completion of C2, burst word association network EWN are built, EWN is clustered using the CLUTO kits increased income, obtains prominent
Hair event word cluster EWC={ ewc1, ewc2..., ewcq, it is assumed that there is q word cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710455550.6A CN107273496B (en) | 2017-06-15 | 2017-06-15 | Method for detecting microblog network region emergency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710455550.6A CN107273496B (en) | 2017-06-15 | 2017-06-15 | Method for detecting microblog network region emergency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107273496A true CN107273496A (en) | 2017-10-20 |
CN107273496B CN107273496B (en) | 2020-07-28 |
Family
ID=60067208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710455550.6A Active CN107273496B (en) | 2017-06-15 | 2017-06-15 | Method for detecting microblog network region emergency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107273496B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733791A (en) * | 2018-05-11 | 2018-11-02 | 北京科技大学 | network event detection method |
CN109509110A (en) * | 2018-07-27 | 2019-03-22 | 福州大学 | Method is found based on the hot microblog topic for improving BBTM model |
CN110502703A (en) * | 2019-07-12 | 2019-11-26 | 北京邮电大学 | Social networks incident detection method based on character string dictionary building |
CN111475732A (en) * | 2020-04-13 | 2020-07-31 | 腾讯科技(深圳)有限公司 | Information processing method and device |
CN112257429A (en) * | 2020-10-16 | 2021-01-22 | 北京工商大学 | BERT-BTM network-based microblog emergency detection method |
CN112528024A (en) * | 2020-12-15 | 2021-03-19 | 哈尔滨工程大学 | Microblog emergency detection method based on multi-feature fusion |
CN112527960A (en) * | 2020-12-17 | 2021-03-19 | 华东师范大学 | Emergency detection method based on keyword clustering |
CN112948587A (en) * | 2021-03-30 | 2021-06-11 | 杭州叙简科技股份有限公司 | Microblog public opinion analysis method and device based on earthquake industry and electronic equipment |
CN114461763A (en) * | 2022-04-13 | 2022-05-10 | 南京众智维信息科技有限公司 | Network security event extraction method based on burst word clustering |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216954A (en) * | 2014-08-20 | 2014-12-17 | 北京邮电大学 | Prediction device and prediction method for state of emergency topic |
CN104281608A (en) * | 2013-07-08 | 2015-01-14 | 上海锐英软件技术有限公司 | Emergency analyzing method based on microblogs |
US20150186378A1 (en) * | 2013-12-30 | 2015-07-02 | International Business Machines Corporation | System for identifying, monitoring and ranking incidents from social media |
CN106294333A (en) * | 2015-05-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of microblogging burst topic detection method and device |
US20170024412A1 (en) * | 2015-07-17 | 2017-01-26 | Environmental Systems Research Institute (ESRI) | Geo-event processor |
-
2017
- 2017-06-15 CN CN201710455550.6A patent/CN107273496B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281608A (en) * | 2013-07-08 | 2015-01-14 | 上海锐英软件技术有限公司 | Emergency analyzing method based on microblogs |
US20150186378A1 (en) * | 2013-12-30 | 2015-07-02 | International Business Machines Corporation | System for identifying, monitoring and ranking incidents from social media |
CN104216954A (en) * | 2014-08-20 | 2014-12-17 | 北京邮电大学 | Prediction device and prediction method for state of emergency topic |
CN106294333A (en) * | 2015-05-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of microblogging burst topic detection method and device |
US20170024412A1 (en) * | 2015-07-17 | 2017-01-26 | Environmental Systems Research Institute (ESRI) | Geo-event processor |
Non-Patent Citations (2)
Title |
---|
张雄宝等: "基于突发词地域分析的微博突发事件检测方法", 《情报杂志》 * |
郭跇秀等: "基于突发词聚类的微博突发事件检测方法", 《计算机应用》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733791A (en) * | 2018-05-11 | 2018-11-02 | 北京科技大学 | network event detection method |
CN108733791B (en) * | 2018-05-11 | 2020-11-20 | 北京科技大学 | Network event detection method |
CN109509110A (en) * | 2018-07-27 | 2019-03-22 | 福州大学 | Method is found based on the hot microblog topic for improving BBTM model |
CN109509110B (en) * | 2018-07-27 | 2021-08-31 | 福州大学 | Microblog hot topic discovery method based on improved BBTM model |
CN110502703A (en) * | 2019-07-12 | 2019-11-26 | 北京邮电大学 | Social networks incident detection method based on character string dictionary building |
CN111475732A (en) * | 2020-04-13 | 2020-07-31 | 腾讯科技(深圳)有限公司 | Information processing method and device |
CN112257429A (en) * | 2020-10-16 | 2021-01-22 | 北京工商大学 | BERT-BTM network-based microblog emergency detection method |
CN112257429B (en) * | 2020-10-16 | 2024-04-16 | 北京工商大学 | Microblog emergency detection method based on BERT-BTM network |
CN112528024A (en) * | 2020-12-15 | 2021-03-19 | 哈尔滨工程大学 | Microblog emergency detection method based on multi-feature fusion |
CN112527960A (en) * | 2020-12-17 | 2021-03-19 | 华东师范大学 | Emergency detection method based on keyword clustering |
CN112948587A (en) * | 2021-03-30 | 2021-06-11 | 杭州叙简科技股份有限公司 | Microblog public opinion analysis method and device based on earthquake industry and electronic equipment |
CN114461763A (en) * | 2022-04-13 | 2022-05-10 | 南京众智维信息科技有限公司 | Network security event extraction method based on burst word clustering |
Also Published As
Publication number | Publication date |
---|---|
CN107273496B (en) | 2020-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107273496A (en) | A kind of detection method of micro blog network region accident | |
Xu et al. | Understanding mobile traffic patterns of large scale cellular towers in urban environment | |
Lv et al. | Social media based transportation research: The state of the work and the networking | |
Gao et al. | A comparative study of users’ microblogging behavior on Sina Weibo and Twitter | |
De Choudhury et al. | How does the data sampling strategy impact the discovery of information diffusion in social media? | |
Sankaranarayanan et al. | Twitterstand: news in tweets | |
Lee et al. | A novel approach for event detection by mining spatio-temporal information on microblogs | |
CN103617169B (en) | A kind of hot microblog topic extracting method based on Hadoop | |
US20130304818A1 (en) | Systems and methods for discovery of related terms for social media content collection over social networks | |
US20130297581A1 (en) | Systems and methods for customized filtering and analysis of social media content collected over social networks | |
CN105630884B (en) | A kind of geographical location discovery method of microblog hot event | |
CN103345524B (en) | Method and system for detecting microblog hot topics | |
CN105224593B (en) | Frequent co-occurrence account method for digging in the of short duration online affairs of one kind | |
CN113454954A (en) | Real-time event detection on social data streams | |
CN104166726B (en) | A kind of burst keyword detection method towards microblogging text flow | |
Cacho et al. | Social smart destination: a platform to analyze user generated content in smart tourism destinations | |
Williams et al. | Improving geolocation of social media posts | |
Farseev et al. | bbridge: A big data platform for social multimedia analytics | |
Gao et al. | A novel method for geographical social event detection in social media | |
Jendryke et al. | Big location‐based social media messages from China's Sina Weibo network: Collection, storage, visualization, and potential ways of analysis | |
CN104281646B (en) | Urban waterlogging detection method based on microblog data | |
Chong et al. | Fine-grained geolocation of tweets in temporal proximity | |
Kim et al. | TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme | |
Stojanovski et al. | Social networks VGI: Twitter sentiment analysis of social hotspots | |
Ruhela et al. | Towards the use of online social networks for efficient internet content distribution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |