CN104281670B - The real-time incremental formula detection method and system of a kind of social networks event - Google Patents
The real-time incremental formula detection method and system of a kind of social networks event Download PDFInfo
- Publication number
- CN104281670B CN104281670B CN201410509359.1A CN201410509359A CN104281670B CN 104281670 B CN104281670 B CN 104281670B CN 201410509359 A CN201410509359 A CN 201410509359A CN 104281670 B CN104281670 B CN 104281670B
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- sigma
- probability
- theme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Machine Translation (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The present invention provides a kind of the real-time incremental formula detection method and system of social networks event, by using probability graph model, according to the time of short text, document and theme label, carries out model learning to short text, obtains likelihood function;Using EM algorithms, likelihood function is solved, obtains parameter;Using incremental update mode, renewal is iterated to the parameter obtained, until parameter restrains;Using distributed way, the E steps and M steps in EM algorithms are performed according to the parameter after convergence, calculate the content for obtaining short essay shelves, so as to solve real-time, socialization and the fragmentation feature that event detection of the prior art can not adapt to the short text in social networks simultaneously, cause the inaccurate technical problem of testing result.And, it is proposed that there is the short text event detection model of supervision, incremental learning is the same as predicting the algorithm being combined and the event detection model based on internal memory calculating platform.
Description
Technical field
The present invention relates to information technology, more particularly to a kind of real-time incremental formula detection method of social networks event and it is
System.
Background technology
Short text in social networks, such as microblogging, often have the characteristics that:Length is strictly limited within 140 words;
User can also be carried out interactive while short text is issued by@symbols and other users;User can also using # symbols come
Show the theme belonging to short text.
As a kind of interactive and propagated extremely strong instrument, the short text quantity in social networks is often with news thing
Part shows volatile growth, so that real time information substitutes frequently on social networks;Simultaneously as
The length limitation of short text in social networks so that text more fragmentation.Generally speaking, the short text in social networks
Real-time, socialization and fragmentation feature huge challenge is brought to event detection.
In the prior art, include for event detection based on burst word detection method, and based on topic model detection side
Method.Wherein, based in burst word detection method, representations of events is the set of one group of associated burst word.First, based on word frequency
The methods of acceleration or wavelet analysis detection burst word;Secondly, the similitude between burst word is calculated;Then, divided based on figure
Or the methods of K- averages (K-means), carries out the cluster of burst word.But it is based on several as follows ask in burst word detection method being present
Topic:First, background and the explanation of probability are lacked;Secondly, can not the theme of track of events change with time;Finally, data
Concentrate different time that event occurs cannot be distinguished by.
Another is based in topic model detection method, and representations of events is a theme.Topic model is extensively should
The method for excavating latent variable is concentrated used in text data.In the topic model of classics, such as the analysis of potential Di Li Crays
(Latent Dirichlet Allocation, LDA), theme is identified according to the cooccurrence relation between the word and word in document.
But there are problems that in the detection method based on topic model:First, classical topic model is applied to the data of long text
Collection, the cooccurrence relation between the word and word of short text is too sparse, and result of calculation is difficult convergence;Secondly, different time in data set
Generation event cannot be distinguished by;Finally, correlation technique is applied to the scene of processed offline, and related algorithm is that order calculates not
With concurrency.
Therefore, event detection of the prior art can not adapt to the real-time of the short text in social networks, society simultaneously
It can change and fragmentation feature, cause testing result inaccurate.
The content of the invention
The present invention provides a kind of the real-time incremental formula detection method and system of social networks event, for solving prior art
In event detection can not adapt to real-time, socialization and the fragmentation feature of the short text in social networks simultaneously, lead
Cause the inaccurate technical problem of testing result.
The first aspect of the invention is to provide a kind of real-time incremental formula detection method of social networks event, including:
Using probability graph model, according to the time of short text, document and theme label, model learning is carried out to short text,
Obtain likelihood function;
Using EM algorithms, likelihood function is solved, obtains parameter;
Using incremental update mode, renewal is iterated to the parameter obtained, until parameter restrains;
Using distributed way, the E steps and M steps in EM algorithms are performed according to the parameter after convergence, calculates and obtains short essay shelves
Content.
Another aspect of the present invention is to provide a kind of real-time incremental formula detecting system of social networks event, including:
Model learning module, for using probability graph model, according to the time of short text, document and theme label, to short
Text carries out model learning, obtains likelihood function;
Likelihood function module, for using EM algorithms, likelihood function is solved, obtains parameter;
Incremental update module, for using incremental update mode, renewal is iterated to the parameter obtained, until parameter
Convergence;
Distributed Calculation module, for using distributed way, according to the parameter after convergence perform E steps in EM algorithms and
M is walked, and calculates the content for obtaining short essay shelves.
The real-time incremental formula detection method and device of social networks event provided by the invention, by using probability artwork
Type, according to the time of short text, document and theme label, model learning is carried out to short text, obtains likelihood function;Calculated using EM
Method, likelihood function is solved, obtain parameter;Using incremental update mode, renewal is iterated to the parameter obtained, directly
Restrained to parameter;Using distributed way, the E steps and M steps in EM algorithms are performed according to the parameter after convergence, calculates and obtains short essay
The content of shelves, the real-time of short text in social networks can not be adapted to simultaneously so as to solve event detection of the prior art
Property, socialization and fragmentation feature, cause the inaccurate technical problem of testing result.And, it is proposed that there is the short essay of supervision
Present event detection model, incremental learning is the same as predicting the algorithm being combined and the event detection model based on internal memory calculating platform.
Brief description of the drawings
Fig. 1 is that a kind of flow of the real-time incremental formula detection method for social networks event that one embodiment of the invention provides is shown
It is intended to;
Fig. 2 is a kind of flow of the real-time incremental formula detection method for social networks event that another embodiment of the present invention provides
Schematic diagram;
Fig. 3 is probability graph model;
Fig. 4 is that a kind of structure of the real-time incremental formula detecting system for social networks event that one embodiment of the invention provides is shown
It is intended to.
Embodiment
Fig. 1 is that a kind of flow of the real-time incremental formula detection method for social networks event that one embodiment of the invention provides is shown
It is intended to, as shown in figure 1, including:
101st, using probability graph model, according to the time of short text, document and theme label, model is carried out to short text
Practise, obtain likelihood function.
102nd, using expectation maximization (Expectation Maximization Algorithm, EM) algorithm, to likelihood
Function is solved, and obtains parameter.
Wherein, parameter includes p (z | d), p (td| z), p (h | z) and p (w | z), and p (z | w, d, td, h) and p (z | w, d,
td);Wherein, p (z | d) represents probability of the theme z in document d;p(td| z) represent theme z in time tdWhen probability, p (h |
Z) represent that theme label h appears in probability in theme z, and p (w | z) represent probability of the word w in theme z.P (z | w, d, td,
H) represent that theme z is related to word w, document d, time td, theme label h probability.P (z | w, d, td) represent that theme z is related to word
W, document d, time tdProbability.
103rd, using incremental update mode, renewal is iterated to the parameter obtained, until parameter restrains.
104th, using distributed way, the E steps performed according to the parameter after convergence in EM algorithms walk with M, and calculating obtains short
The content of document.
In the present embodiment, by using probability graph model, according to the time of short text, document and theme label, to short essay
This progress model learning, obtain likelihood function;Using EM algorithms, likelihood function is solved, obtains parameter;Using increment more
New paragon, renewal is iterated to the parameter obtained, until parameter restrains;Using distributed way, according to the ginseng after convergence
Number performs E steps and M steps in EM algorithms, the content for obtaining short essay shelves is calculated, so as to solve event detection of the prior art
Real-time, socialization and the fragmentation feature of the short text in social networks can not be adapted to simultaneously, cause testing result not
Accurate technical problem.And, it is proposed that there is the short text event detection model of supervision, incremental learning is the same as predicting the calculation being combined
Method and the event detection model based on internal memory calculating platform.
Fig. 2 is a kind of flow of the real-time incremental formula detection method for social networks event that another embodiment of the present invention provides
Schematic diagram, as shown in Fig. 2 including:
201st, model learning.
Fig. 3 is probability graph model, is the probability graph model representation for the event detecting method that the present embodiment is provided, such as
Shown in Fig. 3, theme z is generated by time td, document d and theme label h, and the content W of a short essay shelves is generated by a theme z.
Wherein, likelihood function is as follows:
Wherein, H represents the text collection containing theme label, the non-text collections for representing not containing theme label of H, θ B tables
Show the background theme of the word of no practical significance, n (w, d) represents that word w appears in the word frequency in document d.λ θ B represent a text
Shelves are without the probability of practical significance, the probability that p (d) expression documents d occurs, and the probability that all document d occur in a model is
Identical, for the inverse of total number of documents, p (td| θ B) represent that background theme θ B appear in time tdProbability, p (h | θ B) represents master
Topic label h appears in probability in background theme θ B, and p (w | θ B) represents probability of the word w in background theme θ B, and p (z | d) table
Show probability of the theme z in document d;p(td| z) represent theme z in time tdWhen probability, p (h | z) represents that theme label h goes out
Probability in present theme z, and p (w | z) represent probability of the word w in theme z.
It is as follows that EM Algorithm for Solving results are carried out to it:
E is walked:
The result of E steps is the intermediate result of algorithm.
P (z | w, d, td, h) and represent that theme z is related to word w, document d, time td, theme label h probability.
P (z | w, d, td) represent that theme z is related to word w, document d, time tdProbability.
M is walked:
P (w | z) represents probability of the word w in theme z in the result of M steps;P (z | d) represent theme z in document d
Probability;p(td| z) represent theme z in time tdWhen probability.
It should be noted that on the whole, EM algorithm flow is as follows:
E is walked:Estimate the desired value of unknown parameter, provide current parameter Estimation.
M is walked:Distributed constant is reevaluated, so that the likelihood for obtaining data is maximum, provides the expectation estimation of known variables.
Iteration uses EM steps, until convergence.
202nd, parameter updates.
In order to keep the continuity of the detection of event and the substantial amounts of event content of quick detection, incremental update can be used
The mode of parameter.Parameter renewal process is divided into following two parts.
First, the calculating process of algorithm be iterative calculation, the initialization of variable most started be by Stochastic implementation, it
A steps are performed afterwards and determine that p (w | z) calculates the value of acquisition for the obtained value of initialization or previous round iteration, ask posteriority general
Rate p (z | w, d, td, h) and p (z | w, d, td);Then, perform b step and determine p (z | d), and p (td | z) to be counted according to foregoing
Calculate obtain p (z | w, d, td, h) and p (z | w, d, td) bring into M steps and correspond to the value that formula is tried to achieve, and try to achieve p (w |
z);Finally, iteration performs a steps and b step to convergence again after the normalization of required result.
203rd, distributed implementation.
In a kind of possible implementation, the foregoing EM algorithms of distributed implementation can be used.
Specifically, Distributed Calculation is realized using mapping reduction (MapReduce).MapReduce is a kind of programming model,
For the concurrent operation of large-scale dataset, it is very easy to programming personnel and the program of oneself is operated in into distributed system
On.It is to specify mapping (Map) function that current software, which is realized, for one group of key-value pair is mapped to one group of new key assignments
It is right, concurrent reduction (Reduce) function is specified, for ensureing each shared identical key in the key-value pair of all mappings
Group.
Walked for E, in distributed implementation, can specifically use following code to realize.
Algorithm 1E-step
Input:key:z;value:map<w,val>;key:z;value:map<t,val>;the pairs from M-
Step
Output:key:z;value:map<(w,d,td),val>
1:flatMaptoPair:
2:ArrayList M map=new ArrayList;
3:Foreach d contains w
4:M map.add (key=d, value=map<z,val>)
5:Endforeach
6:Foreach d contains td
7:M map.add (key=d, value=map<z,val>)
8:Endforeach
9:return M map;
10:M map union p(zjd);
11:reduceByKey(arg0,arg1):
12:Foreach key in arg1.map.keySets
13:If arg0.map contains key
14:Arg0.map.get (key) *=arg1.map.get (key)
15:Else
16:arg0.map put arg1.map.get(key)
17:Endif
18:return arg0
19:normalize()/*normalize the results.*/
It should be noted that the implementation procedure of aforementioned code is:
1-9 rows are the map stages:Realization calculates the Task Switching of needs into small task;
The wherein title for the map functions that the first row represents
2nd row states the memory cell of result of calculation
3-5 rows are the small tasks of division p (w | z)
6-8 rows are division p (td| z) small task
10-18 rows are the reduce stages:Realization collects the result of calculation of small task
The wherein memory space of 10 row application result of calculations
What the 11st row represented is the title of reduce functions
13-16 rows collect p (w | z) and p (t respectivelyd| z) result is multiplied to obtain the result that we refer in formula E steps
Result is normalized 19th row
Walked for M, in distributed implementation, can specifically use following code to realize.
Algorithm 2M-Step
Input:key:d;value:map<z,val>;the pair from E-Step
Output:key:z;value:map<map1<w,val>,map2<t,val>>
1:flatMaptoPair:
2:ArrayList E map1;
3:ArrayList E map2;
4:for w in d
5:for z in map1.keysets
6:E map1.add (key=z, value=map<w,val>)
7:Endfor
8:Endfor
9:E map2.add (key=z, value=map<t,val>)
10:return map<E map1,E map2>
11:reduceByKey(arg0,arg1):
12:Foreach key in arg1.map1.keySets
13:If arg0.map contains key
14:Arg0.map1.get (key) +=arg1.map1.get (key) * N (w, d)
15:Else
16:arg0.map1put arg1.map1.get(key)*N(w,d)
17:Endif
18:Foreach key in arg1.map2.keySets
19:If arg0.map2contains key
20:Arg0.map2.get (key) +=arg1.map2.get (key)
21:Else
22:arg0.map2put arg1.map2.get(key)
23:Endif
24:return arg0
25:normalize()/*normalize the results.*/
It should be noted that the implementation procedure of aforementioned code is:
1st row is the function name in map stages:Realization calculates calculative Task Switching into small task;
2-3 rows:The space of application result storage;
4-8 rows:By p (z | w, d, td, h) and p (z | w, d, td) it is converted into small task;
9-10 rows:Return to the result of small task storage;
11st row is the function name in reduce stages:Realization collects the result of small task computation;
13-17 rows:Calculating p (w | z) value;
18-23 rows:Calculate p (td| z) value;
24th row returns to the result of calculation of algorithm;
25th row does normalized to result.
In the present embodiment, the real-time incremental formula detection method of social networks event is by using probability graph model, according to short
Time, document and the theme label of text, model learning is carried out to short text, obtains likelihood function;Using EM algorithms, to likelihood
Function is solved, and obtains parameter;Using incremental update mode, renewal is iterated to the parameter obtained, until parameter is received
Hold back;Using distributed way, the E steps and M steps in EM algorithms are performed according to the parameter after convergence, calculates and obtains the interior of short essay shelves
Hold, the real-time of short text in social networks, society can not be adapted to simultaneously so as to solve event detection of the prior art
It can change and fragmentation feature, cause the inaccurate technical problem of testing result.And, it is proposed that there is the short essay present event of supervision
Detection model, incremental learning is the same as predicting the algorithm being combined and the event detection model based on internal memory calculating platform.
Fig. 4 is that a kind of structure of the real-time incremental formula detecting system for social networks event that one embodiment of the invention provides is shown
It is intended to, including:Model learning module 41, likelihood function module 42, incremental update module 43 and Distributed Calculation module 44.
Model learning module 41 is right according to the time of short text, document and theme label for using probability graph model
Short text carries out model learning, obtains likelihood function.
Likelihood function module 42, it is connected with model learning module 41, for using EM algorithms, likelihood function is asked
Solution, obtain parameter.
Incremental update module 43, it is connected with likelihood function module 42, for using incremental update mode, to the ginseng obtained
Number is iterated renewal, until parameter restrains.
Distributed Calculation module 44, it is connected with incremental update module 43, for using distributed way, after convergence
Parameter performs E steps and M steps in EM algorithms, calculates the content for obtaining short essay shelves.
In the present embodiment, the real-time incremental formula detecting system of social networks event, by using probability graph model, according to short
Time, document and the theme label of text, model learning is carried out to short text, obtains likelihood function;Using EM algorithms, to likelihood
Function is solved, and obtains parameter;Using incremental update mode, renewal is iterated to the parameter obtained, until parameter is received
Hold back;Using distributed way, the E steps and M steps in EM algorithms are performed according to the parameter after convergence, calculates and obtains the interior of short essay shelves
Hold, the real-time of short text in social networks, society can not be adapted to simultaneously so as to solve event detection of the prior art
It can change and fragmentation feature, cause the inaccurate technical problem of testing result.And, it is proposed that there is the short essay present event of supervision
Detection model, incremental learning is the same as predicting the algorithm being combined and the event detection model based on internal memory calculating platform.
For clear explanation the present embodiment, the present embodiment additionally provides a kind of possible implementation, this possible
In implementation,
Model learning module 41, specifically for using probability graph model, according to the time td, document d and theme of short text
Label h, model learning is carried out to short text, obtains following likelihood function:
Wherein, H represents the text collection containing theme label, the non-text collections for representing not containing theme label of H, θ B tables
Show the background theme of the word of no practical significance, n (w, d) represents that word w appears in the word frequency in document d.λ θ B represent a text
Shelves are without the probability of practical significance, the probability that p (d) expression documents d occurs, and the probability that all document d occur in a model is
Identical, for the inverse of total number of documents, p (td| θ B) represent that background theme θ B appear in time tdProbability, p (h | θ B) represents master
Topic label h appears in probability in background theme θ B, and p (w | θ B) represents probability of the word w in background theme θ B, and p (z | d) table
Show probability of the theme z in document d;p(td| z) represent theme z in time tdWhen probability, p (h | z) represents that theme label h goes out
Probability in present theme z, and p (w | z) represent probability of the word w in theme z.
Likelihood function module 42, specifically for using EM algorithms, likelihood function logL is solved, obtained respectively in E steps
Obtain parameter
And walked in M and obtain parameter
Incremental update module 43, specifically for the value according to the parameter p (w | z) being obtained ahead of time, ask posterior probability p (z |
W, d, td, h) and p (z | w, d, td);According to p (z | w, d, td, h) and p (z | w, d, td) calculating acquisition parameter p (z | d) and p (td |
Z) value, and solve p (w | z);The p tried to achieve (w | z), p (z | d) and p (td | z) are normalized;Described in iteration performs
According to the value for the parameter p (w | z) being obtained ahead of time, ask posterior probability p (z | w, d, td, h) and p (z | w, d, td);According to p (z | w,
D, td, h) and p (z | w, d, td) value for obtaining parameter p (z | d) and p (td | z) is calculated, and solve p (w | z);The p that will be tried to achieve
The process that (w | z), p (z | d) and p (td | z) are normalized, (w | z), p (z | d) and p (td | z) restrain until p.
The device that the present embodiment is provided, for performing the method described in Fig. 1 and Fig. 2, each functional module in device
Principle repeats no more, referring to the embodiment of corresponding method.
In the present embodiment, the social networks event detection based on short text is put, by using probability graph model, according to short essay
This time, document and theme label, model learning is carried out to short text, obtains likelihood function;Using EM algorithms, to likelihood letter
Number is solved, and obtains parameter;Using incremental update mode, renewal is iterated to the parameter obtained, until parameter restrains;
Using distributed way, the E steps and M steps in EM algorithms are performed according to the parameter after convergence, calculates the content for obtaining short essay shelves, from
And solve event detection of the prior art can not adapt to simultaneously the real-time of short text in social networks, socialization with
And fragmentation feature, cause the inaccurate technical problem of testing result.And, it is proposed that there is the short text event detection mould of supervision
Type, incremental learning is the same as predicting the algorithm being combined and the event detection model based on internal memory calculating platform.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can lead to
The related hardware of programmed instruction is crossed to complete.Foregoing program can be stored in a computer read/write memory medium.The journey
Sequence upon execution, execution the step of including above-mentioned each method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or
Person's CD etc. is various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme.
Claims (6)
- A kind of 1. real-time incremental formula detection method of social networks event, it is characterised in that including:Using probability graph model, according to the time of short text, document and theme label, model learning is carried out to short text, obtained Likelihood function;Using expectation maximization EM algorithms, likelihood function is solved, obtains parameter;Using incremental update mode, renewal is iterated to the parameter obtained, until parameter restrains;Using distributed way, the E steps and M steps in EM algorithms are performed according to the parameter after convergence, calculates and obtains the interior of short essay shelves Hold;Wherein, it is described to use probability graph model, according to the time of short text, document and theme label, model is carried out to short text Study, likelihood function is obtained, including:Using the probability graph model, according to the time t of short textd, document d and theme label h, model is carried out to short text Practise, obtain likelihood function<mrow> <mtable> <mtr> <mtd> <mrow> <mi>log</mi> <mi> </mi> <mi>l</mi> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>d</mi> </munder> <mi>p</mi> <mrow> <mo>(</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>{</mo> <msub> <mi>&lambda;</mi> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>|</mo> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> <mo>)</mo> </mrow> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&lambda;</mi> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>{</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <mi>d</mi> <mo>)</mo> </mrow> <munder> <mo>&Sigma;</mo> <mi>z</mi> </munder> <mo>{</mo> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <munder> <mi>&Pi;</mi> <mi>w</mi> </munder> <mi>p</mi> <msup> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mrow> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>}</mo> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&lambda;</mi> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>{</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>d</mi> <mo>)</mo> </mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mo>{</mo> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <msub> <mi>&Pi;</mi> <mi>w</mi> </msub> <mi>p</mi> <msup> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mrow> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>}</mo> <mo>}</mo> </mrow> </mtd> </mtr> </mtable> <mo>;</mo> </mrow>Wherein, H represents the text collection containing theme label,Expression does not contain the text collection of theme label, θBExpression does not have The background theme of the word of practical significance, n (w, d) represent that word w appears in the word frequency in document d;Represent that a document does not have The probability of practical significance, p (d) represent the probability that document d occurs, and the probability that all document d occur in a model is identical, For the inverse of total number of documents, p (td|θB) represent background theme θBAppear in time tdProbability, p (h | θB) represent theme label h Appear in background theme θBIn probability, p (w | θB) represent word w in background theme θBIn probability, p (z | d) represents theme z Probability in document d;p(td| z) represent theme z in time tdWhen probability, p (h | z) represents that theme label h appears in theme Probability in z, and p (w | z) represent probability of the word w in theme z.
- 2. the real-time incremental formula detection method of social networks event according to claim 1, it is characterised in that the use EM algorithms, are solved to likelihood function, obtain parameter, including:Using EM algorithms, likelihood function log l are solved, is walked in E obtain parameter respectively<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> <mo>)</mo> </mrow> </mrow><mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> <mo>;</mo> </mrow>Wherein, p (z | w, d, td, h) and represent that theme z is related to word w, document d, time td, theme label h probability;P (z | w, d, td) represent that theme z is related to word w, document d, time tdProbability;And walked in M and obtain parameter<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>w</mi> </msub> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow><mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> <mo>)</mo> </mrow> </mrow> 1<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> </mrow><mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>t</mi> </msub> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>.</mo> </mrow>
- 3. the real-time incremental formula detection method of social networks event according to claim 2, it is characterised in that the use Incremental update mode, renewal is iterated to the parameter obtained, until parameter convergence includes:According to the value for the parameter p (w | z) being obtained ahead of time, ask posterior probability p (z | w, d, td, h) and p (z | w, d, td);According to p (z | w, d, td, h) and p (z | w, d, td) calculating acquisition parameter p (z | d) and p (td| value z), and solve p (w | z);By the p tried to achieve (w | z), p (z | d) and p (td| z) it is normalized;Iteration performs parameter p that the basis is obtained ahead of time (w | z) value, ask posterior probability p (z | w, d, td, h) and p (z | w, D, td);According to p (z | w, d, td, h) and p (z | w, d, td) calculating acquisition parameter p (z | d) and p (td| value z), and solve p (w|z);By the p tried to achieve (w | z), p (z | d) and p (td| process z) being normalized, (w | z), p (z | d) and p (t until pd | z) restrain.
- A kind of 4. real-time incremental formula detecting system of social networks event, it is characterised in that including:Model learning module, for using probability graph model, according to the time of short text, document and theme label, to short text Model learning is carried out, obtains likelihood function;Likelihood function module, for using expectation maximization EM algorithms, likelihood function is solved, obtains parameter;Incremental update module, for using incremental update mode, renewal is iterated to the parameter obtained, until parameter is received Hold back;Distributed Calculation module, for using distributed ways, E steps in EM algorithms are performed according to the parameter after convergence and M is walked, Calculate the content for obtaining short essay shelves;Wherein, the model learning module, specifically for using probability graph model, according to the time t of short textd, document d and master Label h is inscribed, model learning is carried out to short text, obtains likelihood function<mrow> <mtable> <mtr> <mtd> <mrow> <mi>log</mi> <mi> </mi> <mi>l</mi> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>d</mi> </munder> <mi>p</mi> <mrow> <mo>(</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>{</mo> <msub> <mi>&lambda;</mi> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> </msub> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>|</mo> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> <mo>)</mo> </mrow> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&lambda;</mi> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>{</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <mi>d</mi> <mo>)</mo> </mrow> <munder> <mo>&Sigma;</mo> <mi>z</mi> </munder> <mo>{</mo> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <munder> <mi>&Pi;</mi> <mi>w</mi> </munder> <mi>p</mi> <msup> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mrow> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>}</mo> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>&lambda;</mi> <msub> <mi>&theta;</mi> <mi>B</mi> </msub> </msub> <mo>)</mo> </mrow> <mo>{</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>d</mi> <mo>)</mo> </mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mo>{</mo> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <msub> <mi>&Pi;</mi> <mi>w</mi> </msub> <mi>p</mi> <msup> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mrow> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>}</mo> <mo>}</mo> </mrow> </mtd> </mtr> </mtable> <mo>;</mo> </mrow>Wherein, H represents the text collection containing theme label,Expression does not contain the text collection of theme label, θBExpression does not have The background theme of the word of practical significance, n (w, d) represent that word w appears in the word frequency in document d;Represent that a document does not have The probability of practical significance, p (d) represent the probability that document d occurs, and the probability that all document d occur in a model is identical, For the inverse of total number of documents,Represent background themeAppear in time tdProbability,Represent that theme label h appears in background themeIn probability,Represent that word w is being carried on the back Scape themeIn probability, p (z | d) represents probability of the theme z in document d;p(td| z) represent theme z in time tdWhen Probability, p (h | z) represent that theme label h appears in the probability in theme z, and p (w | z) represent probability of the word w in theme z.
- 5. the real-time incremental formula detecting system of social networks event according to claim 4, it is characterised in thatThe likelihood function module, specifically for using EM algorithms, is solved to likelihood function log l, walks obtain in E respectively Parameter<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>h</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> <mo>)</mo> </mrow> </mrow><mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> <mo>;</mo> </mrow>Wherein, p (z | w, d, td, h) and represent that theme z is related to word w, document d, time td, theme label h probability;P (z | w, d, td) represent that theme z is related to word w, document d, time tdProbability;And walked in M and obtain parameter<mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>w</mi> </msub> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>n</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow><mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> <mo>)</mo> </mrow> </mrow><mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>z</mi> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <mo>(</mo> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> <mo>)</mo> </mrow> </mrow><mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>|</mo> <mi>z</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>t</mi> </msub> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>H</mi> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>,</mo> <mi>h</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mover> <mi>H</mi> <mo>~</mo> </mover> </mrow> </msub> <mi>p</mi> <mrow> <mo>(</mo> <mi>z</mi> <mo>|</mo> <mi>w</mi> <mo>,</mo> <mi>d</mi> <mo>,</mo> <msub> <mi>t</mi> <mi>d</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>.</mo> </mrow>
- 6. the real-time incremental formula detecting system of social networks event according to claim 5, it is characterised in that the increment Update module, specifically for the value according to the parameter p (w | z) being obtained ahead of time, ask posterior probability p (z | w, d, td, h) and p (z | W, d, td);According to p (z | w, d, td, h) and p (z | w, d, td) calculating acquisition parameter p (z | d) and p (td| value z), and solve p(w|z);By the p tried to achieve (w | z), p (z | d) and p (td| z) it is normalized;Iteration performs what the basis was obtained ahead of time Parameter p (w | z) value, ask posterior probability p (z | w, d, td, h) and p (z | w, d, td);According to p (z | w, d, td, h) and p (z | W, d, td) calculating acquisition parameter p (z | d) and p (td| value z), and solve p (w | z);By the p tried to achieve (w | z), p (z | d) With p (td| process z) being normalized, (w | z), p (z | d) and p (t until pd| z) restrain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410509359.1A CN104281670B (en) | 2014-09-28 | 2014-09-28 | The real-time incremental formula detection method and system of a kind of social networks event |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410509359.1A CN104281670B (en) | 2014-09-28 | 2014-09-28 | The real-time incremental formula detection method and system of a kind of social networks event |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104281670A CN104281670A (en) | 2015-01-14 |
CN104281670B true CN104281670B (en) | 2017-12-15 |
Family
ID=52256543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410509359.1A Active CN104281670B (en) | 2014-09-28 | 2014-09-28 | The real-time incremental formula detection method and system of a kind of social networks event |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104281670B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106909541A (en) * | 2015-12-23 | 2017-06-30 | 神州数码信息系统有限公司 | A kind of automatic identification of cross-cutting public public sentiment, classify and the system for reporting |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102289487A (en) * | 2011-08-09 | 2011-12-21 | 浙江大学 | Network burst hotspot event detection method based on topic model |
-
2014
- 2014-09-28 CN CN201410509359.1A patent/CN104281670B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102289487A (en) * | 2011-08-09 | 2011-12-21 | 浙江大学 | Network burst hotspot event detection method based on topic model |
Non-Patent Citations (2)
Title |
---|
《EDM:高效的微博事件检测算法》;童薇等;《计算机科学与探索》;20121019;第1076-1086页 * |
Using Incremental PLSI for Threshold-Resilient Online Event Analysis;Tzu-Chuan Chou等;《IEEE Transaction on Knowledge and Data Engineering》;20080331;第20卷(第3期);第289-299页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104281670A (en) | 2015-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhao et al. | Data mining applications with R | |
CN106934012A (en) | A kind of question answering in natural language method and system of knowledge based collection of illustrative plates | |
Mazumdar et al. | Query complexity of clustering with side information | |
Despalatović et al. | Community structure in networks: Girvan-Newman algorithm improvement | |
CN103150383B (en) | A kind of event evolution analysis method of short text data | |
Dowe et al. | Bayes not Bust! Why Simplicity is no Problem for Bayesians1 | |
CN106599194A (en) | Label determining method and device | |
Akgun et al. | Automated symmetry breaking and model selection in Conjure | |
CN105825269B (en) | A kind of feature learning method and system based on parallel automatic coding machine | |
CN107609141A (en) | It is a kind of that quick modelling method of probabilistic is carried out to extensive renewable energy source data | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN114357117A (en) | Transaction information query method and device, computer equipment and storage medium | |
Blanken et al. | Estimating network structures using model selection | |
CN101266660A (en) | Reality inconsistency analysis method based on descriptive logic | |
Gürünlü Alma et al. | On the estimation of the extreme value and normal distribution parameters based on progressive type-II hybrid-censored data | |
Rahman et al. | kDMI: A novel method for missing values imputation using two levels of horizontal partitioning in a data set | |
CN110046344A (en) | Add the method and terminal device of separator | |
CN104281670B (en) | The real-time incremental formula detection method and system of a kind of social networks event | |
Chis | Sliding hidden markov model for evaluating discrete data | |
Roos et al. | Analysis of textual variation by latent tree structures | |
Moreno et al. | Scalable and exact sampling method for probabilistic generative graph models | |
CN109871889A (en) | Mass psychology appraisal procedure under emergency event | |
Riabi et al. | β-entropy for Pareto-type distributions and related weighted distributions | |
CN107430600A (en) | Expansible web data extraction | |
Patra et al. | Motif discovery in biological network using expansion tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |