CN103345581B - Based on online from the Dynamic Network Analysis system and method for center model - Google Patents
Based on online from the Dynamic Network Analysis system and method for center model Download PDFInfo
- Publication number
- CN103345581B CN103345581B CN201310280241.1A CN201310280241A CN103345581B CN 103345581 B CN103345581 B CN 103345581B CN 201310280241 A CN201310280241 A CN 201310280241A CN 103345581 B CN103345581 B CN 103345581B
- Authority
- CN
- China
- Prior art keywords
- topic
- parameter
- beta
- event
- object function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of based on online from the Dynamic Network Analysis system and method for center model, this system at least includes: object function sets up module, dynamically on the basis of center model, to need parameter beta and the topic proportion omegab of studykObject function is set up as variable;The minimization of object function module, after a new events or a series of new events occur, utilizes alternative projection algorithm alternately to update parameter vector β and this topic proportion omegab of the study of these needsk, it is thus achieved that the optimal solution of object function, the present invention is by being modeled with model parameter the topic feature of time-varying, so that the accuracy that model elapses prediction over time will not decline.
Description
Technical field
The present invention, about a kind of Dynamic Network Analysis system and method, particularly relates to one based on online from center model
Dynamic Network Analysis system and method.
Background technology
Analysis of network, particularly Dynamic Network Analysis (Dynamic Network Analysis, i.e. DNA) are including society
Science has seemed more and more important with biology in interior many fields.Although having had now many about dynamic network
The work analyzed, but wherein most otherwise only focus on the large-scale data under the thickest fine granularity exactly, otherwise be exactly
Only focus on the analysis of fine particle size in a network the least.In recent years, it is thus proposed that dynamically from center model
(Dynamic Egocentric Model, i.e. DEM), this model is based on multivariate counting process and successfully to fine particle size
Large-scale time-varying citation network be modeled.In general, in DEM original text, there is the mutation of two DEM: one the most right
Chain feature is modeled, and chain feature is modeled with topic feature (text message) by another simultaneously.Due to the latter's
Accuracy is easier to obtain, unless specifically indicated, in the present invention far above the text message of the former and an article
DEM refer to the latter.Hereinafter DEM is simply introduced:
N is the sum of nodes (article).DEM attempt by each node i (i=1,2 ..., n) upper placement one
Individual counting process NiT () is to be modeled dynamic network.Wherein NiT () represents tiring out of t deadline of " event " in node i
Meter frequency.Here the definition of " event " context to be depended on.Such as, in citation network, one " event " can be right
Answer and once quote.
Although the full probability of these counting processes can be maximized, release the model of a continuous time, but for drawing
For network, it is clear that estimate that the parameter that those are relevant to time varying statistics can be more real by the method maximizing inclined probability
Border.So DEM attempts to maximize the likelihood of following whole network:
Wherein m is the total degree quoting event, and e is the index every time quoting event, ieExpression is cited in event e
Article, teThe time that expression event e occurs, YiT the worthwhile node i of () exists at time t is for 1, is otherwise 0.si(te) represent
Node i is at time teCharacteristic vector.β is the parameter vector needing study.
si(teVector in) can be divided into two classes.One class is referred to as " chain feature (statistic) ", another kind of referred to as " topic
Feature ".8 chain features are had, including three preferential attachment statistics, three triangle in DEM
Statistic and two out-path statistics.Additionally every article is extracted 50 also by the summary of article is run LDA
Topic feature.More specifically, it is assumed that at time teThe article newly arrived is i, can be special with the topic of any existing article j calculated as below
Levy:
Wherein θiRepresenting the topic ratio of article i, o is that the element between vector is multiplied item by item.
From the foregoing, it will be observed that si(te) be a vector containing 58 features, the most front 8 features are chain feature, after 50
Individual for topic feature.Accordingly, β is the parameter vector of a length of 58.
But, although during the prediction of dynamic network, DEM can be dynamically updated node and (represent literary composition in original text
Chapter) chain feature, but the DEM parameter beta that learns out and topic feature θiBut it is fixing during prediction.Therefore, DEM
As time goes on, it was predicted that accuracy can seriously decline, because actually topic feature and parameter all should be as
Time change.Such as, one of chain feature of model is that the in-degree by certain time point node is (article cited secondary
Number), As time goes on, the citation times of an article can become more and more, number of references in the most whole data set
Distribution change as well as the time, such as a result, the parameter of this feature corresponding, even other parameters, also
Should change.It addition, about topic feature, although at first sight, the topic feature of an article can change over time can
Can seem somewhat inconceivable, because from the point of view of usually, the word of an article delivered all will not change over time,
But, quote many articles of this article the most constantly in change.Therefore, reference information is combined with content of text information
Determine that the topic feature of an article is more reasonable.Such as, one can in the 1950's about the article of neutral net
Can be considered as and psychology or height correlation biology, but in today, it is the most more likely divided into about machine
The article of device study, because there being more and more article delivered to refer to it in decades.It follows that the topic of an article
Feature of course can change over time, simply the varying in size of amplitude.Due to cannot be to the parameter of time-varying and topic
Parameter model, DEM and cannot well dynamic network be modeled accurately so that the accuracy predicted can along with time
Between and decline.
Summary of the invention
For the deficiency overcoming above-mentioned prior art to exist, the purpose of the present invention is to provide one based on online from center die
The Dynamic Network Analysis system and method for type, it is by being modeled with model parameter the topic feature of time-varying, so that
Model elapses the accuracy of prediction over time and will not decline.
For reaching above and other purpose, the present invention propose a kind of based on online from the Dynamic Network Analysis system of center model
System, at least includes:
Object function sets up module, dynamic on the basis of center model, to need parameter beta and the topic ratio of study
ωkObject function is set up as variable;
The minimization of object function module, after a new events or a series of new events occur, utilizes alternating projection to calculate
Method alternately updates parameter vector β and this topic proportion omegab of these needs studyk, it is thus achieved that the optimal solution of object function.
Further, this object function is:
Wherein ωkIt is the new topic ratio of node k to be learned, θkIt is topic ratio current for node k, Represent ωkIn each element be non-negative, 1 is the vector that an element is all 1, these
Restriction is used for ensureing ωkIn all elements be all non-negative and element and be 1, λ is one and controls weight between two items
Hyper parameter.
Further, this minimization of object function module includes:
β parameter more new module, uses Newton method undated parameter to need the parameter beta of study after fixing topic proportion omegab;
Topic ratio more new module 111, in actualite ratio θ after fixing BetakOn the basis of, minimize this target letter
Number is to obtain the topic proportion omegab after updatingk。
Further, one is updated after this β parameter more new module and this topic ratio more new module quote event at every q time
Secondary.
This β parameter more new module is after fixing ω, and the object function needing the parameter beta of study is as follows:
First event during wherein x is mini-batch, q is the event number in mini-batch, and mini-batch is
The event sets of accumulation.
Further, this topic ratio more new module the most only updates the topic proportion omegab of an articlek, updating ωk
Time, the topic ratio { ω of other articlesi| i ≠ k} keeps constant.
Further, the object function that this topic ratio more new module need to optimize is:
Wherein,
Further, this topic ratio more new module obtains approximate gradient, root according to the object function local derviation that need to optimize
The approximate objective function of object function is obtained according to approximate gradient.
For reaching above-mentioned and other purpose, the present invention also provide for a kind of based on online from the Dynamic Network Analysis of center model
Method, comprises the steps:
Step one, dynamic on the basis of center model, to need parameter vector β and the topic proportion omegab of studykAs
Variable sets up object function;
Step 2, after a new events or a series of new events occur, utilizing alternative projection algorithm alternately to update should
Need parameter vector and the topic ratio of study, it is thus achieved that the optimal solution of object function.
Further, this object function is:
Wherein ωkIt is the new topic ratio of node k to be learned, θkIt is topic ratio current for node k, Represent ωkIn each element be non-negative, 1 is the vector that an element is all 1, and these restrictions are used for ensureing
ωkIn all elements be all non-negative and element and be 1, λ is one and controls the hyper parameter of weight between two items.
Further, this step 2 comprises the steps:
Step 1.1 uses Newton method undated parameter to need the parameter beta of study after fixing topic proportion omegab;
Step 1.2 after fixing Beta in actualite ratio θkOn the basis of, minimize this object function with obtain update after
Topic proportion omegabk;
Repeat step 1.1 and step 1.2 until meeting end condition.
Further, this step 2 updates once after quoting event at every q time.
Further, this step 1.1 is after fixing ω, and the object function needing the parameter beta of study is as follows:
First event during wherein x is mini-batch, q is the event number in mini-batch, and mini-batch is
The event sets of accumulation.
Further, this step 1.2 the most only updates the topic proportion omegab of an articlek, updating ωkTime, other articles
Topic ratio { ωi| i ≠ k} keeps constant.
Further, in step 1.2, the object function that need to optimize is:
Wherein,
Further, in step 1.2, this object function local derviation that need to optimize is obtained approximate gradient, according to approximation ladder
Degree obtains the approximate objective function of object function.
Compared with prior art, the present invention a kind of based on online from the Dynamic Network Analysis system and method for center model with
The dynamic network of time-varying is modeled, by regularized learning algorithm model parameter over time and topic feature so that the present invention gram
The shortcoming having taken DEM, it is to avoid the problem of the accuracy rate degradation over time that DEM exists.
Accompanying drawing explanation
Fig. 1 be the present invention a kind of based on online from the system architecture diagram of the Dynamic Network Analysis system of center model;
Fig. 2 be the present invention a kind of based on online from the flow chart of steps of the Dynamic Network Analysis method of center model;
Fig. 3 is the thin portion flow chart of steps of the step 202 of Fig. 2;
Fig. 4 is the Comparison of experiment results schematic diagram of the present invention.
Detailed description of the invention
Below by way of specific instantiation accompanying drawings embodiments of the present invention, those skilled in the art can
Further advantage and effect of the present invention is understood easily by content disclosed in the present specification.The present invention also can be different by other
Instantiation implemented or applied, the every details in this specification also can based on different viewpoints and application, without departing substantially from
Various modification and change is carried out under the spirit of the present invention.
Fig. 1 be the present invention a kind of based on online from the system architecture diagram of the Dynamic Network Analysis system of center model.Such as Fig. 1
Shown in, present invention one, at least includes: object function is built from the Dynamic Network Analysis system of center model (OEM) based on online
Formwork erection group 10 and the minimization of object function module 11.
Wherein, object function sets up module 10 dynamically on the basis of center model, with need the parameter vector β of study with
Topic proportion omegabkObject function is set up as variable.
Although intactly can learn from the set of whole article LDA (Latent Dirichlet allocation,
Three layers of bayesian probability model), if but it is clear that the LDA model being directly used in line can consuming time very.Cause
This, in the present invention, learn topic ratio θ again after first fixing topic.Because in citation network, even if some articles itself
Topic changes over time than regular meeting, main topic be relatively stablize constant, so it is rational for doing so.
It should be noted that, in embodiments of the present invention, it is only necessary to update all in the time long every one
Topic.From experiment it can be seen that do so still can reach good accuracy.
Therefore, in present pre-ferred embodiments, object function is:
Wherein ωkIt is the new topic ratio of node k to be learned, θkIt is topic ratio current for node k,L
The definition of (β, ω) is identical with the L (β) in the formula of DEM (1), except here all (β being noticed L as variable with topic ratio
(β, ω) is different from L (β), and in L (β), only β is variable and ω is constant).Represent ωkIn each unit
Element is all non-negative, and 1 is the vector that an element is all 1, and these restrictions are used for ensureing ωkIn all elements be all non-negative
And element and be 1.λ is one and controls the hyper parameter of weight between two items.
The minimization of object function module 11, after a new events or a series of new events occur, utilizes alternating projection
Algorithm (altemating projection) alternately updates the parameter vector β needing study and topic ratio, it is thus achieved that object function
Optimal solution.
When a new events or a series of new events are observed, after the Section 2 in formula (2) can ensure to update
Topic proportion omegabkWill not be apart from current topic ratio θkThe most remote.In addition, the present invention uses old β to come more as initial value
New β.
It is obvious that the optimization problem of formula (2) is not to combine convex to (β, ω).But may certify that this target
Function is when a variable is fixed, and is convex about another one variable.Then the present invention devises an alternating projection calculation
Method (altemating projection) is to find out the optimal solution of object function.Specifically, the minimization of object function module 11
Farther include: β parameter more new module 110 and topic ratio more new module 111, wherein, β parameter more new module 110, Yu Gu
Using Newton method undated parameter to need the parameter beta of study after determining topic proportion omegab, that initialize is current β;Topic ratio
More new module 111, in actualite ratio θ after fixing BetakOn the basis of, minimize after object function updates with acquisition
Topic proportion omegabk.β parameter more new module 110 and topic ratio more new module 111 generally require and are repeated several times by until meeting termination bar
Part.
It should be noted that, each new article i occurs, can add it and use utilization after in former citation network at once
β parameter more new module 110 and topic ratio more new module 111 are until restraining.But, this is for large-scale citation network
It it is quite time-consuming.Therefore, in the present invention it is possible to just start after waiting new article to run up to some to update.This
Mini-batch skill is possible not only to save the calculating time, and can reduce effect of noise.Therefore preferably in the present invention
In embodiment, β parameter more new module 110 and topic ratio more new module 111 update once after quoting event at every q time rather than every
Update once after secondary event.Q is set to about 1500 in an experiment
Specifically, β parameter more new module 110 is after fixing ω, and the object function needing the parameter beta of study is as follows:
First event during wherein x is mini-batch, q is the event number in mini-batch.
In order to avoid update β time travel through all before event of quoting, the present invention used one training window so that
The smaller subset considering to quote in event is had only to during training parameter β.If the width of training window is Wt(1≤Wt≤
Q), β can be learnt by optimizing following formula:
And the present invention can also cache the chain feature of each node to reduce computation burden further, as DEM institute
Do.
The time will be extremely expended, topic ratio more new module owing to disposably updating all topic ratios in ω
The 111 topic proportion omegab the most only updating an articlek, updating ωkTime, the topic ratio { ω of other articlesi|i≠k}
Keep constant.If in the mini-batch that size is q, node k is quoting event e1, e2..., epIn be cited and
At time ep+1, ep+2..., eqIt is not cited and (notices that the time of e2 generation is not necessarily at ep+2Before, although the former subscript
Little compared with the latter),
Here, the object function f (ω optimized is neededk) it is:
Wherein
Here, βlComprise front 8 elements (correspond to chain feature) of parameter beta, βtComprise rear 50 units of parameter beta
Element (corresponding is topic feature), θiIt is to quote event eiThe topic ratio of person who quote,It is to quote event eiIn joint
The chain feature (front 8 features) of some k, CuIt it is one and ωkUnrelated constant.
The single order of formula (3) is as follows with second order local derviation:
Wherein I is unit matrix.
Can be seen that Hessian matrix normal Wishart distribution (PD) from formula above, therefore the function of (3) is convex.At this point it is possible to
Solver is directly used to find globally optimal solution.
It is also preferred that the left in formula (4), AiIt is much larger thanWithAnd p is in each batch
The most relatively small.In like manner, BuIt is much larger thanWithAnd (q-p) is the most relatively small.Therefore, in (4)
Second will be much smaller than other two with Section 3.This means the ladder that can leave out less two to obtain an approximation
Degree:
Based on approximate gradient above, the approximate objective function of (2) can be recovered:
The mutation of (5) this OEM is referred to as " approximation OEM " (approximative OEM) by the present invention, and by original
OEM is referred to as " full OEM " (full OEM).In an experiment it appeared that approximation OEM can reach with expire the most close for OEM accuracy and
Need the most a lot of time.
Fig. 2 be the present invention a kind of based on online from the flow chart of steps of the Dynamic Network Analysis method of center model.Such as Fig. 2
Shown in, the present invention a kind of based on online from the Dynamic Network Analysis method of center model, comprise the steps:
Step 201, dynamic on the basis of center model, to need parameter vector β and the topic proportion omegab of studykAs
Variable sets up object function.
In step 201, the object function of foundation is:
Wherein ωkIt is the new topic ratio of node k to be learned, θkIt is topic ratio current for node k,L
The definition of (β, ω) is identical with the L (β) in the formula of DEM (1), except here all (β being noticed L as variable with topic ratio
(β, ω) is different from L (β), and in L (β), only β is variable and ω is constant).Represent ωkIn each unit
Element is all non-negative, and 1 is the vector that an element is all 1, and these restrictions are used for ensureing ωkIn all elements be all non-negative
And element and be 1.λ is one and controls the hyper parameter of weight between two items.
Step 202, after a new events or a series of new events occur, utilizes alternative projection algorithm
(alternating projection) alternately updates the parameter vector β needing study and topic ratio, it is thus achieved that object function
Optimal solution.
When a new events or a series of new events are observed, after the Section 2 in formula (2) can ensure to update
Topic proportion omegabkWill not be apart from current topic ratio θkThe most remote.In addition, the present invention uses old β to come more as initial value
New β.
It is obvious that the optimization problem of formula (2) is not to combine convex to (β, ω).But may certify that this target
Function is when a variable is fixed, and is convex about another one variable.Then the present invention devises an alternating projection calculation
Method (alternating projection) is to find out the optimal solution of object function.More specifically, in each iteration, we fix
In two variablees one and update another.Specifically, step 202 farther includes following steps (as shown in Figure 3):
Step 301, online β step (online β step): use Newton method undated parameter β after fixing ω, initialize
It is current β;
Step 302, online topic step (online topic step): in actualite ratio θ after fixing BetakBasis
On, minimize formula (2) to obtain the topic proportion omegab after updatingk。
Said process needs to be repeated several times by until meeting end condition.
It should be noted that, each new article i occurs, can add it and use utilization after in former citation network at once
β parameter more new module 110 and topic ratio more new module 111 are until restraining.But, this is for large-scale citation network
It it is quite time-consuming.Therefore, in the present invention it is possible to just start after waiting new article to run up to some to update.This
Mini-batch skill is possible not only to save the calculating time, and can reduce effect of noise.Therefore preferably in the present invention
In embodiment, update once after quoting event every q time rather than update once after each event.It is left that q is set to 1500 in an experiment
Right
In online β step, after fixing ω, the object function needing the parameter beta of study is as follows:
First event during wherein x is mini-batch, q is the event number in mini-batch.
In order to avoid update β time travel through all before event of quoting, the present invention used one training window so that
The smaller subset considering to quote in event is had only to during training parameter β.If the width of training window is Wt(1≤Wt≤
Q), β can be learnt by optimizing following formula:
And the present invention can also cache the chain feature of each node to reduce computation burden further, as DEM institute
Do.
The time will be extremely expended owing to disposably updating all topic ratios in ω, in online topic step, if
Count an algorithm alternately to update ω.More specifically, the most only update the topic proportion omegab of an articlek, updating
ωkTime, the topic ratio { ω of other articlesi| i ≠ k} keeps constant.If in the mini-batch that size is q, joint
Point k is quoting event e1, e2..., epIn be cited and at time ep+1, ep+2..., eqIt is not cited and (notes what e2 occurred
Time is not necessarily at ep+2Before, although the former subscript is little compared with the latter).
Need exist for the object function f (ω optimizedk) it is:
Wherein
Here, βlComprise front 8 elements (correspond to chain feature) of parameter beta, βtComprise rear 50 units of parameter beta
Element (corresponding is topic feature), θ i is to quote event eiThe topic ratio of person who quote,It is to quote event eiIn joint
The chain feature (front 8 features) of some k, CuIt it is one and ωkUnrelated constant.
The single order of formula (3) is as follows with second order local derviation:
Wherein I is unit matrix.
Can be seen that Hessian matrix normal Wishart distribution (PD) from formula above, therefore the function of (3) is convex.At this point it is possible to
Solver is directly used to find globally optimal solution.
It is preferred that in formula (4), AiIt is much larger thanWithAnd p is in each batch
The most relatively small.In like manner, BuIt is much larger thanWithAnd (q-p) is the most relatively small.Therefore, in (4)
Second will be much smaller than other two with Section 3.This means the ladder that can leave out less two to obtain an approximation
Degree:
Based on approximate gradient above, the approximate objective function of (2) can be recovered:
The mutation of (5) this OEM is referred to as " approximation OEM " (approximative OEM) by the present invention, and by original
OEM is referred to as " full OEM " (full OEM).In an experiment it appeared that approximation OEM can reach with expire the most close for OEM accuracy and
Need the most a lot of time.
Owing in each iteration, the algorithm of study ensures that the value of object function always declines, and target function value is total
Being greater than equal to 0, therefore the present invention is convergence.
Below by by the DEM of prior art and the OEM of the present invention being applied to two citation networks and comparing two moulds
The experimental result of type illustrates the progressive of the present invention, the most also analyzes the differentiation of article topic ratio.
1, data set
Citation network analysis is one of most important application in Dynamic Network Analysis, the present invention test in, be two
Data set arXiv-TH and arXiv-PH of individual citation network.Two data sets be all from arXiv (http: //
Snap.stanford.edu/data) crawl.The main information of data set is shown in Table 1.
Table 1 data set information
ArXiv-TH data set is the series of articles theoretical about high-energy physics.The scope of time be 1993 to 1997
Year, this data set has the highest temporal analytical density (being accurate to millisecond).ArXiv-PH data set is about high-energy physics phenomenon
Series of articles, time range is 1993 to 1997, and the time is accurate to every day.Due to the temporal analytical density in data set
The highest, it can be assumed that every new article all the different time join in network and also the most at the same time in may have
More than one quotes event.As previous joint is mentioned, mono-batch ground of a batch updates topic ratio and parameter.More
Body ground, data set is divided into mini-batch one by one, comprises in a period of time in each mini-batch by the present invention
Middle generation quote the time.Timestamp number in mini-batch each for arXiv-TH is 100, and for arXiv-PH is
20.The corresponding event number with each mini-batch is about 1500.
2, baseline
In an experiment, the performance of following 4 models is compared:
(1) DEM: the DEM having 8 chain features and 50 topic features originally.Notice that original DEM is not online
(online), parameter and topic feature are fixing after training.
(2) OEM-β: the OEM only walked with online β, in this model, β can update in time but topic feature will not.
(3) OEM-full: with the full OEM of online β step with topic step, topic feature and parameter all can change over time
Become, employ object function (2).
(4) OEM-appr: with the OEM of online β step with approximation topic step, topic feature and parameter all can change over time
Become, employ object function (5).
3, evaluating standard
Similar with DEM, the present invention evaluates and tests model above by following three standards:
(1) average test log-likelihood (Average held-out log-likelihood): in each test
Quote and i.e. can obtain after event takes log to the likelihoodL (β) in formula (1) testing log-likelihood.Will be all
The test of test event be log-likelihood's and divided by the sum of event in this batch, i.e. can obtain average test
log-likelihood.This numerical value is the highest, then explanation test accuracy is the highest.
(2) recall rate@K (Recall of top-K recommendation list): recall rate here is defined as K
The individual most probable ratio quoting truly generation in event.Here K is a cut-off (cut-point).
(3) the regular ranking of average test (Average held-out normalized rank): the most each quote thing
The ranking (rank) of part refers to this and quotes the physical location in the recommendation list sorted.This ranking is divided by possible
Quote the ranking after the sum of event i.e. obtains normalization (normalize).This numerical value is the lowest, represents that estimated performance is the best.
4, result and analysis
Such as DEM, each data set is divided into three parts by the present invention: establishment stage, training stage and test phase.Set up
Stage, typically its time range can be longer to alleviate truncation effect primarily to set up the statistic of citation network
(the front time of quoting in 1993 does not appears in data set) also avoids bias.In the training stage, we train initially
Model parameter and topic feature.In order to more comprehensively show and the estimated performance of comparison model, test phase ratio here
Longer.Test phase is divided into 24 batch.Notice that statistic (chain feature) is all in training stage with test phase
Can dynamically change.The size of data (representing with quoting event number) in each stage is as shown in table 2.
The foundation of table 2 data set, training, the segmentation of test phase
In order to reduce further OEM training with test time, only randomly selected in each batch a part time
Event of quoting between is to optimize the topic ratio of article.Such as when optimizing the topic ratio of article i, arrive at the 1st batch
After reaching, randomly select 10% person who quote (citer) of (being referred to as citer percentage ratio by 10% here, the most as the same) rather than complete
Portion person who quote.This can to a certain degree speed-up computation.In OEM, if hyper parameter λ=0.1, if citer percentage ratio is 10%, remove
Non-other explanation.The impact of model can be illustrated in ensuing experiment by hyper parameter citer percentage ratio with λ.
The details of the test process of OEM is as follows.First with the data of establishment stage and training stage train one initial
OEM.The most now this initial OEM is equivalent to DEM.Then evaluate and test this model (to notice in the estimated performance of Batch 1
We do not use the data of Batch 1 when training).It is extra training number by the data absorption of Batch 1 the most again
According to and update parameter and the feature of OEM.Then followed by using this OEM updated present to predict Batch 2.Thus may be used
See, before testing some batch, be not used for training by the data of this batch.Therefore the result tested can be true
Extensive/the predictive ability of ground reflection OEM.
Fig. 4 (a) and (b) are the average test log-likelihood of all models.Owing to initial OEM Yu DEM is
Valency, it can be seen that all of model performance when test b atch 1 is all identical.But, As time goes on,
The estimated performance of DEM can seriously decline, and each mutation of OEM then will not.Such as, from Fig. 4 (a) it can be seen that DEM
Log-likelihood declines fairly obvious over time, and OEM-β simply drops to-8.97 from-8.24.OEM-full's is pre-
Survey ability has exceeded above two models, and the scope of log-likelihood is-7.89 to-8.38.OEM-appr is then from-8.24
Drop to-8.56.
Fig. 4 (a) and (b) are the average test log-likelihood that event is quoted in test.(c) and (d) front K recommendation list
In recall rate.E () and (f) are the regular ranking of average test.Owing to all of model is after establishment stage with training stage
Initial parameter is identical, and they are identical in the performance of the 1st test batch.This from (a) to (f) it can be seen that.(g) with
H () is to develop at the topic that the 8001st and the 8005th time point is two the article collection being cited.In order to prevent the mixed of image
Disorderly, we only depict front several topics that ratio is the highest.
Fig. 4 (c) and (d) are the recall rates in front K recommendation list, K value 250.It appeared that DEM, OEM-β and OEM-
The performance of appr declines the most over time, however OEM-full without.Although the estimated performance of OEM-appr as well as
Time declines, but its performance is still significantly more than DEM.The performance of OEM-β is similar with DEM, the most undesirable.This means
The quantity of information of topic feature is the biggest, and it is not nearly enough for being simply updated β.Note can also obtaining when K takes other values similar
Result, not discussed here owing to length limits.
Fig. 4 (e) and (f) are the regular rankings of average test.It appeared that the performance of DEM Yu OEM-β cannot over time and
Improve.OEM-full with OEM-appr is the most permissible.Notice that rank value is the lowest and mean that predictive ability is the highest.With above phase
Seemingly, the undesirable effect of OEM-β further illustrates the renewal of the topic feature importance to this evaluating standard.Because more
Arriving batch below, the event number of quoting of candidate can be the most, if by absolute ranking, the performance of DEM actually along with time
Between and decline.But mbox{OEM-full} be but possible to prevent the decline of performance, even coming from the angle of absolute ranking
See.This is consistent with the result of (d) with Fig. 4 (a), (b), (c).
Table 3 compares the calculation consumption of OEM and approximation OEM.As seen from table, although approximation OEM is than full OEM estimated performance slightly
Difference, but but save the time of 50%.
The calculating time (second) of OEM-full Yu OEM-appr during table 3 λ=0.1
Table 4 citer percentage ratio is average test log-likelihood when 10%
Average test log-likelihood during table 5 λ=0.1
In order to study the hyper parameter (citer percentage ratio and the λ) impact on estimated performance, the present invention uses arXiv-TH data
Collect and calculate citer percentage ratio and take the average test log-likelihood of all test batch during different value with λ.Result
Refer to table 4 and table 5.As shown in Table 4,0.1 is the optimal value of λ.As can be seen from Table 5 after citer percentage ratio is more than 10%, in advance
Survey performance is less along with the raising of citer percentage ratio, and time loss has greatly increased, it means that selection 10% is
Citer percentage ratio is rational.
Sum it up, model OEM is for these hyper parameter insensitive.
The present invention have selected 2 article set from arXiv-TH data set and carrys out the topic differentiation of expository writing chapter.In order to keep away
Exempt from the chaotic topic ratio to each article set to be averaged, figure only depicts average topic ratio.Due to topic number altogether
There are 50, only have selected the topic that the ratio accounted for is maximum.Specifically, S is madet={ r1, r2..., rlRepresent and drawn at time t
Article set (article in same set is quoted with an article).It is then article set StFlat
All topic vectors.Here have selected S8001With S8005As the example of explanation, such as Fig. 4 (g) and (h).
Knowable to Fig. 4 (g), the ratio of topic 7 is (i.e.) with the ratio of topic 46 (i.e.) it is as time decline
's.But the ratio of topic 15Ratio with topic 44The most contrary.One explanation is that this is the 8001st
The article set that individual time point is cited is originally about certain physical sub-field, but As time goes on, these
The value of article is found that by the researcher in other sub-fields.After being refer to enough times by the article in other sub-fields again,
The topic of this article set starts from topic in talk (topic 7 and topic 46) to new topic (topic 15 and topic 44) transfer.With
The thing of sample can occur in the field (frontier) such as the field such as statistics, psychology (former field) and machine learning above.?
Article set (the S that 8005 time points are cited8005) topic develop similar with the 8001st time point, such as Fig. 4 (h) institute
Show.
In sum, the present invention a kind of based on online from the Dynamic Network Analysis system and method for center model with to time-varying
Dynamic network be modeled, by regularized learning algorithm model parameter over time and topic feature so that instant invention overcomes DEM
Shortcoming, it is to avoid the problem of the accuracy rate degradation over time that DEM exists, the experiment knot on two truthful data collection
Fruit shows, the present invention can reach the most considerable estimated performance in actual applications.
Although the experiment of the present invention is only limitted to article citation network, as described in DEM, the present invention is readily adaptable for use in other classes
The network of type, the present invention is not limited.
The principle of above-described embodiment only illustrative present invention and effect thereof, not for limiting the present invention.Any
Above-described embodiment all can be modified under the spirit and the scope of the present invention and change by skilled person.Therefore,
The scope of the present invention, should be as listed by claims.
Claims (10)
1., based on online from a Dynamic Network Analysis system for center model, at least include:
Object function sets up module, dynamic on the basis of center model, to need parameter beta and the topic proportion omegab of studykAs
Variable sets up object function;
The minimization of object function module, after a new events or a series of new events occur, utilizes alternative projection algorithm to hand over
For parameter vector β and this topic proportion omegab of updating the study of these needsk, it is thus achieved that the optimal solution of object function;Wherein, this target letter
Number is:
Subject to: ωk>=0,1Tωk=1,
Wherein ωkIt is the new topic ratio of node k to be learned, θkIt is topic ratio current for node k,ωk
>=0 represents ωkIn each element be non-negative, 1 is the vector that an element is all 1, and these restrictions are used for ensureing ωk
In all elements be all non-negative and element and be 1, λ is one and controls the hyper parameter of weight between two items, and n is network
The sum of interior joint.
A kind of based on online from the Dynamic Network Analysis system of center model, it is characterised in that should
The minimization of object function module includes:
β parameter more new module, uses Newton method undated parameter to need the parameter beta of study after fixing topic proportion omegab;
Topic ratio more new module, in actualite ratio θ after fixing BetakOn the basis of, minimize this object function to obtain
Topic proportion omegab after renewalk。
A kind of based on online from the Dynamic Network Analysis system of center model, it is characterised in that: should
β parameter more new module and this topic ratio more new module update once after quoting event at every q time.
A kind of based on online from the Dynamic Network Analysis system of center model, it is characterised in that should
β parameter more new module is after fixing ω, and the object function needing the parameter beta of study is as follows:
First event during wherein x is mini-batch, q is the event number in mini-batch, and mini-batch is accumulation
Event sets, e is the index every time quoting event, ieRepresent at the node that event e is formed, teExpression event e occur time
Between, Yi(te) represent that e quotes event when occurring, whether node i is present in network, there is i.e. Yi(te)=1, otherwise Yi
(te)=0, Si(te) represent that node i is at time teCharacteristic vector, β be need study parameter vector, βTIt it is parameter vector
Transposition.
A kind of based on online from the Dynamic Network Analysis system of center model, it is characterised in that: should
Topic ratio more new module the most only updates the topic proportion omegab of an articlek, updating ωkTime, the topic ratio of other articles
Example { ωi| i ≠ k} keeps constant.
6., based on online from a Dynamic Network Analysis method for center model, comprise the steps:
Step one, dynamic on the basis of center model, to need parameter vector β and the topic proportion omegab of studykBuild as variable
Vertical object function;
Step 2, after a new events or a series of new events occur, utilizes alternative projection algorithm alternately to update these needs
The parameter vector of study and topic ratio, it is thus achieved that the optimal solution of object function;Wherein, this object function is:
Subject to: ωk>=0,1Tωk=1,
Wherein ωkIt is the new topic ratio of node k to be learned, θkIt is topic ratio current for node k,ωk
>=0 represents ωkIn each element be non-negative, 1 is the vector that an element is all 1, and these restrictions are used for ensureing ωk
In all elements be all non-negative and element and be 1, λ is one and controls the hyper parameter of weight between two items, and n is network
The sum of interior joint.
A kind of based on online from the Dynamic Network Analysis method of center model, it is characterised in that should
Step 2 comprises the steps:
Step 1.1 uses Newton method undated parameter to need the parameter beta of study after fixing topic proportion omegab;
Step 1.2 after fixing Beta in actualite ratio θkOn the basis of, minimize after this object function updates with acquisition
Topic proportion omegabk;
Repeat step 1.1 and step 1.2 until meeting end condition.
A kind of based on online from the Dynamic Network Analysis method of center model, it is characterised in that: should
Step 2 updates once after quoting event at every q time.
A kind of based on online from the Dynamic Network Analysis method of center model, it is characterised in that: should
Step 1.1 is after fixing ω, and the object function needing the parameter beta of study is as follows:
First event during wherein x is mini-batch, q is the event number in mini-batch, and mini-batch is accumulation
Event sets, e is the index every time quoting event, ieRepresent at the node that event e is formed, teExpression event e occur time
Between, Yi(te) represent that e quotes event when occurring, whether node i is present in network, there is i.e. Yi(te)=1, otherwise Yi
(te)=0, Si(te) represent that node i is at time teCharacteristic vector, β be need study parameter vector, βTIt it is parameter vector
Transposition.
A kind of based on online from the Dynamic Network Analysis method of center model, it is characterised in that:
This step 1.2 the most only updates the topic proportion omegab of an articlek, updating ωkTime, the topic ratio { ω of other articlesi|i
≠ k} keeps constant.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310280241.1A CN103345581B (en) | 2013-07-04 | 2013-07-04 | Based on online from the Dynamic Network Analysis system and method for center model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310280241.1A CN103345581B (en) | 2013-07-04 | 2013-07-04 | Based on online from the Dynamic Network Analysis system and method for center model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103345581A CN103345581A (en) | 2013-10-09 |
CN103345581B true CN103345581B (en) | 2016-12-28 |
Family
ID=49280376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310280241.1A Active CN103345581B (en) | 2013-07-04 | 2013-07-04 | Based on online from the Dynamic Network Analysis system and method for center model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103345581B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105629728B (en) * | 2015-12-23 | 2018-07-17 | 辽宁石油化工大学 | The modeling method of complex dynamic network and the design method of model controller |
US20180129937A1 (en) * | 2016-11-04 | 2018-05-10 | Salesforce.Com, Inc. | Quasi-recurrent neural network |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034626A (en) * | 2012-12-26 | 2013-04-10 | 上海交通大学 | Emotion analyzing system and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013086048A1 (en) * | 2011-12-05 | 2013-06-13 | Visa International Service Association | Dynamic network analytic system |
-
2013
- 2013-07-04 CN CN201310280241.1A patent/CN103345581B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034626A (en) * | 2012-12-26 | 2013-04-10 | 上海交通大学 | Emotion analyzing system and method |
Non-Patent Citations (2)
Title |
---|
Dynamic Egocentric Models for Citation Networks;David Hunter, Padhraic Smyth, Duy Q. Vu and Arthur U. Asuncion;《Proceedings of the 28th International Conference on Machine Learning》;20111231;第857-第864页,摘要,2.1栏,2.2栏 * |
On-line LDA Adaptive Topic Models for Mining Text Streams with Applications to Topic;Loulwah AlSumait,Daniel Barbará,Carlotta Domeniconi;《Proceeding ICDM 08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining》;20081215;第3-12页,摘要,3.1栏、3.3栏 * |
Also Published As
Publication number | Publication date |
---|---|
CN103345581A (en) | 2013-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A hybrid model based on synchronous optimisation for multi-step short-term wind speed forecasting | |
CN104134159B (en) | A kind of method that spread scope is maximized based on stochastic model information of forecasting | |
Eichner et al. | Semi-cyclic stochastic gradient descent | |
Li et al. | An unbiased offline evaluation of contextual bandit algorithms with generalized linear models | |
Vu et al. | Continuous-time regression models for longitudinal networks | |
US9940386B2 (en) | Distributed model-building | |
Sadaei et al. | Combining ARFIMA models and fuzzy time series for the forecast of long memory time series | |
US8983879B2 (en) | Systems and methods for large-scale randomized optimization for problems with decomposable loss functions | |
Wang et al. | A hybrid local-search algorithm for robust job-shop scheduling under scenarios | |
CN103810104A (en) | Method and system for optimizing software test case | |
Wang et al. | An approach to evaluate the methods of determining experts’ objective weights based on evolutionary game theory | |
Hou et al. | Course recommendation of MOOC with big data support: A contextual online learning approach | |
CN103390032B (en) | Recommendation system and method based on relationship type cooperative topic regression | |
Gopalan et al. | Modeling overlapping communities with node popularities | |
CN106055661A (en) | Multi-interest resource recommendation method based on multi-Markov-chain model | |
Kruisselbrink | Evolution strategies for robust optimization | |
CN110751289A (en) | Online learning behavior analysis method based on Bagging-BP algorithm | |
CN103345581B (en) | Based on online from the Dynamic Network Analysis system and method for center model | |
CN108337123A (en) | Individual networks awareness of safety Tendency Prediction method | |
Rocha et al. | Modified movement force vector in an electromagnetism-like mechanism for global optimization | |
CN104217296A (en) | Listed company performance comprehensive evaluation method | |
Ding et al. | Uncertain random assignment problem | |
Han et al. | Multiple rules decision-based DE solution for the earliness-tardiness case of hybrid flow-shop scheduling problem | |
Velasco et al. | Can the global optimum of a combinatorial optimization problem be reliably estimated through extreme value theory? | |
Liu et al. | Linear computation for independent social influence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |