CN104899657A - Method for predicting association fusion events - Google Patents

Method for predicting association fusion events Download PDF

Info

Publication number
CN104899657A
CN104899657A CN201510314273.8A CN201510314273A CN104899657A CN 104899657 A CN104899657 A CN 104899657A CN 201510314273 A CN201510314273 A CN 201510314273A CN 104899657 A CN104899657 A CN 104899657A
Authority
CN
China
Prior art keywords
corporations
delta
index
data
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510314273.8A
Other languages
Chinese (zh)
Inventor
唐晓晟
李巍
胡铮
张诗悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201510314273.8A priority Critical patent/CN104899657A/en
Publication of CN104899657A publication Critical patent/CN104899657A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for predicting association fusion events, which comprises the steps of a first step, dividing network original data according to a preset time slice, selecting data of a plurality of time slices from the network original data as training data; a second step, dividing the training data into static associations and dynamic associations; a third step, extracting a key factor index between two random associations based on the training data; and a fourth step, performing supervised training on the key factor index, and determining whether fusion between the two random associations occurs according to the studying result of supervised training. The method realizes prediction for the fusion event of two random associations or a plurality of associations, and furthermore has high prediction reliability. The method is adapted to analyze a majority of weighted or unweighted networks.

Description

The Forecasting Methodology of corporations' fusion event
Technical field
The present invention relates to Data Mining, particularly relate to the Forecasting Methodology of a kind of corporations fusion event.
Background technology
In our life, complex network is ubiquitous, and its common feature is huge, complex structure.Such as social networks is exactly the typical complex network of one be made up of human relationships in real life, its node on behalf network user or Fiel can in people, internodal connection represents friend relation between the network user or real interpersonal relation.This by node in social networks, and the structure be connected to form between node is called network topology structure, this structure presents different features in the social networks of dissimilar, different phase.
Corporations are the seed network representing complex network key character.Corporations have network topology structure equally, and community structure can along with corporations develop and critical event present different features.The behavior guidance that represent certain kind of groups of critical event in complex network.Such as, corporations in social networks represent various social circle, can be circle of friends, emotional affection circle, colleague's circle etc.These corporations may mean the formation of some interest factor or social factor.Carry out prediction to critical event to contribute to excavating these factors in advance and being used, instruct network behavior further.Therefore, to the prediction of corporations' evolution critical event in research or application aspect all has very important meaning.
Corporations' evolution critical event comprises the extinction of corporations, newborn, shrinks, expansion, division and fusion.At present, some researchs are had to the Forecasting Methodology of corporations' evolution critical event, but is only limitted to the evolvement trend predicting single corporations.Corporations' fusion event relate to multiple corporations, whether research work in the past only achieves has the prediction of merging tendency to single corporations, and not clear and definite method does not predict which corporation can merge within a period of time in future.
To sum up, need a kind of method badly and carry out more detailed prediction to by the corporations occurring to merge.
Summary of the invention
One of technical matters to be solved by this invention is to provide a kind of method and carries out more detailed prediction to by the corporations occurring to merge.
In order to solve the problems of the technologies described above, the embodiment of the application provides the Forecasting Methodology of a kind of corporations fusion event, comprising: step one, the timeslice of network raw data according to setting split, and therefrom chooses multiple timeslice data as training data; Step 2, training data carried out to the division of static corporations and dynamic corporations; Step 3, extract the key factor index between any Liang Ge corporations based on training data; Step 4, exercise supervision to described key factor index training, and judge whether any Liang Ge corporations can merge according to the learning outcome of supervised training.
Preferably, key factor index comprises the external structure index of similarity of the inner structure index between Liang Ge corporations, the single order change indicator of described inner structure index, second order change indicator and Liang Ge corporations.
Preferably, following expression is utilized to extract inner structure index between described Liang Ge corporations:
B d ( i , j ) = E i , j ( E i / N i + E j / N j ) / 2
In formula, B d(i, j) is the inner structure index between corporations i and corporations j, E i,jfor the linking number between corporations i and corporations j, E iand E jbe respectively the linking number of corporations i and corporations j inside, N iand N jbe respectively the nodes of corporations i and corporations j inside.
Preferably, following expression is utilized to extract the external structure index of similarity of described Liang Ge corporations:
Sim ( i , j ) = Σ k = 1 k ≠ i , j m ( w i , k × w j , k ) Σ k = 1 k ≠ i , j m w i , k 2 + Σ k = 1 k ≠ i , j m w j , k 2 - Σ k = 1 k ≠ i , j m ( w j , k × w j , k )
In formula, Sim (i, j) is the external structure index of similarity of corporations i and corporations j; w i,kand w j,krepresent the power between corporations i and corporations k and between corporations j and corporations k respectively, wherein, e i,kand E j,kbe respectively the linking number between corporations i and corporations k and between corporations j and corporations k, N i, N jand N kbe respectively the nodes of corporations i, corporations j and corporations k inside; M is corporations' sequence number numbers.
Preferably, comprise the following steps in step 4: utilize the key factor index obtained based on training data to build forecast model, and determine that the separatrix value merged occurs in corporations; The key factor index obtained based on the data apart from the nearest timeslice of time point to be predicted is substituted into described forecast model, and predicting the outcome of obtaining is compared to judge whether corporations can merge with described separatrix value.
Preferably, following expression is utilized to build described forecast model:
Td t = t 0 = P B ( R ( i , j ) t = t 0 = 1 | B d ( i , j ) t = t 0 - Δt ) × ( 1 + log 1 + max | ΔB d ( i , j ) t = t 0 - Δt | ( 1 + ΔB d ( i , j ) t = t 0 - Δt ) ) × ( 1 + log 1 + max | ΔΔ B d ( i , j ) t = t 0 - Δt | ( 1 + ΔΔB d ( i , j ) t = t 0 - Δt ) ) + Sim ( i , j ) t = t 0 - Δt
In formula, for the tendency degree of fusion happens between corporations i and corporations j, P B ( R ( i , j ) t = t 0 = 1 | B d ( i , j ) t = t 0 - Δt ) For probability simulation function, ΔB d ( i , j ) t = t 0 - Δt , ΔΔ B d ( i , j ) t = t 0 - Δt With be respectively the single order change indicator of the inner structure index between corporations i and corporations j, second order change indicator and external structure index of similarity; t 0and t 0-Δ t represents different time points respectively, and Δ t is the time interval.
Preferably, determining that the step of the separatrix value that corporations' generation is merged comprises: predict that the tendency angle value obtained is normalized by according to described forecast model; Utilize the tendency angle value after process and extract based on training data the corporations obtained and merge situation and set up reference function; The separatrix value of fusion is there is in tendency angle value when reference function being obtained maximal value as corporations.
Preferably, reference function is set up according to following expression:
F = 2 αβ α + β
In formula, F is reference function, α and β is parameter, TD 0for extracting the corporations of the generation fusion obtained based on training data to corresponding tendency angle value.
Preferably, step 4 comprises the following steps: the vector key factor index obtained based on training data formed substitutes into SVM forecast model to carry out training to determine that the sorter merged occurs in corporations; The vector of the key factor index obtained based on the data apart from the nearest timeslice of time point to be predicted composition is substituted into described SVM forecast model, and predicts the outcome according to the classification obtained and judge whether corporations can merge.
Preferably, also comprised before step 3: based on described static corporations and described dynamic corporations, each corporation is predicted respectively, obtain corporations' set that will participate in merging; In step 3, extract the key factor index in the set of described corporations arbitrarily between Liang Ge corporations based on training data.
Compared with prior art, the one or more embodiments in such scheme can have the following advantages or beneficial effect by tool:
By extracting the key factor index between Liang Ge corporations, whether to any Liang Ge corporations or multiple corporations can the prediction of fusion happens, the method predicting reliability is high if achieving, can the pervasive analysis of complex network of having the right in the overwhelming majority or having no right.
Other advantages of the present invention, target, to set forth in the following description to a certain extent with feature, and to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, or can be instructed from the practice of the present invention.Target of the present invention and other advantages can by instructionss below, claims, and in accompanying drawing, specifically noted structure realizes and obtains.
Accompanying drawing explanation
Accompanying drawing is used to provide the further understanding of technical scheme to the application or prior art, and forms a part for instructions.Wherein, the expression accompanying drawing of the embodiment of the present application and the embodiment one of the application are used from the technical scheme explaining the application, but do not form the restriction to technical scheme.
Fig. 1 is the schematic flow sheet of the Forecasting Methodology of corporations' fusion event of the embodiment of the present application;
Fig. 2 is inner structure index cumulative distribution curve figure;
Fig. 3 is external structure index of similarity cumulative distribution curve figure;
Fig. 4 be the embodiment of the present application utilize key factor index exercise supervision training schematic flow sheet;
Fig. 5 is the schematic flow sheet of the Forecasting Methodology of corporations' fusion event of the embodiment of the present application.
Embodiment
Describe embodiments of the present invention in detail below with reference to drawings and Examples, to the present invention, how application technology means solve technical matters whereby, and the implementation procedure reaching relevant art effect can fully understand and implement according to this.Each feature in the embodiment of the present application and embodiment, can be combined with each other under prerequisite of not conflicting mutually, the technical scheme formed is all within protection scope of the present invention.
In addition, the step shown in process flow diagram of accompanying drawing can perform in the computer system of such as one group of computer executable instructions.Further, although show logical order in flow charts, in some cases, can be different from the step shown or described by order execution herein.
Description below is carried out for common-use words in some fields occurred in this application:
Network topology structure: be called network topology structure by node and the internodal structure connected to form in network.
Network corporations: by instrument research complex networks such as the graph theorys in mathematics.Given Graph G, network corporations are subgraph G' that an each point is closely connected.Community structure the most intuitively quantization method is that network corporations internal density is greater than external den.
Timeslice: namely when fixing some time points, snapshot is carried out to network, cut into slices at certain time point as to the network of continuous Change and Development, be called timeslice.
Static corporations: the corporations dividing out in certain timeslice.
Dynamic corporations: corporations' evolutional path of formation that the static corporations dividing out in a series of timeslice are linked in sequence according to time order and function.
Corporations' fusion event: the node in the corporations of two and above number is detected in certain time in future and is communicated in corporations.
Supervised training: namely carry out iterative computation according to training set inputoutput data and obtain the models such as classification, prediction.
Training data: the historical data being used for obtaining training pattern.
Fig. 1 is the schematic flow sheet of the Forecasting Methodology of corporations' fusion event of the embodiment of the present application.The method comprises: step S110, the timeslice of network raw data according to setting split, and therefrom chooses multiple timeslice data as training data; Step S120, training data carried out to the division of static corporations and dynamic corporations; Step S130, extract the key factor index between any Liang Ge corporations based on training data; Step S140, exercise supervision to described key factor index training, and judge whether any Liang Ge corporations can merge according to the learning outcome of supervised training.
The method that the thought that the present invention is based on node rank link prediction proposes the Level Link prediction of a kind of corporations judges corporations' fusion event.The prediction of nodes Level Link refers to by information such as known network structures, predicts the possibility producing connection in this network between any two connectionless nodes within certain period in the future.Be specially, choose one be positioned at time point to be predicted before, and the moment definition nearer apart from time point to be predicted is current time t 0, choose t 0the t ' of point sometime (t ' < t before 0), with [t ', t 0] raw data in time range, and based on [t ', t 0) achievement data that extracts of raw data in time range as training data, and utilizes the selected time interval to carry out timeslice segmentation to the raw data in above-mentioned time range.
In an embodiment of the application, during screening raw data, the historical data be close with time point to be predicted to be chosen as far as possible.This is because the network behavior in reality is usually expressed as the regularity of short-term, namely the impact of the factor of a network within state presented the sometime in the future main time period more close with this moment is relevant.Therefore, before generally getting time point to be predicted, the raw data of 4-7 timeslice is as training data, to avoid introducing a large amount of invalid datas and impact prediction effect, limits the scope of raw data, ensures the validity of data.For example, to predict this event of setting up of the mutual powder relation of Sina's microblogging, split the time interval used self-defined in the scope of 3 ~ 20 days.Concrete, time point to be predicted is on May 15th, 2013, and raw data comprises the mutual powder relation data between 3.3 general-purpose family id and 3,650,000 routine users, is specially the formation time of the mutual powder relation of Sina's microblogging.The time span of raw data is in November, 2010 ~ 2013 year November, choose from raw data on May 1,15 days to 2013 March in 2013 during this period of time in data as training data, be spaced apart Δ t=15 days access time.1 day May in 2013 before a time interval of time point to be predicted is set to current time t 0.Utilize Δ t to carry out timeslice segmentation to the training data in above-mentioned time range, timeslice can be obtained and be respectively on May 1st, 2013, on April 15th, 2013, on April 1st, 2013 and on March 15th, 2013.
Next, the excavation of static corporations and dynamic corporations is carried out based on the training data chosen.The object that static corporations divide determines corporations' state of each timeslice.The division of static corporations may be used for the extraction of subsequent dynamic corporations, and the division of static corporations is simultaneously conducive to extracting the parameters relationship between corporations.
In an embodiment of the application, have employed the excavation that Fast--Unfolding algorithm carries out static corporations.Fast-Unfolding algorithm is the static corporations mining algorithm based on modularity, this algorithm is based on the self-similarity of complex network, and each Loop partition corporations out be have employed to the concept of level, complete hierarchical community structure can be shown within the extremely short time.Its Output rusults contain after each step iterative computation corporations' Result so that select the result after certain iteration according to demand.The detailed process of this algorithm comprises, and first each node in network is regarded as independently corporations being numbered it.Then, for each node, consider its each neighbor node, calculate respectively and this node is deleted from the corporations belonging to oneself script, and bind together later modularity with the corporations at its neighbor node place, by the value of more each modularity, by this node binding in the maximum corporations of the growth of modularity.By to each node sequence, repeatedly perform said process, till cannot obtaining the lifting of modularity again.Next, by the corporations excavated are regarded as node, set up a new network, and utilize alternative manner above to continue iteration.Wherein, the weight summation of the connection between the corporations of the weight between new node representated by new node replaces.Node in new network has been connected and composed from ring between node in same corporations.Final when modularity no longer increases, iteration stopping when namely the community structure of network no longer changes.Fast-Unfolding algorithm workable, simultaneously because the complexity of this algorithm is linear, so travelling speed is exceedingly fast.
It should be noted that, in other embodiments of the application, other the static corporations method for digging based on modularity can also be adopted, such as Newman greedy algorithm, Newman fast algorithm etc.The static corporations corresponding with each timeslice can be obtained gather by carrying out static corporations dividing.
The generation of corporations' evolution critical event has different states by making the structure of corporations at different time sheet.The set of the state of a certain corporations in a series of timeslice is represented with dynamic corporations.Each state in set is organized according to the sequencing of the evolutional path of dynamic corporations.The excavation of dynamic corporations is generally carried out based on the division result of static corporations.Be specially, for any two adjacent time sheet t and t+1, by the static corporations C of timeslice t twith the static corporations C of timeslice t+1 t+1mate, and will the C of certain simulated condition be met t+1add C tin the dynamic corporations evolutionary series at place, repeatedly perform the evolutional path that said process extracts dynamic corporations.
In an embodiment of the application, have employed the excavation that Louvain algorithm carries out dynamic corporations.This algorithm when the set of initialization dynamic corporations, for each corporations during static corporations corresponding with timeslice the earliest in training data gather set up the dynamic corporations evolutionary series that take it as initial corporations.When mating two static corporations, Jaccard likeness coefficient is used to judge whether Liang Ge corporations mate, when Jaccard likeness coefficient is greater than selected matching threshold (such as getting 0.3), think that Liang Ge corporations mate, and the corporations that the match is successful to be added with the coupling corporations of correspondence be position last in the dynamic corporations evolutionary series of initial corporations.When Jaccard likeness coefficient is less than or equal to selected matching threshold, think that Liang Ge corporations do not mate, and do not have the corporations that the match is successful to generate a new evolution corporations sequence for this.If there is two or more corporations and certain evolution corporations sequences match, then the evolution corporations sequence that generation one is new, identical the member of the first two corporations sequence in this coupling moment.
For example, the training data based on Sina's muffler carries out corporations' excavation, can obtain the data file of static corporations for " merge_yyyymmdd.comm ", and the data of each timeslice is stored in respectively in a data file.Every a line in this data file represents corporations, and often row is made up of each node serial number data belonging to these corporations.Can obtain the data file of dynamic corporations for " merge_yyyyddmm_4/15.timeline ", wherein 4 representatives are with t simultaneously 0this Timeline for current time has got 4 timeslices altogether, and 15 to represent time slice interval Δ t be 15 days.Data layout in this data file be " dynamic corporations numbering 1: numbering 1, timeslice numbering 2=static state corporations of timeslice numbering 1=static corporations numbering 2 ... " form.
It should be noted that, in other embodiments of the application, different dynamic corporations extracting method can also be adopted, such as Louvain algorithm, FEDN algorithm etc.The static corporations obtained according to excavation and dynamic corporations can extract t easily 0corporations' fusion event in moment.Be specially, if at timeslice t 0the dynamic corporations D that two of-Δ t are different iand D jat timeslice t 0match same corporations, then claim these two dynamic corporations to merge.Prediction index can also be extracted easily from training data based on excavating the corporations obtained.
Prediction index should show the rule comprised in training data as much as possible.In prior art when carrying out the extraction of prediction index, with single corporations for extracting object, what obtain when therefore utilizing index to predict is also whether single corporations participate in corporations' fusion event, and generation fusion can not be predicted between which corporation.For solving the problems of the technologies described above, in the present invention, extract prediction index for any pair corporation in network, the index extracted is relevant with the network topology structure of a pair corporation, and predicting the outcome is whether a pair corporation can merge.
Particularly, in the embodiment of the application, being extracted the prediction index whether multiple Liang Ge of impact corporations can merge, is key factor index by their unified definitions, different according to the effect whether different Index Influence corporations can merge, be divided into direct factor index and indirect factor index.Further, direct factor index comprises the inner structure index between Liang Ge corporations, and the single order change indicator of inner structure index and second order change indicator, and indirect factor index comprises the external structure index of similarity of Liang Ge corporations.To introduce respectively below.
Inner structure index B between corporations i and corporations j dbe defined as:
B d ( i , j ) = E i , j ( E i / N i + E j / N j ) / 2 - - - ( 1 )
In formula, E i,jrepresent the linking number between corporations i and corporations j, E iand E jrepresent the inside linking number of corporations i and corporations j respectively, N iand N jrepresent the internal node number of corporations i and corporations j respectively.
Concrete, can preliminary judgement according to the definition of corporations and network structure thereof, for given Liang Ge corporations, between corporations, linking number is more, and the possibility that corporations merge is larger, and corporations' internal density is less, and the possibility of corporations' fusion is larger.
Fig. 2 is inner structure index cumulative distribution curve figure.The time point t got in office 0-Δ t, extracts the inner structure index between any two corporations of this timeslice, and observes it at time point t 0fusion situation.Afterwards respectively for the corporations of merging to the corporations of not merging to the cumulative distribution curve (CDF drawing inner structure index, Cumulative Distribution Function), in figure, curve 1 represents the cumulative distribution curve of the right inner structure index of corporations not occurring to merge, and curve 2 represents the cumulative distribution curve of the inner structure index that the corporations that occur to merge are right.As can be seen from Figure 2, the span that inner structure desired value is merging corporations' centering concentrates on 1 ~ 15, and concentrates on 0 ~ 0.5 in the span not merging corporations' centering.Illustrate that this inner structure index can be distinguished effectively and merge corporations and non-fused corporations.
Further, B dthe possibility that merges of value larger expression corporations larger, and B dsingle order changes delta B drepresent B dalong with the speedup of Time evolution, speedup is larger, illustrates that the trend that corporations merge is more obvious.B dsecond order changes delta Δ B dalso be same reason.Therefore, according to B ddefine its single order change indicator Δ B dwith second order change indicator Δ Δ B d.As shown in expression formula (2), (3):
&Delta;B d ( i , j ) t 0 - &Delta;t = B d ( i , j ) t 0 - &Delta;t - B d ( i , j ) t 0 - 2 &Delta;t - - - ( 2 )
&Delta; &Delta;B d ( i , j ) t 0 - &Delta;t = &Delta;B d ( i , j ) t 0 - &Delta;t - &Delta;B d ( i , j ) t 0 - 2 &Delta;t - - - ( 3 )
In formula, B d ( i , j ) t 0 - &Delta;t , &Delta;B d ( i , j ) t 0 - &Delta;t With represent that the time is t respectively 0inner structure index between corporations i during-Δ t and corporations j and single order change indicator thereof and second order change indicator.Wherein, second order change indicator can be expressed as the form of expression formula (4) further:
&Delta;&Delta;B d ( i , j ) t 0 - &Delta;t = &Delta;B d ( i , j ) t 0 - &Delta;t - &Delta;B d ( i , j ) t 0 - 2 &Delta;t = B d ( i , j ) t 0 - &Delta;t - 2 B d ( i , j ) t 0 - 2 &Delta;t + B d ( i , j ) t 0 - 3 &Delta;t - - - ( 4 )
Therefore, in the embodiment of the application, apply above-mentioned direct factor index when predicting, need the data choosing four timeslices as training data, be respectively t 0, t 0-Δ t, t 0-2 Δ t and t 0-3 Δ t.
When extracting indirect factor index, apply the extracting method of the local structure similarity degree between node.Predict that the index producing the possibility of connection between two nodes is two node n iand n jbetween structural similarity, as shown in expression formula (5):
Sim ( n i , n j ) = n &RightArrow; i &CenterDot; n &RightArrow; j | n &RightArrow; i | + | n &RightArrow; j | - n &RightArrow; i &CenterDot; n &RightArrow; j = &Sigma; k = 1 m ( w ik &times; w jk ) &Sigma; k = 1 m w ik 2 + &Sigma; k = 1 m w jk 2 - &Sigma; k = 1 m ( w ik &times; w jk ) - - - ( 5 )
In formula, w ijfor node n iwith node n jbetween connect weights, if having no right network, then these weights are 1.If node n iwith node n jbetween without connect, then w ijvalue be 0.
In the embodiment of the application, extract the external structure index of similarity between Liang Ge corporations by the weights between definition Liang Ge corporations.Particularly, the weight w between corporations i and corporations j i,jas shown in expression formula (6):
w i , j = E i , j 2 N i &times; N j - - - ( 6 )
In formula, E i,jrepresent the linking number between corporations i and corporations j, N iand N jrepresent the nodes of corporations i and corporations j inside respectively.
Weight w i,jbe used for weighing the relation weights between the corporations that have ignored concrete inner structure.Between corporations, more multiple weighing value is larger for linking number, and when between corporations, linking number is certain, the weights between nodes Yue Shao corporations of corporations own are relatively large.
External structure index of similarity between the corporations i finally obtained and corporations j is as shown in expression formula (7), and wherein m represents corporations' sequence number number,
Sim ( i , j ) = &Sigma; k = 1 k &NotEqual; i , j m ( w i , k &times; w j , k ) &Sigma; k = 1 k &NotEqual; i , j m w i , k 2 + &Sigma; k = 1 k &NotEqual; i , j m w j , k 2 - &Sigma; k = 1 k &NotEqual; i , j m ( w j , k &times; w j , k ) , w i , k = E i , k 2 N i &times; N k w i , k = E j , k 2 N j &times; N k - - - ( 7 )
Fig. 3 is external structure index of similarity cumulative distribution curve figure.Adopt and method like the CDF class of a curve setting up inner structure index, set up the CDF curve of external structure index of similarity.In figure, curve 1 represents the cumulative distribution curve of the right inner structure index of corporations not occurring to merge, and curve 2 represents the cumulative distribution curve of the inner structure index that the corporations that occur to merge are right.As can be seen from Figure 3, the span that external structure index of similarity value is merging corporations' centering concentrates between 0.00015 ~ 0.006, and is 0 in the value major part not merging corporations' centering.Namely when external structure index of similarity value is larger, the probability that corporations occur to merge is higher, and this index effectively can be distinguished and merges corporations and do not merge corporations.
Above-mentioned each direct factor index and indirect factor index, be all through the result of coincidence theory analysis that verification experimental verification obtains and practical operation.Both taken into full account the major influence factors between corporations and minor effect factor, and be easy to again calculate and realize.Wherein, direct factor index defines based on the architectural characteristic of link completely, has universality, can directly apply in the analysis of complex network.Indirect factor Index Establishment has on the basis of friend relation attribute at social networks.If two strangers have much common friend, then these two strangers become the possibility of friend can be larger, the prediction therefore for the fusion event of social networks can obtain good effect.
Next, judge whether any Liang Ge corporations will merge based on the training that exercises supervision of above-mentioned key factor index.In an embodiment of the application, utilize key factor index exercise supervision training process as shown in Figure 4, comprise: step S410, utilize the key factor index that obtains based on training data to build forecast model, and determine that the separatrix value merged occurs in corporations; Step S420, the key factor index obtained based on the data apart from the nearest timeslice of time point to be predicted is substituted into described forecast model, and predicting the outcome of obtaining is compared to judge whether corporations can merge with described separatrix value.
Concrete, first set up the probability simulation function between probability that inner structure index and corporations occur to merge.The fusion situation of corporations is represented, when corporations i and corporations j is at t with function R (i, j) 0moment, when merging, R (i, j) got 1, when corporations i and corporations j is at t 0moment, when not merging, R (i, j) got 0, can represent with expression formula (8):
Then work as t 0-Δ t, and ought inner structure index now for BD 0time, corporations i and corporations j is at t 0the probability that moment will occur to merge is as shown in expression formula (9):
P ( R ( i , j ) t = t 0 = 1 | B d ( i , j ) t = t 0 - &Delta;t = BD 0 ) = &Sigma; i , j &Element; m i &NotEqual; j R ( i , j ) t = t 0 N ( i , j ) t = t 0 | B d ( i , j ) t = t 0 - &Delta;t = BD 0 - - - ( 9 )
In formula, &Sigma; i , j &Element; m i &NotEqual; j R ( i , j ) t = t 0 | B d ( i , j ) t = t 0 - &Delta;t = BD 0 Represent and work as value is BD 0t 0there is the corporations' logarithm merged in the moment, represent and work as value is BD 0t 0moment all corporations logarithm.Further, according to a series of value obtains a series of conditional probability value, probability function matching is carried out to them, obtains probability simulation function for example, when the historical data based on Sina's microblogging carries out Function Fitting, the probability simulation function obtained has the form as shown in expression formula (10):
P B ( R ( i , j ) t = t 0 = 1 | B d ( i , j ) t = t 0 - &Delta;t ) = b &times; 1 n [ B d ( i , j ) t = t 0 - &Delta;t - &alpha; ] + T 0 - - - ( 10 )
Wherein, a, b, T 0the parameter of the probability function produced in fit procedure for utilizing training data, and a=-0.5, b=0.038, T 0=0.03.
Further, utilize obtain probability simulation function, the single order change indicator of connecting inner structure index, second order change indicator and Liang Ge corporations external structure index of similarity build forecast model, as shown in expression formula (11):
Td t = t 0 = P B ( R ( i , j ) t = t 0 = 1 | B d ( i , j ) t = t 0 - &Delta;t ) &times; ( 1 + log 1 + max | &Delta;B d ( i , j ) t = t 0 - &Delta;t | ( 1 + &Delta;B d ( i , j ) t = t 0 - &Delta;t ) ) &times; ( 1 + log 1 + max | &Delta;&Delta; B d ( i , j ) t = t 0 - &Delta;t | ( 1 + &Delta;&Delta;B d ( i , j ) t = t 0 - &Delta;t ) ) + Sim ( i , j ) t = t 0 - &Delta;t - - - ( 11 )
In formula, represent t 0the tendency degree of fusion happens between moment corporations i and corporations j.According to the probability function relation that the inner structure index set up before and corporations merge, and according to analysis before, Δ B dwith Δ Δ B dlarger, it is larger that possibility is merged in corporations, roughly determines forecast model and B d, Δ B dwith Δ Δ B dbetween relation.Further, on the basis that inner structure index is certain, external structure index of similarity value is larger, and it is larger that possibility is merged in corporations.Therefore, the forecast model merging tendency degree for weighing corporations is constructed.It should be noted that in addition, forecast model is not unique.
Utilizing forecast model and training data to determine there is the separatrix value merged in corporations, comprise, predict that the tendency angle value obtained is normalized by according to forecast model, utilize the tendency angle value after process and extracts based on training data the corporations obtained and merges situation and set up reference function, tendency angle value when reference function being obtained maximal value is as the separatrix value of corporations' generation fusion.
Concrete, the tendency degree of expression formula (11) is normalized and can obtains expression formula (12):
Td merge | t = t 0 = Td t = t 0 - min ( Td t = t 0 ) max ( Td t = t 0 ) - min ( Td t = t 0 ) - - - ( 12 )
By obtain for t 0the tendency degree descending sort predicted the outcome after according to normalization of timeslice, and and t 0the truth that the corporations in moment merge carries out contrast to set up reference function.For often pair be predicted correctly for will occur merge corporations corresponding to tendency angle value TD 0, calculate as follows:
In formula, TD 0for extracting the corporations of the generation fusion obtained based on training data to corresponding tendency angle value.Further, set up reference function according to expression formula (13), and TD when expression formula (13) being obtained maximal value 0value as judge corporations whether will occur merge separatrix value div 0.
F = 2 &alpha;&beta; &alpha; + &beta; - - - ( 13 )
Choose div 0the value size relative equilibrium of corresponding α and β of the value that Shi Xiwang chooses, therefore the blending average of α and β is got in the calculating of reference function F value.
After obtaining forecast model and judging whether corporations the separatrix value merged will occur, just can according to current time t 0training data predicted time be t 0the fusion situation of the corporations of+Δ t, specifically comprises: based on t 0the training data in moment extracts key factor index, substitutes into the forecast model of expression formula (11) using obtained key factor index as the input data of forecast model, and by normalizedly predicting the outcome of obtaining according to expression formula (12) merge separatrix with corporations and be worth div 0compare to judge whether corporations will merge.Concrete, when time, corresponding right the predicting the outcome as merging of corporations, on the contrary be then predicted as and do not merge.
According to the embodiment of the application, based on the corporations pair that will occur to merge predicted, a concrete fusion event can be released further and which corporation be made up of.If such as predict, corporations i and corporations j will merge, and corporations j and corporations k will merge, and corporations k and corporations i will merge, and so corporations i, j, k just constitutes a fusion event.By predicting the fusion behavior between any Liang Ge corporations, effectively reducing range of observation, reducing the difficulty of large data when carrying out corporations' convergence analysis.
For example, in the above-mentioned example predicted Sina's microblogging, by calculating α, β and F, TD when can obtain making F to obtain maximal value 0.13 0value be 0.1247, using 0.1247 as judge corporations whether will occur merge separatrix value.As the Td obtained by forecast model mergewhen being greater than 0.1247, corresponding right the predicting the outcome as merging of corporations, works as Td mergewhen being less than or equal to 0.1247, corresponding right the predicting the outcome as not merging of corporations.
The method predicted based on the fusion behavior of forecast model to corporations of training data foundation is a kind of training method having supervision, experimental data shows simultaneously, the accuracy rate and the recall rate that realize this function are that effect is good, relevant practical application can be supported, and provide convenience for the further research of corporations' evolution aspect.
Certainly, when being judged by supervised training whether any Liang Ge corporations merge by generation with, existing forecast model can also be used to carry out training and predicting.In other embodiments of the application, adopt SVM model to predict, implementation process is: first the vector that the key factor index obtained based on training data forms is substituted into SVM forecast model and carry out training to determine that the sorter merged occurs in corporations; Again the vector of the key factor index obtained based on the data apart from the nearest timeslice of time point to be predicted composition is substituted into described SVM forecast model, and predict the outcome according to the classification obtained and judge whether corporations can merge.Concrete, note t 0-Δ t timeslice corporations C iand C jbetween direct factor index be respectively inner structure and the single order change of inner structure change with second order indirect factor is the external structure similarity of corporations these four parameters be normalized respectively, then n-th training set input variable that this module uses is
X n = ( B d &prime; ( C i , C j ) t 0 - &Delta;t , &Delta; B d &prime; ( C i , C j ) t 0 - &Delta;t , &Delta;&Delta;B d &prime; ( C i , C j ) t 0 - &Delta;t , Sim &prime; ( C i , C j ) t 0 - &Delta;t ) T
Article n-th, training set data output variable is
Further, the input of all training sets, output variable are substituted in sorter, can train and obtain disaggregated model and lineoid, then substitute into the t after normalization 0the prediction input data of timeslice, wherein n-th prediction input data input variable is
X n = ( B d &prime; ( C i , C j ) t 0 , &Delta; B d &prime; ( C i , C j ) t 0 , &Delta;&Delta;B d &prime; ( C i , C j ) t 0 , Sim &prime; ( C i , C j ) t 0 ) T .
Carrying it in disaggregated model according to result of calculation is+1 or-1 can learn which side of these data at lineoid, namely its predict the outcome into, at t 0whether+Δ t corporations can merge.
It should be noted that, in real social networks, the non-fusion event of corporations far away more than corporations' fusion event, therefore gather to initial training concentrate and do not merge corporations' sample size far more than fusion corporations sample size.Therefore, the tendency degree model adopting the embodiment of the present application to set up all can obtain corporations for arbitrary key factor desired value and merge tendency angle value, and the corporations obtained fusion tendency angle value has obvious discrimination.And predicting the outcome of SVM model can be inclined to the many classes of sample, there will be the fusion corporations number doped is the situation of zero.If it is balanced to sample to reach two class sample sizes to sample, then can greatly reduce sample data quality, the training pattern obtained also can serious distortion.
In addition, whether Forecasting Methodology above is all directly predicted merging between any Liang Ge corporations, but apparently, predict which Liang Ge corporation will merge, or which corporation will merge, be all based on corporations self have occur merge possibility basis on, therefore, in another embodiment of the application, by first whether there is the possibility participating in merging to single corporations judge, reduce the scope of prediction further, as shown in Figure 5, the method comprises: step S510, the timeslice of network raw data according to setting is split, and therefrom choose multiple timeslice data as training data, step S520, training data carried out to the division of static corporations and dynamic corporations, step S530, based on described static corporations and described dynamic corporations, each corporation to be predicted respectively, obtain corporations' set that will participate in merging, step S540, to extract the key factor index arbitrarily between Liang Ge corporations in the set of described corporations based on training data, step S550, exercise supervision to described key factor index training, and judge whether any Liang Ge corporations can merge according to the learning outcome of supervised training.
Particularly, in the step predicted respectively each corporation based on described static corporations and described dynamic corporations, any method that the single corporations of prediction of the prior art can be adopted whether to participate in merging realizes.For example, the sorting algorithm of existing SVM support vector machine is adopted, for t 0each corporations of-Δ t timeslice, extract three basic indexs: the internal edges of corporations' size, corporations and the ratio (In-Degree) of corporations' node degree summation and corporations are at timeslice t 0during-Δ t and at timeslice t 0jaccard similarity coefficient during-2 Δ t, and " the single order change indicator " and " second order change indicator " of these three indexs.Observe these corporations at t 0timeslice whether with other corporations' fusion happens (fusion is designated as 1, does not merge and is designated as-1).Above index and fusion results substitution SVM classifier are carried out training and obtains forecast model.Equally, t is got 0each index of the corporations of timeslice, substitute in forecast model, classification results is " 1 ", then predict the outcome as these corporations are about to and other corporations' fusion happens, if contrary classification results is "-1 ", then predict the outcome into these corporations can not with other corporations' fusion happens.Determining the set that will participate in the corporations of merging thus, like this when setting up key factor index based on above-mentioned set, greatly can reduce workload, improve forecasting efficiency.
Corporations' fusion forecasting method of the embodiment of the present application can apply the prediction with network of having the right, and has more general range of application.By defining the key factor index between any Liang Ge corporations, realize the effective prediction to the fusion behavior between Liang Ge corporations or between multiple corporations.
Although the embodiment disclosed by the present invention is as above, the embodiment that described content just adopts for the ease of understanding the present invention, and be not used to limit the present invention.Technician in any the technical field of the invention; under the prerequisite not departing from the spirit and scope disclosed by the present invention; any amendment and change can be done what implement in form and in details; but scope of patent protection of the present invention, the scope that still must define with appending claims is as the criterion.

Claims (10)

1. a Forecasting Methodology for corporations' fusion event, comprising:
Step one, the timeslice of network raw data according to setting to be split, and therefrom choose multiple timeslice data as training data;
Step 2, training data carried out to the division of static corporations and dynamic corporations;
Step 3, extract the key factor index between any Liang Ge corporations based on training data;
Step 4, exercise supervision to described key factor index training, and judge whether any Liang Ge corporations can merge according to the learning outcome of supervised training.
2. method according to claim 1, is characterized in that,
Described key factor index comprises the external structure index of similarity of the inner structure index between Liang Ge corporations, the single order change indicator of described inner structure index, second order change indicator and Liang Ge corporations.
3. method according to claim 2, is characterized in that, utilizes following expression to extract inner structure index between described Liang Ge corporations:
B d ( i , j ) = E i , j ( E i / N i + E j / N j ) / 2
In formula, B d(i, j) is the inner structure index between corporations i and corporations j, E i,jfor the linking number between corporations i and corporations j, E iand E jbe respectively the linking number of corporations i and corporations j inside, N iand N jbe respectively the nodes of corporations i and corporations j inside.
4. method according to claim 2, is characterized in that, utilizes following expression to extract the external structure index of similarity of described Liang Ge corporations:
Sim ( i , j ) = &Sigma; k = 1 k &NotEqual; i , j m ( w i , k &times; w j , k ) &Sigma; k = 1 k &NotEqual; i , j m w i , k 2 + &Sigma; k = 1 k &NotEqual; i , j m w j , k 2 - &Sigma; k = 1 k &NotEqual; i , j m ( w i , k &times; w j , k )
In formula, Sim (i, j) is the external structure index of similarity of corporations i and corporations j; w i,kand w j,krepresent the power between corporations i and corporations k and between corporations j and corporations k respectively, wherein, e i,kand E j,kbe respectively the linking number between corporations i and corporations k and between corporations j and corporations k, N i, N jand N kbe respectively the nodes of corporations i, corporations j and corporations k inside; M is corporations' sequence number numbers.
5. method according to claim 2, is characterized in that, comprises the following steps in described step 4:
Utilize the key factor index obtained based on training data to build forecast model, and determine that the separatrix value merged occurs in corporations;
The key factor index obtained based on the data apart from the nearest timeslice of time point to be predicted is substituted into described forecast model, and predicting the outcome of obtaining is compared to judge whether corporations can merge with described separatrix value.
6. method according to claim 5, is characterized in that, utilizes following expression to build described forecast model:
Td t = t 0 = P B ( R ( i , j ) t = t 0 = 1 | B d ( i , j ) t = t 0 - &Delta;t ) &times; ( 1 + log 1 + max | &Delta;B d ( i , j ) t = t 0 - &Delta;t | ( 1 + &Delta;B d ( i , j ) t = t 0 - &Delta;t ) ) &times; ( 1 + log 1 + max | &Delta;&Delta;B d ( i , j ) t = t 0 - &Delta;t | ( 1 + &Delta;&Delta;B d ( i , j ) t = t 0 - &Delta;t ) ) + Sim ( i , j ) t = t 0 - &Delta;t
In formula, for the tendency degree of fusion happens between corporations i and corporations j, P B ( R ( i , j ) t = t 0 = 1 | B d ( i , j ) t = t 0 - &Delta;t ) For probability simulation function, with be respectively the single order change indicator of the inner structure index between corporations i and corporations j, second order change indicator and external structure index of similarity; t 0and t 0-Δ t represents different time points respectively, and Δ t is the time interval.
7. the method according to claim 5 or 6, is characterized in that, is determining that the step of the separatrix value that corporations' generation is merged comprises:
Predict that the tendency angle value obtained is normalized by according to described forecast model;
Utilize the tendency angle value after process and extract based on training data the corporations obtained and merge situation and set up reference function;
The separatrix value of fusion is there is in tendency angle value when reference function being obtained maximal value as corporations.
8. method according to claim 7, is characterized in that, described reference function is set up according to following expression:
F = 2 &alpha;&beta; &alpha; + &beta;
In formula, F is reference function, α and β is parameter, TD 0for extracting the corporations of the generation fusion obtained based on training data to corresponding tendency angle value.
9. method according to claim 1, is characterized in that, step 4 comprises the following steps:
The vector key factor index obtained based on training data formed substitutes into SVM forecast model to carry out training to determine that the sorter merged occurs in corporations;
The vector of the key factor index obtained based on the data apart from the nearest timeslice of time point to be predicted composition is substituted into described SVM forecast model, and predicts the outcome according to the classification obtained and judge whether corporations can merge.
10. method according to claim 1, is characterized in that, also comprises before step 3:
Based on described static corporations and described dynamic corporations, each corporation is predicted respectively, obtain corporations' set that will participate in merging;
In step 3, extract the key factor index in the set of described corporations arbitrarily between Liang Ge corporations based on training data.
CN201510314273.8A 2015-06-09 2015-06-09 Method for predicting association fusion events Pending CN104899657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510314273.8A CN104899657A (en) 2015-06-09 2015-06-09 Method for predicting association fusion events

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510314273.8A CN104899657A (en) 2015-06-09 2015-06-09 Method for predicting association fusion events

Publications (1)

Publication Number Publication Date
CN104899657A true CN104899657A (en) 2015-09-09

Family

ID=54032310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510314273.8A Pending CN104899657A (en) 2015-06-09 2015-06-09 Method for predicting association fusion events

Country Status (1)

Country Link
CN (1) CN104899657A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105490858A (en) * 2015-12-15 2016-04-13 北京理工大学 Dynamic link predication method of network structure
CN106341258A (en) * 2016-08-23 2017-01-18 浙江工业大学 Method for predicting unknown network connection edges based on second-order local community and seed node structure information
CN106533780A (en) * 2016-11-30 2017-03-22 大连大学 Method for establishing evolution model of weighting command and control network based on local area world
CN107122455A (en) * 2017-04-26 2017-09-01 中国人民解放军国防科学技术大学 A kind of network user's enhancing method for expressing based on microblogging
CN108600076A (en) * 2017-03-07 2018-09-28 中移(杭州)信息技术有限公司 A kind of social networks method for building up and system
CN112085104A (en) * 2020-09-10 2020-12-15 杭州中奥科技有限公司 Event feature extraction method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076913A1 (en) * 2008-09-24 2010-03-25 Nec Laboratories America, Inc. Finding communities and their evolutions in dynamic social network
CN102117325A (en) * 2011-02-24 2011-07-06 清华大学 Method for predicting dynamic social network user behaviors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076913A1 (en) * 2008-09-24 2010-03-25 Nec Laboratories America, Inc. Finding communities and their evolutions in dynamic social network
CN102117325A (en) * 2011-02-24 2011-07-06 清华大学 Method for predicting dynamic social network user behaviors

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
于乐: "社会网络中社团发现及网络演化分析", 《中国优秀博士学位论文全文数据库基础科学辑》 *
祝明睿: "多维异构网络上的边和社团的预测与演化的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105490858A (en) * 2015-12-15 2016-04-13 北京理工大学 Dynamic link predication method of network structure
CN105490858B (en) * 2015-12-15 2018-08-03 北京理工大学 A kind of dynamic link prediction technique of network structure
CN106341258A (en) * 2016-08-23 2017-01-18 浙江工业大学 Method for predicting unknown network connection edges based on second-order local community and seed node structure information
CN106341258B (en) * 2016-08-23 2019-01-22 浙江工业大学 Method for predicting unknown network connection edges based on second-order local community and seed node structure information
CN106533780A (en) * 2016-11-30 2017-03-22 大连大学 Method for establishing evolution model of weighting command and control network based on local area world
CN108600076A (en) * 2017-03-07 2018-09-28 中移(杭州)信息技术有限公司 A kind of social networks method for building up and system
CN107122455A (en) * 2017-04-26 2017-09-01 中国人民解放军国防科学技术大学 A kind of network user's enhancing method for expressing based on microblogging
CN107122455B (en) * 2017-04-26 2019-12-31 中国人民解放军国防科学技术大学 Network user enhanced representation method based on microblog
CN112085104A (en) * 2020-09-10 2020-12-15 杭州中奥科技有限公司 Event feature extraction method and device, storage medium and electronic equipment
CN112085104B (en) * 2020-09-10 2024-04-12 杭州中奥科技有限公司 Event feature extraction method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN104899657A (en) Method for predicting association fusion events
Niyogisubizo et al. Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization
Pang et al. Uniform design–based Gaussian process regression for data-driven rapid fragility assessment of bridges
Fang et al. Privacy wizards for social networking sites
CN101478534B (en) Network exception detecting method based on artificial immunity principle
Alkayem et al. Damage identification in three-dimensional structures using single-objective evolutionary algorithms and finite element model updating: evaluation and comparison
CN107239908A (en) A kind of system maturity assessment method of information system
CN110473592A (en) The multi-angle of view mankind for having supervision based on figure convolutional network cooperate with lethal gene prediction technique
CN111949306B (en) Pushing method and system supporting fragmented learning of open-source project
Hankin Introducing untb, an R package for simulating ecological drift under the unified neutral theory of biodiversity
CN115049124A (en) Deep and long tunnel water inrush prediction method based on Bayesian network
Ma et al. Reconstructing complex networks without time series
CN109597926A (en) A kind of information acquisition method and system based on social media emergency event
Mazepa et al. An ontological approach to detecting fake news in online media
Cao et al. Fast and explainable warm-start point learning for AC Optimal Power Flow using decision tree
Hernández-García et al. Simple models for scaling in phylogenetic trees
CN110471279A (en) A kind of industrial production simulated scenario generator and scene method for generation based on vine-copulas
Ahmed Khan et al. Generating realistic IoT‐based IDS dataset centred on fuzzy qualitative modelling for cyber‐physical systems
CN115965795A (en) Deep darknet group discovery method based on network representation learning
Kai et al. A CVSS-based vulnerability assessment method for reducing scoring error
Fernando et al. Development of a Predictive Decision Support System for Student Graduation using a Decision Tree Algorithm
Karaaslanli et al. Constrained spectral clustering for dynamic community detection
Gunu et al. Modern predictive models for modeling the college graduation rates
Podaras Measuring the accuracy levels regarding the dual business function criticality classifier
Ball et al. Genetic Communities™ White Paper: Predicting fine-scale ancestral origins from the genetic sharing patterns among millions of individuals

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150909