CN108270608A

CN108270608A - A kind of foundation of link prediction model and link prediction method

Info

Publication number: CN108270608A
Application number: CN201710004638.6A
Authority: CN
Inventors: 颜永红; 李太松; 张艳
Original assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Current assignee: Institute of Acoustics CAS; Beijing Kexin Technology Co Ltd
Priority date: 2017-01-04
Filing date: 2017-01-04
Publication date: 2018-07-10
Anticipated expiration: 2037-01-04
Also published as: CN108270608B

Abstract

The present invention provides a kind of method for building up of link prediction model, the link prediction model includes：Sequential is limited Boltzmann machine model and gradient promotes decision-tree model；The method includes：A large amount of network data is captured from internet or other multimedias, network data is pre-processed, network data is divided into historical data and available data, input timing is limited Boltzmann machine model, trains model parameter；The network topology characteristic of network data node pair is extracted, feature set is formed and input gradient promotes decision-tree model, train model parameter；The link prediction model is limited Boltzmann machine model including trained sequential and gradient promotes decision-tree model.Based on the link prediction model that this method is established, the present invention also provides a kind of link prediction method, this method can predict the all-links of network NextState.

Description

A kind of foundation of link prediction model and link prediction method

Technical field

The present invention relates to Internet technical fields, and in particular to a kind of foundation of link prediction model and link prediction side Method, this method utilize the topological characteristic of network and deep learning model, and link prediction is carried out to large scale network.

Background technology

Along with the rapid development of internet and mobile communication technology, the contact between people becomes increasingly closer.It is logical Internet and communication network are crossed, constitutes a huge complex network between men.In network it is interpersonal it is interactive, The various aspects in life have been dissolved into exchange and influence.It also gradually attracts attention to the research of community network, and as current One of research hotspot of scientific domain.In society, many people wish the structure and variation by analyzing community network, find Contact principle between nodes knows to hide rule and community network topological features under general phenomenon Relationship between nodal community feature and network node behavior trend, and then find the differentiation essence of community network, utilize this A little information help people that resource and information processing is more effectively configured, and instruct commodity production, human lives, population management, nature Management, interpretation and the decision of planning etc..One important research point of wherein network node behavior trend is exactly link prediction.

Link prediction method is used for describing the development trend in network future, the connection prediction that can be refined between node； Incomplete either hiding side can also be found out in existing imperfect network.Traditional link prediction method generally utilizes Network topology characteristic and nodal community are predicted using the method for machine learning.However these methods are all from microcosmic angle Degree with node to carrying out link prediction for object, is unfavorable for the evolutionary Modeling to network macrostructure, thus its prediction effect There are certain bottlenecks.

Invention content

It is an object of the invention to overcome drawbacks described above existing for current link prediction method, it is proposed that one kind is based on depth The link prediction method of study, this method are limited adjacency matrix of the Boltzmann machine model to macroscopical sequential lower network using sequential It is modeled, then using trained model as generation model, macro-forecast is carried out to the network linking state of next sequential. On the other hand, network part topological characteristic is extracted from microcosmic angle, using machine learning model, (gradient promotes decision tree Learning model) prediction network structure linking status.Finally by the prediction result Weighted Fusion of the two, the final chain of network is obtained Road prediction result.This method describes the evolution of network from two angles of both macro and micro of network, to generate deep learning model Based on, fusion machine learning model improves link prediction performance.

To achieve these goals, the present invention provides a kind of link prediction method based on deep learning, the links Prediction model includes：Sequential is limited Boltzmann machine model and gradient promotes decision-tree model；The method includes：From internet Or a large amount of network data is captured in other multimedias, network data is pre-processed, network data is divided into history number According to and available data, input timing be limited Boltzmann machine model, train model parameter；Extract the net of network data node pair Network topological characteristic, forms feature set and input gradient promotes decision-tree model, trains model parameter；The link prediction model Boltzmann machine model is limited including trained sequential and gradient promotes decision-tree model.

In above-mentioned technical proposal, the method specifically includes：

Step S1) from internet or other multimedias a large amount of network data is captured, network data is pre-processed, Network data is made not include isolated node or node pair；

Step S2) by certain time length network data is divided into timeslice, tectonic network figure G=under each timeslice {G_K,G_K-1,…,G₁, G is expressed as A={ A with sequential adjacency matrix_K,A_K-1,…,A₁, then acknowledging time window is N, N<K, Wherein { A_N,A_N-1,…,A₂For historical data, { A₁It is available data；

Step S3) historical data and available data input timing be limited Boltzmann machine model, train model parameter；

Step S4) by { G_K,…,G₂It is merged into basic network G '；With G₁For standard set, select from G ' and jumped at a distance of for one Node pair, form positive negative sample；And make positive and negative sample size consistent；The network topology characteristic of node pair is extracted, forms feature Collect and input gradient promotes decision-tree model, train model parameter；

Step S5) the link prediction model training finishes, and the link prediction model is limited including trained sequential Boltzmann machine model and gradient promote decision-tree model.

In above-mentioned technical proposal, the step S4) network topology characteristic include feature based on neighbours and based on network The feature of migration.

Based on the link prediction model that the above method is established, the present invention also provides a kind of link prediction method, the sides Method includes：

Step T1) crawl network data to be predicted, and pre-processed, network data is made not include isolated node or section Point pair；

Step T2) by certain time length network data to be predicted is divided into timeslice, it is constructed under each timeslice Network G={ G_K,G_K-1,…,G₁,G₀, wherein { G_N,G_N-1,…,G₂For web-based history figure, { G₁For existing network figure, { G₀} For network to be predicted；G is expressed as A={ A with sequential adjacency matrix_K,A_K-1,…,A₁,A₀}；Time window is N, N<K, when Between window be moved along a unit, historical data becomes { A_N-1,A_N-2,…,A₁, data two-value random initializtion to be predicted is {A₀, { A₀It is available data, input timing is limited Boltzmann machine model, prediction result R1 is obtained after successive ignition；

Step T3) use { G_K,G_K-1,…,G₁Construction basic network, utilize step S4) extraction feature set, input gradient promotion Decision-tree model predicts { G₀Under node connection status, obtain prediction result R2；

Step T4) weighting amalgamation result R1, R2, finally obtain the prediction result R after fusion.

In above-mentioned technical proposal, the step T4) realization process be：

If there is common node pair in R1 and R2, weight merging and obtain R=α R1+ (1- α) R2, α values exist Between 0.5-0.7；If there are the node pair not having in R2, prediction result R=R1 in R1.

The advantage of the invention is that：

1st, link prediction method of the invention has merged deep learning and machine learning method, and network is described from two angles Variation, overcome the deficiency of single model；And prediction be network NextState all-links, thus prediction effect is more Comprehensively, more accurately；

2nd, link prediction method of the invention not only has heterogeneous networks a universality, but also to heterogeneous networks characteristic, no Network with size has good robustness.

Description of the drawings

Fig. 1 is the sequence diagram of the link prediction method of the present invention.

Specific embodiment

The present invention will be further described in detail in the following with reference to the drawings and specific embodiments.

A kind of method for building up of link prediction model, the link prediction model include：Sequential is limited Boltzmann machine (Temporal Restricted Boltzmann Machine, TRBM) model and gradient promote decision tree (Gradient Boosting Decision Trees, GBDT) model；It the described method comprises the following steps：

Step S1) from internet or other multimedias a large amount of network data is captured, network data is pre-processed；

The temporal information on side is included in the network data；If the network data captured is not comprising isolated node or section The network data of point pair, then can be used directly, otherwise need to pre-process the network data captured, delete isolated node With node pair；

Step S2) by certain time length network data is divided into timeslice (snapshot), structure under each timeslice Make network G={ G_K,G_K-1,…,G₁, G is expressed as A={ A with sequential adjacency matrix_K,A_K-1,…,A₁, when then confirming Between window be N (N<K), wherein { A_N,A_N-1,…,A₂For historical data, { A₁It is available data；

Step S3) historical data and available data inputted into TRBM models, training pattern parameter；

Step S4) by { G_K,…,G₂It is merged into basic network G '；With G₁For standard set, select from G ' and jumped at a distance of for one Node pair, form positive negative sample；It due to positive and negative imbalanced training sets, needs to node to sampling so that positive and negative sample number Amount is consistent；The network topology characteristic of node pair is extracted, form feature set and inputs GBDT model trainings, training pattern parameter；

Network topology characteristic includes the feature based on neighbours and the feature based on network wandering in the present embodiment；In this reality It applies in example, neighbors feature Adamic-Adar；Migration is characterized as RootedPagerank.

Step S5) the link prediction model training finishes, and the link prediction model includes trained TRBM models With GBDT models.

As shown in Figure 1, based on the link prediction model that the above method is established, the present invention also provides a kind of link prediction sides Method, the method includes：

Step T2) by certain time length network data to be predicted is divided into timeslice, it is constructed under each timeslice Network G={ G_K,G_K-1,…,G₁,G₀, wherein { G_N,G_N-1,…,G₂For web-based history figure, { G₁For existing network figure, { G₀} For network to be predicted；G is expressed as A={ A with sequential adjacency matrix_K,A_K-1,…,A₁,A₀}；Time window is N (N<K), when Between window be moved along a unit, historical data becomes { A_N-1,A_N-2,…,A₁, data two-value random initializtion to be predicted is {A₀, { A₀It is available data, TRBM models are inputted, prediction result R1 is obtained after successive ignition；

Step T3) use { G_K,G_K-1,…,G₁Construction basic network, utilize step S4) extraction feature set, input GBDT moulds Type predicts { G₀Under node connection status, obtain prediction result R2；

If there is common node pair in R1 and R2, weighting merges R=α R1+ (1- α) R2, and α values are in 0.5-0.7 Between；If there are the node pair not having in R2 in R1, using the result of R1 as final result R=R1.

Claims

1. a kind of method for building up of link prediction model, the link prediction model include：Sequential is limited Boltzmann machine model Decision-tree model is promoted with gradient；The method includes：A large amount of network data is captured from internet or other multimedias, it is right Network data is pre-processed, and network data is divided into historical data and available data, input timing is limited Boltzmann machine Model trains model parameter；The network topology characteristic of network data node pair is extracted, feature set is formed and input gradient is promoted Decision-tree model trains model parameter；The link prediction model is limited Boltzmann machine model including trained sequential Decision-tree model is promoted with gradient.

2. the method for building up of link prediction model according to claim 1, which is characterized in that the method specifically includes：

Step S1) from internet or other multimedias a large amount of network data is captured, network data is pre-processed, makes net Network data do not include isolated node or node pair；

Step S2) by certain time length network data is divided into timeslice, tectonic network figure G={ G under each timeslice_K, G_K-1,…,G₁, G is expressed as A={ A with sequential adjacency matrix_K,A_K-1,…,A₁, then acknowledging time window is N, N<K, wherein {A_N,A_N-1,…,A₂For historical data, { A₁It is available data；

Step S4) by { G_K,…,G₂It is merged into basic network G '；With G₁For standard set, selected from G ' at a distance of the section jumped for one Point pair, forms positive negative sample；And make positive and negative sample size consistent；The network topology characteristic of node pair is extracted, forms feature set simultaneously Input gradient promotes decision-tree model, trains model parameter；

Step S5) the link prediction model training finishes, and the link prediction model is limited Bohr including trained sequential Hereby graceful machine model and gradient promote decision-tree model.

3. the method for building up of link prediction model according to claim 2, which is characterized in that the step S4) network Topological characteristic includes the feature based on neighbours and the feature based on network wandering.

4. a kind of link prediction method, based on the link prediction model realization that the method described in one of claim 2-3 is established, institute The method of stating includes：

Step T1) crawl network data to be predicted, and pre-processed, network data is made not include isolated node or node It is right；

Step T2) by certain time length network data to be predicted is divided into timeslice, tectonic network under each timeslice Scheme G={ G_K,G_K-1,…,G₁,G₀, wherein { G_N,G_N-1,…,G₂For web-based history figure, { G₁For existing network figure, { G₀To treat The network of prediction；G is expressed as A={ A with sequential adjacency matrix_K,A_K-1,…,A₁,A₀}；Time window is N, N<K, time window A unit is moved along, historical data becomes { A_N-1,A_N-2,…,A₁, data two-value random initializtion to be predicted is { A₀, {A₀It is available data, input timing is limited Boltzmann machine model, prediction result R1 is obtained after successive ignition；

Step T3) use { G_K,G_K-1,…,G₁Construction basic network, utilize step S4) extraction feature set, input gradient promotion decision Tree-model predicts { G₀Under node connection status, obtain prediction result R2；

5. link prediction method according to claim 1, which is characterized in that the step T4) realization process be：

If there is common node pair in R1 and R2, weight merging and obtain R=α R1+ (1- α) R2, α values are in 0.5- Between 0.7；If there are the node pair not having in R2, prediction result R=R1 in R1.