CN106960251A

CN106960251A - A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology

Info

Publication number: CN106960251A
Application number: CN201710136070.3A
Authority: CN
Inventors: 宣琦; 赵明浩; 虞烨炜; 周鸣鸣; 傅晨波
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2017-03-09
Filing date: 2017-03-09
Publication date: 2017-07-18

Abstract

A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology, comprise the following steps：1) according to data with existing collection, including node and even side right value, set up undirected graph；2) the following three classes similarity indices of network in calculating 1) respectively：Local similarity index, global similarity indices and half local similarity index；3) the three class similarity indices according to obtained by being calculated in 2), using multiple linear regression model, connect side right value, then with ten folding cross-validation methods, testing model average behavior, including Pearson's coefficient and root-mean-square value in prediction test set.The present invention utilizes node similitude, and company's side right value of missing is predicted using multiple linear regression model, and model is simple, predicts the outcome preferably.

Description

Undirected network connection edge weight prediction method based on node similarity

Technical Field

The invention relates to the field of link prediction and data mining, in particular to a method for predicting a connection edge weight based on network node similarity.

Background

In reality, many systems can be abstracted into a model of a complex network, individual objects in the system are abstracted into nodes, and relationships between individuals are abstracted into connecting edges, such as a social network, a protein interaction network, a power network and the like. The network connection edge is used as a bridge for connecting individual objects and plays an important role in revealing a network structure. In reality, the edges of many networks are weighted, and the weights of the edges have definite physical meanings. For various reasons, some network link weights may be missing, and especially when the missing weights contain important network structure information, the prediction of these weights is very critical.

Disclosure of Invention

In order to overcome the defect of poor model prediction results caused by the lack of the existing network continuous edge weights, the invention uses the similarity of network nodes and adopts a multiple linear regression model to predict the missing continuous edge weights, and provides a continuous edge weight prediction method based on undirected network node similarity with better model prediction results.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for predicting a connection edge weight value based on network node similarity comprises the following steps:

s1: constructing a undirected network graph G (V, E) by using an existing undirected network structure data set which comprises connecting edge weights between network nodes and nodes;

s2: according to the graph G ═ V, E, using the node similarity theory in link prediction, the following three types of features are calculated: the method comprises the following steps of obtaining a local similarity index, a global similarity index and a semi-local similarity index, wherein the local similarity index comprises a common neighbor CN, a Salton index, a Jaccard index, a S phi rensen index, a major node favorable index HPI, a major node unfavorable index HDI, an LHN-I index, a preferential link index PA, an Adamic-Adar index AA and a resource allocation index RA; the global similarity indexes comprise Katz indexes, LHN-II indexes, average commute time ACT and cosine similarity cos based on random walk⁺Random walk RWR with restart, SimRank index SimR and matrix forest index MFI; the semi-local similarity index comprises a local path index LP, a local random walk index LRW and a superposed local random walk index SRW;

s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2, and finally obtaining the following evaluation indexes according to the fitting result and the original data: pearson correlation coefficient and root mean square value.

The invention has the beneficial effects that: and by utilizing the node similarity and adopting a multiple linear regression model to predict missing continuous edge weights, the model is simple and the prediction result is better.

Drawings

Fig. 1 is a flowchart of a undirected network edge prediction method incorporating node similarity in an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, a undirected network connection edge weight prediction method based on node similarity includes the following steps:

s1: constructing a undirected network graph G (V, E) by using an existing nematode neural network (C.elegans) data set, wherein nodes represent neurons of the nematode, and edges represent synapses or gap connections of the neurons;

s2: adjacency matrix a of fig. G ═ (a)_ij)_n×n,i,j∈{1,2,...,n}，

Wherein:

according to the adjacency matrix A, the following similarity indexes are respectively calculated:

1) common neighbor CN:

where | Q | represents the number of elements of set Q, (x) is defined as the set of neighbor nodes for node x,the CN index value between the node x and the node y is expressed as follows;

2) the Salton index:

wherein k is_xA value representing x;

3) jaccard index:

4) s φ rensen index:

5) the favorable index HPI of the large node:

6) major node unfavorable index HDI:

7) LHN-I index:

8) priority link index PA:

9) Adamic-Adar index AA:

10) resource allocation index (RA):

11) katz index:

S^Katz＝(I-βA)^-1-I

where I is the identity matrix, the value of the parameter β must be less than the maximum eigenvalue λ of the adjacency matrix A₁To ensure matrix convergence;

12) LHN-II index:

wherein,_xyis the function of Kronecker, when x is y_xyThe number of bits is 1, otherwise,_xyd is the degree matrix of the undirected network graph G, i.e. D_ij＝k_i _ij，k_xDenotes the value of x, phi is an adjustable parameter, and the value range is (0,1), lambda₁Is the maximum eigenvalue of the adjacency matrix A, and M is the total number of edges of the network;

13) average commute time ACT:

wherein the pseudo-inverse of the laplace matrix L (L ═ D-a) of the network G is L⁺，A representation matrix L⁺The elements of (1);

14) cosine similarity cos based on random walk⁺：

15) Random walk with restart RWR:

wherein,

element pi_xyExpressed as how much probability the particle from node x eventually has to go to node y, (1-c) is the particle return probability, and P is the Markov probability transition matrix of the network, whose elements P_xyRepresenting the probability of the particle at node x going to node y next;

16) SimRank index SimR:

wherein s is_xx＝1，C∈[0,1]Is the attenuation parameter at the time of similarity transmission;

17) matrix forest index MFI:

S^MFI＝(I+αL)^-1，α＞0

wherein, the laplace matrix of the network G is L (L ═ D-a), and I is an identity matrix;

18) local path index LP:

S^LP＝A²+A³

wherein the parameter values are arbitrary, and when the value is 0, LP is equivalent to CN;

19) local random walk index LRW:

wherein the initial resource distribution of the node x is q_x，Is a vector of n × 1, with only the x-th element being 1 and the other elements being 0, i.e.t≥0；

20) The superimposed local random walk index SRW:

s3: according to a ten-fold cross validation method, dividing the network connection edge weight in a data set into ten parts on average, wherein nine parts are used as a training set, and the rest is used as a test set; performing multiple linear regression analysis by using R language according to the characteristics calculated in the step S2 to obtain a result of fitting the test set, and comparing the result with the original data to obtain the following evaluation indexes: the Pearson correlation coefficient and the root mean square value, the model of the invention is simple and can obtain good prediction results.

As described above, the method for predicting the edge-connected weight in the undirected network graph is introduced, the method combines the similarity of network nodes and analyzes by using a multiple linear regression model, the final prediction result is better, and the requirement of actual use is met. The present invention is to be considered as illustrative and not restrictive. It will be understood by those skilled in the art that various changes, modifications and equivalents may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A undirected network connection edge weight prediction method based on node similarity is characterized in that: the method comprises the following steps:

s2: according to the graph G ═ V, E, using the node similarity theory in link prediction, the following three types of features are calculated: the method comprises the steps of local similarity index, global similarity index and semi-local similarity index, wherein the local similarity index comprisesCommon neighbor CN, Salton index, Jaccard index, S phi rensen index, major node favorable index HPI, major node unfavorable index HDI, LHN-I index, preferential link index PA, Adamic-Adar index AA and resource allocation index RA; the global similarity indexes comprise Katz indexes, LHN-II indexes, average commute time ACT and cosine similarity cos based on random walk⁺Random walk RWR with restart, SimRank index SimR and matrix forest index MFI; the semi-local similarity index comprises a local path index LP, a local random walk index LRW and a superposed local random walk index SRW;

2. The undirected network connection edge weight prediction method based on node similarity as claimed in claim 1, wherein: in step S2, the adjacency matrix a in fig. G is (a)_ij)_n×n，i,j∈{1,2,...,n}，