CN112966156B

CN112966156B - Directed network link prediction method based on structural disturbance and linear optimization

Info

Publication number: CN112966156B
Application number: CN202110309745.6A
Authority: CN
Inventors: 李小丽; 郭天娇; 刘波; 苏海龙
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2023-03-21
Anticipated expiration: 2041-03-23
Also published as: CN112966156A

Abstract

The invention discloses a directed network link prediction method based on structural disturbance and linear optimization, which mainly solves the problem of low prediction accuracy of link prediction in a directed network. The scheme is as follows: 1) Downloading a real directed network data set to obtain an adjacent matrix of the directed network; 2) Decomposing the network adjacency matrix into a symmetric matrix and an asymmetric matrix; 3) Dividing the symmetric matrix into a residual set and a disturbance set, and disturbing the residual set by using the disturbance set to obtain a current initial disturbance matrix; 4) Repeating the step 3) for 10 times in total to obtain an average, and adding the asymmetric matrix to obtain a final disturbance matrix; 5) Taking the final disturbance matrix as the input of a linear optimization LO algorithm, and calculating a similarity matrix S; 6) And (4) arranging the similarity of the unconnected node pairs in the S according to a descending order, and taking the front P links as the predicted directed network links. The invention improves the prediction precision of the link and can be used for various recommendation systems and traffic systems.

Description

Directed network link prediction method based on structural disturbance and linear optimization

Technical Field

The invention belongs to the technical field of data mining, and particularly relates to a directed network link prediction method which can be applied to various recommendation systems, traffic systems, biological research and criminal event analysis.

Background

In the present data era, it is very important to accurately grasp and process data information, and link prediction of a complex network is a basic method for data mining. The link prediction of the complex network can be used not only to solve the problems of incomplete data and unreliability, but also to be widely applied to various recommendation systems, traffic systems, the field of biological research, the analysis of criminal events and terrorist events, and the like.

Over the years, more and more researchers have begun to study the link prediction problem and have proposed many link prediction algorithms. Link prediction for complex networks aims at predicting missing links and links that may appear in the future in the network from information available in the network. However, most of the link prediction methods are only directed to undirected complex networks. In real life, most real networks are directed networks. For example, in a food network, predators and predators' relationships are unidirectional, and such unidirectional relationships can only be characterized by directed edges. Therefore, link prediction of the directed network is gradually becoming a research hotspot and a research difficulty of researchers. In link prediction for a directed network, not only the missing links in the network but also the direction of the missing links in the network are predicted. It is clear that the value of the application of the link prediction algorithm to the network becomes smaller in practice.

In recent years, some link prediction algorithms for the directed network have been proposed by researchers. For example, the structure perturbation method is an algorithm which can well use the overall structure information of the network to realize the prediction of the missing link, and is expanded to the directed network through matrix decomposition in 2018. The basic idea of the structure perturbation method is to use a small part of continuous edges of the original complex network to perturb the network formed by the residual continuous edges, thereby realizing prediction. In the algorithm, the eigenvector of the corresponding adjacency matrix of the complex network needs to be kept unchanged, and the eigenvalue of the adjacency matrix needs to be changed, so as to recover the missing side information in the original network. Ratha Pech and Zhou Tao et al proposed a linear optimization LO method in 2019, which is an algorithm that can be directly used for directed network link prediction. The basic idea of the linear optimization LO method is to convert the link prediction problem into an optimization problem of a likelihood matrix by taking the probability that a link exists between two nodes in the network through the linear summation of the contributions of its neighboring nodes. In the linear optimization LO method, the odd path side information of the original complex network is mainly used for prediction. However, the network side information used by the structure disturbance and linear optimization LO method is relatively small, resulting in slightly poor prediction accuracy.

Disclosure of Invention

The invention aims to provide a directed network link prediction method based on structural disturbance and linear optimization aiming at the defects of the prior art, so that more network link side information is used by fusing the structural disturbance and the linear optimization, and the prediction precision is improved.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

(1) Downloading a real directed network data set, and obtaining an adjacent matrix A of the directed network according to node and link information in the directed network data set;

(2) Decomposing a network adjacency matrix A into symmetric matrices

And an asymmetric matrix

Wherein A is ^T Is the transposition of A;

(3) Will be symmetrical matrix

Dividing the residual set R and the disturbance set D according to the ratio of 9: 1, and using the disturbance set D to disturb the residual set R to obtain a current initial disturbance matrix M;

(4) Repeating the step (3) for 10 times, adding the initial disturbance matrixes M obtained each time, and averaging to obtain an average initial disturbance matrix

(5) Averaging the initial perturbation matrix

Adding an asymmetric matrix

Obtaining the final disturbance matrix

(6) Taking the final disturbance matrix F as the input of a linear optimization LO algorithm, and calculating a similarity matrix S, wherein elements S in the similarity matrix _xy Representing the probability of links existing from node x to node y in the network, i.e. the probability of connected node pairs and the probability of unconnected node pairs;

(7) And (4) arranging the probabilities of unconnected node pairs in the similarity matrix S according to a descending order, wherein the front P links are predicted directed network links.

Compared with the prior art, the invention has the following advantages:

first, the present invention is to use a symmetric matrix

Dividing the residual set R into a residual set R and a disturbance set D, averaging interference of the residual set R for several times by using the disturbance set D, and averaging the interference by using an asymmetric matrix

Adding to the initial average disturbance matrix

In the method, the number of continuous edges of the final disturbance matrix is increased, and the final disturbance matrix F with more continuous edge information is obtained.

Secondly, the final disturbance matrix F with more side information is used as the input of the linear optimization LO algorithm, and the similarity matrix S is calculated, so that the predicted directional link is more accurate, and the prediction precision of the directional link is improved compared with the conventional linear optimization LO algorithm.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention.

Detailed Description

Specific embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the method for predicting the directed network link based on the structural disturbance and the linear optimization of the present invention includes the following steps:

step 1, acquiring a directed network data set to obtain an adjacency matrix A.

Downloading a real directed network data set from a website http:// vlado.fmf.uni-lj.si/pub/networks/data;

obtaining an adjacent matrix A of the directed network according to the node and link information in the directed network data set, wherein an element a in the adjacent matrix A _xy Indicating whether there is a directed link from node x to node y, if a _xy Not equal to 0, it means that there is a directed link from node x to node y, if a _xy =0, it means that there is no directed link from node x to node y.

And 2, decomposing the adjacency matrix A.

Decomposing the adjacency matrix A into a symmetric matrix

And asymmetric matrix

Wherein A is ^T Is the transpose of A.

And 3, carrying out disturbance to obtain a current initial disturbance matrix M.

3.1 Will be symmetric matrix

Randomly dividing the residual set R and the disturbance set D according to the ratio of 9: 1;

3.2 The residue set R) is expressed as follows:

in the formula, λ _k And x _k Are eigenvalues and eigenvectors of the residue set R, respectively, and λ _k Belonging to the real number set, x _k Belongs to an n-dimensional real number set;

3.3 Interfere with the residual set R with the disturbance set D, resulting in the following expression:

(R+D)(x _k +Δx _k )＝(λ _k +Δλ _k )(x _k +Δx _k )

in the formula of lambda _k +Δλ _k And x _k +Δx _k Respectively, eigenvalues and eigenvectors of R + D, delta lambda _k And Δ x _k Respectively representing the eigenvalue and the eigenvector of the disturbance set D;

3.4 To the expression (R + D) (x) _k +Δx _k )＝(λ _k +Δλ _k )(x _k +Δx _k ) Left ride

The following expression is obtained:

ignoring second order terms

And

to obtain Delta lambda _k ：

Wherein the content of the first and second substances,

is x _k Transposing;

3.5 Hold the eigenvectors x of the residue set R in 3.2) _k Changing the characteristic value of the residue set R to be lambda without changing _k +Δλ _k And obtaining a current initial disturbance matrix M as follows:

step 4, calculating an average initial disturbance matrix

Repeating the step 3 for 10 times, adding the initial disturbance matrixes M obtained each time, and averaging to obtain an average initial disturbance matrix

And 5, calculating a final disturbance matrix F.

For average initial disturbance matrix

Expanding to increase the number of edges of the final disturbance matrix F, i.e. giving the average initial disturbance matrix

Adding an asymmetric matrix

Obtaining the final disturbance matrix

And 6, obtaining a similarity matrix S by utilizing a linear optimization LO algorithm.

6.1 Input the final perturbation matrix F, compute the following optimization problem:

wherein the content of the first and second substances,

is a Frobenius norm of power 2 of Z, and

is the Frobenius norm of power 2 of F-FZ, and

the symbol Tr represents the trace of the matrix, α is a free parameter that balances these two terms, Z is the node contribution matrix, Z ^T Is the transpose of Z;

6.2 Expand the expression in 6.1) as follows:

6.3 Let F (F, Z) = alpha Tr [ (F-FZ) ^T (F-FZ)]+Tr(Z ^T Z), the partial derivative of the function F (F, Z) is obtained as:

6.4 Let alpha (-2F) ^T F+2F ^T FZ) +2z =0, solving the matrix Z in this equation, resulting in the optimal solution Z:

Z ^* ＝α(αF ^T F+I) ^-1 F ^T F，

wherein, F ^T Is the transpose of F, I is the identity matrix;

6.5 ) calculating a similarity matrix S from the input perturbation matrix F and the optimal solution Z obtained in 6.4):

S＝FZ ^*

wherein, the element S in the similarity matrix S _xy Representing the probability of a link existing in the network from node x to node y, i.e., the probability of a connected node pair and the probability of an unconnected node pair.

And 7, obtaining the predicted directed network link by using the similarity matrix S.

And (4) arranging the probabilities of unconnected node pairs in the similarity matrix S according to a descending order, wherein the front P links are predicted directed network links.

The effect of the invention is further explained by combining simulation experiments as follows:

1. simulation conditions are as follows:

the operating system adopted in the simulation experiment is windows10. The software used for the experiments was MATLAB.

2. Simulation content:

and respectively utilizing the existing linear optimization LO algorithm and the method of the invention to carry out link prediction on 15 directed networks. The average prediction accuracy of the directional network link prediction of the two methods is counted, and the result is shown in table 1.

TABLE 1 comparison of average prediction accuracies for two methods

Network name	Existing LO algorithms	The invention
			CrystalC	0.4993	0.5036
Japanese macaques	0.2701	0.2863
			Everglades	0.6107	0.6175
gramdry	0.6144	0.6239
			gramwet	0.6107	0.6189
crpdry	0.5581	0.5748
			crpwet	0.5474	0.5610
World trade	0.4628	0.4724
			mangdry	0.5271	0.5332
mangwet	0.5323	0.5485
			baydry	0.5771	0.5848
baywet	0.5723	0.5789
			Little Rock Lake	0.8084	0.8154
USAir	0.4218	0.4449
			SmaGri	0.2012	0.2021

As can be seen from table 1, on 15 directed networks, compared with the existing linear optimization LO algorithm, the average prediction accuracy of the link prediction of the present invention is significantly improved.

Claims

1. A directed network link prediction method based on structural disturbance and linear optimization is characterized by comprising the following steps:

(2) Decomposing a network adjacency matrix A into symmetric matrices

And asymmetric matrix

Wherein A is ^T Is the transposition of A;

(3) Will be symmetrical matrix

Dividing the residual set R and the disturbance set D according to the ratio of 9: 1, and using the disturbance set D to disturb the residual set R to obtain the current initial setStarting to disturb the matrix M;

(5) Averaging the initial perturbation matrix

Adding an asymmetric matrix

Obtaining the final disturbance matrix

(6) Taking the final disturbance matrix F as the input of the linear optimization LO algorithm, and calculating a similarity matrix S, wherein elements S in the similarity matrix _xy Representing the probability of links existing from node x to node y in the network, i.e. the probability of connected node pairs and the probability of unconnected node pairs;

2. The method of claim 1, wherein the disturbance set D is used to disturb the residue set R in (3) to obtain an initial disturbance matrix M, which is implemented as follows:

(3a) The residue set R is represented as follows:

(3b) And (3) interfering the residual set R by using the disturbance set D to obtain the following expression:

(R+D)(x _k +Δx _k )＝(λ _k +Δλ _k )(x _k +Δx _k )

in the formula, λ _k +Δλ _k And x _k +Δx _k Respectively, eigenvalues and eigenvectors of R + D, delta lambda _k And Δ x _k Respectively representing the eigenvalue and the eigenvector of the disturbance set D;

(3c) To the expression (R + D) (x) _k +Δx _k )＝(λ _k +Δλ _k )(x _k +Δx _k ) Left ride

The following expression is obtained:

ignoring second order terms

And

to obtain Delta lambda _k ：

Wherein the content of the first and second substances,

is x _k Transposing;

(3d) Keeping the feature vector x of the residue set R in (3 a) _k Changing the characteristic value of the residue set R to be lambda without changing _k +Δλ _k And obtaining a current initial disturbance matrix M as follows:

3. the method of claim 1, wherein the similarity matrix S is calculated in (6) as follows:

(6a) Inputting a disturbance matrix F, and calculating the following optimization problem:

wherein, the first and the second end of the pipe are connected with each other,

is a Frobenius norm of power 2 of Z, and

is the Frobenius norm of power 2 of F-FZ, and

the symbol Tr represents the trace of the matrix, α is a free parameter that balances these two items, Z is the node contribution matrix, Z ^T Is the transpose of Z;

(6b) The expression in expansion (6 a) is as follows:

(6c) Let F (F, Z) = alpha Tr [ (F-FZ) ^T (F-FZ)]+Tr(Z ^T Z), the partial derivative of the function F (F, Z) is:

(6d) Let alpha (-2F) ^T F+2F ^T FZ) +2z =0, resulting in the optimal solution Z of matrix Z:

Z ^* ＝α(αF ^T F+I) ^-1 F ^T F

wherein, F ^T Is the transpose of F, I is the identity matrix;

(6e) Calculating a similarity matrix S according to the optimal solution Z obtained from the input disturbance matrixes F and (6 d):

S＝FZ ^* 。