CN112966156B - Directed network link prediction method based on structural disturbance and linear optimization - Google Patents
Directed network link prediction method based on structural disturbance and linear optimization Download PDFInfo
- Publication number
- CN112966156B CN112966156B CN202110309745.6A CN202110309745A CN112966156B CN 112966156 B CN112966156 B CN 112966156B CN 202110309745 A CN202110309745 A CN 202110309745A CN 112966156 B CN112966156 B CN 112966156B
- Authority
- CN
- China
- Prior art keywords
- matrix
- disturbance
- network
- directed network
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Algebra (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a directed network link prediction method based on structural disturbance and linear optimization, which mainly solves the problem of low prediction accuracy of link prediction in a directed network. The scheme is as follows: 1) Downloading a real directed network data set to obtain an adjacent matrix of the directed network; 2) Decomposing the network adjacency matrix into a symmetric matrix and an asymmetric matrix; 3) Dividing the symmetric matrix into a residual set and a disturbance set, and disturbing the residual set by using the disturbance set to obtain a current initial disturbance matrix; 4) Repeating the step 3) for 10 times in total to obtain an average, and adding the asymmetric matrix to obtain a final disturbance matrix; 5) Taking the final disturbance matrix as the input of a linear optimization LO algorithm, and calculating a similarity matrix S; 6) And (4) arranging the similarity of the unconnected node pairs in the S according to a descending order, and taking the front P links as the predicted directed network links. The invention improves the prediction precision of the link and can be used for various recommendation systems and traffic systems.
Description
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a directed network link prediction method which can be applied to various recommendation systems, traffic systems, biological research and criminal event analysis.
Background
In the present data era, it is very important to accurately grasp and process data information, and link prediction of a complex network is a basic method for data mining. The link prediction of the complex network can be used not only to solve the problems of incomplete data and unreliability, but also to be widely applied to various recommendation systems, traffic systems, the field of biological research, the analysis of criminal events and terrorist events, and the like.
Over the years, more and more researchers have begun to study the link prediction problem and have proposed many link prediction algorithms. Link prediction for complex networks aims at predicting missing links and links that may appear in the future in the network from information available in the network. However, most of the link prediction methods are only directed to undirected complex networks. In real life, most real networks are directed networks. For example, in a food network, predators and predators' relationships are unidirectional, and such unidirectional relationships can only be characterized by directed edges. Therefore, link prediction of the directed network is gradually becoming a research hotspot and a research difficulty of researchers. In link prediction for a directed network, not only the missing links in the network but also the direction of the missing links in the network are predicted. It is clear that the value of the application of the link prediction algorithm to the network becomes smaller in practice.
In recent years, some link prediction algorithms for the directed network have been proposed by researchers. For example, the structure perturbation method is an algorithm which can well use the overall structure information of the network to realize the prediction of the missing link, and is expanded to the directed network through matrix decomposition in 2018. The basic idea of the structure perturbation method is to use a small part of continuous edges of the original complex network to perturb the network formed by the residual continuous edges, thereby realizing prediction. In the algorithm, the eigenvector of the corresponding adjacency matrix of the complex network needs to be kept unchanged, and the eigenvalue of the adjacency matrix needs to be changed, so as to recover the missing side information in the original network. Ratha Pech and Zhou Tao et al proposed a linear optimization LO method in 2019, which is an algorithm that can be directly used for directed network link prediction. The basic idea of the linear optimization LO method is to convert the link prediction problem into an optimization problem of a likelihood matrix by taking the probability that a link exists between two nodes in the network through the linear summation of the contributions of its neighboring nodes. In the linear optimization LO method, the odd path side information of the original complex network is mainly used for prediction. However, the network side information used by the structure disturbance and linear optimization LO method is relatively small, resulting in slightly poor prediction accuracy.
Disclosure of Invention
The invention aims to provide a directed network link prediction method based on structural disturbance and linear optimization aiming at the defects of the prior art, so that more network link side information is used by fusing the structural disturbance and the linear optimization, and the prediction precision is improved.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) Downloading a real directed network data set, and obtaining an adjacent matrix A of the directed network according to node and link information in the directed network data set;
(3) Will be symmetrical matrixDividing the residual set R and the disturbance set D according to the ratio of 9: 1, and using the disturbance set D to disturb the residual set R to obtain a current initial disturbance matrix M;
(4) Repeating the step (3) for 10 times, adding the initial disturbance matrixes M obtained each time, and averaging to obtain an average initial disturbance matrix
(5) Averaging the initial perturbation matrixAdding an asymmetric matrixObtaining the final disturbance matrix
(6) Taking the final disturbance matrix F as the input of a linear optimization LO algorithm, and calculating a similarity matrix S, wherein elements S in the similarity matrix xy Representing the probability of links existing from node x to node y in the network, i.e. the probability of connected node pairs and the probability of unconnected node pairs;
(7) And (4) arranging the probabilities of unconnected node pairs in the similarity matrix S according to a descending order, wherein the front P links are predicted directed network links.
Compared with the prior art, the invention has the following advantages:
first, the present invention is to use a symmetric matrixDividing the residual set R into a residual set R and a disturbance set D, averaging interference of the residual set R for several times by using the disturbance set D, and averaging the interference by using an asymmetric matrixAdding to the initial average disturbance matrixIn the method, the number of continuous edges of the final disturbance matrix is increased, and the final disturbance matrix F with more continuous edge information is obtained.
Secondly, the final disturbance matrix F with more side information is used as the input of the linear optimization LO algorithm, and the similarity matrix S is calculated, so that the predicted directional link is more accurate, and the prediction precision of the directional link is improved compared with the conventional linear optimization LO algorithm.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
Specific embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the method for predicting the directed network link based on the structural disturbance and the linear optimization of the present invention includes the following steps:
step 1, acquiring a directed network data set to obtain an adjacency matrix A.
Downloading a real directed network data set from a website http:// vlado.fmf.uni-lj.si/pub/networks/data;
obtaining an adjacent matrix A of the directed network according to the node and link information in the directed network data set, wherein an element a in the adjacent matrix A xy Indicating whether there is a directed link from node x to node y, if a xy Not equal to 0, it means that there is a directed link from node x to node y, if a xy =0, it means that there is no directed link from node x to node y.
And 2, decomposing the adjacency matrix A.
Wherein A is T Is the transpose of A.
And 3, carrying out disturbance to obtain a current initial disturbance matrix M.
3.1 Will be symmetric matrixRandomly dividing the residual set R and the disturbance set D according to the ratio of 9: 1;
3.2 The residue set R) is expressed as follows:
in the formula, λ k And x k Are eigenvalues and eigenvectors of the residue set R, respectively, and λ k Belonging to the real number set, x k Belongs to an n-dimensional real number set;
3.3 Interfere with the residual set R with the disturbance set D, resulting in the following expression:
(R+D)(x k +Δx k )=(λ k +Δλ k )(x k +Δx k )
in the formula of lambda k +Δλ k And x k +Δx k Respectively, eigenvalues and eigenvectors of R + D, delta lambda k And Δ x k Respectively representing the eigenvalue and the eigenvector of the disturbance set D;
3.4 To the expression (R + D) (x) k +Δx k )=(λ k +Δλ k )(x k +Δx k ) Left rideThe following expression is obtained:
3.5 Hold the eigenvectors x of the residue set R in 3.2) k Changing the characteristic value of the residue set R to be lambda without changing k +Δλ k And obtaining a current initial disturbance matrix M as follows:
Repeating the step 3 for 10 times, adding the initial disturbance matrixes M obtained each time, and averaging to obtain an average initial disturbance matrix
And 5, calculating a final disturbance matrix F.
For average initial disturbance matrixExpanding to increase the number of edges of the final disturbance matrix F, i.e. giving the average initial disturbance matrixAdding an asymmetric matrixObtaining the final disturbance matrix
And 6, obtaining a similarity matrix S by utilizing a linear optimization LO algorithm.
6.1 Input the final perturbation matrix F, compute the following optimization problem:
wherein the content of the first and second substances,is a Frobenius norm of power 2 of Z, andis the Frobenius norm of power 2 of F-FZ, andthe symbol Tr represents the trace of the matrix, α is a free parameter that balances these two terms, Z is the node contribution matrix, Z T Is the transpose of Z;
6.2 Expand the expression in 6.1) as follows:
6.3 Let F (F, Z) = alpha Tr [ (F-FZ) T (F-FZ)]+Tr(Z T Z), the partial derivative of the function F (F, Z) is obtained as:
6.4 Let alpha (-2F) T F+2F T FZ) +2z =0, solving the matrix Z in this equation, resulting in the optimal solution Z:
Z * =α(αF T F+I) -1 F T F,
wherein, F T Is the transpose of F, I is the identity matrix;
6.5 ) calculating a similarity matrix S from the input perturbation matrix F and the optimal solution Z obtained in 6.4):
S=FZ *
wherein, the element S in the similarity matrix S xy Representing the probability of a link existing in the network from node x to node y, i.e., the probability of a connected node pair and the probability of an unconnected node pair.
And 7, obtaining the predicted directed network link by using the similarity matrix S.
And (4) arranging the probabilities of unconnected node pairs in the similarity matrix S according to a descending order, wherein the front P links are predicted directed network links.
The effect of the invention is further explained by combining simulation experiments as follows:
1. simulation conditions are as follows:
the operating system adopted in the simulation experiment is windows10. The software used for the experiments was MATLAB.
2. Simulation content:
and respectively utilizing the existing linear optimization LO algorithm and the method of the invention to carry out link prediction on 15 directed networks. The average prediction accuracy of the directional network link prediction of the two methods is counted, and the result is shown in table 1.
TABLE 1 comparison of average prediction accuracies for two methods
Network name | Existing LO algorithms | The invention |
CrystalC | 0.4993 | 0.5036 |
Japanese macaques | 0.2701 | 0.2863 |
Everglades | 0.6107 | 0.6175 |
gramdry | 0.6144 | 0.6239 |
gramwet | 0.6107 | 0.6189 |
crpdry | 0.5581 | 0.5748 |
crpwet | 0.5474 | 0.5610 |
World trade | 0.4628 | 0.4724 |
mangdry | 0.5271 | 0.5332 |
mangwet | 0.5323 | 0.5485 |
baydry | 0.5771 | 0.5848 |
baywet | 0.5723 | 0.5789 |
Little Rock Lake | 0.8084 | 0.8154 |
USAir | 0.4218 | 0.4449 |
SmaGri | 0.2012 | 0.2021 |
As can be seen from table 1, on 15 directed networks, compared with the existing linear optimization LO algorithm, the average prediction accuracy of the link prediction of the present invention is significantly improved.
Claims (3)
1. A directed network link prediction method based on structural disturbance and linear optimization is characterized by comprising the following steps:
(1) Downloading a real directed network data set, and obtaining an adjacent matrix A of the directed network according to node and link information in the directed network data set;
(3) Will be symmetrical matrixDividing the residual set R and the disturbance set D according to the ratio of 9: 1, and using the disturbance set D to disturb the residual set R to obtain the current initial setStarting to disturb the matrix M;
(4) Repeating the step (3) for 10 times, adding the initial disturbance matrixes M obtained each time, and averaging to obtain an average initial disturbance matrix
(5) Averaging the initial perturbation matrixAdding an asymmetric matrixObtaining the final disturbance matrix
(6) Taking the final disturbance matrix F as the input of the linear optimization LO algorithm, and calculating a similarity matrix S, wherein elements S in the similarity matrix xy Representing the probability of links existing from node x to node y in the network, i.e. the probability of connected node pairs and the probability of unconnected node pairs;
(7) And (4) arranging the probabilities of unconnected node pairs in the similarity matrix S according to a descending order, wherein the front P links are predicted directed network links.
2. The method of claim 1, wherein the disturbance set D is used to disturb the residue set R in (3) to obtain an initial disturbance matrix M, which is implemented as follows:
(3a) The residue set R is represented as follows:
in the formula, λ k And x k Are eigenvalues and eigenvectors of the residue set R, respectively, and λ k Belonging to the real number set, x k Belongs to an n-dimensional real number set;
(3b) And (3) interfering the residual set R by using the disturbance set D to obtain the following expression:
(R+D)(x k +Δx k )=(λ k +Δλ k )(x k +Δx k )
in the formula, λ k +Δλ k And x k +Δx k Respectively, eigenvalues and eigenvectors of R + D, delta lambda k And Δ x k Respectively representing the eigenvalue and the eigenvector of the disturbance set D;
(3c) To the expression (R + D) (x) k +Δx k )=(λ k +Δλ k )(x k +Δx k ) Left rideThe following expression is obtained:
(3d) Keeping the feature vector x of the residue set R in (3 a) k Changing the characteristic value of the residue set R to be lambda without changing k +Δλ k And obtaining a current initial disturbance matrix M as follows:
3. the method of claim 1, wherein the similarity matrix S is calculated in (6) as follows:
(6a) Inputting a disturbance matrix F, and calculating the following optimization problem:
wherein, the first and the second end of the pipe are connected with each other,is a Frobenius norm of power 2 of Z, and is the Frobenius norm of power 2 of F-FZ, andthe symbol Tr represents the trace of the matrix, α is a free parameter that balances these two items, Z is the node contribution matrix, Z T Is the transpose of Z;
(6b) The expression in expansion (6 a) is as follows:
(6c) Let F (F, Z) = alpha Tr [ (F-FZ) T (F-FZ)]+Tr(Z T Z), the partial derivative of the function F (F, Z) is:
(6d) Let alpha (-2F) T F+2F T FZ) +2z =0, resulting in the optimal solution Z of matrix Z:
Z * =α(αF T F+I) -1 F T F
wherein, F T Is the transpose of F, I is the identity matrix;
(6e) Calculating a similarity matrix S according to the optimal solution Z obtained from the input disturbance matrixes F and (6 d):
S=FZ * 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110309745.6A CN112966156B (en) | 2021-03-23 | 2021-03-23 | Directed network link prediction method based on structural disturbance and linear optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110309745.6A CN112966156B (en) | 2021-03-23 | 2021-03-23 | Directed network link prediction method based on structural disturbance and linear optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112966156A CN112966156A (en) | 2021-06-15 |
CN112966156B true CN112966156B (en) | 2023-03-21 |
Family
ID=76279532
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110309745.6A Active CN112966156B (en) | 2021-03-23 | 2021-03-23 | Directed network link prediction method based on structural disturbance and linear optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112966156B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115037630B (en) * | 2022-04-29 | 2023-10-20 | 电子科技大学长三角研究院(湖州) | Weighted network link prediction method based on structure disturbance model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105306042A (en) * | 2015-11-26 | 2016-02-03 | 东北大学 | Device for parallel computing of broadband chaotic laser reserving pool |
CN107248923A (en) * | 2017-04-20 | 2017-10-13 | 西安电子科技大学 | A kind of link prediction method based on local topology information and corporations' correlation |
CN107909217A (en) * | 2017-11-30 | 2018-04-13 | 武汉大学 | One kind is used to judge to community network Evolution abnormal nodes and impact evaluation method |
CN109120462A (en) * | 2018-09-30 | 2019-01-01 | 南昌航空大学 | Prediction technique, device and the readable storage medium storing program for executing of opportunistic network link |
CN110858311A (en) * | 2018-08-23 | 2020-03-03 | 山东建筑大学 | Deep nonnegative matrix factorization-based link prediction method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200065668A1 (en) * | 2018-08-27 | 2020-02-27 | NEC Laboratories Europe GmbH | Method and system for learning sequence encoders for temporal knowledge graph completion |
-
2021
- 2021-03-23 CN CN202110309745.6A patent/CN112966156B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105306042A (en) * | 2015-11-26 | 2016-02-03 | 东北大学 | Device for parallel computing of broadband chaotic laser reserving pool |
CN107248923A (en) * | 2017-04-20 | 2017-10-13 | 西安电子科技大学 | A kind of link prediction method based on local topology information and corporations' correlation |
CN107909217A (en) * | 2017-11-30 | 2018-04-13 | 武汉大学 | One kind is used to judge to community network Evolution abnormal nodes and impact evaluation method |
CN110858311A (en) * | 2018-08-23 | 2020-03-03 | 山东建筑大学 | Deep nonnegative matrix factorization-based link prediction method and system |
CN109120462A (en) * | 2018-09-30 | 2019-01-01 | 南昌航空大学 | Prediction technique, device and the readable storage medium storing program for executing of opportunistic network link |
Non-Patent Citations (2)
Title |
---|
Inductive perturbation tolerance improvement of hybrid networks using an integrated expert system;Brook W. Abegaz,等;《2017 IEEE Power & Energy Society General Meeting》;20180201;第1-5页 * |
复杂网络链路可预测性:基于特征谱视角;谭索怡等;《物理学报》;20200830(第08期);第188-197页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112966156A (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Balzano et al. | Online identification and tracking of subspaces from highly incomplete information | |
Lam et al. | Profile-kernel likelihood inference with diverging number of parameters | |
Su et al. | Inner envelopes: efficient estimation in multivariate linear regression | |
Ye et al. | Decentralized accelerated proximal gradient descent | |
Blanchet et al. | Distributionally robust groupwise regularization estimator | |
CN109951214B (en) | Signal detection method suitable for large-scale MIMO system | |
CN112966156B (en) | Directed network link prediction method based on structural disturbance and linear optimization | |
CN108520310B (en) | Wind speed forecasting method of G-L mixed noise characteristic v-support vector regression machine | |
CN111832637B (en) | Distributed deep learning classification method based on alternating direction multiplier method ADMM | |
Traganitis et al. | Network topology inference via elastic net structural equation models | |
Liu et al. | Optimal topological design for distributed estimation over sensor networks | |
CN114169091A (en) | Method for establishing prediction model of residual life of engineering mechanical part and prediction method | |
CN112804168B (en) | Tensor chain decomposition-based millimeter wave relay system channel estimation method | |
Zeng et al. | Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems | |
CN111667886B (en) | Dynamic protein compound identification method | |
CN111797979A (en) | Vibration transmission system based on LSTM model | |
Dong et al. | Towards understanding and reducing graph structural noise for GNNs | |
Aketi et al. | Neighborhood gradient clustering: An efficient decentralized learning method for non-iid data distributions | |
CN115102983A (en) | Multi-source heterogeneous data signal processing method of power internet of things based on compressed sensing | |
Wang et al. | Distributed two‐stage state estimation with event‐triggered strategy for multirate sensor networks | |
CN111045861B (en) | Sensor data recovery method based on deep neural network | |
US20210144171A1 (en) | A Method of Digital Signal Feature Extraction Comprising Multiscale Analysis | |
Chen et al. | Highlighting link prediction in bipartite networks via structural perturbation | |
Shen | Encoder Embedding for General Graph and Node Classification | |
Tseng et al. | A rational graph filter method for GFT centrality computation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |