CN115208774A - Link prediction method based on node influence - Google Patents

Link prediction method based on node influence Download PDF

Info

Publication number
CN115208774A
CN115208774A CN202110391814.2A CN202110391814A CN115208774A CN 115208774 A CN115208774 A CN 115208774A CN 202110391814 A CN202110391814 A CN 202110391814A CN 115208774 A CN115208774 A CN 115208774A
Authority
CN
China
Prior art keywords
node
psor
algorithm
index
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110391814.2A
Other languages
Chinese (zh)
Inventor
张月霞
邹列
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202110391814.2A priority Critical patent/CN115208774A/en
Publication of CN115208774A publication Critical patent/CN115208774A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods

Abstract

The invention provides a link prediction method based on node influence, which comprises the following steps: firstly, degrees of a node and neighbor nodes are integrated, a Psor index of the node is defined, and the influence of the node can be better measured by utilizing the Psor index of the node; secondly, comprehensively considering local structure information of the complex network, and defining a Psor similarity index to predict a link; and finally, verifying the effectiveness of the proposed algorithm through five real network data. The method can predict the complex network link more accurately by considering the influence of the node and the contribution of the link of the neighbor node to the prediction, and has higher prediction precision.

Description

Link prediction method based on node influence
Technical Field
The invention relates to the field of link prediction, in particular to a link prediction method based on node influence.
Background
In real life, a plurality of complex systems can be represented into a complex network in a node and connecting line mode, and the internal structure information of the complex systems can be better known and understood by researching the complex network in a scientific mode. Link prediction is one of the ways to study a complex network, and means to predict, from known various information (network topology, node attributes, etc.), the possibility of a connection occurring between two nodes for which no continuous edge exists in the network. Such predictions include both predictions of unknown links and predictions of future links. At present, link prediction is widely applied to the research of network evolution rules, and has huge practical application value. Such as: predicting the possibility of interaction between two strange friends in social networks of microblogs and blogs according to social relation information of a user on a social platform; in a commodity purchasing system and a news, music and live broadcast entertainment system, information required by a user is predicted according to the personal requirement and action behavior of the user, so that corresponding commodities and related favorite contents are recommended. Therefore, the method has practical significance for researching the link prediction problem of the complex network.
The link prediction methods are mainly classified into a machine learning-based method, a maximum likelihood probability-based method, and a network topology-based method. Although the method based on machine learning and maximum likelihood probability can obtain higher accuracy, the complexity is high, and the application range is limited. The method based on the network topology structure utilizes the topology structure to enable the link prediction to be easier to realize, and is concerned by most researchers. The existing similarity prediction algorithm based on the network topology structure is divided into local, global and quasi-local similarity algorithms. The local similarity algorithm mainly calculates the similarity score according to the number of common neighbors, the degree of a node or the degree of the common neighbors. For example, consider the CN index for the number of common neighbors; considering the priority connection PA index of the node degree; taking the self-degree Salton index, sorensen index, HPI index, HDI index and Leicht-Holme-Newman (LHN) index of the node into consideration on the basis of the common neighbor; and utilizing the AA index and the resource allocation RA index of the common neighbor node degree. The global similarity algorithm considers all information of the network topology, for example, katz indexes and LHN-II indexes of all paths in the network. The quasi-local similarity algorithm trades off the high accuracy and complexity of the global similarity algorithm, such as the local random walk indicator (LRW), the superimposed random walk indicator (SRW), and the recently proposed degree and H-index simple mixed influence indicator (SHI) by zhu et al. The similarity algorithm based on the network topology structure focuses on the number of common neighbor nodes, the degree of the common neighbor nodes and the transmission path capacity among the nodes, but ignores the influence of the degree of the neighbor nodes, and cannot relatively and comprehensively acquire the topology information of the network, so that the accuracy of link prediction is not high enough. The link prediction method based on the node influence considers the influence of the degree of the neighbor node, can improve the accuracy of link prediction, and is suitable for node connection prediction in a complex network.
Disclosure of Invention
The invention provides a link prediction method based on node influence, which aims at the problem of low prediction precision of the existing link prediction method based on network topology. The method integrates the degrees of the node and the neighbor nodes, defines the Psor index of the node, and can better measure the influence of one node by using the Psor index of the node compared with the degree of the node. The method considers the local structure information of the complex network and defines the Psor similarity index to predict the link, thereby further improving the accuracy of the link prediction. Simulation results show that the Psor algorithm has higher prediction precision and can effectively predict the network structure.
The link prediction method based on the node influence comprises the following steps:
1) Integrating degrees of the node and neighbor nodes, and defining a Psor index of the node;
2) Comprehensively considering local structure information of a complex network, and establishing a Psor similarity index;
3) And comparing the five real network data with seven existing link prediction algorithms to verify the effectiveness of the algorithm.
The method for integrating the degrees of the node and the neighbor nodes in the step 1) and defining the Psor index of the node comprises the following steps:
in order to better balance the influence of a node in a complex network, the invention integrates the degrees of the node and the neighbor nodes to define the Psor index of the node, which is one third power of the value of the sum of the degrees of the neighbor nodes to the square of the number of the neighbors of the node, and the calculation formula is
Figure BSA0000239229870000021
Where i is a neighbor of node x, k x And k t Representing degrees for node x and node i, respectively. It can be seen from this thatThe Psor index size of the node in the formula (1) is determined by the degrees of the node and the degrees of the neighbor nodes. When the degree of a node is constant, the larger the degree of a neighbor node is, the larger the Psor index of the node is, namely, the larger the influence of the node is. Compared with the existing link prediction index which only considers the node degree as the influence resource of the node, the Psor index of the node can more widely quantify the influence of the node.
In the step 2), the method for establishing the Psor similarity index by comprehensively considering the local structure information of the complex network comprises the following steps:
the invention provides a Psor similarity index for predicting network similarity by comprehensively considering local structure information of a complex network, which is defined as follows:
Figure BSA0000239229870000022
wherein the content of the first and second substances,
Figure BSA0000239229870000023
representing the Psor similarity index of the node pair (x, y), wherein | Γ (x) andΓ (y) | is the number of common neighbors of the node x and the node y, and k x And k y Degree for node x and node y, respectively, psor x And Psor y The Psor indices of node x and node y, respectively.
From this, it can be seen that the Psor similarity index in the formula (2)
Figure BSA0000239229870000031
The method is characterized by comprising the following steps of determining three influence factors of the number of common neighbors of a predicted node pair (x, y), the degree of two nodes and the Psor index of the two nodes. When the degree of the common neighbors and nodes of the predicted node pair is constant, the greater the Psor index of the two predicted nodes,
Figure BSA0000239229870000032
the greater the similarity score value, the greater the likelihood that two unconnected nodes in the network are connected.
In the step 3), the method for verifying the validity of the proposed algorithm by comparing five real network data with seven existing link prediction algorithms is as follows:
five real network data are selected to carry out a simulation experiment, a data set is divided into a training set with the percentage of 90% and a testing set with the percentage of 10%, and prediction effects AUC of a Psor algorithm and seven commonly used link prediction algorithms are compared on five networks. As shown in fig. 2, in four networks, i.e. FINC, USAir, PB, and Metabolite, the AUC value of the Psor algorithm is significantly higher than those of other seven classical link prediction similarity algorithms, that is, the Psor algorithm can obtain better prediction effect on prediction accuracy. In a Yeast network, the AUC value of the LHN-II algorithm is the highest, the AUC value of the Psor algorithm is second to the LHN-II algorithm, and the AUC values of other algorithms are lower than that of the Psor algorithm. Meanwhile, as can be seen from the visual chart of the experimental results, the prediction effect of the ACT algorithm is the worst in the FINC network. In three networks of USAir, PB and Metabolite, the prediction accuracy of the LHN-II algorithm is lowest, and particularly in the Metabolite network, the AUC of the LHN-II algorithm is lower than 0.5, which indicates that the LHN-II algorithm is not suitable for predicting the network. Therefore, in general, the link prediction performance of the Psor algorithm is better than that of other seven algorithms.
Drawings
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, in which:
FIG. 1 is a diagram of two complex networks;
fig. 2 is a schematic diagram of link prediction results of similarity algorithms in different networks.
Detailed Description
The following describes in further detail embodiments of the present invention with reference to the accompanying drawings.
Fig. 1 shows two complex networks. In fig. 1 (a) and 1 (b), the predicted nodes are x and y, the degrees of the node x and the node y are both 4, and the common neighbors of the nodes are node a and node G. The degree of neighbor nodes of the predicted nodes x and y is different in FIG. 1 (a) and FIG. 1 (b), where k is in FIG. 1 (a) B =k E =k C =k F =1, in the figureK in 1 (b) B =k E =k C =k F =2。
The similarity of the node pair (x, y) is calculated according to local similarity algorithms such as Salton, sorensen, HDI, LHN, and PA based on the degree of the node and the number of common neighbors, and the similarity scores of the node pair (x, y) in fig. 1 (a) and 1 (b) are the same, as in fig. 1 (a) and 1 (b), the similarity scores of the node pair (x, y) are both calculated to be 0.5 using the Salton algorithm. These algorithms are unable to predict which of the nodes x and y in fig. 1 (a) and 1 (b) are the most likely connections between them.
However, according to the defined Psor index of the node, i.e. equation (1), it can be obtained that the Psor indexes of the predicted nodes x and y in FIG. 1 (a) are Psor x =Psor y =2.08, psor index for nodes x and y in FIG. 1 (b) being Psor x =Psor y =2.52. Thus, the Psor indices of the predicted nodes x and y in FIG. 1 (b) are larger than the Psor indices of the predicted nodes x and y in FIG. 1 (a). It can be seen that the node pair (x, y) in fig. 1 (b) is more likely to be connected than the node pair (x, y) in fig. 1 (a).
The feasibility of the Psor algorithm was verified by two complex networks as shown in figure 1. The similarity score between node x and node y in fig. 1 (a) and 1 (b), respectively, is calculated according to a defined Psor similarity index, equation (2).
In FIG. 1 (a), k x =k y =4,Γ(x)∩Γ(y)|=2,Psor x =Psor y =2.08, so it can be obtained from the formula (2)
Figure BSA0000239229870000041
In FIG. 1 (b), k x =k y =4,Γ(x)∩Γ(y)|=2,Psor x =Psor y =2.52, so it can be obtained from equation (2)
Figure BSA0000239229870000042
Therefore, according to the Psor algorithm provided by the invention, the similarity score between the node pair (x, y) in fig. 1 (b) is higher than that between the node pair (x, y) in fig. 1 (a), so that the node x and the node y in fig. 1 (b) are more likely to be connected, which is consistent with the fact that the algorithm provided by the invention is feasible.
The effect of the invention is further illustrated by experimental simulation. The simulation experiment data set of the invention selects 5 networks from different fields, including FINC, USAir, yeast, PB and Metabolite. Wherein, FINC is a financial market network, USAir is an American air traffic network, yeast is a protein interaction network, PB is a politician blog network, and Metabolite is a nematode metabolism network.
Fig. 2 is a schematic diagram of link prediction results simulated by similarity algorithms in five networks when a data set is divided into a training set of 90% and a testing set of 10%. In four networks of FINC, USAir, PB and Metabolite, the AUC value of the Psor algorithm is obviously higher than the AUC values of other seven classical link prediction similarity algorithms, namely the Psor algorithm can obtain better prediction effect on prediction accuracy. In the Yeast network, the AUC value of the LHN-II algorithm is the highest, while the AUC value of the Psor algorithm is next to that of the LHN-II algorithm, and the AUC values of other algorithms are lower than that of the Psor algorithm. Meanwhile, as can be seen from the visual chart of the experimental results, the prediction effect of the ACT algorithm is the worst in the FINC network. In three networks of USAir, PB and Metabolite, the prediction accuracy of the LHN-II algorithm is lowest, and particularly in the Metabolite network, the AUC of the LHN-II algorithm is lower than 0.5, which indicates that the LHN-II algorithm is not suitable for predicting the network. Therefore, overall, the link prediction performance of the Psor algorithm is better than that of other seven algorithms.
In order to more clearly illustrate the effectiveness of the Psor algorithm, table 1 shows AUC specific values of each similarity algorithm simulated in different networks. It can be seen that the AUC value of the Psor algorithm is the highest in the four networks, FINC, USAir, PB, and metablate, compared to the other seven algorithms. In the FINC network, the AUC value of the Psor algorithm is at least improved by 0.1% compared with other seven algorithms; in a USAir network, the AUC value of the Psor algorithm is improved by at least 1.96% compared with other seven algorithms; in a PB network, the AUC value of the Psor algorithm is at least improved by 0.47% compared with other seven algorithms; in the Metabolite network, the AUC value of the Psor algorithm is improved by at least 2.09 percent compared with other seven algorithms. Therefore, in a general view, compared with seven classic link prediction algorithms, namely Salton, sorensen, HDI, LHN, PA, LHN-II and ACT, the Psor algorithm has better prediction effect.
TABLE 1 AUC values of 5 true networks under different algorithms with 90% training set and 10% test set
Figure BSA0000239229870000051
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it is obvious that those skilled in the art can make various changes and modifications of the present invention without departing from the spirit and scope of the present invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (4)

1. A link prediction method based on node influence is established, and is characterized by comprising the following steps:
1) Integrating degrees of the node and neighbor nodes, and defining a Psor index of the node;
2) Comprehensively considering local structure information of a complex network, and establishing a Psor similarity index;
3) And comparing the five real network data with seven existing link prediction algorithms to verify the effectiveness of the algorithm.
2. The method for establishing a link prediction method based on node influence according to claim 1, wherein in the step 1), the method for integrating the degrees of the node itself and the neighboring nodes to define the Psor index of the node comprises the following steps:
in order to better balance the influence of a node in a complex network, the invention integrates the degrees of the node and the neighbor nodes to define the Psor index of the node, which is one third power of the value of the sum of the degrees of the neighbor nodes to the square of the number of the neighbors of the node, and the calculation formula is
Psor x =((∑k i ) 2 /k x ) 1/3 (1)
Where i is the neighbor of node x, k x And k i Representing degrees for node x and node i, respectively. It can be seen that the magnitude of the Psor exponent of the node in the formula (1) is determined by the degrees of the node and the degrees of the neighbor nodes. When the degree of a node is constant, the larger the degree of a neighbor node is, the larger the Psor index of the node is, namely, the larger the influence of the node is. Compared with the existing link prediction index which only considers the node degree as the influence resource of the node, the Psor index of the node can more widely quantify the influence of the node.
3. The method for establishing a link prediction method based on node influence according to claim 1, wherein in the step 2), the local structure information of the complex network is considered comprehensively, and the method for establishing the Psor similarity index comprises the following steps:
the invention provides a Psor similarity index for predicting network similarity by comprehensively considering local structure information of a complex network, which is defined as follows:
Figure FSA0000239229860000011
wherein the content of the first and second substances,
Figure FSA0000239229860000012
representing the Psor similarity index of the node pair (x, y), wherein | Γ (x) andΓ (y) | is the number of common neighbors of the node x and the node y, and k x And k y Degree for node x and node y, respectively, psor x And Psor y Respectively the Psor indices of node x and node y.
It can be seen that the Psor similarity index in the formula (2)
Figure FSA0000239229860000013
By predicting the number of common neighbors of a node pair (x, y), the degree of two nodes, and twoThree influence factors of the Psor index of each node are jointly determined. When the common neighbors of a pair of predicted nodes and the degree of the node are constant, the greater the Psor index of the two predicted nodes,
Figure FSA0000239229860000014
the greater the similarity score value, the greater the likelihood that two unconnected nodes in the network are connected.
4. The method for establishing a link prediction method based on node influence according to claim 1, wherein in step 3), the validity of the proposed algorithm is verified by comparing with seven existing link prediction algorithms on five real network data as follows:
five real network data are selected for carrying out simulation experiments, a data set is divided into a training set of 90% and a testing set of 10%, and prediction effects AUCs of a Psor algorithm and seven commonly used link prediction algorithms are compared on five networks. Simulation results show that in four networks of FINC, USAir, PB and Metabolite, the AUC value of the Psor algorithm is obviously higher than the AUC values of seven classic link prediction algorithms of Salton, sorensen, HDI, LHN, PA, LHN-II and ACT, namely the Psor algorithm can obtain better prediction effect on prediction accuracy. In the FINC network, the AUC value of the Psor algorithm is at least improved by 0.1 percent compared with other seven algorithms; in a USAir network, the AUC value of the Psor algorithm is improved by at least 1.96% compared with other seven algorithms; in a PB network, the AUC value of the Psor algorithm is at least improved by 0.47% compared with other seven algorithms; in the Metabolite network, the AUC value of the Psor algorithm is improved by at least 2.09 percent compared with other seven algorithms. In the Yeast network, the AUC value of the LHN-II algorithm is the highest, while the AUC value of the Psor algorithm is next to that of the LHN-II algorithm, and the AUC values of other algorithms are lower than that of the Psor algorithm. Therefore, overall, the link prediction performance of the Psor algorithm is better than that of other seven algorithms.
CN202110391814.2A 2021-04-13 2021-04-13 Link prediction method based on node influence Pending CN115208774A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110391814.2A CN115208774A (en) 2021-04-13 2021-04-13 Link prediction method based on node influence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110391814.2A CN115208774A (en) 2021-04-13 2021-04-13 Link prediction method based on node influence

Publications (1)

Publication Number Publication Date
CN115208774A true CN115208774A (en) 2022-10-18

Family

ID=83570287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110391814.2A Pending CN115208774A (en) 2021-04-13 2021-04-13 Link prediction method based on node influence

Country Status (1)

Country Link
CN (1) CN115208774A (en)

Similar Documents

Publication Publication Date Title
Liu et al. GMM: A generalized mechanics model for identifying the importance of nodes in complex networks
Yuan et al. Improved trust-aware recommender system using small-worldness of trust networks
Ye et al. Predicting positive and negative links in signed social networks by transfer learning
Pang et al. Collaborative filtering recommendation for MOOC application
Liu et al. Stereotrust: a group based personalized trust model
Li et al. Retracted: A clustering-based link prediction method in social networks
Pan et al. Trust-enhanced cloud service selection model based on QoS analysis
Qian et al. Quantifying edge significance on maintaining global connectivity
CN110705045A (en) Link prediction method for constructing weighting network by using network topological characteristics
Mesgari et al. Identifying key nodes in social networks using multi-criteria decision-making tools
Zhang et al. Identification and quantification of node criticality through EWM–TOPSIS: a study of Hong Kong’s MTR system
Kuang et al. A spam worker detection approach based on heterogeneous network embedding in crowdsourcing platforms
Dong et al. TSIFIM: A three-stage iterative framework for influence maximization in complex networks
Liu et al. Link prediction model for weighted networks based on evidence theory and the influence of common neighbours
Buccafurri et al. Experimenting with certified reputation in a competitive multi-agent scenario
CN113518010B (en) Link prediction method, device and storage medium
Pan et al. An improved trust model based on interactive ant algorithms and its applications in wireless sensor networks
CN115208774A (en) Link prediction method based on node influence
He et al. A fast simulated annealing strategy for community detection in complex networks
Bilal et al. Robustness quantification of hierarchical complex networks under targeted failures
CN114092216A (en) Enterprise credit rating method, apparatus, computer device and storage medium
CN113722554A (en) Data classification method and device and computing equipment
Liu et al. Towards dynamic reconfiguration of composite services via failure estimation of general and domain quality of services
Ji et al. An adaptive radial basis function method using weighted improvement
Ghaznavi et al. Assessing usage of negative similarity and distrust information in CF-based recommender system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221018