CN115208774A

CN115208774A - Link prediction method based on node influence

Info

Publication number: CN115208774A
Application number: CN202110391814.2A
Authority: CN
Inventors: 张月霞; 邹列
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2022-10-18

Abstract

The invention provides a link prediction method based on node influence, which comprises the following steps: firstly, degrees of a node and neighbor nodes are integrated, a Psor index of the node is defined, and the influence of the node can be better measured by utilizing the Psor index of the node; secondly, comprehensively considering local structure information of the complex network, and defining a Psor similarity index to predict a link; and finally, verifying the effectiveness of the proposed algorithm through five real network data. The method can predict the complex network link more accurately by considering the influence of the node and the contribution of the link of the neighbor node to the prediction, and has higher prediction precision.

Description

Link prediction method based on node influence

Technical Field

The invention relates to the field of link prediction, in particular to a link prediction method based on node influence.

Background

In real life, a plurality of complex systems can be represented into a complex network in a node and connecting line mode, and the internal structure information of the complex systems can be better known and understood by researching the complex network in a scientific mode. Link prediction is one of the ways to study a complex network, and means to predict, from known various information (network topology, node attributes, etc.), the possibility of a connection occurring between two nodes for which no continuous edge exists in the network. Such predictions include both predictions of unknown links and predictions of future links. At present, link prediction is widely applied to the research of network evolution rules, and has huge practical application value. Such as: predicting the possibility of interaction between two strange friends in social networks of microblogs and blogs according to social relation information of a user on a social platform; in a commodity purchasing system and a news, music and live broadcast entertainment system, information required by a user is predicted according to the personal requirement and action behavior of the user, so that corresponding commodities and related favorite contents are recommended. Therefore, the method has practical significance for researching the link prediction problem of the complex network.

The link prediction methods are mainly classified into a machine learning-based method, a maximum likelihood probability-based method, and a network topology-based method. Although the method based on machine learning and maximum likelihood probability can obtain higher accuracy, the complexity is high, and the application range is limited. The method based on the network topology structure utilizes the topology structure to enable the link prediction to be easier to realize, and is concerned by most researchers. The existing similarity prediction algorithm based on the network topology structure is divided into local, global and quasi-local similarity algorithms. The local similarity algorithm mainly calculates the similarity score according to the number of common neighbors, the degree of a node or the degree of the common neighbors. For example, consider the CN index for the number of common neighbors; considering the priority connection PA index of the node degree; taking the self-degree Salton index, sorensen index, HPI index, HDI index and Leicht-Holme-Newman (LHN) index of the node into consideration on the basis of the common neighbor; and utilizing the AA index and the resource allocation RA index of the common neighbor node degree. The global similarity algorithm considers all information of the network topology, for example, katz indexes and LHN-II indexes of all paths in the network. The quasi-local similarity algorithm trades off the high accuracy and complexity of the global similarity algorithm, such as the local random walk indicator (LRW), the superimposed random walk indicator (SRW), and the recently proposed degree and H-index simple mixed influence indicator (SHI) by zhu et al. The similarity algorithm based on the network topology structure focuses on the number of common neighbor nodes, the degree of the common neighbor nodes and the transmission path capacity among the nodes, but ignores the influence of the degree of the neighbor nodes, and cannot relatively and comprehensively acquire the topology information of the network, so that the accuracy of link prediction is not high enough. The link prediction method based on the node influence considers the influence of the degree of the neighbor node, can improve the accuracy of link prediction, and is suitable for node connection prediction in a complex network.

Disclosure of Invention

The invention provides a link prediction method based on node influence, which aims at the problem of low prediction precision of the existing link prediction method based on network topology. The method integrates the degrees of the node and the neighbor nodes, defines the Psor index of the node, and can better measure the influence of one node by using the Psor index of the node compared with the degree of the node. The method considers the local structure information of the complex network and defines the Psor similarity index to predict the link, thereby further improving the accuracy of the link prediction. Simulation results show that the Psor algorithm has higher prediction precision and can effectively predict the network structure.

The link prediction method based on the node influence comprises the following steps:

1) Integrating degrees of the node and neighbor nodes, and defining a Psor index of the node;

2) Comprehensively considering local structure information of a complex network, and establishing a Psor similarity index;

3) And comparing the five real network data with seven existing link prediction algorithms to verify the effectiveness of the algorithm.

The method for integrating the degrees of the node and the neighbor nodes in the step 1) and defining the Psor index of the node comprises the following steps:

in order to better balance the influence of a node in a complex network, the invention integrates the degrees of the node and the neighbor nodes to define the Psor index of the node, which is one third power of the value of the sum of the degrees of the neighbor nodes to the square of the number of the neighbors of the node, and the calculation formula is

Where i is a neighbor of node x, k _x And k _t Representing degrees for node x and node i, respectively. It can be seen from this thatThe Psor index size of the node in the formula (1) is determined by the degrees of the node and the degrees of the neighbor nodes. When the degree of a node is constant, the larger the degree of a neighbor node is, the larger the Psor index of the node is, namely, the larger the influence of the node is. Compared with the existing link prediction index which only considers the node degree as the influence resource of the node, the Psor index of the node can more widely quantify the influence of the node.

In the step 2), the method for establishing the Psor similarity index by comprehensively considering the local structure information of the complex network comprises the following steps:

the invention provides a Psor similarity index for predicting network similarity by comprehensively considering local structure information of a complex network, which is defined as follows:

wherein the content of the first and second substances,

representing the Psor similarity index of the node pair (x, y), wherein | Γ (x) andΓ (y) | is the number of common neighbors of the node x and the node y, and k _x And k _y Degree for node x and node y, respectively, psor _x And Psor _y The Psor indices of node x and node y, respectively.

From this, it can be seen that the Psor similarity index in the formula (2)

The method is characterized by comprising the following steps of determining three influence factors of the number of common neighbors of a predicted node pair (x, y), the degree of two nodes and the Psor index of the two nodes. When the degree of the common neighbors and nodes of the predicted node pair is constant, the greater the Psor index of the two predicted nodes,

the greater the similarity score value, the greater the likelihood that two unconnected nodes in the network are connected.

In the step 3), the method for verifying the validity of the proposed algorithm by comparing five real network data with seven existing link prediction algorithms is as follows:

five real network data are selected to carry out a simulation experiment, a data set is divided into a training set with the percentage of 90% and a testing set with the percentage of 10%, and prediction effects AUC of a Psor algorithm and seven commonly used link prediction algorithms are compared on five networks. As shown in fig. 2, in four networks, i.e. FINC, USAir, PB, and Metabolite, the AUC value of the Psor algorithm is significantly higher than those of other seven classical link prediction similarity algorithms, that is, the Psor algorithm can obtain better prediction effect on prediction accuracy. In a Yeast network, the AUC value of the LHN-II algorithm is the highest, the AUC value of the Psor algorithm is second to the LHN-II algorithm, and the AUC values of other algorithms are lower than that of the Psor algorithm. Meanwhile, as can be seen from the visual chart of the experimental results, the prediction effect of the ACT algorithm is the worst in the FINC network. In three networks of USAir, PB and Metabolite, the prediction accuracy of the LHN-II algorithm is lowest, and particularly in the Metabolite network, the AUC of the LHN-II algorithm is lower than 0.5, which indicates that the LHN-II algorithm is not suitable for predicting the network. Therefore, in general, the link prediction performance of the Psor algorithm is better than that of other seven algorithms.

Drawings

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, in which:

FIG. 1 is a diagram of two complex networks;

fig. 2 is a schematic diagram of link prediction results of similarity algorithms in different networks.

Detailed Description

The following describes in further detail embodiments of the present invention with reference to the accompanying drawings.

Fig. 1 shows two complex networks. In fig. 1 (a) and 1 (b), the predicted nodes are x and y, the degrees of the node x and the node y are both 4, and the common neighbors of the nodes are node a and node G. The degree of neighbor nodes of the predicted nodes x and y is different in FIG. 1 (a) and FIG. 1 (b), where k is in FIG. 1 (a) _B ＝k _E ＝k _C ＝k _F =1, in the figureK in 1 (b) _B ＝k _E ＝k _C ＝k _F ＝2。

The similarity of the node pair (x, y) is calculated according to local similarity algorithms such as Salton, sorensen, HDI, LHN, and PA based on the degree of the node and the number of common neighbors, and the similarity scores of the node pair (x, y) in fig. 1 (a) and 1 (b) are the same, as in fig. 1 (a) and 1 (b), the similarity scores of the node pair (x, y) are both calculated to be 0.5 using the Salton algorithm. These algorithms are unable to predict which of the nodes x and y in fig. 1 (a) and 1 (b) are the most likely connections between them.

However, according to the defined Psor index of the node, i.e. equation (1), it can be obtained that the Psor indexes of the predicted nodes x and y in FIG. 1 (a) are Psor _x ＝Psor _y =2.08, psor index for nodes x and y in FIG. 1 (b) being Psor _x ＝Psor _y =2.52. Thus, the Psor indices of the predicted nodes x and y in FIG. 1 (b) are larger than the Psor indices of the predicted nodes x and y in FIG. 1 (a). It can be seen that the node pair (x, y) in fig. 1 (b) is more likely to be connected than the node pair (x, y) in fig. 1 (a).

The feasibility of the Psor algorithm was verified by two complex networks as shown in figure 1. The similarity score between node x and node y in fig. 1 (a) and 1 (b), respectively, is calculated according to a defined Psor similarity index, equation (2).

In FIG. 1 (a), k _x ＝k _y ＝4，Γ(x)∩Γ(y)|＝2，Psor _x ＝Psor _y =2.08, so it can be obtained from the formula (2)

In FIG. 1 (b), k _x ＝k _y ＝4，Γ(x)∩Γ(y)|＝2，Psor _x ＝Psor _y =2.52, so it can be obtained from equation (2)

Therefore, according to the Psor algorithm provided by the invention, the similarity score between the node pair (x, y) in fig. 1 (b) is higher than that between the node pair (x, y) in fig. 1 (a), so that the node x and the node y in fig. 1 (b) are more likely to be connected, which is consistent with the fact that the algorithm provided by the invention is feasible.

The effect of the invention is further illustrated by experimental simulation. The simulation experiment data set of the invention selects 5 networks from different fields, including FINC, USAir, yeast, PB and Metabolite. Wherein, FINC is a financial market network, USAir is an American air traffic network, yeast is a protein interaction network, PB is a politician blog network, and Metabolite is a nematode metabolism network.

Fig. 2 is a schematic diagram of link prediction results simulated by similarity algorithms in five networks when a data set is divided into a training set of 90% and a testing set of 10%. In four networks of FINC, USAir, PB and Metabolite, the AUC value of the Psor algorithm is obviously higher than the AUC values of other seven classical link prediction similarity algorithms, namely the Psor algorithm can obtain better prediction effect on prediction accuracy. In the Yeast network, the AUC value of the LHN-II algorithm is the highest, while the AUC value of the Psor algorithm is next to that of the LHN-II algorithm, and the AUC values of other algorithms are lower than that of the Psor algorithm. Meanwhile, as can be seen from the visual chart of the experimental results, the prediction effect of the ACT algorithm is the worst in the FINC network. In three networks of USAir, PB and Metabolite, the prediction accuracy of the LHN-II algorithm is lowest, and particularly in the Metabolite network, the AUC of the LHN-II algorithm is lower than 0.5, which indicates that the LHN-II algorithm is not suitable for predicting the network. Therefore, overall, the link prediction performance of the Psor algorithm is better than that of other seven algorithms.

In order to more clearly illustrate the effectiveness of the Psor algorithm, table 1 shows AUC specific values of each similarity algorithm simulated in different networks. It can be seen that the AUC value of the Psor algorithm is the highest in the four networks, FINC, USAir, PB, and metablate, compared to the other seven algorithms. In the FINC network, the AUC value of the Psor algorithm is at least improved by 0.1% compared with other seven algorithms; in a USAir network, the AUC value of the Psor algorithm is improved by at least 1.96% compared with other seven algorithms; in a PB network, the AUC value of the Psor algorithm is at least improved by 0.47% compared with other seven algorithms; in the Metabolite network, the AUC value of the Psor algorithm is improved by at least 2.09 percent compared with other seven algorithms. Therefore, in a general view, compared with seven classic link prediction algorithms, namely Salton, sorensen, HDI, LHN, PA, LHN-II and ACT, the Psor algorithm has better prediction effect.

TABLE 1 AUC values of 5 true networks under different algorithms with 90% training set and 10% test set

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it is obvious that those skilled in the art can make various changes and modifications of the present invention without departing from the spirit and scope of the present invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A link prediction method based on node influence is established, and is characterized by comprising the following steps:

2. The method for establishing a link prediction method based on node influence according to claim 1, wherein in the step 1), the method for integrating the degrees of the node itself and the neighboring nodes to define the Psor index of the node comprises the following steps:

Psor _x ＝((∑k _i ) ² /k _x ) ^1/3 (1)

Where i is the neighbor of node x, k _x And k _i Representing degrees for node x and node i, respectively. It can be seen that the magnitude of the Psor exponent of the node in the formula (1) is determined by the degrees of the node and the degrees of the neighbor nodes. When the degree of a node is constant, the larger the degree of a neighbor node is, the larger the Psor index of the node is, namely, the larger the influence of the node is. Compared with the existing link prediction index which only considers the node degree as the influence resource of the node, the Psor index of the node can more widely quantify the influence of the node.

3. The method for establishing a link prediction method based on node influence according to claim 1, wherein in the step 2), the local structure information of the complex network is considered comprehensively, and the method for establishing the Psor similarity index comprises the following steps:

wherein the content of the first and second substances,

representing the Psor similarity index of the node pair (x, y), wherein | Γ (x) andΓ (y) | is the number of common neighbors of the node x and the node y, and k _x And k _y Degree for node x and node y, respectively, psor _x And Psor _y Respectively the Psor indices of node x and node y.

It can be seen that the Psor similarity index in the formula (2)

By predicting the number of common neighbors of a node pair (x, y), the degree of two nodes, and twoThree influence factors of the Psor index of each node are jointly determined. When the common neighbors of a pair of predicted nodes and the degree of the node are constant, the greater the Psor index of the two predicted nodes,

4. The method for establishing a link prediction method based on node influence according to claim 1, wherein in step 3), the validity of the proposed algorithm is verified by comparing with seven existing link prediction algorithms on five real network data as follows:

five real network data are selected for carrying out simulation experiments, a data set is divided into a training set of 90% and a testing set of 10%, and prediction effects AUCs of a Psor algorithm and seven commonly used link prediction algorithms are compared on five networks. Simulation results show that in four networks of FINC, USAir, PB and Metabolite, the AUC value of the Psor algorithm is obviously higher than the AUC values of seven classic link prediction algorithms of Salton, sorensen, HDI, LHN, PA, LHN-II and ACT, namely the Psor algorithm can obtain better prediction effect on prediction accuracy. In the FINC network, the AUC value of the Psor algorithm is at least improved by 0.1 percent compared with other seven algorithms; in a USAir network, the AUC value of the Psor algorithm is improved by at least 1.96% compared with other seven algorithms; in a PB network, the AUC value of the Psor algorithm is at least improved by 0.47% compared with other seven algorithms; in the Metabolite network, the AUC value of the Psor algorithm is improved by at least 2.09 percent compared with other seven algorithms. In the Yeast network, the AUC value of the LHN-II algorithm is the highest, while the AUC value of the Psor algorithm is next to that of the LHN-II algorithm, and the AUC values of other algorithms are lower than that of the Psor algorithm. Therefore, overall, the link prediction performance of the Psor algorithm is better than that of other seven algorithms.