CN107231252B - Link prediction method based on Bayesian estimation and seed node neighbor set - Google Patents

Link prediction method based on Bayesian estimation and seed node neighbor set Download PDF

Info

Publication number
CN107231252B
CN107231252B CN201710366159.9A CN201710366159A CN107231252B CN 107231252 B CN107231252 B CN 107231252B CN 201710366159 A CN201710366159 A CN 201710366159A CN 107231252 B CN107231252 B CN 107231252B
Authority
CN
China
Prior art keywords
nodes
node
network
probability
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710366159.9A
Other languages
Chinese (zh)
Other versions
CN107231252A (en
Inventor
杨旭华
项旗立
张海丰
肖杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201710366159.9A priority Critical patent/CN107231252B/en
Publication of CN107231252A publication Critical patent/CN107231252A/en
Application granted granted Critical
Publication of CN107231252B publication Critical patent/CN107231252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A link prediction method based on Bayesian estimation and a seed node neighbor set is characterized in that a network model is established, two nodes which are not directly connected are selected as seed nodes, the probability of edges existing between the two nodes and the probability of edges not existing between the two nodes are respectively calculated, the probability of edges generated between the two nodes and the probability of edges not generated between the two nodes are respectively calculated according to the degree information of intermediate nodes with the length of 2 or 3 paths between the two nodes, the likelihood values of each intermediate node with the length of 2 and 3 paths between the two nodes are calculated according to the Bayesian estimation and the seed node neighbor set, and the similarity score is the sum of the likelihood values of all the intermediate nodes; and traversing the network, acquiring similarity scores between any two seed nodes by using the method, arranging all seed node pairs in a descending order according to the similarity scores, and taking the node pairs corresponding to the first B score values as predicted continuous edges. According to Bayes estimation, different intermediate nodes in a local path between two nodes are distinguished by combining a seed node neighbor set, so that the algorithm has different importance and a good prediction effect.

Description

Link prediction method based on Bayesian estimation and seed node neighbor set
Technical Field
The invention relates to the field of network science and link prediction, in particular to a link prediction method based on Bayesian estimation and a seed node neighbor set.
Background
The complex system in real life can be researched by using a complex network, wherein nodes in the network represent individuals in the complex system, and connecting edges represent the mutual relations among the nodes in the system. The link prediction is one of important research fields of complex networks, because the link prediction can predict links possibly generated between nodes in the evolution process of the network, the evolution trend of the network can be predicted in advance, and 'ghost sides' which do not exist in the network can be judged, so that researchers can be better helped to research the internal rules of the network.
The link prediction problem is of great interest to researchers. In comparison, the link prediction algorithm based on the network structure is more reliable and accurate compared with the prediction algorithm based on the network node attribute information. The Common Neighbor (CN) algorithm is a classical link prediction algorithm based on a network structure, which is also called a structure equivalence algorithm, i.e. there are many common neighbor nodes between the nodes, the more similar the two nodes are, the link prediction algorithm derived on the basis of the CN algorithm is the Salton algorithm, the Jaccard algorithm, the Sorenson algorithm, the HPI (high node favorable index), the HDI (high node unfavorable index), the LHN-I algorithm, the AA algorithm, the RA algorithm, etc., the Salton algorithm is also called as cosine similarity algorithm, the Sorenson algorithm is often used for researching ecological data, the HPI algorithm is often used for analyzing topological similarity of a metabolic network, the idea of the AA algorithm is that the contribution of a common neighbor node with small degree is larger than that of a common neighbor node with large degree, and the RA algorithm is proposed based on the AA algorithm and inspired by a resource allocation process; the similarity algorithm based on the Path mainly comprises Local Path indexes (LP) and a Katz algorithm LHN-II algorithm, overcomes the defect that the network effective information used by a CN algorithm is too little, and utilizes the effective information of the network from the global perspective, thereby improving the accuracy of link prediction to a certain extent.
The above classical algorithms mainly consider topological structure characteristics in the network, that is, the more similar the network characteristics between two nodes are, the more likely the two nodes are to generate links, and simulations of these methods in many networks have proved to be effective, but most of these traditional classical algorithms only consider degree information of intermediate nodes of a path with a length of two between node pairs without directly connected edges, and do not consider attributes of intermediate nodes of a path with a length greater than two, and these attributes in the network have a great effect on generating links between node pairs in fact. The traditional link prediction algorithm based on the seed node neighbor set only considers the intermediate nodes of the paths with the path length equal to 2 between the seed nodes, and only counts the intermediate nodes of the paths with the path length of 2, and does not distinguish the nodes, so that the importance of the intermediate nodes cannot be distinguished.
Disclosure of Invention
In order to overcome the defects that the existing link prediction method based on the seed node neighbor set only considers the intermediate nodes of the paths with the path lengths equal to 2 and 3 and only considers the degrees of the nodes to cause low prediction precision, the invention provides the link prediction method based on the Bayesian estimation and the seed node neighbor set with higher accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a link prediction method based on Bayesian estimation and a seed node neighbor set comprises the following steps:
the method comprises the following steps: establishing a network model G (V, E), wherein V represents a node set in a network, E represents a connecting edge set in the network, the total number of nodes in the network is marked as N, U represents a set of node pairs in the network, and | U | ═ N (N-1)/2 represents the total number of the node pairs in the network;
step two: two nodes x and y in the network are arbitrarily selected as seed nodes, and the probability that a direct connection edge exists between the two nodes is calculated:
Figure BDA0001301607190000021
where | E | represents the total number of edges actually present in the network, A1Indicating that a direct connection edge exists between the two nodes of x and y;
step three: calculating the probability that no direct connecting edge exists between any two nodes x and y in the network:
Figure BDA0001301607190000031
wherein A is0Indicating that no direct connection edge exists between the two nodes x and y;
step four: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability of generating a connecting edge between the nodes x and y:
P(A1|Vw)=Cw
wherein, Cw=2Ew/kw(kw-1),kwRepresents a node VwDegree of (E)wRepresents a node VwK of (a)wThe number of edges actually existing between the neighbor nodes;
step five: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability that no connecting edge is generated between the nodes x and y:
P(A0|Vw)=1-Cw
step six: calculating any one intermediate node V of the path with the length of 2 and 3 between the nodes x and y according to a Bayesian estimation methodwLikelihood value of
Figure BDA0001301607190000032
Figure BDA0001301607190000033
Step seven: repeating the fourth step to the sixth step for each intermediate node of the path with the length of 2 and 3 between the nodes x and y, and calculating the likelihood value of each intermediate node
Figure BDA0001301607190000034
Step eight: calculate the similarity score for nodes x and y:
Figure BDA0001301607190000035
where Q denotes the number of intermediate nodes of all paths between nodes x and y of length 2 and 3, MxRepresenting the sum of the number of first-order neighbors and second-order neighbors of a node x, wherein the first-order neighbors of the node x refer to nodes with the distance of 1 to the node x, and the second-order neighbors of the node x refer to nodes with the distance of 2 to the node x; myRepresents the sum of the first and second order neighbor numbers of node y;
step nine: traversing the whole network, repeating the second step to the eighth step for any two unconnected nodes, calculating similarity scores between all unconnected node pairs, and taking node pairs corresponding to the first B similarity score values as predicted connected edges according to the sequence of the similarity score values from high to low, wherein B is a set positive integer, B is less than or equal to D, and D is the number of all unconnected node pairs in the network.
The invention has the beneficial effects that: considering the local path with the path length equal to 2 or 3 between two unconnected nodes in the network, distinguishing the contribution of the degree of the middle node in the network to the generation of the link, providing a link prediction method based on Bayesian estimation and a seed node neighbor set, and having higher link prediction accuracy.
Drawings
Fig. 1 shows the effect of different intermediate nodes between any pair of nodes in the network where no directly connected edge exists on the link between this pair of nodes.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a link prediction method based on bayesian estimation and a seed node neighbor set includes the following steps:
the method comprises the following steps: establishing a network model G (V, E), wherein V represents a node set in a network, E represents a connecting edge set in the network, the total number of nodes in the network is marked as N, U represents a set of node pairs in the network, and | U | ═ N (N-1)/2 represents the total number of the node pairs in the network;
step two: two nodes x and y in the network are arbitrarily selected as seed nodes, namely, black dots in fig. 1, and the probability that a straight connecting edge exists between the two nodes is calculated:
Figure BDA0001301607190000041
where | E | represents the total number of edges actually present in the network, A1Indicating that a direct connection edge exists between the two nodes of x and y;
step three: calculating the probability that no direct connecting edge exists between any two nodes x and y in the network, as shown in fig. 1:
Figure BDA0001301607190000051
wherein A is0Indicating that no direct connection edge exists between the two nodes x and y;
step four: an intermediate node V according to a path of length 2 or 3 between nodes x and ywDegree information (shown in fig. 1), calculating the probability of generating a connecting edge between nodes x and y:
P(A1|Vw)=Cw
wherein, Cw=2Ew/kw(kw-1),kwRepresents a node VwDegree of (E)wRepresents a node VwK of (a)wThe number of edges actually existing between the neighbor nodes;
step five: an intermediate node V according to a path of length 2 or 3 between nodes x and ywDegree information (shown in fig. 1), calculating the probability that no connecting edge is generated between nodes x and y:
P(A0|Vw)=1-Cw
step six: calculating any one intermediate node V of the path with the length of 2 and 3 between the nodes x and y according to a Bayesian estimation methodwLikelihood value of
Figure BDA0001301607190000052
Figure BDA0001301607190000053
Step seven: repeating the fourth step to the sixth step for each intermediate node of the path with the length of 2 and 3 between the nodes x and y, and calculating the likelihood value of each intermediate node
Figure BDA0001301607190000054
Step eight: calculate the similarity score for nodes x and y:
Figure BDA0001301607190000055
where Q denotes the number of intermediate nodes of all paths between nodes x and y of length 2 and 3, MxRepresenting the sum of the number of first-order neighbors and second-order neighbors of a node x, wherein the first-order neighbors of the node x refer to nodes with the distance of 1 to the node x, and the second-order neighbors of the node x refer to nodes with the distance of 2 to the node x; myRepresents the sum of the first and second order neighbor numbers of node y;
step nine: traversing the whole network, repeating the second step to the eighth step for any two unconnected nodes, calculating similarity scores between all unconnected node pairs, and taking node pairs corresponding to the first B similarity score values as predicted connected edges according to the sequence of the similarity score values from high to low, wherein B is a set positive integer, B is less than or equal to D, and D is the number of all unconnected node pairs in the network.
As mentioned above, the present invention is made more clear by the specific implementation steps implemented in this patent. Any modification and variation of the present invention within the spirit of the present invention and the scope of the claims will fall within the scope of the present invention.

Claims (1)

1. A link prediction method based on Bayesian estimation and a seed node neighbor set is characterized in that: the method comprises the following steps:
the method comprises the following steps: establishing a network model G (V, E), wherein V represents a node set in a network, E represents a connecting edge set in the network, the total number of nodes in the network is marked as N, U represents a set of node pairs in the network, and | U | ═ N (N-1)/2 represents the total number of the node pairs in the network;
step two: two unconnected nodes x and y in the network are arbitrarily selected as seed nodes, and the probability that a direct connection edge exists between the two unconnected nodes x and y is calculated:
Figure FDA0002227403350000011
where | E | represents the total number of edges actually present in the network, A1Indicating that a direct connection edge exists between the two nodes of x and y;
step three: calculating the probability that no direct connecting edge exists between any two unconnected nodes x and y in the network:
Figure FDA0002227403350000012
wherein A is0Indicating that no direct connection edge exists between the two nodes x and y;
step four: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability of generating a connecting edge between the nodes x and y:
P(A1|Vw)=Cw
wherein, Cw=2Ew/kw(kw-1),kwRepresents a node VwDegree of (E)wRepresents a node VwK of (a)wThe number of edges actually existing between the neighbor nodes;
step five: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability that no connecting edge is generated between the nodes x and y:
P(A0|Vw)=1-Cw
step six: calculating any one intermediate node V of the path with the length of 2 and 3 between the nodes x and y according to a Bayesian estimation methodwLikelihood value of
Figure FDA0002227403350000021
Figure FDA0002227403350000022
Step seven: repeating the fourth step to the sixth step for each intermediate node of the path with the length of 2 and 3 between the nodes x and y, and calculating the likelihood value of each intermediate node
Figure FDA0002227403350000023
Step eight: calculate the similarity score for nodes x and y:
Figure FDA0002227403350000024
where Q denotes the number of intermediate nodes of all paths between nodes x and y of length 2 and 3, MxRepresenting the sum of the number of first-order neighbors and second-order neighbors of a node x, wherein the first-order neighbors of the node x refer to nodes with the distance of 1 to the node x, and the second-order neighbors of the node x refer to nodes with the distance of 2 to the node x; myRepresents the sum of the first and second order neighbor numbers of node y;
step nine: traversing the whole network, repeating the second step to the eighth step for any two unconnected nodes, calculating similarity scores between all unconnected node pairs, and taking node pairs corresponding to the first B similarity score values as predicted connected edges according to the sequence of the similarity score values from high to low, wherein B is a set positive integer, B is less than or equal to D, and D is the number of all unconnected node pairs in the network.
CN201710366159.9A 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and seed node neighbor set Active CN107231252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710366159.9A CN107231252B (en) 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and seed node neighbor set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710366159.9A CN107231252B (en) 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and seed node neighbor set

Publications (2)

Publication Number Publication Date
CN107231252A CN107231252A (en) 2017-10-03
CN107231252B true CN107231252B (en) 2020-05-05

Family

ID=59933376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710366159.9A Active CN107231252B (en) 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and seed node neighbor set

Country Status (1)

Country Link
CN (1) CN107231252B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108847993B (en) * 2018-07-20 2021-01-15 中电科新型智慧城市研究院有限公司 Link prediction method based on multi-order path intermediate node resource allocation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905246A (en) * 2014-03-06 2014-07-02 西安电子科技大学 Link prediction method based on grouping genetic algorithm
CN104765825A (en) * 2015-04-10 2015-07-08 清华大学 Method and device for predicting social network links based on cooperative fusion theory
CN105376243A (en) * 2015-11-27 2016-03-02 中国人民解放军国防科学技术大学 Differential privacy protection method for online social network based on stratified random graph
CN106326637A (en) * 2016-08-10 2017-01-11 浙江工业大学 Link prediction method based on local effective path degree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8463895B2 (en) * 2007-11-29 2013-06-11 International Business Machines Corporation System and computer program product to predict edges in a non-cumulative graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905246A (en) * 2014-03-06 2014-07-02 西安电子科技大学 Link prediction method based on grouping genetic algorithm
CN104765825A (en) * 2015-04-10 2015-07-08 清华大学 Method and device for predicting social network links based on cooperative fusion theory
CN105376243A (en) * 2015-11-27 2016-03-02 中国人民解放军国防科学技术大学 Differential privacy protection method for online social network based on stratified random graph
CN106326637A (en) * 2016-08-10 2017-01-11 浙江工业大学 Link prediction method based on local effective path degree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Accurate and fast link prediction in complex networks》;Weiyu Zhang,et al.;《2014 10th International Conference on Natural Computation (ICNC)》;20141208;全文 *
《复杂网络链路预测》;吕琳媛;《电子科技大学学报》;20100930;第39卷(第5期);全文 *

Also Published As

Publication number Publication date
CN107231252A (en) 2017-10-03

Similar Documents

Publication Publication Date Title
CN110532436B (en) Cross-social network user identity recognition method based on community structure
CN110677284B (en) Heterogeneous network link prediction method based on meta path
CN106326637A (en) Link prediction method based on local effective path degree
CN108734223A (en) The social networks friend recommendation method divided based on community
CN113518007B (en) Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning
Wu et al. Graph summarization for attributed graphs
CN107784327A (en) A kind of personalized community discovery method based on GN
CN115358487A (en) Federal learning aggregation optimization system and method for power data sharing
CN115270007B (en) POI recommendation method and system based on mixed graph neural network
CN113422695A (en) Optimization method for improving robustness of topological structure of Internet of things
CN107018027B (en) Link prediction method based on Bayesian estimation and common neighbor node degree
CN114254093A (en) Multi-space knowledge enhanced knowledge graph question-answering method and system
CN107332687B (en) Link prediction method based on Bayesian estimation and common neighbor
CN109740722A (en) A kind of network representation learning method based on Memetic algorithm
CN107231252B (en) Link prediction method based on Bayesian estimation and seed node neighbor set
CN106815653B (en) Distance game-based social network relationship prediction method and system
CN107135107B (en) Bayesian estimation and major node-based unfavorable link prediction method
CN109255433B (en) Community detection method based on similarity
Liu et al. Similarity-based common neighbor and sign influence model for link prediction in signed social networks
CN112035545B (en) Competition influence maximization method considering non-active node and community boundary
Zhang et al. Imbalanced networked multi-label classification with active learning
CN115587187A (en) Knowledge graph complementing method based on small sample
CN111709846A (en) Local community discovery algorithm based on line graph
CN109711478A (en) A kind of large-scale data group searching method based on timing Density Clustering
CN116842199B (en) Knowledge graph completion method based on multi-granularity hierarchy and dynamic embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant