CN107332687B - Link prediction method based on Bayesian estimation and common neighbor - Google Patents

Link prediction method based on Bayesian estimation and common neighbor Download PDF

Info

Publication number
CN107332687B
CN107332687B CN201710378145.9A CN201710378145A CN107332687B CN 107332687 B CN107332687 B CN 107332687B CN 201710378145 A CN201710378145 A CN 201710378145A CN 107332687 B CN107332687 B CN 107332687B
Authority
CN
China
Prior art keywords
nodes
network
node
calculating
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710378145.9A
Other languages
Chinese (zh)
Other versions
CN107332687A (en
Inventor
杨旭华
徐恩平
张海丰
肖杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201710378145.9A priority Critical patent/CN107332687B/en
Publication of CN107332687A publication Critical patent/CN107332687A/en
Application granted granted Critical
Publication of CN107332687B publication Critical patent/CN107332687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A link prediction method based on Bayes estimation and common neighbors is characterized in that a network model is established, two nodes which are not directly connected are selected as seed nodes, the probability of the existence and the non-existence of a connecting edge between the two nodes is respectively calculated, the probability of the existence and the non-existence of the connecting edge between the two nodes is respectively calculated according to the intermediate node degree information of a path with the length of 2 or 3 between the two nodes, the likelihood value of each intermediate node of the path with the length of 2 or 3 between the two nodes is calculated according to the Bayes estimation and the common neighbor information, and the similarity score is the sum of the likelihood values of all the intermediate nodes; traversing the network, acquiring the similarity score between any two seed nodes by using the method, arranging all seed node pairs in a descending order according to the similarity score, and taking the node pairs corresponding to the first B score values as prediction continuous edges. According to Bayes estimation, the invention combines common neighbor information to distinguish different intermediate nodes in a local path between two nodes, and has different importance, and the algorithm has good prediction effect.

Description

Link prediction method based on Bayesian estimation and common neighbor
Technical Field
The invention relates to the field of network science and link prediction, in particular to a link prediction method based on Bayesian estimation and common neighbors.
Background
The complex system in real life can be researched by using a complex network, wherein nodes in the network represent individuals in the complex system, and connecting edges represent the mutual relations among the nodes in the system. The link prediction is one of important research fields of complex networks, because the link prediction can predict links possibly generated between nodes in the evolution process of the network, the evolution trend of the network can be predicted in advance, and 'ghost sides' which do not exist in the network can be judged, so that researchers can be better helped to research the internal rules of the network.
The link prediction problem is of great interest to researchers. In comparison, the link prediction algorithm based on the network structure is more reliable and accurate compared with the prediction algorithm based on the network node attribute information. The Common Neighbor (CN) algorithm is a classical link prediction algorithm based on a network structure, which is also called a structure equivalence algorithm, i.e. there are many common neighbor nodes between the nodes, the more similar the two nodes are, the link prediction algorithm derived on the basis of the CN algorithm is the Salton algorithm, the Jaccard algorithm, the Sorenson algorithm, the HPI (high node favorable index), the HDI (high node unfavorable index), the LHN-I algorithm, the AA algorithm, the RA algorithm, etc., the Salton algorithm is also called as cosine similarity algorithm, the Sorenson algorithm is often used for researching ecological data, the HPI algorithm is often used for analyzing topological similarity of a metabolic network, the idea of the AA algorithm is that the contribution of a common neighbor node with small degree is larger than that of a common neighbor node with large degree, and the RA algorithm is proposed based on the AA algorithm and inspired by a resource allocation process; the similarity algorithm based on the Path mainly comprises Local Path indexes (LP) and a Katz algorithm LHN-II algorithm, overcomes the defect that the network effective information used by a CN algorithm is too little, and utilizes the effective information of the network from the global perspective, thereby improving the accuracy of link prediction to a certain extent.
The above classical algorithms mainly consider the topological characteristics in the network, i.e. the more similar the network characteristics between two nodes are, the more likely the two nodes are to generate links, and these methods are proved to be effective in many networks, but these algorithms simply count the number of intermediate nodes between the node pairs in the network, and do not distinguish the role of each intermediate node. In fact, in many networks, the role of an intermediate node between two nodes in generating a link between a pair of nodes is very different, and the contribution of different intermediate nodes in generating a link is also different. Traditional CN indicator based algorithms do not effectively distinguish between different intermediate nodes.
Disclosure of Invention
In order to overcome the defect of low prediction precision caused by the fact that the existing link prediction method based on the common neighbor does not well distinguish the different functions of different common neighbors between any two unconnected nodes in a network, the invention provides a link prediction method based on Bayesian estimation and the common neighbors with higher accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a link prediction method based on Bayesian estimation and common neighbors comprises the following steps:
the method comprises the following steps: establishing a network model G (V, E), wherein V represents a node set in a network, E represents a connecting edge set in the network, the total number of nodes in the network is marked as N, U represents a set of node pairs in the network, and | U | ═ N (N-1)/2 represents the total number of the node pairs in the network;
step two: two nodes x and y in the network are arbitrarily selected as seed nodes, and the probability that a direct connection edge exists between the two nodes is calculated:
Figure BDA0001301608170000021
where | E | represents the total number of edges actually present in the network, A1Indicating that a direct connection edge exists between the two nodes of x and y;
step three: calculating the probability that no direct connecting edge exists between any two nodes x and y in the network:
Figure BDA0001301608170000031
wherein A is0Indicating that no direct connection edge exists between the two nodes x and y;
step four: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability of generating a connecting edge between the nodes x and y:
P(A1|Vw)=Cw
wherein, Cw=2Ew/kw(kw-1),kwRepresents a node VwDegree of (E)wRepresents a node VwK of (a)wThe number of edges actually existing between the neighbor nodes;
step five: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability that no connecting edge is generated between the nodes x and y:
P(A0|Vw)=1-Cw
step six: calculating any one intermediate node V of the path with the length of 2 and 3 between the nodes x and y according to a Bayesian estimation methodwLikelihood value of
Figure BDA0001301608170000032
Figure BDA0001301608170000033
Step seven: repeating the fourth step to the sixth step for each intermediate node of the path with the length of 2 and 3 between the nodes x and y, and calculating the likelihood value of each intermediate node
Figure BDA0001301608170000034
Step eight: computing similarity scores S for nodes x and yxy
Figure BDA0001301608170000035
Figure BDA0001301608170000036
Represents the sum of the likelihood values of all nodes in all paths between nodes x and y of lengths 2 and 3;
step nine: traversing the whole network, repeating the second step to the eighth step for any two unconnected nodes, calculating similarity scores between all unconnected node pairs, and taking node pairs corresponding to the first B similarity score values as predicted connected edges according to the sequence of the similarity score values from high to low, wherein B is a set positive integer, B is less than or equal to D, and D is the number of all unconnected node pairs in the network.
The invention has the beneficial effects that: the link prediction method based on Bayesian estimation and common neighbors is provided by considering the local path with the path length equal to 2 or 3 between two unconnected nodes in the network and distinguishing the contribution of the intermediate nodes in the network to the generation of the link, and the link prediction accuracy is high.
Drawings
Fig. 1 shows the effect of different intermediate nodes between any pair of nodes in the network where no directly connected edge exists on the link between this pair of nodes.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a link prediction method based on bayesian estimation and common neighbors comprises the following steps:
the method comprises the following steps: establishing a network model G (V, E), wherein V represents a node set in a network, E represents a connecting edge set in the network, the total number of nodes in the network is marked as N, U represents a set of node pairs in the network, and | U | ═ N (N-1)/2 represents the total number of the node pairs in the network;
step two: two nodes x and y in the network are arbitrarily selected as seed nodes, namely, black dots in fig. 1, and the probability that a straight connecting edge exists between the two nodes is calculated:
Figure BDA0001301608170000041
where | E | represents the total number of edges actually present in the network, A1Indicating that a direct connection edge exists between the two nodes of x and y;
step three: calculating the probability that no direct connecting edge exists between any two nodes x and y in the network, as shown in fig. 1:
Figure BDA0001301608170000042
wherein A is0Indicating that no direct connection edge exists between the two nodes x and y;
step four: according to a length between nodes x and y of 2 orAn intermediate node V of the path of person 3wDegree information (shown in fig. 1), calculating the probability of generating a connecting edge between nodes x and y:
P(A1|Vw)=Cw
wherein, Cw=2Ew/kw(kw-1),kwRepresents a node VwDegree of (E)wRepresents a node VwK of (a)wThe number of edges actually existing between the neighbor nodes;
step five: an intermediate node V according to a path of length 2 or 3 between nodes x and ywDegree information (shown in fig. 1), calculating the probability that no connecting edge is generated between nodes x and y:
P(A0|Vw)=1-Cw
step six: calculating any one intermediate node V of the path with the length of 2 and 3 between the nodes x and y according to a Bayesian estimation methodwLikelihood value of
Figure BDA0001301608170000051
Figure BDA0001301608170000052
Step seven: repeating the fourth step to the sixth step for each intermediate node of the path with the length of 2 and 3 between the nodes x and y, and calculating the likelihood value of each intermediate node
Figure BDA0001301608170000053
Step eight: computing similarity scores S for nodes x and yxy
Figure BDA0001301608170000054
Figure BDA0001301608170000055
Representing all paths between nodes x and y of lengths 2 and 3The sum of the likelihood values of all nodes;
step nine: traversing the whole network, repeating the second step to the eighth step for any two unconnected nodes, calculating similarity scores between all unconnected node pairs, and taking node pairs corresponding to the first B similarity score values as predicted connected edges according to the sequence of the similarity score values from high to low, wherein B is a set positive integer, B is less than or equal to D, and D is the number of all unconnected node pairs in the network.
As mentioned above, the present invention is made more clear by the specific implementation steps implemented in this patent. Any modification and variation of the present invention within the spirit of the present invention and the scope of the claims will fall within the scope of the present invention.

Claims (1)

1. A link prediction method based on Bayesian estimation and common neighbors is characterized in that: the method comprises the following steps:
the method comprises the following steps: establishing a network model G (V, E), wherein V represents a node set in a network, E represents a connecting edge set in the network, the total number of nodes in the network is marked as N, U represents a set of node pairs in the network, and | U | ═ N (N-1)/2 represents the total number of the node pairs in the network;
step two: two unconnected nodes x and y in the network are arbitrarily selected as seed nodes, and the probability that a direct connection edge exists between the two unconnected nodes x and y is calculated:
Figure FDA0002227401610000011
where | E | represents the total number of edges actually present in the network, A1Indicating that a direct connection edge exists between the two nodes of x and y;
step three: calculating the probability that no direct connecting edge exists between any two unconnected nodes x and y in the network:
Figure FDA0002227401610000012
wherein A is0Means x and y are not between two nodesThere are straight connecting edges;
step four: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability of generating a connecting edge between the nodes x and y:
P(A1|Vw)=Cw
wherein, Cw=2Ew/kw(kw-1),kwRepresents a node VwDegree of (E)wRepresents a node VwK of (a)wThe number of edges actually existing between the neighbor nodes;
step five: an intermediate node V according to a path of length 2 or 3 between nodes x and ywCalculating the probability that no connecting edge is generated between the nodes x and y:
P(A0|Vw)=1-Cw
step six: calculating any one intermediate node V of the path with the length of 2 and 3 between the nodes x and y according to a Bayesian estimation methodwLikelihood value of
Figure FDA0002227401610000021
Figure FDA0002227401610000022
Step seven: repeating the fourth step to the sixth step for each intermediate node of the path with the length of 2 and 3 between the nodes x and y, and calculating the likelihood value of each intermediate node
Figure FDA0002227401610000023
Step eight: computing similarity scores S for nodes x and yxy
Figure FDA0002227401610000024
Figure FDA0002227401610000025
Represents the sum of the likelihood values of all intermediate nodes in all paths of length 2 and 3 between nodes x and y;
step nine: traversing the whole network, repeating the second step to the eighth step for any two unconnected nodes, calculating similarity scores between all unconnected node pairs, and taking node pairs corresponding to the first B similarity score values as predicted connected edges according to the sequence of the similarity score values from high to low, wherein B is a set positive integer, B is less than or equal to D, and D is the number of all unconnected node pairs in the network.
CN201710378145.9A 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and common neighbor Active CN107332687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710378145.9A CN107332687B (en) 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and common neighbor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710378145.9A CN107332687B (en) 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and common neighbor

Publications (2)

Publication Number Publication Date
CN107332687A CN107332687A (en) 2017-11-07
CN107332687B true CN107332687B (en) 2020-05-05

Family

ID=60193049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710378145.9A Active CN107332687B (en) 2017-05-23 2017-05-23 Link prediction method based on Bayesian estimation and common neighbor

Country Status (1)

Country Link
CN (1) CN107332687B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109039722B (en) * 2018-07-20 2021-05-28 中电科新型智慧城市研究院有限公司 Link prediction method based on common neighbor node resource allocation and naive Bayes
CN110083778A (en) * 2019-04-08 2019-08-02 清华大学 The figure convolutional neural networks construction method and device of study separation characterization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905246A (en) * 2014-03-06 2014-07-02 西安电子科技大学 Link prediction method based on grouping genetic algorithm
CN104765825A (en) * 2015-04-10 2015-07-08 清华大学 Method and device for predicting social network links based on cooperative fusion theory
CN105376243A (en) * 2015-11-27 2016-03-02 中国人民解放军国防科学技术大学 Differential privacy protection method for online social network based on stratified random graph
CN106326637A (en) * 2016-08-10 2017-01-11 浙江工业大学 Link predicting method based on local effective path degree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8463895B2 (en) * 2007-11-29 2013-06-11 International Business Machines Corporation System and computer program product to predict edges in a non-cumulative graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905246A (en) * 2014-03-06 2014-07-02 西安电子科技大学 Link prediction method based on grouping genetic algorithm
CN104765825A (en) * 2015-04-10 2015-07-08 清华大学 Method and device for predicting social network links based on cooperative fusion theory
CN105376243A (en) * 2015-11-27 2016-03-02 中国人民解放军国防科学技术大学 Differential privacy protection method for online social network based on stratified random graph
CN106326637A (en) * 2016-08-10 2017-01-11 浙江工业大学 Link predicting method based on local effective path degree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Accurate and fast link prediction in complex networks》;Weiyu Zhang,et al.;《2014 10th International Conference on Natural Computation (ICNC)》;20141208;全文 *
《复杂网络链路预测》;吕琳媛;《电子科技大学学报》;20100930;第39卷(第5期);全文 *

Also Published As

Publication number Publication date
CN107332687A (en) 2017-11-07

Similar Documents

Publication Publication Date Title
CN110532436B (en) Cross-social network user identity recognition method based on community structure
Wu et al. Attribute weighting via differential evolution algorithm for attribute weighted naive bayes (wnb)
CN106326637A (en) Link predicting method based on local effective path degree
Ma et al. Decomposition-based multiobjective evolutionary algorithm for community detection in dynamic social networks
CN113422695B (en) Optimization method for improving robustness of topological structure of Internet of things
CN108734223A (en) The social networks friend recommendation method divided based on community
CN105574541A (en) Compactness sorting based network community discovery method
CN109740106A (en) Large-scale network betweenness approximation method based on graph convolution neural network, storage device and storage medium
CN107332687B (en) Link prediction method based on Bayesian estimation and common neighbor
CN107784327A (en) A kind of personalized community discovery method based on GN
Gupte et al. Role discovery in graphs using global features: Algorithms, applications and a novel evaluation strategy
CN107018027B (en) Link prediction method based on Bayesian estimation and common neighbor node degree
Lu et al. Measuring and improving communication robustness of networks
CN106844445B (en) Resource description framework RDF graph partitioning method based on semantics
CN115051929A (en) Network fault prediction method and device based on self-supervision target perception neural network
Yang et al. Multi-attribute ranking method for identifying key nodes in complex networks based on GRA
CN107231252B (en) Link prediction method based on Bayesian estimation and seed node neighbor set
CN114254093A (en) Multi-space knowledge enhanced knowledge graph question-answering method and system
CN107135107B (en) Bayesian estimation and major node-based unfavorable link prediction method
CN117272195A (en) Block chain abnormal node detection method and system based on graph convolution attention network
CN112035545B (en) Competition influence maximization method considering non-active node and community boundary
Liu et al. Similarity-based common neighbor and sign influence model for link prediction in signed social networks
CN109948001B (en) Minimum community discovery method for sub-linear time distributed computing girth
Liu et al. An entropy-based gravity model for influential spreaders identification in complex networks
CN111709846A (en) Local community discovery algorithm based on line graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant