CN110569885A - multi-order motif directed network link prediction method based on naive Bayes - Google Patents

multi-order motif directed network link prediction method based on naive Bayes Download PDF

Info

Publication number
CN110569885A
CN110569885A CN201910764249.2A CN201910764249A CN110569885A CN 110569885 A CN110569885 A CN 110569885A CN 201910764249 A CN201910764249 A CN 201910764249A CN 110569885 A CN110569885 A CN 110569885A
Authority
CN
China
Prior art keywords
node
network
motif
edge
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910764249.2A
Other languages
Chinese (zh)
Inventor
刘亚芳
许小可
肖婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Nationalities University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Nationalities University filed Critical Dalian Nationalities University
Priority to CN201910764249.2A priority Critical patent/CN110569885A/en
Publication of CN110569885A publication Critical patent/CN110569885A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multistage motif directed network link prediction method based on naive Bayes, which comprises the following steps: s1, counting the integral structure of which two nodes can form a die body structure with a test edge; s2, calculating the number of closed motif structures which can be formed by all edges and other nodes in the network; s3, calculating the number of the unclosed motif structures which can be formed by all edges and other nodes in the network; s4, calculating each edge as a role function value of the motif to be formed; and S5, counting the total role value of the angle color of each edge. The application provides a link prediction algorithm of a four-node motif of a directed network based on naive Bayes, which fully applies the structural characteristics of the directed network and greatly improves the accuracy of link prediction.

Description

Multi-order motif directed network link prediction method based on naive Bayes
Technical Field
The invention relates to a link prediction method, in particular to a multistage motif directed network link prediction method based on naive Bayes.
Background
Link prediction is an important research direction in the field of complex networks, and the basic problem to be processed is to predict the possibility of a link between any two nodes in a network through known information such as network nodes and network structures. The link prediction can not only obtain the possibility that edges which do not exist in the network may exist in the future, but also find out whether the existing edges in the network are false edges or missing edges.
Among the link prediction methods based on the network structure, the method of common neighbor similarity is most commonly used. The method of Liben-Nowellhe and Kleinberg finding common neighbors based on nodes is one of the best methods of prediction accuracy. However, the common neighbor indexes between nodes do not consider the link direction between nodes, and cannot be directly applied to the directed network. The predicted edge and the common neighbor form a closed triangular structure, and the directional problem is considered on the basis of the triangular structure, so that a local structure of the directed network is formed.
A common link prediction algorithm is based on a common neighbor index, and the idea of the algorithm is that two nodes have more common neighbors, and the more edges tend to be generated between them. The algorithm considers that the contribution of each co-neighbor to the formation of a join is the same. In many practical networks, however, this assumption is not reasonable. For example, people can establish a new friendship relationship through common friends, but when two people pay attention to a star at the same time, whether the relationship between the two people is influenced by how many stars the two people pay attention to together is not influenced, and the result of the fact that the people pay attention to the same star at the same time is that the two people often do not know each other because of the influence of the star and the influence of the two people is not large. It is clear that such mutual neighbourhood has no great influence on whether there is a connection between two persons, but if the mutual neighbourhood of two persons is a friend of the same two persons, a connection between two persons is easy to occur. It is therefore necessary to consider the impact of the nodes in considering the link prediction process for a directed network.
Disclosure of Invention
The application provides a multi-order motif directed network link prediction method based on naive Bayes, and the prediction accuracy is improved by adding a role function.
In order to achieve the purpose, the technical scheme of the application is as follows: a multi-order motif directed network link prediction method based on naive Bayes comprises the following steps:
S1, counting the integral structure of which two nodes can form a die body structure with a test edge;
S2, calculating the number N of closed motif structures which can be formed by all edges and other nodes in the networkΔw
S3, calculating the number N of the unclosed motif structures which can be formed by all edges and other nodes in the networkΛw
S4, calculating each edge as a role function value of the motif to be formed
s5, counting the total role value of the angle color of each edgeWhere w is a number referring to all nodes (for a three-node motif) or edges (for a four-node motif) that can be combined with an edge (x, y) to form the shape of a given motif, Oxya neighbor in the direction is specified for node x and a neighbor node in the direction is specified for node y by the amount of overlap.
further, for the directed unweighted network, Γ (x) represents a neighbor in the direction specified by the node x, and Γ (y) represents a neighbor in the direction specified by the node y;
Oxy=|Γ(x)∩Γ(y)|
The neighbor in the direction specified by node x overlaps with the neighbor node in the direction specified by node y by an amount.
Further, calculating a link prediction score of each test edge:
A. Counting the number | O of each edge capable of forming a die bodyxy|;
B. Calculating the number of possible edges M in the networkF
C. Counting the number M of real connecting edges in the network;
D. Calculating the probability of connection between node pairs in a network
E. Calculating the probability of non-connectivity between node pairs in a network
F. Calculating a link prediction score for each edge
Due to the adoption of the technical scheme, the invention can obtain the following technical effects: the application provides a link prediction algorithm of a four-node motif of a directed network based on naive Bayes, which fully applies the structural characteristics of the directed network and greatly improves the accuracy of link prediction.
Drawings
FIG. 1 is a graph comparing the accuracy of a link prediction based on the number of motifs to a single-motif link prediction based on naive Bayes;
FIG. 2 is a structural diagram of role function calculations for the third-order motif and the fourth-order motif.
Detailed Description
the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it should be understood that the described examples are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
example 1
The application provides a multi-order motif directed network link prediction method based on naive Bayes, which comprises the following steps:
S1, counting the integral structure of which two nodes can form a die body structure with a test edge;
S2, calculating the number N of closed motif structures which can be formed by all edges and other nodes in the networkΔw
S3, calculating the number N of the unclosed motif structures which can be formed by all edges and other nodes in the networkΛw
S4, calculating each edge as a role function value of the motif to be formed
s5, counting the total role value of the angle color of each edgeWhere w is all nodes or edges that can be combined with an edge (x, y) to form a given motif shape, Oxya neighbor in the direction is specified for node x and a neighbor node in the direction is specified for node y by the amount of overlap.
Calculating the link prediction score of each test edge:
A. Counting the number | O of each edge capable of forming a die bodyxy|;
B. Calculating the number of possible edges M in the networkF
C. Counting the number M of real connecting edges in the network;
D. Calculating the probability of connection between node pairs in a network
E. Calculating the probability of non-connectivity between node pairs in a network
F. calculating a link prediction score for each edge
And in the process of carrying out link prediction, using AUC to carry out accuracy calculation, comparing scores of each edge obtained based on the positive sample and the negative sample of the test set, if the two scores are equal, adding 0.5 point, and the score of the positive sample of the test set is greater than the score of the negative sample of the test set, adding 1 point, and if the score of the positive sample of the test set is less than the score of the negative sample of the test set, adding 0 point, and evaluating the accuracy of the link prediction according to the final score.
experiments are carried out according to an algorithm, the obtained experiment results are shown in figure 1 and are obtained by a plurality of networks, wherein I-shaped represents the experiment results obtained by a naive Bayes-based link prediction method, and circles represent the experiment results obtained by a traditional link prediction method.
example 2
this embodiment provides an application of the naive bayes-based multi-order motif directed network link prediction method in embodiment 1, where a network is represented by G (V, E), where V represents a node set in the network and E represents a connecting edge set in the network. E is generally divided into two parts: training set ETAnd test set EPIs provided withAnd ET∪EPE. Randomly selecting 10% of connected edges as a test set positive sample EPand the rest 90% of the continuous edges are used as a training set ETAnd selecting a continuous edge set with the size equal to that of the positive sample of the test set from the non-existing continuous edges as the negative sample of the test setThe method comprises the following specific steps:
S1, acquiring original directed network data, constructing an initial network, and acquiring a node pair list without a connecting edge;
S2, randomly selecting 10% of continuous edges in the network data as a test set positive sample, using the rest 90% of continuous edges as a training set, and selecting a continuous edge set with the size equal to that of the test set positive sample from a node pair list without the continuous edges as a test set negative sample;
S3, obtaining a role function value corresponding to an individual in the network by adopting a naive Bayes model algorithm; (as described with reference to example 1)
S4, obtaining r 'of the node pair according to the number of common neighbors of the node pair and the role function of the common neighbors corresponding to the node pair'xya list;
S5, obtaining r 'from different predictors'xyThe list obtains a new score list by using a machine learning method XGboost;
and calculating a similarity index of the node x and the node y, and measuring the existence possibility of the similarity index, wherein the higher score means the higher possibility of connection. All non-existing edges are sorted in descending order by score, then the preceding edge is most likely to exist.
r'xythe calculation method of the value is as follows: from the role function values that can be obtained for each node or edge, then from the set of nodes or edges that each edge can form a given motif with the node or edge, the total role function value for each edge can be obtained. Adding the role function total value corresponding to each edge on the basis of the number of common neighbors in the specified direction corresponding to each edge, and taking the role function total value as r 'corresponding to each edge'xyValue of
In the formulaWherein M isF| V | (| V | -1)/2 represents the number of all possible connected edges in the network, and M | ETL represents the number of connected edges that actually exist in the network. V is the total number of nodes, and E is the set of all connected edges in the network.
R of double die bodyxyThe calculation method of the value is as follows:
The formula is divided into two parts, and r 'is obtained from two single mold bodies'xyAnd then combining the two single motifs into a whole to obtain a result of double motif calculation, wherein the result is equal to the result of directly adding the results of the two motifs. Wherein, | O1xyI represents node x1Neighbors and nodes y in specified directions1Number of neighbor node overlaps in the designated direction, | O2xyI represents node x2Neighbors and nodes y in specified directions2Number of neighbor node overlaps in a given direction, RvRepresenting the value of a role function of one of the motifs, RwAnd a role function value representing another motif.
the calculation method of predicting score of multiple motifs by machine learning is as follows:
R 'obtained from a test set of already obtained individual motifs using XGboost'xyList and r 'derived from training set'xyThe list is brought into the framework of machine learning by r 'to the resulting training set'xyAnd learning the list to obtain a new test set score list.
Meanwhile, the correlation among different motifs can be obtained through the XGboost model, different multi-motif combinations can be selected according to the correlation among the motifs, and a score list aiming at different combinations can be obtained.
In order to explore the influence of a motif node of a directed network on link prediction, a naive Bayes model is used for calculating a role function value of a node, and the role function of the node is added into a traditional link prediction algorithm. The traditional node role function is proposed based on a three-node motif, and the calculation condition of a four-node motif is not considered. As shown in a diagram in fig. 2, which is a three-node motif, the influence of the node C, except for the predicted edge AB, on the generation of the motif needs to be considered. The method is expanded, and the role function of the four-node motif is calculated. As shown in the b diagram of fig. 2, since there are two nodes C and D in the four-node motif in addition to the predicted edge AB, there are three possible situations when considering the role function of the four-node motif. The first is to consider only the role function of the node C, the second is to consider only the role function of the node D, and the third is to take the whole formed by the connecting edges between the node C and the nodes D and the node CD as the role function. Because the first two ways only consider partial structures except for the predicted edges, the prediction of the whole structure is compared. Therefore, in the process of performing the role function calculation of the four nodes, the role function is calculated by using a third method, that is, the influence of the structure in the frame of the four-node motif in the diagram b on the generation of the motif is considered.
The AUC is called area under the receiver operating characteristic curve in English, and refers to the area under the ROC curve.
AUC can measure the accuracy of link prediction as a whole. AUC refers to the probability that the score value of randomly selecting an edge in a test set positive sample is higher than the score value of randomly selecting an edge in a test set negative sample. That is, each time from EPAndIn case E selects one of them randomlyPIs greater thanif the score of the edge of (E) is less than 1, then the score is addedPIs equal toThe score value of the edge of (1) is added with 0.5 score, otherwise, the score is not added. This process is carried out independently n times, if any, E timesPIs greater thanhas a score of E of YPIs equal tohas a score of Z times EPIs less thanthe AUC may be defined as:
When the AUC is 0.5, it indicates that all scores are equivalent to those generated randomly, and when the AUC is 1, it indicates that the algorithm completely correctly predicts the variation of the continuous edge. The larger the AUC, the more accurate the prediction result, and the size of the AUC reflects the accuracy of the algorithm relative to the random algorithm.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (3)

1. A multi-order motif directed network link prediction method based on naive Bayes is characterized by comprising the following steps:
S1, counting the integral structure of which two nodes can form a die body structure with a test edge;
S2, calculating the number N of closed motif structures which can be formed by all edges and other nodes in the networkΔw
S3, calculating the number N of the unclosed motif structures which can be formed by all edges and other nodes in the networkΛw
S4, calculating each edge as a role function value of the motif to be formed
S5, counting the total role value of the angle color of each edgeWhere w is a number indicating that it can be combined with an edge (x, y) to form a specified modulusall nodes or edges of the shape, OxyA neighbor in the direction is specified for node x and a neighbor node in the direction is specified for node y by the amount of overlap.
2. The naive bayes-based multi-order motif directed network link prediction method of claim 1, wherein for a directed unweighted network, Γ (x) represents a neighbor in a direction specified by a node x, and Γ (y) represents a neighbor in a direction specified by a node y;
Oxy=|Γ(x)∩Γ(y)|
The neighbor in the direction specified by node x overlaps with the neighbor node in the direction specified by node y by an amount.
3. the naive bayes-based multi-order motif directed network link prediction method of claim 1, wherein the link prediction score of each test edge is calculated as:
A. Counting the number | O of each edge capable of forming a die bodyxy|;
B. Calculating the number of possible edges M in the networkF
C. Counting the number M of real connecting edges in the network;
D. Calculating the probability of connection between node pairs in a network
E. calculating the probability of non-connectivity between node pairs in a network
F. Calculating a link prediction score for each edge
CN201910764249.2A 2019-08-19 2019-08-19 multi-order motif directed network link prediction method based on naive Bayes Pending CN110569885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910764249.2A CN110569885A (en) 2019-08-19 2019-08-19 multi-order motif directed network link prediction method based on naive Bayes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910764249.2A CN110569885A (en) 2019-08-19 2019-08-19 multi-order motif directed network link prediction method based on naive Bayes

Publications (1)

Publication Number Publication Date
CN110569885A true CN110569885A (en) 2019-12-13

Family

ID=68774012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910764249.2A Pending CN110569885A (en) 2019-08-19 2019-08-19 multi-order motif directed network link prediction method based on naive Bayes

Country Status (1)

Country Link
CN (1) CN110569885A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111669288A (en) * 2020-05-25 2020-09-15 中国人民解放军战略支援部队信息工程大学 Directional network link prediction method and device based on directional heterogeneous neighbor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111669288A (en) * 2020-05-25 2020-09-15 中国人民解放军战略支援部队信息工程大学 Directional network link prediction method and device based on directional heterogeneous neighbor
CN111669288B (en) * 2020-05-25 2023-02-14 中国人民解放军战略支援部队信息工程大学 Directional network link prediction method and device based on directional heterogeneous neighbor

Similar Documents

Publication Publication Date Title
CN111309824B (en) Entity relationship graph display method and system
CN103810288B (en) Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm
Hoffman et al. A note on using the adjusted Rand index for link prediction in networks
CN110532436A (en) Across social network user personal identification method based on community structure
Rintyarna et al. Mapping acceptance of Indonesian organic food consumption under Covid-19 pandemic using Sentiment Analysis of Twitter dataset
CN107729993A (en) Utilize training sample and the 3D convolutional neural networks construction methods of compromise measurement
Cui et al. Learning global pairwise interactions with Bayesian neural networks
CN106327345A (en) Social group discovering method based on multi-network modularity
CN112381179A (en) Heterogeneous graph classification method based on double-layer attention mechanism
Coskun et al. Link prediction in large networks by comparing the global view of nodes in the network
CN110705045A (en) Link prediction method for constructing weighting network by using network topological characteristics
CN110704694A (en) Organization hierarchy dividing method based on network representation learning and application thereof
Amelio et al. An evolutionary and local refinement approach for community detection in signed networks
CN114416824A (en) Method for mining key nodes of complex network based on motif information
Zhang et al. Normalized modularity optimization method for community identification with degree adjustment
CN107679539A (en) A kind of single convolutional neural networks local message wild based on local sensing and global information integration method
CN104035978B (en) Combo discovering method and system
CN109948242A (en) Network representation learning method based on feature Hash
CN106251230A (en) A kind of community discovery method propagated based on election label
CN113240017B (en) Multispectral and panchromatic image classification method based on attention mechanism
Ruderman et al. Uncovering surprising behaviors in reinforcement learning via worst-case analysis
CN114254738A (en) Double-layer evolvable dynamic graph convolution neural network model construction method and application
CN110569885A (en) multi-order motif directed network link prediction method based on naive Bayes
CN110084423A (en) A kind of link prediction method based on local similarity
CN112685272A (en) Interpretable user behavior abnormity detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination