CN104579790B

CN104579790B - A kind of method determining link prediction reduction limit quantity

Info

Publication number: CN104579790B
Application number: CN201510037313.9A
Authority: CN
Inventors: 张维明; 周游; 修保新; 程光权; 谢福利; 朱先强
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2015-01-26
Filing date: 2015-01-26
Publication date: 2016-01-20
Anticipated expiration: 2035-01-26
Also published as: CN104579790A

Abstract

The invention belongs to network reconfiguration and link prediction field, be specifically related to a kind of method determining link prediction reduction limit number, the present invention is based on the interlinking of link prediction and limit cluster, utilize the segmentation density index in the cluster of limit to determine the reduction limit quantity in link prediction process.Use link Predicting Technique to calculate the right connection probability of all nodes not connecting limit, and sort according to connection probability.Sound out in a network successively according to clooating sequence and add each limit, all to re-start limit cluster analysis to network, in the cluster process of limit, computed segmentation density index, by adding the change of network maximum fractionation density before and after a certain bar limit, and then determine whether add this limit.The invention solves after link prediction, determine the problem of reduction limit quantity; The principle of the invention is simple, clear process, be easy to realize, and can provide the support in decision-making, improve the efficiency of decision-making for policymaker.

Description

A kind of method determining link prediction reduction limit quantity

Technical field

The invention belongs to network reconfiguration and link prediction field, be specifically related to a kind of method determining link prediction reduction limit number.

Background technology

Link prediction is to be solved is to the disappearance reduction on limit and the problem of prediction in complex network, existing link prediction method, for the consideration of computational speed and computation complexity, uses the link prediction based on similitude in practical application.Existing link prediction technology cannot determine to reduce the quantity on limit.By existing method, policymaker can only the right connection probability of the company of calculating limit (not having limit to be directly connected in observation grid) node and connect the sequence of probability, and do not know to reduce the quantity on limit, but this is the problem that must solve in network reconfiguration process.

Introduce several existing link prediction index based on similitude below.

Node v in define grid G set of node _x, the set of its neighbor node is designated as Γ (x), and the number of neighbor node is designated as k _x.Link prediction method based on similitude is thought, two node v _xand v _ysimilitude s _xylarger, between these two nodes, the possibility on the company of generation limit is larger.

(1) common neighbours' index

Common neighbours' similitude can be described as structural equivalence again, if namely two nodes have a lot of common neighbor node, so these two nodes are similar.Visible, structural equivalence pay close attention to be two some whether be in same environment.If it is more that two points share common neighbours, then their the more company of tending to limits.

Common neighbours' index mathematical definition is: two node v _xand v _ysimilitude be their common neighbours' number, that is:

s _xy＝|Γ(x)∩Γ(y)|

If consider the impact of the degree of two end nodes on similarity between two nodes from different perspectives, produce again following 6 kinds of similarity indices.

1) Salton index ^[1]

Salton index is also known as cosine similarity, and it reflects the ratio of common neighbours' number and two-end-point geometric mean, is defined as

s_{xy} = \frac{| Γ (x) \cap Γ (y) |}{\sqrt{k_{x} k_{y}}}

2) Jaccard index ^[2]

This index was just suggested before more than 100 years, its reflection be the ratio of two node common neighbours numbers and whole neighbours' quantity, be defined as

s_{xy} = \frac{| Γ (x) \cap Γ (y) |}{| Γ (x) \cup Γ (y) |}

3) Sorenson index ^[3]

This index is usually used in ecological data research, and it reflects that the ratio of common neighbours' number and two-end-point algebraic mean number is defined as

s_{xy} = \frac{2 \times | Γ (x) \cap Γ (y) |}{k_{x} + k_{y}}

4) the favourable index of magnanimous node (HPI) ^[4]

This index is thought, spend larger node, the centrality shown in a network is more obvious, and easy and other nodes produce and connect limit, and it is defined as

s_{xy} = \frac{| Γ (x) \cap Γ (y) |}{\min {k_{x}, k_{y}}}

Because the less node of denominator degree of being determines, under this definition, between known magnanimous node and other nodes, more easily there is high similarity.

5) the unfavorable index of magnanimous node (HDI)

This index is contrary with HPI index, and it thinks that the large node of network moderate is unfavorable for the connection with other nodes.Contrary with HPI index, in similarity indices formula, denominator gets the maximum of two node degrees, is defined as

s_{xy} = \frac{| Γ (x) \cap Γ (y) |}{\max {k_{x}, k_{y}}}

6) LHN-I index ^[5]

This index is proposed by Leicht, Holme and Newman, is defined as

s_{xy} = \frac{| Γ (x) \cap Γ (y) |}{k_{x} k_{y}}

Wherein, denominator k _xk _ybe proportional to node v _xand v _ythe desired value of common neighbours' number, i.e. E (| Γ (x) ∩ Γ (y) |).

(2) AA index ^[6]

Adamic-Adar index (AA index) considers the information of the degree of the common neighbours of two nodes.The contribution of common neighbours to similitude that its thought degree of being is little is larger.This is readily appreciated that in social networks: such as in microblogging, two people not theed least concerned can pay close attention to a star (having more magnanimous node), but two people of focus attentions equally on domestic consumer (having the node of less degree) are probably friends.AA index is that each common neighbor node gives a weighted value, is defined as

s_{xy} = \underset{z &Element; Γ (x) \cap Γ (y)}{Σ} \frac{1}{\log k_{z}}

(3) Resourse Distribute index index ^[7]

To consider in network G non-conterminous two node v arbitrarily _xand v _ysince network is communicated with, and can think v _xsome information can be transmitted to v _y, and their common neighbours just can regard the medium that information is transmitted as.Suppose the resource of each common neighbours Dou Youyige unit, and mean allocation is passed to its neighbours, then from v _xv can be delivered to _ynumber of resources just can be defined as node v _xand v _ysimilarity, that is:

s_{xy} = \underset{z &Element; Γ (x) \cap Γ (y)}{Σ} \frac{1}{k_{z}}

[1]SaltonG,McGillMJ.Introductiontomoderninformationretrieval[J].1983.

[2]JaccardP.EtudecomparativedeladistributionfloraledansuneportiondesAlpesetduJura[M].Impr.Corbaz,1901.

[3]SorensenT.AmethodofestablishinggroupsofequalamplitudeinplantsociologybasedonsimilarityofspeciescontentanditsapplicationtoanalysesofthevegetatianonDanishcommons[J].BiologiskeSkrifter,1948,5(4):1-34

[4]RavaszE,SomeraAL,MongruDA,etal.Hierarchicalorganizationofmodularityinmetabolicnetworks[J].science,2002,297(5586):1551-1555.

[5]LeichtEA,HolmeP,NewmanMEJ.Vertexsimilarityinnetworks[J].PhysicalReviewE,2006,73(2):026120.

[6]AdamicLA,AdarE.Friendsandneighborsontheweb[J].Socialnetworks,2003,25(3):211-230.

[7]ZhouT,LüL,ZhangYC.Predictingmissinglinksvialocalinformation[J].TheEuropeanPhysicalJournalB,2009,71(4):623-630.

The present invention is intended to solve the problem determining reduction limit number in link prediction (link prediction is to be solved is to the disappearance reduction on limit and the problem of prediction in complex network) process.Method due to existing link prediction only gives the right connection probability of non-connected node in network, and one does not determine the method on reduction how many limits, and for policymaker, a kind of method of reduction limit quantity that can instruct has great importance.

Summary of the invention

Reduce limit count issue for solving in above-mentioned link reduction process, the invention provides and propose a kind of method can determining to reduce limit quantity, concrete technical scheme is as follows:

Determine a method for link prediction reduction limit quantity, comprise the following steps:

(1) initialization sets reduction limit quantity n as 0, selects link prediction similarity indices, does not connect the connection probability that the node on limit is right in computing network, and according to connection probability from big to small by node to sequence composition sequence collection;

(2) judge whether sequence sets is empty set, if empty set, goes to step (5), otherwise, from sequence sets, select the node pair of current connection maximum probability, judge whether to there is multiple node to connection maximum probability simultaneously; If exist, for connecting the equal node of probability to processing, in current network respectively heuristically these nodes between add a limit, calculate the maximum fractionation density after adding, the rear maximum fractionation density calculated is added from big to small according to souning out respectively, these being connected the equal node of probability to carrying out partial ordering's composition local sequence sets, entering step (3); If do not exist, directly enter step (4);

(3) selection node pair is successively concentrated from local sequence, calculate and compare this node between add the maximum fractionation density of a forward and backward network in limit, if add the maximum fractionation density behind this limit be more than or equal to interpolation before maximum fractionation density, then add this limit in a network, the value of n adds 1, if node corresponding to this limit is to being last node pair that local sequence is concentrated, from sequence sets, then delete all nodes pair that local sequence is concentrated, renewal sequence collection, goes to step (2); If add the maximum fractionation density behind this limit be less than interpolation before maximum fractionation density, then do not add this limit, go to step (5);

(4) calculate and compare this node between add the maximum fractionation density of network before and after a limit, if add the maximum fractionation density behind this limit be more than or equal to interpolation before maximum fractionation density, then add this limit in a network, the value of n adds 1, node corresponding for this limit is deleted from sequence sets, renewal sequence collection; Go to step (2); If add the maximum fractionation density behind this limit be less than interpolation before maximum fractionation density, then do not add this limit, go to step (5);

(5) end loop, Output rusults n value.

Adopt the technique effect that the present invention obtains: the invention solves after link prediction, determine the problem of reduction limit quantity; The principle of the invention is simple, clear process, be easy to realize, and can provide the support in decision-making, improve the efficiency of decision-making for policymaker;

Accompanying drawing explanation

Fig. 1 is that the present invention determines reduction limit quantitative approach flow chart;

Fig. 2 is limit of the present invention similitude exemplary graph;

Tu3Shi karate club network topological diagram.

Embodiment

Below in conjunction with the drawings and specific embodiments, the present invention will be further described.

As shown in Figure 1, the present invention is based on the interlinking of link prediction and limit cluster, utilize the segmentation density index in the cluster of limit to determine the reduction limit quantity in link prediction process.Link Predicting Technique to be used before the present invention to calculate the right connection probability of all nodes not connecting limit using, and sort according to connection probability.Sound out in a network successively according to clooating sequence and add each limit, all to re-start limit cluster analysis to network, in the cluster process of limit, computed segmentation density index, by adding the change of network maximum fractionation density before and after a certain bar limit, and then determine whether add this limit.

Introduce the computational methods of network maximum fractionation density below.

First according to the index that link prediction uses, limit similarity indices formula is determined.As shown in Figure 2, limit similitude is for adjacent edge (having two limits of public vertex).In the present embodiment, link prediction uses Jaccard index, and Similarity measures formula is: so in the cluster process of limit, limit similarity definition is wherein, e _ik, e _jkfor there being two limits of common node, Γ (i), Γ (j) are node v _i, v _jneighbor node set.

After determining limit similarity indices formula, the limit similitude of all adjacent edges in computing network.Before the cluster of limit, make every bar limit alone as a class.Maximum for limit similitude two limits (not also being classified as a class) are classified as a class, until all limits are all classified as a class in network by each cluster.

After each cluster, first calculate the Connection Density D of each class inside _c:

Wherein, C represents the class that cluster obtains, m _cthe number of the internal edges of representation class, n _cthe number (concrete list of references AHNY-Y, BAGROWJP, LEHMANNS.Linkcommunitiesrevealmultiscalecomplexityinnetw orks.Nature, 2010,466:761-764.) of the point that representation class comprises.After each cluster, the segmentation density D of whole network is exactly the weighted average of the inside Connection Density of all subclasses, is calculated by following formula:

D = \frac{2}{M} \underset{C}{Σ} m_{C} \frac{m_{C} - (n_{C} - 1)}{(n_{C} - 2) (n_{C} - 1)} .

Wherein, M represents limit number total in network.Network every bar limit before the cluster of limit, alone as a class, is all classified as in a class process to final all limits, and each cluster all calculates once splits density, and the maximum of all segmentation density is exactly the maximum fractionation density of network.

For karate club network, network topology is illustrated in fig. 3 shown below, and number in the figure 1-33 represents node.Perform step 1, initialization n=0, use Jaccard index to carry out link prediction, calculate each node to connection probability, and arrange from big to small, composition sequence collection is as follows:

Node pair	Connect probability
		{7，12}	0.571429
{4，5}	0.5
		{6，9}	0.5
{13，15}	0.5
		{24，25}	0.5
{24，26}	0.5
		{25，26}	0.5
{24，27}	0.5
		{25，27}	0.5
{26，27}	0.5
		{24，28}	0.5
{25，28}	0.5
		{26，28}	0.5
{27，28}	0.5
		{12，14}	0.428571
{13，14}	0.4
		…	…
{3，23}	0.041667

Perform step 2, sequence sets is not empty set, selects the node of connection maximum probability to { 7,12} does not exist with { 7,12} node, to connecting other equal nodes pair of probability, directly enters step 4.

Perform step 4, network adds limit, and { the maximum fractionation density after 7,12} is greater than interpolation limit { the maximum fractionation density before 7,12}, therefore { value of 7,12}, n adds 1 (now n=1) to add limit, by node to 7,12} deletes from sequence sets, and now sequence sets is as follows:

Node pair	Connect probability
		{4，5}	0.5
{6，9}	0.5
		{13，15}	0.5
{24，25}	0.5
		{24，26}	0.5
{25，26}	0.5
		{24，27}	0.5
{25，27}	0.5
		{26，27}	0.5
{24，28}	0.5
		{25，28}	0.5
{26，28}	0.5
		{27，28}	0.5
{12，14}	0.428571
		{13，14}	0.4
…	…
		{3，23}	0.041667

Enter step 2.

Perform step 2, sequence sets is not empty set, selects the node of connection maximum probability to { 4,, there is node to { 6,9}, { 13 in 5}, 15}, { 24,25}, { 24,26}, { 25,26}, { 24,27}, { 25,27}, { 26,27}, { 24,28}, { 25,28}, { 26,28}, { 27,28} and node are to { 4,5} has equal connection probability.Calculate the maximum fractionation density distinguished in a network after these limits of exploratory interpolation, arrange from big to small, composition local sequence sets, as follows:

Enter step 3.

Perform step 3, in order the exploratory limit of adding local sequence and concentrating.{ before the maximum fractionation density after 4,5} is greater than interpolation, so add limit, { 4,5}, n add 1 (now n=2), by node to { 4,5} deletes from the sequence of local to add limit.Now local sequence sets is:

{ before the maximum fractionation density after 6,9} is greater than interpolation, so add limit, { 6,9}, n add 1 (now n=3), by node to { 6,9} deletes from the sequence of local to add limit.Now local sequence sets is:

{ before the maximum fractionation density after 13,15} is greater than interpolation, so add limit, { 13,15}, n add 1 (now n=4), by node to { 13,15} deletes from the sequence of local to add limit.Now local sequence sets is

{ before the maximum fractionation density after 24,25} is less than interpolation, so do not add limit, { 24,25} goes to step 5 to add limit.

Perform step 5, delete all sequences collection, exit circulation, terminate.N=4, the limit of adding is { 7,12}, { 4,5}, { 6,9}, { 13,15}.

Limit similarity indices is relevant with the similarity indices used by the link prediction stage.In real world applications, different similarity indices is used to carry out link prediction, different limit similarity indices will be used to determine the quantity on reduction limit, the limit Similarity measures formula in link prediction use Jaccard index situation is given in this specification, but be not limited thereto, use other similarity indices situations when link prediction under, corresponding limit Similarity measures formula is as follows:

E in table _xk, e _ykfor there being two limits of common node, Γ (x), Γ (y) are node v _x, v _yneighbor node set.

Content described in this specification embodiment is only enumerating the way of realization that the present invention conceives; should not being considered as of protection scope of the present invention is only limitted to the concrete form that embodiment is stated, protection scope of the present invention also and conceive the equivalent technologies means that can expect according to the present invention in those skilled in the art.

Claims

1. determine a method for link prediction reduction limit quantity, it is characterized in that, comprise the following steps:

(5) end loop, Output rusults n value.

2. a kind of method determining link prediction reduction limit quantity as claimed in claim 1, it is characterized in that, described link prediction similarity indices is any one index in Salton index, Jaccard index, Sorenson index, HPI, HDI, LHN-I index and AA index, wherein HPI represents the favourable index of magnanimous node, HDI represents the unfavorable index of magnanimous node, the index abbreviated name that LHN-I index expression Leicht, Holme and Newman propose, AA index expression Adamic-Adar index.