WO2019014894A1 - 网络链路预测方法及装置 - Google Patents

网络链路预测方法及装置 Download PDF

Info

Publication number
WO2019014894A1
WO2019014894A1 PCT/CN2017/093676 CN2017093676W WO2019014894A1 WO 2019014894 A1 WO2019014894 A1 WO 2019014894A1 CN 2017093676 W CN2017093676 W CN 2017093676W WO 2019014894 A1 WO2019014894 A1 WO 2019014894A1
Authority
WO
WIPO (PCT)
Prior art keywords
existing
edges
edge
similarity
similarity value
Prior art date
Application number
PCT/CN2017/093676
Other languages
English (en)
French (fr)
Inventor
周明洋
熊文漫
廖好
沈婧
吴向阳
陆克中
毛睿
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2017/093676 priority Critical patent/WO2019014894A1/zh
Publication of WO2019014894A1 publication Critical patent/WO2019014894A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery

Definitions

  • the invention belongs to the field of data mining, and in particular relates to a network link prediction method and device.
  • Link Prediction in the network refers to the possibility of predicting the link between two nodes in the network that have not yet been generated by the known network nodes and network topology.
  • Link prediction includes predictions of unknown but unknown links and predictions of future links.
  • the similarity algorithm is generally used to predict the network link.
  • Link prediction based on similarity algorithm includes: Common neighbor (CN) algorithm, Jaccarb coefficient algorithm, Resource allocation (RA) algorithm, Local path index (LP) algorithm And Structural Perturbation Method (SPM).
  • the similarity algorithm described above calculates the similarity between the two nodes to obtain the similarity value, and the edge with the high similarity value, that is, the edge with high similarity, is used as the edge where the prediction exists.
  • the edges formed by two nodes with low similarity can have more stable links than the edges with similar similarities.
  • the existing similarity algorithm can only calculate the similarity between two nodes. If the calculated similarity is low, it is impossible to determine whether the edge formed by the two nodes with low similarity is the edge where the prediction exists.
  • the invention provides a network link prediction method and device, which aims to solve the problem that the similarity algorithm cannot predict the edge formed by two nodes with low similarity in the existing link prediction method.
  • a network link prediction method provided by the present invention includes: acquiring a network topology of a network
  • the sides calculated by the similarity are divided into multiple sets, and the number of edges and non-existing edges existing in each set are respectively subjected to probability statistics, and the probability distribution of the edges existing in each set and the non-existing edges are obtained. Probability distributions;
  • a similarity value greater than or equal to the preset value is selected, and the existing edge corresponding to the selected similarity value is used as the predicted edge in the network topology.
  • a network link prediction apparatus provided by the present invention includes:
  • a setting module configured to divide an edge formed between nodes in the network topology into an existing edge and a non-existing edge, and select a plurality of edges from the existing edge to form a training set
  • a calculation processing module configured to calculate a similarity of two nodes of each existing edge of the training set, obtain a similarity value corresponding to each existing edge in the training set, and calculate each of the non-existing edges The similarity between two nodes, and the corresponding similarity value of each non-existing edge is obtained;
  • the sides calculated by the similarity are divided into a plurality of sets, and the number of edges and non-existing edges existing in each set are respectively subjected to probability statistics, and the probability distribution and non-existence of the edges existing in each set are obtained.
  • Probability distribution of edges
  • the prediction module is configured to select, in the adjusted similarity value, a similarity value that is greater than or equal to the preset value, and use the existing edge corresponding to the selected similarity value as the predicted edge in the network topology.
  • the network link prediction method and apparatus provided by the present invention re-adjust the similarity value corresponding to each edge of the training set by using the calculated probability distribution of the existing edge and the probability distribution of the non-existing edge, thereby taking into account the non-existing edge Corresponding similarity cases, and then predicting possible edges based on the adjusted similarity values, not only can predict the existence of edges between two nodes with high similarity, but also predict the existence of two nodes with low similarity. Edge, which in turn improves the accuracy of link prediction.
  • FIG. 1 is a schematic flowchart of an implementation process of a network link prediction method according to a first embodiment of the present invention
  • FIG. 2 is a schematic diagram of a star network topology of a network
  • FIG. 3 is a schematic flowchart of an implementation process of a network link prediction method according to a second embodiment of the present invention.
  • FIG. 4 is a schematic diagram showing a comparison result of an accuracy of a predicted edge of a conventional link prediction method and a network link prediction method according to an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of a network link prediction apparatus according to third and fourth embodiments of the present invention.
  • FIG. 1 is a schematic flowchart of a network link prediction method according to a first embodiment of the present invention, which can be applied to a computer.
  • the network link prediction method shown in FIG. 1 mainly includes the following steps:
  • the network can be in an existing network such as a social network, a protein network, or a neural network.
  • the network topology is the topology of the network, which refers to the node formed by the network computer or device and the transmission medium, and the physical composition mode of the line.
  • the nodes in the network topology are divided into two categories: one is a transit node that converts and exchanges information, such as switches, hubs, and terminal controllers; the other is an access node, such as a computer host or terminal.
  • Network topologies come in a variety of shapes, such as bus, star, ring, tree, and mesh. As shown in FIG. 2, FIG. 2 is a schematic diagram of a star network topology of a network. Each hollow figure in Figure 2 is a node, and the connection between the two nodes is a link, which may also be called the edge of the network topology.
  • edges in the network topology are the existing edges, which edges are the non-existing edges, and then divide the edges in the network topology according to a predefined division.
  • the link formed by the two nodes having the connection relationship is called an existing edge; otherwise, it is a non-existing edge.
  • the selection method of selecting a plurality of edges from the existing edges to form a training set is randomly selected, and is not limited.
  • the method for calculating the similarity between two nodes is not limited, and may be any one of a common neighbor algorithm, a Jacques coefficient algorithm, a resource allocation algorithm, a local path indicator, and a structure perturbation method, or other suitable for the link.
  • Predicted similarity algorithm Predicted similarity algorithm. Among them, the greater the similarity value, the higher the similarity between the two nodes.
  • S104 Divide each side of the similarity calculation into multiple sets, and perform probability statistics on the number of edges and non-existing edges existing in each set, respectively, to obtain probability distributions and non-existences of edges existing in each set. The probability distribution of the edges.
  • the division of the set is a random division, and the number of elements included in the collection is greater than or equal to zero.
  • the edge of the training set is the edge that exists in the set.
  • the similarity values corresponding to the existing edges included in each set are adjusted according to the probability distribution of the edges existing in each set and the probability distribution of the non-existing edges.
  • the adjusted similarity values corresponding to the edges existing in each set are the same.
  • the value of the preset value is not limited, and the preset value may be equal to the adjusted similarity value, or may not be equal to the adjusted similarity value.
  • a similarity value greater than the preset value may be selected, or a similarity value equal to the preset value;
  • a similarity value greater than the preset value may be selected.
  • the adjusted similarity values are 1 and 2.
  • the edge corresponding to the similarity value 1 is the edge a and the edge b, and the edge having the similarity value of 2 is the edge c.
  • the preset value is 1, it will be greater than
  • the edge a, the edge b, and the edge c corresponding to the similarity value equal to 1 are used as predicted edges, or the edge c corresponding to the similarity value greater than 1 is used as the predicted edge.
  • the similarity value corresponding to each edge of the training set is re-adjusted by using the calculated probability distribution of the existing edge and the probability distribution of the non-existing edge, thereby taking into account the similarity of the non-existing edge. Then, according to the adjusted similarity value, the possible edges are predicted, and not only the edges existing between the two nodes with high similarity can be predicted, but also the edges existing between the two nodes with low similarity can be predicted, thereby improving the chain. The accuracy of the road prediction.
  • FIG. 3 is a schematic flowchart of a network link prediction method according to a second embodiment of the present invention, which can be applied to a computer.
  • the network link prediction method shown in FIG. 3 mainly includes the following steps:
  • the network can be in an existing network such as a social network, a protein network, or a neural network.
  • the network topology is the topology of the network, which refers to the node formed by the network computer or device and the transmission medium, and the physical composition mode of the line.
  • the nodes in the network topology are divided into two categories: one is a transit node that converts and exchanges information, such as switches, hubs, and terminal controllers; the other is an access node, such as a computer host or terminal.
  • Network topologies come in a variety of shapes, such as bus, star, ring, tree, and mesh.
  • Each hollow figure in Figure 2 is a node, and the connection between the two nodes is a link, which may also be called the edge of the network topology.
  • edges in the network topology are the existing edges, which edges are the non-existing edges, and then divide the edges in the network topology according to a predefined division.
  • the link formed by the two nodes having the connection relationship is called an existing edge; otherwise, it is a non-existing edge.
  • the selection method of selecting a plurality of edges from the existing edges to form a training set is randomly selected, and is not limited. The more the number selected in the training set, the more accurate the predicted result will be.
  • each node in the network topology is divided into an existing edge and a non-existing edge, and a plurality of edges are selected from the existing edge to form a training set, which is specifically:
  • the method further includes:
  • the number of edges in the training set occupies the number of edges present in the entire network topology. 80%-90%, the number of edges in the test set is 20%-10% of the number of edges present in the entire network topology.
  • the method for calculating the similarity between two nodes is not limited, and may be any one of a common neighbor algorithm, a Jacques coefficient algorithm, a resource allocation algorithm, a local path indicator, and a structure perturbation method, or other suitable for the link. Predicted similarity algorithm.
  • the expression for calculating the similarity between two nodes i and j in the common neighbor algorithm is:
  • ⁇ (i) represents a set of neighbor nodes of node i
  • ⁇ (j) represents a set of neighbor nodes of node j
  • represents a set of nodes in the set
  • nodes i and j belong to the side of the training set The node at the end, or the node at both ends of the non-existing edge.
  • S304 Divide each side of the similarity calculation into multiple sets, and perform probability statistics on the number of edges and non-existing edges existing in each set, respectively, to obtain probability distributions and non-existences of edges existing in each set. The probability distribution of the edges.
  • the obtained similarity value is divided into a plurality of groups, and the edges corresponding to the similarity values in each group are divided into one set.
  • Each group may include a similarity value or multiple similarity values. Therefore, the number of elements contained in each collection is greater than or equal to zero.
  • edge A The sides calculated by the similarity are edge A, edge B, edge C, edge D, and edge F, where the similarity value of edge A corresponds to 1, and the similarity value of edge B and edge C corresponds to 4, and edge D corresponds to The similarity value is 6, and the similarity value corresponding to edge F is 9.
  • group 1 contains 1, group 2 contains 4 and 9, group 3 contains 6, edge A belongs to set 1, edge B, edge C and edge F belong to set 2, and edge D belongs to set 3.
  • edge A, B, and D are existing edges, and edges C and F are non-existing edges
  • the probability distribution of edges existing in set 1 is 1, and the probability distribution of non-existing edges is 0;
  • the probability distribution of edges is 1, the probability distribution of non-existing edges is 2;
  • the probability distribution of edges existing in set 3 is 1, and the probability distribution of non-existing edges is 0.
  • the edge of the training set is the edge that exists in the set.
  • the similarity values corresponding to the existing edges included in each set are adjusted according to the probability distribution of the edges existing in each set and the probability distribution of the non-existing edges.
  • the similarity values corresponding to the obtained existing edges are adjusted as follows:
  • the preset conversion function is: p r (s) is the probability distribution of the existing edges, p n (s) is the probability distribution of the non-existing edges, and s is the similarity value contained in the divided groups;
  • the ratio PNR(s) calculated for each set is used as the adjusted similarity value corresponding to the side existing in each set.
  • the adjusted similarity values corresponding to the edges existing in each set are the same. Let PNR(s) calculated by set A be 1 and PNR(s) calculated by set B be 3, then the adjusted similarity value corresponding to each side existing in set A is 1, and the existing side in set B corresponds to The adjusted similarity value is 3.
  • the value of the preset value is not limited, and the preset value may be equal to the adjusted similarity value, or may not be equal to the adjusted similarity value.
  • a similarity value greater than the preset value may be selected, or a similarity value equal to the preset value; when the preset value is not equal to the adjusted similarity value , you can select a similarity value greater than the preset value.
  • the adjusted similarity values are 1 and 2.
  • the edge corresponding to the similarity value 1 is the edge a and the edge b, and the edge having the similarity value of 2 is the edge c.
  • the preset value is 1, it will be greater than
  • the edge a, the edge b, and the edge c corresponding to the similarity value equal to 1 are used as predicted edges, or the edge c corresponding to the similarity value greater than 1 is used as the predicted edge.
  • the adjusted similarity values are firstly arranged from large to small or from small to large, and then from the similarity value, the existing edge corresponding to the L-th similarity value is selected as the predicted edge.
  • L is an integer greater than or equal to 1. If you limit the number of predicted edges, L can be the number of edges in the test set.
  • is a constant
  • is the set of edges in the test set
  • is the set of non-existing edges and edges of the test set
  • S (s 1 ,s 2 ) ⁇ (s 3 ,s 4 )... ⁇ (s 2m-1 , s 2m ), and satisfy the condition of s 1 ⁇ s 2 , s 3 ⁇ s 4 , . . . , s 2m-1 ⁇ s 2m .
  • are a collection of unknown edges.
  • An unknown edge indicates that the connection between the two nodes is unclear and needs to be verified.
  • c 0 is a constant, if s>c 0 and p r (s) ⁇ p n (s), the accuracy of p F prediction will be low; if s ⁇ c 0 and p r (s )>>p n (s), the prediction accuracy will be high.
  • FIG. 4 is a schematic diagram showing the comparison result of the accuracy of the predicted edge of the existing link prediction method and the network link prediction method provided by the embodiment of the present invention.
  • Two real networks are selected: US Power Grid and Neural Network.
  • the existing link prediction methods include: the existing link prediction method used in the comparison of one set of bar graphs is CN algorithm, The existing link prediction method used in the comparison of the two sets of bar graphs is the Jaccarb algorithm, the existing link prediction method used in the comparison of the three sets of bar graphs is the RA algorithm, and the four sets of bar graphs are used in comparison.
  • Some link prediction methods are SPM for the LP algorithm and the existing link prediction method used for the five sets of bar graph comparisons.
  • the black bar graph is the prediction accuracy of the network link prediction method provided by the embodiment of the present invention
  • the vertical axis coordinate indicates the prediction accuracy.
  • the accuracy of the network link prediction method provided by the embodiment of the present invention is greater than the accuracy of the existing link prediction method. Therefore, compared with the existing link prediction method, the network link prediction method provided by the embodiment of the present invention has better prediction effect and better accuracy.
  • the similarity value corresponding to each edge of the training set is re-adjusted by using the calculated probability distribution of the existing edge and the probability distribution of the non-existing edge, thereby taking into account the similarity of the non-existing edge. Then, according to the adjusted similarity value, the possible edges are predicted, and not only the edges existing between the two nodes with high similarity can be predicted, but also the edges existing between the two nodes with low similarity can be predicted, thereby improving the chain. The accuracy of the road prediction.
  • FIG. 5 is a schematic structural diagram of a network link prediction apparatus according to a third embodiment of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown.
  • the network link prediction apparatus illustrated in FIG. 5 may be an execution body of the network link prediction method provided by the foregoing embodiment shown in FIG. 1.
  • the network link prediction apparatus illustrated in FIG. 5 mainly includes an acquisition module 501, a setting module 502, a calculation processing module 503, and a prediction module 504.
  • the above functional modules are described in detail as follows:
  • the obtaining module 501 is configured to acquire a network topology of the network.
  • the setting module 502 is configured to divide the edge formed between the nodes in the network topology into an existing edge and a non-existing edge, and select a plurality of edges from the existing edge to form a training set;
  • the calculation processing module 503 is configured to calculate similarity between two nodes of each existing edge in the training set, obtain similarity values corresponding to the existing edges of the training set, and calculate two of each non-existing edge of the training set. The similarity of the nodes, the corresponding similarity values of the non-existing edges are obtained;
  • the sides calculated by the similarity are divided into a plurality of sets, and the number of edges and non-existing edges existing in each set are respectively subjected to probability statistics, and the probability distribution and non-existence of the edges existing in each set are obtained.
  • Probability distribution of edges
  • the prediction module 504 is configured to select, in the adjusted similarity value, a similarity value that is greater than or equal to the preset value, and use the existing edge corresponding to the selected similarity value as the predicted edge in the network topology.
  • the selection method of selecting a plurality of edges from the existing edges to form a training set is randomly selected, and is not limited.
  • the method for calculating the similarity between two nodes is not limited, and may be any one of a common neighbor algorithm, a Jacques coefficient algorithm, a resource allocation algorithm, a local path indicator, and a structure perturbation method, or other suitable for the link. Predicted similarity algorithm. The greater the similarity value, the higher the similarity between the two nodes.
  • the division of the set is a random division, and the number of elements included in the collection is greater than or equal to zero.
  • the edge of the training set is the edge that exists in the set.
  • the similarity values corresponding to the existing edges included in each set are adjusted according to the probability distribution of the edges existing in each set and the probability distribution of the non-existing edges.
  • the adjusted similarity values corresponding to the edges existing in each set are the same.
  • each functional module is merely an example, and the actual application may be considered according to requirements, for example, configuration requirements of corresponding hardware or convenience of implementation of software.
  • the above function assignment is performed by different functional modules, that is, the internal structure of the above device is divided into different functional modules to complete all or part of the functions described above.
  • the corresponding functional modules in this embodiment may be implemented by corresponding hardware, or may be executed by corresponding hardware to execute corresponding software.
  • the above description principles may be applied to various embodiments provided in this specification, and are not described herein again.
  • the calculation processing module 503 re-adjusts the similarity value corresponding to each edge of the training set by using the calculated probability distribution of the existing edge and the probability distribution of the non-existing edge, thereby taking into account the non-existent edge corresponding to The similarity situation, and then predicting the possible edges according to the adjusted similarity value, not only can predict the edges existing between the two nodes with high similarity, but also predict the edges existing between the two nodes with low similarity. This improves the accuracy of link prediction.
  • the network link prediction apparatus is an execution body of the network link prediction method provided by the foregoing embodiment shown in FIG. 1 and FIG. For the convenience of description, only parts related to the embodiment of the present invention are shown.
  • the network link prediction apparatus provided by the fourth embodiment of the present invention mainly includes: an obtaining module 501, a setting module 502, a calculation processing module 503, and a prediction module 504.
  • the above functional modules are described in detail as follows:
  • the obtaining module 501 is configured to acquire a network topology of the network.
  • the network can be in an existing network such as a social network, a protein network, or a neural network.
  • the network topology is the topology of the network, which refers to the node formed by the network computer or device and the transmission medium, and the physical composition mode of the line.
  • the nodes in the network topology are divided into two categories: one is a transit node that converts and exchanges information, such as switches, hubs, and terminal controllers; the other is an access node, such as a computer host or terminal.
  • Network topologies come in a variety of shapes, such as bus, star, ring, tree, and mesh.
  • Each hollow figure in Figure 2 is a node, and the connection between the two nodes is a link, which may also be called the edge of the network topology.
  • the setting module 502 is configured to divide the edge formed between the nodes in the network topology into existing edges and non-existing edges, and select a plurality of edges from the existing edges to form a training set.
  • edges in the network topology are the existing edges, which edges are the non-existing edges, and then divide the edges in the network topology according to a predefined division.
  • the selection method of selecting a plurality of edges from the existing edges to form a training set is randomly selected, and is not limited. The more the number selected in the training set, the more accurate the predicted result will be.
  • setting module 502 is further configured to perform the following steps:
  • the number of edges in the training set accounts for 80%-90% of the number of edges existing in the entire network topology
  • the number of edges in the test set accounts for 20%-10% of the number of edges existing in the entire network topology
  • the calculation processing module 503 is configured to calculate similarity between two nodes of each existing edge in the training set, obtain similarity values corresponding to the existing edges of the training set, and calculate two of each non-existing edge of the training set.
  • the similarity of the nodes obtains the corresponding similarity values of the non-existing edges.
  • the method for calculating the similarity between two nodes is not limited, and may be any one of a common neighbor algorithm, a Jacques coefficient algorithm, a resource allocation algorithm, a local path indicator, and a structure perturbation method, or other suitable for the link. Predicted similarity algorithm.
  • the expression for calculating the similarity between two nodes i and j in the common neighbor algorithm is:
  • ⁇ (i) represents a set of neighbor nodes of node i
  • ⁇ (j) represents a set of neighbor nodes of node j
  • represents a set of nodes in the set
  • nodes i and j belong to the side of the training set The node at the end, or the node at both ends of the non-existing edge.
  • the calculation processing module 503 is further configured to divide each side of the similarity calculation into multiple sets, and Probabilistic statistics are performed on the number of edges and non-existing edges present in each set, and the probability distribution of the edges existing in each set and the probability distribution of the non-existing edges are obtained.
  • calculation processing module 503 is further configured to divide the obtained similarity value into a plurality of groups, and divide the edges corresponding to the similarity values in each group into one set.
  • Each group may include a similarity value or multiple similarity values. Therefore, the number of elements contained in each collection is greater than or equal to zero.
  • edge A The sides calculated by the similarity are edge A, edge B, edge C, edge D, and edge F, where the similarity value of edge A corresponds to 1, and the similarity value of edge B and edge C corresponds to 4, and edge D corresponds to The similarity value is 6, and the similarity value corresponding to edge F is 9.
  • group 1 contains 1, group 2 contains 4 and 9, group 3 contains 6, edge A belongs to set 1, edge B, edge C and edge F belong to set 2, and edge D belongs to set 3.
  • edge A, B, and D are existing edges, and edges C and F are non-existing edges
  • the probability distribution of edges existing in set 1 is 1, and the probability distribution of non-existing edges is 0;
  • the probability distribution of edges is 1, the probability distribution of non-existing edges is 2;
  • the probability distribution of edges existing in set 3 is 1, and the probability distribution of non-existing edges is 0.
  • the calculation processing module 503 is further configured to adjust the similarity value corresponding to the obtained existing edges according to the preset conversion function, the probability distribution of the existing edge, and the probability distribution of the non-existing edge.
  • the edge of the training set is the edge that exists in the set.
  • the similarity values corresponding to the existing edges included in each set are adjusted according to the probability distribution of the edges existing in each set and the probability distribution of the non-existing edges.
  • calculation processing module 503 is further configured to perform the following steps:
  • the preset conversion function is: p r (s) is the probability distribution of the existing edges, p n (s) is the probability distribution of the non-existing edges, and s is the similarity value contained in the divided groups;
  • the ratio PNR(s) calculated for each set is used as the adjusted similarity value corresponding to the side existing in each set.
  • the adjusted similarity values corresponding to the edges existing in each set are the same. Let PNR(s) calculated by set A be 1 and PNR(s) calculated by set B be 3, then the adjusted similarity value corresponding to each side existing in set A is 1, and the existing side in set B corresponds to The adjusted similarity value is 3.
  • the prediction module 504 is configured to select, in the adjusted similarity value, a similarity value that is greater than or equal to the preset value, and use the existing edge corresponding to the selected similarity value as the predicted edge in the network topology.
  • the value of the preset value is not limited, and the preset value may be equal to the adjusted similarity value, or may not be equal to the adjusted similarity value.
  • a similarity value greater than the preset value or a similarity value equal to the preset value is selected; when the preset value is not equal to the adjusted similarity value, a similarity value greater than the preset value may be selected.
  • the adjusted similarity values are 1 and 2.
  • the edge corresponding to the similarity value 1 is the edge a and the edge b, and the edge having the similarity value of 2 is the edge c.
  • the preset value is 1, it will be greater than
  • the edge a, the edge b, and the edge c corresponding to the similarity value equal to 1 are used as predicted edges, or the edge c corresponding to the similarity value greater than 1 is used as the predicted edge.
  • the prediction module 504 is further configured to first arrange the adjusted similarity values from large to small or from small to large, and then select the existence of the similarity value of the Lth bit from the similarity value.
  • the edge of the side is the predicted side.
  • L is an integer greater than or equal to 1. If you limit the number of predicted edges, L can be the number of edges in the test set.
  • the calculation processing module 503 is further configured to calculate an accuracy of the network link prediction according to the adjusted similarity value corresponding to the selected predicted edge, to obtain an accuracy rate p F .
  • is a constant
  • is the set of edges in the test set
  • is the set of non-existing edges and edges of the test set
  • S (s 1 ,s 2 ) ⁇ (s 3 ,s 4 )... ⁇ (s 2m-1 , s 2m ), and satisfy the condition of s 1 ⁇ s 2 , s 3 ⁇ s 4 , . . . , s 2m-1 ⁇ s 2m .
  • are a collection of unknown edges.
  • the calculation processing module 503 re-adjusts the similarity value corresponding to each edge of the training set by using the calculated probability distribution of the existing edge and the probability distribution of the non-existing edge, thereby taking into account the non-existent edge corresponding to The similarity situation, and then predicting the possible edges according to the adjusted similarity value, not only can predict the edges existing between the two nodes with high similarity, but also predict the edges existing between the two nodes with low similarity. This improves the accuracy of link prediction.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division.
  • there may be another division manner for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication link shown or discussed may be an indirect coupling or communication link through some interface, device or module, and may be in an electrical, mechanical or other form.
  • the modules described as separate components may or may not be physically separated.
  • the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing module. It is also possible that each module physically exists separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种网络链路预测方法及装置,该方法包括:利用算出的存在的边的概率分布和非存在的边的概率分布,重新调整训练集中每条边对应的相似度值,从而考虑到非存在的边对应的相似度情况,然后依据调整后的相似度值预测可能存在的边,不但可以预测出相似度高的两个节点间存在的边,还可以预测出相似度低的两个节点间存在的边,进而提高了链路预测的准确性。

Description

网络链路预测方法及装置 技术领域
本发明属于数据挖掘领域,尤其涉及一种网络链路预测方法及装置。
背景技术
网络中的链路预测(Link Prediction)是指通过获取的已知网络节点和网络拓扑结构,预测网络中尚未产生连边的两个节点之间的链接的可能性。链路预测包括未知链接(exist yet unknown links)的预测以及未来链接(future links)的预测。传统的链路预测方法有多种,一般使用相似度算法对网络链路进行预测。基于相似度算法的链路预测包括:共同邻居(CN,Common neighbor)算法、杰卡德系数(Jaccarb coefficient)算法、资源分配(RA,Resource allocation)算法、局部路径指标(LP,Local path)算法以及结构微扰法(SPM,Structural Perturbation Method)。
上述的相似度算法,均是计算两个节点间的相似度得到相似度值,并将相似度值高的边,即相似度高的边,作为预测存在的边。但是在复杂网络中,相较于相似性大的边,一些相似性低的两个节点所形成的边能够有着更稳定的链接。通过现有的相似度算法只能算出两个节点的相似度,若算出的相似度低,也无法确定相似度低的两个节点所形成的边是否为预测存在的边。
发明内容
本发明提供一种网络链路预测方法及装置,旨在解决现有的链路预测方法中相似度算法无法预测相似度低的两个节点所形成的边的问题。
本发明提供的一种网络链路预测方法,包括:获取网络的网络拓扑结构;
将所述网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从所述存在的边中选取多个边组成训练集;
计算所述训练集中每一条存在的边的两个节点的相似度,得到所述训练集中各存在的边对应的相似度值,以及计算每一条所述非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值;
将经过相似度计算的各边划分为多个集合,并分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布;
根据预置转换函数、所述存在的边的概率分布和所述非存在的边的概率分布,对得到的所述各存在的边对应的相似度值进行调整;
在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为所述网络拓扑结构中预测的边。。
本发明提供的一种网络链路预测装置,包括:
获取模块,用于获取网络的网络拓扑结构;
设置模块,用于将所述网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从所述存在的边中选取多个边组成训练集;
计算处理模块,用于计算所述训练集中每一条存在的边的两个节点的相似度,得到所述训练集中各存在的边对应的相似度值,以及计算每一条所述非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值;
以及,将经过相似度计算的各边划分为多个集合,并分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布;
以及,根据预置转换函数、所述存在的边的概率分布和所述非存在的边的概率分布,对得到的所述各存在的边对应的相似度值进行调整;
预测模块,用于在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为所述网络拓扑结构中预测的边。
本发明提供的网络链路预测方法及装置,利用算出的存在的边的概率分布和非存在的边的概率分布,重新调整训练集中每条边对应的相似度值,从而考虑到非存在的边对应的相似度情况,然后依据调整后的相似度值预测可能存在的边,不但可以预测出相似度高的两个节点间存在的边,还可以预测出相似度低的两个节点间存在的边,进而提高了链路预测的准确性。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例。
图1是本发明第一实施例提供的网络链路预测方法的实现流程示意图;
图2是一个网络的星型网络拓扑结构的示意图;
图3是本发明第二实施例提供的网络链路预测方法的实现流程示意图;
图4是现有的链路预测方法和本发明实施例提供的网络链路预测方法的预测出边的准确率的比对结果的示意图;
图5是本发明第三、四实施例提供的网络链路预测装置的结构示意图。
具体实施方式
为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而非全部实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参阅图1,图1为本发明第一实施例提供网络链路预测方法的实现流程示意图,可应用于计算机,图1所示的网络链路预测方法,主要包括以下步骤:
S101、获取网络的网络拓扑结构。
该网络可以为社交网络、蛋白质网络或神经网络等现有的网络中。网络拓扑结构为网络的拓扑结构,是指网上计算机或设备与传输媒介形成的节点,与线的物理构成模式。网络拓扑结构中的节点分为两类:一类是转换和交换信息的转接节点,如交换机、集线器和终端控制器;另一类是访问节点,如计算机主机、终端。网络拓扑结构有多种形状,如总线型、星型、环型、树型以及网状型。如图2所示,图2为一个网络的星型网络拓扑结构的示意图。图2中每一个空心图形为节点,两个节点间的连线为链路,也可以称为网络拓扑结构的边。
S102、将该网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从该存在的边中选取多个边组成训练集。
预先定义网络拓扑结构中哪些边作为存在的边,哪些边作为非存在的边,然后按照预先定义的划分方式划分网络拓扑结构中的边。在获取的网络拓扑结构中,若任意两个节点间具有连接关系,则该具有连接关系的两个节点所形成的链路称为存在的边;反之,则为非存在的边。
从存在的边中选取多个边组成训练集中的选取方式为随机选取,且不限数量。
S103、计算该训练集中每一条存在的边的两个节点的相似度,得到该训练集中各存在的边对应的相似度值,以及计算每一条该非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值。
计算两个节点的相似度的方式不做限定,可以为共同邻居算法、杰卡德系数算法、资源分配算法、局部路径指标以及结构微扰法中的任意一种算法,或者其他适用于链路预测的相似度算法。其中,相似度值越大,表示两个节点的相似度越高。
S104、将经过相似度计算的各边划分为多个集合,并分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布。
集合的划分的方式为随机划分,集合中包含的元素的个数为大于或者等于0。
S105、根据预置转换函数、该存在的边的概率分布和该非存在的边的概率分布,对得到的该各存在的边对应的相似度值进行调整。
该训练集中边为集合中存在的边。依据各集合中存在的边的概率分布和非存在的边的概率分布,对各集合中包含的存在的边对应的相似度值进行调整。其中,每个集合中存在的边对应的调整后的相似度值是相同的。
S106、在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为该网络拓扑结构中预测的边。
该预置数值的取值不做限定,该预置数值可以等于调整后的相似度值,也可以不等于调整后的相似度值。当该预置数值等于调整后的相似度值时,可以选取大于该预置数值的相似度值,或者等于该预置数值的相似度值;当预置数 值不等于调整后的相似度值时,可以选取大于该预置数值的相似度值。例如调整后的相似度值为1和2,与相似度值1对应的边为边a和边b,相似度值为2对应的边为边c,若该预置数值为1,则将大于或等于1的相似度值对应的边a、边b和边c作为预测的边,或者将大于1的相似度值对应的边c作为预测的边。
本发明实施例中,利用算出的存在的边的概率分布和非存在的边的概率分布,重新调整训练集中每条边对应的相似度值,从而考虑到非存在的边对应的相似度情况,然后依据调整后的相似度值预测可能存在的边,不但可以预测出相似度高的两个节点间存在的边,还可以预测出相似度低的两个节点间存在的边,进而提高了链路预测的准确性。
请参阅图3,图3为本发明第二实施例提供的网络链路预测方法的实现流程示意图,可应用于计算机,图3所示的网络链路预测方法,主要包括以下步骤:
S301、获取网络的网络拓扑结构。
该网络可以为社交网络、蛋白质网络或神经网络等现有的网络中。网络拓扑结构为网络的拓扑结构,是指网上计算机或设备与传输媒介形成的节点,与线的物理构成模式。网络拓扑结构中的节点分为两类:一类是转换和交换信息的转接节点,如交换机、集线器和终端控制器;另一类是访问节点,如计算机主机、终端。网络拓扑结构有多种形状,如总线型、星型、环型、树型以及网状型。图2中每一个空心图形为节点,两个节点间的连线为链路,也可以称为网络拓扑结构的边。
S302、将该网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从该存在的边中选取多个边组成训练集。
预先定义网络拓扑结构中哪些边作为存在的边,哪些边作为非存在的边,然后按照预先定义的划分方式划分网络拓扑结构中的边。在获取的网络拓扑结构中,若任意两个节点间具有连接关系,则该具有连接关系的两个节点所形成的链路称为存在的边;反之,则为非存在的边。
从存在的边中选取多个边组成训练集中的选取方式为随机选取,且不限数量。训练集中选取的数量越多,预测的结果就越准确。
进一步地,将该网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从该存在的边中选取多个边组成训练集具体为:
设置该网络拓扑结构的邻接矩阵A=(aij)N×N,其中aij表示节点i和节点j间边的权重值,N为该网络拓扑结构中节点的数量,并将aij=1的边作为该存在的边,以及将aij=0的边作为该非存在的边,节点i和节点j属于该网络拓扑结构中的节点;
从该存在的边中选取多个边组成训练集ET,其中,E={(i,j)|aij≠0}。
进一步地,从该存在的边中选取多个边组成训练集之后还包括:
在该网络拓扑结构中,选取该训练集中的边之外的存在的边组成测试集EP,其中
Figure PCTCN2017093676-appb-000001
ET∪EP=E。
一般地,该训练集中边的数量占整个网络拓扑结构中存在的边的数量的 80%-90%,则该测试集中边的数量占整个网络拓扑结构中存在的边的数量的20%-10%。
S303、计算该训练集中每一条存在的边的两个节点的相似度,得到该训练集中各存在的边对应的相似度值,以及计算每一条该非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值。
计算两个节点的相似度的方式不做限定,可以为共同邻居算法、杰卡德系数算法、资源分配算法、局部路径指标以及结构微扰法中的任意一种算法,或者其他适用于链路预测的相似度算法。
以共同邻居算法为例,该共同邻居算法中计算两个节点i和j的相似度的表达式为:
Figure PCTCN2017093676-appb-000002
其中,Γ(i)表示节点i的邻居节点的集合,Γ(j)表示节点j的邻居节点的集合,|...|表示集合中节点的集合,节点i和j属于该训练集中边两端的节点,或者非存在的边的两端的节点。
S304、将经过相似度计算的各边划分为多个集合,并分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布。
进一步地,将经过相似度计算的各边划分为多个集合具体为:
将得到的相似度值划分为多个组,并将每一组中的相似度值对应的边划分为一个集合。
每一个组中可以包括一个相似度值,也可以包括多个相似度值。因此,每个集合中包含的元素的个数为大于或者等于0。
以一个实际例子对边的数量的概率统计过程进行说明,具体如下:
经过相似度计算的各边为边A、边B、边C、边D和边F,其中边A对应的相似度值为1,边B和边C对应的相似度值为4,边D对应的相似度值为6,边F对应的相似度值9。
令划分为三个组,组1中包含1,组2包含4和9,组3包含6,则边A属于集合1,边B、边C和边F属于集合2,边D属于集合3,假设边A、B、D为存在的边,边C和F为非存在的边,则集合1中存在的边的概率分布为1,非存在的边的概率分布为0;集合2中存在的边的概率分布为1,非存在的边的概率分布为2;集合3中存在的边的概率分布为1,非存在的边的概率分布为0。
S305、根据预置转换函数、该存在的边的概率分布和该非存在的边的概率分布,对得到的该各存在的边对应的相似度值进行调整。
该训练集中边为集合中存在的边。依据各集合中存在的边的概率分布和非存在的边的概率分布,对各集合中包含的存在的边对应的相似度值进行调整。
进一步地,根据预置转换函数、该存在的边的概率分布和该非存在的边的概率分布,对得到的该各存在的边对应的相似度值进行调整具体为:
根据预置转换函数分别计算各集合中存在的边的概率分布和非存在的边的概率分布之间的比值,其中,
根据预置转换函数分别计算各集合中存在的边的概率分布和非存在的边的概率分布之间的比值,其中,
该预置转换函数为:
Figure PCTCN2017093676-appb-000003
pr(s)为存在的边的概率分布,pn(s)为非存在的边的概率分布,s为划分的组中包含的相似度值;
将各集合算出的比值PNR(s)分别作为各集合中存在的边对应的调整后的相似度值。
当在步骤S303中算出的相似度值为0,则PNR(s)=0。设调整后的相似度值为s′,则s′=PNR(s)。
其中每个集合中存在的边对应的调整后的相似度值是相同的。设集合A算出的PNR(s)为1,集合B算出的PNR(s)为3,则集合A中各存在的边对应的调整后的相似度值为1,集合B中各存在的边对应的调整后的相似度值3。
S306、在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为该网络拓扑结构中预测的边。
该预置数值的取值不做限定,该预置数值可以等于调整后的相似度值,也可以不等于调整后的相似度值。当该预置数值等于调整后的相似度值时,可以选取大于该预置数值的相似度值,或者等于该预置数值的相似度值;当预置数值不等于调整后的相似度值时,可以选取大于该预置数值的相似度值。例如调整后的相似度值为1和2,与相似度值1对应的边为边a和边b,相似度值为2对应的边为边c,若该预置数值为1,则将大于或等于1的相似度值对应的边a、边b和边c作为预测的边,或者将大于1的相似度值对应的边c作为预测的边。
进一步地,先对调整后的相似度值由大到小或者由小到大进行排列,然后从相似度值大起,选取第L位的相似度值所对应的存在的边作为预测的边。L为大于或等于1的整数。若限制预测的边的个数,则L可以为测试集中边的个数。
S307、根据选取的预测的边对应的调整后的相似度值,计算网络链路预测的准确率,得到准确率pF
计算网络链路预测的准确率的公式为:
Figure PCTCN2017093676-appb-000004
其中,α是一个常数,
Figure PCTCN2017093676-appb-000005
|EP|为该测试集中边的集合,|U-ET|为该非存在的边和该测试集中边的集合,S=(s1,s2)∪(s3,s4)...∪(s2m-1,s2m),且满足s1<s2,s3<s4,...,s2m-1<s2m的条件。
其中,若限制预测的边的个数,则
Figure PCTCN2017093676-appb-000006
该非存在的边和该测试集中边的集合|U-ET|作为未知的边的集合。
未知的边表示两个节点之间的连接关系不清楚,需要待验证。
下面说明计算准确率pF的公式的推导过程,具体说明如下:
假设测试集和训练集具有完全相同的概率分布情况,则基本的准确率的公 式为:
Figure PCTCN2017093676-appb-000007
其中,c0为常数,若s>c0,且pr(s)<<pn(s),则pF预测的准确率就会很低;若s<c0,且pr(s)>>pn(s),则预测准确率就会很高。
由于上述基本的准确率的公式只能选取在步骤S303中算出的相似度值高的边,这个将上述基本的准确率的公式进行变形,变形的过程是:
定义未知的边(即该非存在的边和该测试集中边的集合中的边)对应的相似度值属于测试集EP的概率为:
Figure PCTCN2017093676-appb-000008
简化为pF=αPNR(s)。
进一步地,根据基本的准确率,得到计算网络链路预测的准确率的公式:
Figure PCTCN2017093676-appb-000009
如图4所示,图4为现有的链路预测方法和本发明实施例提供的网络链路预测方法的预测出边的准确率的比对结果的示意图。选取2个真实的网络:美国电网(US PowerGrid)和神经网络(Neural network),现有的链路预测方法包括:1组条形图比对使用的现有的链路预测方法为CN算法、2组条形图比对使用的现有的链路预测方法为Jaccarb算法、3组条形图比对使用的现有的链路预测方法为RA算法、4组条形图比对使用的现有的链路预测方法为LP算法以及5组条形图比对使用的现有的链路预测方法为SPM。图4中空心阴影的条形图为现有的链路预测方法的预测准确率,黑色条形图为本发明实施例提供的网络链路预测方法的预测准确率,纵轴坐标表示预测准确率,从图4中可以看出,本发明实施例提供的网络链路预测方法的准确率大于现有的链路预测方法的准确率。因此,相较于现有的链路预测方法,本发明实施例提供的网络链路预测方法的预测效果更好,准确率更好。
本发明实施例中,利用算出的存在的边的概率分布和非存在的边的概率分布,重新调整训练集中每条边对应的相似度值,从而考虑到非存在的边对应的相似度情况,然后依据调整后的相似度值预测可能存在的边,不但可以预测出相似度高的两个节点间存在的边,还可以预测出相似度低的两个节点间存在的边,进而提高了链路预测的准确性。
请参阅图5,图5是本发明第三实施例提供的网络链路预测装置的结构示意图,为了便于说明,仅示出了与本发明实施例相关的部分。图5示例的网络链路预测装置可以是前述图1所示实施例提供的网络链路预测方法的执行主体。图5示例的网络链路预测装置,主要包括:获取模块501、设置模块502、计算处理模块503和预测模块504。以上各功能模块详细说明如下:
获取模块501,用于获取网络的网络拓扑结构;
设置模块502,用于将该网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从该存在的边中选取多个边组成训练集;
计算处理模块503,用于计算该训练集中每一条存在的边的两个节点的相似度,得到该训练集中各存在的边对应的相似度值,以及计算每一条该非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值;
以及,将经过相似度计算的各边划分为多个集合,并分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布;
以及,根据预置转换函数、该存在的边的概率分布和该非存在的边的概率分布,对得到的该各存在的边对应的相似度值进行调整;
预测模块504,用于在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为该网络拓扑结构中预测的边。
从存在的边中选取多个边组成训练集中的选取方式为随机选取,且不限数量。
计算两个节点的相似度的方式不做限定,可以为共同邻居算法、杰卡德系数算法、资源分配算法、局部路径指标以及结构微扰法中的任意一种算法,或者其他适用于链路预测的相似度算法。其中相似度值越大,表示两个节点的相似度越高。
集合的划分的方式为随机划分,集合中包含的元素的个数为大于或者等于0。
该训练集中边为集合中存在的边。依据各集合中存在的边的概率分布和非存在的边的概率分布,对各集合中包含的存在的边对应的相似度值进行调整。其中每个集合中存在的边对应的调整后的相似度值是相同的。
本实施例未尽之细节,请参阅前述图1所示实施例的描述,此处不再赘述。
需要说明的是,以上图5示例的网络链路预测装置的实施方式中,各功能模块的划分仅是举例说明,实际应用中可以根据需要,例如相应硬件的配置要求或者软件的实现的便利考虑,而将上述功能分配由不同的功能模块完成,即将上述装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。而且,实际应用中,本实施例中的相应的功能模块可以是由相应的硬件实现,也可以由相应的硬件执行相应的软件完成。本说明书提供的各个实施例都可应用上述描述原则,以下不再赘述。
本发明实施例中,计算处理模块503利用算出的存在的边的概率分布和非存在的边的概率分布,重新调整训练集中每条边对应的相似度值,从而考虑到非存在的边对应的相似度情况,然后依据调整后的相似度值预测可能存在的边,不但可以预测出相似度高的两个节点间存在的边,还可以预测出相似度低的两个节点间存在的边,进而提高了链路预测的准确性。
请同样参阅图5,本发明第四实施例提供的网络链路预测装置是前述图1和图3所示实施例提供的网络链路预测方法的执行主体。为了便于说明,仅示出了与本发明实施例相关的部分。本发明第四实施例提供的网络链路预测装置,主要包括:获取模块501、设置模块502、计算处理模块503和预测模块504。 以上各功能模块详细说明如下:
获取模块501,用于获取网络的网络拓扑结构。
该网络可以为社交网络、蛋白质网络或神经网络等现有的网络中。网络拓扑结构为网络的拓扑结构,是指网上计算机或设备与传输媒介形成的节点,与线的物理构成模式。网络拓扑结构中的节点分为两类:一类是转换和交换信息的转接节点,如交换机、集线器和终端控制器;另一类是访问节点,如计算机主机、终端。网络拓扑结构有多种形状,如总线型、星型、环型、树型以及网状型。图2中每一个空心图形为节点,两个节点间的连线为链路,也可以称为网络拓扑结构的边。
设置模块502,用于将该网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从该存在的边中选取多个边组成训练集。
预先定义网络拓扑结构中哪些边作为存在的边,哪些边作为非存在的边,然后按照预先定义的划分方式划分网络拓扑结构中的边。
从存在的边中选取多个边组成训练集中的选取方式为随机选取,且不限数量。训练集中选取的数量越多,预测的结果就越准确。
进一步地,设置模块502还用于执行以下步骤:
设置该网络拓扑结构的邻接矩阵A=(aij)N×N,其中aij表示节点i和节点j间边的权重值,N为该网络拓扑结构中节点的数量,并将aij=1的边作为该存在的边,以及将aij=0的边作为该非存在的边,节点i和节点j属于该网络拓扑结构中的节点;
从该存在的边中选取多个边组成训练集ET,其中,E={(i,j)|aij≠0};
在该网络拓扑结构中,选取该训练集中的边之外的存在的边组成测试集EP,其中
Figure PCTCN2017093676-appb-000010
ET∪EP=E。
一般地,训练集中边的数量占整个网络拓扑结构中存在的边的数量的80%-90%,则测试集中边的数量占整个网络拓扑结构中存在的边的数量的20%-10%。
计算处理模块503,用于计算该训练集中每一条存在的边的两个节点的相似度,得到该训练集中各存在的边对应的相似度值,以及计算每一条该非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值。
计算两个节点的相似度的方式不做限定,可以为共同邻居算法、杰卡德系数算法、资源分配算法、局部路径指标以及结构微扰法中的任意一种算法,或者其他适用于链路预测的相似度算法。
以共同邻居算法为例,该共同邻居算法中计算两个节点i和j的相似度的表达式为:
Figure PCTCN2017093676-appb-000011
其中,Γ(i)表示节点i的邻居节点的集合,Γ(j)表示节点j的邻居节点的集合,|...|表示集合中节点的集合,节点i和j属于该训练集中边两端的节点,或者非存在的边的两端的节点。
计算处理模块503,还用于将经过相似度计算的各边划分为多个集合,并 分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布。
进一步地,计算处理模块503,还用于将得到的相似度值划分为多个组,并将每一组中的相似度值对应的边划分为一个集合。
每一个组中可以包括一个相似度值,也可以包括多个相似度值。因此,每个集合中包含的元素的个数为大于或者等于0。
以一个实际例子对边的数量的概率统计过程进行说明,具体如下:
经过相似度计算的各边为边A、边B、边C、边D和边F,其中边A对应的相似度值为1,边B和边C对应的相似度值为4,边D对应的相似度值为6,边F对应的相似度值9。
令划分为三个组,组1中包含1,组2包含4和9,组3包含6,则边A属于集合1,边B、边C和边F属于集合2,边D属于集合3,假设边A、B、D为存在的边,边C和F为非存在的边,则集合1中存在的边的概率分布为1,非存在的边的概率分布为0;集合2中存在的边的概率分布为1,非存在的边的概率分布为2;集合3中存在的边的概率分布为1,非存在的边的概率分布为0。
计算处理模块503,还用于根据预置转换函数、该存在的边的概率分布和该非存在的边的概率分布,对得到的该各存在的边对应的相似度值进行调整。
该训练集中边为集合中存在的边。依据各集合中存在的边的概率分布和非存在的边的概率分布,对各集合中包含的存在的边对应的相似度值进行调整。
进一步地,计算处理模块503还用于执行以下步骤:
根据预置转换函数分别计算各集合中存在的边的概率分布和非存在的边的概率分布之间的比值,其中,
根据预置转换函数分别计算各集合中存在的边的概率分布和非存在的边的概率分布之间的比值,其中,
该预置转换函数为:
Figure PCTCN2017093676-appb-000012
pr(s)为存在的边的概率分布,pn(s)为非存在的边的概率分布,s为划分的组中包含的相似度值;
将各集合算出的比值PNR(s)分别作为各集合中存在的边对应的调整后的相似度值。
当计算处理模块503算出的相似度值为0,则PNR(s)=0。设调整后的相似度值为s′,则s′=PNR(s)。
其中每个集合中存在的边对应的调整后的相似度值是相同的。设集合A算出的PNR(s)为1,集合B算出的PNR(s)为3,则集合A中各存在的边对应的调整后的相似度值为1,集合B中各存在的边对应的调整后的相似度值3。
预测模块504,用于在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为该网络拓扑结构中预测的边。
该预置数值的取值不做限定,该预置数值可以等于调整后的相似度值,也可以不等于调整后的相似度值。当该预置数值等于调整后的相似度值时,可以 选取大于该预置数值的相似度值,或者等于该预置数值的相似度值;当预置数值不等于调整后的相似度值时,可以选取大于该预置数值的相似度值。例如调整后的相似度值为1和2,与相似度值1对应的边为边a和边b,相似度值为2对应的边为边c,若该预置数值为1,则将大于或等于1的相似度值对应的边a、边b和边c作为预测的边,或者将大于1的相似度值对应的边c作为预测的边。
进一步地,预测模块504,还用于先对调整后的相似度值由大到小或者由小到大进行排列,然后从相似度值大起,选取第L位的相似度值所对应的存在的边作为预测的边。L为大于或等于1的整数。若限制预测的边的个数,则L可以为测试集中边的个数。
计算处理模块503,还用于根据选取的预测的边对应的调整后的相似度值,计算网络链路预测的准确率,得到准确率pF
计算网络链路预测的准确率的公式为:
Figure PCTCN2017093676-appb-000013
其中,α是一个常数,
Figure PCTCN2017093676-appb-000014
|EP|为该测试集中边的集合,|U-ET|为该非存在的边和该测试集中边的集合,S=(s1,s2)∪(s3,s4)...∪(s2m-1,s2m),且满足s1<s2,s3<s4,...,s2m-1<s2m的条件。
其中,若限制预测的边的个数,则
Figure PCTCN2017093676-appb-000015
该非存在的边和该测试集中边的集合|U-ET|作为未知的边的集合。
本实施例未尽之细节,请参阅前述图1和图3所示实施例的描述,此处不再赘述。
本发明实施例中,计算处理模块503利用算出的存在的边的概率分布和非存在的边的概率分布,重新调整训练集中每条边对应的相似度值,从而考虑到非存在的边对应的相似度情况,然后依据调整后的相似度值预测可能存在的边,不但可以预测出相似度高的两个节点间存在的边,还可以预测出相似度低的两个节点间存在的边,进而提高了链路预测的准确性。
在本申请所提供的多个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信链接可以是通过一些接口,装置或模块的间接耦合或通信链接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中, 也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
以上为对本发明所提供的网络链路预测方法及装置的描述,对于本领域的技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本发明的限制。

Claims (10)

  1. 一种网络链路预测方法,其特征在于,包括:
    获取网络的网络拓扑结构;
    将所述网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从所述存在的边中选取多个边组成训练集;
    计算所述训练集中每一条存在的边的两个节点的相似度,得到所述训练集中各存在的边对应的相似度值,以及计算每一条所述非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值;
    将经过相似度计算的各边划分为多个集合,并分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布;
    根据预置转换函数、所述存在的边的概率分布和所述非存在的边的概率分布,对得到的所述各存在的边对应的相似度值进行调整;
    在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为所述网络拓扑结构中预测的边。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从所述存在的边中选取多个边组成训练集包括:
    设置所述网络拓扑结构的邻接矩阵A=(aij)N×N,其中aij表示节点i和节点j间边的权重值,N为所述网络拓扑结构中节点的数量,并将aij=1的边作为所述存在的边,以及将aij=0的边作为所述非存在的边,节点i和节点j属于所述网络拓扑结构中的节点;
    从所述存在的边中选取多个边组成训练集ET,其中,E={(i,j)|aij≠0}。
    则所述从所述存在的边中选取多个边组成训练集之后还包括:
    在所述网络拓扑结构中,选取所述训练集中的边之外的存在的边组成测试集EP,其中
    Figure PCTCN2017093676-appb-100001
    ET∪EP=E。
  3. 根据权利要求2所述的方法,其特征在于,所述将经过相似度计算的各边划分为多个集合包括:
    将得到的相似度值划分为多个组,并将每一组中的相似度值对应的边划分为一个集合。
  4. 根据权利要求3所述的方法,其特征在于,所述根据预置转换函数、所述存在的边的概率分布和所述非存在的边的概率分布,对得到的所述各存在的边对应的相似度值进行调整,包括:
    根据预置转换函数分别计算各集合中存在的边的概率分布和非存在的边的概率分布之间的比值,其中,
    所述预置转换函数为:
    Figure PCTCN2017093676-appb-100002
    pr(s)为存在的边的概率分布,pn(s)为非存在的边的概率分布,s为划分的组中包含的相似度值;
    将各集合算出的比值PNR(s)分别作为各集合中存在的边对应的调整后的 相似度值。
  5. 根据权利要求4所述的方法,其特征在于,所述在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为所述网络拓扑结构中预测的边,之后还包括:
    根据选取的预测的边对应的调整后的相似度值,计算网络链路预测的准确率,得到准确率pF
    计算网络链路预测的准确率的公式为:
    Figure PCTCN2017093676-appb-100003
    其中,α是一个常数,
    Figure PCTCN2017093676-appb-100004
    |EP|为所述测试集中边的集合,|U-ET|为所述非存在的边和所述测试集中边的集合,s=(s1,s2)∪(s3,s4)...∪(s2m-1,s2m),且满足s1<s2,s3<s4,...,s2m-1<s2m的条件。
  6. 一种网络链路预测装置,其特征在于,所述装置包括:
    获取模块,用于获取网络的网络拓扑结构;
    设置模块,用于将所述网络拓扑结构中各节点间所形成的边,划分为存在的边和非存在的边,并从所述存在的边中选取多个边组成训练集;
    计算处理模块,用于计算所述训练集中每一条存在的边的两个节点的相似度,得到所述训练集中各存在的边对应的相似度值,以及计算每一条所述非存在的边的两个节点的相似度,得到各非存在的边的对应的相似度值;
    以及,将经过相似度计算的各边划分为多个集合,并分别对各集合内存在的边和非存在的边的数量进行概率统计,得到各集合中存在的边的概率分布和非存在的边的概率分布;
    以及,根据预置转换函数、所述存在的边的概率分布和所述非存在的边的概率分布,对得到的所述各存在的边对应的相似度值进行调整;
    预测模块,用于在调整后的相似度值中,选取大于或等于预置数值的相似度值,并将选取的相似度值对应的存在的边,作为所述网络拓扑结构中预测的边。
  7. 根据权利要求6所述的装置,其特征在于,
    所述设置模块,还用于设置所述网络拓扑结构的邻接矩阵A=(aij)N×N,其中aij表示节点i和节点j间边的权重值,N为所述网络拓扑结构中节点的数量,并将aij=1的边作为所述存在的边,以及将aij=0的边作为所述非存在的边,节点i和节点j属于所述网络拓扑结构中的节点;
    以及,从所述存在的边中选取多个边组成训练集ET,其中,E={(i,j)|aij≠0};
    以及,在所述网络拓扑结构中,选取所述训练集中的边之外的存在的边组成测试集EP,其中
    Figure PCTCN2017093676-appb-100005
    ET∪EP=E。
  8. 根据权利要求7所述的装置,其特征在于,
    所述计算处理模块,还用于将得到的相似度值划分为多个组,并将每一组中的相似度值对应的边划分为一个集合。
  9. 根据权利要求8所述的装置,其特征在于,
    所述计算处理模块,还用于根据预置转换函数分别计算各集合中存在的边的概率分布和非存在的边的概率分布之间的比值,其中,
    所述预置转换函数为:
    Figure PCTCN2017093676-appb-100006
    pr(s)为存在的边的概率分布,pn(s)为非存在的边的概率分布,s为划分的组中包含的相似度值;
    以及,将各集合算出的比值PNR(s)分别作为各集合中存在的边对应的调整后的相似度值。
  10. 根据权利要求9所述的装置,其特征在于,
    所述计算处理模块,还用于根据选取的预测的边对应的调整后的相似度值,计算网络链路预测的准确率,得到准确率pF
    计算网络链路预测的准确率的公式为:
    Figure PCTCN2017093676-appb-100007
    其中,α是一个常数,
    Figure PCTCN2017093676-appb-100008
    |EP|为所述测试集中边的集合,|U-ET|为所述非存在的边和所述测试集中边的集合,s=(s1,s2)∪(s3,s4)...∪(s2m-1,s2m),且满足s1<s2,s3<s4,...,s2m-1<s2m的条件。
PCT/CN2017/093676 2017-07-20 2017-07-20 网络链路预测方法及装置 WO2019014894A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/093676 WO2019014894A1 (zh) 2017-07-20 2017-07-20 网络链路预测方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/093676 WO2019014894A1 (zh) 2017-07-20 2017-07-20 网络链路预测方法及装置

Publications (1)

Publication Number Publication Date
WO2019014894A1 true WO2019014894A1 (zh) 2019-01-24

Family

ID=65016557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/093676 WO2019014894A1 (zh) 2017-07-20 2017-07-20 网络链路预测方法及装置

Country Status (1)

Country Link
WO (1) WO2019014894A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062486A (zh) * 2019-11-27 2020-04-24 北京国腾联信科技有限公司 一种评价数据的特征分布和置信度的方法及装置
CN112347369A (zh) * 2020-10-12 2021-02-09 中国电子科技集团公司电子科学研究院 基于网络表征的集成学习动态社会网络链路预测方法
CN112508085A (zh) * 2020-12-05 2021-03-16 西安电子科技大学 基于感知神经网络的社交网络链路预测方法
CN112700056A (zh) * 2021-01-06 2021-04-23 中国互联网络信息中心 复杂网络链路预测方法、装置、电子设备及介质
CN114660997A (zh) * 2020-12-22 2022-06-24 中国科学院沈阳自动化研究所 一种基于链路预测的安全一体化中两安冲突预测方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678531A (zh) * 2013-12-02 2014-03-26 三星电子(中国)研发中心 好友推荐方法和装置
CN103905246A (zh) * 2014-03-06 2014-07-02 西安电子科技大学 基于分组遗传算法的链路预测方法
US20140207385A1 (en) * 2011-08-26 2014-07-24 Philip Morris Products Sa Systems and methods for characterizing topological network perturbations
CN106817251A (zh) * 2016-12-23 2017-06-09 烟台中科网络技术研究所 一种基于节点相似度的链路预测方法及装置
CN107623586A (zh) * 2017-07-20 2018-01-23 深圳大学 网络链路预测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140207385A1 (en) * 2011-08-26 2014-07-24 Philip Morris Products Sa Systems and methods for characterizing topological network perturbations
CN103678531A (zh) * 2013-12-02 2014-03-26 三星电子(中国)研发中心 好友推荐方法和装置
CN103905246A (zh) * 2014-03-06 2014-07-02 西安电子科技大学 基于分组遗传算法的链路预测方法
CN106817251A (zh) * 2016-12-23 2017-06-09 烟台中科网络技术研究所 一种基于节点相似度的链路预测方法及装置
CN107623586A (zh) * 2017-07-20 2018-01-23 深圳大学 网络链路预测方法及装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111062486A (zh) * 2019-11-27 2020-04-24 北京国腾联信科技有限公司 一种评价数据的特征分布和置信度的方法及装置
CN111062486B (zh) * 2019-11-27 2023-12-08 北京国腾联信科技有限公司 一种评价数据的特征分布和置信度的方法及装置
CN112347369A (zh) * 2020-10-12 2021-02-09 中国电子科技集团公司电子科学研究院 基于网络表征的集成学习动态社会网络链路预测方法
CN112347369B (zh) * 2020-10-12 2023-09-08 中国电子科技集团公司电子科学研究院 基于网络表征的集成学习动态社会网络链路预测方法
CN112508085A (zh) * 2020-12-05 2021-03-16 西安电子科技大学 基于感知神经网络的社交网络链路预测方法
CN112508085B (zh) * 2020-12-05 2023-04-07 西安电子科技大学 基于感知神经网络的社交网络链路预测方法
CN114660997A (zh) * 2020-12-22 2022-06-24 中国科学院沈阳自动化研究所 一种基于链路预测的安全一体化中两安冲突预测方法
CN114660997B (zh) * 2020-12-22 2024-05-10 中国科学院沈阳自动化研究所 一种基于链路预测的安全一体化中两安冲突预测方法
CN112700056A (zh) * 2021-01-06 2021-04-23 中国互联网络信息中心 复杂网络链路预测方法、装置、电子设备及介质
CN112700056B (zh) * 2021-01-06 2023-09-15 中国互联网络信息中心 复杂网络链路预测方法、装置、电子设备及介质

Similar Documents

Publication Publication Date Title
WO2019014894A1 (zh) 网络链路预测方法及装置
Wang et al. Resource-efficient federated learning with hierarchical aggregation in edge computing
CN107682195B (zh) 基于复杂网络与大数据结合的通信网络鲁棒性评估方法
CN115358487A (zh) 面向电力数据共享的联邦学习聚合优化系统及方法
CN109064348A (zh) 一种在社交网络中封锁谣言社区并抑制谣言传播的方法
EP4131871A1 (en) Method and apparatus for generating network topology
CN113190939B (zh) 基于多边形系数的大型稀疏复杂网络拓扑分析和简化方法
CN114071582A (zh) 面向云边协同物联网的服务链部署方法及装置
CN111181792A (zh) 基于网络拓扑的sdn控制器部署方法、装置及电子设备
CN112487658A (zh) 一种电网关键节点的识别方法、装置及系统
Kalinin et al. Security evaluation of a wireless ad-hoc network with dynamic topology
CN114519306B (zh) 一种去中心化的终端节点网络模型训练方法及系统
WO2018166249A1 (zh) 一种网络业务传输的方法及系统
CN108965287B (zh) 一种基于有限临时删边的病毒传播控制方法
CN111160661A (zh) 一种电力通信网可靠性优化方法、系统以及设备
CN115277115A (zh) 一种用于解决网络上鲁棒信息传播问题的方法及系统
Miccichè et al. A primer on statistically validated networks
CN111600752B (zh) 一种电力通信业务可靠性优化方法及相关装置
CN107623586B (zh) 网络链路预测方法及装置
Otokura et al. Evolutionary core-periphery structure and its application to network function virtualization
CN107332687B (zh) 一种基于贝叶斯估计和共同邻居的链路预测方法
CN115759289A (zh) 基于用户分组协同的联邦学习方法、系统及装置
CN113722554A (zh) 数据分类方法、装置及计算设备
CN105608173A (zh) 一种基于自适应代理的渐进式社区发现方法
Wang et al. Automated allocation of detention rooms based on inverse graph partitioning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17918019

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 22/06/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17918019

Country of ref document: EP

Kind code of ref document: A1