WO2018151619A1 - Network analysis tool testing - Google Patents
Network analysis tool testing Download PDFInfo
- Publication number
- WO2018151619A1 WO2018151619A1 PCT/RU2017/000085 RU2017000085W WO2018151619A1 WO 2018151619 A1 WO2018151619 A1 WO 2018151619A1 RU 2017000085 W RU2017000085 W RU 2017000085W WO 2018151619 A1 WO2018151619 A1 WO 2018151619A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- graph
- vectors
- nodes
- edges
- node
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/50—Testing arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Provided is a process and a device for testing a functionality of a network analysis tool. The process involves receiving an input network dataset, the input network dataset defining a first graph comprising nodes and edges, wherein the edges represent connections between the nodes. The process further involves mapping the nodes to a first set of vectors, wherein the mapping is based on a similarity function assigning connection scores to vector pairs, determining, based on the first set of vectors, a second set of vectors, wherein each vector of the second set of vectors represents a node of a second graph, and determining edges connecting nodes of the second graph, based on the similarity function.
Description
NETWORK ANALYSIS TOOL TESTING
FIELD
The present disclosure relates to testing network analysis tools. In particular, the present disclosure relates to testing a functionality of a network analysis tool by analyzing artificially generated network data with the network analysis tool.
BACKGROUND
In order to test a network analysis tool such as a network mining tool, the network analysis tool may be fed with graphs representing different genuine or artificial networks and the analysis results may be evaluated to verify the functionality of the tool. In this regard, it may be beneficial to use a plurality of different graphs with similar properties to achieve statistically meaningful results.
SUMMARY
According to a first aspect of the present invention, there is provided a method for testing a functionality of a network analysis tool, the method comprising receiving an input network dataset, the input network dataset defining a first graph, the first graph comprising nodes and edges, the edges representing connections between the nodes, mapping the nodes to a first set of vectors, wherein the mapping determines a similarity function assigning connection scores to vector pairs, determining, based on the first set of vectors, a second set of vectors, wherein each vector of the second set of vectors represents a node of a second graph, and determining edges connecting nodes of the second graph, based on the similarity function.
In this regard, it is noted that the term "network analysis tool"* as used throughout the description and claims may equally refer to software, hardware, or a combination of software and hardware. For instance, the network analysis tool may be a combination of hardware and
software, e.g., a computing device storing computer-readable instructions, which receives the input network dataset.
As the input network dataset describes a network topology, e.g., computing devices and communication links, the network topology may be analyzed to derive patterns that allow for an enhanced insight into the network topology. Furthermore, effects of changes in the network topology may be analyzed, e.g., by randomly or systematically modifying the network topology. For instance, an impact of different changes on the network topology may be simulated or critical changes may be extracted. Similarly, the network topology may be modified to increase robustness of the network topologies to adverse events such as node or communication link malfunctions.
However, it is clear to persons of skill in the art that communication networks are just one example for a network that could be analyzed using the network analysis tool. Because, in principle, the network analysis tool may be used to analyze a wide range of different networks. For example, the network analysis tool may be used to analyze a transport network, one or more linked webpages, biological systems, the syntax of a (natural) language, a retail network, an advertising network, or a social network and in fact any kind of network having a topology which is susceptible to be described by a graph.
Moreover, the term "similarity function" as used throughout the description and claims in particular refers to a function that quantifies the similarity between nodes by a connection score, wherein a higher similarity, which may be represented by a higher connection score, may indicate a higher likeliness of an edge between the nodes and hence, connecting the nodes. For example, the connection score may be a real number, wherein higher numbers indicate a higher likeliness of the respective nodes being connected by an edge.
In a first possible implementation form of the first aspect, the method further comprises using the network analysis tool to analyze a network comprising the nodes and the edges of the second graph.
Hence, the network analysis tool may receive the second graph as input and derive one or more patterns from the second graph. As the second graph is derived from the first graph, the second graph may differ in size from the first graph, e.g., the second graph may comprise less than half, less than one-fifth of, less than one-tenth of, less than one-hundredth of, etc., or more than two times, more than five times, more than ten times, more than one-hundred times, etc., the number of nodes of the first graph, but still exhibit the same or similar properties/patterns as the first graph. For example, an analysis result of the first graph and an analysis result of the second graph may be compared and if the results are not consistent with each other, the network analysis tool may be adapted/corrected or discarded. Optionally, further tests may be performed to analyze a statistical meaning and/or the basis of observed deviations.
In a second possible implementation form of the first aspect, the first graph is a directed graph and determining edges connecting nodes of the second graph comprises determining, for each ordered node pair of the second graph, whether an edge connects a first node of the node pair to a second node of the node pair, based on a first connection score, and whether the edge connects the second node of the node pair to the first node of the node pair, based on a second connection score.
Accordingly, the presented method can be used to generate graphs having directed connections such as, for example, graphs representing data traffic such as graphs representing a (wireless) communication network, the distribution of goods, etc.
In a third possible implementation form of the first aspect, all vectors of the second set of vectors are determined based on randomly or pseudo-randomly drawing vectors from a multidimensional probability distribution approximated from the first set of vectors.
For instance, a multidimensional probability distribution may be fitted to the first set of vectors. Accordingly, a structure of the original graph may be preserved while the artificial graphs may be up-scaled or down-scaled. Moreover, the original graph may not be recoverable from the artificial graphs, thereby allowing to render features of a network open to the public while keeping the detailed network structure private/confidential.
In a fourth possible implementation form of the first aspect, all vectors of the second set of vectors are determined by selecting vectors from the first set of vectors.
For example, the second set may comprise a subset of the vectors of the first set and/or the vectors of the first set may be duplicated to generate a down-scaled or up-scaled artificial graph having similar properties.
In a fifth possible implementation form of the first aspect, all vectors of the second set of vectors are determined by selecting a vector of the first set of vectors and adding a noise vector to the selected vector.
Hence, besides generating a down-scaled or up-scaled artificial graph, graph properties may be randomly modified to provide the artificial graph with similar yet randomly modified properties compared to the original graph, thereby allowing systematically testing the significance of the network analysis tool results.
In a sixth possible implementation form of the first aspect, the noise vector is randomly or pseudo-randomly drawn from a multidimensional Gaussian probability distribution.
Hence, a multitude of statistically similar yet different artificial graphs may be generated that provide for graph samples within a region around the original graph.
In a seventh possible implementation form of the first aspect, the nodes of the first graph are assigned to communities and a node of the second graph corresponding to a selected vector of the first set inherits a respective community assignment of the node corresponding to the selected vector.
The communities may, for example, represent sets of densely connected nodes while the sets are more sparsely connected to each other than to the rest of the network.
Hence, artificial graphs with communities having a similar (in a statistical sense) yet different structure compared to the original graph may be generated.
In a eighth possible implementation form of the first aspect, the edges of the first graph are assigned weights and an edge of the second graph connecting nodes corresponding to selected vectors of the first set inherits a respective weight of an edge of the first graph connecting the nodes corresponding to the selected vectors.
Hence, additional features of the network modelled by the graph may be encoded in the edge weights and at least partially preserved in the generated artificial graphs. For example, the weights may correspond to bandwith of a communication link, transport capacity, etc.
In an ninth possible implementation form of the first aspect, if the nodes corresponding to the selected vectors are not connected in the first graph, said edge of the second graph is assigned a minimal weight among all edges of the first graph.
Hence, a weight structure of the original graph may be maintained while generating artificial graphs having a similar characteristic than the original graph.
In a tenth possible implementation form of the first aspect, determining edges connecting nodes of the second graph based on the similarity function further comprises comparing connection scores of pairs of nodes of the second graph with a threshold.
For example, edges between nodes of the second graph may be added, if the connection scores of the respective node pairs are above the threshold.
In an eleventh possible implementation form of the first aspect, the threshold is determined to discriminate, based on the similarity function, top-E node pairs of the first graph with relatively higher connection scores from the rest of node pairs, where E is a number of the edges in the first graph.
According to a second aspect of the present invention, there is provided a computer-readable medium, storing instructions which if executed by a computer cause the computer to load an input network dataset, the input network dataset defining a first graph, the first graph comprising nodes and edges, the edges representing connections between the nodes, map the nodes to a first set of vectors, wherein the mapping is based on a similarity function assigning connection scores to vector pairs, determine, based on the first set of vectors, a second set of vectors, wherein each vector of the second set of vectors represents a node of a second graph, and determine edges connecting nodes of the second graph, based on the similarity function.
For example, the computer may be provided with a storage storing the instructions and the input network data set, or the computer may retrieve the input data set via a network connection. Moreover, the computer may be caused, by executing instructions stored on the computer-readable medium, to analyze network data and generate the input network dataset. For instance, the instructions may cause the computer to request data on computing entities and data connections between the computing entities of a network and to map the computing entities to nodes of the first graph and the data connections to edges of the first graph.
In a first possible implementation form of the second aspect, the computer-readable medium further stores instructions which if executed by the computer cause the computer to execute a network analysis tool and analyze a network comprising the nodes and the edges of the second graph. Hence, a modified, e.g., down-scaled or up-scaled second graph can be derived from the first graph, wherein the derived graph has similar properties as the first graph. Thus, a comparison between the results of an analysis of a network corresponding to the first graph and networks corresponding to derived second graphs can be used to verify that the network analysis tool derives similar patterns when analyzing networks having similar properties. According to a second aspect of the present invention, there is provided a network analysis tool testing apparatus, comprising a processor and persistently stored instructions which, if executed by the processor, cause the processor to load an input network dataset, the input network dataset defining a first graph, the first graph comprising nodes and edges, the edges
representing connections between the nodes, map the nodes to a first set of vectors, wherein the mapping is based on a similarity function assigning connection scores to vector pairs, determine, based on the first set of vectors, a second set of vectors, wherein each vector of the second set of vectors represents a node of a second graph, determine edges connecting nodes of the second graph, based on the similarity function, and store an output network dataset, the output network dataset defining the second graph.
For instance, the apparatus may implement the method according to the first aspect and the implementation forms of the first aspect and use the second graph to test the network analysis tool. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows a flow-chart of a process for generating an output graph from an input graph;
Fig. 2 shows exemplary input and output graphs used/generated by the process of Fig. 1 ;
Fig. 3 illustrates the application of the process of Fig. 1 for use in relation to a network mining tool; and Fig. 4 shows a block diagram of a network mining tool testing apparatus. DETAILED DESCRIPTION
Fig. 1 and Fig. 2 illustrate a process 10 for generating an output graph 12 from an input graph 14. As indicated at step 16, the process 10 starts with receiving an input network dataset defining the input graph 14. In this regard, the following notation is used in the remainder:
For instance, the input graph 14 illustarted in Fig. 2 may be a directed weighted graph G = (N, E) with a community structure, as indicated by the circles around nodes (0,1,2,3) and
(3,4,5,6,7), respectively. Thus, each node nt may have an assigned community label c, . However, it is to note that depending on the graph, no community label or a set of community labels
may be assigned to a node. Moreover, each edge nj— > n . may have an assigned weight wtj .
As indicated at step 18, the input graph 14 is mapped to vectors. In particular, the graph G = (N,E) may be embedded by mapping the nodes to real-valued vectors For
example, the directed weighted graph G may be embedded based on a bilinear link model, BLM, or using largescale information network embedding, LINE, although other techniques such as Deep Walk (cf. B. Perozzi, R. Al-Rfou, and S. Skiena, "Deepwalk: Online learning of Social Representations," in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2014) or node2vec (cf. A. Grover and J. Leskovec, "Node2vec: Scalable feature learning for networks," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2016) may be used instead.
For instance, a BLM designed specially for directed graphs has been suggested by O. U. Ivanov and S. O. Bartunov in "Learning Representations in Directed Networks," presented at the International Conference on Analysis of Images, Social Networks and Texts, and publisched on pages 196-207 in Volume 542 of the series "Communications in Computer and Information Science" by Springer in 2015, and is given by:
hence allowing the embedding of a directed graph. Having regard to the joint link probability
the objective function of the embedding is a log-likelihood of the whole graph:
For the softmax approximation, a technique called noise contrastive estimation, NCE, presented by M. Gutmann and A. Hyvarinen in "Noise-Contrastive Estimation: A New Estimation Principle for Unnormalized Statistical Models," in AISTATS, Volume 1 , 2010, page 6, may be used. This technique is directed at the estimation of unnormalized probabilistic models, treating the normalizing constant as an additional parameter. The key idea is to reduce the task of probability density learning to a binary classification problem, namely, distinguishing the data distribution pd(x) from a noise distribution p„(x) . Applied to graph embedding and assuming that noise samples are k times more frequent than data samples, the mixture distribution takes the form:
Moreover, the posterior probability that a sample x is from the data distribution is:
If a model ρθ (χ) with parameter set Θ aims at fitting to the data distribution, the posterior probability becomes a function of Θ :
This approximates pd(x) without a normalization requirement in regard to ρθ(χ) . In the BLM setup, the normalization constant becomes a new parameter
resulting in a new parameter set and the following probabilistic model:
Taking into account that pd corresponds to the graph edges and choosing pn as
tne new objective becomes:
Thus, the initial objective in BLM may be replaced with the NCE objective which can be efficiently optimized.
The LINE approach for embedding a directed weighted graph suggested by J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, in "Line: Largescale Information Network Embedding," published in the Proceedings of the 24th International Conference on World
Wide Web, ACM, 2015, pages 1067-1077, is based on so-called first-order and second-order proximities between nodes. The first-order proximity of the nodes nt and n is indicated by the edge weight wy and characterizes the strength of the relation of the nodes:
The second-order proximity characterizes the relationship of a node n with its context nt :
This in fact coincides with the softmax approximation in the BLM.
Furthermore, it is suggested to optimize the two corresponding objectives separately:
The embedding vectors ul of both models may then be concatenated.
In order to reduce the summation complexity of the denominator, negative sampling, NEG, a technique known from language modelling and described by T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, in "Distributed Representations of Words and Phrases and their Compositionality," published in the Proceedings of the Advances in Neural Information Processing Systems, 2013, pages 31 1 1 -31 19, may be used. NEG is a simplification of NCE which does not approximate the softmax but nevertheless retains the quality of the embedding
vectors. This is achieved by replacing the term kpn (x) with 1 and ignoring the normalizing constant which results in:
LINE this has the form:
and other similarity functions may be used instead. Continuing with the example of Fig. 2, the input graph 14 G has | N |= 8 nodes and | E \= 28 edges, wherein two edges, (0,5) and (2,6), have weight 0.1 and all other edges have weight 10. As indicated above, the overlapping communities of the input graph 14 G are a = (0,1 ,2,3) and b = (3,4,5,6,7) with modularity QG = 0.2955 (cf. M. Drobyshevskiy, A. Korshunov, and
D. Turdakov, "Parallel Modularity Computation for Directed Weighted Graphs with Overlapping Communities," in Proceedings of the Institute for System Programming, Volume 28(6), 2016, pages 153-170). The input graph 14 G = (N, E) with N = {0,1,2,3,4,5,6,7} ,
£ = {(0,1), (1 ,0), (0,2),..., (7,6)} , weight labels {w01 = 10,w10 = 10,...,w05 = 0.1 ,...,w76 = 10} , and community labels may be embedded using
BLM with the following parameters: · epochs=600— number of epochs,
This may result in two vectors and a normalization parameter Z( per node:
The node pairs
may then be ranked by their similarity score. If embedding is successful, (almost) all edges
should have higher scores than non-edge pairs
and a threshold tG in s may be set which has rank \ E \ . Said threshold tG (approximately) separates edges
from non-edges
Then, the vectors
may be concatenated into a corresponding embedding vector for each node
Continuing with the above example, ranking all node pairs
N using the BLM similarity score computed as ) may result in the following ordered list:
The threshold tG in sy having rank \ E \ may thus be computed as ta - 0.63909784 6078 . Moreover, concatenating the vectors
into an embedding vector leads to:
12. For example, in order to approximate the distribution of the embedding vectors when
sampling/drawing the vectors representing the output graph 12:
randomly sampled, or
a multi-dimensional probability distribution may be fitted to the vectors and the
vectors representing the output graph 12 may be randomly drawn from said distribution.
As the number of vectors representing the output graph 12 corresponds to the size m =\ M \ of the output graph 12 H = (M, F) , 16 vectors may be sampled/drawn in relation to the example of Fig. 2.
For example, m vectors may be randomly picked (with repetitions) from the set of
embedding vectors
E.g., for each j = \ ..m , i may be drawn uniformly from [1 ,| N |] and an assignment may be made. As the vectors q . correspond to the nodes of the
vector qi may then be de-concatenated into 2 vectors of equal length
The selected vectors may then be assigned to the nodes of the output graph 12 H :
In this regard, it should however be noted that instead of concatenating and de-concatenating vectors, the selecting and assigning may also be performed on the basis of vector sets, where each set contains a first vector in relation to outgoing edges and a second vector in relation to ingoing edges.
As indicated at step 22, edges between nodes of the output graph 12 H = (M,-) may then be determined based on the similarity function by connecting top | F | pairs of nodes ranked by the similarity function. For instance, the similarity score may be computed for all pairs
as softmax For pairs e M x M with score z„ > tc ,
node ml may be connected to node m/ with an edge. The output graph 12 H = (M, F) may thus have a set of directed edges
Continuing with the above example, the similarity scores z . may be:
As a result, the edges F = {(0,2), (0,4),..., (15,14)} between the nodes m1 and mj of the output graph 12 may be determined.
Moreover, weights may be assigned to each of the | F \ edges of the output graph 12 H by inheriting the edge weights of the corresponding edges of the input graph 14 G, if the input graph 14 G has different edge weights. Furthermore, for edges of the output graph 12 H which have no corresponding edges in the input graph 14 G, a minimal edge weight may be assigned, e.g., a minimal weight of all edges of the input graph 14 G. I.e., for each edge (k,l) e F , the corresponding edge weight may be determined by if the
edge a minimal weight may be assigned.
Continuing with the above example:
Moreover, community labels may be assigned to each of the \ M \ nodes of graph H by inheriting corresponding community labels of the input graph 14 G, if the input graph 14 G has a community structure, i.e. community labels cj . I.e., for each node mj . e M , its community labels set CV may be determined by CV = C, , if the corresponding node vector qj is sampled from node vector rt .
Continuing with the above example:
As a result, the output graph 12 H = (M, F) with | M |= 16 nodes and | F |= 99 edges and two overlapping communities a = (0,5,6,14, 15) and b = (1 ,2,3,4,7,8,9,10,1 1 ,12,13,14) is determined, wherein the edges within the communities have high weight while the edges between the communities have low weight. Moreover, the output graph 12 H = (M, F) has a similar degree distribution and distribution of subgraphs with 3 nodes as the input graph 14 G - (N, E) and a relatively high modularity ( QH = 0.2374 ) in view of the communities.
In summary, the above process 10 of generating random output graphs 12 which have similar properties as a given input graph 14 provides the following benefits: automatic learning of degree distribution, subgraph distribution, and community structure from a given graph and reproducing them in synthetic graphs,
enabling synthetic graphs of arbitrary size, and
• supporting directed weighted graphs with communities.
However, it is clear to the skilled person that the above process 10 is not limited to directed weighted graphs with communities but that the process 10 may also be applied to graphs which are not directed, graphs which have edges without weight, and/or graphs without community structure. Moreover, the (directed) (weighted) input graph 14 (with communities) may - in principle - be from any graph domain (social, mobile, biological, etc.).
As shown in Fig. 3, the output graph 12 may be used for the development and significance testing of network mining tools, e.g., in view of community detection. Furthermore, since the output graph 12 can be made arbitrarily large, the scalability of network mining tools can be evaluated by testing a network mining tool with multiple output graphs 12 of different size which are all generated from the same input graph 14 but differ in size. For instance, the network mining tool may be tested with output graphs 12 having half, one-fifth of, one-tenth of, one-hundredth of, etc., and/or two times, five times, ten times, one-hundred times, etc., the number of nodes of the input graph 14. The analysis results gained by analyzing such output graphs 12 may be compared and if the results are consistent with each other (for a sufficiently large number of output graphs 12), scalability of the network mining tool may be verified.
Moreover, data anonymization can be achieved which gives the possibility to make network features public without making the exact structure of the network public. Finally, if a large network may be difficult to analyze due to its size, the process 10 may be applied to create a representative sample, i.e., an output graph 12 of smaller size, of such a network with similar properties.
Fig. 4 shows a block diagram of a network mining tool testing apparatus 24. The apparatus 24 comprises a processor 26 and a computer-readable medium 28 persistently storing instructions which if executed by the processor 26 implement some or all steps of the above- described process 10.
Claims
1. A method for testing a functionality of a network analysis tool, the method comprising: receiving an input network dataset, the input network dataset defining a first graph, the first graph comprising nodes and edges, the edges representing connections between the nodes; mapping the nodes to a first set of vectors, wherein the mapping determines a similarity function assigning connection scores to vector pairs; determining, based on the first set of vectors, a second set of vectors, wherein each vector of the second set of vectors represents a node of a second graph; and determining edges connecting nodes of the second graph, based on the similarity function.
2. The method of claim 1 , further comprising: using the network analysis tool to analyze a network comprising the nodes and the edges of the second graph.
3. The method of claim 1 or 2, wherein the first graph is a directed graph and determining edges connecting nodes of the second graph comprises determining, for each ordered node pair of the second graph: whether an edge connects a first node of the node pair to a second node of the node pair based on a first connection score; and whether the edge connects the second node of the node pair to the first node of the node pair based on a second connection score.
4. The method of any one of claims 1 to 3, wherein all vectors of the second set of vectors are determined based on randomly or pseudo-randomly drawing vectors from a multidimensional probability distribution approximated from the first set of vectors.
5. The method of claim 4, wherein all vectors of the second set of vectors are determined by selecting vectors from the first set of vectors.
6. The method of claim 4, wherein all vectors of the second set of vectors are determined by selecting vectors from the first set of vectors and adding a noise vector to the selected vector.
7. The method of claim 6, wherein the noise vector is randomly or pseudo- randomly drawn from a multidimensional Gaussian probability distribution.
8. The method of any one of claims 4 to 7, wherein the nodes of the first graph are assigned to communities and a node of the second graph corresponding to a selected vector of the first set inherits a respective community assignment of the node corresponding to the selected vector.
9. The method of any one of claims 4 to 8, wherein the edges of the first graph are assigned weights and an edge of the second graph connecting nodes corresponding to selected vectors of the first set inherits a respective weight of an edge of the first graph connecting the nodes corresponding to the selected vectors.
10. The method of claim 9, wherein if the nodes corresponding to the selected vectors are not connected in the first graph, said edge of the second graph is assigned a minimal weight among all edges of the first graph.
1 1 . The method of any one of claims 1 to 10, wherein determining edges connecting nodes of the second graph based on the similarity function further comprises comparing connection scores of pairs of nodes of the second graph with a threshold.
12. The method of claim 1 1, wherein the threshold is determined to discriminate, based on the similarity function, top-E node pairs of the first graph with relatively higher connection scores from the rest of node pairs, where E is a number of the edges in the first graph.
13. A computer-readable medium, storing instructions which if executed by a computer cause the computer to: load an input network dataset, the input network dataset defining a first graph, the first graph comprising nodes and edges, the edges representing connections between the nodes; map the nodes to a first set of vectors, wherein the mapping is based on a similarity function assigning connection scores to vector pairs; determine, based on the first set of vectors, a second set of vectors, wherein each vector of the second set of vectors represents a node of a second graph; and determine edges connecting nodes of the second graph, based on the similarity function.
14. The computer-readable medium of claim 13, further storing instructions which if executed by the computer cause the computer to: execute a network analysis tool; and analyze a network comprising the nodes and the edges of the second graph.
15. A network analysis tool testing apparatus, comprising: a processor; and persistently stored instructions which if executed by the processor cause the processor to: load an input network dataset, the input network dataset defining a first graph, the first graph comprising nodes and edges, the edges representing connections between the nodes;
map the nodes to a first set of vectors, wherein the mapping is based on a similarity function assigning connection scores to vector pairs; determine, based on the first set of vectors, a second set of vectors, wherein each vector of the second set of vectors represents a node of a second graph; determine edges connecting nodes of the second graph, based on the similarity function; and store an output network dataset, the output network dataset defining the second graph.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2017/000085 WO2018151619A1 (en) | 2017-02-20 | 2017-02-20 | Network analysis tool testing |
CN201780086994.5A CN110313150B (en) | 2017-02-20 | 2017-02-20 | Network analysis tool testing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2017/000085 WO2018151619A1 (en) | 2017-02-20 | 2017-02-20 | Network analysis tool testing |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018151619A1 true WO2018151619A1 (en) | 2018-08-23 |
Family
ID=58699234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/RU2017/000085 WO2018151619A1 (en) | 2017-02-20 | 2017-02-20 | Network analysis tool testing |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110313150B (en) |
WO (1) | WO2018151619A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110969685A (en) * | 2018-09-28 | 2020-04-07 | 苹果公司 | Customizable rendering pipeline using rendering maps |
US20210390461A1 (en) * | 2017-06-30 | 2021-12-16 | Visa International Service Association | Graph model build and scoring engine |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7821966B2 (en) * | 2007-03-19 | 2010-10-26 | International Business Machines Corporation | Method and apparatus for network topology discovery using closure approach |
CN101877711B (en) * | 2009-04-28 | 2013-08-28 | 华为技术有限公司 | Social network establishment method and device, and community discovery method and device |
CN101894123A (en) * | 2010-05-11 | 2010-11-24 | 清华大学 | Subgraph based link similarity quick approximate calculation system and method thereof |
CN103034687B (en) * | 2012-11-29 | 2017-03-08 | 中国科学院自动化研究所 | A kind of relating module recognition methodss based on 2 class heterogeneous networks |
CN104102745B (en) * | 2014-07-31 | 2017-12-29 | 上海交通大学 | Complex network community method for digging based on Local Minimum side |
-
2017
- 2017-02-20 CN CN201780086994.5A patent/CN110313150B/en active Active
- 2017-02-20 WO PCT/RU2017/000085 patent/WO2018151619A1/en active Application Filing
Non-Patent Citations (12)
Title |
---|
A. GROVER; J, LESKOVEC: "Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining", 2016, ACM, article "Node2vec: Scalable feature learning for networks" |
ADITYA GROVER ET AL: "node2vec: Scalable Feature Learning for Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 3 July 2016 (2016-07-03), XP080711771 * |
AVIN CHEN ET AL: "On social networks of program committees", SOCIAL NETWORK ANALYSIS AND MINING, SPRINGER VIENNA, VIENNA, vol. 6, no. 1, 8 April 2016 (2016-04-08), pages 1 - 20, XP036119936, ISSN: 1869-5450, [retrieved on 20160408], DOI: 10.1007/S13278-016-0328-Y * |
B. PEROZZI; R. AL-RFOU; S. SKIENA: "Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining", 2014, ACM, article "Deepwalk: Online learning of Social Representations" |
BRYAN PEROZZI ET AL: "DeepWalk", KNOWLEDGE DISCOVERY AND DATA MINING, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 24 August 2014 (2014-08-24), pages 701 - 710, XP058053805, ISBN: 978-1-4503-2956-9, DOI: 10.1145/2623330.2623732 * |
J. TANG; M. QU; M. WANG; M. ZHANG; J. YAN; Q. MEI: "Proceedings of the 24th International Conference on World Wide Web", 2015, ACM, article "Line: Largescale Information Network Embedding", pages: 1067 - 1077 |
JIAN WU ET AL: "Internet routing resilience to failures", CONEXT 2007, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 10 December 2007 (2007-12-10), pages 1 - 12, XP058272722, ISBN: 978-1-59593-770-4, DOI: 10.1145/1364654.1364687 * |
KARASUYAMA MASAYUKI ET AL: "Adaptive edge weighting for graph-based learning algorithms", MACHINE LEARNING, KLUWER ACADEMIC PUBLISHERS, BOSTON, US, vol. 106, no. 2, 18 November 2016 (2016-11-18), pages 307 - 335, XP036133778, ISSN: 0885-6125, [retrieved on 20161118], DOI: 10.1007/S10994-016-5607-3 * |
M. DROBYSHEVSKIY; A. KORSHUNOV; D. TURDAKOV: "Parallel Modularity Computation for Directed Weighted Graphs with Overlapping Communities", PROCEEDINGS OF THE INSTITUTE FOR SYSTEM PROGRAMMING, vol. 28, no. 6, 2016, pages 153 - 170 |
M. GUTMANN; A. HYVARINEN: "Noise-Contrastive Estimation: A New Estimation Principle for Unnormalized Statistical Models", AISTATS, vol. 1, 2010, pages 6 |
O. U. IVANOV; S. O. BARTUNOV: "International Conference on Analysis of Images, Social Networks and Texts", vol. 542, 2015, SPRINGER, article "Learning Representations in Directed Networks", pages: 196 - 207 |
T. MIKOLOV; 1. SUTSKEVER; K. CHEN; G. S. CORRADO; J. DEAN: "Distributed Representations of Words and Phrases and their Compositionality", PROCEEDINGS OF THE ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2013, pages 3111 - 3119 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210390461A1 (en) * | 2017-06-30 | 2021-12-16 | Visa International Service Association | Graph model build and scoring engine |
US11847540B2 (en) * | 2017-06-30 | 2023-12-19 | Visa International Service Association | Graph model build and scoring engine |
CN110969685A (en) * | 2018-09-28 | 2020-04-07 | 苹果公司 | Customizable rendering pipeline using rendering maps |
CN110969685B (en) * | 2018-09-28 | 2024-03-12 | 苹果公司 | Customizable rendering pipeline using rendering graphs |
Also Published As
Publication number | Publication date |
---|---|
CN110313150B (en) | 2021-02-05 |
CN110313150A (en) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dias et al. | Concept lattices reduction: Definition, analysis and classification | |
Jensen et al. | Linkage and autocorrelation cause feature selection bias in relational learning | |
Parker et al. | Accelerating fuzzy-c means using an estimated subsample size | |
O'Neill et al. | Common subtrees in related problems: A novel transfer learning approach for genetic programming | |
Ivanov et al. | Understanding isomorphism bias in graph data sets | |
Nunes et al. | GraphHD: Efficient graph classification using hyperdimensional computing | |
Yang et al. | Metamorphic exploration of an unsupervised clustering program | |
Pelikan et al. | Transfer learning, soft distance-based bias, and the hierarchical boa | |
WO2018151619A1 (en) | Network analysis tool testing | |
Brown et al. | LSHPlace: fast phylogenetic placement using locality-sensitive hashing | |
JP2010272004A (en) | Discriminating apparatus, discrimination method, and computer program | |
Terziev | Feature Generation using Ontologies during Induction of Decision Trees on Linked Data. | |
CN116361788A (en) | Binary software vulnerability prediction method based on machine learning | |
CN113392086B (en) | Medical database construction method, device and equipment based on Internet of things | |
Saha et al. | FLIP: active learning for relational network classification | |
Krész et al. | Economic network analysis based on infection models | |
Carmona et al. | Non-dominated multi-objective evolutionary algorithm based on fuzzy rules extraction for subgroup discovery | |
Kaedi et al. | Holographic memory-based Bayesian optimization algorithm (HM-BOA) in dynamic environments | |
Khoshgoftaar et al. | A novel feature selection technique for highly imbalanced data | |
Guerreiro et al. | Recovering network topology and dynamics via sequence characterization | |
CN116049700B (en) | Multi-mode-based operation and inspection team portrait generation method and device | |
Miao et al. | Informative core identification in complex networks | |
Santana et al. | Network measures for re-using problem information in EDAs | |
Wang et al. | Ensemble clustering based on evidence theory | |
Ramos-Jiménez et al. | Induction of decision trees using an internal control of induction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17722900 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17722900 Country of ref document: EP Kind code of ref document: A1 |