WO2012150107A1 - Network analysis tool - Google Patents
Network analysis tool Download PDFInfo
- Publication number
- WO2012150107A1 WO2012150107A1 PCT/EP2012/056303 EP2012056303W WO2012150107A1 WO 2012150107 A1 WO2012150107 A1 WO 2012150107A1 EP 2012056303 W EP2012056303 W EP 2012056303W WO 2012150107 A1 WO2012150107 A1 WO 2012150107A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- node
- nodes
- motif
- motifs
- Prior art date
Links
- 238000003012 network analysis Methods 0.000 title claims abstract description 9
- 238000000034 method Methods 0.000 claims description 26
- 238000000513 principal component analysis Methods 0.000 claims description 13
- 238000003064 k means clustering Methods 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims 1
- 230000001788 irregular Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 description 16
- 239000013598 vector Substances 0.000 description 8
- 238000012800 visualization Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000005065 mining Methods 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 241001598984 Bromius obscurus Species 0.000 description 1
- 235000008694 Humulus lupulus Nutrition 0.000 description 1
- 241001092142 Molina Species 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000010234 longitudinal analysis Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
- H04L67/306—User profiles
Definitions
- the present invention relates to a method and tool for analyzing a network of information
- network nodes correspond with people and connections between nodes can represent relationships between friends and family.
- financial networks the affairs of payees and payers, or borrowers and lenders can be represented by account nodes interconnected by transactions between those nodes.
- account nodes interconnected by transactions between those nodes.
- telephone or communications networks nodes corresponding to telephone or e-mail accounts can be related by calls, messages or correspondence between those accounts.
- the network of connections around a node of a network can provide characteristic information about the node itself. It is appreciated in that art that the structural similarity of two nodes within a network can be measured in two ways.
- LORRAIN F., WHITE H.: Structural Equivalence of Individuals in Social Networks, Journal of Mathematical Sociology 1 (1971 ), 49-80 disclose that two nodes are structurally equivalent if they share many of the same neighbors.
- LEICHT E., HOLME P., NEWMAN M.: Vertex Similarity in Networks, Physical Review E 73, 026120 (2006) formulate a measure based on two nodes being regularly equivalent if they are connected to other nodes that are themselves structurally similar.
- Structural similarity can also be used to classify entire networks.
- KOSCHUTZKI D., SCHWOBBERMEYER H., SCHREIBER F.: Ranking of Network Elements Based on Functional Substructures, Journal of Theoretical Biology 248, 3 (2007), 471-479 formulate a number of network motif-based centrality measures. They rank the vertices of the E. Coli transcriptional network using each centrality measure. They claim that network motif-based centrality measures identify genes that are import regulators which are overlooked by local (e.g. out-degree) and global (e.g. betweeness) centrality measures.
- Network motif analysis is generally concerned with entire networks and global counts.
- the analysis comprises qualitative interviews and a quantitative analysis at three distinct levels (node- neighbour pairs, neighbour-neighbour pairs and networks).
- the quantitative analysis investigated the characteristics of the nodes, the structural characteristics of the networks, the characteristics of the node-neighbour pairs and neighbours, the structural positions of the neighbours, and the characteristics of the neighbour- neighbour pairs.
- VON LANDESBERGER T., GORNER M., SCHRECK T.: Visual Analysis of Graphs with Multiple Connected Components, Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST'09) (2009), IEEE Computer Society, pp. 155-162 use a self- organizing map (SOM) to cluster networks into a grid of prototypical networks. They compute a variety of topological features for the networks, including reciprocity features, distance features, clustering features and degree distribution features. The user weights the features appropriately and the system produces a SOM layout. Each cell represents a subset of similar networks; the background color indicates the number.
- SOM self- organizing map
- PCA Component Analysis
- E-NET is a tool developed primarily by the social scientist Steve Borgatti, for analyzing networks. It aids with data collection, data analysis and visualization. For data collection, it produces appropriate questionnaires to elicit attribute and relationship data from people. For data analysis, it measures size, composition (e.g. homogeneity and homophily), structure (e.g.
- LI C LIN S.: Egocentric Information Abstraction for Heterogeneous Social Networks.
- ASONAMO9 (2009), IEEE Computer Society, pp. 255-260 summarize egocentric networks by combining the surrounding relational structures with the statistical dependencies between attribute values to form feature vectors.
- the features describing the relational structures are based on the various types of paths of fixed length (say, two) that can emanate from an ego (node). They use frequency- based measures (local frequency, local rarity and relative frequency) to determine whether or not a feature is relevant. They then construct representative egocentric networks using only the relevant features.
- a computer-implemented method for analyzing a network of information comprising a plurality of interconnected nodes, the method comprising the steps of:
- each network motif comprising a respective pattern of connections between a node and at least its neighbouring nodes
- determining a network motif profile comprising for each network motif of the set of network motifs, a count of the instances of the network motif at said node;
- said dimensionality reduction comprises performing principal component analysis (PCA) on said normalized network motif profile information in said high dimensionality space.
- PCA principal component analysis
- the present invention provides a system that analyzes and clusters nodes based on the relationship structure of their network connections; and presents the results as a node based spatialization.
- Embodiments of the invention use a form of network motif analysis and dimensionality reduction to cluster nodes so that two nodes are in the same cluster if their respective network connections are structurally similar. This view of a network discriminates between the various classes of typical and exceptional nodes.
- Embodiments of the present invention combine network motif analysis at the node level and dimensionality reduction using PCA to produce an aggregated node based view of a network.
- Embodiments allow a user to visually inspect networks, network ratio profiles, and a spatialization of the nodes based on the structural similarity of the node networks.
- the various views are coordinated allowing a user to select a node in one view and examine its properties in another.
- a user can also compare, for example, network ratio profiles through selecting multiple nodes to help identify the distinguishing features of a collection of node networks.
- Embodiments of the invention use network motif analysis to exhaustively count the number of network motifs up to a certain size in a network.
- a node's network connections can include connections between a node and its immediate neighbours as well as connections between a node's neighbours and possibly their neighbours.
- Embodiments of the invention are particularly useful for identifying rogue behaviour without a priori knowledge of the form of this behaviour.
- a personal bank account in a financial transaction network is typical, its network connections should be structurally similar to network connections of other typical accounts. At the very least, there should be a small number of classes of typical accounts.
- a bank account is involved in smurfing (the splitting of large financial transactions into multiple smaller transactions, each of which is below a limit above which financial institutions must report), assuming the incidence of smurfing is relatively low, the bank account's network connections should be relatively exceptional. The only inputs for required for the present system to analyze such a network would be a list of account transactions.
- the structure of a node's network is defined by the longest shortest-path distance k from a node to every other node in the node's network (the radius) as well as the various network motifs to be counted.
- the counts for each node's network are adjusted for scale to produce network ratio profiles.
- the network ratio profiles can be interpreted as points in a high-dimensional space.
- they are projected onto a 2-dimensional spatialization using principal component analysis (PCA). This projection removes the correlations between the counts.
- PCA principal component analysis
- Figure 1 shows a user interface including a view generated according to an embodiment of the present invention for browsing a single 1 ,000-node random network from the ER dataset.
- Figure 2 shows a summary for a selection of nodes in Figure 1 .
- Figure 3 shows a view of a single 1 ,000-node network from the WS dataset generated according to an embodiment of the present invention.
- Figure 4 shows a view of the activity in the Prosper Marketplace dataset during April 2010 generated according to an embodiment of the present invention.
- Figure 5 shows a comparison of a view generated according to an embodiment of the present invention and a global view of the MIT Reality Mining dataset.
- Figure 6 is a flow diagram illustrating an embodiment of the invention. Description of the Preferred Embodiments
- Embodiments of the present invention comprise a network analysis tool which produces a spatialization for a network of nodes (egos) that clusters nodes so that two nodes are in the same cluster if their node networks are structurally similar.
- One of the potential applications for the tool is in identifying and visualizing nodes exhibiting potentially fraudulent behaviour without specifying the behaviour of such nodes a priori.
- FIG. 1 shows a user interface 10 for a network analysis tool according to an embodiment of the present invention.
- the interface comprises multiple coordinated views 12, 14, 16.
- Each of the three views 12, 14, 16 illustrates a specific aspect of selected node(s) and each view is coordinated with the others, so that selecting nodes in one window causes appropriate updates in the other views.
- the views are coordinated, allowing a user to select a node in one view and examine its properties in the others.
- the system includes a view 14 of the topology of the selected node networks and a view 16 for comparing their network ratio profiles.
- the spatialization 12 is the central view in the system. A user may pan and zoom within this view. At the top left, the view 12 includes bar indicators 20.
- each bar 20 shows the percentages of variability captured by each axis of the view; as such, these can be interpreted as a measure of the significance of each axis of the spatialization.
- the view 12 further includes a slider control 22 that can automatically color the nodes based on a k-means clustering.
- the node based spatialization in the view 12 is computed through network motif analysis and dimensionality reduction, described in more detail below.
- the tool begins by calculating a node network for each node in turn, step 60.
- the node network, or k-neighborhood subnetwork, of a node u is the subnetwork induced by the set of vertices that have shortest-path distance at most k hops from u.
- k 2
- a network motif profile is calculated, step 62.
- a profile comprises a 30-element vector where each entry is based on a count of the number of instances of the corresponding network motif in an ordered list that are incident with the node.
- the ordered list comprises all network motifs with at most I vertices up to
- I 5 i.e. the maximum number of nodes in any given network motif is 5.
- the ordered list can be generated using, for example, geng from the nauty package disclosed in MCKAY B.: Isomorph-Free Exhaustive Generation. Journal of Algorithms 26, 2 (1998), 306-324.
- the counts for each element of the network profile vector can be calculated using GraphGrepSX disclosed in GIUGNO R., SHASHA D.: GraphGrep: A Fast and Universal Method for Querying Graphs, Proceedings of the 16th International
- GraphGrepSX is a tool that solves the subgraph isomorphism problem using enumerated paths as index features. This can be a time-consuming process, but for large datasets, node networks can be processed in parallel and/or both k and I can be reduced.
- each entry of the 30-element vector comprises a normalized ratio of the corresponding entry in the network motif profile.
- the ratio profile rp of a node network is computed using:
- nmpj is the ith entry of the network motif profile
- nmp i is the average of the ith entry of all of the network motif profiles
- ⁇ is a small integer that ensures that the ratio is not misleadingly large when the network motif appears very few times in all of the node networks.
- a normalized ratio measures the abundance of a network motif in each individual node network relative to all node networks; it is similar to a z-score. It is noted that there are correlations between the elements of a network ratio profile. Thus, in the embodiment, to adjust for these, a dimensionality reduction is performed, step 66.
- Principal Component Analysis is an exemplary dimensionality reduction technique that calculates the eigenvectors of a covariance matrix
- each eigenvector or principal component, corresponds to an orthogonal direction of variation within the data. Often, a small subset of eigenvectors can account for much of the variability.
- PCA identifies underlying structures such as clusters and outliers that are difficult to perceive in the original set of vectors.
- the first two eigenvectors are calculated and these are used as the x and y axes for the spatialization shown in the window 12 of Figure 1 . It will of course be appreciated that if more than two eigenvectors were calculated, the window 12 could be implemented as a three dimensional display of nodes; or other techniques for enabling browsing of higher dimension spaces could be employed. It will also be noted that in Figure 1 , the PCA analysis has been directed to determine 5 clusters of nodes, whereas it can be seen that any number of clusters can be chosen.
- Figure 1 is based on an analysis of the ER dataset. This includes 50 random networks generated using the Erdos-Renyi model disclosed in: ERDOS P., RENYI A.: On Random Graphs. Publicationes Mathematicae Debrecen 6 (1959), 290-297; and ERDOS P., RENYI A.: On the Evolution of Random Graphs. Institute of
- Figure 2 shows an updated window 16 from Figure 1 , if all five vertices of the clique indicated by the circle 18 in the network in Fig. 1 are selected and displayed in radar chart style.
- Figure 2 shows the five nodes v0... v4 have broadly identical network ratio profiles making them structurally similar. All elements in the network ratio profiles are relatively high, especially for the higher-order network motifs and this is what makes the corresponding nodes exceptional.
- the WS dataset also includes 50 random networks generated using the Watts- Stogatz model disclosed in WATTS D., STROGATZ S.: Collective Dynamics of 'Small-World' Networks. Nature 393 (1998), 440-442. Again, each network contains 1 ,000 vertices and 5,000 edges. Again, five vertices (nodes) were chosen at random and augmented with additional edges to create a clique.
- Figure 3 which is based on a single 1 ,000-node network from the WS dataset is less convincing.
- the five nodes belonging to the five node clique are surrounded by a black circle 33.
- the clique is not easily identifiable through the spatialization shown. This is due to the increased clustering coefficient (indicated on the axis 32) found in networks from the WS dataset compared to those from the ER dataset.
- the nodes in the clique are no longer considered exceptional. (Again, the same is also true for all 50 networks in the WS dataset).
- the networks in both Fig. 1 and Fig. 3 have the same number of vertices and edges. However, their differing structure means that a node centric network considered exceptional in one is typical in another.
- the Prosper Marketplace dataset (www.prosper.com) is derived from a peer-to-peer lending or social lending service where borrowers ask for money in the form of listings and lenders bid on listings specifying repayment terms including interest rates. If enough lenders fund a listing, the listing becomes a loan. Prosper rates prospective borrowers according to their creditworthiness. It also maintains borrower and lender groups, endorsements, past listings, bids and loans.
- the social structure of the service is evident from the data: a node represents a borrower or lender and an edge represents a fraction of a loan agreed upon between a borrower and a lender. It should be noted that lenders can also be borrowers and vice versa and therefore the network is not necessarily bipartite.
- Figure 4 includes a view 42 showing the activity in the Prosper Marketplace dataset during April 2010. 462 borrowers and lenders agreed upon new loans which were divided into 1 ,246 fractions. 453 of the borrowers and lenders are in a single connected component.
- the first and second principal components of the network ratio profiles (the x-and y-axes of the spatialization 42) account for 54% and 16% of the variability in the original dataset (see the bar indicators 44). It can be seen that the node networks of nodes to the left of the spatialization have more vertices and edges than those of the nodes to the right. However, the difference between the nodes along the y-axis of the spatialization is more interesting. Two representative nodes 46 are selected in Fig. 4.
- the radar chart view 48 of their network ratio profiles reveal that the node to the top, when compared to the node to the bottom, has a node network with relatively fewer lower-order network motifs but relatively more higher-order network motifs. This is corroborated by the small multiples
- the node to the bottom of the spatialization (to the left of the small multiples representation 49) has just two neighbours, both of whom are connected to many others.
- the vertices at the center of the two circles 51 represent the two neighbours.
- the node to the top of the spatialization (to the right of the small multiples representation) has many more neighbours.
- the vertices in the circle surrounding the node represent these.
- the differences between, say, the top and bottom egos in Fig. 4 can be computed more easily and directly using, say, node degrees and clustering coefficients.
- the MIT Reality Mining dataset disclosed in EAGLE N., PENTLAND A., LAZER D.: Inferring Friendship Network Structure by Using Mobile Phone Data, Proceedings of the National Academy of Sciences (PNAS) 106, 36 (2009), 15274-15278 comprises mobile phone call and SMS records over a 296-day period between 100 unique mobile phones.
- the dataset is a subset of a much larger dataset comprising communication, proximity, location, and activity information involving 100 subjects at MIT over the course of the 2004-2005 academic year.
- a node represents a user, or more specifically a mobile phone
- an edge represents a mobile phone call or SMS between two mobile phones.
- Figure 5 shows a node based view 12' produced according to an embodiment of the present invention and a global view 26 produced using a force-directed algorithm of the network indicating all calls between all users.
- the global view 26 identifies two large communities, a known artifact of the dataset, being mobile phone users with dense communication within each group and sparse communication between the groups.
- the view 12' also identifies two communities 30', 30" but these do not correspond to the two communities in the global view 26. Instead, they correspond to core mobile phone users 30" and peripheral mobile phone users 30'.
- the peripheral mobile phone users can be further divided into an inner periphery (the nodes 30A below the divider) and an outer periphery (the nodes 30B above the divider).
- the selected nodes within the circle 30' in the view 12' correspond to the selected nodes in the two circles 28', 28" in the global view 26.
- specialized algorithms could be employed to enumerate more complex network motifs, for example, stars and triangles.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Data Mining & Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Physiology (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A network analysis tool for analyzing a network of information comprising a plurality of interconnected nodes is disclosed. The tool determines, for the network, a set of network motifs, each network motif comprising a respective pattern of connections between a node and at least its neighbouring nodes. For each node, a network motif profile is determined, the profile comprising for each network motif of the set of network motifs, a count of the instances of the network motif at the node. The network motif profile for each node is normalized relative to the network motif profiles of other nodes in the network. The normalized network motif profiles for the network are projected from a high-dimensional space corresponding to the number of motifs in the set of network motifs, onto a lower dimensional space based on maximizing the variability of normalized network motif profiles with the space. At least some of the nodes of the network are displayed in the lower dimensional space.
Description
Network Analysis Tool
Field of the Invention
The present invention relates to a method and tool for analyzing a network of information
Background
There are many examples of networked information in data processing. For example, in social networks, network nodes correspond with people and connections between nodes can represent relationships between friends and family. In financial networks, the affairs of payees and payers, or borrowers and lenders can be represented by account nodes interconnected by transactions between those nodes. Similarly in telephone or communications networks, nodes corresponding to telephone or e-mail accounts can be related by calls, messages or correspondence between those accounts.
The network of connections around a node of a network can provide characteristic information about the node itself. It is appreciated in that art that the structural similarity of two nodes within a network can be measured in two ways. LORRAIN F., WHITE H.: Structural Equivalence of Individuals in Social Networks, Journal of Mathematical Sociology 1 (1971 ), 49-80 disclose that two nodes are structurally equivalent if they share many of the same neighbors. LEICHT E., HOLME P., NEWMAN M.: Vertex Similarity in Networks, Physical Review E 73, 026120 (2006) formulate a measure based on two nodes being regularly equivalent if they are connected to other nodes that are themselves structurally similar.
Structural similarity can also be used to classify entire networks. BRANDES U., LERNER J., NAGEL U., NICK B.: Structural Trends in Network Ensembles,
Proceedings of the 1 st International Workshop on Complex Networks (CompleNef 09) (2009), Springer, pp. 83-97 discloses an effective heuristic for partitioning a set of (planted partition) networks such that two networks are in the same part if and only
if they are generated using the same random network model. The heuristic relies on the fact that the adjacency matrices of two networks generated using the same random planted partition network model have (with high probability) similar spectra. SHEN-ORR S., MILO R., MANGAN S., ALON U.: Network Motifs in the
Transcriptional Regulation Network of Escherichia Coli. Nature Genetics 31 (2002), 64-68; and MILO R., SHEN-ORR S., ITZKOVITZ S., KASHTAN N., CHKLOVSKII D., ALON U.: Network Motifs: Simple Building Blocks of Complex Networks, Science 298, 5594 (2002), 824-827 disclose forms of network motif analysis. This represents the structural or topological properties of a network by counting all occurrences of each network motif in a network.
MILO R., ITZKOVITZ S., KASHTAN N., LEVITT R., SHEN-ORR S., AYZENSHTAT I., SHEFFER M., ALON U.: Superfamilies of Evolved and Designed Networks, Science 303, 5663 (2004), 1538-1542 use network ratio profiles to compare the counts of network motifs in networks of varying sizes. Using a correlation coefficient matrix, they identify several families of networks that have similar network ratio profiles, such as biological information-processing networks, Internet networks, social networks and autonomous systems networks.
KOSCHUTZKI D., SCHWOBBERMEYER H., SCHREIBER F.: Ranking of Network Elements Based on Functional Substructures, Journal of Theoretical Biology 248, 3 (2007), 471-479 formulate a number of network motif-based centrality measures. They rank the vertices of the E. Coli transcriptional network using each centrality measure. They claim that network motif-based centrality measures identify genes that are import regulators which are overlooked by local (e.g. out-degree) and global (e.g. betweeness) centrality measures.
Network motif analysis is generally concerned with entire networks and global counts.
There are also several visualization systems that aid with network motif analysis, such as disclosed in SCHREIBER F., SCHWOBBERMEYER H.: MAVisto: A Tool for
the Exploration of Network Motifs, Bioinformatics 21 , 17 (2005), 3572-3574; and MA'AYAN A., JENKINS S., WEBB R., BERGER S., PURUSHOTHAMAN S., ABUL- HUSN N., POSNER J., FLORES T., IYENGAR R.: SNAVI: Desktop Application for Analysis and Visualization of Large-Scale Signaling Networks, BMC Systems Biology 3, 10 (2009). However, these are concerned with network motif analysis on entire networks.
WHITE H., BOORMAN S., BREIGER R.: Social Structure from Multiple Networks - Blockmodels of Roles and Positions, American Journal of Sociology 81 (1976), 730- 780; BORGATTI S., EVERETT M.: The Class of All Regular Equivalences: Algebraic Structure and Computation, Social Networks 1 1 (1989), 65-88; and WELLMAN B.: An Egocentric Network Tale: Comment on Bien et al, Social Networks 15 (1993), 423-436 each disclose node based analysis within the field of social network analysis.
LUBBERS M., MOLINA J., LERNER J., BRANDES U., AVILA J., MCCARTY C: Longitudinal Analysis of Personal Networks: The Case of Argentinean Migrants in Spain, Social Networks 32, 1 (2010), 91-104 describe a dynamic node based network analysis of Argentinean immigrants in Spain. The analysis comprises qualitative interviews and a quantitative analysis at three distinct levels (node- neighbour pairs, neighbour-neighbour pairs and networks). The quantitative analysis investigated the characteristics of the nodes, the structural characteristics of the networks, the characteristics of the node-neighbour pairs and neighbours, the structural positions of the neighbours, and the characteristics of the neighbour- neighbour pairs.
In BRANDES U., LERNER J., LUBBERS M., MCCARTY C, MOLINA J.: Visual Statistics for Collections of Clustered Graphs, In Proceedings of the IEEE VGTC Pacific Visualization Symposium (PacificVis'08) (2008), IEEE, pp. 47-54, the composition of networks were visualized using clustered networks where the size of four nodes encode the number of people in each of four groups (origin, fellows, host and transnationals) and the thickness of the edges quantify the amount of
communication between the groups.
WELSER H., GLEAVE E., FISHER D., SMITH M.: Visualizing the Signatures of Social Roles in Online Discussion Groups, Journal of Social Structure 8 (2007) present an analysis of roles in an online discussion group. They visualize posting habits within networks where through visual inspection, they identify three types of poster: answer people, discussion people, and disruptors.
Similarly, STOICA A., PRIEUR C: Structure of Neighborhoods in a Large Social network, Proceedings of the International Conference on Computational Science and Engineering (CSEO9) (2009), pp. 26-33 analyze a large mobile phone call network and partition nodes according to roles and validate their results using network attributes.
ANTIQUEIRA L, DA FONTOURA COSTA L: Characterization of Subgraph
Relationships and Distribution in Complex Networks, New Journal of Physics 1 1 , 013058 (2009) present a methodology for analyzing non-overlapping subnetworks, their interrelationships, and their distribution in a network. Given a network and a set of non-overlapping subnetworks, they generate histograms of the subnetwork sizes and the shortest distances between the subnetworks. They then dilate each subnetwork, merging subnetworks when necessary, until the entire network is covered. They consider the rate at which the subnetworks merge and the rate at which vertices are covered by the dilations. They analyze four random network models and five real-world networks and show, for example, that the real-world networks have similarities with combinations of the random network models.
The analyses of Welser et al., Stoica and Prieur and Antiqueira and da Fontoura Costa choose a number of network statistics, for example, degree, clustering coefficient and local triangle count, when characterizing subnetworks and the choice of network statistics is often specific to the task at hand.
Although not directly related to node based visualization, VON LANDESBERGER T., GORNER M., SCHRECK T.: Visual Analysis of Graphs with Multiple Connected Components, Proceedings of the IEEE Symposium on Visual Analytics Science and
Technology (VAST'09) (2009), IEEE Computer Society, pp. 155-162 use a self- organizing map (SOM) to cluster networks into a grid of prototypical networks. They compute a variety of topological features for the networks, including reciprocity features, distance features, clustering features and degree distribution features. The user weights the features appropriately and the system produces a SOM layout. Each cell represents a subset of similar networks; the background color indicates the number. Also, JEONG D., ZIEMKIEWICZ C, FISHER B., RIBARSKY W., CHANG R.: iPCA: An Interactive System for PCA-Based Visual Analytics, Proceedings of the 1 1 th Eurographics/IEEE Symposium on Visualization (EuroVis'09) (2009), pp. 767- 774 discloses an interactive interface that opens up the black box of Principle
Component Analysis (PCA).
Given the above, it will be appreciated that current network analysis and visualization tools focus on analyzing and visualizing either the entire network or individual node networks but fail to visually summarize a collection of node networks.
Other approaches include E-NET, which is a tool developed primarily by the social scientist Steve Borgatti, for analyzing networks. It aids with data collection, data analysis and visualization. For data collection, it produces appropriate questionnaires to elicit attribute and relationship data from people. For data analysis, it measures size, composition (e.g. homogeneity and homophily), structure (e.g.
connectedness, density and structural holes), etc. For visualization, it produces tabular representations of the results. LI C, LIN S.: Egocentric Information Abstraction for Heterogeneous Social Networks. In Proceedings of the 1 st International Conference on Social Networks Analysis and Mining (ASONAMO9) (2009), IEEE Computer Society, pp. 255-260 summarize egocentric networks by combining the surrounding relational structures with the statistical dependencies between attribute values to form feature vectors. The features describing the relational structures are based on the various types of paths of fixed length (say, two) that can emanate from an ego (node). They use frequency- based measures (local frequency, local rarity and relative frequency) to determine
whether or not a feature is relevant. They then construct representative egocentric networks using only the relevant features.
In spite of these approaches, there remains a need for tools and techniques for effectively analyzing and visually summarizing networks.
Summary
According to the present invention there is provided a computer-implemented method for analyzing a network of information comprising a plurality of interconnected nodes, the method comprising the steps of:
for the network, determining a set of network motifs, each network motif comprising a respective pattern of connections between a node and at least its neighbouring nodes;
for each node, determining a network motif profile comprising for each network motif of the set of network motifs, a count of the instances of the network motif at said node;
for each node, normalizing the network motif profile relative to the network motif profiles of other nodes in the network;
for the network, projecting the normalized network motif profiles from a high- dimensional space corresponding to the number of motifs in said set of network motifs, onto a lower dimensional space based on maximizing the variability of normalized network motif profiles with said space; and
displaying at least some of the nodes of said network in said lower dimensional space.
Preferably, said dimensionality reduction comprises performing principal component analysis (PCA) on said normalized network motif profile information in said high dimensionality space. The present invention provides a system that analyzes and clusters nodes based on the relationship structure of their network connections; and presents the results as a node based spatialization.
Embodiments of the invention use a form of network motif analysis and dimensionality reduction to cluster nodes so that two nodes are in the same cluster if their respective network connections are structurally similar. This view of a network discriminates between the various classes of typical and exceptional nodes.
Embodiments of the present invention combine network motif analysis at the node level and dimensionality reduction using PCA to produce an aggregated node based view of a network. Embodiments allow a user to visually inspect networks, network ratio profiles, and a spatialization of the nodes based on the structural similarity of the node networks. The various views are coordinated allowing a user to select a node in one view and examine its properties in another. A user can also compare, for example, network ratio profiles through selecting multiple nodes to help identify the distinguishing features of a collection of node networks. Embodiments of the invention use network motif analysis to exhaustively count the number of network motifs up to a certain size in a network. For large networks, this could be prohibitive, but for a collection of node networks, the computation can be divided and parallelized. A node's network connections can include connections between a node and its immediate neighbours as well as connections between a node's neighbours and possibly their neighbours.
Embodiments of the invention are particularly useful for identifying rogue behaviour without a priori knowledge of the form of this behaviour. For example, if a personal bank account in a financial transaction network is typical, its network connections should be structurally similar to network connections of other typical accounts. At the very least, there should be a small number of classes of typical accounts. On the other hand, if a bank account is involved in smurfing (the splitting of large financial transactions into multiple smaller transactions, each of which is below a limit above which financial institutions must report), assuming the incidence of smurfing is relatively low, the bank account's network connections should be relatively
exceptional. The only inputs for required for the present system to analyze such a network would be a list of account transactions.
In embodiments of the invention, the structure of a node's network is defined by the longest shortest-path distance k from a node to every other node in the node's network (the radius) as well as the various network motifs to be counted. Preferably, the counts for each node's network are adjusted for scale to produce network ratio profiles. The network ratio profiles can be interpreted as points in a high-dimensional space. Preferably, they are projected onto a 2-dimensional spatialization using principal component analysis (PCA). This projection removes the correlations between the counts. The spatialization encodes the similarities and differences amongst the node networks. Furthermore, clusters of nodes represent broad classes of nodes with structurally similar node networks. Brief Description of the Drawings
Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
Figure 1 shows a user interface including a view generated according to an embodiment of the present invention for browsing a single 1 ,000-node random network from the ER dataset.
Figure 2 shows a summary for a selection of nodes in Figure 1 . Figure 3 shows a view of a single 1 ,000-node network from the WS dataset generated according to an embodiment of the present invention.
Figure 4 shows a view of the activity in the Prosper Marketplace dataset during April 2010 generated according to an embodiment of the present invention.
Figure 5 shows a comparison of a view generated according to an embodiment of the present invention and a global view of the MIT Reality Mining dataset.
Figure 6 is a flow diagram illustrating an embodiment of the invention. Description of the Preferred Embodiments
Embodiments of the present invention comprise a network analysis tool which produces a spatialization for a network of nodes (egos) that clusters nodes so that two nodes are in the same cluster if their node networks are structurally similar. One of the potential applications for the tool is in identifying and visualizing nodes exhibiting potentially fraudulent behaviour without specifying the behaviour of such nodes a priori.
Referring now to Figure 1 , which shows a user interface 10 for a network analysis tool according to an embodiment of the present invention. The interface comprises multiple coordinated views 12, 14, 16. Each of the three views 12, 14, 16 illustrates a specific aspect of selected node(s) and each view is coordinated with the others, so that selecting nodes in one window causes appropriate updates in the other views. The views are coordinated, allowing a user to select a node in one view and examine its properties in the others. For example, the system includes a view 14 of the topology of the selected node networks and a view 16 for comparing their network ratio profiles. The spatialization 12 is the central view in the system. A user may pan and zoom within this view. At the top left, the view 12 includes bar indicators 20. The length of each bar 20 shows the percentages of variability captured by each axis of the view; as such, these can be interpreted as a measure of the significance of each axis of the spatialization. The view 12 further includes a slider control 22 that can automatically color the nodes based on a k-means clustering.
The node based spatialization in the view 12 is computed through network motif analysis and dimensionality reduction, described in more detail below.
Referring now to Figure 6, the tool begins by calculating a node network for each node in turn, step 60. The node network, or k-neighborhood subnetwork, of a node u is the subnetwork induced by the set of vertices that have shortest-path distance at most k hops from u. In the illustrated embodiment, k = 2, i.e. a node's network extends to its neighbours and its neighbours' neighbours.
For each node network, a network motif profile is calculated, step 62. In the embodiment, a profile comprises a 30-element vector where each entry is based on a count of the number of instances of the corresponding network motif in an ordered list that are incident with the node.
The ordered list comprises all network motifs with at most I vertices up to
isomorphism connected to a node. In the illustrated embodiment, I = 5, i.e. the maximum number of nodes in any given network motif is 5. The ordered list can be generated using, for example, geng from the nauty package disclosed in MCKAY B.: Isomorph-Free Exhaustive Generation. Journal of Algorithms 26, 2 (1998), 306-324.
The counts for each element of the network profile vector can be calculated using GraphGrepSX disclosed in GIUGNO R., SHASHA D.: GraphGrep: A Fast and Universal Method for Querying Graphs, Proceedings of the 16th International
Conference on Pattern Recognition (ICPRO2) (2002), pp. 1 12-1 15. GraphGrepSX is a tool that solves the subgraph isomorphism problem using enumerated paths as index features. This can be a time-consuming process, but for large datasets, node networks can be processed in parallel and/or both k and I can be reduced.
For each network motif profile, a network ratio profile is computed, step 64. In the network ratio profile, each entry of the 30-element vector comprises a normalized ratio of the corresponding entry in the network motif profile. The ratio profile rp of a node network is computed using:
nmpj + nmpj + ε
where nmpj is the ith entry of the network motif profile, nmpi is the average of the ith entry of all of the network motif profiles, and ε is a small integer that ensures that the ratio is not misleadingly large when the network motif appears very few times in all of the node networks. To adjust for scaling, the normalized ratio profile nrp of a node network is computed using:
∑rp, )
A normalized ratio measures the abundance of a network motif in each individual node network relative to all node networks; it is similar to a z-score. It is noted that there are correlations between the elements of a network ratio profile. Thus, in the embodiment, to adjust for these, a dimensionality reduction is performed, step 66. Principle Component Analysis (PCA) is an exemplary dimensionality reduction technique that calculates the eigenvectors of a covariance matrix
generated from a set of vectors, in this case, the network ratio profiles. Each eigenvector, or principal component, corresponds to an orthogonal direction of variation within the data. Often, a small subset of eigenvectors can account for much of the variability. PCA identifies underlying structures such as clusters and outliers that are difficult to perceive in the original set of vectors. In the present embodiment, the first two eigenvectors are calculated and these are used as the x and y axes for the spatialization shown in the window 12 of Figure 1 . It will of course be appreciated that if more than two eigenvectors were calculated, the window 12 could be implemented as a three dimensional display of nodes; or other techniques for enabling browsing of higher dimension spaces could be employed. It will also be noted that in Figure 1 , the PCA analysis has been directed to determine 5 clusters of nodes, whereas it can be seen that any number of clusters can be chosen.
Figure 1 is based on an analysis of the ER dataset. This includes 50 random networks generated using the Erdos-Renyi model disclosed in: ERDOS P., RENYI A.: On Random Graphs. Publicationes Mathematicae Debrecen 6 (1959), 290-297; and ERDOS P., RENYI A.: On the Evolution of Random Graphs. Institute of
Mathematics, Hungarian Academy of Sciences 5 (1960), 17-61 . [GM10] GINOZA R., MUGLER A.: Network Motifs Come in Sets: Correlations in the Randomization Process, Physical Review E 82, 1 (2010). Each network contains 1 ,000 vertices and 5,000 edges. Furthermore, five nodes were chosen at random and augmented with additional edges to create a clique.
In Figure 1 , a black circle 18 indicates the five nodes belonging to the five-node clique. The exceptionality of these egos is apparent from their position within the window 12. When one of the nodes is selected, the node network for the selected node is displayed in the window 14, while at the same time the network ratio profile for the selected node is displayed in the window 16. The normalized ratio ranges from [-1 , 1 ] and so it will be seen that at least for the first six elements of the vector the profile is relatively close to average.
Figure 2 shows an updated window 16 from Figure 1 , if all five vertices of the clique indicated by the circle 18 in the network in Fig. 1 are selected and displayed in radar chart style. Figure 2 shows the five nodes v0... v4 have broadly identical network ratio profiles making them structurally similar. All elements in the network ratio profiles are relatively high, especially for the higher-order network motifs and this is what makes the corresponding nodes exceptional.
The WS dataset also includes 50 random networks generated using the Watts- Stogatz model disclosed in WATTS D., STROGATZ S.: Collective Dynamics of 'Small-World' Networks. Nature 393 (1998), 440-442. Again, each network contains 1 ,000 vertices and 5,000 edges. Again, five vertices (nodes) were chosen at random and augmented with additional edges to create a clique.
The ER and WS datasets reveal both the strength and limitation of the present approach. In Figure 1 based on the ER dataset, the clique comprising nodes v0... v4 is easily identifiable through the spatialization 12. (The same is true for the other networks in the dataset.) These cliques are not easily identifiable in the
corresponding topological views produced by a force-directed algorithm, for example, the view 26 shown in Figure 5.
However, Figure 3, which is based on a single 1 ,000-node network from the WS dataset is less convincing. The five nodes belonging to the five node clique are surrounded by a black circle 33. The clique is not easily identifiable through the spatialization shown. This is due to the increased clustering coefficient (indicated on the axis 32) found in networks from the WS dataset compared to those from the ER
dataset. The nodes in the clique are no longer considered exceptional. (Again, the same is also true for all 50 networks in the WS dataset). The networks in both Fig. 1 and Fig. 3 have the same number of vertices and edges. However, their differing structure means that a node centric network considered exceptional in one is typical in another.
The Prosper Marketplace dataset (www.prosper.com) is derived from a peer-to-peer lending or social lending service where borrowers ask for money in the form of listings and lenders bid on listings specifying repayment terms including interest rates. If enough lenders fund a listing, the listing becomes a loan. Prosper rates prospective borrowers according to their creditworthiness. It also maintains borrower and lender groups, endorsements, past listings, bids and loans. The social structure of the service is evident from the data: a node represents a borrower or lender and an edge represents a fraction of a loan agreed upon between a borrower and a lender. It should be noted that lenders can also be borrowers and vice versa and therefore the network is not necessarily bipartite.
Figure 4 includes a view 42 showing the activity in the Prosper Marketplace dataset during April 2010. 462 borrowers and lenders agreed upon new loans which were divided into 1 ,246 fractions. 453 of the borrowers and lenders are in a single connected component. The first and second principal components of the network ratio profiles (the x-and y-axes of the spatialization 42) account for 54% and 16% of the variability in the original dataset (see the bar indicators 44). It can be seen that the node networks of nodes to the left of the spatialization have more vertices and edges than those of the nodes to the right. However, the difference between the nodes along the y-axis of the spatialization is more interesting. Two representative nodes 46 are selected in Fig. 4. The radar chart view 48 of their network ratio profiles reveal that the node to the top, when compared to the node to the bottom, has a node network with relatively fewer lower-order network motifs but relatively more higher-order network motifs. This is corroborated by the small multiples
representation 49 of the two corresponding node networks. The node to the bottom of the spatialization (to the left of the small multiples representation 49) has just two neighbours, both of whom are connected to many others. The vertices at the center
of the two circles 51 represent the two neighbours. However, the node to the top of the spatialization (to the right of the small multiples representation) has many more neighbours. The vertices in the circle surrounding the node represent these. The differences between, say, the top and bottom egos in Fig. 4 can be computed more easily and directly using, say, node degrees and clustering coefficients.
However, the importance and flexibility of the above approach lies in the fact that the nature or the distinguishing feature(s) of the differences was not input a priori. Thus, through the exploration of the spatialization 42 and the network ratio profiles 48 we can deduce the distinguishing feature(s) of nodes in cliques.
The MIT Reality Mining dataset disclosed in EAGLE N., PENTLAND A., LAZER D.: Inferring Friendship Network Structure by Using Mobile Phone Data, Proceedings of the National Academy of Sciences (PNAS) 106, 36 (2009), 15274-15278 comprises mobile phone call and SMS records over a 296-day period between 100 unique mobile phones. The dataset is a subset of a much larger dataset comprising communication, proximity, location, and activity information involving 100 subjects at MIT over the course of the 2004-2005 academic year. In this case, a node represents a user, or more specifically a mobile phone, and an edge represents a mobile phone call or SMS between two mobile phones. Figure 5 shows a node based view 12' produced according to an embodiment of the present invention and a global view 26 produced using a force-directed algorithm of the network indicating all calls between all users. The global view 26 identifies two large communities, a known artifact of the dataset, being mobile phone users with dense communication within each group and sparse communication between the groups.
The view 12' also identifies two communities 30', 30" but these do not correspond to the two communities in the global view 26. Instead, they correspond to core mobile phone users 30" and peripheral mobile phone users 30'. The peripheral mobile phone users can be further divided into an inner periphery (the nodes 30A below the divider) and an outer periphery (the nodes 30B above the divider). The selected nodes within the circle 30' in the view 12' correspond to the selected nodes in the two circles 28', 28" in the global view 26.
In extensions of the embodiments described above, specialized algorithms could be employed to enumerate more complex network motifs, for example, stars and triangles.
It could also be possible to allow for weighting individual elements of the network ratio profile vector. This would effect the projection of the network ratio profiles onto points in the spatialization. For example, a user could choose to ignore the contribution of one network motif entirely or emphasize the contribution of another.
It will be seen that in some cases, using PCA, the percentage of variability accounted for by either of the axes in the view 12 could be relatively small, as would be indicated by short bar indicators, and so the positioning of the nodes along the axes would not be significant. Using other dimensionality reduction techniques such as disclosed in ROWEIS S., SAUL L.: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 22, 5500 (2000), 2323-2326 could preserve locality and allow for better presentation of data in these cases.
Claims
1 . A computer-implemented method for analyzing a network of information comprising a plurality of interconnected nodes, the method comprising the steps of: for the network, determining a set of network motifs, each network motif comprising a respective pattern of connections between a node and at least its neighbouring nodes;
for each node, determining a network motif profile comprising for each network motif of the set of network motifs, a count of the instances of the network motif at said node;
for each node, normalizing the network motif profile relative to the network motif profiles of other nodes in the network;
for the network, projecting the normalized network motif profiles from a high- dimensional space corresponding to the number of motifs in said set of network motifs, onto a lower dimensional space based on maximizing the variability of normalized network motif profiles with said space; and
displaying at least some of the nodes of said network in said lower dimensional space.
2. The method of claim 1 wherein said projecting comprises performing principal component analysis (PCA) on said normalized network motif profile for said nodes.
3. The method of claim 1 further comprising the step of: responsive to a user selecting one or more nodes from said lower dimensional space display,
simultaneously displaying network motif profile information for said selected nodes.
4. The method of claim 1 further comprising the step of: responsive to a user selecting a node from said lower dimensional space display, simultaneously displaying a node network for said node.
5. The method of claim 1 wherein one or both of said determining and
normalizing steps are performed in parallel for each node.
6. The method of claim 1 wherein a node's network connections includes connections between a node and its immediate neighbours and connections between a node's neighbours and their neighbours.
7. The method of claim 1 wherein said nodes correspond with bank accounts and said connections correspond with transactions between said bank accounts, said displaying enabling the identification a posteriori of potentially fraudulent behaviour between bank accounts.
8. The method of claim 1 wherein said nodes correspond with phone accounts and said connections correspond with connections between said phone accounts, said displaying enabling the identification a posteriori of irregular behaviour between users of said accounts.
9. The method of claim 1 wherein a node's network motifs extend a maximum number k of interconnections from a node to every other node in the node's network.
10. The method of claim 9 wherein k=2.
1 1 . The method of claim 2, wherein said PCA analysis reduces said normalized network motif profile information to a 2-dimensional space.
12. The method of claim 1 further comprising the step of weighting one or more individual network motifs within said network motif profiles to either emphasize or de- emphasize variations between nodes in said network in respect of specific network motifs.
13. The method of claim 1 further comprising applying a second clustering to said network of nodes to divide said nodes into a specified number of clusters, and wherein said displaying comprises displaying said nodes in said lower dimensional space according to their designated cluster.
14. The method of claim 13 wherein said second clustering comprises k-means clustering.
15. A computer program product comprising computer readable instructions stored on a computer readable medium which when executed in a computing device are arranged to perform the steps of any previous claim.
16. A network analysis tool for analyzing a network of information comprising a plurality of interconnected nodes, the tool being arranged to:
for the network, determine a set of network motifs, each network motif comprising a respective pattern of connections between a node and at least its neighbouring nodes;
for each node, determine a network motif profile comprising for each network motif of the set of network motifs, a count of the instances of the network motif at said node; for each node, normalize the network motif profile relative to the network motif profiles of other nodes in the network;
for the network, project the normalized network motif profiles from a high-dimensional space corresponding to the number of motifs in said set of network motifs, onto a lower dimensional space based on maximizing the variability of normalized network motif profiles with said space; and
display at least some of the nodes of said network in said lower dimensional space.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GBGB1107251.9A GB201107251D0 (en) | 2011-05-03 | 2011-05-03 | Netowrk analysis tool |
GB1107251.9 | 2011-05-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012150107A1 true WO2012150107A1 (en) | 2012-11-08 |
Family
ID=44203005
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2012/056303 WO2012150107A1 (en) | 2011-05-03 | 2012-04-05 | Network analysis tool |
Country Status (2)
Country | Link |
---|---|
GB (1) | GB201107251D0 (en) |
WO (1) | WO2012150107A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015047431A1 (en) * | 2013-09-30 | 2015-04-02 | Mcafee, Inc. | Visualization and analysis of complex security information |
GB2523237A (en) * | 2013-12-19 | 2015-08-19 | Bae Systems Plc | Data communications performance monitoring |
US20170257291A1 (en) * | 2016-03-07 | 2017-09-07 | Autodesk, Inc. | Node-centric analysis of dynamic networks |
US10153950B2 (en) | 2013-12-19 | 2018-12-11 | Bae Systems Plc | Data communications performance monitoring |
CN110224847A (en) * | 2018-05-02 | 2019-09-10 | 腾讯科技(深圳)有限公司 | Group dividing method, device, storage medium and equipment based on social networks |
US10601688B2 (en) | 2013-12-19 | 2020-03-24 | Bae Systems Plc | Method and apparatus for detecting fault conditions in a network |
US10728105B2 (en) * | 2018-11-29 | 2020-07-28 | Adobe Inc. | Higher-order network embedding |
CN113961712A (en) * | 2021-09-08 | 2022-01-21 | 武汉众智数字技术有限公司 | Knowledge graph-based fraud telephone analysis method |
CN114826278A (en) * | 2022-04-25 | 2022-07-29 | 电子科技大学 | Image data compression method based on Boolean matrix decomposition |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003062943A2 (en) * | 2002-01-22 | 2003-07-31 | Yeda Research And Development Co. Ltd. | Method for analyzing data to identify network motifs |
EP1587012A2 (en) * | 2004-04-15 | 2005-10-19 | Microsoft Corporation | Reinforced clustering of multi-type data objects for search term suggestion |
US20070094066A1 (en) * | 2005-10-21 | 2007-04-26 | Shailesh Kumar | Method and apparatus for recommendation engine using pair-wise co-occurrence consistency |
WO2011022660A1 (en) * | 2009-08-21 | 2011-02-24 | Puretech Ventures, Llc | Methods of diagnosing and treating microbiome-associated disease using interaction network parameters |
-
2011
- 2011-05-03 GB GBGB1107251.9A patent/GB201107251D0/en not_active Ceased
-
2012
- 2012-04-05 WO PCT/EP2012/056303 patent/WO2012150107A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003062943A2 (en) * | 2002-01-22 | 2003-07-31 | Yeda Research And Development Co. Ltd. | Method for analyzing data to identify network motifs |
EP1587012A2 (en) * | 2004-04-15 | 2005-10-19 | Microsoft Corporation | Reinforced clustering of multi-type data objects for search term suggestion |
US20070094066A1 (en) * | 2005-10-21 | 2007-04-26 | Shailesh Kumar | Method and apparatus for recommendation engine using pair-wise co-occurrence consistency |
WO2011022660A1 (en) * | 2009-08-21 | 2011-02-24 | Puretech Ventures, Llc | Methods of diagnosing and treating microbiome-associated disease using interaction network parameters |
Non-Patent Citations (29)
Title |
---|
"Longitudinal Analysis of Personal Networks: The Case of Argentinean Migrants in Spain", SOCIAL NETWORKS, vol. 32, no. 1, 2010, pages 91 - 104 |
ANTIQUEIRA L.; DA FONTOURA COSTA L.: "Characterization of Subgraph Relationships and Distribution in Complex Networks", NEW JOURNAL OF PHYSICS, vol. 11, 2009, pages 013058 |
BORGATTI S.; EVERETT M.: "The Class of All Regular Equivalences: Algebraic Structure and Computation", SOCIAL NETWORKS, vol. 11, 1989, pages 65 - 88 |
BRANDES U.; LERNER J.; LUBBERS M.; MCCARTY C.; MOLINA J.: "Visual Statistics for Collections of Clustered Graphs", PROCEEDINGS OF THE IEEE VGTC PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS'08, 2008, pages 47 - 54, XP031238926 |
BRANDES U.; LERNER J.; NAGEL U.; NICK B.: "CompleNet' 09", 2009, SPRINGER, article "Structural Trends in Network Ensembles, Proceedings of the 1 st International Workshop on Complex Networks", pages: 83 - 97 |
BUNKE H ET AL: "Recent advances in graph-based pattern recognition with applications in document analysis", PATTERN RECOGNITION, ELSEVIER, GB, vol. 44, no. 5, 1 May 2011 (2011-05-01), pages 1057 - 1067, XP027595411, ISSN: 0031-3203, [retrieved on 20110111], DOI: 10.1016/J.PATCOG.2010.11.015 * |
EAGLE N.; PENTLAND A.; LAZER D.: "Inferring Friendship Network Structure by Using Mobile Phone Data", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES (PNAS, vol. 106, no. 36, 2009, pages 15274 - 15278, XP003027942, DOI: doi:10.1073/PNAS.0900282106 |
ERDOS P.; RÉNY A, ON RANDOM GRAPHS. PUBLICATIONES MATHEMATICAE DEBRECEN, vol. 6, 1959, pages 290 - 297 |
ERDOS P.; RENYI A: "On the Evolution of Random Graphs", INSTITUTE OF MATHEMATICS, HUNGARIAN ACADEMY OF SCIENCES, vol. 5, 1960, pages 17 - 61 |
GINOZA R.; MUGLER A.: "Network Motifs Come in Sets: Correlations in the Randomization Process", PHYSICAL REVIEW, vol. E 82, 2010, pages 1 |
GIUGNO R.; SHASHA D.: "GraphGrep: A Fast and Universal Method for Querying Graphs", PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR'02, 2002, pages 112 - 115, XP010613833, DOI: doi:10.1109/ICPR.2002.1048250 |
JEONG D.; ZIEMKIEWICZ C.; FISHER B.; RIBARSKY W.; CHANG R: "iPCA: An Interactive System for PCA-Based Visual Analytics", PROCEEDINGS OF THE 11TH EUROGRAPHICS/IEEE SYMPOSIUM ON VISUALIZATION (EUROVIS'09, 2009, pages 767 - 774 |
KOSCHUTZKI D.; SCHWOBBERMEYER H.; SCHREIBER F.: "Ranking of Network Elements Based on Functional Substructures", JOURNAL OF THEORETICAL BIOLOGY, vol. 248, no. 3, 2007, pages 471 - 479, XP022235465, DOI: doi:10.1016/j.jtbi.2007.05.038 |
LEICHT E.; HOLME P.; NEWMAN M., VERTEX SIMILARITY IN NETWORKS, PHYSICAL REVIEW E, vol. 73, 2006, pages 026120 |
LI C.; LIN S.: "Proceedings of the 1 st International Conference on Social Networks Analysis and Mining (ASONAM'09", 2009, IEEE COMPUTER SOCIETY, article "Egocentric Information Abstraction for Heterogeneous Social Networks", pages: 255 - 260 |
LORRAIN F.; WHITE H.: "Structural Equivalence of Individuals in Social Networks", JOURNAL OF MATHEMATICAL SOCIOLOGY, vol. 1, 1971, pages 49 - 80 |
MA'AYAN A.; JENKINS S.; WEBB R.; BERGER S.; PURUSHOTHAMAN S.; ABUL-HUSN N.; POSNER J.; FLORES T.; IYENGAR R: SNAVI: "Desktop Application for Analysis and Visualization of Large-Scale Signaling Networks", BMC SYSTEMS BIOLOGY, vol. 3, 2009, pages 10, XP021052425, DOI: doi:10.1186/1752-0509-3-10 |
MCKAY B.: "Isomorph-Free Exhaustive Generation", JOURNAL OF ALGORITHMS, vol. 26, no. 2, 1998, pages 306 - 324 |
MILO R.; ITZKOVITZ S.; KASHTAN N.; LEVITT R.; SHEN-ORR S.; AYZENSHTAT; SHEFFER M.; ALON U.: "Superfamilies of Evolved and Designed Networks", SCIENCE, vol. 303, no. 5663, 2004, pages 1538 - 1542 |
MILO R.; SHEN-ORR S.; ITZKOVITZ S.; KASHTAN N.; CHKLOVSKII D.; ALON U.: "Network Motifs: Simple Building Blocks of Complex Networks", SCIENCE, vol. 298, no. 5594, 2002, pages 824 - 827, XP002496255, DOI: doi:10.1126/science.298.5594.824 |
ROWEIS S.; SAUL L.: "Nonlinear Dimensionality Reduction by Locally Linear Embedding", SCIENCE, vol. 22, no. 5500, 2000, pages 2323 - 2326, XP002971560, DOI: doi:10.1126/science.290.5500.2323 |
SCHREIBER F.; SCHWOBBERMEYER H.: "MAVisto: A Tool for the Exploration of Network Motifs", BIOINFORMATICS, vol. 21, no. 17, 2005, pages 3572 - 3574, XP008077898, DOI: doi:10.1093/bioinformatics/bti556 |
SHEN-ORR S.; MILO R.; MANGAN S.; ALON U.: "Network Motifs in the Transcriptional Regulation Network of Escherichia Coli", NATURE GENETICS, vol. 31, 2002, pages 64 - 68, XP008077904, DOI: doi:10.1038/ng881 |
STOICA A.; PRIEUR C.: "Structure of Neighborhoods in a Large Social network", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE'09, 2009, pages 26 - 33, XP031544340 |
VON LANDESBERGER T; GORNER M.; SCHRECK T: "Visual Analysis of Graphs with Multiple Connected Components", PROCEEDINGS OF THE IEEE SYMPOSIUM ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST'09, 2009, pages 155 - 162, XP031566631 |
WATTS D.; STROGATZ S.: "Collective Dynamics of 'Small-World' Networks", NATURE, vol. 393, 1998, pages 440 - 442, XP007918559 |
WELLMAN B. ET AL.: "An Egocentric Network Tale: Comment on Bien et al", SOCIAL NETWORKS, vol. 15, 1993, pages 423 - 436 |
WELSER H.; GLEAVE E.; FISHER D.; SMITH M.: "Visualizing the Signatures of Social Roles in Online Discussion Groups", JOURNAL OF SOCIAL STRUCTURE, vol. 8, 2007 |
WHITE H.; BOORMAN S.; BREIGER R: "Social Structure from Multiple Networks - Blockmodels of Roles and Positions", AMERICAN JOURNAL OF SOCIOLOGY, vol. 81, 1976, pages 730 - 780 |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9591028B2 (en) | 2013-09-30 | 2017-03-07 | Mcafee, Inc. | Visualization and analysis of complex security information |
WO2015047431A1 (en) * | 2013-09-30 | 2015-04-02 | Mcafee, Inc. | Visualization and analysis of complex security information |
US10601688B2 (en) | 2013-12-19 | 2020-03-24 | Bae Systems Plc | Method and apparatus for detecting fault conditions in a network |
GB2523237A (en) * | 2013-12-19 | 2015-08-19 | Bae Systems Plc | Data communications performance monitoring |
GB2523237B (en) * | 2013-12-19 | 2016-06-01 | Bae Systems Plc | Data communications performance monitoring |
US10153950B2 (en) | 2013-12-19 | 2018-12-11 | Bae Systems Plc | Data communications performance monitoring |
US20170257291A1 (en) * | 2016-03-07 | 2017-09-07 | Autodesk, Inc. | Node-centric analysis of dynamic networks |
WO2017155585A1 (en) * | 2016-03-07 | 2017-09-14 | Autodesk, Inc. | Node-centric analysis of dynamic networks |
US10142198B2 (en) | 2016-03-07 | 2018-11-27 | Autodesk, Inc. | Node-centric analysis of dynamic networks |
CN110224847A (en) * | 2018-05-02 | 2019-09-10 | 腾讯科技(深圳)有限公司 | Group dividing method, device, storage medium and equipment based on social networks |
CN110224847B (en) * | 2018-05-02 | 2021-12-14 | 腾讯科技(深圳)有限公司 | Social network-based community division method and device, storage medium and equipment |
US10728105B2 (en) * | 2018-11-29 | 2020-07-28 | Adobe Inc. | Higher-order network embedding |
CN113961712A (en) * | 2021-09-08 | 2022-01-21 | 武汉众智数字技术有限公司 | Knowledge graph-based fraud telephone analysis method |
CN113961712B (en) * | 2021-09-08 | 2024-04-26 | 武汉众智数字技术有限公司 | Knowledge-graph-based fraud telephone analysis method |
CN114826278A (en) * | 2022-04-25 | 2022-07-29 | 电子科技大学 | Image data compression method based on Boolean matrix decomposition |
CN114826278B (en) * | 2022-04-25 | 2023-04-28 | 电子科技大学 | Graph data compression method based on Boolean matrix decomposition |
Also Published As
Publication number | Publication date |
---|---|
GB201107251D0 (en) | 2011-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012150107A1 (en) | Network analysis tool | |
Costa et al. | Characterization of complex networks: A survey of measurements | |
US11195312B1 (en) | Gragnostics rendering | |
Harenberg et al. | Community detection in large‐scale networks: a survey and empirical evaluation | |
Dianati | Unwinding the hairball graph: Pruning algorithms for weighted complex networks | |
Liu et al. | Multicriterion market segmentation: a new model, implementation, and evaluation | |
Taha et al. | SIIMCO: A forensic investigation tool for identifying the influential members of a criminal organization | |
WO2019149268A1 (en) | Method and system for marketing internet-based insurance products | |
Zhou et al. | A robust clustering algorithm based on the identification of core points and KNN kernel density estimation | |
Nagar et al. | Visualization and analysis of Pareto-optimal fronts using interpretable self-organizing map (iSOM) | |
Chen et al. | Same stats, different graphs: Exploring the space of graphs in terms of graph properties | |
US11810001B1 (en) | Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis | |
CN104035978B (en) | Combo discovering method and system | |
Hatami | A new approach for analyzing financial markets using correlation networks and population analysis | |
Lin et al. | A novel centrality-based method for visual analytics of small-world networks | |
Lee et al. | A network structural approach to the link prediction problem | |
Shah et al. | A Three-Way Clustering Mechanism to Handle Overlapping Regions | |
Huang et al. | NGD: Filtering graphs for visual analysis | |
Huang et al. | Eigenedge: A measure of edge centrality for big graph exploration | |
Voges et al. | A rough cluster analysis of shopping orientation data | |
Gogoglou et al. | The fractal dimension of a citation curve: quantifying an individual’s scientific output using the geometry of the entire curve | |
Li et al. | An Optimization-Based Order-and-Cut Approach for Fair Clustering of Data Sets | |
Zhang et al. | Community detection based on structural balance in signed networks | |
US11928123B2 (en) | Systems and methods for network explainability | |
Goodrich et al. | The stabilizing effect of noise on the dynamics of a Boolean network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12715347 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12715347 Country of ref document: EP Kind code of ref document: A1 |