US10657686B2 - Gragnostics rendering - Google Patents

Gragnostics rendering Download PDF

Info

Publication number
US10657686B2
US10657686B2 US15/935,657 US201815935657A US10657686B2 US 10657686 B2 US10657686 B2 US 10657686B2 US 201815935657 A US201815935657 A US 201815935657A US 10657686 B2 US10657686 B2 US 10657686B2
Authority
US
United States
Prior art keywords
graphs
graph
feature
vertices
edges
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US15/935,657
Other versions
US20190295296A1 (en
Inventor
Robert P. Gove, JR.
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Two Six Labs LLC
Original Assignee
Two Six Labs LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Two Six Labs LLC filed Critical Two Six Labs LLC
Priority to US15/935,657 priority Critical patent/US10657686B2/en
Assigned to Two Six Labs, LLC reassignment Two Six Labs, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GOVE, ROBERT P., JR
Publication of US20190295296A1 publication Critical patent/US20190295296A1/en
Assigned to COMERICA BANK reassignment COMERICA BANK SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Two Six Labs, LLC
Priority to US16/865,946 priority patent/US11195312B1/en
Application granted granted Critical
Publication of US10657686B2 publication Critical patent/US10657686B2/en
Assigned to Two Six Labs, LLC reassignment Two Six Labs, LLC RELEASE OF SECURITY INTEREST RECORDED AT R/F 051683/0437 Assignors: COMERICA BANK
Assigned to ANNALY MIDDLE MARKET LENDING LLC, AS COLLATERAL AGENT reassignment ANNALY MIDDLE MARKET LENDING LLC, AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TWO SIX LABS HOLDINGS, INC., Two Six Labs, LLC
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • G06F16/287Visualization; Browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/358Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data

Definitions

  • Graphs are often employed for representing associations or relations between entities. Graphs include a set of nodes, or vertices connected by edges, or lines. Mathematical and computational representations defined by various expressions and data structures are often used in practice, however graphs can also be visually expressed to facilitate human recognition of the information or trends contained therein. Visual recognition of trends can often be made by observing “clusters” of nodes, meaning a set of nodes that share similar attributes, features or values, and the corresponding computational representation is available for automated processing or interpretation of the graph in areas such as machine learning, analytics, and statistics.
  • a graph processing system, method and apparatus classifies graphs based on a linearly computable set of features defined as a feature vector adapted for comparison with the feature vectors of other graphs.
  • the features result from graph statistics (“gragnostics”) computable from the edges and vertices of a set of graphs.
  • Graphs are classified based on a multidimensional distance of the resulting feature vectors, and similar graphs are denoted according to a distance, or nearest neighbor, of the feature vector corresponding to each graph. Projection of the feature vector onto a two dimensional Cartesian plane allows visualization of the classification, as similar graphs appear as clusters or groups separated by a relatively shorter distance. Different types or classifications of graphs also appear as other, more distant, clusters.
  • An initial training set defines the classification types, and sampled graphs are evaluated and classified based on the feature vector and nearest neighbors in the training set.
  • Configurations herein are based, in part, on the observation that graphs are often employed for storing information that can also be rendered or visualized to facilitate human recognition of the information contained in the graph.
  • conventional approaches to graph processing and rendering suffers from the shortcoming that large graphs can be unwieldy in both processing and recognition.
  • Existing approaches to comparing graphs are slow and not very expressive in explaining how the graphs are similar or dissimilar.
  • Large graphs, such as those that may be derived from analytics-focused data sources can result in a number of nodes and vertices that are cumbersome to process, and difficult to visually recognize or interpret, due to scaling or geometric issues, e.g. a large number of nodes or vertices merge into an amorphous visual image.
  • configurations herein substantially overcome the above-described shortcomings by providing a linearly computable feature vector based on quantifiable features of a graph, and a comparison metric that determines a classification of a graph to designate graphs sharing similar features, and hence, likely depict related information types or forms.
  • classification of a graph results from comparison of other graphs to answer questions such as “Which graphs or graph types does my graph most resemble?”
  • correlations between the data sources may be inferred.
  • Configurations disclosed herein are operable for, in an analytics environment having graph data responsive to rendering for visual recognition, comparison of statistical trends defined in a plurality of graphs.
  • the disclosed scalable method of rendering visualized graph data includes receiving a plurality of graphs, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges, and computing, for each graph, a plurality of features based on the edges and vertices.
  • the computed features detailed below, provide a scalar quantity that may then be normalized into a predetermined range, such as 0-1.
  • An application operable on a gragnostics arranges, for each graph, each of the normalized features into a feature vector having ordered values for each of the features.
  • the example herein coalesces 10 features into an ordered vector of normalized values in the 0.0-1.0 range, providing a multidimensional vector coalescing the features of each graph.
  • the gragnostics processor determines similarity of the graphs based on a distance between the corresponding visualized feature vectors, and a user display renders a visualization of the feature vectors for depicting the relative distance from other graphs.
  • FIG. 1 is a context diagram of a computing environment suitable for use with configurations herein;
  • FIG. 2 shows an example of graph features employed in assessing graphs in the computing environment of FIG. 1 ;
  • FIG. 3 is an example correlation of the graph features FIG. 2 derived from a set of graphs
  • FIGS. 4A-4D show a gragnostic plot for comparing feature vectors aggregated from the graph features of FIG. 3 ;
  • FIGS. 5 A 1 - 5 C denote a flowchart depicting graph processing resulting in the gragnostic plot of FIGS. 4A-4D .
  • Configurations below implement a gragnostic processing approach for an example set of graphs, and illustrate comparison and determination of a graph class based on a baseline or control set of other graphs.
  • An example set of graphs are employed for training and for classification, however any suitable graph representations may be employed.
  • a graph as employed herein is a mathematical structure defining a set of nodes—also known as vertices—and the links—or edges—that connect them. Speed and efficiency are significant because graph comparisons can be used to determine similarity or distances between a large number of graphs, which in turn can then be used to cluster large graph datasets, or to query a database to identify similar graphs. For example, in social media networks, clustering users' ego networks may identify fake accounts vs. real accounts.
  • the disclosed approach demonstrates several advantages over the prior art: 1) Computational performance that enables highly scalable graph comparisons, 2) comparisons that can be meaningfully interpreted by humans, and 3) fewer constraints on input graphs. Determining similarity of two graphs is related to graph isomorphism, which belongs to the class of NP problems, i.e. computationally intractable in practice.
  • Conventional approaches include so-called Graph kernels which are more computationally tractable by using algorithms whose running time is a polynomial function of the number of nodes in the graph, but these kernels cannot be meaningfully interpreted or understood by humans.
  • polynomial time functions become computationally infeasible for very large graphs.
  • linear computability metrics are scalable because the complexity (number of computing operations) does not vary exponentially with the number of inputs, which becomes prohibitive with a sufficiently large number of inputs.
  • Other conventional approaches use techniques such as singular value decomposition of adjacency matrices or multi-dimensional scaling on Euclidean distances between adjacency matrices, however these are also polynomial time functions that yield unintelligible results and require that input graphs all be of the same size.
  • the gragnostics approach disclosed herein emphasizes several advantages.
  • gragnostics should scale to large graphs, thus the disclosed gragnostics can be computed in O(
  • the features are comprehensible by analysts who may not be experts in graph theory, as the rendered gragnostics correspond to topological characteristics described in visual renderings. This enables broad audiences to easily understand gragnostics.
  • the disclosed approach imposes few constraints, so there are few restrictions on size or components, which also complements the linear computability provided by O(
  • FIG. 1 is a context diagram of a computing environment 100 suitable for use with configurations herein.
  • a gragnostics processor 110 is responsive to a user computing device 112 .
  • a repository 120 stores a plurality of graphs, which may include a training set 122 of graphs depicting particular types, or previous graphs 124 employed for classification, which are then added to the set of graphs employed for classification.
  • Configurations herein are particularly beneficial in an analytics environment having graph data responsive to rendering for visual recognition and comparison.
  • Visually recognizable aspects of the rendered graphs can denote statistical trends defined in a plurality of graphs.
  • the gragnostics processor 110 receives a plurality of graphs, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges.
  • the graphs are defined by a data structure indicative of, for each vertex, an association to a set of connected vertices, in which each vertex is renderable as a node and the set of associations defined by a line (edge) to each connected vertex.
  • a variety of data structures may be employed for storing the graph representations as utilized herein.
  • the user device 112 receives a graph for classification 150 from any suitable source, such as the Internet 130 or other source, such as a set of graphs stored on media to be classified according to configurations herein.
  • the classification processor 110 which may simply be a desktop PC in conjunction with the user device 112 , performs the classification and renders a graph classification 152 indicative of graphs in the repository 120 that the graph for classification most resembles. As will be discussed further below, similar graphs produce distinct clusters when classified according to a robust set of graphs, therefore defining a distinct group that the graph for classification most resembles.
  • the graph may also be stored in the repository 120 as a previous graph 124 and used for subsequent classifications.
  • FIG. 2 shows an example of graph features employed in assessing graphs in the computing environment of FIG. 1 .
  • the gragnostics processor 150 computes, for each graph, a plurality of features based on the edges and vertices.
  • Each of the features has a linear computability time such that the feature is computable in a time that varies linearly with the number of nodes and/or vertices, providing for scalability to large accumulations of data.
  • Each feature corresponds to a set of traversal steps for processing the graph, in which the traversal steps define a finite sequence of operations that varies linearly based on the number of vertices or edges in the graph.
  • the gragnostics processor 150 computes each of the features in a numerical manner, and normalizes each of the features to a range between 0.0 and 1.0 to facilitate intra-graph comparisons.
  • the features relate to visual attributes of the graphs, and include density, bridge, disconnectivity, isolation, constriction, linearity, tree and star, and also the number of vertices (nodes) and a number of edges.
  • FIG. 2 shows a visual appearance of a minimal 201 depiction of a feature ranging to a maximum depiction 203 of a feature, as well as a moderate depiction 202 of the feature.
  • Gragnostics processor 150 normalizes the computed features into a predetermined range, such that for each feature, the value is scaled to a range of between 0 and 1. Alternate feature ranges may be performed, however normalizing to the same range allows comparison between different graphs in the plurality of graphs.
  • the gragnostics processor 110 employs constant time interpretable metrics to create a graph feature vector that can be used to compute distance between graphs using techniques such as or Euclidean distance or compute clusters of graphs using techniques such as k-nearest neighbors or k-means clustering. Other multidimensional distance approaches may also be employed.
  • the gragnostics processor 110 arranges each of the normalized features into a feature vector having ordered values for each of the features.
  • the feature vector includes a value for each feature normalized to a common range for the 10 features, 0.0-1.0 in this case.
  • FIG. 3 is an example correlation of the graph features FIG. 2 derived from a set of graphs. Referring to FIGS. 2 and 3 , computation of each of the 10 feature metrics results in the intermediate step shown in FIG. 3 .
  • a plurality of classes 310 are defined based on at least the training set 122 to produce a correlation 300 of features.
  • the classes 310 each denoted by a different color, define the classifications, or groups of graphs, discussed further below in FIG. 4 .
  • FIG. 3 illustrates the degree to which certain features correlate with, or predict, other features.
  • a vertical axis 320 lists each of the 10 features, and a horizontal axis 322 lists the same set of features.
  • each feature depicts an array of subgraphs 350 (3 upper left subgraphs labeled for clarity).
  • FIG. 3 also illustrates the benefit of normalizing each of the features to the range of 0.0-1.0 to allow comparison to other features, as each subgraph 350 has a horizontal axis 352 and vertical axis 354 .
  • the horizontal axis 352 defines the value of the feature defined on the axis 322
  • the vertical axis 354 defines the value of the feature defined on the axis 320 .
  • each graph in a training set 122 is plotted based on the normalized value of the feature. This illustrates the intermediate step of rendering a graphing of each feature against the other features for each graph.
  • the color of the plot point indicates the group from which the graph was derived.
  • a grouping 360 denotes that subway graphs (green dots showing graphs derived from an inner city subway layout) tend to be less constricted (constricted feature value near 0.0).
  • Group 362 shows that a high value in the lines feature tends to correlate with the constriction feature.
  • Group 364 demonstrates that star and bridge features distinguish the ego graphs (pink dots showing social media connections).
  • group 366 distinguishes geometric graphs (blue dots derived from graphs of regular geometric shapes).
  • Group 368 demonstrates a correlation between tree and density features
  • group 370 shows correlation between bridge and constricted features.
  • FIGS. 4A-4D coalesce the aggregate features of each graph for comparison on a broader scale.
  • FIGS. 4A-4D show a gragnostic plot for comparing feature vectors aggregated from the graph features of FIG. 3 .
  • the computed set of features ( 10 , in the example shown) defines an ordered vector, which can be represented in a multidimensional space.
  • FIGS. 4A-4D build on the feature vector generated from the metrics of FIG. 3 by computing a two dimensional (2D) plot depicting each of the feature vectors presented for comparison.
  • the preexisting graphs of the training set 122 and previous graphs 124 are already plotted to define graph types, one of which the graphs for classification 150 will fall into.
  • the feature vector includes the ten 0.0-1.0 magnitudes of each of the normalized features for each graph.
  • the user device 112 is operable for rendering a visualization of the feature vectors, and the gragnostics processor 110 determines a similarity of the graphs based on a distance between the corresponding visualized feature vectors. It should be noted that the visual rendering of FIG. 3 , depicting individual features, is an intermediate step not required for generating the feature vector.
  • the feature vector is generated from the 10 computed, normalized features.
  • the feature vector when computed as on ordered set of normalized values, therefore defines a multidimensional vector, or a reference to a multidimensional space.
  • the gragnostics processor 110 is configured to compute a multidimensional distance between each of the feature vectors for determining a similarity between the graphs corresponding to the feature vectors.
  • a multidimensional distance is computable between vectors of different graphs, offering a coalesced metric of similarity to other graphs.
  • the feature vector may be projected or reduced onto a two-dimensional (2D) plane depicting the computed distance between the feature vectors of different graphs. Similar graphs appear as a “cluster” or closely located set of points, as depicted in FIGS. 4A-4D .
  • Graphs classification occurs based on the computed distance between the feature vectors of each of the graphs, and rendered visually by these clusters of points.
  • the graph rendering of FIG. 4A depicts a comparison of the computed distance between each of the graphs, and classification of each of the graphs based on the comparison by determining a nearest neighbor, shown as different colors in FIG. 4A .
  • FIGS. 4A-4D distinct groups of values (points) are shown for a sample set of graphs.
  • a two dimensional rendering region 400 depicts a metric multi-dimensional scaling (MDS) plot of graphs projected onto 2 dimensions from the 10-dimensional gragnostics feature space (e.g a multidimensional space encompassing the 10 value feature vector).
  • MDS metric multi-dimensional scaling
  • FIG. 4A illustrates notable class separation in groups 410 , 430 , 450 , 470 and 490 , indicating that different classes of graphs have different feature vector values.
  • a cluster 410 of green plots are based on graphs of subway maps.
  • Graph plots 411 - 1 , 411 - 2 and 411 - 3 are based on graph data of graphs 421 - 1 , 421 - 2 and 421 - 3 , respectively, depicting the Tokyo, Shanghai and London subway networks.
  • Tokyo 421 - 1 represents the graphs for classification and graphs 421 - 2 and 421 - 3 are the computed nearest neighbors in the training set.
  • Feature vector 423 - 1 shows the values of the features computed for the graph 421 - 1
  • feature vectors 423 - 2 and 423 - 3 shows the respective training set values.
  • a nearest neighbors value 425 shows London and Shanghai as the closest valued vectors, a characteristic visually apparent on the rendering region 400 . It is apparent that, based on the classification of the Tokyo subway graph 421 - 1 , its distance to the London subway graph 421 - 3 is very short; its features are nearly identical, and its force-directed node-link diagram shares the same visual structure. Meanwhile, the Tokyo graph's second nearest neighbor is the Shanghai subway graph 421 - 2 , which is farther away than London. Shanghai has higher bridge, constricted, and line features. Furthermore, we can visually confirm this dissimilarity by looking at Shanghai's force-directed node-link diagram and noting that it has more bridge edges, it is more constricted, and it is more line-like because more vertices have only two edges
  • the feature vectors 443 - 1 , 443 - 2 and 443 - 3 indicate that the star feature is most pronounced, followed by the tree feature value.
  • the nearest neighbor values 445 likewise designate “686” and “348” as the closest.
  • one of the nearest character graphs is Star Wars 2 (441-4) having feature vector 443 - 4 .
  • the Star Wars 2 and Storm of Swords graphs ( FIG. 4D ) are more typical of non-autobiographical character graphs. We also see that the software and the character classes overlap in the MDS plot 400 .
  • Another grouping 450 is based on software graphs, defined by code structure and plotted as color purple.
  • a plot for classification 451 - 1 is shown in FIG. 4D as graph 461 - 1 (sjbullet), with nearest neighbors of 451 - 2 and 451 - 3 , corresponding to graphs 461 - 2 (Storm of Swords) and Javax 461 - 3 .
  • graph 461 - 1 sjbullet
  • nearest neighbors of 451 - 2 and 451 - 3 corresponding to graphs 461 - 2 (Storm of Swords) and Javax 461 - 3 .
  • the next nearest neighbor is Javax 461 - 3 , shown by nearest neighbor values 465 , and each has highest feature values of tree, with the star feature running a distant second.
  • Computation of the feature vector includes computation of each of the following 10 features in linear computability time, meaning that the number of computing instructions, and hence time, varies linearly with the number of nodes and/or vertices.
  • Number of nodes This counts the number of nodes in the graph. This runs in O(
  • Number of links This counts the number of links in the graph. This runs in O(
  • Density This determines the link density in the graph, or the probability that any two randomly chosen nodes will be connected via a link. This is calculated by:
  • Isolation This describes the fraction of nodes in the graph that are not connected to any other node. This is calculated by:
  • v* represents the node with the highest degree in the graph. This requires finding the node with the highest degree and summing the degree of each node, so this runs in O(
  • bridge ⁇ ( G ) ⁇ V ⁇ - 1
  • bridge(G) is the number of bridge links in graph G.
  • cut(G) is the number of nodes whose removal will disconnect the graph. This can also be breadth found using a breadth-first search in O(
  • Disconnected This describes the fraction of connected components in a graph out of the maximum possible number of connected components, i.e. a fraction denoting the degree that clusters are unreachable by other clusters.
  • Tree This describes how close a graph is to being a tree, i.e. how close it is to a graph with no cycles. This is calculated by:
  • ⁇ i 1 ⁇ V ⁇ ⁇ l ⁇ ( i ) ⁇ V ⁇ ⁇ ⁇
  • D is a vector of length
  • where each element is the degree d(v) of a vertex in V such that if d(v) 1 then it is at the beginning of the vector.
  • FIGS. 5 A 1 - 5 C denote a flowchart 500 depicting graph processing resulting in the gragnostic plot of FIGS. 4A-4D .
  • the gragnostics processor 110 loads each graph from a media source or user data store.
  • a classification is performed for one graph against a preexisting training set 122 or previous graphs 124 presented for classification.
  • the following gragnostics steps are performed for all graphs, and a classification distinction made against previously processed graphs. This includes receiving a plurality of graphs in a suitable data structure, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges, as depicted at step 502 .
  • the gragnostics processor 110 preprocesses each graph to arrange for feature computation, as disclosed at step 503 .
  • Each of the graphs is defined by a data structure indicative of, for each vertex, an association to a set of connected vertices, such that each vertex is renderable as a node and the set of associations defined by a line to each connected vertex, as depicted at step 504 .
  • Such an arrangement corresponds to a typical visualization of a graph which has the appearance of circles with lines emanating from each circle and terminating in other circles to which the lines connect.
  • the gragnostics processor 110 extracts the 10 features, as shown at step 505 .
  • the set of 10 features provides an illustrative set for permitting optimal computation of the graphs as defined herein, however similar configurations may consider a greater or lesser number of features.
  • the gragnostics processor 110 computes, for each graph, a plurality of features based on the edges and vertices, as depicted at step 506 .
  • each of the features has a linear computability time such that the feature is computable in a time that varies linearly with at least one of the number of nodes or vertices, as depicted at step 507
  • the gragnostics processor computes the tree feature at step 508 , depicted in FIG. 5B , and the linearity feature at step 509 , depicted in FIG. 5C .
  • the gragnostics processor 110 creates a feature vector using the 10 features for each graph, as disclosed at step 510 . This includes, at step 511 , normalizing the computed features into a predetermined range, and for each graph, arranging each of the normalized features into a feature vector, such that the feature vector has ordered values for each of the features, depicted at step 512 . Normalizing, in the example configuration, scales each feature to a range of 0.0-1.0, facilitating comparison and rendering as a multidimensional vector.
  • Comparison and visualization includes computing distances between graphs using the feature vectors, as shown at step 512 .
  • Multidimensional values such as the feature vector may be compared using Euclidian distance or similar metrics, such that the distance between feature vectors is indicative of the similarity of the corresponding graphs.
  • the gragnostics processor 110 computes a two dimensional position based on each of the feature vectors, as depicted at step 514 , and projects the position of each vector onto a visualized two dimensional rendering, as depicted at step 515 and rendered in FIG. 4A .
  • the gragnostics processor 110 launches the desired comparison analytic (e.g. clustering or nearest neighbor search), and invokes the interpretability of the features to understand differences between graphs and clusters of graphs (e.g. these two graphs are very similar, except one is more star-like than the other), as disclosed at step 516 .
  • Groupings, or classifications of graphs can therefore be determined by observing a cluster of graphs separated by relatively small distances.
  • the gragnostics processor 110 classifies, based on a distance on the visualized two dimensional graph, groups of graphs, such that the classification is defined by visual clusters of the positions on the two dimensional rendering, as depicted at step 517 .
  • the result is a determination of whether the graph for classification corresponds to one of the classes of graphs, disclosed at step 518 .
  • FIG. 5B is a flowchart of computation of the tree feature.
  • computing the features includes, at step 550 , and determining the tree feature in linear time by traversing each of the vertices in the graph and accumulating, based on the traversal, a number of edges, as depicted at step 551 .
  • the gragnostics processor 110 determines a number of edges for which the removal of would result in a tree by removing cyclic paths, as disclosed at step 552 , thus providing a measure of how “close” the graph is to a model tree, and compares the determined number of edges with a number of the traversed vertices, as disclosed at step 553 .
  • FIG. 5C details computation of the linearity feature.
  • computing the linearity feature in linear time includes traversing each of the vertices in the graph, as shown at step 570 , and determining, at each vertex, if a number of edges emanating from the vertex is consistent with a linear graph, as depicted at step 571 .
  • the gragnostics processor 110 accumulates the number of vertices consistent with a linear graph, at step 572 , and compares the accumulated vertices with the number of traversed vertices, as depicted at step 573 .
  • the result is a measure of a relative number of total vertices that satisfy the criteria for a linear graph, meaning two edges touch each vertex with the exception of two terminal vertices touching only one edge.
  • programs and methods defined herein are deliverable to a user processing and rendering device in many forms, including but not limited to a) information permanently stored on non-writeable storage media such as ROM devices, b) information alterably stored on writeable non-transitory storage media such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and optical media, or c) information conveyed to a computer through communication media, as in an electronic network such as the Internet or telephone modem lines.
  • the operations and methods may be implemented in a software executable object or as a set of encoded instructions for execution by a processor responsive to the instructions.
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • state machines controllers or other hardware components or devices, or a combination of hardware, software, and firmware components.

Abstract

A graph processing system, method and apparatus classifies graphs based on a linearly computable set of features defined as a feature vector adapted for comparison with the feature vectors of other graphs. The features result from graph statistics (“gragnostics”) computable from the edges and vertices of a set of graphs. Graphs are classified based on a multidimensional distance of the resulting feature vectors, and similar graphs are classified according to a distance, or nearest neighbor, of the feature vector corresponding to each graph. Projection of the feature vector onto two dimensions allows visualization of the classification, as similar graphs appear as clusters or groups separated by a relatively shorter distance. Different types or classifications of graphs also appear as other, more distant, clusters. An initial training set defines the classification types, and sampled graphs are evaluated and classified based on the feature vector and nearest neighbors in the training set.

Description

BACKGROUND
Graphs are often employed for representing associations or relations between entities. Graphs include a set of nodes, or vertices connected by edges, or lines. Mathematical and computational representations defined by various expressions and data structures are often used in practice, however graphs can also be visually expressed to facilitate human recognition of the information or trends contained therein. Visual recognition of trends can often be made by observing “clusters” of nodes, meaning a set of nodes that share similar attributes, features or values, and the corresponding computational representation is available for automated processing or interpretation of the graph in areas such as machine learning, analytics, and statistics.
SUMMARY
A graph processing system, method and apparatus classifies graphs based on a linearly computable set of features defined as a feature vector adapted for comparison with the feature vectors of other graphs. The features result from graph statistics (“gragnostics”) computable from the edges and vertices of a set of graphs. Graphs are classified based on a multidimensional distance of the resulting feature vectors, and similar graphs are denoted according to a distance, or nearest neighbor, of the feature vector corresponding to each graph. Projection of the feature vector onto a two dimensional Cartesian plane allows visualization of the classification, as similar graphs appear as clusters or groups separated by a relatively shorter distance. Different types or classifications of graphs also appear as other, more distant, clusters. An initial training set defines the classification types, and sampled graphs are evaluated and classified based on the feature vector and nearest neighbors in the training set.
Configurations herein are based, in part, on the observation that graphs are often employed for storing information that can also be rendered or visualized to facilitate human recognition of the information contained in the graph. Unfortunately, conventional approaches to graph processing and rendering suffers from the shortcoming that large graphs can be unwieldy in both processing and recognition. Existing approaches to comparing graphs are slow and not very expressive in explaining how the graphs are similar or dissimilar. Large graphs, such as those that may be derived from analytics-focused data sources, can result in a number of nodes and vertices that are cumbersome to process, and difficult to visually recognize or interpret, due to scaling or geometric issues, e.g. a large number of nodes or vertices merge into an amorphous visual image. Accordingly, configurations herein substantially overcome the above-described shortcomings by providing a linearly computable feature vector based on quantifiable features of a graph, and a comparison metric that determines a classification of a graph to designate graphs sharing similar features, and hence, likely depict related information types or forms. In other words, classification of a graph results from comparison of other graphs to answer questions such as “Which graphs or graph types does my graph most resemble?” Depending on the data source used to define the graphs, correlations between the data sources may be inferred.
Configurations disclosed herein are operable for, in an analytics environment having graph data responsive to rendering for visual recognition, comparison of statistical trends defined in a plurality of graphs. The disclosed scalable method of rendering visualized graph data includes receiving a plurality of graphs, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges, and computing, for each graph, a plurality of features based on the edges and vertices. The computed features, detailed below, provide a scalar quantity that may then be normalized into a predetermined range, such as 0-1. An application operable on a gragnostics (a portmanteau of “graph” and “diagnostics”) processor arranges, for each graph, each of the normalized features into a feature vector having ordered values for each of the features. The example herein coalesces 10 features into an ordered vector of normalized values in the 0.0-1.0 range, providing a multidimensional vector coalescing the features of each graph. The gragnostics processor determines similarity of the graphs based on a distance between the corresponding visualized feature vectors, and a user display renders a visualization of the feature vectors for depicting the relative distance from other graphs.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, features and advantages of the invention will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
FIG. 1 is a context diagram of a computing environment suitable for use with configurations herein;
FIG. 2 shows an example of graph features employed in assessing graphs in the computing environment of FIG. 1;
FIG. 3 is an example correlation of the graph features FIG. 2 derived from a set of graphs;
FIGS. 4A-4D show a gragnostic plot for comparing feature vectors aggregated from the graph features of FIG. 3; and
FIGS. 5A1-5C denote a flowchart depicting graph processing resulting in the gragnostic plot of FIGS. 4A-4D.
DETAILED DESCRIPTION
Configurations below implement a gragnostic processing approach for an example set of graphs, and illustrate comparison and determination of a graph class based on a baseline or control set of other graphs. An example set of graphs are employed for training and for classification, however any suitable graph representations may be employed. A graph as employed herein is a mathematical structure defining a set of nodes—also known as vertices—and the links—or edges—that connect them. Speed and efficiency are significant because graph comparisons can be used to determine similarity or distances between a large number of graphs, which in turn can then be used to cluster large graph datasets, or to query a database to identify similar graphs. For example, in social media networks, clustering users' ego networks may identify fake accounts vs. real accounts.
The disclosed approach demonstrates several advantages over the prior art: 1) Computational performance that enables highly scalable graph comparisons, 2) comparisons that can be meaningfully interpreted by humans, and 3) fewer constraints on input graphs. Determining similarity of two graphs is related to graph isomorphism, which belongs to the class of NP problems, i.e. computationally intractable in practice. Conventional approaches include so-called Graph kernels which are more computationally tractable by using algorithms whose running time is a polynomial function of the number of nodes in the graph, but these kernels cannot be meaningfully interpreted or understood by humans. Moreover, polynomial time functions become computationally infeasible for very large graphs. In contrast, linear computability metrics are scalable because the complexity (number of computing operations) does not vary exponentially with the number of inputs, which becomes prohibitive with a sufficiently large number of inputs. Other conventional approaches use techniques such as singular value decomposition of adjacency matrices or multi-dimensional scaling on Euclidean distances between adjacency matrices, however these are also polynomial time functions that yield unintelligible results and require that input graphs all be of the same size. Other conventional approaches include Degree Distribution Quantification and Comparison or the Kolmogorov-Smirnov test on degree distributions, which offer an improved running time (feature extraction in O(|V|) time), but the features are only marginally more interpretable and do not offer substantive insight into how the topology differs between two or more graphs.
The gragnostics approach disclosed herein emphasizes several advantages. First, gragnostics should scale to large graphs, thus the disclosed gragnostics can be computed in O(|V|+|E|) time (vertices and edges). Second, the features are comprehensible by analysts who may not be experts in graph theory, as the rendered gragnostics correspond to topological characteristics described in visual renderings. This enables broad audiences to easily understand gragnostics. Third, the disclosed approach imposes few constraints, so there are few restrictions on size or components, which also complements the linear computability provided by O(|V|+|E|).
FIG. 1 is a context diagram of a computing environment 100 suitable for use with configurations herein. Referring to FIG. 1, a gragnostics processor 110 is responsive to a user computing device 112. A repository 120 stores a plurality of graphs, which may include a training set 122 of graphs depicting particular types, or previous graphs 124 employed for classification, which are then added to the set of graphs employed for classification.
Configurations herein are particularly beneficial in an analytics environment having graph data responsive to rendering for visual recognition and comparison. Visually recognizable aspects of the rendered graphs can denote statistical trends defined in a plurality of graphs. The gragnostics processor 110 receives a plurality of graphs, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges. Generally, the graphs are defined by a data structure indicative of, for each vertex, an association to a set of connected vertices, in which each vertex is renderable as a node and the set of associations defined by a line (edge) to each connected vertex. A variety of data structures may be employed for storing the graph representations as utilized herein.
The user device 112 receives a graph for classification 150 from any suitable source, such as the Internet 130 or other source, such as a set of graphs stored on media to be classified according to configurations herein. The classification processor 110, which may simply be a desktop PC in conjunction with the user device 112, performs the classification and renders a graph classification 152 indicative of graphs in the repository 120 that the graph for classification most resembles. As will be discussed further below, similar graphs produce distinct clusters when classified according to a robust set of graphs, therefore defining a distinct group that the graph for classification most resembles. Once processed for classification, the graph may also be stored in the repository 120 as a previous graph 124 and used for subsequent classifications.
FIG. 2 shows an example of graph features employed in assessing graphs in the computing environment of FIG. 1. Referring to FIGS. 1 and 2, the gragnostics processor 150 computes, for each graph, a plurality of features based on the edges and vertices. Each of the features has a linear computability time such that the feature is computable in a time that varies linearly with the number of nodes and/or vertices, providing for scalability to large accumulations of data. Each feature corresponds to a set of traversal steps for processing the graph, in which the traversal steps define a finite sequence of operations that varies linearly based on the number of vertices or edges in the graph.
The gragnostics processor 150 computes each of the features in a numerical manner, and normalizes each of the features to a range between 0.0 and 1.0 to facilitate intra-graph comparisons. The features relate to visual attributes of the graphs, and include density, bridge, disconnectivity, isolation, constriction, linearity, tree and star, and also the number of vertices (nodes) and a number of edges. FIG. 2 shows a visual appearance of a minimal 201 depiction of a feature ranging to a maximum depiction 203 of a feature, as well as a moderate depiction 202 of the feature. Gragnostics processor 150 normalizes the computed features into a predetermined range, such that for each feature, the value is scaled to a range of between 0 and 1. Alternate feature ranges may be performed, however normalizing to the same range allows comparison between different graphs in the plurality of graphs.
The gragnostics processor 110 employs constant time interpretable metrics to create a graph feature vector that can be used to compute distance between graphs using techniques such as or Euclidean distance or compute clusters of graphs using techniques such as k-nearest neighbors or k-means clustering. Other multidimensional distance approaches may also be employed.
For each graph, the gragnostics processor 110 arranges each of the normalized features into a feature vector having ordered values for each of the features. In the example configuration using 10 features, the feature vector includes a value for each feature normalized to a common range for the 10 features, 0.0-1.0 in this case. A more technical discussion of the linear computation of each of the 10 metrics is disclosed below with respect to FIGS. 4A-4D.
FIG. 3 is an example correlation of the graph features FIG. 2 derived from a set of graphs. Referring to FIGS. 2 and 3, computation of each of the 10 feature metrics results in the intermediate step shown in FIG. 3. A plurality of classes 310 are defined based on at least the training set 122 to produce a correlation 300 of features. The classes 310, each denoted by a different color, define the classifications, or groups of graphs, discussed further below in FIG. 4. FIG. 3 illustrates the degree to which certain features correlate with, or predict, other features. A vertical axis 320 lists each of the 10 features, and a horizontal axis 322 lists the same set of features. The intersection of each feature depicts an array of subgraphs 350 (3 upper left subgraphs labeled for clarity). FIG. 3 also illustrates the benefit of normalizing each of the features to the range of 0.0-1.0 to allow comparison to other features, as each subgraph 350 has a horizontal axis 352 and vertical axis 354. The horizontal axis 352 defines the value of the feature defined on the axis 322, and the vertical axis 354 defines the value of the feature defined on the axis 320. Within each subgraph 350, each graph in a training set 122 is plotted based on the normalized value of the feature. This illustrates the intermediate step of rendering a graphing of each feature against the other features for each graph. The color of the plot point indicates the group from which the graph was derived. Several of the conclusions that may be drawn at this stage are labeled in FIG. 3.
For example, a grouping 360 denotes that subway graphs (green dots showing graphs derived from an inner city subway layout) tend to be less constricted (constricted feature value near 0.0). Group 362 shows that a high value in the lines feature tends to correlate with the constriction feature. Group 364 demonstrates that star and bridge features distinguish the ego graphs (pink dots showing social media connections). Similarly, group 366 distinguishes geometric graphs (blue dots derived from graphs of regular geometric shapes). Group 368 demonstrates a correlation between tree and density features, and group 370 shows correlation between bridge and constricted features. Other conclusions regarding classification will be discussed below with respect to FIG. 4. Not surprisingly, the plots of the same feature along the correlation 340 each define a diagonal line (e.g. star to star, etc.).
While the graphical depictions of FIG. 3 can define relations between certain graphs, FIGS. 4A-4D coalesce the aggregate features of each graph for comparison on a broader scale. FIGS. 4A-4D show a gragnostic plot for comparing feature vectors aggregated from the graph features of FIG. 3. Referring to FIGS. 2 and 4, the computed set of features (10, in the example shown) defines an ordered vector, which can be represented in a multidimensional space. FIGS. 4A-4D build on the feature vector generated from the metrics of FIG. 3 by computing a two dimensional (2D) plot depicting each of the feature vectors presented for comparison. In a particular example, the preexisting graphs of the training set 122 and previous graphs 124 are already plotted to define graph types, one of which the graphs for classification 150 will fall into.
The feature vector includes the ten 0.0-1.0 magnitudes of each of the normalized features for each graph. The user device 112 is operable for rendering a visualization of the feature vectors, and the gragnostics processor 110 determines a similarity of the graphs based on a distance between the corresponding visualized feature vectors. It should be noted that the visual rendering of FIG. 3, depicting individual features, is an intermediate step not required for generating the feature vector. The feature vector is generated from the 10 computed, normalized features. The feature vector, when computed as on ordered set of normalized values, therefore defines a multidimensional vector, or a reference to a multidimensional space. The gragnostics processor 110 is configured to compute a multidimensional distance between each of the feature vectors for determining a similarity between the graphs corresponding to the feature vectors. A multidimensional distance is computable between vectors of different graphs, offering a coalesced metric of similarity to other graphs. Also, the feature vector may be projected or reduced onto a two-dimensional (2D) plane depicting the computed distance between the feature vectors of different graphs. Similar graphs appear as a “cluster” or closely located set of points, as depicted in FIGS. 4A-4D. Graphs classification occurs based on the computed distance between the feature vectors of each of the graphs, and rendered visually by these clusters of points.
The graph rendering of FIG. 4A depicts a comparison of the computed distance between each of the graphs, and classification of each of the graphs based on the comparison by determining a nearest neighbor, shown as different colors in FIG. 4A. Referring to FIGS. 4A-4D, distinct groups of values (points) are shown for a sample set of graphs. A two dimensional rendering region 400 depicts a metric multi-dimensional scaling (MDS) plot of graphs projected onto 2 dimensions from the 10-dimensional gragnostics feature space (e.g a multidimensional space encompassing the 10 value feature vector). In general, FIG. 4A illustrates notable class separation in groups 410, 430, 450, 470 and 490, indicating that different classes of graphs have different feature vector values. A cluster 410 of green plots are based on graphs of subway maps. Graph plots 411-1, 411-2 and 411-3 are based on graph data of graphs 421-1, 421-2 and 421-3, respectively, depicting the Tokyo, Shanghai and London subway networks. In this example, Tokyo 421-1 represents the graphs for classification and graphs 421-2 and 421-3 are the computed nearest neighbors in the training set. Feature vector 423-1 shows the values of the features computed for the graph 421-1, and feature vectors 423-2 and 423-3 shows the respective training set values. It is apparent that the tree feature value is just short of 1.0 for each graph, and the line feature is the next highest value at around 0.75. Comparable values for constricted and bridge features exist also. A nearest neighbors value 425 shows London and Shanghai as the closest valued vectors, a characteristic visually apparent on the rendering region 400. It is apparent that, based on the classification of the Tokyo subway graph 421-1, its distance to the London subway graph 421-3 is very short; its features are nearly identical, and its force-directed node-link diagram shares the same visual structure. Meanwhile, the Tokyo graph's second nearest neighbor is the Shanghai subway graph 421-2, which is farther away than London. Shanghai has higher bridge, constricted, and line features. Furthermore, we can visually confirm this dissimilarity by looking at Shanghai's force-directed node-link diagram and noting that it has more bridge edges, it is more constricted, and it is more line-like because more vertices have only two edges
A similar analysis follows for a cluster 430 (pink) depicting ego networks, as derived from FACEBOOK® connections. Referring to FIGS. 4A and 4C, it is interesting to note that the David Copperfield graph's (431-1) nearest neighbors are all Facebook ego networks. This classification makes sense if we consider that David Copperfield is often considered to be a semi-autobiographical novel about Charles Dickens, in essence making it an ego graph of the central character. Examining the respective graphs 441-1, 441-2, 441-3 and 441-4 based on plots 431-1, 431-2, 431-3 and 431-4, the feature vectors 443-1, 443-2 and 443-3 indicate that the star feature is most pronounced, followed by the tree feature value. The nearest neighbor values 445 likewise designate “686” and “348” as the closest. For comparison, one of the nearest character graphs (turquoise) is Star Wars 2 (441-4) having feature vector 443-4. The Star Wars 2 and Storm of Swords graphs (FIG. 4D) are more typical of non-autobiographical character graphs. We also see that the software and the character classes overlap in the MDS plot 400.
Another grouping 450 is based on software graphs, defined by code structure and plotted as color purple. A plot for classification 451-1 is shown in FIG. 4D as graph 461-1 (sjbullet), with nearest neighbors of 451-2 and 451-3, corresponding to graphs 461-2 (Storm of Swords) and Javax 461-3. Based on the sjbullet software graph 461-1, it can be seen that its features are similar to the Storm of Swords character graph 461-2, illustrating overlap, although it is a littler smaller and a little less dense. The next nearest neighbor is Javax 461-3, shown by nearest neighbor values 465, and each has highest feature values of tree, with the star feature running a distant second.
Computation of the feature vector, in the disclosed configuration, includes computation of each of the following 10 features in linear computability time, meaning that the number of computing instructions, and hence time, varies linearly with the number of nodes and/or vertices.
Number of nodes: This counts the number of nodes in the graph. This runs in O(|V|) time.
Number of links: This counts the number of links in the graph. This runs in O(|E|) time.
Density. This determines the link density in the graph, or the probability that any two randomly chosen nodes will be connected via a link. This is calculated by:
2 · E V · ( V - 1 )
If |V| and |E| are already calculated, then this runs in O(1) time; otherwise it runs in O(|V|+|E|) time.
Isolation. This describes the fraction of nodes in the graph that are not connected to any other node. This is calculated by:
{ v V : d ( v ) = 0 } V
Where d(v) is the degree of node v. This requires counting the number of nodes that have degree 0, so this runs in O(|V|+|E|) time.
Star. This describes the degree to which a single node is more central than the other nodes in a graph. This is calculated by:
v V d ( v * ) - d ( v ) ( V - 1 ) ( V - 2 )
Where v* represents the node with the highest degree in the graph. This requires finding the node with the highest degree and summing the degree of each node, so this runs in O(|V|+|E|) time.
Bridge. This describes the fraction of links in the graph whose removal will disconnect the graph. This is calculated by:
bridge ( G ) V - 1
Where bridge(G) is the number of bridge links in graph G. Using a breadth-first search, all bridges can be found in O(|V|+|E|) time.
Constricted. This describes the fraction of nodes in the graph whose removal will disconnect the graph. This is calculated by:
cut ( G ) V - 2
Where cut(G) is the number of nodes whose removal will disconnect the graph. This can also be breadth found using a breadth-first search in O(|V|+|E|) time.
Disconnected: This describes the fraction of connected components in a graph out of the maximum possible number of connected components, i.e. a fraction denoting the degree that clusters are unreachable by other clusters.
C - 1 V - 1
Where |C| is the number of connected components in the graph, which can be found in O(|V|+|E|) time.
Tree: This describes how close a graph is to being a tree, i.e. how close it is to a graph with no cycles. This is calculated by:
1 - E - ( V - 1 ) V · ( V - 1 ) / 2 - ( V - 1 )
Colloquially, this refers to the number of links needed to be removed in order to make the graph a tree, divided by the maximum possible number of links needed to remove to make the graph a tree, all subtracted from 1. If |V| and |E| are already calculated, then this runs in O(1) time; otherwise it runs in O(|V|+|E|) time. Line: This calculates how close a graph is to being a line, i.e. a tree with exactly two leaves where there is only one path from one leaf to the other. This is calculated by:
i = 1 V l ( i ) V Where l ( i ) = { 1 , if D i = 1 and i 2 , or if D i = 2 and i > 2 0 , otherwise .
Here, D is a vector of length |V| where each element is the degree d(v) of a vertex in V such that if d(v)=1 then it is at the beginning of the vector. This requires iterating over each edge to calculate the degree of each vertex, and then iterating over each vertex to twice to ensure that if d(v)=1 then it is at the front of the list, so this runs in O(|V|+|E|) time.
FIGS. 5A1-5C denote a flowchart 500 depicting graph processing resulting in the gragnostic plot of FIGS. 4A-4D. Referring to FIGS. 1, 5A1 and 5A2, at step 501, the gragnostics processor 110 loads each graph from a media source or user data store. As indicated previously, a classification is performed for one graph against a preexisting training set 122 or previous graphs 124 presented for classification. In practice, the following gragnostics steps are performed for all graphs, and a classification distinction made against previously processed graphs. This includes receiving a plurality of graphs in a suitable data structure, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges, as depicted at step 502.
The gragnostics processor 110 preprocesses each graph to arrange for feature computation, as disclosed at step 503. Each of the graphs is defined by a data structure indicative of, for each vertex, an association to a set of connected vertices, such that each vertex is renderable as a node and the set of associations defined by a line to each connected vertex, as depicted at step 504. Such an arrangement corresponds to a typical visualization of a graph which has the appearance of circles with lines emanating from each circle and terminating in other circles to which the lines connect.
The gragnostics processor 110 extracts the 10 features, as shown at step 505. The set of 10 features provides an illustrative set for permitting optimal computation of the graphs as defined herein, however similar configurations may consider a greater or lesser number of features. The gragnostics processor 110 computes, for each graph, a plurality of features based on the edges and vertices, as depicted at step 506. In the example configuration, each of the features has a linear computability time such that the feature is computable in a time that varies linearly with at least one of the number of nodes or vertices, as depicted at step 507
In particular, the gragnostics processor computes the tree feature at step 508, depicted in FIG. 5B, and the linearity feature at step 509, depicted in FIG. 5C.
From the computed features, the gragnostics processor 110 creates a feature vector using the 10 features for each graph, as disclosed at step 510. This includes, at step 511, normalizing the computed features into a predetermined range, and for each graph, arranging each of the normalized features into a feature vector, such that the feature vector has ordered values for each of the features, depicted at step 512. Normalizing, in the example configuration, scales each feature to a range of 0.0-1.0, facilitating comparison and rendering as a multidimensional vector.
Comparison and visualization includes computing distances between graphs using the feature vectors, as shown at step 512. Multidimensional values such as the feature vector may be compared using Euclidian distance or similar metrics, such that the distance between feature vectors is indicative of the similarity of the corresponding graphs. The gragnostics processor 110 computes a two dimensional position based on each of the feature vectors, as depicted at step 514, and projects the position of each vector onto a visualized two dimensional rendering, as depicted at step 515 and rendered in FIG. 4A.
The gragnostics processor 110 launches the desired comparison analytic (e.g. clustering or nearest neighbor search), and invokes the interpretability of the features to understand differences between graphs and clusters of graphs (e.g. these two graphs are very similar, except one is more star-like than the other), as disclosed at step 516. Groupings, or classifications of graphs, can therefore be determined by observing a cluster of graphs separated by relatively small distances. The gragnostics processor 110 classifies, based on a distance on the visualized two dimensional graph, groups of graphs, such that the classification is defined by visual clusters of the positions on the two dimensional rendering, as depicted at step 517. The result is a determination of whether the graph for classification corresponds to one of the classes of graphs, disclosed at step 518.
FIG. 5B is a flowchart of computation of the tree feature. Referring to FIG. 5B, computing the features includes, at step 550, and determining the tree feature in linear time by traversing each of the vertices in the graph and accumulating, based on the traversal, a number of edges, as depicted at step 551. The gragnostics processor 110 determines a number of edges for which the removal of would result in a tree by removing cyclic paths, as disclosed at step 552, thus providing a measure of how “close” the graph is to a model tree, and compares the determined number of edges with a number of the traversed vertices, as disclosed at step 553.
FIG. 5C details computation of the linearity feature. Referring to FIG. 5C, computing the linearity feature in linear time includes traversing each of the vertices in the graph, as shown at step 570, and determining, at each vertex, if a number of edges emanating from the vertex is consistent with a linear graph, as depicted at step 571. The gragnostics processor 110 accumulates the number of vertices consistent with a linear graph, at step 572, and compares the accumulated vertices with the number of traversed vertices, as depicted at step 573. The result is a measure of a relative number of total vertices that satisfy the criteria for a linear graph, meaning two edges touch each vertex with the exception of two terminal vertices touching only one edge.
Those skilled in the art should readily appreciate that the programs and methods defined herein are deliverable to a user processing and rendering device in many forms, including but not limited to a) information permanently stored on non-writeable storage media such as ROM devices, b) information alterably stored on writeable non-transitory storage media such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and optical media, or c) information conveyed to a computer through communication media, as in an electronic network such as the Internet or telephone modem lines. The operations and methods may be implemented in a software executable object or as a set of encoded instructions for execution by a processor responsive to the instructions. Alternatively, the operations and methods disclosed herein may be embodied in whole or in part using hardware components, such as Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), state machines, controllers or other hardware components or devices, or a combination of hardware, software, and firmware components.
While the system and methods defined herein have been particularly shown and described with references to embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims (16)

What is claimed is:
1. In an analytics environment having graph data responsive to rendering for visual recognition and comparison of statistical trends defined in a plurality of graphs, a scalable method of visualized graph data, comprising:
receiving a plurality of graphs, each graph defining associations between data entities and renderable in a visual form having a plurality of vertices connected by one or more edges;
computing, for each graph, a plurality of features based on the edges and vertices interconnected by the edges, including computing a feature value for each of the features in a linear computability time such that the feature value is computable in a time that varies linearly with at least one of a number of nodes or a number of vertices;
normalizing the computed features into a predetermined range;
for each graph, arranging each of the normalized features into a feature vector, the feature vector having ordered values for each of the features, the ordered values in the feature vector including a tree feature and a linearity feature for each graph, further comprising:
determining the tree feature in linear time by:
traversing each of the vertices in the graph;
accumulating, based on the traversal, a number of edges; and
determine a number of edges for which the removal of would result in a tree by removing cyclic paths; and
comparing the determined number of edges with a number of the traversed vertices; and
determining the linearity feature in linear time by:
traversing each of the vertices in the graph;
determining, at each vertex, if a number of edges emanating from the vertex is consistent with a linear graph;
accumulating the number of vertices consistent with a linear graph; and
computing the accumulated vertices with the number of traversed vertices;
computing a multidimensional distance between each of the feature vectors for determining a similarity between the graphs corresponding to the feature vectors;
computing a two dimensional position corresponding to each of the feature vectors based on a projection of the compound multidimensional distance;
displaying the position of each vector onto a visualized two dimensional rendering;
rendering a visualization of the feature vectors; and
determining similarity of the graphs based on a distance between the corresponding visualized feature vectors by classifying, based on a distance on the visualized two dimensional rendering, groups of graphs, the classification defined by visual clusters of the positions on the two dimensional rendering.
2. The method of claim 1 wherein the graphs are defined by a data structure indicative of, for each vertex, an association to a set of connected vertices, each vertex renderable as a node and the set of associations defined by a line to each connected vertex.
3. The method of claim 1 further comprising classifying the graphs based on the computed distance between the feature vectors of each of the graphs.
4. The method of claim 1 further comprising:
comparing the computed distance between each of the graphs, and
classifying each of the graphs based on the comparison by determining a nearest neighbor.
5. The method of claim 1 further comprising:
computing the feature vectors for a set of training graphs, the training graphs defining at least one class of graphs;
receiving a graph for classification;
computing the feature vector corresponding to the graph for classification; and
determining if the graph for classification corresponds to one of the classes of graphs.
6. The method of claim 1 wherein the feature vector includes a value for each feature normalized to a common range for all features.
7. The method of claim 6 wherein the feature vector includes a value for each of at least 10 features including a number of vertices, a number of edges, density, bridge, disconnectivity, isolation, constriction, linearity, tree and star, each value scaled to a range of 0 to 1.
8. The method of claim 6 further comprising an intermediate step of rendering a graphing of each feature against the other features for each graph.
9. The method of claim 1 wherein each feature corresponds to a set of traversal steps, the traversal steps defining a finite sequence of operations that varies linearly based on the number of vertices or edges in the graph.
10. The method of claim 1 wherein the visualization occurs on a two dimensional Cartesian plane, the similar graphs disposed as clusters or groups separated by a relatively shorter distance based on the multidimensional distance, and different classifications of graphs appear as distinct, distant clusters of points, each point defining a multidimensional value of a feature vector based on a graph.
11. The method of claim 1 further comprising rendering the visualization of each feature vector in a color group, the color group based on a relative distance of each feature vector to a feature vector of other graphs, the feature vectors of each respective color group defined by a shorter distance on the visualization to each feature vector in the same color group and a longer distance on the visualization to each feature vector in a different color group.
12. A computerized device for visualizing graph data comprising:
a network interface adapted for receiving a plurality of graphs, each graph defining associations between data entities and renderable in a visual form having a plurality of vertices connected by one or more edges;
a user accessible display;
a gragnostics processor, configured to:
compute, for each graph, a plurality of features based on the edges and vertices, the vertices interconnected by the edges for computing a feature value for each of the features in a linear computability time such that the feature value is comparable in a time that varies linearly with at least one of a number of nodes or number of vertices, the gragnostics processor further openable to:
determine the tree feature in linear time by:
traversing each of the vertices in the graph;
accumulating, based on the traversal, a number of edges; and
determining a number of edges for which the removal of would result in a tree by removing cyclic paths; and
comparing the determined number of edges with a number of the traversed vertices; and
determine the linearity feature in linear time by:
traversing each of the vertices in the graph;
determining, at each vertex, if a number of edges emanating from the vertex is consistent with a linear graph;
accumulating the number of vertices consistent with a linear graph; and
comparing the accumulated vertices with the number of traversed vertices;
normalize the computed features into a predetermined range;
for each graph, arrange each of the normalized features into a feature vector, the feature vector having ordered values for each of the features;
determine similarity of the graphs based on a distance between the corresponding visualized feature vectors;
compute a multidimensional distance between each of the feature vectors for determining a similarity between the graphs corresponding to the feature vectors;
compute a two dimensional position corresponding to each of the feature vectors based on a projection of the computed multidimensional distance;
display the position of each vector onto a visualized two dimensional rendering; and
render a visualization of the feature vectors on the user accessible display the display indicative of, based on a distance on the visualized two dimensional rendering, groups of graphs, the classification defined by visual clusters of the positions on the two dimensional rendering.
13. The device of claim 12 wherein the gragnostics processor is further operable to:
compute a multidimensional distance between each of the feature vectors for determining a similarity between the graphs corresponding to the feature vectors; and
classify the graphs based on the computed distance between the feature vectors of each of the graphs.
14. The device of claim 13 wherein the gragnostics processor is further operable to:
compare the computed distance between each of the graphs, and
classify each of the graphs based on the comparison by determining a nearest neighbor.
15. The device of claim 13 wherein the gragnostics processor is further operable to:
compute the feature vectors for a set of training graphs, the training graphs defining at least one class of graphs;
receive a graph for classification;
compute the feature vector corresponding to the graph for classification; and
determine if the graph for classification corresponds to one of the classes of graphs.
16. A computer program product on a non-transitory computer readable storage medium having instructions that, when executed by a processor, perform a method of rendering visualized graph data, the method comprising:
receiving a plurality of graphs, each graph defining associations between data entities and renderable in a visual form having a plurality of vertices connected by one or more edges;
computing, for each graph, a plurality of features based on the edges and vertices interconnected by the edges, including computing a feature value for each of the features in a linear computability time such that the feature value is computable in a time that varies linearly with at least one of a number of nodes or number of vertices;
normalizing the computed features into a predetermined range;
for each graph, arranging each of the normalized features into a feature vector, the feature vector having ordered values for each of the features, the ordered values in the feature vector including a tree feature and a linearity feature for each graph, further comprising:
determining the tree feature in linear time by:
traversing each of the vertices in the graph;
accumulating, based on the traversal, a number of edges; and
determining a number of edges for which the removal of would result in a tree by removing cyclic paths; and
comparing the determined number of edges with a number of the traversed vertices; and
determining the linearity feature in linear time by:
traversing each of the vertices in the graph;
determining, at each vertex, if a number of edges emanating from the vertex is consistent with a linear graph;
accumulating the number of vertices consistent with a linear graph; and
comparing the accumulated vertices with the number of traversed vertices;
computing a multidimensional distance between each of the feature vectors for determining a similarity between the graphs corresponding to the feature vectors;
computing a two dimensional position corresponding to each of the feature vectors based on a projection of the computed multidimensional distance;
displaying the position of each vector onto a visualized two dimensional rendering
rendering a visualization of the feature vectors; and
determining similarity of the graphs based on a distance between the corresponding visualized feature vectors by classifying, based on a distance on the visualized two dimensional rendering, groups of graphs, the classification defined by visual clusters of the positions on the two dimensional rendering.
US15/935,657 2018-03-26 2018-03-26 Gragnostics rendering Active 2038-04-13 US10657686B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/935,657 US10657686B2 (en) 2018-03-26 2018-03-26 Gragnostics rendering
US16/865,946 US11195312B1 (en) 2018-03-26 2020-05-04 Gragnostics rendering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/935,657 US10657686B2 (en) 2018-03-26 2018-03-26 Gragnostics rendering

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/865,946 Continuation US11195312B1 (en) 2018-03-26 2020-05-04 Gragnostics rendering

Publications (2)

Publication Number Publication Date
US20190295296A1 US20190295296A1 (en) 2019-09-26
US10657686B2 true US10657686B2 (en) 2020-05-19

Family

ID=67985465

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/935,657 Active 2038-04-13 US10657686B2 (en) 2018-03-26 2018-03-26 Gragnostics rendering
US16/865,946 Active US11195312B1 (en) 2018-03-26 2020-05-04 Gragnostics rendering

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/865,946 Active US11195312B1 (en) 2018-03-26 2020-05-04 Gragnostics rendering

Country Status (1)

Country Link
US (2) US10657686B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11195312B1 (en) * 2018-03-26 2021-12-07 Two Six Labs, LLC Gragnostics rendering

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9396283B2 (en) 2010-10-22 2016-07-19 Daniel Paul Miranker System for accessing a relational database using semantic queries
US10452677B2 (en) 2016-06-19 2019-10-22 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US11068847B2 (en) 2016-06-19 2021-07-20 Data.World, Inc. Computerized tools to facilitate data project development via data access layering logic in a networked computing platform including collaborative datasets
US10438013B2 (en) 2016-06-19 2019-10-08 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11755602B2 (en) 2016-06-19 2023-09-12 Data.World, Inc. Correlating parallelized data from disparate data sources to aggregate graph data portions to predictively identify entity data
US10324925B2 (en) 2016-06-19 2019-06-18 Data.World, Inc. Query generation for collaborative datasets
US11468049B2 (en) 2016-06-19 2022-10-11 Data.World, Inc. Data ingestion to generate layered dataset interrelations to form a system of networked collaborative datasets
US11036697B2 (en) 2016-06-19 2021-06-15 Data.World, Inc. Transmuting data associations among data arrangements to facilitate data operations in a system of networked collaborative datasets
US10353911B2 (en) 2016-06-19 2019-07-16 Data.World, Inc. Computerized tools to discover, form, and analyze dataset interrelations among a system of networked collaborative datasets
US11023104B2 (en) 2016-06-19 2021-06-01 data.world,Inc. Interactive interfaces as computerized tools to present summarization data of dataset attributes for collaborative datasets
US11042560B2 (en) 2016-06-19 2021-06-22 data. world, Inc. Extended computerized query language syntax for analyzing multiple tabular data arrangements in data-driven collaborative projects
US11675808B2 (en) 2016-06-19 2023-06-13 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US10747774B2 (en) 2016-06-19 2020-08-18 Data.World, Inc. Interactive interfaces to present data arrangement overviews and summarized dataset attributes for collaborative datasets
US11941140B2 (en) 2016-06-19 2024-03-26 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10645548B2 (en) 2016-06-19 2020-05-05 Data.World, Inc. Computerized tool implementation of layered data files to discover, form, or analyze dataset interrelations of networked collaborative datasets
US11334625B2 (en) 2016-06-19 2022-05-17 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US11042548B2 (en) 2016-06-19 2021-06-22 Data World, Inc. Aggregation of ancillary data associated with source data in a system of networked collaborative datasets
US10853376B2 (en) 2016-06-19 2020-12-01 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US11042556B2 (en) 2016-06-19 2021-06-22 Data.World, Inc. Localized link formation to perform implicitly federated queries using extended computerized query language syntax
US11947554B2 (en) 2016-06-19 2024-04-02 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US10452975B2 (en) 2016-06-19 2019-10-22 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11036716B2 (en) 2016-06-19 2021-06-15 Data World, Inc. Layered data generation and data remediation to facilitate formation of interrelated data in a system of networked collaborative datasets
US11086896B2 (en) 2016-06-19 2021-08-10 Data.World, Inc. Dynamic composite data dictionary to facilitate data operations via computerized tools configured to access collaborative datasets in a networked computing platform
US11042537B2 (en) 2016-06-19 2021-06-22 Data.World, Inc. Link-formative auxiliary queries applied at data ingestion to facilitate data operations in a system of networked collaborative datasets
US10824637B2 (en) 2017-03-09 2020-11-03 Data.World, Inc. Matching subsets of tabular data arrangements to subsets of graphical data arrangements at ingestion into data driven collaborative datasets
US11238109B2 (en) 2017-03-09 2022-02-01 Data.World, Inc. Computerized tools configured to determine subsets of graph data arrangements for linking relevant data to enrich datasets associated with a data-driven collaborative dataset platform
US11068453B2 (en) 2017-03-09 2021-07-20 data.world, Inc Determining a degree of similarity of a subset of tabular data arrangements to subsets of graph data arrangements at ingestion into a data-driven collaborative dataset platform
US10922308B2 (en) * 2018-03-20 2021-02-16 Data.World, Inc. Predictive determination of constraint data for application with linked data in graph-based datasets associated with a data-driven collaborative dataset platform
US11243960B2 (en) 2018-03-20 2022-02-08 Data.World, Inc. Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures
US11212299B2 (en) * 2018-05-01 2021-12-28 Royal Bank Of Canada System and method for monitoring security attack chains
USD940732S1 (en) 2018-05-22 2022-01-11 Data.World, Inc. Display screen or portion thereof with a graphical user interface
USD940169S1 (en) 2018-05-22 2022-01-04 Data.World, Inc. Display screen or portion thereof with a graphical user interface
US11947529B2 (en) 2018-05-22 2024-04-02 Data.World, Inc. Generating and analyzing a data model to identify relevant data catalog data derived from graph-based data arrangements to perform an action
US11442988B2 (en) 2018-06-07 2022-09-13 Data.World, Inc. Method and system for editing and maintaining a graph schema
WO2022072895A1 (en) 2020-10-01 2022-04-07 Crowdsmart, Inc. Managing and measuring semantic coverage in knowledge discovery processes
US11947600B2 (en) 2021-11-30 2024-04-02 Data.World, Inc. Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114802A1 (en) * 2003-08-29 2005-05-26 Joerg Beringer Methods and systems for providing a visualization graph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10657686B2 (en) * 2018-03-26 2020-05-19 Two Six Labs, LLC Gragnostics rendering

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114802A1 (en) * 2003-08-29 2005-05-26 Joerg Beringer Methods and systems for providing a visualization graph

Non-Patent Citations (19)

* Cited by examiner, † Cited by third party
Title
A. Buja, et al., "Data Visualization with Multidimensional Scaling", Sep. 18, 2007, pp. 1-30; at http://www.stat.yale.edu/˜lc436/papers/JCGS-mds.pdf (Year: 2007). *
Bonner et al., "Deep Topology Classification: A New Approach for Massive Graph Classification", https://github.com/sbonner0/DeepTopologyClassification, 2016 IEEE International Conf. on Big Data, 2016, pp. 1-8 (Year: 2016). *
Bonner, et al., "Efficient Comparison of Massive Graphs Through the Use of Graph Fingerprints", Twelfth Workshop on Mining and Learning with Graphs (MLG), 2016, pp. 1-8.
Bonner, et al., "Efficient Comparison of Massive Graphs Through the Use of Graph Fingerprints", Twelfth Workshop on Mining and Learning with Graphs (MLG), 2016, pp. 1-8. (Year: 2016). *
Chen, et al., "A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks", 2017 IEEE International Conference on Data Mining, 2017, pp. 41-50.
Christian Pich, "Applications of Multidimensional Scaling to Graph Drawing", PhD Dissertation, Jul. 2009, pp. 1-172, at https://pdfs.sennanticscholar.org/8f02/53749c83779c04b8f44a30140f217a9676cc.pdf (Year: 2009). *
Dhifli, et al, "Mining Topological Representative Substructures from Molecular Networks", BioKDD' 14 New York City NY, pp. 1-10.
Dhifli, et al., "ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space", Department of Computer Science, University of Quebec at Montreal, Jan. 25, 2016.
Dhifli, et al., "Towards an Efficient Discovery of the Topological Representative Subgraphs", Aug. 16, 2013, pp. 1-21.
J. Gibert, et al., "Graph Embedding in Vector Spaces", GbR'2011, Mini-tutorial, https://iapr-tc15.greyc.fr/download/03MT.pdf, 2011, pp. 1-66 (hereinafter Gibed) (Year: 2011). *
Kantarci, et al., "Classification of Complex Networks Based on Topological Properties", Cloud and Green Computing (CGC), 2013 Third International Conference, IEEE Xplore Digital Library, Sep.-Oct. 2013, pp. 1-9 (Year: 2013). *
Kantarci, et al., "Classification of Complex Networks Based on Topological Properties", IEEE Xplore Digital Library, Sep. 30-Oct. 2013, pp. 1-9, Published in: Cloud and Green Computing (CGC), 2013 Third International Conference.
Kaspar Riesen, et al., "Graph Classification and Clustering based on Vector Space Embedding", Series in Machine Perception and Artificial Intelligence, vol. 77, 2010 by World Scientific Publishing Col., pp. 1-331 (Year: 2010). *
Keneshloo, et al., "A Relative Feature Selection Algorithm for Graph Classification", AISC 186, pp. 137-148, Springer-Verlag Berlin Heidelberg 2013.
Li, et al., "Effective Graph Classification Based on Topological and Label Attributes", 2012 Wiley Periodicals, Inc., pp. 265-283, Apr. 25, 2012, Published on Jun. 12, 2012 in Wiley Online Library (Year: 2012). *
Li, et al., "Effective Graph Classification Based on Topological and Label Attributes", 2012 Wiley Periodicals, Inc., pp. 265-283, Apr. 25, 2012, Published on Jun. 12, 2012 in Wiley Online Library.
N. Cao, et al., "g-Miner: Interactive Visual Group Mining on Multivariate Graphs", http://team-net-work.org/pdfs/CaoLLT_CHI15.pdf, Chi 2015, Apr. 18-23, 2015, Seoul, Korea, pp. 1-10 (Year: 2015). *
Q. Liao, A. Striegel, and N. Chawla, "Visualizing Graph Dynamics and Similarity for Enterprise Network Security and Management," Proc. Seventh Int'l Symp. Visualization for Cyber Security (VizSec '10), pp. 34-45, 2010 (Year: 2010). *
Y. Keselman, "Many-to-Many Graph Matching via Metric Embedding", retrieved from, http://www.cs.toronto.edu/˜sven/Papers/cvpr2003.pdf, 2003, pp. 1-8 (Year: 2003). *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11195312B1 (en) * 2018-03-26 2021-12-07 Two Six Labs, LLC Gragnostics rendering

Also Published As

Publication number Publication date
US20190295296A1 (en) 2019-09-26
US11195312B1 (en) 2021-12-07

Similar Documents

Publication Publication Date Title
US11195312B1 (en) Gragnostics rendering
Chen et al. PME: projected metric embedding on heterogeneous networks for link prediction
Xie et al. Graph convolutional networks with multi-level coarsening for graph classification
Charalambous et al. A data‐driven framework for visual crowd analysis
CN113452548B (en) Index evaluation method and system for network node classification and link prediction
CN111325237A (en) Image identification method based on attention interaction mechanism
Kadavankandy et al. The power of side-information in subgraph detection
Chen et al. Towards better caption supervision for object detection
Min et al. Can hybrid geometric scattering networks help solve the maximum clique problem?
CN110502669B (en) Social media data classification method and device based on N-edge DFS subgraph lightweight unsupervised graph representation learning
Czech Invariants of distance k-graphs for graph embedding
Marasca et al. Assessing classification complexity of datasets using fractals
Singh et al. Dimensionality Reduction for Classification and Clustering
Amorim et al. Supervised learning using local analysis in an optimal-path forest
Bernard et al. Multiscale visual quality assessment for cluster analysis with Self-Organizing Maps
Morshed et al. LeL-GNN: Learnable edge sampling and line based graph neural network for link prediction
Marcílio et al. An approach to perform local analysis on multidimensional projection
Taranto et al. Uncertain Graphs meet Collaborative Filtering.
Zheng et al. Embeddingtree: Hierarchical exploration of entity features in embedding
CN113515519A (en) Method, device and equipment for training graph structure estimation model and storage medium
Al-Furas et al. Deep attributed network embedding via weisfeiler-lehman and autoencoder
Zhang et al. Personalized web page ranking based graph convolutional network for community detection in attribute networks
Sarlin et al. Visual conjoint analysis (VCA): a topology of preferences in multi-attribute decision making
Hoti et al. Spectral Analysis, Agglomerative, Mean Shift and Affinity Propagation Algorithms, Use on the Content from Social Media for Low-Resource Languages
Παναγιωτόπουλος Clustering algorithm selection by meta-learning

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: TWO SIX LABS, LLC, VIRGINIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GOVE, ROBERT P., JR;REEL/FRAME:045407/0767

Effective date: 20180321

FEPP Fee payment procedure

Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: COMERICA BANK, MICHIGAN

Free format text: SECURITY INTEREST;ASSIGNOR:TWO SIX LABS, LLC;REEL/FRAME:051683/0437

Effective date: 20180122

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: TWO SIX LABS, LLC, VIRGINIA

Free format text: RELEASE OF SECURITY INTEREST RECORDED AT R/F 051683/0437;ASSIGNOR:COMERICA BANK;REEL/FRAME:055273/0424

Effective date: 20210201

AS Assignment

Owner name: ANNALY MIDDLE MARKET LENDING LLC, AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNORS:TWO SIX LABS HOLDINGS, INC.;TWO SIX LABS, LLC;REEL/FRAME:057266/0363

Effective date: 20210820

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4