US10657686B2 - Gragnostics rendering - Google Patents
Gragnostics rendering Download PDFInfo
- Publication number
- US10657686B2 US10657686B2 US15/935,657 US201815935657A US10657686B2 US 10657686 B2 US10657686 B2 US 10657686B2 US 201815935657 A US201815935657 A US 201815935657A US 10657686 B2 US10657686 B2 US 10657686B2
- Authority
- US
- United States
- Prior art keywords
- graphs
- graph
- feature
- vertices
- edges
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000009877 rendering Methods 0.000 title claims description 31
- 239000013598 vector Substances 0.000 claims abstract description 101
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000012800 visualization Methods 0.000 claims abstract description 13
- 230000000007 visual effect Effects 0.000 claims description 20
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 238000002955 isolation Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims 1
- 238000004590 computer program Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 10
- 238000013459 approach Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 101100045153 Caenorhabditis elegans wars-2 gene Proteins 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000001276 Kolmogorov–Smirnov test Methods 0.000 description 1
- 230000035508 accumulation Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000010981 turquoise Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/20—Drawing from basic elements, e.g. lines or circles
- G06T11/206—Drawing of charts or graphs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
- G06F16/287—Visualization; Browsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/358—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
- H04L43/045—Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
Definitions
- Graphs are often employed for representing associations or relations between entities. Graphs include a set of nodes, or vertices connected by edges, or lines. Mathematical and computational representations defined by various expressions and data structures are often used in practice, however graphs can also be visually expressed to facilitate human recognition of the information or trends contained therein. Visual recognition of trends can often be made by observing “clusters” of nodes, meaning a set of nodes that share similar attributes, features or values, and the corresponding computational representation is available for automated processing or interpretation of the graph in areas such as machine learning, analytics, and statistics.
- a graph processing system, method and apparatus classifies graphs based on a linearly computable set of features defined as a feature vector adapted for comparison with the feature vectors of other graphs.
- the features result from graph statistics (“gragnostics”) computable from the edges and vertices of a set of graphs.
- Graphs are classified based on a multidimensional distance of the resulting feature vectors, and similar graphs are denoted according to a distance, or nearest neighbor, of the feature vector corresponding to each graph. Projection of the feature vector onto a two dimensional Cartesian plane allows visualization of the classification, as similar graphs appear as clusters or groups separated by a relatively shorter distance. Different types or classifications of graphs also appear as other, more distant, clusters.
- An initial training set defines the classification types, and sampled graphs are evaluated and classified based on the feature vector and nearest neighbors in the training set.
- Configurations herein are based, in part, on the observation that graphs are often employed for storing information that can also be rendered or visualized to facilitate human recognition of the information contained in the graph.
- conventional approaches to graph processing and rendering suffers from the shortcoming that large graphs can be unwieldy in both processing and recognition.
- Existing approaches to comparing graphs are slow and not very expressive in explaining how the graphs are similar or dissimilar.
- Large graphs, such as those that may be derived from analytics-focused data sources can result in a number of nodes and vertices that are cumbersome to process, and difficult to visually recognize or interpret, due to scaling or geometric issues, e.g. a large number of nodes or vertices merge into an amorphous visual image.
- configurations herein substantially overcome the above-described shortcomings by providing a linearly computable feature vector based on quantifiable features of a graph, and a comparison metric that determines a classification of a graph to designate graphs sharing similar features, and hence, likely depict related information types or forms.
- classification of a graph results from comparison of other graphs to answer questions such as “Which graphs or graph types does my graph most resemble?”
- correlations between the data sources may be inferred.
- Configurations disclosed herein are operable for, in an analytics environment having graph data responsive to rendering for visual recognition, comparison of statistical trends defined in a plurality of graphs.
- the disclosed scalable method of rendering visualized graph data includes receiving a plurality of graphs, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges, and computing, for each graph, a plurality of features based on the edges and vertices.
- the computed features detailed below, provide a scalar quantity that may then be normalized into a predetermined range, such as 0-1.
- An application operable on a gragnostics arranges, for each graph, each of the normalized features into a feature vector having ordered values for each of the features.
- the example herein coalesces 10 features into an ordered vector of normalized values in the 0.0-1.0 range, providing a multidimensional vector coalescing the features of each graph.
- the gragnostics processor determines similarity of the graphs based on a distance between the corresponding visualized feature vectors, and a user display renders a visualization of the feature vectors for depicting the relative distance from other graphs.
- FIG. 1 is a context diagram of a computing environment suitable for use with configurations herein;
- FIG. 2 shows an example of graph features employed in assessing graphs in the computing environment of FIG. 1 ;
- FIG. 3 is an example correlation of the graph features FIG. 2 derived from a set of graphs
- FIGS. 4A-4D show a gragnostic plot for comparing feature vectors aggregated from the graph features of FIG. 3 ;
- FIGS. 5 A 1 - 5 C denote a flowchart depicting graph processing resulting in the gragnostic plot of FIGS. 4A-4D .
- Configurations below implement a gragnostic processing approach for an example set of graphs, and illustrate comparison and determination of a graph class based on a baseline or control set of other graphs.
- An example set of graphs are employed for training and for classification, however any suitable graph representations may be employed.
- a graph as employed herein is a mathematical structure defining a set of nodes—also known as vertices—and the links—or edges—that connect them. Speed and efficiency are significant because graph comparisons can be used to determine similarity or distances between a large number of graphs, which in turn can then be used to cluster large graph datasets, or to query a database to identify similar graphs. For example, in social media networks, clustering users' ego networks may identify fake accounts vs. real accounts.
- the disclosed approach demonstrates several advantages over the prior art: 1) Computational performance that enables highly scalable graph comparisons, 2) comparisons that can be meaningfully interpreted by humans, and 3) fewer constraints on input graphs. Determining similarity of two graphs is related to graph isomorphism, which belongs to the class of NP problems, i.e. computationally intractable in practice.
- Conventional approaches include so-called Graph kernels which are more computationally tractable by using algorithms whose running time is a polynomial function of the number of nodes in the graph, but these kernels cannot be meaningfully interpreted or understood by humans.
- polynomial time functions become computationally infeasible for very large graphs.
- linear computability metrics are scalable because the complexity (number of computing operations) does not vary exponentially with the number of inputs, which becomes prohibitive with a sufficiently large number of inputs.
- Other conventional approaches use techniques such as singular value decomposition of adjacency matrices or multi-dimensional scaling on Euclidean distances between adjacency matrices, however these are also polynomial time functions that yield unintelligible results and require that input graphs all be of the same size.
- the gragnostics approach disclosed herein emphasizes several advantages.
- gragnostics should scale to large graphs, thus the disclosed gragnostics can be computed in O(
- the features are comprehensible by analysts who may not be experts in graph theory, as the rendered gragnostics correspond to topological characteristics described in visual renderings. This enables broad audiences to easily understand gragnostics.
- the disclosed approach imposes few constraints, so there are few restrictions on size or components, which also complements the linear computability provided by O(
- FIG. 1 is a context diagram of a computing environment 100 suitable for use with configurations herein.
- a gragnostics processor 110 is responsive to a user computing device 112 .
- a repository 120 stores a plurality of graphs, which may include a training set 122 of graphs depicting particular types, or previous graphs 124 employed for classification, which are then added to the set of graphs employed for classification.
- Configurations herein are particularly beneficial in an analytics environment having graph data responsive to rendering for visual recognition and comparison.
- Visually recognizable aspects of the rendered graphs can denote statistical trends defined in a plurality of graphs.
- the gragnostics processor 110 receives a plurality of graphs, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges.
- the graphs are defined by a data structure indicative of, for each vertex, an association to a set of connected vertices, in which each vertex is renderable as a node and the set of associations defined by a line (edge) to each connected vertex.
- a variety of data structures may be employed for storing the graph representations as utilized herein.
- the user device 112 receives a graph for classification 150 from any suitable source, such as the Internet 130 or other source, such as a set of graphs stored on media to be classified according to configurations herein.
- the classification processor 110 which may simply be a desktop PC in conjunction with the user device 112 , performs the classification and renders a graph classification 152 indicative of graphs in the repository 120 that the graph for classification most resembles. As will be discussed further below, similar graphs produce distinct clusters when classified according to a robust set of graphs, therefore defining a distinct group that the graph for classification most resembles.
- the graph may also be stored in the repository 120 as a previous graph 124 and used for subsequent classifications.
- FIG. 2 shows an example of graph features employed in assessing graphs in the computing environment of FIG. 1 .
- the gragnostics processor 150 computes, for each graph, a plurality of features based on the edges and vertices.
- Each of the features has a linear computability time such that the feature is computable in a time that varies linearly with the number of nodes and/or vertices, providing for scalability to large accumulations of data.
- Each feature corresponds to a set of traversal steps for processing the graph, in which the traversal steps define a finite sequence of operations that varies linearly based on the number of vertices or edges in the graph.
- the gragnostics processor 150 computes each of the features in a numerical manner, and normalizes each of the features to a range between 0.0 and 1.0 to facilitate intra-graph comparisons.
- the features relate to visual attributes of the graphs, and include density, bridge, disconnectivity, isolation, constriction, linearity, tree and star, and also the number of vertices (nodes) and a number of edges.
- FIG. 2 shows a visual appearance of a minimal 201 depiction of a feature ranging to a maximum depiction 203 of a feature, as well as a moderate depiction 202 of the feature.
- Gragnostics processor 150 normalizes the computed features into a predetermined range, such that for each feature, the value is scaled to a range of between 0 and 1. Alternate feature ranges may be performed, however normalizing to the same range allows comparison between different graphs in the plurality of graphs.
- the gragnostics processor 110 employs constant time interpretable metrics to create a graph feature vector that can be used to compute distance between graphs using techniques such as or Euclidean distance or compute clusters of graphs using techniques such as k-nearest neighbors or k-means clustering. Other multidimensional distance approaches may also be employed.
- the gragnostics processor 110 arranges each of the normalized features into a feature vector having ordered values for each of the features.
- the feature vector includes a value for each feature normalized to a common range for the 10 features, 0.0-1.0 in this case.
- FIG. 3 is an example correlation of the graph features FIG. 2 derived from a set of graphs. Referring to FIGS. 2 and 3 , computation of each of the 10 feature metrics results in the intermediate step shown in FIG. 3 .
- a plurality of classes 310 are defined based on at least the training set 122 to produce a correlation 300 of features.
- the classes 310 each denoted by a different color, define the classifications, or groups of graphs, discussed further below in FIG. 4 .
- FIG. 3 illustrates the degree to which certain features correlate with, or predict, other features.
- a vertical axis 320 lists each of the 10 features, and a horizontal axis 322 lists the same set of features.
- each feature depicts an array of subgraphs 350 (3 upper left subgraphs labeled for clarity).
- FIG. 3 also illustrates the benefit of normalizing each of the features to the range of 0.0-1.0 to allow comparison to other features, as each subgraph 350 has a horizontal axis 352 and vertical axis 354 .
- the horizontal axis 352 defines the value of the feature defined on the axis 322
- the vertical axis 354 defines the value of the feature defined on the axis 320 .
- each graph in a training set 122 is plotted based on the normalized value of the feature. This illustrates the intermediate step of rendering a graphing of each feature against the other features for each graph.
- the color of the plot point indicates the group from which the graph was derived.
- a grouping 360 denotes that subway graphs (green dots showing graphs derived from an inner city subway layout) tend to be less constricted (constricted feature value near 0.0).
- Group 362 shows that a high value in the lines feature tends to correlate with the constriction feature.
- Group 364 demonstrates that star and bridge features distinguish the ego graphs (pink dots showing social media connections).
- group 366 distinguishes geometric graphs (blue dots derived from graphs of regular geometric shapes).
- Group 368 demonstrates a correlation between tree and density features
- group 370 shows correlation between bridge and constricted features.
- FIGS. 4A-4D coalesce the aggregate features of each graph for comparison on a broader scale.
- FIGS. 4A-4D show a gragnostic plot for comparing feature vectors aggregated from the graph features of FIG. 3 .
- the computed set of features ( 10 , in the example shown) defines an ordered vector, which can be represented in a multidimensional space.
- FIGS. 4A-4D build on the feature vector generated from the metrics of FIG. 3 by computing a two dimensional (2D) plot depicting each of the feature vectors presented for comparison.
- the preexisting graphs of the training set 122 and previous graphs 124 are already plotted to define graph types, one of which the graphs for classification 150 will fall into.
- the feature vector includes the ten 0.0-1.0 magnitudes of each of the normalized features for each graph.
- the user device 112 is operable for rendering a visualization of the feature vectors, and the gragnostics processor 110 determines a similarity of the graphs based on a distance between the corresponding visualized feature vectors. It should be noted that the visual rendering of FIG. 3 , depicting individual features, is an intermediate step not required for generating the feature vector.
- the feature vector is generated from the 10 computed, normalized features.
- the feature vector when computed as on ordered set of normalized values, therefore defines a multidimensional vector, or a reference to a multidimensional space.
- the gragnostics processor 110 is configured to compute a multidimensional distance between each of the feature vectors for determining a similarity between the graphs corresponding to the feature vectors.
- a multidimensional distance is computable between vectors of different graphs, offering a coalesced metric of similarity to other graphs.
- the feature vector may be projected or reduced onto a two-dimensional (2D) plane depicting the computed distance between the feature vectors of different graphs. Similar graphs appear as a “cluster” or closely located set of points, as depicted in FIGS. 4A-4D .
- Graphs classification occurs based on the computed distance between the feature vectors of each of the graphs, and rendered visually by these clusters of points.
- the graph rendering of FIG. 4A depicts a comparison of the computed distance between each of the graphs, and classification of each of the graphs based on the comparison by determining a nearest neighbor, shown as different colors in FIG. 4A .
- FIGS. 4A-4D distinct groups of values (points) are shown for a sample set of graphs.
- a two dimensional rendering region 400 depicts a metric multi-dimensional scaling (MDS) plot of graphs projected onto 2 dimensions from the 10-dimensional gragnostics feature space (e.g a multidimensional space encompassing the 10 value feature vector).
- MDS metric multi-dimensional scaling
- FIG. 4A illustrates notable class separation in groups 410 , 430 , 450 , 470 and 490 , indicating that different classes of graphs have different feature vector values.
- a cluster 410 of green plots are based on graphs of subway maps.
- Graph plots 411 - 1 , 411 - 2 and 411 - 3 are based on graph data of graphs 421 - 1 , 421 - 2 and 421 - 3 , respectively, depicting the Tokyo, Shanghai and London subway networks.
- Tokyo 421 - 1 represents the graphs for classification and graphs 421 - 2 and 421 - 3 are the computed nearest neighbors in the training set.
- Feature vector 423 - 1 shows the values of the features computed for the graph 421 - 1
- feature vectors 423 - 2 and 423 - 3 shows the respective training set values.
- a nearest neighbors value 425 shows London and Shanghai as the closest valued vectors, a characteristic visually apparent on the rendering region 400 . It is apparent that, based on the classification of the Tokyo subway graph 421 - 1 , its distance to the London subway graph 421 - 3 is very short; its features are nearly identical, and its force-directed node-link diagram shares the same visual structure. Meanwhile, the Tokyo graph's second nearest neighbor is the Shanghai subway graph 421 - 2 , which is farther away than London. Shanghai has higher bridge, constricted, and line features. Furthermore, we can visually confirm this dissimilarity by looking at Shanghai's force-directed node-link diagram and noting that it has more bridge edges, it is more constricted, and it is more line-like because more vertices have only two edges
- the feature vectors 443 - 1 , 443 - 2 and 443 - 3 indicate that the star feature is most pronounced, followed by the tree feature value.
- the nearest neighbor values 445 likewise designate “686” and “348” as the closest.
- one of the nearest character graphs is Star Wars 2 (441-4) having feature vector 443 - 4 .
- the Star Wars 2 and Storm of Swords graphs ( FIG. 4D ) are more typical of non-autobiographical character graphs. We also see that the software and the character classes overlap in the MDS plot 400 .
- Another grouping 450 is based on software graphs, defined by code structure and plotted as color purple.
- a plot for classification 451 - 1 is shown in FIG. 4D as graph 461 - 1 (sjbullet), with nearest neighbors of 451 - 2 and 451 - 3 , corresponding to graphs 461 - 2 (Storm of Swords) and Javax 461 - 3 .
- graph 461 - 1 sjbullet
- nearest neighbors of 451 - 2 and 451 - 3 corresponding to graphs 461 - 2 (Storm of Swords) and Javax 461 - 3 .
- the next nearest neighbor is Javax 461 - 3 , shown by nearest neighbor values 465 , and each has highest feature values of tree, with the star feature running a distant second.
- Computation of the feature vector includes computation of each of the following 10 features in linear computability time, meaning that the number of computing instructions, and hence time, varies linearly with the number of nodes and/or vertices.
- Number of nodes This counts the number of nodes in the graph. This runs in O(
- Number of links This counts the number of links in the graph. This runs in O(
- Density This determines the link density in the graph, or the probability that any two randomly chosen nodes will be connected via a link. This is calculated by:
- Isolation This describes the fraction of nodes in the graph that are not connected to any other node. This is calculated by:
- v* represents the node with the highest degree in the graph. This requires finding the node with the highest degree and summing the degree of each node, so this runs in O(
- bridge ⁇ ( G ) ⁇ V ⁇ - 1
- bridge(G) is the number of bridge links in graph G.
- cut(G) is the number of nodes whose removal will disconnect the graph. This can also be breadth found using a breadth-first search in O(
- Disconnected This describes the fraction of connected components in a graph out of the maximum possible number of connected components, i.e. a fraction denoting the degree that clusters are unreachable by other clusters.
- Tree This describes how close a graph is to being a tree, i.e. how close it is to a graph with no cycles. This is calculated by:
- ⁇ i 1 ⁇ V ⁇ ⁇ l ⁇ ( i ) ⁇ V ⁇ ⁇ ⁇
- D is a vector of length
- where each element is the degree d(v) of a vertex in V such that if d(v) 1 then it is at the beginning of the vector.
- FIGS. 5 A 1 - 5 C denote a flowchart 500 depicting graph processing resulting in the gragnostic plot of FIGS. 4A-4D .
- the gragnostics processor 110 loads each graph from a media source or user data store.
- a classification is performed for one graph against a preexisting training set 122 or previous graphs 124 presented for classification.
- the following gragnostics steps are performed for all graphs, and a classification distinction made against previously processed graphs. This includes receiving a plurality of graphs in a suitable data structure, such that each graph defines associations between data entities and is renderable in a visual form having vertices connected by edges, as depicted at step 502 .
- the gragnostics processor 110 preprocesses each graph to arrange for feature computation, as disclosed at step 503 .
- Each of the graphs is defined by a data structure indicative of, for each vertex, an association to a set of connected vertices, such that each vertex is renderable as a node and the set of associations defined by a line to each connected vertex, as depicted at step 504 .
- Such an arrangement corresponds to a typical visualization of a graph which has the appearance of circles with lines emanating from each circle and terminating in other circles to which the lines connect.
- the gragnostics processor 110 extracts the 10 features, as shown at step 505 .
- the set of 10 features provides an illustrative set for permitting optimal computation of the graphs as defined herein, however similar configurations may consider a greater or lesser number of features.
- the gragnostics processor 110 computes, for each graph, a plurality of features based on the edges and vertices, as depicted at step 506 .
- each of the features has a linear computability time such that the feature is computable in a time that varies linearly with at least one of the number of nodes or vertices, as depicted at step 507
- the gragnostics processor computes the tree feature at step 508 , depicted in FIG. 5B , and the linearity feature at step 509 , depicted in FIG. 5C .
- the gragnostics processor 110 creates a feature vector using the 10 features for each graph, as disclosed at step 510 . This includes, at step 511 , normalizing the computed features into a predetermined range, and for each graph, arranging each of the normalized features into a feature vector, such that the feature vector has ordered values for each of the features, depicted at step 512 . Normalizing, in the example configuration, scales each feature to a range of 0.0-1.0, facilitating comparison and rendering as a multidimensional vector.
- Comparison and visualization includes computing distances between graphs using the feature vectors, as shown at step 512 .
- Multidimensional values such as the feature vector may be compared using Euclidian distance or similar metrics, such that the distance between feature vectors is indicative of the similarity of the corresponding graphs.
- the gragnostics processor 110 computes a two dimensional position based on each of the feature vectors, as depicted at step 514 , and projects the position of each vector onto a visualized two dimensional rendering, as depicted at step 515 and rendered in FIG. 4A .
- the gragnostics processor 110 launches the desired comparison analytic (e.g. clustering or nearest neighbor search), and invokes the interpretability of the features to understand differences between graphs and clusters of graphs (e.g. these two graphs are very similar, except one is more star-like than the other), as disclosed at step 516 .
- Groupings, or classifications of graphs can therefore be determined by observing a cluster of graphs separated by relatively small distances.
- the gragnostics processor 110 classifies, based on a distance on the visualized two dimensional graph, groups of graphs, such that the classification is defined by visual clusters of the positions on the two dimensional rendering, as depicted at step 517 .
- the result is a determination of whether the graph for classification corresponds to one of the classes of graphs, disclosed at step 518 .
- FIG. 5B is a flowchart of computation of the tree feature.
- computing the features includes, at step 550 , and determining the tree feature in linear time by traversing each of the vertices in the graph and accumulating, based on the traversal, a number of edges, as depicted at step 551 .
- the gragnostics processor 110 determines a number of edges for which the removal of would result in a tree by removing cyclic paths, as disclosed at step 552 , thus providing a measure of how “close” the graph is to a model tree, and compares the determined number of edges with a number of the traversed vertices, as disclosed at step 553 .
- FIG. 5C details computation of the linearity feature.
- computing the linearity feature in linear time includes traversing each of the vertices in the graph, as shown at step 570 , and determining, at each vertex, if a number of edges emanating from the vertex is consistent with a linear graph, as depicted at step 571 .
- the gragnostics processor 110 accumulates the number of vertices consistent with a linear graph, at step 572 , and compares the accumulated vertices with the number of traversed vertices, as depicted at step 573 .
- the result is a measure of a relative number of total vertices that satisfy the criteria for a linear graph, meaning two edges touch each vertex with the exception of two terminal vertices touching only one edge.
- programs and methods defined herein are deliverable to a user processing and rendering device in many forms, including but not limited to a) information permanently stored on non-writeable storage media such as ROM devices, b) information alterably stored on writeable non-transitory storage media such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and optical media, or c) information conveyed to a computer through communication media, as in an electronic network such as the Internet or telephone modem lines.
- the operations and methods may be implemented in a software executable object or as a set of encoded instructions for execution by a processor responsive to the instructions.
- ASICs Application Specific Integrated Circuits
- FPGAs Field Programmable Gate Arrays
- state machines controllers or other hardware components or devices, or a combination of hardware, software, and firmware components.
Abstract
Description
If |V| and |E| are already calculated, then this runs in O(1) time; otherwise it runs in O(|V|+|E|) time.
Where d(v) is the degree of node v. This requires counting the number of nodes that have
Where v* represents the node with the highest degree in the graph. This requires finding the node with the highest degree and summing the degree of each node, so this runs in O(|V|+|E|) time.
Where bridge(G) is the number of bridge links in graph G. Using a breadth-first search, all bridges can be found in O(|V|+|E|) time.
Where cut(G) is the number of nodes whose removal will disconnect the graph. This can also be breadth found using a breadth-first search in O(|V|+|E|) time.
Where |C| is the number of connected components in the graph, which can be found in O(|V|+|E|) time.
Colloquially, this refers to the number of links needed to be removed in order to make the graph a tree, divided by the maximum possible number of links needed to remove to make the graph a tree, all subtracted from 1. If |V| and |E| are already calculated, then this runs in O(1) time; otherwise it runs in O(|V|+|E|) time. Line: This calculates how close a graph is to being a line, i.e. a tree with exactly two leaves where there is only one path from one leaf to the other. This is calculated by:
Here, D is a vector of length |V| where each element is the degree d(v) of a vertex in V such that if d(v)=1 then it is at the beginning of the vector. This requires iterating over each edge to calculate the degree of each vertex, and then iterating over each vertex to twice to ensure that if d(v)=1 then it is at the front of the list, so this runs in O(|V|+|E|) time.
Claims (16)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/935,657 US10657686B2 (en) | 2018-03-26 | 2018-03-26 | Gragnostics rendering |
US16/865,946 US11195312B1 (en) | 2018-03-26 | 2020-05-04 | Gragnostics rendering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/935,657 US10657686B2 (en) | 2018-03-26 | 2018-03-26 | Gragnostics rendering |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/865,946 Continuation US11195312B1 (en) | 2018-03-26 | 2020-05-04 | Gragnostics rendering |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190295296A1 US20190295296A1 (en) | 2019-09-26 |
US10657686B2 true US10657686B2 (en) | 2020-05-19 |
Family
ID=67985465
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/935,657 Active 2038-04-13 US10657686B2 (en) | 2018-03-26 | 2018-03-26 | Gragnostics rendering |
US16/865,946 Active US11195312B1 (en) | 2018-03-26 | 2020-05-04 | Gragnostics rendering |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/865,946 Active US11195312B1 (en) | 2018-03-26 | 2020-05-04 | Gragnostics rendering |
Country Status (1)
Country | Link |
---|---|
US (2) | US10657686B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11195312B1 (en) * | 2018-03-26 | 2021-12-07 | Two Six Labs, LLC | Gragnostics rendering |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9396283B2 (en) | 2010-10-22 | 2016-07-19 | Daniel Paul Miranker | System for accessing a relational database using semantic queries |
US10452677B2 (en) | 2016-06-19 | 2019-10-22 | Data.World, Inc. | Dataset analysis and dataset attribute inferencing to form collaborative datasets |
US11068847B2 (en) | 2016-06-19 | 2021-07-20 | Data.World, Inc. | Computerized tools to facilitate data project development via data access layering logic in a networked computing platform including collaborative datasets |
US10438013B2 (en) | 2016-06-19 | 2019-10-08 | Data.World, Inc. | Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization |
US11755602B2 (en) | 2016-06-19 | 2023-09-12 | Data.World, Inc. | Correlating parallelized data from disparate data sources to aggregate graph data portions to predictively identify entity data |
US10324925B2 (en) | 2016-06-19 | 2019-06-18 | Data.World, Inc. | Query generation for collaborative datasets |
US11468049B2 (en) | 2016-06-19 | 2022-10-11 | Data.World, Inc. | Data ingestion to generate layered dataset interrelations to form a system of networked collaborative datasets |
US11036697B2 (en) | 2016-06-19 | 2021-06-15 | Data.World, Inc. | Transmuting data associations among data arrangements to facilitate data operations in a system of networked collaborative datasets |
US10353911B2 (en) | 2016-06-19 | 2019-07-16 | Data.World, Inc. | Computerized tools to discover, form, and analyze dataset interrelations among a system of networked collaborative datasets |
US11023104B2 (en) | 2016-06-19 | 2021-06-01 | data.world,Inc. | Interactive interfaces as computerized tools to present summarization data of dataset attributes for collaborative datasets |
US11042560B2 (en) | 2016-06-19 | 2021-06-22 | data. world, Inc. | Extended computerized query language syntax for analyzing multiple tabular data arrangements in data-driven collaborative projects |
US11675808B2 (en) | 2016-06-19 | 2023-06-13 | Data.World, Inc. | Dataset analysis and dataset attribute inferencing to form collaborative datasets |
US10747774B2 (en) | 2016-06-19 | 2020-08-18 | Data.World, Inc. | Interactive interfaces to present data arrangement overviews and summarized dataset attributes for collaborative datasets |
US11941140B2 (en) | 2016-06-19 | 2024-03-26 | Data.World, Inc. | Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization |
US10645548B2 (en) | 2016-06-19 | 2020-05-05 | Data.World, Inc. | Computerized tool implementation of layered data files to discover, form, or analyze dataset interrelations of networked collaborative datasets |
US11334625B2 (en) | 2016-06-19 | 2022-05-17 | Data.World, Inc. | Loading collaborative datasets into data stores for queries via distributed computer networks |
US11042548B2 (en) | 2016-06-19 | 2021-06-22 | Data World, Inc. | Aggregation of ancillary data associated with source data in a system of networked collaborative datasets |
US10853376B2 (en) | 2016-06-19 | 2020-12-01 | Data.World, Inc. | Collaborative dataset consolidation via distributed computer networks |
US11042556B2 (en) | 2016-06-19 | 2021-06-22 | Data.World, Inc. | Localized link formation to perform implicitly federated queries using extended computerized query language syntax |
US11947554B2 (en) | 2016-06-19 | 2024-04-02 | Data.World, Inc. | Loading collaborative datasets into data stores for queries via distributed computer networks |
US10452975B2 (en) | 2016-06-19 | 2019-10-22 | Data.World, Inc. | Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization |
US11036716B2 (en) | 2016-06-19 | 2021-06-15 | Data World, Inc. | Layered data generation and data remediation to facilitate formation of interrelated data in a system of networked collaborative datasets |
US11086896B2 (en) | 2016-06-19 | 2021-08-10 | Data.World, Inc. | Dynamic composite data dictionary to facilitate data operations via computerized tools configured to access collaborative datasets in a networked computing platform |
US11042537B2 (en) | 2016-06-19 | 2021-06-22 | Data.World, Inc. | Link-formative auxiliary queries applied at data ingestion to facilitate data operations in a system of networked collaborative datasets |
US10824637B2 (en) | 2017-03-09 | 2020-11-03 | Data.World, Inc. | Matching subsets of tabular data arrangements to subsets of graphical data arrangements at ingestion into data driven collaborative datasets |
US11238109B2 (en) | 2017-03-09 | 2022-02-01 | Data.World, Inc. | Computerized tools configured to determine subsets of graph data arrangements for linking relevant data to enrich datasets associated with a data-driven collaborative dataset platform |
US11068453B2 (en) | 2017-03-09 | 2021-07-20 | data.world, Inc | Determining a degree of similarity of a subset of tabular data arrangements to subsets of graph data arrangements at ingestion into a data-driven collaborative dataset platform |
US10922308B2 (en) * | 2018-03-20 | 2021-02-16 | Data.World, Inc. | Predictive determination of constraint data for application with linked data in graph-based datasets associated with a data-driven collaborative dataset platform |
US11243960B2 (en) | 2018-03-20 | 2022-02-08 | Data.World, Inc. | Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures |
US11212299B2 (en) * | 2018-05-01 | 2021-12-28 | Royal Bank Of Canada | System and method for monitoring security attack chains |
USD940732S1 (en) | 2018-05-22 | 2022-01-11 | Data.World, Inc. | Display screen or portion thereof with a graphical user interface |
USD940169S1 (en) | 2018-05-22 | 2022-01-04 | Data.World, Inc. | Display screen or portion thereof with a graphical user interface |
US11947529B2 (en) | 2018-05-22 | 2024-04-02 | Data.World, Inc. | Generating and analyzing a data model to identify relevant data catalog data derived from graph-based data arrangements to perform an action |
US11442988B2 (en) | 2018-06-07 | 2022-09-13 | Data.World, Inc. | Method and system for editing and maintaining a graph schema |
WO2022072895A1 (en) | 2020-10-01 | 2022-04-07 | Crowdsmart, Inc. | Managing and measuring semantic coverage in knowledge discovery processes |
US11947600B2 (en) | 2021-11-30 | 2024-04-02 | Data.World, Inc. | Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050114802A1 (en) * | 2003-08-29 | 2005-05-26 | Joerg Beringer | Methods and systems for providing a visualization graph |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10657686B2 (en) * | 2018-03-26 | 2020-05-19 | Two Six Labs, LLC | Gragnostics rendering |
-
2018
- 2018-03-26 US US15/935,657 patent/US10657686B2/en active Active
-
2020
- 2020-05-04 US US16/865,946 patent/US11195312B1/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050114802A1 (en) * | 2003-08-29 | 2005-05-26 | Joerg Beringer | Methods and systems for providing a visualization graph |
Non-Patent Citations (19)
Title |
---|
A. Buja, et al., "Data Visualization with Multidimensional Scaling", Sep. 18, 2007, pp. 1-30; at http://www.stat.yale.edu/˜lc436/papers/JCGS-mds.pdf (Year: 2007). * |
Bonner et al., "Deep Topology Classification: A New Approach for Massive Graph Classification", https://github.com/sbonner0/DeepTopologyClassification, 2016 IEEE International Conf. on Big Data, 2016, pp. 1-8 (Year: 2016). * |
Bonner, et al., "Efficient Comparison of Massive Graphs Through the Use of Graph Fingerprints", Twelfth Workshop on Mining and Learning with Graphs (MLG), 2016, pp. 1-8. |
Bonner, et al., "Efficient Comparison of Massive Graphs Through the Use of Graph Fingerprints", Twelfth Workshop on Mining and Learning with Graphs (MLG), 2016, pp. 1-8. (Year: 2016). * |
Chen, et al., "A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks", 2017 IEEE International Conference on Data Mining, 2017, pp. 41-50. |
Christian Pich, "Applications of Multidimensional Scaling to Graph Drawing", PhD Dissertation, Jul. 2009, pp. 1-172, at https://pdfs.sennanticscholar.org/8f02/53749c83779c04b8f44a30140f217a9676cc.pdf (Year: 2009). * |
Dhifli, et al, "Mining Topological Representative Substructures from Molecular Networks", BioKDD' 14 New York City NY, pp. 1-10. |
Dhifli, et al., "ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space", Department of Computer Science, University of Quebec at Montreal, Jan. 25, 2016. |
Dhifli, et al., "Towards an Efficient Discovery of the Topological Representative Subgraphs", Aug. 16, 2013, pp. 1-21. |
J. Gibert, et al., "Graph Embedding in Vector Spaces", GbR'2011, Mini-tutorial, https://iapr-tc15.greyc.fr/download/03MT.pdf, 2011, pp. 1-66 (hereinafter Gibed) (Year: 2011). * |
Kantarci, et al., "Classification of Complex Networks Based on Topological Properties", Cloud and Green Computing (CGC), 2013 Third International Conference, IEEE Xplore Digital Library, Sep.-Oct. 2013, pp. 1-9 (Year: 2013). * |
Kantarci, et al., "Classification of Complex Networks Based on Topological Properties", IEEE Xplore Digital Library, Sep. 30-Oct. 2013, pp. 1-9, Published in: Cloud and Green Computing (CGC), 2013 Third International Conference. |
Kaspar Riesen, et al., "Graph Classification and Clustering based on Vector Space Embedding", Series in Machine Perception and Artificial Intelligence, vol. 77, 2010 by World Scientific Publishing Col., pp. 1-331 (Year: 2010). * |
Keneshloo, et al., "A Relative Feature Selection Algorithm for Graph Classification", AISC 186, pp. 137-148, Springer-Verlag Berlin Heidelberg 2013. |
Li, et al., "Effective Graph Classification Based on Topological and Label Attributes", 2012 Wiley Periodicals, Inc., pp. 265-283, Apr. 25, 2012, Published on Jun. 12, 2012 in Wiley Online Library (Year: 2012). * |
Li, et al., "Effective Graph Classification Based on Topological and Label Attributes", 2012 Wiley Periodicals, Inc., pp. 265-283, Apr. 25, 2012, Published on Jun. 12, 2012 in Wiley Online Library. |
N. Cao, et al., "g-Miner: Interactive Visual Group Mining on Multivariate Graphs", http://team-net-work.org/pdfs/CaoLLT_CHI15.pdf, Chi 2015, Apr. 18-23, 2015, Seoul, Korea, pp. 1-10 (Year: 2015). * |
Q. Liao, A. Striegel, and N. Chawla, "Visualizing Graph Dynamics and Similarity for Enterprise Network Security and Management," Proc. Seventh Int'l Symp. Visualization for Cyber Security (VizSec '10), pp. 34-45, 2010 (Year: 2010). * |
Y. Keselman, "Many-to-Many Graph Matching via Metric Embedding", retrieved from, http://www.cs.toronto.edu/˜sven/Papers/cvpr2003.pdf, 2003, pp. 1-8 (Year: 2003). * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11195312B1 (en) * | 2018-03-26 | 2021-12-07 | Two Six Labs, LLC | Gragnostics rendering |
Also Published As
Publication number | Publication date |
---|---|
US20190295296A1 (en) | 2019-09-26 |
US11195312B1 (en) | 2021-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11195312B1 (en) | Gragnostics rendering | |
Chen et al. | PME: projected metric embedding on heterogeneous networks for link prediction | |
Xie et al. | Graph convolutional networks with multi-level coarsening for graph classification | |
Charalambous et al. | A data‐driven framework for visual crowd analysis | |
CN113452548B (en) | Index evaluation method and system for network node classification and link prediction | |
CN111325237A (en) | Image identification method based on attention interaction mechanism | |
Kadavankandy et al. | The power of side-information in subgraph detection | |
Chen et al. | Towards better caption supervision for object detection | |
Min et al. | Can hybrid geometric scattering networks help solve the maximum clique problem? | |
CN110502669B (en) | Social media data classification method and device based on N-edge DFS subgraph lightweight unsupervised graph representation learning | |
Czech | Invariants of distance k-graphs for graph embedding | |
Marasca et al. | Assessing classification complexity of datasets using fractals | |
Singh et al. | Dimensionality Reduction for Classification and Clustering | |
Amorim et al. | Supervised learning using local analysis in an optimal-path forest | |
Bernard et al. | Multiscale visual quality assessment for cluster analysis with Self-Organizing Maps | |
Morshed et al. | LeL-GNN: Learnable edge sampling and line based graph neural network for link prediction | |
Marcílio et al. | An approach to perform local analysis on multidimensional projection | |
Taranto et al. | Uncertain Graphs meet Collaborative Filtering. | |
Zheng et al. | Embeddingtree: Hierarchical exploration of entity features in embedding | |
CN113515519A (en) | Method, device and equipment for training graph structure estimation model and storage medium | |
Al-Furas et al. | Deep attributed network embedding via weisfeiler-lehman and autoencoder | |
Zhang et al. | Personalized web page ranking based graph convolutional network for community detection in attribute networks | |
Sarlin et al. | Visual conjoint analysis (VCA): a topology of preferences in multi-attribute decision making | |
Hoti et al. | Spectral Analysis, Agglomerative, Mean Shift and Affinity Propagation Algorithms, Use on the Content from Social Media for Low-Resource Languages | |
Παναγιωτόπουλος | Clustering algorithm selection by meta-learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: TWO SIX LABS, LLC, VIRGINIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GOVE, ROBERT P., JR;REEL/FRAME:045407/0767 Effective date: 20180321 |
|
FEPP | Fee payment procedure |
Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: COMERICA BANK, MICHIGAN Free format text: SECURITY INTEREST;ASSIGNOR:TWO SIX LABS, LLC;REEL/FRAME:051683/0437 Effective date: 20180122 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: TWO SIX LABS, LLC, VIRGINIA Free format text: RELEASE OF SECURITY INTEREST RECORDED AT R/F 051683/0437;ASSIGNOR:COMERICA BANK;REEL/FRAME:055273/0424 Effective date: 20210201 |
|
AS | Assignment |
Owner name: ANNALY MIDDLE MARKET LENDING LLC, AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNORS:TWO SIX LABS HOLDINGS, INC.;TWO SIX LABS, LLC;REEL/FRAME:057266/0363 Effective date: 20210820 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |