WO2022066156A1  Node embedding via hashbased projection of transformed personalized pagerank  Google Patents
Node embedding via hashbased projection of transformed personalized pagerank Download PDFInfo
 Publication number
 WO2022066156A1 WO2022066156A1 PCT/US2020/052461 US2020052461W WO2022066156A1 WO 2022066156 A1 WO2022066156 A1 WO 2022066156A1 US 2020052461 W US2020052461 W US 2020052461W WO 2022066156 A1 WO2022066156 A1 WO 2022066156A1
 Authority
 WO
 WIPO (PCT)
 Prior art keywords
 vector
 node
 personal
 pagerank
 given node
 Prior art date
Links
 230000015654 memory Effects 0.000 claims description 20
 229920005630 polypropylene random copolymer Polymers 0.000 abstract description 42
 238000005516 engineering process Methods 0.000 abstract description 18
 238000010586 diagram Methods 0.000 description 7
 239000011159 matrix material Substances 0.000 description 5
 230000002104 routine Effects 0.000 description 5
 230000006403 shortterm memory Effects 0.000 description 5
 238000001514 detection method Methods 0.000 description 2
 230000004044 response Effects 0.000 description 2
 238000005070 sampling Methods 0.000 description 2
 229920002803 Thermoplastic polyurethane Polymers 0.000 description 1
 230000002547 anomalous Effects 0.000 description 1
 230000006399 behavior Effects 0.000 description 1
 238000004891 communication Methods 0.000 description 1
 230000000875 corresponding Effects 0.000 description 1
 238000009792 diffusion process Methods 0.000 description 1
 238000010801 machine learning Methods 0.000 description 1
 238000000034 method Methods 0.000 description 1
 230000003287 optical Effects 0.000 description 1
 238000005192 partition Methods 0.000 description 1
 238000005295 random walk Methods 0.000 description 1
 239000000126 substance Substances 0.000 description 1
 230000003442 weekly Effects 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
 G06F16/25—Integrating or interfacing systems involving database management systems

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/90—Details of database functions independent of the retrieved data types
 G06F16/901—Indexing; Data structures therefor; Storage structures
 G06F16/9024—Graphs; Linked lists

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
 G06F16/28—Databases characterised by their database models, e.g. relational or object models
 G06F16/289—Object oriented databases

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/90—Details of database functions independent of the retrieved data types
 G06F16/95—Retrieval from the web
Abstract
Systems and methods for generating singlenode representations in graphs comprised of linked nodes. The present technology enables generation of individual node embeddings on the fly in sublinear time (less than O(n), where n is the number of nodes in graph G) using only a PPR vector for the node, and random projection to reduce the dimensionality of the node's PPR vector. In one example, the present technology includes a computerimplemented method comprising obtaining a graph having a plurality of nodes from a database, generating a personal pagerank vector for a given node of the plurality of nodes, and producing an embedding vector for the given node by randomly projecting the personal pagerank vector, wherein the embedding vector has lower dimensionality than the personal pagerank vector.
Description
NODE EMBEDDING VIA HASHBASED PROJECTION OF TRANSFORMED PERSONALIZED PAGERANK BACKGROUND [0001] Graphs may be used to model a wide variety of interesting problems where data can be represented as objects connected to each other, such as in social networks, computer networks, chemical molecules, and knowledge graphs. In many cases, it is beneficial to generate embedded representations of graphs in which a ddimensional embedding vector is assigned for each node in a given graph G. Such node embeddings may be used for downstream machine learning tasks, such as visualization (e.g., where a highdimensional graph is reduced to a lower dimension), node classification (e.g., where missing information in one node is predicted using features of adjacent nodes), anomaly detection (e.g., where anomalous groups of nodes are highlighted), and link predictions (e.g., where new links between nodes are predicted, such as suggesting new connections in a social network). [0002] Existing approaches for generating graph embeddings typically assume that graph data easily fits in memory and is stable. However, in many cases, graph data may in fact be large, making it difficult or infeasible to store and/or process on certain devices (e.g., personal computers, mobile devices). Likewise, in many cases, graph data may be volatile, and thus may become too stale to rely upon for certain tasks (e.g., social networks are constantly changing with new users joining and new relationships forming). Given that network embedding generally must be consistent across all nodes in the graph data, a standard approach to dealing with this changing behavior is to rerun the embedding algorithm on a regular (e.g., weekly) basis, in order to balance the time necessary to generate new graph representations with the need for representations that are as uptodate as possible. At the same time, many of the common uses for graph embeddings such as node classification may only require current representations for a single node or a small set of nodes, making it particularly inefficient to recompute an entire graph embedding on an asneeded basis. [0003] In response, the present technology proposes systems and methods in which the embedding for a node is restricted to using only local structural information, and cannot access the representations of other nodes in the graph or rely on trained global model state. In addition, the present technology can produce embeddings which are consistent with the representations of
the other nodes in the graph, so that the new node embeddings can be incorporated with the rest of the graph embedding and used for downstream tasks. To accomplish this, the present technology proposes systems and methods which leverage a highorder ranking matrix based on global Personalized PageRank (“PPR”) as foundations on which local node embeddings are computed with local PPR Hashing. These systems and methods can produce node embeddings that are comparable to stateoftheart methods in terms of quality, but with efficiency several orders of magnitude better in terms of clock time and shortterm memory consumption. For example, the systems and methods can be configured to produce node embeddings that fit into the volatile memory of a desktop and/or mobile computing device. Moreover, these systems and methods make it possible to update different node embeddings in parallel, for example in a serverfarm system and/or a multiprocessor or multicore processor based system, making it possible to field multiple simultaneous queries, and to base each response on locally updated embeddings specific to each query. Finally, these systems and methods make it possible to tailor processing so as to provide embeddings within preset amount of time, which enables the present technology to be applied in contexts such as frauddetection where embeddings must be generated in a guaranteed amount of time (e.g., 200 ms). BRIEF SUMMARY [0004] The present technology concerns improved systems and methods for generating single node representations in graphs comprised of linked nodes. In that regard, the present technology provides systems and methods for generating individual node embeddings on the fly in sublinear time (less than O(n), where n is the number of nodes in graph G) using only a PPR vector for the node, and random projection to reduce the dimensionality of the node's PPR vector. [0005] In one aspect, the disclosure describes a processing system, comprising a memory, and one or more processors coupled to the memory and configured to perform the following operations: obtain a graph having a plurality of nodes from a database; generate a personal pagerank vector for a given node of the plurality of nodes; and produce an embedding vector for the given node by randomly projecting the personal pagerank vector, wherein the embedding vector has lower dimensionality than the personal pagerank vector. In some aspects, the one or more processors are further configured to perform the following operations, and to perform one or more of the following operations in parallel with one or more of the operations of claim 1:
generate an additional personal pagerank vector for an additional node of the plurality of nodes, the additional node being different from the given node; and produce an additional embedding vector for the additional node by randomly projecting the additional personal pagerank vector, wherein the additional embedding vector has lower dimensionality than the additional personal pagerank vector. In some aspects, the one or more processors are further configured to generate the personal pagerank vector for the given node based at least in part on a precision value. In some aspects, the one or more processors are further configured to generate the personal pagerank vector for the given node based at least in part on a return probability. In some aspects, the one or more processors are further configured to generate the personal pagerank vector as a sparse vector. In some aspects, the one or more processors are further configured to produce the embedding vector for the given node by randomly projecting the personal pagerank vector based at least in part on a preselected dimensionality for the embedding vector. In some aspects, the one or more processors are further configured to produce the embedding vector for the given node by randomly projecting the personal pagerank vector based at least in part on a one or more hashing functions. In some aspects, the one or more processors are further configured to update an embedding for the graph based on the embedding vector for the given node. In some aspects, the one or more processors are further configured to produce a link prediction based at least in part on the embedding vector for the given node, wherein the link prediction represents a prediction of a new link between the given node and another of the plurality of nodes. In some aspects, the one or more processors are further configured to produce a node classification based at least in part on the embedding vector for the given node, wherein the node classification represents a prediction of information to be associated with the given node based on one or more features of other nodes of the plurality of nodes that are adjacent to the given node. [0006] In another aspect, the disclosure describes a computerimplemented method, comprising steps of: obtaining, with one or more processors of a processing system, a graph having a plurality of nodes from a database; generating, with the one or more processors, a personal pagerank vector for a given node of the plurality of nodes; and producing, with the one or more processors, an embedding vector for the given node by randomly projecting the personal pagerank vector, wherein the embedding vector has lower dimensionality than the personal pagerank vector. In some aspects, the method further comprises the following steps, one or more
of which are performed in parallel with one or more of the steps of claim 11: generating, with the one or more processors, an additional personal pagerank vector for an additional node of the plurality of nodes, the additional node being different from the given node; and producing, with the one or more processors, an additional embedding vector for the additional node by randomly projecting the additional personal pagerank vector, wherein the additional embedding vector has lower dimensionality than the additional personal pagerank vector. In some aspects, generating the personal pagerank vector for the given node is based at least in part on a precision value. In some aspects, generating the personal pagerank vector for the given node is based at least in part on a return probability. In some aspects, the personal pagerank vector is a sparse vector. In some aspects, producing the embedding vector for the given node by randomly projecting the personal pagerank vector is based at least in part on a preselected dimensionality for the embedding vector. In some aspects, producing the embedding vector for the given node by randomly projecting the personal pagerank vector is based at least in part on one or more hashing functions. In some aspects, the method further comprises updating the embedding for the graph based on the embedding vector for the given node. In some aspects, the method further comprises producing a link prediction based at least in part on the embedding vector for the given node, wherein the link prediction represents a prediction of a new link between the given node and another of the plurality of nodes. In some aspects, the method further comprises producing a node classification based at least in part on the embedding vector for the given node, wherein the node classification represents a prediction of information to be associated with the given node based on one or more features of other nodes of the plurality of nodes that are adjacent to the given node. BRIEF DESCRIPTION OF THE DRAWINGS [0007] FIG. 1 is a functional diagram of an example system in accordance with aspects of the disclosure. [0008] FIG. 2 is a functional diagram of an example system in accordance with aspects of the disclosure. [0009] FIG. 3 is a flow diagram showing an exemplary method for generating a local node embedding for a selected node v in a graph G with n total nodes, in accordance with aspects of the disclosure.
[0010] FIG.4 is a flow diagram showing an exemplary method for generating a PPR vector for a selected node v in a graph G with n total nodes, in accordance with aspects of the disclosure. [0011] FIG. 5 is a flow diagram showing an exemplary method for performing random projection of a PPR vector to generate a local node embedding for a selected node v, in accordance with aspects of the disclosure. DETAILED DESCRIPTION [0012] The present technology will now be described with respect to the following exemplary systems and methods. Example Systems [0013] A highlevel system diagram 100 of an exemplary processing system for performing the methods described herein is shown in FIG. 1. The processing system 102 may include one or more processors 104 and memory 106 storing instructions and data. The instructions and data may include the graph, the node embeddings, and the routines described herein. Processing system 102 may be resident on a single computing device. For example, processing system 102 may be a server, personal computer, or mobile device, and the graph, node embeddings, and routines may thus be local to that single computing device. Similarly, processing system 102 may be resident on a cloud computing system or other distributed system, such that the graph, node embeddings, and routines may reside on one or more different physical computing devices. [0014] In this regard, FIG. 2 shows an additional highlevel system diagram 200 in which an exemplary processing system 202 for performing the methods described herein is shown as a set of n servers 202a202n, each of which includes one or more processors 204 and memory 206 storing instructions 208 and data 210. In addition, in the example of FIG. 2, the processing system 202 is shown in communication with one or more networks 212, through which it may communicate with one or more other computing devices. For example, the one or more networks 212 may allow a user to interact with processing system 202 using a personal computing device 214, which is shown as a laptop computer, but may take any known form including a desktop computer, tablet, smart phone, etc. Likewise, the one or more networks 212 may allow processing system 202 to communicate with one or more remote databases such as database 216. In this regard, in some aspects of the technology, database 216 may store the
graph, node embeddings, and/or routines described herein, and thus may (along with processing system 202) form a distributed processing system for practicing the methods described below. [0015] The processing systems described herein may be implemented on any type of computing device(s), such as any type of general computing device, server, or set thereof, and may further include other components typically present in general purpose computing devices or servers. Memory 106, 206 stores information accessible by the one or more processors 104, 204, including instructions 108, 208 and data 110, 210 that may be executed or otherwise used by the processor(s) 104, 204. Memory 106, 206 may be of any nontransitory type capable of storing information accessible by the processor(s) 104, 204. For instance, memory 106, 206 may include a nontransitory medium such as a harddrive, memory card, optical disk, solidstate, tape memory, or the like. Computing devices suitable for the roles described herein may include different combinations of the foregoing, whereby different portions of the instructions and data are stored on different types of media. [0016] In all cases, the computing devices described herein may further include any other components normally used in connection with a computing device such as a user interface subsystem. The user interface subsystem may include one or more user inputs (e.g., a mouse, keyboard, touch screen and/or microphone) and one or more electronic displays (e.g., a monitor having a screen or any other electrical device that is operable to display information). Output devices besides an electronic display, such as speakers, lights, and vibrating, pulsing, or haptic elements, may also be included in the computing devices described herein. [0017] The one or more processors included in each computing device may be any conventional processors, such as commercially available central processing units (“CPUs”), graphics processing units (“GPUs”), tensor processing units (“TPUs”), etc. Alternatively, the one or more processors may be a dedicated device such as an ASIC or other hardwarebased processor. Each processor may have multiple cores that are able to operate in parallel. The processor(s), memory, and other elements of a single computing device may be stored within a single physical housing, or may be distributed between two or more housings. Similarly, the memory of a computing device may include a hard drive or other storage media located in a housing different from that of the processor(s), such as in an external database or networked storage device. Accordingly, references to a processor or computing device will be understood to include
references to a collection of processors or computing devices or memories that may or may not operate in parallel, as well as one or more servers of a loadbalanced server farm or cloudbased system. [0018] The computing devices described herein may store instructions capable of being executed directly (such as machine code) or indirectly (such as scripts) by the processor(s). The computing devices may also store data, which may be retrieved, stored, or modified by one or more processors in accordance with the instructions. Instructions may be stored as computing device code on a computing devicereadable medium. In that regard, the terms “instructions” and “programs” may be used interchangeably herein. Instructions may also be stored in object code format for direct processing by the processor(s), or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. By way of example, the programming language may be C#, C++, JAVA or another computer programming language. Similarly, any components of the instructions or programs may be implemented in a computer scripting language, such as JavaScript, PHP, ASP, or any other computer scripting language. Furthermore, any one of these components may be implemented using a combination of computer programming languages and computer scripting languages. Example Methods [0019] FIG. 3 depicts an exemplary method 300 showing how a processing system (e.g., processing system 102 or 202) may generate a local node embedding for a selected node v in a graph G with n total nodes, in accordance with aspects of the disclosure. [0020] In step 302, the processing system receives as input the selected node v, a desired dimension d for the node embedding, a desired precision ∈ and return probability α to be used in calculating the personalized pagerank (“PPR”) vector, and random hashing functions h_{d} and h_{sgn}. ^{[0021]} Functions h_{d} and h_{sgn} are global hash functions. In the example methods of FIGS. 3 and 5, h_{d} is a function randomly sampled from a universal hash family U_{d} that returns a natural number between 0 and (d  1), and h_{sgn} is a function randomly sampled from a universal hash family U_{1,1} that returns either 1 or 1. However, any suitable randomprojectionbased hashing strategy for reducing the dimensionality of the PPR vector may be used, so long as it provides an unbiased estimator for the innerproduct value calculated in step 512 of FIG. 5 (below), and
requires less than O(n) memory and provides a bounded variance. For example, in some aspects of the technology, the variance of the innerproduct calculated in step 512 may be O(log(n^{2}/d)). [0022] Precision ∈ is a value representing the error factor of the PPR approximation. This precision value ∈, together with the local topology of the graph, effectively determines how large of a neighborhood surrounding node v will need to be stored in shortterm memory and processed in order to estimate the PPR vector for node v. In that regard, as the PushFlow routine described in the example methods of FIGS.3 and 4 estimates the true PPR values up to a factor of ∈ for each node, a smaller ∈ value gives a better overall approximation, at the expense of an increased number of iterations and shortterm memory required. The precision value ∈ may be “tuned” by testing different values of ∈ on the dataset until suitable results are achieved, and then using that value for future PPR estimates. For example, the value ∈ may be tuned such that the size of the PPR approximation does not exceed some predefined memory bound, e.g. an amount of memory available to a computing device, a memory cache size of a processor of a computing device or the like. [0023] Return probability α is a value representing a probability of whether a given “random walk” from node v will end up returning (or “teleporting”) back to node v before reaching the end of the neighborhood (defined by precision value ∈). This return probability value α, together with the local topology of the graph, effectively determines how the PPR vector will spread out from node v. The return probability α may be a measured or assumed value. For example, if graph G represents a group of webpages, return probability α could be calculated based on how often a set of actual users surfing those webpages start from a given webpage end up back at that same webpage. However, in some aspects of the technology, the return probability α can simply be a selected value. In that regard, like the precision value ∈, the return probability α may also be “tuned” by testing different values of α on the dataset until suitable results are achieved, and then using that value for future PPR estimates. [0024] In step 304, the processing system calculates a PPR vector for node v based on graph G, node v, precision value ∈, and return probability α, and stores that PPR vector to πv. For the purposes of illustrating the exemplary methods of FIGS. 35, we will assume that πv is a vector with z components [c_{1}, c_{2}, c_{3}, . . . , c_{z}]. Each component c of vector πv is an indexvalue pair, such that cj = (j, rj). Node identifier j can be an integer, or any other unique, hashable identifier
such as a string. Using indexvalue pairs for each component of πv allows the PPR vector to store only nonzero elements. Thus, while a PPR vector will have n values for a graph with n total nodes, using indexvalue pairs allows πv to store only the nonzero values, resulting in a smaller number of only z total components. [0025] In the example of FIGS.3 and 4, the processing system will calculate the PPR vector for node v using the Sparse Personalized PageRank routine known as PushFlow, which is described in Andersen et al., Using pagerank to locally partition a graph, Internet Mathematics 4.1 (2007), pp. 35–64. However, the present technology may utilize any routine for computing PPR that employs a heuristic that guarantees its locality, such as the PPR routines described in: Bahmani et al., Fast Incremental and Personalized PageRank, Proceedings of the VLDB Endowment, vol. 4, No. 3 (2011), pp. 173184; Lofgren, et al., Personalized PageRank to a Target Node, arXiv:1304.4658v2, April 11, 2014; or Yang et al., PNorm Flow Diffusion for Local Graph Clustering, SIAM Workshop on Network Science 2020, available at https://ns20.cs.cornell.edu/abstracts/SIAMNS_2020_paper_12.pdf. In addition, in some aspects of the technology, an adjacency matrix representing all connections between all nodes within graph G may be used instead of a PPR vector, and that adjacency matrix may then be randomly projected (as described below). Further, in some aspects of the technology the adjacency matrix may be raised to a power and then randomly projected (again, as described below). [0026] In step 306, the processing system performs random projection on PPR vector πv based on random hashing functions h_{d} and h_{sgn}, which results in a final vector w of dimension d representing the updated local node embedding for node v. As noted above, this vector w may be used for downstream tasks specific to node v such as classifying node v, or generating link predictions for node v. In that regard, in addition to creating an updated vector for node v, the method of FIG. 3 may be repeated for one or more additional nodes adjacent to node v so as to ensure that any such classifications or node predictions for node v will also take into account any updated attributes of its adjacent nodes. Likewise, for applications in which additional updated representations are needed for other nodes elsewhere in the graph (e.g., nodes that are not adjacent to node v), the method of FIG.3 may be repeated for each of those remote nodes. [0027] In addition, as the methods described herein create updated representations for node v that are consistent with the representations of the other nodes in graph G, the processing system
may generate updated node representations on the fly whenever a node is modified. As such, vector w may be integrated with existing node embeddings for graph G so that downstream tasks that rely upon an entire graph embedding (e.g., visualization tasks) may be performed on a fully updated graph embedding. [0028] FIG. 4 depicts an exemplary method 400 showing how a processing system (e.g., processing system 102 or 202) may generate a PPR vector for a selected node v in a graph G with n total nodes, in accordance with aspects of the disclosure. In that regard, in some aspects of the technology, method 400 may be used to calculate the PPR vector as described above with respect to step 304 of FIG.3. [0029] In step 402, the processing system receives as input the selected node v, and the precision ∈ and return probability α to be used in calculating the PPR vector (each of which has been described above). The processing system will also have access to graph G. However, graph G need not be stored in shortterm memory for the purposes of method 400, thus reducing short term memory consumption. [0030] In step 404, the processing system initializes residual vector r as an empty sparse vector with dimension n. In other words, residual vector r is initialized as a sparse vector with n possible components, each of which is initially empty. Again, n is a number representing the number of total nodes in graph G. [0031] In step 406, the processing system initializes PPR vector π as an empty sparse vector with dimension n. Thus, PPR vector π is also initialized as a sparse vector with n possible components, each of which is initially empty. [0032] In step 408, the element of residual vector r corresponding to selected node v, or r[v], is assigned an initial value of 1. [0033] In step 410, a loop begins which will repeat steps 412418 while there exists any node w in graph G for which that node's residual value r[w] is greater than that node's degree multiplied by the selected precision value ∈. In that regard, the degree of node w, or deg(w) represents the number of nodes that node w is connected to. Thus, on the first pass, because r[v] has been initialized to 1, the condition may be satisfied with respect to node v (assuming reasonable values for ^ and deg(w)), and the loop will begin as shown by the “Yes” arrow pointing to step 412).
[0034] In step 412, the processing system copies the existing value of r[w] to a temporary variable. For the purposes of illustrating example method 400, that temporary variable will be referred to as r'. [0035] In step 414, the processing system increments the existing value of π[w] by (α * r'). This results in that incremented value being stored in the component of π associated with node w, implicitly creating an indexvalue pair between node w and the incremented value. For example, on the first step where π is initially empty, step 414 will result in (α * r') being stored to π[w], which will implicitly create an index value pair within π of (w, (α * r')). [0036] In step 416, the processing system assigns r[w] a new value according to Equation 1 below. As Equation 1 multiplies the stored value of r[w], or r', by the fraction ((1 – α)/2), this results in r[w] being reduced in value.
[0037] In step 418, for each node u connected to node w, the processing system increments that node's residual value r[u] according to Equation 2 below.
[0038] In this case, as deg(w) will return the number of nodes connected to node w, Equation 2 results in the residual value of each node u being increased by an equal share of node w's original residual value. In all, node w's original residual value r' will thus be split up as follows during one pass through steps 412418: • (α * r') will be allocated to π[w] as described in step 414; • [((1 – α)r')/2] will remain in r[w] as described in step 416; and • [((1 – α)r')/2] will be split equally among each r[u] as described in step 418. [0039] Steps 410418 thus result in a node w with “too much” residual value (as determined by the test in step 410) having that residual value flow away from r[w], and into node w's PPR value, and the residuals of its neighboring nodes u. [0040] After each pass through steps 410418, the loop will return to step 410 (as shown by the arrow connecting step 418 back to step 410) for another determination of whether there are any nodes with “too much” residual value. In that regard, as a result of how residual value gets redistributed in steps 410418, each pass has the potential to create additional nodes with “too
much” residual value. Accordingly, the loop of steps 410418 will repeat until, at step 410, the processing system determines that there are no remaining nodes with “too much” residual value. At this point, the existing form of the π vector will be the final PPR vector for node v, and the method will proceed to step 420 as shown by the “No” arrow. [0041] The π vector produced at the conclusion of steps 410418 will be a sparse PPR vector for node v containing only the nonzero values (and their associated index value) that were stored to π[w] in each pass through steps 410418. Accordingly, in step 420, the processing system will return the sparse PPR vector as the final PPR vector πv. [0042] While the resulting PPR vector πv may have a far lower dimensionality than would if it were not sparse (and thus also had to store zero values for any nodes not updated in the passes through steps 410418), even πv may nevertheless have a dimensionality that is too high for it to be used for certain tasks and/or on certain hardware platforms. In that regard, the relatively high dimensionality of πv may make it impractical or impossible to use as input to other models, as a large input vector increases the size (and speed) of the model that uses it. For example, a πv vector with entries for 1 million nodes will require the model to have at least 1 million * k parameters, where k is the output size of the first hidden layer. A model of that size may thus become too big to fit within the memory of a given computing device. Likewise, larger models take longer to train and evaluate. [0043] Thus, to produce a more usable local node embedding, the present technology relies upon random projection to reduce the dimensionality of πv. This enables πv to be converted into a low dimensional embedding that models can learn to generalize on with only a small number of training examples. The smaller dimensionality of the embedding also allows models to be much smaller, and requires less computing power, so that the embedding can be used on computing devices such as mobile phones, tablets, and personal computers as opposed to larger and more powerful computing devices such as enterpriselevel hardware. In addition, smaller individual node embeddings will yield a proportionally smaller graph embedding, allowing fullgraph representations to be used in situations where instantiating a full PPR matrix would simply not be feasible. [0044] FIG. 5 depicts an exemplary method 500 showing how a processing system (e.g., processing system 102 or 202) may perform random projection of a PPR vector to generate a
local node embedding for a selected node v, in accordance with aspects of the disclosure. In that regard, in some aspects of the technology, method 500 may be used to perform the random projection described above with respect to step 306 of FIG.3. [0045] In step 502, the processing system receives as input the PPR vector πv to be randomly projected, a desired dimension d for the node embedding, and the random hashing functions h_{d} and h_{sgn} (each of which has been described above). [0046] In step 504, the processing system initializes a null vector w with dimension d. In other words, w is initialized as a vector with d components, each of which is 0. [0047] In step 506, the processing system initializes a variable j with a value of 1. [0048] In step 508, a loop begins in which, for each component cj in πv, steps 510514 are performed. Again, as described above, πv is composed of the nonzero values of the PPR vector for node v, and each component cj is an indexvalue pair such that cj = (j, rj). [0049] In step 510, the processing system calculates h_{d}(j) and h_{sgn}(j) using the global hash functions described above. [0050] In step 512, the processing system uses the random natural number returned by hashing function h_{d}(j) to select a component of vector w to modify (represented herein as and
increments that selected component of vector w according to Equation 3, below.
[0051] In step 514, the processing system determines whether the current value of j is less than z, the number of components in the PPR vector πv. If so, the processing system will follow the “Yes” arrow to step 516. At step 516, the processing system will increment j by one, and then follow the arrow back to step 508 so that steps 510514 may be repeated for the next component of πv. [0052] This loop will continue to repeat for each next value of j until, at step 514, the processing system determines that j is not less than z, at which point the processing system will follow the “No” arrow to step 518. At step 518, the processing system will return vector w, which represents the updated local node embedding for node v. [0053] Unless otherwise stated, the foregoing alternative examples are not mutually exclusive, but may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the features discussed above can be utilized without
departing from the subject matter defined by the claims, the foregoing description of exemplary systems and methods should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. In addition, the provision of the examples described herein, as well as clauses phrased as “such as,” “including,” “comprising,” and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, the examples are intended to illustrate only some of the many possible embodiments. Further, the same reference numbers in different drawings can identify the same or similar elements.
Claims
CLAIMS 1. A processing system, comprising: a memory; and one or more processors coupled to the memory and configured to perform the following operations: obtain a graph having a plurality of nodes from a database; generate a personal pagerank vector for a given node of the plurality of nodes; and produce an embedding vector for the given node by randomly projecting the personal pagerank vector, wherein the embedding vector has lower dimensionality than the personal pagerank vector.
2. The system of claim 1, wherein the one or more processors are further configured to perform the following operations, and to perform one or more of the following operations in parallel with one or more of the operations of claim 1: generate an additional personal pagerank vector for an additional node of the plurality of nodes, the additional node being different from the given node; and produce an additional embedding vector for the additional node by randomly projecting the additional personal pagerank vector, wherein the additional embedding vector has lower dimensionality than the additional personal pagerank vector.
3. The system of any of claims 1 or 2, wherein the one or more processors are further configured to generate the personal pagerank vector for the given node based at least in part on a precision value.
4. The system of any preceding claim, wherein the one or more processors are further configured to generate the personal pagerank vector for the given node based at least in part on a return probability.
5. The system of any preceding claim, wherein the one or more processors are further configured to generate the personal pagerank vector as a sparse vector.
6. The system of any preceding claim, wherein the one or more processors are further configured to produce the embedding vector for the given node by randomly projecting the personal pagerank vector based at least in part on a preselected dimensionality for the embedding vector.
7. The system of any preceding claim, wherein the one or more processors are further configured to produce the embedding vector for the given node by randomly projecting the personal pagerank vector based at least in part on a one or more hashing functions.
8. The system of any preceding claim, wherein the one or more processors are further configured to update an embedding for the graph based on the embedding vector for the given node.
9. The system of any preceding claim, wherein the one or more processors are further configured to produce a link prediction based at least in part on the embedding vector for the given node, wherein the link prediction represents a prediction of a new link between the given node and another of the plurality of nodes.
10. The system of any preceding claim, wherein the one or more processors are further configured to produce a node classification based at least in part on the embedding vector for the given node, wherein the node classification represents a prediction of information to be associated with the given node based on one or more features of other nodes of the plurality of nodes that are adjacent to the given node.
11. A computerimplemented method, comprising steps of: obtaining, with one or more processors of a processing system, a graph having a plurality of nodes from a database; generating, with the one or more processors, a personal pagerank vector for a given node of the plurality of nodes; and
producing, with the one or more processors, an embedding vector for the given node by randomly projecting the personal pagerank vector, wherein the embedding vector has lower dimensionality than the personal pagerank vector.
12. The method of claim 11, further comprising the following steps, one or more of which are performed in parallel with one or more of the steps of claim 11: generating, with the one or more processors, an additional personal pagerank vector for an additional node of the plurality of nodes, the additional node being different from the given node; and producing, with the one or more processors, an additional embedding vector for the additional node by randomly projecting the additional personal pagerank vector, wherein the additional embedding vector has lower dimensionality than the additional personal pagerank vector.
13. The method of any of claims 11 or 12, wherein generating the personal pagerank vector for the given node is based at least in part on a precision value.
14. The method of any of claims 11 to 13, wherein generating the personal pagerank vector for the given node is based at least in part on a return probability.
15. The method of any of claims 11 to 14, wherein the personal pagerank vector is a sparse vector.
16. The method of any of claims 11 to 15, wherein producing the embedding vector for the given node by randomly projecting the personal pagerank vector is based at least in part on a preselected dimensionality for the embedding vector.
17. The method of any of claims 11 to 16, wherein producing the embedding vector for the given node by randomly projecting the personal pagerank vector is based at least in part on one or more hashing functions.
18. The method of any of claims 11 to 17, further comprising updating the embedding for the graph based on the embedding vector for the given node.
19. The method of any of claims 11 to 18, further comprising producing a link prediction based at least in part on the embedding vector for the given node, wherein the link prediction represents a prediction of a new link between the given node and another of the plurality of nodes.
20. The method of any of claims 11 to 19, further comprising producing a node classification based at least in part on the embedding vector for the given node, wherein the node classification represents a prediction of information to be associated with the given node based on one or more features of other nodes of the plurality of nodes that are adjacent to the given node.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

PCT/US2020/052461 WO2022066156A1 (en)  20200924  20200924  Node embedding via hashbased projection of transformed personalized pagerank 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

PCT/US2020/052461 WO2022066156A1 (en)  20200924  20200924  Node embedding via hashbased projection of transformed personalized pagerank 
Publications (1)
Publication Number  Publication Date 

WO2022066156A1 true WO2022066156A1 (en)  20220331 
Family
ID=72826991
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

PCT/US2020/052461 WO2022066156A1 (en)  20200924  20200924  Node embedding via hashbased projection of transformed personalized pagerank 
Country Status (1)
Country  Link 

WO (1)  WO2022066156A1 (en) 

2020
 20200924 WO PCT/US2020/052461 patent/WO2022066156A1/en unknown
NonPatent Citations (6)
Title 

ANDERSEN ET AL.: "Using pagerank to locally partition a graph", INTERNET MATHEMATICS, vol. 4, no. 1, 2007, pages 35  64 
ANTON TSITSULIN ET AL: "VERSE: Versatile Graph Embeddings from Similarity Measures", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 13 March 2018 (20180313), XP080864071, DOI: 10.1145/3178876.3186120 * 
BAHMANI ET AL.: "Fast Incremental and Personalized PageRank", PROCEEDINGS OF THE VLDB ENDOWMENT, vol. 4, no. 3, 2011, pages 173  184, XP058105554, DOI: 10.14778/1929861.1929864 
LOFGREN ET AL.: "Personalized PageRank to a Target Node", ARXIV:1304.4658V2, 11 April 2014 (20140411) 
YANG DINGQI YANGDINGQI@GMAIL COM ET AL: "NodeSketch HighlyEfficient Graph Embeddings via Recursive Sketching", PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING , KDD '19, ACM PRESS, NEW YORK, NEW YORK, USA, 25 July 2019 (20190725), pages 1162  1172, XP058466198, ISBN: 9781450362016, DOI: 10.1145/3292500.3330951 * 
YANG ET AL.: "PNorm Flow Diffusion for Local Graph Clustering", SIAM WORKSHOP ON NETWORK SCIENCE, 2020, Retrieved from the Internet <URL:https://ns20.cs.cornell.edu/abstracts/SIAMNS_2020_paper_12.pdf.> 
Similar Documents
Publication  Publication Date  Title 

US11544573B2 (en)  Projection neural networks  
US8918348B2 (en)  Webscale entity relationship extraction  
Yu et al.  Latent semantic analysis for text categorization using neural network  
US8533195B2 (en)  Regularized latent semantic indexing for topic modeling  
Matthews et al.  Web usage mining with evolutionary extraction of temporal fuzzy association rules  
JP2009528628A (en)  Relevance propagation from labeled documents to unlabeled documents  
WO2022057658A1 (en)  Method and apparatus for training recommendation model, and computer device and storage medium  
WO2022105108A1 (en)  Network data classification method, apparatus, and device, and readable storage medium  
Ma et al.  Adaptivestep graph metalearner for fewshot graph classification  
Tahmassebi  ideeple: Deep learning in a flash  
Wei et al.  Fast supervised hyperspectral band selection using graphics processing unit  
Xiang et al.  Collective inference for network data with copula latent markov networks  
Ng et al.  Incremental hashbit learning for semantic image retrieval in nonstationary environments  
EP3542319A1 (en)  Training neural networks using a clustering loss  
Li et al.  A deep graph structured clustering network  
WO2020157728A1 (en)  Search and ranking of records across different databases  
Chen et al.  Adversarial caching training: Unsupervised inductive network representation learning on largescale graphs  
Xu et al.  GripNet: Graph information propagation on supergraph for heterogeneous graphs  
WO2022066156A1 (en)  Node embedding via hashbased projection of transformed personalized pagerank  
Kocacoban et al.  Fast online learning in the presence of latent variables  
JP6770709B2 (en)  Model generator and program for machine learning.  
Lu et al.  A smart adversarial attack on deep hashing based image retrieval  
US20220383036A1 (en)  Clustering data using neural networks based on normalized cuts  
Bhattacharjee et al.  Distributed learning of deep feature embeddings for visual recognition tasks  
Guetz et al.  Adaptive importance sampling for network growth models 
Legal Events
Date  Code  Title  Description 

121  Ep: the epo has been informed by wipo that ep was designated in this application 
Ref document number: 20789755 Country of ref document: EP Kind code of ref document: A1 

ENP  Entry into the national phase 
Ref document number: 2020789755 Country of ref document: EP Effective date: 20221125 