WO2017039595A1 - Graph inference calculator - Google Patents

Graph inference calculator Download PDF

Info

Publication number
WO2017039595A1
WO2017039595A1 PCT/US2015/047565 US2015047565W WO2017039595A1 WO 2017039595 A1 WO2017039595 A1 WO 2017039595A1 US 2015047565 W US2015047565 W US 2015047565W WO 2017039595 A1 WO2017039595 A1 WO 2017039595A1
Authority
WO
WIPO (PCT)
Prior art keywords
edge
graph
memory
edges
nodes
Prior art date
Application number
PCT/US2015/047565
Other languages
French (fr)
Inventor
Manish Marwah
Mijung Kim
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2015/047565 priority Critical patent/WO2017039595A1/en
Publication of WO2017039595A1 publication Critical patent/WO2017039595A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • Probabilistic graphical models combine graphs (representations of data) and probability theory.
  • nodes of the graphical model may represent random variables and edges (connections, links, etc.) between or connecting the nodes represent relationships between the nodes (e.g., dependency).
  • the relationships may be specified in terms of factors, edge potential functions, messages, etc. between the nodes.
  • FIG. 1 illustrates a schematic diagram of an example data system including a graph manager with a graph inference calculator that may be implemented in accordance with an aspect of this disclosure.
  • FIG. 2 is a block diagram of an example graph inference calculator that may be used to implement the graph inference calculator of FIG. 1.
  • FIG. 3 is a flowchart representative of example machine readable instructions that may be executed to implement the graph inference calculator of
  • FIG. 4 is another flowchart representative of example machine readable instructions that may be executed to implement the graph inference calculator of FIG. 2.
  • FIG. 5 is a block diagram of an example processor platform capable of executing the instructions of FIGS. 3 and/or 4 to implement the graph inference calculator of FIG. 2.
  • Examples disclosed herein involve determining a graph inference (also known as a graphical model inference) of a graph,
  • a graph inference of a graph computes/indicates conditional or marginal probability(ies) of variab!e(s). value(s), etc. of node(s) of the graph.
  • the marginal probabilities of the nodes correspond to probabilities that the nodes have corresponding value(s) (e.g., a random variable or any other parameter) assigned to the nodes.
  • Conditional probabilities correspond to probabilities that the nodes have corresponding value(s) (e.g., a random variable or any other parameter) assigned to the nodes, conditioned on specific values assigned to other nodes, etc.
  • Graphs are useful in representing relatively large data sets, which may be referred to as "Big Data", on a smaller scale so that the data sets may be easier to understand or so that the content of the data set is displayed in a simpler/more effective manner.
  • massive sizes of data sets cause computing problems/issues when seeking to analyze the data set as computers may lack the resources to process/analyze the data in a desirable amount of time.
  • a data set, or a graph of a data set may be inaccessible because the size of the graph exceeds a memory or memory system of the device.
  • multiple memory devices may be utilized to process the data (e.g., by loading portions of the graph data into both a dynamic random access memory (DRAM) and a non-volatile random access memory (NVRAM)).
  • DRAM dynamic random access memory
  • NVRAM non-volatile random access memory
  • the size of the data set or the size of the graph of the data set may cause a computing system to utilize resources for an excessive period of time, thus, causing the computing system to be unable to process or analyze other data, systems, processes, etc.
  • calculating a graph inference involves a node- centric approach of retrieving node data for a node, calculating probability of the node based on all edges connected to the node, computing the marginal or conditional probability of the node, and then updating the messages of the node.
  • node data and/or edge data may be stored in a volatile memory, such as a DRAM or non-volatile memory, such as a NVRAM.
  • NVRAM may have a slower access rate than a DRAM, randomly accessing the NVRAM is relatively time consuming and an inefficient use of computer resources.
  • previous node-centric techniques that randomly access NVRAM may encounter computing problems/issues due to the slower access speeds of randomly accessing the nodes and subsequently randomly accessing an increased number of edges of the nodes (graphs may include relatively many more edges than nodes) that have edge data stored at various locations throughout a memory device. Accordingly, relatively large amounts of computer resources are used to retrieve the node, then retrieve messages of the node from edges from random locations of the memory storing the edge data.
  • Examples involve retrieving edge data sequentially from a NVRAM or other storage device. Sequentially retrieving data (e.g., in an access sequence from first physical location to a last physical location of the memory) from a NVRAM allows faster access to the data versus randomly accessing the data from random locations of the NVRAM because a controller/processor is sequentially accessing locations that located next to, or near, a previously accessed location.
  • a graph inference calculator sequentially retrieves edge data corresponding to edges from a NVRAM and iteratively processes the edge data to update the node data in a DRAM, or a memory with a faster access rate than the memory storing the edges
  • the edge data from a first memory such as an NVRAM
  • the edge data from a second memory such as a DRAM
  • examples herein provide for increased graph inference processing/calculation speeds and efficiency.
  • An example method includes retrieving edge data from a first memory of a device, the edge data corresponding to edges of a graph and the graph representative of data of a database, the graph comprising a system of nodes connected by the edges.
  • a graph inference calculator calculates respective probabilities of nodes based on corresponding edge data of the edge, the nodes identified in the edge data of the respective edge and updates node data of the respective nodes to include the respective probabilities, the node data stored in a second memory of the device. Furthermore, messages of the edge data for the edges of the graph are updated based on the calculated probabilities.
  • FIG. 1 is a block diagram of a data system 100 including a graph manager 1 10 with a graph inference calculator 1 12 constructed in accordance with the teachings of this disclosure.
  • the example data system 100 of FIG. 1 includes the graph manager 1 10 and graph inference calculator 1 12, a storage database 120, and a memory system 130.
  • the example memory system 130 includes a first memory 132 and a second memory 134.
  • the graph manager 1 10 and graph inference calculator 1 12 analyze data (e.g., data in the form of graphs) from the storage database 120 utilizing the memory system 130.
  • the example storage database 120 may be any database in communication with the memory system 130 and the graph manager 1 10.
  • the storage database 120 may be a storage database of a computer, a server, a network, etc.
  • the storage database 120 stores data and/or corresponding graphs of the data
  • the storage database 120 may be a non-volatile disk storage device, a flash storage device, etc.
  • the storage database 120 may include at least one of the first memory 132 or the second memory 134.
  • the example memory system 130 includes the first memory 132 and the second memory 134.
  • the first memory 132 may be a non-volatile random access memory (NVRAM) and the second memory 134 may be a dynamic random access memory (DRAM). Additionally or
  • the first memory 132 may be accessible by the graph manager 1 10 at a relatively slower rate than the second memory 134. Accordingly, the graph manager 1 10 may access the second memory 134 at a faster rate than the first memory 132 (or vice versa).
  • the first memory 132 may have a larger capacity than the second memory 134 (e.g., because the second memory 134 is faster and thus may be more expensive at larger capacities), in some examples, the memory system 130 may be considered an intermediate memory within the architecture of the data system 100. Accordingly, the graph manager 1 10 may load data or graphs from the storage database 120 into the memory system 130 for processing or analysis (e.g., because the memory system 130 may be accessed more easily, more efficiently, more quickly, etc. than the storage database 120).
  • the graph inference calculator 1 12 may utilize both the first memory 132 and the second memory 134 (rather than one of the first memory or the second memory) to analyze a graph because data of the graph may not fit within the first memory 132 or second memory 134 alone (i.e., the data size of the graph is too large for the capacity of the first memory 132 alone or the second memory 134 alone). More specifically, in some examples, the graph manager 1 10 may load first objects (e.g., edges) of a graph into the first memory 132 and second objects (e.g., nodes) of the same graph into the second memory 134.
  • first objects e.g., edges
  • second objects e.g., nodes
  • the graph manager 1 10 may load edge data of edges of a graph into an NVRAM and node data of nodes of the graph into a DRAM of the memory system 130 to determine a graph inference of the graph.
  • the graph manager 1 10 of FIG. 1 includes a graph inference calculator 1 12 to calculate/determine a graph inference of graphs in the storage database 120 in accordance with the teachings of this disclosure.
  • FIG. 2 is a block diagram of an example graph inference calculator 1 12 that may be used to implement the graph inference calculator 1 12 of FIG. 1 in accordance with the teachings of this disclosure.
  • the example graph inference calculator 1 12 of FIG. 2 includes an edge retriever 210, a node probability calculator 220, a node probability updater 230, an edge message updater 240, and a convergence analyzer 250.
  • the example graph inference calculator 1 12 facilitates calculating a graph inference of a graph (e.g., a graph from the storage database 120 of FIG. 1 ) using an edge-centric approach.
  • the edge retriever 210 may sequentially access edge data from the first memory 132 (e.g., an NVRAM), the node probability calculator 220 may calculate a probability of the nodes (e.g., a node belief) based on the edge data, and the node probability updater 230 may update the node data based on the calculated probability in the second memory 134.
  • the message updater 240 may update messages of the edge data based on the calculated probabilities of the nodes.
  • the example convergence analyzer 250 determines whether messages of all edges or a threshold percentage of the edges (e.g., a subset of the edges) have been updated to determine whether the graph inference calculation for the graph is complete (i.e., the messages of all edges or a percentage of the edges and probabilities of ail nodes or a percentage of the nodes have been updated accordingly).
  • the example edge retriever 210 retrieves edge data for an edge (or edges) from the first memory 132 and buffers (e.g., temporarily store the edge data for processing) the retrieved edge data for the graph inference calculator 1 12. In examples herein, the edge retriever 210 sequentially retrieves the edge data of edges from the first memory 132.
  • the edge retriever 210 iteratively retrieves edge data for the edges by retrieving edge data for a first edge and providing the edge data to the probability calculator 220 for probability calculation of nodes linked by the first edge, then subsequently retrieving edge data of a second edge from a sequential location of the first memory 132, and so on, until edge data for ail edges (or at least a threshold percent of the edges) have been retrieved for processing by the graph inference calculator 1 12.
  • the edge retriever 210 may sequentially retrieve edge data in order of a physical address or physical location of the first memory 132 (e.g., from one physical end of the first memory 132 to an opposite physical end of the first memory 132).
  • the edge retriever 210 may determine that edge data for all (or a threshold percentage/subset) of the edges has been received when the edge retriever 210 retrieves edge data from a last location (e.g., a last physical location, a last address location) of an access sequence. Sequentially retrieving the edge data from the first memory 132 (e.g., by physical address) may allow the edge retriever 210 to retrieve the edge data at a relatively faster rate than by randomly retrieving edge data from any location of the first memory 132 because the edge retriever 210 previously accessed a location near a present access location and therefore may be pointed toward that current access location (rather than needing to process a location/address further away from a previous iocation).
  • a last location e.g., a last physical location, a last address location
  • Sequentially retrieving the edge data from the first memory 132 may allow the edge retriever 210 to retrieve the edge data at a relatively faster rate than by randomly retrieving edge data from any location of the first
  • a threshold percentage of the nodes e.g., 90%, 95%, etc.
  • a selectable number of the nodes may be specified (e.g., via default settings, a user input, etc.). Accordingly, a subset of the edges of the graph may be retrieved by the edge retriever 210 for processing.
  • the example edge data may include node information that identifies nodes linked by the edge example node manager 220 of FIG. 2 retrieve edge data and node data, respectively, from the system memory 130.
  • the edge manager 210 may sequentially retrieve edge data from a non-volatile memory (e.g., such as the second memory 134) corresponding to edges of the graph, and the node manager 220 may retrieve node data from a DRAM of the memory system 130 (e.g., the first memory 132).
  • the example node manager 220 may retrieve node data corresponding to a node or nodes identified in the edge data (e.g., nodes linked by the edge).
  • the node probability calculator 220 receives edge data for edges from the edge retriever 210.
  • the node probability calculator 220 identifies nodes in the edge data (e.g., a pair of nodes linked by the edge).
  • the node probability calculator 220 calculates a marginal or conditional probability of the nodes (or of a value associated with the node) based on messages in the edge data.
  • the marginal probability of a node refers to a probability that a random variable, parameter, etc. of the node may correspond to a designated value or result if or when the node is processed/executed.
  • the messages of the edge data may be messages sent between the nodes of the edge data indicating dependency or relationships of the nodes.
  • a conditional probability refers to a probability that the node has a corresponding value (e.g., a random variable or any other parameter) assigned to the nodes, conditioned on specific values assigned to other nodes, etc.
  • the messages may indicate previously calculated probabilities of the nodes, expected probabilities of the nodes, or factors contributing to probabilities of the nodes based on the relationship and current probabilities of those nodes. Accordingly, edge data for each edge may include at least two messages that are sent between the pair of nodes).
  • edge data for edge E may at least: identify nodes A, B, include a message from Node A to Node B indicating characteristics of Node A (e.g., a probability, message data from other edges linked to Node A, a relationship between Node A and Node B, etc.), and include a message from Node B to Node A (e.g., a message indicating a probability, message data from other edges linked to Node B, a relationship between Node B and Node A, etc.).
  • characteristics of Node A e.g., a probability, message data from other edges linked to Node A, a relationship between Node A and Node B, etc.
  • Node B e.g., a message indicating a probability, message data from other edges linked to Node B, a relationship between Node B and Node A, etc.
  • the node probability calculator 220 identifies the nodes of an edge (e.g., Nodes A, B) from the edge data, and determines marginal or conditional probabilities of the nodes based on messages or other data within the edge data and the Nodes A, B.
  • the marginal or conditional probabilities may be partial probabilities of the nodes until convergence of the graph inference calculation.
  • the example probability calculator 220 may determine/calculate the probabilities of the nodes linked by an edge using any suitable techniques (e.g., belief propagation, Gibbs sampling, etc.).
  • Node A and/or Node B of the edge E may have already been processed for another edge X (e.g., a different edge linked to Node A or Node B was previously retrieved/processed).
  • the node probability calculator 220 may calculate the probability based on the previously updated probability of Node A or Node B in processing the edge X. Accordingly, in examples herein, the node probability calculator 220 calculates an incremental probability for each of the nodes of the graph as each edge is retrieved and processed by the graph inference calculator 1 12.
  • the probabilities of Nodes A, B may be incrementally updated multiple times based on the number of edges respectively linked to Nodes A, B.
  • the example node probability updater 230 updates node data for nodes of edges based on the probabilities calculated by the node probability calculator 230.
  • the node probability updater 230 may update a probability of the node (e.g., a value of a variable, parameter, etc. represented by the node) to reflect probability calculated based on the edge messages in edge data retrieved by the edge retriever 210.
  • the example node data may have been loaded into the second memory 134 (e.g., a DRAM) that has a relatively fast access rate (e.g., compared to the first memory 132) by the graph manager 1 10.
  • the node probability updater 230 may quickly access the node data from the second memory 134 and update the node probability based on the calculations of the node probability calculator 230. Accordingly, referring to the example above, for edge E, the node probability updater 230 may update node data of nodes A, B in the second memory 134 to reflect an updated probability calculated from edge data (e.g., messages) of the edge E. After the node probability updater 230 updates the node probabilities of the respective nodes of an edge, the node probability updater 230 may notify the edge retriever 210, such that the edge retriever 210 may retrieve or provide edge data for another edge (e.g., an edge stored physically nearest to edge E in the first memory 132).
  • edge data e.g., messages
  • the edge retriever 210, probability calculator 220, and node probability updater 230 may iteratively process edges by retrieving the edges from the first memory 132, calculating probabilities of nodes of the respective edges, and updating the probabilities of the respective nodes in the edges in the second memory 134 until all edges (or a threshold percentage (e.g., 90%, 95%, 99%, etc.) of the edges or subset of the edges) in the first memory 132 are processed. Once the edges of the first memory 132 have been processed, the edge message manager 240 updates the messages of the edges in the first memory 132 based on the updated calculated probabilities of the nodes stored in the second memory 134.
  • a threshold percentage e.g. 90%, 95%, 99%, etc.
  • edge message manager 240 may sequentially retrieve the edge data from the first memory 132, generate new edge messages based on updated probabilities of the nodes identified in the edge data and the old messages of the edge data, and store the new messages in the edge data.
  • the edge message manager 240 may compare the old messages to the new messages to determine whether the messages match or match to within a threshold percentage (e.g., using any suitable technique, such as calculating a hash of the messages).
  • the edge message manager may indicate (e.g., using a flag, pointer, binary identifier (0, 1 ), etc.) that the edge (or messages) have converged, and if the old and new edge messages do not match, the edge message manager 240 may indicate that the edge messages of that edge have not converged.
  • the example edge message manager 240 may iterativeiy update the edge messages of the edges stored in the first memory 132 based on the updated probabilities of the nodes stored in the second memory 134 until all messages of all of the edges (or a threshold percentage of the edges) in the first memory 132 have been updated.
  • the example edge message manager 240 may notify the convergence analyzer 250 after ail of the messages (or a threshold percentage of the edges) in the first memory 132 have been updated.
  • the example convergence analyzer 250 checks that ail messages or a threshold percentage of the messages have been updated in the edge data entries of the first memory 132.
  • the convergence analyzer 250 determines whether edge data for each of the edges (or for a threshold percentage of the edges) includes an identifier (e.g., a flag, a pointer, a binary identifier (e.g., 0 or 1 ), etc.) indicating that the messages have converged, if all messages have converged, the convergence analyzer 250 may identify convergence of the graph, and therefore, the graph inference calculator 1 12 has completed calculating the graph inference of the graph, in such examples, the graph manager 1 10 may provide or store the updated graph inference (e.g., the updated edge data and node data from the first memory 132 and the second memory 134) in the storage database 120 of FIG. 1.
  • an identifier e.g., a flag, a pointer, a binary identifier (e.g., 0 or 1 ), etc.
  • the convergence analyzer 250 may notify the edge retriever 210 to begin retrieving edge data for processing of the edges (i.e., processing of the edges starts from the beginning).
  • edges stored in the first memory 132 may be iterativeiy processed by the edge retriever 210, the node probability calculator 220, the node probability updater 230 and messages of the edge data may then be iterativeiy updated by the edge message manager 240 and analyzed for convergence by the convergence analyzer 250 until all messages (or a threshold percentage of the messages) have converged in order to complete calculating the graph inference of the graph.
  • the graph inference calculator 1 12 of FIG. 2 While an example manner of implementing the graph inference calculator 1 12 of FIG. 1 is illustrated in FIG. 2, at least one of the elements, processes and/or devices illustrated in FIG. 2 may be combined, divided, rearranged, omitted, eliminated and/or implemented in any other way. Further, the edge retriever 210, the node probability calculator 220, the node probability updater 230, the edge message manager 240, the convergence analyzer 250, and/or, more generally, the example graph inference calculator 1 12 of FIG. 2 may be implemented by hardware and/or any combination of hardware and executable instructions (e.g., software and/or firmware).
  • hardware and executable instructions e.g., software and/or firmware
  • any of the edge retriever 210, the node probability calculator 220, the node probability updater 230, the edge message manager 240, the convergence analyzer 250, and/or, more generally, the example graph inference calculator 1 12 be implemented by at least one of an analog or digital circuit, a logic circuit, a programmable processor, an application specific integrated circuit (ASIC), a programmable logic device (PLD) and/or a field programmable logic device (FPLD).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • FPLD field programmable logic device
  • At least one of the edge retriever 210, the node probability calculator 220, the node probability updater 230, the edge message manager 240, and/or the convergence analyzer 250 is/are hereby expressly defined to include a tangible machine readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. storing the executable
  • example graph inference calculator 1 12 of FIG. 2 may include at least one element, process, and/or device in addition to, or instead of, those illustrated in FIG. 2, and/or may include more than one of any or all of the illustrated elements, processes and devices.
  • FIGS. 3 and/or 4 Flowcharts representative of example machine readable instructions for implementing the graph inference calculator 1 12 of FIG. 2 is shown in FIGS. 3 and/or 4. in examples herein, the machine readable instructions comprise a program(s)/process(es) for execution by a processor or processors such as the processor 512 shown in the example processor platform 500 discussed below in connection with FIG. 5.
  • the program/process may be embodied in executable instructions (e.g., software) stored on a tangible machine readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 512, but the entire program/process and/or parts thereof could a!ternative!y be executed by a device other than the processor 512 and/or embodied in firmware or dedicated hardware.
  • a tangible machine readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 512, but the entire program/process and/or parts thereof could a!ternative!y be executed by a device other than the processor 512 and/or embodied in firmware or dedicated hardware.
  • a device other than the processor 512 and/or embodied in firmware or dedicated hardware.
  • the example process 300 of FIG. 3 begins with an initiation of the graph inference calculator 1 12 (e.g., upon startup, upon instructions from a user, upon startup of a device implementing the graph inference calculator 1 12 (e.g., the graph manager 1 10, a processor, etc.), etc.).
  • the example process 300 of FIG. 3 may be executed to process edges to determine a graph inference of a graph.
  • the edge retriever retrieves edge data for edges of a graph from the first memory device 132 (e.g., a NVRAM).
  • the edge retriever 210 may sequentially retrieve the edge data by retrieving edge data first a first edge in a first execution of block 310 and retrieving edge data for a second edge in a second execution of the block 310.
  • the edge data for the first edge and the edge data for the second edge may be stored physically next to or near one another, such that the edge retriever 210 retrieves edge data at block 310 in a sequential manner (e.g., based on a physical address/location order of the first memory 132).
  • the node probability calculator 220 calculates (e.g., incrementally calculates based on previous node probability calculations and current node probabilities) a probability of nodes based on the edge data of each node. In other words, the node probability calculator 220 at block 320, calculates a node probability for nodes connected by an edge of the edge data.
  • the node probability updater 230 updates (e.g., incrementally updates) node data of the respective nodes of the edges to include the calculated probabilities.
  • the example node data may be updated in the second memory device 134 (e.g., a DRAM).
  • the edge message manager 240 updates messages of the edges based on the probabilities of the node to facilitate providing a graph inference of the graph, in examples herein, at block 340, the edge message manager 240 may iteratively or sequentially access messages of the edges from the first memory device 132 until messages for all edges (or a threshold percentage of the edges) are updated (e.g., the sequence of edges in the first memory 132 is completed). After block 340, the example process 300 ends.
  • the example process 400 of FIG. 4 begins with an initiation of the graph inference calculator 1 12.
  • the process 400 of FIG. 4 may be executed to calculate the graph inference by iteratively retrieving edges, iteratively updating messages, and iteratively determining convergence (e.g., completion) of the graph inference calculation.
  • the edge retriever 210 retrieves a next edge from the first memory 132 (e.g., a NVRAM).
  • the next edge may be a first edge with edge data stored in a first location of a sequence of physical storage locations of the first memory 132 or the next edge may be a subsequent edge with edge data stored at a subsequent physical location (e.g., next to or near the previous edge) of the first memory 132.
  • the node probability calculator 220 calculates probability of the nodes of the edge (i.e., the nodes linked by the edge) based on the edge data retrieved at block 410.
  • the node probability updater 230 logs the updated probability in the second memory 134.
  • the edge retriever 210 determines whether all (or a threshold percentage) of the edges have been retrieved (e.g., if the processed edge was the last edge of the sequence of edges in the first memory 132). If ail (or a threshold percentage) of the edges have not been retrieved, control returns to block 410.
  • the edge message manager 240 updates messages of a next edge in the first memory based on updated probabiiities of the nodes of the graph (block 450). For example, similar to the edge retriever 210 retrieving edges in block 410, the edge message manager 240, at block 450, may sequentially update messages of the edges based on the updated node probabiiities. In some examples at block 450, the edge message manager 240 may indicate convergence of messages of an edge by comparing the new edge messages to old edge messages of the edge. At block 460, the edge message manager 240 determines whether all (or a threshold percentage) of the edge messages have been updated.
  • the convergence analyzer 250 determines whether the graph inference has converged (block 470), indicating that all messages reflect ail updates to the probabiiities of the nodes (i.e., all old messages match (or match within a threshold percentage) all new messages of all or a threshold percent of the edges). For example, at block 470, the convergence analyzer 250 may check convergence indicators or identifiers (e.g., flags, pointers, binary identifiers, etc.) stored in edge data of the edges.
  • convergence indicators or identifiers e.g., flags, pointers, binary identifiers, etc.
  • FIGS. 3 and/or 4 may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • coded instructions e.g., computer and/or machine readable instructions
  • a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief
  • tangible machine readable storage medium is expressly defined to include any type of machine readabie storage device and/or storage disk and to exclude propagating signals and to exclude transmission media.
  • computer readable storage medium and “machine readable storage medium” are used interchangeably. Additionally or alternatively, the example processes of FIGS.
  • 3 and/or 4 may be implemented using coded instructions (e.g., computer and/or machine readabie instructions) stored on a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • FIG. 5 is a block diagram of an example processor platform 500 capable of executing the instructions of FIGS. 3 and/or 4 to implement the graph inference calculator 1 10 of FIG. 2.
  • the example processor platform 500 may be or may be included in any type of apparatus, such as a server, a personal computer, a mobile device (e.g., a cell phone, a smart phone, a tablet, etc.), a personal digital assistant (PDA), an Internet appliance, or any other type of computing device.
  • the processor platform 500 of the illustrated example of FIG. 5 includes a processor 512.
  • the processor 512 of the illustrated example is hardware.
  • the processor 512 can be implemented by at least one integrated circuit, logic circuit, microprocessor or controller from any desired family or manufacturer.
  • the processor 512 of the illustrated example includes a local memory 513 (e.g., a cache).
  • the processor 512 of the illustrated example is in communication with a main memory including a volatile memory 514 and a nonvolatile memory 518 via a bus 518.
  • the volatile memory 514 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM) and/or any other type of random access memory device.
  • the non-volatile memory 518 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 514, 516 is controlled by a memory controller.
  • the processor platform 500 of the illustrated example also includes an interface circuit 520.
  • the interface circuit 520 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a peripheral component interconnect (PCI) express interface.
  • At least one input device 522 is connected to the interface circuit 520.
  • the input device(s) 522 permit(s) a user to enter data and commands into the processor 512.
  • the input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
  • At least one output device 524 is also connected to the interface circuit 520 of the illustrated example.
  • the output device(s) 524 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touchscreen, a tactile output device, a light emitting diode (LED), a printer and/or speakers).
  • the interface circuit 520 of the illustrated example thus, may include a graphics driver card, a graphics driver chip or a graphics driver processor.
  • the interface circuit 520 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 526 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
  • a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 526 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
  • DSL digital subscriber line
  • the processor platform 500 of the illustrated example also includes at least one mass storage device 528 for storing executable
  • mass storage device(s) 528 examples include floppy disk drives, hard drive disks, compact disk drives, Biu-ray disk drives, RAID systems, and digital versatile disk (DVD) drives.
  • the coded instructions 532 of FIGS. 3 and/or 4 may be stored in the mass storage device 528, in the local memory 513 in the volatile memory 514, in the non-volatile memory 516, and/or on a removable tangible machine readable storage medium such as a CD or DVD.
  • the above disclosed methods, apparatus and articles of manufacture provide a graph inference calculator that utilizes an edge centric approach to determine a graph inference of a graph.
  • the examples herein enhance speed, computer resource utilization, efficiency, etc, by sequentially retrieving edge data from a first memory (e.g., a NVRAM), and accessing node data in a second memory (e.g., a DRAM).
  • the examples herein may enable utilization of computer resources by quickly and efficiently calculating a graph inference of a graph relative to prior techniques.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Examples herein involve retrieving edge data from a first memory of a device, the edge data corresponding to edges of a graph and the graph representative of data of a database, the graph comprising a system of nodes connected by the edges, calculating respective probabilities of nodes based on corresponding edge data of the edge, the nodes identified in the edge data of the respective edge, updating node data of the respective nodes to include the respective probabilities, the node data stored in a second memory of the device, and updating messages of the edge data for the edges of the graph based on the calculated probabilities, the messages representative of probabilities of the nodes connected by the respective edge, the messages and the probabilities to facilitate providing a graph inference of the graph.

Description

GRAPH INFERENCE CALCULATOR
BACKGROUND
[0001] Probabilistic graphical models combine graphs (representations of data) and probability theory. In a graphical model, nodes of the graphical model may represent random variables and edges (connections, links, etc.) between or connecting the nodes represent relationships between the nodes (e.g., dependency). The relationships may be specified in terms of factors, edge potential functions, messages, etc. between the nodes.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] FIG. 1 illustrates a schematic diagram of an example data system including a graph manager with a graph inference calculator that may be implemented in accordance with an aspect of this disclosure.
[0003J FIG. 2 is a block diagram of an example graph inference calculator that may be used to implement the graph inference calculator of FIG. 1.
[0004] FIG. 3 is a flowchart representative of example machine readable instructions that may be executed to implement the graph inference calculator of
FIG. 2.
[0005J FIG. 4 is another flowchart representative of example machine readable instructions that may be executed to implement the graph inference calculator of FIG. 2.
[O0O8] FIG. 5 is a block diagram of an example processor platform capable of executing the instructions of FIGS. 3 and/or 4 to implement the graph inference calculator of FIG. 2.
[0007] Wherever possible, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
DETAILED DESCRIPTION [0008] Examples disclosed herein involve determining a graph inference (also known as a graphical model inference) of a graph, A graph inference of a graph computes/indicates conditional or marginal probability(ies) of variab!e(s). value(s), etc. of node(s) of the graph. The marginal probabilities of the nodes correspond to probabilities that the nodes have corresponding value(s) (e.g., a random variable or any other parameter) assigned to the nodes. Conditional probabilities correspond to probabilities that the nodes have corresponding value(s) (e.g., a random variable or any other parameter) assigned to the nodes, conditioned on specific values assigned to other nodes, etc.
[0009] Graphs are useful in representing relatively large data sets, which may be referred to as "Big Data", on a smaller scale so that the data sets may be easier to understand or so that the content of the data set is displayed in a simpler/more effective manner. However, massive sizes of data sets cause computing problems/issues when seeking to analyze the data set as computers may lack the resources to process/analyze the data in a desirable amount of time. For example, a data set, or a graph of a data set may be inaccessible because the size of the graph exceeds a memory or memory system of the device. Accordingly, multiple memory devices may be utilized to process the data (e.g., by loading portions of the graph data into both a dynamic random access memory (DRAM) and a non-volatile random access memory (NVRAM)). Furthermore, the size of the data set or the size of the graph of the data set may cause a computing system to utilize resources for an excessive period of time, thus, causing the computing system to be unable to process or analyze other data, systems, processes, etc.
[0010] In many systems, calculating a graph inference involves a node- centric approach of retrieving node data for a node, calculating probability of the node based on all edges connected to the node, computing the marginal or conditional probability of the node, and then updating the messages of the node. In some instances, node data and/or edge data ma be stored in a volatile memory, such as a DRAM or non-volatile memory, such as a NVRAM. Considering that NVRAM may have a slower access rate than a DRAM, randomly accessing the NVRAM is relatively time consuming and an inefficient use of computer resources. Accordingly, previous node-centric techniques that randomly access NVRAM may encounter computing problems/issues due to the slower access speeds of randomly accessing the nodes and subsequently randomly accessing an increased number of edges of the nodes (graphs may include relatively many more edges than nodes) that have edge data stored at various locations throughout a memory device. Accordingly, relatively large amounts of computer resources are used to retrieve the node, then retrieve messages of the node from edges from random locations of the memory storing the edge data.
[0011] In examples herein, an edge centric approach is employed.
Examples involve retrieving edge data sequentially from a NVRAM or other storage device. Sequentially retrieving data (e.g., in an access sequence from first physical location to a last physical location of the memory) from a NVRAM allows faster access to the data versus randomly accessing the data from random locations of the NVRAM because a controller/processor is sequentially accessing locations that located next to, or near, a previously accessed location. More specifically, in examples herein, a graph inference calculator sequentially retrieves edge data corresponding to edges from a NVRAM and iteratively processes the edge data to update the node data in a DRAM, or a memory with a faster access rate than the memory storing the edges, in examples herein, the edge data from a first memory, such as an NVRAM, is processed by identifying the nodes and randomly accessing node data of the nodes from a second memory, such as a DRAM, that has a faster access rate than the first memory. Accordingly, examples herein provide for increased graph inference processing/calculation speeds and efficiency.
[0012] An example method includes retrieving edge data from a first memory of a device, the edge data corresponding to edges of a graph and the graph representative of data of a database, the graph comprising a system of nodes connected by the edges. In examples herein, for each edge of the graph, iteratively, a graph inference calculator calculates respective probabilities of nodes based on corresponding edge data of the edge, the nodes identified in the edge data of the respective edge and updates node data of the respective nodes to include the respective probabilities, the node data stored in a second memory of the device. Furthermore, messages of the edge data for the edges of the graph are updated based on the calculated probabilities.
[0013] FIG. 1 is a block diagram of a data system 100 including a graph manager 1 10 with a graph inference calculator 1 12 constructed in accordance with the teachings of this disclosure. The example data system 100 of FIG. 1 includes the graph manager 1 10 and graph inference calculator 1 12, a storage database 120, and a memory system 130. The example memory system 130 includes a first memory 132 and a second memory 134. In examples herein, the graph manager 1 10 and graph inference calculator 1 12 analyze data (e.g., data in the form of graphs) from the storage database 120 utilizing the memory system 130.
[0014] The example storage database 120 may be any database in communication with the memory system 130 and the graph manager 1 10. For example, the storage database 120 may be a storage database of a computer, a server, a network, etc. In examples herein, the storage database 120 stores data and/or corresponding graphs of the data, in some examples, the storage database 120 may be a non-volatile disk storage device, a flash storage device, etc. In some examples, the storage database 120 may include at least one of the first memory 132 or the second memory 134.
[0015] The example memory system 130 includes the first memory 132 and the second memory 134. in some examples, the first memory 132 may be a non-volatile random access memory (NVRAM) and the second memory 134 may be a dynamic random access memory (DRAM). Additionally or
alternatively, other types of memory devices may be included within the memory system 130. In the illustrated example of FIG. 1 , the first memory 132 may be accessible by the graph manager 1 10 at a relatively slower rate than the second memory 134. Accordingly, the graph manager 1 10 may access the second memory 134 at a faster rate than the first memory 132 (or vice versa).
Furthermore, the first memory 132 may have a larger capacity than the second memory 134 (e.g., because the second memory 134 is faster and thus may be more expensive at larger capacities), in some examples, the memory system 130 may be considered an intermediate memory within the architecture of the data system 100. Accordingly, the graph manager 1 10 may load data or graphs from the storage database 120 into the memory system 130 for processing or analysis (e.g., because the memory system 130 may be accessed more easily, more efficiently, more quickly, etc. than the storage database 120). In some examples, the graph inference calculator 1 12 may utilize both the first memory 132 and the second memory 134 (rather than one of the first memory or the second memory) to analyze a graph because data of the graph may not fit within the first memory 132 or second memory 134 alone (i.e., the data size of the graph is too large for the capacity of the first memory 132 alone or the second memory 134 alone). More specifically, in some examples, the graph manager 1 10 may load first objects (e.g., edges) of a graph into the first memory 132 and second objects (e.g., nodes) of the same graph into the second memory 134. Accordingly, in some examples, the graph manager 1 10 may load edge data of edges of a graph into an NVRAM and node data of nodes of the graph into a DRAM of the memory system 130 to determine a graph inference of the graph. The graph manager 1 10 of FIG. 1 includes a graph inference calculator 1 12 to calculate/determine a graph inference of graphs in the storage database 120 in accordance with the teachings of this disclosure.
[0018J FIG. 2 is a block diagram of an example graph inference calculator 1 12 that may be used to implement the graph inference calculator 1 12 of FIG. 1 in accordance with the teachings of this disclosure. The example graph inference calculator 1 12 of FIG. 2 includes an edge retriever 210, a node probability calculator 220, a node probability updater 230, an edge message updater 240, and a convergence analyzer 250. The example graph inference calculator 1 12 facilitates calculating a graph inference of a graph (e.g., a graph from the storage database 120 of FIG. 1 ) using an edge-centric approach.
Accordingly, in examples herein, the edge retriever 210 may sequentially access edge data from the first memory 132 (e.g., an NVRAM), the node probability calculator 220 may calculate a probability of the nodes (e.g., a node belief) based on the edge data, and the node probability updater 230 may update the node data based on the calculated probability in the second memory 134. In examples herein, after edge data for ail or a threshold percentage of the edges is retrieved from the first memory 132, the message updater 240 may update messages of the edge data based on the calculated probabilities of the nodes. The example convergence analyzer 250 determines whether messages of all edges or a threshold percentage of the edges (e.g., a subset of the edges) have been updated to determine whether the graph inference calculation for the graph is complete (i.e., the messages of all edges or a percentage of the edges and probabilities of ail nodes or a percentage of the nodes have been updated accordingly).
[0017] The example edge retriever 210 retrieves edge data for an edge (or edges) from the first memory 132 and buffers (e.g., temporarily store the edge data for processing) the retrieved edge data for the graph inference calculator 1 12. In examples herein, the edge retriever 210 sequentially retrieves the edge data of edges from the first memory 132. Accordingly, the edge retriever 210 iteratively retrieves edge data for the edges by retrieving edge data for a first edge and providing the edge data to the probability calculator 220 for probability calculation of nodes linked by the first edge, then subsequently retrieving edge data of a second edge from a sequential location of the first memory 132, and so on, until edge data for ail edges (or at least a threshold percent of the edges) have been retrieved for processing by the graph inference calculator 1 12. In some examples, the edge retriever 210 may sequentially retrieve edge data in order of a physical address or physical location of the first memory 132 (e.g., from one physical end of the first memory 132 to an opposite physical end of the first memory 132). Accordingly, the edge retriever 210 may determine that edge data for all (or a threshold percentage/subset) of the edges has been received when the edge retriever 210 retrieves edge data from a last location (e.g., a last physical location, a last address location) of an access sequence. Sequentially retrieving the edge data from the first memory 132 (e.g., by physical address) may allow the edge retriever 210 to retrieve the edge data at a relatively faster rate than by randomly retrieving edge data from any location of the first memory 132 because the edge retriever 210 previously accessed a location near a present access location and therefore may be pointed toward that current access location (rather than needing to process a location/address further away from a previous iocation). in some examp!es, a threshold percentage of the nodes (e.g., 90%, 95%, etc.) or a selectable number of the nodes may be specified (e.g., via default settings, a user input, etc.). Accordingly, a subset of the edges of the graph may be retrieved by the edge retriever 210 for processing.
[0018] In examples herein, the example edge data may include node information that identifies nodes linked by the edge example node manager 220 of FIG. 2 retrieve edge data and node data, respectively, from the system memory 130. For example, the edge manager 210 may sequentially retrieve edge data from a non-volatile memory (e.g., such as the second memory 134) corresponding to edges of the graph, and the node manager 220 may retrieve node data from a DRAM of the memory system 130 (e.g., the first memory 132). The example node manager 220 may retrieve node data corresponding to a node or nodes identified in the edge data (e.g., nodes linked by the edge).
[0019] The node probability calculator 220 receives edge data for edges from the edge retriever 210. The node probability calculator 220 identifies nodes in the edge data (e.g., a pair of nodes linked by the edge). The node probability calculator 220 calculates a marginal or conditional probability of the nodes (or of a value associated with the node) based on messages in the edge data. As used herein, the marginal probability of a node refers to a probability that a random variable, parameter, etc. of the node may correspond to a designated value or result if or when the node is processed/executed. In examples herein, the messages of the edge data may be messages sent between the nodes of the edge data indicating dependency or relationships of the nodes. A conditional probability refers to a probability that the node has a corresponding value (e.g., a random variable or any other parameter) assigned to the nodes, conditioned on specific values assigned to other nodes, etc.
[0020] The messages may indicate previously calculated probabilities of the nodes, expected probabilities of the nodes, or factors contributing to probabilities of the nodes based on the relationship and current probabilities of those nodes. Accordingly, edge data for each edge may include at least two messages that are sent between the pair of nodes).
[0021]As an example, given two nodes A, B linked by edge E, edge data for edge E may at least: identify nodes A, B, include a message from Node A to Node B indicating characteristics of Node A (e.g., a probability, message data from other edges linked to Node A, a relationship between Node A and Node B, etc.), and include a message from Node B to Node A (e.g., a message indicating a probability, message data from other edges linked to Node B, a relationship between Node B and Node A, etc.). Accordingly, the node probability calculator 220 identifies the nodes of an edge (e.g., Nodes A, B) from the edge data, and determines marginal or conditional probabilities of the nodes based on messages or other data within the edge data and the Nodes A, B. The marginal or conditional probabilities may be partial probabilities of the nodes until convergence of the graph inference calculation. The example probability calculator 220 may determine/calculate the probabilities of the nodes linked by an edge using any suitable techniques (e.g., belief propagation, Gibbs sampling, etc.).
[0022] In some examples, Node A and/or Node B of the edge E may have already been processed for another edge X (e.g., a different edge linked to Node A or Node B was previously retrieved/processed). In such an example, the node probability calculator 220 may calculate the probability based on the previously updated probability of Node A or Node B in processing the edge X. Accordingly, in examples herein, the node probability calculator 220 calculates an incremental probability for each of the nodes of the graph as each edge is retrieved and processed by the graph inference calculator 1 12. Furthermore, the probabilities of Nodes A, B may be incrementally updated multiple times based on the number of edges respectively linked to Nodes A, B.
[0023]The example node probability updater 230 updates node data for nodes of edges based on the probabilities calculated by the node probability calculator 230. For example, the node probability updater 230 may update a probability of the node (e.g., a value of a variable, parameter, etc. represented by the node) to reflect probability calculated based on the edge messages in edge data retrieved by the edge retriever 210. The example node data may have been loaded into the second memory 134 (e.g., a DRAM) that has a relatively fast access rate (e.g., compared to the first memory 132) by the graph manager 1 10. Accordingly, the node probability updater 230 may quickly access the node data from the second memory 134 and update the node probability based on the calculations of the node probability calculator 230. Accordingly, referring to the example above, for edge E, the node probability updater 230 may update node data of nodes A, B in the second memory 134 to reflect an updated probability calculated from edge data (e.g., messages) of the edge E. After the node probability updater 230 updates the node probabilities of the respective nodes of an edge, the node probability updater 230 may notify the edge retriever 210, such that the edge retriever 210 may retrieve or provide edge data for another edge (e.g., an edge stored physically nearest to edge E in the first memory 132).
[0024] In examples herein, the edge retriever 210, probability calculator 220, and node probability updater 230 may iteratively process edges by retrieving the edges from the first memory 132, calculating probabilities of nodes of the respective edges, and updating the probabilities of the respective nodes in the edges in the second memory 134 until all edges (or a threshold percentage (e.g., 90%, 95%, 99%, etc.) of the edges or subset of the edges) in the first memory 132 are processed. Once the edges of the first memory 132 have been processed, the edge message manager 240 updates the messages of the edges in the first memory 132 based on the updated calculated probabilities of the nodes stored in the second memory 134. For example, edge message manager 240 may sequentially retrieve the edge data from the first memory 132, generate new edge messages based on updated probabilities of the nodes identified in the edge data and the old messages of the edge data, and store the new messages in the edge data. In examples herein, the edge message manager 240 may compare the old messages to the new messages to determine whether the messages match or match to within a threshold percentage (e.g., using any suitable technique, such as calculating a hash of the messages). If the new messages relatively match, then the edge message manager may indicate (e.g., using a flag, pointer, binary identifier (0, 1 ), etc.) that the edge (or messages) have converged, and if the old and new edge messages do not match, the edge message manager 240 may indicate that the edge messages of that edge have not converged. The example edge message manager 240 may iterativeiy update the edge messages of the edges stored in the first memory 132 based on the updated probabilities of the nodes stored in the second memory 134 until all messages of all of the edges (or a threshold percentage of the edges) in the first memory 132 have been updated.
[0025] The example edge message manager 240 may notify the convergence analyzer 250 after ail of the messages (or a threshold percentage of the edges) in the first memory 132 have been updated. The example convergence analyzer 250 checks that ail messages or a threshold percentage of the messages have been updated in the edge data entries of the first memory 132. For example, the convergence analyzer 250 determines whether edge data for each of the edges (or for a threshold percentage of the edges) includes an identifier (e.g., a flag, a pointer, a binary identifier (e.g., 0 or 1 ), etc.) indicating that the messages have converged, if all messages have converged, the convergence analyzer 250 may identify convergence of the graph, and therefore, the graph inference calculator 1 12 has completed calculating the graph inference of the graph, in such examples, the graph manager 1 10 may provide or store the updated graph inference (e.g., the updated edge data and node data from the first memory 132 and the second memory 134) in the storage database 120 of FIG. 1. If the convergence analyzer 250 determines that not all messages (or a threshold percentage of the messages) have not converged, then the convergence analyzer 250 may notify the edge retriever 210 to begin retrieving edge data for processing of the edges (i.e., processing of the edges starts from the beginning).
[O028]Accordingiy, in examples herein, edges stored in the first memory 132 may be iterativeiy processed by the edge retriever 210, the node probability calculator 220, the node probability updater 230 and messages of the edge data may then be iterativeiy updated by the edge message manager 240 and analyzed for convergence by the convergence analyzer 250 until all messages (or a threshold percentage of the messages) have converged in order to complete calculating the graph inference of the graph.
[0027] While an example manner of implementing the graph inference calculator 1 12 of FIG. 1 is illustrated in FIG. 2, at least one of the elements, processes and/or devices illustrated in FIG. 2 may be combined, divided, rearranged, omitted, eliminated and/or implemented in any other way. Further, the edge retriever 210, the node probability calculator 220, the node probability updater 230, the edge message manager 240, the convergence analyzer 250, and/or, more generally, the example graph inference calculator 1 12 of FIG. 2 may be implemented by hardware and/or any combination of hardware and executable instructions (e.g., software and/or firmware). Thus, for example, any of the edge retriever 210, the node probability calculator 220, the node probability updater 230, the edge message manager 240, the convergence analyzer 250, and/or, more generally, the example graph inference calculator 1 12 be implemented by at least one of an analog or digital circuit, a logic circuit, a programmable processor, an application specific integrated circuit (ASIC), a programmable logic device (PLD) and/or a field programmable logic device (FPLD). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the edge retriever 210, the node probability calculator 220, the node probability updater 230, the edge message manager 240, and/or the convergence analyzer 250 is/are hereby expressly defined to include a tangible machine readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. storing the executable
instructions. Further still, the example graph inference calculator 1 12 of FIG. 2 may include at least one element, process, and/or device in addition to, or instead of, those illustrated in FIG. 2, and/or may include more than one of any or all of the illustrated elements, processes and devices.
[0028] Flowcharts representative of example machine readable instructions for implementing the graph inference calculator 1 12 of FIG. 2 is shown in FIGS. 3 and/or 4. in examples herein, the machine readable instructions comprise a program(s)/process(es) for execution by a processor or processors such as the processor 512 shown in the example processor platform 500 discussed below in connection with FIG. 5. The program/process may be embodied in executable instructions (e.g., software) stored on a tangible machine readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 512, but the entire program/process and/or parts thereof could a!ternative!y be executed by a device other than the processor 512 and/or embodied in firmware or dedicated hardware. Further, although the example program is described with reference to the flowchail illustrated in FIGS. 3 and/or 4, many other methods of implementing the example graph inference calculator 1 12 may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined.
[0029]The example process 300 of FIG. 3 begins with an initiation of the graph inference calculator 1 12 (e.g., upon startup, upon instructions from a user, upon startup of a device implementing the graph inference calculator 1 12 (e.g., the graph manager 1 10, a processor, etc.), etc.). The example process 300 of FIG. 3 may be executed to process edges to determine a graph inference of a graph. At block 310 of FIG. 3, the edge retriever retrieves edge data for edges of a graph from the first memory device 132 (e.g., a NVRAM). For example, at block 310, the edge retriever 210 may sequentially retrieve the edge data by retrieving edge data first a first edge in a first execution of block 310 and retrieving edge data for a second edge in a second execution of the block 310. In such an example, the edge data for the first edge and the edge data for the second edge may be stored physically next to or near one another, such that the edge retriever 210 retrieves edge data at block 310 in a sequential manner (e.g., based on a physical address/location order of the first memory 132).
[0030] At block 320, the node probability calculator 220 calculates (e.g., incrementally calculates based on previous node probability calculations and current node probabilities) a probability of nodes based on the edge data of each node. In other words, the node probability calculator 220 at block 320, calculates a node probability for nodes connected by an edge of the edge data. At block 330, the node probability updater 230 updates (e.g., incrementally updates) node data of the respective nodes of the edges to include the calculated probabilities. The example node data may be updated in the second memory device 134 (e.g., a DRAM). At block 340, the edge message manager 240 updates messages of the edges based on the probabilities of the node to facilitate providing a graph inference of the graph, in examples herein, at block 340, the edge message manager 240 may iteratively or sequentially access messages of the edges from the first memory device 132 until messages for all edges (or a threshold percentage of the edges) are updated (e.g., the sequence of edges in the first memory 132 is completed). After block 340, the example process 300 ends.
[0031]The example process 400 of FIG. 4 begins with an initiation of the graph inference calculator 1 12. The process 400 of FIG. 4 may be executed to calculate the graph inference by iteratively retrieving edges, iteratively updating messages, and iteratively determining convergence (e.g., completion) of the graph inference calculation. At block 410, the edge retriever 210 retrieves a next edge from the first memory 132 (e.g., a NVRAM). For example, the next edge may be a first edge with edge data stored in a first location of a sequence of physical storage locations of the first memory 132 or the next edge may be a subsequent edge with edge data stored at a subsequent physical location (e.g., next to or near the previous edge) of the first memory 132. At block 420, the node probability calculator 220 calculates probability of the nodes of the edge (i.e., the nodes linked by the edge) based on the edge data retrieved at block 410. At block 430, the node probability updater 230 logs the updated probability in the second memory 134. At block 440, the edge retriever 210 determines whether all (or a threshold percentage) of the edges have been retrieved (e.g., if the processed edge was the last edge of the sequence of edges in the first memory 132). If ail (or a threshold percentage) of the edges have not been retrieved, control returns to block 410.
[0032] If, at block 440, ail (or a threshold percentage) of the edges were retrieved, then the edge message manager 240 updates messages of a next edge in the first memory based on updated probabiiities of the nodes of the graph (block 450). For example, similar to the edge retriever 210 retrieving edges in block 410, the edge message manager 240, at block 450, may sequentially update messages of the edges based on the updated node probabiiities. In some examples at block 450, the edge message manager 240 may indicate convergence of messages of an edge by comparing the new edge messages to old edge messages of the edge. At block 460, the edge message manager 240 determines whether all (or a threshold percentage) of the edge messages have been updated. If all (or a threshold percentage) of the messages have not been updated, then control returns to block 450 (and the next message is updated), if, at block 480, the messages of all (or a threshold percentage) of the edges have been updated, then the convergence analyzer 250 determines whether the graph inference has converged (block 470), indicating that all messages reflect ail updates to the probabiiities of the nodes (i.e., all old messages match (or match within a threshold percentage) all new messages of all or a threshold percent of the edges). For example, at block 470, the convergence analyzer 250 may check convergence indicators or identifiers (e.g., flags, pointers, binary identifiers, etc.) stored in edge data of the edges. If ail edges do not indicate convergence of the edges of the graph, then control returns to block 410 (and the process 400 is repeated), if at block 470, the convergence analyzer 250 determines that the messages have all converged, then the example process 400 ends, and the graph inference of the graph has been determined/completed (based on the updated edge messages and node probabilities).
[0033] As mentioned above, the example processes of FIGS. 3 and/or 4 may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term tangible machine readable storage medium is expressly defined to include any type of machine readabie storage device and/or storage disk and to exclude propagating signals and to exclude transmission media. As used herein, "computer readable storage medium" and "machine readable storage medium" are used interchangeably. Additionally or alternatively, the example processes of FIGS. 3 and/or 4 may be implemented using coded instructions (e.g., computer and/or machine readabie instructions) stored on a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term non- transitory machine readabie medium is expressly defined to include any type of machine readable storage device and/or storage disk and to exclude
propagating signals and to exclude transmission media. As used herein, when the phrase "at least" is used as the transition term in a preamble of a claim, it is open-ended in the same manner as the term "comprising" is open ended. As used herein the term "a" or "an" may mean "at least one," and therefore, "a" or "an" do not necessarily limit a particular element to a single element when used to describe the element. As used herein, when the term "or" is used in a series, it is not, unless otherwise indicated, considered an "exclusive or."
[0034] FIG. 5 is a block diagram of an example processor platform 500 capable of executing the instructions of FIGS. 3 and/or 4 to implement the graph inference calculator 1 10 of FIG. 2. The example processor platform 500 may be or may be included in any type of apparatus, such as a server, a personal computer, a mobile device (e.g., a cell phone, a smart phone, a tablet, etc.), a personal digital assistant (PDA), an Internet appliance, or any other type of computing device.
[0035] The processor platform 500 of the illustrated example of FIG. 5 includes a processor 512. The processor 512 of the illustrated example is hardware. For example, the processor 512 can be implemented by at least one integrated circuit, logic circuit, microprocessor or controller from any desired family or manufacturer.
[0036] The processor 512 of the illustrated example includes a local memory 513 (e.g., a cache). The processor 512 of the illustrated example is in communication with a main memory including a volatile memory 514 and a nonvolatile memory 518 via a bus 518. The volatile memory 514 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM) and/or any other type of random access memory device. The non-volatile memory 518 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 514, 516 is controlled by a memory controller.
[0037] The processor platform 500 of the illustrated example also includes an interface circuit 520. The interface circuit 520 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a peripheral component interconnect (PCI) express interface.
[0038] In the illustrated example, at least one input device 522 is connected to the interface circuit 520. The input device(s) 522 permit(s) a user to enter data and commands into the processor 512. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
[0039] At least one output device 524 is also connected to the interface circuit 520 of the illustrated example. The output device(s) 524 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touchscreen, a tactile output device, a light emitting diode (LED), a printer and/or speakers). The interface circuit 520 of the illustrated example, thus, may include a graphics driver card, a graphics driver chip or a graphics driver processor. [0040] The interface circuit 520 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 526 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
[0041]The processor platform 500 of the illustrated example also includes at least one mass storage device 528 for storing executable
instructions (e.g., software) and/or data. Examples of such mass storage device(s) 528 include floppy disk drives, hard drive disks, compact disk drives, Biu-ray disk drives, RAID systems, and digital versatile disk (DVD) drives.
[0042] The coded instructions 532 of FIGS. 3 and/or 4 may be stored in the mass storage device 528, in the local memory 513 in the volatile memory 514, in the non-volatile memory 516, and/or on a removable tangible machine readable storage medium such as a CD or DVD.
[0043] From the foregoing, it will be appreciated that the above disclosed methods, apparatus and articles of manufacture provide a graph inference calculator that utilizes an edge centric approach to determine a graph inference of a graph. The examples herein enhance speed, computer resource utilization, efficiency, etc, by sequentially retrieving edge data from a first memory (e.g., a NVRAM), and accessing node data in a second memory (e.g., a DRAM).
Accordingly, the examples herein may enable utilization of computer resources by quickly and efficiently calculating a graph inference of a graph relative to prior techniques.
[0044] Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.

Claims

CLAIMS What is claimed is:
1. A method comprising:
retrieving edge data from a first memory of a device, the edge data corresponding to edges of a graph and the graph representative of data of a database, the graph comprising a system of nodes connected by the edges; for each edge of the graph:
calculating respective probabilities of nodes based on corresponding edge data of the edge, the nodes identified in the edge data of the respective edge; and
incrementally updating node data of the respective nodes to include the respective probabilities, the node data stored in a second memory of the device; and
updating messages of the edge data for the edges of the graph based on the calculated probabilities, the messages representative of probabilities of the nodes connected by the respective edge, the messages and the probabilities to facilitate providing a graph inference of the graph.
2. The method as defined in claim 1 , wherein retrieving the edge data from the non-volatile memory comprises sequentially retrieving edge data in a sequential order based on physical addresses of the first memory, the first memory comprising a non-volatile memory.
3. The method as defined in claim 1 , further comprising updating the messages of the edge data for the edges until convergence of the graph, such that the probabilities of the nodes are incrementally updated.
4. The method as defined in claim 3, further comprising identifying convergence of the graph based on a comparison of the updated messages to old messages of the edge data, wherein convergence of the graph occurs when the old messages match the new messages.
5. The method as defined in claim 1 , further comprising retrieving edge data from the first memory at slower rate than updating the node data of the respective nodes in the second memory due to the first memory having a siovver access rate than the second memory.
6. The method as defined in claim 1 , further comprising:
loading the edge data of the edges into the first memory, the first memory comprising a non-volatile memory; and
loading the node data of the nodes into the second memory, the second memory comprising a dynamic random access memory,
wherein the nodes of the system of nodes are representative of variables of the data, wherein each edge of the edges represents a relationship between a pair of variables of a respective pair of nodes, the respective pair of nodes connected by the edge.
7. An apparatus comprising:
an edge retriever to retrieve edge data for edges of a graph from a nonvolatile memory;
a node probability calculator to calculate marginal probabilities of values of nodes linked by respective edges of the graph based on the edge data; a node probability updater to update probabilities of the nodes to include the marginal probabilities of the values of the nodes;
an edge message manager to update messages of the edges of the graph based on the updated probabilities of the nodes; and
a convergence analyzer to determine that messages of a threshold percentage of the edges have been updated to identify convergence of the graph inference.
8. The apparatus as defined in claim 7, wherein the edge message manager is further to:
compare old messages of the edges to the updated messages of the edges; and
indicate, via identifiers in the edge data of the edges, that the old messages of the edges match the updated messages of the edges to within a threshold percentage to indicate convergence of the edges.
9. The apparatus as defined in claim 8, wherein the convergence analyzer is to analyze the identifiers in the edge data to determine that the messages of the threshold percentage of the edges have been updated.
10. The apparatus as defined in claim 7, wherein the edge retriever, prior to the edge message manager updating the messages of the edges of the graph, is further to:
determine that all of the edges have been retrieved from the non-volatile memory after the edge retriever retrieves a last edge from a last physical location of a sequence of physical locations of the first memory.
1 1 . A non-transitory machine readable storage medium comprising instructions that, when executed, cause a machine to at least:
retrieve edge data for an edge of a graph from a non-volatile memory of a system based on a access sequence of the non-volatile memory;
calculate a probability associated with each node linked by the edge based on the edge data;
log the probability associated with each of the nodes in a dynamic random access memory;
update messages of the edge data based on a probability associated with each of the nodes calculated after all edges of the graph are retrieved and probabilities of all nodes of the graph are logged; and
determine that the graph inference has converged after messages of the edges of the graph have been updated.
12. The non-transitory machine readable storage medium of claim 1 1 , wherein the instructions are further to cause the machine to:
determine the graph inference has converged after al! updated messages of the edges of the graph match old messages of the respective edges of the graph.
13. The non-transitory machine readable storage medium of claim 1 1 , wherein, the edge is a first edge and the instructions are further to cause the machine to:
retrieve edge data for a second edge of the graph from a location of the non-vo!atile memory subsequently ordered in the access sequence of the nonvolatile memory.
14. The non-transitory machine readable storage medium of claim 13, wherein the probability associated with each node comprises at ieast a marginal probability or a conditional probability.
15. The non-transitory machine readable storage medium of claim 1 1 , wherein the graph is representative of data stored in a database, the nodes represent random variables of the data, and the edges represent relationships between the random variables of the respective nodes of the edge.
PCT/US2015/047565 2015-08-28 2015-08-28 Graph inference calculator WO2017039595A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/047565 WO2017039595A1 (en) 2015-08-28 2015-08-28 Graph inference calculator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/047565 WO2017039595A1 (en) 2015-08-28 2015-08-28 Graph inference calculator

Publications (1)

Publication Number Publication Date
WO2017039595A1 true WO2017039595A1 (en) 2017-03-09

Family

ID=58188223

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/047565 WO2017039595A1 (en) 2015-08-28 2015-08-28 Graph inference calculator

Country Status (1)

Country Link
WO (1) WO2017039595A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130179974A1 (en) * 2012-01-11 2013-07-11 Pratyusa Kumar Manadhata Inferring a state of behavior through marginal probability estimation
US20140108322A1 (en) * 2012-10-12 2014-04-17 International Business Machines Corporation Text-based inference chaining
US20140250046A1 (en) * 2013-03-03 2014-09-04 Microsoft Corporation Probabilistic parsing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130179974A1 (en) * 2012-01-11 2013-07-11 Pratyusa Kumar Manadhata Inferring a state of behavior through marginal probability estimation
US20140108322A1 (en) * 2012-10-12 2014-04-17 International Business Machines Corporation Text-based inference chaining
US20140250046A1 (en) * 2013-03-03 2014-09-04 Microsoft Corporation Probabilistic parsing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MICHAEL 1. JORDAN ET AL.: "Graphical models: Probabilistic inference", THE HANDBOOK OF BRAIN THEORY AND NEURAL NETWORKS, 2002, pages 1 - 17, Retrieved from the Internet <URL:http://web.cs.iastate.edu/-honavar/jordan2.pdf> *
TAMIR HAZAN ET AL.: "Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies", THE 24TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, ARXIV PREPRINT ARXIV:1206.3262, 2012, XP055372835, Retrieved from the Internet <URL:http://arxiv.org/ftp/arxiv/papers/1206/1206.3262.pdf> *

Similar Documents

Publication Publication Date Title
CN111078479B (en) Memory detection model training method, memory detection method and device
US9116879B2 (en) Dynamic rule reordering for message classification
CN109873904B (en) User message state reporting processing method, device and storage medium for message pushing
US10089761B2 (en) Graph processing using a shared memory
US11200466B2 (en) Machine learning classifiers
US10621104B2 (en) Variable cache for non-volatile memory
US20230206083A1 (en) Optimizing gradient boosting feature selection
TW201737115A (en) Method and system for training machine learning system
US11403331B2 (en) Multi-term query subsumption for document classification
US10628407B1 (en) Efficient multithreaded data structure processing
US10073657B2 (en) Data processing apparatus, data processing method, and computer program product, and entry processing apparatus
CN117763024A (en) Data fragment extraction method and device
CN110019783B (en) Attribute word clustering method and device
WO2017039595A1 (en) Graph inference calculator
JP6148763B2 (en) Method for grouping data records into a data store across network nodes to optimize data access
WO2017086986A1 (en) Partitioned graph processing
WO2022060868A1 (en) An automated machine learning tool for explaining the effects of complex text on predictive results
CN105468603A (en) Data selection method and apparatus
US20110307475A1 (en) Techniques to find percentiles in a distributed computing environment
US20230128180A1 (en) Programmable access engine architecture for graph neural network and graph application
US11687266B2 (en) Managing deduplication operations based on a likelihood of duplicability
US11513862B2 (en) System and method for state management of devices
US10481803B2 (en) Low write overhead consensus protocol for distributed storage
CN117851355A (en) Data caching method, device, equipment and medium for edge node
TWI740884B (en) Data writing method and device in distributed file system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15903186

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15903186

Country of ref document: EP

Kind code of ref document: A1