WO2017127106A1 - Graph partitioning - Google Patents

Graph partitioning Download PDF

Info

Publication number
WO2017127106A1
WO2017127106A1 PCT/US2016/014512 US2016014512W WO2017127106A1 WO 2017127106 A1 WO2017127106 A1 WO 2017127106A1 US 2016014512 W US2016014512 W US 2016014512W WO 2017127106 A1 WO2017127106 A1 WO 2017127106A1
Authority
WO
WIPO (PCT)
Prior art keywords
partitions
edge
node
nodes
partition
Prior art date
Application number
PCT/US2016/014512
Other languages
French (fr)
Inventor
Roshan DATHATHRI
Manish Marwah
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2016/014512 priority Critical patent/WO2017127106A1/en
Publication of WO2017127106A1 publication Critical patent/WO2017127106A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • a graph is a representation of a set of data (e.g., Big Data).
  • An example graph may include a plurality of nodes (or vertices) and edges connecting the plurality of edges.
  • the graph may be processed by executing the nodes in accordance with characteristics of edges linked to the nodes and related nodes linked by the edges.
  • FIG. 1 is a schematic diagram of an example graph partitioning system including an example graph partitioner constructed in accordance with examples herein.
  • FIG. 2 is a block diagram of an example graph partitioner that may be used to implement the graph partitioner of FIG. 1.
  • FIG. 3 is a block diagram of an example graph distribution monitor that may be implemented by the graph partitioner of FIGS. 1 and/or 2.
  • FIG. 4 is a flowchart representative of example machine readable instructions that may be executed to implement the graph partitioner of FIG. 2 to assign an edge to a partition in accordance with an aspect of this disclosure.
  • FIG. 5 is another flowchart representative of example machine readable instructions that may be executed to implement the graph partitioner of FIG. 2 to partition a graph in accordance with an aspect of this disclosure.
  • FIG. 8 is a block diagram of an example processor platform capable of executing the instructions of FIGS. 4 and/or 5 to implement the graph partitioner of FIG. 2, [0008] Wherever possible, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts,
  • Examples disclosed herein involve partitioning a graph by distributing the edges of the graph across partitions of a memory based on degrees of nodes (number of edges linked to the nodes) linked by the edges and the loads of the partitions (amount of data or edge data in each partition). Partitioning of a graph is tracked to monitor the locations of the nodes across the partitions and the loads of the partitions storing the graph data. Accordingly, examples herein allow for edges to be assigned to particular partitions that may limit or minimize a number of replicated nodes across the partitions and evenly distribute the loads of the partitions.
  • edge data for edges and node data for corresponding nodes is distributed across a plurality of partitions of a memory.
  • Node-cut (or vertex-cut) partitioning involves distributing the edges across the partitions such that no same edge is included within a same partition.
  • nodes may be replicated across the plurality of partitions.
  • each partition is processed separately by respective graph processors of the respective partitions. Accordingly, to ensure accurate processing of a partitioned graph, replicated nodes may need to be reprocessed multiple times as the nodes may be updated by other graph processors from other partitions. Thus, to more quickly and effectively reach convergence of the graph, it may be advantageous to lessen the number of replicated nodes across the partitions.
  • Examples herein involve graph partitioning that seeks to lessen (e.g., minimize) the amount of replicated nodes across the partitions and seeks to equally distribute the loads of the partitions. Accordingly, these benefits may enhance future processing of the partitioned graph as reprocessing of replicated nodes may be lessened, thus decreasing a number of iterations to process the graph to reach convergence. Furthermore, even distribution across the partitions may achieve less wait time among the graph processors, as respective graph processors of the partitions may avoid waiting for graph processor(s) processing graph data of a partition that is overloaded relative to the other graph partitions.
  • An example method includes retrieving edge data for an edge of a graph to assign the edge data to a partition of a memory, the edge linking a pair of nodes of the graph; determining that node data corresponding to at least one node of the pair of nodes linked by the edge exists in a candidate set of partitions of the memory; and assigning the edge data for the edge to a selected one of the partitions of the candidate set of partitions based on respective loads of each partition of the candidate set of partitions
  • a partition "including" a particular edge or a particular node it may be read that the partition includes a copy of edge data or a copy of node data for that particular edge or node, respectively.
  • edge data for the edge and/or node data for the node has been written (e.g., loaded, copied, stored) to that particular partition.
  • FIG. 1 is a schematic diagram of an example graph partitioning system 100 including an example graph partitioner 110 constructed in accordance with examples herein.
  • the example graph partitioning system may be implemented by any computing device(s), such as server(s), personal computer(s), etc.
  • the graph partitioning system 100 in the example of FIG. 1 includes the graph partitioner 1 10, a memory fabric 120 with a plurality of graph partitions 122 (which may be referred to herein as the partitions 122), a graph database 130, and a user interface 140.
  • the graph partitioner 110 partitions graphs from the graph database 130 into the graph partitions 122 of the memory fabric 120 (e.g., based on settings or instructions received from the user interface 140).
  • the example memory fabric 120 may be any suitable memory, such as a dynamic random access memory (DRAM) and/or a non-volatile memory (e.g., a memristor, a phase-change memory, etc.).
  • the memory fabric 120 may include or be a part of a shared-memory that is accessible to other systems (e.g., other graph partitioning systems similar to the graph partitioning system 00 of FIG. 1 , computing devices (e.g., computers, servers, etc), etc.).
  • the example graph partitions 122 of the memory fabric 120 store graph data (e.g., edge data for edges and node data for nodes) for a graph distributed across the graph partitions 122.
  • some or ail of the graph partitions 122 may store graph data for a same graph (e.g., edge data for edges of the graph and node data for nodes of the graph linked by the edges of the graph).
  • the graph partitioner 110 may partition a graph from the graph database 130 such that edge data for edges of a graph is distributed across different graph partitions of the plurality of graph partitions 122. Accordingly, edge data for a same edge may not be located in more than one partition of the plurality of partitions 122. Furthermore, node data corresponding to nodes linked by edges of the edge data is included in the corresponding partitions 22. Thus for example, if edge data for a first edge is located in a first partition of the plurality of partitions 122, node data
  • the graph partitioner 110 may distribute graph data among the plurality of partitions 122 with efforts to minimize or limit a number of replicated nodes across the plurality of partitions 122.
  • the example user interface 140 facilitates user interaction with the graph partitioner 1 10.
  • the user interface 140 may include user input(s) (e.g., a keyboard, a mouse, a trackball, a touchscreen, etc.) and/or user output(s) (e.g., a display, a touchscreen, speakers, indicators, etc.).
  • the user interface 140 enables a user to access and control settings of the graph partitioner 1 10.
  • the user interface 40 may request partitioning of a particular graph from the graph database 130.
  • the user interface 140 enables a user to indicate or set partitioning settings (e.g., a number of the partitions 122 to use, threshold number of edges associated with a node used to differentiate low-degree nodes from high-degree nodes, etc.) to partition the selected graph or particular types of graphs/graph data in accordance with examples herein.
  • partitioning settings e.g., a number of the partitions 122 to use, threshold number of edges associated with a node used to differentiate low-degree nodes from high-degree nodes, etc.
  • the example graph database 130 stores graph data for graphs.
  • the example graph database 130 may be implemented by at least a storage device, a computing device, a network device, etc, that may store or provide graph data for graphs that are to be partitioned in accordance with examples herein.
  • the example graph partitioner 1 10 may receive/retrieve graph data for graphs from the graph database 130. Accordingly, the graph partitioner 110 may load graph data into respective partitions of the plurality of partitions 122 in accordance with the examples herein.
  • FIG. 2 is a block diagram of an example graph partitioner 1 10 that may be used to implement the graph partitioner 110 of FIG. 1.
  • the example graph partitioner 110 of FIG. 2 includes a graph receiver 210, an edge analyzer 220, a graph distribution monitor 230, and a partition assigner 240.
  • An example communication bus 250 in FIG. 2 may facilitate communication between the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, and the partition assigner 240.
  • the graph receiver 210 receives/retrieves a graph (e.g., from the graph database 130), the edge analyzer 220 analyzes edges of the graph, the graph distribution monitor 220 tracks a status of distribution of graph data for the received graph into partitions (e.g.
  • the partition assigner 240 assigns graph data (e.g., edge data for edges and/or node data for corresponding nodes to respective partitions) to the partitions 122.
  • graph data e.g., edge data for edges and/or node data for corresponding nodes to respective partitions
  • the example graph receiver 210 of FIG. 2 may serve as a buffer for receiving (or temporarily storing retrieved) graph data from the graph database 130.
  • the graph receiver 210 may include a queue that receives and/or provides graph data in a specific order.
  • the queue may organize or sort the graph data by edges, such that edge data for each edge is provided one-by-one to the edge analyzer 220,
  • the graph receiver 210 may also organize the graph data such that node data for nodes linked by each edge is stored with the corresponding edge data,
  • the edge analyzer 220 of FIG. 2 analyzes edges (and
  • the edge analyzer 220 may retrieve/receive edge data for an edge (e.g., one-by-one) and node data for nodes linked by the edge from the graph receiver 210.
  • the edge analyzer 220 may analyze the edge data (and/or node data) to identify the nodes linked by the edge.
  • the edge analyzer 220 determines an appropriate partition (e.g., an optimal partition in accordance with examples herein, etc.) or candidate set of partitions for assignment of edge data of the edge and, in some examples, the node data corresponding to the nodes of the edge.
  • the edge analyzer 220 may compute degrees of the nodes of the edge.
  • a degree of a node represents the number of nodes (or number of edges) linked to another node.
  • the edge analyzer 220 may analyze graph data (e.g., during a pre-processing analysis) for the graph by scanning all edges and nodes to determine the degrees of the nodes of the graph.
  • the edge analyzer 220 may determine degrees of the nodes from the node data which may include a characteristic indicating the degree of the nodes.
  • the edge analyzer 220 may determine whether the nodes are low- degree nodes or high degree nodes.
  • a threshold number of links to the nodes may determine whether the nodes are low-degree nodes (less than the threshold number of links) or high-degree nodes (greater than the threshold number of links).
  • the example threshold may be adjustable (e.g., using a user input). Accordingly, different settings may be used for partitioning different graphs.
  • the edge analyzer 220 may set or adjust the example threshold (e.g., without user input) based on characteristics of the graph data (e.g., using an analysis on average degree of all nodes of the graph).
  • the edge analyzer 220 refers to the graph distribution monitor 230 to determine appropriate partitions (e.g., an optimal) for assignment of edge data for each of the edges of the graph (e.g., an optimal partition to minimize a number of replicated nodes and/or evenly distribute loads of the partitions).
  • the graph distribution monitor 230 may track the number of partitions that include node data (and/or edge data) for each node of the graph (i.e., a list of partitions on which each node is replicated, if at ail) (see FIG. 3).
  • the edge analyzer 220 performs a comparative analysis of the partitions to determine an appropriate partition or a candidate set of partitions for assignment of the edge (by storing the edge data/node data of nodes of the edge).
  • the example comparative analysis may minimize replications of nodes in the partitions and evenly distribute loads of the partitions.
  • the edge analyzer 220 may compute a score for each of the plurality of partitions 122 to determine the appropriate partition for assignment of the edge data of an edge (and potential corresponding node data if the node data does not exist in the partition).
  • the example score for each of the partitions may represent an amount of replication that may be avoided and is based on the degree (e.g., high-degree, low-degree, etc) of the nodes of the edge.
  • the edge analyzer 220 may implement or execute the following pseudocode to determine scores for each of the partitions for an edge:
  • vertexl and vertex2 be the vertices on which edge incident
  • / score represents the amount of replication that be avoided
  • vertexl is a low-degree vertex
  • vertex2 is a low-degree vertex :
  • the edge analyzer 220 may determine a candidate set of partitions.
  • the example candidate set of partitions may be those partitions that share a same highest score using the pseudocode above (or any other type of comparative analysis of the partitions).
  • a candidate set may include one partition (e.g., only one of the partitions 122 had a highest score among the other analyzed partitions 122), or a candidate set may include multiple partitions (e.g., multiple partitions received a same highest score).
  • the partition assigner 240 of FIG. 2 assigns edges to one of the candidate set of partitions.
  • assignment of an edge to a partition refers to loading or storing edge data and/or node data for nodes of the edge in the assigned partition, in accordance with examples herein, the partition assigner 240 selects an appropriate partition from the candidate set of partitions and stores the edge data (and, in some examples, corresponding node data) into the appropriate partition.
  • the partition assigner 240 may refer to the graph distribution monitor 230 to determine loads of the partitions. For example, the graph distribution monitor 230 may track loads of each of the plurality of partitions 122 (see FIG. 3).
  • the load of a partition refers to a number of edges and/or an amount of data stored in the partition.
  • the partition assigner 240 may refer to the graph distribution monitor 230 to find a least loaded partition of the candidate set of partitions (i.e., the partition storing the least amount of edges and/or data).
  • the partition assigner 240 may assign the edge data for an edge (and
  • FIG. 2 While an example manner of implementing the graph partitioner 110 of FIG. 1 is illustrated in FIG, 2, at least one of the elements, processes and/or devices illustrated in FIG. 2 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, the partition assigner 240 and/or, more generally, the example graph partitioner 1 10 of FIG. 2 may be implemented by hardware and/or any combination of hardware and executable instructions (e.g., software and/or firmware).
  • hardware and executable instructions e.g., software and/or firmware
  • any of the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, the partition assigner 240 and/or, more generally, the example graph partitioner 110 could be implemented by at least one of an analog or digital circuit, a logic circuit, a programmable processor, an application specific integrated circuit (ASIC), a programmable logic device (PLD) and/or a field programmable logic device (FPLD).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • FPLD field programmable logic device
  • At least one of the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, and/or the partition assigner 240 is/are hereby expressly defined to include a tangible machine readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. storing the executable instructions.
  • the graph partitioner 1 10 of FIG. 2 may include at least one element, process, and/or device in addition to, or instead of, those illustrated in FIG. 2, and/or may include more than one of any or ail of the illustrated elements, processes and devices.
  • FIG. 3 is a block diagram of an example implementation a graph distribution monitor 230 that may be used to implement the graph distribution monitor 230 of FIG. 2.
  • the graph distribution monitor 230 of FIG. 3 includes a node tracker 310 and a partition load tracker 320.
  • the example node tracker 310 and the partition load tracker 320 track distribution of graph data throughout the partitions 122 of the memory fabric 120 (e.g., in real time during partitioning of a graph) in accordance with examples herein.
  • the node tracker 310 may thus provide information to the edge analyzer 220 and/or the partition assigner 240 for distribution of edges/edge data across the partitions 122.
  • the example node tracker 310 tracks a runtime state of the distribution of nodes throughout the partitions 122 of FIG. .
  • the node tracker 3 0 includes a node table 312, which may be implemented by any suitable data structure (e.g., a table, an index, a hash table, etc.), that indicates the partitions 122 in which a particular node has been assigned.
  • Node (which may be represented by an identifier in the node table 312) of FIG. 3 may be located within the set of Partitionsi of the plurality of partitions 122. Partitions!
  • the node tracker 310 of FIG. 3 updates the node table 312. Accordingly, the node tracker 310 tracks/indicates which nodes are located in which partitions as graph data is partitioned.
  • the edge analyzer 220 may refer to the node tracker 310 of the graph distribution monitor 230 to determine that nodes (or copies of node data) of an edge are or are not located within any of the partitions 122. in other words, the edge analyzer 220 may refer to the node tracker 310 to determine whether an edge of a graph was previously processed that is linked to at least one of the nodes of another edge of the graph that is under analysis.
  • the example partition load tracker 320 tracks a runtime state of loads of the partitions 122 of FIG. 1 based on the amount of data (e.g., edge data, node data, or any other data) that is stored within each of the partitions 122.
  • the partition load tracker 320 includes a partition load table 322, which may be implemented by any suitable data structure (e.g., a table).
  • Partitioni includes Ni data, where Ni refers to an amount of data in the partition.
  • An amount of data may be representative of or include a number of edges (and corresponding edge data and node data of nodes of the edge) currently stored in the corresponding partition.
  • the partition load tracker 320 of FIG. 3 updates the partition load table 322. Accordingly, the partition load tracker 322
  • the partition assigner 240 may refer to the partition load tracker 320 of the graph distribution monitor 230 to determine the loads of the partitions 122, in particular, to identify a least loaded partition of a candidate set of partitions (e.g., a partition of a candidate set of partitions that includes a lowest amount of data).
  • a priority may be given to one partition over another for receiving the edge data (e.g., based on address, physical location (proximity) within the memory fabric 120, etc.).
  • the graph distribution monitor 230 of FIG. 3 tracks whether nodes (or node data) exist within the partitions 122 and within which partitions the nodes are located and the loads of the partitions 122. Using this information, the graph partitioner 110 may assign edges to the partitions to minimize a number of replicated nodes across the partitions as well as evenly distribute the loads the partitions.
  • FIG. 3 While an example manner of implementing the graph distribution monitor 230 of FIG. 2 is illustrated in FIG. 3, at least one of the elements, processes and/or devices illustrated in FIG. 3 may be combined, divided, rearranged, omitted, eliminated and/or implemented in any other way. Further, the node tracker 310, the node table 312, the partition load tracker 320, the partition load table 322, and/or, more generally, the example graph distribution monitor 230 of FIG, 3 may be implemented by hardware and/or any combination of hardware and executable instructions (e.g., software and/or firmware).
  • hardware and executable instructions e.g., software and/or firmware
  • any of the node tracker 310, the node fable 312, the partition load tracker 320, the partition load table 322, and/or, more generally, the example graph distribution monitor 230 could be implemented by at least one of an analog or digital circuit, a logic circuit, a programmable processor, an application specific integrated circuit (ASIC), a programmable logic device (PLD) and/or a field programmable logic device (FPLD).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • FPLD field programmable logic device
  • At least one of the node tracker 310, the node table 312, the partition load tracker 320, and/or the partition load table 322 is/are hereby expressly defined to include a tangible machine readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Biu-ray disk, etc. storing the executable instructions.
  • the example graph distribution monitor 230 of FIG. 3 may include at least one element, process, and/or device in addition to, or instead of, those illustrated in FIG. 3, and/or may include more than one of any or ail of the illustrated elements, processes and devices.
  • FIGS. 4 and 5 Flowcharts representative of example machine readable instructions for implementing the graph partitioner 1 10 of FIGS. 2 are shown in FIGS. 4 and 5.
  • the machine readable instructions may comprise program(s)/process(es) for execution by a processor such as the processor 612 shown in the example processor platform 600 discussed below in connection with FIG. 6.
  • the program(s)/process(es) may be embodied in executable instructions (e.g., software) stored on a tangible machine readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 612, but the entirety of the program(s)/process(es) and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware.
  • a tangible machine readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 612
  • a tangible machine readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 612
  • a tangible machine readable storage medium
  • the example process 400 of FIG. 4 begins with an initiation of the graph partitioner 110 (e.g., upon startup, upon instructions from a user, upon startup of a device graph partitioner 110 (e.g., the graph partitioning system 100), etc.).
  • the example process 400 of FIG. 4 may be executed to assign an edge of a graph to a partition during a partitioning of the graph to the plurality of partitions 122 of the memory fabric 120.
  • the edge analyzer 220 retrieves (or receives) edge data (e.g., from the edge receiver 210) for an edge of a graph to assign the edge data to a partition 122 of the memory fabric 120.
  • the edge analyzer 220 may retrieve an edge and identify nodes associated with the edge.
  • the edge analyzer determines that node data corresponding to at least one node of the pair of nodes linked by the edge exists in a candidate set of partitions of the memory fabric 120.
  • the edge analyzer 220 may refer to node tracker 3 0 of the graph distribution monitor 230 to determine whether data for one or both of the nodes linked by an edge is located in a partition (or multiple partitions) of the plurality of partitions 122. If so, the candidate set of partitions may be any of the partitions that includes node data or an edge that is associated with at least one of the nodes.
  • the partition assigner 240 assigns the edge data for the edge to one of the partitions of the candidate set of partitions based on respective loads of each partition of the candidate set of partitions. For example, at block 430, the partition assigner 240 may compare the loads of the candidate set of partitions (e.g., by referring to the partition load tracker 320 of the graph distribution monitor 230), identify a least loaded partition, and assign the edge data to the least loaded partition. In some examples, if, at block 430, the candidate set of partitions includes a single partition, then the partition assigner 240 may assign the edge data to that single partition assuming the load of that single partition is not at maximum capacity (i.e., that the partition is full). In some examples, if a selected partition is at capacity, a secondary candidate partition may be selected, or data within the selected candidate partition may be redistributed to other appropriate partitions. After block 430, the example process 400 ends.
  • the example process 500 of FIG. 5 begins with an initiation of the graph partitioner 110.
  • the example process 500 may be executed to partition a graph into the plurality of partitions 122 of the memory fabric 120.
  • the edge analyzer 220 (and/or the graph receiver 220) selects a next edge 505 for assignment to a partition, in some examples, if the next edge 505 is a first edge of a graph to be assigned, the first edge may be assigned to any one of the partitions 122 (e.g., a first listed partition, a nearest partition based on a physical location, etc.),
  • the edge analyzer 220 determines whether a candidate set of partitions exist that includes one of the nodes linked by the selected edge. For example, at block 510, the edge analyzer 220 may refer to the node tracker 310 to determine whether a partition (or multiple partitions) exist that include (e.g., is/are storing, is/are assigned, etc.) node data corresponding to at least one of the nodes linked by the edge. If, at block 510, a candidate set of partitions exist that includes one of the nodes linked by the selected edge, control advances to block 520. If no such candidate set exists that includes one of the nodes linked by the selected edge (block 510), then, at block 5 5, the partition assigner 240 may assign the edge to any one of the partitions 122 that is least loaded,
  • the edge analyzer 220 determines whether a candidate set of partitions exist that includes both of the nodes linked by the selected edge. For example, at block 510, the edge analyzer 220 may refer to the node tracker 310 to determine whether a partition (or multiple partitions) exist that include node data corresponding to both of the nodes linked by the edge. If, at block 510, no such candidate set of partitions exists that includes both of the nodes linked by the selected edge (i.e., only one of the nodes is found within the partitions 122), then control advances to block 530. if a candidate set of partitions does exist that includes both of the nodes linked by the selected edge (block 520), then, at block 525, the partition assigner 240 assigns the edge to least loaded partition of the set that includes both nodes of the edge.
  • the edge analyzer 220 determines a degree of each of the nodes of the selected edge.
  • the edge analyzer 220 may refer to node characteristics or an analysis performed identifying the number of edge linked to each of the nodes (i.e., the degree). It is noted that from the analysis of blocks 510 and 520, at block 530, only one of the nodes linked by the edge selected at block 505 is located within at least one of the partitions 122.
  • the edge analyzer 220 determines that both of the nodes are either high-degree or that both of the nodes are low-degree (i.e., one of the nodes is a high-degree node and one of the nodes is a low-degree node), then control advances to block 545. If the edge analyzer 220 determines that both nodes of the selected edge are a high- degree node or both nodes of the edge are a low-degree node (block 535), then, at block 540, the partition assigner 240 assigns the edge to the least loaded partition of the set of partitions that includes the node.
  • the edge analyzer 220 determines whether the candidate set of partitions includes the low-degree node (or a copy of the node data for the low degree node). If, at block 545, the candidate set of partitions includes partitions with the low-degree node, then the partition assigner 240 assigns the edge to the least loaded partition that includes the low-degree node (block 550).
  • the partition assigner 240 assigns the edge to the least loaded partition that includes the high-degree node.
  • the graph partitioner 1 10 determines whether more edges of the graph are to be assigned to the partitions 122 of the memory fabric 120. If, at block 580, the graph partitioner 110 determines that more edges are to be assigned to the partitions (i.e., the graph partitioner 110 has not completed partitioning the graph), then control returns to block 505 to assign the next edge, if the graph partitioner 110 determines that there are no more edges to be assigned to the partitions 122 (i.e., the graph partitioning is complete), then the example process 500 ends, in the example process 500 of FIG.
  • the partition assigner 240 may refer to the partition load tracker 320 to determine the least loaded partition of the partitions 122 or of the candidate set of partitions.
  • FIGS. 4 and 5 may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • coded instructions e.g., computer and/or machine readable instructions
  • a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances,
  • tangible machine readable storage medium is expressly defined to include any type of machine readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media.
  • computer readable storage medium and “machine readable storage medium” are used interchangeably. Additionally or alternatively, the example processes of FIGS.
  • non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk,
  • FIG. 8 is a block diagram of an example processor platform 600 capable of executing the instructions of FIGS. 4 and 5 to implement the graph partitioner of FIG. 2,
  • the example processor platform 600 may be or may be included in any type of apparatus, such as a server, a personal computer, a mobile device (e.g., a cell phone, a smart phone, a tablet, etc.), any other type of computing device.
  • the processor platform 600 of the illustrated example of FIG, 6 includes a processor 612.
  • the processor 612 of the illustrated example is hardware.
  • the processor 612 can be implemented by at least one integrated circuit, logic circuit, microprocessor or controller from any desired family or manufacturer.
  • the processor 612 of the illustrated example includes a local memory 613 (e.g., a cache).
  • the processor 612 of the illustrated example is in communication with a main memory including a volatile memory 614 and a nonvolatile memory 616 via a bus 618.
  • the volatile memory 614 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM) and/or any other type of random access memory device.
  • the non-volatile memory 616 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 614, 616 is controlled by a memory controller.
  • the processor platform 600 of the illustrated example also includes an interface circuit 620.
  • the interface circuit 620 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a peripheral component interconnect (PCI) express interface.
  • At least one input device 622 is connected to the interface circuit 620.
  • the input device(s) 622 permit(s) a user to enter data and commands into the processor 612.
  • the input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, and/or a voice recognition system.
  • At least one output device 624 is also connected to the interface circuit 620 of the illustrated example.
  • the output device(s) 624 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touchscreen, a tactile output device, a light emitting diode (LED), a printer and/or speakers).
  • the interface circuit 620 of the illustrated example thus, may include a graphics driver card, a graphics driver chip or a graphics driver processor.
  • the interface circuit 620 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 626 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
  • a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 626 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
  • DSL digital subscriber line
  • the processor platform 600 of the illustrated example also includes at least one mass storage device 628 for storing executable
  • mass storage device(s) 628 examples include floppy disk drives, hard drive disks, compact disk drives, Biu-ray disk drives, RAID systems, and digital versatile disk (DVD) drives,
  • the coded instructions 632 of FIGS. 4 and/or 5 may be stored in the mass storage device 628, in the local memory 613 in the volatile memory 614, in the non-volatile memory 616, and/or on a removable tangible machine readable storage medium such as a CD or DVD.
  • examples herein partition a graph to limit or minimize a number of replicated nodes across partitions of a graph as well as attempt to evenly distribute edges across the partitions of the graph such that the loads of the partitions are relatively even (e.g., within a threshold distribution). Accordingly, examples herein provide a partitioned graph that may be processed with relatively increased efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Examples herein involve partitioning a graph into partitions of a memory based on degrees of nodes and loads of partitions. Examples involve retrieving edge data for an edge, determining node data for nodes of the edge exists in partitions of a memory, and assigning the edge data for the edge to a selected one of the partitions based on loads of the partitions.

Description

^ uKDAA QrUn Q PA A DKT 1 I 1 T I ! l iU**¾¾,lS ίl Sil!iUIWk
BACKGROUND
[0001] A graph is a representation of a set of data (e.g., Big Data). An example graph may include a plurality of nodes (or vertices) and edges connecting the plurality of edges. The graph may be processed by executing the nodes in accordance with characteristics of edges linked to the nodes and related nodes linked by the edges.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] FIG. 1 is a schematic diagram of an example graph partitioning system including an example graph partitioner constructed in accordance with examples herein.
[0003] FIG. 2 is a block diagram of an example graph partitioner that may be used to implement the graph partitioner of FIG. 1.
[0004] FIG. 3 is a block diagram of an example graph distribution monitor that may be implemented by the graph partitioner of FIGS. 1 and/or 2.
[0005] FIG. 4 is a flowchart representative of example machine readable instructions that may be executed to implement the graph partitioner of FIG. 2 to assign an edge to a partition in accordance with an aspect of this disclosure.
[0006] FIG. 5 is another flowchart representative of example machine readable instructions that may be executed to implement the graph partitioner of FIG. 2 to partition a graph in accordance with an aspect of this disclosure.
[0007] FIG. 8 is a block diagram of an example processor platform capable of executing the instructions of FIGS. 4 and/or 5 to implement the graph partitioner of FIG. 2, [0008] Wherever possible, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts,
DETAILED DESCRIPTION
[0009] Examples disclosed herein involve partitioning a graph by distributing the edges of the graph across partitions of a memory based on degrees of nodes (number of edges linked to the nodes) linked by the edges and the loads of the partitions (amount of data or edge data in each partition). Partitioning of a graph is tracked to monitor the locations of the nodes across the partitions and the loads of the partitions storing the graph data. Accordingly, examples herein allow for edges to be assigned to particular partitions that may limit or minimize a number of replicated nodes across the partitions and evenly distribute the loads of the partitions.
[0010] When partitioning graphs, edge data for edges and node data for corresponding nodes is distributed across a plurality of partitions of a memory. Node-cut (or vertex-cut) partitioning involves distributing the edges across the partitions such that no same edge is included within a same partition.
Accordingly, because multiple edges may be linked to a single node, nodes may be replicated across the plurality of partitions. When a partitioned graph is processed, each partition is processed separately by respective graph processors of the respective partitions. Accordingly, to ensure accurate processing of a partitioned graph, replicated nodes may need to be reprocessed multiple times as the nodes may be updated by other graph processors from other partitions. Thus, to more quickly and effectively reach convergence of the graph, it may be advantageous to lessen the number of replicated nodes across the partitions.
[0011] Examples herein involve graph partitioning that seeks to lessen (e.g., minimize) the amount of replicated nodes across the partitions and seeks to equally distribute the loads of the partitions. Accordingly, these benefits may enhance future processing of the partitioned graph as reprocessing of replicated nodes may be lessened, thus decreasing a number of iterations to process the graph to reach convergence. Furthermore, even distribution across the partitions may achieve less wait time among the graph processors, as respective graph processors of the partitions may avoid waiting for graph processor(s) processing graph data of a partition that is overloaded relative to the other graph partitions.
[0012]An example method includes retrieving edge data for an edge of a graph to assign the edge data to a partition of a memory, the edge linking a pair of nodes of the graph; determining that node data corresponding to at least one node of the pair of nodes linked by the edge exists in a candidate set of partitions of the memory; and assigning the edge data for the edge to a selected one of the partitions of the candidate set of partitions based on respective loads of each partition of the candidate set of partitions
[0013] In examples herein, when referring to a partition "including" a particular edge or a particular node, it may be read that the partition includes a copy of edge data or a copy of node data for that particular edge or node, respectively. Furthermore, in examples herein, when an edge (and/or a node) is assigned to a particular partition, it may be read that edge data for the edge (and/or node data for the node) has been written (e.g., loaded, copied, stored) to that particular partition.
[0014] FIG. 1 is a schematic diagram of an example graph partitioning system 100 including an example graph partitioner 110 constructed in accordance with examples herein. The example graph partitioning system may be implemented by any computing device(s), such as server(s), personal computer(s), etc. The graph partitioning system 100 in the example of FIG. 1 includes the graph partitioner 1 10, a memory fabric 120 with a plurality of graph partitions 122 (which may be referred to herein as the partitions 122), a graph database 130, and a user interface 140. in the illustrated example of FIG. 1 , the graph partitioner 110 partitions graphs from the graph database 130 into the graph partitions 122 of the memory fabric 120 (e.g., based on settings or instructions received from the user interface 140).
[0015] The example memory fabric 120 may be any suitable memory, such as a dynamic random access memory (DRAM) and/or a non-volatile memory (e.g., a memristor, a phase-change memory, etc.). The memory fabric 120 may include or be a part of a shared-memory that is accessible to other systems (e.g., other graph partitioning systems similar to the graph partitioning system 00 of FIG. 1 , computing devices (e.g., computers, servers, etc), etc.). The example graph partitions 122 of the memory fabric 120 store graph data (e.g., edge data for edges and node data for nodes) for a graph distributed across the graph partitions 122. For example, some or ail of the graph partitions 122 may store graph data for a same graph (e.g., edge data for edges of the graph and node data for nodes of the graph linked by the edges of the graph).
[0016] In examples herein, the graph partitioner 110 may partition a graph from the graph database 130 such that edge data for edges of a graph is distributed across different graph partitions of the plurality of graph partitions 122. Accordingly, edge data for a same edge may not be located in more than one partition of the plurality of partitions 122. Furthermore, node data corresponding to nodes linked by edges of the edge data is included in the corresponding partitions 22. Thus for example, if edge data for a first edge is located in a first partition of the plurality of partitions 122, node data
corresponding to the nodes linked by the first edge is also stored in the first partition. Thus, in examples herein, because the nodes linked by the first edge may also be linked to other nodes via additional edges, node data for the same nodes may be replicated (i.e., copied, duplicated, etc.) across multiple partitions of the plurality of partitions 122, Such nodes may be referred to herein as replicate nodes, in examples herein, the graph partitioner 110 may distribute graph data among the plurality of partitions 122 with efforts to minimize or limit a number of replicated nodes across the plurality of partitions 122.
[0017] The example user interface 140 facilitates user interaction with the graph partitioner 1 10. For example, the user interface 140 may include user input(s) (e.g., a keyboard, a mouse, a trackball, a touchscreen, etc.) and/or user output(s) (e.g., a display, a touchscreen, speakers, indicators, etc.). In examples herein, the user interface 140 enables a user to access and control settings of the graph partitioner 1 10. For example, the user interface 40 may request partitioning of a particular graph from the graph database 130. Furthermore, the user interface 140 enables a user to indicate or set partitioning settings (e.g., a number of the partitions 122 to use, threshold number of edges associated with a node used to differentiate low-degree nodes from high-degree nodes, etc.) to partition the selected graph or particular types of graphs/graph data in accordance with examples herein.
[0018] The example graph database 130 stores graph data for graphs. The example graph database 130 may be implemented by at least a storage device, a computing device, a network device, etc, that may store or provide graph data for graphs that are to be partitioned in accordance with examples herein. The example graph partitioner 1 10 may receive/retrieve graph data for graphs from the graph database 130. Accordingly, the graph partitioner 110 may load graph data into respective partitions of the plurality of partitions 122 in accordance with the examples herein.
[0019] FIG. 2 is a block diagram of an example graph partitioner 1 10 that may be used to implement the graph partitioner 110 of FIG. 1. The example graph partitioner 110 of FIG. 2 includes a graph receiver 210, an edge analyzer 220, a graph distribution monitor 230, and a partition assigner 240. An example communication bus 250 in FIG. 2 may facilitate communication between the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, and the partition assigner 240. In examples herein, the graph receiver 210 receives/retrieves a graph (e.g., from the graph database 130), the edge analyzer 220 analyzes edges of the graph, the graph distribution monitor 220 tracks a status of distribution of graph data for the received graph into partitions (e.g. by tracking runtime states of the distribution, such as node locations and loads of the partitions), and the partition assigner 240 assigns graph data (e.g., edge data for edges and/or node data for corresponding nodes to respective partitions) to the partitions 122.
[0020] The example graph receiver 210 of FIG. 2 may serve as a buffer for receiving (or temporarily storing retrieved) graph data from the graph database 130. In some examples, the graph receiver 210 may include a queue that receives and/or provides graph data in a specific order. For example, the queue may organize or sort the graph data by edges, such that edge data for each edge is provided one-by-one to the edge analyzer 220, In some examples, the graph receiver 210 may also organize the graph data such that node data for nodes linked by each edge is stored with the corresponding edge data,
[0021]The edge analyzer 220 of FIG. 2 analyzes edges (and
corresponding edge data and node data of nodes linked by the edges) of a graph to be distributed across the plurality of partitions 122 of the memory fabric 120, For example, the edge analyzer 220 may retrieve/receive edge data for an edge (e.g., one-by-one) and node data for nodes linked by the edge from the graph receiver 210. The edge analyzer 220 may analyze the edge data (and/or node data) to identify the nodes linked by the edge. For each edge, the edge analyzer 220 determines an appropriate partition (e.g., an optimal partition in accordance with examples herein, etc.) or candidate set of partitions for assignment of edge data of the edge and, in some examples, the node data corresponding to the nodes of the edge.
[0022] In examples herein, for each edge that is to be assigned to one of the partitions 122, the edge analyzer 220 may compute degrees of the nodes of the edge. As used herein, a degree of a node represents the number of nodes (or number of edges) linked to another node. In some examples, the edge analyzer 220 may analyze graph data (e.g., during a pre-processing analysis) for the graph by scanning all edges and nodes to determine the degrees of the nodes of the graph. In some examples, the edge analyzer 220 may determine degrees of the nodes from the node data which may include a characteristic indicating the degree of the nodes. Based on the degree of each of the nodes of an edge, the edge analyzer 220 may determine whether the nodes are low- degree nodes or high degree nodes. A threshold number of links to the nodes may determine whether the nodes are low-degree nodes (less than the threshold number of links) or high-degree nodes (greater than the threshold number of links). The example threshold may be adjustable (e.g., using a user input). Accordingly, different settings may be used for partitioning different graphs. In some examples, the edge analyzer 220 may set or adjust the example threshold (e.g., without user input) based on characteristics of the graph data (e.g., using an analysis on average degree of all nodes of the graph).
[0023] In examples herein, the edge analyzer 220 refers to the graph distribution monitor 230 to determine appropriate partitions (e.g., an optimal) for assignment of edge data for each of the edges of the graph (e.g., an optimal partition to minimize a number of replicated nodes and/or evenly distribute loads of the partitions). For example, the graph distribution monitor 230 may track the number of partitions that include node data (and/or edge data) for each node of the graph (i.e., a list of partitions on which each node is replicated, if at ail) (see FIG. 3).
[0024] In examples herein, for each node, the edge analyzer 220 performs a comparative analysis of the partitions to determine an appropriate partition or a candidate set of partitions for assignment of the edge (by storing the edge data/node data of nodes of the edge). The example comparative analysis may minimize replications of nodes in the partitions and evenly distribute loads of the partitions. For example, the edge analyzer 220 may compute a score for each of the plurality of partitions 122 to determine the appropriate partition for assignment of the edge data of an edge (and potential corresponding node data if the node data does not exist in the partition). The example score for each of the partitions may represent an amount of replication that may be avoided and is based on the degree (e.g., high-degree, low-degree, etc) of the nodes of the edge. For example, the edge analyzer 220 may implement or execute the following pseudocode to determine scores for each of the partitions for an edge:
assign(edge) :
let vertexl and vertex2 be the vertices on which edge incident
for each partition p :
score [ρ] == 0
/ score represents the amount of replication that be avoided
// weight/priority for low degree vertices : L
/ weight/priority for high degree vertices : H
// higher weight is given to low-degree-vertices ,
L. > H
// e . g . , L = 2 and H = 1
if vertexl is already replicated in p :
if vertexl is a low-degree vertex:
score [p] += L
else :
score[p] += H
if vertex2 is already replicated in p :
if vertex2 is a low-degree vertex :
score[p] += L
else
score[p] += H
[0025] After computing the scores for each of the partitions for an edge, the edge analyzer 220 may determine a candidate set of partitions. The example candidate set of partitions may be those partitions that share a same highest score using the pseudocode above (or any other type of comparative analysis of the partitions). In some examples, no candidate set of partitions may exist for an edge (e.g., none of the partitions 122 include at least one node of an edge under analysis, thus ail partitions 122 have a score = 0), a candidate set may include one partition (e.g., only one of the partitions 122 had a highest score among the other analyzed partitions 122), or a candidate set may include multiple partitions (e.g., multiple partitions received a same highest score). In the example pseudocode above priority is given to low-degree nodes (referred to as vertices or a "vertex" in the pseudocode above) as they receive a higher score, thus, those partitions that include the low-degree node of the edge may be assigned the edge data before assigning the edge data to a partition with the high-degree node of the edge. The edge analyzer 220 may then provide the candidate set of partitions to the partition assigner 240 to select and assign the edge data to the appropriate partition.
[0026] In examples herein, the partition assigner 240 of FIG. 2 assigns edges to one of the candidate set of partitions. In examples herein, assignment of an edge to a partition refers to loading or storing edge data and/or node data for nodes of the edge in the assigned partition, in accordance with examples herein, the partition assigner 240 selects an appropriate partition from the candidate set of partitions and stores the edge data (and, in some examples, corresponding node data) into the appropriate partition. The partition assigner 240 may refer to the graph distribution monitor 230 to determine loads of the partitions. For example, the graph distribution monitor 230 may track loads of each of the plurality of partitions 122 (see FIG. 3). As used herein, the load of a partition refers to a number of edges and/or an amount of data stored in the partition. Accordingly, the partition assigner 240 may refer to the graph distribution monitor 230 to find a least loaded partition of the candidate set of partitions (i.e., the partition storing the least amount of edges and/or data). The partition assigner 240 may assign the edge data for an edge (and
corresponding node data) to the least loaded partition in order to evenly distribute the edge data throughout the partitions 122 of the memory fabric 120, and, thus, avoid overloading partition(s) with edge data relative to other partitions. In such examples, retrieving edge data from the partitions 122 when processing the graph data distributed across the partitions 122 may be relatively more efficient.
[0027] While an example manner of implementing the graph partitioner 110 of FIG. 1 is illustrated in FIG, 2, at least one of the elements, processes and/or devices illustrated in FIG. 2 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, the partition assigner 240 and/or, more generally, the example graph partitioner 1 10 of FIG. 2 may be implemented by hardware and/or any combination of hardware and executable instructions (e.g., software and/or firmware). Thus, for example, any of the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, the partition assigner 240 and/or, more generally, the example graph partitioner 110 could be implemented by at least one of an analog or digital circuit, a logic circuit, a programmable processor, an application specific integrated circuit (ASIC), a programmable logic device (PLD) and/or a field programmable logic device (FPLD). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the graph receiver 210, the edge analyzer 220, the graph distribution monitor 230, and/or the partition assigner 240 is/are hereby expressly defined to include a tangible machine readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. storing the executable instructions. Further still, the example the graph partitioner 1 10 of FIG. 2 may include at least one element, process, and/or device in addition to, or instead of, those illustrated in FIG. 2, and/or may include more than one of any or ail of the illustrated elements, processes and devices.
[0028] FIG. 3 is a block diagram of an example implementation a graph distribution monitor 230 that may be used to implement the graph distribution monitor 230 of FIG. 2. The graph distribution monitor 230 of FIG. 3 includes a node tracker 310 and a partition load tracker 320. The example node tracker 310 and the partition load tracker 320 track distribution of graph data throughout the partitions 122 of the memory fabric 120 (e.g., in real time during partitioning of a graph) in accordance with examples herein. The node tracker 310 may thus provide information to the edge analyzer 220 and/or the partition assigner 240 for distribution of edges/edge data across the partitions 122.
[0029] The example node tracker 310 tracks a runtime state of the distribution of nodes throughout the partitions 122 of FIG. . in the example of FIG. 3, the node tracker 3 0 includes a node table 312, which may be implemented by any suitable data structure (e.g., a table, an index, a hash table, etc.), that indicates the partitions 122 in which a particular node has been assigned. For example, Node; (which may be represented by an identifier in the node table 312) of FIG. 3 may be located within the set of Partitionsi of the plurality of partitions 122. Partitions! may be a list of identifiers (e.g., numbers, addresses, locations, etc) identifying the partitions 122 of the memory fabric 120. As edges/edge data and corresponding nodes/node data are distributed (partitioned) across the partitions 122 of FIG. 1 , the node tracker 310 of FIG. 3 updates the node table 312. Accordingly, the node tracker 310 tracks/indicates which nodes are located in which partitions as graph data is partitioned.
Therefore, the edge analyzer 220 may refer to the node tracker 310 of the graph distribution monitor 230 to determine that nodes (or copies of node data) of an edge are or are not located within any of the partitions 122. in other words, the edge analyzer 220 may refer to the node tracker 310 to determine whether an edge of a graph was previously processed that is linked to at least one of the nodes of another edge of the graph that is under analysis.
[0030] The example partition load tracker 320 tracks a runtime state of loads of the partitions 122 of FIG. 1 based on the amount of data (e.g., edge data, node data, or any other data) that is stored within each of the partitions 122. In the example of FIG. 3, the partition load tracker 320 includes a partition load table 322, which may be implemented by any suitable data structure (e.g., a table). For example, Partitioni includes Ni data, where Ni refers to an amount of data in the partition. An amount of data may be representative of or include a number of edges (and corresponding edge data and node data of nodes of the edge) currently stored in the corresponding partition. As edges/edge data and corresponding nodes/node data are distributed (partitioned) across the partitions 122 of FIG. 1 , the partition load tracker 320 of FIG. 3 updates the partition load table 322. Accordingly, the partition load tracker 322
tracks/indicates loads of the partitions 122 that is representative of an amount of data (e.g., an amount of data for edges and corresponding nodes) that is stored within the partitions 122. Therefore, the partition assigner 240 may refer to the partition load tracker 320 of the graph distribution monitor 230 to determine the loads of the partitions 122, in particular, to identify a least loaded partition of a candidate set of partitions (e.g., a partition of a candidate set of partitions that includes a lowest amount of data). In some examples, if the partition load tracker 320 indicates that multiple partitions of a candidate set of partitions have a same load, a priority may be given to one partition over another for receiving the edge data (e.g., based on address, physical location (proximity) within the memory fabric 120, etc.).
[0031] Accordingly, the graph distribution monitor 230 of FIG. 3 tracks whether nodes (or node data) exist within the partitions 122 and within which partitions the nodes are located and the loads of the partitions 122. Using this information, the graph partitioner 110 may assign edges to the partitions to minimize a number of replicated nodes across the partitions as well as evenly distribute the loads the partitions.
[0032] While an example manner of implementing the graph distribution monitor 230 of FIG. 2 is illustrated in FIG. 3, at least one of the elements, processes and/or devices illustrated in FIG. 3 may be combined, divided, rearranged, omitted, eliminated and/or implemented in any other way. Further, the node tracker 310, the node table 312, the partition load tracker 320, the partition load table 322, and/or, more generally, the example graph distribution monitor 230 of FIG, 3 may be implemented by hardware and/or any combination of hardware and executable instructions (e.g., software and/or firmware). Thus, for example, any of the node tracker 310, the node fable 312, the partition load tracker 320, the partition load table 322, and/or, more generally, the example graph distribution monitor 230 could be implemented by at least one of an analog or digital circuit, a logic circuit, a programmable processor, an application specific integrated circuit (ASIC), a programmable logic device (PLD) and/or a field programmable logic device (FPLD). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the node tracker 310, the node table 312, the partition load tracker 320, and/or the partition load table 322 is/are hereby expressly defined to include a tangible machine readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Biu-ray disk, etc. storing the executable instructions. Further still, the example graph distribution monitor 230 of FIG. 3 may include at least one element, process, and/or device in addition to, or instead of, those illustrated in FIG. 3, and/or may include more than one of any or ail of the illustrated elements, processes and devices. [0033] Flowcharts representative of example machine readable instructions for implementing the graph partitioner 1 10 of FIGS. 2 are shown in FIGS. 4 and 5. In these examples, the machine readable instructions may comprise program(s)/process(es) for execution by a processor such as the processor 612 shown in the example processor platform 600 discussed below in connection with FIG. 6. The program(s)/process(es) may be embodied in executable instructions (e.g., software) stored on a tangible machine readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 612, but the entirety of the program(s)/process(es) and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware. Further, although the example program(s)/process(es) is/are described with reference to the flowcharts illustrated in FIG. 4 and/or 5, many other methods of implementing the example graph partitioner 10 may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined.
[0034] The example process 400 of FIG. 4 begins with an initiation of the graph partitioner 110 (e.g., upon startup, upon instructions from a user, upon startup of a device graph partitioner 110 (e.g., the graph partitioning system 100), etc.). The example process 400 of FIG. 4, may be executed to assign an edge of a graph to a partition during a partitioning of the graph to the plurality of partitions 122 of the memory fabric 120. At block 410 of FIG. 4, the edge analyzer 220 retrieves (or receives) edge data (e.g., from the edge receiver 210) for an edge of a graph to assign the edge data to a partition 122 of the memory fabric 120. For example, at block 410, the edge analyzer 220 may retrieve an edge and identify nodes associated with the edge. At block 420, the edge analyzer determines that node data corresponding to at least one node of the pair of nodes linked by the edge exists in a candidate set of partitions of the memory fabric 120. For example, at block 420, the edge analyzer 220 may refer to node tracker 3 0 of the graph distribution monitor 230 to determine whether data for one or both of the nodes linked by an edge is located in a partition (or multiple partitions) of the plurality of partitions 122. If so, the candidate set of partitions may be any of the partitions that includes node data or an edge that is associated with at least one of the nodes.
[0035] At block 430 of FIG, 4, the partition assigner 240 assigns the edge data for the edge to one of the partitions of the candidate set of partitions based on respective loads of each partition of the candidate set of partitions. For example, at block 430, the partition assigner 240 may compare the loads of the candidate set of partitions (e.g., by referring to the partition load tracker 320 of the graph distribution monitor 230), identify a least loaded partition, and assign the edge data to the least loaded partition. In some examples, if, at block 430, the candidate set of partitions includes a single partition, then the partition assigner 240 may assign the edge data to that single partition assuming the load of that single partition is not at maximum capacity (i.e., that the partition is full). In some examples, if a selected partition is at capacity, a secondary candidate partition may be selected, or data within the selected candidate partition may be redistributed to other appropriate partitions. After block 430, the example process 400 ends.
[0038] The example process 500 of FIG. 5 begins with an initiation of the graph partitioner 110. The example process 500 may be executed to partition a graph into the plurality of partitions 122 of the memory fabric 120. At block 505, the edge analyzer 220 (and/or the graph receiver 220) selects a next edge 505 for assignment to a partition, in some examples, if the next edge 505 is a first edge of a graph to be assigned, the first edge may be assigned to any one of the partitions 122 (e.g., a first listed partition, a nearest partition based on a physical location, etc.),
[0037]At block 510 of FIG. 5, the edge analyzer 220 determines whether a candidate set of partitions exist that includes one of the nodes linked by the selected edge. For example, at block 510, the edge analyzer 220 may refer to the node tracker 310 to determine whether a partition (or multiple partitions) exist that include (e.g., is/are storing, is/are assigned, etc.) node data corresponding to at least one of the nodes linked by the edge. If, at block 510, a candidate set of partitions exist that includes one of the nodes linked by the selected edge, control advances to block 520. If no such candidate set exists that includes one of the nodes linked by the selected edge (block 510), then, at block 5 5, the partition assigner 240 may assign the edge to any one of the partitions 122 that is least loaded,
[0038] At block 520 of FIG. 5, the edge analyzer 220 determines whether a candidate set of partitions exist that includes both of the nodes linked by the selected edge. For example, at block 510, the edge analyzer 220 may refer to the node tracker 310 to determine whether a partition (or multiple partitions) exist that include node data corresponding to both of the nodes linked by the edge. If, at block 510, no such candidate set of partitions exists that includes both of the nodes linked by the selected edge (i.e., only one of the nodes is found within the partitions 122), then control advances to block 530. if a candidate set of partitions does exist that includes both of the nodes linked by the selected edge (block 520), then, at block 525, the partition assigner 240 assigns the edge to least loaded partition of the set that includes both nodes of the edge.
[0039] At block 530 of FIG. 5, the edge analyzer 220 determines a degree of each of the nodes of the selected edge. For example, at block 530, the edge analyzer 220 may refer to node characteristics or an analysis performed identifying the number of edge linked to each of the nodes (i.e., the degree). It is noted that from the analysis of blocks 510 and 520, at block 530, only one of the nodes linked by the edge selected at block 505 is located within at least one of the partitions 122. if, at block 535, the edge analyzer 220 does not determine that both of the nodes are either high-degree or that both of the nodes are low-degree (i.e., one of the nodes is a high-degree node and one of the nodes is a low-degree node), then control advances to block 545. If the edge analyzer 220 determines that both nodes of the selected edge are a high- degree node or both nodes of the edge are a low-degree node (block 535), then, at block 540, the partition assigner 240 assigns the edge to the least loaded partition of the set of partitions that includes the node.
[0040] It is noted, that following the analysis of blocks 510, 520, and 535, one of the two nodes linked by the edge is included in a partition (or multiple partitions) of the partitions 122, and the selected edge from block 505 links a low-degree node and a high-degree node. At block 545 of FIG. 5, the edge analyzer 220 determines whether the candidate set of partitions includes the low-degree node (or a copy of the node data for the low degree node). If, at block 545, the candidate set of partitions includes partitions with the low-degree node, then the partition assigner 240 assigns the edge to the least loaded partition that includes the low-degree node (block 550). If the candidate set of partitions does not include partitions with the low-degree node (i.e., the candidate set of partitions includes partition(s) with the high-degree node) (block 545), then, at block 555, the partition assigner 240 assigns the edge to the least loaded partition that includes the high-degree node.
[0041] At block 560, the graph partitioner 1 10 determines whether more edges of the graph are to be assigned to the partitions 122 of the memory fabric 120. If, at block 580, the graph partitioner 110 determines that more edges are to be assigned to the partitions (i.e., the graph partitioner 110 has not completed partitioning the graph), then control returns to block 505 to assign the next edge, if the graph partitioner 110 determines that there are no more edges to be assigned to the partitions 122 (i.e., the graph partitioning is complete), then the example process 500 ends, in the example process 500 of FIG. 5, when assigning the edge to a least loaded partition (e.g., at blocks 515, 525, 540, 550), the partition assigner 240 may refer to the partition load tracker 320 to determine the least loaded partition of the partitions 122 or of the candidate set of partitions.
[0042] As mentioned above, the example processes of FIGS. 4 and 5 may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term tangible machine readable storage medium is expressly defined to include any type of machine readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media. As used herein, "computer readable storage medium" and "machine readable storage medium" are used interchangeably. Additionally or alternatively, the example processes of FIGS. 4 and 5may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term non- transitory machine readable medium is expressly defined to include any type of machine readable storage device and/or storage disk and to exclude
propagating signals and to exclude transmission media.
[0043] As used herein, when the phrase "at least" is used as the transition term in a preamble of a claim, it is open-ended in the same manner as the term "comprising" is open ended. As used herein the term "a" or "an" may mean "at least one," and therefore, "a" or "an" do not necessarily limit a particular element to a single element when used to describe the element. As used herein, when the term "or" is used in a series, it is not, unless otherwise indicated, considered an "exclusive or."
[0044] FIG. 8 is a block diagram of an example processor platform 600 capable of executing the instructions of FIGS. 4 and 5 to implement the graph partitioner of FIG. 2, The example processor platform 600 may be or may be included in any type of apparatus, such as a server, a personal computer, a mobile device (e.g., a cell phone, a smart phone, a tablet, etc.), any other type of computing device.
[0045] The processor platform 600 of the illustrated example of FIG, 6 includes a processor 612. The processor 612 of the illustrated example is hardware. For example, the processor 612 can be implemented by at least one integrated circuit, logic circuit, microprocessor or controller from any desired family or manufacturer.
[0046] The processor 612 of the illustrated example includes a local memory 613 (e.g., a cache). The processor 612 of the illustrated example is in communication with a main memory including a volatile memory 614 and a nonvolatile memory 616 via a bus 618. The volatile memory 614 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM) and/or any other type of random access memory device. The non-volatile memory 616 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 614, 616 is controlled by a memory controller.
[0047] The processor platform 600 of the illustrated example also includes an interface circuit 620. The interface circuit 620 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a peripheral component interconnect (PCI) express interface.
[0048] In the illustrated example, at least one input device 622 is connected to the interface circuit 620. The input device(s) 622 permit(s) a user to enter data and commands into the processor 612. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, and/or a voice recognition system.
[0049] At least one output device 624 is also connected to the interface circuit 620 of the illustrated example. The output device(s) 624 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touchscreen, a tactile output device, a light emitting diode (LED), a printer and/or speakers). The interface circuit 620 of the illustrated example, thus, may include a graphics driver card, a graphics driver chip or a graphics driver processor. [0050]The interface circuit 620 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 626 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
[0051]The processor platform 600 of the illustrated example also includes at least one mass storage device 628 for storing executable
instructions (e.g., software) and/or data. Examples of such mass storage device(s) 628 include floppy disk drives, hard drive disks, compact disk drives, Biu-ray disk drives, RAID systems, and digital versatile disk (DVD) drives,
[0052] The coded instructions 632 of FIGS. 4 and/or 5 may be stored in the mass storage device 628, in the local memory 613 in the volatile memory 614, in the non-volatile memory 616, and/or on a removable tangible machine readable storage medium such as a CD or DVD.
[0053] From the foregoing, it will be appreciated that the above disclosed methods, apparatus and articles of manufacture enable partitioning of a graph. The examples disclosed herein partition a graph to limit or minimize a number of replicated nodes across partitions of a graph as well as attempt to evenly distribute edges across the partitions of the graph such that the loads of the partitions are relatively even (e.g., within a threshold distribution). Accordingly, examples herein provide a partitioned graph that may be processed with relatively increased efficiency.
[0054] Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers ail methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.

Claims

CLAIMS What is claimed is:
1. A method comprising:
retrieving, via a processor, edge data for an edge of a graph to assign the edge data to a partition of a memory, the edge linking a pair of nodes of the graph;
determining, via the processor, that node data corresponding to at least one node of the pair of nodes linked by the edge exists in a candidate set of partitions of the memory; and
assigning, via the processor, the edge data for the edge to a selected one of the partitions of the candidate set of partitions based on respective loads of each partition of the candidate set of partitions.
2. The method as defined in claim 1 , further comprising:
determining that node data corresponding to both of the nodes of the edge does not exist in a same partition of the candidate set of partitions;
determining the degree of each of the nodes, the degree of each of the nodes representing a number of edges linked with the corresponding node; and assigning edge data for each of the edges to the selected one of the partitions of the candidate partitions based on the degree of each of the nodes.
3. The method as defined in claim 2, further comprising: determining that each partition of the candidate set of partitions comprises node data corresponding to both nodes of the edge; and
assigning the edge data for the edge to the selected one of the partitions, the selected one of the partitions comprising a least loaded partition of the candidate set of partitions.
4. The method as defined in claim 2, further comprising:
determining the degree of the nodes by determining whether each of the nodes of the edge comprises a low-degree node or a high-degree node based on the degree of each of the nodes, the low-degree node comprising a node associated with less than a threshold number of edges and the high-degree node comprising a node associated with more than the threshold number of edges.
5. The method as defined in claim 4, further comprising:
determining that both nodes of the edge are low-degree nodes or that both nodes of the first edge are high-degree nodes; and
assigning edge data for the edge to a least loaded partition of the candidate set of partitions.
8. The method as defined in claim 4, further comprising: determining that a first node from the nodes of the edge is a low-degree node and that a second node from the nodes of the edge is a high-degree node; determining whether node data of the first node exists in a first set of partitions of the candidate set of partitions and whether node data
corresponding to the second node exists in the second set of partitions of the candidate set of partitions; and
when the node data of the first node is located in the first set of partitions, assigning edge data for the edge and node data for the first node and the second node to a least loaded partition of the first set of partitions,
when the node data of the low-degree node is not located in the set of partitions, assigning edge data for the edge and node data for the first node and the second node to a least loaded partition of the second set of partitions that include node data corresponding to the high-degree node, and
when node data corresponding to the low-degree node is not located in the first set of partitions and node data from the high-degree node is not located in the second set of partitions, assigning edge data for the edge and node data for the first node and the second node to a least loaded partition of the plurality of partitions.
7. The method as defined in claim 1 , further comprising:
analyzing edge data for a second edge of the graph, the second edge of the graph linking a second set of two nodes of the graph; determining that node data corresponding to either of the nodes of the second edge does not exist in any partition of the memory; and
assigning the edge data for the second edge and node data for the second set of the two nodes to the a least loaded partition of the memory.
8. An apparatus comprising:
a graph receiver to receive a graph to be partitioned, the graph comprising edges and nodes linked by the edges, the edges to be distributed across a plurality of partitions of a memory;
a graph distribution monitor to track which partitions include which nodes and loads of the partitions;
an edge analyzer to determine a candidate set of partitions from the plurality of partitions for each edge based on which partitions include the nodes linked by the corresponding edge; and
a partition assigner to assign the edges to one of the candidate set of partitions based on the load of each partition of the candidate set of partitions.
9. The apparatus of claim 8, wherein the partition assigner is to assign the edges to a least loaded partition of the candidate set of partitions.
10. The apparatus of claim 8, wherein the graph distribution monitor comprises:
a node fable to list, for each node, the partitions that include a copy of node data of the node; and a partition load table to indicate, for each partition, the load of the partition.
1. The apparatus of claim 8, wherein, for each edge, the edge analyzer performs a comparative analysis of the plurality of partitions to determine the candidate set of partitions, each of the candidate set of partitions including at least one node of the edge.
12. The apparatus of claim 1 1 , wherein the comparative analysis give priority to low-degree nodes of the edges, the low-degree nodes comprising nodes linked to less than a threshold number of edges.
13. The apparatus of claim 8, wherein,
the edge analyzer is further to determine that none of the plurality of partitions includes either of the nodes of an edge; and
when the edge analyzer determines that none of the plurality of partitions includes either of the nodes of the edge, the partition assigner is to assign the edge to a least loaded partition of the plurality of partitions.
14. A non-transitory machine readable medium comprising instructions that, when executed, cause a machine to at least:
retrieve edge data for an edge of a graph to be partitioned in a plurality of partitions of a memory, the edge linking a pair of nodes of the graph;
determine a candidate set of partitions from the plurality of partitions that include at least one node linked by the edge of the graph; and assign the edge to the least loaded partition of the candidate set of partitions by writing the edge data to the least loaded partition.
5. The non-transitory machine readable medium of claim 14, wherein the instructions, when executed, further cause the machine to:
determine the candidate set of partitions by:
determining a degree of each of the pair of nodes, the degree representative of a number of edges linked to each of the pair of nodes; and for each partition of the plurality of partitions, performing a comparative analysis that minimizes an amount of replication of the node across the partitions based on the degrees of each of the pair of nodes.
PCT/US2016/014512 2016-01-22 2016-01-22 Graph partitioning WO2017127106A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2016/014512 WO2017127106A1 (en) 2016-01-22 2016-01-22 Graph partitioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2016/014512 WO2017127106A1 (en) 2016-01-22 2016-01-22 Graph partitioning

Publications (1)

Publication Number Publication Date
WO2017127106A1 true WO2017127106A1 (en) 2017-07-27

Family

ID=59362589

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/014512 WO2017127106A1 (en) 2016-01-22 2016-01-22 Graph partitioning

Country Status (1)

Country Link
WO (1) WO2017127106A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110276649A1 (en) * 2010-05-10 2011-11-10 Telefonica, S.A. Method for Efficient Partition and Replication of Social-Based Applications
US20120317142A1 (en) * 2009-09-11 2012-12-13 University Of Maryland, College Park Systmen and method for data management in large data networks
US20140280360A1 (en) * 2013-03-15 2014-09-18 James Webber Graph database devices and methods for partitioning graphs
US20150095348A1 (en) * 2013-10-01 2015-04-02 Facebook, Inc. Systems and methods for dynamic mapping for locality and balance
EP2884453A1 (en) * 2013-12-12 2015-06-17 Telefonica Digital España, S.L.U. A computer implemented method, a system and computer program product for partitioning a graph representative of a communication network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317142A1 (en) * 2009-09-11 2012-12-13 University Of Maryland, College Park Systmen and method for data management in large data networks
US20110276649A1 (en) * 2010-05-10 2011-11-10 Telefonica, S.A. Method for Efficient Partition and Replication of Social-Based Applications
US20140280360A1 (en) * 2013-03-15 2014-09-18 James Webber Graph database devices and methods for partitioning graphs
US20150095348A1 (en) * 2013-10-01 2015-04-02 Facebook, Inc. Systems and methods for dynamic mapping for locality and balance
EP2884453A1 (en) * 2013-12-12 2015-06-17 Telefonica Digital España, S.L.U. A computer implemented method, a system and computer program product for partitioning a graph representative of a communication network

Similar Documents

Publication Publication Date Title
US8886781B2 (en) Load balancing in cluster storage systems
US10089761B2 (en) Graph processing using a shared memory
US9020892B2 (en) Efficient metadata storage
US9015441B2 (en) Memory usage scanning
CN106407207B (en) Real-time newly-added data updating method and device
US10356150B1 (en) Automated repartitioning of streaming data
US11429636B2 (en) Smart elastic scaling based on application scenarios
US10120588B2 (en) Sliding-window multi-class striping
US10664458B2 (en) Database rebalancing method
US20170270145A1 (en) Index updates using parallel and hybrid execution
US20180196610A1 (en) Database Memory Monitoring and Defragmentation of Database Indexes
US10120860B2 (en) Methods and apparatus to identify a count of n-grams appearing in a corpus
US20240061712A1 (en) Method, apparatus, and system for creating training task on ai training platform, and medium
CN115129782A (en) Partition level connection method and device for distributed database
CN110737717A (en) database migration method and device
US20210349918A1 (en) Methods and apparatus to partition a database
US20170344607A1 (en) Apparatus and method for controlling skew in distributed etl job
CN110928941B (en) Data fragment extraction method and device
US20230132117A1 (en) Handling system-characteristics drift in machine learning applications
CN107102898B (en) Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture
US20170308574A1 (en) Method and apparatus for reducing query processing time by dynamically changing algorithms and computer readable medium therefor
WO2017127106A1 (en) Graph partitioning
US9898504B1 (en) System, method, and computer program for accessing data on a big data platform
US10452492B2 (en) Method, apparatus, and computer program stored in computer readable medium for recovering block in database system
US20150347409A1 (en) Convert Command Into a BULK Load Operation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16886743

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16886743

Country of ref document: EP

Kind code of ref document: A1