CN111538867A

CN111538867A - Method and system for dividing bounded incremental graph

Info

Publication number: CN111538867A
Application number: CN202010294991.4A
Authority: CN
Inventors: 樊文飞; 田超; 许瑞琦
Original assignee: Shenzhen Institute of Computing Sciences
Current assignee: Shenzhen Institute of Computing Sciences
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2020-08-14
Anticipated expiration: 2040-04-15
Also published as: CN111538867B; WO2021208147A1

Abstract

The invention discloses a method and a system for dividing a bounded incremental graph. The method comprises the following steps: the coordinator divides the initial graph structure into a plurality of first sub-graphs, correspondingly obtains a plurality of first sub-divisions, and distributes the first sub-divisions to a plurality of services; each service carries out iterative expansion on the acquired first sub-division, judges whether the first sub-division reaches a preset equilibrium upper bound or not in the iterative expansion process, and stops expanding the first sub-division if the first sub-division reaches the preset equilibrium upper bound; the coordinator confirms whether the updated data exists; if the updated data exists, the updated data is merged with the initial graph structure to obtain an updated partial graph structure, then the partial graph structure is divided into a plurality of second subgraphs and corresponding second subdivisions, the second subdivisions are distributed to the service, and the service receiving the second subdivisions carries out iterative expansion. The invention can reduce the calculation cost in the division of the distributed graph and make the division result more balanced.

Description

Method and system for dividing bounded incremental graph

Technical Field

The invention relates to the field of distributed graph partitioning, in particular to a bounded incremental graph partitioning method and a bounded incremental graph partitioning system.

Background

A graph (graph) is a network of vertices and edges between the vertices. Graph partitioning (graph partitioning) is the partitioning of a graph into sub-graphs such that the sizes of the different sub-graphs are approximately equal and the resulting partitioning cost (cut edges or cut points) is minimized as much as possible. The graph partitions can be divided into: a point partitioning (vertex partitioning) and an edge partitioning (edge partitioning), the former of which divides a node set of a graph; the latter divides the edge set of the graph. Graph partitioning problems are prevalent in various aspects of computer science and technology, such as image segmentation, data clustering, large scale integrated circuit design, distributed parallel computing systems, and the like. On the other hand, many practical problems can also be modeled as graphs, such as knowledge graphs and the like.

In recent years, with the development of the internet, graph data is explosively increased, which brings great challenges to traditional graph computation, such as computation and storage of large-scale graph data. The graph data under the large data can not be accommodated by the memory space of a single machine, so that the graph must be divided and then stored on different computing nodes respectively for distributed computing. The distributed system is a computing system consisting of a plurality of independent computers and a communication network among the independent computers, and each computing node has an independent CPU, a memory address and a storage space. Distributed graph computation needs to divide and divide large-scale graph data into a plurality of subgraphs, store the subgraphs in memories or disks of different nodes, compute all the computations simultaneously and coordinate the computation operation through network communication to complete the computation task. Whether a distributed computing system can operate efficiently depends on: one important indicator of the computational performance, system bandwidth, and quality of graph partitioning of each node, whether it is efficient, is the response time of the distributed system, i.e., the total time from submitting a computational task to obtaining a computational result.

Two indexes need to be considered when dividing the graph: one is load balancing, that is, under the condition of uneven load distribution, the computing node with the highest load can form a computing bottleneck and delay the response time seriously. Assuming that all compute nodes have equal amounts of compute resources, the more balanced the graph partitioning, the shorter the total response time. One indicator of graph partitioning is therefore equality. The second is communication overhead, i.e., communication between each node through the network also increases response time. Communications are caused by the boundaries of the graph being partitioned, and communications are generated when computations need to cross a partition boundary. Therefore, the more sparse the boundary of the graph division is, the less the total amount of communication is, so that the time occupied by the communication is reduced.

The graph partitioning systems that have been widely used now include METIS (a package for serial graph splitting), xtrapalp (a graph partitioning tool), etc., which can generate a partition of graph data on a static graph. However, in practical applications, most graph data is dynamic and is frequently updated, and the updated part is often only a small proportion of the whole graph. The static graph partitioning method and system need to recalculate the whole graph partitioning, and have huge calculation overhead and long time consumption. For example, dividing static map data of about 20GB in size using xtrapalp takes 10 minutes or more. This requires incremental partitioning, i.e., dynamically computing new graph partitions based on the updated portion of the graph data and the existing partitioning results. When the updating amount is smaller, the changing amount of the general partition result is smaller, so that the incremental partition can quickly return a new partition result.

The existing graph partitioning methods all have certain disadvantages, for example, for non-incremental point partitioning and edge partitioning, even a small amount of update needs to be completely recalculated, resulting in increased computational overhead; for point division of non-bounded increments, the division results are unbalanced, and the calculation cost is large when a small amount of updating exists; for non-bounded delta edge partitioning, there are a small number of updates whose computational overhead is also relatively large; for the point division of the bounded increment, the effect of equalization cannot be achieved when the point division is used for dividing the graph. That is, the aforementioned several distributed graph partitioning methods are more or less unable to satisfy two criteria that need to be considered when performing graph partitioning.

Disclosure of Invention

The embodiment of the invention provides a method and a system for dividing a bounded incremental graph, aiming at reducing the calculation overhead of graph division and enabling the graph division result to be more balanced.

In a first aspect, an embodiment of the present invention provides a bounded increment graph partitioning method, where the method includes:

the coordinator divides the initial graph structure into a plurality of first sub-graphs, correspondingly obtains a plurality of first sub-divisions, and distributes the plurality of first sub-graphs and the corresponding first sub-divisions to a plurality of services;

each service acquires a first sub-partition corresponding to a respective first sub-graph, performs iterative expansion on the first sub-partition, and judges whether the first sub-partition reaches a preset first equilibrium upper bound or not in the iterative expansion process, if the first sub-partition reaches the preset first equilibrium upper bound, stops expanding the first sub-partition, and if the first sub-partition does not reach the preset first equilibrium upper bound, continues expanding the first sub-partition;

when all the services complete respective corresponding expansion, feeding back information of the current iteration completion to the coordinator;

after receiving the information of the current iteration completion, the coordinator determines whether an unallocated edge exists or not;

if the unallocated edges exist, the coordinator informs the service of performing iterative expansion on the unallocated edges until all the edges are allocated;

if the unallocated edge does not exist, determining whether the update data exists;

if the updated data exists, the coordinator firstly merges the updated data and the initial graph structure to obtain a partial graph structure corresponding to the updated data, then divides the partial graph structure into a plurality of second subgraphs, obtains a second subdivision corresponding to the second subgraph, distributes the plurality of second subdivisions to the plurality of services, and carries out iterative expansion by the service receiving the second subdivision;

if there is no update data, the graph division processing is ended.

Further, the obtaining, by each service, a first sub-partition corresponding to the respective first sub-graph, and iteratively expanding the first sub-partition includes:

the service acquires a derivative vertex set in the first subdivision, acquires a core vertex set in the initial graph structure, selects a vertex with a priority greater than a preset level threshold value from a difference set of the derivative vertex set and the core vertex set as an expansion vertex set, and then expands all expansion vertices in the expansion vertex set.

Further, the expanding all the expansion vertexes in the expansion vertex set includes:

acquiring unallocated adjacent edges corresponding to all the expansion vertexes, and allocating the adjacent edges to the first subdivision;

updating the derived vertex set according to the newly allocated edges;

judging whether the updated derived vertex set has the condition that two endpoints corresponding to adjacent edges are both in the derived vertex set or not;

and if the two endpoints corresponding to the adjacent edges are both in the derived vertex set, distributing the corresponding adjacent edges to the first subdivision.

Further, the determining whether the first sub-partition reaches a preset first upper balance limit includes:

firstly according to the formula

Calculating a first balance upper bound and the number of edges in the first sub-division, then judging whether the calculated number of edges reaches the first balance upper bound, and if so, judging that the first sub-division reaches a preset first balance upper bound;

in the formula, k is the total number of all the first subdivisions, and | E | is the total number of edges of the initial graph structure.

Further, the expanding all the expansion vertexes in the expansion vertex set further includes:

and when all adjacent edges in the derived vertex set are distributed, randomly selecting a core vertex from the core vertex set, and distributing the core vertex to the derived vertex set.

Further, if there is update data, the coordinator first merges the update data with the initial graph structure to obtain a partial graph structure corresponding to the update data, then divides the partial graph structure into a plurality of second subgraphs, obtains a second subdivision corresponding to the second subgraph, distributes the plurality of second subdivisions to the plurality of services, and performs iterative expansion by the service receiving the second subdivision, including:

merging the updated data and the initial graph structure to obtain an updated graph structure and a partial graph structure corresponding to the updated data;

dividing the partial graph structure to obtain a plurality of second subgraphs and second subdivisions corresponding to the second subgraphs;

calculating the total number of edges of the updated graph structure, and calculating a second balance upper bound according to the total number of edges of the updated graph structure and the total number of all the first sub-partitions;

moving out partial edges in a second sub-partition that reaches the second upper equilibrium bound such that the second sub-partition satisfies the second upper equilibrium bound;

and acquiring and removing redundant vertexes in the second subdivision and derivative vertexes of which the number of corresponding adjacent edges is smaller than a second preset edge number value.

Further, if there is update data, the coordinator first merges the update data with the initial graph structure, and obtains a partial graph structure corresponding to the update data, then divides the partial graph structure into a plurality of second subgraphs, and obtains a second subdivision corresponding to the second subgraph, and distributes the plurality of second subdivisions to the plurality of services, and performs iterative expansion by the service receiving the second subdivision, and the method further includes:

and the coordinator distributes the second plurality of sub-partitions in a broadcast distribution mode.

In a second aspect, an embodiment of the present invention further provides a bounded incremental graph partitioning system, including a coordinator and a plurality of services;

the coordinator includes:

the system comprises a first dividing unit, a second dividing unit and a third dividing unit, wherein the first dividing unit is used for dividing an initial graph structure into a plurality of subgraphs, correspondingly obtaining a plurality of first sub-divisions, and distributing the subgraphs and the corresponding first sub-divisions to a plurality of services;

the first confirming unit is used for confirming whether an unallocated edge exists or not after receiving the information that the current iteration is completed;

a notification unit, configured to notify the service to perform iterative expansion on the unallocated edges until all the edges are allocated;

a second confirming unit configured to confirm whether there is update data if there is no unallocated edge;

a second dividing unit, configured to, if there is update data, merge the update data with the initial graph structure, obtain a partial graph structure corresponding to the update data, divide the partial graph structure into multiple second subgraphs, obtain second subdivisions corresponding to the second subgraphs, distribute the multiple second subdivisions to the multiple services, and perform iterative expansion by the services receiving the second subdivisions;

and an ending unit configured to end the graph division processing if there is no update data.

Each service comprises:

the iterative expansion unit is used for acquiring first sub-partitions corresponding to respective sub-graphs, performing iterative expansion on the first sub-partitions, judging whether the first sub-partitions reach a preset first equilibrium upper bound or not in the iterative expansion process, stopping the expansion of the first sub-partitions if the first sub-partitions reach the preset first equilibrium upper bound, and continuing to expand the first sub-partitions if the first sub-partitions do not reach the preset first equilibrium upper bound;

and the feedback unit is used for feeding back information of finishing current iteration to the coordinator when all the services finish respective corresponding expansion.

Further, the bounded incremental graph partitioning system further comprises an IO controller;

the IO controller is used for receiving external update data of the initial graph structure and forwarding the update data to the coordinator.

Further, each service further includes:

the first distribution unit is used for acquiring the unallocated adjacent edges corresponding to all the expansion vertexes and distributing the adjacent edges to the first subdivision;

an updating unit, configured to update the derived vertex set according to the newly allocated edge;

the judging unit is used for judging whether the updated derived vertex set has the condition that two end points corresponding to the adjacent edges are both in the derived vertex set;

and the second distribution unit is used for distributing the corresponding adjacent edges to the first subdivision if the two end points corresponding to the adjacent edges are both in the derived vertex set.

The embodiment of the invention provides a method and a system for dividing a bounded incremental graph. The method comprises the steps that a coordinator divides an initial graph structure into a plurality of first sub-graphs, correspondingly obtains a plurality of first sub-divisions, and distributes the plurality of first sub-graphs and the corresponding first sub-divisions to a plurality of services; each service acquires a first sub-partition corresponding to a respective first sub-graph, performs iterative expansion on the first sub-partition, and judges whether the first sub-partition reaches a preset first equilibrium upper bound or not in the iterative expansion process, if the first sub-partition reaches the preset first equilibrium upper bound, stops expanding the first sub-partition, and if the first sub-partition does not reach the preset first equilibrium upper bound, continues expanding the first sub-partition; when all the services complete respective corresponding expansion, feeding back information of the current iteration completion to the coordinator; after receiving the information of the current iteration completion, the coordinator determines whether an unallocated edge exists or not; if the unallocated edges exist, the coordinator informs the service of performing iterative expansion on the unallocated edges until all the edges are allocated; if the unallocated edge does not exist, determining whether the update data exists; if the updated data exists, the coordinator firstly merges the updated data and the initial graph structure to obtain a partial graph structure corresponding to the updated data, then divides the partial graph structure into a plurality of second subgraphs, obtains a second subdivision corresponding to the second subgraph, distributes the plurality of second subdivisions to the plurality of services, and carries out iterative expansion by the service receiving the second subdivision; if there is no update data, the graph division processing is ended. The embodiment of the invention can reduce the calculation cost in the division of the distributed graph and make the division result more balanced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a bounded delta graph partitioning method according to an embodiment of the present invention;

FIG. 2 is a sub-flow diagram illustrating a bounded delta graph partitioning method according to an embodiment of the present invention;

FIG. 3 is an exemplary diagram of a bounded delta graph partitioning method according to an embodiment of the present invention;

FIG. 4 is a schematic view of another sub-flow of a bounded delta graph partitioning method according to an embodiment of the present invention;

FIG. 5 is a basic architecture diagram of a bounded incremental graph partitioning method according to an embodiment of the present invention;

FIG. 6 is a basic flowchart of a bounded delta graph partitioning method according to an embodiment of the present invention;

FIG. 7 is a flowchart illustrating a method for partitioning a bounded delta graph according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flowchart of a bounded incremental graph partitioning method according to an embodiment of the present invention, which specifically includes: steps S101 to S107.

S101, the initial graph structure G is divided into a plurality of first sub-graphs G by the coordinator_iAnd correspondingly obtaining a plurality of first sub-divisions P_iAnd combining the plurality of first sub-graphs G_iAnd a corresponding first subdivision P_iDistribution into a plurality of services;

s102, each service acquires respective first sub-graph G_iCorresponding first subdivision P_iAnd dividing the first sub-division P_iPerforming iterative expansion, and judging the first subdivision P in the iterative expansion process_iWhether a preset first equilibrium upper bound is reached, if the first subdivision P_iStopping the first subdivision P when a preset first equilibrium upper bound is reached_iIf said first subdivision P_iIf the preset first balance upper bound is not reached, continuing to divide the first sub-partition P_iCarrying out expansion;

s103, when all the services complete respective corresponding expansion, feeding back information of current iteration completion to the coordinator;

s104, after receiving the information of the current iteration completion, the coordinator confirms whether an unallocated edge exists or not, and if the unallocated edge exists, the step S105 is executed; if there is no unallocated edge, go to step S106;

s105, the coordinator informs the service of performing iterative expansion on the unallocated edges until all the edges are allocated;

s106, confirming whether the updated data delta G exists or not; if the update data Δ G exists, the process proceeds to step S107; if the update data Δ G does not exist, the process proceeds to step S108.

S107, the coordinator firstly merges the updated data delta G and the initial graph structure G to obtain a partial graph structure delta G 'corresponding to the updated data, and then divides the partial graph structure delta G' into a plurality of second sub-graphs G_i', and obtaining a second subdivision P corresponding to the second subgraph_i', and dividing said plurality of second sub-divisions P_i' distributing to the plurality of services, and receiving the second subdivision P_i' performs iterative expansion;

and S108, ending the graph dividing processing.

In this embodiment, when performing distributed computation on the initial graph structure G, on one hand, each service is enabled to perform a first subdivision P for each service by presetting a first upper balance bound on each service_iWhen the iterative expansion is carried out, the expansion is not carried out without limitation, but when the first subdivision P is divided_iTo the first balanceOn the other hand, when the initial graph structure G has the updated data delta G, the updated graph structure G ⊕ delta G does not need to be completely expanded, and only the updated part delta G' needs to be iteratively expanded, so that the effect of incremental partitioning is realized.

In this embodiment, the distribution of the subdivisions is achieved by the coordinator, and the iterative expansion of the different subdivisions is achieved by multiple services. The coordinator in this embodiment does not store a data structure related to the graph structure, but stores only a small number of temporary variables. Besides distributing each subdivision to each service, the coordinator monitors the processes of all the graph divisions through a global variable, further determines the operation steps to be executed next according to the processes of each service, and forwards the communication among the services, and the communication content is generally used for synchronizing the data corresponding to each subdivision, and theoretically can be a data structure after any serialization. All services in the embodiment have independent CPU and memory space, and sub-graph G is stored_iAnd a first subdivision P on the subgraph_iAnd all services are iteratively expanded independently. In addition, upon reception of the update data Δ G distributed by the coordinator (i.e. the second subdivision P)_i') each service still performs the iterative expansion independently, i.e. the second subdivision is computed locally. It should be noted that the update data Δ G in the present embodiment includes insertion data Δ of a graph structure⁺G and deletion data Delta^-G。

Currently, existing incremental graph partitioning systems, such as ParMETIS (an MPI parallel library, which has many built-in non-structure graphs and algorithms for grid partitioning (partitioning) and subdivision (partitioning)), Hermes (a large data real-time multidimensional analysis platform), and the like, although the calculation can be accelerated by using the previous old partitioning result, the existing incremental graph partitioning systems are not sensitive to the update size, and also take a lot of time to calculate when the update amount is small. The reason is that the algorithm does not have incremental bounding (incremental bounding) property, i.e. the incremental computation cost cannot be limited by an expression about the update size, which requires designing a bounded incremental algorithm, so that the graph partitioning system can quickly return a new graph partitioning result at a smaller update amount. On the other hand, the existing incremental graph partitioning system is designed based on a point partitioning model. Compared with point division, edge division can divide the graph data more uniformly to obtain a better division result, and the incremental graph division algorithm in the current edge division mode is still blank.

The embodiment provides a method for dividing a distributed bounded increment graph in an edge division mode, and importantly, the method is the first method with bounded increments in the field of distributed graph division, realizes bounded increment calculation of a graph structure, and fills up the technical blank in the technical field. The advantage of bounded increments over other methods is that their computational overhead is uniquely determined by the update data, and therefore is at most a constant time when the update data Δ G is empty. When the update data Δ G is very small, e.g. only 1% of the full graph proportion, the overhead of calculating new graph partitions must also be very small, and is therefore suitable for frequent small updates.

In an embodiment, the services obtain respective first sub-graph G_iCorresponding first subdivision P_iAnd dividing the first sub-division P_iPerforming iterative dilation comprising:

said service being in said first subdivision P_iIn-process derived vertex set S_iAnd acquiring a core vertex set C in the initial graph structure G, and acquiring a derived vertex set S in the derived vertex set S_iAnd a difference set S of the core vertex set C_iSelecting the vertexes with the priority levels larger than the preset level threshold value from the C as expansion vertexes and forming an expansion vertex set X_iThen set X to the expansion vertex_iAll the expansion apexes in (1) are expanded.

In this embodiment, the core vertex set C specifically means that all the adjacent edges corresponding to all the vertices in the set have been allocated and are allocated to each of the first subdivisions P_iPerforming the following steps; set of derived vertices S_iIn particular toMeans that there is at least one vertex in the set assigned to the set S of derived vertices_iBelonging first subdivision P_iAnd the first subdivision P_iBoth end points corresponding to all edges in (b) belong to the first subdivision P_iCorresponding set of derived vertices S_i。

Note that the derived vertex set S_iAnd a difference set S of the core vertex set C_i\ C specifically refers to: set of derived vertices S_iThe vertex in (A) is removed from the vertex in the core vertex set C, and the remaining vertices are set, that is, in the difference set S_iThe selected vertex in C belongs to a set S of derived vertices_iBut not core vertex set C. Thus, the core vertex set C and the derived vertex set S_iDifference set S between_iC can be understood as: in this difference set S_iC, there is at least one vertex for all vertices assigned to the first subdivision P_iAnd there is also at least one adjacent edge that is not allocated. That is, at least one adjacent edge exists for each selected extended vertex, and the adjacent edge is not assigned to the derived vertex set S_iCorresponding first subdivision P_iThese expansion vertices can thus be iteratively expanded in the next step, i.e. their corresponding unassigned adjoining edges are assigned to the respective first subdivision P_iIn (1).

In this embodiment, the number of edges of the unallocated adjacent edge corresponding to the vertex is used as the determination of the priority of the vertex, that is, the smaller the number of edges of the unallocated adjacent edge corresponding to the vertex is, the higher the priority corresponding to the vertex is, and vice versa. That is, the vertex with the smaller number of edges to which no adjacent edge is assigned is preferentially selected as the expansion vertex. This has the advantage that as much as possible all the contiguous edges corresponding to one vertex are allocated in the same subdivision. In a specific application scenario, the priorities of all the vertices are calculated, and the vertices with the priorities of the first 10% are acquired as the expansion vertices.

In addition, in distributed computing, due to a top in a graphPoints may exist on different subgraphs, which are in different services, respectively, so that when one service selects the expanded vertex set X_iThe vertex set then needs to be expanded in synchronization with other services. In other words, for a vertex, if there is a service, it is selected as an extended vertex into the extended vertex set X_iThen the vertex needs to be added to the corresponding set of expanded vertices on all other services that own the vertex.

In one embodiment, as shown in FIG. 2, the pair of the extended vertex sets X_iAll of the expansion apexes in (1) are expanded, including: steps S201 to S204.

S201, obtaining the unallocated adjacent edges corresponding to all the expansion vertexes, and allocating the adjacent edges to the first subdivision P_iPerforming the following steps;

s202, according to newly distributed edges, collecting the derived vertexes S_iUpdating is carried out;

s203, judging the updated derivative vertex set S_iWhether two end points corresponding to the adjacent edges exist in the derived vertex set S or not_iThe case (1) above;

s204, if two endpoints corresponding to the adjacent edges exist, the two endpoints are all in the derived vertex set S_iIn case (3), then the corresponding adjacent edge is assigned to the first subdivision P_iIn (1).

In this embodiment, when an adjacent edge is assigned to the corresponding first subdivision, another vertex (i.e., a non-expanded vertex) of the adjacent edge also conforms to the derived vertex set S_i(i.e., there is at least one vertex in the set that is assigned to the set S of derived vertices_iBelonging first subdivision P_iMedium), it is therefore necessary to set S to the derived vertices after the assignment of the adjacent edges is completed_iAnd (6) updating. And for all newly added derived vertex sets S_iAll adjacent edges corresponding to the vertex in (1) are checked, and the set S of derived vertices is not searched_iAll possible assigned contiguous edges are assigned with the addition of a new vertex. When updated derived vertex set S_iThere is an adjacent edge whose two end points belong to the derived vertex set S_iThen the adjacent edge is allocated to the corresponding first subdivision P_iIn (1).

In addition, when an adjacent edge is allocated to different first subdivisions P_iWhen expanding, the adjacent edge is randomly distributed to one of the first subdivisions P_iIn (1).

For example, as shown in FIG. 3, for the first subdivision P₀In the iterative expansion process, a boundary expansion vertex u is selected for expansion, and the adjacent edges (u, v) which are not distributed in all the adjacent edges of the expansion vertex u are distributed to a first subdivision P₀Then, for the newly entered derived vertex set S₀Checks all its adjacent edges and does not go to S₀All possible assigned edges are assigned with the introduction of a new vertex.

In one embodiment, the determining the first subdivision P_iWhether a preset first upper balance limit is reached comprises the following steps:

firstly according to the formula

Calculating a first upper equilibrium bound and calculating the first subdivision P_iThen judging whether the calculated number of edges reaches the first upper balance limit, if so, judging the first subdivision P_iReaching a preset first balance upper bound;

wherein k is a predetermined number and k is all of the first subdivision P_iIs the total number of edges of the existing graph structure G, | E |.

In the present embodiment, P is divided into a first sub-division_iWhen performing iterative dilation, to avoid imbalance of the final partition, all the first sub-partitions P are divided_iA first upper balance bound is preset. When the first subdivision P_iWhen the number of edges in (1) reaches the first equilibrium upper bound, stopping the process towards the first subdivision P_iMaking a division of adjoining edges, i.e. the first subdivision P_iThe iterative expansion is not continued, at this timeIf other first subdivision P_iThe first upper equilibrium limit has not been reached, iterative dilation may continue.

It is to be noted that the formula

Is a preset small amount greater than 0, and the formula can be understood as: the scale of one of the largest first subdivisions in the graph structure may not exceed (1+) times the absolute uniform subdivision (1/k).

In one embodiment, the pair of the set of expansion vertices X_iThe expanding of all of the expanding apices further comprising:

when the derived vertex set S_iWhen all the adjacent edges in (a) are allocated, a core vertex is randomly selected from the core vertex set C, and the core vertex is allocated to the derived vertex set S_iIn (1).

In this embodiment, when deriving vertex set S_iAre assigned, the corresponding first subdivision P_iThe expansion will stop. At this time, if the first subdivision P_iWhen the preset first balance upper bound is not reached yet, other first sub-divisions P_iReaching the preset first upper balance limit may result in an unbalanced final partition. Therefore, in order to ensure that the division results in better locality (balance), it is necessary to set S of derived vertices_iPerforming active expansion, namely adding the vertexes in the core vertex set C into the derivative vertex set S_iSo that the first subdivision P can continue to be divided_iIs expanded.

In one embodiment, as shown in fig. 4, the step S107 includes: steps S401 to S405.

S401, merging the updated data delta G and the initial graph structure G to obtain an updated graph structure G ≧ delta G and a partial graph structure delta G' corresponding to the updated data;

s402, dividing the partial graph structure delta G 'to obtain a plurality of second sub graphs G' and the second sub graphsSecond subdivision P for graph mapping_i’；

S403, calculating the total number of edges of the updated graph structure G ⊕ Δ G, and dividing the updated graph structure G ⊕ Δ G into all first sub-partitions P_iCalculating a second upper balance bound;

s404, a second subdivision P reaching the second equilibrium upper bound_i' removing partial edges so that the second subdivision P_i' satisfying the second upper balance bound;

s405, obtaining and removing the second subdivision P_i' and the derived vertices with the number of corresponding adjacent edges less than a second predetermined number of edges.

In this embodiment, when there is update data Δ G for the initial graph structure G, after the update data Δ G and the initial graph structure G are merged (i.e., G ⊕ Δ G), a partial graph structure Δ G 'corresponding to the update data Δ G and the update graph structure G ⊕ Δ G' may be obtained_i' and a corresponding plurality of second subdivisions P_i'. Since the total number of edges in the updated graph structure may change, the total number of edges | E '| of the updated graph structure needs to be recalculated, and the second upper bound of equalization needs to be recalculated according to the total number of edges | E' |. After obtaining the second equilibrium upper bound, the second subdivision P needs to be partitioned_i' proceed the corresponding process to make the second subdivision P_i' satisfies the second upper balance bound.

The redundant vertex in this embodiment refers to a vertex without any adjacent edge in the corresponding subdivision, and the second subdivision P is divided into_i' after partial edge removal, this second subdivision P may result_i' some redundant vertices are generated, and removing these redundant vertices and the derived vertices with fewer corresponding adjacent edges in time enables the second subdivision P to be generated_i' with less communication overhead.

In addition, after the above operations are completed, the kernel vertex set S is also required_i' and all derived vertex sets C are updated.

In an embodiment, the step S107 further includes:

the coordinator adopts a broadcast distribution mode to divide the plurality of second sub-partitions P_i' to distribute.

In this embodiment, some first sub-partitions P may be in the process of partitioning the initial graph structure G_iThe preset first balance upper bound is reached, and the expansion is not needed to be continued, so that when the update data deltaG exists, the coordinator adopts a broadcast distribution mode to divide the second sub-partition P into the second sub-partitions_i' distribution and decision by each service whether to receive the second subdivision P distributed by the coordinator_i' in this way, the final partitioning can be guaranteed to be more uniform.

In a specific embodiment, as shown in fig. 5, the embodiment of the present invention specifically includes an IO controller, a coordinator (coordinator) and a plurality of services (workers). When the external world makes an update change to the graph structure, the IO controller receives update data Δ G (including insertion data Δ) of the external world to the graph structure⁺G and deletion data Delta^-G) And sending the updated data Δ G to a coordinator, and the coordinator dividing the received updated data Δ G to obtain a plurality of sub-divisions P_i', and dividing these into P_i' distribution to services and subdivision P by reception_iThe service of' iteratively expands independently.

In one embodiment, as shown in fig. 6, the embodiment of the present invention is mainly composed of two stages: a partial partitioning phase and a rebalancing phase. Wherein Partial partitioning (Partial Allocation) is used to expand the graph partitioning that exists so far until all edges have been allocated. At this time, if the IO controller detects that the graph structure has the updated data Δ G, a rebalancing (ReBalance) phase is performed to process the updated graph data Δ G, so that a Partial result of the current graph partitioning can be processed again by the Partial Allocation (Partial Allocation) phase.

Specifically, in the Partial Allocation stage, the input content includes a graph structure G, a subgraph G', and a sub-partition P on the subgraph G_i，

The output content comprises: subgraph G', subdivision P after expansion_i，

At this stage, a partition on the sub-graph G' is expanded to a larger sub-graph G ″, and the specific partitioning steps can refer to steps S201 to S204.

In the rebalancing stage, its input content includes: graph G, subdivision P_iUpdate data deltaG, and the output content includes an updated graph G ⊕ deltaG and a subdivision P_iWherein

At this stage, the graph update data Δ G is incorporated into the existing graph structure G and a new partial graph partition P (G') is generated such that this graph partition meets the updated balance constraint, i.e. the number of edges in the largest subdivision does not exceed the updated balance constraint

The steps S301 to S305 can be referred to for the specific steps of rebalancing.

In another embodiment, as shown in fig. 7, the process with the prefix PMIC _ is executed on the coordinator, and the rest of the processes are executed in parallel on the services. The part of the input update data Δ G is then processed by the IO controller. partialAllocation corresponds to the main loop process from filtering to PMIC _ expansion S, the service partitioning a portion of edges each time until all edges have been allocated. After receiving the Update data Δ G, the Update data Δ G is preprocessed through PMIC _ Update (PMIC _ Update) and ReBalance (ReBalance) processes, and then returned to the main loop body for iterative expansion. It should be noted that each service uses a BSP (global synchronous parallel computing) mode in parallel, each phase is executed independently on a service or a coordinator, and does not involve communication with other nodes (other services or coordinators), and after local computing of all services is finished, global communication is performed to exchange information.

Specifically, in the filtering (Filter) stage, the input content includes: set of derived vertices S_iAnd a difference set S of the core vertex set C_iC and difference set S_iV (v) priority level for each vertex in C; the output content comprises: expanding vertex set X_iAnd to synchronize the communication messages to each vertex that corresponds thereto. At this stage, the difference set S of the local is measured_iV of C is sorted from high to low according to f (v), and the vertex with higher priority is selected as the expanding vertex X_i(e.g., pick the top 10% of the priority). Finally, communicating with other services through the coordinator, X_iThe vertex v in (b) is synchronized into the expanded vertex set of the other service.

In the PMIC _ Filter (PMIC _ Filter) phase, the coordinator is used to pass communication messages and to calculate the following expansion of C (ExpandC) by expansion X_iIs assigned to P_iIf the total number of edges exceeds the preset balance upper bound, the number of the edges is reduced by X_iIs partially vertex-pointed.

In the expansion c (expand c) stage, its inputs include: subfigure G_iPartial subdivision of P_iEach expansion node set X_iK, · i ═ 1, 2; the output content comprises: subdivision P by dilation C_iUpdated core vertex set C, derivative vertex set S_iAnd expanding the vertex set X_i1, 2.. k, and for synchronizing the communication messages of the corresponding set on other services.

Specifically, in the dilation C phase, P is divided into sub-partitions_iFor expanding vertex set X_iAll local vertices are expanded, i.e. all locally visible contiguous edges are assigned to subdivision P_iIn (1). For example, if the vertex v is at X₀In (3), all its adjacent edges are allocated to P₀In (1). Assuming that one of the edges assigned is (v, u), the vertex u is added to S_iMiddle (update S)_i). When one edge is simultaneously expanded by a plurality of sub-partitions, one of the sub-partitions is randomly selected for expansion.

In PMIC _ expanded C (Expandc) stage, harmonizationFor communicating communication messages and updating the globally allocated payload. Herein, the allocated payload specifically means having been allocated to the subdivision P_iThe number of edges in (1).

At expansion s (expands), its input content includes: subfigure G_iSubdivision P_i，S_iNew addition part Delta S of_iK, · i ═ 1, 2; the output content comprises: subdivision P via dilation S_iUpdated local core vertex set C and derivative vertex set S_iAnd expanding the vertex set X_i1, 2.. k, and a synchronous communication message.

Specifically, in the dilation S phase, P is divided into sub-partitions_iScanning the updated set S of derived vertices_iAll the adjacent edges which are not allocated yet and correspond to the middle vertex, if another vertex of the adjacent edges which are not allocated is also in the derived vertex set S_iIn (3), the adjacent edge is divided into sub-partitions P_iAnd the inside. And simultaneously, synchronizing the vertex information in the updated derivative vertex set with other services.

In a PMIC _ expanded S (PMIC _ ExpandS) stage, the coordinator is used for transmitting a communication message and judging whether an unallocated edge exists or not, and if the unallocated edge exists, entering a next iteration; if the unallocated edge does not exist, the next judgment process is carried out, namely whether the updating data delta G exists or not is judged, if the updating data delta G exists, the PMIC _ Update stage is started, and if the updating data delta G does not exist, the process is ended.

In PMIC _ Update (PMIC _ Update) phase, the coordinator divides the Update data Δ G into a plurality of sub-divisions P_i', and P_i' distribute to individual services, and compute an updated equilibrium upper bound.

In the ReBalance (ReBalance) stage, the specific content is the same as that in the ReBalance stage, and is not described herein again.

The bounded incremental graph dividing method provided by the embodiment can efficiently and quickly divide the graph. In a specific application scenario, when the updated data occupies 10% of the full map, the method provided by the present embodiment can achieve 7.9 times of acceleration ratio compared to using the static method for repartitioning, and when the updated data occupies 50% of the full map, the acceleration ratio still remains 3.9 times.

The bounded incremental graph partitioning method provided by the embodiment can also achieve the same or even better partitioning quality as static graph partitioning. In a specific application scenario, the graph partitioning communication overhead corresponding to the method provided by the embodiment is about 10% lower than that of other static edge partitioning methods.

The bounded incremental graph partitioning method provided by the embodiment has extremely strong parallel expansibility. In a specific application scenario, when 128 services are used, 128-partitioning a graph structure with a size of 58 hundred million edges takes only 51 seconds.

The bounded incremental graph dividing method provided by the embodiment is less time-consuming compared with other existing incremental graph dividing methods. In a specific application scenario, the response time of the method provided by this embodiment is at least 6.4 times less than that of ParMETIS, and at least 2.2 times less than that of Hermes.

The embodiment of the invention also provides a bounded incremental graph partitioning system, which comprises a coordinator and a plurality of services;

the coordinator includes:

Each service comprises:

In an embodiment, the bounded delta graph partitioning system further comprises an IO controller;

In an embodiment, each of the services further comprises:

a second distribution unit, configured to, if there are two endpoints corresponding to adjacent edges, locate both in the derived vertex set S_iIn case of (1), then the correspondingThe adjacent edge is allocated into the first subdivision.

Since the embodiment of the system part corresponds to the embodiment of the method part, the embodiment of the system part is described with reference to the embodiment of the method part, and is not repeated here.

Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the present invention further provides a computer device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the above embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A bounded delta graph partitioning method, comprising:

the coordinator divides the initial graph structure into a plurality of first sub-graphs, correspondingly obtains a plurality of first sub-divisions, and distributes the plurality of sub-graphs and the corresponding first sub-divisions to a plurality of services;

if there is no update data, the graph division processing is ended.

2. The method according to claim 1, wherein the obtaining a first sub-partition corresponding to a first sub-graph of each service and iteratively expanding the first sub-partition comprises:

3. The method of bounded delta graph partitioning according to claim 2, wherein said expanding all of said set of expanded vertices comprises:

updating the derived vertex set according to the newly allocated edges;

4. The method according to claim 1, wherein said determining whether the first sub-division reaches a preset first equilibrium upper bound comprises:

firstly according to the formula

5. The method of bounded delta graph partitioning according to claim 3, wherein said expanding all of said set of expanded vertices further comprises:

6. The method according to claim 4, wherein if there is update data, the coordinator merges the update data with the initial graph structure to obtain a partial graph structure corresponding to the update data, then divides the partial graph structure into a plurality of second subgraphs to obtain second sub-partitions corresponding to the second subgraphs, and distributes the second sub-partitions to the services, and performs iterative expansion by the service that receives the second sub-partitions, and the method includes:

7. The method according to claim 6, wherein if there is update data, the coordinator merges the update data with the initial graph structure to obtain a partial graph structure corresponding to the update data, then divides the partial graph structure into a plurality of second subgraphs to obtain second sub-partitions corresponding to the second subgraphs, distributes the second sub-partitions to the services, and performs iterative expansion by the services receiving the second sub-partitions; if there is no update data, ending the graph partitioning process, further comprising:

8. A bounded delta graph partitioning system comprising a coordinator and a plurality of services:

the coordinator includes:

Each service comprises:

9. The bounded delta graph partitioning system according to claim 8, further comprising an IO controller;

10. The bounded delta graph partitioning system according to claim 8, wherein said each service further comprises:

a second distribution unit, configured to, if there are two endpoints corresponding to adjacent edges, locate both in the derived vertex set S_iIn (3), the corresponding adjacent edge is allocated to the first subdivision.