WO2021000435A1

WO2021000435A1 - Large-scale dynamic graph division method based on sliding window

Info

Publication number: WO2021000435A1
Application number: PCT/CN2019/108136
Authority: WO
Inventors: 崔焕庆; 荣炫宇; 贾瑞生; 魏永山; 张峰; 徐强
Original assignee: 山东科技大学
Priority date: 2019-07-01
Filing date: 2019-09-26
Publication date: 2021-01-07
Also published as: CN110309371A

Abstract

A large-scale dynamic graph division method based on a sliding window, which method belongs to the technical field of computers. According to the method, when vertexes are added, the vertexes with higher degrees are preferentially selected from the sliding window for division, so that the vertexes with lower degrees can be gathered to the vertexes with higher degrees, and as many vertexes as possible can also be divided into appropriate partitions at each instance of division, thereby realizing load balancing and reducing the number of cut edges, and thus greatly reducing the communication cost during a graph calculation process; and when edges are added, the vertexes with the most adjacent edges are preferentially selected from the sliding window for division, so that the frequent migration of the vertexes can be effectively avoided, and as many adjacent vertexes as possible can also be divided into appropriate partitions at each instance of division, thereby greatly reducing the number of instances of migrations of the vertexes, improving the division efficiency, and realizing load balancing and minimizing the number of cut edges.

Description

A large-scale dynamic graph partition method based on sliding window

Technical field

The invention belongs to the field of computer technology, and specifically relates to a large-scale dynamic graph division method based on a sliding window.

Background technique

As an abstract data structure, graphs can express complex structures and rich semantics, and have been widely used in many fields such as social networks, communications, and scientific computing. In recent years, with the continuous growth of data scale, it is necessary to use distributed graph computing system to analyze and process graph data.

Graph partitioning is a technology for distributing large-scale graph structure data into a distributed computing system composed of a large number of computing nodes, and is the basis for realizing distributed graph computing. In graph partitioning, if the two vertices of an edge are divided into different computing nodes, the edge is called a cut edge. Graph partitioning should minimize the number of cut edges and achieve load balancing among computing nodes.

At present, the graph data of many application scenarios will often change, such as the addition or deletion of users and related relationships in social networks. Such graphs are called dynamic graphs. Most of the existing graph partitioning algorithms are for static graphs. Before dividing, all graph data needs to be loaded into the memory and then divided. Such algorithms are prone to generate huge computational overhead when used for dynamic graph division.

Summary of the invention

In view of the above-mentioned technical problems in the prior art, the present invention proposes a large-scale dynamic graph partition method based on a sliding window, which is reasonable in design, overcomes the shortcomings of the prior art, and has good effects.

In order to achieve the above objectives, the present invention adopts the following technical solutions:

A large-scale dynamic graph partition method based on sliding window includes the following steps:

Step 1: Add vertices; specifically include the following steps:

Input is set to be increased vertices S _vertex, K current partition _{P i (i = 1,2, ...} , K) of the set of vertices of the respective partitions;

Step 1.1: Set

Specify the upper limit of |W _vertex | as L _vertex ;

Among them, W _vertex is the set of candidate vertices to be divided, and its vertices come from S _vertex ;

Step 1.2: Take N=min{|S _vertex |,L _vertex -|W _vertex |}, which is the minimum value of |S _vertex | and L _vertex -|W _vertex |, and increase the first N vertices in S _vertex to W _vertex , and delete these vertices from S _vertex ;

Step 1.3: If

Output the division result and end the division process; otherwise, go to step 1.4;

Step 1.4: Take _{v = argmax {d u | u∈W} vertex, d u u is the number of vertices i.e., the degree of contact of adjacent apex u}, that is to take the maximum degree of vertex v _vertex W, if multiple For the vertices with the same degree and the largest degree, choose any one of them;

Step 1.5: Take V=Q=R={v}; V is the set of vertices selected to be divided into a certain partition,

Q is the vertex queue; R is the set of all vertices adjacent to the vertices in V;

Step 1.6: If

Go to step 1.8; otherwise, take the first vertex u from Q and delete the vertex from Q;

Step 1.7: Take Q=Q∪{w|(u,w) as an edge of the graph, and

R=R∪{w|(u,w) is an edge of the graph}, then go to step 1.6;

Step 1.8: Take V=V∪{w|w∈R, and

Step 1.9: for each partition _{P i (i = 1,2, ...} , K), calculated

Among them, C _i is the cost of dividing the vertex or edge into the i-th partition;

Is the number of vertices in the partition with the most vertices, α is the weight coefficient of the partition load and the number of cut edges when calculating C _i , 0<α<1;

Used to measure the partition load,

The user measures the number of cut edges;

Step 1.10: Take m=argmin{C _i |i=1, 2,...,K}, that is, C _m is the minimum value among all {C _i |i=1,2,...,K};

Step 1.11: Divide all vertices in V into partition P _m ; P _m is the partition corresponding to the minimum value C _m ;

Step 1.12: Take W _vertex = W _vertex -V, then go to step 1.2;

Step 2: Add edges; specifically include the following steps:

To be added to the input side of the set S _edge, the K current partition _{P i (i = 1,2, ...} , K) of the set of vertices of the respective partitions;

Prerequisite: All vertices of the _edge in S _edge have been divided;

Step 2.1: Set

Specify the upper limit of |W _edge | as L _edge ;

Step 2.2: Take N=min{|S _edge |,L _edge -|W _edge |}, which is the minimum value of |S _edge | and L _edge -|W _edge |, and increase the first N edges in S _edge to W _edge , and delete these edges from S _edge ;

Step 2.3: If

Then output the division result and end the division process; otherwise, go to step 2.4;

Step 2.4: For each vertex v in W _edge , take E _v =u|(u,v)∈W _edge }; E _v is the set of vertices adjacent to vertex v and belonging to W _edge ;

Step 2.5: Take v=argmax{|E _v |}, that is, take the vertex v with the largest number of adjacent vertices in W _edge . If there are multiple vertices that meet the conditions, choose any one of them;

Step 2.6: Take T={w|(v,w) is an edge of the graph}; T is the set of all vertices associated with a vertex in W _edge ;

Step 2.7: for each partition _{P i (i = 1,2, ...} , K), if v∈P _i, then

otherwise

Where max _j=1,2,...,K {|P _j |} is the number of vertices in the partition with the most vertices;

Step 2.8: Take m=argmin{C _i |i=1, 2,...,K}, that is, C _m is the smallest value among all {C _i |i=1,2,...,K};

Step 2.9: Transfer v to the partition P _m ;

Step 2.10: E _v for each of the one side _{(u, v), u∈P i} , v∈P j, if i ≠ j, then (u, v) is divided into P _i and P _j; and otherwise (u, v) is divided into the P _i;

Step 2.11: W _edge = W _Edge- E _v , go to step 2.2.

The beneficial technical effects brought by the present invention:

When adding vertices, the present invention preferentially selects vertices with a higher degree for division in the sliding window, which can not only make the vertices with a small degree gather to the vertices with a large degree, but also divide as many vertices as possible into each division. In a suitable partition, the number of cut edges is reduced while achieving load balancing, thereby greatly reducing the communication cost in the graph calculation process.

When adding edges, the present invention preferentially selects the vertices with the most adjacent edges for division in the sliding window, which can effectively avoid frequent vertex migration, and can divide as many adjacent vertices as possible into suitable partitions during each division , Thereby greatly reducing the number of vertices migration, improving the efficiency of division, and achieving load balancing and minimizing the number of edge cuts.

Description of the drawings

Figure 1 is a flowchart for adding vertices.

Figure 2 is a flowchart of adding edges.

Figure 3 is a schematic diagram of the window structure when vertices are added.

Figure 4 is a schematic diagram of the window structure when edges are added.

Fig. 5 is a schematic diagram of a sliding window model with added vertices.

Figure 6 shows an example of adding vertices.

Figures 6(a)-(d) respectively show the information in the vertex window corresponding to the A state shown in Figure 5, the partition state of the graph structure data that has been divided before the vertex is added, and the flow graph partition algorithm to increase v ₈ and The division result after v ₉ and the division result after v ₈ and v _{9 are} added using the algorithm proposed by the present invention.

Figure 7 shows an example of adding edges.

Figure 7(a)-(d) respectively show the window information when adding an edge, the partition status of the graph structure data that has been divided before adding the edge, and the division result after adding (v ₁ , v ₃ ) using the streaming graph partition algorithm , And a schematic diagram of the division result after adding v ₁ and its associated edges using the algorithm proposed by the present invention.

Detailed ways

The present invention will be further described in detail below in conjunction with the drawings and specific embodiments:

Step 1: Add vertices; the process is shown in Figure 1, which specifically includes the following steps:

Step 1.1: Set

Specify the upper limit of |W _vertex | as L _vertex ;

Step 1.3: If

Step 1.6: If

Step 1.7: Take Q=Q∪{w|(u,w) as an edge of the graph, and

R=R∪{w|(u,w) is an edge of the graph}, then go to step 1.6;

Step 1.8: Take V=V∪{w|w∈R, and

Step 1.9: for each partition _{P i (i = 1,2, ...} , K), calculated

Used to measure the partition load,

The user measures the number of cut edges;

Step 1.12: Take W _vertex = W _vertex -V, then go to step 1.2;

Step 2: Add edges; the process is shown in Figure 2, which specifically includes the following steps:

Prerequisite: All vertices of the _edge in S _edge have been divided;

Step 2.1: Set

Specify the upper limit of |W _edge | as L _edge ;

Step 2.3: If

Step 2.4: For each vertex v in W _edge , take E _v ={u|(u,v)∈W _edge }; E _v is the set of vertices adjacent to vertex v and belonging to W _edge ;

Step 2.7: for each partition _{P i (i = 1,2, ...} , K), if v∈P _i, then

otherwise

Where max _{j = 1, 2,..., K} {|P _j |} is the number of vertices in the partition with the most vertices;

Step 2.9: Transfer v to the partition P _m ;

Step 2.11: W _edge = W _edge- E _v , go to step 2.2.

In the above method, the symbols and meanings involved are shown in Table 1.

Table 1 Main symbols and their meanings

This method needs to build a sliding window. The window structure when adding vertices is shown in Figure 3. The sliding window when adding vertices is composed of L _vertex vertices, which are sorted by degree, and each vertex includes 3 fields:

(1) Primary Key: Each vertex to be divided corresponds to a primary key in the sliding window.

(2) Divided adjacent vertices (Secondary Key): A list of vertices that are adjacent to the primary key and have been divided into a certain partition.

(3) Undivided adjacent vertices (Unassigned Key): A list of vertices in the sliding window that are adjacent to the primary key and have not been divided.

The window structure when adding an edge is shown in Figure 4. The sliding window when adding an edge is composed of L _edge edges, and all edges are composed in the manner of an adjacency list, where the head vertices of the adjacency list are based on the number of adjacent points in W _edge Sort, that is, include:

(1) Primary Key: Each vertex in the sliding window corresponds to a primary key.

(2) Adjacent vertices (Secondary Key): other vertices associated with the primary key and corresponding to the edges in the sliding window.

The graphs of many application scenarios are dynamically changing. The specific implementation of adding vertices and adding edges of the present invention will be further described below with reference to the drawings and specific examples.

The first case: increase the vertex.

The sliding window model under the added vertex is shown in Figure 5, where S _vertex is the vertex stream; A and B represent the state after filling the W _vertex (by sliding the window to the right) with the vertex.

According to step 1.1, first initialize the window W _vertex and specify L _vertex as 4.

According to step 1.2, take N=min{|S _vertex |,L _vertex -|W _vertex |}, add the first N vertices in S _vertex to W _vertex to reach the A state, at this time the vertices in W _vertex are {v ₈ ,v ₉ ,v ₁₀ ,v ₁₁ }, the window information is shown in Figure 6(a). Figure 6(b) shows the partition status of the graph structure data that has been divided before adding vertices. The dotted circles P ₁ and P ₂ are two partitions, and the hollow circles v ₀ , v ₁ , v ₂ , v ₃ , v ₄ , v ₅ , v ₆ and v ₇ represent 8 vertices of the graph structure data, and the solid lines between the vertices represent edges in the graph structure data.

According to steps 1.3 to 1.5, take the vertex with the largest degree from the window, where v _{8 has a} degree of 4, then select v = v ₈ and add it to V, Q, R.

According to steps 1.6 to 1.8, Q={v ₈ } is not empty, take the first vertex u=v ₈ from Q and delete v ₈ from Q, traverse u's neighbor vertices v ₁ , v ₃ , v ₅ , v ₉ , add v ₉ to Q, and add v ₁ , v ₃ , v ₅ , and v ₉ to R. Then continue to perform the above steps, and end when Q is empty. Finally, the undivided vertices in R are added to V. At this time, V={v ₈ , ₉ }, R={v ₀ , v ₁ , v ₃ , v ₅ , v ₇ }.

According to steps 1.9 to 1.11, calculate the costs C ₁ , C ₂ incurred by adding the vertices v ₈ and v ₉ in V to P ₁ and P ₂ respectively, satisfying C ₁ >C ₂ , that is, the target partition P _m =P ₂ , Add v ₈ , v ₉ to P ₂ and the result is shown in Figure 6(d). The number of cut edges after division is 3. Compared with the result obtained by the flow graph partitioning algorithm (Figure 6(c)), the number of cut edges is reduced by 2.

According to step 1.12, delete vertices v ₈ and v ₉ from W _vertex . The division of v ₈ and v ₉ in the A state shown in Fig. 5 is completed, and go to step 1.2. At this time, N is 2, add the vertices v ₁₂ and v ₁₃ in S _vertex to W _vertex to reach the state B in Figure 5. Follow the above steps until all vertices in S _vertex are divided into the graph, output the division result and End.

The second case: adding edges.

In the case of increasing the edge, step 2.1 to 2.3,2.11 vertices and the like increase, i.e. increasing from the edges to S _edge W _edge, when the number of edges reaches W _edge L _edge, or W _edge is not empty , Select the vertex transition and divide the related edges into corresponding partitions; when W _edge is empty, output the division result and end. Here we mainly describe the process of selecting vertex transition and adding related edges (step 2.4 to step 2.10) under the window W _edge , and specify L _edge as 3.

According to steps 2.4 to 2.5, select the vertex v with the largest number of adjacent vertices from W _edge , which can be obtained from the window information in Figure 7(a), v=v ₁ .

According to step 2.6, put all adjacent vertices of v ₁ into T. Figure 7(b) is a schematic diagram of the partition state of the graph structure data that has been divided before adding edges. Therefore, in Figure 7(a) and Figure 7(b), all vertices v ₂ , v ₅ adjacent to v ₁ ,v ₃ ,v ₄ ,v ₆ are all put in T. At this time, T={v ₂ , v ₃ , v ₄ , v ₅ , v ₆ }.

According to steps 2.7 to 2.10, calculate the cost C ₁ , C ₂ incurred by transferring v ₁ to P ₁ , P ₂ and satisfy C ₁ >C ₂ , that is, the target partition P _m =P ₂ , because v _{1 is} already in partition P ₂ In, then no more division. When the edges in E _v = {(v ₁ ,v ₃ ),(v ₁ ,v ₄ ),(v ₁ ,v ₆ )} are divided into partitions, the result of the division is shown in Figure 7(d) Show.

In the existing flow graph division methods, the number of cut edges is mostly used as the main basis for transition judgment. After adding (v ₁ , v ₃ ) in Figure 7(b), in order to reduce the number of cutting edges, v ₁ will be transferred to P ₁ (Figure 7(c)). After continuing to increase (v ₁ ,v ₄ ) and (v ₁ ,v ₆ ), v ₁ will be transferred back to the original partition P ₂ to achieve the same division result as the present invention, but it will cost two more vertex transfers , The division efficiency is reduced.

Of course, the above description is not a limitation of the present invention, and the present invention is not limited to the above examples. Changes, modifications, additions or substitutions made by those skilled in the art within the essential scope of the present invention shall also belong to the present invention. The scope of protection of the invention.

Claims

A method for dividing a large-scale dynamic graph based on a sliding window is characterized in that it includes the following steps:

Step 1: Add vertices; specifically include the following steps:

Input is set to be increased vertices S vertex, K current partition P i (i = 1,2, ... , K) of the set of vertices of the respective partitions;

Step 1.1: Set
Specify the upper limit of |W vertex | as L vertex ;

Among them, W vertex is the set of candidate vertices to be divided, and its vertices come from S vertex ;

Step 1.2: Take N=min{|S vertex |,L vertex -|W vertex |}, which is the minimum value of |S vertex | and L vertex -|W vertex |, and increase the first N vertices in S vertex to W vertex , and delete these vertices from S vertex ;

Step 1.3: If
Output the division result and end the division process; otherwise, go to step 1.4;

Step 1.4: Take v = argmax {d u | u∈W vertex, d u u is the number of vertices i.e., the degree of contact of adjacent apex u}, that is to take the maximum degree of vertex v vertex W, if multiple For the vertices with the same degree and the largest degree, choose any one of them;

Step 1.5: Take V=Q=R={v}; V is the set of vertices selected to be divided into a certain partition,
Q is the vertex queue; R is the set of all vertices adjacent to the vertices in V;

Step 1.6: If
Go to step 1.8; otherwise, take the first vertex u from Q and delete the vertex from Q;

Step 1.7: Take Q=Q∪{w|(u,w) as an edge of the graph, and
R=R∪{w|(u,w) is an edge of the graph}, then go to step 1.6;

Step 1.8: Take V=V∪{w|w∈R, and

Step 1.9: for each partition P i (i = 1,2, ... , K), calculated

Among them, C i is the cost of dividing the vertex or edge into the i-th partition;
Is the number of vertices in the partition with the most vertices, α is the weight coefficient of the partition load and the number of cut edges when calculating C i , 0<α<1;
Used to measure the partition load,
The user measures the number of cut edges;

Step 1.10: Take m=argmin{C i |i=1, 2,...,K}, that is, C m is the minimum value among all {C i |i=1,2,...,K};

Step 1.11: Divide all vertices in V into partition P m ; P m is the partition corresponding to the minimum value C m ;

Step 1.12: Take W vertex = W vertex -V, then go to step 1.2;

Step 2: Add edges; specifically include the following steps:

To be added to the input side of the set S edge, the K current partition P i (i = 1,2, ... , K) of the set of vertices of the respective partitions;

Prerequisite: All vertices of the edge in S edge have been divided;

Step 2.1: Set
Specify the upper limit of |W edge | as L edge ;

Step 2.2: Take N=min{|S edge |,L edge -|W edge |}, which is the minimum value of |S edge | and L edge -|W edge |, and increase the first N edges in S edge to W edge , and delete these edges from S edge ;

Step 2.3: If
Then output the division result and end the division process; otherwise, go to step 2.4;

Step 2.4: For each vertex v in W edge , take E v ={u|(u,v)∈W edge }; E v is the set of vertices adjacent to vertex v and belonging to W edge ;

Step 2.5: Take v=argmax{|E v |}, that is, take the vertex v with the largest number of adjacent vertices in W edge . If there are multiple vertices that meet the conditions, choose any one of them;

Step 2.6: Take T={w|(v,w) is an edge of the graph}; T is the set of all vertices associated with a vertex in W edge ;

Step 2.7: for each partition P i (i = 1,2, ... , K), if v∈P i, then

otherwise
Where max j = 1, 2,..., K {|P j |} is the number of vertices in the partition with the most vertices;

Step 2.8: Take m=argmin{C i |i=1, 2,...,K}, that is, C m is the smallest value among all {C i |i=1,2,...,K};

Step 2.9: Transfer v to the partition P m ;

Step 2.10: E v for each of the one side (u, v), u∈P i , v∈P j, if i ≠ j, then (u, v) is divided into P i and P j; and otherwise (u, v) is divided into the P i;

Step 2.11: W edge = W edge- E v , go to step 2.2.