WO2015169029A1

WO2015169029A1 - Graph data partitioning method and device

Info

Publication number: WO2015169029A1
Application number: PCT/CN2014/087091
Authority: WO
Inventors: 罗圣美; 曲文武; 刘丽霞
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-05-05
Filing date: 2014-09-22
Publication date: 2015-11-12
Also published as: CN105096297A

Abstract

A graph data partitioning method and device. The method comprises: converting original graph data into a locally intensive weighted hypergraph through a parallel label propagation algorithm; uniformly partitioning the weighted hypergraph into weighted hypergraph subgraphs gradually through a partitioning algorithm; and restoring the weighted hypergraph subgraphs into data corresponding to an original graph.

Description

Method and device for dividing graph data

Technical field

The present invention relates to the field of cloud computing technologies and graph data analysis technologies, and in particular, to a method and apparatus for graph data segmentation.

Background technique

BSP (Bulk Synchronous Parallel) is a design model of parallel algorithms. In this model, the algorithm is divided into several super steps, each of which is divided into three processes, namely local computing, mutual communication, and phase synchronization. The BSP parallel model is suitable for high iteration calculations. The graph data is segmented, and the graph data is the data stored in the graph structure. Graph structure is the most commonly used class of abstract data structures in computer science. It consists of a finite number of vertices and edges between connected vertices. It has a more general representation than linear table structures and tree structures.

Since the application of the real scene is described by multiple graphs, as the information grows, the graph data reaches a massive scale. Due to the inherent connectivity of graph data and the strong coupling characteristics of graph calculation, in order to achieve efficient parallel processing, a logically complete large graph needs to be divided into several parts by means of decoupling and placed into distributed storage systems. On each working node, then parallel distributed processing is performed.

The existing graph data segmentation scheme mainly has the following types of methods: heuristic method, represented by Kernighan-Lin algorithm. In the algorithm, the graph data is first divided into two sets of A and B, and then the influence of each vertex in the A set and each vertex in the B set on the set weight is calculated, and each exchange has the greatest influence on the set weight. The two vertices until the end condition is reached. The spectral segmentation method, which calculates the eigenvectors of the Laplacian matrix of the graph, extracts the first k eigenvalues and their corresponding eigenvectors, obtains the representation of each vertex in the low-dimensional space, and then performs k-means clustering. , to obtain the division of the map. It can be seen from the above technical solutions that in the existing graph data segmentation scheme, the following disadvantages exist:

The calculation time complexity is higher: for example, the Kernongan-Lin algorithm, because it needs to compare the vertices in the two sets separately, calculates the influence of the weights on the set after the exchange, so the time complexity is O(n3). Moreover, in the big data segmentation application, the data needs to be divided into multiple copies. For the Kernighan-Lin algorithm, the algorithm needs to be run multiple times for the two segmentation results of the first step, and the time consumption is more. The spectral analysis method needs to solve the eigenvalue decomposition problem of n-order square matrix, and its time complexity is O(n3). The matrix calculation for large-scale graph data is complicated.

The computational space complexity is high: as in the spectral analysis method, it is necessary to construct an adjacency matrix for the vertices in the graph data, and then perform Laplace decomposition, and then perform the segmentation calculation. The adjacency matrix of the graph is n × n, where n is the number of vertices in the graph. Due to the large number of vertices in the large image data, the matrix is also quite large, which is not conducive to calculation and caching.

Parallelization is difficult: Since the algorithm itself is not designed for parallelization, there are problems in parallelizing it to improve efficiency. For example, in the Kernighan-Lin algorithm, only one pair of vertices is exchanged at a time, and how to perform large-scale matrix parallel decomposition in the spectral analysis method.

Summary of the invention

The embodiment of the invention provides a method and a device for segmenting graph data, so as to overcome the problems of high time complexity and space complexity and difficulty of parallelization existing in the existing data segmentation technology.

The embodiment of the invention provides a method for segmenting data of a graph, comprising:

Converting the original graph data into a locally dense weighted hypergraph by a parallel label transfer algorithm;

The weighted hypergraph equilibrium is successively divided into weighted hypergraph subgraphs by a partitioning algorithm;

The weighted hypergraph subgraph is restored to the data corresponding to the original graph.

Preferably, the above method further has the following feature: converting the original image data into a locally dense weighted hypergraph by a parallel label transfer algorithm, including:

The vertex with the same label in the original graph data is aggregated into a super point by a parallel label passing algorithm, and the weight of the super point is the number of vertices included in the super point;

The connecting edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image;

The weighted hypergraph is represented by the super point and the super edge.

Preferably, the above method further has the following feature: the weight of the super edge is determined by an edge in the original image, including:

If the two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a super edge between the two superpoints, and the weight of the super edge is increased by 1;

If the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no superedge is generated.

Preferably, the above method also has the following features:

And dividing the weighted hypergraph equalization into weighted hypergraph subgraphs by using a partitioning algorithm, including:

Minimizing the localized partial rate value by using the super point in the weighted hypergraph as a starting point

And dividing the weighted hypergraph into a weighted hypergraph subgraph of a specified number of blocks according to the minimized partial cut rate value.

The embodiment of the invention further provides an apparatus for segmenting data of a graph, comprising:

a conversion module configured to convert the original map data into a locally dense weighted hypergraph by a parallel label transfer algorithm;

a dividing module, configured to successively divide the weighted hypergraph equalization into a weighted hypergraph subgraph by a partitioning algorithm;

And a restoration module configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.

Preferably, the above device also has the following features:

The conversion module is configured to aggregate the vertices having the same label in the original graph data into a super point by a parallel label transfer algorithm, and the weight of the super point is the number of vertices included in the super point; The connected edge between the super points is a super edge, and the weight of the super edge is determined by the edge in the original image; the weighted hypergraph is constructed by the super point and the super edge.

Preferably, the above device also has the following features:

Determining the weight of the super edge from the edge in the original graph includes: if two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a strip between the two hyperpoints The super edge, the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.

Preferably, the above device also has the following features:

The dividing module is configured to calculate the minimum in order from the super point in the weighted hypergraph The partial cut rate value is obtained, and the weighted hypergraph is divided into the weighted hypergraph subgraphs of the specified number of blocks according to the minimized partial cut rate value.

The embodiment of the invention further provides a computer program, comprising program instructions, which when executed by the data segmentation device, enable the device to perform the above method.

Embodiments of the present invention also provide a carrier carrying the above computer program.

In summary, the embodiments of the present invention provide a method and a device for segmenting data of a graph, which can be faster in segmentation, larger in processing data size, and less in coupling between the segmented data blocks, thereby effectively reducing parallel computing using the BSP model. Data communication between the working vertices in the platform improves processing efficiency.

BRIEF abstract

1 is a flowchart of a method for dividing a graph data according to an embodiment of the present invention;

2 is a flowchart of a parallel label transfer algorithm according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an apparatus for data segmentation according to an embodiment of the present invention; FIG.

4 is a data original diagram of Embodiment 1 of the present invention;

Figure 5 is a diagram showing the effect of the label transmission after the embodiment 1 of the present invention;

6 is a weighted hypergraph of Embodiment 1 of the present invention;

Figure 7 is a diagram showing the original data of the embodiment 2 of the present invention;

Figure 8 is a diagram showing the effect of the label transfer according to the second embodiment of the present invention;

9 is a weighted hypergraph of Embodiment 2 of the present invention;

10 is a data original diagram of Embodiment 3 of the present invention;

Figure 11 is a diagram showing the effect of the label transfer after the embodiment of the present invention;

Figure 12 is a weighted hypergraph of Embodiment 3 of the present invention.

Preferred embodiment of the invention

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that In the case of no conflict, the embodiments in the present application and the features in the embodiments may be arbitrarily combined with each other.

In practical applications, the graph is not random. Local dense subgraphs are widely used in many networks. The vertices inside these subgraphs are closely connected to each other and have fewer connections to external vertices. The segmentation method of this embodiment considers that such a dense subgraph should not be divided into two or more partitions, but is treated as an inseparable "atoms", so that the division of the map is turned into a pair. The division of these indivisible "atoms".

This embodiment provides a bottom-up segmentation method. The method is divided into two phases, an aggregation phase and a segmentation phase. In the aggregation phase, locally dense subgraphs are aggregated into one piece by a distributed tag propagation algorithm. These locally dense subgraphs form the most basic segmentation unit, referred to herein as Super Vertex, which forms a Super Graph. In the segmentation phase, the hypergraph generated in the aggregation phase is divided by a greedy successive graph segmentation algorithm. Each time, one and the other subgraphs are extracted from the hypergraph to cut the smallest set, so that the supergraph is successively obtained, and finally the superpoint is restored to the original locally dense subgraph, and the segmentation of the graph is completed.

The symbols in this embodiment are defined as follows:

Use G = (V, E) to represent a graph, where V represents the set of points in the graph,

Is a collection of edges. The adjacency matrix of the graph is M = (ω _{i, j} ) _{i, j = 1, 2, ..., n} . ω _i,j >0 means that the vertex v _i and v _{j are} connected, and the weight is ω _i,j . When the graph G is a weightless graph, if the point v _i is adjacent to v _j , ω _i,j =1, if the point v _i is not adjacent to v _j , ω _i,j =0. N(S) represents the neighborhood of the vertex S,

For any set of vertices

Its complement V\V _{i is} expressed as

For any two sets A and B, define:

The cut set of the set V _i is

The edge of the cut set C _i is cut

P = V ₁ , V ₂ , ..., V _k is the k-way division of graph G, if and only if: (1), ∪ _i V _i = V and (2),

The most straightforward way to construct a partition is to solve the minimum edge cut problem: choose a partition P = V ₁ , V ₂ , ..., V _k to minimize the following formula:

However, the solution of the minimum edge cut problem usually does not produce an equilibrium partition of the graph. The objective function of the equilibrium partition is defined as follows:

As shown in FIG. 1, a method for segmenting data of the embodiment includes the following steps:

S1, converting the original graph data into a locally dense weighted hypergraph by a parallel label transfer algorithm;

In the parallel tag propagation method, the vertex only needs to send tag information to its neighboring vertices without acquiring other vertex information. The algorithm is linear and complex, suitable for parallel computing, as shown in Figure 2, including the following steps:

Step 11. Initialize the label of each vertex in the graph. For a given vertex v, the label L _v (0)=v indicates that the label of the vertex is the ID of the vertex at the beginning of the 0th iteration, and an iteration counter t is set. =0;

Step 12. For each vertex v, send its own label to its adjacent vertex;

Step 13. For each vertex v, calculate its own new label according to the received label information sent by other vertex. Calculated as follows:

L _u (t)=f(L _u1 (t-1),...,L _ui (t-1))

Where, _ui (t-1) represents the label state of the i-th adjacent vertex of the vertex u in the (t-1) loop, and f is the label that returns the most occurrences. When the number of tags is more than one, the random selection is performed. One of them is the return result of f;

Step 14, t = t + 1, then jump to step 12, when all vertex labels no longer change, or t reaches the set upper limit, the algorithm stops.

After the calculation is completed, the results are sorted to obtain a weighted hypergraph G _m (V _m , E). Where V _m is a super point set, and the subscript m is the number of tag types of all vertices in the graph at the end of the aggregation algorithm. The vertices with the same label are aggregated into a super point. The weight of the super point is the number of vertices contained in the super point, that is, the number of vertices with the same label. E is the super-edge between the hyper-points in the hypergraph, and its weight is determined by the edge in the original graph: 1) The two endpoints of the edge in the original graph belong to different hyper-points in the hypergraph, then the two super-points exist directly A super edge, the weight of the edge increases by 1; 2) the two endpoints of the edge in the original graph belong to the same hyperpoint in the hypergraph, and since the hypergraph of the scheme does not allow the loop, no superedge is generated. In summary, the weight of the super edge in the hypergraph is not less than one.

S2, dividing the weighted hypergraph balance into weighted hypergraph subgraphs by a partitioning algorithm;

A weighting graph G _m (V _m , E) can be obtained by the label propagation method. In order to solve the equalization partition of the weighted graph, the present embodiment proposes an algorithm for gradually minimizing the Ratio-Cut (cutting rate). The algorithm is executed in multiple steps, each step looking for a subset v _i capable of minimizing the objective function PRC (Partial-Ratio Cut), then removing the subset from the graph, and then remaining in the same way Look for the next subset in the diagram. The PRC is defined as follows:

Wherein, V _i represents the i-th block partition of the graph, and |V _i | represents the number of vertices in the partition,

Represents V _i and its complement

The number of sides between. PRC(V _i ) can be understood as the number of edges for which the vertices of each of the vertices are cut off for the division V _{i of the} graph.

The steps of successively dividing the algorithm include:

Step 21: Initialize the super point set setList and the bestSet as an empty set, set the number of blocks to be divided into k, and set the block counter i=0;

Step 22: Calculate a super point set for the super point v ∈ V _m in the weighted hypergraph G _m (V _m , E):

The formula indicates that the supergraph G _m is divided by the vertex v as the starting point, and the number of blocks is k, and each partition should be roughly included.

For the vertices, the calculation method of minimumPRC (minimized PRC) is shown below.

Step 22: Calculate the PRC value for the set S and the set bestSet respectively. If PRC(S)<PRC(bestSet), let the bestSet be S, and add the elements in the bestSet to the setList. BestSet element is removed from the set of points V _m in super. Counter i is incremented by 1, while emptying the elements in the bestSet.

Step 23. If the counter i is smaller than k, jump to step 202 if the counter i is greater than or equal to k, the algorithm stops and returns each time setList.

S3. Restore the weighted hypergraph subgraph to data corresponding to the original graph.

The super point set is restored to the corresponding original vertex set according to the super point information.

In the successive partitioning algorithm of this embodiment, the steps of the minimum PRC algorithm include:

Step 31: Initialize the super point result set V _i and set it as an empty set. Set the maximum number of elements in each V _i to n, where

k is the number of blocks that need to be divided.

Step 32: Select the super point v that has not been added to any division according to the order of the super point id from small to large, and calculate the substitution formula.

Value, select the super point that minimizes the formula value as the point to be added, ie calculate

among them,

d _v is the number of super edges with the super point v as the endpoint, and V _i is the super point result set.

Step 33: When |V _i |+|v|≤n, the super point v is added to the set V _i , and when |V _i |+|v|>n, the super point v is not added to the set V _i , where |v| Represents the number of elements in the set V.

Step 34, when |V _i |<n, jump to step 302.

Step 35: Return the result set V _i .

The method of the embodiment of the invention converts the original map data into a locally dense weighted hypergraph by a label transfer algorithm; a successive segmentation algorithm is designed to realize the equalization partition of the weighted graph; finally, the divided weighted graph is restored back. The vertices and edges corresponding to the original image. Compared with the related technology, the segmentation speed is faster, the processing data scale is larger, and the degree of coupling between the segmented data blocks is smaller, thereby effectively reducing data communication between the working vertices in the parallel computing platform using the BSP model, thereby improving Processing efficiency.

The method of the embodiments of the present invention can be widely used in the field where parallel processing of graph data is required:

1. In the parallel graph data analysis platform, it can be used as a data loading algorithm, and the corresponding working nodes are reasonably allocated according to the topology structure of the graph data, thereby reducing the communication between the working nodes in the parallel computing process;

2. In the parallel graph data analysis algorithm, it can be used as part of data preprocessing to divide the original data, improve the efficiency of the algorithm, and shorten the running time;

3. It can be used as a supervised graph analysis algorithm to cluster graph data. The desired clustering result is calculated by manually setting the number of types to be aggregated.

FIG. 3 is a schematic diagram of an apparatus for data segmentation according to an embodiment of the present invention. As shown in FIG. 3, the apparatus of this embodiment may include:

a dividing module, configured to divide the weighted hypergraph equalization into weighted hypergraph subgraphs by a partitioning algorithm;

The restoration module is configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.

In a preferred embodiment, the conversion module is configured to aggregate the vertices having the same label in the original map data into a super point by a parallel label transfer algorithm, and the weight of the super point is included in the super point. The number of vertices; the connected edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image; the weighted hypergraph is constructed by the super point and the super edge .

The weight of the super edge determined by the edge in the original image includes: if two endpoints of the edge in the original image belong to different hyperpoints in the weighted hypergraph, then between the two superpoints There is a super edge, and the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.

In a preferred embodiment, the dividing module is configured to sequentially calculate a minimized PRC value starting from a super point in the weighted hypergraph, and divide the weighted hypergraph according to the minimized PRC value. A weighted supergraph subgraph that specifies the number of blocks.

The method of the embodiment of the present invention is described in detail below through several application examples.

Example 1:

Referring to FIG. 4, assuming that the number of required partitioning blocks is two, the dividing steps of this embodiment are as follows:

Step 101: Passing an aggregate hypergraph in parallel tags;

1), the label of each vertex in the initialization graph is its own ID; set the counter t=0, set the upper limit of the counter t to 60; take the vertex set {0, 1, 2, 3, 4} in the figure as an example, each The tag values of the vertices are their own IDs, that is, L ₀ (0) = 0, L ₁ (0) = 1, L ₂ (0) = 2, L ₃ (0) = 3, and L ₄ (0) = 4.

2) Each vertex v sends its own label to its neighboring vertex. According to the adjacency relationship, vertex 0 sends label 0 to the adjacent point {1, 2, 3, 4, 5}, and vertex 1 sends label 1 to the adjacent Vertex {0, 2, 3, 4}, vertex 2 sends label 2 to adjacent vertex {0, 1, 3, 4, 8}, vertex 3 sends label 3 to adjacent vertex {0, 1, 2, 4} , vertex 4 sends label 1 to adjacent vertices {0, 1, 2, 3, 11, 14};

3) Each vertex v calculates its own new label according to the received labels sent by other vertices, and returns the label with the most occurrences. When there are more than one label, one of them is randomly selected; The label information is received once, and the labels received by each vertex are all once. In order to simplify the calculation process, the random selection label result is 1, and the result L ₀ (1)=1, L ₁ (1)=2 , L ₂ (1)=1, L ₃ (1)=1, L ₄ (1)=1, since the tag information received by vertex 1 does not include 1, so the random selection result is 2;

4) Since the iteration does not reach the set upper limit, each vertex continues to send its own current tag information, vertex 0 sends tag 1 to the adjacent point {1, 2, 3, 4, 5}, and vertex 1 sends tag 2 to phase Neighbor vertex {0, 2, 3, 4}, vertex 2 sends label 1 to adjacent vertex {0, 1, 3, 4, 8}, vertex 3 sends label 1 to adjacent vertex {0, 1, 2, 4 }, vertex 4 sends label 1 to adjacent vertices {0, 1, 2, 3, 11, 14};

5) Each vertex accepts the tag information and calculates its own new tag. The information received by vertex 0 is L ₁ (1)=2, L ₂ (1)=1, L ₃ (1)=1, L ₄ (1)=1, L ₅ (1)=label5, since 1 is Most, L ₀ (2)=1; the information received by vertex 1 is L ₀ (1)=1, L ₂ (1)=1, L ₃ (1)=1, L ₄ (1)=1, Since 1 is the majority, L ₁ (2)=1; the rest of the vertex calculation process is similar to vertex 1, and the subsequent iteration step remains unchanged because most of the labels in the adjacent vertex are 1.

6), t=t+1, jump to 2), repeat the above process until the vertex labels no longer change or the counter t reaches the set upper limit, the algorithm stops.

The result of the calculation in step 101 is shown in Fig. 5. The same shaded background is used to indicate the same vertices at the end of the aggregation. Generate a corresponding weighted hypergraph, as shown in Figure 6, where the vertex corresponding to the super point 1 is {5, 6, 7, 8, 9}, and the vertex corresponding to the super point 2 is {0, 1, 2, 3, 4}, the vertices of the original image corresponding to the super point 3 are {10, 11, 12, 13, 14}, and the vertices of the original image corresponding to the super point 4 are {15, 16, 17, 18, 19}, and the super point 5 The corresponding vertices of the original image are {25, 26, 27, 28, 29}, the vertices of the original image corresponding to the super point 6 are {20, 21, 22, 23, 24}, and the vertices of the original image corresponding to the super point 7 are {50 , 51, 52, 53, 54}, the vertex corresponding to the super point 8 is {55, 56, 57, 58, 59}, and the vertex corresponding to the super point 9 is {40, 41, 42, 43, 44 }, corresponding to super 10 The vertices of the original image are {35, 36, 37, 38, 39}, the vertices of the original image corresponding to the super point 11 are {30, 31, 32, 33, 34}, and the vertices of the original image corresponding to the super point 12 are {45, 46. , 47, 48, 49}. Among them, the super-edge weights are marked, and the super-point weights are all 5.

Step 102: successively dividing;

1) Initializing the super point set setList and the bestList as an empty set, setting the number of blocks to be divided into 2, and setting the block counter i=0;

2) Calculate the minimumPRC value in turn based on the super point in the weighted hypergraph. According to the data shown in FIG. 5, S=minimizePRC(G _m ,6,1) is calculated starting from the super point 1 , and the set of the number of elements is 6 from the super map G _m with the super point 1 as the starting point. According to the minimumPRC algorithm, the following calculations are performed:

21), setting the result set S to be an empty set, and adding the starting super point 1 as an element;

22) At this time, the number of S elements in the set is 1, which is less than the set value of 6. It is necessary to continue to add the super point; calculate the remaining super points and select to make

The smallest super point is added to the collection. Taking super point 2 as an example, the calculation is made, d ₂ is the weight sum of all the edges of the super point 2, which is 2+2= _4, 2k _{2, V} = 2∑ _{2, j∈V} W(2,1)= 2*1=2, so

Similarly, calculate other super points and get the corresponding value of super point 3.

Super point 4 corresponding value

Super point 5 corresponding value

Super point 6 corresponding value

Super point 7 corresponding value

Super point 8 corresponding value

Super point 9 corresponding value

Super point 10 corresponding value

Super point 11 corresponding value

Super point 12 corresponding value

Since the element of the current set S is 1, which is smaller than the required 6 of the target, the super point 2 is added as the element of the set S;

23), repeating 22) until the number of elements of the set S reaches 6; and returns the set S as a result; the set S is {1, 2, 3, 4, 5, 6};

24) Perform a PRC calculation on the result set S of step 22) to obtain PRC(S)=0.33333. Since the bestSet is an empty set at this time, PRC(S)<PRC(bestSet), let the bestSet be S, and increase the content in the bestSet. Element to setList, then jump to step 22), calculate the set S=minimizePRC(G _m ,6,2) starting from the super point 2, and the PRC(S) corresponding to S. After all the 12 super points are calculated as the starting point, the S (ie bestSet) set with the smallest PRC(S) is one of the data partitions, and the elements in the bestSet are removed from the super point set V _m , the counter i Increase by 1;

25) Since k=2, after dividing one block, the remaining super point is another block result, and finally returns two pieces of data result.

The calculation result is divided into two super-point sets (7, 8, 9, 10, 11, 12} and (1, 2, 3, 4, 5, 6}, which can be seen in the corresponding figure, cut off the super point 12 and The weight between 3, 4, and 8 is the super edge of 1.

26), restore the super point back to the corresponding original point, and complete the segmentation.

Referring to FIG. 7, assuming that the number of blocks required for segmentation is four, the embodiment includes the following steps:

Step 201: The parallel label passes the aggregate hypergraph, and includes the following steps:

Step 2011, the label of each vertex in the initialization graph is its own ID; setting the counter t=0, setting the upper limit of the counter t to 60; taking the vertex set {18, 19, 20, 21, 22, 23} in the figure as an example for description The label value of each vertex is its own ID, that is, L ₁₈ (0)=18, L ₁₉ (0)=19, L ₂₀ (0)=20, L ₂₁ (0)=21, L ₂₂ (0)= 22, L ₂₃ (0) = 23;

Step 2012, each vertex v, sends its own label to its adjacent vertex. According to the adjacency relationship, the vertex 18 sends the label 18 to the adjacent point {12, 19, 20, 21, 22, 23, 24}, and the vertex 19 sends Label 19 to adjacent vertices {17, 18, 20, 21, 22, 23}, vertex 20 sends label 20 to adjacent vertices {18, 19, 21, 22, 23}, vertex 21 sends label 21 to adjacent vertices {18,19,20,22,23}, vertex 22 sends label 22 to adjacent vertices {18, 19, 20, 21, 23, 28}, vertex 23 sends label 23 to adjacent vertices {18, 19, 20 , 21, 22};

Step 2013: Each vertex v calculates a label of the highest number of occurrences according to the received label sent by other vertices, and randomly selects one of the labels when the number of times has a maximum number of labels; The label information is received once, and the labels received by each vertex are all once. In order to simplify the calculation process, the random selection label result is 18, and the result L ₁₉ (1)=18, L ₂₀ (1)=18, L ₂₁ (1)=18, L ₂₂ (1)=18, L ₂₃ (1)=18, since the tag information received by the vertex 18 does not include 18, the random selection result is 12;

Step 2014, since the iteration does not reach the set upper limit, each vertex continues to send its own current tag information, and the vertex 18 sends the tag 12 to the adjacent vertex {12, 19, 20, 21, 22, 23, 24}, vertex 19 Send label 18 to adjacent points {17, 18, 20, 21, 22, 23}, vertex 20 sends label 18 to adjacent vertices {18, 19, 21, 22, 23}, vertex 21 sends label 18 to adjacent Vertices {18, 19, 20, 22, 23}, vertex 22 sends label 18 to adjacent vertices {18, 19, 20, 21, 23, 28}, vertex 23 sends label 18 to adjacent vertices {18, 19, 20,21,22};

In step 2015, each vertex accepts the tag information and calculates its own new tag. The information received by vertex 18 is L ₁₂ (1) = lable12, L ₁₉ (1) = 18, L ₂₀ (1) = 18, L ₂₁ (1) = 18, L ₂₂ (1) = 18, L ₂₃ ( 1)=18, L ₂₄ (1)=label24, since 18 is the majority, L ₁₈ (2)=18; the information received by vertex 19 is L ₁₇ (1)=label17, L ₁₈ (1)=12, L ₂₀ (1)=18, L ₂₁ (1)=18, L ₂₂ (1)=18, L ₂₃ (1)=18, since 18 is the majority, L ₁₉ (2)=18; the rest of the vertex calculation process Similar to vertex 18. And in the subsequent iteration step, since most of the labels in the adjacent vertices are 18, they remain unchanged. In the above process, lable12 and lable17 respectively indicate that the vertices numbered 12 and 17 are self-labeled in the same process, and since their labels are a minority in the set of vertices discussed, there is no influence on the calculation result.

t=t+1, jump to step 2012, and repeat the above process until the vertex tags no longer change or the counter t reaches the set upper limit, the algorithm stops.

The result of the calculation in step 201 is shown in Fig. 8. The same shaded background is used to indicate the same vertices at the end of the aggregation. Generate a corresponding weighted hypergraph, see Figure 9, where the vertex corresponding to the super point 1 is {42, 43, 44, 45, 46, 47}, and the vertex corresponding to the super point 2 is {12, 13 , 14, 15, 16, 17}, the vertex of the original image corresponding to the super point 3 is {24, 25, 26, 27, 28}, and the vertex of the original image corresponding to the super point 4 is {0, 1, 2, 3, 4 , 5}, the vertices of the original image corresponding to the super point 5 are {18, 19, 20, 21, 22, 23}, and the vertices of the original image corresponding to the super point 6 are {6, 7, 8, 9, 10, 11}, The vertices of the original image corresponding to the super point 7 are {30, 31, 32, 33, 34, 35}, and the vertices of the original image corresponding to the super point 8 are {36, 37, 38, 39, 40, 41}, wherein the super-edge weights are See the logo, the super point weight is 6.

Step 202: Divide successively, including the following steps:

Step 2021: Initialize the super point set setList and the bestList as an empty set, set the number of blocks to be divided to 4, and set the block counter i=0;

Step 2022: Calculate the minimum PRCC value sequentially starting from the super point in the weighted hypergraph. According to the data shown in Fig. 8, S=minimizePRC(G _m , 2,1) is calculated from the point of the super point 1 , and the set of the number of elements not exceeding 2 is selected from the super map G _m with the super point 1 as the starting point. According to the minimumPRC algorithm, the following calculations are performed:

1), setting the result set S to be an empty set, and adding the starting super point 1 as an element;

2) At this time, the number of elements of the set S is 1, which is smaller than the target value of 2. It is necessary to continue to add the super point; at this time, the number of elements of the set S is 1, which is smaller than the target value of 2, and it is necessary to continue to add the super point; Calculation

The smallest super point is added to the collection. Take Super Point 2 as an example to calculate. d ₂ is the weight sum of all the edges of the super point 2, which is 3+1= _4, 2k _{2, S} =0, so

Super point 4 corresponding value

Super point 5 corresponding value

Super point 6 corresponding value

Super point 7 corresponding value

Super point 8 corresponding value

Since the element of the current set S is 1, which is smaller than the 2 required by the target, the super point 8 is added as the element of the set S;

3) Since the set element set size is 2, the set S satisfies the requirement and returns the set S as a result; the set S is {1, 8}.

Step 2023: Perform a PRC calculation on the result set S of step 2022 to obtain PRC(S)=1.0. Since the bestSet is an empty set at this time, PRC(S)<PRC(bestSet), and the bestSet is S, and the elements in the bestSet are added. To setList. Then, the process jumps to step 2022 to calculate a set S=minimizePRC(G _m , 2, 2) starting from the super point 2, and a PRC(S) corresponding to S. After all the 8 super points are calculated as the starting point, the S (ie bestSet) set with the smallest PRC(S) is one of the data partitions, and the elements in the bestSet are removed from the super point set V _m , the counter i Increase by 1.

At step 2024, steps 2022 and 2023 are continued until the blocking counter equals 4 ends.

Step 2025: Since k=4, after dividing three blocks, the remaining super points are another block result, and finally four block data results are returned.

The calculation result is divided into four super point sets {1, 8}, {2, 5}, {3, 7}, {4, 6}. It can be seen from the corresponding figure that the first time cut off the

super point

7, 8 The first side has a weight of 1 and a super edge of 2, and the second time cuts the

super point

2, 6 and 3, the weight between 1 and 2 is the super edge of 1 and 2, and the third time cuts the super point 4. The weight between 7 and 6, 7 is 1 for the super edge.

Step 2026: Restore the super point back to the corresponding original point to complete the segmentation.

Referring to FIG. 10, it is assumed that the number of blocks required for segmentation is 2. This embodiment includes the following steps:

Step 301: The parallel label passes the aggregate hypergraph, and includes the following steps:

Step 3011: The label of each vertex in the initialization graph is its own ID; the counter t=0 is set, and the upper limit of the counter t is set to 60; taking the set of vertices in the figure {28, 29, 30, 31, 32, 33, 34} as an example. To be explained, the label value of each vertex is its own ID, that is, L ₂₈ (0)=28, L ₂₉ (0)=29, L ₃₀ (0)=30, L ₃₁ (0)=31, L ₃₂ (0) ) = 32, L ₃₃ (0) = 33, L ₃₄ (0) = 34.

Step 3012: Each vertex v sends its own label to its adjacent vertex. According to the adjacency relationship, vertex 28 sends label 28 to the adjacent point {21, 29, 30, 21, 32, 33, 34, 35}, vertex 29 Send the tag 29 to the adjacent vertex {28, 30, 31, 32, 33, 34}, the vertex 30 sends the tag 30 to the adjacent vertex {28, 29, 31, 32, 33, 34}, and the vertex 31 sends the tag 31 to Adjacent vertices {28, 29, 30, 32, 33, 34}, vertex 32 sends label 32 to adjacent vertices {28, 29, 30, 31, 33, 34}, vertex 33 sends label 33 to adjacent vertices { 27, 28, 29, 30, 31, 32, 34}, vertex 34 sends label 34 to adjacent vertices {5, 28, 29, 30, 31, 32, 33};

Step 3013: Each vertex v calculates its own new label according to the received label sent by other vertices, and returns the label with the most occurrences. When the number of labels is more than one, one of them is randomly selected; The label information is received once, and the labels received by each vertex are all once. In order to simplify the calculation process, the random selection label result is 28, and the result L ₂₉ (1)=28, L ₃₀ (1)=28 , L ₃₁ (1)=28, L ₃₂ (1)=28, L ₃₃ (1)=28, L ₃₄ (1)=28, since the tag information received by the vertex 28 does not include 28, so it is randomized The result of the selection is 29, with L ₂₈ (1)=29;

Step 3014: Since the iteration does not reach the set upper limit, each vertex continues to transmit its own current tag information, and the vertex 28 sends the tag 29 to the adjacent point {21, 29, 30, 21, 32, 33, 34, 35}, the vertex 29 sends label 28 to adjacent vertices {28, 30, 31, 32, 33, 34}, vertex 30 sends label 28 to adjacent vertices {28, 29, 31, 32, 33, 34}, vertex 31 sends label 28 To adjacent vertices {28, 29, 30, 32, 33, 34}, vertex 32 sends label 28 to adjacent vertices {28, 29, 30, 31, 33, 34}, vertex 33 sends label 28 to adjacent vertices {27, 28, 29, 30, 31, 32, 34}, vertex 34 sends label 28 to adjacent vertices {5, 28, 29, 30, 31, 32, 33};

In step 3015, each vertex accepts the tag information and calculates its own new tag. The information received by vertex 28 is L ₁₂ (1)=lable21, L ₂₉ (1)=28, L ₃₀ (1)=28, L ₃₁ (1)=28, L ₃₂ (1)=28, L ₃₃ ( 1) = 28, L ₃₄ (1) = 28, L ₃₅ (1) = label ₃₅ , since 28 is the majority, L ₂₈ (2) = 28; the information received by vertex 29 is L ₂₈ (1) = 29, L ₃₀ (1)=28, L ₃₁ (1)=28, L ₃₂ (1)=28, L ₃₃ (1)=28, L ₃₄ (1)=28, since 28 is the majority, L ₂₉ (2 ) = 28; the rest of the vertex calculation process is similar to vertex 28. And in the subsequent iteration step, since most of the labels in the adjacent vertices are 28, they remain unchanged. In the above process, lable21 and label35 respectively indicate that the vertices numbered 21 and 35 are self-labeled in the same process, and since their labels are a minority in the set of vertices discussed, there is no influence on the calculation result.

t=t+1, jump to (ii), repeat the above process until the vertex tags no longer change or the counter t reaches the set upper limit, the algorithm stops.

The result of the calculation in step 301 is shown in Fig. 11, in which the same shaded background is used to indicate the same vertices at the end of the aggregation. Generate a corresponding weighted hypergraph, as shown in Figure 12, where the vertex corresponding to the super point 1 is {35, 36, 37, 38, 39, 40, 41}, and the vertex corresponding to the super point 2 is {0, 1,2,3,4,5,6}, the vertex of the original image corresponding to the super point 3 is {7,8,9,10,11,12,13}, and the vertex of the original image corresponding to the super point 4 is {14, 15,16,17,18,19,20}, the vertex of the original image corresponding to the super point 5 is {49, 50, 51, 52, 53, 54, 55}, and the vertex of the original image corresponding to the super point 6 is {28, 29, 30, 31, 32, 33, 34}, the vertex of the original image corresponding to the super point 7 is {21, 22, 23, 24, 25, 26, 27}, and the vertex of the original image corresponding to the super point 8 is {42, 43,44,45,46,47,48}. Among them, the super-edge weights are marked, and the super-point weights are all 7.

Step 302: successively dividing;

Step 3022: Initializing the super point set setList and the bestList as an empty set, setting the number of blocks to be divided into 2, and setting the block counter i=0;

In step 3023, the minimum PRCC value is sequentially calculated starting from the super point in the weighted hypergraph. According to the data shown in FIG. 5, S=minimizePRC(G _m , 4, 1) is calculated starting from the super point 1 , and the set of the number of elements is selected from the super map G _m with the super point 1 as the starting point. According to the minimumPRC algorithm, the following calculations are performed:

2) At this time, the number of S elements in the set is 1, which is less than the set 4, and it is necessary to continue to add the super point; at this time, the number of S elements in the set is 1, less than the set 4, and it is necessary to continue to add the super point; Point to calculate, choose to make

The smallest super point is added to the collection. Taking super point 2 as an example, the calculation is made, d ₂ is the weight sum of all the edges of the super point 2, which is 2+1+1= _4, 2k _{2, S} = 2∑ _{2, j∈S} W(2,1 )=2*0=0, so

Super point 4 corresponding value

Super point 5 corresponding value

Super point 6 corresponding value

Super point 7 corresponding value

Super point 8 corresponding value

Since the element of the current set S is 1, less than the 4 required by the target, due to

Add super point 2 to the element of set S;

3), repeat execution 2) until the number of elements of the set S reaches 4; and returns the set S as a result; the set S is {1, 2, 5, 8};

Step 3024: Perform a PRC calculation on the result set S of step 3022 to obtain PRC(S)=1.5. Since the bestSet is an empty set at this time, PRC(S)<PRC(bestSet), and the bestSet is S, and the elements in the bestSet are added. To setList. Then, it jumps to step (ii), and calculates a set S=minimizePRC(G _m , 4, 2) starting from the super point 2, and a PRC(S) corresponding to S. When the set of all 8 superpoints is the starting point, the S (ie bestSet) set with the smallest PRC(S) is one of the data partitions. BestSet element is removed from the set of points V _m in super. Counter i is incremented by 1.

Step 3025: Since k=2, after dividing one block, the remaining super point is another block result, and finally returns two pieces of data result.

The calculation result is divided into two super point sets {1, 2, 5, 8} and {3, 4, 6, 7}. It can be seen from the corresponding figure that the

super points

1, 3 and 1, 6 and 7 are cut off. The super-edge with a weight of 1 between 8 and 5, and a super-edge with a weight of 2 between 2 and 3.

Restore the super point back to the corresponding original point and complete the segmentation.

One of ordinary skill in the art will appreciate that all or a portion of the steps described above can be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any specific form of combination of hardware and software.

The above is only a preferred embodiment of the present invention, and of course, the present invention may have other various embodiments. A person skilled in the art can make various corresponding changes and modifications in accordance with the present invention without departing from the spirit and scope of the invention, but the corresponding changes and modifications are to be included in the appended claims. The scope of protection.

Industrial applicability

The embodiment of the invention provides a method and a device for segmenting data of a graph, which can be faster in segmentation, larger in processing data size, less coupled between the segmented data blocks, and effectively reduce the number of parallel computing platforms using the BSP model. Data communication between working vertices improves processing efficiency.

Claims

A method for segmenting data of a graph, comprising:

Converting the original graph data into a locally dense weighted hypergraph by a parallel label transfer algorithm;

The weighted hypergraph equilibrium is successively divided into weighted hypergraph subgraphs by a partitioning algorithm;

The weighted hypergraph subgraph is restored to the data corresponding to the original graph.
The method of claim 1 wherein said converting said original map data to a locally dense weighted hypergraph by a parallel label transfer algorithm comprises:

The vertex with the same label in the original graph data is aggregated into a super point by a parallel label passing algorithm, and the weight of the super point is the number of vertices included in the super point;

The connecting edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image;

The weighted hypergraph is represented by the super point and the super edge.
The method of claim 2 wherein: the weight of said superedge is determined by an edge in the original image, comprising:

If the two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a super edge between the two superpoints, and the weight of the super edge is increased by 1;

If the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no superedge is generated.
The method according to any one of claims 1-3, wherein: the weighted hypergraph equalization is divided into weighted hypergraph subgraphs by a partitioning algorithm, including:

Minimizing the localized partial rate value by using the super point in the weighted hypergraph as a starting point

And dividing the weighted hypergraph into a weighted hypergraph subgraph of a specified number of blocks according to the minimized partial cut rate value.
A device for segmenting data of a graph, comprising:

a conversion module configured to convert the original map data into a locally dense weighted hypergraph by a parallel label transfer algorithm;

a dividing module, configured to divide the weighted hypergraph equalization into weights by a partitioning algorithm Supergraph subgraph;

And a restoration module configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.
The apparatus of claim 5 wherein:

The conversion module is configured to: aggregate the vertices having the same label in the original graph data into a super point by a parallel label transfer algorithm, and the weight of the super point is the number of vertices included in the super point; The connected edge between the super points is a super edge, and the weight of the super edge is determined by the edge in the original image; the weighted hypergraph is constructed by the super point and the super edge.
The apparatus of claim 6 wherein:

Determining the weight of the super edge from the edge in the original graph includes: if two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a strip between the two hyperpoints The super edge, the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.
A device according to any of claims 5-7, wherein:

The dividing module is configured to: sequentially calculate a minimized partial cut rate value starting from a super point in the weighted hypergraph, and divide the weighted hypergraph according to the minimized partial cut rate value A weighted supergraph subgraph that specifies the number of blocks.
A computer program comprising program instructions which, when executed by a data segmentation device, cause the device to perform the method of any of claims 1-4.
A carrier carrying the computer program of claim 9.