WO2015169029A1 - Graph data partitioning method and device - Google Patents

Graph data partitioning method and device Download PDF

Info

Publication number
WO2015169029A1
WO2015169029A1 PCT/CN2014/087091 CN2014087091W WO2015169029A1 WO 2015169029 A1 WO2015169029 A1 WO 2015169029A1 CN 2014087091 W CN2014087091 W CN 2014087091W WO 2015169029 A1 WO2015169029 A1 WO 2015169029A1
Authority
WO
WIPO (PCT)
Prior art keywords
super
edge
point
vertex
hypergraph
Prior art date
Application number
PCT/CN2014/087091
Other languages
French (fr)
Chinese (zh)
Inventor
罗圣美
曲文武
刘丽霞
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015169029A1 publication Critical patent/WO2015169029A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of cloud computing technologies and graph data analysis technologies, and in particular, to a method and apparatus for graph data segmentation.
  • BSP Bit Synchronous Parallel
  • the algorithm is divided into several super steps, each of which is divided into three processes, namely local computing, mutual communication, and phase synchronization.
  • the BSP parallel model is suitable for high iteration calculations.
  • the graph data is segmented, and the graph data is the data stored in the graph structure.
  • Graph structure is the most commonly used class of abstract data structures in computer science. It consists of a finite number of vertices and edges between connected vertices. It has a more general representation than linear table structures and tree structures.
  • the existing graph data segmentation scheme mainly has the following types of methods: heuristic method, represented by Kernighan-Lin algorithm.
  • the graph data is first divided into two sets of A and B, and then the influence of each vertex in the A set and each vertex in the B set on the set weight is calculated, and each exchange has the greatest influence on the set weight. The two vertices until the end condition is reached.
  • the spectral segmentation method which calculates the eigenvectors of the Laplacian matrix of the graph, extracts the first k eigenvalues and their corresponding eigenvectors, obtains the representation of each vertex in the low-dimensional space, and then performs k-means clustering. , to obtain the division of the map.
  • the calculation time complexity is higher: for example, the Kernongan-Lin algorithm, because it needs to compare the vertices in the two sets separately, calculates the influence of the weights on the set after the exchange, so the time complexity is O(n3).
  • the data needs to be divided into multiple copies.
  • the Kernighan-Lin algorithm the algorithm needs to be run multiple times for the two segmentation results of the first step, and the time consumption is more.
  • the spectral analysis method needs to solve the eigenvalue decomposition problem of n-order square matrix, and its time complexity is O(n3).
  • the matrix calculation for large-scale graph data is complicated.
  • the computational space complexity is high: as in the spectral analysis method, it is necessary to construct an adjacency matrix for the vertices in the graph data, and then perform Laplace decomposition, and then perform the segmentation calculation.
  • the adjacency matrix of the graph is n ⁇ n, where n is the number of vertices in the graph. Due to the large number of vertices in the large image data, the matrix is also quite large, which is not conducive to calculation and caching.
  • the embodiment of the invention provides a method and a device for segmenting graph data, so as to overcome the problems of high time complexity and space complexity and difficulty of parallelization existing in the existing data segmentation technology.
  • the embodiment of the invention provides a method for segmenting data of a graph, comprising:
  • the weighted hypergraph equilibrium is successively divided into weighted hypergraph subgraphs by a partitioning algorithm
  • the weighted hypergraph subgraph is restored to the data corresponding to the original graph.
  • the above method further has the following feature: converting the original image data into a locally dense weighted hypergraph by a parallel label transfer algorithm, including:
  • the vertex with the same label in the original graph data is aggregated into a super point by a parallel label passing algorithm, and the weight of the super point is the number of vertices included in the super point;
  • the connecting edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image
  • the weighted hypergraph is represented by the super point and the super edge.
  • the above method further has the following feature: the weight of the super edge is determined by an edge in the original image, including:
  • the above method also has the following features:
  • the embodiment of the invention further provides an apparatus for segmenting data of a graph, comprising:
  • a conversion module configured to convert the original map data into a locally dense weighted hypergraph by a parallel label transfer algorithm
  • a dividing module configured to successively divide the weighted hypergraph equalization into a weighted hypergraph subgraph by a partitioning algorithm
  • a restoration module configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.
  • the above device also has the following features:
  • the conversion module is configured to aggregate the vertices having the same label in the original graph data into a super point by a parallel label transfer algorithm, and the weight of the super point is the number of vertices included in the super point;
  • the connected edge between the super points is a super edge, and the weight of the super edge is determined by the edge in the original image;
  • the weighted hypergraph is constructed by the super point and the super edge.
  • the above device also has the following features:
  • Determining the weight of the super edge from the edge in the original graph includes: if two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a strip between the two hyperpoints The super edge, the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.
  • the above device also has the following features:
  • the dividing module is configured to calculate the minimum in order from the super point in the weighted hypergraph
  • the partial cut rate value is obtained, and the weighted hypergraph is divided into the weighted hypergraph subgraphs of the specified number of blocks according to the minimized partial cut rate value.
  • the embodiment of the invention further provides a computer program, comprising program instructions, which when executed by the data segmentation device, enable the device to perform the above method.
  • Embodiments of the present invention also provide a carrier carrying the above computer program.
  • the embodiments of the present invention provide a method and a device for segmenting data of a graph, which can be faster in segmentation, larger in processing data size, and less in coupling between the segmented data blocks, thereby effectively reducing parallel computing using the BSP model.
  • Data communication between the working vertices in the platform improves processing efficiency.
  • FIG. 1 is a flowchart of a method for dividing a graph data according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a parallel label transfer algorithm according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an apparatus for data segmentation according to an embodiment of the present invention.
  • Embodiment 4 is a data original diagram of Embodiment 1 of the present invention.
  • Figure 5 is a diagram showing the effect of the label transmission after the embodiment 1 of the present invention.
  • Embodiment 6 is a weighted hypergraph of Embodiment 1 of the present invention.
  • Figure 7 is a diagram showing the original data of the embodiment 2 of the present invention.
  • Figure 8 is a diagram showing the effect of the label transfer according to the second embodiment of the present invention.
  • Embodiment 9 is a weighted hypergraph of Embodiment 2 of the present invention.
  • Embodiment 10 is a data original diagram of Embodiment 3 of the present invention.
  • Figure 11 is a diagram showing the effect of the label transfer after the embodiment of the present invention.
  • Figure 12 is a weighted hypergraph of Embodiment 3 of the present invention.
  • the graph is not random.
  • Local dense subgraphs are widely used in many networks.
  • the vertices inside these subgraphs are closely connected to each other and have fewer connections to external vertices.
  • the segmentation method of this embodiment considers that such a dense subgraph should not be divided into two or more partitions, but is treated as an inseparable "atoms", so that the division of the map is turned into a pair. The division of these indivisible "atoms”.
  • This embodiment provides a bottom-up segmentation method.
  • the method is divided into two phases, an aggregation phase and a segmentation phase.
  • aggregation phase locally dense subgraphs are aggregated into one piece by a distributed tag propagation algorithm. These locally dense subgraphs form the most basic segmentation unit, referred to herein as Super Vertex, which forms a Super Graph.
  • segmentation phase the hypergraph generated in the aggregation phase is divided by a greedy successive graph segmentation algorithm. Each time, one and the other subgraphs are extracted from the hypergraph to cut the smallest set, so that the supergraph is successively obtained, and finally the superpoint is restored to the original locally dense subgraph, and the segmentation of the graph is completed.
  • Use G (V, E) to represent a graph, where V represents the set of points in the graph, Is a collection of edges.
  • ⁇ i,j >0 means that the vertex v i and v j are connected, and the weight is ⁇ i,j .
  • N(S) represents the neighborhood of the vertex S,
  • V ⁇ V i For any two sets A and B, define:
  • a method for segmenting data of the embodiment includes the following steps:
  • the vertex In the parallel tag propagation method, the vertex only needs to send tag information to its neighboring vertices without acquiring other vertex information.
  • the algorithm is linear and complex, suitable for parallel computing, as shown in Figure 2, including the following steps:
  • Step 11 Initialize the label of each vertex in the graph.
  • Step 12 For each vertex v, send its own label to its adjacent vertex;
  • Step 13 For each vertex v, calculate its own new label according to the received label information sent by other vertex. Calculated as follows:
  • ui (t-1) represents the label state of the i-th adjacent vertex of the vertex u in the (t-1) loop
  • f is the label that returns the most occurrences.
  • V m is a super point set
  • subscript m is the number of tag types of all vertices in the graph at the end of the aggregation algorithm.
  • the vertices with the same label are aggregated into a super point.
  • the weight of the super point is the number of vertices contained in the super point, that is, the number of vertices with the same label.
  • E is the super-edge between the hyper-points in the hypergraph, and its weight is determined by the edge in the original graph: 1) The two endpoints of the edge in the original graph belong to different hyper-points in the hypergraph, then the two super-points exist directly A super edge, the weight of the edge increases by 1; 2) the two endpoints of the edge in the original graph belong to the same hyperpoint in the hypergraph, and since the hypergraph of the scheme does not allow the loop, no superedge is generated. In summary, the weight of the super edge in the hypergraph is not less than one.
  • a weighting graph G m (V m , E) can be obtained by the label propagation method.
  • the present embodiment proposes an algorithm for gradually minimizing the Ratio-Cut (cutting rate).
  • the algorithm is executed in multiple steps, each step looking for a subset v i capable of minimizing the objective function PRC (Partial-Ratio Cut), then removing the subset from the graph, and then remaining in the same way Look for the next subset in the diagram.
  • PRC Partial-Ratio Cut
  • V i represents the i-th block partition of the graph
  • represents the number of vertices in the partition
  • PRC(V i ) can be understood as the number of edges for which the vertices of each of the vertices are cut off for the division V i of the graph.
  • the steps of successively dividing the algorithm include:
  • Step 22 Calculate a super point set for the super point v ⁇ V m in the weighted hypergraph G m (V m , E):
  • the formula indicates that the supergraph G m is divided by the vertex v as the starting point, and the number of blocks is k, and each partition should be roughly included.
  • minimumPRC minimum PRC
  • Step 22 Calculate the PRC value for the set S and the set bestSet respectively. If PRC(S) ⁇ PRC(bestSet), let the bestSet be S, and add the elements in the bestSet to the setList. BestSet element is removed from the set of points V m in super. Counter i is incremented by 1, while emptying the elements in the bestSet.
  • Step 23 If the counter i is smaller than k, jump to step 202 if the counter i is greater than or equal to k, the algorithm stops and returns each time setList.
  • the super point set is restored to the corresponding original vertex set according to the super point information.
  • the steps of the minimum PRC algorithm include:
  • Step 31 Initialize the super point result set V i and set it as an empty set. Set the maximum number of elements in each V i to n, where k is the number of blocks that need to be divided.
  • Step 32 Select the super point v that has not been added to any division according to the order of the super point id from small to large, and calculate the substitution formula. Value, select the super point that minimizes the formula value as the point to be added, ie calculate among them, d v is the number of super edges with the super point v as the endpoint, and V i is the super point result set.
  • Step 33 When
  • Step 34 when
  • Step 35 Return the result set V i .
  • the method of the embodiment of the invention converts the original map data into a locally dense weighted hypergraph by a label transfer algorithm; a successive segmentation algorithm is designed to realize the equalization partition of the weighted graph; finally, the divided weighted graph is restored back.
  • the segmentation speed is faster, the processing data scale is larger, and the degree of coupling between the segmented data blocks is smaller, thereby effectively reducing data communication between the working vertices in the parallel computing platform using the BSP model, thereby improving Processing efficiency.
  • the parallel graph data analysis platform it can be used as a data loading algorithm, and the corresponding working nodes are reasonably allocated according to the topology structure of the graph data, thereby reducing the communication between the working nodes in the parallel computing process;
  • the desired clustering result is calculated by manually setting the number of types to be aggregated.
  • FIG. 3 is a schematic diagram of an apparatus for data segmentation according to an embodiment of the present invention. As shown in FIG. 3, the apparatus of this embodiment may include:
  • a conversion module configured to convert the original map data into a locally dense weighted hypergraph by a parallel label transfer algorithm
  • a dividing module configured to divide the weighted hypergraph equalization into weighted hypergraph subgraphs by a partitioning algorithm
  • the restoration module is configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.
  • the conversion module is configured to aggregate the vertices having the same label in the original map data into a super point by a parallel label transfer algorithm, and the weight of the super point is included in the super point.
  • the number of vertices; the connected edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image; the weighted hypergraph is constructed by the super point and the super edge .
  • the weight of the super edge determined by the edge in the original image includes: if two endpoints of the edge in the original image belong to different hyperpoints in the weighted hypergraph, then between the two superpoints There is a super edge, and the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.
  • the dividing module is configured to sequentially calculate a minimized PRC value starting from a super point in the weighted hypergraph, and divide the weighted hypergraph according to the minimized PRC value.
  • a weighted supergraph subgraph that specifies the number of blocks.
  • Step 101 Passing an aggregate hypergraph in parallel tags
  • each vertex in the initialization graph is its own ID
  • vertex v sends its own label to its neighboring vertex.
  • vertex 0 sends label 0 to the adjacent point ⁇ 1, 2, 3, 4, 5 ⁇
  • vertex 1 sends label 1 to the adjacent Vertex ⁇ 0, 2, 3, 4 ⁇
  • vertex 2 sends label 2 to adjacent vertex ⁇ 0, 1, 3, 4, 8 ⁇
  • vertex 3 sends label 3 to adjacent vertex ⁇ 0, 1, 2, 4 ⁇
  • vertex 4 sends label 1 to adjacent vertices ⁇ 0, 1, 2, 3, 11, 14 ⁇ ;
  • Each vertex v calculates its own new label according to the received labels sent by other vertices, and returns the label with the most occurrences. When there are more than one label, one of them is randomly selected; The label information is received once, and the labels received by each vertex are all once.
  • each vertex Since the iteration does not reach the set upper limit, each vertex continues to send its own current tag information, vertex 0 sends tag 1 to the adjacent point ⁇ 1, 2, 3, 4, 5 ⁇ , and vertex 1 sends tag 2 to phase Neighbor vertex ⁇ 0, 2, 3, 4 ⁇ , vertex 2 sends label 1 to adjacent vertex ⁇ 0, 1, 3, 4, 8 ⁇ , vertex 3 sends label 1 to adjacent vertex ⁇ 0, 1, 2, 4 ⁇ , vertex 4 sends label 1 to adjacent vertices ⁇ 0, 1, 2, 3, 11, 14 ⁇ ;
  • Each vertex accepts the tag information and calculates its own new tag.
  • Fig. 5 The result of the calculation in step 101 is shown in Fig. 5.
  • the same shaded background is used to indicate the same vertices at the end of the aggregation.
  • Generate a corresponding weighted hypergraph as shown in Figure 6, where the vertex corresponding to the super point 1 is ⁇ 5, 6, 7, 8, 9 ⁇ , and the vertex corresponding to the super point 2 is ⁇ 0, 1, 2, 3, 4 ⁇ , the vertices of the original image corresponding to the super point 3 are ⁇ 10, 11, 12, 13, 14 ⁇ , and the vertices of the original image corresponding to the super point 4 are ⁇ 15, 16, 17, 18, 19 ⁇ , and the super point 5
  • the corresponding vertices of the original image are ⁇ 25, 26, 27, 28, 29 ⁇
  • the vertices of the original image corresponding to the super point 6 are ⁇ 20, 21, 22, 23, 24 ⁇
  • the vertices of the original image corresponding to the super point 7 are ⁇ 50 , 51, 52, 53, 54 ⁇
  • Step 102 successively dividing
  • the set S is ⁇ 1, 2, 3, 4, 5, 6 ⁇ ;
  • the calculation result is divided into two super-point sets (7, 8, 9, 10, 11, 12 ⁇ and (1, 2, 3, 4, 5, 6 ⁇ , which can be seen in the corresponding figure, cut off the super point 12 and
  • the weight between 3, 4, and 8 is the super edge of 1.
  • the embodiment includes the following steps:
  • Step 201 The parallel label passes the aggregate hypergraph, and includes the following steps:
  • each vertex v sends its own label to its adjacent vertex.
  • the vertex 18 sends the label 18 to the adjacent point ⁇ 12, 19, 20, 21, 22, 23, 24 ⁇
  • the vertex 19 sends Label 19 to adjacent vertices ⁇ 17, 18, 20, 21, 22, 23 ⁇
  • vertex 20 sends label 20 to adjacent vertices ⁇ 18, 19, 21, 22, 23 ⁇
  • vertex 21 sends label 21 to adjacent vertices ⁇ 18,19,20,22,23 ⁇
  • vertex 22 sends label 22 to adjacent vertices ⁇ 18, 19, 20, 21, 23, 28 ⁇
  • vertex 23 sends label 23 to adjacent vertices ⁇ 18, 19, 20 , 21, 22 ⁇ ;
  • Step 2013 Each vertex v calculates a label of the highest number of occurrences according to the received label sent by other vertices, and randomly selects one of the labels when the number of times has a maximum number of labels;
  • the label information is received once, and the labels received by each vertex are all once.
  • Step 2014 since the iteration does not reach the set upper limit, each vertex continues to send its own current tag information, and the vertex 18 sends the tag 12 to the adjacent vertex ⁇ 12, 19, 20, 21, 22, 23, 24 ⁇ , vertex 19 Send label 18 to adjacent points ⁇ 17, 18, 20, 21, 22, 23 ⁇ , vertex 20 sends label 18 to adjacent vertices ⁇ 18, 19, 21, 22, 23 ⁇ , vertex 21 sends label 18 to adjacent Vertices ⁇ 18, 19, 20, 22, 23 ⁇ , vertex 22 sends label 18 to adjacent vertices ⁇ 18, 19, 20, 21, 23, 28 ⁇ , vertex 23 sends label 18 to adjacent vertices ⁇ 18, 19, 20,21,22 ⁇ ;
  • each vertex accepts the tag information and calculates its own new tag.
  • the subsequent iteration step since most of the labels in the adjacent vertices are 18, they remain unchanged.
  • lable12 and lable17 respectively indicate that the vertices numbered 12 and 17 are self-labeled in the same process, and since their labels are a minority in the set of vertices discussed, there is no influence on the calculation result.
  • step 2012 jump to step 2012, and repeat the above process until the vertex tags no longer change or the counter t reaches the set upper limit, the algorithm stops.
  • Fig. 8 The result of the calculation in step 201 is shown in Fig. 8.
  • the same shaded background is used to indicate the same vertices at the end of the aggregation.
  • Generate a corresponding weighted hypergraph see Figure 9, where the vertex corresponding to the super point 1 is ⁇ 42, 43, 44, 45, 46, 47 ⁇ , and the vertex corresponding to the super point 2 is ⁇ 12, 13 , 14, 15, 16, 17 ⁇ , the vertex of the original image corresponding to the super point 3 is ⁇ 24, 25, 26, 27, 28 ⁇ , and the vertex of the original image corresponding to the super point 4 is ⁇ 0, 1, 2, 3, 4 , 5 ⁇ , the vertices of the original image corresponding to the super point 5 are ⁇ 18, 19, 20, 21, 22, 23 ⁇ , and the vertices of the original image corresponding to the super point 6 are ⁇ 6, 7, 8, 9, 10, 11 ⁇ ,
  • the vertices of the original image corresponding to the super point 7 are ⁇ 30, 31, 32, 33, 34
  • Step 202 Divide successively, including the following steps:
  • Step 2022 Calculate the minimum PRCC value sequentially starting from the super point in the weighted hypergraph.
  • the minimumPRC algorithm the following calculations are performed:
  • the set element set size is 2, the set S satisfies the requirement and returns the set S as a result; the set S is ⁇ 1, 8 ⁇ .
  • steps 2022 and 2023 are continued until the blocking counter equals 4 ends.
  • the calculation result is divided into four super point sets ⁇ 1, 8 ⁇ , ⁇ 2, 5 ⁇ , ⁇ 3, 7 ⁇ , ⁇ 4, 6 ⁇ . It can be seen from the corresponding figure that the first time cut off the super point 7, 8 The first side has a weight of 1 and a super edge of 2, and the second time cuts the super point 2, 6 and 3, the weight between 1 and 2 is the super edge of 1 and 2, and the third time cuts the super point 4. The weight between 7 and 6, 7 is 1 for the super edge.
  • Step 2026 Restore the super point back to the corresponding original point to complete the segmentation.
  • Step 301 The parallel label passes the aggregate hypergraph, and includes the following steps:
  • Step 3012 Each vertex v sends its own label to its adjacent vertex.
  • vertex 28 sends label 28 to the adjacent point ⁇ 21, 29, 30, 21, 32, 33, 34, 35 ⁇
  • vertex 29 Send the tag 29 to the adjacent vertex ⁇ 28, 30, 31, 32, 33, 34 ⁇
  • the vertex 30 sends the tag 30 to the adjacent vertex ⁇ 28, 29, 31, 32, 33, 34 ⁇
  • the vertex 31 sends the tag 31 to Adjacent vertices ⁇ 28, 29, 30, 32, 33, 34 ⁇
  • vertex 32 sends label 32 to adjacent vertices ⁇ 28, 29, 30, 31, 33, 34 ⁇
  • vertex 33 sends label 33 to adjacent vertices ⁇ 27, 28, 29, 30, 31, 32, 34 ⁇
  • vertex 34 sends label 34 to adjacent vertices ⁇ 5, 28, 29, 30, 31, 32, 33 ⁇ ;
  • Step 3013 Each vertex v calculates its own new label according to the received label sent by other vertices, and returns the label with the most occurrences. When the number of labels is more than one, one of them is randomly selected; The label information is received once, and the labels received by each vertex are all once.
  • Step 3014 Since the iteration does not reach the set upper limit, each vertex continues to transmit its own current tag information, and the vertex 28 sends the tag 29 to the adjacent point ⁇ 21, 29, 30, 21, 32, 33, 34, 35 ⁇ , the vertex 29 sends label 28 to adjacent vertices ⁇ 28, 30, 31, 32, 33, 34 ⁇ , vertex 30 sends label 28 to adjacent vertices ⁇ 28, 29, 31, 32, 33, 34 ⁇ , vertex 31 sends label 28 To adjacent vertices ⁇ 28, 29, 30, 32, 33, 34 ⁇ , vertex 32 sends label 28 to adjacent vertices ⁇ 28, 29, 30, 31, 33, 34 ⁇ , vertex 33 sends label 28 to adjacent vertices ⁇ 27, 28, 29, 30, 31, 32, 34 ⁇ , vertex 34 sends label 28 to adjacent vertices ⁇ 5, 28, 29, 30, 31, 32, 33 ⁇ ;
  • each vertex accepts the tag information and calculates its own new tag.
  • the subsequent iteration step since most of the labels in the adjacent vertices are 28, they remain unchanged.
  • lable21 and label35 respectively indicate that the vertices numbered 21 and 35 are self-labeled in the same process, and since their labels are a minority in the set of vertices discussed, there is no influence on the calculation result.
  • Fig. 11 The result of the calculation in step 301 is shown in Fig. 11, in which the same shaded background is used to indicate the same vertices at the end of the aggregation.
  • a corresponding weighted hypergraph as shown in Figure 12, where the vertex corresponding to the super point 1 is ⁇ 35, 36, 37, 38, 39, 40, 41 ⁇ , and the vertex corresponding to the super point 2 is ⁇ 0, 1,2,3,4,5,6 ⁇ , the vertex of the original image corresponding to the super point 3 is ⁇ 7,8,9,10,11,12,13 ⁇ , and the vertex of the original image corresponding to the super point 4 is ⁇ 14, 15,16,17,18,19,20 ⁇ , the vertex of the original image corresponding to the super point 5 is ⁇ 49, 50, 51, 52, 53, 54, 55 ⁇ , and the vertex of the original image corresponding to the super point 6 is ⁇ 28, 29, 30, 31, 32, 33, 34 ⁇ , the vertex of the original image corresponding to the super point
  • Step 302 successively dividing
  • step 3023 the minimum PRCC value is sequentially calculated starting from the super point in the weighted hypergraph.
  • the minimumPRC algorithm the following calculations are performed:
  • the calculation result is divided into two super point sets ⁇ 1, 2, 5, 8 ⁇ and ⁇ 3, 4, 6, 7 ⁇ . It can be seen from the corresponding figure that the super points 1, 3 and 1, 6 and 7 are cut off.
  • the embodiment of the invention provides a method and a device for segmenting data of a graph, which can be faster in segmentation, larger in processing data size, less coupled between the segmented data blocks, and effectively reduce the number of parallel computing platforms using the BSP model. Data communication between working vertices improves processing efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A graph data partitioning method and device. The method comprises: converting original graph data into a locally intensive weighted hypergraph through a parallel label propagation algorithm; uniformly partitioning the weighted hypergraph into weighted hypergraph subgraphs gradually through a partitioning algorithm; and restoring the weighted hypergraph subgraphs into data corresponding to an original graph.

Description

一种图数据分割的方法及装置Method and device for dividing graph data 技术领域Technical field
本发明涉及云计算技术和图数据分析技术领域,特别是涉及一种图数据分割的方法及装置。The present invention relates to the field of cloud computing technologies and graph data analysis technologies, and in particular, to a method and apparatus for graph data segmentation.
背景技术Background technique
BSP(Bulk Synchronous Parallel,整体同步并行模型),是一种并行算法的设计模型。在该模型中,将算法分为若干个超步(super step),每个超步中分为三个过程,即本地计算、相互通信、阶段同步。BSP并行模型适合进行迭代次数高的计算。图数据分割,图数据即采用图结构存储的数据。图结构是计算机科学中最常用的一类抽象数据结构,由有限个顶点和连接顶点之间的边组成,具有比线性表结构和树结构更一般性的表示能力。BSP (Bulk Synchronous Parallel) is a design model of parallel algorithms. In this model, the algorithm is divided into several super steps, each of which is divided into three processes, namely local computing, mutual communication, and phase synchronization. The BSP parallel model is suitable for high iteration calculations. The graph data is segmented, and the graph data is the data stored in the graph structure. Graph structure is the most commonly used class of abstract data structures in computer science. It consists of a finite number of vertices and edges between connected vertices. It has a more general representation than linear table structures and tree structures.
由于现实场景的应用多用图进行描述,随着信息的增长,图数据达到了海量规模。由于图数据本身固有的连通性和图计算的强耦合性特点,为了实现高效的并行处理,需要通过解耦等手段将一个逻辑上完整的大图分割成若干部分,分别放置到分布式存储系统的各个工作节点上,然后进行并行的分布式处理。Since the application of the real scene is described by multiple graphs, as the information grows, the graph data reaches a massive scale. Due to the inherent connectivity of graph data and the strong coupling characteristics of graph calculation, in order to achieve efficient parallel processing, a logically complete large graph needs to be divided into several parts by means of decoupling and placed into distributed storage systems. On each working node, then parallel distributed processing is performed.
现有的图数据分割方案主要有如下几类方法:启发式方法,以Kernighan-Lin算法为代表。该算法中首先将图数据分成A、B两个集合,然后分别计算A集合中的每一个顶点与B集合中每一个顶点交换后对集合权值的影响,每次交换对集合权值影响最大的两个顶点,直到达到结束条件。谱分割方法,该算法通过计算图的拉普拉斯矩阵的特征向量,提取前k个特征值和它们对应的特征向量,获得每一个顶点在低维空间的表示,然后进行k-means聚类,获得图的划分。从以上技术方案可以看出,在现有的图数据分割方案中,存在以下缺点:The existing graph data segmentation scheme mainly has the following types of methods: heuristic method, represented by Kernighan-Lin algorithm. In the algorithm, the graph data is first divided into two sets of A and B, and then the influence of each vertex in the A set and each vertex in the B set on the set weight is calculated, and each exchange has the greatest influence on the set weight. The two vertices until the end condition is reached. The spectral segmentation method, which calculates the eigenvectors of the Laplacian matrix of the graph, extracts the first k eigenvalues and their corresponding eigenvectors, obtains the representation of each vertex in the low-dimensional space, and then performs k-means clustering. , to obtain the division of the map. It can be seen from the above technical solutions that in the existing graph data segmentation scheme, the following disadvantages exist:
计算时间复杂度较高:如Kernighan-Lin算法,由于需要对两个集合中的顶点分别进行比较,计算其交换后对集合的权值影响,所以其时间复杂度为 O(n3)。而且,在大数据分割应用中,需要将数据分为多份,对于Kernighan-Lin算法就需要对第一步的两个分割结果多次运行算法,其时间消耗更多。谱分析方法需要求解n阶方阵的特征值分解问题,其时间复杂度为O(n3),对于大规模图数据构成的矩阵计算复杂。The calculation time complexity is higher: for example, the Kernongan-Lin algorithm, because it needs to compare the vertices in the two sets separately, calculates the influence of the weights on the set after the exchange, so the time complexity is O(n3). Moreover, in the big data segmentation application, the data needs to be divided into multiple copies. For the Kernighan-Lin algorithm, the algorithm needs to be run multiple times for the two segmentation results of the first step, and the time consumption is more. The spectral analysis method needs to solve the eigenvalue decomposition problem of n-order square matrix, and its time complexity is O(n3). The matrix calculation for large-scale graph data is complicated.
计算空间复杂度较高:如谱分析方法,需要对图数据中顶点构建邻接矩阵,然后进行拉普拉斯分解,再进行分割计算。图的邻接矩阵规模为n×n,其中n为图中顶点的数目。由于大图数据中顶点个数很多,该矩阵也相当庞大,不利于计算和缓存。The computational space complexity is high: as in the spectral analysis method, it is necessary to construct an adjacency matrix for the vertices in the graph data, and then perform Laplace decomposition, and then perform the segmentation calculation. The adjacency matrix of the graph is n × n, where n is the number of vertices in the graph. Due to the large number of vertices in the large image data, the matrix is also quite large, which is not conducive to calculation and caching.
难以进行并行化:由于算法本身设计时没有进行并行化设计,所以在将其并行化以提高效率时会遇到问题。如Kernighan-Lin算法中一次只交换一对顶点,谱分析方法中的如何进行大规模矩阵并行分解。Parallelization is difficult: Since the algorithm itself is not designed for parallelization, there are problems in parallelizing it to improve efficiency. For example, in the Kernighan-Lin algorithm, only one pair of vertices is exchanged at a time, and how to perform large-scale matrix parallel decomposition in the spectral analysis method.
发明内容Summary of the invention
本发明实施例提供一种图数据分割的方法及装置,以克服现有图数据分割技术中存在的时间复杂度和空间复杂度较高问题和难以并行化的缺陷。The embodiment of the invention provides a method and a device for segmenting graph data, so as to overcome the problems of high time complexity and space complexity and difficulty of parallelization existing in the existing data segmentation technology.
本发明实施例提供了一种图数据分割的方法,包括:The embodiment of the invention provides a method for segmenting data of a graph, comprising:
通过并行的标签传递算法将原始图数据转换为局部密集的带权超图;Converting the original graph data into a locally dense weighted hypergraph by a parallel label transfer algorithm;
通过划分算法逐次将所述带权超图均衡划分为带权超图子图;The weighted hypergraph equilibrium is successively divided into weighted hypergraph subgraphs by a partitioning algorithm;
将所述带权超图子图还原为原始图对应的数据。The weighted hypergraph subgraph is restored to the data corresponding to the original graph.
较佳地,上述方法还具有下面特点:所述通过并行的标签传递算法将原始图数据转换为局部密集的带权超图,包括:Preferably, the above method further has the following feature: converting the original image data into a locally dense weighted hypergraph by a parallel label transfer algorithm, including:
通过并行的标签传递算法将原始图数据中具有同样标签的顶点聚合成一个超点,所述超点的权值为该超点所包含的顶点个数;The vertex with the same label in the original graph data is aggregated into a super point by a parallel label passing algorithm, and the weight of the super point is the number of vertices included in the super point;
所述超点之间的连边为超边,所述超边的权值由原始图中的边决定;The connecting edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image;
由所述超点和所述超边构所述带权超图。The weighted hypergraph is represented by the super point and the super edge.
较佳地,上述方法还具有下面特点:所述超边的权值由原始图中的边决定,包括: Preferably, the above method further has the following feature: the weight of the super edge is determined by an edge in the original image, including:
如原始图中的边的两个端点在所述带权超图中属于不同的超点,则该两个超点之间存在一条超边,该超边的权值增加1;If the two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a super edge between the two superpoints, and the weight of the super edge is increased by 1;
如原始图中的边的两个端点在所述带权超图中属于同一下超点,则不产生超边。If the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no superedge is generated.
较佳地,上述方法还具有下面特点:Preferably, the above method also has the following features:
所述通过划分算法逐次将所述带权超图均衡划分为带权超图子图,包括:And dividing the weighted hypergraph equalization into weighted hypergraph subgraphs by using a partitioning algorithm, including:
以所述带权超图中的超点为起点依次计算最小化局部切分率值,Minimizing the localized partial rate value by using the super point in the weighted hypergraph as a starting point
根据所述最小化局部切分率值将所述带权超图划分为指定块数的带权超图子图。And dividing the weighted hypergraph into a weighted hypergraph subgraph of a specified number of blocks according to the minimized partial cut rate value.
本发明实施例还提供了一种图数据分割的装置,包括:The embodiment of the invention further provides an apparatus for segmenting data of a graph, comprising:
转换模块,其设置为通过并行的标签传递算法将原始图数据转换为局部密集的带权超图;a conversion module configured to convert the original map data into a locally dense weighted hypergraph by a parallel label transfer algorithm;
划分模块,其设置为通过划分算法逐次将所述带权超图均衡划分为带权超图子图;以及a dividing module, configured to successively divide the weighted hypergraph equalization into a weighted hypergraph subgraph by a partitioning algorithm;
还原模块,其设置为将所述带权超图子图还原为原始图对应的数据。And a restoration module configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.
较佳地,上述装置还具有下面特点:Preferably, the above device also has the following features:
所述转换模块,是设置为通过并行的标签传递算法将原始图数据中具有同样标签的顶点聚合成一个超点,所述超点的权值为该超点所包含的顶点个数;所述超点之间的连边为超边,由原始图中的边决定所述超边的权值;由所述超点和所述超边构所述带权超图。The conversion module is configured to aggregate the vertices having the same label in the original graph data into a super point by a parallel label transfer algorithm, and the weight of the super point is the number of vertices included in the super point; The connected edge between the super points is a super edge, and the weight of the super edge is determined by the edge in the original image; the weighted hypergraph is constructed by the super point and the super edge.
较佳地,上述装置还具有下面特点:Preferably, the above device also has the following features:
由原始图中的边决定所述超边的权值包括:如原始图中的边的两个端点在所述带权超图中属于不同的超点,则该两个超点之间存在一条超边,该超边的权值增加1;如原始图中的边的两个端点在所述带权超图中属于同一下超点,则不产生超边。Determining the weight of the super edge from the edge in the original graph includes: if two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a strip between the two hyperpoints The super edge, the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.
较佳地,上述装置还具有下面特点:Preferably, the above device also has the following features:
所述划分模块,是设置为以所述带权超图中的超点为起点依次计算最小 化局部切分率值,根据所述最小化局部切分率值将所述带权超图划分为指定块数的带权超图子图。The dividing module is configured to calculate the minimum in order from the super point in the weighted hypergraph The partial cut rate value is obtained, and the weighted hypergraph is divided into the weighted hypergraph subgraphs of the specified number of blocks according to the minimized partial cut rate value.
本发明实施例还提供一种计算机程序,包括程序指令,当该程序指令被图数据分割装置执行时,使得该装置可执行上述方法。The embodiment of the invention further provides a computer program, comprising program instructions, which when executed by the data segmentation device, enable the device to perform the above method.
本发明实施例还提供一种载有上述计算机程序的载体。Embodiments of the present invention also provide a carrier carrying the above computer program.
综上,本发明实施例提供一种图数据分割的方法及装置,可以分割速度更快,处理数据规模更大,分割后的数据块之间耦合度较小,有效降低使用BSP模型的并行计算平台中个工作顶点之间的数据通信,提高了处理效率。In summary, the embodiments of the present invention provide a method and a device for segmenting data of a graph, which can be faster in segmentation, larger in processing data size, and less in coupling between the segmented data blocks, thereby effectively reducing parallel computing using the BSP model. Data communication between the working vertices in the platform improves processing efficiency.
附图概述BRIEF abstract
图1为本发明实施例的一种图数据分割的方法的流程图;1 is a flowchart of a method for dividing a graph data according to an embodiment of the present invention;
图2为本发明实施例的并行的标签传递算法的流程图;2 is a flowchart of a parallel label transfer algorithm according to an embodiment of the present invention;
图3为本发明实施例的一种图数据分割的装置的示意图;FIG. 3 is a schematic diagram of an apparatus for data segmentation according to an embodiment of the present invention; FIG.
图4为本发明实施例1的数据原始图;4 is a data original diagram of Embodiment 1 of the present invention;
图5为本发明实施例1的标签传递后的效果图;Figure 5 is a diagram showing the effect of the label transmission after the embodiment 1 of the present invention;
图6为本发明实施例1的带权超图;6 is a weighted hypergraph of Embodiment 1 of the present invention;
图7为本发明实施例2的数据原始图;Figure 7 is a diagram showing the original data of the embodiment 2 of the present invention;
图8为本发明实施例2的标签传递后的效果图;Figure 8 is a diagram showing the effect of the label transfer according to the second embodiment of the present invention;
图9为本发明实施例2的带权超图;9 is a weighted hypergraph of Embodiment 2 of the present invention;
图10为本发明实施例3的数据原始图;10 is a data original diagram of Embodiment 3 of the present invention;
图11为本发明实施例3的标签传递后的效果图;Figure 11 is a diagram showing the effect of the label transfer after the embodiment of the present invention;
图12为本发明实施例3的带权超图。Figure 12 is a weighted hypergraph of Embodiment 3 of the present invention.
本发明的较佳实施方式Preferred embodiment of the invention
下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在 不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that In the case of no conflict, the embodiments in the present application and the features in the embodiments may be arbitrarily combined with each other.
在实际应用中,图并不是随机的,局部密集子图在很多网络中都广泛存在,这些子图内部的顶点彼此紧密连接,而与外部顶点具有较少的连接。本实施例的分割方法认为这样的密集子图不应当被切分为两个或多个分块之间,而是被当作是一个独立不可分的“原子”,这样对图划分就转变成对这些不可分割的“原子”的划分。In practical applications, the graph is not random. Local dense subgraphs are widely used in many networks. The vertices inside these subgraphs are closely connected to each other and have fewer connections to external vertices. The segmentation method of this embodiment considers that such a dense subgraph should not be divided into two or more partitions, but is treated as an inseparable "atoms", so that the division of the map is turned into a pair. The division of these indivisible "atoms".
本实施例提供的是一种采用自底向上的分割方法。该方法分为两个阶段,聚集阶段和分割阶段。在聚集阶段,通过一个分布式标签传播算法将局部密集的子图聚合到一块。这些局部密集的子图形成最基本的分割单元,这里称作超点(Super Vertex),超点组成一个超图(Super Graph)。在分割阶段,通过一个贪心的逐次图分割算法对聚集阶段产生的超图进行划分。每一次从超图中抽取一个和其余子图边割最小的集合,这样逐次取得这个超图的划分,最后将超点还原为原始的局部密集的子图,完成图的分割。This embodiment provides a bottom-up segmentation method. The method is divided into two phases, an aggregation phase and a segmentation phase. In the aggregation phase, locally dense subgraphs are aggregated into one piece by a distributed tag propagation algorithm. These locally dense subgraphs form the most basic segmentation unit, referred to herein as Super Vertex, which forms a Super Graph. In the segmentation phase, the hypergraph generated in the aggregation phase is divided by a greedy successive graph segmentation algorithm. Each time, one and the other subgraphs are extracted from the hypergraph to cut the smallest set, so that the supergraph is successively obtained, and finally the superpoint is restored to the original locally dense subgraph, and the segmentation of the graph is completed.
本实施例中的符号定义如下:The symbols in this embodiment are defined as follows:
使用G=(V,E)表示一个图,其中V表示图中点的集合,
Figure PCTCN2014087091-appb-000001
是边的集合。图的邻接矩阵为M=(ωi,j)i,j=1,2,...,n.。ωi,j>0表示顶点vi和vj相连,权重为ωi,j,当图G是无权图时,如果点vi与vj相邻,ωi,j=1,如果点vi与vj不相邻,ωi,j=0。N(S)表示顶点S的邻域,
Figure PCTCN2014087091-appb-000002
Use G = (V, E) to represent a graph, where V represents the set of points in the graph,
Figure PCTCN2014087091-appb-000001
Is a collection of edges. The adjacency matrix of the graph is M = (ω i, j ) i, j = 1, 2, ..., n . ω i,j >0 means that the vertex v i and v j are connected, and the weight is ω i,j . When the graph G is a weightless graph, if the point v i is adjacent to v j , ω i,j =1, if the point v i is not adjacent to v j , ω i,j =0. N(S) represents the neighborhood of the vertex S,
Figure PCTCN2014087091-appb-000002
对于任意顶点集合
Figure PCTCN2014087091-appb-000003
它的补集V\Vi表示为
Figure PCTCN2014087091-appb-000004
对于任意两个集合A和B,定义:
For any set of vertices
Figure PCTCN2014087091-appb-000003
Its complement V\V i is expressed as
Figure PCTCN2014087091-appb-000004
For any two sets A and B, define:
Figure PCTCN2014087091-appb-000005
Figure PCTCN2014087091-appb-000005
集合Vi的割集为
Figure PCTCN2014087091-appb-000006
割集Ci的边割为
Figure PCTCN2014087091-appb-000007
P=V1,V2,...,Vk为图G的k-路划分,当且仅当:(1)、∪iVi=V和(2)、
Figure PCTCN2014087091-appb-000008
构建图的划分的最直接方式是解最小边割问题:选择一个划分P=V1,V2,...,Vk,使以下式子最小:
The cut set of the set V i is
Figure PCTCN2014087091-appb-000006
The edge of the cut set C i is cut
Figure PCTCN2014087091-appb-000007
P = V 1 , V 2 , ..., V k is the k-way division of graph G, if and only if: (1), ∪ i V i = V and (2),
Figure PCTCN2014087091-appb-000008
The most straightforward way to construct a partition is to solve the minimum edge cut problem: choose a partition P = V 1 , V 2 , ..., V k to minimize the following formula:
Figure PCTCN2014087091-appb-000009
Figure PCTCN2014087091-appb-000009
但是最小边割问题的解通常不能产生图的一个均衡划分,将均衡划分的目标函数定义如下:However, the solution of the minimum edge cut problem usually does not produce an equilibrium partition of the graph. The objective function of the equilibrium partition is defined as follows:
Figure PCTCN2014087091-appb-000010
Figure PCTCN2014087091-appb-000010
如图1所示,本实施例的一种图数据分割的方法包括以下步骤:As shown in FIG. 1, a method for segmenting data of the embodiment includes the following steps:
S1、通过并行的标签传递算法将原始图数据转换为局部密集的带权超图;S1, converting the original graph data into a locally dense weighted hypergraph by a parallel label transfer algorithm;
并行标签传播方法中,顶点只需要向其邻接顶点发送标签信息,而不需要获取其他的顶点信息。该算法线性复杂,适合并行计算,如图2所示,包括如下步骤:In the parallel tag propagation method, the vertex only needs to send tag information to its neighboring vertices without acquiring other vertex information. The algorithm is linear and complex, suitable for parallel computing, as shown in Figure 2, including the following steps:
步骤11、初始化图中每一个顶点的标签,对于给定的顶点v,标签Lv(0)=v,表示在第0次迭代开始时,顶点的标签为该顶点的ID,设置迭代计数器t=0; Step 11. Initialize the label of each vertex in the graph. For a given vertex v, the label L v (0)=v indicates that the label of the vertex is the ID of the vertex at the beginning of the 0th iteration, and an iteration counter t is set. =0;
步骤12、对于每一个顶点v,发送自己的标签给它的邻接顶点; Step 12. For each vertex v, send its own label to its adjacent vertex;
步骤13、对于每一个顶点v,根据接收到的其他顶点发送过来的标签信息,计算自己的新标签。计算公式如下: Step 13. For each vertex v, calculate its own new label according to the received label information sent by other vertex. Calculated as follows:
Lu(t)=f(Lu1(t-1),…,Lui(t-1))L u (t)=f(L u1 (t-1),...,L ui (t-1))
其中,Lui(t-1)表示顶点u的第i个邻接顶点在(t-1)循环时的标签状态,f为返回出现次数最多的标签,当次数最多标签有多个时,随机选定其中之一为f的返回结果;Where, ui (t-1) represents the label state of the i-th adjacent vertex of the vertex u in the (t-1) loop, and f is the label that returns the most occurrences. When the number of tags is more than one, the random selection is performed. One of them is the return result of f;
步骤14、t=t+1,然后跳转到步骤12,当所有顶点标签不再变化,或者t到达设置的上限时,算法停止。 Step 14, t = t + 1, then jump to step 12, when all vertex labels no longer change, or t reaches the set upper limit, the algorithm stops.
计算结束后,对结果进行整理,获得一个带权超图Gm(Vm,E)。其中,Vm为超点集合,下标m为聚合算法结束时图中所有顶点的标签种类数。具有同样标签的顶点聚合成一个超点,超点的权值为该超点所包含的顶点个数,即具有相同标签的顶点个数。E为超图中超点之间的超边,其权值由原始图中的边决定:1)原始图中边的两个端点在超图中属于不同的超点,则两个超点直接 存在一条超边,边的权值增加1;2)原始图中边的两个端点在超图中属于同一个超点,由于本方案超图不允许自环,因此不产生超边。综上所述,超图中超边的权值不小于1。After the calculation is completed, the results are sorted to obtain a weighted hypergraph G m (V m , E). Where V m is a super point set, and the subscript m is the number of tag types of all vertices in the graph at the end of the aggregation algorithm. The vertices with the same label are aggregated into a super point. The weight of the super point is the number of vertices contained in the super point, that is, the number of vertices with the same label. E is the super-edge between the hyper-points in the hypergraph, and its weight is determined by the edge in the original graph: 1) The two endpoints of the edge in the original graph belong to different hyper-points in the hypergraph, then the two super-points exist directly A super edge, the weight of the edge increases by 1; 2) the two endpoints of the edge in the original graph belong to the same hyperpoint in the hypergraph, and since the hypergraph of the scheme does not allow the loop, no superedge is generated. In summary, the weight of the super edge in the hypergraph is not less than one.
S2、通过划分算法逐次将所述带权超图均衡划分为带权超图子图;S2, dividing the weighted hypergraph balance into weighted hypergraph subgraphs by a partitioning algorithm;
由标签传播方法可以获得一个加权图Gm(Vm,E),为了求解加权图的均衡划分,本实施例提案提出了一个逐步最小化Ratio-Cut(切分率)的算法。该算法通过多步执行,每步寻找能够最小化目标函数PRC(Partial-Ratio Cut,局部切分率)的子集vi,然后从图中移除该子集,再以同样的方式在剩余的图中寻找下一个子集。其中PRC定义如下:A weighting graph G m (V m , E) can be obtained by the label propagation method. In order to solve the equalization partition of the weighted graph, the present embodiment proposes an algorithm for gradually minimizing the Ratio-Cut (cutting rate). The algorithm is executed in multiple steps, each step looking for a subset v i capable of minimizing the objective function PRC (Partial-Ratio Cut), then removing the subset from the graph, and then remaining in the same way Look for the next subset in the diagram. The PRC is defined as follows:
Figure PCTCN2014087091-appb-000011
Figure PCTCN2014087091-appb-000011
其中,Vi表示对于图的第i块划分,|Vi|表示该划分中顶点的个数,
Figure PCTCN2014087091-appb-000012
表示Vi与其补集
Figure PCTCN2014087091-appb-000013
之间的边数目。PRC(Vi)可以理解为对于图的划分Vi平均每个顶点被切断的边的数目。
Wherein, V i represents the i-th block partition of the graph, and |V i | represents the number of vertices in the partition,
Figure PCTCN2014087091-appb-000012
Represents V i and its complement
Figure PCTCN2014087091-appb-000013
The number of sides between. PRC(V i ) can be understood as the number of edges for which the vertices of each of the vertices are cut off for the division V i of the graph.
逐次划分算法的步骤包括:The steps of successively dividing the algorithm include:
步骤21、初始化超点集合setList和bestSet为空集,设置需要划分的块个数为k,设置分块计数器i=0;Step 21: Initialize the super point set setList and the bestSet as an empty set, set the number of blocks to be divided into k, and set the block counter i=0;
步骤22、对带权超图Gm(Vm,E)中的超点v∈Vm计算超点集合:Step 22: Calculate a super point set for the super point v ∈ V m in the weighted hypergraph G m (V m , E):
Figure PCTCN2014087091-appb-000014
Figure PCTCN2014087091-appb-000014
该公式表示对超图Gm进行以顶点v为起点的划分,块数为k,每块划分应该大致包括
Figure PCTCN2014087091-appb-000015
个顶点,minimizePRC(最小化PRC)的计算方法见下文。
The formula indicates that the supergraph G m is divided by the vertex v as the starting point, and the number of blocks is k, and each partition should be roughly included.
Figure PCTCN2014087091-appb-000015
For the vertices, the calculation method of minimumPRC (minimized PRC) is shown below.
步骤22、对集合S和集合bestSet分别计算PRC值,如果PRC(S)<PRC(bestSet),那么令bestSet为S,将bestSet中的元素增加至setList中。从超点集合Vm中移除bestSet中的元素。计数器i增加1,同时清空bestSet中的元素。Step 22: Calculate the PRC value for the set S and the set bestSet respectively. If PRC(S)<PRC(bestSet), let the bestSet be S, and add the elements in the bestSet to the setList. BestSet element is removed from the set of points V m in super. Counter i is incremented by 1, while emptying the elements in the bestSet.
步骤23、如果计数器i小于k,跳转到步骤202,如果计数器i大于等于 k,算法停止,返回每次setList。 Step 23. If the counter i is smaller than k, jump to step 202 if the counter i is greater than or equal to k, the algorithm stops and returns each time setList.
S3、将所述带权超图子图还原为原始图对应的数据。S3. Restore the weighted hypergraph subgraph to data corresponding to the original graph.
根据超点信息将超点集合还原为对应的原始顶点集合。The super point set is restored to the corresponding original vertex set according to the super point information.
本实施例的逐次划分算法中,minimizePRC算法的步骤包括:In the successive partitioning algorithm of this embodiment, the steps of the minimum PRC algorithm include:
步骤31、初始化超点结果集合Vi,设为空集。设置每个Vi中的元素个数上限为n,其中
Figure PCTCN2014087091-appb-000016
k为需要划分的块数。
Step 31: Initialize the super point result set V i and set it as an empty set. Set the maximum number of elements in each V i to n, where
Figure PCTCN2014087091-appb-000016
k is the number of blocks that need to be divided.
步骤32、根据超点id从小到大的顺序依次选择尚未加入任何划分的超点v,计算代入公式
Figure PCTCN2014087091-appb-000017
的值,选择使公式值最小的超点作为待加入超点,即计算
Figure PCTCN2014087091-appb-000018
其中,
Figure PCTCN2014087091-appb-000019
dv为以超点v为端点的超边个数,Vi为超点结果集合。
Step 32: Select the super point v that has not been added to any division according to the order of the super point id from small to large, and calculate the substitution formula.
Figure PCTCN2014087091-appb-000017
Value, select the super point that minimizes the formula value as the point to be added, ie calculate
Figure PCTCN2014087091-appb-000018
among them,
Figure PCTCN2014087091-appb-000019
d v is the number of super edges with the super point v as the endpoint, and V i is the super point result set.
步骤33、当|Vi|+|v|≤n时,超点v加入集合Vi,当|Vi|+|v|>n时,超点v不加入集合Vi,其中|v|表示集合V中元素的个数。Step 33: When |V i |+|v|≤n, the super point v is added to the set V i , and when |V i |+|v|>n, the super point v is not added to the set V i , where |v| Represents the number of elements in the set V.
步骤34、当|Vi|<n时,跳转到步骤302。 Step 34, when |V i |<n, jump to step 302.
步骤35、返回结果集合ViStep 35: Return the result set V i .
采用本发明实施例的方法通过标签传递算法将原始图数据转换局部密集的带权超图;设计了一个逐次分割算法实现带权图的均衡划分;最后将划分后的带权图子图还原回原始图对应的顶点和边。与相关技术相比,可以分割速度更快,处理数据规模更大,分割后的数据块之间耦合度较小,有效降低使用BSP模型的并行计算平台中个工作顶点之间的数据通信,提高了处理效率。The method of the embodiment of the invention converts the original map data into a locally dense weighted hypergraph by a label transfer algorithm; a successive segmentation algorithm is designed to realize the equalization partition of the weighted graph; finally, the divided weighted graph is restored back. The vertices and edges corresponding to the original image. Compared with the related technology, the segmentation speed is faster, the processing data scale is larger, and the degree of coupling between the segmented data blocks is smaller, thereby effectively reducing data communication between the working vertices in the parallel computing platform using the BSP model, thereby improving Processing efficiency.
本发明实施例的方法可以广泛用于需要进行图数据并行处理的领域:The method of the embodiments of the present invention can be widely used in the field where parallel processing of graph data is required:
1、在并行图数据分析平台中,可以作为数据加载算法,根据图数据的拓扑结构合理分配其对应的工作节点,降低并行计算过程中工作节点之间的通信量;1. In the parallel graph data analysis platform, it can be used as a data loading algorithm, and the corresponding working nodes are reasonably allocated according to the topology structure of the graph data, thereby reducing the communication between the working nodes in the parallel computing process;
2、在并行图数据分析算法中,可以作为数据预处理的一部分,对原始数据进行划分,提高算法运行效率,缩短运行时间; 2. In the parallel graph data analysis algorithm, it can be used as part of data preprocessing to divide the original data, improve the efficiency of the algorithm, and shorten the running time;
3、可以作为一种有监督的图分析算法,对图数据进行聚类。通过人为设置需要聚合的种类个数,计算得到所需的聚类结果。3. It can be used as a supervised graph analysis algorithm to cluster graph data. The desired clustering result is calculated by manually setting the number of types to be aggregated.
图3为本发明实施例的一种图数据分割的装置的示意图,如图3所示,本实施例的装置可以包括:FIG. 3 is a schematic diagram of an apparatus for data segmentation according to an embodiment of the present invention. As shown in FIG. 3, the apparatus of this embodiment may include:
转换模块,设置为通过并行的标签传递算法将原始图数据转换为局部密集的带权超图;a conversion module configured to convert the original map data into a locally dense weighted hypergraph by a parallel label transfer algorithm;
划分模块,设置为通过划分算法逐次将所述带权超图均衡划分为带权超图子图;a dividing module, configured to divide the weighted hypergraph equalization into weighted hypergraph subgraphs by a partitioning algorithm;
还原模块,设置为将所述带权超图子图还原为原始图对应的数据。The restoration module is configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.
在一优选实施例中,所述转换模块,是设置为通过并行的标签传递算法将原始图数据中具有同样标签的顶点聚合成一个超点,所述超点的权值为该超点所包含的顶点个数;所述超点之间的连边为超边,由原始图中的边决定所述超边的权值;由所述超点和所述超边构所述带权超图。In a preferred embodiment, the conversion module is configured to aggregate the vertices having the same label in the original map data into a super point by a parallel label transfer algorithm, and the weight of the super point is included in the super point. The number of vertices; the connected edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image; the weighted hypergraph is constructed by the super point and the super edge .
其中,由原始图中的边决定所述超边的权值包括:如原始图中的边的两个端点在所述带权超图中属于不同的超点,则该两个超点之间存在一条超边,该超边的权值增加1;如原始图中的边的两个端点在所述带权超图中属于同一下超点,则不产生超边。The weight of the super edge determined by the edge in the original image includes: if two endpoints of the edge in the original image belong to different hyperpoints in the weighted hypergraph, then between the two superpoints There is a super edge, and the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.
在一优选实施例中,所述划分模块,是设置为以所述带权超图中的超点为起点依次计算最小化PRC值,根据所述最小化PRC值将所述带权超图划分为指定块数的带权超图子图。In a preferred embodiment, the dividing module is configured to sequentially calculate a minimized PRC value starting from a super point in the weighted hypergraph, and divide the weighted hypergraph according to the minimized PRC value. A weighted supergraph subgraph that specifies the number of blocks.
以下通过几个应用实例对本发明实施例的方法进行详细的说明。The method of the embodiment of the present invention is described in detail below through several application examples.
示例1:Example 1:
参照图4,假设需求分割块数为2,本实施例的分割步骤如下:Referring to FIG. 4, assuming that the number of required partitioning blocks is two, the dividing steps of this embodiment are as follows:
步骤101:并行标签传递聚合超图;Step 101: Passing an aggregate hypergraph in parallel tags;
1)、初始化图中每一个顶点的标签为自身ID;设置计数器t=0,设置计数器t上限为60;以图中顶点集合{0,1,2,3,4}为例进行说明,每个顶点的标签值为自身ID,即L0(0)=0,L1(0)=1,L2(0)=2,L3(0)=3,L4(0)=4。 1), the label of each vertex in the initialization graph is its own ID; set the counter t=0, set the upper limit of the counter t to 60; take the vertex set {0, 1, 2, 3, 4} in the figure as an example, each The tag values of the vertices are their own IDs, that is, L 0 (0) = 0, L 1 (0) = 1, L 2 (0) = 2, L 3 (0) = 3, and L 4 (0) = 4.
2)、每个顶点v发送自己的标签给它的邻接顶点,根据邻接关系,顶点0发送标签0至相邻点{1,2,3,4,5},顶点1发送标签1至相邻顶点{0,2,3,4},顶点2发送标签2至相邻顶点{0,1,3,4,8},顶点3发送标签3至相邻顶点{0,1,2,4},顶点4发送标签1至相邻顶点{0,1,2,3,11,14};2) Each vertex v sends its own label to its neighboring vertex. According to the adjacency relationship, vertex 0 sends label 0 to the adjacent point {1, 2, 3, 4, 5}, and vertex 1 sends label 1 to the adjacent Vertex {0, 2, 3, 4}, vertex 2 sends label 2 to adjacent vertex {0, 1, 3, 4, 8}, vertex 3 sends label 3 to adjacent vertex {0, 1, 2, 4} , vertex 4 sends label 1 to adjacent vertices {0, 1, 2, 3, 11, 14};
3)、每个顶点v根据接收到的其他顶点发送过来的标签,计算自己的新标签,为返回出现次数最多的标签,当次数最多标签有多个时,随机选取其中之一;由于为第一次接受标签信息,各顶点接收到的标签均为一次,为了简化计算过程,不妨令随机选择标签结果均为1,则可得结果L0(1)=1,L1(1)=2,L2(1)=1,L3(1)=1,L4(1)=1,由于顶点1接收到的标签信息中不包括1,所以令其随机选择结果为2;3) Each vertex v calculates its own new label according to the received labels sent by other vertices, and returns the label with the most occurrences. When there are more than one label, one of them is randomly selected; The label information is received once, and the labels received by each vertex are all once. In order to simplify the calculation process, the random selection label result is 1, and the result L 0 (1)=1, L 1 (1)=2 , L 2 (1)=1, L 3 (1)=1, L 4 (1)=1, since the tag information received by vertex 1 does not include 1, so the random selection result is 2;
4)、由于迭代没有达到设定上限,每个顶点继续发送自己当前的标签信息,顶点0发送标签1至相邻点{1,2,3,4,5},顶点1发送标签2至相邻顶点{0,2,3,4},顶点2发送标签1至相邻顶点{0,1,3,4,8},顶点3发送标签1至相邻顶点{0,1,2,4},顶点4发送标签1至相邻顶点{0,1,2,3,11,14};4) Since the iteration does not reach the set upper limit, each vertex continues to send its own current tag information, vertex 0 sends tag 1 to the adjacent point {1, 2, 3, 4, 5}, and vertex 1 sends tag 2 to phase Neighbor vertex {0, 2, 3, 4}, vertex 2 sends label 1 to adjacent vertex {0, 1, 3, 4, 8}, vertex 3 sends label 1 to adjacent vertex {0, 1, 2, 4 }, vertex 4 sends label 1 to adjacent vertices {0, 1, 2, 3, 11, 14};
5)、各顶点接受标签信息,并计算自己的新标签。顶点0接收到的信息为L1(1)=2,L2(1)=1,L3(1)=1,L4(1)=1,L5(1)=label5,由于1为大多数,L0(2)=1;顶点1接收到的信息为L0(1)=1,L2(1)=1,L3(1)=1,L4(1)=1,由于1为大多数,L1(2)=1;其余顶点计算过程与顶点1类似,且后续迭代步骤中由于相邻顶点中大多数标签为1,所以保持不变。5) Each vertex accepts the tag information and calculates its own new tag. The information received by vertex 0 is L 1 (1)=2, L 2 (1)=1, L 3 (1)=1, L 4 (1)=1, L 5 (1)=label5, since 1 is Most, L 0 (2)=1; the information received by vertex 1 is L 0 (1)=1, L 2 (1)=1, L 3 (1)=1, L 4 (1)=1, Since 1 is the majority, L 1 (2)=1; the rest of the vertex calculation process is similar to vertex 1, and the subsequent iteration step remains unchanged because most of the labels in the adjacent vertex are 1.
6)、t=t+1,跳转到2),重复以上过程,直到所有顶点标签不再变化或者计数器t到达设置的上限时,算法停止。6), t=t+1, jump to 2), repeat the above process until the vertex labels no longer change or the counter t reaches the set upper limit, the algorithm stops.
步骤101计算结果见图5,图中使用相同阴影背景表示聚合结束时标签相同的顶点。生成对应带权超图,见图6,其中超点1所对应的原图顶点为{5,6,7,8,9},超点2对应的原图顶点为{0,1,2,3,4},超点3对应的原图顶点为{10,11,12,13,14},超点4对应的原图顶点为{15,16,17,18,19},超点5对应的原图顶点为{25,26,27,28,29},超点6对应的原图顶点为{20,21,22,23,24},超点7对应的原图顶点为{50,51,52,53,54},超点8对应的原图顶点为{55,56,57,58,59},超点9对应的原图顶点为{40,41,42,43,44},超点10对应的 原图顶点为{35,36,37,38,39},超点11对应的原图顶点为{30,31,32,33,34},超点12对应的原图顶点为{45,46,47,48,49}。其中超边权重见标识,超点权重均为5。The result of the calculation in step 101 is shown in Fig. 5. The same shaded background is used to indicate the same vertices at the end of the aggregation. Generate a corresponding weighted hypergraph, as shown in Figure 6, where the vertex corresponding to the super point 1 is {5, 6, 7, 8, 9}, and the vertex corresponding to the super point 2 is {0, 1, 2, 3, 4}, the vertices of the original image corresponding to the super point 3 are {10, 11, 12, 13, 14}, and the vertices of the original image corresponding to the super point 4 are {15, 16, 17, 18, 19}, and the super point 5 The corresponding vertices of the original image are {25, 26, 27, 28, 29}, the vertices of the original image corresponding to the super point 6 are {20, 21, 22, 23, 24}, and the vertices of the original image corresponding to the super point 7 are {50 , 51, 52, 53, 54}, the vertex corresponding to the super point 8 is {55, 56, 57, 58, 59}, and the vertex corresponding to the super point 9 is {40, 41, 42, 43, 44 }, corresponding to super 10 The vertices of the original image are {35, 36, 37, 38, 39}, the vertices of the original image corresponding to the super point 11 are {30, 31, 32, 33, 34}, and the vertices of the original image corresponding to the super point 12 are {45, 46. , 47, 48, 49}. Among them, the super-edge weights are marked, and the super-point weights are all 5.
步骤102:逐次划分;Step 102: successively dividing;
1)、初始化超点集合setList和bestList为空集,设置需要划分的块个数为2,设置分块计数器i=0;1) Initializing the super point set setList and the bestList as an empty set, setting the number of blocks to be divided into 2, and setting the block counter i=0;
2)、以带权超图中的超点为起点依次计算minimizePRC值。根据图5所示数据,以超点1为起点计算S=minimizePRC(Gm,6,1),表示以超点1为起点,从超图Gm中选取元素个数为6的集合。根据minimizePRC算法,进行如下计算:2) Calculate the minimumPRC value in turn based on the super point in the weighted hypergraph. According to the data shown in FIG. 5, S=minimizePRC(G m ,6,1) is calculated starting from the super point 1 , and the set of the number of elements is 6 from the super map G m with the super point 1 as the starting point. According to the minimumPRC algorithm, the following calculations are performed:
21)、设置结果集合S为空集,并加入起始超点1为元素;21), setting the result set S to be an empty set, and adding the starting super point 1 as an element;
22)、此时集合S元素个数为1,小于设定的6,需要继续加入超点;对其余超点进行计算,选择使
Figure PCTCN2014087091-appb-000020
最小的超点加入集合。以超点2为例,进行计算,d2为超点2所有临边的权值和,为2+2=4,2k2,V=2∑2,j∈VW(2,1)=2*1=2,所以
Figure PCTCN2014087091-appb-000021
同理计算其他的超点,得到超点3对应值
Figure PCTCN2014087091-appb-000022
超点4对应值
Figure PCTCN2014087091-appb-000023
超点5对应值
Figure PCTCN2014087091-appb-000024
超点6对应值
Figure PCTCN2014087091-appb-000025
超点7对应值
Figure PCTCN2014087091-appb-000026
超点8对应值
Figure PCTCN2014087091-appb-000027
超点9对应值
Figure PCTCN2014087091-appb-000028
超点10对应值
Figure PCTCN2014087091-appb-000029
超点11对应值
Figure PCTCN2014087091-appb-000030
超点12对应值
Figure PCTCN2014087091-appb-000031
由于当前集合S的元素为1,小于目标需要的6,所以添加超点2为集合S的元素;
22) At this time, the number of S elements in the set is 1, which is less than the set value of 6. It is necessary to continue to add the super point; calculate the remaining super points and select to make
Figure PCTCN2014087091-appb-000020
The smallest super point is added to the collection. Taking super point 2 as an example, the calculation is made, d 2 is the weight sum of all the edges of the super point 2, which is 2+2= 4, 2k 2, V = 2∑ 2, j∈V W(2,1)= 2*1=2, so
Figure PCTCN2014087091-appb-000021
Similarly, calculate other super points and get the corresponding value of super point 3.
Figure PCTCN2014087091-appb-000022
Super point 4 corresponding value
Figure PCTCN2014087091-appb-000023
Super point 5 corresponding value
Figure PCTCN2014087091-appb-000024
Super point 6 corresponding value
Figure PCTCN2014087091-appb-000025
Super point 7 corresponding value
Figure PCTCN2014087091-appb-000026
Super point 8 corresponding value
Figure PCTCN2014087091-appb-000027
Super point 9 corresponding value
Figure PCTCN2014087091-appb-000028
Super point 10 corresponding value
Figure PCTCN2014087091-appb-000029
Super point 11 corresponding value
Figure PCTCN2014087091-appb-000030
Super point 12 corresponding value
Figure PCTCN2014087091-appb-000031
Since the element of the current set S is 1, which is smaller than the required 6 of the target, the super point 2 is added as the element of the set S;
23)、重复执行22)直到集合S的元素个数达到6个;并返回集合S作为结果;集合S为{1,2,3,4,5,6};23), repeating 22) until the number of elements of the set S reaches 6; and returns the set S as a result; the set S is {1, 2, 3, 4, 5, 6};
24)、对步骤22)的结果集合S进行PRC计算,得到PRC(S)=0.33333,由于此时bestSet为空集所以PRC(S)<PRC(bestSet),令bestSet为S,增加bestSet中的元素至setList,然后跳转至步骤22),计算以超点2为起点的集 合S=minimizePRC(Gm,6,2),和S所对应的PRC(S)。当全部12个超点为起点的集合计算完毕后,其中PRC(S)最小的S(即bestSet)集合即为数据划分之一,从超点集合Vm中移除bestSet中的元素,计数器i增加1;24) Perform a PRC calculation on the result set S of step 22) to obtain PRC(S)=0.33333. Since the bestSet is an empty set at this time, PRC(S)<PRC(bestSet), let the bestSet be S, and increase the content in the bestSet. Element to setList, then jump to step 22), calculate the set S=minimizePRC(G m ,6,2) starting from the super point 2, and the PRC(S) corresponding to S. After all the 12 super points are calculated as the starting point, the S (ie bestSet) set with the smallest PRC(S) is one of the data partitions, and the elements in the bestSet are removed from the super point set V m , the counter i Increase by 1;
25)、由于k=2,分割1块后,剩余超点即为另一块结果,最后返回两块数据结果。25) Since k=2, after dividing one block, the remaining super point is another block result, and finally returns two pieces of data result.
计算结果分为两个超点集合(7,8,9,10,11,12}和(1,2,3,4,5,6},对应图中可以看出,切断了超点12和3、4和8之间的权重为1的超边。The calculation result is divided into two super-point sets (7, 8, 9, 10, 11, 12} and (1, 2, 3, 4, 5, 6}, which can be seen in the corresponding figure, cut off the super point 12 and The weight between 3, 4, and 8 is the super edge of 1.
26)、将超点还原回对应的原始点,完成分割。26), restore the super point back to the corresponding original point, and complete the segmentation.
参照图7,假设需求分割的块数为4,本实施例包括如下步骤:Referring to FIG. 7, assuming that the number of blocks required for segmentation is four, the embodiment includes the following steps:
步骤201:并行标签传递聚合超图,包括以下步骤:Step 201: The parallel label passes the aggregate hypergraph, and includes the following steps:
步骤2011、初始化图中每一个顶点的标签为自身ID;设置计数器t=0,设置计数器t上限为60;以图中顶点集合{18,19,20,21,22,23}为例进行说明,每个顶点的标签值为自身ID,即L18(0)=18,L19(0)=19,L20(0)=20,L21(0)=21,L22(0)=22,L23(0)=23;Step 2011, the label of each vertex in the initialization graph is its own ID; setting the counter t=0, setting the upper limit of the counter t to 60; taking the vertex set {18, 19, 20, 21, 22, 23} in the figure as an example for description The label value of each vertex is its own ID, that is, L 18 (0)=18, L 19 (0)=19, L 20 (0)=20, L 21 (0)=21, L 22 (0)= 22, L 23 (0) = 23;
步骤2012、每个顶点v,发送自己的标签给它的邻接顶点,根据邻接关系,顶点18发送标签18至相邻点{12,19,20,21,22,23,24},顶点19发送标签19至相邻顶点{17,18,20,21,22,23},顶点20发送标签20至相邻顶点{18,19,21,22,23},顶点21发送标签21至相邻顶点{18,19,20,22,23},顶点22发送标签22至相邻顶点{18,19,20,21,23,28},顶点23发送标签23至相邻顶点{18,19,20,21,22};Step 2012, each vertex v, sends its own label to its adjacent vertex. According to the adjacency relationship, the vertex 18 sends the label 18 to the adjacent point {12, 19, 20, 21, 22, 23, 24}, and the vertex 19 sends Label 19 to adjacent vertices {17, 18, 20, 21, 22, 23}, vertex 20 sends label 20 to adjacent vertices {18, 19, 21, 22, 23}, vertex 21 sends label 21 to adjacent vertices {18,19,20,22,23}, vertex 22 sends label 22 to adjacent vertices {18, 19, 20, 21, 23, 28}, vertex 23 sends label 23 to adjacent vertices {18, 19, 20 , 21, 22};
步骤2013、每个顶点v根据接收到的其他顶点发送过来的标签,计算自己的新标签为返回出现次数最多的标签,当次数最多标签有多个时,随机选取其中之一;由于为第一次接受标签信息,各顶点接收到的标签均为一次,为了简化计算过程,不妨令随机选择标签结果均为18,则可得结果L19(1)=18,L20(1)=18,L21(1)=18,L22(1)=18,L23(1)=18,由于顶点18接收到的标签信息中不包括18,所以令其随机选择结果为12;Step 2013: Each vertex v calculates a label of the highest number of occurrences according to the received label sent by other vertices, and randomly selects one of the labels when the number of times has a maximum number of labels; The label information is received once, and the labels received by each vertex are all once. In order to simplify the calculation process, the random selection label result is 18, and the result L 19 (1)=18, L 20 (1)=18, L 21 (1)=18, L 22 (1)=18, L 23 (1)=18, since the tag information received by the vertex 18 does not include 18, the random selection result is 12;
步骤2014、由于迭代没有达到设定上限,每个顶点继续发送自己当前的标签信息,顶点18发送标签12至相邻顶点{12,19,20,21,22,23,24},顶点19 发送标签18至相邻点{17,18,20,21,22,23},顶点20发送标签18至相邻顶点{18,19,21,22,23},顶点21发送标签18至相邻顶点{18,19,20,22,23},顶点22发送标签18至相邻顶点{18,19,20,21,23,28},顶点23发送标签18至相邻顶点{18,19,20,21,22};Step 2014, since the iteration does not reach the set upper limit, each vertex continues to send its own current tag information, and the vertex 18 sends the tag 12 to the adjacent vertex {12, 19, 20, 21, 22, 23, 24}, vertex 19 Send label 18 to adjacent points {17, 18, 20, 21, 22, 23}, vertex 20 sends label 18 to adjacent vertices {18, 19, 21, 22, 23}, vertex 21 sends label 18 to adjacent Vertices {18, 19, 20, 22, 23}, vertex 22 sends label 18 to adjacent vertices {18, 19, 20, 21, 23, 28}, vertex 23 sends label 18 to adjacent vertices {18, 19, 20,21,22};
步骤2015、各顶点接受标签信息,并计算自己的新标签。顶点18接收到的信息为L12(1)=lable12,L19(1)=18,L20(1)=18,L21(1)=18,L22(1)=18,L23(1)=18,L24(1)=label24,由于18为大多数,L18(2)=18;顶点19接收到的信息为L17(1)=label17,L18(1)=12,L20(1)=18,L21(1)=18,L22(1)=18,L23(1)=18,由于18为大多数,L19(2)=18;其余顶点计算过程与顶点18类似。且后续迭代步骤中由于相邻顶点中大多数标签为18,所以保持不变。上述过程中lable12和lable17分别表示编号为12和17的顶点在经过同样过程中所确定的自身标签,由于其标签在所讨论顶点集合中占少数,所以对计算结果没有影响。In step 2015, each vertex accepts the tag information and calculates its own new tag. The information received by vertex 18 is L 12 (1) = lable12, L 19 (1) = 18, L 20 (1) = 18, L 21 (1) = 18, L 22 (1) = 18, L 23 ( 1)=18, L 24 (1)=label24, since 18 is the majority, L 18 (2)=18; the information received by vertex 19 is L 17 (1)=label17, L 18 (1)=12, L 20 (1)=18, L 21 (1)=18, L 22 (1)=18, L 23 (1)=18, since 18 is the majority, L 19 (2)=18; the rest of the vertex calculation process Similar to vertex 18. And in the subsequent iteration step, since most of the labels in the adjacent vertices are 18, they remain unchanged. In the above process, lable12 and lable17 respectively indicate that the vertices numbered 12 and 17 are self-labeled in the same process, and since their labels are a minority in the set of vertices discussed, there is no influence on the calculation result.
t=t+1,跳转到步骤2012,重复以上过程,直到所有顶点标签不再变化或者计数器t到达设置的上限时,算法停止。t=t+1, jump to step 2012, and repeat the above process until the vertex tags no longer change or the counter t reaches the set upper limit, the algorithm stops.
步骤201计算结果见图8,图中使用相同阴影背景表示聚合结束时标签相同的顶点。生成对应带权超图,见图9,其中,超点1所对应的原图顶点为{42,43,44,45,46,47},超点2对应的原图顶点为{12,13,14,15,16,17},超点3对应的原图顶点为{24,25,26,27,28},超点4对应的原图顶点为{0,1,2,3,4,5},超点5对应的原图顶点为{18,19,20,21,22,23},超点6对应的原图顶点为{6,7,8,9,10,11},超点7对应的原图顶点为{30,31,32,33,34,35},超点8对应的原图顶点为{36,37,38,39,40,41},其中超边权重见标识,超点权重均为6。The result of the calculation in step 201 is shown in Fig. 8. The same shaded background is used to indicate the same vertices at the end of the aggregation. Generate a corresponding weighted hypergraph, see Figure 9, where the vertex corresponding to the super point 1 is {42, 43, 44, 45, 46, 47}, and the vertex corresponding to the super point 2 is {12, 13 , 14, 15, 16, 17}, the vertex of the original image corresponding to the super point 3 is {24, 25, 26, 27, 28}, and the vertex of the original image corresponding to the super point 4 is {0, 1, 2, 3, 4 , 5}, the vertices of the original image corresponding to the super point 5 are {18, 19, 20, 21, 22, 23}, and the vertices of the original image corresponding to the super point 6 are {6, 7, 8, 9, 10, 11}, The vertices of the original image corresponding to the super point 7 are {30, 31, 32, 33, 34, 35}, and the vertices of the original image corresponding to the super point 8 are {36, 37, 38, 39, 40, 41}, wherein the super-edge weights are See the logo, the super point weight is 6.
步骤202:逐次划分,包括以下步骤:Step 202: Divide successively, including the following steps:
步骤2021、初始化超点集合setList和bestList为空集,设置需要划分的块个数为4,设置分块计数器i=0;Step 2021: Initialize the super point set setList and the bestList as an empty set, set the number of blocks to be divided to 4, and set the block counter i=0;
步骤2022、以带权超图中的超点为起点依次计算minimizePRC值。根据图8所示数据,以超点1为起点计算S=minimizePRC(Gm,2,1),表示以超点1为起点,从超图Gm中选取元素个数不超过2的集合。根据minimizePRC算法,进行如下计算: Step 2022: Calculate the minimum PRCC value sequentially starting from the super point in the weighted hypergraph. According to the data shown in Fig. 8, S=minimizePRC(G m , 2,1) is calculated from the point of the super point 1 , and the set of the number of elements not exceeding 2 is selected from the super map G m with the super point 1 as the starting point. According to the minimumPRC algorithm, the following calculations are performed:
1)、设置结果集合S为空集,并加入起始超点1为元素;1), setting the result set S to be an empty set, and adding the starting super point 1 as an element;
2)、此时集合S元素个数为1,小于目标值2,需要继续加入超点;此时集合S元素个数为1,小于目标值2,需要继续加入超点;对其余超点进行计算,选择使
Figure PCTCN2014087091-appb-000032
最小的超点加入集合。以超点2为例,进行计算。d2为超点2所有临边的权值和,为3+1=4,2k2,S=0,所以
Figure PCTCN2014087091-appb-000033
同理计算其他的超点,得到超点3对应值
Figure PCTCN2014087091-appb-000034
超点4对应值
Figure PCTCN2014087091-appb-000035
超点5对应值
Figure PCTCN2014087091-appb-000036
超点6对应值
Figure PCTCN2014087091-appb-000037
超点7对应值
Figure PCTCN2014087091-appb-000038
超点8对应值
Figure PCTCN2014087091-appb-000039
由于当前集合S的元素为1,小于目标需要的2,所以添加超点8为集合S的元素;
2) At this time, the number of elements of the set S is 1, which is smaller than the target value of 2. It is necessary to continue to add the super point; at this time, the number of elements of the set S is 1, which is smaller than the target value of 2, and it is necessary to continue to add the super point; Calculation
Figure PCTCN2014087091-appb-000032
The smallest super point is added to the collection. Take Super Point 2 as an example to calculate. d 2 is the weight sum of all the edges of the super point 2, which is 3+1= 4, 2k 2, S =0, so
Figure PCTCN2014087091-appb-000033
Similarly, calculate other super points and get the corresponding value of super point 3.
Figure PCTCN2014087091-appb-000034
Super point 4 corresponding value
Figure PCTCN2014087091-appb-000035
Super point 5 corresponding value
Figure PCTCN2014087091-appb-000036
Super point 6 corresponding value
Figure PCTCN2014087091-appb-000037
Super point 7 corresponding value
Figure PCTCN2014087091-appb-000038
Super point 8 corresponding value
Figure PCTCN2014087091-appb-000039
Since the element of the current set S is 1, which is smaller than the 2 required by the target, the super point 8 is added as the element of the set S;
3)、由于设定的元素集合大小为2,所以集合S满足要求,并返回集合S作为结果;集合S为{1,8}。3) Since the set element set size is 2, the set S satisfies the requirement and returns the set S as a result; the set S is {1, 8}.
步骤2023、对步骤2022的结果集合S进行PRC计算,得到PRC(S)=1.0,由于此时bestSet为空集所以PRC(S)<PRC(bestSet),令bestSet为S,增加bestSet中的元素至setList。然后跳转至步骤2022,计算以超点2为起点的集合S=minimizePRC(Gm,2,2),和S所对应的PRC(S)。当全部8个超点为起点的集合计算完毕后,其中PRC(S)最小的S(即bestSet)集合即为数据划分之一,从超点集合Vm中移除bestSet中的元素,计数器i增加1。Step 2023: Perform a PRC calculation on the result set S of step 2022 to obtain PRC(S)=1.0. Since the bestSet is an empty set at this time, PRC(S)<PRC(bestSet), and the bestSet is S, and the elements in the bestSet are added. To setList. Then, the process jumps to step 2022 to calculate a set S=minimizePRC(G m , 2, 2) starting from the super point 2, and a PRC(S) corresponding to S. After all the 8 super points are calculated as the starting point, the S (ie bestSet) set with the smallest PRC(S) is one of the data partitions, and the elements in the bestSet are removed from the super point set V m , the counter i Increase by 1.
步骤2024、继续执行步骤2022和2023直到分块计数器等于4结束。At step 2024, steps 2022 and 2023 are continued until the blocking counter equals 4 ends.
步骤2025、由于k=4,分割3块后,剩余超点即为另一块结果,最后返回四块数据结果。Step 2025: Since k=4, after dividing three blocks, the remaining super points are another block result, and finally four block data results are returned.
计算结果分为四个超点集合{1,8},{2,5},{3,7},{4,6},对应图中可以看出,第一次切断了超点7,8和1,4之间的权重为1的超边,第二次切断了超点2,6和3,5之间的权重分别为1和2的超边,第三次切断了超点4,7和6,7之间的权重为1的超边。The calculation result is divided into four super point sets {1, 8}, {2, 5}, {3, 7}, {4, 6}. It can be seen from the corresponding figure that the first time cut off the super point 7, 8 The first side has a weight of 1 and a super edge of 2, and the second time cuts the super point 2, 6 and 3, the weight between 1 and 2 is the super edge of 1 and 2, and the third time cuts the super point 4. The weight between 7 and 6, 7 is 1 for the super edge.
步骤2026、将超点还原回对应的原始点,完成分割。Step 2026: Restore the super point back to the corresponding original point to complete the segmentation.
参照图10,假设需求分割的块数为2,本实施例包括如下步骤: Referring to FIG. 10, it is assumed that the number of blocks required for segmentation is 2. This embodiment includes the following steps:
步骤301:并行标签传递聚合超图,包括以下步骤:Step 301: The parallel label passes the aggregate hypergraph, and includes the following steps:
步骤3011、初始化图中每一个顶点的标签为自身ID;设置计数器t=0,设置计数器t上限为60;以图中顶点集合{28,29,30,31,32,33,34}为例进行说明,每个顶点的标签值为自身ID,即L28(0)=28,L29(0)=29,L30(0)=30,L31(0)=31,L32(0)=32,L33(0)=33,L34(0)=34。Step 3011: The label of each vertex in the initialization graph is its own ID; the counter t=0 is set, and the upper limit of the counter t is set to 60; taking the set of vertices in the figure {28, 29, 30, 31, 32, 33, 34} as an example. To be explained, the label value of each vertex is its own ID, that is, L 28 (0)=28, L 29 (0)=29, L 30 (0)=30, L 31 (0)=31, L 32 (0) ) = 32, L 33 (0) = 33, L 34 (0) = 34.
步骤3012、每个顶点v发送自己的标签给它的邻接顶点,根据邻接关系,顶点28发送标签28至相邻点{21,29,30,21,32,33,34,35},顶点29发送标签29至相邻顶点{28,30,31,32,33,34},顶点30发送标签30至相邻顶点{28,29,31,32,33,34},顶点31发送标签31至相邻顶点{28,29,30,32,33,34},顶点32发送标签32至相邻顶点{28,29,30,31,33,34},顶点33发送标签33至相邻顶点{27,28,29,30,31,32,34},顶点34发送标签34至相邻顶点{5,28,29,30,31,32,33};Step 3012: Each vertex v sends its own label to its adjacent vertex. According to the adjacency relationship, vertex 28 sends label 28 to the adjacent point {21, 29, 30, 21, 32, 33, 34, 35}, vertex 29 Send the tag 29 to the adjacent vertex {28, 30, 31, 32, 33, 34}, the vertex 30 sends the tag 30 to the adjacent vertex {28, 29, 31, 32, 33, 34}, and the vertex 31 sends the tag 31 to Adjacent vertices {28, 29, 30, 32, 33, 34}, vertex 32 sends label 32 to adjacent vertices {28, 29, 30, 31, 33, 34}, vertex 33 sends label 33 to adjacent vertices { 27, 28, 29, 30, 31, 32, 34}, vertex 34 sends label 34 to adjacent vertices {5, 28, 29, 30, 31, 32, 33};
步骤3013、每个顶点v根据接收到的其他顶点发送过来的标签,计算自己的新标签,为返回出现次数最多的标签,当次数最多标签有多个时,随机选取其中之一;由于为第一次接受标签信息,各顶点接收到的标签均为一次,为了简化计算过程,不妨令随机选择标签结果均为28,则可得结果L29(1)=28,L30(1)=28,L31(1)=28,L32(1)=28,L33(1)=28,L34(1)=28,由于顶点28接收到的标签信息中不包括28,所以令其随机选择结果为29,有L28(1)=29;Step 3013: Each vertex v calculates its own new label according to the received label sent by other vertices, and returns the label with the most occurrences. When the number of labels is more than one, one of them is randomly selected; The label information is received once, and the labels received by each vertex are all once. In order to simplify the calculation process, the random selection label result is 28, and the result L 29 (1)=28, L 30 (1)=28 , L 31 (1)=28, L 32 (1)=28, L 33 (1)=28, L 34 (1)=28, since the tag information received by the vertex 28 does not include 28, so it is randomized The result of the selection is 29, with L 28 (1)=29;
步骤3014、由于迭代没有达到设定上限,每个顶点继续发送自己当前的标签信息,顶点28发送标签29至相邻点{21,29,30,21,32,33,34,35},顶点29发送标签28至相邻顶点{28,30,31,32,33,34},顶点30发送标签28至相邻顶点{28,29,31,32,33,34},顶点31发送标签28至相邻顶点{28,29,30,32,33,34},顶点32发送标签28至相邻顶点{28,29,30,31,33,34},顶点33发送标签28至相邻顶点{27,28,29,30,31,32,34},顶点34发送标签28至相邻顶点{5,28,29,30,31,32,33};Step 3014: Since the iteration does not reach the set upper limit, each vertex continues to transmit its own current tag information, and the vertex 28 sends the tag 29 to the adjacent point {21, 29, 30, 21, 32, 33, 34, 35}, the vertex 29 sends label 28 to adjacent vertices {28, 30, 31, 32, 33, 34}, vertex 30 sends label 28 to adjacent vertices {28, 29, 31, 32, 33, 34}, vertex 31 sends label 28 To adjacent vertices {28, 29, 30, 32, 33, 34}, vertex 32 sends label 28 to adjacent vertices {28, 29, 30, 31, 33, 34}, vertex 33 sends label 28 to adjacent vertices {27, 28, 29, 30, 31, 32, 34}, vertex 34 sends label 28 to adjacent vertices {5, 28, 29, 30, 31, 32, 33};
步骤3015、各顶点接受标签信息,并计算自己的新标签。顶点28接收到的信息为L12(1)=lable21,L29(1)=28,L30(1)=28,L31(1)=28,L32(1)=28,L33(1)=28,L34(1)=28,L35(1)=label35,由于28为大 多数,L28(2)=28;顶点29接收到的信息为L28(1)=29,L30(1)=28,L31(1)=28,L32(1)=28,L33(1)=28,L34(1)=28,由于28为大多数,L29(2)=28;其余顶点计算过程与顶点28类似。且后续迭代步骤中由于相邻顶点中大多数标签为28,所以保持不变。上述过程中lable21和label35分别表示编号为21和35的顶点在经过同样过程中所确定的自身标签,由于其标签在所讨论顶点集合中占少数,所以对计算结果没有影响。In step 3015, each vertex accepts the tag information and calculates its own new tag. The information received by vertex 28 is L 12 (1)=lable21, L 29 (1)=28, L 30 (1)=28, L 31 (1)=28, L 32 (1)=28, L 33 ( 1) = 28, L 34 (1) = 28, L 35 (1) = label 35 , since 28 is the majority, L 28 (2) = 28; the information received by vertex 29 is L 28 (1) = 29, L 30 (1)=28, L 31 (1)=28, L 32 (1)=28, L 33 (1)=28, L 34 (1)=28, since 28 is the majority, L 29 (2 ) = 28; the rest of the vertex calculation process is similar to vertex 28. And in the subsequent iteration step, since most of the labels in the adjacent vertices are 28, they remain unchanged. In the above process, lable21 and label35 respectively indicate that the vertices numbered 21 and 35 are self-labeled in the same process, and since their labels are a minority in the set of vertices discussed, there is no influence on the calculation result.
t=t+1,跳转到(ii),重复以上过程,直到所有顶点标签不再变化或者计数器t到达设置的上限时,算法停止。t=t+1, jump to (ii), repeat the above process until the vertex tags no longer change or the counter t reaches the set upper limit, the algorithm stops.
步骤301计算结果见图11,图中使用相同阴影背景表示聚合结束时标签相同的顶点。生成对应带权超图,见图12,其中超点1所对应的原图顶点为{35,36,37,38,39,40,41},超点2对应的原图顶点为{0,1,2,3,4,5,6},超点3对应的原图顶点为{7,8,9,10,11,12,13},超点4对应的原图顶点为{14,15,16,17,18,19,20},超点5对应的原图顶点为{49,50,51,52,53,54,55},超点6对应的原图顶点为{28,29,30,31,32,33,34},超点7对应的原图顶点为{21,22,23,24,25,26,27},超点8对应的原图顶点为{42,43,44,45,46,47,48}。其中超边权重见标识,超点权重均为7。The result of the calculation in step 301 is shown in Fig. 11, in which the same shaded background is used to indicate the same vertices at the end of the aggregation. Generate a corresponding weighted hypergraph, as shown in Figure 12, where the vertex corresponding to the super point 1 is {35, 36, 37, 38, 39, 40, 41}, and the vertex corresponding to the super point 2 is {0, 1,2,3,4,5,6}, the vertex of the original image corresponding to the super point 3 is {7,8,9,10,11,12,13}, and the vertex of the original image corresponding to the super point 4 is {14, 15,16,17,18,19,20}, the vertex of the original image corresponding to the super point 5 is {49, 50, 51, 52, 53, 54, 55}, and the vertex of the original image corresponding to the super point 6 is {28, 29, 30, 31, 32, 33, 34}, the vertex of the original image corresponding to the super point 7 is {21, 22, 23, 24, 25, 26, 27}, and the vertex of the original image corresponding to the super point 8 is {42, 43,44,45,46,47,48}. Among them, the super-edge weights are marked, and the super-point weights are all 7.
步骤302:逐次划分;Step 302: successively dividing;
步骤3022、初始化超点集合setList和bestList为空集,设置需要划分的块个数为2,设置分块计数器i=0;Step 3022: Initializing the super point set setList and the bestList as an empty set, setting the number of blocks to be divided into 2, and setting the block counter i=0;
步骤3023、以带权超图中的超点为起点依次计算minimizePRC值。根据图5所示数据,以超点1为起点计算S=minimizePRC(Gm,4,1),表示以超点1为起点,从超图Gm中选取元素个数为4的集合。根据minimizePRC算法,进行如下计算:In step 3023, the minimum PRCC value is sequentially calculated starting from the super point in the weighted hypergraph. According to the data shown in FIG. 5, S=minimizePRC(G m , 4, 1) is calculated starting from the super point 1 , and the set of the number of elements is selected from the super map G m with the super point 1 as the starting point. According to the minimumPRC algorithm, the following calculations are performed:
1)、设置结果集合S为空集,并加入起始超点1为元素;1), setting the result set S to be an empty set, and adding the starting super point 1 as an element;
2)、此时集合S元素个数为1,小于设定的4,需要继续加入超点;此时集合S元素个数为1,小于设定的4,需要继续加入超点;对其余超点进行计算,选择使
Figure PCTCN2014087091-appb-000040
最小的超点加入集合。以超点2为例,进行计算,d2为超点2所有临边的权值和,为2+1+1=4,2k2,S=2∑2,j∈SW(2,1)=2*0=0, 所以
Figure PCTCN2014087091-appb-000041
同理计算其他的超点,得到超点3对应值
Figure PCTCN2014087091-appb-000042
超点4对应值
Figure PCTCN2014087091-appb-000043
超点5对应值
Figure PCTCN2014087091-appb-000044
超点6对应值
Figure PCTCN2014087091-appb-000045
超点7对应值
Figure PCTCN2014087091-appb-000046
超点8对应值
Figure PCTCN2014087091-appb-000047
由于当前集合S的元素为1,小于目标需要的4,由于
Figure PCTCN2014087091-appb-000048
添加超点2为集合S的元素;
2) At this time, the number of S elements in the set is 1, which is less than the set 4, and it is necessary to continue to add the super point; at this time, the number of S elements in the set is 1, less than the set 4, and it is necessary to continue to add the super point; Point to calculate, choose to make
Figure PCTCN2014087091-appb-000040
The smallest super point is added to the collection. Taking super point 2 as an example, the calculation is made, d 2 is the weight sum of all the edges of the super point 2, which is 2+1+1= 4, 2k 2, S = 2∑ 2, j∈S W(2,1 )=2*0=0, so
Figure PCTCN2014087091-appb-000041
Similarly, calculate other super points and get the corresponding value of super point 3.
Figure PCTCN2014087091-appb-000042
Super point 4 corresponding value
Figure PCTCN2014087091-appb-000043
Super point 5 corresponding value
Figure PCTCN2014087091-appb-000044
Super point 6 corresponding value
Figure PCTCN2014087091-appb-000045
Super point 7 corresponding value
Figure PCTCN2014087091-appb-000046
Super point 8 corresponding value
Figure PCTCN2014087091-appb-000047
Since the element of the current set S is 1, less than the 4 required by the target, due to
Figure PCTCN2014087091-appb-000048
Add super point 2 to the element of set S;
3)、重复执行2)直到集合S的元素个数达到4个;并返回集合S作为结果;集合S为{1,2,5,8};3), repeat execution 2) until the number of elements of the set S reaches 4; and returns the set S as a result; the set S is {1, 2, 5, 8};
步骤3024、对步骤3022的结果集合S进行PRC计算,得到PRC(S)=1.5,由于此时bestSet为空集所以PRC(S)<PRC(bestSet),令bestSet为S,增加bestSet中的元素至setList。然后跳转至步骤(ii),计算以超点2为起点的集合S=minimizePRC(Gm,4,2),和S所对应的PRC(S)。当全部8个超点为起点的集合计算完毕后,其中PRC(S)最小的S(即bestSet)集合即为数据划分之一。从超点集合Vm中移除bestSet中的元素。计数器i增加1。Step 3024: Perform a PRC calculation on the result set S of step 3022 to obtain PRC(S)=1.5. Since the bestSet is an empty set at this time, PRC(S)<PRC(bestSet), and the bestSet is S, and the elements in the bestSet are added. To setList. Then, it jumps to step (ii), and calculates a set S=minimizePRC(G m , 4, 2) starting from the super point 2, and a PRC(S) corresponding to S. When the set of all 8 superpoints is the starting point, the S (ie bestSet) set with the smallest PRC(S) is one of the data partitions. BestSet element is removed from the set of points V m in super. Counter i is incremented by 1.
步骤3025、由于k=2,分割1块后,剩余超点即为另一块结果,最后返回两块数据结果。Step 3025: Since k=2, after dividing one block, the remaining super point is another block result, and finally returns two pieces of data result.
计算结果分为两个超点集合{1,2,5,8}和{3,4,6,7},对应图中可以看出,切断了超点1,3和1,6和7,8和5,7之间的权重为1的超边,以及2,3之间的权重为2的超边。The calculation result is divided into two super point sets {1, 2, 5, 8} and {3, 4, 6, 7}. It can be seen from the corresponding figure that the super points 1, 3 and 1, 6 and 7 are cut off. The super-edge with a weight of 1 between 8 and 5, and a super-edge with a weight of 2 between 2 and 3.
将超点还原回对应的原始点,完成分割。Restore the super point back to the corresponding original point and complete the segmentation.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明不限制于任何特定形式的硬件和软件的结合。One of ordinary skill in the art will appreciate that all or a portion of the steps described above can be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any specific form of combination of hardware and software.
以上仅为本发明的优选实施例,当然,本发明还可有其他多种实施例, 在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。The above is only a preferred embodiment of the present invention, and of course, the present invention may have other various embodiments. A person skilled in the art can make various corresponding changes and modifications in accordance with the present invention without departing from the spirit and scope of the invention, but the corresponding changes and modifications are to be included in the appended claims. The scope of protection.
工业实用性Industrial applicability
本发明实施例提供一种图数据分割的方法及装置,可以分割速度更快,处理数据规模更大,分割后的数据块之间耦合度较小,有效降低使用BSP模型的并行计算平台中个工作顶点之间的数据通信,提高了处理效率。 The embodiment of the invention provides a method and a device for segmenting data of a graph, which can be faster in segmentation, larger in processing data size, less coupled between the segmented data blocks, and effectively reduce the number of parallel computing platforms using the BSP model. Data communication between working vertices improves processing efficiency.

Claims (10)

  1. 一种图数据分割的方法,包括:A method for segmenting data of a graph, comprising:
    通过并行的标签传递算法将原始图数据转换为局部密集的带权超图;Converting the original graph data into a locally dense weighted hypergraph by a parallel label transfer algorithm;
    通过划分算法逐次将所述带权超图均衡划分为带权超图子图;The weighted hypergraph equilibrium is successively divided into weighted hypergraph subgraphs by a partitioning algorithm;
    将所述带权超图子图还原为原始图对应的数据。The weighted hypergraph subgraph is restored to the data corresponding to the original graph.
  2. 如权利要求1所述的方法,其中:所述通过并行的标签传递算法将原始图数据转换为局部密集的带权超图,包括:The method of claim 1 wherein said converting said original map data to a locally dense weighted hypergraph by a parallel label transfer algorithm comprises:
    通过并行的标签传递算法将原始图数据中具有同样标签的顶点聚合成一个超点,所述超点的权值为该超点所包含的顶点个数;The vertex with the same label in the original graph data is aggregated into a super point by a parallel label passing algorithm, and the weight of the super point is the number of vertices included in the super point;
    所述超点之间的连边为超边,所述超边的权值由原始图中的边决定;The connecting edge between the super points is a super edge, and the weight of the super edge is determined by an edge in the original image;
    由所述超点和所述超边构所述带权超图。The weighted hypergraph is represented by the super point and the super edge.
  3. 如权利要求2所述的方法,其中:所述超边的权值由原始图中的边决定,包括:The method of claim 2 wherein: the weight of said superedge is determined by an edge in the original image, comprising:
    如原始图中的边的两个端点在所述带权超图中属于不同的超点,则该两个超点之间存在一条超边,该超边的权值增加1;If the two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a super edge between the two superpoints, and the weight of the super edge is increased by 1;
    如原始图中的边的两个端点在所述带权超图中属于同一下超点,则不产生超边。If the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no superedge is generated.
  4. 如权利要求1-3任一项所述的方法,其中:所述通过划分算法逐次将所述带权超图均衡划分为带权超图子图,包括:The method according to any one of claims 1-3, wherein: the weighted hypergraph equalization is divided into weighted hypergraph subgraphs by a partitioning algorithm, including:
    以所述带权超图中的超点为起点依次计算最小化局部切分率值,Minimizing the localized partial rate value by using the super point in the weighted hypergraph as a starting point
    根据所述最小化局部切分率值将所述带权超图划分为指定块数的带权超图子图。And dividing the weighted hypergraph into a weighted hypergraph subgraph of a specified number of blocks according to the minimized partial cut rate value.
  5. 一种图数据分割的装置,包括:A device for segmenting data of a graph, comprising:
    转换模块,其设置为通过并行的标签传递算法将原始图数据转换为局部密集的带权超图;a conversion module configured to convert the original map data into a locally dense weighted hypergraph by a parallel label transfer algorithm;
    划分模块,其设置为通过划分算法逐次将所述带权超图均衡划分为带权 超图子图;以及a dividing module, configured to divide the weighted hypergraph equalization into weights by a partitioning algorithm Supergraph subgraph;
    还原模块,其设置为将所述带权超图子图还原为原始图对应的数据。And a restoration module configured to restore the weighted hypergraph subgraph to data corresponding to the original graph.
  6. 如权利要求5所述的装置,其中:The apparatus of claim 5 wherein:
    所述转换模块,是设置为:通过并行的标签传递算法将原始图数据中具有同样标签的顶点聚合成一个超点,所述超点的权值为该超点所包含的顶点个数;所述超点之间的连边为超边,由原始图中的边决定所述超边的权值;由所述超点和所述超边构所述带权超图。The conversion module is configured to: aggregate the vertices having the same label in the original graph data into a super point by a parallel label transfer algorithm, and the weight of the super point is the number of vertices included in the super point; The connected edge between the super points is a super edge, and the weight of the super edge is determined by the edge in the original image; the weighted hypergraph is constructed by the super point and the super edge.
  7. 如权利要求6所述的装置,其中:The apparatus of claim 6 wherein:
    由原始图中的边决定所述超边的权值包括:如原始图中的边的两个端点在所述带权超图中属于不同的超点,则该两个超点之间存在一条超边,该超边的权值增加1;如原始图中的边的两个端点在所述带权超图中属于同一下超点,则不产生超边。Determining the weight of the super edge from the edge in the original graph includes: if two endpoints of the edge in the original graph belong to different hyperpoints in the weighted hypergraph, then there is a strip between the two hyperpoints The super edge, the weight of the super edge is increased by 1; if the two endpoints of the edge in the original graph belong to the same lower hyperpoint in the weighted hypergraph, no super edge is generated.
  8. 如权利要求5-7任一项所述的装置,其中:A device according to any of claims 5-7, wherein:
    所述划分模块,是设置为:以所述带权超图中的超点为起点依次计算最小化局部切分率值,根据所述最小化局部切分率值将所述带权超图划分为指定块数的带权超图子图。The dividing module is configured to: sequentially calculate a minimized partial cut rate value starting from a super point in the weighted hypergraph, and divide the weighted hypergraph according to the minimized partial cut rate value A weighted supergraph subgraph that specifies the number of blocks.
  9. 一种计算机程序,包括程序指令,当该程序指令被图数据分割装置执行时,使得该装置可执行权利要求1-4任一项所述的方法。A computer program comprising program instructions which, when executed by a data segmentation device, cause the device to perform the method of any of claims 1-4.
  10. 一种载有权利要求9所述计算机程序的载体。 A carrier carrying the computer program of claim 9.
PCT/CN2014/087091 2014-05-05 2014-09-22 Graph data partitioning method and device WO2015169029A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410187377.2 2014-05-05
CN201410187377.2A CN105096297A (en) 2014-05-05 2014-05-05 Graph data partitioning method and device

Publications (1)

Publication Number Publication Date
WO2015169029A1 true WO2015169029A1 (en) 2015-11-12

Family

ID=54392070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/087091 WO2015169029A1 (en) 2014-05-05 2014-09-22 Graph data partitioning method and device

Country Status (2)

Country Link
CN (1) CN105096297A (en)
WO (1) WO2015169029A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557581A (en) * 2016-11-29 2017-04-05 佛山科学技术学院 A kind of hypergraph division methods migrated based on multi-level framework and super side
CN115601565A (en) * 2022-12-15 2023-01-13 安徽大学(Cn) Large-span steel structure fixed feature extraction method based on minimum valley distance
CN116894097A (en) * 2023-09-04 2023-10-17 中南大学 Knowledge graph label prediction method based on hypergraph modeling

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10120956B2 (en) * 2014-08-29 2018-11-06 GraphSQL, Inc. Methods and systems for distributed computation of graph data
CN105550765B (en) * 2015-11-30 2020-02-07 中国科学技术大学 Method for selecting representative elements in road network distance calculation
CN112994916B (en) * 2019-12-17 2024-05-24 中兴通讯股份有限公司 Service state analysis method, server and storage medium
CN113191405B (en) * 2021-04-16 2023-04-18 上海思尔芯技术股份有限公司 Integrated circuit-based multilevel clustering method with weight hypergraph and storage medium
CN113792170B (en) * 2021-11-15 2022-03-15 支付宝(杭州)信息技术有限公司 Graph data dividing method and device and computer equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810113A (en) * 2012-06-06 2012-12-05 北京航空航天大学 Hybrid clustering method aiming at complicated network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8411952B2 (en) * 2007-04-04 2013-04-02 Siemens Aktiengesellschaft Method for segmenting an image using constrained graph partitioning of watershed adjacency graphs
US8428363B2 (en) * 2011-04-29 2013-04-23 Mitsubishi Electric Research Laboratories, Inc. Method for segmenting images using superpixels and entropy rate clustering
CN102663108B (en) * 2012-04-16 2013-11-13 南京大学 Medicine corporation finding method based on parallelization label propagation algorithm for complex network model
CN103699606B (en) * 2013-12-16 2017-03-01 华中科技大学 A kind of large-scale graphical partition method assembled with community based on summit cutting

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810113A (en) * 2012-06-06 2012-12-05 北京航空航天大学 Hybrid clustering method aiming at complicated network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUO, SHENGMEI ET AL.: "Implementation of a parallel graph partitioning algorithm to speed up BSP computing.", 2014 11 INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY., 21 August 2014 (2014-08-21), pages 740 - 744, XP032701347 *
ZENG, ZENGFENG ET AL.: "A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining.", BIGMINE, vol. 12, 12 August 2012 (2012-08-12), pages 61 - 68, XP058009568 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557581A (en) * 2016-11-29 2017-04-05 佛山科学技术学院 A kind of hypergraph division methods migrated based on multi-level framework and super side
CN106557581B (en) * 2016-11-29 2021-02-12 佛山科学技术学院 Hypergraph division method based on multi-level framework and hyperedge migration
CN115601565A (en) * 2022-12-15 2023-01-13 安徽大学(Cn) Large-span steel structure fixed feature extraction method based on minimum valley distance
CN116894097A (en) * 2023-09-04 2023-10-17 中南大学 Knowledge graph label prediction method based on hypergraph modeling
CN116894097B (en) * 2023-09-04 2023-12-22 中南大学 Knowledge graph label prediction method based on hypergraph modeling

Also Published As

Publication number Publication date
CN105096297A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
WO2015169029A1 (en) Graph data partitioning method and device
Jiang et al. An exact algorithm for the maximum weight clique problem in large graphs
JP2021517295A (en) High-efficiency convolutional network for recommender systems
Li et al. Modular community detection in networks
Yu et al. Generalizing graph matching beyond quadratic assignment model
Curien et al. Percolation on random triangulations and stable looptrees
Kondor et al. Multiresolution matrix factorization
WO2019127299A1 (en) Data query method, and electronic device and storage medium
Liu et al. A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems
Yang et al. Optimization equivalence of divergences improves neighbor embedding
CN105814582B (en) Method and system for recognizing human face
Leng et al. Hashing for distributed data
US10474690B2 (en) Disjunctive rule mining with finite automaton hardware
JP2023166313A (en) Importance evaluating method and device for complicated network node
CN108960251A (en) A kind of images match description generates the hardware circuit implementation method of scale space
AU2017288044B2 (en) Method and system for flexible, high performance structured data processing
CN113692591A (en) Node disambiguation
US9558313B1 (en) Method and system for providing a game theory based global routing
CN109952742B (en) Graph structure processing method, system, network device and storage medium
Qiu et al. Heterogeneous assignment of functional units with gaussian execution time on a tree
CN109635183A (en) A kind of community-based partner&#39;s recommended method
Evertz Vectorized search for single clusters
Konstantopoulos et al. Convergence to the Tracy-Widom distribution for longest paths in a directed random graph
Hong et al. Louvain-based multi-level graph drawing
US8566761B2 (en) Network flow based datapath bit slicing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14891521

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14891521

Country of ref document: EP

Kind code of ref document: A1