CN112883241A

CN112883241A - Supercomputer benchmark test acceleration method based on connected component generation optimization

Info

Publication number: CN112883241A
Application number: CN202110293568.7A
Authority: CN
Inventors: 白皓; 甘新标; 张一鸣; 李东升; 贾孟涵; 谭雯; 司嘉奇; 来宪龙; 李海莉; 来乐; 宣栋梁; 苏鸿宇; 王庆坤; 徐云鹏
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2021-06-01
Anticipated expiration: 2041-03-19
Also published as: CN112883241B

Abstract

The invention discloses a supercomputer benchmark test acceleration method based on connected component generation optimization, aiming at minimizing a communication path, maximizing the utilization rate of memory access bandwidth and accelerating the supercomputer big data benchmark test; the technical scheme is that the characteristic that a Graph generated by Graph500 comprises a plurality of connected components is utilized, the connected components are quickly found in the Graph, two-dimensional vectors are adopted to store the connected components, path compression is carried out on parent-child relations of vertexes in the connected components, two connected components with different root vertexes are merged, and the vertexes of the same connected component are divided to physical nodes with short communication paths in a super computer, so that the communication cost is small when the Graph is traversed and accessed, and the operation speed is high. The method can effectively and quickly store all the connected components in the graph, improve the merging speed to the maximum extent, accelerate the query speed of the root vertex, reduce the occupied overhead of the stack in the memory and improve the test speed of the big data processing capacity of the supercomputer.

Description

Supercomputer benchmark test acceleration method based on connected component generation optimization

Technical Field

The invention relates to a method for accelerating big data benchmark test of a supercomputer, in particular to a method for accelerating the benchmark test by generating and optimizing a connected component based on two-dimensional vector and path compression.

Background

The graph is a common data structure and can be used for abstracting and expressing various complex association relations among real things. For example, social networks, the world Wide Web, and the like may all be represented using graphs. Graph calculation is to process and calculate graph data, and plays an important role in many scenes in real life. In recent years, the scale of graph data is continuously increased, according to related reports, in the third quarter of 2020, Facebook is 18.2 billion per day active users, Tencent WeChat is 12.1 billion in active users, and the relationship between the users and the active users is abstracted into points and edges in the graph, so that the scale of the points in the graph reaches billions, and the scale of the edges reaches thousands of billions. This results in higher demands in terms of data storage and computational power. The supercomputer is mainly used for numerical calculation, and in the big data age with widely-rising data intensive application, the Graph500 is an important benchmark test program for testing the computational power (namely the processing capacity of the supercomputer on data). Graph500 measures the supercomputer's big data processing capacity in terms of the number of edges Per second, teps, (transformed Edge Per second) traversed in the Graph.

The Graph500 benchmark test program consists of four steps of Graph generation, Graph establishment, BFS search and verification and result output, as shown in FIG. 1.

(1) And (3) generating a graph: generating a random graph structure G (V, E) through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, and the scale of the graph is determined by parameters scale and edge factor input by a user, wherein scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2^scaleThe number of vertices of the input graph, i.e., the number of elements of V, and M ═ edgefactor × N, the number of edges of the input graph, i.e., the number of elements of E. Commonly using v_iRepresenting the vertex numbered i in the figure, using the vertex pair (v)_i，v_j) Representing the edge from vertex i to vertex j. (v)_i，v_j)∈E，0≤i≤N-1，0≤j≤N-1，N＝2^scaleThe number of vertices in V.

(2) Establishing a graph: the vertex and side information generated in the first step is converted into a data structure arbitrarily representing a Graph, and the Graph information is stored using the adjacency matrix of the Graph in the standard Graph 500.

(3) BFS (Breadth-First Search Breadth First Search) Search and validation: and randomly generating a root point, carrying out BFS search on the whole Graph by taking the root point as a source point, outputting a spanning tree as a search result, recording the effective timing time t of the Graph500, and verifying whether the BFS spanning tree obtained by the search is matched with the original Graph information. The process will loop 64 times and each BFS search portion will be clocked separately.

(4) And (4) outputting a result: the Graph500 measures the execution performance of the program by using the number of edges per second (TEPS), which is the number of edges M generating the Graph divided by the BFS search time t, that is, the TEPS is calculated by 64 loop traversals respectively (M/t), and then the average value of 64 TEPS is taken as the basis for the final test and ranking of the Graph 500.

In summary, in the Graph500, all the vertices in the Graph need to be traversed sequentially in the BFS search, and the vertices are distributed on numerous physical nodes of the supercomputer, so the memory access bandwidth is a key factor affecting the performance of the Graph 500. Moreover, the Graph500 mainly uses the BFS search time as the measurement time, and the Graph data preparation before BFS is not limited. The Graph500 is mainly applied to benchmark testing of a super computer system on computing capacity of big data, the super computer system is generally composed of computing nodes, storage nodes and an interconnection network, wherein the computing nodes and the storage nodes are connected by the interconnection network through a switch to form the super computer system, interaction information among the nodes is forwarded through the switch (the number of the switches needing to pass through among 2 mutually communicated nodes is called the hop number of a path among the nodes, the path is longer when the hop number is larger), forwarding information reaches a target node through the interconnection network, generally, a storage space of the storage node is uniformly mapped to the computing nodes, namely, a plurality of computing nodes can share one storage node, but a storage space with non-mapped access of the computing nodes (namely, the storage space with access of the computing nodes is not the storage space corresponding to the computing nodes) needs interconnection communication, and the testing performance of the Graph500 is mainly limited by the size of a memory and the access bandwidth, the higher the bandwidth, the better the performance. Therefore, if 2 computing nodes needing interaction share the storage space, the communication path can be reduced, and the memory access speed is improved.

In a 'vertex reordering and priority caching based big data processing capability test method' (patent application number: 202010748396.3), by utilizing the characteristic that BFS traverses a high probability of edge relation between a vertex with a high medium number and a root node, a vertex reordering and priority caching based big data processing capability test method is provided, so that the access and storage times are reduced, but the connectivity of a connected component in a big graph is not considered in the method.

The connected component of an undirected graph, also called a maximal connected subgraph, refers to a subgraph in which each pair of vertices can be connected with each other through a path. Connected component algorithms are often an important step in large-scale graph processing. The non-connected graph can be decomposed into a plurality of connected components, each connected component corresponds to at least 1 spanning tree (a tree containing all the vertexes of the connected components), and the set of the spanning trees of the connected components forms a connected forest. If the vertices in the same connected component are divided to the physical nodes with shorter communication paths during point division, that is, the vertices with edge association are distributed to the routing range of the switches on the same layer as much as possible according to the network topology structure of the computer system in the Graph500 benchmark test program Graph establishing process (that is, the second step), and the BFS is operated on the basis, the bandwidth utilization rate of the super computer system can be improved, the load balance is realized, the communication overhead is reduced, and the generation of the spanning tree is accelerated.

A plurality of scholars realize optimization of a connected component algorithm on a single processor and a plurality of processors, but the connected component algorithm is only applied to scenes such as reachability query and consistency detection at present, and no published document relates to application of the connected component algorithm to acceleration of big data benchmark test of a super computer.

The Graph generated by the Graph500 comprises a plurality of connected components, and if the algorithm for realizing the connected components can be optimized, the benchmark test of the big data of the super computer is accelerated. The union set search algorithm is one of the realization modes of the connected components, and the traditional union set search algorithm comprises the following steps:

in the first step, the root vertex (find) of the two vertices of the currently visited edge is found, as shown in fig. 2(a), where fig. 2(a) is a find (find) operation, which means that the root vertex is found by continuously tracing back the parent vertex in the tree.

Second, if the root vertices are different, the set of two vertices is merged (i.e., union). As shown in fig. 2(b), fig. 2(b) is a merge (union) operation, which means that 2 trees with different root vertices are merged into 1 tree.

And the set searching algorithm expresses each set by 1 tree, each vertex has its father vertex, and the father vertex is continuously searched until the father vertex is the root of the tree, which is the root vertex of all the vertices. In the conventional union set algorithm, only the parent vertices of partial vertices are changed in the union (merging), which results in multiple calls to the function for finding the parent vertices (including function recursion) in the find stage, and the cost of the function recursion is definitely huge in large-scale graph calculation. Therefore, the union-check set algorithm cannot be directly used for the large data benchmark test acceleration of the super computer.

Although optimization algorithms for path compression are also presented to optimize and find algorithms, for example, the 1985 article "On the expected performance of path compression algorithms" in SIAM Journal On Computing (SICOMP) analyzes the complexity of path compression, so that it is not necessary to call a function for finding parent vertices many times when finding, and reduces function recursion, but the current optimization algorithms for path compression do not consider the coupling degree with data structures, resulting in general optimization effect. Therefore, the path compression union search algorithm cannot be directly used for the large data benchmark test acceleration of the super computer.

Therefore, how to minimize the communication path and maximize the utilization rate of the memory access bandwidth and accelerate the big data benchmark test of the super computer is still a technical problem which needs to be solved urgently by the technical personnel in the field.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the communication path is minimized, the memory access bandwidth utilization rate is maximized, and the large data benchmark test of the super computer is accelerated.

The technical scheme includes that the characteristic that a Graph generated by Graph500 comprises a plurality of connected components is utilized, the connected components are quickly found in the Graph, two-dimensional vectors are adopted to store the connected components, path compression is carried out on parent-child relations of vertexes in the connected components, two connected components with different root vertexes are combined, and the vertexes of the same connected component are divided to physical nodes with short communication paths in the super computer, so that communication overhead is reduced when traversal access of the Graph is executed, operation speed is increased, and big data processing capacity testing speed of the super computer is increased.

The specific technical scheme is as follows:

the first step, graph generation. Generation by Kronecker graph generatorThe graph structure G is (V, E), V is a vertex set, E is an edge set, the scale of the graph is determined by parameters scale and edge factor input by a user, wherein the scale indicates the scale of the vertex of the graph, the edge factor indicates the average number of connected edges of each vertex, and N is 2^scaleThe number of vertices of G, i.e., the number of vertices in V, and M ═ edgefactor × N indicates the number of edges of G, i.e., the number of elements of E. Using v_iDenotes the vertex with number i in G, and uses the vertex pair (v)_i，v_j) Representing a vertex v_iTo the vertex v_jThe edge of (2). (v)_i，v_j) Belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1.

And secondly, constructing an adjacency matrix A for storing the graph G. A. the_ij0 denotes the vertex v_iAnd vertex v_jBetween which there is no edge, A _ij1 denotes the vertex v_iAnd vertex v_jWith edges in between.

Thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge (the self-loop edge refers to an edge connecting a vertex and the self-loop edge) in the edge set E to eliminate interference of the self-loop edge, classifying according to different conditions of the two vertexes of the edge, and facilitating next processing, wherein the method comprises the following steps:

3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] V_i]Representing a vertex v_iRoot vertex of root, root [ v ]_i]Initialisation to-1, son is a two-dimensional vector comprising N elements, each element being a vector, initialising each element in son to a null vector. son [ v ]_i]Representing a vertex v_iIs used to store the sub-vertex vector of (a) with the vertex v_iSet of vertices being root vertices, i.e. with vertex v_iThe vertex information is vertex information of a connected component of the root vertex, and the connected component is the content stored in the son vector by the vertex whose root value is itself. The initialization variable e is 1.

3.2. And creating a structural body, namely, a packed _ edge consistent with an edge data storage format in the Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for later function expansion and has no direct relation with the acceleration method. For simplicity and clarity of description, the first vertex of the edge e _ b with the ID v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with the ID v1_ low is denoted by e _ b.v1_ low.

3.3. An edge E _ b is created with the trellis coded _ edge for storing the edge information read from E.

3.4. If E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5.

3.5. And if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4.

3.6. And judging whether the root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] -, turning to the fifth step. The process of finding the root vertices of e _ b.v0_ low and e _ b.v1_ low in this step is the find stage step in the union-find algorithm.

And fourthly, processing the condition that the root vertexes are the same. If neither E _ b.v0_ low nor E _ b.v1_ low has been visited, then E _ b.v0_ low and E _ b.v1_ low are merged into the connected component son [ E _ b.v0_ low ] with E _ b.v0_ low as the root vertex, whereas if both E _ b.v0_ low and E _ b.v1_ low have been visited, meaning that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, then E _ b is skipped, accessing the next edge in E. The method comprises the following steps:

4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is stated that both e _ b.v0_ low and e _ b.v1_ low have been accessed and already exist in the same connected component son [ root [ e _ b.v0_ low ] (i.e. an element in the same vector in son), no merging operation is needed, and 3.4 is turned.

4.2. The method includes the steps of merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting the e _ b.v0_ low as the root vertex of two vertexes, namely e _ b.v0_ low and e _ b.v1_ low, and setting elements corresponding to root vectors of the e _ b.v0_ low and the e _ b.v1_ low as e _ b.v0_ low, namely setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low.

4.3. And inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], and inserting e _ b.v1_ low into son [ e _ b.v0_ low ]. 4.2 and 4.3 are the merging phases of the corresponding and looking-up algorithms for the changes to the root and son vectors. And 3.4.

And fifthly, carrying out path compression on the parent-child relationship of the vertex according to the different conditions of the root vertex, and merging two different connected components of the root vertex. In the process of traversing the edge, vertex information of the connected components stored in the son vector is continuously improved, finally, all the connected components have one root value as a vertex of the connected components, the root vertex is the root vertex of the connected components, and all the vertex information of the connected components where the vertex is located is stored in the son vector corresponding to the vertex with the root value as the vertex. The method comprises the following steps:

5.1. if root e _ b.v0_ low is-1, go to 5.2, otherwise go to 5.3.

5.2. At this time, the root [ e _ b.v0_ low ] ═ 1, and the root [ e _ b.v1_ low ] ≠ -1, which indicates that the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited at this time, e _ b.v0_ low is inserted into the connected component son [ e _ b.v1_ low ] (with the root vertex of e _ b.v1_ low) as the root vertex) (i.e., e _ b.v0_ low is merged into the connected component with the root vertex [ e _ b.v1_ low ] (representing the root vertex of e _ b.v1_ low), and the root of e _ b.v0_ low is the root vertex of e _ b.v1_ low, so that the root of the root [ e _ b.v0_ low ] (with the root vertex of e _ b.v1_ low) is merged into the root vertex of e _ b.v1_ low, and the root of the root _ b.v0_ low is merged into the root component of the root (e _ b.v0.v0.v0 _ low), and the root of the root.

5.3. If root e b v1 low ═ 1, go to 5.4, otherwise go to 5.5.

5.4. At this time, the root [ e _ b.v1_ low ] is equal to-1, and the root [ e _ b.v0_ low ] is not equal to-1, which indicates that the vertex e _ b.v0_ low has been visited and the vertex e _ b.v1_ low has not been visited at this time, e _ b.v1_ low is inserted into a connected component son [ e _ b.v0_ low ] corresponding to the root vertex of e _ b.v0_ low, that is, e _ b.v1_ low is merged into a connected component with the root vertex [ e _ b.v0_ low ] as the vertex, and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, that is, the root [ e _ b.v0_ low ] is inserted and changed into the root vertex of e _ b.v0_ low ], that two common vertex pairs of the root [ e _ b.v0_ low ] are merged and the root vertex is changed into a new root component information of e _ b.v0_ low.

5.5. At this time, the number of vertexes of the connected component corresponding to e _ b.v0_ low and e _ b.v1_ low, that is, the number of elements of son [ root [ e _ b.v0_ low ] and son [ root [ e _ b.v1_ low ], is compared. If son [ root [ e _ b.v0_ low ] ]. size < son [ root [ e _ b.v1_ low ] ]. size (size represents the number of elements of a vector, son [ root [ e _ b.v0_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v0_ low, and son [ root [ e _ b.v1_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v1_ low), the IDs of the two vertices are reversed so that the number of elements of son [ root [ e _ b.v0_ low ] is greater than the number of elements of son [ root [ e _ b.v1_ low ]. This is done to perform a merging operation of a connected component having a smaller number of vertices into another connected component when the connected component having root [ e _ b.v0_ low ] as a root vertex and the connected component having root [ e _ b.v1_ low ] as a root vertex are merged. And 5.6.

5.6. Let the root vertex of e _ b.v. 1_ low be root _ v2, i.e. let root _ v2 be root [ e _ b.v. 1_ low ], set the loop variable i, and initialize i to 1.

5.7. If i is equal to son [ root _ v2]. size, it is described that all vertices in the connected components stored in son [ root _ v2] have traversed, all vertex elements in son [ root _ v2] have been inserted into the connected components to be merged, the loop ends, and the loop is turned to 3.4. Otherwise, the description is not traversed and 5.8 is rotated.

5.8. The path compression is performed on the parent-child relationship of the vertex in the connected component son [ root [ e _ b.v0_ low ] with root [ e _ b.v0_ low ] as the root vertex, and each element in son [ root _ v2] is adjusted to be the direct child vertex of the root [ e _ b.v0_ low ] so that no intermediate-level vertex appears between each element in son [ root _ v2] and the root vertex of e _ b.v0_ low ]. The path compression method comprises the following steps: the root vertex of the i-th element in the son [ root _ v2], i.e., the vertex son [ root _ v2] [ i ], is set to root [ e _ b.v0_ low ], that is, the root [ son [ root _ v2] [ i ] ] [ root _ b.v0_ low ], and the vertex son [ root _ v2] [ i ] is inserted into the connected component son [ root [ e _ b.v0_ low ].

5.9. Let i equal i +1, go to 5.7.

And sixthly, finding out the information of all the connected components in the graph G, and dividing the graph G by adopting the connected components. And distributing the graph G divided by the connected components to each processing node of the super computer by adopting scatter divergent operation (distributed parallel programming MPI standard operation function). The method comprises the following steps:

6.1. the vertex number j and the connected component number k are set, and the initialization j is 0 and k is 1.

6.2. And if j is equal to N-1, the vertex is traversed, and k is the number of the connected components in the graph G at the moment, and the seventh step is executed, otherwise, 6.3 is executed.

6.3. Judging whether the root vertex of the vertex vj with the sequence number j is self, namely if root [ v ]_j]≠v_jDescription of the vertex v_jNot the root vertex of the connected component, go 6.5. Otherwise, the vertex v is described_jIs the root vertex of the kth connected component, son [ v [ ]_j]All vertices of the kth connected component are stored, which acquires all vertices of the kth connected component. Let k be k + 1. And 6.4.

6.4. Josson [ v ]_j]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation.

6.5. Let j equal j +1, go to 6.2.

Step seven, BFS searching and verifying: and randomly generating a root vertex v, and carrying out BFS search on the graph G which is divided by adopting the connected component in the sixth step by taking v as a source point by combining the side information stored in the adjacency matrix A constructed in the second step. And outputting the spanning tree as a search result, recording the Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by searching is matched with the original image information. The process will loop 64 times and each BFS search portion will be clocked separately.

And eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to test the average value of the performance values, obtaining a test result and outputting the test result. The higher the TEPS value is, the stronger the large-scale Graph processing capacity of the surface supercomputer is, the more the Graph500 is ranked, and meanwhile, the supercomputer is more suitable for processing large data.

And ninthly, ending.

The invention can achieve the following technical effects:

1. in the third step of the invention, a two-dimensional vector data structure son is established to store the connected components, so that all the connected components in the graph G can be effectively and quickly stored, the merging speed of the sub-vertexes during the merging operation in the fifth step is improved to the maximum extent, and the efficiency of visiting and traversing the sub-vertexes is optimized.

2. In the invention, the root vector is utilized to realize the searching operation in 3.6 steps, the function for searching the parent vertex is not required to be called for many times, the root vertex can be obtained by directly inquiring the element of the root vector, and the operation is matched with the path compression carried out during the merging in the fifth step, so that the inquiring speed of the root vertex is accelerated.

3. In the fifth step, the path compression is carried out on the parent-child relationship of the vertexes in the connected components, when the two connected components with different root vertexes are combined, not only are the roots of the root vertexes with less child vertexes changed, but also the roots of all the child vertexes are completely changed into new root vertexes, so that the calling level of the search operation is reduced, the occupied cost of a stack in a memory is reduced, and the speed is increased.

4. In the sixth step, the generated connected components compressed by the paths are used for dividing the vertexes in the graph among the physical nodes in the supercomputer, and the vertexes of the same connected component are divided to the physical nodes with shorter communication paths, so that the communication paths are reduced, and the testing speed is improved. And the vertex belonging to the same connected component is only divided to the physical node with a shorter communication path, and when the BFS is carried out in the seventh step, the information of the edge is still obtained from the adjacency matrix A constructed in the second step, so that the accuracy of the Graph500 test is ensured.

Drawings

FIG. 1 is a flowchart of a Graph500 test benchmark program described in the background art.

Fig. 2 is a diagram of main steps of a related art union-finding algorithm, in which fig. 2(a) is a find (find) operation and fig. 2(b) is a merge (unity) operation.

FIG. 3 is an overall flow chart of the present invention.

Fig. 4 is a schematic diagram of the fifth step of compressing the path according to the present invention.

FIG. 5 is a schematic diagram of a son two-dimensional vector memory.

FIG. 6 is a schematic diagram of graph division by using connected components in the sixth step of the present invention.

The specific implementation mode is as follows:

the invention is further illustrated below with reference to the accompanying drawings, as shown in fig. 3, comprising the following steps:

the first step, graph generation. Generating a random graph structure G (V, E) through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, and the scale of the graph is determined by parameters scale and edge factor input by a user, wherein scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2^scaleThe number of vertices of G, i.e., the number of vertices in V, and M ═ edgefactor × N indicates the number of edges of G, i.e., the number of elements of E. Using v_iDenotes the vertex with number i in G, and uses the vertex pair (v)_i，v_j) Representing a vertex v_iTo the vertex v_jThe edge of (2). (v)_i，v_j) Belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1.

And secondly, constructing an adjacency matrix A for storing the graph G. A. the_ij0 denotes the vertex v_iAnd vertex v_jBetween which there is no edge, A_ij1 denotes the vertex v_iAnd vertex v_jWith edges in between.

Thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge in the edge set E to eliminate interference of the self-loop edge, classifying according to different conditions of the two vertexes of the edge, and facilitating next processing, wherein the method comprises the following steps:

3.2. And creating a structural body, namely, a packed _ edge consistent with an edge data storage format in the Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for later function extension. The first vertex of the edge e _ b with ID v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with ID v1_ low is denoted by e _ b.v1_ low.

3.6. And judging whether the root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] -, turning to the fifth step.

4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is stated that both e _ b.v0_ low and e _ b.v1_ low are accessed and already in the same connected component son [ e _ b.v0_ low ], no merging operation is required, and go to 3.4.

4.3. And inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], and inserting e _ b.v1_ low into son [ e _ b.v0_ low ]. And 3.4.

And fifthly, carrying out path compression on the parent-child relationship of the vertex according to the different conditions of the root vertex, and merging two different connected components of the root vertex. The method comprises the following steps:

5.1. if root e _ b.v0_ low is-1, go to 5.2, otherwise go to 5.3.

5.3. If root e b v1 low ═ 1, go to 5.4, otherwise go to 5.5.

5.4. At this time, the vertex e _ b.v0_ low has been visited, and the vertex e _ b.v1_ low has not been visited, e _ b.v1_ low is inserted into the connected component son [ root [ e _ b.v0_ low ]) corresponding to the root vertex of e _ b.v0_ low, i.e. e _ b.v1_ low is merged into the connected component using the vertex root [ e _ b.v0_ low ], and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, i.e. the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], and the insertion and the change of the root vertex together complete the merging operation of the two connected components, and the connected component using the root [ e _ b.v0_ low ] as the root vertex is added with new vertex information, and the vertex information is converted into 3.4.4.

5.8. For root [ e _ b.v0_ low ]]Is the root vertex's connected component son [ e _ b.v0_ low]]Compressing the path of parent-child relationship of middle vertex, and compressing son [ root _ v2]]Is adjusted to root e _ b.v0_ low]Such that son root _ v2]No intermediate level vertices are present between each element in (a) and the root vertex of e _ b.v0_ low. The path compression method comprises the following steps: joson root _ v2]The ith element in (1), i.e., vertex son [ root _ v2][i]Is set to root e _ b.v0_ low]In an order of root (son _ v 2)][i]]＝root[e_b.v0_low]And connects the vertex son [ root _ v2]][i]Insert connected component son [ e _ b.v0_ low ]]]. FIG. 5 is a diagram of son vector storage according to the present invention, wherein son vectors store all child vertices that depend from the root vertex. In FIG. 5, vector son [ v ]₀]Due to the element number ratio vector son [ v ]₂]Less, it is necessary to copy the elements in the vector to the vector son v₂]In the order of son [ v ]₀]The elements in (a) are copied to the vector son [ v ] in turn₂]To the end of (c). son [ v ]₀][0]、son[v₀][1]、son[v₀][2].. shows the vertex v before merging₀The IDs of vertices that are root vertices, which are each replicated to son after merging₂][m]、son[v₂][m+1]、son[v₂][m+2]... (m is son [ v ]₂]The starting subscript of the newly added element).

FIG. 4 is a schematic diagram of the fifth step of compressing paths according to the parent-child relationship of the vertices and merging two connected components with different root vertices, according to the situation that the root vertices are different, corresponding to operations 5.5-5.8 of the present invention, showing when the access is from v₀And v₂When the side is formed, son [ v ]₀]Stored connected component sum son [ v [ ]₂]The stored connected components merge due to son [ v [ ]₀]The number of the middle vertexes is less, so that the vertexes are directly connectedDown to son [ v ]₂]Store connected component under root vertex, i.e. vertex son [ v ]₀][0]、son[v₀][1]、son[v₀][2].. as son [ v ]₂][0]Are respectively copied as son [ v ]₂][m]、son[v₂][m+1]、son[v₂][m+2].., the tree hierarchy is reduced.

5.9. Let i equal i +1, go to 5.7.

And sixthly, finding out the information of all the connected components in the graph G, and dividing the graph G by adopting the connected components. And distributing the graph G divided by the connected components to each processing node of the super computer by adopting scatter divergence operation.

The method comprises the following steps:

6.6. the vertex number j and the connected component number k are set, and the initialization j is 0 and k is 1.

6.7. And if j is equal to N-1, the vertex is traversed, and k is the number of the connected components in the graph G at the moment, and the seventh step is executed, otherwise, 6.3 is executed.

6.8. Determining the vertex v with sequence number j_jIs itself, i.e. if root v_j]≠v_jDescription of the vertex v_jNot the root vertex of the connected component, go 6.5. Otherwise, the vertex v is described_jIs the root vertex of the kth connected component, son [ v [ ]_j]All vertices of the kth connected component are stored, which acquires all vertices of the kth connected component. Let k be k + 1. And 6.4.

6.9. Josson [ v ]_j]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation.

6.10. Let j equal j +1, go to 6.2.

FIG. 6 is a schematic diagram of graph division using connected components in the sixth step of the present invention. The storage nodes on the super computer system are represented by boxes, CCk (k is 1, 2, 3.) -in each box represents all connected components in the graph G, sequence numbers on connecting lines identify the actual access sequence, the 'say.' before and after the numbers represent the complexity of the path and the number of network hops required to be passed, the more the 'say.' the more the communication path is, the more the number of hops required to be passed is, when the connected components are used for dividing the graph, for the vertex vi, the connected component where the vi is located is assumed to be CC1(k is 1), the vi root vertices are root [ vi ], all the vertices of the connected component where the vi is located can be obtained from son [ root [ vi ] ], other vertices in the CC1 are now divided to nodes which are closer to the node of the vertex vi, and the nodes are preferentially accessed.

And ninthly, ending.

Claims

1. A supercomputer benchmark test acceleration method based on connected component generation optimization is characterized by comprising the following steps:

firstly, generating a graph, namely (V, E) generating a random graph structure G through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, the scale of the graph is determined by parameters scale and edge factor input by a user, scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2^scaleThe number of vertices of G, that is, the number of vertices in the element of V, and M ═ edgefactor × N represents the number of edges of G, that is, the number of elements of E; using v_iDenotes the vertex numbered i in G,using vertex pair (v)_i,v_j) Representing a vertex v_iTo the vertex v_jThe edge of (1); (v)_i,v_j) E belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1;

second, construct the adjacency matrix A, A of the memory map G_ij0 denotes the vertex v_iAnd vertex v_jBetween which there is no edge, A_ij1 denotes the vertex v_iAnd vertex v_jThere is an edge in between;

thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge in the edge set E, namely an edge connecting the vertex with the edge, and classifying according to different conditions of the two vertexes of the edge, wherein the method comprises the following steps:

3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] V_i]Representing a vertex v_iRoot vertex of root, root [ v ]_i]Initializing to-1, wherein the son is a two-dimensional vector and comprises N elements, each element is a vector, and each element in the son is initialized to be a null vector; son [ v ]_i]Representing a vertex v_iIs used to store the sub-vertex vector of (a) with the vertex v_iSet of vertices being root vertices, i.e. with vertex v_iVertex information of a connected component which is a root vertex, wherein the connected component is the content stored in the son vector by the vertex of which the root value is self; initializing a variable e to 1;

3.2. creating a structural body, namely, a packed _ edge consistent with an edge data storage format in a Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for function extension; the first vertex of the edge e _ b with the ID of v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with the ID of v1_ low is denoted by e _ b.v1_ low;

3.3. creating an edge E _ b by using the structure packed _ edge for storing the edge information read from the E;

3.4. if E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5;

3.5. if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4;

3.6. judging whether root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fifth step;

step four, processing the condition that the root vertex is the same, if neither E _ b.v0_ low nor E _ b.v1_ low has been visited, merging E _ b.v0_ low and E _ b.v1_ low into a connected component son [ E _ b.v0_ low ] taking E _ b.v0_ low as the root vertex, if both E _ b.v0_ low and E _ b.v1_ low have been visited, indicating that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, skipping E _ b, and visiting the next edge in E, wherein the method comprises the following steps:

4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is stated that both e _ b.v0_ low and e _ b.v1_ low are accessed and already in the same connected component son [ e _ b.v0_ low ], no merging operation is required, and 3.4 is performed;

4.2. merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting e _ b.v0_ low as the root vertex of two vertexes, i.e. setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root vector of e _ b.v1_ low as e _ b.v0_ low, i.e. setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low;

4.3. inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], inserting e _ b.v1_ low into son [ e _ b.v0_ low ], and turning to 3.4;

and fifthly, performing path compression on parent-child relations of the vertexes according to different conditions of the root vertexes, and combining two different connected components of the root vertexes, wherein the method comprises the following steps:

5.1. if root [ e _ b.v0_ low ] ═ 1, go to 5.2, otherwise, go to 5.3;

5.2. when the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited, e _ b.v0_ low is inserted into the connected component son [ root [ e _ b.v1_ low ] ] corresponding to the root vertex of e _ b.v1_ low, i.e. e _ b.v0_ low is merged into the connected component with the vertex root [ e _ b.v1_ low ], and the root vertex of e _ b.v0_ low is changed into the root vertex of e _ b.v1_ low, i.e. the root [ e _ b.v1_ low ] (root [ e _ b.v0_ low ], the insertion and the change of the root together complete the merging operation of the two connected components, and the connected component with the root [ e _ b.v1_ low ] as the root is added with new vertex information, and the new vertex information is transferred by 3.4.4;

5.3. if root [ e _ b.v1_ low ] ═ 1, go to 5.4, otherwise, go to 5.5;

5.4. when the vertex e _ b.v0_ low has been visited and the vertex e _ b.v1_ low has not been visited, e _ b.v1_ low is inserted into the connected component son [ root [ e _ b.v0_ low ] corresponding to the root vertex of e _ b.v0_ low ], i.e. e _ b.v1_ low is merged into the connected component using the vertex root [ e _ b.v0_ low ], and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, i.e. the root [ e _ b.v0_ low ] is made equal to root [ e _ b.v1_ low ], and the merging operation of the two connected components is completed by inserting and changing the root vertex, and the connected component using the root [ e _ b.v0_ low ] as the root vertex is added with new vertex information, and the vertex information is converted into 3.4;

5.5. when root [ e _ b.v0_ low ] ≠ 1 and root [ e _ b.v1_ low ] ≠ -1, compare the numbers of vertices of connected components corresponding to e _ b.v0_ low and e _ b.v1_ low, i.e. compare the numbers of elements of son [ e _ b.v0_ low ] and son [ e _ b.v1_ low ], [ e _ b.v0_ low ], [ size < son [ e _ b.v1_ low ] ]. size, size represents the number of elements of the vector, son [ e _ b.v0_ low ], [ size ] represents the number of child nodes of the root element of e _ b.v0_ low ], size [ e _ b.v0_ low ], [ size ] represents the number of child nodes of the root element of e _ b.v0_ low ], [ number of child nodes of the vertex [ e _ b.v0_ low ], [ ID _ b.v0_ low ], [ number of child nodes of the root element of son _ b.v0_ low ], [ e _ b.1 ] so that the number of child nodes of the root element of son _ b.b.v0 _ low ], [ ID ] is greater than two child nodes; rotating by 5.6;

5.6. recording the root vertex of e _ b.v. 1_ low as root _ v2, namely, setting a loop variable i and initializing i as 1 by using root _ v2 as root [ e _ b.v. 1_ low ];

5.7. if i is son [ root _ v2]. size, go 3.4. Otherwise, turning to 5.8;

5.8. performing path compression on parent-child relationships of vertices in a son [ root [ e _ b.v0_ low ] of a connected component with the root [ e _ b.v0_ low ] as a root vertex, and adjusting each element in the son [ root _ v2] to be a direct child vertex of the root [ e _ b.v0_ low ] so that no vertex of an intermediate level exists between each element in the son [ root _ v2] and the root vertex of the e _ b.v0_ low; the path compression method comprises the following steps: setting a root vertex of an i-th element in the son [ root _ v2], that is, a vertex son [ root _ v2] [ i ] to root [ e _ b.v0_ low ], that is, making root [ son [ root _ v2] [ i ] ] [ root _ b.v0_ low ], and inserting vertex som [ root _ v2] [ i ] into a connected component son [ root [ e _ b.v0_ low ] ];

5.9. converting i to i +1, and converting to 5.7;

sixthly, finding out information of all connected components in the graph G, dividing the graph G by adopting the connected components, and distributing the graph G divided by adopting the connected components to each processing node of the super computer by adopting scatter divergence operation;

step seven, BFS searching and verifying: randomly generating a root vertex v, carrying out BFS search on a Graph G which is divided by adopting a connected component by taking v as a source point by combining side information stored in an adjacent matrix A, outputting a spanning tree as a search result, recording Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by search is matched with original Graph information or not; the process is circulated for 64 times, and each BFS searching part is timed;

eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to test the average value of the performance values, obtaining and outputting a test result;

and ninthly, ending.

2. The sixth step of claim 1, wherein the method for finding out the information of all the connected components in the graph G, dividing the graph G by using the connected components, and distributing the graph G divided by using the connected components to each processing node of the supercomputer by using scatter divergence operation comprises:

6.1. setting a vertex serial number j and a connected component serial number k, initializing j to be 0, and initializing k to be 1;

6.2. if j is equal to N-1, it is described that all the vertexes have been traversed, at this time k is the number of connected components in the graph G, the seventh step is switched, otherwise, 6.3 is switched;

6.3. determining the vertex v with sequence number j_jIs itself, i.e. if root v_j]≠v_jDescription of the vertex v_jTurning to 6.5 when the root vertex is not the connected component; otherwise, the vertex v is described_jIs the root vertex of the kth connected component, son [ v [ ]_j]Storing all the vertexes of the kth connected component, and acquiring all the vertexes of the kth connected component; let k be k + 1; 6.4, rotating;

6.4. josson [ v ]_j]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation;

6.5. let j equal j +1, go to 6.2.