CN112883241B - Supercomputer benchmark test acceleration method based on connected component generation optimization - Google Patents
Supercomputer benchmark test acceleration method based on connected component generation optimization Download PDFInfo
- Publication number
- CN112883241B CN112883241B CN202110293568.7A CN202110293568A CN112883241B CN 112883241 B CN112883241 B CN 112883241B CN 202110293568 A CN202110293568 A CN 202110293568A CN 112883241 B CN112883241 B CN 112883241B
- Authority
- CN
- China
- Prior art keywords
- low
- root
- vertex
- son
- edge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90348—Query processing by searching ordered data, e.g. alpha-numerically ordered data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a supercomputer benchmark test acceleration method based on connected component generation optimization, aiming at minimizing a communication path, maximizing the utilization rate of memory access bandwidth and accelerating the supercomputer big data benchmark test; the technical scheme is that the characteristic that a Graph generated by Graph500 comprises a plurality of connected components is utilized, the connected components are quickly found in the Graph, two-dimensional vectors are adopted to store the connected components, path compression is carried out on parent-child relations of vertexes in the connected components, two connected components with different root vertexes are merged, and the vertexes of the same connected component are divided to physical nodes with short communication paths in a super computer, so that the communication cost is small when the Graph is traversed and accessed, and the operation speed is high. The invention can effectively and quickly store all connected components in the graph, improve the merging speed to the maximum extent, accelerate the query speed of the root vertex, reduce the occupation overhead of the stack in the memory and improve the test speed of the big data processing capacity of the supercomputer.
Description
Technical Field
The invention relates to a method for accelerating big data benchmark test of a supercomputer, in particular to a method for accelerating the benchmark test based on two-dimensional vector and path compression connected component generation optimization.
Background
The graph is a common data structure and can be used for abstracting and expressing various complex association relations among real things. For example, social networks, the world Wide Web, and the like may all be represented using graphs. Graph calculation is to process and calculate graph data, and plays an important role in many scenes in real life. In recent years, the scale of graph data is continuously increased, according to related reports, in the third quarter of 2020, Facebook is 18.2 billion per day active users, Tencent WeChat is 12.1 billion in active users, and the relationship between the users and the active users is abstracted into points and edges in the graph, so that the scale of the points in the graph reaches billions, and the scale of the edges reaches thousands of billions. This results in higher demands in terms of data storage and computational power. The supercomputer is mainly used for numerical calculation, and in the big data era with widely-rising data intensive application, Graph500 is an important benchmark test program for testing the computational power (namely the data processing capability of the supercomputer). Graph500 measures the big data processing capability of supercomputers in terms of the number of edges Per second teps (translated Edge Per second) in the traversal Graph.
The Graph500 benchmark test program consists of four steps of Graph generation, Graph establishment, BFS search and verification and result output, as shown in FIG. 1.
(1) And (3) generating a graph: generating a random graph structure G (V, E) through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, and the scale of the graph is determined by parameters scale and edge factor input by a user, wherein scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2 scale The number of vertices of the input graph, i.e., the number of elements of V, and M ═ edgefactor ×, N, the number of edges of the input graph, i.e., the number of elements of EThe number of the cells. Commonly using v i Representing the vertex numbered i in the figure, using the vertex pair (v) i ,v j ) Representing vertex i to vertex j edges. (v) i ,v j )∈E,0≤i≤N-1,0≤j≤N-1,N=2 scale The number of vertices in V.
(2) Establishing a graph: the vertex and side information generated in the first step is converted into a data structure arbitrarily representing a Graph, and the Graph information is stored using the adjacency matrix of the Graph in the standard Graph 500.
(3) BFS (Breadth-First Search Breadth First Search) Search and validation: and randomly generating a root point, carrying out BFS search on the whole Graph by taking the root point as a source point, outputting a spanning tree as a search result, recording the effective timing time t of the Graph500, and verifying whether the BFS spanning tree obtained by the search is matched with the original Graph information. The process will loop 64 times and each BFS search portion will be clocked separately.
(4) And (4) outputting a result: the Graph500 measures the execution performance of the program by using the number of edges per second (TEPS), which is the number of edges M generating the Graph divided by the BFS search time t, that is, the TEPS is calculated by 64 loop traversals respectively (M/t), and then the average value of 64 TEPS is taken as the basis for the final test and ranking of the Graph 500.
In summary, in the BFS search in Graph500, all vertices in the Graph need to be traversed sequentially, and these vertices are distributed on numerous physical nodes of the supercomputer, so the memory access bandwidth is a key factor affecting the performance of Graph 500. Moreover, the Graph500 mainly uses the BFS search time as the measurement time, and the Graph data preparation before BFS is not limited. Graph500 is mainly applied to benchmark test of big data computing capability of a super computer system, the super computer system is generally composed of computing nodes, storage nodes and an interconnection network, wherein the computing nodes and the storage nodes are connected by the interconnection network through switches to form the super computer system, mutual information among the nodes is forwarded through the switches (the number of the switches needing to be passed among 2 mutually communicated nodes is called the hop number of a path among the nodes, the path is longer when the hop number is larger), forwarding information reaches a target node through the interconnection network, under general conditions, the storage space of the storage nodes can be uniformly mapped to the computing nodes, namely, a plurality of computing nodes can share one storage node, but the access and storage non-mapping storage space of the computing nodes (namely the access and storage of the computing nodes are not the storage space corresponding to the computing nodes) needs interconnection communication, the test performance of Graph500 is mainly limited by the memory size and the storage bandwidth, the higher the bandwidth, the better the performance. Therefore, if 2 computing nodes needing interaction share the storage space, the communication path can be reduced, and the memory access speed is improved.
In a 'vertex reordering and priority caching based big data processing capability test method' (patent application number: 202010748396.3), by utilizing the characteristic that BFS traverses a high probability of edge relation between a vertex with a high medium number and a root node, a vertex reordering and priority caching based big data processing capability test method is provided, so that the access and storage times are reduced, but the connectivity of a connected component in a big graph is not considered in the method.
The connected component of the undirected graph, also called the maximal-connected subgraph, refers to a subgraph in which each pair of vertices can be connected to each other through a path. Connected component algorithms are often an important step in large-scale graph processing. The non-connected graph can be decomposed into a plurality of connected components, each connected component corresponds to at least 1 spanning tree (a tree containing all the vertexes of the connected components), and the set of the spanning trees of the connected components forms a connected forest. If the vertexes in the same connected component are divided to the physical nodes with shorter communication paths during point division, namely, the vertexes with edge association are distributed to the routing range of the switches on the same layer as much as possible according to the network topology structure of the computer system in the Graph500 benchmark test program establishing process (namely, the second step), and the BFS is operated on the basis, the bandwidth utilization rate of the super computer system can be improved, the load balance is realized, the communication overhead is reduced, and the generation of a spanning tree is accelerated.
A plurality of scholars realize optimization of a connected component algorithm on a single processor and a plurality of processors, but the connected component algorithm is only applied to scenes such as reachability query and consistency detection at present, and no published document relates to application of the connected component algorithm to acceleration of big data benchmark test of a super computer.
The Graph generated by the Graph500 comprises a plurality of connected components, and if the algorithm for realizing the connected components can be optimized, the benchmark test of the big data of the super computer is accelerated. The union set search algorithm is one of the realization modes of the connected components, and the traditional union set search algorithm comprises the following steps:
in the first step, the root vertex (find) of the two vertices of the currently visited edge is found, as shown in fig. 2(a), where fig. 2(a) is a find (find) operation, which means that the root vertex is found by tracing back the parent vertex in the tree.
Second, if the root vertices are different, the set of two vertices is merged (i.e., union). As shown in fig. 2(b), fig. 2(b) is a merge (union) operation, which means that 2 trees with different root vertices are merged into 1 tree.
And the set searching algorithm expresses each set by 1 tree, each vertex has its father vertex, and the father vertex is continuously searched until the father vertex is the root of the tree, which is the root vertex of all the vertices. In the conventional union set algorithm, only the parent vertices of partial vertices are changed in the union (merging), which results in multiple calls to the function for finding the parent vertices (including function recursion) in the find stage, and the cost of the function recursion is definitely huge in large-scale graph calculation. Therefore, the union check set algorithm cannot be directly used for the large data benchmark test acceleration of the super computer.
Although optimization algorithms for path compression are also presented to optimize and find algorithms, for example, the 1985 article "On the expected performance of path compression algorithms" in SIAM Journal On Computing (SICOMP) analyzes the complexity of path compression, so that it is not necessary to call a function for finding parent vertices many times when finding, and reduces function recursion, but the current optimization algorithms for path compression do not consider the coupling degree with data structures, resulting in general optimization effect. Therefore, the path compression union search algorithm cannot be directly used for the large data benchmark test acceleration of the super computer.
Therefore, how to minimize the communication path and maximize the utilization rate of the memory access bandwidth and accelerate the big data benchmark test of the super computer is still a technical problem which needs to be solved urgently by the technical personnel in the field.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the communication path is minimized, the memory access bandwidth utilization rate is maximized, and the large data benchmark test of the super computer is accelerated.
The technical scheme includes that the characteristic that a Graph generated by Graph500 comprises a plurality of connected components is utilized, the connected components are quickly found in the Graph, two-dimensional vectors are adopted to store the connected components, path compression is carried out on parent-child relations of vertexes in the connected components, two connected components with different root vertexes are combined, and the vertexes of the same connected component are divided to physical nodes with short communication paths in the super computer, so that communication overhead is reduced when traversal access of the Graph is executed, operation speed is increased, and big data processing capacity testing speed of the super computer is increased.
The specific technical scheme is as follows:
the first step, graph generation. Generating a random graph structure G (V, E) through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, and the scale of the graph is determined by parameters scale and edge factor input by a user, wherein the scale indicates the scale of the vertex of the graph, the edge factor indicates the average number of connected edges of each vertex, and N is 2 scale The number of vertices of G, i.e., the number of vertices in V, and M ═ edgefactor × N indicates the number of edges of G, i.e., the number of elements of E. Using v i Denotes the vertex with number i in G, and uses the vertex pair (v) i ,v j ) Representing the vertex v i To the vertex v j Of (c) is performed. (v) i ,v j ) Belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1.
And secondly, constructing an adjacency matrix A of the storage map G. A. the ij 0 denotes the vertex v i And vertex v j Between which there is no edge, A ij 1 denotes the vertex v i And vertex v j With edges in between.
Thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge (the self-loop edge refers to an edge connecting a vertex and the self-loop edge) in the edge set E to eliminate interference of the self-loop edge, classifying according to different conditions of the two vertexes of the edge, and facilitating next processing, wherein the method comprises the following steps:
3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] V i ]Representing a vertex v i Root vertex of (c), root [ v ] i ]Initialisation to-1, son is a two-dimensional vector comprising N elements, each element being a vector, initialising each element in son to a null vector. son [ v ] i ]Representing a vertex v i For storing the sub-vertex vectors with vertices v i Set of vertices being root vertices, i.e. with vertex v i The vertex information is vertex information of a connected component of the root vertex, and the connected component is the content stored in the son vector by the vertex whose root value is itself. The initialization variable e is 1.
3.2. And creating a structural body, namely, a packed _ edge consistent with an edge data storage format in the Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for later function expansion and has no direct relation with the acceleration method. For simplicity and clarity of description, the first vertex of the edge e _ b with the ID v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with the ID v1_ low is denoted by e _ b.v1_ low.
3.3. An edge E _ b is created with the trellis coded _ edge for storing the edge information read from E.
3.4. If E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5.
3.5. And if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4.
3.6. And judging whether the root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] -, turning to the fifth step. The process of finding the root vertices of e _ b.v0_ low and e _ b.v1_ low in this step is the find stage step in the union-find algorithm.
And fourthly, processing the condition that the root vertexes are the same. If neither E _ b.v0_ low nor E _ b.v1_ low has been visited, then E _ b.v0_ low and E _ b.v1_ low are merged into the connected component son [ E _ b.v0_ low ] with E _ b.v0_ low as the root vertex, whereas if both E _ b.v0_ low and E _ b.v1_ low have been visited, meaning that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, then E _ b is skipped, accessing the next edge in E. The method comprises the following steps:
4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is indicated that both e _ b.v0_ low and e _ b.v1_ low have been accessed and are already in the same connected component son [ root [ e _ b.v0_ low ] (i.e. an element in the same vector in son), no merging operation is required, and 3.4 is performed.
4.2. The method includes the steps of merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting the e _ b.v0_ low as the root vertex of two vertexes, namely e _ b.v0_ low and e _ b.v1_ low, and setting elements corresponding to root vectors of the e _ b.v0_ low and the e _ b.v1_ low as e _ b.v0_ low, namely setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low.
4.3. And inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], and inserting e _ b.v1_ low into son [ e _ b.v0_ low ]. 4.2 and 4.3 are modifications to the root and son vectors corresponding to the merge phase of the gather-finding algorithm. And 3.4.
And fifthly, carrying out path compression on the parent-child relationship of the vertex according to the different conditions of the root vertex, and merging two different connected components of the root vertex. In the process of traversing the edge, vertex information of the connected components stored in the son vector is continuously improved, finally, all the connected components have one root value as a vertex of the connected components, the root vertex is the root vertex of the connected components, and all the vertex information of the connected components where the vertex is located is stored in the son vector corresponding to the vertex with the root value as the vertex. The method comprises the following steps:
5.1. if root e _ b.v0_ low is equal to-1, go 5.2, otherwise go 5.3.
5.2. At this time, the root [ e _ b.v0_ low ] ═ 1, and the root [ e _ b.v1_ low ] ≠ -1, which indicates that the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited at this time, e _ b.v0_ low is inserted into the connected component son [ e _ b.v1_ low ] (with the root vertex of e _ b.v1_ low) as the root vertex) (i.e., e _ b.v0_ low is merged into the connected component with the root vertex [ e _ b.v1_ low ] (representing the root vertex of e _ b.v1_ low), and the root of e _ b.v0_ low is the root vertex of e _ b.v1_ low, so that the root of the root [ e _ b.v0_ low ] (with the root vertex of e _ b.v1_ low) is merged into the root vertex of e _ b.v1_ low, and the root of the root _ b.v0_ low is merged into the root component of the root (e _ b.v0.v0.v0 _ low), and the root of the root.
5.3. If root e _ b.v1_ low is-1, go 5.4, otherwise, go 5.5.
5.4. At this time, the root [ e _ b.v1_ low ] is equal to-1, and the root [ e _ b.v0_ low ] is not equal to-1, which indicates that the vertex e _ b.v0_ low has been visited and the vertex e _ b.v1_ low has not been visited at this time, e _ b.v1_ low is inserted into a connected component son [ e _ b.v0_ low ] corresponding to the root vertex of e _ b.v0_ low, that is, e _ b.v1_ low is merged into a connected component with the root vertex [ e _ b.v0_ low ] as the vertex, and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, that is, the root [ e _ b.v0_ low ] is inserted and changed into the root vertex of e _ b.v0_ low ], that two common vertex pairs of the root [ e _ b.v0_ low ] are merged and the root vertex is changed into a new root component information of e _ b.v0_ low.
5.5. At this time, the number of vertexes of the connected component corresponding to e _ b.v0_ low and e _ b.v1_ low, that is, the number of elements of son [ root [ e _ b.v0_ low ] and son [ root [ e _ b.v1_ low ], is compared. If son [ root [ e _ b.v0_ low ] ]. size < son [ root [ e _ b.v1_ low ] ]. size (size represents the number of elements of a vector, son [ root [ e _ b.v0_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v0_ low, and son [ root [ e _ b.v1_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v1_ low), the IDs of the two vertices are reversed so that the number of elements of son [ root [ e _ b.v0_ low ] is greater than the number of elements of son [ root [ e _ b.v1_ low ]. This is done to perform a merging operation of a connected component having a smaller number of vertices into another connected component when the connected component having root [ e _ b.v0_ low ] as a root vertex and the connected component having root [ e _ b.v1_ low ] as a root vertex are merged. And 5.6.
5.6. Let the root vertex of e _ b.v. 1_ low be root _ v2, i.e. let root _ v2 be root [ e _ b.v. 1_ low ], set the loop variable i, and initialize i to 1.
5.7. If i is equal to son [ root _ v2]. size, it is described that all vertices in the connected components stored in son [ root _ v2] have traversed, all vertex elements in son [ root _ v2] have been inserted into the connected components to be merged, the loop ends, and the loop is turned to 3.4. Otherwise, the description is not traversed and 5.8 is rotated.
5.8. The path compression is performed on the parent-child relationship of the vertex in the connected component son [ root [ e _ b.v0_ low ] with root [ e _ b.v0_ low ] as the root vertex, and each element in son [ root _ v2] is adjusted to be the direct child vertex of the root [ e _ b.v0_ low ] so that no intermediate-level vertex appears between each element in son [ root _ v2] and the root vertex of e _ b.v0_ low ]. The path compression method comprises the following steps: the root vertex of the i-th element in the son [ root _ v2], i.e., the vertex son [ root _ v2] [ i ], is set to root [ e _ b.v0_ low ], that is, the root [ son [ root _ v2] [ i ] ] [ root _ b.v0_ low ], and the vertex son [ root _ v2] [ i ] is inserted into the connected component son [ root [ e _ b.v0_ low ].
5.9. Let i equal i +1, go to 5.7.
And sixthly, finding out information of all connected components in the graph G, and dividing the graph G by adopting the connected components. And distributing the graph G divided by the connected components to each processing node of the supercomputer by adopting scatter divergent operation (a distributed parallel programming MPI standard operation function). The method comprises the following steps:
6.1. the vertex number j and the connected component number k are set, and the initialization j is 0 and k is 1.
6.2. And if j is equal to N-1, the vertex is traversed, and k is the number of the connected components in the graph G at the moment, and the seventh step is executed, otherwise, 6.3 is executed.
6.3. Judging whether the root vertex of the vertex vj with the sequence number j is self, namely if root [ v ] j ]≠v j To illustrate the vertex v j Not the root vertex of the connected component, go 6.5. Otherwise, the vertex v is described j Is the root vertex of the kth connected component, son [ v [ ] j ]All vertices of the kth connected component are stored, which acquires all vertices of the kth connected component. Let k be k + 1. And 6.4 of rotation.
6.4. Josson [ v ] j ]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation.
6.5. Let j equal j +1, go to 6.2.
Step seven, BFS searching and verifying: and randomly generating a root vertex v, and carrying out BFS search on the graph G which is divided by adopting the connected component in the sixth step by taking v as a source point by combining the side information stored in the adjacency matrix A constructed in the second step. And outputting the spanning tree as a search result, recording the Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by searching is matched with the original image information. The process will loop 64 times, with each BFS search portion being clocked separately.
And eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to test the average value of the performance values, obtaining a test result and outputting the test result. The higher the TEPS value is, the stronger the large-scale Graph processing capacity of the surface supercomputer is, the more the Graph500 is ranked, and meanwhile, the supercomputer is more suitable for processing large data.
And ninthly, ending.
The invention can achieve the following technical effects:
1. in the third step of the invention, a two-dimensional vector data structure son is established to store the connected components, so that all the connected components in the graph G can be effectively and quickly stored, the merging speed of the sub-vertexes during the merging operation in the fifth step is improved to the maximum extent, and the efficiency of visiting and traversing the sub-vertexes is optimized.
2. In the method, the root vector is utilized to realize the searching operation in 3.6 steps, the function for searching the parent vertex does not need to be called for many times, the root vertex can be obtained by directly inquiring the element of the root vector, and the operation is matched with the path compression during the merging in the fifth step, so that the inquiring speed of the root vertex is increased.
3. In the fifth step, the path compression is carried out on the parent-child relationship of the vertexes in the connected components, when the two connected components with different root vertexes are combined, not only are the roots of the root vertexes with less child vertexes changed, but also the roots of all the child vertexes are completely changed into new root vertexes, so that the calling level of the search operation is reduced, the occupied cost of a stack in a memory is reduced, and the speed is increased.
4. In the sixth step, the generated connected components compressed by the paths are used for dividing the vertexes in the graph among the physical nodes in the supercomputer, and the vertexes of the same connected component are divided to the physical nodes with shorter communication paths, so that the communication paths are reduced, and the testing speed is improved. And the vertex belonging to the same connected component is only divided to the physical node with a shorter communication path, and when the BFS is carried out in the seventh step, the information of the edge is still obtained from the adjacency matrix A constructed in the second step, so that the accuracy of the Graph500 test is ensured.
Drawings
Fig. 1 is a flowchart of a Graph500 test benchmark program described in the background art.
Fig. 2 is a diagram of main steps of a related art union-finding algorithm, in which fig. 2(a) is a find (find) operation and fig. 2(b) is a merge (unity) operation.
FIG. 3 is an overall flow chart of the present invention.
Fig. 4 is a schematic diagram of the fifth step of compressing the path according to the present invention.
FIG. 5 is a schematic diagram of a son two-dimensional vector memory.
FIG. 6 is a schematic diagram of graph division by using connected components in the sixth step of the present invention.
The specific implementation mode is as follows:
the invention is further illustrated below with reference to the accompanying drawings, as shown in fig. 3, comprising the following steps:
the first step, graph generation. Generating a random graph structure G (V, E) through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, and the scale of the graph is determined by parameters scale and edge factor input by a user, wherein scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2 scale The number of vertices of G, i.e., the number of vertices in V, and M ═ edgefactor × N indicates the number of edges of G, i.e., the number of elements of E. Using v i Denotes the vertex with number i in G, and uses the vertex pair (v) i ,v j ) Representing a vertex v i To the vertex v j The edge of (2). (v) i ,v j ) Belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1.
And secondly, constructing an adjacency matrix A for storing the graph G. A. the ij 0 denotes the vertex v i And vertex v j Between which there is no edge, A ij 1 denotes the vertex v i And vertex v j With edges in between.
Thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge in the edge set E to eliminate interference of the self-loop edge, classifying according to different conditions of the two vertexes of the edge, and facilitating next processing, wherein the method comprises the following steps:
3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] V i ]Representing a vertex v i Root vertex of root, root [ v ] i ]Initialisation to-1, son is a two-dimensional vector comprising N elements, each element being a vector, initialising each element in son to a null vector. son [ v ] i ]Representing the vertex v i For storing the sub-vertex vectors with vertices v i Set of vertices being root vertices, i.e. with vertex v i The vertex information is vertex information of a connected component of the root vertex, and the connected component is the content stored in the son vector by the vertex whose root value is itself. The initialization variable e is 1.
3.2. And creating a structural body, namely, a packed _ edge consistent with an edge data storage format in the Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for later function extension. The first vertex of the edge e _ b with ID v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with ID v1_ low is denoted by e _ b.v1_ low.
3.3. An edge E _ b is created with the trellis coded _ edge for storing the edge information read from E.
3.4. If E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5.
3.5. And if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4.
3.6. And judging whether the root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] -, turning to the fifth step.
And fourthly, processing the condition that the root vertexes are the same. If neither E _ b.v0_ low nor E _ b.v1_ low has been visited, then E _ b.v0_ low and E _ b.v1_ low are merged into the connected component son [ E _ b.v0_ low ] with E _ b.v0_ low as the root vertex, whereas if both E _ b.v0_ low and E _ b.v1_ low have been visited, meaning that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, then E _ b is skipped, and the next edge in E is visited. The method comprises the following steps:
4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is stated that both e _ b.v0_ low and e _ b.v1_ low are accessed and already in the same connected component son [ e _ b.v0_ low ], no merging operation is required, and go to 3.4.
4.2. The method includes the steps of merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting the e _ b.v0_ low as the root vertex of two vertexes, namely e _ b.v0_ low and e _ b.v1_ low, and setting elements corresponding to root vectors of the e _ b.v0_ low and the e _ b.v1_ low as e _ b.v0_ low, namely setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low.
4.3. And inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], and inserting e _ b.v1_ low into son [ e _ b.v0_ low ]. And 3.4.
And fifthly, carrying out path compression on the parent-child relationship of the vertexes according to the different conditions of the root vertexes, and merging two different connected components of the root vertexes. The method comprises the following steps:
5.1. if root e _ b.v0_ low is-1, go to 5.2, otherwise go to 5.3.
5.2. At this time, the root [ e _ b.v0_ low ] ═ 1, and the root [ e _ b.v1_ low ] ≠ -1, which indicates that the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited, e _ b.v0_ low is inserted into the connected component son [ root [ e _ b.v1_ low ] ] corresponding to the root vertex of e _ b.v1_ low (i.e. e _ b.v0_ low is merged into the connected component with the root vertex [ e _ b.v1_ low ] (representing the root vertex of e _ b.v1_ low) as the root vertex), and the root of e _ b.v0_ low is changed into the root vertex of e _ b.v1_ low, i.e. the root of e _ b.v0_ low is inserted into the root vertex of e _ b.v1_ low, and the two merged root of the root vertex and the root of e _ b.v0_ low are changed into a new connected component, i.e _ b.v1_ low ] (i.g. the root of the root.
5.3. If root e b v1 low ═ 1, go to 5.4, otherwise go to 5.5.
5.4. At this time, the vertex e _ b.v0_ low has been visited, and the vertex e _ b.v1_ low has not been visited, e _ b.v1_ low is inserted into the connected component son [ root [ e _ b.v0_ low ]) corresponding to the root vertex of e _ b.v0_ low, i.e. e _ b.v1_ low is merged into the connected component using the vertex root [ e _ b.v0_ low ], and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, i.e. the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], and the insertion and the change of the root vertex together complete the merging operation of the two connected components, and the connected component using the root [ e _ b.v0_ low ] as the root vertex is added with new vertex information, and the vertex information is converted into 3.4.4.
5.5. At this time, the number of vertexes of the connected component corresponding to e _ b.v0_ low and e _ b.v1_ low, that is, the number of elements of son [ root [ e _ b.v0_ low ] and son [ root [ e _ b.v1_ low ], is compared. If son [ root [ e _ b.v0_ low ] ]. size < son [ root [ e _ b.v1_ low ] ]. size (size represents the number of elements of a vector, son [ root [ e _ b.v0_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v0_ low, and son [ root [ e _ b.v1_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v1_ low), the IDs of the two vertices are reversed so that the number of elements of son [ root [ e _ b.v0_ low ] is greater than the number of elements of son [ root [ e _ b.v1_ low ]. This is done to perform a merging operation of a connected component having a smaller number of vertices into another connected component when the connected component having root [ e _ b.v0_ low ] as a root vertex and the connected component having root [ e _ b.v1_ low ] as a root vertex are merged. And 5.6 of rotation.
5.6. Let the root vertex of e _ b.v1_ low be root _ v2, i.e. let root _ v2 be root [ e _ b.v1_ low ], set loop variable i, and initialize i to 1.
5.7. If i is equal to son [ root _ v2]. size, it is described that all vertices in the connected components stored in son [ root _ v2] have traversed, all vertex elements in son [ root _ v2] have been inserted into the connected components to be merged, the loop ends, and the loop is turned to 3.4. Otherwise, the description is not traversed and 5.8 is rotated.
5.8. For root [ e _ b.v0_ low ]]Is the root vertex's connected component son [ e _ b.v0_ low]]Compressing the path of parent-child relationship of middle vertex, and compressing son [ root _ v2]]Is adjusted to root e _ b.v0_ low]Such that son root _ v2]No intermediate level vertices are present between each element in (a) and the root vertex of e _ b.v0_ low. The path compression method comprises the following steps: joson root _ v2]The ith element in (1), i.e., vertex son [ root _ v2][i]Is set to root e _ b.v0_ low]That is, make root [ son _ v2]][i]]=root[e_b.v0_low]And connects the vertex son [ root _ v2]][i]Insert connected component son [ e _ b.v0_ low ]]]. FIG. 5 is a diagram of son vector storage for storing all child vertices hanging from a root vertex in accordance with the present invention. In FIG. 5, vector son [ v ] 0 ]Due to the number of elementsRatio vector son [ v ] 2 ]Less, it is necessary to copy the elements in the vector to the vector son v 2 ]In the order of son [ v ] 0 ]The elements in (a) are copied to the vector son [ v ] in turn 2 ]To the end of (c). son [ v ] 0 ][0]、son[v 0 ][1]、son[v 0 ][2].., before merging, with vertex v 0 The IDs of vertices that are root vertices, which are each copied to son V after merging 2 ][m]、son[v 2 ][m+1]、son[v 2 ][m+2]... (m is son [ v ] 2 ]The starting subscript of the newly added element).
FIG. 4 is a schematic diagram of the fifth step of performing path compression on parent-child relationships of vertices according to different root vertices and merging two connected components with different root vertices, which corresponds to operations 5.5-5.8 of the present invention and indicates when the visited node is a node v 0 And v 2 When the side is formed, son [ v ] 0 ]Stored connected component sum son [ v [ ] 2 ]The stored connected components merge due to son [ v [ ] 0 ]The number of the middle vertexes is less, so that the vertexes are directly hung down to the son [ v ] 2 ]Store the connected component under the root vertex, i.e. the vertex son v 0 ][0]、son[v 0 ][1]、son[v 0 ][2].. as son [ v ] 2 ][0]Are respectively copied as son [ v ] 2 ][m]、son[v 2 ][m+1]、son[v 2 ][m+2].., the tree hierarchy is reduced.
5.9. Let i equal i +1, go to 5.7.
And sixthly, finding out the information of all the connected components in the graph G, and dividing the graph G by adopting the connected components. And distributing the graph G divided by the connected components to each processing node of the super computer by adopting scatter divergence operation.
The method comprises the following steps:
6.6. the vertex number j and the connected component ordinal number k are set, and the initialization j is 0 and k is 1.
6.7. If j equals N-1, it means that all vertices have been traversed, and k is the number of connected components in the graph G at this time, go to the seventh step, otherwise, go to 6.3.
6.8. Determining the vertex v with sequence number j j Whether the root vertex of (c) is itself, i.e. if root v j ]≠v j Description of the vertex v j Not the root vertex of the connected component, go 6.5. Otherwise, the vertex v is illustrated j Is the root vertex of the kth connected component, son [ v [ ] j ]All vertices of the kth connected component are stored, which acquires all vertices of the kth connected component. Let k be k + 1. And 6.4.
6.9. Josson [ v ] j ]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergent operation.
6.10. Let j equal j +1, turn 6.2.
FIG. 6 is a schematic diagram of graph division using connected components in the sixth step of the present invention. The storage nodes on the super computer system are represented by boxes, CCk (k is 1, 2, 3.) -in each box represents all connected components in the graph G, sequence numbers on connecting lines identify the actual access sequence, the 'say.' before and after the numbers represent the complexity of the path and the number of network hops required to be passed, the more the 'say.' the more the communication path is, the more the number of hops required to be passed is, when the connected components are used for dividing the graph, for the vertex vi, the connected component where the vi is located is assumed to be CC1(k is 1), the vi root vertices are root [ vi ], all the vertices of the connected component where the vi is located can be obtained from son [ root [ vi ] ], other vertices in the CC1 are now divided to nodes which are closer to the node of the vertex vi, and the nodes are preferentially accessed.
Step seven, BFS searching and verifying: and randomly generating a root vertex v, and carrying out BFS search on the graph G which is divided by adopting the connected component in the sixth step by taking v as a source point by combining the side information stored in the adjacency matrix A constructed in the second step. And outputting the spanning tree as a search result, recording the Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by searching is matched with the original image information. The process will loop 64 times, with each BFS search portion being clocked separately.
And eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to test the average value of the performance values, obtaining a test result and outputting the test result. The higher the TEPS value is, the stronger the large-scale Graph processing capacity of the surface supercomputer is, the more the Graph500 is ranked, and meanwhile, the supercomputer is more suitable for processing large data.
And ninth step, ending.
Claims (2)
1. A supercomputer benchmark test acceleration method based on connected component generation optimization is characterized by comprising the following steps:
first, generating a graph, namely generating a random graph structure G (V, E) by a Kronecker graph generator, wherein V is a vertex set, E is an edge set, the scale of the graph is determined by parameters scale and edge factor input by a user, scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connected edges of each vertex, and N is 2 scale The number of vertices of G, that is, the number of vertices in the element of V, and M ═ edgefactor × N represents the number of edges of G, that is, the number of elements of E; using v i Denotes the vertex with number i in G, and uses the vertex pair (v) i ,v j ) Representing the vertex v i To vertex v j The edge of (1); (v) i ,v j ) E belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1;
second, construct the adjacency matrix A, A of the memory map G ij 0 denotes the vertex v i And vertex v j Between which there is no edge, A ij 1 denotes the vertex v i And vertex v j There is an edge in between;
thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge in the edge set E, namely an edge connecting the vertex with the edge, and classifying according to different conditions of the two vertexes of the edge, wherein the method comprises the following steps:
3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] V i ]Representing the vertex v i Root vertex of root, root [ v ] i ]Initialized to-1, son is a two-dimensional vector comprising N elements, each element being a vectorInitializing each element in the son to be a null vector; son [ v ] i ]Representing a vertex v i For storing the sub-vertex vectors with vertices v i Set of vertices being root vertices, i.e. with vertex v i Vertex information of a connected component which is a root vertex, wherein the connected component is the content stored in the son vector by the vertex of which the root value is self; initializing a variable e to 1;
3.2. creating a structural body, namely, a packed _ edge consistent with an edge data storage format in a Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for function extension; the first vertex of the edge e _ b with the ID of v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with the ID of v1_ low is denoted by e _ b.v1_ low;
3.3. creating an edge E _ b with the struct packed _ edge for storing the edge information read from E;
3.4. if E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5;
3.5. if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4;
3.6. judging whether root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fifth step;
step four, processing the condition that the root vertex is the same, if neither E _ b.v0_ low nor E _ b.v1_ low has been visited, merging E _ b.v0_ low and E _ b.v1_ low into a connected component son [ E _ b.v0_ low ] taking E _ b.v0_ low as the root vertex, if both E _ b.v0_ low and E _ b.v1_ low have been visited, indicating that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, skipping E _ b, and visiting the next edge in E, wherein the method comprises the following steps:
4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is described that both e _ b.v0_ low and e _ b.v1_ low have been accessed and are already in the same connected component son [ root [ e _ b.v0_ low ], no merging operation is required, and 3.4 is performed;
4.2. merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting e _ b.v0_ low as the root vertex of two vertexes, i.e. setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root vector of e _ b.v1_ low as e _ b.v0_ low, i.e. setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low;
4.3. inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], inserting e _ b.v1_ low into son [ e _ b.v0_ low ], and turning to 3.4;
and fifthly, performing path compression on parent-child relations of the vertexes according to different conditions of the root vertexes, and combining two different connected components of the root vertexes, wherein the method comprises the following steps:
5.1. if root [ e _ b.v0_ low ] ═ 1, go to 5.2, otherwise, go to 5.3;
5.2. when the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited, e _ b.v0_ low is inserted into the connected component son [ root [ e _ b.v1_ low ] ] corresponding to the root vertex of e _ b.v1_ low, i.e. e _ b.v0_ low is merged into the connected component with the vertex root [ e _ b.v1_ low ], and the root vertex of e _ b.v0_ low is changed into the root vertex of e _ b.v1_ low, i.e. the root [ e _ b.v1_ low ] (root [ e _ b.v0_ low ], the insertion and the change of the root together complete the merging operation of the two connected components, and the connected component with the root [ e _ b.v1_ low ] as the root is added with new vertex information, and the new vertex information is transferred by 3.4.4;
5.3. if root [ e _ b.v1_ low ] ═ 1, go to 5.4, otherwise, go to 5.5;
5.4. when the vertex e _ b.v0_ low has been visited and the vertex e _ b.v1_ low has not been visited, e _ b.v1_ low is inserted into the connected component son [ root [ e _ b.v0_ low ] corresponding to the root vertex of e _ b.v0_ low ], i.e. e _ b.v1_ low is merged into the connected component using the vertex root [ e _ b.v0_ low ], and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, i.e. the root [ e _ b.v0_ low ] is made equal to root [ e _ b.v1_ low ], and the merging operation of the two connected components is completed by inserting and changing the root vertex, and the connected component using the root [ e _ b.v0_ low ] as the root vertex is added with new vertex information, and the vertex information is converted into 3.4;
5.5. when root [ e _ b.v0_ low ] ≠ 1 and root [ e _ b.v1_ low ] ≠ -1, compare the numbers of vertices of connected components corresponding to e _ b.v0_ low and e _ b.v1_ low, i.e. compare the numbers of elements of son [ e _ b.v0_ low ] and son [ e _ b.v1_ low ], [ e _ b.v0_ low ], [ size < son [ e _ b.v1_ low ] ]. size, size represents the number of elements of the vector, son [ e _ b.v0_ low ], [ size ] represents the number of child nodes of the root element of e _ b.v0_ low ], size [ e _ b.v0_ low ], [ size ] represents the number of child nodes of the root element of e _ b.v0_ low ], [ number of child nodes of the vertex [ e _ b.v0_ low ], [ ID _ b.v0_ low ], [ number of child nodes of the root element of son _ b.v0_ low ], [ e _ b.1 ] so that the number of child nodes of the root element of son _ b.b.v0 _ low ], [ ID ] is greater than two child nodes; rotating for 5.6;
5.6. recording the root vertex of e _ b.v. 1_ low as root _ v2, namely, setting a loop variable i and initializing i as 1 by using root _ v2 as root [ e _ b.v. 1_ low ];
5.7. if i ═ son [ root _ v2]. size, go to 3.4, otherwise, go to 5.8;
5.8. performing path compression on parent-child relationships of vertices in a son [ root [ e _ b.v0_ low ] of a connected component with the root [ e _ b.v0_ low ] as a root vertex, and adjusting each element in the son [ root _ v2] to be a direct child vertex of the root [ e _ b.v0_ low ] so that no vertex of an intermediate level exists between each element in the son [ root _ v2] and the root vertex of the e _ b.v0_ low; the path compression method comprises the following steps: setting a root vertex of an i-th element in the son [ root _ v2], that is, a vertex son [ root _ v2] [ i ] to root [ e _ b.v0_ low ], that is, making root [ son [ root _ v2] [ i ] ] [ root _ b.v0_ low ], and inserting vertex som [ root _ v2] [ i ] into a connected component son [ root [ e _ b.v0_ low ] ];
5.9. converting i to i +1, and converting to 5.7;
sixthly, finding out information of all connected components in the graph G, dividing the graph G by adopting the connected components, and distributing the graph G divided by adopting the connected components to each processing node of the super computer by adopting scatter divergence operation;
step seven, BFS searching and verifying: randomly generating a root vertex v, carrying out BFS search on a Graph G which is divided by adopting a connected component by taking v as a source point by combining side information stored in an adjacent matrix A, outputting a spanning tree as a search result, recording Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by search is matched with original Graph information or not; the process is circulated for 64 times, and each time of BFS searching part is timed;
eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to obtain a test result and outputting the test result;
and ninth step, ending.
2. The method for accelerating benchmark testing of a supercomputer based on connected component generation optimization as claimed in claim 1, wherein the sixth step finds out the information of all connected components in the graph G, divides the graph G by using the connected components, and distributes the graph G divided by using the connected components to each processing node of the supercomputer by using scatter dispersion operation, the method comprises:
6.1. setting a vertex serial number j and a connected component serial number k, initializing j to be 0, and initializing k to be 1;
6.2. if j is equal to N-1, it is described that all the vertexes have been traversed, at this time k is the number of connected components in the graph G, the seventh step is switched, otherwise, 6.3 is switched;
6.3. determining the vertex v with sequence number j j Whether the root vertex of (c) is itself, i.e. if root v j ]≠v j To illustrate the vertex v j Turning to 6.5 when the root vertex is not the connected component; otherwise, the vertex v is described j Is the root vertex of the kth connected component, son [ v [ ] j ]Storing all vertexes of the kth connected component and acquiring all vertexes of the kth connected component; let k be k + 1; 6.4 of rotation;
6.4. josson [ v ] j ]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation;
6.5. let j equal j +1, go to 6.2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110293568.7A CN112883241B (en) | 2021-03-19 | 2021-03-19 | Supercomputer benchmark test acceleration method based on connected component generation optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110293568.7A CN112883241B (en) | 2021-03-19 | 2021-03-19 | Supercomputer benchmark test acceleration method based on connected component generation optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112883241A CN112883241A (en) | 2021-06-01 |
CN112883241B true CN112883241B (en) | 2022-09-09 |
Family
ID=76041241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110293568.7A Active CN112883241B (en) | 2021-03-19 | 2021-03-19 | Supercomputer benchmark test acceleration method based on connected component generation optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112883241B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115858766A (en) * | 2023-03-01 | 2023-03-28 | 中国人民解放军国防科技大学 | Interest propagation recommendation method and device, computer equipment and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116069603B (en) * | 2021-09-28 | 2023-12-08 | 华为技术有限公司 | Performance test method of application, method and device for establishing performance test model |
CN115051936A (en) * | 2022-03-31 | 2022-09-13 | 中国电子科技集团公司第十五研究所 | Multi-graph-based connected component increment calculation method |
CN117056978B (en) * | 2023-08-30 | 2024-06-18 | 西安电子科技大学 | Security parallel checking and gathering operation method based on arithmetic sharing |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10423663B2 (en) * | 2017-01-18 | 2019-09-24 | Oracle International Corporation | Fast graph query engine optimized for typical real-world graph instances whose small portion of vertices have extremely large degree |
CN107193899B (en) * | 2017-05-10 | 2019-09-13 | 华中科技大学 | A kind of strongly connected graph division methods that nomography is friendly |
CN109656798B (en) * | 2018-12-26 | 2022-02-01 | 中国人民解放军国防科技大学 | Vertex reordering-based big data processing capability test method for supercomputer |
CN109684185B (en) * | 2018-12-26 | 2022-02-01 | 中国人民解放军国防科技大学 | Heuristic traversal-based big data processing capacity test method for supercomputer |
CN111881327A (en) * | 2020-07-30 | 2020-11-03 | 中国人民解放军国防科技大学 | Big data processing capacity testing method based on vertex reordering and priority caching |
-
2021
- 2021-03-19 CN CN202110293568.7A patent/CN112883241B/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115858766A (en) * | 2023-03-01 | 2023-03-28 | 中国人民解放军国防科技大学 | Interest propagation recommendation method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112883241A (en) | 2021-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112883241B (en) | Supercomputer benchmark test acceleration method based on connected component generation optimization | |
Roughgarden et al. | Shuffles and circuits (on lower bounds for modern parallel computation) | |
Chen et al. | Topological properties, communication, and computation on WK‐recursive networks | |
CN112165405B (en) | Method for testing big data processing capacity of supercomputer based on network topological structure | |
CN109656798B (en) | Vertex reordering-based big data processing capability test method for supercomputer | |
CN112100450A (en) | Graph calculation data segmentation method, terminal device and storage medium | |
CN104731925A (en) | MapReduce-based FP-Growth load balance parallel computing method | |
CN114567634B (en) | Method, system, storage medium and electronic device for calculating E-level map facing backward | |
CN113449153A (en) | Index construction method and device, computer equipment and storage medium | |
Stergiou et al. | Multiple-value exclusive-or sum-of-products minimization algorithms | |
CN111881327A (en) | Big data processing capacity testing method based on vertex reordering and priority caching | |
CN115310612A (en) | Quantum line construction method and device and quantum computer operating system | |
CN115310614A (en) | Quantum line construction method and device and quantum computer operating system | |
CN111369052A (en) | Simplified road network KSP optimization algorithm | |
CN110704693A (en) | Distributed graph calculation system and distributed graph calculation method | |
Peres | Concurrent self-adjusting distributed tree networks | |
Luo et al. | Implementation of a parallel graph partition algorithm to speed up BSP computing | |
Lai et al. | Exploiting and evaluating MapReduce for large-scale graph mining | |
CN112765409B (en) | Distributed community discovery method based on modularity | |
Fantozzi et al. | A general PRAM simulation scheme for clustered machines | |
Leu | Fast Consistent Hashing in Constant Time | |
CN113726342B (en) | Segmented difference compression and inert decompression method for large-scale graph iterative computation | |
CN114281830B (en) | Rule mapping table construction method, rule matching method and device for multi-attribute conditions | |
Ranawaka et al. | Distributed Sparse Random Projection Trees for Constructing K-Nearest Neighbor Graphs | |
de Alencar Vasconcellos et al. | A new efficient parallel algorithm for minimum spanning tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |