CN112883241A - Supercomputer benchmark test acceleration method based on connected component generation optimization - Google Patents

Supercomputer benchmark test acceleration method based on connected component generation optimization Download PDF

Info

Publication number
CN112883241A
CN112883241A CN202110293568.7A CN202110293568A CN112883241A CN 112883241 A CN112883241 A CN 112883241A CN 202110293568 A CN202110293568 A CN 202110293568A CN 112883241 A CN112883241 A CN 112883241A
Authority
CN
China
Prior art keywords
low
root
vertex
son
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110293568.7A
Other languages
Chinese (zh)
Other versions
CN112883241B (en
Inventor
白皓
甘新标
张一鸣
李东升
贾孟涵
谭雯
司嘉奇
来宪龙
李海莉
来乐
宣栋梁
苏鸿宇
王庆坤
徐云鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110293568.7A priority Critical patent/CN112883241B/en
Publication of CN112883241A publication Critical patent/CN112883241A/en
Application granted granted Critical
Publication of CN112883241B publication Critical patent/CN112883241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90348Query processing by searching ordered data, e.g. alpha-numerically ordered data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a supercomputer benchmark test acceleration method based on connected component generation optimization, aiming at minimizing a communication path, maximizing the utilization rate of memory access bandwidth and accelerating the supercomputer big data benchmark test; the technical scheme is that the characteristic that a Graph generated by Graph500 comprises a plurality of connected components is utilized, the connected components are quickly found in the Graph, two-dimensional vectors are adopted to store the connected components, path compression is carried out on parent-child relations of vertexes in the connected components, two connected components with different root vertexes are merged, and the vertexes of the same connected component are divided to physical nodes with short communication paths in a super computer, so that the communication cost is small when the Graph is traversed and accessed, and the operation speed is high. The method can effectively and quickly store all the connected components in the graph, improve the merging speed to the maximum extent, accelerate the query speed of the root vertex, reduce the occupied overhead of the stack in the memory and improve the test speed of the big data processing capacity of the supercomputer.

Description

Supercomputer benchmark test acceleration method based on connected component generation optimization
Technical Field
The invention relates to a method for accelerating big data benchmark test of a supercomputer, in particular to a method for accelerating the benchmark test by generating and optimizing a connected component based on two-dimensional vector and path compression.
Background
The graph is a common data structure and can be used for abstracting and expressing various complex association relations among real things. For example, social networks, the world Wide Web, and the like may all be represented using graphs. Graph calculation is to process and calculate graph data, and plays an important role in many scenes in real life. In recent years, the scale of graph data is continuously increased, according to related reports, in the third quarter of 2020, Facebook is 18.2 billion per day active users, Tencent WeChat is 12.1 billion in active users, and the relationship between the users and the active users is abstracted into points and edges in the graph, so that the scale of the points in the graph reaches billions, and the scale of the edges reaches thousands of billions. This results in higher demands in terms of data storage and computational power. The supercomputer is mainly used for numerical calculation, and in the big data age with widely-rising data intensive application, the Graph500 is an important benchmark test program for testing the computational power (namely the processing capacity of the supercomputer on data). Graph500 measures the supercomputer's big data processing capacity in terms of the number of edges Per second, teps, (transformed Edge Per second) traversed in the Graph.
The Graph500 benchmark test program consists of four steps of Graph generation, Graph establishment, BFS search and verification and result output, as shown in FIG. 1.
(1) And (3) generating a graph: generating a random graph structure G (V, E) through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, and the scale of the graph is determined by parameters scale and edge factor input by a user, wherein scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2scaleThe number of vertices of the input graph, i.e., the number of elements of V, and M ═ edgefactor × N, the number of edges of the input graph, i.e., the number of elements of E. Commonly using viRepresenting the vertex numbered i in the figure, using the vertex pair (v)i,vj) Representing the edge from vertex i to vertex j. (v)i,vj)∈E,0≤i≤N-1,0≤j≤N-1,N=2scaleThe number of vertices in V.
(2) Establishing a graph: the vertex and side information generated in the first step is converted into a data structure arbitrarily representing a Graph, and the Graph information is stored using the adjacency matrix of the Graph in the standard Graph 500.
(3) BFS (Breadth-First Search Breadth First Search) Search and validation: and randomly generating a root point, carrying out BFS search on the whole Graph by taking the root point as a source point, outputting a spanning tree as a search result, recording the effective timing time t of the Graph500, and verifying whether the BFS spanning tree obtained by the search is matched with the original Graph information. The process will loop 64 times and each BFS search portion will be clocked separately.
(4) And (4) outputting a result: the Graph500 measures the execution performance of the program by using the number of edges per second (TEPS), which is the number of edges M generating the Graph divided by the BFS search time t, that is, the TEPS is calculated by 64 loop traversals respectively (M/t), and then the average value of 64 TEPS is taken as the basis for the final test and ranking of the Graph 500.
In summary, in the Graph500, all the vertices in the Graph need to be traversed sequentially in the BFS search, and the vertices are distributed on numerous physical nodes of the supercomputer, so the memory access bandwidth is a key factor affecting the performance of the Graph 500. Moreover, the Graph500 mainly uses the BFS search time as the measurement time, and the Graph data preparation before BFS is not limited. The Graph500 is mainly applied to benchmark testing of a super computer system on computing capacity of big data, the super computer system is generally composed of computing nodes, storage nodes and an interconnection network, wherein the computing nodes and the storage nodes are connected by the interconnection network through a switch to form the super computer system, interaction information among the nodes is forwarded through the switch (the number of the switches needing to pass through among 2 mutually communicated nodes is called the hop number of a path among the nodes, the path is longer when the hop number is larger), forwarding information reaches a target node through the interconnection network, generally, a storage space of the storage node is uniformly mapped to the computing nodes, namely, a plurality of computing nodes can share one storage node, but a storage space with non-mapped access of the computing nodes (namely, the storage space with access of the computing nodes is not the storage space corresponding to the computing nodes) needs interconnection communication, and the testing performance of the Graph500 is mainly limited by the size of a memory and the access bandwidth, the higher the bandwidth, the better the performance. Therefore, if 2 computing nodes needing interaction share the storage space, the communication path can be reduced, and the memory access speed is improved.
In a 'vertex reordering and priority caching based big data processing capability test method' (patent application number: 202010748396.3), by utilizing the characteristic that BFS traverses a high probability of edge relation between a vertex with a high medium number and a root node, a vertex reordering and priority caching based big data processing capability test method is provided, so that the access and storage times are reduced, but the connectivity of a connected component in a big graph is not considered in the method.
The connected component of an undirected graph, also called a maximal connected subgraph, refers to a subgraph in which each pair of vertices can be connected with each other through a path. Connected component algorithms are often an important step in large-scale graph processing. The non-connected graph can be decomposed into a plurality of connected components, each connected component corresponds to at least 1 spanning tree (a tree containing all the vertexes of the connected components), and the set of the spanning trees of the connected components forms a connected forest. If the vertices in the same connected component are divided to the physical nodes with shorter communication paths during point division, that is, the vertices with edge association are distributed to the routing range of the switches on the same layer as much as possible according to the network topology structure of the computer system in the Graph500 benchmark test program Graph establishing process (that is, the second step), and the BFS is operated on the basis, the bandwidth utilization rate of the super computer system can be improved, the load balance is realized, the communication overhead is reduced, and the generation of the spanning tree is accelerated.
A plurality of scholars realize optimization of a connected component algorithm on a single processor and a plurality of processors, but the connected component algorithm is only applied to scenes such as reachability query and consistency detection at present, and no published document relates to application of the connected component algorithm to acceleration of big data benchmark test of a super computer.
The Graph generated by the Graph500 comprises a plurality of connected components, and if the algorithm for realizing the connected components can be optimized, the benchmark test of the big data of the super computer is accelerated. The union set search algorithm is one of the realization modes of the connected components, and the traditional union set search algorithm comprises the following steps:
in the first step, the root vertex (find) of the two vertices of the currently visited edge is found, as shown in fig. 2(a), where fig. 2(a) is a find (find) operation, which means that the root vertex is found by continuously tracing back the parent vertex in the tree.
Second, if the root vertices are different, the set of two vertices is merged (i.e., union). As shown in fig. 2(b), fig. 2(b) is a merge (union) operation, which means that 2 trees with different root vertices are merged into 1 tree.
And the set searching algorithm expresses each set by 1 tree, each vertex has its father vertex, and the father vertex is continuously searched until the father vertex is the root of the tree, which is the root vertex of all the vertices. In the conventional union set algorithm, only the parent vertices of partial vertices are changed in the union (merging), which results in multiple calls to the function for finding the parent vertices (including function recursion) in the find stage, and the cost of the function recursion is definitely huge in large-scale graph calculation. Therefore, the union-check set algorithm cannot be directly used for the large data benchmark test acceleration of the super computer.
Although optimization algorithms for path compression are also presented to optimize and find algorithms, for example, the 1985 article "On the expected performance of path compression algorithms" in SIAM Journal On Computing (SICOMP) analyzes the complexity of path compression, so that it is not necessary to call a function for finding parent vertices many times when finding, and reduces function recursion, but the current optimization algorithms for path compression do not consider the coupling degree with data structures, resulting in general optimization effect. Therefore, the path compression union search algorithm cannot be directly used for the large data benchmark test acceleration of the super computer.
Therefore, how to minimize the communication path and maximize the utilization rate of the memory access bandwidth and accelerate the big data benchmark test of the super computer is still a technical problem which needs to be solved urgently by the technical personnel in the field.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the communication path is minimized, the memory access bandwidth utilization rate is maximized, and the large data benchmark test of the super computer is accelerated.
The technical scheme includes that the characteristic that a Graph generated by Graph500 comprises a plurality of connected components is utilized, the connected components are quickly found in the Graph, two-dimensional vectors are adopted to store the connected components, path compression is carried out on parent-child relations of vertexes in the connected components, two connected components with different root vertexes are combined, and the vertexes of the same connected component are divided to physical nodes with short communication paths in the super computer, so that communication overhead is reduced when traversal access of the Graph is executed, operation speed is increased, and big data processing capacity testing speed of the super computer is increased.
The specific technical scheme is as follows:
the first step, graph generation. Generation by Kronecker graph generatorThe graph structure G is (V, E), V is a vertex set, E is an edge set, the scale of the graph is determined by parameters scale and edge factor input by a user, wherein the scale indicates the scale of the vertex of the graph, the edge factor indicates the average number of connected edges of each vertex, and N is 2scaleThe number of vertices of G, i.e., the number of vertices in V, and M ═ edgefactor × N indicates the number of edges of G, i.e., the number of elements of E. Using viDenotes the vertex with number i in G, and uses the vertex pair (v)i,vj) Representing a vertex viTo the vertex vjThe edge of (2). (v)i,vj) Belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1.
And secondly, constructing an adjacency matrix A for storing the graph G. A. theij0 denotes the vertex viAnd vertex vjBetween which there is no edge, A ij1 denotes the vertex viAnd vertex vjWith edges in between.
Thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge (the self-loop edge refers to an edge connecting a vertex and the self-loop edge) in the edge set E to eliminate interference of the self-loop edge, classifying according to different conditions of the two vertexes of the edge, and facilitating next processing, wherein the method comprises the following steps:
3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] Vi]Representing a vertex viRoot vertex of root, root [ v ]i]Initialisation to-1, son is a two-dimensional vector comprising N elements, each element being a vector, initialising each element in son to a null vector. son [ v ]i]Representing a vertex viIs used to store the sub-vertex vector of (a) with the vertex viSet of vertices being root vertices, i.e. with vertex viThe vertex information is vertex information of a connected component of the root vertex, and the connected component is the content stored in the son vector by the vertex whose root value is itself. The initialization variable e is 1.
3.2. And creating a structural body, namely, a packed _ edge consistent with an edge data storage format in the Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for later function expansion and has no direct relation with the acceleration method. For simplicity and clarity of description, the first vertex of the edge e _ b with the ID v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with the ID v1_ low is denoted by e _ b.v1_ low.
3.3. An edge E _ b is created with the trellis coded _ edge for storing the edge information read from E.
3.4. If E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5.
3.5. And if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4.
3.6. And judging whether the root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] -, turning to the fifth step. The process of finding the root vertices of e _ b.v0_ low and e _ b.v1_ low in this step is the find stage step in the union-find algorithm.
And fourthly, processing the condition that the root vertexes are the same. If neither E _ b.v0_ low nor E _ b.v1_ low has been visited, then E _ b.v0_ low and E _ b.v1_ low are merged into the connected component son [ E _ b.v0_ low ] with E _ b.v0_ low as the root vertex, whereas if both E _ b.v0_ low and E _ b.v1_ low have been visited, meaning that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, then E _ b is skipped, accessing the next edge in E. The method comprises the following steps:
4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is stated that both e _ b.v0_ low and e _ b.v1_ low have been accessed and already exist in the same connected component son [ root [ e _ b.v0_ low ] (i.e. an element in the same vector in son), no merging operation is needed, and 3.4 is turned.
4.2. The method includes the steps of merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting the e _ b.v0_ low as the root vertex of two vertexes, namely e _ b.v0_ low and e _ b.v1_ low, and setting elements corresponding to root vectors of the e _ b.v0_ low and the e _ b.v1_ low as e _ b.v0_ low, namely setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low.
4.3. And inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], and inserting e _ b.v1_ low into son [ e _ b.v0_ low ]. 4.2 and 4.3 are the merging phases of the corresponding and looking-up algorithms for the changes to the root and son vectors. And 3.4.
And fifthly, carrying out path compression on the parent-child relationship of the vertex according to the different conditions of the root vertex, and merging two different connected components of the root vertex. In the process of traversing the edge, vertex information of the connected components stored in the son vector is continuously improved, finally, all the connected components have one root value as a vertex of the connected components, the root vertex is the root vertex of the connected components, and all the vertex information of the connected components where the vertex is located is stored in the son vector corresponding to the vertex with the root value as the vertex. The method comprises the following steps:
5.1. if root e _ b.v0_ low is-1, go to 5.2, otherwise go to 5.3.
5.2. At this time, the root [ e _ b.v0_ low ] ═ 1, and the root [ e _ b.v1_ low ] ≠ -1, which indicates that the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited at this time, e _ b.v0_ low is inserted into the connected component son [ e _ b.v1_ low ] (with the root vertex of e _ b.v1_ low) as the root vertex) (i.e., e _ b.v0_ low is merged into the connected component with the root vertex [ e _ b.v1_ low ] (representing the root vertex of e _ b.v1_ low), and the root of e _ b.v0_ low is the root vertex of e _ b.v1_ low, so that the root of the root [ e _ b.v0_ low ] (with the root vertex of e _ b.v1_ low) is merged into the root vertex of e _ b.v1_ low, and the root of the root _ b.v0_ low is merged into the root component of the root (e _ b.v0.v0.v0 _ low), and the root of the root.
5.3. If root e b v1 low ═ 1, go to 5.4, otherwise go to 5.5.
5.4. At this time, the root [ e _ b.v1_ low ] is equal to-1, and the root [ e _ b.v0_ low ] is not equal to-1, which indicates that the vertex e _ b.v0_ low has been visited and the vertex e _ b.v1_ low has not been visited at this time, e _ b.v1_ low is inserted into a connected component son [ e _ b.v0_ low ] corresponding to the root vertex of e _ b.v0_ low, that is, e _ b.v1_ low is merged into a connected component with the root vertex [ e _ b.v0_ low ] as the vertex, and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, that is, the root [ e _ b.v0_ low ] is inserted and changed into the root vertex of e _ b.v0_ low ], that two common vertex pairs of the root [ e _ b.v0_ low ] are merged and the root vertex is changed into a new root component information of e _ b.v0_ low.
5.5. At this time, the number of vertexes of the connected component corresponding to e _ b.v0_ low and e _ b.v1_ low, that is, the number of elements of son [ root [ e _ b.v0_ low ] and son [ root [ e _ b.v1_ low ], is compared. If son [ root [ e _ b.v0_ low ] ]. size < son [ root [ e _ b.v1_ low ] ]. size (size represents the number of elements of a vector, son [ root [ e _ b.v0_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v0_ low, and son [ root [ e _ b.v1_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v1_ low), the IDs of the two vertices are reversed so that the number of elements of son [ root [ e _ b.v0_ low ] is greater than the number of elements of son [ root [ e _ b.v1_ low ]. This is done to perform a merging operation of a connected component having a smaller number of vertices into another connected component when the connected component having root [ e _ b.v0_ low ] as a root vertex and the connected component having root [ e _ b.v1_ low ] as a root vertex are merged. And 5.6.
5.6. Let the root vertex of e _ b.v. 1_ low be root _ v2, i.e. let root _ v2 be root [ e _ b.v. 1_ low ], set the loop variable i, and initialize i to 1.
5.7. If i is equal to son [ root _ v2]. size, it is described that all vertices in the connected components stored in son [ root _ v2] have traversed, all vertex elements in son [ root _ v2] have been inserted into the connected components to be merged, the loop ends, and the loop is turned to 3.4. Otherwise, the description is not traversed and 5.8 is rotated.
5.8. The path compression is performed on the parent-child relationship of the vertex in the connected component son [ root [ e _ b.v0_ low ] with root [ e _ b.v0_ low ] as the root vertex, and each element in son [ root _ v2] is adjusted to be the direct child vertex of the root [ e _ b.v0_ low ] so that no intermediate-level vertex appears between each element in son [ root _ v2] and the root vertex of e _ b.v0_ low ]. The path compression method comprises the following steps: the root vertex of the i-th element in the son [ root _ v2], i.e., the vertex son [ root _ v2] [ i ], is set to root [ e _ b.v0_ low ], that is, the root [ son [ root _ v2] [ i ] ] [ root _ b.v0_ low ], and the vertex son [ root _ v2] [ i ] is inserted into the connected component son [ root [ e _ b.v0_ low ].
5.9. Let i equal i +1, go to 5.7.
And sixthly, finding out the information of all the connected components in the graph G, and dividing the graph G by adopting the connected components. And distributing the graph G divided by the connected components to each processing node of the super computer by adopting scatter divergent operation (distributed parallel programming MPI standard operation function). The method comprises the following steps:
6.1. the vertex number j and the connected component number k are set, and the initialization j is 0 and k is 1.
6.2. And if j is equal to N-1, the vertex is traversed, and k is the number of the connected components in the graph G at the moment, and the seventh step is executed, otherwise, 6.3 is executed.
6.3. Judging whether the root vertex of the vertex vj with the sequence number j is self, namely if root [ v ]j]≠vjDescription of the vertex vjNot the root vertex of the connected component, go 6.5. Otherwise, the vertex v is describedjIs the root vertex of the kth connected component, son [ v [ ]j]All vertices of the kth connected component are stored, which acquires all vertices of the kth connected component. Let k be k + 1. And 6.4.
6.4. Josson [ v ]j]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation.
6.5. Let j equal j +1, go to 6.2.
Step seven, BFS searching and verifying: and randomly generating a root vertex v, and carrying out BFS search on the graph G which is divided by adopting the connected component in the sixth step by taking v as a source point by combining the side information stored in the adjacency matrix A constructed in the second step. And outputting the spanning tree as a search result, recording the Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by searching is matched with the original image information. The process will loop 64 times and each BFS search portion will be clocked separately.
And eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to test the average value of the performance values, obtaining a test result and outputting the test result. The higher the TEPS value is, the stronger the large-scale Graph processing capacity of the surface supercomputer is, the more the Graph500 is ranked, and meanwhile, the supercomputer is more suitable for processing large data.
And ninthly, ending.
The invention can achieve the following technical effects:
1. in the third step of the invention, a two-dimensional vector data structure son is established to store the connected components, so that all the connected components in the graph G can be effectively and quickly stored, the merging speed of the sub-vertexes during the merging operation in the fifth step is improved to the maximum extent, and the efficiency of visiting and traversing the sub-vertexes is optimized.
2. In the invention, the root vector is utilized to realize the searching operation in 3.6 steps, the function for searching the parent vertex is not required to be called for many times, the root vertex can be obtained by directly inquiring the element of the root vector, and the operation is matched with the path compression carried out during the merging in the fifth step, so that the inquiring speed of the root vertex is accelerated.
3. In the fifth step, the path compression is carried out on the parent-child relationship of the vertexes in the connected components, when the two connected components with different root vertexes are combined, not only are the roots of the root vertexes with less child vertexes changed, but also the roots of all the child vertexes are completely changed into new root vertexes, so that the calling level of the search operation is reduced, the occupied cost of a stack in a memory is reduced, and the speed is increased.
4. In the sixth step, the generated connected components compressed by the paths are used for dividing the vertexes in the graph among the physical nodes in the supercomputer, and the vertexes of the same connected component are divided to the physical nodes with shorter communication paths, so that the communication paths are reduced, and the testing speed is improved. And the vertex belonging to the same connected component is only divided to the physical node with a shorter communication path, and when the BFS is carried out in the seventh step, the information of the edge is still obtained from the adjacency matrix A constructed in the second step, so that the accuracy of the Graph500 test is ensured.
Drawings
FIG. 1 is a flowchart of a Graph500 test benchmark program described in the background art.
Fig. 2 is a diagram of main steps of a related art union-finding algorithm, in which fig. 2(a) is a find (find) operation and fig. 2(b) is a merge (unity) operation.
FIG. 3 is an overall flow chart of the present invention.
Fig. 4 is a schematic diagram of the fifth step of compressing the path according to the present invention.
FIG. 5 is a schematic diagram of a son two-dimensional vector memory.
FIG. 6 is a schematic diagram of graph division by using connected components in the sixth step of the present invention.
The specific implementation mode is as follows:
the invention is further illustrated below with reference to the accompanying drawings, as shown in fig. 3, comprising the following steps:
the first step, graph generation. Generating a random graph structure G (V, E) through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, and the scale of the graph is determined by parameters scale and edge factor input by a user, wherein scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2scaleThe number of vertices of G, i.e., the number of vertices in V, and M ═ edgefactor × N indicates the number of edges of G, i.e., the number of elements of E. Using viDenotes the vertex with number i in G, and uses the vertex pair (v)i,vj) Representing a vertex viTo the vertex vjThe edge of (2). (v)i,vj) Belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1.
And secondly, constructing an adjacency matrix A for storing the graph G. A. theij0 denotes the vertex viAnd vertex vjBetween which there is no edge, Aij1 denotes the vertex viAnd vertex vjWith edges in between.
Thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge in the edge set E to eliminate interference of the self-loop edge, classifying according to different conditions of the two vertexes of the edge, and facilitating next processing, wherein the method comprises the following steps:
3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] Vi]Representing a vertex viRoot vertex of root, root [ v ]i]Initialisation to-1, son is a two-dimensional vector comprising N elements, each element being a vector, initialising each element in son to a null vector. son [ v ]i]Representing a vertex viIs used to store the sub-vertex vector of (a) with the vertex viSet of vertices being root vertices, i.e. with vertex viThe vertex information is vertex information of a connected component of the root vertex, and the connected component is the content stored in the son vector by the vertex whose root value is itself. The initialization variable e is 1.
3.2. And creating a structural body, namely, a packed _ edge consistent with an edge data storage format in the Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for later function extension. The first vertex of the edge e _ b with ID v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with ID v1_ low is denoted by e _ b.v1_ low.
3.3. An edge E _ b is created with the trellis coded _ edge for storing the edge information read from E.
3.4. If E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5.
3.5. And if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4.
3.6. And judging whether the root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] -, turning to the fifth step.
And fourthly, processing the condition that the root vertexes are the same. If neither E _ b.v0_ low nor E _ b.v1_ low has been visited, then E _ b.v0_ low and E _ b.v1_ low are merged into the connected component son [ E _ b.v0_ low ] with E _ b.v0_ low as the root vertex, whereas if both E _ b.v0_ low and E _ b.v1_ low have been visited, meaning that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, then E _ b is skipped, accessing the next edge in E. The method comprises the following steps:
4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is stated that both e _ b.v0_ low and e _ b.v1_ low are accessed and already in the same connected component son [ e _ b.v0_ low ], no merging operation is required, and go to 3.4.
4.2. The method includes the steps of merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting the e _ b.v0_ low as the root vertex of two vertexes, namely e _ b.v0_ low and e _ b.v1_ low, and setting elements corresponding to root vectors of the e _ b.v0_ low and the e _ b.v1_ low as e _ b.v0_ low, namely setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low.
4.3. And inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], and inserting e _ b.v1_ low into son [ e _ b.v0_ low ]. And 3.4.
And fifthly, carrying out path compression on the parent-child relationship of the vertex according to the different conditions of the root vertex, and merging two different connected components of the root vertex. The method comprises the following steps:
5.1. if root e _ b.v0_ low is-1, go to 5.2, otherwise go to 5.3.
5.2. At this time, the root [ e _ b.v0_ low ] ═ 1, and the root [ e _ b.v1_ low ] ≠ -1, which indicates that the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited at this time, e _ b.v0_ low is inserted into the connected component son [ e _ b.v1_ low ] (with the root vertex of e _ b.v1_ low) as the root vertex) (i.e., e _ b.v0_ low is merged into the connected component with the root vertex [ e _ b.v1_ low ] (representing the root vertex of e _ b.v1_ low), and the root of e _ b.v0_ low is the root vertex of e _ b.v1_ low, so that the root of the root [ e _ b.v0_ low ] (with the root vertex of e _ b.v1_ low) is merged into the root vertex of e _ b.v1_ low, and the root of the root _ b.v0_ low is merged into the root component of the root (e _ b.v0.v0.v0 _ low), and the root of the root.
5.3. If root e b v1 low ═ 1, go to 5.4, otherwise go to 5.5.
5.4. At this time, the vertex e _ b.v0_ low has been visited, and the vertex e _ b.v1_ low has not been visited, e _ b.v1_ low is inserted into the connected component son [ root [ e _ b.v0_ low ]) corresponding to the root vertex of e _ b.v0_ low, i.e. e _ b.v1_ low is merged into the connected component using the vertex root [ e _ b.v0_ low ], and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, i.e. the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], and the insertion and the change of the root vertex together complete the merging operation of the two connected components, and the connected component using the root [ e _ b.v0_ low ] as the root vertex is added with new vertex information, and the vertex information is converted into 3.4.4.
5.5. At this time, the number of vertexes of the connected component corresponding to e _ b.v0_ low and e _ b.v1_ low, that is, the number of elements of son [ root [ e _ b.v0_ low ] and son [ root [ e _ b.v1_ low ], is compared. If son [ root [ e _ b.v0_ low ] ]. size < son [ root [ e _ b.v1_ low ] ]. size (size represents the number of elements of a vector, son [ root [ e _ b.v0_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v0_ low, and son [ root [ e _ b.v1_ low ] ]. size represents the number of child nodes of the root vertex of e _ b.v1_ low), the IDs of the two vertices are reversed so that the number of elements of son [ root [ e _ b.v0_ low ] is greater than the number of elements of son [ root [ e _ b.v1_ low ]. This is done to perform a merging operation of a connected component having a smaller number of vertices into another connected component when the connected component having root [ e _ b.v0_ low ] as a root vertex and the connected component having root [ e _ b.v1_ low ] as a root vertex are merged. And 5.6.
5.6. Let the root vertex of e _ b.v. 1_ low be root _ v2, i.e. let root _ v2 be root [ e _ b.v. 1_ low ], set the loop variable i, and initialize i to 1.
5.7. If i is equal to son [ root _ v2]. size, it is described that all vertices in the connected components stored in son [ root _ v2] have traversed, all vertex elements in son [ root _ v2] have been inserted into the connected components to be merged, the loop ends, and the loop is turned to 3.4. Otherwise, the description is not traversed and 5.8 is rotated.
5.8. For root [ e _ b.v0_ low ]]Is the root vertex's connected component son [ e _ b.v0_ low]]Compressing the path of parent-child relationship of middle vertex, and compressing son [ root _ v2]]Is adjusted to root e _ b.v0_ low]Such that son root _ v2]No intermediate level vertices are present between each element in (a) and the root vertex of e _ b.v0_ low. The path compression method comprises the following steps: joson root _ v2]The ith element in (1), i.e., vertex son [ root _ v2][i]Is set to root e _ b.v0_ low]In an order of root (son _ v 2)][i]]=root[e_b.v0_low]And connects the vertex son [ root _ v2]][i]Insert connected component son [ e _ b.v0_ low ]]]. FIG. 5 is a diagram of son vector storage according to the present invention, wherein son vectors store all child vertices that depend from the root vertex. In FIG. 5, vector son [ v ]0]Due to the element number ratio vector son [ v ]2]Less, it is necessary to copy the elements in the vector to the vector son v2]In the order of son [ v ]0]The elements in (a) are copied to the vector son [ v ] in turn2]To the end of (c). son [ v ]0][0]、son[v0][1]、son[v0][2].. shows the vertex v before merging0The IDs of vertices that are root vertices, which are each replicated to son after merging2][m]、son[v2][m+1]、son[v2][m+2]... (m is son [ v ]2]The starting subscript of the newly added element).
FIG. 4 is a schematic diagram of the fifth step of compressing paths according to the parent-child relationship of the vertices and merging two connected components with different root vertices, according to the situation that the root vertices are different, corresponding to operations 5.5-5.8 of the present invention, showing when the access is from v0And v2When the side is formed, son [ v ]0]Stored connected component sum son [ v [ ]2]The stored connected components merge due to son [ v [ ]0]The number of the middle vertexes is less, so that the vertexes are directly connectedDown to son [ v ]2]Store connected component under root vertex, i.e. vertex son [ v ]0][0]、son[v0][1]、son[v0][2].. as son [ v ]2][0]Are respectively copied as son [ v ]2][m]、son[v2][m+1]、son[v2][m+2].., the tree hierarchy is reduced.
5.9. Let i equal i +1, go to 5.7.
And sixthly, finding out the information of all the connected components in the graph G, and dividing the graph G by adopting the connected components. And distributing the graph G divided by the connected components to each processing node of the super computer by adopting scatter divergence operation.
The method comprises the following steps:
6.6. the vertex number j and the connected component number k are set, and the initialization j is 0 and k is 1.
6.7. And if j is equal to N-1, the vertex is traversed, and k is the number of the connected components in the graph G at the moment, and the seventh step is executed, otherwise, 6.3 is executed.
6.8. Determining the vertex v with sequence number jjIs itself, i.e. if root vj]≠vjDescription of the vertex vjNot the root vertex of the connected component, go 6.5. Otherwise, the vertex v is describedjIs the root vertex of the kth connected component, son [ v [ ]j]All vertices of the kth connected component are stored, which acquires all vertices of the kth connected component. Let k be k + 1. And 6.4.
6.9. Josson [ v ]j]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation.
6.10. Let j equal j +1, go to 6.2.
FIG. 6 is a schematic diagram of graph division using connected components in the sixth step of the present invention. The storage nodes on the super computer system are represented by boxes, CCk (k is 1, 2, 3.) -in each box represents all connected components in the graph G, sequence numbers on connecting lines identify the actual access sequence, the 'say.' before and after the numbers represent the complexity of the path and the number of network hops required to be passed, the more the 'say.' the more the communication path is, the more the number of hops required to be passed is, when the connected components are used for dividing the graph, for the vertex vi, the connected component where the vi is located is assumed to be CC1(k is 1), the vi root vertices are root [ vi ], all the vertices of the connected component where the vi is located can be obtained from son [ root [ vi ] ], other vertices in the CC1 are now divided to nodes which are closer to the node of the vertex vi, and the nodes are preferentially accessed.
Step seven, BFS searching and verifying: and randomly generating a root vertex v, and carrying out BFS search on the graph G which is divided by adopting the connected component in the sixth step by taking v as a source point by combining the side information stored in the adjacency matrix A constructed in the second step. And outputting the spanning tree as a search result, recording the Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by searching is matched with the original image information. The process will loop 64 times and each BFS search portion will be clocked separately.
And eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to test the average value of the performance values, obtaining a test result and outputting the test result. The higher the TEPS value is, the stronger the large-scale Graph processing capacity of the surface supercomputer is, the more the Graph500 is ranked, and meanwhile, the supercomputer is more suitable for processing large data.
And ninthly, ending.

Claims (2)

1. A supercomputer benchmark test acceleration method based on connected component generation optimization is characterized by comprising the following steps:
firstly, generating a graph, namely (V, E) generating a random graph structure G through a Kronecker graph generator, wherein V is a vertex set, E is an edge set, the scale of the graph is determined by parameters scale and edge factor input by a user, scale indicates the scale of the vertex of the graph, edge factor indicates the average number of connecting edges of each vertex, and N is 2scaleThe number of vertices of G, that is, the number of vertices in the element of V, and M ═ edgefactor × N represents the number of edges of G, that is, the number of elements of E; using viDenotes the vertex numbered i in G,using vertex pair (v)i,vj) Representing a vertex viTo the vertex vjThe edge of (1); (v)i,vj) E belongs to E, i and j are positive integers, i is more than or equal to 0 and less than or equal to N-1, and j is more than or equal to 0 and less than or equal to N-1;
second, construct the adjacency matrix A, A of the memory map Gij0 denotes the vertex viAnd vertex vjBetween which there is no edge, Aij1 denotes the vertex viAnd vertex vjThere is an edge in between;
thirdly, initializing a data structure, setting root vertexes and sub vertexes of all vertexes in the V to be corresponding values, traversing the edge set E, removing a self-loop edge in the edge set E, namely an edge connecting the vertex with the edge, and classifying according to different conditions of the two vertexes of the edge, wherein the method comprises the following steps:
3.1. according to the data scale of the graph G, root vertex vectors root and two-dimensional sub-vertex vectors son of all the vertexes in the V are initialized, wherein the root comprises N elements, and the root [ V ] Vi]Representing a vertex viRoot vertex of root, root [ v ]i]Initializing to-1, wherein the son is a two-dimensional vector and comprises N elements, each element is a vector, and each element in the son is initialized to be a null vector; son [ v ]i]Representing a vertex viIs used to store the sub-vertex vector of (a) with the vertex viSet of vertices being root vertices, i.e. with vertex viVertex information of a connected component which is a root vertex, wherein the connected component is the content stored in the son vector by the vertex of which the root value is self; initializing a variable e to 1;
3.2. creating a structural body, namely, a packed _ edge consistent with an edge data storage format in a Graph500 source code, wherein the packed _ edge comprises three int-type integer variables, a first variable v0_ low is the ID of a first vertex forming an edge, a second variable v1_ low is the ID of a second vertex forming an edge connected with the first variable v0_ low, and a third variable high is reserved for function extension; the first vertex of the edge e _ b with the ID of v0_ low is denoted by e _ b.v0_ low, and the second vertex of the edge e _ b with the ID of v1_ low is denoted by e _ b.v1_ low;
3.3. creating an edge E _ b by using the structure packed _ edge for storing the edge information read from the E;
3.4. if E > M, it indicates that the edge set E has been processed, go to the sixth step, otherwise, read the E-th edge from the edge set E in order, make E _ b equal to the E-th edge, where E _ b.v0_ low and E _ b.v1_ low are two vertices that constitute E _ b, make E equal to E +1, go to 3.5;
3.5. if e _ b.v0_ low is not equal to e _ b.v1_ low, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is not a self-looping edge, and turning to 3.6, otherwise, the edge formed by connecting e _ b.v0_ low and e _ b.v1_ low is a self-looping edge, and directly turning to 3.4;
3.6. judging whether root vertexes of e _ b.v0_ low and e _ b.v1_ low are the same, if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fourth step, and if the root [ e _ b.v0_ low ] ═ root [ e _ b.v1_ low ], turning to the fifth step;
step four, processing the condition that the root vertex is the same, if neither E _ b.v0_ low nor E _ b.v1_ low has been visited, merging E _ b.v0_ low and E _ b.v1_ low into a connected component son [ E _ b.v0_ low ] taking E _ b.v0_ low as the root vertex, if both E _ b.v0_ low and E _ b.v1_ low have been visited, indicating that E _ b.v0_ low and E _ b.v1_ low are already in the same connected component, skipping E _ b, and visiting the next edge in E, wherein the method comprises the following steps:
4.1. if root [ e _ b.v0_ low ] ═ 1, it means that neither e _ b.v0_ low nor e _ b.v1_ low has been accessed, and the edge formed by e _ b.v0_ low and e _ b.v1_ low is the first access, go to 4.2; otherwise, it is stated that both e _ b.v0_ low and e _ b.v1_ low are accessed and already in the same connected component son [ e _ b.v0_ low ], no merging operation is required, and 3.4 is performed;
4.2. merging e _ b.v1_ low into a connected component with e _ b.v0_ low as a root vertex, setting e _ b.v0_ low as the root vertex of two vertexes, i.e. setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root vector of e _ b.v1_ low as e _ b.v0_ low, i.e. setting the root [ e _ b.v0_ low ] as e _ b.v0_ low and the root [ e _ b.v1_ low ] as e _ b.v0_ low;
4.3. inserting ID numbers of e _ b.v0_ low and e _ b.v1_ low into a son vector corresponding to e _ b.v0_ low, namely adding new vertex information into a connected component son [ e _ b.v0_ low ], namely inserting e _ b.v0_ low into son [ e _ b.v0_ low ], inserting e _ b.v1_ low into son [ e _ b.v0_ low ], and turning to 3.4;
and fifthly, performing path compression on parent-child relations of the vertexes according to different conditions of the root vertexes, and combining two different connected components of the root vertexes, wherein the method comprises the following steps:
5.1. if root [ e _ b.v0_ low ] ═ 1, go to 5.2, otherwise, go to 5.3;
5.2. when the vertex e _ b.v1_ low has been visited and the vertex e _ b.v0_ low has not been visited, e _ b.v0_ low is inserted into the connected component son [ root [ e _ b.v1_ low ] ] corresponding to the root vertex of e _ b.v1_ low, i.e. e _ b.v0_ low is merged into the connected component with the vertex root [ e _ b.v1_ low ], and the root vertex of e _ b.v0_ low is changed into the root vertex of e _ b.v1_ low, i.e. the root [ e _ b.v1_ low ] (root [ e _ b.v0_ low ], the insertion and the change of the root together complete the merging operation of the two connected components, and the connected component with the root [ e _ b.v1_ low ] as the root is added with new vertex information, and the new vertex information is transferred by 3.4.4;
5.3. if root [ e _ b.v1_ low ] ═ 1, go to 5.4, otherwise, go to 5.5;
5.4. when the vertex e _ b.v0_ low has been visited and the vertex e _ b.v1_ low has not been visited, e _ b.v1_ low is inserted into the connected component son [ root [ e _ b.v0_ low ] corresponding to the root vertex of e _ b.v0_ low ], i.e. e _ b.v1_ low is merged into the connected component using the vertex root [ e _ b.v0_ low ], and the root vertex of e _ b.v1_ low is changed into the root vertex of e _ b.v0_ low, i.e. the root [ e _ b.v0_ low ] is made equal to root [ e _ b.v1_ low ], and the merging operation of the two connected components is completed by inserting and changing the root vertex, and the connected component using the root [ e _ b.v0_ low ] as the root vertex is added with new vertex information, and the vertex information is converted into 3.4;
5.5. when root [ e _ b.v0_ low ] ≠ 1 and root [ e _ b.v1_ low ] ≠ -1, compare the numbers of vertices of connected components corresponding to e _ b.v0_ low and e _ b.v1_ low, i.e. compare the numbers of elements of son [ e _ b.v0_ low ] and son [ e _ b.v1_ low ], [ e _ b.v0_ low ], [ size < son [ e _ b.v1_ low ] ]. size, size represents the number of elements of the vector, son [ e _ b.v0_ low ], [ size ] represents the number of child nodes of the root element of e _ b.v0_ low ], size [ e _ b.v0_ low ], [ size ] represents the number of child nodes of the root element of e _ b.v0_ low ], [ number of child nodes of the vertex [ e _ b.v0_ low ], [ ID _ b.v0_ low ], [ number of child nodes of the root element of son _ b.v0_ low ], [ e _ b.1 ] so that the number of child nodes of the root element of son _ b.b.v0 _ low ], [ ID ] is greater than two child nodes; rotating by 5.6;
5.6. recording the root vertex of e _ b.v. 1_ low as root _ v2, namely, setting a loop variable i and initializing i as 1 by using root _ v2 as root [ e _ b.v. 1_ low ];
5.7. if i is son [ root _ v2]. size, go 3.4. Otherwise, turning to 5.8;
5.8. performing path compression on parent-child relationships of vertices in a son [ root [ e _ b.v0_ low ] of a connected component with the root [ e _ b.v0_ low ] as a root vertex, and adjusting each element in the son [ root _ v2] to be a direct child vertex of the root [ e _ b.v0_ low ] so that no vertex of an intermediate level exists between each element in the son [ root _ v2] and the root vertex of the e _ b.v0_ low; the path compression method comprises the following steps: setting a root vertex of an i-th element in the son [ root _ v2], that is, a vertex son [ root _ v2] [ i ] to root [ e _ b.v0_ low ], that is, making root [ son [ root _ v2] [ i ] ] [ root _ b.v0_ low ], and inserting vertex som [ root _ v2] [ i ] into a connected component son [ root [ e _ b.v0_ low ] ];
5.9. converting i to i +1, and converting to 5.7;
sixthly, finding out information of all connected components in the graph G, dividing the graph G by adopting the connected components, and distributing the graph G divided by adopting the connected components to each processing node of the super computer by adopting scatter divergence operation;
step seven, BFS searching and verifying: randomly generating a root vertex v, carrying out BFS search on a Graph G which is divided by adopting a connected component by taking v as a source point by combining side information stored in an adjacent matrix A, outputting a spanning tree as a search result, recording Graph500 effective timing time t, and verifying whether the BFS spanning tree obtained by search is matched with original Graph information or not; the process is circulated for 64 times, and each BFS searching part is timed;
eighthly, calculating an evaluation value of the graph test performance, namely traversing the BFS of 64 spanning trees to test the average value of the performance values, obtaining and outputting a test result;
and ninthly, ending.
2. The sixth step of claim 1, wherein the method for finding out the information of all the connected components in the graph G, dividing the graph G by using the connected components, and distributing the graph G divided by using the connected components to each processing node of the supercomputer by using scatter divergence operation comprises:
6.1. setting a vertex serial number j and a connected component serial number k, initializing j to be 0, and initializing k to be 1;
6.2. if j is equal to N-1, it is described that all the vertexes have been traversed, at this time k is the number of connected components in the graph G, the seventh step is switched, otherwise, 6.3 is switched;
6.3. determining the vertex v with sequence number jjIs itself, i.e. if root vj]≠vjDescription of the vertex vjTurning to 6.5 when the root vertex is not the connected component; otherwise, the vertex v is describedjIs the root vertex of the kth connected component, son [ v [ ]j]Storing all the vertexes of the kth connected component, and acquiring all the vertexes of the kth connected component; let k be k + 1; 6.4, rotating;
6.4. josson [ v ]j]The vertexes in the super computer are distributed to the same or similar physical nodes of the super computer by adopting scatter divergence operation;
6.5. let j equal j +1, go to 6.2.
CN202110293568.7A 2021-03-19 2021-03-19 Supercomputer benchmark test acceleration method based on connected component generation optimization Active CN112883241B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110293568.7A CN112883241B (en) 2021-03-19 2021-03-19 Supercomputer benchmark test acceleration method based on connected component generation optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110293568.7A CN112883241B (en) 2021-03-19 2021-03-19 Supercomputer benchmark test acceleration method based on connected component generation optimization

Publications (2)

Publication Number Publication Date
CN112883241A true CN112883241A (en) 2021-06-01
CN112883241B CN112883241B (en) 2022-09-09

Family

ID=76041241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110293568.7A Active CN112883241B (en) 2021-03-19 2021-03-19 Supercomputer benchmark test acceleration method based on connected component generation optimization

Country Status (1)

Country Link
CN (1) CN112883241B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115051936A (en) * 2022-03-31 2022-09-13 中国电子科技集团公司第十五研究所 Multi-graph-based connected component increment calculation method
CN116069603A (en) * 2021-09-28 2023-05-05 华为技术有限公司 Performance test method of application, method and device for establishing performance test model
CN117056978A (en) * 2023-08-30 2023-11-14 西安电子科技大学 Security union checking method based on arithmetic sharing and operation method thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858766B (en) * 2023-03-01 2023-05-05 中国人民解放军国防科技大学 Interest propagation recommendation method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193899A (en) * 2017-05-10 2017-09-22 华中科技大学 A kind of friendly strongly connected graph division methods of nomography
US20180203897A1 (en) * 2017-01-18 2018-07-19 Oracle International Corporation Fast graph query engine optimized for typical real-world graph instances whose small portion of vertices have extremely large degree
CN109656798A (en) * 2018-12-26 2019-04-19 中国人民解放军国防科技大学 Vertex reordering-based big data processing capability test method for supercomputer
CN109684185A (en) * 2018-12-26 2019-04-26 中国人民解放军国防科技大学 Heuristic traversal-based big data processing capacity test method for supercomputer
CN111881327A (en) * 2020-07-30 2020-11-03 中国人民解放军国防科技大学 Big data processing capacity testing method based on vertex reordering and priority caching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180203897A1 (en) * 2017-01-18 2018-07-19 Oracle International Corporation Fast graph query engine optimized for typical real-world graph instances whose small portion of vertices have extremely large degree
CN107193899A (en) * 2017-05-10 2017-09-22 华中科技大学 A kind of friendly strongly connected graph division methods of nomography
CN109656798A (en) * 2018-12-26 2019-04-19 中国人民解放军国防科技大学 Vertex reordering-based big data processing capability test method for supercomputer
CN109684185A (en) * 2018-12-26 2019-04-26 中国人民解放军国防科技大学 Heuristic traversal-based big data processing capacity test method for supercomputer
CN111881327A (en) * 2020-07-30 2020-11-03 中国人民解放军国防科技大学 Big data processing capacity testing method based on vertex reordering and priority caching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
衡冬冬等: "并行原型系统上BFS算法设计实现与测试分析", 《计算机工程与科学》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069603A (en) * 2021-09-28 2023-05-05 华为技术有限公司 Performance test method of application, method and device for establishing performance test model
CN116069603B (en) * 2021-09-28 2023-12-08 华为技术有限公司 Performance test method of application, method and device for establishing performance test model
CN115051936A (en) * 2022-03-31 2022-09-13 中国电子科技集团公司第十五研究所 Multi-graph-based connected component increment calculation method
CN117056978A (en) * 2023-08-30 2023-11-14 西安电子科技大学 Security union checking method based on arithmetic sharing and operation method thereof

Also Published As

Publication number Publication date
CN112883241B (en) 2022-09-09

Similar Documents

Publication Publication Date Title
CN112883241B (en) Supercomputer benchmark test acceleration method based on connected component generation optimization
Chen et al. Topological properties, communication, and computation on WK‐recursive networks
CN112165405B (en) Method for testing big data processing capacity of supercomputer based on network topological structure
CN109522428B (en) External memory access method of graph computing system based on index positioning
CN109656798B (en) Vertex reordering-based big data processing capability test method for supercomputer
CN112100450A (en) Graph calculation data segmentation method, terminal device and storage medium
CN113449153B (en) Index construction method, apparatus, computer device and storage medium
CN104731925A (en) MapReduce-based FP-Growth load balance parallel computing method
CN114567634B (en) Method, system, storage medium and electronic device for calculating E-level map facing backward
CN111369052B (en) Simplified road network KSP optimization algorithm
Stergiou et al. Multiple-value exclusive-or sum-of-products minimization algorithms
CN111881327A (en) Big data processing capacity testing method based on vertex reordering and priority caching
Considine Cluster-based optimizations for distributed hash tables
CN112765409B (en) Distributed community discovery method based on modularity
Lai et al. Exploiting and evaluating MapReduce for large-scale graph mining
CN113726342B (en) Segmented difference compression and inert decompression method for large-scale graph iterative computation
CN116613892B (en) Device incremental topology analysis method, device, computer device and storage medium
de Alencar Vasconcellos et al. A new efficient parallel algorithm for minimum spanning tree
CN114281830B (en) Rule mapping table construction method, rule matching method and device for multi-attribute conditions
Fantozzi et al. A general PRAM simulation scheme for clustered machines
Dou et al. Distributed Construction of Near-Optimal Compact Routing Schemes for Planar Graphs
CN115310613A (en) Quantum computing platform adaptation method and device and quantum computer operating system
Herley et al. Implementing shared memory on mesh-connected computers and on the fat-tree
Pietracaprina et al. Constructive, deterministic implementation of shared memory on meshes
CN118151861A (en) Partition determination method and device for distributed storage of graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant