CN109656798A

CN109656798A - Vertex reordering-based big data processing capability test method for supercomputer

Info

Publication number: CN109656798A
Application number: CN201811600894.2A
Authority: CN
Inventors: 甘新标; 曾瑞庚; 吴涛; 杨志辉; 孙泽文; 刘杰; 龚春叶; 李胜国; 杨博; 徐涵; 晏益慧
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2019-04-19
Anticipated expiration: 2038-12-26
Also published as: CN109656798B

Abstract

The invention discloses a method for testing the big data processing capacity of a super computer based on vertex reordering, and aims to improve the speed of testing the big data processing capacity of the super computer. The technical scheme includes that a graph is generated, an adjacency matrix of the graph is constructed, vertexes in the graph are sequenced based on vertex degrees, BFS searching is conducted on the graph by utilizing a sequenced vertex set, the characteristic that the vertex with high degrees has high probability of edge association is utilized in the same-layer traversal, child nodes of nodes in the current-layer vertex set are searched in a traversal mode, the vertex with high degrees is detected in a traversal mode preferentially, and invalid access traversal is reduced to the maximum extent. The invention can improve the hit rate of the edge relation between the nodes, reduce the invalid access times, avoid unnecessary access to the memory to the maximum extent, accelerate the traversal of the graph and improve the test speed of the big data processing capacity of the supercomputer.

Description

The supercomputer big data processing capacity test method to be reordered based on vertex

Technical field

It is espespecially a kind of to be surpassed based on what vertex was reordered the present invention relates to supercomputer big data processing capacity test method Grade computer big data processing capacity test method.

Background technique

Graph structure is one of most important data structure in big data application, is widely used in various fields, Such as social media, bioinformatics, astrophysics, artificial intelligence, data mining.The common feature of these applications is several According to amount greatly and structure is complicated, often can achieve billions of a sides and many trillion node, this cause data store and calculating There is higher demand in terms of power.Supercomputer is mainly used for numerical value calculating, and most of HPC benchmark tests are all to calculate power As measurement standard, the HPL used such as Top 500；In the big data era that data-intensive applications rise extensively, Graph 500 important supplement as Top500 is the new benchmark for testing supercomputer computing capability.Graph 500 With the big data of quantity TEPS (Traversed Edge Per Second) the Lai Hengliang supercomputer on side in traversing graph per second Processing capacity, and the pretreatment before traversal is not counted in time-consuming.

Graph500 benchmark is generated by figure, figure is established, BFS search exports four parts and form with verifying, result, As shown in Figure 1.

(1) figure generates: program generates a series of side tuple informations by Kronecker diagram generator, the scale of figure by with Parameter SCALE, edegefactor of family input determines, wherein SCALE indicates the vertex scale of figure, edegefactor instruction The par on each vertex connection side, N=2^sCALEIndicate that the vertex number of input figure, M=edgefactor*N indicate input The number of edges mesh of figure.

(2) figure is established: the vertex that figure generation phase generates and side information are converted to the data of any expression figure by this process Structure stores figure information using the adjacency matrix of figure in standard graph500.

(3) BFS (Breadth-First Search) search and verifying: a root vertex is generated at random, and as source Point carries out BFS search to entire figure, records the forerunner vertex on each vertex, and output spanning tree records BFS and search as search result Rope time t, and the obtained BFS spanning tree of verification search whether with original image information matches.The process will recycle 64 times, and right respectively Each BFS searches for part timing.

(4) result exports: for the number of edges TEPS of Graph500 traversal per second come the execution performance of measurement procedures, TEPS=is raw At the number of edges M of figure divided by BFS search time t, that is, 64 times searching loop calculates separately TEPS=M/t, then takes 64 TEPS's Foundation of the average value as Graph500 final test and ranking.

Scheming G=(V, E) includes vertex set V and line set E, usually using v_iThe vertex that number is i in expression figure uses Vertex is to (v_i,v_j) indicate vertex i to the side of vertex j.(v_i,v_j) ∈ E, 0≤i≤N_V- 1,0≤j≤N_V- 1, N_VFor vertex in V Number.G usually indicates with adjacency matrix A, the i-th row A in A_iFor adjacency list.As shown in Fig. 2, the figure G shaped like Fig. 2 (a) is available The adjacency matrix A of Fig. 2 (b) indicates, the elements A of the i-th row jth column in A_ijIndicate side (v_i,v_j).Indicate that there are this usually using 1 The side of sample, 0 indicates that such side is not present.

In conclusion needing once successively to traverse vertex all in V in BFS search in Graph500, examination and root knot Relationship between point.Therefore, memory bandwidth is the key factor for influencing the performance of Graph500.Also, the measurement of Graph500 Only using BFS search time as measure time according in, the diagram data before not limiting BFS prepares and pretreatment time. Experiment shows that Graph500 test performance is primarily limited to memory size and memory bandwidth, and bandwidth is higher, and performance is also got over It is good, unnecessary memory access how is reduced when memory bandwidth is constant, improves effective memory access frequency, promotes Graph500 test performance Technical problem as those skilled in the art's urgent need to resolve.

Summary of the invention

The technical problem to be solved in the present invention is that: time-consuming and degree is not counted in using the pretreatment before BFS traversal Have the characteristics that the probability of frontier juncture system is high between high vertex and root node, proposes a kind of supercomputing reordered based on vertex Machine big data processing capacity test method reduces memory access number, avoids invalid memory access to greatest extent, makes pair compared to Graph500 Graph traversal accelerates, and improves supercomputer big data processing capacity test speed.

The specific technical proposal is:

The first step, figure generate.Random graph structure G=(V, E) is generated using Kronecker diagram generator, V is vertex set It closes, wherein including N_VA vertex, N_VFor positive integer；E is line set；It include N in E_ESide, N_EFor positive integer；

The adjacency matrix A of second step, building storage figure G.A_ijThere is no side, A between=0 expression vertex i and vertex j_ij=1 Indicate there is side, 0≤i≤N between vertex i and vertex j_V- 1,0≤j≤N_V- 1, i and j are positive integer；

Third step pre-processes V based on degree of vertex, and the specific method is as follows:

3.1. each vertex in V is traversed, and records each degree of vertex, obtains in degree of vertex set D, D i-th yuan Plain deg (v_i) indicate vertex v_iDegree, that is, have deg (v_i) a vertex and vertex v_iBetween have side；

3.2. it sorts to the vertex in V: descending sort being carried out to the element in D using Bubble Sort Algorithm, is sorted Degree of vertex binary group set D2 afterwards,I-th of element in D2 < v_i,deg(v_i) > expression vertex v_iDegree be deg (v_i), and meetVertex in V is ranked up according to D2, is sorted Vertex set Deg afterwards,First element v in Deg₀Corresponding vertex degree is maximum Vertex, second element v₁Corresponding vertex degree is only smaller than or is equal to the maximum vertex of degree, and the identical vertex of degree weighs side by side It lists again,The smallest vertex of corresponding vertex degree；

4th step carries out BFS search to figure G using the vertex set Deg after sequence, and the specific method is as follows:

4.1. data structure definition, the specific method is as follows:

4.1.1. not visited vertex set V is defined_ns=V；

4.1.2. set D-tmp=D2 among the degree of vertex of BFS search is defined；

4.1.3. set Deg-tmp=Deg among the degree of vertex after definition sequence；

4.1.4. the vertex set being accessed is defined

4.1.5. defining current layer vertex set

4.1.6. current level of child nodes set is defined

4.1.7. defining child node setIndicate vertex v_iChild node set；

4.1.8. selecting a vertex v at random in V_rAs root vertex, i.e. source summit, r=0,1 ..., N_V；

4.1.9. the set of the child node set of root vertex is enabledSon_rIn element be set；

4.1.10. by vertex v_rIt is added in the vertex set being accessed, V_s=V_s+{V_r}；

4.1.11. by vertex v_rIt is added in current vertex set, i.e. Cur=Cur+ { V_r}；

4.2. it loops through, one cycle exports a spanning tree, recycles 64 times, exports 64 spanning trees, specific method It is as follows:

4.2.1. defining cyclic variable k=0；

4.2.2. obtaining system time t₁；

If 4.2.3. k < 64, turn 4.3；Otherwise, turn the 5th step；

4.3. same layer traverses, and haves the characteristics that the associated probability in side is also high using the high vertex of degree, in traversal search Cur The child node of node, the specific method is as follows:

4.3.1. enabling

4.3.2. if4.3.3 is executed, otherwise, the search of this layer is completed, and is turned 4.4 pairs of next layers and is traversed；

4.3.3. appoint in Cur and take a vertex v_i, it is denoted as current root node v_cs, cs=0,1 ..., N_V；

4.3.4. v is deleted from Cur_cs, i.e. Cur=Cur- { v_cs}；

4.3.5. ifIt executes 4.3.6 first traversal and checks the high vertex of degree, otherwise, current root section Point search finishes, and turns 4.3.20；

4.3.6. inquiry D2 set, finds binary group, confirm v_csDegree be deg (v_cs)；

4.3.7. from not visited vertex set V_nsIt is middle to delete current root node v_cs, i.e. V_ns=V_ns-{v_cs}；

4.3.8. defining cyclic variable m=0；

If 4.3.9. m < deg (v_cs), it executes 4.3.10 and has otherwise found all sides of current vertex totally, turn 4.3.16 v is checked_csAdjacent node, i.e. other element vertex in Cur；

4.3.10. first vertex is selected from Deg-tmp, is enabled as v_j, i.e., the current highest vertex of degree；

4.3.11. adjacency matrix A is inquired, if A_ij=1, indicate vertex v_iWith vertex v_jBetween have a side, execute 4.3.12, it is no Then, turn 4.3.14；

4.3.12. the existing associated vertex in side, i.e. Deg-tmp=Deg-tmp- { v are deleted from set Deg-tmp_j}；

If 4.3.13. v_j∈V_ns, by vertex v_jFrom V_nsMiddle deletion, i.e. V_ns=V_ns-{v_j, directly turn 4.3.14；Otherwise, it says It is brightIt does not need from V_nsMiddle deletion, turns 4.3.14；

4.3.14. updating current root node v_cs, i.e. vertex v_iChild node set, that is, Son_i=Son_i+{v_j}；

4.3.15. the child node set of current layer is updated, that is, L-Son=L-Son+ { v_j}；

4.3.16. Son is added in the child node set of current root node in the form of set element_rIn, i.e. Son_r=Son_r +{Son_i}；

4.3.17.m=m+1；

4.3.18. by vertex v_jFrom V_nsMiddle deletion, i.e. V_ns=V_ns-{v_j}；

4.3.19. ifTurn 4.3.9, otherwise, the vertex traversal of all not visited mistakes finishes, and turns 4.3.20；

4.3.20. root node v before being deleted from current vertex set Cur_cs, i.e. Cur=Cur- { v_cs}；

4.3.21. turn 4.3.1；

4.4. interlayer traverses, and the specific method is as follows:

4.4.1. current layer vertex set is emptied, current vertex set is reset

4.4.2. current level of child nodes set L-Son is assigned to current vertex set, i.e. Cur=L-Son；

4.4.3. obtaining system time t₂；

4.4.4. the time t=t of the heuristic traversal search of record kth time₂-t₁；

4.4.5. turn 4.3.2；

4.5. root vertex set Son is exported_r, Son_rAs kth wheel circulation is with vertex v_rBFS's as root vertex Spanning tree root_k-r；

4.6. test performance is calculated.Calculate the BFS traversal test performance value of current spanning tree

4.7. turn 4.2.2；

5th step, the evaluation of estimate for calculating figure test performance, i.e., the BFS traversal test performance value average value of 64 spanning treesObtain test result.The Large Scale Graphs processing capacity of the higher surface supercomputer of TEPS value Stronger, Graph500 ranking is also more forward, while also reflecting that the supercomputer is more suitable to be handled with big data.

6th step terminates.

Following technical effect can achieve using the present invention:

It is ranked up 1. third step of the present invention is based on degree of vertex opposite vertexes, pre-processes, can be kept away to greatest extent as BFS Exempt from the invalid memory access of BFS, promotes the hit rate that BFS is effectively traversed, optimize BFS traversal efficiency；

2. the 4th step of the invention improves the hit rate of frontier juncture system between node based on the BFS search that vertex is reordered, reduce Invalid memory access number avoids unnecessary memory access to greatest extent, accelerates to improve supercomputer big data to graph traversal Processing capacity test speed.

Detailed description of the invention

Fig. 1 is Graph500 test benchmark program flow diagram；

Fig. 2 is that the adjacency matrix of figure indicates schematic diagram；Fig. 2 (a) is an oriented no weight graph；Fig. 2 (b) is the neighbour of Fig. 2 (a) Connect matrix.

Fig. 3 is overview flow chart of the present invention.

Specific embodiment

Fig. 3 is overview flow chart of the present invention, and step of the invention is as follows:

The first step, figure generate.Random graph structure G=(V, E) is generated using Kronecker diagram generator, V is vertex set It closes, wherein including N_VA vertex, N_VFor positive integer；E is line set；

3.1. each vertex in V is traversed, and records each degree of vertex, obtains degree of vertex set D,

I-th of element deg (v in D_i) indicate vertex v_iDegree, that is, have deg (v_i) a vertex and vertex v_iBetween have side；

3.2. it sorts to the vertex in V: descending sort being carried out to the element in D using Bubble Sort Algorithm, is sorted Degree of vertex binary group set D2 afterwards,, i-th yuan in D2 Plain < v_i,deg(v_i) > expression vertex v_iDegree be deg (v_i), and meetVertex in V is ranked up according to D2, is sorted Vertex set Deg afterwards,First element v in Deg₀Corresponding vertex degree is maximum Vertex, second element v₁Corresponding vertex degree is only smaller than or is equal to the maximum vertex of degree, and the identical vertex of degree weighs side by side It lists again,The smallest vertex of corresponding vertex degree；

4.1. data structure definition, the specific method is as follows:

4.1.12. not visited vertex set V is defined_ns=V；

4.1.13. set D-tmp=D2 among the degree of vertex of BFS search is defined；

4.1.14. set Deg-tmp=Deg among the degree of vertex after definition sequence；

4.1.15. defining the vertex set being accessed

4.1.16. defining current layer vertex set

4.1.17. defining current level of child nodes set

4.1.18. defining child node setIndicate vertex v_iChild node set；

4.1.19. a vertex v is selected at random in V_rAs root vertex, i.e. source summit, r=0,1 ..., N_V；

4.1.20. the set of the child node set of root vertex is enabledSon_rIn element be set；

4.1.21. by vertex v_rIt is added in the vertex set being accessed, V_s=V_s+{V_r}；

4.1.22. by vertex v_rIt is added in current vertex set, i.e. Cur=Cur+ { V_r}；

4.2.4. defining cyclic variable k=0；

4.2.5. obtaining system time t₁；

If 4.2.6. k < 64, turn 4.3；Otherwise, turn the 5th step；

4.3.1. enabling

4.3.4. v is deleted from Cur_cs, i.e. Cur=Cur- { v_cs}；

4.3.6. inquiry D2 set, finds binary group, confirm v_csDegree be deg (v_cs)；

4.3.8. defining cyclic variable m=0；

4.3.17.m=m+1；

4.3.18. by vertex v_jFrom V_nsMiddle deletion, i.e. V_ns=V_ns-{v_j}；

4.3.21. turn 4.3.1；

4.4. interlayer traverses, and the specific method is as follows:

4.4.1. current layer vertex set is emptied, current vertex set is reset

4.4.3. obtaining system time t₂；

4.4.5. turn 4.3.2；

4.7. turn 4.2.2；

6th step terminates.

Claims

1. a kind of supercomputer big data processing capacity test method to be reordered based on vertex, it is characterised in that including following Step:

The first step generates random graph structure G=(V, E), and V is vertex set, wherein including N_VA vertex, N_VFor positive integer；E is Line set includes N in E_ESide, N_EFor positive integer；

The adjacency matrix A, A of second step, building storage figure G_ijThere is no side, A between=0 expression vertex i and vertex j_ij=1 indicates top There are side, 0≤i≤N between point i and vertex j_V- 1,0≤j≤N_V- 1, i and j are positive integer；

3.1. each vertex in V is traversed, and records each degree of vertex, obtains i-th of element in degree of vertex set D, D deg(v_i) indicate vertex v_iDegree, that is, have deg (v_i) a vertex and vertex v_iBetween have side；

3.2. it sorts to the vertex in V: descending sort being carried out to the element in D, the degree of vertex binary group collection after being sorted D2 is closed,I-th in D2 A element < v_i,deg(v_i) > expression vertex v_iDegree be deg (v_i), and meetVertex in V is ranked up according to D2, is sorted Vertex set Deg afterwards,First element v in Deg₀Corresponding vertex degree is maximum Vertex, second element v₁Corresponding vertex degree is only smaller than or is equal to the maximum vertex of degree, and the identical vertex of degree weighs side by side It lists again,The smallest vertex of corresponding vertex degree；

4.1. data structure definition, the specific method is as follows:

4.1.1. not visited vertex set V is defined_ns=V；

4.1.2. set D-tmp=D2 among the degree of vertex of BFS search is defined；

4.1.3. set Deg-tmp=Deg among the degree of vertex after definition sequence；

4.1.4. the vertex set being accessed is defined

4.1.5. defining current layer vertex set

4.1.6. current level of child nodes set is defined

4.1.7. defining child node setIndicate vertex v_iChild node set；

4.1.8. a vertex v is selected at random in V_rAs root vertex, i.e. source summit, r=0,1 ..., N_V；

4.1.11. by vertex v_rIt is added in current vertex set, i.e. Cur=Cur+ { V_r}；

4.2. it looping through, one cycle exports a spanning tree, and it recycles 64 times, exports 64 spanning trees, the specific method is as follows:

4.2.1. defining cyclic variable k=0；

4.2.2. obtaining system time t₁；

If 4.2.3. k < 64, turn 4.3；Otherwise, turn the 5th step；

4.3. same layer traverses, and haves the characteristics that the associated probability in side is also high using the high vertex of degree, traversal search Cur interior joint Child node, the specific method is as follows:

4.3.1. enabling

4.3.4. v is deleted from Cur_cs, i.e. Cur=Cur- { v_cs}；

4.3.5. ifIt executes 4.3.6 first traversal and checks the high vertex of degree, otherwise, current root node is searched Rope finishes, and turns 4.3.20；

4.3.6. inquiry D2 set, finds binary group, confirm v_csDegree be deg (v_cs)；

4.3.8. defining cyclic variable m=0；

If 4.3.9. m < deg (v_cs), it executes 4.3.10 and has otherwise found all sides of current vertex totally, turned 4.3.16 Check v_csAdjacent node, i.e. other element vertex in Cur；

4.3.11. adjacency matrix A is inquired, if A_ij=1, indicate vertex v_iWith vertex v_jBetween have a side, execute 4.3.12, otherwise, turn 4.3.14；

If 4.3.13. v_j∈V_ns, by vertex v_jFrom V_nsMiddle deletion, i.e. V_ns=V_ns-{v_j, directly turn 4.3.14；Otherwise, explanationIt does not need from V_nsMiddle deletion, turns 4.3.14；

4.3.16. Son is added in the child node set of current root node in the form of set element_rIn, i.e. Son_r=Son_r+ {Son_i}；

4.3.17.m=m+1；

4.3.18. by vertex v_jFrom V_nsMiddle deletion, i.e. V_ns=V_ns-{v_j}；

4.3.21. turn 4.3.1；

4.4. interlayer traverses, and the specific method is as follows:

4.4.1. current layer vertex set is emptied, current vertex set is reset

4.4.3. obtaining system time t₂；

4.4.5. turn 4.3.2；

4.5. root vertex set Son is exported_r, Son_rAs kth wheel circulation is with vertex v_rThe generation of BFS as root vertex Set root_k-r；

4.6. the BFS traversal test performance value of current spanning tree is calculated

4.7. turn 4.2.2；

5th step, the evaluation of estimate for calculating figure test performance, i.e., the BFS traversal test performance value average value of 64 spanning treesObtain test result；

6th step terminates.

2. the supercomputer big data processing capacity test method to be reordered as described in claim 1 based on vertex, special Sign is using Kronecker diagram generator generation figure G.

3. the supercomputer big data processing capacity test method to be reordered as described in claim 1 based on vertex, special Sign is to carry out the element in D to use Bubble Sort Algorithm when descending sort.