CN102521119A

CN102521119A - Method for rapidly detecting cluster parallel efficiency

Info

Publication number: CN102521119A
Application number: CN2011103601872A
Authority: CN
Inventors: 郑辉; 陈良华
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2011-11-15
Filing date: 2011-11-15
Publication date: 2012-06-27

Abstract

The invention provides a method rapidly detecting cluster parallel efficiency, which is characterized in that various items of parameters required in a set of LinPack test based on a Linux cluster system are rapidly determined by using an IntelMKLLinPack tool and an N_calculator.sh script compiled by using a shellscript software on a management node, a theoretical floating point peak value and an actual floating point peak value are figured out, a parallel efficiency of the system is further determined, and the purpose of rapidly evaluating a cluster system is achieved.

Description

A kind of method of confirming the cluster parallel efficiency fast

Technical field

The present invention relates to use Shell Scripts (shell script) and LinPack benchmark test to confirm the performance index and the various parameter of the HPC cluster of linux operating system fast, have versatility at extensive multinode linux High-Performance Computing Cluster computing environment.The cluster performance that is specifically related to a kind of parallel environment based on Infiniband network and lustre file system is confirmed.

Background technology

A lot of processors are used in high-performance calculation (HPC) usually, in the high speed internet network, use parallel computation environments such as MPI, operation parallel computation software, the efficient of acceleration section mathematical operations.Be accompanied by HPC the popularizing of colleges and universities and scientific research institutions, set up the large-scale cluster system of high parallel efficiency, causes such as scientific research, education, national defence are had great significance.So fast and effectively the performance of a cover group system being assessed is vital to the large-scale cluster system of setting up high parallel efficiency.

Weigh an important indicator of computing power and calculate peak value exactly, Floating-point Computation peak value for example, it is meant the Floating-point Computation maximum times that computing machine can be accomplished p.s..Comprise theoretical floating-point peak value and actual measurement floating-point peak value: theoretical floating-point peak value is can accomplish the Floating-point Computation maximum times p.s. that can reach on this computer theory; It mainly is by the decision of the dominant frequency of CPU, and each clock period of theoretical floating-point peak value=CPU frequency * CPU is carried out core cpu number in the number of times * system of floating-point operation.

Actual measurement floating-point peak value is meant the Linpack test value, that is to say operation Linpack test procedure on this machine, the test result of the optimum that obtains through the excellent method of various accent.In fact in the practical programs operational process, possibly reach actual measurement floating-point peak value hardly, say nothing of and reached theoretical floating-point peak value.These two values are just as an index weighing machine performance, are used for showing the tolerance of the scale and the potential of machine processing ability.

In the tradition LinPack test, there are some test parameters (for example most important N (problems sizes)) to need to confirm that the unfamiliar people of knowwhy is easy to calculate wrong parameter, causes whole test crash according to concrete cluster environment.Simultaneously, after testing out the actual operation peak-peak, also quite complicated according to the performance of outcome evaluation whole system.

The present invention uses the method based on Shell Script; Interactive environment is provided; Only need to obtain required test parameter of LinPack and performance reference table simply and easily, re-use Linpack and obtain theoretical floating-point peak value and actual floating-point peak value according to prompting input cluster configuration parameter; And further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system.

Summary of the invention

The purpose of this invention is to provide a kind of method of confirming the cluster parallel efficiency fast.

The objective of the invention is to realize by following mode; Required parameters when on management node, using Intel MKL LinPack instrument and the N_calculator.sh script that uses shell script software programming to confirm fast a cover based on the group system LinPack test of Linux; And calculate theoretical floating-point peak value and actual floating-point peak value; Further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system;

Concrete steps are following:

1) the linux system is installed on management node m1, uses the operating system of network installation computing node and lustre node;

2) configuration ssh and rsh do not have the cryptographic acess environment, realize internodal no cryptographic acess;

3) on management node, configuration nis and ntp service, it is synchronous to realize that the user shares with node time;

4) intel compiler, mkl and mpi are installed, application deployment software makes and adopts the infiniband network communication to satisfy network bandwidth requirements between the node to the lustre file directory/opt that shares;

5) under ../intel/mkl/10.X.X.XXX/benchmarks/mp_linpack catalogue, compile out executable file xhpl and the parameter configuration files HPL.dat that is used to test;

6) in the interactive environment of N_calculator.sh, the cluster configuration parameter is provided to N_calculator;

7) use the N_calculator.sh script to calculate N value required among the HPL.dat.And modification HPL.dat file;

8) use Intel MPI operation xhpl, the actual Floating-point Computation peak value of test macro, and, obtain the parallel efficiency calculation of system through system performance table with reference to the generation of N_calculator.sh script.

The invention has the beneficial effects as follows:

1) to extensive multinode linux High-Performance Computing Cluster computing environment, carries out LinPack (Linear System Package)The test of linear system software package obtains the complete machine performance parameter;

2) through Shell Scripts script, the cluster configuration parameter is provided under interactive environment, confirms that fast LinPack calculating is needed N (problems sizes) and systems ability reference table under the various system environmentss;

3) through based on Shell Scripts script, confirm the complete machine parallel running efficient of various system environmentss fast.

Description of drawings

Fig. 1 is the network environment Organization Chart.

Embodiment

It is 2 parts that the network architecture of the present invention is divided into: storage networking and computational grid, wherein:

Storage networking uses 8GB FC optical fiber switch, adopts the memory device of FC SAN framework, divides different lun spaces and is mounted to ls1, ls2 respectively ... On the lustre file system servers such as lsn.Ls1 is total to the n station server to lsn and adopts ls1 to do the mds server, and other servers are done the oss server, and the lun subregion of storage is done mdt and ost equipment respectively, forms the lustre distributed file system, and the readwrite performance of file increases substantially.

Computational grid adopts the Infiniband switch, and the HCA through IB cable collocation server links and receives the lustre node (ls1---lsn), management node (m1) and computing node (c1---cn).Adopt IB over IP communication mechanism, realize internodal express network communication.The bandwidth of IB switch can reach 40Gb/s, and this is that Ethernet institute is unappeasable.

Use the corresponding share directory of m1 management node and computing node carry lustre parallel file system.

Required parameters when the objective of the invention is to confirm fast that through the N_calculator.sh script that on management node, uses Intel MKL LinPack instrument and the own shell of use script to write a cover is tested based on the group system LinPack of Linux; And calculate theoretical floating-point peak value and actual floating-point peak value; Further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system.

Concrete steps of the present invention are following:

The first step is installed the linux system on management node m1, use the operating system of network installation computing node and lustre node.

In second step, configuration ssh and rsh do not have the cryptographic acess environment, realize internodal no cryptographic acess.

The 3rd step, on management node, configuration nis and ntp service, it is synchronous to realize that the user shares with node time.

The 4th step, intel compiler, mkl and mpi are installed, application deployment software makes and adopts the infiniband network communication to satisfy network bandwidth requirements between the node to the lustre file directory/opt that shares.

In the 5th step, under ../intel/mkl/10.X.X.XXX/benchmarks/mp_linpack catalogue, compile out the executable file xhpl and the parameter configuration files HPL.dat that are used to test

In the 6th step, in the interactive environment of N_calculator.sh, the cluster configuration parameter is provided to N_calculator.

In the 7th step, use the N_calculator.sh script to calculate N value required among the HPL.dat.And modification HPL.dat file.

In the 8th step, use Intel MPI operation xhpl, the actual Floating-point Computation peak value of test macro.And, obtain the parallel efficiency calculation of system through system performance table with reference to the generation of N_calculator.sh script.

Embodiment:

Hardware environment: computing node: 7

Every node memory: 8GB

Every node check figure: 12 cores

Cpu frequency: 2.66GHz

Network connects: Infiniband QDR

In the interactive environment of N_calculator.sh, the cluster configuration parameter is provided to N_calculator.sh.Use N_calculator.sh to calculate N (about the use prompting of N_calculator.sh, can use-acquisition of h parameter).

Test result is following

^{The following parameter values will be used:}

^{N : 75000}

^{NB : 192 128}

^{PMAP : Row-major process mapping}

^{P : 7}

^{Q : 12}

^{PFACT : Left}

^{NBMIN : 4}

^{NDIV : 2}

^{RFACT : Crout}

^{BCAST : 1ring}

^{DEPTH : 0}

^{SWAP : Mix (threshold = 256)}

^{L1 : no-transposed form}

^{U : no-transposed form}

^{EQUIL : no}

^{ALIGN : 8 double precision words}

^{--------------------------------------------------------------------------------}

^{- The matrix A is randomly generated for each test.}

^{- The following scaled residual check will be computed:}

^{||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )}

^{- The relative machine precision (eps) is taken to be 1.110223e-16}

^{- Computational tests pass if scaled residuals are less than 16.0}

^{================================================================================}

^{T/V N NB P Q Time Gflops}

^{WR00C2L4 75000 192 7 12 383.36 7.337e+02 --------------------------------------------------------------------------------}

^{||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0023895 ...... PASSED}

^{T/V N NB P Q Time Gflops}

^{WR00C2L4 75000 128 7 12 379.70 7.407e+02 --------------------------------------------------------------------------------}

^{||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0021663 ...... PASSED}

^{============================================================================}

^{Finished 2 tests with the following results:}

^{2 tests completed and passed residual checks,}

^{0 tests completed and failed residual checks,}

^{0 tests skipped because of illegal input values.}

^{----------------------------------------------------------------------------}

^{End of Tests.}

Can know that by above result the actual peak-peak speed of cluster reaches 740.7Gflops.

The theoretical peak of cluster is:

(2.66 dominant frequency) * 4 (each clock period is carried out 4 floating-point operations) * 84 (7 nodes, every node 12 nuclears)=893.76Gflops

So the parallel efficiency of cluster is 740.7/893.76 * 100%=83%

Except that the described technical characterictic of instructions, be the known technology of those skilled in the art.

Claims

1. method of confirming fast the cluster parallel efficiency; It is characterized in that; Required parameters when on management node, using Intel MKL LinPack instrument and the N_calculator.sh script that uses shell script software programming to confirm fast a cover based on the group system LinPack test of Linux; And calculate theoretical floating-point peak value and actual floating-point peak value, and further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system;

Concrete steps are following:

7) use the N_calculator.sh script to calculate N value required among the HPL.dat, and revise the HPL.dat file;