CN102521119A - Method for rapidly detecting cluster parallel efficiency - Google Patents

Method for rapidly detecting cluster parallel efficiency Download PDF

Info

Publication number
CN102521119A
CN102521119A CN2011103601872A CN201110360187A CN102521119A CN 102521119 A CN102521119 A CN 102521119A CN 2011103601872 A CN2011103601872 A CN 2011103601872A CN 201110360187 A CN201110360187 A CN 201110360187A CN 102521119 A CN102521119 A CN 102521119A
Authority
CN
China
Prior art keywords
calculator
node
peak value
script
parallel efficiency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103601872A
Other languages
Chinese (zh)
Inventor
郑辉
陈良华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN2011103601872A priority Critical patent/CN102521119A/en
Publication of CN102521119A publication Critical patent/CN102521119A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a method rapidly detecting cluster parallel efficiency, which is characterized in that various items of parameters required in a set of LinPack test based on a Linux cluster system are rapidly determined by using an IntelMKLLinPack tool and an N_calculator.sh script compiled by using a shellscript software on a management node, a theoretical floating point peak value and an actual floating point peak value are figured out, a parallel efficiency of the system is further determined, and the purpose of rapidly evaluating a cluster system is achieved.

Description

A kind of method of confirming the cluster parallel efficiency fast
Technical field
The present invention relates to use Shell Scripts (shell script) and LinPack benchmark test to confirm the performance index and the various parameter of the HPC cluster of linux operating system fast, have versatility at extensive multinode linux High-Performance Computing Cluster computing environment.The cluster performance that is specifically related to a kind of parallel environment based on Infiniband network and lustre file system is confirmed.
Background technology
A lot of processors are used in high-performance calculation (HPC) usually, in the high speed internet network, use parallel computation environments such as MPI, operation parallel computation software, the efficient of acceleration section mathematical operations.Be accompanied by HPC the popularizing of colleges and universities and scientific research institutions, set up the large-scale cluster system of high parallel efficiency, causes such as scientific research, education, national defence are had great significance.So fast and effectively the performance of a cover group system being assessed is vital to the large-scale cluster system of setting up high parallel efficiency.
Weigh an important indicator of computing power and calculate peak value exactly, Floating-point Computation peak value for example, it is meant the Floating-point Computation maximum times that computing machine can be accomplished p.s..Comprise theoretical floating-point peak value and actual measurement floating-point peak value: theoretical floating-point peak value is can accomplish the Floating-point Computation maximum times p.s. that can reach on this computer theory; It mainly is by the decision of the dominant frequency of CPU, and each clock period of theoretical floating-point peak value=CPU frequency * CPU is carried out core cpu number in the number of times * system of floating-point operation.
Actual measurement floating-point peak value is meant the Linpack test value, that is to say operation Linpack test procedure on this machine, the test result of the optimum that obtains through the excellent method of various accent.In fact in the practical programs operational process, possibly reach actual measurement floating-point peak value hardly, say nothing of and reached theoretical floating-point peak value.These two values are just as an index weighing machine performance, are used for showing the tolerance of the scale and the potential of machine processing ability.
In the tradition LinPack test, there are some test parameters (for example most important N (problems sizes)) to need to confirm that the unfamiliar people of knowwhy is easy to calculate wrong parameter, causes whole test crash according to concrete cluster environment.Simultaneously, after testing out the actual operation peak-peak, also quite complicated according to the performance of outcome evaluation whole system.
The present invention uses the method based on Shell Script; Interactive environment is provided; Only need to obtain required test parameter of LinPack and performance reference table simply and easily, re-use Linpack and obtain theoretical floating-point peak value and actual floating-point peak value according to prompting input cluster configuration parameter; And further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system.
Summary of the invention
The purpose of this invention is to provide a kind of method of confirming the cluster parallel efficiency fast.
The objective of the invention is to realize by following mode; Required parameters when on management node, using Intel MKL LinPack instrument and the N_calculator.sh script that uses shell script software programming to confirm fast a cover based on the group system LinPack test of Linux; And calculate theoretical floating-point peak value and actual floating-point peak value; Further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system;
Concrete steps are following:
1) the linux system is installed on management node m1, uses the operating system of network installation computing node and lustre node;
2) configuration ssh and rsh do not have the cryptographic acess environment, realize internodal no cryptographic acess;
3) on management node, configuration nis and ntp service, it is synchronous to realize that the user shares with node time;
4) intel compiler, mkl and mpi are installed, application deployment software makes and adopts the infiniband network communication to satisfy network bandwidth requirements between the node to the lustre file directory/opt that shares;
5) under ../intel/mkl/10.X.X.XXX/benchmarks/mp_linpack catalogue, compile out executable file xhpl and the parameter configuration files HPL.dat that is used to test;
6) in the interactive environment of N_calculator.sh, the cluster configuration parameter is provided to N_calculator;
7) use the N_calculator.sh script to calculate N value required among the HPL.dat.And modification HPL.dat file;
8) use Intel MPI operation xhpl, the actual Floating-point Computation peak value of test macro, and, obtain the parallel efficiency calculation of system through system performance table with reference to the generation of N_calculator.sh script.
The invention has the beneficial effects as follows:
1) to extensive multinode linux High-Performance Computing Cluster computing environment, carries out LinPack (Linear System Package)The test of linear system software package obtains the complete machine performance parameter;
2) through Shell Scripts script, the cluster configuration parameter is provided under interactive environment, confirms that fast LinPack calculating is needed N (problems sizes) and systems ability reference table under the various system environmentss;
3) through based on Shell Scripts script, confirm the complete machine parallel running efficient of various system environmentss fast.
Description of drawings
Fig. 1 is the network environment Organization Chart.
Embodiment
It is 2 parts that the network architecture of the present invention is divided into: storage networking and computational grid, wherein:
Storage networking uses 8GB FC optical fiber switch, adopts the memory device of FC SAN framework, divides different lun spaces and is mounted to ls1, ls2 respectively ... On the lustre file system servers such as lsn.Ls1 is total to the n station server to lsn and adopts ls1 to do the mds server, and other servers are done the oss server, and the lun subregion of storage is done mdt and ost equipment respectively, forms the lustre distributed file system, and the readwrite performance of file increases substantially.
Computational grid adopts the Infiniband switch, and the HCA through IB cable collocation server links and receives the lustre node (ls1---lsn), management node (m1) and computing node (c1---cn).Adopt IB over IP communication mechanism, realize internodal express network communication.The bandwidth of IB switch can reach 40Gb/s, and this is that Ethernet institute is unappeasable.
Use the corresponding share directory of m1 management node and computing node carry lustre parallel file system.
Required parameters when the objective of the invention is to confirm fast that through the N_calculator.sh script that on management node, uses Intel MKL LinPack instrument and the own shell of use script to write a cover is tested based on the group system LinPack of Linux; And calculate theoretical floating-point peak value and actual floating-point peak value; Further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system.
Concrete steps of the present invention are following:
The first step is installed the linux system on management node m1, use the operating system of network installation computing node and lustre node.
In second step, configuration ssh and rsh do not have the cryptographic acess environment, realize internodal no cryptographic acess.
The 3rd step, on management node, configuration nis and ntp service, it is synchronous to realize that the user shares with node time.
The 4th step, intel compiler, mkl and mpi are installed, application deployment software makes and adopts the infiniband network communication to satisfy network bandwidth requirements between the node to the lustre file directory/opt that shares.
In the 5th step, under ../intel/mkl/10.X.X.XXX/benchmarks/mp_linpack catalogue, compile out the executable file xhpl and the parameter configuration files HPL.dat that are used to test
In the 6th step, in the interactive environment of N_calculator.sh, the cluster configuration parameter is provided to N_calculator.
In the 7th step, use the N_calculator.sh script to calculate N value required among the HPL.dat.And modification HPL.dat file.
In the 8th step, use Intel MPI operation xhpl, the actual Floating-point Computation peak value of test macro.And, obtain the parallel efficiency calculation of system through system performance table with reference to the generation of N_calculator.sh script.
Embodiment:
Hardware environment: computing node: 7
Every node memory: 8GB
Every node check figure: 12 cores
Cpu frequency: 2.66GHz
Network connects: Infiniband QDR
In the interactive environment of N_calculator.sh, the cluster configuration parameter is provided to N_calculator.sh.Use N_calculator.sh to calculate N (about the use prompting of N_calculator.sh, can use-acquisition of h parameter).
Test result is following
The following parameter values will be used:
N : 75000
NB : 192 128
PMAP : Row-major process mapping
P : 7
Q : 12
PFACT : Left
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 256)
L1 : no-transposed form
U : no-transposed form
EQUIL : no
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2L4 75000 192 7 12 383.36 7.337e+02 --------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0023895 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00C2L4 75000 128 7 12 379.70 7.407e+02 --------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0021663 ...... PASSED
============================================================================
Finished 2 tests with the following results:
2 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
----------------------------------------------------------------------------
End of Tests.
Can know that by above result the actual peak-peak speed of cluster reaches 740.7Gflops.
The theoretical peak of cluster is:
(2.66 dominant frequency) * 4 (each clock period is carried out 4 floating-point operations) * 84 (7 nodes, every node 12 nuclears)=893.76Gflops
So the parallel efficiency of cluster is 740.7/893.76 * 100%=83%
Except that the described technical characterictic of instructions, be the known technology of those skilled in the art.

Claims (1)

1. method of confirming fast the cluster parallel efficiency; It is characterized in that; Required parameters when on management node, using Intel MKL LinPack instrument and the N_calculator.sh script that uses shell script software programming to confirm fast a cover based on the group system LinPack test of Linux; And calculate theoretical floating-point peak value and actual floating-point peak value, and further confirm the parallel efficiency of system, reach the purpose of rapid evaluation group system;
Concrete steps are following:
1) the linux system is installed on management node m1, uses the operating system of network installation computing node and lustre node;
2) configuration ssh and rsh do not have the cryptographic acess environment, realize internodal no cryptographic acess;
3) on management node, configuration nis and ntp service, it is synchronous to realize that the user shares with node time;
4) intel compiler, mkl and mpi are installed, application deployment software makes and adopts the infiniband network communication to satisfy network bandwidth requirements between the node to the lustre file directory/opt that shares;
5) under ../intel/mkl/10.X.X.XXX/benchmarks/mp_linpack catalogue, compile out executable file xhpl and the parameter configuration files HPL.dat that is used to test;
6) in the interactive environment of N_calculator.sh, the cluster configuration parameter is provided to N_calculator;
7) use the N_calculator.sh script to calculate N value required among the HPL.dat, and revise the HPL.dat file;
8) use Intel MPI operation xhpl, the actual Floating-point Computation peak value of test macro, and, obtain the parallel efficiency calculation of system through system performance table with reference to the generation of N_calculator.sh script.
CN2011103601872A 2011-11-15 2011-11-15 Method for rapidly detecting cluster parallel efficiency Pending CN102521119A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103601872A CN102521119A (en) 2011-11-15 2011-11-15 Method for rapidly detecting cluster parallel efficiency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103601872A CN102521119A (en) 2011-11-15 2011-11-15 Method for rapidly detecting cluster parallel efficiency

Publications (1)

Publication Number Publication Date
CN102521119A true CN102521119A (en) 2012-06-27

Family

ID=46292050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103601872A Pending CN102521119A (en) 2011-11-15 2011-11-15 Method for rapidly detecting cluster parallel efficiency

Country Status (1)

Country Link
CN (1) CN102521119A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473140A (en) * 2013-10-09 2013-12-25 浪潮(北京)电子信息产业有限公司 Cluster distribution method of life science applications, and software setup method and device applied of life science applications
CN103793268A (en) * 2014-02-27 2014-05-14 北京并行科技有限公司 Method and device for recognizing dead/low-efficiency process
CN103984613A (en) * 2014-06-10 2014-08-13 浪潮电子信息产业股份有限公司 Method for automatically testing floating point calculation performance of CPU (Central Processing Unit)
CN104035876A (en) * 2014-07-02 2014-09-10 浪潮电子信息产业股份有限公司 Method for implementing LINPACK cluster test in IB network environment based on PXE, SHELL and EXPECT
CN105589717A (en) * 2015-12-10 2016-05-18 浪潮电子信息产业股份有限公司 Batch BIOS (Basic Input Output System) refreshing method based on SSH (Spring Struts Hibernate) service
CN107451022A (en) * 2017-08-11 2017-12-08 郑州云海信息技术有限公司 A kind of method and system for automatically adjusting linpack performance tests
CN111343047A (en) * 2020-02-23 2020-06-26 苏州浪潮智能科技有限公司 Method and system for monitoring IB network flow

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070078641A1 (en) * 2005-09-30 2007-04-05 International Business Machines Corporation Method for performing dynamic simulations within virtualized environment
CN101674492A (en) * 2008-09-09 2010-03-17 中兴通讯股份有限公司 Method and device for testing performance of stream media server

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070078641A1 (en) * 2005-09-30 2007-04-05 International Business Machines Corporation Method for performing dynamic simulations within virtualized environment
CN101674492A (en) * 2008-09-09 2010-03-17 中兴通讯股份有限公司 Method and device for testing performance of stream media server

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MOHAMAD SINDI: "Evaluating MPI Implementations Using HPL on an Infiniband Nehalem Linux Cluster", 《INFORMATION TECHNOLOGY: NEW GENERATIONS (ITNG), 2010 SEVENTH INTERNATIONAL CONFERENCE ON》 *
张磊等: "计算机集群的搭建、测试与应用", 《水利水电科技进展》 *
王勇超: "高性能计算集群技术应用研究", 《万方学位论文数据库》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473140A (en) * 2013-10-09 2013-12-25 浪潮(北京)电子信息产业有限公司 Cluster distribution method of life science applications, and software setup method and device applied of life science applications
CN103793268A (en) * 2014-02-27 2014-05-14 北京并行科技有限公司 Method and device for recognizing dead/low-efficiency process
CN103984613A (en) * 2014-06-10 2014-08-13 浪潮电子信息产业股份有限公司 Method for automatically testing floating point calculation performance of CPU (Central Processing Unit)
CN104035876A (en) * 2014-07-02 2014-09-10 浪潮电子信息产业股份有限公司 Method for implementing LINPACK cluster test in IB network environment based on PXE, SHELL and EXPECT
CN104035876B (en) * 2014-07-02 2017-05-03 浪潮电子信息产业股份有限公司 Method for implementing LINPACK cluster test in IB network environment based on PXE, SHELL and EXPECT
CN105589717A (en) * 2015-12-10 2016-05-18 浪潮电子信息产业股份有限公司 Batch BIOS (Basic Input Output System) refreshing method based on SSH (Spring Struts Hibernate) service
CN107451022A (en) * 2017-08-11 2017-12-08 郑州云海信息技术有限公司 A kind of method and system for automatically adjusting linpack performance tests
CN107451022B (en) * 2017-08-11 2019-07-30 郑州云海信息技术有限公司 A kind of method and system automatically adjusting linpack performance test
CN111343047A (en) * 2020-02-23 2020-06-26 苏州浪潮智能科技有限公司 Method and system for monitoring IB network flow

Similar Documents

Publication Publication Date Title
CN102521119A (en) Method for rapidly detecting cluster parallel efficiency
Mehrotra et al. Performance evaluation of Amazon EC2 for NASA HPC applications
Heinecke et al. Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers
Jayasinghe et al. Variations in performance and scalability when migrating n-tier applications to different clouds
Gorton et al. A high-performance hybrid computing approach to massive contingency analysis in the power grid
RU2013121560A (en) COMPUTER CLUSTER STRUCTURE FOR COMPUTER TASKS AND METHOD FOR OPERATING THE CLUSTER
Eller et al. Scalable non-blocking preconditioned conjugate gradient methods
Oldfield et al. Evaluation of methods to integrate analysis into a large-scale shock shock physics code
Cao et al. New functions added to ALEVIN for evaluating virtual network embedding
Zhai et al. Performance prediction for large-scale parallel applications using representative replay
Mehrotra et al. Performance evaluation of Amazon elastic compute cloud for NASA high‐performance computing applications
Thakkar et al. Renda: resource and network aware data placement algorithm for periodic workloads in cloud
Wang et al. A new approach to load balance for parallel/compositional simulation based on reservoir-model overdecomposition
Choi et al. End-to-end performance modeling of distributed GPU applications
Dorier et al. Colza: Enabling elastic in situ visualization for high-performance computing simulations
Luo et al. Parameter box: High performance parameter servers for efficient distributed deep neural network training
Hermanns et al. Understanding the formation of wait states in applications with one-sided communication
Zhang et al. Validating the simulation of large-scale parallel applications using statistical characteristics
Pottier et al. Modeling the performance of scientific workflow executions on hpc platforms with burst buffers
CN102682078A (en) Method for automatically and rapidly deploying NFS (network file system) sharing
Poyraz et al. Application-specific I/O optimizations on petascale supercomputers
Rajan et al. Performance comparison of 20 Gbps and 40 Gbps InfiniBand interconnect
Matsuoka The tsubame2. 5 evolution
CN103955424A (en) Virtualized embedded type binary software defect detection system
Zhou et al. MHT: A light-weight scalable zero-hop MPI enabled distributed key-value store

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120627