CN102945198A - Method for characterizing application characteristics of high performance computing - Google Patents

Method for characterizing application characteristics of high performance computing Download PDF

Info

Publication number
CN102945198A
CN102945198A CN201210398976XA CN201210398976A CN102945198A CN 102945198 A CN102945198 A CN 102945198A CN 201210398976X A CN201210398976X A CN 201210398976XA CN 201210398976 A CN201210398976 A CN 201210398976A CN 102945198 A CN102945198 A CN 102945198A
Authority
CN
China
Prior art keywords
application
data
node
monitoring
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210398976XA
Other languages
Chinese (zh)
Other versions
CN102945198B (en
Inventor
刘羽
金莲
吕文静
于涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201210398976.XA priority Critical patent/CN102945198B/en
Publication of CN102945198A publication Critical patent/CN102945198A/en
Application granted granted Critical
Publication of CN102945198B publication Critical patent/CN102945198B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method for characterizing application software operation characteristics of various industries in the field of high performance computing. The method comprehensively examines the load pressure of application programs in five links of inputting, storage, processing, transmission and outputting, and sequentially divides the applications into four large types of compute-intensive type, memory constraint type, I/O intensive type and network intensive type. Through the quantitative representation of the four aspects, the resource demands of the applications on the aspects of CPU occupation, memory capacity, memory throughput, inputting/outputting and network data exchange are fully represented to maximally reflect the operation characteristics of the application software. The method provided by the invention is simple, practical, reliable and effective, and can directly reflect the quantity demand of one application software on the high-performance hardware resource, so that the applications can be operated on a proper high-performance platform in order to maximally exert the performances of the application software. According to the characteristics, the performance bottleneck of the application software can be improved and broken through in purpose, and the application expandability is enhanced.

Description

A kind of method that characterizes the high-performance calculation application characteristic
Technical field
The present invention relates to the content of high-performance computing sector aspect the application software performance characterization, be specifically related to a kind of the utilization and monitor and extract suitable high-performance parameter, reflect that the most reasonably large-scale application software is to the method for computational resource requirements.
Background technology
Along with the continuous progress of human society, the development of science and technology, people are not only more and more extensive to natural understanding, and also more and more urgent to the demand of outfield exploration.This is just so that the growth of the rapid property of amount of the information data that human support is held, and with the time simultaneously, the information data of these magnanimity all needs to analyze timely and process.For example, a large-scale astronomical radio telescope array just can produce the above universe microwave data of 100GB one second, and these data all need in time to be analyzed; For another example, in the particle physics research field, the data that LHC once clashes are also measured take TB as unit; In addition, also more and more higher requirement has been proposed computing power as human genome engineering, petroleum prospecting, weather forecast etc. field.Become already except experiment, the third extremely important Science Explorations means the theoretical analysis in numerical evaluation under this overall background.Just be based on such reality, the supercomputer of greatly developing that has impelled each science and technology power of the world today all doing one's utmost.As, in the world TOP500 of in June, 2012 issue, the IBM that ranks the first " Chinese larch (Sequoia) " has just reached the peak velocity of 20PFlops, and meanwhile new No. trillion supercomputer also among research and planning.Generally speaking, per ten years basically, the speed of supercomputer just promoted three magnitudes (1000 times), and the ability of therefore building supercomputer has become strong an embodiment of a national science and technology level and overall national strength.
Although the supercomputer speed of development is surprising, also gratifying, the software engineering that regrettably matches is with it hesitated to move forward, this serious restriction the performance of supercomputer application power.Now most application software institute based on ultimate principle and mathematical algorithm, or propose 50 ~ sixties of last century and grow up, these algorithms are to mate fully and adapts to at that time large scale computer, with serial or a small amount of behavior master also between process.But the development through more than 50 years, earth-shaking variation has occured in present supercomputing machine architecture, easily have hundreds of thousands and even CPU nuclear up to a million, and to also have quite a few supercomputer be the architectural framework of the mixing isomery (CPU+GPU/MIC etc.) that uses, this just so that early stage physical model and mathematical algorithm unable to do what one wishes, can't be competent at.Present most application software inefficiency that Here it is, the main cause of poor expandability.
Crack these present difficult problems, on the one hand, we are the new physical model and the mathematical algorithm that are complementary with supercomputing machine architecture now of research and development energetically, this is the Last Resort of breaking through existing bottleneck, but this is an extremely difficult problem after all, can't see at short notice effect and realize large-scale application; On the other hand, we should set about studying the present application software that get off, magnanimity of inheriting, reasonably characterize their operation characteristic, find out their performance bottleneck, in the performance of existing these application of platform performance, can also provide strong foundation for improvement and the breakthrough of application performance in addition to greatest extent.Therefore,, how rationally the feature of the sign application of science is exactly subject matter to be solved by this invention.
Summary of the invention
The purpose of this invention is to provide a kind of method that characterizes the high-performance calculation application characteristic.
The objective of the invention is to realize in the following manner, the technical problem to be solved in the present invention is a kind of method that characterizes fast and efficiently application program operation characteristic in the high-performance calculation of design, thereby rapidly, position-location application is brought into play the performance of application program to greatest extent to the demand of computational resource accurately.
Characteristics for existing high-performance computer architectural framework and computing application, to substantially be divided into two key steps to the characterization of using operation characteristic, namely, 1) for the application program operation monitoring and data that computational resource takies are extracted, 2) be analysis and aftertreatment to institute's image data, for the former, construction characteristic according to the high-performance calculation platform, from the running background watch-dog, realize that the application programs computational resource takies the Real Time Monitoring of situation, and extract data, it not only will be for the hardware platform of different frameworks, more require watch-dog very little to taking of resource, can not have influence on the normal operation of monitored program; And for the latter, set reasonably with reference to amount according to the characteristics of hardware platform, from the mass data of monitoring, choose suitable analyzing with reference to amount, with the need level of position-location application to computational resource, require to have the analyzing and processing ability to mass data of unified standard; Concrete analysis, organization flow is as follows:
1) determines software and hardware platform: refer to according to the application software that will characterize, select suitable hardware platform, and dispose the software environments such as good corresponding system, math library, watch-dog, the performance of hardware platform balance as much as possible here, and leave as much as possible certain resource surplus;
2) dispose running monitor: refer in flow process 1) on the good hardware platform, move respectively resource monitor at master and slave node; Here resource monitor should satisfy the function of Real Time Monitoring application resource occupancy from all computing nodes, include but not limited to CPU usage, EMS memory occupation amount, IO bandwidth, network are handled up in real time, it uses complete all resources that contain the hardware platform of monitoring;
3) send monitor data from node to main controlled node: main controlled node and from node for the monitoring, it is a relative concept, main controlled node can be from node simultaneously also, it mainly finishes the reception to monitor data, then be that the monitoring own resource takies situation and is responsible for sending monitor data to main controlled node from node, just do not need Returning process 1 if main controlled node normally receives data), state soft to redefine, hard platform can be used;
The application program that 4) will characterize in monitored node operation;
5) Real Time Monitoring: should guarantee that the data of monitoring are authentic and valid, then answer Returning process 2 such as the data distortion);
6) determine analytical standard: according to the hardware characteristics of the hardware platform that runs application, determine reference point, if using is to move under gigabit networking, then should get the bandwidth upper limit 125MB/s of gigabit networking as reference point, if use the Infiniband network be, then use get the HCA card that uses higher limit as a reference: definite method of other index parameters is therewith together;
7) generate characteristic feature: according to monitor data, calculating mean value or choose maximal value, and compare with reference standard, gained ratio is this characterization result, each has different characterization results different application, for the network characterisation of using, network traffics mean value or the maximal value of computing application run duration, and compare with the criterion referenced value, be the characterization result of this application.
The invention has the beneficial effects as follows: the present invention takes full advantage of the characteristics of high-performance calculation, and the desirability of application program to computational resource explored and excavated to the degree of depth, and does not affect the normal operation of analyzed application program with atomic weak resource occupation amount.The method can characterize the feature of application fast, has brought into play greatly the calculated performance that existing magnanimity is used.
Description of drawings
Fig. 1 is the computational analysis method flow diagram.
Embodiment
Explain below with reference to Figure of description method of the present invention being done.
The present invention is directed to the characteristics of existing high-performance computer architectural framework and computing application, to substantially be divided into two key steps to the characterization of using operation characteristic, namely, one moves monitoring and data that computational resource is taken for application program extracts, and two is analysis and the aftertreatment to institute's image data.For the former, it mainly is the construction characteristic according to the high-performance calculation platform, from the running background watch-dog, realize that the application programs computational resource takies the Real Time Monitoring of situation, and extraction data, it not only will more require watch-dog very little to taking of resource for the hardware platform of different frameworks, can not have influence on the normal operation of monitored program; And for the latter, mainly be to set reasonably with reference to amount according to the characteristics of hardware platform, from the mass data of monitoring, choose suitable amount and analyze, with the need level of position-location application to computational resource, the analyzing and processing ability to mass data of its major requirement unified standard.Specifically, substantially can be divided into following several step:
1. dispose watch-dog according to computing platform;
2. start watch-dog from main controlled node, collect successively the characteristic parameter of computing node;
3. move the application program of appointment;
4. finish the monitoring to using, withdraw from monitoring environment;
5. set reference standard according to hardware platform;
6. analyzing and processing monitor data.
In order to make purpose of the present invention, technical scheme and advantage more clear, we with the characteristic present process of application program on two nodes of using CPU as example, and by reference to the accompanying drawings, committed step among the present invention is elaborated, for using more computing nodes, or use the characteristic manner of application program of the isomery systems such as GPU/MIC identical with it.
Embodiment
As shown in Figure 1, provided the schematic diagram of analysis process involved in the present invention.Its basic analysis, organization flow are as follows:
1. determine software and hardware platform.Mainly refer to according to the application software that will characterize, select suitable hardware platform, and dispose the software environments such as good corresponding system, math library, watch-dog.The performance of hardware platform balance as much as possible here, and leave as much as possible certain resource surplus;
2. running monitor.Mainly refer in flow process 1, dispose on the good hardware platform, move respectively resource monitor at master and slave node.Here resource monitor should satisfy the function of Real Time Monitoring application resource occupancy from all computing nodes, include but not limited to CPU usage, EMS memory occupation amount, IO bandwidth, network are handled up etc. in real time, it uses complete all resources that contain the hardware platform of monitoring;
3. send monitor data from node to main controlled node.Main controlled node and from node for the monitoring, it is a relative concept, main controlled node can be from node simultaneously also, and it mainly finishes the reception to monitor data, then is that the monitoring own resource takies situation and responsible for main controlled node transmission monitor data from node.Just do not need Returning process 1 if main controlled node normally receives data in this step, state soft to redefine, hard platform can be used;
4. the application program that will characterize in monitored node operation;
5. Real Time Monitoring.Here should guarantee mainly that the data of monitoring are authentic and valid, then answer Returning process 2 such as the data distortion;
6. determine analytical standard.Mainly refer to the hardware characteristics according to the hardware platform that runs application, determine reference point.As, be under gigabit networking, to move if use, then should get the bandwidth upper limit (125MB/s) of gigabit networking as reference point, if use the Infiniband network that is, then use get the HCA card that uses higher limit as a reference.Definite method of other index parameters is same therewith;
7. generation characteristic feature.Mainly refer to according to monitor data, calculating mean value (or choose maximal value etc., decide according to the specific occasion) is also compared with reference standard, and gained ratio is this characterization result, and each has different characterization results different application.For example, for the network characterisation of using, the network traffics mean value of computing application run duration (or maximal value etc.), and compare with the criterion referenced value, be the characterization result of this application.
Analytical approach of the present invention can reflect the operation characteristic of application program greatly, use taking computational resource thereby hold fast, and the extendability of using etc., thereby accurately the hardware platform of application performance can at utmost be brought into play in the location, realizes the application performance maximization.
Except the described technical characterictic of instructions, be the known technology of those skilled in the art.

Claims (1)

1. method that characterizes the high-performance calculation application characteristic, it is characterized in that the characteristics for existing high-performance computer architectural framework and computing application, to substantially be divided into two key steps to the characterization of using operation characteristic, namely, 1) for the application program operation monitoring and data that computational resource takies are extracted, 2) be analysis and aftertreatment to institute's image data, for the former, construction characteristic according to the high-performance calculation platform, from the running background watch-dog, realize that the application programs computational resource takies the Real Time Monitoring of situation, and extraction data, it not only will more require watch-dog very little to taking of resource for the hardware platform of different frameworks, can not have influence on the normal operation of monitored program; And for the latter, set reasonably with reference to amount according to the characteristics of hardware platform, from the mass data of monitoring, choose suitable analyzing with reference to amount, with the need level of position-location application to computational resource, require to have the analyzing and processing ability to mass data of unified standard; Concrete analysis, organization flow is as follows:
1) determines software and hardware platform: refer to according to the application software that will characterize, select suitable hardware platform, and dispose the software environments such as good corresponding system, math library, watch-dog, the performance of hardware platform balance as much as possible here, and leave as much as possible certain resource surplus;
2) dispose running monitor: refer in flow process 1) on the good hardware platform, move respectively resource monitor at master and slave node; Here resource monitor should satisfy the function of Real Time Monitoring application resource occupancy from all computing nodes, include but not limited to CPU usage, EMS memory occupation amount, IO bandwidth, network are handled up in real time, it uses complete all resources that contain the hardware platform of monitoring;
3) send monitor data from node to main controlled node: main controlled node and from node for the monitoring, it is a relative concept, main controlled node can be from node simultaneously also, it mainly finishes the reception to monitor data, then be that the monitoring own resource takies situation and is responsible for sending monitor data to main controlled node from node, just do not need Returning process 1 if main controlled node normally receives data), state soft to redefine, hard platform can be used;
The application program that 4) will characterize in monitored node operation;
5) Real Time Monitoring: should guarantee that the data of monitoring are authentic and valid, then answer Returning process 2 such as the data distortion);
6) determine analytical standard: according to the hardware characteristics of the hardware platform that runs application, determine reference point, if using is to move under gigabit networking, then should get the bandwidth upper limit 125MB/s of gigabit networking as reference point, if use the Infiniband network be, then use get the HCA card that uses higher limit as a reference: definite method of other index parameters is therewith together;
7) generate characteristic feature: according to monitor data, calculating mean value or choose maximal value, and compare with reference standard, gained ratio is this characterization result, each has different characterization results different application, for the network characterisation of using, network traffics mean value or the maximal value of computing application run duration, and compare with the criterion referenced value, be the characterization result of this application.
CN201210398976.XA 2012-10-19 2012-10-19 A kind of method characterizing high-performance calculation application characteristic Active CN102945198B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210398976.XA CN102945198B (en) 2012-10-19 2012-10-19 A kind of method characterizing high-performance calculation application characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210398976.XA CN102945198B (en) 2012-10-19 2012-10-19 A kind of method characterizing high-performance calculation application characteristic

Publications (2)

Publication Number Publication Date
CN102945198A true CN102945198A (en) 2013-02-27
CN102945198B CN102945198B (en) 2016-03-02

Family

ID=47728146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210398976.XA Active CN102945198B (en) 2012-10-19 2012-10-19 A kind of method characterizing high-performance calculation application characteristic

Country Status (1)

Country Link
CN (1) CN102945198B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246569A (en) * 2013-05-20 2013-08-14 浪潮(北京)电子信息产业有限公司 Method and device for representing high-performance calculation application characteristics
CN103501253A (en) * 2013-10-18 2014-01-08 浪潮电子信息产业股份有限公司 Monitoring organization method for high-performance computing application characteristics
CN103716210A (en) * 2014-01-07 2014-04-09 浪潮(北京)电子信息产业有限公司 System, device and method for monitoring operation efficiency of calculation application software
CN104156296A (en) * 2014-08-01 2014-11-19 浪潮(北京)电子信息产业有限公司 System and method for intelligently monitoring large-scale data center cluster computing nodes
CN106201691A (en) * 2016-07-11 2016-12-07 浪潮(北京)电子信息产业有限公司 The dispatching method of a kind of network I/O intensive task and device
CN115129541A (en) * 2022-06-20 2022-09-30 北京计算机技术及应用研究所 High-performance computing resource monitoring implementation method based on Feiteng platform
CN117407250A (en) * 2023-12-15 2024-01-16 上海飞斯信息科技有限公司 Computer performance control system based on real-time processing of running environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1026592A2 (en) * 1999-02-04 2000-08-09 Sun Microsystems, Inc. Method for analyzing the performance of application programs
WO2003083731A2 (en) * 2002-03-28 2003-10-09 Siemens Aktiengesellschaft Pc-arrangement for visualisation, diagnosis and expert systems for monitoring, controlling and regulating high voltage supply units of electric filters
US20050013705A1 (en) * 2003-07-16 2005-01-20 Keith Farkas Heterogeneous processor core systems for improved throughput
CN102591921A (en) * 2010-12-20 2012-07-18 微软公司 Scheduling and management in a personal datacenter

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1026592A2 (en) * 1999-02-04 2000-08-09 Sun Microsystems, Inc. Method for analyzing the performance of application programs
WO2003083731A2 (en) * 2002-03-28 2003-10-09 Siemens Aktiengesellschaft Pc-arrangement for visualisation, diagnosis and expert systems for monitoring, controlling and regulating high voltage supply units of electric filters
US20050013705A1 (en) * 2003-07-16 2005-01-20 Keith Farkas Heterogeneous processor core systems for improved throughput
CN102591921A (en) * 2010-12-20 2012-07-18 微软公司 Scheduling and management in a personal datacenter

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246569A (en) * 2013-05-20 2013-08-14 浪潮(北京)电子信息产业有限公司 Method and device for representing high-performance calculation application characteristics
CN103501253A (en) * 2013-10-18 2014-01-08 浪潮电子信息产业股份有限公司 Monitoring organization method for high-performance computing application characteristics
CN103716210A (en) * 2014-01-07 2014-04-09 浪潮(北京)电子信息产业有限公司 System, device and method for monitoring operation efficiency of calculation application software
CN103716210B (en) * 2014-01-07 2017-05-24 浪潮(北京)电子信息产业有限公司 System, device and method for monitoring operation efficiency of calculation application software
CN104156296A (en) * 2014-08-01 2014-11-19 浪潮(北京)电子信息产业有限公司 System and method for intelligently monitoring large-scale data center cluster computing nodes
CN104156296B (en) * 2014-08-01 2017-06-30 浪潮(北京)电子信息产业有限公司 The system and method for intelligent monitoring large-scale data center cluster calculate node
CN106201691A (en) * 2016-07-11 2016-12-07 浪潮(北京)电子信息产业有限公司 The dispatching method of a kind of network I/O intensive task and device
CN115129541A (en) * 2022-06-20 2022-09-30 北京计算机技术及应用研究所 High-performance computing resource monitoring implementation method based on Feiteng platform
CN115129541B (en) * 2022-06-20 2024-03-26 北京计算机技术及应用研究所 High-performance computing resource monitoring implementation method based on Feiteng platform
CN117407250A (en) * 2023-12-15 2024-01-16 上海飞斯信息科技有限公司 Computer performance control system based on real-time processing of running environment
CN117407250B (en) * 2023-12-15 2024-03-12 上海飞斯信息科技有限公司 Computer performance control system based on real-time processing of running environment

Also Published As

Publication number Publication date
CN102945198B (en) 2016-03-02

Similar Documents

Publication Publication Date Title
CN102945198A (en) Method for characterizing application characteristics of high performance computing
CN103235974B (en) A kind of method improving massive spatial data treatment effeciency
Tikir et al. PSINS: An open source event tracer and execution simulator for MPI applications
CN106547882A (en) A kind of real-time processing method and system of big data of marketing in intelligent grid
CN106339351B (en) A kind of SGD algorithm optimization system and method
US20130191052A1 (en) Real-time simulation of power grid disruption
CN104156296B (en) The system and method for intelligent monitoring large-scale data center cluster calculate node
CN103501253A (en) Monitoring organization method for high-performance computing application characteristics
CN103246569A (en) Method and device for representing high-performance calculation application characteristics
CN104519112A (en) Intelligent selecting framework for staged cloud manufacturing services
CN106408126A (en) Three-stage optimization method oriented to concurrent acquisition of energy consumption data
CN103885867A (en) Online evaluation method of performance of analog circuit
CN107679133B (en) Mining method applicable to massive real-time PMU data
Rizvandi et al. On modeling dependency between mapreduce configuration parameters and total execution time
CN103593534A (en) Shield tunneling machine intelligent model selection method and device based on engineering geology factor relevance
CN104090813A (en) Analysis modeling method for CPU (central processing unit) usage of virtual machines in cloud data center
Ventura et al. Efficient performance‐based design using parallel and cloud computing
Dad et al. Synthesis and feedback on the distribution and parallelization of FMI-CS-based co-simulations with the DACCOSIM platform
Alekseev et al. Scientific Data Lake for High Luminosity LHC project and other data-intensive particle and astro-particle physics experiments
CN109800506B (en) Performance evaluation method and system of aircraft
Tang et al. A job scheduling algorithm based on parallel workload prediction on computational grid
Li et al. Construction of a Smart Supply Chain for Sand Factory Using the Edge-Computing-Based Deep Learning Algorithm
Becciani et al. Cosmological simulations and data exploration: a testcase on the usage of grid infrastructure
Ziganurova et al. Local Virtual Times Analysis in PCS Model
CN105787180A (en) Large-scale crowd behavior evolution analysis method based on Map-Reduce and multi-agent models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant