CN108132872A - Based on the parallel super GRAPES system optimization methods for calculating grid cloud platform - Google Patents

Based on the parallel super GRAPES system optimization methods for calculating grid cloud platform Download PDF

Info

Publication number
CN108132872A
CN108132872A CN201810021292.5A CN201810021292A CN108132872A CN 108132872 A CN108132872 A CN 108132872A CN 201810021292 A CN201810021292 A CN 201810021292A CN 108132872 A CN108132872 A CN 108132872A
Authority
CN
China
Prior art keywords
function
grapes
processes
test
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810021292.5A
Other languages
Chinese (zh)
Other versions
CN108132872B (en
Inventor
张禹涵
吴涛
吴锡
王铁军
黄敏
杨昊
赵长名
陈海宁
谢磊
肖丹
杨晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201810021292.5A priority Critical patent/CN108132872B/en
Publication of CN108132872A publication Critical patent/CN108132872A/en
Application granted granted Critical
Publication of CN108132872B publication Critical patent/CN108132872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • G06F11/3423Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time where the assessed time is active or idle time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis

Abstract

The present invention relates to a kind of based on the parallel super GRAPES system optimization methods for calculating grid cloud platform, including:S1 test data set and operating system) are loaded into, carries out system level testing, communication stage test and the test of function grade respectively, including:S1.1) system level testing;S1.2) communication stage is tested;S1.3) function grade is tested:The function of calling is monitored, obtains the operation characteristic of function.S2 test result analysis) is carried out according to derived system features file, including:S2.1) system test result is analyzed;S2.2) MPI communication stages test result analysis;S2.3) function grade test result analysis.S3 processing) is optimized according to analysis result, optimization processing includes:Vectorization, load balancing substitute the function in GRAPES_GFS using library function.The present invention solves optimization problems of the Grapes on parallel super calculation grid platform, improves running efficiency of system.

Description

Based on the parallel super GRAPES system optimization methods for calculating grid cloud platform
Technical field
The present invention relates to process meteorological data field more particularly to a kind of GRAPES based on parallel super calculation grid cloud platform System optimization method.
Background technology
By long-term development, the scientific basic and technical method of numerical weather forecast comparative maturity, are to make day The most important Scientific Approaches of gas forecast, and there is unique advantage in terms of the forecast of Extreme Weather Events.Weather numerical value is pre- Report is other than it will consider the contribution of air motion of various scales comprehensively, it is necessary to mutual in view of air and the other ring layers of the earth Effect, therefore the data volume for participating in calculating is very huge, while believable mid-term Numerical Weather timeliness further extends, and is based on Numerical forecast requirement of real-time is higher and higher, improves GRAPES calculating speeds with regard to increasingly important.
GRAPES systematic researches exploitation is from Data Assimilation, Forecast Mode dynamical frame, Atmospheric dynamics and calculating Machine supports what four direction carried out, this four aspect achievements in research are integrated into:Region mesoscale assimilation system and the whole world assimilation with Two class forecast system of mid-range forecast, wherein whole world medium-range numerical weather forecast system GRAPES-GFS is numerical forecast business body The core of system not only provided boundary condition and background information for region Mesoscale Numerical Forecast System (GRAPES-MESO), but also is complete The basis of ball DATA PROCESSING IN ENSEMBLE PREDICTION SYSTEM.At present, GRAPES-GFS can carry out the weather situation of 10 days in global range and precipitation event pre- It surveys.
The currently running environment of GRAPES systems is the martial prowess 4000A (Sunway based on IBM POWER processor architectures 4000A).Martial prowess 4000A completes hardware installation and after 2011 complete to expand in part of in September, 2009, and GRAPES systems just exist It is run on this parallel tables.Martial prowess 4000A computing subsystem peak computationals ability is 15.75TFLOPS, containing processor 42, Every has 4 child nodes, and 24 core CPU are configured in each child node, share 36GB memories.Memory node 16, it is complete to provide 128TB Office's memory capacity.Martial prowess 4000A systems use the low power consumption processor of Intel mainstreams, and each calculate node contains 2 The 4 core Xeons (Intel Xeon Nehalem X5570/2.93GHz) of 2.93GHz, the memory that single cpu caryogamy is put For 4.5GB;Node, which interconnects, uses Infiniband networks, and the two-way bandwidth of High speed network between node is 80Gbps.
However, with the continuously improving of numerical weather prediction model, the continuous improvement of resolution ratio and forecast gradually walk to Extended peroid is forecast and the development of Short-term Forecast, and GRAPES systems run on martial prowess 4000A platforms, computing capability to platform and The demand of resolution ratio is also constantly expanding, therefore GRAPES systems are run under existing running environment, it may appear that because of calculation amount Increased dramatically, computing capability is limited and the problem of causing the computational efficiencies such as calculating speed is slack-off, calculating cycle is elongated low;It is another Aspect, GRAPES patterns are a mesh point modes, are drawn on the parallel decomposition of zoning using general horizontal grid Point, and in order to improve the accuracy of forecast, calculating grid can divide less and less, and corresponding calculation amount can increase.Likewise, Because the computing capability of martial prowess 4000A platforms is limited, this can lead to the decline of computational accuracy.Computational efficiency is low and computational accuracy Declining the accuracy that can directly result in the forecast of GRAPES system values and timeliness cannot ensure, and the timeliness and essence forecast True property is the most important demand and feature of meteorological field application.In addition, as GRAPES systems are run on this parallel tables The passage of time, user volume are also increasing, and the resource management system of former running environment cannot be tracked dynamically, reflected in real time User is to the service condition of computing resource, it is impossible to implement resource control scheme in time, it can not be with a kind of unitized quantization hand The usage amount of the quantity of segment description resource, accurate record and control user resources.
In conclusion martial prowess 4000A cannot meet GRAPES system values forecast system for computing capability, calculate The demand of precision and resource reasonable distribution.Solve the problems, such as existing platform exist just must to using INTEL X86-baseds as It is transplanted on the High-Performance Computing Cluster platform of a new generation of core.Positioned at Guangzhou, country surpasses the Milky Way two at calculation center and disclosure satisfy that For computing capability caused by GRAPES system upgrades and mode conversion, the need of computational accuracy and reasonable distribution related resource It asks.Therefore it needs to provide corresponding optimization method, solves Grapes in the parallel super optimization problem calculated on grid platform.
Invention content
For the deficiency of the prior art, the present invention proposes a kind of based on the parallel super GRAPES systems for calculating grid cloud platform Optimization method includes the following steps:
S1 test data set and operating system) are loaded into, system level testing, communication stage test and function grade is carried out respectively and surveys Examination, including:
S1.1) system level testing:The weather condition that 0.1 ° of resolution ratio example is used to forecast on GRAPES_GFS 24 hours, Every 6 hours export a modvar, carry out performance using 2048 processes, 4096 processes, 8192 processes and 12000 processes respectively Extension test;
S1.2) communication stage is tested:Communication conditions when being run to program are monitored, and are forecast using 0.1 ° of resolution ratio example The weather condition of 24 hours, every 6 hours are exported a modvar, are tested using 2048 processes;
S1.3) function grade is tested:The function called in pattern is monitored, obtains the operation characteristic of function, is used 0.1 ° of resolution ratio example forecasts the weather condition of 24 hours, and every 6 hours are exported a modvar, surveyed using 8192 processes Examination;
S2 test result analysis) is carried out according to derived system features file, including:
S2.1) carry out system test result analysis, analysis GRAPES_GFS patterns respectively 2048 processes, 4096 processes, Operation characteristic under 8192 processes and 12000 process scales, the use of the operation characteristic including cpu resource, Internet resources Use, the use of memory source and the use of disk;
S2.2 MPI communication stage test result analysis) is carried out, analyzes MP1 call duration times accounting and GRAPES logical process Event accounting;
S2.3) into line function grade test result analysis, analysis of central issue, analysis carry out the function in GRAPES by software The most function of holding time;
S3 processing) is optimized according to the analysis result of step S2, optimization processing includes:Vectorization, load balancing, use Library function substitutes the function in GRAPES_GFS.
According to a preferred embodiment, step S2.2 is further included:It analyzes the distribution of call duration time and is related to global scope Time accounting.
According to a preferred embodiment, in step s3, ff2 functions highest to usage time accounting optimize, Maximum, the minimum value that the mode recycled in source program is found a function are replaced using library function.
The invention has the advantages that:
The present invention solves optimization problems of the Grapes on parallel super calculation grid platform, improves running efficiency of system. No. two superpower computing capabilitys of the operation demand of GRAPES system super larges and computational accuracy demand and the Milky Way just mutually suit, will The function of GRAPES is realized by the Milky Way two, can promote the speed of service and computational accuracy of GRAPES systems, being capable of big model Exclosure improves the accuracy of numerical weather forecast, realizes the real-time weather conditions forecast in a wide range of area.
Description of the drawings
Fig. 1 illustrates the flow chart of the present invention;
Fig. 2 shows operation characteristic figures of the GRAPES_GFS under different process scales;
Fig. 3 shows that MPI communication stages test result counts schematic diagram;
Fig. 4 shows the time accounting change schematic diagram of integrate function calls.
Specific embodiment
It is described in detail below in conjunction with the accompanying drawings.
Understand to make the object, technical solutions and advantages of the present invention clearer, With reference to embodiment and join According to attached drawing, the present invention is described in more detail.It should be understood that these descriptions are merely illustrative, and it is not intended to limit this hair Bright range.In addition, in the following description, the description to known features and technology is omitted, to avoid this is unnecessarily obscured The concept of invention.
Test platform and monitoring instrument:This test job is to calculate Guangzhou center " Milky Way two " in national Super Upper progress, the details of test environment are as shown in the table:
In order to help that whole analysis is carried out to GRAPES systems, whole system operation conditions is monitored, acquisition system Every operating index of system.It is main to have used following 4 tools:
VTune
The operation data of acquisition system realizes the monitoring of system.Each section can be directly obtained by collected data The operation conditions of program can also obtain the run time of system and the run time of every section of function, can therefrom find system The run time of each function, is targetedly analyzed.
Paramoon
Application operation feature extractor, by monitoring the clothes such as cluster management/login node, calculate node, I/O node in real time It is special to provide the operation that application software in cluster system changes over time for processor, memory, network and the storage performance data of business device Sign.
ParaTune
Application operation feature analyzer, can analyze the .para application operation tag files of Paramon generations, and display should With the performance data of processor, memory, network and disk in each node during operation, group of planes application operation process is reconstructed, efficiently, accurate The operation characteristic of application really is described.
Step S1:Test data set and operating system are loaded into, carries out system level testing, communication stage test and function grade respectively Test.
Step S1.1:System level testing.
System level testing is the situation of the operation of test system on the whole, and 0.1 ° of resolution ratio is used on GRAPES_GFS Example, the forecast weather condition of 24 hours, every 6 hours export a modvar, respectively using 2048 processes, 4096 processes, 8192 processes, 12000 processes carry out behavior extension.
Step S1.2:Communication stage is tested.
Communication stage test is that communication conditions when being run to program are monitored, and uses 0.1 ° of resolution ratio example, forecast 24 The weather condition of hour, every 6 hours are exported a modvar, are tested using 2048 processes.
Step S1.3:Function grade is tested.
The test of function grade is that the function called in pattern is monitored, and obtains the operation characteristic of function, uses 0.1 ° point Resolution example, the weather condition of forecast 24 hours, every 6 hours are exported a modvar, are tested using 8192 processes.
Step S2:Test result analysis.
Step S2.1:System level testing interpretation of result.
Observe GRAPES_GFS patterns operation characteristic overall condition, respectively 2048 processes, 4096 processes, 8192 into Under journey, 12000 process scales.Aforementioned operating condition includes the use of cpu resource, the use of Internet resources, memory source make With and disk use.
Step S2.2:MPI communication stage test result analysis.
Analyze the run time distribution situation of entire program, observation MPI call duration times accounting, GRAPES logical process events Accounting.The distribution situation of further analysis call duration time, the time accounting for being related to global scope.
Step S2.3:Function grade test result analysis.
Analysis of central issue, the most function of analysis holding time carry out the function in GRAPES by software.Further, Hot spot function concrete function is analyzed, so as to find the mode that may optimize.
Step S3:Processing is optimized according to the analysis result of step S2.
In the work step of front, we have done the test job of different stage, and its object is to find GRAPES_ The weak link of GFS patterns is optimized by targetedly method, improves operational efficiency.It is main according to test result The optimization means of use include three kinds:Vectorization, load balancing substitute the function in GRAPES_GFS using library function.
Due to the complexity of GRAPES systems in itself, Optimization Work is an extremely complex process, has above been carried Hotspot's distribution into GRAPES_GFS is on multiple functions, therefore the highest ff2 functions of usage time accounting are made in optimization It is optimized for example.Further, the hot spot in ff2 functions is concentrated mainly in the calculating of maximin, optimizes work Work is that maximum, the minimum value that the mode that will be recycled in source program is found a function are replaced using library function
For a better understanding of the present invention, step S2 and step S3 is carried out with reference to specific embodiment further Description, it should be noted that the embodiment content is to understand for convenience, cannot form the limit to the scope of the present invention System.
Step S2.1:System level testing interpretation of result.
Be illustrated in figure 2 the overall condition of the operation characteristic of GRAPES_GFS patterns under different processes, which show Operating condition under 2048 processes, 4096 processes, 8192 processes, 12000 process scales, specifically includes:The use of cpu resource, The use of Internet resources, the use of memory source and the use of disk.
As can be seen that the utilization rate of CPU, close to 100%, the operation ratio of CPU (sywa) %d under systematic thinking way is smaller, Illustrate that CPU most times are all spent on processing user program.(cycles perinstruction, 0.25) theory is to CPI It is 0.6 to be worth, and shows that instruction execution efficiency is higher.The percentage of the Gflpps of local runtime is 6%, from test result, Gflops is only relatively low in 1% or so, Gflops values, illustrates that the efficiency of Floating-point Computation is relatively low.The Milky Way two supports AVX instructions, The GRAPES indexs are that the percentage of 0%, VEC is only 3% or so, it may be said that the main reason for bright floating number computational efficiency is relatively low One of be that vectorization degree is not high.
From peak information as can be seen that system resource is also not up to saturation, compare using 2048 processes and 12000 into That Cheng Jinhang is tested as a result, number of processes increases, express network bandwidth peak reduces, and network bandwidth resources do not reach saturation, The usage amount that same number of processes increases memory is only 1/3rd of peak value, and memory source does not also reach saturation.From figure In it can also be seen that Gflops peak values only reach 7.74%, only account for the half of peak value, still have greatly improved space.
Step S2.2:MPI communication stage test result analysis.
As shown in Figure 3, it can be seen that the run time distribution situation of entire program, wherein MPI call duration times accounting 68.1%, GRAPES logical process time accounting 31.8%, about 1/3rd time are all consumed in communication above.Further Analysis, the distribution situation of call duration time, be related to global scope MPI_Barrier, MPI_Allgather simultaneously operating occupy It is more, account for about the 27% of entire program runtime.
Further analyze the time accounting of the time accounting situation, on the whole each process of each process of each process Than more uniform, the time loss that process that wherein process number is 1416~1439 calculates is close to 2 times of other processes, MPI_ It is relatively low that Sendrecv communications take ratio.Other statistics processes:Computing module User_Code, MPI_Barrier, MPI_ Sendrecv, which takes, is presented wavy cyclically-varying, and in 1600 process sections, computing module User_Code fluctuations are on 33% left side The right side, read-me make corresponding adjustment in terms of load balancing.
Step S2.3:Function grade test result analysis.
Integrate function calls are the major parts of grapes logical process, to 8192 processes, in the uniform time 6 processes are randomly selected at interval, count the time accounting variation of different process integrate function calls.
As shown in figure 4, the time scale of Integrate function calls is different between different processes.Process number it is bigger than normal into Cheng Zhong:The time accounting of colm_init reduces sharply, the increase of solver_grapes accountings, in addition colm_init, med_before_ The amplitude of variation of solve_io, MPI_BARRIER between different processes relatively stablize, solver_grapes function calls when Between ratio change among varying processes it is obvious.Solver_grapes function calls are further analyzed, it is right 8192 processes, evenly spaced to randomly select 6 processes, there is accounting in main hot spot solve_helmholts and prm_3d Alternately change, the variations such as radiation_driver, upstream_interp_jin are more steady.
Next the hot spot function of GRAPES programs according to itself is taken and be ranked up, wherein holding time is at most Ff2 functions account for about full-time 3%.Ff2 functions are further analyzed, the function is in the operation of upper strata function loops Repeatedly called, the hot spot of function concentrates on the maximum value and minimum value of evaluation.
Step S3:
The hotspot's distribution in GRAPES_GFS is mentioned above on multiple functions, the usage time accounting highest in optimization Ff2 functions optimized as example.Hot spot in ff2 functions is concentrated mainly in the calculating of maximin, optimization Work is that the maximin that the mode that will be recycled in source program is found a function is replaced using library function, optimizes front and rear code such as Under:
A1=f (1)
B1=f (1)
Before optimization:
Do i=2,32
A1=amax1 (a1, f (i))
B1=amin1 (b1, f (i))
End do
After optimization:
Do i=2,32
A1m=maxval (f)
B1m=minval (f)
End do
Operation exports result:
Before optimization
A1=98.94819
B1=2.946837
RES_max=98.9489059448242
Total_time=59.90730
After optimization
A1m=98.94819
B1m=2.946837
RES_max=98.9489059448242
Total_time=24.00490
It recycles and performs 1000*1000*1000 times on former ff2 functional foundations.The time is 59.0730 before optimization;Optimization The time is 24.0049 afterwards, speed-up ratio 2.4956.From the point of view of optimum results, the effect of optimization is fairly obvious, has and improves operation The advantages of efficiency.
The present invention is based on GRAPES test results, it is proposed that the optimization method of hot spot function in GRAPES_GFS patterns.This Outside, entire GRAPES systems are further expanded to, improve running efficiency of system, solve Grapes in parallel super calculation grid Optimization problem on platform.
It should be noted that above-mentioned specific embodiment is exemplary, those skilled in the art can disclose in the present invention Various solutions are found out under the inspiration of content, and these solutions also belong to disclosure of the invention range and fall into this hair Within bright protection domain.It will be understood by those skilled in the art that description of the invention and its attached drawing are illustrative and are not Form limitations on claims.Protection scope of the present invention is limited by claim and its equivalent.

Claims (3)

  1. It is 1. a kind of based on the parallel super GRAPES system optimization methods for calculating grid cloud platform, which is characterized in that include the following steps:
    S1 test data set and operating system) are loaded into, carries out system level testing, communication stage test and the test of function grade respectively, Including:
    S1.1) system level testing:The weather condition that 0.1 ° of resolution ratio example is used to forecast on GRAPES_GFS 24 hours, every 6 The modvar of output in a hour, carries out performance expansion using 2048 processes, 4096 processes, 8192 processes and 12000 processes respectively Exhibition test;
    S1.2) communication stage is tested:Communication conditions when being run to program are monitored, and forecast that 24 is small using 0.1 ° of resolution ratio example When weather condition, every 6 hours are exported a modvar, are tested using 2048 processes;
    S1.3) function grade is tested:The function called in pattern is monitored, the operation characteristic of function is obtained, uses 0.1 ° point Resolution example forecasts the weather condition of 24 hours, and every 6 hours are exported a modvar, tested using 8192 processes;
    S2 test result analysis) is carried out according to derived system features file, including:
    S2.1 system test result analysis) is carried out, analysis GRAPES_GFS patterns are respectively in 2048 processes, 4096 processes, 8192 Operation characteristic under process and 12000 process scales, the use of the operation characteristic including cpu resource, the use of Internet resources, The use of memory source and the use of disk;
    S2.2 MPI communication stage test result analysis) is carried out, analyzes MP1 call duration times accounting and GRAPES logical process events Accounting;
    S2.3) into line function grade test result analysis, analysis of central issue carries out the function in GRAPES by software, analysis occupies Time most function;
    S3 processing) is optimized according to the analysis result of step S2, optimization processing includes:Vectorization, uses library letter at load balancing Number substitutes the function in GRAPES_GFS.
  2. 2. the method as described in claim 1, which is characterized in that step S2.2 is further included:It analyzes the distribution of call duration time and relates to And the time accounting of global scope.
  3. 3. method as claimed in claim 2, which is characterized in that in step s3, ff2 functions highest to usage time accounting It optimizes, maximum, the minimum value that the mode recycled in source program is found a function are replaced using library function.
CN201810021292.5A 2018-01-10 2018-01-10 GRAPES (GRAPES) system optimization method based on parallel supercomputing grid cloud platform Active CN108132872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810021292.5A CN108132872B (en) 2018-01-10 2018-01-10 GRAPES (GRAPES) system optimization method based on parallel supercomputing grid cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810021292.5A CN108132872B (en) 2018-01-10 2018-01-10 GRAPES (GRAPES) system optimization method based on parallel supercomputing grid cloud platform

Publications (2)

Publication Number Publication Date
CN108132872A true CN108132872A (en) 2018-06-08
CN108132872B CN108132872B (en) 2020-04-03

Family

ID=62399637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810021292.5A Active CN108132872B (en) 2018-01-10 2018-01-10 GRAPES (GRAPES) system optimization method based on parallel supercomputing grid cloud platform

Country Status (1)

Country Link
CN (1) CN108132872B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101397A (en) * 2018-08-01 2018-12-28 武汉索雅信息技术有限公司 High-Performance Computing Cluster monitoring method, unit and storage medium
CN111506442A (en) * 2020-04-16 2020-08-07 艾普阳科技(深圳)有限公司 Local procedure calling method, device, equipment and medium
CN113886251A (en) * 2021-09-30 2022-01-04 四川大学 Hot spot function determination method based on thermodynamic diagram

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588323A (en) * 2004-09-23 2005-03-02 上海交通大学 Parallel program visuable debugging method
CN102141962A (en) * 2011-04-07 2011-08-03 北京航空航天大学 Safety distributed test framework system and test method thereof
US20130006569A1 (en) * 2010-03-05 2013-01-03 Nec Corporation Control policy adjusting apparatus, method of adjusting control policy, and program
CN106776289A (en) * 2016-11-24 2017-05-31 山东交通学院 Multitask self adaptation cloud method of testing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1588323A (en) * 2004-09-23 2005-03-02 上海交通大学 Parallel program visuable debugging method
US20130006569A1 (en) * 2010-03-05 2013-01-03 Nec Corporation Control policy adjusting apparatus, method of adjusting control policy, and program
CN102141962A (en) * 2011-04-07 2011-08-03 北京航空航天大学 Safety distributed test framework system and test method thereof
CN106776289A (en) * 2016-11-24 2017-05-31 山东交通学院 Multitask self adaptation cloud method of testing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘钊: "基于国产高性能计算机的GRAPES性能优化研究", 《中国优秀硕士学位论文全文数据库》 *
李婵娟: "高性能计算综合评测框架的设计和实现", 《中国优秀硕士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101397A (en) * 2018-08-01 2018-12-28 武汉索雅信息技术有限公司 High-Performance Computing Cluster monitoring method, unit and storage medium
CN111506442A (en) * 2020-04-16 2020-08-07 艾普阳科技(深圳)有限公司 Local procedure calling method, device, equipment and medium
CN111506442B (en) * 2020-04-16 2023-05-09 艾普阳科技(深圳)有限公司 Local procedure call method, device, equipment and medium
CN113886251A (en) * 2021-09-30 2022-01-04 四川大学 Hot spot function determination method based on thermodynamic diagram

Also Published As

Publication number Publication date
CN108132872B (en) 2020-04-03

Similar Documents

Publication Publication Date Title
Xu et al. Graph processing on GPUs: Where are the bottlenecks?
Qin et al. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm
CN102854968B (en) Real-time energy consumption metering method of virtual machine
CN102323957B (en) Distributed parallel Skyline query method based on vertical dividing mode
CN102214086A (en) General-purpose parallel acceleration algorithm based on multi-core processor
CN104598565B (en) A kind of K mean value large-scale data clustering methods based on stochastic gradient descent algorithm
CN108132872A (en) Based on the parallel super GRAPES system optimization methods for calculating grid cloud platform
Guo et al. A container scheduling strategy based on neighborhood division in micro service
Pilla et al. A topology-aware load balancing algorithm for clustered hierarchical multi-core machines
Liao et al. Long-term generation scheduling of hydropower system using multi-core parallelization of particle swarm optimization
CN103309889A (en) Method for realizing of real-time data parallel compression by utilizing GPU (Graphic processing unit) cooperative computing
Huo et al. An improved multi-cores parallel artificial Bee colony optimization algorithm for parameters calibration of hydrological model
Loghin et al. On understanding time, energy and cost performance of wimpy heterogeneous systems for edge computing
Zhang et al. Using big data computing framework and parallelized PSO algorithm to construct the reservoir dispatching rule optimization
CN105700998A (en) Method and device for monitoring and analyzing performance of parallel programs
Zhang et al. Estimating power consumption of containers and virtual machines in data centers
Anand et al. The odd one out: Energy is not like other metrics
Gu et al. Adaptive online cache capacity optimization via lightweight working set size estimation at scale
Zhong et al. The cloud computing load forecasting algorithm based on wavelet support vector machine
Niu et al. Parallel grid-based density peak clustering of big trajectory data
CN204066111U (en) A kind of quick retrieval system of magnanimity electric-power metering data
CN112784435B (en) GPU real-time power modeling method based on performance event counting and temperature
Bermejo et al. On the linearity of performance and energy at virtual machine consolidation: the cis2 index for cpu workload in server saturation
CN104536938A (en) Method and system for calculating earthquake pre-stack time migration
CN108304549A (en) A kind of big data Intelligent processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant