CN108563510A

CN108563510A - The architecture method for sensing and optimizing calculated towards E grades

Info

Publication number: CN108563510A
Application number: CN201810418522.1A
Authority: CN
Inventors: 刘彦; 刘尧; 黄智�; 黄一智; 李仁发
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2018-09-21
Anticipated expiration: 2038-05-04
Also published as: CN108563510B

Abstract

The invention discloses a kind of architecture method for sensing and optimizing calculated towards E grades, include the steps that task flow sensing and optimizing；The step of program code sensing and optimizing；The step of the step of programmed algorithm sensing and optimizing and vectorial sensing and optimizing.This architecture method for sensing and optimizing calculated towards E grades provided by the invention, using a variety of sensing and optimizing algorithms, when magnanimity computing resource that can be when being calculated towards E grade, rationally adjustment and distribution hardware resource, raising computational efficiency.

Description

An Architecture-Aware Optimization Method for Exascale Computing

技术领域technical field

本发明具体涉及一种面向E级计算的体系结构感知优化方法。The invention specifically relates to an architecture-aware optimization method for exascale computing.

背景技术Background technique

随着经济技术的发展、人们生活水平的提高和信息化时代的到来，E级计算 (百万万亿次)的超级计算机的时代已经到来。E级计算具有超强的计算能力，能够给计算科学和科学研究带来革命性的变化。With the development of economy and technology, the improvement of people's living standards and the arrival of the information age, the era of supercomputers with exascale calculations (metaflops) has arrived. Exascale computing has super computing power and can bring revolutionary changes to computing science and scientific research.

随着科学的发展，各种各样的数值模型的尺度也越来越大，模型也更加接近实际情况。但是，更加精确的模型意味着更加复杂的模型求解。目前，科研院所的研究经费相对紧张，无法承担购买大量服务器所需要的经费。所以，即使科研院所成功的建立了更加精确的数值模型，其自身也无法独立将该数值模型进行求解。With the development of science, the scale of various numerical models is getting larger and larger, and the models are closer to the actual situation. However, a more accurate model means a more complex model solution. At present, the research funds of scientific research institutes are relatively tight and cannot afford the funds required to purchase a large number of servers. Therefore, even if a scientific research institute successfully establishes a more accurate numerical model, it cannot independently solve the numerical model.

如果科研院所能够将更加精确的数值模型的代码移植到面向E级计算的超算平台上，那么科研院所则无需购置大量性能强劲的服务器，只需要将计算代码移植到超算平台，远程查看计算进度和效果，支付超算计算的费用即可。而超算计算的费用与大量性能强劲的服务器相比，无疑成本要低廉许多。If scientific research institutes can transplant the codes of more accurate numerical models to the supercomputing platform for exascale computing, then the scientific research institutes do not need to purchase a large number of powerful servers, but only need to transplant the calculation codes to the supercomputing platform. Check the calculation progress and effect, and pay for the supercomputing fee. The cost of supercomputing is undoubtedly much lower than that of a large number of powerful servers.

但是，如何将数值模型的计算代码有效的移植到面向E级计算的超算平台，从而最大程度的发挥E级计算超算平台的计算能力，目前尚未有相关的研究，从而也在一定程度上制约了该领域的发展。However, there is no relevant research on how to effectively transplant the calculation code of the numerical model to the supercomputing platform for E-level computing, so as to maximize the computing power of the E-level computing supercomputing platform. restrict the development of this field.

发明内容Contents of the invention

本发明的目的在于提供一种面向E级计算，能够根据任务体系结构进行感知对程序进行优化的面向E级计算的体系结构感知优化方法。The purpose of the present invention is to provide an E-level computing-oriented architecture-aware optimization method capable of perceiving and optimizing programs according to the task architecture.

本发明提供的这种面向E级计算的体系结构感知优化方法，包括如下步骤：This E-level computing-oriented architecture-aware optimization method provided by the present invention includes the following steps:

任务流感知优化的步骤，用于对所需要执行的所有任务进行感知和优化，调整并得到各个任务的处理顺序，从而加快任务流的处理效率；The step of task flow perception optimization is used to perceive and optimize all the tasks that need to be executed, adjust and obtain the processing order of each task, so as to speed up the processing efficiency of the task flow;

程序代码感知优化的步骤，用于将需要执行的程序代码分发至最合适的系统进行进行分布式计算并进行结果汇总，从而提高程序代码的处理效率；The step of program code perception optimization is used to distribute the program code that needs to be executed to the most suitable system for distributed calculation and summary of results, so as to improve the processing efficiency of program code;

程序算法感知优化的步骤，用于对程序算法进行分析并分配相应的计算资源，从而提高程序算法的处理效率；The step of program algorithm perception optimization is used to analyze the program algorithm and allocate corresponding computing resources, so as to improve the processing efficiency of the program algorithm;

向量感知优化的步骤，用于对计算过程中的向量化计算选择最大的压缩存储格式，从而提高向量化计算的处理效率。The vector-aware optimization step is used to select the largest compressed storage format for the vectorized calculation in the calculation process, thereby improving the processing efficiency of the vectorized calculation.

所述的任务流感知优化，具体包括如下步骤：The task flow perception optimization specifically includes the following steps:

(1)获取所有需要执行的任务，并确认所有任务之间不存在相互依赖关系；(1) Obtain all tasks that need to be executed, and confirm that there is no interdependence between all tasks;

(2)对步骤(1)中获取的所有任务的子任务进行识别，并在一个时间段内统一对各个子任务分配计算资源；(2) Identify the subtasks of all tasks obtained in step (1), and uniformly allocate computing resources to each subtask within a period of time;

(3)采用最短作业优化算法找到各个子任务中计算量最小的子任务，并优先处理该计算量最小的子任务。(3) Use the shortest job optimization algorithm to find the subtask with the least amount of calculation among each subtask, and process the subtask with the least amount of calculation first.

所述的程序代码感知优化，具体包括如下步骤：The program code perception optimization specifically includes the following steps:

1)对测试代码进行测试，找到各个典型独立代码段与系统硬件之间的最佳匹配关系；1) Test the test code to find the best matching relationship between each typical independent code segment and the system hardware;

2)对程序代码进行识别，找到程序代码中的独立代码段；2) Identify the program code and find the independent code segment in the program code;

3)将步骤2)得到的独立代码段与典型独立代码段进行匹配，从而得到各个独立代码段与典型独立代码段之间的对应关系；3) matching the independent code segment obtained in step 2) with the typical independent code segment, thereby obtaining the correspondence between each independent code segment and the typical independent code segment;

4)根据步骤3)中的独立代码段与典型独立代码段之间的对应关系，以及步骤1)中各个典型独立代码段与系统硬件之间的最佳匹配关系，将独立代码段分配至对应的系统硬件，从而提高程序代码的处理效率。4) According to the corresponding relationship between the independent code segment and the typical independent code segment in step 3), and the best matching relationship between each typical independent code segment and the system hardware in step 1), assign the independent code segment to the corresponding system hardware, thereby improving the processing efficiency of the program code.

所述的程序算法感知优化，具体包括如下步骤：The program algorithm perception optimization specifically includes the following steps:

A.在程序算法运行前，对不同的程序算法进行计算资源的平均分配；A. Before the program algorithm is run, the computing resources are evenly allocated to different program algorithms;

B.在程序算法运行时，若出现计算等待的情况，则程序算法的资源需求进行重新分析；B. When the program algorithm is running, if there is a calculation waiting situation, the resource requirements of the program algorithm will be re-analyzed;

C.根据步骤B得到的分析结果，采用动态资源平衡(DRF)算法对计算资源进行重新分配。C. According to the analysis result obtained in step B, reallocate computing resources by using a dynamic resource balancing (DRF) algorithm.

步骤C所述的对计算资源进行重新分配，具体为采用如下原则进行重新分配：The redistribution of computing resources described in step C is specifically to redistribute according to the following principles:

R1.用户不能获得比其他用户更多的资源；R1. Users cannot obtain more resources than other users;

R2.用户不能通过谎报其资源需求来获得更多的资源；R2. Users cannot obtain more resources by lying about their resource requirements;

R3.分配所有可以利用的资源，不用取代现有的资源分配；R3. Allocate all available resources without replacing existing resource allocations;

R4.用户不会更喜欢其他用户的资源分配。R4. Users do not prefer resource allocations of other users.

本发明提供的这种面向E级计算的体系结构感知优化方法，采用多种感知优化算法，能够在面向E级计算时的海量计算资源时，合理调整和分配硬件资源，提高计算效率。The E-level computing-oriented architecture perception optimization method provided by the present invention adopts multiple perception optimization algorithms, which can reasonably adjust and allocate hardware resources when facing massive computing resources during E-level computing, and improve computing efficiency.

附图说明Description of drawings

图1为本发明方法的方法示意图。Fig. 1 is a method schematic diagram of the method of the present invention.

具体实施方式Detailed ways

如图1所示为本发明方法的方法流程示意图：本发明提供的这种面向E级计算的体系结构感知优化方法，包括如下步骤：As shown in Figure 1, it is a schematic flow chart of the method of the present invention: the architecture-aware optimization method for E-level computing provided by the present invention includes the following steps:

任务流感知优化的步骤，用于对所需要执行的所有任务进行感知和优化，调整并得到各个任务的处理顺序，从而加快任务流的处理效率；具体包括如下步骤：The task flow-aware optimization step is used to perceive and optimize all the tasks that need to be executed, adjust and obtain the processing order of each task, so as to speed up the processing efficiency of the task flow; specifically, it includes the following steps:

(3)采用最短作业优化算法找到各个子任务中计算量最小的子任务，并优先处理该计算量最小的子任务；(3) Use the shortest job optimization algorithm to find the subtask with the smallest amount of calculation among each subtask, and give priority to the subtask with the smallest amount of calculation;

程序代码感知优化的步骤，用于将需要执行的程序代码分发至最合适的系统进行进行分布式计算并进行结果汇总，从而提高程序代码的处理效率；具体包括如下步骤：The step of program code perception optimization is used to distribute the program code that needs to be executed to the most suitable system for distributed calculation and summary of results, so as to improve the processing efficiency of program code; specifically, it includes the following steps:

4)根据步骤3)中的独立代码段与典型独立代码段之间的对应关系，以及步骤1)中各个典型独立代码段与系统硬件之间的最佳匹配关系，将独立代码段分配至对应的系统硬件，从而提高程序代码的处理效率；4) According to the corresponding relationship between the independent code segment and the typical independent code segment in step 3), and the best matching relationship between each typical independent code segment and the system hardware in step 1), assign the independent code segment to the corresponding system hardware, thereby improving the processing efficiency of the program code;

程序算法感知优化的步骤，用于对程序算法进行分析并分配相应的计算资源，从而提高程序算法的处理效率；具体包括如下步骤：The step of program algorithm perception optimization is used to analyze the program algorithm and allocate corresponding computing resources, so as to improve the processing efficiency of the program algorithm; specifically, it includes the following steps:

C.根据步骤B得到的分析结果，采用动态资源平衡(DRF)算法对计算资源进行重新分配；C. According to the analysis result obtained in step B, the computing resources are redistributed by using a dynamic resource balancing (DRF) algorithm;

向量感知优化的步骤，用于对计算过程中的向量化计算选择最大的压缩存储格式，从而提高向量化计算的处理效率；具体为采用如下原则进行重新分配：The vector-aware optimization step is used to select the largest compressed storage format for vectorized calculations in the calculation process, thereby improving the processing efficiency of vectorized calculations; specifically, the following principles are used for redistribution:

Claims

1. a kind of architecture method for sensing and optimizing calculated towards E grades, includes the following steps：

The step of task flow sensing and optimizing, is perceived for all tasks to required execution and is optimized, adjusted and obtained The processing sequence of each task, to accelerate the treatment effeciency of task flow；

The step of program code sensing and optimizing, carries out for the program code for needing to execute to be distributed to most suitable system Distributed Calculation simultaneously carries out result and summarizes, to improve the treatment effeciency of program code；

The step of programmed algorithm sensing and optimizing, for being analyzed programmed algorithm and being distributed corresponding computing resource, to carry The treatment effeciency of high programmed algorithm；

The step of vectorial sensing and optimizing, selects maximum compression storage format for calculating the vectorization in calculating process, from And improve the treatment effeciency of vectorization calculating.

2. the architecture method for sensing and optimizing according to claim 1 calculated towards E grades, it is characterised in that described appoints Business stream sensing and optimizing, specifically comprises the following steps：

(1) task of institute's execution in need is obtained, and confirms that there is no relation of interdependence between all tasks；

(2) subtask of all tasks to being obtained in step (1) is identified, and unifies in a period of time to each height Task distributes computing resource；

(3) subtask of calculation amount minimum in each subtask is found using most short optimization of job algorithm, and the priority processing meter The subtask of calculation amount minimum.

3. the architecture method for sensing and optimizing according to claim 1 or 2 calculated towards E grades, it is characterised in that described Program code sensing and optimizing, specifically comprise the following steps：

1) test code is tested, finds the optimum matching relation between each typical independent code section and system hardware；

2) program code is identified, finds the independent code section in program code；

3) the independent code section that step 2) obtains is matched with typical independent code section, to obtain each independent code section With the correspondence between typical independent code section；

4) according to each in the independent code section in step 3) and the correspondence between typical independent code section and step 1) Optimum matching relation between typical independent code section and system hardware distributes independent code section to corresponding system hardware, To improve the treatment effeciency of program code.

4. the architecture method for sensing and optimizing according to claim 1 or 2 calculated towards E grades, it is characterised in that described Programmed algorithm sensing and optimizing, specifically comprise the following steps：

A. before programmed algorithm operation, the mean allocation of computing resource is carried out to different programmed algorithms；

B. when programmed algorithm is run, if occurring calculating the case where waiting for, the resource requirement of programmed algorithm is reanalysed；

C. the analysis result obtained according to step B divides computing resource using dynamic resource balance (DRF) algorithm again Match.

5. the architecture method for sensing and optimizing according to claim 1 or 2 calculated towards E grades, it is characterised in that step C Described redistributes computing resource, is specially redistributed using following principle：

R1. user cannot obtain resources more more than other users；

R2. user cannot obtain more resources by lying about its resource requirement；

R3. all utilizable resources are distributed, do not have to replace existing resource allocation；

R4. user will not prefer the resource allocation of other users.