CN108897619B

CN108897619B - Multi-level resource flexible configuration method for super computer

Info

Publication number: CN108897619B
Application number: CN201810680674.9A
Authority: CN
Inventors: 孟祥飞; 康波; 李健增; 刘光明; 菅晓东; 雷秀丽; 孙华文; 马庆珍
Original assignee: National Supercomputer Center In Tianjin
Current assignee: National Supercomputer Center In Tianjin
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2020-05-05
Anticipated expiration: 2038-06-27
Also published as: CN111475297A; CN111475297B; CN108897619A

Abstract

The invention relates to a multi-level resource flexible configuration method for a super computer, which comprises the following steps:assigning P of supercomputers to jobs⁰A node; the calculation is performing N tasks T₁,T₂,...,T_NReach M breakpoints in the process of { B } {₁,B₂,...,B_MCorresponding initial expected time of

Computing when executing task T_jLate arrival task T_jAnd task T_j+1Break point B in between_iThe actual time of (c)

And the initial expected time

Difference of (2)

When in use

Then, for the remaining N-j unexecuted tasks { T }_j+1,T_j+2,...,T_NAllocate P¹One compute node and recalculate to the remaining M-i breakpoints B_i+1,B_i+2,...,B_MCorresponding first corrected expected time of

Description

Multi-level resource flexible configuration method for super computer

Technical Field

The invention relates to a multi-level resource flexible configuration method for a super computer.

Background

The supercomputer is a computer which is formed by combining a plurality of computing nodes and can perform large-scale computation or data processing in parallel, is also called as a parallel computer, is the computer with the strongest function, the fastest operation and the largest storage capacity, is mainly used for the national high-tech field and the advanced technical research, and is an important embodiment of the national science and technology development level and the comprehensive national force.

At present, when a user submits a job to a supercomputer, various required resources, such as a storage space, a node number, a core number and the like of the supercomputer required for running the job, need to be specified by the user. In general, the user estimates the required resources based on experience or the results of a small number of data commissioning, and thus the bias is often large. If the requested resources are insufficient, the operation may be terminated due to timeout, overflow and the like, and a desired result cannot be obtained; however, if the resources requested are excessive, the user is charged with additional cost and valuable recalculation computing power is wasted. Therefore, how to specify a proper amount of resources for a job when the job is submitted and run becomes an urgent problem to be solved.

Disclosure of Invention

In order to solve the technical problem, the invention provides a multi-level resource flexible configuration method for a supercomputer, which comprises the following steps:

step S100, obtaining a job, wherein the job comprises N tasks { T }₁,T₂,...,T_NAnd M breakpoints B respectively arranged between the tasks₁,B₂,...,B_M}; assigning P of supercomputer to the job⁰A node; the calculation is performing N tasks T₁,T₂,...,T_NReach M breakpoints in the process of { B } {₁,B₂,...,B_MCorresponding initial expected time of

Wherein N, M and P⁰Are all natural numbers, and M is more than N;

step S200, calculating the task T when executing_jLate arrival task T_jAnd task T_j+1Break point B in between_iThe actual time of (c)

And the initial expected time

Difference of (2)

Step S300, when

Where | Δ t₁Is Deltat₁TH1 is a set threshold (preferably not exceeding 5).

Detailed Description

The present invention will be described in further detail in order to make the objects, technical solutions and advantages of the present invention more apparent. This description is made by way of example and not limitation to specific embodiments consistent with the principles of the invention, the description being in sufficient detail to enable those skilled in the art to practice the invention, other embodiments may be utilized and the structure of various elements may be changed and/or substituted without departing from the scope and spirit of the invention. The following detailed description is, therefore, not to be taken in a limiting sense.

One embodiment of the invention provides a multi-level resource flexible configuration method for a supercomputer, wherein the supercomputer is selected from a Tianhe supercomputer, in particular a Tianhe series supercomputer such as TH-1, TH-1A, TH-2, and the series supercomputer generally receives and executes a job in the form of a script file, wherein the script file at least provides parameters such as a job submission mode, a calculation partition, a node number, a core number, a task script file absolute file path and the like, and the submission form of reference is "yhbatch-N N1-p P1-N1xxx.bat", wherein N1 is the node number, and the data type is integer; p1 is the name of the partition, and the data type is a character string; n1 is the number of cores, and the data type is integer; bat is task script file name, data type is string, specifically, the configuration method includes the following steps:

step S100, obtaining a job through a script file, wherein the job comprises N tasks { T }₁,T₂,...,T_NAnd M breakpoints B respectively arranged between the tasks₁,B₂,...,B_MThe task can be any suitable software or a program for executing specific processing, the task has an interface for inputting data, the result of the input data after processing is used as output data, the result data output by the previous task is used as the input data of the next task, and the result of the task is obtained after the last task is executed, namely the execution of the task is finished; the breakpoint is arranged after one or more of the first N-1 tasks, the operation is temporarily suspended at the breakpoint, and the next task is continued after the execution progress of the operation is evaluated; assigning P of supercomputer to the job after obtaining script file⁰Each node generally comprises a plurality of computing cores, for example, 4-28 cores, and in the Tianhe super computer, computing resources are generally distributed by taking the node as a unit; then, before the job is executed, the N tasks are calculated to be executed { T }₁,T₂,...,T_NReach M breakpoints in the process of { B } {₁,B₂,...,B_MCorresponding initial expected time of

And the initial expected job run time required to complete the job, i.e., all N tasks

Wherein N, M and P⁰All are natural numbers, and M > N, the calculated initial expected time

And an initial expected job run time

After storing, the method is used for evaluating the execution progress of the job in the following steps;

step S200, after the job is submitted to the super computer to be executed, in the task T_jAnd task T_j+1Breakpoint B_iInterrupt processing, obtaining the operation from the start of execution to the breakpoint B_iActual running time of the process

Calculating the actual time

Corresponding to the initial expected time

Difference between them

Step S300, when

Where | Δ t₁Is Deltat₁Absolute value of, TH₁To set the threshold value, TH₁May be any suitable value, typically not more than 10, preferably not more than 5, for example not more than 4, 3, 2, 1, 0.5, 0.3, 0.2, 0.1 etc.

In a preferred embodiment, in step S300, when Δ t is greater than or equal to₁When > 0, P¹＝(1+w)×P⁰First corrected expected time

In (1)

Wherein M is more than or equal to i +1 and less than or equal to M,

thus as i increases, w gets closer to A₁Thereby providing more resources to complete the job as soon as possible within the expected time.

In a preferred embodiment, in step S300, when Δ t is greater than or equal to₁When the ratio is less than or equal to 0, P¹＝(1-w)×P⁰First corrected expected time

In (1)

Wherein M is more than or equal to i +1 and less than or equal to M,

this allows for the release of excess resources as early as possible to obtain completion in the right time and to save costs as much as possible.

Similarly, when task T is executed_j+yLate arrival task T_j+yAnd task T_j+y+1Break point B in between_i+xWhen in treatment, y is more than or equal to 1 and less than N-j, x is more than or equal to 1 and less than M-i, and the actual time is calculated

And the initial expected time

Difference of (2)

When in use

Then, for the remaining N-j-y tasks { T }_j+y+1,T_j+y+2,...,T_NAllocate P^xOne compute node and recalculate to the remaining M-i-x breakpoints { B }_i+x+1,B_i+x+2,...,B_MCorresponding corrected expected time of xth

Wherein when | Δ t_xIs Deltat_xAbsolute value of, TH_xTo set the threshold value, TH_xCan be connected with TH₁The same or different. TH allows for more resources to be needed to correct the bias the later the program is, and therefore_xPreferably less than TH₁So as to sensitively and timely start the resource allocation correction process and complete the operation on time.

In some cases, in step S200, the number P of nodes that need to be allocated for the current operation of the job is calculated according to the algorithm complexity of the job and the data amount of the current operation, or according to the historical operation result of the job and the data amount of the current operation⁰And calculating the number of tasks in executing N { T }₁,T₂,...,T_NReach M breakpoints in the process of { B } {₁,B₂,...,B_MCorresponding initial expected time of

For example, the historical run results include the job at different data volumes { D₁,D₂,...,D_LAnd number of different nodes { P }₁,P₂,...,P_KRun time under the conditions of }

And run to M breakpoints { B₁,B₂,...,B_MCorresponding time of }

Wherein

And

respectively indicate the data quantity D of the operation_aAnd the number of nodes P_bRun time and run arrival breakpoint B under the condition of (c)_iThe time of (d); accordingly, the data amount D of the current operation is calculated according to the operation_cAnd a desired run time t_cSearching and selecting from the historical operation results

Corresponding number of nodes P_bThe number P of the nodes needing to be distributed as the operation of the operation at this time⁰Selecting

Reach M breakpoints as runs B₁,B₂,...,B_MCorresponding initial expected time of

Wherein D is_aAnd D_cClosest to and equal to or greater than D_cAt the same time

And t_cClosest to and not more than t_c。

In other cases, when the number P of nodes to be allocated for the current operation of the job cannot be found in the historical operation result according to the data size and the expected operation time of the current operation of the job⁰And run to M breakpoints { B₁,B₂,...,B_MCorresponding toInitial expected time

In this case, various known interpolation methods can be used to determine the number P of nodes to be allocated in the current operation of the job⁰And run to M breakpoints { B₁,B₂,...,B_MCorresponding initial expected time of

In some cases, at allocation P⁰At the same time, according to N tasks { T }₁,T₂,...,T_NAllocating memory and storage space according to the requirement of the unit; and in distribution P¹While computing nodes, according to the remaining N-j unexecuted tasks { T_j+1,T_j+2,...,T_NAllocating memory and storage space according to the requirement of the unit; and in distribution P^xWhile computing the node, according to the rest N-j-y tasks { T }_j+y+1,T_j+y+2,...,T_NAllocating memory and storage space according to the requirement of the unit; wherein any known method may be employed to allocate memory and storage space. Preferably, the storage space is increased when the allocated storage space occupancy exceeds 85%, preferably exceeds 75%, more preferably exceeds 70%.

By using the method of the invention, before the operation is executed, the proper resources can be initially allocated to the operation as accurately as possible according to the historical data; during the execution of the operation, the resources can be dynamically allocated or recovered according to the actual execution condition of the operation, namely the condition of exceeding or falling below the expected progress, so that the flexible or elastic configuration of the resources of the super computer at multiple levels is realized, and the occupation of the resources of the super computer is reduced as much as possible under the condition of fully ensuring the timely completion of the operation.

Moreover, other implementations of the invention will be apparent to those skilled in the art from consideration of the specification of the invention disclosed herein. The embodiments and/or aspects of the embodiments can be used in the systems and methods of the present invention alone or in any combination. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A method for flexible configuration of multi-tier resources for a supercomputer, comprising the steps of:

Wherein N, M and P⁰Are all natural numbers, and M is more than N;

And the initial expected time

Difference of (2)

Calculating the number P of nodes required to be allocated for the current operation of the operation according to the algorithm complexity of the operation and the data volume of the current operation or according to the historical operation result of the operation and the data volume of the current operation⁰And calculating the number of tasks in executing N { T }₁,T₂,...,T_NReach M breakpoints in the process of { B } {₁,B₂,...,B_MCorresponding initial expected time of

Wherein the historical operating result comprises that the job has different data volumes { D₁,D₂,...,D_LAnd number of different nodes { P }₁,P₂,...,P_KRun time under the conditions of }

And run to M breakpoints { B₁,B₂,...,B_MCorresponding time of }

Wherein

And

And t_cClosest to and not more than t_c；

Step S300, when

Where | Δ t₁Is Deltat₁Absolute value of, TH₁To set the threshold.

2. The multi-tier resource flexible configuration method for a supercomputer according to claim 1, wherein, in step S300, TH₁Is a set threshold value not exceeding 5.

3. The multi-level resource flexible configuration method for a supercomputer according to claim 1, wherein when the number P of nodes to be allocated for the current operation of the job cannot be found in the historical operation result according to the data amount and expected operation time of the current operation of the job⁰And run to M breakpoints { B₁,B₂,...,B_MCorresponding initial expected time of

Then, the number P of the nodes needing to be distributed in the current operation of the operation is obtained by adopting an interpolation method⁰And run to M breakpoints { B₁,B₂,...,B_MCorresponding initial expected time of

4. The multi-tier resource flexible configuration method for a supercomputer according to any one of claims 1 to 3, wherein in step S300, when Δ t is reached₁When > 0, P¹＝(1+w)×P⁰First corrected expected time

In (1)

Wherein M is more than or equal to i +1 and less than or equal to M,

5. the multi-tier resource flexible configuration method for a supercomputer according to any one of claims 1 to 3, wherein in step S300, when Δ t is reached₁When the ratio is less than or equal to 0, P¹＝(1-w)×P⁰First corrected expected time

In (1)

Wherein M is more than or equal to i +1 and less than or equal to M,