CN111444025A

CN111444025A - Resource allocation method, system and medium for improving energy efficiency of computing subsystem

Info

Publication number: CN111444025A
Application number: CN202010290699.5A
Authority: CN
Inventors: 陈娟; 齐新新; 董勇; 袁远; 吴菲豪; 孙晓乐; 欧祉辛; 张云放
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2020-07-24
Anticipated expiration: 2040-04-14
Also published as: CN111444025B

Abstract

The invention discloses a resource allocation method, a system and a medium for improving the energy efficiency of a computing subsystem_targetDetermining the optimal number of added nodes delta N^*Processor frequency f^*(ii) a Setting a power consumption limit value to be satisfied to P_targetAnd scheduling the parallel program to run at N + delta N^*On one computing node (Δ N)^*>0) and the initial value of the processor frequency of each compute node is the processor frequency f^*Where N is the minimum number of compute nodes (one process allocated for each processor core) required for parallel program execution. The invention can realize the reduction of the program execution time and the energy consumption under the condition of meeting the power consumption constraint aiming at the access limited parallel program running on the system, thereby improving the energy effectiveness of the system.

Description

Resource allocation method, system and medium for improving energy efficiency of computing subsystem

Technical Field

The invention relates to a resource allocation technology of a high-performance computing cluster, in particular to a resource allocation method, a resource allocation system and a resource allocation medium for improving the energy efficiency of a computing subsystem.

Background

The computing power of high performance computing systems is increasingly affected by power consumption. Despite the rapid increase in energy consumption of high performance computing centers, high performance computing users still require higher performance to run more complex models at larger data scales. Therefore, it is urgently needed to find a method for improving the performance of a high-performance computer program under the condition of meeting the constraint condition of power consumption. Currently in this research area, there are several ways to improve the energy efficiency of high performance computing systems, such as designing new computer architectures and performing reasonable resource scheduling for high performance computing programs based on software. The software-based resource scheduling method improves the performance of a program under the condition of meeting power consumption constraints by carefully determining the computing resource settings, such as the number of computing nodes, the processor frequency and the like. One advantage of the software-based resource scheduling approach is that it can be easily deployed on existing hardware since no hardware modifications are required. Currently, the resource allocation strategy of most high performance computing centers aims to maximize system utilization, i.e., allocate as few computing nodes as possible. This strategy does not take into account the relationship between the optimal performance of the access-limited parallel program and the number of distributed compute nodes, because maximizing processor utilization may result in severe memory contention for the access-limited parallel program, thereby affecting parallel performance.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a resource allocation method, a system and a medium for improving the energy efficiency of a computing subsystem, and the method, the system and the medium can realize the reduction of program execution time, the constant total power consumption and the reduction of energy consumption under the condition of meeting the power consumption constraint condition aiming at the access-limited parallel program running on the system, thereby improving the energy effectiveness of the system.

In order to solve the technical problems, the invention adopts the technical scheme that:

a resource allocation method for improving energy efficiency of a computing subsystem comprises the following implementation steps:

1) determining an optimal number of added nodes Δ N, a processor frequency f, and a power consumption limit value P_target；

2) Setting a power consumption limit value to P using a dynamic processor frequency adjustment tool_targetAnd scheduling the parallel program to run on N + delta N computing nodes, wherein the initial value of the processor frequency of each computing node is the processor frequency f, N is the minimum number of the computing nodes required by the running of the parallel program, and each processor core runs a process under the allocation of default resources.

Optionally, step 1) is preceded by a step of calculating an optimal number of added nodes Δ N, and the detailed steps include: calculating a first incremental node data interval [0, Δ N ] using total memory bandwidth^pref](ii) a Calculating a second increased node data interval [0, Delta N ] by using a power consumption constraint condition^power](ii) a Find the first incremental node data interval [0, Δ N^pref]Second incremental node data interval [0, Δ N^power]And selecting the maximum value in the intersection interval as the optimal number of the increased nodes.

Optionally, the first increased node data interval [0, Δ N ] is calculated by using the total memory bandwidth^pref]The detailed steps comprise:

memory bandwidth calculation first incremental node data interval [0, Δ N ]^pref]The detailed steps comprise:

s1) acquiring the actual memory access bandwidth b of each computing node at each time t of record₁(t),b₂(t),...,b_N(t), calculating the average actual memory access bandwidth B (t) of the single node during the running of the parallel program, and taking the maximum value of B (t) as the actual memory access bandwidth B of the parallel program_N；

S2) calculating the actual memory access bandwidth B_NA ratio bound with respect to the physical memory bandwidth B of a single node, and according to the ratioWhether the value bound reaches a threshold value α or not, judging whether the access of the parallel program is limited or not, if the access is not limited, jumping to execute the step S3), and if the access is limited, jumping to execute the step S4);

s3) determining that no node addition is required, Δ N is set^prefIs 0, so that the resulting first incremental node data interval [0, Δ N ] is^pref]Is [0,0 ]]Ending and returning;

s4) according to the principle of invariable total memory bandwidth N- ((bound/α) & B_N)＝(N+ΔN^pref) α. B solving for the number of nodes Δ N that need to be added^prefTo obtain the first incremental node data interval [0, Δ N ]^pref]And ending and returning.

Optionally, the second increased node data interval [0, Δ N ] is calculated by using a power consumption constraint condition^power]Specifically, the maximum node number Δ N satisfying the following power consumption constraint function is solved, and the obtained node number Δ N is used as the node number Δ N to be increased^powerTo obtain a second incremental node data interval [0, Δ N ]^power]；

In the above formula, n is the number of processes of the parallel program, and each processor core runs one process under the default resource allocation, P^cpu(f_max) As maximum frequency f of a single processor core_maxLower corresponding maximum power consumption, P^cpu(f_mid) Run at f for a single processor core_midThe next corresponding processor power consumption, c is the number of processor cores owned on each compute node,

for processor power consumption, P, with a single processor core in an idle state^memFor memory power consumption, P^otherFor other power consumption on a single compute node, in addition to processor and memory, adding Δ N compute nodes will correspondingly increase the total power consumption, including memory power consumption of Δ N compute nodes and idle processor cores generated by the added nodesPower consumption, the frequency of all processor cores must be from the maximum frequency f in order to ensure that the total power consumption of the multi-node is not increased_maxDown to the middle of the frequency f_midTaking the intermediate value f of frequency_midIs the maximum frequency f_maxAnd minimum frequency f_minAverage value between the two.

Optionally, step 1) is preceded by a step of calculating a processor frequency f, and the detailed steps include: substituting the optimal number of added nodes Δ N into a power consumption constraint function expressed by the following formula, making Δ N equal to Δ N, and making P^cpu(f_mid)＝P^cpu(f_i) To obtain a power consumption value P^cpu(f_i) According to the relation between different processor frequency levels and the power consumption values of the processor cores, the processor frequency f meeting the conditions is taken_iAs the processor frequency f determined in step 1);

in the above formula, n is the number of processes of the parallel program, and each processor core runs one process under the default resource allocation, P^cpu(f_max) Operating at a maximum frequency f for a single processor core_maxLower corresponding processor power consumption, P^cpu(f_i) Run at f for a single processor core_iThe next corresponding processor power consumption, c is the number of processor cores owned on each compute node,

for processor power consumption, P, with a single processor core in an idle state^memFor memory power consumption, P^otherFor other power consumption on a single compute node than processor and memory.

Optionally, before the step 1), calculating the power consumption limit value P is further included_targetAnd calculating a function expression as follows:

in the above formula, P^cpu(f_max) Operating at a maximum frequency f for a single processor core_maxThe corresponding processor power consumption.

In addition, the invention also provides a resource allocation system for improving the energy efficiency of the computing subsystem, which comprises the following components:

a parameter initialization program unit for determining the optimum number of incremental nodes Δ N, the processor frequency f and the power consumption limit value P_target；

A resource allocation program unit for setting the power consumption limit value of the dynamic frequency adjustment tool of the processor to P_targetAnd scheduling the parallel program to run on N + delta N computing nodes, wherein the initial value of the processor frequency of each computing node is the processor frequency f, N is the minimum computing node number (each processor core runs one process under default resource allocation) required by the running of the parallel program, and delta N is the optimal increased node number.

In addition, the present invention also provides a resource allocation system for improving energy efficiency of a computing subsystem, which includes a computer device programmed or configured to perform the steps of the aforementioned resource allocation method for improving energy efficiency of a computing subsystem, or a computer program programmed or configured to perform the aforementioned resource allocation method for improving energy efficiency of a computing subsystem is stored in a memory of the computer device.

Furthermore, the present invention also provides a computer-readable storage medium having stored thereon a computer program programmed or configured to perform the aforementioned resource allocation method for improving energy efficiency of a computing subsystem.

Compared with the prior art, the invention has the following advantages: the invention can realize the reduction of program execution time, the constant total power consumption and the reduction of energy consumption under the condition of meeting the power consumption constraint condition aiming at the access-limited parallel program running on the system, thereby improving the energy effectiveness of the system.

Drawings

FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.

FIG. 2 is a detailed flow chart of the method according to the embodiment of the present invention.

Detailed Description

As shown in fig. 1, the implementation steps of the resource allocation method for improving the energy efficiency of the computing subsystem in this embodiment include:

2) Setting a power consumption limit value P for a processor dynamic frequency adjustment tool_targetAnd scheduling the parallel program to run on N + delta N computing nodes, wherein the initial value of the processor frequency of each computing node is the processor frequency f, and N is the minimum number of computing nodes required by the running of the parallel program (each processor core runs a process under default resource allocation)_target。

As shown in FIG. 2, the present embodiment requires pre-measuring and establishing the relationship between the frequency levels of different processors and the power consumption of the processor core, P is described above^cpu() Namely, the method is used for acquiring the relationship between different processor frequency levels and the power consumption values of the processor cores. In this embodiment, the frequency level is divided by 0.1GHZ, and the processor can adjust the frequency interval [ f [ ]_min,f_max]May be divided into M stages. And constructing a corresponding table of the relationship between the processor frequency level and the processor core power consumption value by using the measured processor power consumption under different frequency levels, wherein the table comprises M groups of values. Each set of values includes two parts: processor frequency f_iCorresponding individual processor core power consumption P^cpu(f_i). In this embodiment, the program is run under a default resource allocation policy and used

VTune^TMThe Amplifier collects performance analysis data, and uses Intel RAP L to measure power consumption related data, wherein the performance related analysis data comprises actual memory access bandwidths of programs at different moments t and bound values (the actual memory access bandwidth B) reflecting the limited degree of memory access_NRelative sheetThe ratio of the physical memory bandwidth B of the individual nodes). The analysis data related to power consumption includes: processor power consumption with single processor core in idle state

Memory power consumption P^menPower consumption P on a single compute node other than processor and memory^other。

As shown in fig. 2, before step 1), the present embodiment further includes a step of calculating an optimal number of added nodes Δ N, and the detailed steps include: calculating a first incremental node data interval [0, Δ N ] using total memory bandwidth^pref](ii) a Calculating a second increased node data interval [0, Delta N ] by using a power consumption constraint condition^power](ii) a Find the first incremental node data interval [0, Δ N^pref]Second incremental node data interval [0, Δ N^power]And selecting the maximum value in the intersection interval as the optimal number of the increased nodes.

In this embodiment, the total memory bandwidth is used to calculate the first incremental node data interval [0, Δ N^pref]The detailed steps comprise:

S2) calculating the actual memory access bandwidth B_NRelative to the ratio bound of the physical memory bandwidth B of a single node, judging whether the access and storage of the parallel program are limited according to whether the ratio bound exceeds a threshold value α, and if the access and storage are not limited, skipping to execute a step S3);

s4) according to the principle of invariable total memory bandwidth N- ((bound/α) & B_N)＝(N+ΔN^pref) α. B is solvedNumber of nodes to increase to^prefTo obtain the first incremental node data interval [0, Δ N ]^pref]And ending and returning.

Calculating a function expression of the average actual memory access bandwidth b (t) of the single node as follows:

in the above equation, N is the minimum number of compute nodes required for parallel program execution (one process is run per processor core under default resource allocation), where b_iAnd (t) is the actual memory access bandwidth value on the ith computing node.

Limited by physical memory bandwidth, during program execution, actual memory access bandwidth B_NFor a program which is not limited by access, when different computing node numbers are used, the total actual access bandwidth of all nodes can be considered to be constant, and when the program is limited by access, the increased computing nodes can reduce the calculation amount of the parallel program on a single computing node and reduce the access times on the single node, thereby relieving the access limitation condition of the single node, and the total actual access bandwidth of multiple nodes is correspondingly increased along with the increase of the node number (bound/α) B. B_NThe higher the memory limitation degree is, the larger the value is than B_NWherein the bound value reflects the access limit degree.

According to the unchanged total memory bandwidth, the following function expression is provided:

in the above formula, the first and second carbon atoms are,

therefore, according to the principle of the total memory bandwidth invariance, N. ((bound/α) · B_N)＝(N+ΔN^pref) α. B solving for the number of nodes Δ N that need to be added^prefTo obtain the first incremental node data interval [0, Δ N ]^pref]。

In this embodiment, the power consumption constraint condition is used to calculate the second incremental node data interval [0, Δ N^power]Specifically, the maximum node number Δ N satisfying the following power consumption constraint function is solved, and the obtained node number Δ N is used as the node number Δ N to be increased^powerTo obtain a second incremental node data interval [0, Δ N ]^power]；

In the above formula, n is the number of processes, P^cpu(f_max) As maximum frequency f of a single processor core_maxLower corresponding maximum power consumption, P^cpu(f_mid) As the intermediate frequency f of a single processor core_midThe corresponding maximum power consumption, c is the number of processor cores of a single compute node,

for processor power consumption, P, with a single processor core in an idle state^menFor memory power consumption, P^otherFor other power consumption of a single computing node except for a processor and a memory, increasing the Δ N computing nodes correspondingly increases the total power consumption, wherein the total power consumption comprises the memory power consumption of the Δ N computing nodes and the power consumption of idle processor cores generated by the increased nodes, and in order to ensure that the total power consumption of multiple nodes is not increased, the frequency of all the processor cores must be increased from the maximum frequency f_maxDown to the middle of the frequency f_midTaking the intermediate value f of frequency_midIs the maximum frequency f_maxAnd minimum frequency f_minAverage value between the two.

In this embodiment, step 1) further includes a step of calculating a processor frequency f, and the detailed steps include substituting the optimal number of added nodes Δ N into formula (1), making Δ N ═ Δ N, and making P^cpu(f_mid)＝P^cpu(f_i) To obtain a power consumption value P^cpu(f_i) According to the relation between different processor frequency levels and the power consumption values of the processor cores, the processor frequency f meeting the conditions is taken_iIs taken as the processor frequency f determined in step 1).

In this embodiment, before step 1), calculating the power consumption limit value P_targetAnd calculating a function expression as follows:

In summary, the resource allocation method for improving the energy efficiency of the computing subsystem aims at access-limited programs running on a cluster system, and aims to achieve the purposes that program execution time is reduced, total power consumption is kept unchanged, and energy consumption is reduced under the condition that power consumption constraint conditions are met, so that the energy efficiency of the system is improved.

In addition, this embodiment further provides a resource allocation system for improving energy efficiency of a computing subsystem, including:

A resource allocation program unit for setting the power consumption limit value of the dynamic frequency adjustment tool of the processor to P_targetAnd scheduling the parallel program to run on N + delta N computing nodes, wherein the initial value of the processor frequency of each computing node is the processor frequency f, N is the minimum computing node number required by the running of the parallel program, each processor core runs a process under the default resource allocation, and delta N is the optimal increased node number.

In addition, the embodiment also provides a resource allocation system for improving energy efficiency of a computing subsystem, which includes a computer device programmed or configured to perform the steps of the foregoing resource allocation method for improving energy efficiency of a computing subsystem, or a computer program programmed or configured to perform the foregoing resource allocation method for improving energy efficiency of a computing subsystem is stored in a memory of the computer device.

Furthermore, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, the computer program being programmed or configured to execute the foregoing resource allocation method for improving energy efficiency of the computing subsystem.

The above description is only a preferred embodiment of the present embodiment, and the protection scope of the present embodiment is not limited to the above embodiment, and all technical solutions belonging to the idea of the present embodiment belong to the protection scope of the present embodiment. It should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present embodiment, and these improvements and modifications should also be construed as the protection scope of the present embodiment.

Claims

1. A resource allocation method for improving energy efficiency of a computing subsystem is characterized by comprising the following implementation steps:

2. The method for allocating resources to improve energy efficiency of a computing subsystem according to claim 1, wherein step 1) is preceded by a step of calculating an optimal number of incremental nodes Δ N, and the detailed steps include: calculating a first incremental node data interval [0, Δ N ] using total memory bandwidth^pref](ii) a Calculating a second increased node data interval [0, Delta N ] by using a power consumption constraint condition^power](ii) a Find the first incremental node data interval [0, Δ N^pref]Second incremental node data interval [0, Δ N^power]And selecting the maximum value in the intersection interval as the optimal number of the increased nodes.

3. The resource allocation method for improving energy efficiency of the computing subsystem according to claim 2, wherein the first incremental node data interval [0, Δ N ] is calculated by using the total memory bandwidth^pref]The detailed steps comprise:

s1) acquiring the actual memory access bandwidth b of each computing node at each time t of record₁(t),b₂(t),...,b_N(t), calculating the average actual memory access bandwidth B (t) of the single node during the running of the parallel program, and taking the maximum value of B (t) as the actual memory access bandwidth B of the parallel program_NWherein b is_i(t) is the actual memory access bandwidth value on the ith computing node;

s2) calculating the actual memory access bandwidth B_NRelative to the ratio bound of the physical memory bandwidth B of a single node, judging whether the access and storage of the parallel program are limited according to whether the ratio bound reaches a threshold value α, and if the access and storage are not limited, skipping to execute a step S3);

4. The resource allocation method for improving energy efficiency of the computing subsystem according to claim 2, wherein the second incremental node data interval [0, Δ N ] is calculated by using the power consumption constraint condition^power]Specifically, the maximum node number Δ N satisfying the following power consumption constraint function is solved, and the obtained node number Δ N is used as the node number Δ N to be increased^powerTo obtain a second incremental node data interval [0, Δ N ]^power]；

for processor power consumption, P, with a single processor core in an idle state^memFor memory power consumption, P^otherFor other power consumption of a single computing node except for a processor and a memory, increasing the Delta N computing nodes can correspondingly increase the total power consumption, wherein the total power consumption comprises the memory power consumption of the Delta N computing nodes and the power consumption of idle processor cores generated by the increased nodes, and in order to ensure the total power consumption of multiple nodesThe power consumption is not increased, and the frequency of all processor cores must be from the maximum frequency f_maxDown to the middle of the frequency f_midTaking the intermediate value f of frequency_midIs the maximum frequency f_maxAnd minimum frequency f_minAverage value between the two.

5. The method for allocating resources to improve energy efficiency of a computing subsystem according to claim 1, wherein step 1) is preceded by a step of calculating a processor frequency f, and the detailed steps comprise: substituting the optimal number of added nodes Δ N into a power consumption constraint function expressed by the following formula, making Δ N equal to Δ N, and making P^cpu(f_mid)＝P^cpu(f_i) To obtain a power consumption value P^cpu(f_i) According to the relation between different processor frequency levels and the power consumption values of the processor cores, the processor frequency f meeting the conditions is taken_iAs the processor frequency f determined in step 1);

6. The resource allocation method for improving energy efficiency of the computing subsystem according to claim 5, wherein the step 1) is preceded by calculating workConsumption limit value P_targetAnd calculating a function expression as follows:

7. A resource allocation system for improving energy efficiency of a computing subsystem, comprising:

A resource allocation program unit for setting a power consumption limit value to P using the dynamic processor frequency adjustment tool_targetAnd scheduling the parallel program to run on N + delta N computing nodes, wherein the initial value of the processor frequency of each computing node is the processor frequency f, N is the minimum number of the computing nodes required by the running of the parallel program, and each processor core runs a process under the allocation of default resources.

8. A resource allocation system for improving energy efficiency of a computing subsystem, comprising a computer device, wherein the computer device is programmed or configured to perform the steps of the resource allocation method for improving energy efficiency of a computing subsystem according to any one of claims 1 to 6, or wherein a memory of the computer device has stored thereon a computer program programmed or configured to perform the resource allocation method for improving energy efficiency of a computing subsystem according to any one of claims 1 to 6.

9. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the method for computing sub-system energy efficient resource allocation according to any one of claims 1 to 6.