CN113419827A - High-performance computing resource scheduling fair sharing method - Google Patents

High-performance computing resource scheduling fair sharing method Download PDF

Info

Publication number
CN113419827A
CN113419827A CN202110509596.8A CN202110509596A CN113419827A CN 113419827 A CN113419827 A CN 113419827A CN 202110509596 A CN202110509596 A CN 202110509596A CN 113419827 A CN113419827 A CN 113419827A
Authority
CN
China
Prior art keywords
leaf
task
quota
tasks
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110509596.8A
Other languages
Chinese (zh)
Inventor
陆伟钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Skycloud Rongchuang Software Technology Co ltd
Original Assignee
Beijing Skycloud Rongchuang Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Skycloud Rongchuang Software Technology Co ltd filed Critical Beijing Skycloud Rongchuang Software Technology Co ltd
Priority to CN202110509596.8A priority Critical patent/CN113419827A/en
Publication of CN113419827A publication Critical patent/CN113419827A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention provides a high-performance computing resource scheduling fair sharing method, which realizes fair resource sharing by computing the right of historical resource usage in a dynamic limit, and improves the service quality of a high-performance computing system to users. The leaves to which the tasks belong are searched through the hash table, and the sequencing of all the shared tree nodes is optimized after each task is successfully scheduled, so that the scheduling speed is increased, and the high throughput of the system is realized.

Description

High-performance computing resource scheduling fair sharing method
Technical Field
The invention relates to a high-performance computing resource and task scheduling method, in particular to a distributed computing-based high-performance computing resource scheduling fair sharing method.
Background
The following is a related art introduction:
high-performance computing and big data task scheduling system
High-performance computation and big data belong to a distributed computing system, namely the whole system is formed by a plurality of servers into a cluster, and computation and data tasks are distributed to each server to run.
Resource and task scheduling systems are key technologies for distributed computing systems. The user's computing tasks are run through the resource and task scheduling system, rather than by directly accessing a server.
A task refers to a whole segment of computation. The user submits multiple tasks into a queue. The scheduler reads the task definition from the queue and allocates resources for the task based on the availability of resources (i.e., the host is working properly), the situation that has been allocated, and the scheduling policy definition.
The sum of the resources required for all tasks in a high performance computing and big data environment is typically greater than the system available resources. How to automatically and rationally allocate resources is the main role of the resource and task scheduler.
Second, fair sharing principle
Fair sharing is a scheduling policy commonly used in resource and task scheduling, and refers to automatically and reasonably allocating the amount of resources used by each user according to a predefined scheduling policy when allocating limited resources to multiple users. A typical example is that multiple tasks with two different users in a task waiting queue wait to be scheduled to 100 CPU cores (these cores are distributed on a cluster of multiple hosts), and the scheduling policy requires:
1. when the total number of CPU cores needed by all tasks in the queue is more than the number of cluster CPU cores, the number of the CPU cores occupied by the tasks of the two users is 3: 2 allocation (i.e. 60: 40).
2. When the first user has fewer tasks than is using its share, the remaining resources on the cluster may all be allocated to the second user. If the first user currently only needs 20 cores, and the second user has enough tasks waiting in the queue, the second user can use up to 80 cores.
3. When the first user adds more tasks to the queue, the second user occupies more than 40 extra-core resources and needs to be released for the first user to use until the ratio of the resources used by the two users is 3: 2.
in practice, the fair sharing scheduling policy may define multiple layers, as shown in fig. 1.
A cluster is used by multiple users, and users are organized by department and project. The value in parentheses is the share of the resource of the unit, the size of the value itself has no meaning (may be any value), and the ratio of the adjacent unit values determines the resource allocation amount of the resource in the whole cluster (resource pool), as in the first example 3: the partition ratio of 2 may be defined as 30 and 20, or 60 and 40.
The cluster is shared by the department 1 and the department 2, and the sharing proportion of the department 1 and the department 2 is 20: 10, namely 2: 1.
two users in department 1, the ratio of using the resources assigned to that department is 1: 2.
the resources allocated to each department 2 are subdivided into two projects, with project 1 being 2 and project 2 being 3.
The resources of project 1 are distributed to all users belonging to the project in the same proportion.
The resources of project 2 are as follows 1: 2 to user 1 and user 2.
Third, the existing fair sharing algorithm
Business and open source resource task scheduling software generally has a fair sharing scheduling strategy, although the priority of each unit (user, department, project and the like) is determined by sequencing leaves of a tree data structure basically, the implementation mode of the scheduling software is realized by respective unique code logic, and a public library is not provided or does not exist to realize the fair sharing function.
Description of fair sharing of open source software SLURM: slurm Workload Manager-Fair Tree Fairshare Algorithm (schedmd. com)
Description of fair sharing of business software PBSPro: pdf of Fairshare Management for Altair PBS protocol
The commercial software IBM Spectrum LSF also has a fair sharing function: fairshare scheduling (ibm.com)
Fourth, fair sharing of scheduling performance
Fair sharing scheduling is to determine the priority of task scheduling according to the comparison between the dynamic limit in each unit on the shared tree and the limit of other units, and the dynamic limit changes after each task of the unit is scheduled, so that each time a task is scheduled, the priority of all units needs to be evaluated once to determine the unit to which the next task belongs. And then the waiting task belonging to the unit is obtained in the queue.
Therefore, in the whole scheduling process, after each task is scheduled, the shared tree unit needs to be queued, waiting tasks corresponding to the high-priority unit are searched, and the scheduling of a plurality of tasks cannot be parallelized, so that the scheduling speed has certain limitation.
How to quickly schedule a large number of tasks in a large environment (over a hundred thousand CPU cores, hundreds of thousands waiting tasks), or in a high-throughput task environment (task running time is less than 30 seconds, total tasks are more than ten thousand) in order to use free resources in time is a critical issue.
Fifthly, fair sharing considering historical usage
Many fair sharing scheduling algorithms (e.g., SLURM) only consider the current resource usage of the nodes of the current sharing tree, and such algorithms are not "fair". For example, 2 users, user a and user B share one resource pool, and the quota ratio is 1: 1, i.e. 50% each. User a has 1000 tasks at the beginning and user B has no tasks, so user a can use the entire resource pool. When the task of user a is completed by 900, user B starts to submit 1000 tasks, and then the scheduler starts to schedule tasks according to the following rule 1: 1 schedules the tasks for a and user B at a time, so that user B has 900 tasks left when 1000 tasks for user a are completed. From the whole operation situation, the scheduler is scheduled according to a first-in first-out strategy.
While a more reasonable fair share of scheduling requires tasks for both users to be completed simultaneously.
This algorithm requires consideration of historical resource usage such that when user B starts to be tasked, user a has a very low priority since it has used the excess resources for a long time, and the resources are limited to user B. And after the total resource usage amount of the user B is increased to be equal to the total resource usage amount of the user A, the quota of the user A is the same as that of the user B, and all tasks of the last two users are finished at the same time.
Disclosure of Invention
The invention provides a high-performance computing resource scheduling fair sharing method, which solves two problems of the above mentioned fair sharing scheduling algorithm: (1) the scheduling speed ensures high throughput; (2) and fair sharing related to historical resource utilization is realized. The technical scheme is as follows:
a high-performance computing resource scheduling fair sharing method comprises the following steps:
s1: initializing a data structure, converting the configured fair sharing structure into a data structure of a tree, calculating a static quota of each leaf, setting a dynamic quota as the static quota, and setting a sub-queue of the leaves;
s2: putting tree leaf sequences according to users to which tasks belong;
s3: the leaves are sorted according to the dynamic quota in a descending order to generate a leaf sorting list;
s4: taking a first task from a tree leaf queue of the highest dynamic quota for scheduling, and adjusting the dynamic quota of the leaf;
s5: comparing the dynamic quota of the next leaf in the leaf ordered list, and if the dynamic quota in the leaf unit of the task which is just scheduled is lower than the next leaf, arranging the dynamic quota behind the next leaf until the dynamic quota is higher than the next leaf;
s6: looping the steps S4-S5 until the scheduling period is finished or the tasks in all the sub queues are scheduled;
s7: the scheduling cycle is ended;
s8; according to the running and ending states of the tasks, adjusting the dynamic quota of the shared leaves;
s9: returning to step S2, the process of the next scheduling cycle is performed.
Further, in step S1, the global share amount of each leaf of the fair share tree is calculated, the cluster-level amount is defined as 1, and the share amount of each leaf is calculated from top to bottom.
Further, in step S1, each leaf unit has a dynamic quota, and the initial value of the dynamic quota is the same as the static quota.
Further, in step S1, setting up the leaf subqueue means that a "hash" table with the leaf unit name as a key is established, so that the leaf unit can be quickly found according to the user name of the task when the task is put into the leaf unit subqueue in the following.
Further, in step S2, the tasks whose number is 2-3 times of the total amount of the resources are taken out from the task queue according to the order of task submission.
Further, in step S4, after the first task is scheduled, if the task is successfully started, the unit is decremented according to the resource allocated by the task.
Further, in step S4, the calculation formula of the dynamic quota is as follows:
Figure BDA0003059815140000051
wherein the content of the first and second substances,
Figure 100002_1
the invention has the following advantages:
(1) and the traditional method for sequencing all the nodes by using the shared tree is not used, and the sequencing of the shared tree after no task is scheduled is optimized.
(2) When the tasks in the waiting queue are evaluated, certain optimization is also carried out, and all the tasks in the waiting queue are not traversed.
(3) And after the task is scheduled, the computation node quota considers the use of historical resources.
Drawings
FIG. 1 is a schematic diagram of a fair sharing scheduling policy definition multi-layer;
FIG. 2 is a schematic flow diagram of a method provided by the present invention;
FIG. 3 is a schematic diagram of a leaf sort list;
FIG. 4 is a schematic diagram of the configuration of fair sharing of test in the test of the present method;
FIG. 5 is a diagram illustrating the statistics in units of users during testing;
FIG. 6 is a diagram showing the statistical results in terms of items during the test.
Detailed Description
As shown in fig. 2, the high-performance and high-performance computing resource scheduling fair sharing method provided by the present invention includes the following steps:
s1: initializing a data structure: converting the configured fair sharing structure into a data structure of a tree, calculating the static quota of each leaf, setting the dynamic quota as the static quota, and setting a sub-queue of the leaves;
the fair sharing structure is shown in fig. 1. All the following description takes the structure in this figure as an example.
S11: and calculating the global sharing amount of each leaf of the fair sharing tree, defining the amount of the cluster level as 1, and calculating the sharing amount of each leaf from top to bottom. In the example of fig. 1, department 1 is 0.6667, department 2 is 0.3333, user 3 is 0.2222, user 4 is 0.4445, item 1 is 0.1333, item 2 is 0.2, user 1 is 0.0667, and user 2 is 0.0333. The unit values of the final leaf part are shown in Table 1. The credit is static credit, i.e. credit of each leaf with a total of 1.000 according to the configuration.
Table 1: fair sharing of configured credits for pages
Leaf unit Static quota
User
1 0.0667
User 2 0.0333
Item 1 0.1333
User 3 0.2222
User 4 0.4445
S12: each leaf unit is further provided with a dynamic amount, and the initial value is the same as the static amount. And sorting the leaves in a descending order according to the amount. The sequence in table 1 after sorting is: user 4, user 3, item 1, user 2.
S13: then, a sub-queue is established for each leaf unit:
and establishing a 'hash' table taking the leaf unit name as a key so as to quickly find the leaf unit according to the user name of the task when the task is put into the leaf unit subqueue in the following.
S2: according to the user to which the task belongs, putting a tree leaf sequence: wherein, the number of tasks in the task queue is 2-3 times of the number of the resource pools.
A scheduling cycle is started as follows:
and taking out tasks with the quantity 2-3 times of the total amount of resources (resource pool) from the task queue according to the order of task submission (pre-fetching of first submission).
This is done to (a) avoid traversing all waiting tasks (perhaps hundreds of thousands to one million), and (b) ensure that free resources are made full.
The task is then placed in the subqueue of the leaf to which it belongs.
S3: the leaves are sorted according to the dynamic quota in a descending order to generate a leaf sorting list:
the sorted list is shown in fig. 3, and 12 tasks are sorted and wait for the tasks to be put into the leaf unit sub-queue.
The leaf units are sorted from high to low according to the dynamic quota, and the sorted result is shown in table 2.
Table 2: ordering of leaf units
Leaf unit Dynamic limit Number of sub-queue tasks
User 4 0.4445 4
User 3 0.2222 2
Item 1 0.1333 2
User 1 0.0667 2
User 2 0.0333 2
S4: and taking the first task from the tree leaf queue with the highest dynamic quota for scheduling, and adjusting the dynamic quota of the leaf:
and taking a task from the unit subqueue with the highest quota, distributing resources for the task and starting the task. If the start is successful, the unit credit is reduced according to the resources allocated by the task (see the description of S9 for a method for calculating the dynamic credit).
S5: and comparing the dynamic quota of the next leaf in the leaf ordered list, and if the dynamic quota in the leaf unit of the task just scheduled is lower than the next, arranging the dynamic quota behind the next leaf until the dynamic quota is higher than the next leaf:
and descending the leaf ranks according to the updated quota. It is time consuming if the leaf elements (e.g. the number of different users) of the shared tree are large.
In order to increase the speed, only the dynamic quota in the leaf unit of the task just scheduled is compared with the next unit, and if the dynamic quota is smaller than the next unit, the sequence is exchanged until the dynamic quota is larger than the next unit.
Thus, the leaf unit does not need to be moved or moved 1-2 times in a large probability, and the reordering is faster than the whole leaf unit column. This speed increase is very much possible when scheduling a large number of tasks (hundreds of thousands) because this step is done once per task.
S6: and (4) until the scheduling period is finished or the tasks in all the sub queues are scheduled, otherwise, performing loop processing, and jumping to the step S4:
and repeating the step 4 until a) no CPU core is available in the cluster or b) all tasks in the tree leaf queues are scheduled.
S7: the scheduling cycle is ended;
s8: according to the task running and ending state (waiting, running or ending), the dynamic quota of the shared leaf is adjusted:
and checking the running state and running time of all running tasks, and then adjusting the dynamic limit of each leaf unit.
S9: entering the next scheduling period;
the process proceeds to step S2, and the process proceeds to the next scheduling cycle.
Considering the influence of the resource occupation of the historical tasks on the dynamic quota, the dynamic quota of the unit is calculated by using the following formula:
Figure BDA0003059815140000081
wherein:
Figure 2
in the embodiment, the method is tested in the SkyForm AIP task scheduler, and the specific test result is as follows:
1. the test environment is as follows:
(1) resource:
using AWS (amazon public cloud), the main scheduling servers are c4.xlarge:16CPU core, 30GB memory, 1000 compute servers t2.micro:1CPU core, 1GB memory, each compute server simulates 50 CPU cores, totaling a cluster of 5 ten thousand cores for the scheduler.
(2) Task: 1 million tasks with a running time of around 3 minutes.
(3) Configuration:
5 items, 8 users, each of which is in all items simultaneously, as shown in fig. 4.
2. Task submission:
users u001-u008 submitted 125000 tasks per user. These tasks are divided equally into five groups of items, idle, owner, priority, normal, short, i.e. 25000 tasks per group of items per user. The order of delivery was random.
3. The final result is:
(1) the statistical results are shown in fig. 5 in units of users. It can be seen that the task scheduling situation realizes fair sharing among users.
(2) The statistical results in terms of items are shown in fig. 6. It can be seen that the case of task scheduling enables fair sharing among items (one queue per item).
And a throughput rate of 792952 tasks/hour is calculated. The same test compared to the open source software SLURM, the resulting dispatch throughput rate was 126849 tasks/hour.
The technical scheme has the advantages that:
(1) by calculating the right of historical resource usage in the dynamic limit, fair resource sharing is realized, and the service quality of a high-performance computing system to users is improved.
(2) The leaves to which the tasks belong are searched through the hash table, and the sequencing of all the shared tree nodes is optimized after each task is successfully scheduled, so that the scheduling speed is increased, and the high throughput of the system is realized.

Claims (7)

1. A high-performance computing resource scheduling fair sharing method comprises the following steps:
s1: initializing a data structure, converting the configured fair sharing structure into a data structure of a tree, calculating a static quota of each leaf, setting a dynamic quota as the static quota, and setting a sub-queue of the leaves;
s2: putting tree leaf sequences according to users to which tasks belong;
s3: the leaves are sorted according to the dynamic quota in a descending order to generate a leaf sorting list;
s4: taking a first task from a tree leaf queue of the highest dynamic quota for scheduling, and adjusting the dynamic quota of the leaf;
s5: comparing the dynamic quota of the next leaf in the leaf ordered list, and if the dynamic quota in the leaf unit of the task which is just scheduled is lower than the next leaf, arranging the dynamic quota behind the next leaf until the dynamic quota is higher than the next leaf;
s6: looping the steps S4-S5 until the scheduling period is finished or the tasks in all the sub queues are scheduled;
s7: the scheduling cycle is ended;
s8; according to the running and ending states of the tasks, adjusting the dynamic quota of the shared leaves;
s9: returning to step S2, the process of the next scheduling cycle is performed.
2. The method according to claim 1, wherein the method comprises: in step S1, the global share amount of each leaf of the fair share tree is calculated, the cluster-level amount is defined as 1, and the share amount of each leaf is calculated from top to bottom.
3. The method according to claim 1, wherein the method comprises: in step S1, each leaf unit has a dynamic quota, and the initial value of the dynamic quota is the same as the static quota.
4. The method according to claim 1, wherein the method comprises: in step S1, setting up the leaf sub-queue means to establish a "hash" table keyed by the leaf unit name, so that the leaf unit can be quickly found according to the user name of the task when the task is put into the leaf unit sub-queue in the following.
5. The method according to claim 1, wherein the method comprises: in step S2, the tasks whose number is 2-3 times of the total amount of resources are taken out from the task queue according to the order of task submission.
6. The method according to claim 1, wherein the method comprises: in step S4, after the first task is scheduled, if the task is started successfully, the unit is reduced according to the resource allocated by the task.
7. The method according to claim 1, wherein the method comprises: in step S4, the calculation formula of the dynamic quota is as follows:
Figure FDA0003059815130000021
wherein the content of the first and second substances,
Figure 1
CN202110509596.8A 2021-05-11 2021-05-11 High-performance computing resource scheduling fair sharing method Withdrawn CN113419827A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110509596.8A CN113419827A (en) 2021-05-11 2021-05-11 High-performance computing resource scheduling fair sharing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110509596.8A CN113419827A (en) 2021-05-11 2021-05-11 High-performance computing resource scheduling fair sharing method

Publications (1)

Publication Number Publication Date
CN113419827A true CN113419827A (en) 2021-09-21

Family

ID=77712194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110509596.8A Withdrawn CN113419827A (en) 2021-05-11 2021-05-11 High-performance computing resource scheduling fair sharing method

Country Status (1)

Country Link
CN (1) CN113419827A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070256077A1 (en) * 2006-04-27 2007-11-01 International Business Machines Corporation Fair share scheduling based on an individual user's resource usage and the tracking of that usage
CN103380608A (en) * 2011-03-09 2013-10-30 中国科学院计算机网络信息中心 Method for gathering queue information and job information in computation environment
CN105302650A (en) * 2015-12-10 2016-02-03 云南大学 Dynamic multi-resource equitable distribution method oriented to cloud computing environment
CN107291545A (en) * 2017-08-07 2017-10-24 星环信息科技(上海)有限公司 The method for scheduling task and equipment of multi-user in computing cluster

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070256077A1 (en) * 2006-04-27 2007-11-01 International Business Machines Corporation Fair share scheduling based on an individual user's resource usage and the tracking of that usage
CN103380608A (en) * 2011-03-09 2013-10-30 中国科学院计算机网络信息中心 Method for gathering queue information and job information in computation environment
CN105302650A (en) * 2015-12-10 2016-02-03 云南大学 Dynamic multi-resource equitable distribution method oriented to cloud computing environment
CN107291545A (en) * 2017-08-07 2017-10-24 星环信息科技(上海)有限公司 The method for scheduling task and equipment of multi-user in computing cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓景文: "集群系统下面向用户的作业公平调度算法", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 11, pages 1 - 64 *

Similar Documents

Publication Publication Date Title
US9141430B2 (en) Scheduling mapreduce job sets
US10108458B2 (en) System and method for scheduling jobs in distributed datacenters
Page et al. Framework for task scheduling in heterogeneous distributed computing using genetic algorithms
He et al. A novel task-duplication based clustering algorithm for heterogeneous computing environments
US11030014B2 (en) Concurrent distributed graph processing system with self-balance
CN107688492B (en) Resource control method and device and cluster resource management system
Coffman Jr et al. Computer scheduling methods and their countermeasures
US20240036937A1 (en) Workload placement for virtual gpu enabled systems
Wang et al. Pigeon: An effective distributed, hierarchical datacenter job scheduler
CN103793272A (en) Periodical task scheduling method and periodical task scheduling system
Zhao et al. Multi-resource interleaving for deep learning training
CN109857535B (en) Spark JDBC-oriented task priority control implementation method and device
CN109240795A (en) A kind of resource regulating method of the cloud computing resources pool model suitable for super fusion IT infrastructure
KR20120082598A (en) Cost based scheduling algorithm for multiple workflow in cloud computing and system of the same
CN115292016A (en) Task scheduling method based on artificial intelligence and related equipment
CN108427602B (en) Distributed computing task cooperative scheduling method and device
CN110908782A (en) Genetic algorithm optimization-based packaging type distributed job task scheduling method and system
CN113495779A (en) Task scheduling method and device and task execution system
CN114265679A (en) Data processing method and device and server
CN113886034A (en) Task scheduling method, system, electronic device and storage medium
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
Wang et al. A throughput driven task scheduler for improving mapreduce performance in job-intensive environments
CN113419827A (en) High-performance computing resource scheduling fair sharing method
CN115098240B (en) Multiprocessor application scheduling method and system and storage medium
CN116010051A (en) Federal learning multitasking scheduling method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210921

WW01 Invention patent application withdrawn after publication