CN113703952B - Resource allocation method for queue resource scheduling based on supercomputer - Google Patents
Resource allocation method for queue resource scheduling based on supercomputer Download PDFInfo
- Publication number
- CN113703952B CN113703952B CN202010429029.7A CN202010429029A CN113703952B CN 113703952 B CN113703952 B CN 113703952B CN 202010429029 A CN202010429029 A CN 202010429029A CN 113703952 B CN113703952 B CN 113703952B
- Authority
- CN
- China
- Prior art keywords
- resources
- user
- queue
- resource
- private
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
Abstract
The invention relates to a resource allocation method for queue resource scheduling based on a supercomputer, which comprises the following steps: (1) A user submits a job, and the number of computing resources and the name of a private queue are specified; (2) The submitted parameters are sent to a system for judgment, if the private queue resources are used enough, namely the number of resources in the private queue resources is larger than the number of computing resources, the user works normally, and the operation is finished; otherwise, the system judges whether the conditions are met; the submitted parameters refer to the number of computing resources specified by the user and the private queue name; (3) If the condition is met, dividing the needed temporary node into a private queue corresponding to the middle private queue name from the resource pool, and completing normal operation of the user operation; otherwise, printing out reasons for the non-compliance condition; (4) The system repaints the temporary node back into the resource pool and ends. The invention optimizes the configuration of computing resources and improves the efficiency. A vigorous resource queue may be maintained for resource calls in emergency.
Description
Technical Field
The invention relates to a resource allocation method for queue resource scheduling based on a supercomputer, and belongs to the technical field of high-performance computing supercomputer resource dynamic scheduling algorithms.
Background
The super computer is used for the research of the national high-tech field and the advanced technology, is the embodiment of the scientific research strength of the country, has important significance for the national security, economy and social development, and is an important mark of the national science and technology development level and the comprehensive national force. The supercomputers of one country are generally responsible for operation and maintenance by the state-level supercomputer centers. By the end of 5 months in 2020, seven super computing centers are built or are being built in China, namely a national super computing Tianjin center, a national super computing Changsha center, a national super computing Jinan center, a national super computing Guangzhou center, a national super computing Shenzhen center, a national super computing Wuxi center and a national super computing Zheng center.
Currently, the allocation of queue resources in a national supercomputer (supercomputer) is basically in two modes, namely commercial computing resources or domestic computing resources, and the queue resources of shared computing nodes and the queue resources of exclusive computing nodes. In the field of supercomputers, the computing node resources have consistent properties, and a dynamic scheduling algorithm for scheduling from a logic level is not available, so that the computing resources are manually distributed mostly through the purchase application condition of users, and the flexibility and the instantaneity are to be improved.
In the initial stage of super computer construction, the total performance of the super computer is generally evaluated by using the total computer computing node resources of the super computer. After the super computing center is gradually put into operation, after computing node resources are gradually leased out, a relatively large computing resource queue pool is difficult to coordinate to assist in important scientific computation. In the operation process of the supercomputer center, the following problems exist: (1) Users frequently occupy most computing resources of the shared queue, and the resources are tensed at a certain moment due to assault computation, so that the system pressure is overlarge. (2) The exclusive queue is owned by one or a type of user and the computing resources are occupied, but the idleness is so high that supercomputers cannot provide a large amount of centralized computing power. (3) Some large computing scientific tasks are not supported by sufficient computing resources for a short time.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a resource allocation method for queue resource scheduling based on a supercomputer.
The prior art means has no dynamic allocation capability, and a repackaging scheduling algorithm needs to be modified at a user scheduling system layer to realize dynamic resource scheduling at a logic level. Therefore, the existing resource analysis and the user's own resource analysis are required to be carried out according to the dynamic trigger monitoring, a super computing center dynamic scheduling mechanism of a resource logic layer is designed, the resource utilization rate problem in the resource scheduling process is improved in a series of modes, and the problem of insufficient computing resources of the super computing center is further solved.
The technical scheme of the invention is as follows:
a resource allocation method for queue resource scheduling based on a supercomputer comprises the following steps:
(1) A user submits a job, and the number of required computing resources and the private queue name are specified; for example, the number of computing resources required includes the number of nodes, the number of cores required for each node, and the private queue name of the task to be submitted;
(2) The submitted parameters are judged by the queue resource scheduling resource allocation method designed by the invention, if the private queue resources are used enough, namely the number of resources in the private queue resources is larger than the number of the computing resources in the step (1), the user operation is normally operated, and the operation is ended; otherwise, the queue resource scheduling resource allocation method judges whether the existing computing resources meet the required computing resource number, and the step (3) is entered; the submitted parameters refer to the number of computing resources and the private queue name specified by the user in the step (1);
(3) If the existing computing resources meet the required computing resource number, dividing the required temporary nodes into private queues corresponding to the private queue names in the step (1) from a resource pool, completing normal operation of user operation, and entering the step (4); otherwise, printing out reasons for the non-compliance condition; for example: the submitted job calculates that the node number has exceeded the actual purchase total.
(4) And (5) the temporary node is re-divided back into the resource pool by the queue resource scheduling resource allocation method, and the method is ended.
According to a preferred embodiment of the present invention, steps (2) to (4) include the steps of:
A. judging whether the number of resources in the private queue resources meets the number of resources required by a user, wherein the number of resources required by the user is the number of calculated resources, if so, executing the transmission of the bsub1 parameter to the system bsub2, and entering a step F, otherwise, entering a step B; the bsub1 parameters are all parameters which are configured and submitted by a user, and comprise the number of nodes, the number of cores required by each node and the private queue name of a task to be submitted, and the bsub2 is the bsub command which is called after the step (4) is finished; namely: and acquiring a job node number of the job, detecting the job state, and executing a corresponding number of zero-time resources of the system command from a user queue to a resource pool queue after the job is normally ended, and calling a bsub command after the job is normally ended.
B. Counting the sum of the node number in the submitted job and the node number expected to be used in the submitted job, if the sum is larger than the total number of nodes purchased by the user, returning to print to prompt the user, and if the sum is not larger than the total number of nodes purchased by the user, executing the step C;
C. the system calculates the residual usable calculation resources in the resource pool at the moment, if the residual usable calculation resources in the resource pool at the moment are smaller than the number of the nodes which are expected to be used in the submitting operation at the moment, the step D is entered, otherwise, the step E is entered;
D. after t minutes (min), the system calculates the residual usable computing resources in the resource pool at the moment, if the residual usable computing resources in the resource pool at the moment are still smaller than the number of nodes which are expected to be used in the submitted operation at the moment, the printing prompt is returned to the user, the system is insufficient in computing resources, and the system administrator is requested to be contacted, otherwise, the step E is entered;
E. executing a scheduling system command, namely transferring the number of nodes expected to be used by the submitted job from a resource pool to a private queue corresponding to the private queue name of the user, and then executing the transmission of the bsub1 parameter to the bsub 2;
F. executing the bsub2, acquiring the job node number of the submitted job, and executing the submitted job;
G. after the submitted job normally ends, executing a system command, and dividing the number of nodes which are allocated from the resource pool and are expected to be used for the submitted job into the resource pool from the private user queue.
Further preferably, t=1.
According to the preferred embodiment of the present invention, in the above method for allocating resources for scheduling queue resources based on a supercomputer, the number of resources in the private queue resources is greater than the number of resources in the resource pool. Generally, the annual utilization of the supercomputer center X86 architecture clusters floats above and below 75%.
The beneficial effects of the invention are as follows:
1. the invention optimizes the configuration of computing resources and improves the efficiency. Even though the computing resources cannot be unified, the algorithm is still effective for cluster resource management, and the larger the algorithm base is, the greater the use is. A vigorous resource queue may be maintained for resource calls in emergency.
2. The invention removes the need of modifying each user attribute in the initial setting, can automatically maintain in the later period, and can automatically operate the system so as to save the labor cost.
Drawings
FIG. 1 is a flow chart of a method for allocating resources for queue resource scheduling based on a supercomputer.
Detailed Description
The invention is further defined by, but is not limited to, the following drawings and examples in conjunction with the specification.
Example 1
A resource allocation method for queue resource scheduling based on supercomputer, as shown in figure 1, comprises the following steps:
(1) A user submits a job, and the number of required computing resources and the private queue name are specified; for example, the number of computing resources required includes the number of nodes, the number of cores required for each node, and the private queue name of the task to be submitted;
(2) The submitted parameters are judged by the queue resource scheduling resource allocation method designed by the invention, if the private queue resources are used enough, namely the number of resources in the private queue resources is larger than the number of the computing resources in the step (1), the user operation is normally operated, and the operation is ended; otherwise, the queue resource scheduling resource allocation method judges whether the existing computing resources meet the required computing resource number, and the step (3) is entered; the submitted parameters refer to the number of computing resources and the private queue name specified by the user in the step (1);
(3) If the existing computing resources meet the required computing resource number, dividing the required temporary nodes into private queues corresponding to the private queue names in the step (1) from a resource pool, completing normal operation of user operation, and entering the step (4); otherwise, printing out reasons for the non-compliance condition; for example: the submitted job calculates that the node number has exceeded the actual purchase total.
(4) And (5) the temporary node is re-divided back into the resource pool by the queue resource scheduling resource allocation method, and the method is ended.
Example 2
The resource allocation method for queue resource scheduling based on a supercomputer according to embodiment 1 is characterized in that: step (2) to step (4), comprising the steps of:
A. judging whether the number of resources in the private queue resources meets the number of resources required by a user, wherein the number of resources required by the user is the number of calculated resources, if so, executing the transmission of the bsub1 parameter to the system bsub2, and entering a step F, otherwise, entering a step B; the bsub1 parameters are all parameters which are configured and submitted by a user, and comprise the number of nodes, the number of cores required by each node and the private queue name of a task to be submitted, and the bsub2 is the bsub command which is called after the step (4) is finished; namely: and acquiring a job node number of the job, detecting the job state, and executing a corresponding number of zero-time resources of the system command from a user queue to a resource pool queue after the job is normally ended, and calling a bsub command after the job is normally ended.
B. Counting the sum of the node number in the submitted job and the node number expected to be used in the submitted job, if the sum is larger than the total number of nodes purchased by the user, returning to print to prompt the user, and if the sum is not larger than the total number of nodes purchased by the user, executing the step C;
C. the system calculates the residual usable calculation resources in the resource pool at the moment, if the residual usable calculation resources in the resource pool at the moment are smaller than the number of the nodes which are expected to be used in the submitting operation at the moment, the step D is entered, otherwise, the step E is entered;
D. after t minutes (min), the system calculates the residual usable computing resources in the resource pool at the moment, if the residual usable computing resources in the resource pool at the moment are still smaller than the number of nodes which are expected to be used in the submitted operation at the moment, the printing prompt is returned to the user, the system is insufficient in computing resources, and the system administrator is requested to be contacted, otherwise, the step E is entered;
E. executing a scheduling system command, namely transferring the number of nodes expected to be used by the submitted job from a resource pool to a private queue corresponding to the private queue name of the user, and then executing the transmission of the bsub1 parameter to the bsub 2;
F. executing the bsub2, acquiring the job node number of the submitted job, and executing the submitted job;
G. after the submitted job normally ends, executing a system command, and dividing the number of nodes which are allocated from the resource pool and are expected to be used for the submitted job into the resource pool from the private user queue.
t=1。
In the resource allocation method based on the queue resource scheduling of the supercomputer, the number of resources in the private queue resource is larger than that in the resource pool. Generally, the annual utilization of the supercomputer center X86 architecture clusters floats above and below 75%.
Claims (3)
1. A resource allocation method for queue resource scheduling based on a supercomputer is characterized by comprising the following steps:
(1) A user submits a job, and the number of required computing resources and the private queue name are specified;
(2) If the private queue resources can be used, namely the number of resources in the private queue resources is larger than the number of the computing resources in the step (1), the user operation is normally operated, and the process is finished; otherwise, judging whether the existing computing resources meet the required computing resource number, and entering the step (3); the submitted parameters refer to the number of computing resources and the private queue name specified by the user in the step (1);
(3) If the existing computing resources meet the required computing resource number, dividing the required temporary nodes into private queues corresponding to the private queue names in the step (1) from a resource pool, completing normal operation of user operation, and entering the step (4); otherwise, printing out reasons for the non-compliance condition;
(4) The temporary node is re-drawn back to the resource pool and is ended;
step (2) to step (4), comprising the steps of:
A. judging whether the number of resources in the private queue resources meets the number of resources required by a user, wherein the number of resources required by the user is the number of calculated resources, if so, executing the transmission of the bsub1 parameter to the system bsub2, and entering a step F, otherwise, entering a step B; the bsub1 parameters are all parameters which are configured and submitted by a user, and comprise the number of nodes, the number of cores required by each node and the private queue name of a task to be submitted, and the bsub2 is the bsub command which is called after the step (4) is finished;
B. c, counting the sum of the node number in the submitted job of the user and the node number expected to be used in the submitted job, if the sum is larger than the total number of the nodes purchased by the user, returning to print to prompt the user, and if the sum is not larger than the total number of the nodes purchased by the user, executing the step C;
C. the system calculates the residual usable calculation resources in the resource pool at the moment, if the residual usable calculation resources in the resource pool at the moment are smaller than the number of the nodes which are expected to be used in the submitting operation at the moment, the step D is entered, otherwise, the step E is entered;
D. after t minutes, the system calculates the residual usable computing resources in the resource pool at the moment, if the residual usable computing resources in the resource pool at the moment are still smaller than the number of nodes which are expected to be used in the submitted operation at the moment, the printing is returned to prompt the user, the system is insufficient in computing resources, and the system administrator is contacted, otherwise, the step E is entered;
E. executing a scheduling system command, namely transferring the number of nodes expected to be used by the submitted job from a resource pool to a private queue corresponding to the private queue name of the user, and then executing the transmission of the bsub1 parameter to the bsub 2;
F. executing the bsub2, acquiring the job node number of the submitted job, and executing the submitted job;
G. after the submitted job normally ends, executing a system command, and dividing the number of nodes which are allocated from the resource pool and are expected to be used for the submitted job into the resource pool from the private user queue.
2. A method for resource allocation for supercomputer-based queue resource scheduling as recited in claim 1, wherein t = 1.
3. The method for allocating resources for scheduling queue resources on a supercomputer according to any one of claims 1 and 2, wherein the number of resources in the private queue resources is greater than the number of resources in the resource pool.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010429029.7A CN113703952B (en) | 2020-05-20 | 2020-05-20 | Resource allocation method for queue resource scheduling based on supercomputer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010429029.7A CN113703952B (en) | 2020-05-20 | 2020-05-20 | Resource allocation method for queue resource scheduling based on supercomputer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113703952A CN113703952A (en) | 2021-11-26 |
CN113703952B true CN113703952B (en) | 2023-10-10 |
Family
ID=78645441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010429029.7A Active CN113703952B (en) | 2020-05-20 | 2020-05-20 | Resource allocation method for queue resource scheduling based on supercomputer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113703952B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114020443B (en) * | 2022-01-05 | 2022-03-18 | 国家超级计算天津中心 | Supercomputer resource scheduling method, electronic device and medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902592A (en) * | 2012-09-10 | 2013-01-30 | 曙光信息产业(北京)有限公司 | Zoning scheduling management method of cluster computing resources |
CN105320565A (en) * | 2014-07-31 | 2016-02-10 | 中国石油化工股份有限公司 | Computer resource scheduling method for various application software |
CN106708622A (en) * | 2016-07-18 | 2017-05-24 | 腾讯科技(深圳)有限公司 | Cluster resource processing method and system, and resource processing cluster |
CN106844056A (en) * | 2017-01-25 | 2017-06-13 | 北京百分点信息科技有限公司 | Hadoop big datas platform multi-tenant job management method and its system |
CN108304260A (en) * | 2017-12-15 | 2018-07-20 | 上海超算科技有限公司 | A kind of virtualization job scheduling system and its implementation based on high-performance cloud calculating |
CN109445919A (en) * | 2018-10-19 | 2019-03-08 | 曙光信息产业(北京)有限公司 | Online computing resource transaction system based on cloud service |
CN110806928A (en) * | 2019-10-16 | 2020-02-18 | 北京并行科技股份有限公司 | Job submitting method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8020161B2 (en) * | 2006-09-12 | 2011-09-13 | Oracle America, Inc. | Method and system for the dynamic scheduling of a stream of computing jobs based on priority and trigger threshold |
-
2020
- 2020-05-20 CN CN202010429029.7A patent/CN113703952B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902592A (en) * | 2012-09-10 | 2013-01-30 | 曙光信息产业(北京)有限公司 | Zoning scheduling management method of cluster computing resources |
CN105320565A (en) * | 2014-07-31 | 2016-02-10 | 中国石油化工股份有限公司 | Computer resource scheduling method for various application software |
CN106708622A (en) * | 2016-07-18 | 2017-05-24 | 腾讯科技(深圳)有限公司 | Cluster resource processing method and system, and resource processing cluster |
CN106844056A (en) * | 2017-01-25 | 2017-06-13 | 北京百分点信息科技有限公司 | Hadoop big datas platform multi-tenant job management method and its system |
CN108304260A (en) * | 2017-12-15 | 2018-07-20 | 上海超算科技有限公司 | A kind of virtualization job scheduling system and its implementation based on high-performance cloud calculating |
CN109445919A (en) * | 2018-10-19 | 2019-03-08 | 曙光信息产业(北京)有限公司 | Online computing resource transaction system based on cloud service |
CN110806928A (en) * | 2019-10-16 | 2020-02-18 | 北京并行科技股份有限公司 | Job submitting method and system |
Non-Patent Citations (2)
Title |
---|
Vineetha Kondameedi等.Adaptive Hybrid Queue Configuration for Supercomputer Systems.《2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)》.2017,90-99. * |
王硕.Hama中满足公平性和负载均衡资源调度器的研究及实现.《中国优秀硕士学位论文全文数据库信息科技辑》.2017,I138-2267. * |
Also Published As
Publication number | Publication date |
---|---|
CN113703952A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020206705A1 (en) | Cluster node load state prediction-based job scheduling method | |
CN103179048B (en) | Main frame qos policy transform method and the system of cloud data center | |
CN104598426B (en) | Method for scheduling task for heterogeneous multi-nucleus processor system | |
CN104111877A (en) | Thread dynamic deployment system and method based on thread deployment engine | |
CN102158513A (en) | Service cluster and energy-saving method and device thereof | |
CN105868004B (en) | Scheduling method and scheduling device of service system based on cloud computing | |
CN103716372A (en) | Digital library-as-a-service cloud computing platform construction method | |
WO2015100995A1 (en) | Intelligent service scheduling method | |
CN113703952B (en) | Resource allocation method for queue resource scheduling based on supercomputer | |
CN114816715B (en) | Cross-region-oriented flow calculation delay optimization method and device | |
CN109960591A (en) | A method of the cloud application resource dynamic dispatching occupied towards tenant's resource | |
CN108664116A (en) | Adaptive electricity saving method, device and the cpu controller of network function virtualization | |
CN106095581B (en) | Network storage virtualization scheduling method under private cloud condition | |
CN110850957B (en) | Scheduling method for reducing system power consumption through dormancy in edge computing scene | |
CN108388471A (en) | A kind of management method constraining empty machine migration based on double threshold | |
CN116360922A (en) | Cluster resource scheduling method, device, computer equipment and storage medium | |
Wang et al. | A hard real-time scheduler for Spark on YARN | |
CN114741200A (en) | Data center station-oriented computing resource allocation method and device and electronic equipment | |
WO2020244300A1 (en) | Method and apparatus for reducing power consumption of virtual machine cluster | |
CN111506407B (en) | Resource management and job scheduling method and system combining Pull mode and Push mode | |
CN110149341B (en) | Cloud system user access control method based on sleep mode | |
CN109960565A (en) | Cloud platform, dispatching method of virtual machine and device based on cloud platform | |
Gvozdetska et al. | Energy-efficient backfill-based scheduling approach for SLURM resource manager | |
CN106293000B (en) | A kind of virtual machine storage subsystem power-economizing method towards cloud environment | |
KR20190061241A (en) | Mesos process apparatus for unified management of resource and method for the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |