CN115543554A - Method and device for scheduling calculation jobs and computer readable storage medium - Google Patents

Method and device for scheduling calculation jobs and computer readable storage medium Download PDF

Info

Publication number
CN115543554A
CN115543554A CN202211035050.4A CN202211035050A CN115543554A CN 115543554 A CN115543554 A CN 115543554A CN 202211035050 A CN202211035050 A CN 202211035050A CN 115543554 A CN115543554 A CN 115543554A
Authority
CN
China
Prior art keywords
computing
resource
amount
job
queuing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211035050.4A
Other languages
Chinese (zh)
Inventor
王江
黄毅
李发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sany Group Co Ltd
Original Assignee
Sany Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sany Group Co Ltd filed Critical Sany Group Co Ltd
Priority to CN202211035050.4A priority Critical patent/CN115543554A/en
Publication of CN115543554A publication Critical patent/CN115543554A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The application discloses a method and a device for scheduling computing jobs and a computer readable storage medium, wherein by acquiring a queuing reason of a queuing job, when the queuing reason is that the residual resource amount of a computing resource pool corresponding to the queuing job is less than the required resource amount of the queuing job, the residual resource amount of other computing resource pools in a cluster is acquired, and when the residual resource amount of the other computing resource pools is greater than or equal to the required resource amount and the other computing resource pools are matched with the queuing job, the queuing job is distributed to the other computing resource pools for computing; the method and the device can not only improve the effective utilization of computing resources in the cluster, but also reduce the queuing waiting time of computing operation, thereby improving the computing efficiency and the utilization rate of the cluster.

Description

Method and device for scheduling computing jobs and computer readable storage medium
Technical Field
The present application relates to the field of computing scheduling technologies, and in particular, to a method and an apparatus for scheduling a computing job, and a computer-readable storage medium.
Background
High Performance Computing (HPC) is a computer technology aimed at improving scientific Computing power. HPC simulation is a parallel computing process, i.e., a method of partitioning an application into multiple pieces that can be executed in parallel and destined for execution on multiple processors.
HPC simulation computing needs to rely on scheduling software to manage the computing scheduling of multiple applications (such as simulation software, etc.), however, the general scheduling software can only schedule the computing of applications in a cluster of a single environment (such as a single computing pool), and it is difficult to meet the requirement of the computing resource scheduling problem in a complex environment.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. Embodiments of the present application provide a method and an apparatus for scheduling a computing job, and a computer-readable storage medium, which solve the above technical problems.
According to an aspect of the present application, there is provided a method for scheduling a computing job, including: acquiring a queuing reason of queuing operation; wherein the queued jobs represent computing jobs that are being queued for processing; when the queuing reason is that the residual resource amount of the computing resource pool corresponding to the queuing operation is smaller than the required resource amount of the queuing operation, acquiring the residual resource amount of other computing resource pools in the cluster; the cluster comprises a computing resource pool corresponding to the queuing operation and the other computing resource pools; and when the residual resource amount of the other computing resource pools is larger than or equal to the required resource amount and the other computing resource pools are matched with the queued operation, allocating the queued operation to the other computing resource pools for calculation.
In an embodiment, when the queuing reason is that the remaining resource amount of the computing resource pool corresponding to the queued job is less than the required resource amount of the queued job, the obtaining the remaining resource amount of other computing resource pools in the cluster includes:
when the queuing reason is that the quantity of the remaining resources of the computing resource pool corresponding to the queuing operation minus the quantity of the reserved resources is smaller than the quantity of the required resources of the queuing operation, obtaining the quantity of the remaining resources of other computing resource pools in the cluster; the reserved resource amount represents the amount of resources reserved in a single computing resource pool, and the reserved resource amount is positively correlated with the amount of remaining resources in the corresponding computing resource pool.
In an embodiment, the allocating the queued job to the other computing resource pool for calculation when the remaining resource amount of the other computing resource pool is greater than or equal to the required resource amount and the other computing resource pool matches the queued job includes:
when the amount of the remaining resources of the other computing resource pools minus the amount of the reserved resources is greater than or equal to the amount of the required resources and the other computing resource pools are matched with the queuing operation, allocating the queuing operation to the other computing resource pools for calculation; the reserved resource amount represents the amount of resources reserved in a single computing resource pool, and the reserved resource amount is positively correlated with the amount of the remaining resources in the corresponding computing resource pool.
In one embodiment, the method for scheduling a computing job further includes:
and when the residual resource amount of the other computing resource pools is less than the required resource amount, stopping scheduling the queuing operation.
In an embodiment, after the stopping of scheduling the queued job, the method for scheduling a computing job further includes:
and scheduling the computing job after the queued job.
In an embodiment, the method for scheduling a computing job further includes:
when the sum of the required resource amount of the computing job corresponding to a single user is greater than the upper limit of the resource amount of the single user, stopping scheduling the computing job of the single user; wherein the single-user resource amount upper limit is positively correlated with the remaining resource amount of the corresponding computing resource pool.
In an embodiment, before the obtaining the queuing reason of the queued job, the method for scheduling a computing job further includes:
and distributing the computing jobs to each computing resource pool respectively for matching according to the requirements of all the computing jobs and the computing characteristics of each computing resource pool in the cluster.
In an embodiment, the obtaining of the queuing reason of the queued job includes:
and when the queued operation waiting in the cluster exists, acquiring the queuing reason of the queued operation.
According to another aspect of the present application, there is provided a scheduling apparatus of a computing job, including: the first acquisition module is used for acquiring the queuing reason of the queuing operation; wherein the queued jobs represent computing jobs that are being queued for processing; a second obtaining module, configured to obtain the remaining resource amount of other computing resource pools in the cluster when the queuing reason is that the remaining resource amount of the computing resource pool corresponding to the queued job is smaller than the required resource amount of the queued job; the cluster comprises a computing resource pool corresponding to the queuing operation and the other computing resource pools; and the scheduling execution module is used for allocating the queued job to the other computing resource pools for calculation when the residual resource amount of the other computing resource pools is greater than or equal to the required resource amount and the other computing resource pools are matched with the queued job.
According to another aspect of the present application, there is provided a computer-readable storage medium storing a computer program for executing the method of scheduling a computing job as described in any one of the above.
According to the scheduling method and device for the computing job and the computer readable storage medium, when the queued job exists, by acquiring the queuing reason of the queued job, when the queuing reason is that the residual resource amount of the computing resource pool corresponding to the queued job is smaller than the required resource amount of the queued job, the residual resource amount of other computing resource pools in the cluster is acquired, and when the residual resource amount of other computing resource pools is larger than or equal to the required resource amount and the other computing resource pools are matched with the queued job, the queued job is distributed to the other computing resource pools for computing; the method comprises the steps of judging the reason for queuing the queued operation, acquiring the residual resource amount of other computing resource pools in the cluster if the residual resource amount of the computing resource pool corresponding to the queued operation is insufficient, and distributing the queued operation to the other computing resource pools for computing if the residual resource amount of the other computing resource pools meets the resource requirement of the queued operation and the other computing resource pools are matched with the queued operation, so that the effective utilization of the computing resources in the cluster can be improved, the queuing waiting time of the computing operation can be reduced, and the computing efficiency and the utilization rate of the cluster can be improved.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a flowchart illustrating a scheduling method for a computing job according to an exemplary embodiment of the present application.
Fig. 2 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application.
Fig. 3 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application.
Fig. 4 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application.
Fig. 5 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application.
Fig. 6 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application.
Fig. 7 is a schematic structural diagram of a scheduling apparatus for computing jobs according to an exemplary embodiment of the present application.
Fig. 8 is a schematic structural diagram of a scheduling apparatus for computing jobs according to another exemplary embodiment of the present application.
Fig. 9 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments of the present application, and it should be understood that the present application is not limited to the example embodiments described herein.
The application program of the HPC simulation computation needs to rely on the HPC scheduling software to implement the computation job scheduling, but the computation job scheduling policy of the ordinary HPC scheduling software can only schedule for a single computation cluster, and cannot meet the environment where multiple computation clusters, local computation clusters, and cloud computation clusters are combined.
In order to solve the problem of mutual scheduling among a plurality of computing clusters (including a local computing cluster and/or a cloud computing cluster), the application provides a method and a device for scheduling computing jobs, and a computer-readable storage medium, wherein all the clusters are scheduled and managed in the same scheduling manner, queued jobs in each cluster (or computing resource pool) are monitored, when queued jobs occur in one cluster, the remaining computing resource amount of other clusters is detected, and if the remaining computing resource amount of other clusters meets the requirement of the queued jobs, the queued jobs are scheduled to other clusters for computing processing, so that the computing efficiency of an HPC simulation computing program and the resource utilization rate of all the computing clusters are improved.
The following describes a specific scheme and an implementation manner of a method and an apparatus for scheduling a computing job and a computer-readable storage medium provided in an embodiment of the present application in detail with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a scheduling method for a computing job according to an exemplary embodiment of the present application. As shown in fig. 1, the method for scheduling a computing job includes the following steps:
step 110: and acquiring the queuing reason of the queued operation.
Wherein the queued job characterizes a computing job that is being queued for processing. In an embodiment, the specific implementation manner of step 110 may be: and when the queued operation which is queued to wait exists in the cluster, acquiring the queuing reason of the queued operation.
Specifically, the present application is applied to an HPC cluster scenario with multiple computing resource pools, for example, including 3 computing resource pools (a first local computing resource pool, a second local computing resource pool, and an on-cloud computing resource pool), where the first local computing resource pool includes 36 Servers (CPUs), the second local computing resource pool includes 48 Servers (CPUs), the on-cloud computing resource pool includes 64 Servers (CPUs), and software that needs to be computed in the present application includes STAR-CCM +, fluent, abaqus, LS _ Dyna, mecanicaladdl, optistruct, and so on. It should be understood that the number of computing resource pools and the number of corresponding servers in the present application are only exemplary, and are not limited to the specific number of computing resource pools and the specific number of corresponding servers. When a user submits a software computation, the application may be determined to be a queued job, possibly because the resources of the computing resource pool for the software or the client are all used up or the remaining resources are insufficient to compute the application. And when the queuing operation exists in the cluster, activating a scheduling program, namely judging the queuing reason of the queuing operation, so as to schedule the resource according to the queuing reason. Specifically, the scheduler may periodically (e.g., every minute) determine whether queued jobs exist in the cluster, thereby avoiding queuing for a long time without being discovered.
Step 120: and when the queuing reason is that the residual resource amount of the computing resource pool corresponding to the queuing operation is smaller than the required resource amount of the queuing operation, acquiring the residual resource amount of other computing resource pools in the cluster.
For example, in the above example, if the first local computing resource pool is the computing resource pool corresponding to the queued job, the second local computing resource pool and the cloud computing resource pool are other computing resource pools. When the queuing reason of the queued job is determined to be that the residual resource amount of the corresponding computing resource pool is smaller than the required resource amount, the scheduling software acquires the residual resource amount of other computing resource pools in the cluster to determine whether scheduling can be performed or not. For example, when a queuing operation exists in the first local computing resource pool and the queuing reason is that the remaining resource amount of the first local computing resource pool is not enough to meet the required resource amount of the queuing operation, the remaining resource amounts of the second local computing resource pool and the cloud computing resource pool are obtained. It should be understood that, the remaining resource amount of the other computing resource pools may be obtained according to a preset sequence, for example, the remaining resource amount of the second local computing resource pool is obtained first, and if the remaining resource amount of the second local computing resource pool does not meet the requirement, the remaining resource amount of the computing resource pool on the cloud is obtained, so as to save the computing amount for computing the remaining resource amount.
Step 130: and when the residual resource amount of the other computing resource pools is larger than or equal to the required resource amount and the other computing resource pools are matched with the queued operation, the queued operation is distributed to the other computing resource pools for calculation.
Because the requirements of different simulation computing software are different, for example, the STAR-CCM software is suitable for a multi-core server but has no requirement on main frequency, a computing resource pool on the cloud is selected for computing, and the Optistruct software is suitable for high main frequency but has no requirement on the number of core servers, a first local computing resource pool is selected, so that the computing resource pool meeting the requirements of the queuing operation is preferably selected as a target before the residual resource amount of other computing resource pools is obtained. After the remaining resource amount of the other computing resource pools is obtained, if the remaining resource amount of the other computing resource pools is greater than or equal to the required resource amount of the queued job (that is, the remaining resource amount of the other computing resource pools meets the requirement of the queued job), and the other computing resource pools are matched with the queued job, the queued job can be allocated to the other computing resource pools for calculation, so as to reduce the number of the simulation computing software waiting in the queue, thereby improving the effective utilization rate and the calculation efficiency of the cluster.
According to the scheduling method of the computing job, when the queued job exists, the queuing reason of the queued job is obtained, when the queuing reason is that the residual resource amount of the computing resource pool corresponding to the queued job is smaller than the required resource amount of the queued job, the residual resource amount of other computing resource pools in the cluster is obtained, and when the residual resource amount of the other computing resource pools is larger than or equal to the required resource amount and the other computing resource pools are matched with the queued job, the queued job is distributed to the other computing resource pools for computing; the method comprises the steps of judging the reason for queuing the queued operation, acquiring the residual resource amount of other computing resource pools in the cluster if the residual resource amount of the computing resource pool corresponding to the queued operation is insufficient, and distributing the queued operation to the other computing resource pools for computing if the residual resource amount of the other computing resource pools meets the resource requirement of the queued operation and the other computing resource pools are matched with the queued operation, so that the effective utilization of the computing resources in the cluster can be improved, the queuing waiting time of the computing operation can be reduced, and the computing efficiency and the utilization rate of the cluster can be improved.
Fig. 2 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application. As shown in fig. 2, the step 120 may include:
step 121: and when the queuing reason is that the quantity of the residual resources of the computing resource pool corresponding to the queuing operation minus the quantity of the reserved resources is smaller than the quantity of the required resources of the queuing operation, acquiring the quantity of the residual resources of other computing resource pools in the cluster.
The reserved resource quantity represents the quantity of the reserved resource in a single computing resource pool, and the reserved resource quantity is positively correlated with the quantity of the residual resource in the corresponding computing resource pool. In order to ensure the calculation speed of the calculation resource pool, a reserved resource amount can be set for each calculation resource pool, on one hand, the over-saturation operation of the calculation resource pool can be avoided, and on the other hand, certain calculation resources can be reserved for new users or more urgent calculation operation. Specifically, the reserved resource amount is in positive correlation with the remaining resource amount, that is, the more the remaining resource amount of a single computing resource pool is, the more the reserved resource amount is, for example, the number of servers of the single computing resource pool is 50, and when the resource utilization rate is 0 (that is, completely idle), the required resource amount of a single simulation computing software may be allowed to be at most 30 servers (that is, the reserved resource amount is 20 servers); when the resource utilization rate is 80% (i.e., the remaining resource amount is 10 servers), the required resource amount of a single simulation computing software may be allowed to be at most 5 servers (i.e., the reserved resource amount is 5 servers) at this time. Preferably, the corresponding reservation ratio may be preset according to different resource utilization rates, for example, the reservation ratio is 40% when the resource utilization rate is 0, and the reservation ratio is 50% when the resource utilization rate is 80%.
When the residual resource amount of the computing resource pool minus the reserved resource amount is smaller than the required resource amount of the queued job, that is, when the resource amount of the computing resource pool currently available for the single simulation computing software is smaller than the required resource amount, the residual resource amount of other computing resource pools in the cluster is obtained, so as to schedule the single simulation computing software.
Fig. 3 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application. As shown in fig. 3, the step 130 may include:
step 131: and when the remaining resource amount of the other computing resource pools minus the reserved resource amount is greater than or equal to the required resource amount and the other computing resource pools are matched with the queued operation, allocating the queued operation to the other computing resource pools for computation.
The reserved resource amount represents the amount of resources reserved in a single computing resource pool, and the reserved resource amount is positively correlated with the amount of the remaining resources in the corresponding computing resource pool. In order to ensure the computing speed of the computing resource pools, each computing resource pool is set with a reserved resource amount so as to avoid the over-saturation operation of the computing resource pools, and certain computing resources can be reserved for new users or more urgent computing operation. Therefore, when the available resource amount of the other computing resource pool is obtained, a part of the resource amount also needs to be reserved, that is, only when the resource amount obtained by subtracting the corresponding reserved resource amount (which is positively correlated to the current resource utilization rate) from the remaining resource amount of the other computing resource pool can still meet the required resource amount of the queuing operation and the computing resource pool is matched with the queuing operation, the queuing operation is allocated to the other computing resource pool for computation, so as to ensure the normal computation operation of the other computing resource pool.
Fig. 4 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application. As shown in fig. 4, the method for scheduling a computing job may further include:
step 140: and when the residual resource amount of other computing resource pools is less than the required resource amount, stopping scheduling the queuing operation.
If the residual resource amount of other computing resource pools is less than the resource amount required by the queued job, that is, the residual resource amount of all the computing resource pools in the cluster cannot meet the requirement of the queued job, the scheduling degree can only be exited, that is, the queued job is stopped from being scheduled, and the existing queuing state is maintained. And judging whether the queued operation exists or not and whether other computing resource pools meeting the requirements of the queued operation exist or not again in the next period, and scheduling the queued operation if the judgment results are yes.
In an embodiment, as shown in fig. 4, after step 140, the method for scheduling a computing job may further include:
step 150: and scheduling the computing job after the queued job.
Because the required resource amounts of different simulation computing software are different, if a certain computing resource pool has a plurality of queued jobs, and the required resource amount of the queued job positioned in front of the queue exceeds the remaining resource amounts of all the computing resource pools in the cluster, at this time, if the required resource amount of the subsequent queued job is smaller, a computing resource pool capable of computing the subsequent queued job may exist, and the subsequent queued job may be scheduled (the specific scheduling manner is as described in the above steps 110 to 130), so that the resource utilization rate of the whole cluster and the computing efficiency of the simulation computing software can be improved as much as possible.
Fig. 5 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application. As shown in fig. 5, the method for scheduling a computing job may further include:
step 160: and when the sum of the required resource quantity of the computing job corresponding to the single user is greater than the upper limit of the resource quantity of the single user, stopping scheduling the computing job of the single user.
Because the computing resources of the computing resource pool are limited, if the number of the simulation computing software submitted by a single user is too large, the single user may occupy too many resources, so that other users cannot normally use the simulation computing software. Therefore, the total resource of a single user can be limited (an upper limit of the resource amount of the single user is set), and if the sum of the required resource amounts of the simulation computing software simultaneously run by the single user exceeds the upper limit of the resource amount of the single user, the computing job submitted by the single user can only wait in a queue, and the scheduling software stops scheduling the computing job of the single user.
Wherein, the upper limit of the single-user resource amount is positively correlated with the residual resource amount of the corresponding computing resource pool. Specifically, the upper limit of the amount of the single-user resource is positively correlated with the amount of the remaining resources, that is, the more the amount of the remaining resources in a single computing resource pool is, the higher the upper limit of the amount of the single-user resource is, for example, the number of servers in the single computing resource pool is 50, and when the resource utilization rate is 0 (that is, completely idle), the total amount of the computing resources of the simulation computing software submitted by a single user can be allowed to be at most 30 servers; when the resource utilization rate is 80%, the total computing resources of the simulation computing software submitted by a single user can be allowed to be 5 servers at most (namely 5 servers are reserved, the 5 reserved servers can be labeled, and the running of common software is forbidden, so that the problem of supersaturation caused by the use of the 5 reserved servers is avoided as much as possible). Preferably, the ratio of the upper limit of the corresponding single-user resource amount to the remaining resource amount may be preset according to different resource utilization rates, for example, the ratio when the resource utilization rate is 0 is 40%, and the ratio when the resource utilization rate is 80% is 50%.
Fig. 6 is a flowchart illustrating a scheduling method for a computing job according to another exemplary embodiment of the present application. As shown in fig. 6, before step 110, the method for scheduling a computing job may further include:
step 170: and respectively distributing the computing jobs to each computing resource pool for matching according to the requirements of all the computing jobs and the computing characteristics of each computing resource pool in the cluster.
Because the requirements of different simulation computing software are different, for example, the STAR-CCM software is suitable for a multi-core server but has no requirement on the main frequency, a computing resource pool with a larger number of servers is selected for computing, and the Optistruct software is suitable for a high main frequency but has no requirement on the number of core servers, a computing resource pool with a higher main frequency is selected. In order to achieve higher calculation efficiency and calculation effect as much as possible, the simulation calculation software and the calculation resource pool can be paired in advance according to the requirements of the simulation calculation software and the calculation characteristics of the calculation resource pool, so that each simulation calculation software can perform calculation processing in a better calculation resource pool on the premise of no scheduling, and the simulation effect is guaranteed. In addition, in order to further improve the utilization rate and balance of computing resources in the cluster, the method and the device can also uniformly distribute each piece of simulation computing software to each computing resource pool on the premise of meeting the matching principle according to the required resource amount of each piece of simulation computing software, so as to reduce the risk of saturation of the computing resource pools as much as possible, and further reduce scheduling. In addition, after a period of time, the simulation computing software and the computing resource pool can be comprehensively matched again according to the use frequency and the like of each simulation computing software in the period of time, so that the risk of saturation of the computing resource pool is further reduced, and scheduling is reduced.
Fig. 7 is a schematic structural diagram of a scheduling apparatus for computing jobs according to an exemplary embodiment of the present application. As shown in fig. 7, the scheduling apparatus 70 for computing a job includes: a first obtaining module 71, configured to obtain a queuing reason of the queuing operation; wherein the queued jobs represent computing jobs that are being queued for processing; a second obtaining module 72, configured to obtain the remaining resource amount of the other computing resource pools in the cluster when the queuing reason is that the remaining resource amount of the computing resource pool corresponding to the queued job is smaller than the required resource amount of the queued job; the cluster comprises a computing resource pool corresponding to the queued operation and other computing resource pools; and a scheduling execution module 73, configured to allocate the queued job to another computing resource pool for calculation when the remaining resource amount of the other computing resource pool is greater than or equal to the required resource amount and the other computing resource pool is matched with the queued job.
According to the scheduling device for the computing jobs, when the queued jobs exist, the queuing reason of the queued jobs is obtained through the first obtaining module 71, when the queuing reason is that the residual resource amount of the computing resource pool corresponding to the queued jobs is smaller than the required resource amount of the queued jobs, the second obtaining module 72 obtains the residual resource amount of other computing resource pools in the cluster, and when the residual resource amount of the other computing resource pools is larger than or equal to the required resource amount and the other computing resource pools are matched with the queued jobs, the scheduling execution module 73 allocates the queued jobs to the other computing resource pools for calculation; the method comprises the steps of judging the reason for queuing the queued operation, acquiring the residual resource amount of other computing resource pools in the cluster if the residual resource amount of the computing resource pool corresponding to the queued operation is insufficient, and distributing the queued operation to the other computing resource pools for computing if the residual resource amount of the other computing resource pools meets the resource requirement of the queued operation and the other computing resource pools are matched with the queued operation, so that the effective utilization of the computing resources in the cluster can be improved, the queuing waiting time of the computing operation can be reduced, and the computing efficiency and the utilization rate of the cluster can be improved.
In an embodiment, the first obtaining module 71 may be further configured to: and when the queued jobs waiting in the queue exist in the cluster, acquiring the queuing reason of the queued jobs.
In an embodiment, the second obtaining module 72 may be further configured to: and when the queuing reason is that the quantity of the residual resources of the computing resource pool corresponding to the queuing operation minus the quantity of the reserved resources is smaller than the quantity of the required resources of the queuing operation, acquiring the quantity of the residual resources of other computing resource pools in the cluster. The reserved resource quantity represents the quantity of the reserved resource in a single computing resource pool, and the reserved resource quantity is positively correlated with the quantity of the residual resource in the corresponding computing resource pool.
In an embodiment, the schedule execution module 73 may be further configured to: and when the residual resource amount of the other computing resource pools minus the reserved resource amount is larger than or equal to the required resource amount and the other computing resource pools are matched with the queuing operation, the queuing operation is distributed to the other computing resource pools for calculation. The reserved resource amount represents the amount of resources reserved in a single computing resource pool, and the reserved resource amount is positively correlated with the amount of the remaining resources in the corresponding computing resource pool.
Fig. 8 is a schematic structural diagram of a scheduling apparatus for computing jobs according to another exemplary embodiment of the present application. As shown in fig. 8, the scheduling device 70 for computing jobs may include: and the scheduling termination module 74 is configured to stop scheduling the queued job when the remaining resource amount of the other computing resource pools is less than the required resource amount. Correspondingly, the scheduling device 70 of the computing job may be further configured to: and scheduling the computing job after the queued job.
In an embodiment, the schedule termination module 74 may be further configured to: and when the sum of the required resource quantity of the computing job corresponding to the single user is greater than the upper limit of the resource quantity of the single user, stopping scheduling the computing job of the single user.
In an embodiment, as shown in fig. 8, the scheduling device 70 for computing jobs may include: and the pre-allocation module 75 is configured to allocate the computing jobs to each computing resource pool for matching according to the requirements of all the computing jobs and the computing characteristics of each computing resource pool in the cluster.
Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 9. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom.
FIG. 9 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
As shown in fig. 9, the electronic device 10 includes one or more processors 11 and memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium and executed by the processor 11 to implement the scheduling methods of computing jobs of the various embodiments of the present application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
When the electronic device is a stand-alone device, the input means 13 may be a communication network connector for receiving the acquired input signals from the first device and the second device.
The input device 13 may also include, for example, a keyboard, a mouse, and the like.
The output device 14 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 9, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
The computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages, for carrying out operations according to embodiments of the present application. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
The computer readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (10)

1. A method for scheduling a computational job, comprising:
acquiring a queuing reason of queuing operation; wherein the queued jobs represent computing jobs that are being queued for processing;
when the queuing reason is that the residual resource amount of the computing resource pool corresponding to the queuing operation is smaller than the required resource amount of the queuing operation, acquiring the residual resource amount of other computing resource pools in the cluster; the cluster comprises a computing resource pool corresponding to the queuing operation and the other computing resource pools; and
and when the residual resource amount of the other computing resource pools is greater than or equal to the required resource amount and the other computing resource pools are matched with the queuing operation, distributing the queuing operation to the other computing resource pools for calculation.
2. The method according to claim 1, wherein when the queuing reason is that the remaining resource amount of the computing resource pool corresponding to the queued job is smaller than the required resource amount of the queued job, the obtaining of the remaining resource amount of other computing resource pools in the cluster includes:
when the queuing reason is that the quantity of the remaining resources of the computing resource pool corresponding to the queuing operation minus the quantity of the reserved resources is smaller than the quantity of the required resources of the queuing operation, obtaining the quantity of the remaining resources of other computing resource pools in the cluster; the reserved resource amount represents the amount of resources reserved in a single computing resource pool, and the reserved resource amount is positively correlated with the amount of the remaining resources in the corresponding computing resource pool.
3. The method according to claim 1, wherein the allocating the queued job to the other computing resource pool for calculation when the remaining amount of resources of the other computing resource pool is greater than or equal to the required amount of resources and the other computing resource pool matches the queued job comprises:
when the residual resource amount of the other computing resource pools minus the reserved resource amount is larger than or equal to the required resource amount and the other computing resource pools are matched with the queuing operation, allocating the queuing operation to the other computing resource pools for calculation; the reserved resource amount represents the amount of resources reserved in a single computing resource pool, and the reserved resource amount is positively correlated with the amount of remaining resources in the corresponding computing resource pool.
4. The method of scheduling a computing job according to any one of claims 1 to 3, further comprising:
and when the residual resource amount of the other computing resource pools is less than the required resource amount, stopping scheduling the queuing operation.
5. The method of scheduling a computing job according to claim 4, wherein after the stopping of scheduling the queued job, the method of scheduling a computing job further comprises:
and scheduling the computing job after the queued job.
6. The method according to any one of claims 1 to 3, wherein the method further comprises:
when the sum of the required resource amount of the computing job corresponding to a single user is greater than the upper limit of the resource amount of the single user, stopping scheduling the computing job of the single user; wherein the single-user resource amount upper limit is positively correlated with the remaining resource amount of the corresponding computing resource pool.
7. The method according to any one of claims 1 to 3, wherein before the obtaining of the queuing reason for the queued job, the method further comprises:
and respectively distributing the computing jobs to each computing resource pool for matching according to the requirements of all the computing jobs and the computing characteristics of each computing resource pool in the cluster.
8. The method according to any one of claims 1 to 3, wherein the obtaining of the queuing reason for the queued job comprises:
and when the queued operation waiting in the cluster exists, acquiring the queuing reason of the queued operation.
9. A scheduling apparatus of a computing job, comprising:
the first acquisition module is used for acquiring the queuing reason of the queuing operation; wherein the queued jobs represent computing jobs that are being queued for processing;
a second obtaining module, configured to obtain the remaining resource amount of other computing resource pools in the cluster when the queuing reason is that the remaining resource amount of the computing resource pool corresponding to the queued job is smaller than the required resource amount of the queued job; the cluster comprises a computing resource pool corresponding to the queuing operation and the other computing resource pools; and
and the scheduling execution module is used for allocating the queued job to the other computing resource pools for calculation when the residual resource amount of the other computing resource pools is greater than or equal to the required resource amount and the other computing resource pools are matched with the queued job.
10. A computer-readable storage medium storing a computer program for executing the method of scheduling a computing job according to any one of claims 1 to 8.
CN202211035050.4A 2022-08-26 2022-08-26 Method and device for scheduling calculation jobs and computer readable storage medium Pending CN115543554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211035050.4A CN115543554A (en) 2022-08-26 2022-08-26 Method and device for scheduling calculation jobs and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211035050.4A CN115543554A (en) 2022-08-26 2022-08-26 Method and device for scheduling calculation jobs and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115543554A true CN115543554A (en) 2022-12-30

Family

ID=84726457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211035050.4A Pending CN115543554A (en) 2022-08-26 2022-08-26 Method and device for scheduling calculation jobs and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115543554A (en)

Similar Documents

Publication Publication Date Title
US8918784B1 (en) Providing service quality levels through CPU scheduling
US8645592B2 (en) Balancing usage of hardware devices among clients
RU2538920C2 (en) Method for task distribution by computer system server, computer-readable data medium and system for implementing said method
US11150951B2 (en) Releasable resource based preemptive scheduling
US9973512B2 (en) Determining variable wait time in an asynchronous call-back system based on calculated average sub-queue wait time
US20080229319A1 (en) Global Resource Allocation Control
US20150128136A1 (en) Graphics processing unit controller, host system, and methods
CN103927225A (en) Multi-core framework Internet information processing and optimizing method
CN112783659B (en) Resource allocation method and device, computer equipment and storage medium
CN114327843A (en) Task scheduling method and device
CN111338785A (en) Resource scheduling method and device, electronic equipment and storage medium
US10733022B2 (en) Method of managing dedicated processing resources, server system and computer program product
CN114721818A (en) Kubernetes cluster-based GPU time-sharing method and system
CN111625339A (en) Cluster resource scheduling method, device, medium and computing equipment
CN111597044A (en) Task scheduling method and device, storage medium and electronic equipment
CN114327894A (en) Resource allocation method, device, electronic equipment and storage medium
US20150212859A1 (en) Graphics processing unit controller, host system, and methods
US8245229B2 (en) Temporal batching of I/O jobs
CN109189581B (en) Job scheduling method and device
Andrews et al. Survey on job schedulers in hadoop cluster
Kyi et al. An efficient approach for virtual machines scheduling on a private cloud environment
CN115543554A (en) Method and device for scheduling calculation jobs and computer readable storage medium
KR101639947B1 (en) Hadoop preemptive deadline constraint scheduling method, execution program thereof method and recorded medium of the program
Rodrigo Álvarez et al. A2l2: An application aware flexible hpc scheduling model for low-latency allocation
Aluri et al. Priority based non-preemptive shortest job first resource allocation technique in cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Lv Qinghai

Inventor after: Wang Jiang

Inventor after: Huang Yi

Inventor after: Li Fa

Inventor before: Wang Jiang

Inventor before: Huang Yi

Inventor before: Li Fa

CB03 Change of inventor or designer information