CN114691328A - Backfill scheduling parameter determination method, device, equipment, medium and program product - Google Patents

Backfill scheduling parameter determination method, device, equipment, medium and program product Download PDF

Info

Publication number
CN114691328A
CN114691328A CN202210293938.1A CN202210293938A CN114691328A CN 114691328 A CN114691328 A CN 114691328A CN 202210293938 A CN202210293938 A CN 202210293938A CN 114691328 A CN114691328 A CN 114691328A
Authority
CN
China
Prior art keywords
backfill
scheduling
job
state
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210293938.1A
Other languages
Chinese (zh)
Inventor
朱文强
张涛
胡梦龙
吕灼恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Shuguang International Information Industry Co ltd
Original Assignee
Zhongke Shuguang International Information Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Shuguang International Information Industry Co ltd filed Critical Zhongke Shuguang International Information Industry Co ltd
Priority to CN202210293938.1A priority Critical patent/CN114691328A/en
Publication of CN114691328A publication Critical patent/CN114691328A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a backfill scheduling parameter determination method, a backfill scheduling parameter determination device, a backfill scheduling parameter determination storage medium and a program product. The method comprises the steps of obtaining an output log of the job scheduling system in the previous backfill scheduling process, determining the backfill state of the backfill scheduling process according to the output log, and adjusting the current backfill scheduling parameters according to a scheduling strategy corresponding to the backfill state. The method realizes the state monitoring of the job scheduling system in the backfill scheduling process, can acquire the backfill state of the backfill scheduling process in real time, further realizes the dynamic optimization of the current backfill scheduling parameters based on the backfill state, enables the scheduling effect of backfill scheduling to be optimal, and further improves the resource utilization rate of the job scheduling system.

Description

Backfill scheduling parameter determination method, device, equipment, medium and program product
Technical Field
The present application relates to the field of computer device technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for determining backfill scheduling parameters.
Background
SLURM (simple Linux Utility for Resource management) is a highly scalable and fault-tolerant cluster manager and job scheduling system that can be used for large-scale computing node clusters, and is widely used by supercomputers and computing clusters worldwide. And the SLURM scheduling system comprises a backfill scheduler, the backfill scheduler can consider each running job, sequence the queued jobs according to the priority from high to low, and determine the starting time of each job. If the job under consideration can be started immediately without affecting the expected start time of any higher priority jobs, then this job will be executed directly by the backfill scheduler to efficiently schedule jobs submitted by the compute nodes.
In the practical use process of the computing cluster, the SLURM scheduling system provides a plurality of parameters, a user can limit and manage the scheduling system by adjusting the parameters, similarly, a backfill scheduler in the SLURM scheduling system also has a plurality of parameters to set backfill scheduling time, scheduling queue depth, scheduling time interval and the like, and further effective limitation and management on the scheduling system can be realized by adjusting the parameters. At present, large domestic HPC (high Performance computing) clusters are numerous, the scale of the clusters and the user quantity on each cluster are different, a user can only set a relatively reasonable value for backfill scheduling parameters according to information such as the cluster scale, the workload and the like, but the parameters are numerous and cannot achieve an optimal effect, so that poor scheduling effect and low resource utilization rate can be caused.
Based on this, how to provide a method for optimizing the parameters of the backfill scheduler so that the SLURM scheduling system can achieve the optimal scheduling effect in the job scheduling process, thereby improving the resource utilization rate becomes a technical problem to be solved urgently at present.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a device, a medium, and a program product for determining backfill scheduling parameters, which can improve scheduling effect and improve resource utilization.
In a first aspect, the present application provides a method for determining backfill scheduling parameters, where the method includes:
acquiring an output log of a job scheduling system in the previous backfill scheduling process;
determining the backfill state of the current backfill scheduling process according to the output log;
and adjusting the current backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
According to the method and the device, the state monitoring of the job scheduling system in the backfill scheduling process is realized, the backfill state of the backfill scheduling process can be acquired in real time, the dynamic optimization of the current backfill scheduling parameters based on the backfill state is further realized, the scheduling effect of backfill scheduling is enabled to be optimal, and the resource utilization rate of the job scheduling system is improved.
In one embodiment, if the backfill state is a busy state, the adjusting the current backfill scheduling parameter according to the scheduling policy corresponding to the backfill state includes:
and detecting the operation exit reason of the previous backfill scheduling process, and adjusting the current backfill scheduling parameters according to the operation exit reason.
In one embodiment, the adjusting the current backfill scheduling parameter according to the job exit reason includes:
extracting a set queue depth, a backfill scheduling overtime state and a job scheduling state of each job from an output log of the previous backfill scheduling process;
if the job quit reason is that the set queue depth is smaller than the current queue actual depth and the job scheduling state of each job indicates that part of the jobs are not scheduled, the value of the job related parameter in the current backfill scheduling parameter is increased;
and if the reason for the operation quitting is that the overtime state of the backfill scheduling is the overtime quitting of the backfill scheduling, and the operation scheduling state of each operation indicates that part of the operation is not scheduled, increasing the backfill scheduling time in the current backfill scheduling parameter.
According to the method and the device, the backfill scheduling parameters are optimized by analyzing the reason for quitting the job, the probability of invalid job scheduling of the job scheduling system in the backfill scheduling process can be reduced, the scheduling efficiency of the job scheduling system on the job is further improved, and the optimal scheduling effect is achieved.
In one embodiment, if the backfill state is an idle state, the adjusting the current backfill scheduling parameter according to the scheduling policy corresponding to the backfill state includes:
and adjusting the backfill scheduling time and/or setting the queue depth in the current backfill scheduling parameters.
The method can avoid the phenomenon that the backfill occupancy lock influences the running of other threads due to the fact that the backfill scheduling occupies the lock and the queue is not full of backfill jobs when the backfill state is the idle state, so that the running of the backfill scheduler is optimized, and the backfill scheduler can obtain the optimal state.
In one embodiment, the method further comprises:
and if the number of the jobs in the current backfill scheduling parameter is smaller than a preset number threshold or no job meets the backfill requirement, increasing the backfill scheduling time interval in the current backfill scheduling parameter.
The method of the embodiment realizes dynamic optimization of backfill scheduling parameters under the condition that the number of the jobs is small or the jobs do not meet backfill requirements, so that a backfill scheduler can effectively save scheduling resources when the backfill scheduler does not need backfill scheduling.
In a second aspect, the present application further provides a job scheduling method. The method comprises the following steps:
acquiring jobs submitted by each job node in a job scheduling system;
adjusting the backfill scheduling parameter according to the backfill scheduling parameter determining method of the first aspect to obtain an adjusted backfill scheduling parameter;
and carrying out backfill scheduling on the jobs submitted by the job nodes according to the adjusted backfill scheduling parameters.
The job scheduling method provided by the embodiment obtains the backfill state of the current backfill scheduling process by detecting the log output in the backfill scheduling process, judges whether the backfill state is optimal or not, and divides the backfill state into two types if the backfill state is not optimal, wherein one type of the backfill state is a busy state, and the backfill scheduler processes the jobs which are too long in time and cannot traverse to the deeper jobs in the job queue under the state, so that part of the jobs cannot be scheduled; the other type of state is an idle state, the workload in the job queue is small in the state, the backfill occupancy lock is caused to influence the running of other threads because the backfill occupancy lock occupies the lock every time the backfill scheduling and the queue is not full of the backfill job, so that the running condition of the backfill scheduler can be adjusted by analyzing the state and adjusting backfill scheduling parameters, and the backfill scheduler can reach the optimal state of the backfill scheduler. Therefore, when the management node performs backfill scheduling on the job by using the optimal backfill scheduling parameter, the job scheduling efficiency and the resource utilization rate of the job scheduling system can be further improved.
In a third aspect, the application further provides a backfill scheduling parameter determining device. The device comprises:
the acquisition module is used for acquiring an output log of the job scheduling system in the backfilling scheduling process;
the determining module is used for determining the backfill state of the backfill scheduling process according to the output log;
and the adjusting module is used for adjusting the backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
In a fourth aspect, the application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the method of the first or second aspect when executing the computer program.
In a fifth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first or second aspect described above.
In a sixth aspect, the present application further provides a computer program product. The computer program product comprising a computer program that when executed by a processor implements the method of the first or second aspect.
Drawings
FIG. 1 is a schematic diagram of a job scheduling system in one embodiment;
FIG. 2 is a schematic diagram of a backfill scheduling process in one embodiment;
FIG. 3 is a flow diagram illustrating a method for determining backfill scheduling parameters, according to one embodiment;
FIG. 4 is a flow diagram illustrating the manner in which backfill scheduling parameters are adjusted in one embodiment;
FIG. 5 is a flow chart illustrating a method for determining backfill scheduling parameters according to another embodiment;
FIG. 6 is a schematic diagram of an optimization process in one embodiment;
FIG. 7 is a flowchart illustrating a job scheduling method according to one embodiment;
FIG. 8 is a schematic illustration of a process flow for performing a job in one embodiment;
FIG. 9 is a block diagram of an apparatus for determining backfill scheduling parameters in one embodiment;
FIG. 10 is a block diagram of an apparatus for determining backfill scheduling parameters, according to one embodiment;
FIG. 11 is a block diagram showing the construction of a job scheduling apparatus according to an embodiment;
FIG. 12 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for determining backfill scheduling parameters according to the embodiment of the present application can be applied to a job scheduling system as shown in fig. 1, where the job scheduling system includes a plurality of computing nodes 102, a management node 104 and a plurality of user terminals 106, and the management node 104 is connected to each user terminal 106 and each computing node 102 respectively. Each user terminal 106 is responsible for submitting the job application program to the management node 104, the management node 104 schedules corresponding resources according to the requirements of the job application program, backfill scheduling of the job is achieved, an operation instruction is sent to each computing node 102, each computing node 102 starts to operate the job application program after receiving the operation instruction, and a calculation result is returned to the user terminal 106 through the management node 104 after the scheduling job is finished. The user terminal 106 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The computing nodes 102 may be implemented as individual servers or as a server cluster of multiple servers. Management node 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
In the conventional SLURM scheduling system, the SLURM scheduling system maintains a queue of pending jobs and manages the overall resource utilization of the jobs. It manages available computing nodes in a shared or unshared manner for the user to perform work. The SLURM scheduling system will allocate resources for the task queue appropriately and monitor the job to its completion. The SLURM scheduling system provides three key functions, first, it allocates exclusive and/or non-exclusive access rights to resources for users over a period of time so that they can perform work. Second, it provides a framework for initiating, executing, and monitoring work on a set of distributed nodes. Finally, it arbitrates resource contention by managing the pending job queue.
The SLURM scheduler in the conventional SLURM scheduling system mainly comprises a main scheduler and a backfill scheduler, wherein the backfill scheduler considers running jobs and then sorts the queued jobs in the order of priority from high to low to determine the start time of each job. If a low priority job can be started immediately without affecting the expected start time of any higher priority job, then this job will be executed directly by the backfill scheduler. The effect of priority scheduling and backfill scheduling is shown in figure 2. With time in the horizontal direction and the number of resources in the vertical direction, it can be seen that the backfill scheduler advances the a and b jobs and does not affect the expected start time of the previous job.
Generally, in the actual use process of an SLURM scheduling system of a large-scale computing cluster, the quantities of jobs submitted by users at different times are different, when the jobs submitted by the users are suddenly increased, if the scheduling time set by a backfill scheduler is short or the depth of a scheduled queue is small, a phenomenon that a plurality of jobs cannot be scheduled in a none state is caused, so that even if idle resources meet the job operation requirements, the jobs cannot be provided for use, at the moment, relevant parameters of backfill scheduling need to be manually adjusted by an administrator user, and if the administrator finds that the jobs cannot be scheduled in time, long-time none of jobs is caused, so that the jobs cannot be scheduled by the users, the resource waste is serious, and the user experience is poor. In order to solve the above problems, the present application provides a method for determining backfill scheduling parameters, which can implement dynamic optimization of backfill scheduling parameters in a backfill scheduling process by a scheduling system, so that the scheduling system achieves an optimal scheduling effect in the backfill scheduling process, and further improves resource utilization rate. The following embodiment specifically illustrates the determination method of the backfill scheduling parameter.
In one embodiment, as shown in fig. 3, a backfill scheduling parameter determining method is provided, which is described by taking the method as an example applied to the management node in fig. 1, and includes the following steps:
s101, obtaining an output log of the job scheduling system in the previous backfill scheduling process.
Parameters related to the backfill scheduling process or the backfill scheduler, such as the number of scheduled jobs, the time of the scheduled jobs, and the reason for exiting the backfill scheduling, can be contained in the output log. The job scheduling system may be the job scheduling system shown in fig. 1, in which the management node is responsible for performing backfill scheduling on jobs submitted by each user terminal.
In this embodiment, in the process of performing each round of backfill scheduling on jobs submitted by each user terminal, the management node records parameters related to the backfill scheduling process, such as the number of scheduled jobs, the time of scheduled jobs, the reason for backfill scheduling exit, and the state of backfill scheduling exit, in the output log in each round of backfill scheduling process. Therefore, when the management node carries out backfill scheduling of the current round on the jobs submitted by each user terminal, the output log of each previous round of backfill scheduling process can be detected in a polling mode. It should be noted that, a backfill scheduler may be installed on the management node, and when backfill scheduling is performed, the backfill scheduler may be started to perform backfill scheduling, and an output log of the backfill scheduler is detected.
And S102, determining the backfill state of the current backfill scheduling process according to the output log.
Wherein the backfill state includes a best state, a busy state, and an idle state. The optimal state represents that the management node can reasonably process the jobs in the job queue in the backfill scheduling process, so that the backfill scheduling efficiency and the backfill scheduling quality can be ensured; the busy state indicates that the management node processes each job for too long time and cannot traverse to the job deeper in the job queue in the backfill scheduling process, so that part of the jobs cannot be scheduled. The idle state indicates that the amount of work in the work queue is small, each backfill scheduling occupies a lock and the queue is not full of backfilled work, so that the backfilling occupies the lock and influences the running of other threads.
In this embodiment, when the management node detects an output log of a previous backfill scheduling process, parameters related to the backfill scheduling process, such as the number of scheduling jobs, the scheduling job time, and the backfill scheduling exit reason, may be further extracted from the output log, and the backfill state of the current backfill scheduling process may be determined by combining a plurality of parameters set in the current backfill scheduling parameter, such as the backfill scheduling time, the backfill scheduling time interval, and the like, and resources schedulable by the management node and each connected computing node.
S103, adjusting the current backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
Wherein, different backfill states correspond to different scheduling strategies. The scheduling policy is used to adjust the current backfill scheduling parameters, i.e. to optimize the current backfill scheduling parameters. The backfill scheduling parameters include a plurality of parameters, wherein the parameters having a large influence on the backfill scheduling process can be shown in table 1, and a user can control the backfill scheduling process in the scheduling system by modifying the backfill scheduling parameters, namely, the backfill scheduler is controlled, the backfill scheduling parameters are mainly divided into two types, one is time for modifying each function of the backfill scheduler, and the other is job number for modifying an access queue of the backfill scheduler.
TABLE 1
Figure BDA0003562511720000071
Figure BDA0003562511720000081
In this embodiment, the management node may determine and store a correspondence between different backfill states and different scheduling policies in advance. When the management node determines the backfill state of the current backfill scheduling process based on the steps, the influence of each backfill scheduling parameter on the backfill scheduling process can be further analyzed, and the backfill scheduling parameters and the backfill state are bound, so that the current backfill scheduling parameters are adjusted according to the scheduling strategy corresponding to the backfill state, and the optimization of the current backfill scheduling parameters is realized. Specifically, when the backfill state is a busy state, a scheduling policy corresponding to the busy state can be determined according to the corresponding relationship between the backfill state and the scheduling policy, and the current backfill scheduling parameter is adjusted according to the scheduling policy corresponding to the busy state; when the backfill state is the idle state, the scheduling policy corresponding to the idle state can be determined according to the corresponding relation between the backfill state and the scheduling policy, and the current backfill scheduling parameter is adjusted according to the scheduling policy corresponding to the idle state.
The method for determining the backfill scheduling parameter according to the embodiment includes acquiring an output log of the job scheduling system in a previous backfill scheduling process, determining a backfill state of the backfill scheduling process according to the output log, and adjusting a current backfill scheduling parameter according to a scheduling policy corresponding to the backfill state. The method realizes the state monitoring of the job scheduling system in the backfill scheduling process, can acquire the backfill state of the backfill scheduling process in real time, further realizes the dynamic optimization of the current backfill scheduling parameters based on the backfill state, enables the scheduling effect of backfill scheduling to be optimal, and further improves the resource utilization rate of the job scheduling system.
In the method according to the above embodiment, the backfill states detected by the management node in the current backfill scheduling process may be divided into an optimal state, a busy state, and an idle state, and when the backfill state is the busy state, the present application provides a method for optimizing the backfill scheduling parameters in the backfill state, that is, the step S103 "adjusting the current backfill scheduling parameters according to the scheduling policy corresponding to the backfill state" includes: and detecting the job exit reason of the previous backfill scheduling process, and adjusting the current backfill scheduling parameter according to the job exit reason.
Specifically, when the management node determines that the backfill state of the current backfill scheduling process is a busy state based on the foregoing steps, a job exit reason of the previous backfill scheduling process can be further extracted from an output log of the previous backfill scheduling process, a scheduling policy corresponding to the job exit reason is obtained by analyzing the job exit reason, and the current backfill scheduling parameter is adjusted according to the scheduling policy corresponding to the job exit reason. It should be noted that different job exit reasons may correspond to different scheduling policies, and the correspondence may be predetermined and stored by the management node.
Further, a specific manner for adjusting the backfill scheduling parameter according to the job exit reason of the backfill scheduling process is provided, as shown in fig. 4, the manner includes:
s201, extracting the set queue depth, the overtime state of backfill scheduling and the job scheduling state from the output log of the previous backfill scheduling process.
The overtime state of the backfill scheduling comprises a state that the backfill scheduling exits overtime or the backfill scheduling does not exit overtime. The job scheduling state includes a state in which a job is scheduled or a job is not scheduled. Specifically, when the management node needs to adjust the current backfill scheduling parameter according to the job exit reason of the backfill scheduling process, the job related parameters such as the set queue depth, the timeout state of the backfill scheduling, the job scheduling state of each job, and the like can be further extracted from the output log of the previous backfill scheduling process, so that the job related parameters can be used as references to adjust the current backfill scheduling parameter.
S202, if the reason for the job quit is that the set queue depth is smaller than the current actual queue depth, the value of the job related parameter in the current backfill scheduling parameter is increased.
When the job quit reason is that the set queue depth is smaller than the current queue actual depth, and the job scheduling state of each job indicates that some jobs are not scheduled, it indicates that the set queue depth is too small, and the values of the job-related parameters in the current backfill scheduling parameters need to be properly increased, for example, the set queue depth, the partition number, the number of users, the starting time, and the like, so that all jobs contained in the queue can be effectively scheduled, and further, resources are fully utilized. The values of the job-related parameters also include the values of the job-related parameters in table 1, such as the maximum number of jobs started per partition, the maximum number of jobs started per user, and the like.
And S203, if the reason for the job quit is that the backfill scheduling overtime state is the backfill scheduling overtime quit, and the job scheduling state of each job indicates that part of jobs are not scheduled, increasing the backfill scheduling time in the current backfill scheduling parameters.
When the job quit reason is that the backfill scheduling overtime quits, and the job scheduling state of each job indicates that part of the job is not scheduled, the backfill scheduling time is short, and the backfill scheduling time can be properly increased. According to the method and the device, the backfill scheduling parameters are optimized by analyzing the reason for quitting the job, the probability of invalid job scheduling of the job scheduling system in the backfill scheduling process can be reduced, the scheduling efficiency of the job scheduling system on the job is further improved, and the optimal scheduling effect is achieved.
In an embodiment, when the backfill state is the idle state, a method for optimizing the backfill scheduling parameter in the backfill state is further provided, that is, the step S103 "adjusting the current backfill scheduling parameter according to the scheduling policy corresponding to the backfill state" includes: and adjusting the backfill scheduling time in the current backfill scheduling parameter and/or setting the queue depth.
Specifically, when the management node determines that the backfill state of the current backfill scheduling process is an idle state based on the steps, the management node indicates that the workload in the job queue is small, and can properly reduce the backfill scheduling time in the current backfill scheduling parameter, so that the backfill scheduling time is matched with the workload in the job queue, and the waste of the backfill scheduling time is avoided; optionally, the set queue depth in the current backfill scheduling parameter can be properly reduced, so that the set queue depth is matched with the depth of the job queue, and the waste of queue resources is not caused; optionally, the backfill scheduling time in the current backfill scheduling parameter and the set queue depth may also be simultaneously reduced, thereby avoiding waste of backfill scheduling time and waste of queue resources. The method of the embodiment can avoid the phenomenon that the backfill occupancy lock influences the operation of other threads due to the backfill scheduling occupancy lock and the fact that the queue is not full of backfill jobs when the backfill state is the idle state, thereby optimizing the operation of the backfill scheduler and obtaining the optimal state of the backfill scheduler.
Further, the step S103 of adjusting the current backfill scheduling parameter according to the scheduling policy corresponding to the backfill state further includes: and if the number of the jobs in the current backfill scheduling parameter is smaller than a preset number threshold or no job meets the backfill requirement, increasing the backfill scheduling time interval in the current backfill scheduling parameter.
Specifically, when the management node determines that the backfill state of the current backfill scheduling process is an idle state based on the foregoing steps, it may be further determined whether the number of jobs in the current backfill scheduling parameter is smaller than a preset number threshold, and if the number of jobs in the current backfill scheduling parameter is smaller than the preset number threshold, it indicates that the number of jobs to be scheduled in the job queue is small, in which case the number of backfill scheduling times may be reduced, and the time interval of backfill scheduling in the current backfill scheduling parameter may be appropriately increased. Optionally, when the management node determines that the backfill state of the current backfill scheduling process is an idle state based on the foregoing steps, it may further determine whether the jobs in the queue meet the backfill requirement, and if no job meets the backfill requirement, it indicates that backfill scheduling is not needed, or the number of backfill scheduling times is reduced, in this case, the time interval of backfill scheduling in the current backfill scheduling parameter may be appropriately increased. The method of the embodiment realizes the dynamic optimization of the backfill scheduling parameters under the condition that the number of the jobs is small or the jobs do not meet the backfill requirements, so that the backfill scheduler can effectively save scheduling resources when the backfill scheduling is not needed.
In summary of all the above embodiments, there is also provided a method for determining backfill scheduling parameters, as shown in fig. 5, the method includes:
s301, obtaining an output log of the job scheduling system in the previous backfill scheduling process.
And S302, determining the backfill state of the current backfill scheduling process according to the output log.
S303, extracting the set queue depth, the backfill scheduling overtime state and the job scheduling state from the output log, and determining the job quit reason of the backfill scheduling process.
S304, if the backfill state is a busy state, determining a job quit reason of the backfill scheduling process, and if the job quit reason is that the set queue depth is smaller than the current queue actual depth, increasing the value of the job related parameter in the current backfill scheduling parameter; and if the job quit reason is that the backfill scheduling overtime quits because of the overtime state of the backfill scheduling and the job scheduling state of each job indicates that part of the jobs are not scheduled, increasing the backfill scheduling time in the current backfill scheduling parameters.
S305, if the backfill state is the idle state, the backfill scheduling time in the current backfill scheduling parameter is reduced and/or the queue depth is set, and the backfill scheduling time interval in the current backfill scheduling parameter is increased under the condition that the number of the jobs in the current backfill scheduling parameter is smaller than a preset number threshold or no job meets the backfill requirement.
The method described in the above embodiments of fig. 2 to fig. 5 is a method for adjusting backfill scheduling parameters for a management node, so as to optimize the backfill scheduling parameters, and reference may be made to an optimization flow diagram shown in fig. 6, where in the whole scheduling process, a backfill scheduler executes a loop step: and monitoring the log in real time, determining a backfill state according to the log output, and adjusting backfill scheduling parameters according to the backfill state. During which the backfill scheduler may extract relevant parameters such as backfill lock release, backfill interval, backfill scheduling initiative, etc. from the log output and determine backfill status based on these relevant parameters. Dynamic optimization of backfill scheduling parameters is achieved by a round robin optimization procedure as shown in fig. 6. Specifically, when the method described in the embodiments of fig. 2 to fig. 6 is applied to the job scheduling system shown in fig. 1 to schedule each job, the efficient scheduling of each job can be improved, and the resource utilization rate in the scheduling process can be improved. The following FIG. 7 embodiment illustrates a job scheduling method applied to the application environment shown in FIG. 1.
In an embodiment, as shown in fig. 7, a job scheduling method is provided, which is described by taking the application of the method to the management node in fig. 1 as an example, and includes the following steps:
s401, acquiring the jobs submitted by each user terminal in the job scheduling system.
At present, the applications in the fields of weather forecast, satellite image processing, biomedicine, industrial simulation and the like of the national civilians are all indiscriminate during calculation, but each high-performance cluster is an indiscriminate scheduling system. A job scheduling system (such as the SLURM scheduling system) may allocate scheduling resources appropriately. Specifically, the management node may adopt the job operation flow to allocate or manage the job as shown in fig. 8, where after a user terminal in the job scheduling system submits the job, a main scheduler or a backfill scheduler on the management node in the job scheduling system applies or allocates corresponding resources according to the needs of the job to ensure normal execution of the job, the management node sends an operation instruction to the applied computing nodes, and each computing node has a corresponding daemon process.
S402, adjusting the backfill scheduling parameters according to the backfill scheduling parameter determining method described in the embodiment of FIGS. 2-7, and obtaining adjusted backfill scheduling parameters.
The backfill scheduling parameters are adjusted according to the backfill scheduling parameter determining method described in the embodiments of fig. 2-7, so that the backfill scheduling parameters are optimized, and the optimized backfill scheduling parameters can be matched with the current scheduling process.
And S403, performing backfill scheduling on the jobs submitted by each user terminal according to the adjusted backfill scheduling parameters.
After the backfill scheduling parameters are adjusted to carry out backfill scheduling on the jobs submitted by each user terminal, the job scheduling system can achieve the best scheduling effect in the backfill scheduling process.
The job scheduling method provided by the embodiment obtains the backfill state of the current backfill scheduling process by detecting the log output in the backfill scheduling process, judges whether the backfill state is optimal or not, and divides the backfill state into two types if the backfill state is not optimal, wherein one type of the backfill state is a busy state, and the backfill scheduler processes the jobs which are too long in time and cannot traverse to the deeper jobs in the job queue under the state, so that part of the jobs cannot be scheduled; the other type of state is an idle state, the workload in the job queue is small in the state, the backfill occupancy lock is caused to influence the running of other threads because the backfill occupancy lock occupies the lock every time the backfill scheduling and the queue is not full of the backfill job, so that the running condition of the backfill scheduler can be adjusted by analyzing the state and adjusting backfill scheduling parameters, and the backfill scheduler can reach the optimal state of the backfill scheduler. When the management node performs backfill scheduling on the job by using the optimal backfill scheduling parameter, the job scheduling efficiency and the resource utilization rate of the job scheduling system can be further improved.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a backfill scheduling parameter determining device for implementing the backfill scheduling parameter determining method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so that specific limitations in the following embodiment of the apparatus for determining one or more backfill scheduling parameters may refer to the limitations of the method for determining backfill scheduling parameters, and are not described herein again.
In one embodiment, as shown in fig. 9, there is provided a backfill scheduling parameter determining apparatus, including:
the obtaining module 10 is used for obtaining an output log of the job scheduling system in the backfill scheduling process;
the determining module 11 is configured to determine a backfill state of the backfill scheduling process according to the output log;
and the adjusting module 12 is configured to adjust the backfill scheduling parameter according to the scheduling policy corresponding to the backfill state.
In an embodiment, if the backfill state is a busy state, the adjusting module 12 is specifically configured to detect a job exit reason of the previous backfill scheduling process, and adjust the current backfill scheduling parameter according to the job exit reason.
In one embodiment, the adjusting module 12, as shown in fig. 10, includes:
an extracting unit 120, configured to extract a set queue depth, a timeout state of backfill scheduling, and a job scheduling state of each job from an output log of the previous backfill scheduling process;
a first adjusting unit 121, configured to, when the job quit reason is that the set queue depth is smaller than the current queue actual depth, and the job scheduling status of each job indicates that some jobs are not scheduled, increase the value of a job-related parameter in the current backfill scheduling parameter;
a second adjusting unit 122, configured to increase the backfill scheduling time in the current backfill scheduling parameter when the job exit reason is that the timeout state of the backfill scheduling is backfill scheduling timeout exit, and the job scheduling state of each job indicates that some jobs are not scheduled.
In an embodiment, if the backfill state is an idle state, the adjusting module 12 is specifically configured to reduce the backfill scheduling time and/or set the queue depth in the current backfill scheduling parameter.
In an embodiment, if the number of jobs in the current backfill scheduling parameter is smaller than a preset number threshold, or no job meets the backfill requirement, the adjusting module 12 is specifically configured to increase the time interval of the backfill scheduling in the current backfill scheduling parameter.
Based on the same inventive concept, the embodiment of the present application further provides a job scheduling apparatus for implementing the job scheduling method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so specific limitations in one or more embodiments of the job scheduling device provided below can refer to the limitations on the job scheduling method in the foregoing, and details are not described herein again.
In one embodiment, as shown in fig. 11, there is provided a job scheduling apparatus including:
an obtaining module 20, configured to obtain jobs submitted by each user terminal in the job scheduling system;
the adjusting module 21 is configured to determine a backfill state of the backfill scheduling process according to the output log;
and the scheduling module 22 is configured to adjust the backfill scheduling parameter according to the scheduling policy corresponding to the backfill state.
The determining device of the backfill scheduling parameter or each module in the job scheduling device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing output log data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a method of determining backfill scheduling parameters.
It will be appreciated by those skilled in the art that the configuration shown in fig. 12 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring an output log of a job scheduling system in the previous backfill scheduling process;
determining the backfill state of the current backfill scheduling process according to the output log;
and adjusting the current backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
The implementation principle and technical effect of the computer device provided by the above embodiment are similar to those of the above method embodiment, and are not described herein again.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring an output log of a job scheduling system in the previous backfill scheduling process;
determining the backfill state of the current backfill scheduling process according to the output log;
and adjusting the current backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:
acquiring an output log of a job scheduling system in the previous backfill scheduling process;
determining the backfill state of the current backfill scheduling process according to the output log;
and adjusting the current backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
The foregoing embodiments provide a computer program product, which has similar implementation principles and technical effects to those of the foregoing method embodiments, and will not be described herein again.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring jobs submitted by user terminals in a job scheduling system;
adjusting the backfill scheduling parameters according to the backfill scheduling parameter determining method to obtain adjusted backfill scheduling parameters;
and carrying out backfill scheduling on the jobs submitted by the user terminals according to the adjusted backfill scheduling parameters.
The implementation principle and technical effect of the computer device provided by the above embodiment are similar to those of the above method embodiment, and are not described herein again.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring jobs submitted by user terminals in a job scheduling system;
adjusting the backfill scheduling parameter according to the backfill scheduling parameter determining method to obtain an adjusted backfill scheduling parameter;
and carrying out backfill scheduling on the jobs submitted by the user terminals according to the adjusted backfill scheduling parameters.
The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:
acquiring jobs submitted by user terminals in a job scheduling system;
adjusting the backfill scheduling parameter according to the backfill scheduling parameter determining method to obtain an adjusted backfill scheduling parameter;
and carrying out backfill scheduling on the jobs submitted by the user terminals according to the adjusted backfill scheduling parameters.
The foregoing embodiments provide a computer program product, which has similar implementation principles and technical effects to those of the foregoing method embodiments, and will not be described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A method for determining backfill scheduling parameters, the method comprising:
acquiring an output log of a job scheduling system in the previous backfill scheduling process;
determining the backfill state of the current backfill scheduling process according to the output log;
and adjusting the current backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
2. The method of claim 1, wherein if the backfill state is a busy state, the adjusting the current backfill scheduling parameter according to the scheduling policy corresponding to the backfill state comprises:
and detecting the operation exit reason of the previous backfill scheduling process, and adjusting the current backfill scheduling parameters according to the operation exit reason.
3. The method of claim 2, wherein said adjusting a current backfill scheduling parameter based on said job exit cause comprises:
extracting a set queue depth, a backfill scheduling overtime state and a job scheduling state of each job from an output log of the previous backfill scheduling process;
if the job quit reason is that the set queue depth is smaller than the current queue actual depth and the job scheduling state of each job indicates that part of the jobs are not scheduled, the value of the job related parameter in the current backfill scheduling parameter is increased;
and if the reason for the operation quitting is that the overtime state of the backfill scheduling is the overtime quitting of the backfill scheduling, and the operation scheduling state of each operation indicates that part of the operation is not scheduled, increasing the backfill scheduling time in the current backfill scheduling parameter.
4. The method according to claim 1, wherein if the backfill state is an idle state, the adjusting the current backfill scheduling parameter according to the scheduling policy corresponding to the backfill state comprises:
and adjusting the backfill scheduling time and/or setting the queue depth in the current backfill scheduling parameter.
5. The method of claim 4, further comprising:
and if the number of the jobs in the current backfill scheduling parameter is smaller than a preset number threshold value or no job meets the backfill requirement, increasing the backfill scheduling time interval in the current backfill scheduling parameter.
6. A job scheduling method, comprising:
acquiring jobs submitted by user terminals in a job scheduling system;
adjusting the backfill scheduling parameter according to the backfill scheduling parameter determining method according to any one of claims 1-5, to obtain an adjusted backfill scheduling parameter;
and carrying out backfill scheduling on the jobs submitted by the user terminals according to the adjusted backfill scheduling parameters.
7. An apparatus for determining backfill scheduling parameters, the apparatus comprising:
the acquisition module is used for acquiring an output log of the job scheduling system in the backfill scheduling process;
the determining module is used for determining the backfill state of the backfill scheduling process according to the output log;
and the adjusting module is used for adjusting the backfill scheduling parameters according to the scheduling strategy corresponding to the backfill state.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.
CN202210293938.1A 2022-03-24 2022-03-24 Backfill scheduling parameter determination method, device, equipment, medium and program product Pending CN114691328A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210293938.1A CN114691328A (en) 2022-03-24 2022-03-24 Backfill scheduling parameter determination method, device, equipment, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210293938.1A CN114691328A (en) 2022-03-24 2022-03-24 Backfill scheduling parameter determination method, device, equipment, medium and program product

Publications (1)

Publication Number Publication Date
CN114691328A true CN114691328A (en) 2022-07-01

Family

ID=82139304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210293938.1A Pending CN114691328A (en) 2022-03-24 2022-03-24 Backfill scheduling parameter determination method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN114691328A (en)

Similar Documents

Publication Publication Date Title
US9454389B2 (en) Abstracting a multithreaded processor core to a single threaded processor core
US9870269B1 (en) Job allocation in a clustered environment
CN102043675B (en) Thread pool management method based on task quantity of task processing request
CN107241281B (en) Data processing method and device
CN102541460B (en) Multiple disc management method and equipment
De Assuncao et al. Impact of user patience on auto-scaling resource capacity for cloud services
US9304814B2 (en) Determine variable wait time in an asynchronous call-back system based on calculated average sub-queue wait time
US20120297216A1 (en) Dynamically selecting active polling or timed waits
CN114610474B (en) Multi-strategy job scheduling method and system under heterogeneous supercomputing environment
CN111338791A (en) Method, device and equipment for scheduling cluster queue resources and storage medium
CN112114950A (en) Task scheduling method and device and cluster management system
US20140053164A1 (en) Region-Weighted Accounting of Multi-Threaded Processor Core According to Dispatch State
US20090178045A1 (en) Scheduling Memory Usage Of A Workload
US20100211680A1 (en) Apparatus and method to allocate limited resources
CN109117279A (en) The method that is communicated between electronic device and its limiting process, storage medium
CN114461384A (en) Task execution method and device, computer equipment and storage medium
CN115756143B (en) Energy-saving method and device for data packet processing, computer equipment and storage medium
CN114691328A (en) Backfill scheduling parameter determination method, device, equipment, medium and program product
CN110888726A (en) Multitask concurrent processing method and system
CN110928659A (en) Numerical value pool system remote multi-platform access method with self-adaptive function
Wang et al. A round robin with multiple feedback job scheduler in Hadoop
CN115858499A (en) Database partition processing method and device, computer equipment and storage medium
Zhu et al. Green scheduling: A scheduling policy for improving the energy efficiency of fair scheduler
Zhang et al. A workflow scheduling method for cloudlet management in mobile cloud
CN114489978A (en) Resource scheduling method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination