WO2024001851A1 - Resource scheduling method, apparatus and system - Google Patents

Resource scheduling method, apparatus and system Download PDF

Info

Publication number
WO2024001851A1
WO2024001851A1 PCT/CN2023/101172 CN2023101172W WO2024001851A1 WO 2024001851 A1 WO2024001851 A1 WO 2024001851A1 CN 2023101172 W CN2023101172 W CN 2023101172W WO 2024001851 A1 WO2024001851 A1 WO 2024001851A1
Authority
WO
WIPO (PCT)
Prior art keywords
job
resources
resource
loanable
type
Prior art date
Application number
PCT/CN2023/101172
Other languages
French (fr)
Chinese (zh)
Inventor
栾宏忠
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024001851A1 publication Critical patent/WO2024001851A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of communication technology, and in particular, to a resource scheduling method, device and system.
  • a user can apply for computing resources from the cluster computing system to run a job submitted by the user, and the management node in the cluster computing system schedules computing resources for the job.
  • the computing resources of the cluster computing system are limited, and the management node uses reserved resources to allocate resources to jobs.
  • jobs are not continuously running, that is, the resources reserved for the job are often not in use all the time, which results in low resource utilization.
  • This application provides a resource scheduling method, device and system that can effectively improve resource utilization.
  • this application provides a resource scheduling method, which includes: a management node of a cluster computing system obtains a first job of a resource to be scheduled; and selects a job set of loanable resources in the cluster computing system, where the job set includes at least one Jobs, the at least one job is a job that has allocated resources and can be borrowed resources that match the first job; and then determines the resource scheduling policy of the first job based on the loanable resources of the job set, and the resource scheduling policy is used to indicate that in In the resource borrowing stage, the loanable resource is used to execute the first operation.
  • the management node can determine the resource scheduling policy of the first job based on the job set of loanable resources matching the first job in the cluster computing system. In this way, the resource scheduling policy of the first job can be flexibly allocated to the first job.
  • the loanable resources of the job set that allocates resources are seconded to the first job, making full use of the resources of the cluster computing system and improving the resource utilization of the cluster computing system.
  • the above-mentioned screening of job sets that can be loaned resources in the cluster computing system includes: filtering from one or more jobs according to the resource loan information of one or more jobs in the cluster computing system.
  • a first job set whose job type matches the first job is selected, and a second job set that can borrow resources to meet the resource requirements of the first job is selected from the first job set.
  • the resource loan information is used to indicate the job types supported by the loanable resources of the job when the job's loanable resources are seconded to other jobs.
  • the job types include: at least one of breakpoint resume type, termination type or continuous type; resource
  • the requirements include requirements of resource types and configuration requirements of the resource types, and the resource types include at least one of computing resources, storage resources, or network resources.
  • the above-mentioned determining the resource scheduling policy of the first job based on the loanable resources of the job set includes: determining the resource scheduling policy of the first job based on the loanable resources of the second job set.
  • the job set selected based on job type and resource requirements matches the first job, that is, the resources of the jobs in the job set are qualified to be seconded to the first job.
  • the loanable resources of the job set to the first job so that the resources of the jobs in the job set are fully utilized.
  • the loan duration of the loanable resource is less than or equal to the duration of the resource loanable phase, so that the management node schedules the resource for the first job based on the loan duration.
  • the loanable resources can be seconded to the first operation; when the secondment period expires, the resources can no longer be seconded to the first operation.
  • the second job set includes a second job, and when the job type of the first job is a resumable transfer type, the second job set determines the first job based on the secondable resources of the second job set.
  • the resource scheduling strategy includes: seconding the loanable resources of the second job to the first job during the resource borrowing stage of the second job, and suspending the loaning of the second job when the secondment period of the second job expires. Resources are seconded to the first job; when the second job ends, the seconded resources will be transferred to the first job to continue running the first job.
  • the resource borrowing period of the second job can be seconded to the first job, which improves the utilization of the resources allocated to the second job, and the borrowing time expires. Pause the first job, so that it does not affect the operation of the second job.
  • the second job set includes a second job
  • the resource scheduling policy of the first job is determined based on the loanable resources of the second job set. , including: seconding the loanable resources of the second operation to the first operation during the resource borrowing stage of the second operation, and terminating the secondment of the loanable resources of the second operation to the first operation when the secondment period of the loanable resources expires.
  • the first operation can second the resources of the second operation to the first operation during the resource borrowing period, which improves the utilization of the resources allocated for the second operation, and terminates the first operation when the borrowing time expires. In this way, it does not affect Run the second job.
  • the second job set includes a second job.
  • the resource scheduling policy of the first job is determined based on the loanable resources of the second job set. , including: during the resource borrowing stage of the second operation, seconding the second operation's borrowable resources to the first operation, and when the secondment period of the loanable resources expires, continuing to secondment the second operation's resources to the first operation operation, the resource can be seconded to the first operation during the resource borrowing stage of the second operation, thereby improving the utilization of the resources allocated to the second operation.
  • this application provides a management node, including: an acquisition module, a first determination module, and a second determination module, wherein the acquisition module is used to acquire the first job of resources to be scheduled; and the first determination module is used in the cluster.
  • Screening a job set of loanable resources in the computing system the job set includes at least one job, the at least one job is a job that has allocated resources and can be loaned to the resource that matches the first job; the second determination module is used to select the job set based on the job set.
  • the loanable resources determine the resource scheduling policy of the first job, and the resource scheduling policy is used to indicate the processing method of using the loanable resources to execute the first job during the resource borrowing stage.
  • the above-mentioned first determination module is specifically configured to filter the first job that matches the job type of the first job from one or more jobs according to the resource secondment information of one or more jobs in the cluster computing system.
  • a job set is provided, and a second job set that can borrow resources to meet the resource requirements of the first job is selected from the first job set.
  • the resource loan information is used to indicate the job types supported by the loanable resources of the job when the job's loanable resources are seconded to other jobs.
  • the job types include: at least one of breakpoint resume type, termination type or continuous type; resource
  • the requirements include requirements of resource types and configuration requirements of the resource types, and the resource types include at least one of computing resources, storage resources, or network resources.
  • the above-mentioned second determination module is specifically configured to determine the resource scheduling policy of the first job according to the loanable resources of the second job set.
  • the loan duration of the loanable resource is less than or equal to the duration of the resource's loanable period.
  • the above-mentioned second job set includes a second job.
  • the second determination module is specifically configured to determine when the resources of the second job can be borrowed.
  • the second job's loanable resources are seconded to the first job, and when the secondment period of the second job expires, the second job's loanable resources are suspended to the first job; when the second job ends , schedule the loanable resources to the first job to continue running the first job.
  • the above-mentioned second job set includes a second job.
  • the second determination module is specifically configured to determine the second job during the resource borrowing phase of the second job.
  • the loanable resources of the second operation are seconded to the first operation, and when the secondment period of the loanable resources expires, the secondment of the loanable resources of the second operation to the first operation is terminated.
  • the above-mentioned second job set includes a second job.
  • the second determination module is specifically configured to determine the second job during the resource borrowing stage of the second job.
  • the loanable resources of the second operation are seconded to the first operation, and when the secondment period of the loanable resources expires, the resources of the second operation are seconded to the first operation.
  • the present application provides a management node, including a memory and at least one processor connected to the memory.
  • the memory is used to store computer program code.
  • the computer program code includes computer instructions. When the computer instructions are executed by at least one processor , causing the management node to execute the method described in any one of the first aspect and its possible implementations.
  • the present application provides a computer-readable storage medium on which computer instructions are stored.
  • the computer instructions execute the method described in any one of the first aspect and its possible implementations when the computer is running.
  • the present application provides a computer program product.
  • the computer program product contains computer instructions. When the computer instructions are run on a computer, the method described in any one of the first aspect and its possible implementations is executed.
  • this application provides a chip, including a memory and a processor.
  • the memory is used to store computer instructions.
  • the processor is used to call and run the computer instructions from the memory to execute any of the first aspect and its possible implementations. one of the method described.
  • this application provides a cluster computing system, including a management node and at least one computing node.
  • the management node uses the method described in any one of the first aspects and its possible implementations.
  • Figure 1 is a schematic architectural diagram of a cluster computing system provided by this application.
  • FIG. 2 is a schematic diagram of the stage division of an operation provided by this application.
  • FIG. 3 is a schematic diagram of the hardware structure of a management node provided by this application.
  • Figure 4 is one of the schematic diagrams of a resource scheduling method provided by this application.
  • Figure 5 is one of the schematic diagrams of a resource scheduling strategy provided by this application.
  • Figure 6 is a second schematic diagram of a resource scheduling strategy provided by this application.
  • Figure 7 is the third schematic diagram of a resource scheduling strategy provided by this application.
  • Figure 8 is the fourth schematic diagram of a resource scheduling strategy provided by this application.
  • FIG. 9 is one of the structural schematic diagrams of a management node provided by this application.
  • Figure 10 is the second structural schematic diagram of a management node provided by this application.
  • this application provides a resource scheduling method that screens a set of jobs that can borrow resources in a cluster system, and then determines the resource scheduling strategy for the first job. In this way, the loanable resources of the job can be used to perform the processing of other jobs to achieve the purpose of making full use of resources.
  • the method provided by this application is applied to the cluster computing system.
  • the user can submit a job to the cluster computing system.
  • the management node in the cluster computing system allocates resources to the user's job and runs the job based on the allocated resources.
  • the application scenarios of cluster computing systems can include high performance computing (HPC), artificial intelligence (artificial intelligence, AI), and big data fusion.
  • FIG 1 is a schematic architectural diagram of a cluster computing system.
  • the cluster computing system includes a management node 101 and one or more computing nodes 102.
  • the management node 101 is mainly responsible for the management of the cluster computing system (including configuration management, resource management, etc.).
  • the management node 101 includes a scheduler.
  • the scheduler is used to allocate computing resources to jobs submitted by users and schedule the jobs (or dispatched) to the corresponding computing node 102, so that the computing node 102 executes the processing process of the job based on the allocated resources.
  • the process in which the above scheduler allocates resources to jobs and schedules jobs to corresponding computing nodes is resource scheduling.
  • the resource borrowing stage of a job refers to the stage when the computing resources of the job are allowed to be borrowed. That is, in the resource borrowing phase of the job, the resources of the job (the resources of the job refer to the resources allocated to the job) are allowed to be loaned to other jobs. .
  • the resource unborrowable stage of a job refers to the stage when the computing resources of the job are not allowed to be borrowed. That is, during the resource unborrowable stage of the job, the resources of the job are not allowed to be loaned to other jobs.
  • the resource borrowing stage and the resource non-borrowing stage of a job can be defined based on the resource utilization of the computing node during the operation of the job and/or the time period during which the job is run. Of course, the resource borrowing stage can also be defined based on other factors. This article The application examples are not limiting.
  • the resource utilization of the computing node cannot reach a fully utilized state. In some stages, the resource utilization of the computing node is high, and in some stages, the resource utilization of the computing node is high. Resource utilization is low. Therefore, the job running stage in which the resource utilization rate is lower than the preset utilization rate can be regarded as the resource borrowable stage, and the stage in which the resource utilization rate is higher than the preset utilization rate can be regarded as the resource unborrowable stage.
  • some jobs not only have a real computing phase (hereinafter referred to as the run phase), but also have a job preparation phase (hereinafter referred to as the pre-job phase) and/or a job post-processing phase (hereinafter referred to as the post -job stage).
  • the pre-job stage occurs before the run stage
  • the post-job stage occurs after the run stage.
  • the environment deployment and inspection of the job, or job data transmission (also called large file transfer), etc. are performed. Perform environment cleanup or job data deletion in the post-job phase.
  • job 1 includes a pre-job phase, a run phase and a post-job phase, in which environment deployment and inspection are performed in the pre-job phase, and environment cleaning is performed in the post-job phase; job 2 also includes The pre-job stage, the run stage and the post-job stage, among which, the pre-job stage carries out job data transmission, and the post-job stage carries out job data deletion.
  • the pre-job phase and the post-job phase usually only consume a small amount of resources, that is, the resource utilization rate is low. Therefore, in the embodiment of the present application, the pre-job phase or the post-job phase of the job can be used as a resource to borrow.
  • Stages for example, job 1 in Figure 2 includes two resource borrowing stages, and job 2 also includes two resource borrowing stages.
  • phase with low resource utilization in the run phase of the job, there may be one or more phases with low resource utilization, and the phase with low resource utilization in the run phase of the job can also be used as the resource borrowing phase.
  • other stages in the run stage are the stages where resources cannot be borrowed.
  • the idle phase of the job can be regarded as the resource-borrowable phase
  • the busy phase of the job can be regarded as the resource-unborrowable phase.
  • the device that executes the resource scheduling method is a management node of the cluster computing system.
  • the management node can be a desktop computer, a portable computer, a personal digital assistant (PDA), and other devices.
  • PDA personal digital assistant
  • Figure 3 Please refer to Figure 3 to introduce the hardware structure of the management node provided by this application.
  • the various components shown in Figure 3 may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the management node may include: a processor 301, a memory 302, and a communication interface 303.
  • the processor 301, the memory 302 and the communication interface 303 may be connected to each other through a bus 304, or in other ways.
  • the processor 301 is the control center of the communication device.
  • the processor 301 can be a general central processing unit (CPU) or other general processor.
  • the general processor can be a microprocessor or a CPU. Any regular processor etc.
  • the processor 301 may include an application processor (application processor, AP), a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, etc.
  • the controller in the processor 301 is the nerve center and command center of the communication device.
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 301 may also be provided with a memory for storing instructions and data.
  • the processor 301 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 3 .
  • Memory 302 includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM), fast Flash memory, or optical memory, disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures that can be accessed by a computer.
  • the memory 302 may store computer instructions, which include instructions for executing the resource scheduling method provided by the present application.
  • the memory 302 may exist independently of the processor 301.
  • the memory 302 can be connected to the processor 301 through the bus 304 for storing data, instructions or program codes.
  • the processor 301 calls and executes instructions or program codes stored in the memory 302 to implement the function of a scheduler to schedule resources for jobs submitted by users.
  • the memory 302 can also be integrated with the processor 301 .
  • the communication interface 303 is used for the management node to communicate with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (WLAN), etc.
  • devices or communication networks such as Ethernet, RAN, wireless local area networks (WLAN), etc.
  • the bus 304 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, CXL, UB, etc.
  • ISA industry standard architecture
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • CXL CXL
  • UB UB
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 3, but it does not mean that there is only one bus or one type of bus.
  • management node shown in Figure 3 is only an example of a management node.
  • the management node may have more or less components than those shown in Figure 3, and may combine two or more components. parts, or can have different part configurations.
  • the resource scheduling method provided by the embodiment of the present application is described below with reference to the accompanying drawings.
  • the execution subject of the resource scheduling method is the management node of the cluster computing system.
  • the resource scheduling method provided by the embodiment of the present application may include the following steps:
  • the first job is a job currently submitted by the user, that is, the first job is a job waiting for the management node to schedule resources for it.
  • the management node when the user submits the first job, he needs to indicate the resource requirements of the first job, that is, he needs to indicate the resource requirements of the first job.
  • the resource requirements include resource type requirements and resource type configuration requirements.
  • the resource type may include at least one of computing resources, storage resources or network resources.
  • Computing resources include CPU or GPU, storage resources include memory and hard disk, and network resources include bandwidth.
  • the corresponding resource configuration requirements may be the number of CPU and/or GPU cores; if the resource type is storage resources, the corresponding resource configuration requirements may be the size of memory and/or hard disk; If the resource type is a network resource, the corresponding resource configuration requirement can be bandwidth. For example, when the user submits the first job, he or she may indicate that when allocating resources for the first job, the GPU must be scheduled with two cores.
  • the operation type includes: at least one of a resume type, a termination type, or a continuous type.
  • Breakpoint-resumable jobs refer to jobs that have checkpoints during the running process, that is, the intermediate calculation results need to be cached during the running process of the job.
  • intermediate calculation results need to be cached regularly (which can be said to require regular checkpoints).
  • the intermediate calculation results are not saved, and the final calculation results are saved when the calculation process ends.
  • the intermediate calculations are not cached during the calculation process, As a result, once the intermediate calculation results are lost, the job needs to be restarted. Therefore, by establishing checkpoints, important intermediate calculation results can be stored in reliable storage space to ensure the smooth operation of the business.
  • a terminated job refers to a job that can be terminated periodically. For example, for some applications that save calculation results regularly, the application's job can be terminated after saving the results, that is, the job is killed periodically.
  • Continuous jobs refer to jobs that are not allowed to be paused during the running process, that is, it is necessary to ensure that the job runs continuously until the end of the operation.
  • the parameters of the job include the above job type and the resource requirements of the job.
  • the management node runs the command line, the resource requirements and job type of the job can be obtained.
  • the management node places the job into a job queue corresponding to the job type according to the job type.
  • the parameters of the job queue that borrows resources can be set in the text configuration file in the management node.
  • the queue parameters can include the queue name (QueueName) and the queue type (LoanType). The queue name and queue type are divided accordingly according to the job type. .
  • the queue name can be runlimitQueue or ckpntQueue or waitQueue, where runlimitQueue represents a terminated job queue, ckpntQueue represents a resume-type job queue, and waitQueue represents a continuous job queue;
  • the queue type is runlimit or ckpnt or The wait type, among which, the runlimit type corresponds to the termination type operation, the ckpnt type corresponds to the breakpoint resume type operation, and the wait type corresponds to the continuous type operation.
  • S402. Screen the job set of loanable resources that match the first job in the cluster computing system.
  • the set of jobs that can be borrowed resources obtained through the above screening includes at least one job, and the at least one job is a job to which resources have been allocated.
  • the method for the management node to allocate resources to the one or more jobs is to allocate resources to the one or more jobs according to the resource requirements of the one or more jobs.
  • the above S402 can be implemented through S4021-S4022.
  • the user also carries the resource loan information of the job when submitting the job, so that the management node can store the resource loan information of the job, and the resource loan information of the job is used to indicate that the job will be
  • the job types supported by the loanable resources of at least one job can be understood as the types of jobs to which the loanable resources of at least one job can be loaned.
  • the management node can learn which stage of the job is the resource loanable stage and which stage is the resource.
  • the unborrowable stage for example, the resource borrowable stage of a job is a stage in which the resource utilization rate of the job is lower than the preset utilization rate.
  • the preset resource utilization rate can be determined according to the actual situation, for example, it can be set to 50%.
  • some jobs may include multiple resource-borrowing phases.
  • the loanable resources of the multiple resource-borrowing phases of the same job may support the same business type, or they may Differently, the embodiments of this application are not limited.
  • the following uses an example to illustrate the acquisition process of the first job set.
  • the jobs with allocated resources in the cluster computing system include job1, job2, job3, job4, and job5.
  • job6 The details of the job are shown in Table 1.
  • job6 is a resume-type job
  • jobs matching the job type of the first job i.e., job6
  • job6 are selected from job1, job2, job3, job4, and job5 to be job1 and job5, so the first The job set is represented as ⁇ job1, job5 ⁇ .
  • the resource type includes at least one of computing resources, storage resources, or network resources.
  • computing resources storage resources, or network resources.
  • network resources For related content on computing resources, storage resources, and network resources, please refer to the description of the above embodiments, and will not be described again here.
  • the resource requirement of at least one job meeting the resource requirement of the first job means that the resource configuration of the resources corresponding to at least one job is higher than the resource configuration requirement of the first job.
  • the first job set includes two jobs, job1 and job5. Assume that the resources allocated for job1 are 4-core CPU and 4M memory; the resources allocated for job5 are 1-core CPU, 8M memory, the resource requirement of the first job is 5M memory; then the job selected from job1 and job5 that meets the resource requirement of the first job is job5, so the second job set is expressed as ⁇ job5 ⁇ .
  • the process in the process of selecting the job set from one or more jobs with allocated resources in the cluster computing system, is not limited to filtering based on resource requirements and job types supported by seconded resources.
  • the above S4021-S4022 is to filter at least one job that matches the resource requirements of the first job according to the job type. That is to say, the resource requirements of at least one job meet the resource requirements of the first job, and the secondable resources of at least one job support the second job.
  • the job type of a job is to say, the resource requirements of at least one job meet the resource requirements of the first job, and the secondable resources of at least one job support the second job.
  • screening the job set of loanable resources that match the first job in the cluster computing system may include: filtering from one or more jobs that can loan resources in the cluster computing system that satisfy the first job resource requirements or filter the job set that matches the job type of the first job.
  • S403. Determine the resource scheduling policy of the first job according to the loanable resources of the job set.
  • the resource scheduling policy of the first job is used to indicate the processing method of executing the first job using loanable resources during the resource borrowing stage. It should be understood that when the job type of the first job is different, the processing method of executing the first job is different.
  • the job set includes at least one job, and the resource scheduling policy of the first job is determined in S403. Including the following two situations:
  • Case 1 If the resource requirements of the at least one job and the job types supported by the loanable resources match the first job, the resource scheduling policy of the first job is determined based on all the resources in the loanable resources of the job set.
  • Case 2 If the resource requirements of the at least one job or the job type supported by the loanable resources match the first job, then a subset of the loanable resources based on the job set (the subset of loanable resources refers to the loanable resources) part of the resources) determines the resource scheduling strategy of the first job. For example, if the job type supported by the loanable resources of at least one job matches the first job, it is necessary to further filter out one or more jobs from the at least one job, and the resource requirements of the one or more jobs are consistent with the first job.
  • the resource requirements of the job are matched, and then the resource scheduling strategy of the first job is determined based on the loanable resources of the one or more jobs, that is, the subset of loanable resources mentioned above refers to the loanable resources of one or more jobs in the job set. resource.
  • the user can also specify the loan duration of the job's loanable resources.
  • the loan duration of the job's loanable resources refers to the secondment of the job's loanable resources.
  • the length of time the first operation can utilize the seconded resource.
  • the management node obtains the secondment period, it can schedule resources for the first job according to the secondment period. It should be noted that when the secondment period expires, its resources can no longer be seconded to the first job.
  • loan duration of a job's loanable resources is less than or equal to the duration of the loanable phase of the job's resources.
  • loan duration of the job's loanable resources is equal to the duration of the job's borrowable resources, the user also submits the job.
  • the loan duration does not need to be specified.
  • the default loan duration is the duration of the period when the resource can be borrowed.
  • the second job set includes a second job.
  • determining the resource scheduling policy of the first job based on the loanable resources of the second job set specifically determines the resources of the first job based on the loanable resources of the second job. Scheduling strategy.
  • resource scheduling strategies are different.
  • the following takes the resumption type, termination type and continuous type in the above embodiment as examples to introduce the scheduling strategies of these three types of jobs respectively.
  • the jobs to which resources have been allocated in the cluster computing system include Job 1 to Job 7.
  • the details of the jobs are as shown in Table 2 below.
  • the first job of the resource to be scheduled is recorded as job 8. It is assumed that the job type of job 8 is breakpoint resume type.
  • the above-mentioned S4021 is executed to filter jobs whose job types match the first job, and the first job set is obtained as ⁇ job 1, job 3, job 6 ⁇ . Assume that S4022 is executed to filter jobs whose resource requirements match the first job.
  • the second job set is obtained as ⁇ job 1 ⁇ .
  • the running conditions of job 1 and job 8 can be referred to (a) in Figure 5.
  • the second job is job 1.
  • the scheduler dispatches job 1 to a computing node. After job 1 enters the resource borrowing stage, the scheduler Job 8 submitted by the user is also dispatched to the computer node, and the computing node is instructed to use the resources of job 1 to run the job during the resource borrowing stage of job 1.
  • Job 8 when the resource borrowing phase of job 1 expires, the management node notifies the computing node to stop running job 8 (that is, job 8 performs checkpoint), and notifies the computing node to start the computing phase of job 1; when the computing phase of job 1 runs At the end, the computing node notifies the scheduler that job 1 has ended. Afterwards, the scheduler notifies the computing node to continue running job 8, that is, restarting job 8.
  • the above-mentioned S4021 is executed to filter jobs whose job types match the first job, and the first job set is obtained as ⁇ job 1, job 3, job 6 ⁇ .
  • S4022 is executed to filter jobs whose resource requirements match the first job.
  • the second job set is obtained as ⁇ Job 6 ⁇ , and the loaning time of the loanable resources of Job 6 is 30 minutes (the resource borrowing period of Job 6 is longer than 30 minutes).
  • the running status of Job 6 and Job 8 can be referred to In (b) in Figure 5, the computing node uses the resources of job 6 to run job 8 during the resource borrowing stage of job 6; when the loan period of job 6 expires, it stops running job 8 (that is, job 8 performs checkpoint); When job 6 ends, the computing node continues to run job 8, that is, restarts job 8.
  • the second job set includes the second job, and the above S4031 specifically includes S4031b:
  • the first job of the resource to be scheduled is recorded as job 8. It is assumed that the job type of job 8 is termination type.
  • the above-mentioned S4021 is executed to filter jobs whose job types match the first job, and the first job set is obtained as ⁇ job 2, job 5, job 7 ⁇ .
  • S4022 is executed to filter jobs whose resource requirements match the first job.
  • the second job set is obtained as ⁇ job 2 ⁇ .
  • the running status of Job 2 and Job 8 can be referred to (a) in Figure 6.
  • the computing node uses the resources of Job 2 to run Job 8 during the resource borrowing phase of Job 2; when the resource borrowing phase of Job 2 expires, Terminate running job 8 (that is, job 8 is killed).
  • the above-mentioned S4021 is executed to filter jobs whose job type matches the first job, and the first job set is obtained as ⁇ job 2, job 5, job 7 ⁇ .
  • S4022 is executed to filter jobs whose resource requirements match the first job.
  • the second job set is obtained as ⁇ job 5 ⁇ .
  • the running status of Job 5 and Job 8 can be referred to (b) in Figure 6.
  • the computing node uses the resources of Job 5 to run Job 8; when the resource borrowing phase of Job 5 expires, Terminate running job 8 (that is, job 8 is killed).
  • the second job set includes the second job, and the above S4031 specifically includes S4031c:
  • the first job of the resource to be scheduled is recorded as job 8.
  • job type of job 8 is continuous.
  • the above S4021 is executed to filter the job type and the first job.
  • a job matches the job, and the first job set is obtained as ⁇ Job 4 ⁇
  • S4022 is executed to determine that Job 4 meets the resource requirements of the first job, so the second job set is obtained as ⁇ Job 4 ⁇ .
  • the running status of Job 4 and Job 8 can be seen in Figure 7.
  • the computing node uses the resources of Job 4 to run Job 8; when the resource borrowing phase of Job 4 expires, Job 4 is suspended. Continue to run job 8; when job 8 ends, continue to run job 4, that is, job 4 needs to wait for job 8 to end before it can continue to run.
  • the job set matching the first job obtained by executing the above S4021-S4022 may include multiple jobs.
  • the multiple jobs not only match the resource requirements of the first job, but also support the first job.
  • the scheduler can select one job from multiple jobs as the target job (ie, the above-mentioned second job), and then second the resources of the target job to the first job.
  • the scheduler can select the job that first enters the job queue as the target job from multiple jobs based on the first-in, first-out principle; or the scheduler can also randomly select a job from multiple jobs as the target job, Second the resources of the target job to the first job; or, the scheduler can select the highest priority job as the target job based on the secondment priorities of multiple jobs.
  • the resources of the second job can be borrowed.
  • the resources of the second job can also be seconded to one or more other jobs.
  • the resources of the second job are be fully utilized.
  • the first job is a short job. Since the short job consumes very few resources, there are still many remaining resources during the resource borrowing stage of the second job, and the remaining resources can also be seconded to other short jobs.
  • the second job is job 3 among the seven jobs mentioned above, and the first job is job 8, during the resource borrowing stage of job 3, job 8, job 9 and job can be run. 10.
  • the job set matching the first job obtained by executing the above S4021-S4022 includes multiple jobs, the multiple jobs support the job type of the first job, and each of the multiple jobs If the resources of a single job cannot meet the resource requirements of the first job, the sum of the resources of the multiple jobs can meet the resource requirements of the first job.
  • the multiple jobs are all used as target jobs (that is, the second job includes multiple jobs), that is, the scheduler lends the loanable resources of the multiple jobs to the first job.
  • the first job of the resource to be scheduled is recorded as job 8.
  • job type of job 8 is a resumable transfer type, in one case, by The set of jobs matching the first job obtained by executing S4021-S4022 is ⁇ job 1, job 3 ⁇ . In this way, during the resource borrowing stage of job 1 and job 3, job 8 is run using the resources of job 1 and job 3.
  • job 8 will be paused; when job 1 and job 3 both finish running, job 8 will continue to run.
  • job set matching the first job obtained by executing S4021-S4022 is ⁇ job 2, job 7 ⁇ . In this way, between job 2 and job 7 In the resource borrowing stage, use the resources of Job 2 and Job 7 to run Job 8. When the loan duration of Job 2 and Job 7 with the shortest loan duration expires, job 8 will be terminated.
  • the management node can filter out at least one available resource that has been allocated and matches the first job among multiple jobs in the cluster computing system.
  • the job that borrows resources obtains a job set, and then determines the resource scheduling policy of the first job based on the loanable resources of the job set.
  • the loanable resources of the job set that have allocated resources can be flexibly seconded to the first job, which fully Utilizing the resources of the cluster computing system improves the resource utilization of the cluster computing system.
  • the embodiment of the present application provides a management node.
  • the management node can be divided into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two functional modules can be divided into two. Or two or more functions are integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • FIG. 9 shows a possible structural diagram of the management node involved in the above embodiment.
  • the management node includes an acquisition module 901, a first determination module 902 and a second determination module 903.
  • the acquisition module 901 is used to acquire the first job of the resource to be scheduled, for example, executing S401 in the above method embodiment.
  • the first determination module 902 is configured to screen a job set that can lend resources in the cluster computing system.
  • the job set includes at least one job, and the at least one job is a job that has allocated resources and matches the first job and can lend resources, For example, S402 in the above method embodiment is executed.
  • the second determination module 903 is configured to determine the resource scheduling policy of the first job based on the loanable resources of the job set.
  • the resource scheduling policy is used to indicate the processing method of using the loanable resources to execute the first job during the resource borrowing stage, for example, executing the above S403 in the method embodiment.
  • the management node in this embodiment of the present invention can be implemented by a central processing unit (CPU), an application-specific integrated circuit (ASIC), or a programmable logic device.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above PLD can be a complex programmable logical device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL ), data processing unit (DPU), system on chip (SoC), or any combination thereof.
  • CPLD complex programmable logical device
  • field-programmable gate array field-programmable gate array
  • FPGA field-programmable gate array
  • GAL general array logic
  • DPU data processing unit
  • SoC system on chip
  • the above-mentioned first determination module 902 is specifically configured to filter the first job that matches the job type of the first job from the one or more jobs according to the resource secondment information of one or more jobs in the cluster computing system.
  • a job set and select a second job set from the first job set that can lend resources to meet the resource requirements of the first job.
  • the resource loan information is used to indicate the job types supported by the loanable resources of the job when the job's loanable resources are seconded to other jobs.
  • the job types include: at least one of breakpoint resume type, termination type or continuous type; resource Requirements include resource type requirements and resource type configuration requirements.
  • the resource type includes at least one of computing resources, storage resources or network resources, such as performing S4021-S4022 in the above method embodiment.
  • the above-mentioned second determination module 903 is specifically configured to determine the resource scheduling policy of the first job based on the loanable resources of the second job set, for example, executing S4031 in the above method embodiment.
  • the second job set includes a second job.
  • the second determination module 903 is specifically configured to assign the second job to the second job during the resource borrowing stage of the second job.
  • the loanable resources of the job are seconded to the first job, and when the loan duration of the loanable resources expires, the second job's loanable resources are suspended to the first job; when the second job ends, the loanable resources are Schedule to the first job to continue running the first job, for example, perform S4031a in the above method embodiment.
  • the above-mentioned second job set includes a second job.
  • the second determination module 903 is specifically configured to transfer the second job's
  • the loanable resources are seconded to the first operation, and when the loan duration of the loanable resources expires, seconding the loanable resources of the second operation to the first operation is terminated, for example, S4031b in the above method embodiment is executed.
  • the above-mentioned second job set includes a second job.
  • the second determination module 903 is specifically configured to, in the resource borrowing stage of the second job, change the available resource of the second job.
  • the seconded resource is seconded to the first operation, and when the secondment period of the secondable resource expires, the resource of the second operation is continued to be seconded to the first operation, for example, S4031c in the above method embodiment is executed.
  • Each module of the above-mentioned management node can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be quoted from the functional description of the corresponding functional module, which will not be described again here.
  • FIG. 10 shows another possible structural diagram of the management node involved in the above embodiment.
  • the management node provided by the embodiment of the present application may include: a processing module 1001 and a communication module 1002.
  • the processing module 1001 can be used to control and manage the actions of the management node.
  • the processing module 1001 can be used to support the management node to perform S401, S402 (including S4021), S403 (including S4031, S4031a, S4031b and S4031c), and/or other processes for the techniques described herein.
  • the communication module 1002 may be used to support communication between the management node and other network entities, for example, to support communication between the management node and a computing node.
  • the management node may also include a storage module 1003 for storing computer instructions and data.
  • the processing module 1001 can be a processor or a controller (for example, it can be the above-mentioned processor 301 shown in Figure 3), and the above-mentioned processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors. , a combination of DSP and microprocessor, etc.
  • the communication module 1002 may be a communication interface (for example, it may be the above-mentioned communication interface 303 as shown in Figure 3).
  • the storage module 1003 may be a memory (for example, it may be the above-mentioned memory 302 shown in Figure 1).
  • the processing module 1001 is a processor
  • the communication module 1002 is a communication interface
  • the storage module 1003 is a memory
  • the processor, transceiver and memory can be connected through a bus.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • a software program it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wirelessly (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (e.g., floppy disks, magnetic disks, magnetic tapes), optical media (e.g., digital video discs (DVD)), or semiconductor media (e.g., solid state drives). SSD)) etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Provided are a resource scheduling method, apparatus and system, which relate to the technical field of communications, and can increase the resource utilization rate. The resource scheduling method comprises: acquiring a first job of a resource to be scheduled; screening for a job set of borrowable resources in a cluster computing system, wherein the job set comprises at least one job, and the at least one job is a job of an allocated resource and is a borrowable resource matching the first job; and determining a resource scheduling policy for the first job according to the borrowable resources for the job set, wherein the resource scheduling policy is used for indicating a processing mode in which the first job is executed using the borrowable resources for the job set at a resource-borrowable stage.

Description

一种资源调度方法、装置及系统A resource scheduling method, device and system
本申请要求于2022年06月27日提交国家知识产权局、申请号202210741724.6、申请名称为“一种资源调度的方法”的中国专利申请和2022年11月15日提交国家知识产权局、申请号202211427018.0、申请名称为“一种资源调度方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application is required to be submitted to the State Intellectual Property Office on June 27, 2022, application number 202210741724.6, and the Chinese patent application titled "A method of resource scheduling" and submitted to the State Intellectual Property Office on November 15, 2022, application number 202211427018.0, the priority of the Chinese patent application titled "A Resource Scheduling Method, Device and System", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及通信技术领域,尤其涉及一种资源调度方法、装置及系统。The present application relates to the field of communication technology, and in particular, to a resource scheduling method, device and system.
背景技术Background technique
在集群计算的场景中,用户可以向集群计算系统申请计算资源以运行该用户提交的作业,集群计算系统中的管理节点为该作业调度计算资源。In a cluster computing scenario, a user can apply for computing resources from the cluster computing system to run a job submitted by the user, and the management node in the cluster computing system schedules computing resources for the job.
通常,集群计算系统的计算资源是有限的,管理节点会采用预留资源方式为作业分配资源。然而,作业并非处于持续运行状态,也即为作业预留的资源往往不是一直处于被使用状态,这就导致资源的利用率较低。Usually, the computing resources of the cluster computing system are limited, and the management node uses reserved resources to allocate resources to jobs. However, jobs are not continuously running, that is, the resources reserved for the job are often not in use all the time, which results in low resource utilization.
发明内容Contents of the invention
本申请提供一种资源调度方法、装置及系统,能够有效提升资源利用率。This application provides a resource scheduling method, device and system that can effectively improve resource utilization.
为达到上述目的,本申请采用如下技术方案:In order to achieve the above purpose, this application adopts the following technical solutions:
第一方面,本申请提供一种资源调度方法,包括:集群计算系统的管理节点获取待调度资源的第一作业;并且在集群计算系统中筛选可借调资源的作业集合,该作业集合包括至少一个作业,该至少一个作业为已分配资源,且与第一作业匹配的可借调资源的作业;进而根据该作业集合的可借调资源确定第一作业的资源调度策略,该资源调度策略用于指示在资源可借阶段利用可借调资源执行第一作业的处理方式。In a first aspect, this application provides a resource scheduling method, which includes: a management node of a cluster computing system obtains a first job of a resource to be scheduled; and selects a job set of loanable resources in the cluster computing system, where the job set includes at least one Jobs, the at least one job is a job that has allocated resources and can be borrowed resources that match the first job; and then determines the resource scheduling policy of the first job based on the loanable resources of the job set, and the resource scheduling policy is used to indicate that in In the resource borrowing stage, the loanable resource is used to execute the first operation.
本申请中,对于待调度资源的第一作业,由于管理节点可以根据集群计算系统中与第一作业匹配的可借调资源的作业集合确定第一作业的资源调度策略,如此,能够灵活地将已分配资源的作业集合的可借调资源借调至第一作业,充分利用集群计算系统的资源,提升了集群计算系统的资源利用率。In this application, for the first job of resources to be scheduled, the management node can determine the resource scheduling policy of the first job based on the job set of loanable resources matching the first job in the cluster computing system. In this way, the resource scheduling policy of the first job can be flexibly allocated to the first job. The loanable resources of the job set that allocates resources are seconded to the first job, making full use of the resources of the cluster computing system and improving the resource utilization of the cluster computing system.
在一种可能的实现方式中,上述在集群计算系统中筛选可借调资源的作业集合,包括:根据集群计算系统中的一个或多个作业的资源借调信息,从一个或多个作业中筛选与第一作业的作业类型匹配的第一作业集合,再从第一作业集合中筛选可借调资源满足第一作业的资源需求的第二作业集合。其中,资源借调信息用于指示将作业的可借调资源借调至其他作业时该可借调资源支持的作业类型,作业类型包括:断点续传型、终止型或连续型中的至少一种;资源需求包括资源类型的需求以及资源类型的配置需求,该资源类型包括计算资源、存储资源或网络资源中的至少一种。In a possible implementation, the above-mentioned screening of job sets that can be loaned resources in the cluster computing system includes: filtering from one or more jobs according to the resource loan information of one or more jobs in the cluster computing system. A first job set whose job type matches the first job is selected, and a second job set that can borrow resources to meet the resource requirements of the first job is selected from the first job set. The resource loan information is used to indicate the job types supported by the loanable resources of the job when the job's loanable resources are seconded to other jobs. The job types include: at least one of breakpoint resume type, termination type or continuous type; resource The requirements include requirements of resource types and configuration requirements of the resource types, and the resource types include at least one of computing resources, storage resources, or network resources.
在另一种可能的实现方式中,上述根据作业集合的可借调资源确定第一作业的资源调度策略,包括:根据第二作业集合的可借调资源确定第一作业的资源调度策略。In another possible implementation, the above-mentioned determining the resource scheduling policy of the first job based on the loanable resources of the job set includes: determining the resource scheduling policy of the first job based on the loanable resources of the second job set.
本申请中,根据作业类型以及资源需求筛选的作业集合(即上述的第二作业集合)与第一作业匹配,即该作业集合中的作业的资源具备借调至第一作业的条件,在此基础上,将该作业集合的可借调资源借调至第一作业,使得作业集合中的作业的资源得到充分利用。In this application, the job set selected based on job type and resource requirements (i.e., the above-mentioned second job set) matches the first job, that is, the resources of the jobs in the job set are qualified to be seconded to the first job. On this basis on, seconding the loanable resources of the job set to the first job, so that the resources of the jobs in the job set are fully utilized.
在另一种可能的实现方式中,可借调资源的借调时长小于或等于资源可借阶段的时长,从而管理节点依据借调时长为第一作业调度资源。在借调时长到期之前,将可借调资源借调至第一作业;当借调时长到期时,不能再将其资源借调至第一作业。In another possible implementation manner, the loan duration of the loanable resource is less than or equal to the duration of the resource loanable phase, so that the management node schedules the resource for the first job based on the loan duration. Before the secondment period expires, the loanable resources can be seconded to the first operation; when the secondment period expires, the resources can no longer be seconded to the first operation.
在另一种可能的实现方式中,上述第二作业集合包括第二作业,当第一作业的作业类型为断点续传型时,上述根据第二作业集合的可借调资源确定第一作业的资源调度策略,包括:在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,暂停将第二作业的可借调资源借调至第一作业;当第二作业运行结束时,将可借调资源调 度至第一作业,以继续运行第一作业,能够将第二作业的资源可借阶段将资源借调至第一作业,提升了为第二作业分配的资源的利用率,并且可借时长到期暂停第一作业,如此,不影响第二作业的运行。In another possible implementation, the second job set includes a second job, and when the job type of the first job is a resumable transfer type, the second job set determines the first job based on the secondable resources of the second job set. The resource scheduling strategy includes: seconding the loanable resources of the second job to the first job during the resource borrowing stage of the second job, and suspending the loaning of the second job when the secondment period of the second job expires. Resources are seconded to the first job; when the second job ends, the seconded resources will be transferred to the first job to continue running the first job. The resource borrowing period of the second job can be seconded to the first job, which improves the utilization of the resources allocated to the second job, and the borrowing time expires. Pause the first job, so that it does not affect the operation of the second job.
在另一种可能的实现方式中,上述第二作业集合包括第二作业,当第一作业的作业类型为终止型时,上述根据第二作业集合的可借调资源确定第一作业的资源调度策略,包括:在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,终止将第二作业的可借调资源借调至第一作业,能够将第二作业的资源可借阶段将资源借调至第一作业,提升了为第二作业分配的资源的利用率,并且可借时长到期终止第一作业,如此,不影响第二作业的运行。In another possible implementation, the second job set includes a second job, and when the job type of the first job is terminated, the resource scheduling policy of the first job is determined based on the loanable resources of the second job set. , including: seconding the loanable resources of the second operation to the first operation during the resource borrowing stage of the second operation, and terminating the secondment of the loanable resources of the second operation to the first operation when the secondment period of the loanable resources expires. The first operation can second the resources of the second operation to the first operation during the resource borrowing period, which improves the utilization of the resources allocated for the second operation, and terminates the first operation when the borrowing time expires. In this way, it does not affect Run the second job.
在另一种可能的实现方式中,上述第二作业集合包括第二作业,当第一作业的作业类型为连续型时,上述根据第二作业集合的可借调资源确定第一作业的资源调度策略,包括:在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,继续将第二作业的资源借调至第一作业,能够将第二作业的资源可借阶段将资源借调至第一作业,提升了为第二作业分配的资源的利用率。In another possible implementation, the second job set includes a second job. When the job type of the first job is continuous, the resource scheduling policy of the first job is determined based on the loanable resources of the second job set. , including: during the resource borrowing stage of the second operation, seconding the second operation's borrowable resources to the first operation, and when the secondment period of the loanable resources expires, continuing to secondment the second operation's resources to the first operation operation, the resource can be seconded to the first operation during the resource borrowing stage of the second operation, thereby improving the utilization of the resources allocated to the second operation.
第二方面,本申请提供一种管理节点,包括:获取模块、第一确定模块以及第二确定模块,其中,获取模块用于获取待调度资源的第一作业;第一确定模块用于在集群计算系统中筛选可借调资源的作业集合,该作业集合包括至少一个作业,该至少一个作业为已分配资源,且与第一作业匹配的可借调资源的作业;第二确定模块用于根据作业集合的可借调资源确定第一作业的资源调度策略,该资源调度策略用于指示在资源可借阶段利用可借调资源执行第一作业的处理方式。In a second aspect, this application provides a management node, including: an acquisition module, a first determination module, and a second determination module, wherein the acquisition module is used to acquire the first job of resources to be scheduled; and the first determination module is used in the cluster. Screening a job set of loanable resources in the computing system, the job set includes at least one job, the at least one job is a job that has allocated resources and can be loaned to the resource that matches the first job; the second determination module is used to select the job set based on the job set. The loanable resources determine the resource scheduling policy of the first job, and the resource scheduling policy is used to indicate the processing method of using the loanable resources to execute the first job during the resource borrowing stage.
一种可能的实现方式中,上述第一确定模块具体用于根据集群计算系统中的一个或多个作业的资源借调信息,从一个或多个作业中筛选与第一作业的作业类型匹配的第一作业集合,并且从第一作业集合中筛选可借调资源满足第一作业的资源需求的第二作业集合。其中,资源借调信息用于指示将作业的可借调资源借调至其他作业时该可借调资源支持的作业类型,作业类型包括:断点续传型、终止型或连续型中的至少一种;资源需求包括资源类型的需求以及资源类型的配置需求,该资源类型包括计算资源、存储资源或网络资源中的至少一种。In a possible implementation manner, the above-mentioned first determination module is specifically configured to filter the first job that matches the job type of the first job from one or more jobs according to the resource secondment information of one or more jobs in the cluster computing system. A job set is provided, and a second job set that can borrow resources to meet the resource requirements of the first job is selected from the first job set. The resource loan information is used to indicate the job types supported by the loanable resources of the job when the job's loanable resources are seconded to other jobs. The job types include: at least one of breakpoint resume type, termination type or continuous type; resource The requirements include requirements of resource types and configuration requirements of the resource types, and the resource types include at least one of computing resources, storage resources, or network resources.
在另一种可能的实现方式中,上述第二确定模块具体用于根据第二作业集合的可借调资源确定第一作业的资源调度策略。In another possible implementation, the above-mentioned second determination module is specifically configured to determine the resource scheduling policy of the first job according to the loanable resources of the second job set.
在另一种可能的实现方式中,可借调资源的借调时长小于或等于资源可借阶段的时长。In another possible implementation, the loan duration of the loanable resource is less than or equal to the duration of the resource's loanable period.
在另一种可能的实现方式中,上述第二作业集合包括第二作业,当第一作业的作业类型为断点续传型时,第二确定模块具体用于在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,暂停将第二作业的可借调资源借调至第一作业;当第二作业运行结束时,将可借调资源调度至第一作业,以继续运行第一作业。In another possible implementation, the above-mentioned second job set includes a second job. When the job type of the first job is a breakpoint resumable transfer type, the second determination module is specifically configured to determine when the resources of the second job can be borrowed. In the stage, the second job's loanable resources are seconded to the first job, and when the secondment period of the second job expires, the second job's loanable resources are suspended to the first job; when the second job ends , schedule the loanable resources to the first job to continue running the first job.
在另一种可能的实现方式中,上述第二作业集合包括第二作业,当第一作业的作业类型为终止型时,第二确定模块具体用于在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,终止将第二作业的可借调资源借调至第一作业。In another possible implementation, the above-mentioned second job set includes a second job. When the job type of the first job is a termination type, the second determination module is specifically configured to determine the second job during the resource borrowing phase of the second job. The loanable resources of the second operation are seconded to the first operation, and when the secondment period of the loanable resources expires, the secondment of the loanable resources of the second operation to the first operation is terminated.
在另一种可能的实现方式中,上述第二作业集合包括第二作业,当第一作业的作业类型为连续型时,第二确定模块具体用于在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,继续将第二作业的资源借调至第一作业。In another possible implementation, the above-mentioned second job set includes a second job. When the job type of the first job is continuous, the second determination module is specifically configured to determine the second job during the resource borrowing stage of the second job. The loanable resources of the second operation are seconded to the first operation, and when the secondment period of the loanable resources expires, the resources of the second operation are seconded to the first operation.
第三方面,本申请提供一种管理节点,包括存储器和与存储器连接的至少一个处理器,存储器用于存储计算机程序代码,该计算机程序代码包括计算机指令,当计算机指令被至少一个处理器执行时,使得管理节点执行第一方面及其可能的实现方式中任意之一所述的方法。In a third aspect, the present application provides a management node, including a memory and at least one processor connected to the memory. The memory is used to store computer program code. The computer program code includes computer instructions. When the computer instructions are executed by at least one processor , causing the management node to execute the method described in any one of the first aspect and its possible implementations.
第四方面,本申请提供一种计算机可读存储介质,其上存储有计算机指令,该计算机指令在计算机上述运行时,执行第一方面及其可能的实现方式中任意之一所述的方法。In a fourth aspect, the present application provides a computer-readable storage medium on which computer instructions are stored. The computer instructions execute the method described in any one of the first aspect and its possible implementations when the computer is running.
第五方面,本申请提供一种计算机程序产品,该计算机程序产品包含计算机指令,当计算机指令在计算机上运行时,执行第一方面及其可能的实现方式中任意之一所述的方法。In a fifth aspect, the present application provides a computer program product. The computer program product contains computer instructions. When the computer instructions are run on a computer, the method described in any one of the first aspect and its possible implementations is executed.
第六方面,本申请提供一种芯片,包括存储器和处理器,存储器用于存储计算机指令,处理器用于从存储器中调用并运行该计算机指令,以执行第一方面及其可能的实现方式中任意之一所 述的方法。In a sixth aspect, this application provides a chip, including a memory and a processor. The memory is used to store computer instructions. The processor is used to call and run the computer instructions from the memory to execute any of the first aspect and its possible implementations. one of the method described.
第七方面,本申请提供一种集群计算系统,包括管理节点和至少一个计算节点,该管理节点第一方面及其可能的实现方式中任意之一所述的方法。In a seventh aspect, this application provides a cluster computing system, including a management node and at least one computing node. The management node uses the method described in any one of the first aspects and its possible implementations.
应当理解的是,本申请的第二方面至第七方面技术方案及对应的可能的实施方式所取得的有益效果可以参见上述对第一方面及其对应的可能的实施方式的技术效果,此处不再赘述。It should be understood that the beneficial effects achieved by the technical solutions of the second to seventh aspects of the present application and the corresponding possible implementations can be referred to the above-mentioned technical effects of the first aspect and the corresponding possible implementations, here No longer.
附图说明Description of drawings
图1为本申请提供的一种集群计算系统的架构示意图;Figure 1 is a schematic architectural diagram of a cluster computing system provided by this application;
图2为本申请提供的一种作业的阶段划分示意图;Figure 2 is a schematic diagram of the stage division of an operation provided by this application;
图3为本申请提供的一种管理节点的硬件结构示意图;Figure 3 is a schematic diagram of the hardware structure of a management node provided by this application;
图4为本申请提供的一种资源调度方法示意图之一;Figure 4 is one of the schematic diagrams of a resource scheduling method provided by this application;
图5为本申请提供的一种资源调度策略的示意图之一;Figure 5 is one of the schematic diagrams of a resource scheduling strategy provided by this application;
图6为本申请提供的一种资源调度策略示意图之二;Figure 6 is a second schematic diagram of a resource scheduling strategy provided by this application;
图7为本申请提供的一种资源调度策略示意图之三;Figure 7 is the third schematic diagram of a resource scheduling strategy provided by this application;
图8为本申请提供的一种资源调度策略示意图之四;Figure 8 is the fourth schematic diagram of a resource scheduling strategy provided by this application;
图9为本申请提供的一种管理节点的结构示意图之一;Figure 9 is one of the structural schematic diagrams of a management node provided by this application;
图10为本申请提供的一种管理节点的结构示意图之二。Figure 10 is the second structural schematic diagram of a management node provided by this application.
具体实施方式Detailed ways
为了解决现有技术中资源利用率低的问题,本申请提供一种资源调度方法,在集群系统中筛选可调借资源的作业集合,进而确定第一作业的资源调度策略。由此利用作业的可借调资源执行其他作业的处理过程,实现充分利用资源的目的。In order to solve the problem of low resource utilization in the prior art, this application provides a resource scheduling method that screens a set of jobs that can borrow resources in a cluster system, and then determines the resource scheduling strategy for the first job. In this way, the loanable resources of the job can be used to perform the processing of other jobs to achieve the purpose of making full use of resources.
下面结合附图详细介绍本申请所提供的资源调度方法。The resource scheduling method provided by this application will be introduced in detail below with reference to the accompanying drawings.
本申请提供的方法应用于集群计算系统,用户可以向集群计算系统提交作业,集群计算系统中的管理节点为该用户的作业分配资源,并基于分配的资源运行作业。集群计算系统的应用场景可以包括高性能计算(high performance computing,HPC)、人工智能(artificial intelligence,AI)以及大数据融合等。The method provided by this application is applied to the cluster computing system. The user can submit a job to the cluster computing system. The management node in the cluster computing system allocates resources to the user's job and runs the job based on the allocated resources. The application scenarios of cluster computing systems can include high performance computing (HPC), artificial intelligence (artificial intelligence, AI), and big data fusion.
图1为一种集群计算系统的架构示意图,参考图1,集群计算系统包括管理节点101和一个或多个计算节点102。其中,管理节点101主要负责集群计算系统的管理工作(包括配置管理、资源管理等),该管理节点101中包括调度器,调度器用于为用户提交的作业分配计算资源,并将作业调度(或派发)至对应的计算节点102,从而使得计算节点102基于分配的资源执行作业的处理过程。上述调度器为作业分配资源并调度作业至对应的计算节点的过程即为资源调度。Figure 1 is a schematic architectural diagram of a cluster computing system. Referring to Figure 1, the cluster computing system includes a management node 101 and one or more computing nodes 102. Among them, the management node 101 is mainly responsible for the management of the cluster computing system (including configuration management, resource management, etc.). The management node 101 includes a scheduler. The scheduler is used to allocate computing resources to jobs submitted by users and schedule the jobs (or dispatched) to the corresponding computing node 102, so that the computing node 102 executes the processing process of the job based on the allocated resources. The process in which the above scheduler allocates resources to jobs and schedules jobs to corresponding computing nodes is resource scheduling.
作业的资源可借阶段,指作业的计算资源允许借出的阶段,即在作业的资源可借阶段,允许将该作业的资源(该作业的资源指为该作业分配的资源)借调给其他作业。The resource borrowing stage of a job refers to the stage when the computing resources of the job are allowed to be borrowed. That is, in the resource borrowing phase of the job, the resources of the job (the resources of the job refer to the resources allocated to the job) are allowed to be loaned to other jobs. .
作业的资源不可借阶段,指作业的计算资源不允许借出的阶段,即在作业的资源不可借阶段,不允许将该作业的资源借调给其他作业。The resource unborrowable stage of a job refers to the stage when the computing resources of the job are not allowed to be borrowed. That is, during the resource unborrowable stage of the job, the resources of the job are not allowed to be loaned to other jobs.
作业的资源可借阶段和资源不可借阶段可以依据计算节点运行该作业过程中对资源的利用率和/或作业运行的时间段来定义,当然资源可借阶段也可以依据其他因素来定义,本申请实施例不做限定。The resource borrowing stage and the resource non-borrowing stage of a job can be defined based on the resource utilization of the computing node during the operation of the job and/or the time period during which the job is run. Of course, the resource borrowing stage can also be defined based on other factors. This article The application examples are not limiting.
在一种实现方式中,在计算节点上运行用户提交的作业的过程中,计算节点的资源利用率无法达到充分利用的状态,有些阶段计算节点的资源利用率较高,而部分阶段计算节点的资源利用率较低。因此,可以将资源利用率低于预设利用率的作业运行阶段作为资源可借阶段,将资源利用率高于预设利用率的阶段作为资源不可借阶段。In one implementation, during the process of running a user-submitted job on a computing node, the resource utilization of the computing node cannot reach a fully utilized state. In some stages, the resource utilization of the computing node is high, and in some stages, the resource utilization of the computing node is high. Resource utilization is low. Therefore, the job running stage in which the resource utilization rate is lower than the preset utilization rate can be regarded as the resource borrowable stage, and the stage in which the resource utilization rate is higher than the preset utilization rate can be regarded as the resource unborrowable stage.
在作业运行过程中,有些作业除了存在真实的计算阶段(以下简称为run阶段)之外,还有作业准备阶段(以下简称为pre-job阶段)和/或作业后处理阶段(以下简称为post-job阶段)。其中,pre-job阶段发生在run阶段之前,post-job阶段发生在run阶段之后,在pre-job阶段执行作业的环境部署与检查,或作业数据传输(也可以称为大文件传输)等,在post-job阶段执行环境清理或作业数据删除等。 During the job running process, some jobs not only have a real computing phase (hereinafter referred to as the run phase), but also have a job preparation phase (hereinafter referred to as the pre-job phase) and/or a job post-processing phase (hereinafter referred to as the post -job stage). Among them, the pre-job stage occurs before the run stage, and the post-job stage occurs after the run stage. In the pre-job stage, the environment deployment and inspection of the job, or job data transmission (also called large file transfer), etc. are performed. Perform environment cleanup or job data deletion in the post-job phase.
示例性的,参考图2,作业1包括pre-job阶段、run阶段以及post-job阶段,其中,在pre-job阶段进行环境部署与检查,在post-job阶段进行环境清理;作业2也包括pre-job阶段、run阶段以及post-job阶段,其中,pre-job阶段进行作业数据传输,在post-job阶段进行作业数据删除。通常,pre-job阶段和post-job阶段通常只消耗少量的资源,即资源利用率较低,因此,本申请实施例中,可以将作业的pre-job阶段或post-job阶段作为资源可借阶段,例如图2中的作业1包括两个资源可借阶段,作业2也包括两个资源可借阶段。For example, referring to Figure 2, job 1 includes a pre-job phase, a run phase and a post-job phase, in which environment deployment and inspection are performed in the pre-job phase, and environment cleaning is performed in the post-job phase; job 2 also includes The pre-job stage, the run stage and the post-job stage, among which, the pre-job stage carries out job data transmission, and the post-job stage carries out job data deletion. Usually, the pre-job phase and the post-job phase usually only consume a small amount of resources, that is, the resource utilization rate is low. Therefore, in the embodiment of the present application, the pre-job phase or the post-job phase of the job can be used as a resource to borrow. Stages, for example, job 1 in Figure 2 includes two resource borrowing stages, and job 2 also includes two resource borrowing stages.
另外,在作业的run阶段中,也可能存在一个或多个资源利用率较低的阶段,也可以将作业的run阶段中资源利用率较低的阶段作为资源可借阶段。除此之外,run阶段中的其他阶段即是资源不可借阶段。In addition, in the run phase of the job, there may be one or more phases with low resource utilization, and the phase with low resource utilization in the run phase of the job can also be used as the resource borrowing phase. In addition, other stages in the run stage are the stages where resources cannot be borrowed.
在另一种实现方式中,作业运行过程中,一个或多个时间段属于作业的空闲时间段,因此,可以将作业的空闲阶段作为资源可借阶段,将作业的忙碌阶段作为资源不可借阶段。In another implementation, during the operation of the job, one or more time periods belong to the idle time period of the job. Therefore, the idle phase of the job can be regarded as the resource-borrowable phase, and the busy phase of the job can be regarded as the resource-unborrowable phase. .
本申请实施例中,执行资源调度方法的装置为集群计算系统的管理节点,管理节点可以为台式机、便携式电脑、掌上电脑(personal digital assistant,PDA)等设备。请参考图3,对本申请提供的管理节点的硬件结构进行介绍。图3中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。In the embodiment of this application, the device that executes the resource scheduling method is a management node of the cluster computing system. The management node can be a desktop computer, a portable computer, a personal digital assistant (PDA), and other devices. Please refer to Figure 3 to introduce the hardware structure of the management node provided by this application. The various components shown in Figure 3 may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
如图3所示,管理节点可以包括:处理器301、存储器302、通信接口303。其中,处理器301、存储器302以及通信接口303之间可以通过总线304连接,或采用其他方式相互连接。As shown in Figure 3, the management node may include: a processor 301, a memory 302, and a communication interface 303. Among them, the processor 301, the memory 302 and the communication interface 303 may be connected to each other through a bus 304, or in other ways.
其中,处理器301是通信设备的控制中心,处理器301可以是通用中央处理单元(central processing unit,CPU),也可以是其他通用处理器等,其中,通用处理器可以是微处理器或者是任何常规的处理器等。例如,处理器301可以包括应用处理器(application processor,AP),图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器等。Among them, the processor 301 is the control center of the communication device. The processor 301 can be a general central processing unit (CPU) or other general processor. The general processor can be a microprocessor or a CPU. Any regular processor etc. For example, the processor 301 may include an application processor (application processor, AP), a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, etc.
处理器301中的控制器是通信设备的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。可选地,处理器301中还可以设置存储器,用于存储指令和数据。The controller in the processor 301 is the nerve center and command center of the communication device. The controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions. Optionally, the processor 301 may also be provided with a memory for storing instructions and data.
示例性的,处理器301可以包括一个或多个CPU,例如图3中所示的CPU 0和CPU 1。Exemplarily, the processor 301 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 3 .
存储器302包括但不限于是随机存取存储器(random access memory,RAM)、只读存储器(read only memory,ROM)、可擦除可编程只读存储器(erasable programmable read-only memory,EPROM)、快闪存储器、或光存储器、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质。本申请实施例中,存储器302可以存储计算机指令,该计算机指令包括用于执行本申请提供的资源调度方法的指令。Memory 302 includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM), fast Flash memory, or optical memory, disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures that can be accessed by a computer. In this embodiment of the present application, the memory 302 may store computer instructions, which include instructions for executing the resource scheduling method provided by the present application.
一种可能的实现方式中,存储器302可以独立于处理器301存在。存储器302可以通过总线304与处理器301相连接,用于存储数据、指令或者程序代码。处理器301调用并执行存储器302中存储的指令或程序代码可以实现调度器的功能,以对用户提交的作业进行资源调度。In a possible implementation, the memory 302 may exist independently of the processor 301. The memory 302 can be connected to the processor 301 through the bus 304 for storing data, instructions or program codes. The processor 301 calls and executes instructions or program codes stored in the memory 302 to implement the function of a scheduler to schedule resources for jobs submitted by users.
另一种可能的实现方式中,存储器302也可以和处理器301集成在一起。In another possible implementation, the memory 302 can also be integrated with the processor 301 .
通信接口303用于管理节点与其他设备或通信网络通信,如与以太网,RAN,无线局域网(wireless local area networks,WLAN)等通信。The communication interface 303 is used for the management node to communicate with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (WLAN), etc.
总线304可以是工业标准体系结构(industry standard architecture,ISA)总线、外部设备互连(peripheral component interconnect,PCI)总线或扩展工业标准体系结构(extended industry standard architecture,EISA)总线、CXL、UB等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 304 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, CXL, UB, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 3, but it does not mean that there is only one bus or one type of bus.
需要说明的是,图3所示的管理节点仅仅是管理节点的一个范例,该管理节点可以具有比图3中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。It should be noted that the management node shown in Figure 3 is only an example of a management node. The management node may have more or less components than those shown in Figure 3, and may combine two or more components. parts, or can have different part configurations.
下面参考附图描述本申请实施例提供的资源调度方法,该资源调度方法的执行主体为集群计算系统的管理节点。 The resource scheduling method provided by the embodiment of the present application is described below with reference to the accompanying drawings. The execution subject of the resource scheduling method is the management node of the cluster computing system.
在上述实施例所述描述的内容的基础上,如图4所示,本申请实施例提供的资源调度方法可以包括以下步骤:Based on the content described in the above embodiments, as shown in Figure 4, the resource scheduling method provided by the embodiment of the present application may include the following steps:
S401、获取待调度资源的第一作业。S401. Obtain the first job of the resource to be scheduled.
第一作业为用户当前提交的一个作业,即该第一作业是等待管理节点为其调度资源的作业。The first job is a job currently submitted by the user, that is, the first job is a job waiting for the management node to schedule resources for it.
本申请实施例中,用户提交第一作业时,需指示该第一作业对于资源的需求,即需指示第一作业的资源需求,资源需求包括资源类型的需求以及资源类型的配置需求,如此,管理节点获取到第一作业的资源需求之后,为该第一作业调度资源时根据该第一作业的资源需求为第一作业分配合适的资源。In the embodiment of this application, when the user submits the first job, he needs to indicate the resource requirements of the first job, that is, he needs to indicate the resource requirements of the first job. The resource requirements include resource type requirements and resource type configuration requirements. In this way, After the management node obtains the resource requirements of the first job, it allocates appropriate resources to the first job according to the resource requirements of the first job when scheduling resources for the first job.
其中,资源类型可以包括计算资源、存储资源或网络资源中的至少一种。计算资源包括CPU或GPU,存储资源包括内存、硬盘,网络资源包括带宽。示例性的,若资源类型为计算资源,对应的资源配置需求可以是CPU和/或GPU的核的数量;若资源类型为存储资源,对应的资源配置需求可以是内存和/或硬盘的大小;若资源类型为网络资源,对应的资源配置需求可以是带宽大小。示例性的,用户提交第一作业时,可以指示为该第一作业分配资源时需满足为其调度2个核的GPU。The resource type may include at least one of computing resources, storage resources or network resources. Computing resources include CPU or GPU, storage resources include memory and hard disk, and network resources include bandwidth. For example, if the resource type is computing resources, the corresponding resource configuration requirements may be the number of CPU and/or GPU cores; if the resource type is storage resources, the corresponding resource configuration requirements may be the size of memory and/or hard disk; If the resource type is a network resource, the corresponding resource configuration requirement can be bandwidth. For example, when the user submits the first job, he or she may indicate that when allocating resources for the first job, the GPU must be scheduled with two cores.
用户提交第一作业时还需携带该第一作业的作业类型,如此,集群计算系统中的管理节点可以获知该第一作业的作业类型。本申请实施例中,作业类型包括:断点续传型、终止型或连续型中的至少一种。When the user submits the first job, he also needs to carry the job type of the first job. In this way, the management node in the cluster computing system can learn the job type of the first job. In the embodiment of the present application, the operation type includes: at least one of a resume type, a termination type, or a continuous type.
断点续传型的作业,是指运行的过程中具有检查点(即checkpoint)的作业,即作业运行过程中需要将中间计算结果缓存起来。对于有些应用的作业,例如AI训练作业,需要定期缓存中间计算结果(可以称为需要定期进行checkpoint)。通常,在作业运行过程中,不保存中间计算结果,待计算流程结束时保存最终的计算结果,但是对于有的计算流程比较长(例如AI训练作业)的作业,如果计算过程中不缓存中间计算结果,一旦中间计算结果丢失,就需要重新开始运行作业。因此,通过建立检查点可以将比较重要的中间计算结果存储至可靠的存储空间,以保证业务顺利运行。Breakpoint-resumable jobs refer to jobs that have checkpoints during the running process, that is, the intermediate calculation results need to be cached during the running process of the job. For some application jobs, such as AI training jobs, intermediate calculation results need to be cached regularly (which can be said to require regular checkpoints). Usually, during the running process of the job, the intermediate calculation results are not saved, and the final calculation results are saved when the calculation process ends. However, for some jobs with relatively long calculation processes (such as AI training jobs), if the intermediate calculations are not cached during the calculation process, As a result, once the intermediate calculation results are lost, the job needs to be restarted. Therefore, by establishing checkpoints, important intermediate calculation results can be stored in reliable storage space to ensure the smooth operation of the business.
终止型的作业,是指可以定期终止运行的作业,例如,对于一些定期保存计算结果的应用,该应用的作业保存结果之后可以终止运行,即作业定期被杀掉(killed)。A terminated job refers to a job that can be terminated periodically. For example, for some applications that save calculation results regularly, the application's job can be terminated after saving the results, that is, the job is killed periodically.
连续型的作业,是指运行的过程中不允许暂停的作业,即需要保证该作业连续运行,直至运行结束。Continuous jobs refer to jobs that are not allowed to be paused during the running process, that is, it is necessary to ensure that the job runs continuously until the end of the operation.
用户通过命令行提交作业,在命令行中携带作业的参数,作业的参数包括上述作业类型、作业的资源需求,管理节点运行该命令行之后,可以获取到作业的资源需求以及作业类型。Users submit jobs through the command line and carry the parameters of the job in the command line. The parameters of the job include the above job type and the resource requirements of the job. After the management node runs the command line, the resource requirements and job type of the job can be obtained.
上述用户提交借用资源的作业(例如上述的第一作业)之后,管理节点根据作业类型,将作业归入该作业类型对应的作业队列。在管理节点中的文本配置文件里可以设置借用资源的作业队列的参数,例如队列参数可以包括队列名(QueueName)和队列类型(LoanType),队列名和队列类型均是按照作业类型进行相应的划分的。例如,队列名可以为runlimitQueue或ckpntQueue或waitQueue,其中,runlimitQueue表示终止型的作业队列,ckpntQueue表示断点续传型的作业队列,waitQueue表示连续型的作业队列;队列类型为runlimit型或ckpnt型或wait型,其中,runlimit型对应终止型作业,ckpnt型对应断点续传型作业,wait对应连续型作业。After the above-mentioned user submits a job to borrow resources (for example, the above-mentioned first job), the management node places the job into a job queue corresponding to the job type according to the job type. The parameters of the job queue that borrows resources can be set in the text configuration file in the management node. For example, the queue parameters can include the queue name (QueueName) and the queue type (LoanType). The queue name and queue type are divided accordingly according to the job type. . For example, the queue name can be runlimitQueue or ckpntQueue or waitQueue, where runlimitQueue represents a terminated job queue, ckpntQueue represents a resume-type job queue, and waitQueue represents a continuous job queue; the queue type is runlimit or ckpnt or The wait type, among which, the runlimit type corresponds to the termination type operation, the ckpnt type corresponds to the breakpoint resume type operation, and the wait type corresponds to the continuous type operation.
S402、在集群计算系统中筛选与第一作业匹配的可借调资源的作业集合。S402. Screen the job set of loanable resources that match the first job in the cluster computing system.
上述筛选得到的可借调资源的作业集合包括至少一个作业,该至少一个作业为已分配资源的作业。The set of jobs that can be borrowed resources obtained through the above screening includes at least one job, and the at least one job is a job to which resources have been allocated.
当用户提交第一作业时,集群计算系统中已经存在一个或多个已分配资源的作业。应理解,管理节点为该一个或多个作业分配资源的方法是:按照该一个或多个作业的资源需求为该一个或多个作业分配资源。When the user submits the first job, one or more jobs with allocated resources already exist in the cluster computing system. It should be understood that the method for the management node to allocate resources to the one or more jobs is to allocate resources to the one or more jobs according to the resource requirements of the one or more jobs.
在一种可能的实现方式中,如图4所示,上述S402可以通过S4021-S4022实现。In a possible implementation, as shown in Figure 4, the above S402 can be implemented through S4021-S4022.
S4021、根据集群计算系统中可借调资源的一个或多个作业的资源借调信息,从该一个或多个作业中筛选与第一作业的作业类型匹配的第一作业集合。S4021. Based on the resource loan information of one or more jobs that can loan resources in the cluster computing system, select the first job set that matches the job type of the first job from the one or more jobs.
应理解,对于一个或多个作业中的任意一个作业,用户提交该作业时也携带该作业的资源借调信息,从而管理节点可以存储该作业的资源借调信息,作业的资源借调信息用于指示将该作业 的可借调资源借调至其他作业时其可借调资源支持的作业类型。至少一个作业的可借调资源支持的作业类型可以理解为该至少一个作业的可借调资源可以借调给哪些类型的作业使用。It should be understood that for any one of the one or more jobs, the user also carries the resource loan information of the job when submitting the job, so that the management node can store the resource loan information of the job, and the resource loan information of the job is used to indicate that the job will be The job The types of operations supported by the loanable resources when the loanable resources are seconded to other operations. The job types supported by the loanable resources of at least one job can be understood as the types of jobs to which the loanable resources of at least one job can be loaned.
另外,用户提交作业时,也需要携带指示作业的阶段的信息以及指示各个阶段的资源是否可借的信息,如此,管理节点可以获知作业的哪一个阶段是资源可借调阶段,哪一个阶段是资源不可借阶段,例如作业的资源可借阶段是作业的资源利用率低于预设利用率的阶段,预设资源利用率可以根据实际情况确定,例如可以设置为50%,关于资源可借调阶段的描述参考上述实施例的相关描述,此处不再赘述。In addition, when users submit a job, they also need to carry information indicating the stages of the job and information indicating whether the resources at each stage can be borrowed. In this way, the management node can learn which stage of the job is the resource loanable stage and which stage is the resource. The unborrowable stage, for example, the resource borrowable stage of a job is a stage in which the resource utilization rate of the job is lower than the preset utilization rate. The preset resource utilization rate can be determined according to the actual situation, for example, it can be set to 50%. Regarding the resource loanable stage, For the description, reference is made to the relevant descriptions of the above embodiments, which will not be described again here.
可选地,一个或多个已分配资源的作业中,有的作业可能包括多个资源可借阶段,同一作业的多个资源可借阶段的可借调资源支持的业务类型可以相同的,也可以不同,本申请实施例不做限定。Optionally, among one or more jobs that have allocated resources, some jobs may include multiple resource-borrowing phases. The loanable resources of the multiple resource-borrowing phases of the same job may support the same business type, or they may Differently, the embodiments of this application are not limited.
下面以一个示例说明第一作业集合的获取过程,假设集群计算系统中已分配资源的作业包括job1、job2、job3、job4以及job5,将第一作业记为job6。作业的具体情况如表1所示。The following uses an example to illustrate the acquisition process of the first job set. Assume that the jobs with allocated resources in the cluster computing system include job1, job2, job3, job4, and job5. The first job is recorded as job6. The details of the job are shown in Table 1.
表1
Table 1
若job6为断点续传型作业,通过执行上述S4021,从该job1、job2、job3、job4以及job5中筛选与第一作业(即job6)的作业类型匹配的作业为job1和job5,因此第一作业集合表示为{job1,job5}。If job6 is a resume-type job, by executing the above S4021, the jobs matching the job type of the first job (i.e., job6) are selected from job1, job2, job3, job4, and job5 to be job1 and job5, so the first The job set is represented as {job1, job5}.
S4022、从第一作业集合中筛选可借调资源满足第一作业的资源需求的第二作业集合。S4022. Select a second set of jobs from the first set of jobs that can borrow resources to meet the resource requirements of the first job.
其中,资源类型包括计算资源、存储资源或网络资源中的至少一种,关于计算资源、存储资源以及网络资源的相关内容可以参考上述实施例的描述,此处不再赘述。The resource type includes at least one of computing resources, storage resources, or network resources. For related content on computing resources, storage resources, and network resources, please refer to the description of the above embodiments, and will not be described again here.
至少一个作业的资源需求满足第一作业的资源需求指的是:至少一个作业对应的资源中资源配置高于第一作业对资源配置需求。继续以S4021中的第一作业集合为例,该第一作业集合中包括两个作业,分别为job1和job5,假设为job1分配的资源为4核CPU,4M的内存;为job5分配的资源是1核CPU,8M的内存,第一作业的资源需求是5M的内存;那么从job1和job5中筛选到的满足第一作业的资源需求的作业为job5,因此第二作业集合表示为{job5}。The resource requirement of at least one job meeting the resource requirement of the first job means that the resource configuration of the resources corresponding to at least one job is higher than the resource configuration requirement of the first job. Continuing to take the first job set in S4021 as an example, the first job set includes two jobs, job1 and job5. Assume that the resources allocated for job1 are 4-core CPU and 4M memory; the resources allocated for job5 are 1-core CPU, 8M memory, the resource requirement of the first job is 5M memory; then the job selected from job1 and job5 that meets the resource requirement of the first job is job5, so the second job set is expressed as {job5} .
可选地,本申请实施例中,从集群计算系统中的一个或多个已分配资源的作业中筛选出该作业集合的过程中,不限定按照资源需求以及可借调资源支持的作业类型进行筛选的顺序,即可以按照S4021-S4022中先筛选可借调资源支持第一作业的作业类型的作业,再从作业类型匹配的作业中筛选出资源需求匹配的作业,得到作业集合;或者6也可以先筛选资源需求匹配的作业,再从资源需求匹配的作业中筛选出可借调资源支持第一作业的作业类型的作业,得到作业集合,本申请实施例不做限定。Optionally, in the embodiment of the present application, in the process of selecting the job set from one or more jobs with allocated resources in the cluster computing system, the process is not limited to filtering based on resource requirements and job types supported by seconded resources. In the order of S4021-S4022, you can first filter the jobs that can be seconded to support the job type of the first job, and then filter out the jobs that match the resource requirements from the jobs that match the job type to get the job set; or 6, you can also first Jobs with matching resource requirements are screened, and then jobs of a job type that can be seconded to support resources for the first job are selected from the jobs with matching resource needs to obtain a job set, which is not limited by the embodiments of this application.
上述S4021-S4022是根据作业类型筛选与第一作业的资源需求匹配的至少一个作业,也就是说,至少一个作业的资源需求满足第一作业的资源需求,并且至少一个作业的可借调资源支持第一作业的作业类型。The above S4021-S4022 is to filter at least one job that matches the resource requirements of the first job according to the job type. That is to say, the resource requirements of at least one job meet the resource requirements of the first job, and the secondable resources of at least one job support the second job. The job type of a job.
另一种可能的实现方式中,在集群计算系统中筛选与第一作业匹配的可借调资源的作业集合可以包括:从集群计算系统中可借调资源的一个或多个作业中筛选满足第一作业的资源需求或者筛选与第一作业的作业类型匹配的作业集合。In another possible implementation, screening the job set of loanable resources that match the first job in the cluster computing system may include: filtering from one or more jobs that can loan resources in the cluster computing system that satisfy the first job resource requirements or filter the job set that matches the job type of the first job.
S403、根据作业集合的可借调资源确定第一作业的资源调度策略。S403. Determine the resource scheduling policy of the first job according to the loanable resources of the job set.
第一作业的资源调度策略用于指示在资源可借阶段利用可借调资源执行第一作业的处理方式,应理解,第一作业的作业类型不同时,执行第一作业的处理方式不同。The resource scheduling policy of the first job is used to indicate the processing method of executing the first job using loanable resources during the resource borrowing stage. It should be understood that when the job type of the first job is different, the processing method of executing the first job is different.
结合S402的内容,作业集合包括至少一个作业,上述S403中确定第一作业的资源调度策略 包括以下两种情况:Combined with the contents of S402, the job set includes at least one job, and the resource scheduling policy of the first job is determined in S403. Including the following two situations:
情况1、若该至少一个作业的资源需求以及可借调资源支持的作业类型与第一作业均匹配,则基于作业集合的可借调资源中的全部资源确定第一作业的资源调度策略。Case 1: If the resource requirements of the at least one job and the job types supported by the loanable resources match the first job, the resource scheduling policy of the first job is determined based on all the resources in the loanable resources of the job set.
情况2、若该至少一个作业的资源需求或可借调资源支持的作业类型与第一作业匹配,则基于作业集合的可借调资源的子集(可借调资源的子集指的是可借调资源中的部分资源)确定第一作业的资源调度策略。例如,若至少一个作业的可借调资源支持的作业类型与第一作业匹配,则还需进一步从该至少一个作业中筛选出一个或多个作业,该一个或多个作业的资源需求与第一作业的资源需求匹配,进而基于该一个或多个作业的可借调资源确定第一作业的资源调度策略,即上述的可借调资源的子集是指作业集合中的一个或多个作业的可借调资源。Case 2: If the resource requirements of the at least one job or the job type supported by the loanable resources match the first job, then a subset of the loanable resources based on the job set (the subset of loanable resources refers to the loanable resources) part of the resources) determines the resource scheduling strategy of the first job. For example, if the job type supported by the loanable resources of at least one job matches the first job, it is necessary to further filter out one or more jobs from the at least one job, and the resource requirements of the one or more jobs are consistent with the first job. The resource requirements of the job are matched, and then the resource scheduling strategy of the first job is determined based on the loanable resources of the one or more jobs, that is, the subset of loanable resources mentioned above refers to the loanable resources of one or more jobs in the job set. resource.
在一种可能的实现方式中,对于已分配资源的一个或多个作业,用户还可以指定作业的可借调资源的借调时长,该可借调资源的借调时长指的是将该作业的可借调资源借调至第一作业时,第一作业可利用该可借调资源的时长。如此,管理节点获取到借调时长之后,可以根据该借调时长为第一作业调度资源,应注意,当借调时长到期时,不能再将其资源借调至第一作业。In one possible implementation, for one or more jobs that have allocated resources, the user can also specify the loan duration of the job's loanable resources. The loan duration of the job's loanable resources refers to the secondment of the job's loanable resources. When seconded to the first operation, the length of time the first operation can utilize the seconded resource. In this way, after the management node obtains the secondment period, it can schedule resources for the first job according to the secondment period. It should be noted that when the secondment period expires, its resources can no longer be seconded to the first job.
需要说明的是,作业的可借调资源的借调时长小于或等于该作业资源可借阶段的时长,当作业的可借调资源的借调时长等于该作业资源可借阶段的时长时,用户提交作业时也可以不指定借调时长,如此,默认借调时长为资源可借阶段的时长。It should be noted that the loan duration of a job's loanable resources is less than or equal to the duration of the loanable phase of the job's resources. When the loan duration of the job's loanable resources is equal to the duration of the job's borrowable resources, the user also submits the job. The loan duration does not need to be specified. In this case, the default loan duration is the duration of the period when the resource can be borrowed.
基于以上S4021-S4022及其相关内容,继续参考图4,相应地,上述S403可以通过S4031实现。Based on the above S4021-S4022 and related contents, continue to refer to Figure 4. Correspondingly, the above S403 can be implemented through S4031.
S4031、根据第二作业集合的可借调资源确定第一作业的资源调度策略。S4031. Determine the resource scheduling policy of the first job based on the loanable resources of the second job set.
本申请实施例中,第二作业集合包括第二作业,如此,根据第二作业集合的可借调资源确定第一作业的资源调度策略具体是根据第二作业的可借调资源确定第一作业的资源调度策略。In this embodiment of the present application, the second job set includes a second job. In this way, determining the resource scheduling policy of the first job based on the loanable resources of the second job set specifically determines the resources of the first job based on the loanable resources of the second job. Scheduling strategy.
对于不同类型的作业,资源调度策略不同,下面以上述实施例中的断点续传型、终止型以及连续型为例,分别介绍这三种类型的作业的调度策略。For different types of jobs, resource scheduling strategies are different. The following takes the resumption type, termination type and continuous type in the above embodiment as examples to introduce the scheduling strategies of these three types of jobs respectively.
当第一作业的作业类型为断点续传型时,第二作业集合包括第二作业时,上述S4031具体包括S4031a:When the job type of the first job is breakpoint resume type and the second job set includes the second job, the above S4031 specifically includes S4031a:
S4031a、在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,暂停将第二作业的可借调资源借调至第一作业;当第二作业运行结束时,将可借调资源调度至第一作业,以继续运行第一作业。S4031a. During the resource borrowing stage of the second operation, second the second operation's secondable resources to the first operation, and when the secondment period of the second operation expires, suspend secondment of the second operation's secondable resources to the first operation. One job; when the second job ends, the loanable resources are scheduled to the first job to continue running the first job.
示例性的,假设集群计算系统中已分配资源的作业包括作业1至作业7,作业的具体情况如下表2所示。For example, assume that the jobs to which resources have been allocated in the cluster computing system include Job 1 to Job 7. The details of the jobs are as shown in Table 2 below.
表2
Table 2
待调度资源的第一作业记为作业8,假设该作业8为的作业类型为断点续传型。The first job of the resource to be scheduled is recorded as job 8. It is assumed that the job type of job 8 is breakpoint resume type.
一种情况下,执行上述S4021筛选作业类型与第一作业匹配的作业,得到第一作业集合为{作业1,作业3,作业6},假设执行S4022筛选资源需求与第一作业匹配的作业,得到第二作业集合为{作业1}。作业1与作业8的运行情况可参考图5中的(a),其中,第二作业为作业1,调度器将作业1被派发至一个计算节点,作业1进入资源可借阶段之后,调度器将用户提交的作业8也派发至该计算机节点,并指示计算节点在作业1的资源可借阶段,利用作业1的资源运行作 业8;待作业1的资源可借阶段到期,管理节点通知该计算节点停止运行作业8(即作业8进行checkpoint),并通知计算节点启动作业1的计算阶段;当作业1的计算阶段运行结束,计算节点通知调度器作业1已运行结束,之后,调度器再通知该计算节点继续运行作业8,即重启作业8。In one case, the above-mentioned S4021 is executed to filter jobs whose job types match the first job, and the first job set is obtained as {job 1, job 3, job 6}. Assume that S4022 is executed to filter jobs whose resource requirements match the first job. The second job set is obtained as {job 1}. The running conditions of job 1 and job 8 can be referred to (a) in Figure 5. The second job is job 1. The scheduler dispatches job 1 to a computing node. After job 1 enters the resource borrowing stage, the scheduler Job 8 submitted by the user is also dispatched to the computer node, and the computing node is instructed to use the resources of job 1 to run the job during the resource borrowing stage of job 1. Job 8; when the resource borrowing phase of job 1 expires, the management node notifies the computing node to stop running job 8 (that is, job 8 performs checkpoint), and notifies the computing node to start the computing phase of job 1; when the computing phase of job 1 runs At the end, the computing node notifies the scheduler that job 1 has ended. Afterwards, the scheduler notifies the computing node to continue running job 8, that is, restarting job 8.
另一种情况下,执行上述S4021筛选作业类型与第一作业匹配的作业,得到第一作业集合为{作业1,作业3,作业6},假设执行S4022筛选资源需求与第一作业匹配的作业,得到第二作业集合为{作业6},该作业6的可借调资源的借调时长为30分钟(作业6的资源可借阶段的时长大于30分钟),作业6与作业8的运行情况可参考图5中的(b),计算节点在作业6的资源可借阶段,利用作业6的资源运行作业8;当作业6的借调时长到期时,停止运行作业8(即作业8进行checkpoint);当作业6运行结束时,该计算节点再继续运行作业8,即重启作业8。In another case, the above-mentioned S4021 is executed to filter jobs whose job types match the first job, and the first job set is obtained as {job 1, job 3, job 6}. Assume that S4022 is executed to filter jobs whose resource requirements match the first job. , the second job set is obtained as {Job 6}, and the loaning time of the loanable resources of Job 6 is 30 minutes (the resource borrowing period of Job 6 is longer than 30 minutes). The running status of Job 6 and Job 8 can be referred to In (b) in Figure 5, the computing node uses the resources of job 6 to run job 8 during the resource borrowing stage of job 6; when the loan period of job 6 expires, it stops running job 8 (that is, job 8 performs checkpoint); When job 6 ends, the computing node continues to run job 8, that is, restarts job 8.
当第一作业的作业类型为终止型时,第二作业集合包括第二作业,上述S4031具体包括S4031b:When the job type of the first job is termination type, the second job set includes the second job, and the above S4031 specifically includes S4031b:
S4031b、在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,终止将第二作业的可借调资源借调至第一作业。S4031b. During the resource borrowing stage of the second operation, second the second operation's secondable resources to the first operation, and when the secondment period of the second operation expires, terminate the second operation's second operation's secondable resources to the first operation. One assignment.
仍以S4031a中的作业1至作业7为例,待调度资源的第一作业记为作业8,假设该作业8的作业类型为终止型。Still taking jobs 1 to 7 in S4031a as an example, the first job of the resource to be scheduled is recorded as job 8. It is assumed that the job type of job 8 is termination type.
一种情况下,执行上述S4021筛选作业类型与第一作业匹配的作业,得到第一作业集合为{作业2,作业5,作业7},假设执行S4022筛选资源需求与第一作业匹配的作业,得到第二作业集合为{作业2}。作业2与作业8的运行情况可参考图6中的(a),计算节点在作业2的资源可借阶段,利用作业2的资源运行作业8;当作业2的资源可借阶段到期时,终止运行作业8(即作业8被killed)。In one case, the above-mentioned S4021 is executed to filter jobs whose job types match the first job, and the first job set is obtained as {job 2, job 5, job 7}. Assume that S4022 is executed to filter jobs whose resource requirements match the first job. The second job set is obtained as {job 2}. The running status of Job 2 and Job 8 can be referred to (a) in Figure 6. The computing node uses the resources of Job 2 to run Job 8 during the resource borrowing phase of Job 2; when the resource borrowing phase of Job 2 expires, Terminate running job 8 (that is, job 8 is killed).
另一种情况下,执行上述S4021筛选作业类型与第一作业匹配的作业,得到第一作业集合为{作业2,作业5,作业7},假设执行S4022筛选资源需求与第一作业匹配的作业,得到第二作业集合为{作业5}。作业5与作业8的运行情况可参考图6中的(b),计算节点在作业5的资源可借阶段,利用作业5的资源运行作业8;当作业5的资源可借阶段到期时,终止运行作业8(即作业8被killed)。In another case, the above-mentioned S4021 is executed to filter jobs whose job type matches the first job, and the first job set is obtained as {job 2, job 5, job 7}. Assume that S4022 is executed to filter jobs whose resource requirements match the first job. , the second job set is obtained as {job 5}. The running status of Job 5 and Job 8 can be referred to (b) in Figure 6. In the resource borrowing phase of Job 5, the computing node uses the resources of Job 5 to run Job 8; when the resource borrowing phase of Job 5 expires, Terminate running job 8 (that is, job 8 is killed).
当第一作业的作业类型为连续型时,第二作业集合包括第二作业,上述S4031具体包括S4031c:When the job type of the first job is continuous, the second job set includes the second job, and the above S4031 specifically includes S4031c:
S4031c、在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,继续将第二作业的资源借调至第一作业。S4031c. During the resource borrowing stage of the second operation, second the second operation's borrowable resources to the first operation, and when the secondment period of the second operation's borrowed resources expires, continue to second the second operation's resources to the first operation. .
仍以S4031a中的作业1至作业7为例,待调度资源的第一作业记为作业8,假设该作业8的作业类型为连续型,在一种情况下,执行上述S4021筛选作业类型与第一作业匹配的作业,得到第一作业集合为{作业4},并且执行S4022确定作业4满足第一作业的资源需求,因此得到第二作业集合为{作业4}。作业4与作业8的运行情况可参考图7中,计算节点在作业4的资源可借阶段,利用作业4的资源运行作业8;当作业4的资源可借阶段到期时,暂停作业4,继续运行作业8;当作业8运行结束时,再继续运行作业4,即作业4需要等待作业8运行结束才可以继续运行。Still taking jobs 1 to 7 in S4031a as an example, the first job of the resource to be scheduled is recorded as job 8. Assume that the job type of job 8 is continuous. In one case, the above S4021 is executed to filter the job type and the first job. A job matches the job, and the first job set is obtained as {Job 4}, and S4022 is executed to determine that Job 4 meets the resource requirements of the first job, so the second job set is obtained as {Job 4}. The running status of Job 4 and Job 8 can be seen in Figure 7. In the resource borrowing phase of Job 4, the computing node uses the resources of Job 4 to run Job 8; when the resource borrowing phase of Job 4 expires, Job 4 is suspended. Continue to run job 8; when job 8 ends, continue to run job 4, that is, job 4 needs to wait for job 8 to end before it can continue to run.
在一种可能的实现方式中,通过执行上述S4021-S4022得到的与第一作业匹配的作业集合中可能包括多个作业,该多个作业既与第一作业的资源需求匹配,也支持第一作业的作业类型。在这种情况下,调度器可以从多个作业中选择一个作业作为目标作业(即上述的第二作业),进而将目标作业的资源借调至第一作业。In a possible implementation, the job set matching the first job obtained by executing the above S4021-S4022 may include multiple jobs. The multiple jobs not only match the resource requirements of the first job, but also support the first job. The job type of the job. In this case, the scheduler can select one job from multiple jobs as the target job (ie, the above-mentioned second job), and then second the resources of the target job to the first job.
可选地,调度器可以根据先进先出的原则,从多个作业中选择最先进入作业队列的作业作为目标作业;或者,调度器也可以从多个作业中随机选择一个作业作为目标作业,将该目标作业的资源借调至第一作业;又或者,调度器可以按照多个作业的借调优先级,选择优先级最高的一个作业作为目标作业。Optionally, the scheduler can select the job that first enters the job queue as the target job from multiple jobs based on the first-in, first-out principle; or the scheduler can also randomly select a job from multiple jobs as the target job, Second the resources of the target job to the first job; or, the scheduler can select the highest priority job as the target job based on the secondment priorities of multiple jobs.
在另一种可能的实现方式中,上述第二作业的资源可借阶段的资源除了借调至第一作业之外,也可以借调至其他的一个或多个作业,如此,使得第二作业的资源得到充分利用。例如,第一作业为短作业,由于短作业消耗的资源很少,这样,在第二作业的资源可借阶段仍有较多的剩余资源,剩余资源也可以借调至其他的短作业。示例性的,如图8所示,假设第二作业为上述7个作业中的作业3,第一作业为作业8,在作业3的资源可借阶段,可以运行作业8,作业9以及作业 10。In another possible implementation, the resources of the second job can be borrowed. In addition to being seconded to the first job, the resources of the second job can also be seconded to one or more other jobs. In this way, the resources of the second job are be fully utilized. For example, the first job is a short job. Since the short job consumes very few resources, there are still many remaining resources during the resource borrowing stage of the second job, and the remaining resources can also be seconded to other short jobs. For example, as shown in Figure 8, assuming that the second job is job 3 among the seven jobs mentioned above, and the first job is job 8, during the resource borrowing stage of job 3, job 8, job 9 and job can be run. 10.
在又一种可能的实现方式中,通过执行上述S4021-S4022得到的与第一作业匹配的作业集合中包括多个作业,该多个作业支持第一作业的作业类型,该多个作业中每一个单独的作业的资源无法满足第一作业的资源需求,该多个作业的资源总和可以满足第一作业的资源需求。在这种情况下,将该多个作业均作为目标作业(即第二作业包括多个作业),即调度器将多个作业的可借调资源借调至第一作业。In another possible implementation, the job set matching the first job obtained by executing the above S4021-S4022 includes multiple jobs, the multiple jobs support the job type of the first job, and each of the multiple jobs If the resources of a single job cannot meet the resource requirements of the first job, the sum of the resources of the multiple jobs can meet the resource requirements of the first job. In this case, the multiple jobs are all used as target jobs (that is, the second job includes multiple jobs), that is, the scheduler lends the loanable resources of the multiple jobs to the first job.
示例性的,仍以S4031a中的作业1至作业7为例,待调度资源的第一作业记为作业8,若该作业8的作业类型为断点续传型,在一种情况下,通过执行S4021-S4022得到的与第一作业匹配的作业集合为{作业1,作业3},如此,在作业1和作业3的资源可借阶段,利用作业1和作业3的资源运行作业8,当作业1和作业3中借调时长最短的一个借调时长到期时,暂停作业8;当作业1和作业3均运行结束时,再继续运行作业8。For example, still taking jobs 1 to 7 in S4031a as an example, the first job of the resource to be scheduled is recorded as job 8. If the job type of job 8 is a resumable transfer type, in one case, by The set of jobs matching the first job obtained by executing S4021-S4022 is {job 1, job 3}. In this way, during the resource borrowing stage of job 1 and job 3, job 8 is run using the resources of job 1 and job 3. When When the shortest loan duration among Job 1 and Job 3 expires, job 8 will be paused; when job 1 and job 3 both finish running, job 8 will continue to run.
若该作业8的作业类型为终止型,在一种情况下,通过执行S4021-S4022得到的与第一作业匹配的作业集合为{作业2,作业7},如此,在作业2和作业7的资源可借阶段,利用作业2和作业7的资源运行作业8,当作业2和作业7中借调时长最短的一个借调时长到期时,终止运行作业8。If the job type of job 8 is terminated, in one case, the job set matching the first job obtained by executing S4021-S4022 is {job 2, job 7}. In this way, between job 2 and job 7 In the resource borrowing stage, use the resources of Job 2 and Job 7 to run Job 8. When the loan duration of Job 2 and Job 7 with the shortest loan duration expires, job 8 will be terminated.
上述实施例中仅以部分示例示出了资源调度的方式,具体实施过程中,可以有更多情况,本申请实施例不再一一列举。In the above embodiments, only some examples are used to illustrate resource scheduling methods. During the specific implementation process, there may be more situations, and the embodiments of this application will not list them one by one.
综上,本申请实施例提供的资源调度方法中,对于待调度资源的第一作业,管理节点可以在集群计算系统的多个作业中筛选出至少一个已分配资源且与第一作业匹配的可借调资源的作业,得到作业集合,进而根据该作业集合的可借调资源确定第一作业的资源调度策略,如此,能够灵活地将已分配资源的作业集合的可借调资源借调至第一作业,充分利用集群计算系统的资源,提升了集群计算系统的资源利用率。In summary, in the resource scheduling method provided by the embodiments of the present application, for the first job of resources to be scheduled, the management node can filter out at least one available resource that has been allocated and matches the first job among multiple jobs in the cluster computing system. The job that borrows resources obtains a job set, and then determines the resource scheduling policy of the first job based on the loanable resources of the job set. In this way, the loanable resources of the job set that have allocated resources can be flexibly seconded to the first job, which fully Utilizing the resources of the cluster computing system improves the resource utilization of the cluster computing system.
相应地,本申请实施例提供一种管理节点,本申请实施例中,可以根据上述方法示例对该管理节点进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。Accordingly, the embodiment of the present application provides a management node. In the embodiment of the present application, the management node can be divided into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two functional modules can be divided into two. Or two or more functions are integrated into one processing module. The above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.
在采用对应各个功能划分各个功能模块的情况下,图9示出上述实施例中所涉及的管理节点的一种可能的结构示意图。如图9所示,该管理节点包括获取模块901、第一确定模块902以及第二确定模块903。其中,获取模块901用于获取待调度资源的第一作业,例如执行上述方法实施例中的S401。第一确定模块902用于在集群计算系统中筛选可借调资源的作业集合,该作业集合包括至少一个作业,该至少一个作业为已分配资源,且与第一作业匹配的可借调资源的作业,例如执行上述方法实施例中的S402。第二确定模块903用于根据作业集合的可借调资源确定第一作业的资源调度策略,该资源调度策略用于指示在资源可借阶段利用可借调资源执行第一作业的处理方式,例如执行上述方法实施例中的S403。In the case where each functional module is divided corresponding to each function, FIG. 9 shows a possible structural diagram of the management node involved in the above embodiment. As shown in Figure 9, the management node includes an acquisition module 901, a first determination module 902 and a second determination module 903. The acquisition module 901 is used to acquire the first job of the resource to be scheduled, for example, executing S401 in the above method embodiment. The first determination module 902 is configured to screen a job set that can lend resources in the cluster computing system. The job set includes at least one job, and the at least one job is a job that has allocated resources and matches the first job and can lend resources, For example, S402 in the above method embodiment is executed. The second determination module 903 is configured to determine the resource scheduling policy of the first job based on the loanable resources of the job set. The resource scheduling policy is used to indicate the processing method of using the loanable resources to execute the first job during the resource borrowing stage, for example, executing the above S403 in the method embodiment.
应理解的是,本发明本申请实施例的管理节点可以通过中央处理单元(central processing unit,CPU)实现,也可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)、数据处理单元(data processing unit,DPU)、片上系统(system on chip,SoC)或其任意组合。也可以通过软件实现图4所示的资源调度方法时,管理节点及其各个模块也可以为软件模块。It should be understood that the management node in this embodiment of the present invention can be implemented by a central processing unit (CPU), an application-specific integrated circuit (ASIC), or a programmable logic device. (programmable logic device, PLD) implementation, the above PLD can be a complex programmable logical device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL ), data processing unit (DPU), system on chip (SoC), or any combination thereof. When the resource scheduling method shown in Figure 4 can also be implemented through software, the management node and its respective modules can also be software modules.
可选地,上述第一确定模块902,具体用于根据集群计算系统中的一个或多个作业的资源借调信息,从该一个或多个作业中筛选与第一作业的作业类型匹配的第一作业集合,并且从第一作业集合中筛选可借调资源满足第一作业的资源需求的第二作业集合。其中,资源借调信息用于指示将作业的可借调资源借调至其他作业时该可借调资源支持的作业类型,作业类型包括:断点续传型、终止型或连续型中的至少一种;资源需求包括资源类型的需求以及资源类型的配置需求, 资源类型包括计算资源、存储资源或网络资源中的至少一种,例如执行上述方法实施例中的S4021-S4022。Optionally, the above-mentioned first determination module 902 is specifically configured to filter the first job that matches the job type of the first job from the one or more jobs according to the resource secondment information of one or more jobs in the cluster computing system. a job set, and select a second job set from the first job set that can lend resources to meet the resource requirements of the first job. The resource loan information is used to indicate the job types supported by the loanable resources of the job when the job's loanable resources are seconded to other jobs. The job types include: at least one of breakpoint resume type, termination type or continuous type; resource Requirements include resource type requirements and resource type configuration requirements. The resource type includes at least one of computing resources, storage resources or network resources, such as performing S4021-S4022 in the above method embodiment.
可选地,上述第二确定模块903,具体用于根据第二作业集合的可借调资源确定第一作业的资源调度策略,例如执行上述方法实施例中的S4031。Optionally, the above-mentioned second determination module 903 is specifically configured to determine the resource scheduling policy of the first job based on the loanable resources of the second job set, for example, executing S4031 in the above method embodiment.
可选地,第二作业集合包括第二作业,当第一作业的作业类型为断点续传型时,第二确定模块903,具体用于在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,暂停将第二作业的可借调资源借调至第一作业;当第二作业运行结束时,将可借调资源调度至第一作业,以继续运行第一作业,例如执行上述方法实施例中的S4031a。Optionally, the second job set includes a second job. When the job type of the first job is a resumable transfer type, the second determination module 903 is specifically configured to assign the second job to the second job during the resource borrowing stage of the second job. The loanable resources of the job are seconded to the first job, and when the loan duration of the loanable resources expires, the second job's loanable resources are suspended to the first job; when the second job ends, the loanable resources are Schedule to the first job to continue running the first job, for example, perform S4031a in the above method embodiment.
可选地,上述第二作业集合包括第二作业,当第一作业的作业类型为终止型时,第二确定模块903,具体用于在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,终止将第二作业的可借调资源借调至第一作业,例如执行上述方法实施例中的S4031b。Optionally, the above-mentioned second job set includes a second job. When the job type of the first job is terminated, the second determination module 903 is specifically configured to transfer the second job's The loanable resources are seconded to the first operation, and when the loan duration of the loanable resources expires, seconding the loanable resources of the second operation to the first operation is terminated, for example, S4031b in the above method embodiment is executed.
可选地,上述第二作业集合包括第二作业,当第一作业的作业类型为连续型时,第二确定模块903具体用于在第二作业的资源可借阶段,将第二作业的可借调资源借调至第一作业,且当可借调资源的借调时长到期时,继续将第二作业的资源借调至第一作业,例如执行上述方法实施例中的S4031c。Optionally, the above-mentioned second job set includes a second job. When the job type of the first job is continuous, the second determination module 903 is specifically configured to, in the resource borrowing stage of the second job, change the available resource of the second job. The seconded resource is seconded to the first operation, and when the secondment period of the secondable resource expires, the resource of the second operation is continued to be seconded to the first operation, for example, S4031c in the above method embodiment is executed.
上述管理节点的各个模块还可以用于执行上述方法实施例中的其他动作,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。Each module of the above-mentioned management node can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be quoted from the functional description of the corresponding functional module, which will not be described again here.
在采用集成的单元的情况下,图10示出了上述实施例中所涉及的管理节点的另一种可能的结构示意图。如图10所示,本申请实施例提供的管理节点可以包括:处理模块1001和通信模块1002。处理模块1001可以用于对该管理节点的动作进行控制管理,例如,处理模块1001可以用于支持该管理节点执行上述方法实施例中的S401、S402(包括S4021)、S403(包括S4031、S4031a、S4031b以及S4031c),和/或用于本文所描述的技术的其它过程。通信模块1002可以用于支持该管理节点与其他网络实体的通信,例如支持该管理节点与计算节点通信。可选地,如图10所示,该管理节点还可以包括存储模块1003,用于存储计算机指令和数据。In the case of using an integrated unit, FIG. 10 shows another possible structural diagram of the management node involved in the above embodiment. As shown in Figure 10, the management node provided by the embodiment of the present application may include: a processing module 1001 and a communication module 1002. The processing module 1001 can be used to control and manage the actions of the management node. For example, the processing module 1001 can be used to support the management node to perform S401, S402 (including S4021), S403 (including S4031, S4031a, S4031b and S4031c), and/or other processes for the techniques described herein. The communication module 1002 may be used to support communication between the management node and other network entities, for example, to support communication between the management node and a computing node. Optionally, as shown in Figure 10, the management node may also include a storage module 1003 for storing computer instructions and data.
其中,处理模块1001可以是处理器或控制器(例如可以是上述如图3所示的处理器301),上述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块1002可以是通信接口(例如可以是上述如图3所示的通信接口303)。存储模块1003可以是存储器(例如可以是上述如图1所示的存储器302)。The processing module 1001 can be a processor or a controller (for example, it can be the above-mentioned processor 301 shown in Figure 3), and the above-mentioned processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors. , a combination of DSP and microprocessor, etc. The communication module 1002 may be a communication interface (for example, it may be the above-mentioned communication interface 303 as shown in Figure 3). The storage module 1003 may be a memory (for example, it may be the above-mentioned memory 302 shown in Figure 1).
当处理模块1001为处理器,通信模块1002为通信接口,存储模块1003为存储器时,处理器、收发器和存储器可以通过总线连接。When the processing module 1001 is a processor, the communication module 1002 is a communication interface, and the storage module 1003 is a memory, the processor, transceiver and memory can be connected through a bus.
上述管理节点包含的模块实现上述功能的更多细节请参考前面各个方法实施例中的描述,在这里不再重复。For more details on how the modules included in the above management node implement the above functions, please refer to the descriptions in the previous method embodiments, which will not be repeated here.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。Each embodiment in this specification is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机指令时,全部或部分地产生按照本申请实施例中的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))方式或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包括一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、磁盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state drives, SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted over a wired connection from a website, computer, server, or data center (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wirelessly (such as infrared, wireless, microwave, etc.) to another website, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (e.g., floppy disks, magnetic disks, magnetic tapes), optical media (e.g., digital video discs (DVD)), or semiconductor media (e.g., solid state drives). SSD)) etc.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。 The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present application shall be covered by the protection scope of the present application. . Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (16)

  1. 一种资源调度方法,其特征在于,应用于集群计算系统中的管理节点,所述方法包括:A resource scheduling method, characterized in that it is applied to management nodes in a cluster computing system, and the method includes:
    获取待调度资源的第一作业;Obtain the first job of the resource to be scheduled;
    在集群计算系统中筛选可借调资源的作业集合,所述作业集合包括至少一个作业,所述至少一个作业为已分配资源,且与所述第一作业匹配的可借调资源的作业;Screening a job set of loanable resources in the cluster computing system, where the job set includes at least one job, the at least one job is a job that has allocated resources and is a loanable resource job that matches the first job;
    根据所述作业集合的所述可借调资源确定所述第一作业的资源调度策略,所述资源调度策略用于指示在资源可借阶段利用所述可借调资源执行所述第一作业的处理方式。Determine the resource scheduling policy of the first job according to the loanable resources of the job set, and the resource scheduling policy is used to indicate the processing method of using the loanable resources to execute the first job during the resource borrowing stage. .
  2. 根据权利要求1所述的方法,其特征在于,所述在所述集群计算系统中筛选可借调资源的作业集合,包括:The method according to claim 1, characterized in that, screening the job set of loanable resources in the cluster computing system includes:
    根据所述集群计算系统中的一个或多个作业的资源借调信息,从所述一个或多个作业中筛选与所述第一作业的作业类型匹配的第一作业集合;其中,所述资源借调信息用于指示将作业的可借调资源借调至其他作业时所述可借调资源支持的作业类型,所述作业类型包括:断点续传型、终止型或连续型中的至少一种;According to the resource loan information of one or more jobs in the cluster computing system, a first job set matching the job type of the first job is selected from the one or more jobs; wherein the resource loan The information is used to indicate the type of operations supported by the loanable resources of the operation when the loanable resources of the operation are seconded to other operations. The operation types include: at least one of breakpoint resume type, termination type or continuous type;
    从所述第一作业集合中筛选可借调资源满足所述第一作业的资源需求的第二作业集合;所述资源需求包括资源类型的需求以及所述资源类型的配置需求,所述资源类型包括计算资源、存储资源或网络资源中的至少一种。Screen a second set of jobs from the first set of jobs that can borrow resources to meet the resource requirements of the first job; the resource requirements include requirements for resource types and configuration requirements for the resource types, and the resource types include At least one of computing resources, storage resources or network resources.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述作业集合的所述可借调资源确定所述第一作业的资源调度策略,包括:The method of claim 2, wherein determining the resource scheduling policy of the first job based on the loanable resources of the job set includes:
    根据所述第二作业集合的可借调资源确定所述第一作业的资源调度策略。The resource scheduling policy of the first job is determined according to the loanable resources of the second job set.
  4. 根据权利要求1至3任一项所述的方法,其特征在于,The method according to any one of claims 1 to 3, characterized in that,
    所述可借调资源的借调时长小于或等于所述资源可借阶段的时长。The loan duration of the loanable resource is less than or equal to the duration of the borrowable period of the resource.
  5. 根据权利要求3或4所述的方法,其特征在于,所述第二作业集合包括第二作业,当所述第一作业的作业类型为断点续传型时,所述根据所述第二作业集合的可借调资源确定所述第一作业的资源调度策略,包括:The method according to claim 3 or 4, characterized in that the second job set includes a second job, and when the job type of the first job is a breakpoint resume type, the second job set according to the second job The loanable resources of the job set determine the resource scheduling policy of the first job, including:
    在所述第二作业的资源可借阶段,将所述第二作业的可借调资源借调至所述第一作业,且当所述可借调资源的借调时长到期时,暂停将所述第二作业的可借调资源借调至所述第一作业;During the resource borrowing stage of the second operation, the loanable resources of the second operation are seconded to the first operation, and when the loan duration of the loanable resources expires, the second operation is suspended. The loanable resources of the operation are seconded to the first operation;
    当所述第二作业运行结束时,将所述可借调资源调度至所述第一作业,以继续运行所述第一作业。When the second job ends, the loanable resource is scheduled to the first job to continue running the first job.
  6. 根据权利要求3或4所述的方法,其特征在于,所述第二作业集合包括第二作业,当所述第一作业的作业类型为终止型时,所述根据所述第二作业集合的可借调资源确定所述第一作业的资源调度策略,包括:The method according to claim 3 or 4, characterized in that the second job set includes a second job, and when the job type of the first job is a termination type, the job according to the second job set The secondable resources determine the resource scheduling strategy of the first job, including:
    在所述第二作业的资源可借阶段,将所述第二作业的可借调资源借调至所述第一作业,且当所述可借调资源的借调时长到期时,终止将所述第二作业的可借调资源借调至所述第一作业。During the resource borrowing stage of the second operation, the loanable resources of the second operation are seconded to the first operation, and when the loan duration of the loanable resources expires, the second operation is terminated. The loanable resources of the operation are seconded to the first operation.
  7. 根据权利要求3或4所述的方法,其特征在于,所述第二作业集合包括第二作业,当所述第一作业的作业类型为连续型时,所述根据所述第二作业集合的可借调资源确定所述第一作业的资源调度策略,包括:The method according to claim 3 or 4, characterized in that the second job set includes a second job, and when the job type of the first job is continuous, the job according to the second job set The secondable resources determine the resource scheduling strategy of the first job, including:
    在所述第二作业的资源可借阶段,将所述第二作业的可借调资源借调至所述第一作业,且当所述可借调资源的借调时长到期时,继续将所述第二作业的资源借调至所述第一作业。During the resource borrowing stage of the second operation, the loanable resources of the second operation are seconded to the first operation, and when the loan duration of the loanable resources expires, the second operation is continued to be borrowed. The resources of the operation are seconded to the first operation.
  8. 一种管理节点,其特征在于,包括:A management node is characterized by including:
    获取模块,用于获取待调度资源的第一作业;The acquisition module is used to obtain the first job of resources to be scheduled;
    第一确定模块,用于在集群计算系统中筛选可借调资源的作业集合,所述作业集合包括至少一个作业,所述至少一个作业为已分配资源,且与所述第一作业匹配的可借调资源的作业;The first determination module is used to screen a job set of loanable resources in the cluster computing system. The job set includes at least one job. The at least one job is an allocated resource and can be seconded that matches the first job. Resource assignments;
    第二确定模块,用于根据所述作业集合的所述可借调资源确定所述第一作业的资源调度策略,所述资源调度策略用于指示在资源可借阶段利用所述可借调资源执行所述第一作业的处理方式。The second determination module is configured to determine the resource scheduling policy of the first job according to the loanable resources of the job set, where the resource scheduling policy is used to indicate that the loanable resources are used to perform all tasks during the resource loaning stage. Describe how the first job is handled.
  9. 根据权利要求8所述的管理节点,其特征在于,The management node according to claim 8, characterized in that:
    所述第一确定模块,具体用于根据所述集群计算系统中的一个或多个作业的资源借调信息,从所述一个或多个作业中筛选与所述第一作业的作业类型匹配的第一作业集合;其中,所述资源 借调信息用于指示将作业的可借调资源借调至其他作业时所述可借调资源支持的作业类型,所述作业类型包括:断点续传型、终止型或连续型中的至少一种;并且从所述第一作业集合中筛选可借调资源满足所述第一作业的资源需求的第二作业集合;所述资源需求包括资源类型的需求以及所述资源类型的配置需求,所述资源类型包括计算资源、存储资源或网络资源中的至少一种。The first determination module is specifically configured to filter the first job that matches the job type of the first job from the one or more jobs according to the resource secondment information of one or more jobs in the cluster computing system. A collection of jobs; where the resource The secondment information is used to indicate the type of operations supported by the loanable resources of a job when the loanable resources of the job are seconded to other jobs. The job types include: at least one of breakpoint resume type, termination type, or continuous type; and Screen a second set of jobs from the first set of jobs that can borrow resources to meet the resource requirements of the first job; the resource requirements include requirements for resource types and configuration requirements for the resource types, and the resource types include At least one of computing resources, storage resources or network resources.
  10. 根据权利要求8或9所述的管理节点,其特征在于,The management node according to claim 8 or 9, characterized in that,
    所述第二确定模块,具体用于根据所述第二作业集合的可借调资源确定所述第一作业的资源调度策略。The second determination module is specifically configured to determine the resource scheduling policy of the first job according to the loanable resources of the second job set.
  11. 根据权利要求8至10任一项所述的管理节点,其特征在于,The management node according to any one of claims 8 to 10, characterized in that,
    所述可借调资源的借调时长小于或等于所述资源可借阶段的时长。The loan duration of the loanable resource is less than or equal to the duration of the borrowable period of the resource.
  12. 根据权利要求10或11所述的管理节点,其特征在于,所述第二作业集合包括第二作业,当所述第一作业的作业类型为断点续传型时;The management node according to claim 10 or 11, characterized in that the second job set includes a second job, and when the job type of the first job is a breakpoint resume type;
    所述第二确定模块,具体用于在所述第二作业的资源可借阶段,将所述第二作业的可借调资源借调至所述第一作业,且当所述可借调资源的借调时长到期时,暂停将所述第二作业的可借调资源借调至所述第一作业;当所述第二作业运行结束时,将所述可借调资源调度至所述第一作业,以继续运行所述第一作业。The second determination module is specifically configured to secondment the loanable resources of the second job to the first job during the resource borrowing stage of the second job, and when the loan duration of the second job is When it expires, suspend the loaning of the loanable resources of the second job to the first job; when the second job ends, schedule the loanable resources to the first job to continue running. The first job.
  13. 根据权利要求10或11所述的管理节点,其特征在于,所述第二作业集合包括第二作业,当所述第一作业的作业类型为终止型时;The management node according to claim 10 or 11, wherein the second job set includes a second job, and when the job type of the first job is a termination type;
    所述第二确定模块,具体用于在所述第二作业的资源可借阶段,将所述第二作业的可借调资源借调至所述第一作业,且当所述可借调资源的借调时长到期时,终止将所述第二作业的可借调资源借调至所述第一作业。The second determination module is specifically configured to secondment the loanable resources of the second job to the first job during the resource borrowing stage of the second job, and when the loan duration of the second job is Upon expiration, the loaning of the loanable resources of the second operation to the first operation is terminated.
  14. 根据权利要求10或11所述的管理节点,其特征在于,所述第二作业集合包括第二作业,当所述第一作业的作业类型为连续型时;The management node according to claim 10 or 11, wherein the second job set includes a second job, and when the job type of the first job is continuous;
    所述第二确定模块,具体用于在所述第二作业的资源可借阶段,将所述第二作业的可借调资源借调至所述第一作业,且当所述可借调资源的借调时长到期时,继续将所述第二作业的资源借调至所述第一作业。The second determination module is specifically configured to secondment the loanable resources of the second job to the first job during the resource borrowing stage of the second job, and when the loan duration of the second job is When it expires, continue to second the resources of the second job to the first job.
  15. 一种管理节点,其特征在于,包括存储器和与所述存储器连接的至少一个处理器,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述计算机指令被所述至少一个处理器执行时,使得所述管理节点执行如权利要求1至7任一项所述的方法。A management node, characterized in that it includes a memory and at least one processor connected to the memory. The memory is used to store computer program code. The computer program code includes computer instructions. When the computer instructions are described When at least one processor is executed, the management node is caused to execute the method according to any one of claims 1 to 7.
  16. 一种集群计算系统,其特征在于,包括管理节点和至少一个计算节点,所述管理节点执行如权利要求1至7任一项所述的方法。 A cluster computing system, characterized by including a management node and at least one computing node, the management node executing the method according to any one of claims 1 to 7.
PCT/CN2023/101172 2022-06-27 2023-06-19 Resource scheduling method, apparatus and system WO2024001851A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210741724 2022-06-27
CN202210741724.6 2022-06-27
CN202211427018.0A CN117311957A (en) 2022-06-27 2022-11-15 Resource scheduling method, device and system
CN202211427018.0 2022-11-15

Publications (1)

Publication Number Publication Date
WO2024001851A1 true WO2024001851A1 (en) 2024-01-04

Family

ID=89259118

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101172 WO2024001851A1 (en) 2022-06-27 2023-06-19 Resource scheduling method, apparatus and system

Country Status (2)

Country Link
CN (1) CN117311957A (en)
WO (1) WO2024001851A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277982A1 (en) * 2014-03-31 2015-10-01 Fujitsu Limited Parallel computer system and method for allocating jobs to calculation nodes
US10754697B2 (en) * 2018-01-29 2020-08-25 Bank Of America Corporation System for allocating resources for use in data processing operations
CN111679900A (en) * 2020-06-15 2020-09-18 杭州海康威视数字技术股份有限公司 Task processing method and device
CN113806064A (en) * 2020-06-17 2021-12-17 华为技术有限公司 Job scheduling method, device and system and job dispatching device
US20220188158A1 (en) * 2020-12-11 2022-06-16 Liqid Inc. Execution job compute unit composition in computing clusters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277982A1 (en) * 2014-03-31 2015-10-01 Fujitsu Limited Parallel computer system and method for allocating jobs to calculation nodes
US10754697B2 (en) * 2018-01-29 2020-08-25 Bank Of America Corporation System for allocating resources for use in data processing operations
CN111679900A (en) * 2020-06-15 2020-09-18 杭州海康威视数字技术股份有限公司 Task processing method and device
CN113806064A (en) * 2020-06-17 2021-12-17 华为技术有限公司 Job scheduling method, device and system and job dispatching device
US20220188158A1 (en) * 2020-12-11 2022-06-16 Liqid Inc. Execution job compute unit composition in computing clusters

Also Published As

Publication number Publication date
CN117311957A (en) 2023-12-29

Similar Documents

Publication Publication Date Title
WO2023082560A1 (en) Task processing method and apparatus, device, and medium
US9727372B2 (en) Scheduling computer jobs for execution
EP2893444B1 (en) Quota-based resource management
US8424007B1 (en) Prioritizing tasks from virtual machines
US8468251B1 (en) Dynamic throttling of access to computing resources in multi-tenant systems
US8056083B2 (en) Dividing a computer job into micro-jobs for execution
CN109564528B (en) System and method for computing resource allocation in distributed computing
WO2016078178A1 (en) Virtual cpu scheduling method
US9973512B2 (en) Determining variable wait time in an asynchronous call-back system based on calculated average sub-queue wait time
US20160162337A1 (en) Multiple core real-time task execution
WO2022247105A1 (en) Task scheduling method and apparatus, computer device and storage medium
WO2022068697A1 (en) Task scheduling method and apparatus
WO2024021489A1 (en) Task scheduling method and apparatus, and kubernetes scheduler
WO2023274278A1 (en) Resource scheduling method and device and computing node
US20080168125A1 (en) Method and system for prioritizing requests
CN114721818A (en) Kubernetes cluster-based GPU time-sharing method and system
US8977752B2 (en) Event-based dynamic resource provisioning
WO2024001851A1 (en) Resource scheduling method, apparatus and system
CN111708799A (en) Spark task processing method and device, electronic equipment and storage medium
CN116303132A (en) Data caching method, device, equipment and storage medium
EP2413240A1 (en) Computer micro-jobs
KR20150089665A (en) Appratus for workflow job scheduling
WO2024007922A1 (en) Task migration method and apparatus, and device, storage medium and product
WO2022057754A1 (en) Memory control method and device
WO2024087663A1 (en) Job scheduling method and apparatus, and chip

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23830031

Country of ref document: EP

Kind code of ref document: A1