WO2024087663A1 - 作业调度方法、装置和芯片 - Google Patents

作业调度方法、装置和芯片 Download PDF

Info

Publication number
WO2024087663A1
WO2024087663A1 PCT/CN2023/101052 CN2023101052W WO2024087663A1 WO 2024087663 A1 WO2024087663 A1 WO 2024087663A1 CN 2023101052 W CN2023101052 W CN 2023101052W WO 2024087663 A1 WO2024087663 A1 WO 2024087663A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
job
scheduler
scheduling
type
Prior art date
Application number
PCT/CN2023/101052
Other languages
English (en)
French (fr)
Inventor
申鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024087663A1 publication Critical patent/WO2024087663A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of computers, and in particular to a job scheduling method, device and chip.
  • a supercomputing center refers to a cluster of multiple servers, which can be used to perform large-scale jobs or computing tasks, etc.
  • the supercomputing center can be used to run high-performance computing (HPC) jobs or artificial intelligence (AI) jobs.
  • HPC high-performance computing
  • AI artificial intelligence
  • the scheduler in the supercomputing center allocates different hardware resources to different jobs (such as HPC jobs or AI jobs), such as computing resources, storage resources, and network resources.
  • HPC jobs or AI jobs such as computing resources, storage resources, and network resources.
  • there is only one type of scheduler which can only allocate hardware resources for one type of job and cannot allocate hardware resources for other types of jobs, resulting in poor adaptability of the cluster to other types of jobs. Therefore, how to provide a highly adaptable job scheduling method has become a problem that needs to be solved urgently.
  • the present application provides a job scheduling method, device and chip, which solves the problem of poor adaptability caused by a single type of scheduler in the same cluster in a supercomputing center being able to process only one type of job.
  • a job scheduling method which is applicable to a heterogeneous computing power system including a scheduling node, in which a two-layer scheduling architecture consisting of a unified scheduler and a set of schedulers managed by the unified scheduler, and a peer architecture in which at least two resource schedulers simultaneously manage each computing node are adopted.
  • the job scheduling method includes: first, the unified scheduler obtains a job scheduling command. Secondly, the unified scheduler determines a first type of resource scheduler that matches the type of the first job from the set of schedulers according to the type of the first job indicated by the job scheduling command. Finally, the unified scheduler instructs the first type of resource scheduler to perform data processing of the first job according to the job scheduling command.
  • the job scheduling command is used to execute the scheduling process of the first job, and the scheduler set includes resource schedulers that can execute at least two different types of job scheduling processes.
  • the above data processing may include, but is not limited to: the first type of resource scheduler allocates corresponding resources (or computing nodes) managed by the first type of resource scheduler to the first job according to the resources required by the first job.
  • the scheduling node For a heterogeneous computing power system (or cluster), the scheduling node sets a scheduler set for the heterogeneous computing power system. Since the scheduler set includes resource schedulers that can be used to perform various types of job scheduling processing, the same cluster implements the function of supporting various types of job scheduling processing. When the same cluster obtains a job scheduling command, the unified scheduler in the scheduling node can assign a matching resource scheduler to the job according to the type of job indicated by the job scheduling command, and instruct the resource scheduler matching the type of the job to perform the data processing process of the job. The same cluster supports schedulers of different job types, which improves the adaptability of the same cluster to different types of jobs.
  • the unified resource caller assigns the job to the corresponding type of resource scheduler, which not only ensures the types of schedulers in the same cluster, thereby increasing the types of jobs that can be scheduled, but also avoids the development workload brought by the improvement of the original resource scheduler, so that the same cluster can support the job processing of various job types, which is conducive to improving the resource utilization rate in the same cluster and the efficiency of job execution data processing.
  • the above job scheduling command may be sent from the terminal to the scheduling node.
  • the information carried in the job scheduling command is used to indicate the type of the first job.
  • the unified scheduler can quickly determine the type of the first job according to the job scheduling command, thereby improving the efficiency of the scheduling node in determining a matching resource scheduler for the first job.
  • the type of the first job is an HPC job, an AI job, or a big data job, etc.
  • a permission lock is maintained in the unified scheduler, and the permission lock is used to indicate the scheduling permission of the first type of resource scheduler to the resources managed by the first type of resource scheduler, and the managed resources include at least one of computing resources, network resources, and storage resources.
  • the unified scheduler can update the status of the permission lock according to the situation in the following example.
  • LSF resource scheduler
  • Example 1 The process of obtaining scheduling authority: The unified scheduler sends a resource scheduling command that matches the job scheduling command to LSF. LSF According to the resource scheduling command, the unified scheduler is requested for the scheduling authority of the computing node managed by the LSF. According to the aforementioned request, the unified scheduler updates the state of the authority lock to: LSF has the authority to schedule the resources managed by the LSF.
  • Example 2 the process of releasing scheduling authority: after LSF allocates corresponding resources to the first job, LSF sends a permission release command to the unified scheduler; and the unified scheduler responds to the permission release command and updates the status of the permission lock to: no resource scheduler has scheduling authority at this time.
  • This permission lock can be used to control the scheduling permissions of multiple resource schedulers for resources in the same cluster, so that only one resource scheduler can allocate computing nodes for jobs at the same time in the same cluster. While improving the adaptability of different types of jobs in the same cluster, it avoids the problem of the same resources being called by multiple resource schedulers at the same time, resulting in resource preemption and increased waiting time for job execution caused by resource preemption, improving the accuracy of job scheduling and the adaptability of the same cluster to multiple different types of jobs.
  • the resource scheduler does not need to interact with other resource schedulers, which is conducive to improving the efficiency of job scheduling.
  • the job scheduling method before the unified scheduler instructs the first type of resource scheduler to perform data processing of the first job, the job scheduling method also includes: the unified scheduler obtains the status of resources managed by each resource scheduler in the scheduler set; the unified scheduler determines the resource status table based on the status of the resources managed by the aforementioned each resource scheduler.
  • the above-mentioned state may include: the running state of the computing node, and the usage state of at least one of the computing resources, storage resources or network resources included in the computing node.
  • the resource state table may include the current state of the computing node, and the current state is used to indicate whether the computing node is available or unavailable.
  • the current state of the computing node may be determined based on the running state.
  • the resource status table can be used to indicate the usage of resources managed by each resource scheduler in the scheduler set.
  • multiple resource schedulers can determine the usage of resources based on the resource status table, and a unified scheduler can synchronize the resource status among multiple resource schedulers, thereby avoiding the situation where different resource schedulers manage the same resource with different statuses. Furthermore, it ensures that multiple resource schedulers will not preempt the same resources, which is beneficial to reducing the execution waiting time of jobs.
  • the job scheduling method after instructing the first scheduler to perform data processing of the first job, the job scheduling method also includes: when the unified scheduler receives the resource allocation result of the first job sent by the first type of resource scheduler, the unified scheduler updates the resource status table; wherein the resource allocation result is used to indicate: the callable resources allocated by the first type of resource scheduler to the first job.
  • the scheduling node After the scheduling node completes the allocation of computing nodes for a job, it promptly updates the resource status table to ensure that the scheduling node allocates resources to other jobs based on the latest resource status table, avoiding the resource scheduler from allocating jobs to computing nodes whose usage status is that the remaining resources are less than the resources required by the job, which may cause resource preemption. This improves the rationality of the resource scheduler's allocation of computing nodes to jobs, reduces the waiting time for job execution, and improves the efficiency of the scheduling node in allocating computing resources to jobs.
  • the job scheduling method further includes: instructing other resource schedulers in the scheduler set to synchronize the updated resource status table, and determining resource allocation results for other jobs according to the updated resource status table.
  • the other resource scheduler is any resource scheduler in the scheduler set except the first type of resource scheduler.
  • the unified scheduler may send a resource synchronization command to other resource schedulers to instruct the other resource schedulers to synchronize the updated resource status table.
  • the other resource schedulers allocate resources for other jobs according to the updated resource status table and obtain corresponding resource allocation results.
  • the resource synchronization command is used to instruct: other resource schedulers to synchronize the updated resource status table.
  • the unified scheduler synchronizes the updated resource status table to other resource schedulers in the scheduler set, and synchronizes the resource status between multiple resource schedulers. This avoids the problem of the same resource being repeatedly called by multiple resource schedulers due to inconsistent resource status, resulting in resource preemption and increased waiting time for job execution caused by resource preemption, which is conducive to improving the efficiency of data processing for the job in the same cluster.
  • a job scheduling device which is applied to a scheduling node and is suitable for a heterogeneous computing system including a scheduling node, and the job scheduling device includes various modules for executing the job scheduling method in the first aspect or any optional design of the first aspect.
  • the job scheduling device includes: an acquisition module, a selection module, and an indication module, wherein the acquisition module is used to obtain a job scheduling command; the selection module is used to select a first type of resource scheduler matching the first job in a scheduler set according to the type of the first job, and the scheduler set includes resource schedulers that can perform at least two different types of job scheduling processing; the indication module is used to instruct the first type of resource scheduler to perform data processing of the first job.
  • the job scheduling command is used to execute the scheduling process of the first job.
  • the above-mentioned job scheduling command may be sent from the terminal to the scheduling node.
  • the unified scheduler may send the job scheduling command to the first type resource manager to instruct the first type resource scheduler to perform data processing of the first job.
  • the present application provides a chip, comprising: a control circuit and an interface circuit, wherein the interface circuit is used to obtain a job scheduling command, and the control circuit is used to execute the method in the first aspect and any possible implementation of the first aspect according to the job scheduling command.
  • the present application provides a scheduling node, comprising a processor and a memory; the memory is used to store computer instructions, and the processor executes the computer instructions to implement the method in the above-mentioned first aspect and any optional implementation method of the first aspect.
  • the present application provides a heterogeneous computing power system, which includes a scheduling node and a computing node; the scheduling node is used to allocate a computing node to a first job, so that the scheduling node executes the method in the above-mentioned first aspect and any optional implementation of the first aspect.
  • the present application provides a computer-readable storage medium, which stores a computer program or instruction.
  • the computer program or instruction is executed by a processing device, the method in the above-mentioned first aspect and any optional implementation method in the first aspect is implemented.
  • the present application provides a computer program product, which includes a computer program or instructions.
  • a computer program product which includes a computer program or instructions.
  • FIG1 is an application scenario diagram of a heterogeneous computing system provided by the present application.
  • FIG2 is a schematic diagram of a flow chart of a method for initializing a scheduling node provided in an embodiment of the present application
  • FIG3 is a schematic diagram of a process flow of a job scheduling method provided in an embodiment of the present application.
  • FIG4 is a flowchart of a method for updating a permission lock according to an embodiment of the present application.
  • FIG5 is a second flow chart of a method for updating a permission lock provided in an embodiment of the present application.
  • FIG6 is a flow chart of a state updating method provided in an embodiment of the present application.
  • FIG7A is a structural schematic diagram 1 of a job scheduling device provided in an embodiment of the present application.
  • FIG7B is a second structural diagram of a job scheduling device provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the structure of a control node provided in an embodiment of the present application.
  • the present application provides a job scheduling method, which is applicable to a heterogeneous computing system including scheduling nodes, wherein the scheduling nodes adopt a two-layer scheduling architecture consisting of a unified scheduler and a set of schedulers managed by the unified scheduler, and the set of schedulers includes at least a plurality of different types of resource schedulers, wherein the plurality of types of resource schedulers respectively perform data processing of resources (or computing nodes) for specific types of jobs, and a peer architecture in which at least two resource schedulers simultaneously manage each computing node.
  • the above-mentioned specific type of job refers to the type of job that the resource scheduler supports scheduling
  • the non-specific type of job refers to the type of job that the resource scheduler does not support scheduling or has low scheduling efficiency.
  • the heterogeneous computing power system includes multiple clusters, each cluster supports job processing of multiple job types, and the scheduling node sets a scheduler set for the heterogeneous computing power system. Since the scheduler set includes resource schedulers that can be used to perform multiple types of job scheduling processing, the function of supporting multiple types of job scheduling processing is realized in the same cluster.
  • the unified scheduler in the scheduling node can assign a matching resource scheduler to the job according to the type of job indicated by the job scheduling command, and instruct the resource scheduler matching the type of the job to perform the data processing process of the job, and the same cluster supports job processing of multiple different job types.
  • the unified scheduler in the scheduling node can assign a matching resource scheduler to the job according to the type of job indicated by the job scheduling command, and instruct the resource scheduler matching the type of the job to perform the data processing process of the job.
  • the same cluster supports schedulers of different job types, which improves the adaptability of the same cluster to different types of jobs.
  • the unified resource caller allocates jobs to resource schedulers of the corresponding type.
  • the heterogeneous computing power system is a computer cluster, which includes a scheduling node and a large number of computing nodes.
  • the scheduling node and the computing node can be connected by wire or wirelessly.
  • the scheduling node is used to allocate computing nodes to jobs, and the computing nodes provide computing power support for the jobs.
  • the above-mentioned job scheduling command is used to schedule the computing nodes that match the first job.
  • the unified scheduler can send a job scheduling command to the first type of resource manager to instruct the first type of resource scheduler to allocate resources in the heterogeneous computing power system to the first job.
  • a scheduler can be software running on a scheduling node or a hardware device deployed in a scheduling node.
  • a resource scheduler refers to scheduling software that allocates various types of resources to jobs or applications.
  • the resource scheduler can implement functions such as computing resource management and job scheduling.
  • the resource scheduler refers to a processor or controller that is deployed separately on a server or a heterogeneous computing system (such as a cluster of multiple types of processors); in other cases, the resource scheduler refers to a virtual machine (VM), container, or other software unit deployed on the server, which is not limited in this application.
  • VM virtual machine
  • the resource scheduler can provide an access port, and the hardware or other software units in the server can send commands or instructions to the resource scheduler through the access port, or the resource scheduler can schedule resources provided by the server or heterogeneous computing system through the access port.
  • the scheduler set includes multiple different types of resource schedulers, and each type of resource scheduler may include one or more resource schedulers.
  • High-performance computing clusters refer to computers that can perform large amounts of data and high-speed calculations that ordinary personal computers cannot handle.
  • the container orchestration platform (Kubernetes, k8S) refers to a system that automates the operation and maintenance of containers (Docker).
  • LSF Load Sharing Facility
  • the cluster task management system (Sun Grid Engine, SGE) is a system used to queue tasks submitted by users and then assign the tasks to capable computing nodes for execution.
  • Agents are deployed on computing nodes to communicate with scheduling nodes and perform corresponding operations based on the content of the communication. Agents deployed on computing nodes include K8S agents and HPC agents.
  • Computing nodes refer to systems that provide computing power, storage, and network support for jobs; computing nodes may include: Central Processing Unit (CPU) computing nodes, Graphics Processing Unit (GPU) computing nodes, and Neural-Network Processing Units (NPU) computing nodes.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • NPU Neural-Network Processing Units
  • CPU computing nodes are equipped with a large number of CPUs.
  • GPU computing nodes are equipped with a large number of parallel accelerators such as GPUs, Field-Programmable Gate Arrays (FPGAs), and Application Specific Integrated Circuits (ASICs).
  • FPGAs Field-Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • the memory in the computing node can be, but is not limited to, random access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable read-only memory (EEPROM), etc.
  • the network configured by the communication interface of the computing node can be the Internet, or other networks (such as Ethernet).
  • the network can include one or more network devices, such as a router or a switch.
  • the resource scheduler does not support scheduling the non-specific types of jobs.
  • the present application adopts a two-layer scheduling architecture consisting of a unified scheduler and a scheduler set, and a peer-to-peer architecture in which at least two resource schedulers in the scheduler set manage each computing node at the same time.
  • Figure 1 is an application scenario diagram of a heterogeneous computing power system provided by the present application.
  • the heterogeneous computing power system 100 may include a scheduling node 110 and n computing nodes 120, where n is a positive integer.
  • the terminal 200 and the scheduling node 110, as well as the scheduling node 110 and the computing node 120, can communicate via wired means, such as Ethernet, optical fiber, and various peripheral component interconnect standards (Peripheral Component Interconnect Express, PCIe) buses set inside the heterogeneous computing power system 100 for connecting the scheduling node 110 and the computing node 120; they can also communicate via wireless means, such as the Internet, wireless communication (WIFI), and ultra-wideband (Ultra Wide Band, UWB) technology, etc.
  • the heterogeneous computing system 100 may also include a terminal 200 .
  • the terminal 200 sends the job scheduling command to the scheduling node 110, and the unified scheduler 111 in the scheduling node 110 determines the first type of resource scheduler 1121 that matches the type of the first job from the scheduler set 112 according to the type of the first job indicated by the job scheduling command.
  • the unified scheduler 111 will instruct the first type of resource scheduler 1121 to perform data processing of the first job according to the job scheduling command, and the process of the data processing includes: the first type of resource scheduler allocates the corresponding computing node to the first job according to the resources required by the first job.
  • the scheduling node will be initialized.
  • a possible implementation method for initializing the scheduling node 110 in the heterogeneous computing power system 100 is provided, as shown in FIG2 , which is a flow chart of the method for initializing the scheduling node provided in the embodiment of the present application.
  • a unified scheduler and a scheduler set are running in the scheduling node 110, and the scheduler set includes resource schedulers for executing various types of job scheduling processing, such as LSF and K8S, a unified scheduler and a scheduler set.
  • the resource scheduler included in the scheduler set may be executed on a processor in the scheduling node 110 .
  • the scheduling node 110 also includes a memory. It is worth noting that FIG. 2 is only an example provided in the present application and should not be understood as a limitation of the present application.
  • the scheduler set also includes more resource schedulers, and the heterogeneous computing system 100 may also include more or fewer computing nodes.
  • the scheduling node initialization provided in this embodiment includes steps S210 and S220.
  • the unified scheduler 111 obtains the status of the computing node 120 managed by the resource scheduler in the scheduler set of the heterogeneous computing power system 100.
  • the status is used to indicate: the running status of the computing node, and the usage status of computing resources, storage resources, network resources, etc. included in the computing node.
  • the computing resources are used to indicate the number of floating-point operations and integer operations per unit time; for example, the number of floating-point operations and integer operations per unit time of the computing resources are used to describe the remaining processing capacity.
  • the above computing resources may be provided by a processor in the computing node 120, which may be an integrated circuit chip with signal processing capability, and may be a general-purpose processor, including the above-mentioned CPU, GPU, NPU, FPGA, ASIC, etc.
  • Storage resources are used to indicate storage capacity, data read and write speed, etc.
  • the data read and write speed of storage resources is used to describe the processing capacity of storage resources.
  • the above storage resources can be provided by the memory in the computing node 120 shown in FIG1.
  • For the content of the memory reference can be made to the description of the memory of the computing node in the above-mentioned related art, which will not be repeated here.
  • the network resource is used to indicate the transmission bandwidth, which refers to the maximum amount of data that can be transmitted in a unit of time; for example, the transmission bandwidth is used to describe the processing capacity of the network resource.
  • the above network resources can be provided by the transmission bandwidth of the communication interface of the computing node 120.
  • For the content of the transmission bandwidth of the communication interface please refer to the description of the communication interface of the computing node in the above related technology, which will not be repeated here.
  • the operating state of the computing node can be determined according to the fault condition of the computing node.
  • the fault condition may include but is not limited to: interruption of the communication link between the resource scheduler and the computing node, failure of a device (such as a memory, a processor core, a network device, etc.) in the computing node, etc.
  • the operating status of the computing node is unavailable.
  • a computing node configured with a 128-core CPU can only provide 64 cores of computing power due to the failure of some cores (non-main cores). At this time, the running status of the node is "available", but the available resources are 64 cores.
  • this application provides the following two optional examples.
  • the unified scheduler 111 sends a status collection command to LSF and K8S, and LSF and K8S obtain the status of computing node 1, computing node 2, computing node 3, and computing node 4 respectively according to the status collection command.
  • the unified scheduler 111 may send a bhosts command to LSF to check the status of all computing nodes; the unified scheduler 111 may send a top command to K8S to check the status of all computing nodes.
  • LSF and K8S actively send the status of all computing nodes to the unified scheduler 111.
  • the unified scheduler 111 generates a resource status table according to the acquired status of the computing node.
  • the resource status table is used to track the usage of resources in the cluster.
  • the unified scheduler updates the resource status table in a timely manner according to the acquired status.
  • the resource status table can be stored in a database that communicates with the unified scheduler, or the resource status table can be stored in a scheduling node to which the unified scheduler belongs.
  • the resource status table may include the running status and current status of the computing node, etc. After obtaining the resource status table, the unified scheduler synchronizes the resource status table to all resource schedulers in the scheduler set.
  • the current status is determined based on the running status of the computing node obtained by each resource scheduler.
  • the current status of the computing node is available only when the running status of the computing node obtained by each resource scheduler is available.
  • a resource status table is shown in Table 1 below.
  • node_name represents the name of the computing node, and each computing node has a unique number
  • node_state represents the current state of the computing node, which can be divided into two types: available/unavailable
  • K8S_state represents the running state of the computing node obtained by K8S, which can be divided into two types: available/unavailable
  • LSF_state represents the running state of the computing node obtained by LSF, which can be divided into two types: available/unavailable.
  • the resource status table will also save the usage status of at least one of the computing resources, storage resources, and network resources corresponding to each computing node. For example, for a computing node named C01n01, the remaining resources are 10 cores and 30% (500G) of storage space.
  • the resource status table can be used to indicate the usage of resources managed by each resource scheduler in the scheduler set. For example, multiple resource schedulers can determine the usage of resources (such as the status of multiple computing nodes shown in Table 1 above) based on the resource status table.
  • the unified scheduler can synchronize the resource status among multiple resource schedulers, thereby avoiding the situation where different resource schedulers manage the same resource with different status. Furthermore, it ensures that multiple resource schedulers will not preempt the same resources, which is beneficial to reducing the execution waiting time of jobs.
  • the unified scheduler 111 and the resource schedulers included in the scheduler set may be located at different scheduling nodes, for example, the unified scheduler is located at the first scheduling node, and the scheduler set is located at the second scheduling node.
  • the unified scheduler 111 is configured with the ports of all resource schedulers in the scheduler set 112, so that the unified scheduler 111 can send job scheduling commands and status collection commands to the resource schedulers based on the ports of the resource schedulers; and all resource schedulers are also configured with the ports of the unified scheduler 111, so that the resource schedulers can send commands and data to the unified scheduler 111 through the ports of the unified scheduler 111.
  • the unified scheduler 111 maintains the IP-Ports of all resource schedulers, such as resource scheduler 1, 14.17.32.211:1024, resource scheduler 2, 14.17.32.211:1025. Accordingly, each resource scheduler maintains the IP-Port of the unified scheduler 111 (such as 14.17.33.211:1024).
  • FIG3 is a flow chart of a job scheduling method provided in an embodiment of the present application, and the job scheduling method can be applied to the heterogeneous computing power system 100 shown in FIG1 .
  • the job scheduling method is executed by the scheduling node 110 shown in FIG1 , and the scheduling node 110 is deployed with a unified scheduler 111 and a scheduler set 112, and the scheduler set includes a first-class resource scheduler 1121 and a second-class resource scheduler 1122, etc.
  • Each type of resource scheduler may include one or more resource schedulers.
  • the first type of resource scheduler 1121 includes resource scheduler 1121A and resource scheduler 1121B, both of which are K8S schedulers.
  • the second type of resource scheduler 1122 includes resource scheduler 1122A and resource scheduler 1122B, both of which are LSF schedulers.
  • the resource scheduler mentioned above refers to a system deployed in the scheduling node 110 for scheduling resources for jobs.
  • the resource scheduler refers to the content of the resource scheduler in the introduction to the above-mentioned related technologies.
  • the scheduling node 110 allocates computing nodes to jobs.
  • the job scheduling method provided in this embodiment includes the following steps S310 to S330 .
  • the unified scheduler 111 in the scheduling node 110 obtains a job scheduling command.
  • the user inputs a job scheduling command for the first job through the terminal 200, and then sends the job scheduling command to the scheduling node 110.
  • the job scheduling command is used to perform the scheduling process of the first job, and instruct the resource scheduler to schedule the computing nodes required for the first job.
  • the user can input the job scheduling command in the command line interface (CLI) on the terminal 200.
  • CLI command line interface
  • the type of the first job may be: HPC job, containerized job (such as AI job) or big data job, etc.
  • the first job may also refer to other types of jobs, which are not limited in this application.
  • the format of the job scheduling command can be implemented in a variety of different ways.
  • the scheduler set only includes K8S and LSF, this application gives two possible examples.
  • the job scheduling command is a native K8S command or an LSF command.
  • native LSF commands such as bjobs-r/-a/-p, busb-J/-N/-R span, etc.
  • Native K8S commands such as Kubectl create/delete/get/run commands, etc.
  • the job scheduling command is a command that encapsulates a native K8S command or an LSF command.
  • the job scheduling command is shown in Table 2 below.
  • LSF commands and “K8S commands” refer to native LSF commands and K8S commands.
  • K8S commands For the contents of native LSF commands and K8S commands, please refer to the description of K8S commands and LSF commands in the first possible example above, which will not be repeated here.
  • the job scheduling method provided in this embodiment further includes step S320 .
  • the unified scheduler 111 determines a first-type resource scheduler 1121 matching the first job from the scheduler set 112 according to the type of the first job indicated by the job scheduling command.
  • the unified scheduler 111 in the scheduling node 110 identifies the type of the first job indicated in the job scheduling command, and the unified scheduler 111 determines a resource scheduler that matches the type of the first job from the scheduler set 112 based on the type of the first job, wherein the resource schedulers in the scheduler set 112 are used to schedule computing nodes in the heterogeneous computing power system 100.
  • the unified scheduler determines the K8S used to allocate computing nodes to the AI job from the LSF and the K8S.
  • the unified scheduler 111 determines the type of the first job according to information carried in the job scheduling command.
  • this embodiment provides the following two possible examples for illustration.
  • the correspondence between the job scheduling command and the type of the first job can be determined according to a set mapping relationship table.
  • the mapping relationship table is used to indicate: the correspondence between the command and the job type.
  • the aforementioned command can indicate the command header corresponding to the command.
  • the unified scheduler 111 queries the mapping relationship table according to busb of the above job scheduling command, and can determine that the type of the first job corresponding to the job scheduling command is an HPC job.
  • z represents the number of CPUs required to submit the job
  • –q specifies the queue to which the job is submitted. If the –q option is not used, the system submits the job to the default job queue.
  • inputfile represents the file name that the program needs to read (such as namelist, etc.)
  • outputfile represents a file, and the standard output information after the job is submitted will be saved in this file.
  • COMMAND is the program that the user wants to run.
  • the unified scheduler 111 queries the mapping relationship table according to the kubectl of the above job scheduling command, and can determine that the type of the first job corresponding to the job scheduling command is an AI job or a containerized job.
  • the unified scheduler 111 can directly determine the type of the job corresponding to the job scheduling command based on the job type contained in the job scheduling command, such as HPC or AI.
  • the unified scheduler 111 can determine that the job type of the aforementioned job scheduling command is an HPC job based on the musb-HPC of the job scheduling command.
  • the unified scheduler 111 can determine that the job type of the aforementioned job scheduling command is an AI job based on the musb-AI of the job scheduling command.
  • the unified scheduler can quickly determine the type of the first job according to the job scheduling command, thereby improving the efficiency of the scheduling node in determining a matching resource scheduler for the first job.
  • the job scheduling method provided in this embodiment further includes step S330 .
  • the unified scheduler 111 instructs the first-type resource scheduler 1121 to execute data processing of the first job.
  • the unified scheduler 111 may instruct the first-type resource scheduler 1121 to schedule the first job by sending a resource scheduling command matching the job scheduling command to the first-type resource scheduler.
  • the resource scheduling command is used to indicate that the first type resource scheduler 1121 is a computing node in the first job allocation cluster.
  • the resource scheduling command is intercepted from the job scheduling command.
  • the unified scheduler deletes musb-AI of the above job scheduling command and obtains the resource scheduling command bsub–n z–q QUEUENAME–i inputfile–o outputfile COMMAND.
  • LSF when LSF receives a resource scheduling command corresponding to the first job, which is "bsub–n z–q QUEUENAME–i inputfile–o outputfile COMMAND", LSF allocates a computing node for the first job according to the required number of CPUs indicated in the resource scheduling command, the queue to which the specified job is submitted, and the resource status table shown in Table 1.
  • the computing node runs the first job according to information such as the program to be run by the first job, the data to be read, and the address of the output data storage.
  • the first resource scheduler 1121 After determining the computing node assigned to the first job, the first resource scheduler 1121 sends the information of the first job to the agent on the computing node, such as the HPC agent or the K8S agent, to instruct the computing node to run the first job.
  • the information of the first job may include the program to be run and the input data.
  • the scheduling node sets a scheduler set for the heterogeneous computing system.
  • the scheduler set includes K8S and LSF. Since the scheduler set includes resource schedulers that can be used to perform various types of job scheduling processing (such as HPC jobs and AI jobs), the function of supporting various types of job scheduling processing is implemented in the same cluster.
  • the unified scheduler in the scheduling node can assign a matching resource scheduler to the job according to the type of job indicated by the job scheduling command (such as the unified scheduler assigns a job of HPC type to LSF), and instructs the resource scheduler that matches the type of job to execute the data processing process of the job, thereby increasing the number of job types supported for processing in the same cluster and improving the adaptability of the same cluster to different types of jobs.
  • the unified scheduler assigns the job to the resource scheduler of the corresponding type, which is beneficial to improving the efficiency of the same cluster in performing data processing for the job.
  • the scheduler set 112 shown in the above embodiment includes LSF or K8S, which schedules HPC jobs or AI jobs respectively, and belongs to two categories of resource schedulers.
  • the first type of resource scheduler 1121 may include multiple resource schedulers, such as the first type of resource scheduler 1121 includes resource scheduler 1121A and resource scheduler 1121B, where LSF can be called resource scheduler 1121A, and SGE can be called resource scheduler 1121B.
  • the unified scheduler 111 obtains the job scheduling command and the type of the first job indicated by the job scheduling command, determines the content of the first type resource scheduler 1121 that matches the type of the first job, and can refer to the contents of S310 and S320 above, which will not be repeated here.
  • the unified scheduler 111 After determining the first-class resource scheduler 1121 that matches the type of the first job, the unified scheduler 111 determines the job queue status of LSF and SGE in the first-class resource scheduler. Based on the job queue status of LSF and SGE, the unified scheduler 111 sends the resource scheduling command to the resource scheduler with fewer job queues.
  • the unified scheduler sends a resource scheduling command to LSF based on the job queue status.
  • the content of assigning a computing node to the first job can refer to the content of the resource scheduling command corresponding to the first job received by LSF, which will not be repeated here.
  • the process of instructing the first-type resource scheduler to perform data processing of the first job may include:
  • the unified scheduler 111 sends a resource scheduling command matching the job scheduling command to the LSF, and the LSF requests the unified scheduler 111 for the scheduling authority of the computing node managed by the LSF according to the resource scheduling command, and the LSF allocates resources for the first job according to the scheduling authority and the resource demand indicated by the resource scheduling command.
  • At least one of computing resources, network resources, and storage resources At least one of computing resources, network resources, and storage resources.
  • the LSF request for scheduling authority may be implemented through command interaction with the unified scheduler 111.
  • the interaction process between the resource scheduler and the unified scheduler before the job scheduling process is illustrated by way of example, as shown in FIG4 , which is a flowchart diagram 1 of the permission lock update method provided in an embodiment of the present application.
  • the LSF in response to the resource scheduling command of the unified scheduler 111 , the LSF requests the unified scheduler 111 for the scheduling authority of the callable computing node indicated in the aforementioned resource status table.
  • the LSF may determine the computing node that can be called from the computing nodes managed by the LSF from the resource status table shown in Table 1.
  • the running status of the computing node that can be called is available and has remaining resources.
  • the LSF may send a first command to the unified scheduler, where the first command is used to request scheduling authority for the computing node 120 .
  • the unified scheduler 111 responds to the request of the LSF and updates the status of the permission lock maintained by the unified scheduler 111 .
  • the unified scheduler 111 updates the status of the permission lock to: LSF has the permission to schedule the computing node.
  • the unified scheduler 111 may update the permission lock state according to the configured scheduling policy. If the priority of LSF is higher than the priority of K8S, when the first command of LSF and K8S is received at the same time, the unified scheduler 111 updates the permission lock state to: LSF has the permission to schedule the computing node 120.
  • the second command is used to indicate that LSF can schedule the computing nodes 120 in the heterogeneous computing power system 100 according to the scheduling authority.
  • LSF allocates computing nodes to the first job.
  • LSF is a description of the allocation of computing nodes to the first job. Please refer to the example in S330 above for the allocation of computing nodes 120 in the heterogeneous computing system 100 to the first job, which will not be repeated here.
  • K8S sends a first command to the unified scheduler 111.
  • the unified scheduler 111 returns information about the call failure to K8S based on the status of the permission lock.
  • K8S will periodically send the first command until it obtains the third command sent by the unified scheduler 111, wherein the third command is used to indicate that K8S can schedule the computing nodes 120 in the heterogeneous computing power system 100 according to the scheduling authority.
  • process of the first type of resource scheduler executing data processing of the first job may also include:
  • FIG5 is a second flow chart of the permission lock updating method provided in an embodiment of the present application.
  • the LSF sends a permission release command to the unified scheduler 111 .
  • the authority release command is used to indicate that the LSF has released the above scheduling authority.
  • the unified scheduling 111 responds to the permission release command and updates the status of the permission lock.
  • the status of the permission lock can be updated to: no resource scheduler has scheduling authority at this time.
  • the unified scheduler 111 will update the status of the permission lock to: K8S has the authority to schedule the computing node.
  • This permission lock can be used to control the scheduling permissions of multiple resource schedulers for resources in the same cluster, so that in the same cluster, only one resource scheduler (such as only LSF in the above example) can allocate computing nodes for jobs at the same time. While improving the adaptability of different types of jobs in the same cluster, it avoids the problem of the same resources being called by multiple resource schedulers at the same time, resulting in resource preemption and increased waiting time for job execution caused by resource preemption, and improves the accuracy of job scheduling and the adaptability of the same cluster to multiple different types of job allocations.
  • the resource scheduler does not need to interact with other resource schedulers, which is conducive to improving the efficiency of job scheduling.
  • the job scheduling method provided in this embodiment further includes the unified scheduler 111 updating the resource status table.
  • Figure 6 is a flow chart of the state update method provided in an embodiment of the present application. Take the first type of resource scheduler being LSF and the second type of resource scheduler being K8S as an example for explanation.
  • the LSF sends the resource allocation result of the first job to the unified scheduler 111 .
  • the resource allocation result is used to indicate the callable resources allocated by the LSF to the first job.
  • LSF allocates the first job to the computing node with the node name C01n01 in Table 1 above, and allocates 4 cores in C01n01 and 10% of the storage resources to the first job.
  • the unified scheduler 111 updates the resource status table according to the resource allocation result.
  • the unified scheduler allocates 4 cores in C01n01 and 10% of storage resources according to the above resource allocation result.
  • the scheduling node After the scheduling node completes the allocation of computing nodes for a job, it promptly updates the usage status and running status of the computing nodes in the resource status table shown in Table 1 above, ensuring that the scheduling node allocates resources to other jobs based on the latest resource status table, avoiding the resource scheduler from allocating jobs to computing nodes whose usage status is that the remaining resources are less than the resources required by the job, causing resource preemption problems, improving the rationality of the resource scheduler's allocation of computing nodes to jobs, reducing the waiting time for job execution, and improving the efficiency of the scheduling node in allocating computing resources to jobs.
  • this embodiment provides a possible implementation method: the unified scheduler 111 instructs other resource schedulers in the scheduler set to synchronize the above-mentioned updated resource status table, and determine the resource allocation results for other jobs based on the updated resource status table.
  • resource schedulers are any resource schedulers in the scheduler set except the first type of resource schedulers.
  • the unified scheduler 111 sends a resource synchronization command to other resource schedulers, such as K8S in Figure 6, to instruct other resource schedulers to synchronize the updated resource status table.
  • Other resource schedulers allocate resources for other jobs according to the updated resource status table and obtain corresponding resource allocation results.
  • the resource synchronization command is used to instruct other resource schedulers to synchronize updated resource status tables.
  • the resource synchronization command carries a complete updated resource status table, and other resource schedulers replace the existing saved resource status table after receiving the resource synchronization command.
  • the resource synchronization command carries a partial resource status table, and other resource schedulers update the existing saved resource status table based on the partial resource status table after receiving the resource synchronization command.
  • the partial resource status table only includes the resource allocation result of the first job.
  • the above example is a resource synchronization command with data to be updated, such as a complete resource status table, or a partial resource status table; in other situations of the present application, the unified scheduler 111 may send the resource synchronization command and the data to be updated to other resource schedulers respectively.
  • the unified scheduler synchronizes the updated resource status table to other resource schedulers (such as K8S) in the scheduler set, and synchronizes the resource status between multiple resource schedulers. This avoids the problem of the same resource being repeatedly called by multiple resource schedulers due to inconsistent resource status between multiple resource schedulers, resulting in resource preemption and increased waiting time for job execution caused by resource preemption. This is conducive to improving the efficiency of data processing for the job in the same cluster.
  • resource schedulers such as K8S
  • Figure 7A is a structural schematic diagram of a job scheduling device provided by the present application, to describe the job scheduling device provided by the present application.
  • the job scheduling device 700 can be used to implement the function of the scheduling node in the above method embodiment, and thus can also achieve the beneficial effects possessed by the above method embodiment.
  • the job scheduling device 700 includes a command acquisition module 710, a selection module 720, and an indication module 730; the job scheduling device 700 is used to implement the function of the scheduling node in the method embodiments corresponding to the above-mentioned FIG2 to FIG6 .
  • the specific process of the job scheduling device 700 for implementing the above-mentioned job scheduling method includes the following process:
  • the acquisition module 710 is used to acquire a job scheduling command.
  • the terminal 200 receives the job scheduling command of the first job input by the user, and then sends the job scheduling command to the unified scheduler.
  • the job scheduling command is used to schedule resources matched with the first job.
  • the job scheduling command is used to execute the scheduling process of the first job.
  • the selection module 720 is used to determine a first type of resource scheduler matching the first job from a scheduler set according to the type of the first job indicated by the job scheduling command.
  • the unified scheduler in the scheduling node identifies the type of the first job indicated in the job scheduling command, and the unified scheduler determines a resource scheduler matching the type of the first job from a scheduler set according to the type of the first job, wherein the resource schedulers in the scheduler set are used to schedule computing nodes in the heterogeneous computing power system.
  • the instruction module 730 is used to instruct the first type of resource scheduler to perform data processing of the first job.
  • the unified scheduler can instruct the first type of resource scheduler to allocate a computing node in the heterogeneous computing system for the first job by sending a resource scheduling command matching the job scheduling command to the first type of resource scheduler.
  • the resource scheduling command is used to instruct: the first type of resource scheduler allocates a computing node in the cluster for the first job.
  • the resource scheduling command is intercepted from the job scheduling command.
  • the present application also provides a job scheduling device, as shown in Figure 7B, which is a structural schematic diagram 2 of a job scheduling device provided by the present application, and the job scheduling device 700 also includes a state update module 740, a table determination module 750, a table update module 760 and an indication synchronization module 770.
  • the status update module 740 is used to update the status of the permission lock, and the permission lock is used to indicate the scheduling authority of the first type of resource scheduler to the resources managed by the first type of resource scheduler, and the managed resources include at least one of computing resources, network resources, and storage resources.
  • the table determination module 750 is used to obtain the status of the resources managed by each resource scheduler in the scheduler set; and determine the resource status table according to the status of the resources, and the resource status table is used to indicate the usage of the resources managed by each resource scheduler in the scheduler set.
  • the table update module 760 is used to update the resource status table when receiving the resource allocation result of the first job sent by the first type of resource scheduler; wherein the resource allocation result is used to indicate: the callable resources allocated by the first type of resource scheduler to the first job.
  • the indication synchronization module 770 instructs other resource schedulers in the scheduler set to synchronize the updated resource status table, and determine the resource allocation results for other jobs according to the updated resource status table.
  • the scheduling node of the aforementioned embodiment may correspond to the job scheduling device 700, and may correspond to the corresponding subject corresponding to Figures 2 to 6 of the method according to the embodiments of the present application, and the operations and/or functions of each module in the job scheduling device 700 are respectively for implementing the corresponding processes of each method of the corresponding embodiments in Figures 2 to 6. For the sake of brevity, they will not be repeated here.
  • the scheduling node 110 may include one or more hardware, as shown in Figure 8, which is a schematic diagram of the structure of a scheduling node provided by the present application.
  • the scheduling node 800 may be applied to the heterogeneous computing system 100 shown in Figure 1.
  • the scheduling node 800 may include a processor 810 , a memory 820 , a communication interface 830 , a bus 840 , a unified scheduler 850 , etc.
  • the processor 810 , the memory 820 , and the communication interface 830 are connected via the bus 840 .
  • the processor 810 is the computing core and control core of the scheduling node 800.
  • the processor 810 can be a very large scale integrated circuit. An operating system and other software programs are installed in the processor 810, so that the processor 810 can access the memory 820 and various PCIe devices.
  • the processor 810 includes one or more processor cores.
  • the processor core in the processor 810 is, for example, a CPU or other specific integrated circuit ASIC.
  • the processor 810 can also be other general-purpose processors, digital signal processors (digital signal processing, DSP), FPGAs or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the scheduling node 800 may also include multiple processors.
  • the above-mentioned unified scheduler can be software executed in the processor 810.
  • the memory 820 can be used to store computer executable program codes, which include instructions.
  • the processor 810 executes various functional applications and data processing of the scheduling node 800 by running the instructions stored in the internal memory 820.
  • the memory 820 may include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application required for at least one function (such as identifying a job scheduling command, a sending function, etc.), etc.
  • the data storage area may store data created during the use of the processing device 800 (such as a resource status table), etc.
  • the internal memory 820 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • UFS universal flash storage
  • the communication interface 830 is used to implement communication between the scheduling node 800 and an external device or component. In this embodiment, the communication interface 830 is used to perform data exchange with the computing node 120 and the terminal 200.
  • the bus 840 may include a path for transmitting information between the above components (such as the processor 810, the memory 820, and the communication interface 830).
  • the bus 840 may also include a power bus, a control bus, and a status signal bus.
  • various buses are labeled as bus 840 in the figure.
  • the bus 840 may be a PCIe bus, or an extended industry standard architecture (EISA) bus, a unified bus (Ubus or UB), a compute express link (CXL), a cache coherent interconnect for accelerators (CCIX), etc.
  • EISA extended industry standard architecture
  • Ubus or UB unified bus
  • CXL compute express link
  • CCIX cache coherent interconnect for accelerators
  • the processor 810 is connected to the memory 820 through a double data rate (DDR) bus.
  • DDR double data rate
  • different memories 820 may use different data buses to communicate with the processor 810, so the DDR bus can also be replaced by other types of data buses, and the embodiment of the present application does not limit the bus type.
  • the present application also provides a unified scheduler, for example, the unified scheduler 111 in the aforementioned scheduling node 110.
  • the unified scheduler 111 may include one or more hardware.
  • the unified scheduler 850 is deployed on the scheduling node 800.
  • the unified scheduler 850 includes a processor 851, which includes one or more processor cores.
  • the processor 851 can execute the methods shown in Figures 2 to 6 according to the acquired job scheduling command.
  • the unified scheduler 850 can store the resource status table in the memory 820 in the scheduler node.
  • the unified scheduler 850 further includes a memory 852 , and the processor 851 may store the obtained resource status table in the memory 852 .
  • FIG8 only takes the scheduling node 800 including one processor 810 and one memory 820 as an example.
  • the device 810 and the memory 820 are respectively used to indicate a type of device or equipment.
  • the number of each type of device or equipment can be determined according to business requirements.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, a network device, a user device or other programmable device.
  • the computer program or instruction may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instruction may be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server, data center, etc. that integrates one or more available media.
  • the available medium may be a magnetic medium, for example, a floppy disk, a hard disk, a tape; it may also be an optical medium, for example, a digital video disc (DVD); it may also be a semiconductor medium, for example, a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

公开了一种作业调度方法、装置和芯片,涉及计算机领域。针对于一个异构算力系统(或集群)而言,调度节点为该异构算力系统设定调度器集合,该调度器集合包括可用于执行多种类型的作业调度处理的资源调度器,同一集群内实现了支持多种类型的作业调度处理的功能。当该同一集群获取到作业调度命令的情况下,调度节点中的统一调度器可按照作业调度命令指示的作业的类型为该作业分配相匹配的资源调度器,并指示与该作业的类型匹配的资源调度器执行该作业的数据处理过程,增加了同一集群中支持处理作业类型的数量,提高了同一集群中对不同类型作业的适配性。

Description

作业调度方法、装置和芯片
本申请要求于2022年10月28日提交国家知识产权局、申请号为202211338117.1、申请名称为“作业调度方法、装置和芯片”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机领域,尤其涉及作业调度方法、装置和芯片。
背景技术
超算中心是指包括多个服务器的集群,该集群可用于执行大规模作业或运算任务等;如该超算中心可用于运行高性能计算(High performance computing,HPC)作业或者人工智能(Artificial Intelligence,AI)作业等。通常,超算中心中的调度器为不同的作业(如HPC作业或AI作业)分配不同的硬件资源,如该硬件资源包括:计算资源、存储资源和网络资源等。然而,在同一集群中,仅设有1种类型的调度器,该调度器仅能为一种类型的作业分配硬件资源,不能为其他类型的作业进行硬件资源的分配,导致该集群对其他类型的作业适配性较差。因此,如何提供一种适配性高的作业调度方法成为目前亟需解决的问题。
发明内容
本申请提供了作业调度方法、装置和芯片,解决了超算中心中同一集群中单一类型的调度器仅能处理一种类型的作业所导致的适配性差的问题。
第一方面,提供了一种作业调度方法,该方法适用于包括调度节点的异构算力系统,在调度节点中,采用统一调度器和由统一调度器管理的调度器集合构成的两层调度架构,以及至少两个资源调度器同时管理每个计算节点的对等架构。该作业调度方法包括:首先,统一调度器获取作业调度命令。其次,统一调度器根据作业调度命令指示的第一作业的类型,从调度器集合中确定与第一作业的类型匹配的第一类资源调度器。最后,统一调度器指示第一类资源调度器根据作业调度命令执行第一作业的数据处理。
其中,该作业调度命令用于执行第一作业的调度处理,该调度器集合包括可执行至少两种不同类型的作业调度处理的资源调度器。
示例性的,上述数据处理可以包括,但不限于:第一类资源调度器根据第一作业所需的资源,为该第一作业分配对应的第一类资源调度器管理的资源(或称为计算节点)。
针对于一个异构算力系统(或集群)而言,调度节点为该异构算力系统设定调度器集合,由于该调度器集合包括可用于执行多种类型的作业调度处理的资源调度器,因此,同一集群内实现了支持多种类型的作业调度处理的功能。当该同一集群获取到作业调度命令的情况下,调度节点中的统一调度器可按照作业调度命令指示的作业的类型为该作业分配相匹配的资源调度器,并指示与该作业的类型匹配的资源调度器执行该作业的数据处理过程,同一集群中支持不同作业类型的调度器,提高了同一集群中对不同类型作业的适配性。此外,由于采用统一调度器和资源调度器的二层调度架构,无需对资源调度器的调度方式,而是由统一资源调取器将作业分配至对应类型的资源调度器,既保证了同一集群中调度器的种类,进而增加了可支持被调度的作业类型,又避免对原有资源调度器的改进所带来的开发工作量,使得同一集群中可支持多种作业类型的作业处理,有利于提高该同一集群中资源利用率,以及作业执行数据处理的效率。
示例的,上述作业调度命令可由终端发送至调度节点。
在一种可能的实现方式中,作业调度命令携带的信息用于指示第一作业的类型。
由于第一作业的作业调度命令直接的体现了第一作业的类型,统一调度器根据该作业调度命令,能快速确定第一作业的类型,提高了调度节点为第一作业确定匹配的资源调度器的效率。
可选的,第一作业的类型为HPC作业、AI作业或大数据作业等。
在一种可能的实现方式中,统一调度器中维护有权限锁,该权限锁用于指示第一类资源调度器对第一类资源调度器管理的资源的调度权限,管理的资源包括计算资源、网络资源、存储资源中至少一种。统一调度器可根据以下示例中的情形,更新权限锁的状态。
下面以第一类资源调度器是LSF为例,对权限锁的状态更新过程进行说明。
示例一,调度权限的获取过程:统一调度器向LSF发送与作业调度命令匹配的资源调度命令,LSF 根据该资源调度命令向统一调度器请求LSF管理的计算节点的调度权限,统一调度器根据前述的请求更新该权限锁的状态为:LSF具有调度LSF管理的资源的权限。
示例二,调度权限的释放过程:LSF为第一作业分配对应的资源后,LSF向统一调度器发送权限释放命令;以及统一调度器响应权限释放命令,并更新权限锁的状态为:此时无资源调度器具有调度权限。
该权限锁可用于控制多个资源调度器对同一集群中资源的调度权限,使得在同一集群中同一时间有且仅有一个资源调度器可为作业进行计算节点的分配,在提高了同一集群中对不同类型作业的适配性的同时,避免了相同的资源同时被多个资源调度器调用,出现资源抢占以及由资源抢占导致的作业的执行等待时长增加的问题,提高了作业调度的准确性以及同一集群对多种不同类型作业的适配性。资源调度器无需与其他资源调度器进行交互,有利于提升作业调度的效率。
在一种可能的实现方式中,在统一调度器指示第一类资源调度器执行第一作业的数据处理之前,该作业调度方法还包括:统一调度器获取调度器集合中各个资源调度器管理的资源的状态;统一调度器根据前述的各个资源调度器管理的资源的状态,确定资源状态表。
示例的,上述状态可以包括:计算节点的运行状态,以及计算节点包括的计算资源、存储资源或网络资源中至少一种的使用状态。该资源状态表中可包括计算节点的当前状态,该当前状态用于指示计算节点可用或者不可用。该计算节点的当前状态可基于运行状态确定。
本示例中,资源状态表可用于指示调度器集合中各个资源调度器管理的资源的使用情况,如多个资源调度器可根据该资源状态表来确定资源的使用情况,由统一调度器实现多个资源调度器间资源状态的同步,避免了不同的资源调度器管理相同资源的状态出现差异的情况,进而,确保了多个资源调度器不会对相同的资源进行资源抢占,有利于减少作业的执行等待时间。
在一种可能的实现方式中,在指示第一调度器执行第一作业的数据处理之后,该作业调度方法还包括:当统一调度器接收到第一类资源调度器发送的第一作业的资源分配结果时,统一调度器更新资源状态表;其中,该资源分配结果用于指示:第一类资源调度器为第一作业分配的可被调用的资源。
调度节点在为一个作业分配计算节点完成后,及时更新资源状态表,确保调度节点根据最新的资源状态表来为其他作业分配资源,避免资源调度器将作业分配至使用状态为剩余资源小于作业所需资源的计算节点上,出现资源抢占的问题,提高了资源调度器为作业分配计算节点的合理性,减少作业执行的等待时间,提高了调度节点为作业分配计算资源的效率。
在一种可能的实现方式中,该作业调度方法还包括:指示调度器集合中其他资源调度器同步更新后的资源状态表,并根据更新后的资源状态表为其他作业确定资源分配结果。该其他资源调度器为调度器集合中除第一类资源调度器之外的任一资源调度器。
在一种可能的示例中,统一调度器可向其他资源调度器发送资源同步命令,以指示其他资源调度器同步更新后的资源状态表。其他资源调度器根据更新后的资源状态表为其他作业进行资源的分配,并得到对应的资源分配结果。其中,该资源同步命令用于指示:其他资源调度器同步更新后的资源状态表。
统一调度器将更新后的资源状态表同步至调度器集合中的其他资源调度器,由统一调度器实现多个资源调度器间资源状态的同步,避免了多个资源调度器间由于资源状态的不一致,导致同一资源可能被多个资源调度器重复调用,出现资源抢占以及由资源抢占导致的作业的执行等待时长增加的问题,有利于提高同一集群为该作业执行数据处理的效率。
第二方面,提供了一种作业调度装置,该作业调度装置应用于调度节点,并适用于包括调度节点的异构算力系统,该作业调度装置包括用于执行第一方面或第一方面任一种可选设计中的作业调度方法的各个模块。示例的,该作业调度装置包括:获取模块、选择模块、指示模块,其中,获取模块,用于获取作业调度命令;选择模块,用于根据第一作业的类型在调度器集合中选择与第一作业匹配的第一类资源调度器,调度器集合包括可执行至少两种不同类型的作业调度处理的资源调度器;指示模块,用于指示第一类资源调度器执行第一作业的数据处理。
其中,且该作业调度命令用于执行第一作业的调度处理。
示例的,上述作业调度命令可由终端发送至调度节点。统一调度器可向第一类资源管理器发送作业调度命令,以指示第一类资源调度器执行第一作业的数据处理。
关于作业调度装置更多详细的实现内容可参照以上第一方面中任一实现方式的描述,以及下述具体实施方式的内容,在此不予赘述。
第三方面,本申请提供一种芯片,包括:控制电路和接口电路,所述接口电路用于获取作业调度命令,所述控制电路用于根据所述作业调度命令执行第一方面和第一方面中任一种可能实现方式中的方法。
第四方面,本申请提供一种调度节点,包括处理器和存储器;存储器用于存储计算机指令,处理器执行该计算机指令实现上述第一方面及其第一方面任意可选的实现方式中的方法。
第五方面,本申请提供一种异构算力系统,该异构算力系统包括调度节点和计算节点;该调度节点用于为第一作业分配计算节点,使得调度节点执行上述第一方面及其第一方面任意可选的实现方式中的方法。
第六方面,本申请提供一种计算机可读存储介质,该存储介质中存储有计算机程序或指令,当计算机程序或指令被处理设备执行时,实现上述第一方面和第一方面中任一种可选实现方式中的方法。
第七方面,本申请提供一种计算机程序产品,该计算程序产品包括计算机程序或指令,当该计算机程序或指令被处理设备执行时,实现上述第一方面和第一方面中任一种可选实现方式中的方法。
以上第二方面至第七方面的有益效果可参照第一方面或第一方面中任一种实现方式,在此不予赘述。本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种异构算力系统的应用场景图;
图2为本申请实施例提供的调度节点的初始化方法的流程示意图;
图3为本申请实施例提供的作业调度方法的流程示意图;
图4为本申请实施例提供的权限锁更新方法的流程示意图一;
图5为本申请实施例提供的权限锁更新方法的流程示意图二;
图6为本申请实施例提供的状态更新方法的流程示意图;
图7A为本申请实施例提供的一种作业调度装置的结构示意图一;
图7B为本申请实施例提供的一种作业调度装置的结构示意图二;
图8为本申请实施例提供的一种控制节点的结构示意图。
具体实施方式
本申请提供一种作业调度方法,该方法适用于包括调度节点的异构算力系统,其中,调度节点采用了统一调度器和由统一调度器管理的调度器集合构成的两层调度架构,该调度器集合包括至少包括多种不同类型的资源调度器,多种类型的资源调度器分别为特定类型的作业进行资源(或称为计算节点)的数据处理,以及至少两个资源调度器同时管理每个计算节点的对等架构。其中,上述特定类型的作业是指资源调度器支持调度的作业的类型,非特定类型的作业是指资源调度器不支持调度或调度效率较低的作业的类型。
针对于一个异构算力系统而言,异构算力系统包括多个集群,每个集群支持多种作业类型的作业处理,调度节点为该异构算力系统设定调度器集合,由于该调度器集合包括可用于执行多种类型的作业调度处理的资源调度器,因此,同一集群内实现了支持多种类型的作业调度处理的功能。当该同一集群获取到作业调度命令的情况下,调度节点中的统一调度器可按照作业调度命令指示的作业的类型为该作业分配相匹配的资源调度器,并指示与该作业的类型匹配的资源调度器执行该作业的数据处理过程,同一集群中支持多种不同作业类型的作业处理。当该同一集群获取到作业调度命令的情况下,调度节点中的统一调度器可按照作业调度命令指示的作业的类型为该作业分配相匹配的资源调度器,并指示与该作业的类型匹配的资源调度器执行该作业的数据处理过程,同一集群中支持不同作业类型的调度器,提高了同一集群中对不同类型作业的适配性。此外,由于采用统一调度器和资源调度器的二层调度架构,无需对资源调度器的调度方式,而是由统一资源调取器将作业分配至对应类型的资源调度器,既保证了同一集群中调度器的种类,进而增加了可支持被调度的作业类型,又避免对原有资源调度器的改进所带来的开发工作量,使得同一集群中可支持多种作业类型的作业处理,有利于提高该同一集群中资源利用率,以及作业执行数据处理的效率。
示例的,异构算力系统为一种计算机集群,该计算机集群包括调度节点和大量的计算节点,上述的调度节点与计算节点可通过有线或无线的方式连接,该调度节点用于为作业分配计算节点,该计算节点为作业提供算力支撑。上述作业调度命令用于调度与第一作业所匹配的计算节点。统一调度器可向第一类资源管理器发送作业调度命令,以指示第一类资源调度器为第一作业分配异构算力系统中的资源。统 一调度器可为运行在调度节点上软件或者部署在调度节点中的硬件装置。
下面对本实施例提供的数据监控分析方法进行说明,首先给出相关技术的介绍。
资源调度器,是指为作业或应用分配多种类型的资源的调度软件,如该资源调度器可实现计算资源管理和作业调度等功能。在一些情形中,该资源调度器是指单独部署在服务器或异构算力系统(如包括多类处理器的集群等)的处理器或控制器等;在另一些情形中,该资源调度器是指服务器部署的虚拟机(Virtual Machine,VM)、容器(container)、或者其他软件单元等,本申请对此不予限定。以资源调度器是软件单元为例,该资源调度器可提供有访问端口,服务器中的硬件或其他软件单元通过该访问端口向资源调度器发送命令或指令,或者,该资源调度器通过该访问端口调度服务器或异构算力系统提供的资源等。
调度器集合,包括多种不同类型的资源调度器,且每种类型的资源调度器中可包括一个或多个资源调度器。
高性能计算集群(High-performance Computing,HPC),是指能够执行一般个人电脑无法处理的大资料量与高速运算的电脑。
容器编排平台(Kubernetes,k8S),是指自动化运维管理容器(Docker)的系统。
作业调度系统(Load Sharing Facility,LSF),是指用于计算资源的管理和批处理作业调度的系统。
集群任务管理系统(Sun Grid Engine,SGE),是指用于将用户投递的任务进行排队,然后将任务交给能够运行的计算节点执行的系统。
代理(Agent),部署在计算节点上,用于与调度节点进行通信,根据通信的内容执行相应的操作。部署在计算节点上的代理包括K8S代理和HPC代理等。
计算节点,是指为作业提供算力、存储和网络等支持的系统;计算节点可包括:中央处理器(Central Processing Unit,CPU)计算节点、图形处理器(Graphics processing unit,GPU)计算节点和网络处理器(Neural-Network Processing Units,NPU)计算节点等。CPU计算节点配置了大量的CPU。GPU计算节点配置了大量的GPU、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)等并行加速器。计算节点中的存储器可以是但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。计算节点的通信接口配置的网络可以是因特网,或者其他网络(如以太网)。网络可以包括一个或多个网络设备,如网络设备可以是路由器或交换机等。
为避免资源调度器对非特定类型的作业进行资源调度时,资源调度器不支持调度该非特定类型的作业。本申请采用统一调度器和调度器集合构成的双层调度架构,以及调度器集合中至少两个资源调度器同时管理每个计算节点的对等架构。如图1所示,图1为本申请提供的一种异构算力系统的应用场景图。该异构算力系统100可以包括调度节点110和n个计算节点120,n为正整数。终端200和调度节点110以及调度节点110和计算节点120间可通过有线的方式通信,如以太网、光纤以及设置于异构算力系统100内部用于连接调度节点110和计算节点120的各种快捷外围部件互连标准(Peripheral Component Interconnect Express,PCIe)总线等;也可以通过无线的方式通信,如因特网、无线通信(WIFI)以及超宽带(Ultra Wide Band,UWB)技术等。在一种可能的情形中,上述异构算力系统100还可包括终端200。
示例性的,在如图1所示的异构算力系统100的应用场景图中,处理人员在终端200输入作业调度命令后,终端200将该作业调度命令发送至调度节点110,调度节点110中的统一调度器111根据作业调度命令指示的第一作业的类型,从调度器集合112中确定与第一作业的类型匹配的第一类资源调度器1121。统一调度器111将指示第一类资源调度器1121根据作业调度命令执行第一作业的数据处理,该数据处理的过程包括:第一类资源调度器根据第一作业所需的资源,为该第一作业分配对应的计算节点。
为保证计算节点的初始状态同步,调度节点将进行初始化。在图1所示的异构算力系统100的基础上,提供一种异构算力系统100中调度节点110初始化的一种可能实现方式,如图2所示,图2为本申请实施例提供的调度节点的初始化方法的流程示意图,调度节点110中运行有统一调度器和调度器集合,该调度器集合包括用于执行多种类型作业调度处理的资源调度器,如LSF和K8S,统一调度器和调度 器集合包括的资源调度器可在调度节点110中的处理器上运行。
可选的,调度节点110还包括存储器。值得注意的是,图2仅为本申请提供的示例,不应理解为对本申请的限定,在一些情况中,调度器集合中还包括更多的资源调度器,异构算力系统100还可包括更多或更少的计算节点。
本实施例提供的调度节点初始化的内容包括步骤S210和S220。
S210、统一调度器111获取异构算力系统100调度器集合中资源调度器管理的计算节点120的状态。
在一种可能的情形中,该状态用于指示:计算节点的运行状态,以及计算节点包括的计算资源、存储资源和网络资源等的使用状态。
其中,计算资源用于指示在单位时间内的浮点运算次数和整点运算次数;如计算资源在单位时间内的浮点运算次数和整点运算次数用于指描述剩余处理能力。示例性的,上述的计算资源可由计算节点120中的处理器提供,该处理器可以是一种集成电路芯片,具有信号处理能力,该处理器可以是通用处理器,包括上述的CPU、GPU、NPU、FPGA、ASIC等。
存储资源用于指示存储容量、数据读写速度等;如存储资源的数据读写速度用于描述存储资源的处理能力。上述存储资源可由图1所示出的计算节点120中存储器来提供,关于存储器的内容,可参考上述相关技术中对计算节点的存储器的表述,在此不予赘述。
网络资源用于指示传输带宽,该传输带宽是指在单位时间内所能传输的最大数据量;如传输带宽用于描述网络资源的处理能力。上述网络资源可由计算节点120的通信接口所具备的传输带宽提供,关于通信接口所具备的传输带宽的内容,可参考上述相关技术中对计算节点的通信接口的表述,在此不予赘述。
前述计算节点的运行状态可根据计算节点的故障情况来确定。例如,该故障情况可包括但不限于:资源调度器与计算节点间之间的通信链路中断、计算节点中的器件(如存储器、处理器核、网络设备等)故障等。
在一种可能的示例中,若计算节点与资源调度器的通信链路中断,则该计算节点的运行状态为不可用。
在另一种可能的示例中,若计算节点中部分硬件故障,该计算节点的运行状态仍为可用,但可用资源减少。
例如,配置有128核CPU的计算节点,由于部分核(非主核)故障,仅能提供64核的计算能力,此时,该节点的运行状态为“可用”,但可用资源为64核。
对于状态的获取,本申请给出了以下两种可选的示例。
在第一种可选的示例中,统一调度器111向LSF和K8S发送状态采集命令,LSF和K8S根据该状态采集命令分别获取计算节点1、计算节点2、计算节点3、计算节点4的状态。
示例性的,统一调度器111可向LSF发送bhosts命令查看所有计算节点的状态;统一调度器111可向K8S发送top命令查看所有计算节点的状态。
在第二种可选的示例中,LSF和K8S在获取到计算节点的状态后,主动向统一调度器111发送所有计算节点的状态。
S220、统一调度器111根据获取到的计算节点的状态,生成资源状态表。
上述资源状态表用于追踪集群中资源的使用情况,统一调度器根据获取到的状态,及时更新资源状态表。该资源状态表可保存在与统一调度器通信的数据库,或者,资源状态表可保存在统一调度器所属的调度节点。
该资源状态表可包括计算节点的运行状态以及当前状态等。统一调度器在得到资源状态表后,将该资源状态表同步至调度器集合中的所有资源调度器。
针对于计算节点的当前状态确定过程,以下给出了一种可能的情形:该当前状态根据各个资源调度器获取到的计算节点的运行状态确定,仅当各个资源调度器获取到的计算节点的运行状态都为可用时,该计算节点的当前状态为可用。
示例的,如下表1所示的资源状态表。
表1

其中,node_name表示计算节点的名称,每个计算节点具有唯一的编号;node_state表示计算节点的当前状态,该当前状态可分为两种:可用/不可用;K8S_state表示由K8S获取到的计算节点的运行状态,该运行状态分为两种:可用/不可用;LSF_state表示由LSF获取到的计算节点的运行状态,该运行状态分为两种:可用/不可用。
在一种可能的示例中,资源状态表中还将保存各计算节点对应的计算资源、存储资源和网络资源等至少一种的使用状态,如节点名称为C01n01的计算节点,剩余资源为10个核心数、30%(500G)的存储空间。
资源状态表可用于指示调度器集合中各个资源调度器管理的资源的使用情况,如多个资源调度器可根据该资源状态表来确定资源的使用情况(如上述表1示出的多个计算节点的状态),由统一调度器实现多个资源调度器间资源状态的同步,避免了不同的资源调度器管理相同资源的状态出现差异的情况,进而,确保了多个资源调度器不会对相同的资源进行资源抢占,有利于减少作业的执行等待时间。
在本申请的另一实施例中,统一调度器111和调度器集合包括的资源调度器可分别处于不同的调度节点。如统一调度器位于第一调度节点,调度器集合位于第二调度节点。
当统一调度器111和调度器集合包括的资源调度器可分别处于不同的调度节点时,统一调度器111配置了调度器集合112中所有资源调度器的端口,以实现统一调度器111基于资源调度器的端口向资源调度器发送作业调度命令以及状态采集命令等;且所有的资源调度器同样配置了统一调度器111的端口,以实现资源调度器通过统一调度器111的端口向统一调度器111发送命令和数据等。
示例的,统一调度器111维护了所有资源调度器的IP-Port,例如资源调度器1,14.17.32.211:1024,资源调度器2,14.17.32.211:1025。相应的,每个资源调度器维护了统一调度器111的IP-Port(如14.17.33.211:1024)。
在上述调度节点上的统一调度器111初始化完毕后,调度节点110将为作业进行资源调度,如针对于前述计算节点的分配等。如图3所示,图3为本申请实施例提供的作业调度方法的流程示意图,该作业调度方法可应用于图1所示出的异构算力系统100。该作业调度方法由图1示出的调度节点110执行,调度节点110中部署有统一调度器111、调度器集合112,该调度器集合包括第一类资源调度器1121和第二类资源调度器1122等。
每一类资源调度器可包括一个或多个资源调度器,如第一类资源调度器1121包括的资源调度器1121A、资源调度器1121B均为K8S调度器,又如第二类资源调度器1122包括的资源调度器1122A、资源调度器1122B均为LSF调度器。
以上的资源调度器是指部署在调度节点110中的为作业调度资源的系统,对于资源调度器的具体表述,可参考上述相关技术的介绍中资源调度器的内容。
在一种可能的示例中,给出了调度节点110为作业分配计算节点的方案,请参照图3,本实施例提供的作业调度方法包括以下步骤S310至S330。
S310、调度节点110中的统一调度器111获取作业调度命令。
其中,用户通过终端200输入对第一作业的作业调度命令,再将该作业调度命令发送至调度节点110。该作业调度命令用于执行第一作业的调度处理,指示资源调度器为第一作业调度所需的计算节点。示例的,用户可在终端200上的命令行界面(Command-Line Interface,CLI)输入作业调度命令。
可选的,第一作业的类型可为:HPC作业、容器化作业(如AI作业)或大数据作业等。该第一作业还可以是指其他类型的作业,本申请对此不予限定。
作业调度命令的格式可以采用多种不同的实现方式,在调度器集合仅包括K8S和LSF时,本申请给出了两种可能的示例。
在第一种可能的示例中,作业调度命令为原生的K8S命令或LSF命令。
示例的,原生的LSF命令,如bjobs-r/-a/-p、busb-J/-N/-R span等。原生的K8S命令,如Kubectl create/delete/get/run命令等。
在第二种可能的示例中,作业调度命令为对原生K8S命令或LSF命令进行封装后的命令。
示例的,如下表2所示的作业调度命令。
表2
其中,“LSF命令”和“K8S命令”是指原生的LSF命令和K8S命令,对于原生的LSF命令和K8S命令的内容,可参考上述第一种可能的示例中对K8S命令和LSF命令的表述,在此不予赘述。
请继续参见图3,本实施例提供的作业调度方法还包括步骤S320。
S320、统一调度器111根据作业调度命令指示的第一作业的类型,从调度器集合112中确定与第一作业匹配的第一类资源调度器1121。
调度节点110中的统一调度器111识别作业调度命令中指示的第一作业的类型,统一调度器111根据第一作业的类型从调度器集合112中确定与第一作业的类型匹配的资源调度器,其中,该调度器集合112中的资源调度器用于调度异构算力系统100中的计算节点。
示例的,当该第一作业的类型为AI作业,统一调度器从LSF和K8S中,确定用于为AI作业分配计算节点的K8S。
可选的,统一调度器111根据作业调度命令携带的信息确定第一作业的类型。
针对于统一调度器111根据作业调度命令确定第一作业的类型,本实施例提供了以下两种可能的示例进行说明。
在第一种可能的示例中,作业调度命令与第一作业的类型的对应关系,可根据设定的映射关系表确定。该映射关系表用于指示:命令与作业类型的对应关系。前述的命令可指示命令对应的命令头。
针对于上述的映射关系表,下表3给出了一种可能的示例。
表3
例如,当作业调度命令为原生“LSF命令”时,如bsub–n z–q QUEUENAME–i inputfile–o outputfile COMMAND,统一调度器111根据上述作业调度命令的busb,查询映射关系表,可判断出该作业调度命令对应的第一作业的类型为HPC作业。
其中,z代表了提交作业需要的cpu数,–q指定作业提交到的队列,如果不采用–q选项,系统把作业提交到默认作业队列。inputfile代表程序需要读入的文件名(例如namelist等),outputfile代表一个文件,作业提交后标准输出的信息将会保存到这个文件中。COMMAND是用户要运行的程序。
当作业调度命令为原生“K8S命令”时,如kubectl run nginx--replicas=3--labels="app=example"--image=nginx:1.10--port=80,统一调度器111根据上述作业调度命令的kubectl,查询映射关系表,可判断出该作业调度命令对应的第一作业的类型为AI作业或容器化作业。
其中,上述作业调度命令表示运行一个名称为nginx,副本数为3,标签为app=example,镜像为nginx:1.10,端口为80的容器实例。
在第二种可能的示例中,统一调度器111根据作业调度命令中带有的作业类型,如HPC或AI,从而可直接确定作业调度命令对应的作业的类型。
例如,当作业调度命令为如上表2中的musb-HPC“LSF命令”时,统一调度器111根据该作业调度命令的musb-HPC,可确定前述作业调度命令的作业类型为HPC作业。
当作业调度命令为如上表2中的musb-AI“K8S命令”时,统一调度器111根据该作业调度命令的musb-AI,可确定前述作业调度命令的作业类型为AI作业。
以上几种示例仅为本实施例提供的确定第一作业的类型的可选方式,不应理解为对本申请的限定。
由于第一作业的作业调度命令直接的体现了第一作业的类型,统一调度器根据该作业调度命令,能快速确定第一作业的类型,提高了调度节点为第一作业确定匹配的资源调度器的效率。
请继续参见图3,本实施例提供的作业调度方法还包括步骤S330。
S330、统一调度器111指示第一类资源调度器1121执行第一作业的数据处理。
在一种可能的情形中,统一调度器111可通过向第一类资源调度器发送与作业调度命令匹配的资源调度命令,来指示第一类资源调度器1121为第一作业进行调度。
其中,该资源调度命令用于指示:第一类资源调度器1121为第一作业分配集群中的计算节点。该资源调度命令从作业调度命令中截取得到。
示例的,当作业调度命令为musb-HPC bsub–n z–q QUEUENAME–i inputfile–o outputfile COMMAND时,统一调度器将上述作业调度命令的musb-AI删掉后,得到资源调度命令bsub–n z–q QUEUENAME–i inputfile–o outputfile COMMAND。
以上示例仅为本实施例提供的获取资源调度命令的可选方式,不应理解为对本申请的限定。
针对于第一类资源调度器1121为第一作业分配集群中的计算节点,本申请提供一种可能的示例进行说明。
示例性的,当LSF接收到第一作业对应的资源调度命令为“bsub–n z–q QUEUENAME–i inputfile–o outputfile COMMAND”,LSF根据资源调度命令中指示的需要的cpu数、指定作业提交到的队列等信息以及上述表1所示的资源状态表,为第一作业分配计算节点。计算节点根据第一作业需要运行的程序、需读取的数据以及输出数据存储的地址等信息,运行该第一作业。
第一类资源调度器1121在确定为第一作业分配的计算节点后,将向该计算节点上的agent,如HPC agent或K8S agent等发送第一作业的信息,以指示计算节点运行该第一作业。该第一作业的信息可包括需运行的程序与输入数据等。
针对于一个异构算力系统(或集群)而言,调度节点为该异构算力系统设定调度器集合,如该调度器集合包括K8S和LSF,由于该调度器集合包括可用于执行多种类型(如HPC作业和AI作业)的作业调度处理的资源调度器,因此,同一集群内实现了支持多种类型的作业调度处理的功能。
当该同一集群获取到作业调度命令的情况下,调度节点中的统一调度器可按照作业调度命令指示的作业的类型为该作业分配相匹配的资源调度器(如统一调度器将作业类型为HPC的作业分配至LSF),并指示与该作业的类型匹配的资源调度器执行该作业的数据处理过程,增加了同一集群中支持处理作业类型的数量,提高了同一集群中对不同类型作业的适配性,而且,统一调度器将作业分配至对应类型的资源调度器,有利于提高该同一集群为该作业执行数据处理的效率。
上述实施例中示出的调度器集合112包括的LSF或K8S,分别对HPC作业或AI作业进行调度,属于两种类别的资源调度器,在本申请的另一实施例中,第一类资源调度器1121中可包括多个资源调度器,如第一类资源调度器1121包括资源调度器1121A和资源调度器1121B,其中LSF可称为资源调度器1121A,SGE可称为资源调度器1121B。
在本实施例中,统一调度器111获取作业调度命令和根据作业调度命令指示的第一作业的类型,确定与第一作业的类型匹配的第一类资源调度器1121的内容,可参考上述S310和S320的内容,在此不予赘述。
统一调度器111在确定与第一作业的类型匹配的第一类资源调度器1121后,确定第一类资源调度器中LSF和SGE的作业队列情况,统一调度器111基于LSF和SGE的作业队列情况,将资源调度命令发送至作业队列少的资源调度器。
示例的,当LSF的作业队列情况为有3个作业待分配,SGE的作业队列情况为有5个作业待分配,统一调度器基于上述作业队列情况,向LSF发送资源调度命令。LSF接收到资源调度命令后为第一作业分配计算节点的内容,可参考上述LSF接收到第一作业对应的资源调度命令的内容,在此不予赘述。
在一种可选的实现方式中,指示第一类资源调度器执行第一作业的数据处理的过程可以包括:
以第一类资源调度器是LSF为例进行说明,统一调度器111向LSF发送与作业调度命令匹配的资源调度命令,LSF根据该资源调度命令向统一调度器111请求LSF管理的计算节点的调度权限,以及,LSF根据该调度权限以及资源调度命令指示的资源需求,为第一作业分配资源。其中,计算节点包括计 算资源、网络资源、存储资源中至少一种。对于计算资源、网络资源、存储资源的内容,可参见S210中对计算资源、网络资源、存储资源的表述,在此不予赘述。
示例性的,LSF请求调度权限可以是和统一调度器111通过命令交互实现的。在图3示出的统一调度器与调度器集合中多类资源调度器的关系的基础上,对在作业调度过程之前资源调度器和统一调度器的交互过程进行举例说明,如图4所示,图4为本申请实施例提供的权限锁更新方法的流程示意图一。
S410、LSF响应于统一调度器111的资源调度命令,向统一调度器111请求前述资源状态表中指示的可被调用计算节点的调度权限。
具体的,LSF可从如表1所示的资源状态表,从LSF管理的计算节点中确定可被调用的计算节点。该可被调用的计算节点的运行状态为可用,且具有剩余资源。
示例的,LSF可向统一调度器发送第一命令,该第一命令用于请求对计算节点120的调度权限。
S420、统一调度器111响应LSF的请求,更新统一调度器111维护的权限锁的状态。
其中,统一调度器111更新权限锁的状态为:LSF具有调度计算节点的权限。
在一种可能的示例中,统一调度器111可根据配置的调度策略,进行权限锁状态的更新。如LSF的优先级高于K8S的优先级,当同时接收到LSF和K8S的第一命令时,统一调度器111更新权限锁的状态为:LSF具有调度计算节点120的权限。
S430、统一调度器111向LSF发送第二命令。
其中,第二命令用于指示:LSF能够根据调度权限调度异构算力系统100中的计算节点120。
LSF基于第二命令,对第一作业进行计算节点的分配,LSF为第一作业分配计算节点的表述,可参考上述S330中的示例LSF为第一作业分配异构算力系统100中的计算节点120的内容,在此不予赘述。
在一种可能的情形中,统一调度器111在如上述S520的步骤更新权限锁的状态后,K8S发送第一命令至统一调度器111。统一调度器111根据该权限锁的状态,向K8S返回调用失败的信息。在K8S接收到调用失败的信息后,K8S将周期性的发送第一命令,直至获取到统一调度器111发送的第三命令,其中,第三命令用于指示:K8S能够根据调度权限调度异构算力系统100中的计算节点120。
值得注意的是,第一类资源调度器执行第一作业的数据处理的过程还可以包括:
以第一类资源调度器是LSF为例进行说明,如图5所示,图5为本申请实施例提供的权限锁更新方法的流程示意图二。
S510、在LSF为第一作业分配对应的资源后,LSF向统一调度器111发送权限释放命令。
其中,权限释放命令用于指示LSF已释放上述调度权限。
S520、统一调度111响应权限释放命令,并更新权限锁的状态。
示例的,可将权限锁的状态更新为:此时无资源调度器具有调度权限。此时其他资源调度器,如K8S向统一调度器111发送权限请求的命令时,统一调度器111将更新权限锁的状态为:K8S具有调度计算节点的权限。
该权限锁可用于控制多个资源调度器对同一集群中资源的调度权限,使得在同一集群中同一时间有且仅有一个资源调度器(如上述示例中仅LSF)可为作业进行计算节点的分配,在提高了同一集群中对不同类型作业的适配性的同时,避免了相同的资源同时被多个资源调度器调用,出现资源抢占以及由资源抢占导致的作业的执行等待时长增加的问题,提高了作业调度的准确性以及同一集群对多种不同类型作业分配时的适配性。资源调度器无需与其他资源调度器进行交互,有利于提升作业调度的效率。
在第一类资源调度器释放调度权限之后,本实施例提供的作业调度方法还包括统一调度器111更新资源状态表。
如图6所示,图6为本申请实施例提供的状态更新方法的流程示意图。以第一类资源调度器是LSF,第二类资源调度器是K8S为例进行说明。
S610、LSF向统一调度器111发送第一作业的资源分配结果。
其中,该资源分配结果用于指示LSF为第一作业分配的可被调用的资源。
示例的,LSF将第一作业分配至如上述表1中节点名称为C01n01的计算节点,并为第一作业分配了C01n01中的4个核心数,以及10%的存储资源。
S620、统一调度器111根据该资源分配结果,更新资源状态表。
示例的,统一调度器根据上述资源分配结果“C01n01中的4个核心数,以及10%的存储资源”,更 新如上述表1所示的资源状态表中名称为C01n01对应的使用状态。
调度节点在为一个作业分配计算节点完成后,及时更新如上表1所示的资源状态表中的计算节点的使用状态和运行状态,确保调度节点根据最新的资源状态表来为其他作业分配资源,避免资源调度器将作业分配至使用状态为剩余资源小于作业所需资源的计算节点上,出现资源抢占的问题,提高了资源调度器为作业分配计算节点的合理性,减少作业执行的等待时间,提高了调度节点为作业分配计算资源的效率。
为同步调度器集合中所有资源调度器所管理的计算节点的状态,本实施例给出一种可能的实现方式:统一调度器111指示调度器集合中其他资源调度器同步上述更新后的资源状态表,并根据更新后的资源状态表为其他作业确定资源分配结果。
其中,其他资源调度器为调度器集合中除第一类资源调度器之外的任一资源调度器。
在一种可能的情形中,统一调度器111通过向其他资源调度器,如图6中的K8S发送资源同步命令,以指示其他资源调度器同步更新后的资源状态表。其他资源调度器根据更新后的资源状态表为其他作业进行资源的分配,并得到对应的资源分配结果。
该资源同步命令用于指示:其他资源调度器同步更新后的资源状态表。
在一种可能的示例中,该资源同步命令带有完整的更新后的资源状态表,其他资源调度器在接收到该资源同步命令后,替换现有保存的资源状态表。
在另一种可能的示例中,该资源同步命令带有部分资源状态表,其他资源调度器在接收到资源同步命令后,基于该部分资源状态表更新现有保存的资源状态表。该部分资源状态表仅包括第一作业的资源分配结果。
以上几种示例仅为本实施例提供的同步资源状态表的可选方式,不应理解为对本申请的限定。
上述示例为资源同步命令带有待更新的数据,如完整的资源状态表,或部分资源状态表;在本申请的其他情形中,统一调度器111可将资源同步命令与待更新的数据分别发送至其他资源调度器。
统一调度器将更新后的资源状态表同步至调度器集合中的其他资源调度器(如K8S),由统一调度器实现多个资源调度器间资源状态的同步,避免了多个资源调度器间由于资源状态的不一致,导致同一资源可能被多个资源调度器重复调用,出现资源抢占以及由资源抢占导致的作业的执行等待时长增加的问题,有利于提高同一集群为该作业执行数据处理的效率。
上文中结合图1至图6,详细描述了根据本申请所提供的作业调度的方法,下面将结合图7A,图7A为本申请提供的一种作业调度装置的结构示意图一,描述根据本申请所提供的作业调度装置。作业调度装置700可以用于实现上述方法实施例中调度节点的功能,因此也能实现上述方法实施例所具备的有益效果。
如图7A所示,作业调度装置700包括命令获取模块710、选择模块720和指示模块730;该作业调度装置700用于实现上述图2~图6中所对应的方法实施例中调度节点的功能。在一种可能的示例中,该作业调度装置700用于实现上述作业调度方法的具体过程包括以下过程:
获取模块710,用于获取作业调度命令。其中,终端200接收用户输入的第一作业的作业调度命令,再将该作业调度命令发送至统一调度器。作业调度命令用于调度与第一作业所匹配的资源。该作业调度命令用于执行第一作业的调度处理。
选择模块720,用于根据作业调度命令指示的第一作业的类型,从调度器集合中确定与第一作业匹配的第一类资源调度器。调度节点中的统一调度器识别作业调度命令中指示的第一作业的类型,统一调度器根据第一作业的类型从调度器集合中确定与第一作业的类型匹配的资源调度器,其中,该调度器集合中的资源调度器用于调度异构算力系统中的计算节点。
指示模块730,用于指示第一类资源调度器执行第一作业的数据处理。示例的,统一调度器可通过向第一类资源调度器发送与作业调度命令匹配的资源调度命令,来指示第一类资源调度器为第一作业分配异构算力系统中的计算节点。其中,该资源调度命令用于指示:第一类资源调度器为第一作业分配集群中的计算节点。该资源调度命令从作业调度命令中截取得到。
为进一步实现上述图2至图6中所示的方法实施例中的功能。本申请还提供了一种作业调度装置,如图7B所示,图7B为本申请提供的一种作业调度装置的结构示意图二,该作业调度装置700还包括状态更新模块740、表确定模块750、表更新模块760和指示同步模块770。
其中,该状态更新模块740用于更新权限锁的状态,权限锁用于指示第一类资源调度器对第一类资源调度器管理的资源的调度权限,管理的资源包括计算资源、网络资源、存储资源中至少一种。表确定模块750,用于获取调度器集合中的各个资源调度器管理的资源的状态;以及根据资源的状态确定资源状态表,该资源状态表用于指示调度器集合中的各个资源调度器管理的资源的使用情况。表更新模块760,用于当接收到第一类资源调度器发送的第一作业的资源分配结果时,更新资源状态表;其中,资源分配结果用于指示:第一类资源调度器为第一作业分配的可被调用的资源。指示同步模块770,指示调度器集合中其他资源调度器同步更新后的资源状态表,并根据更新后的资源状态表为其他作业确定资源分配结果。
应理解,前述实施例的调度节点可对应于该作业调度装置700,并可以对应于执行根据本申请实施例的方法图2~图6对应的相应主体,并且作业调度装置700中的各个模块的操作和/或功能分别为了实现图2至图6中对应实施例的各个方法的相应流程,为了简洁,在此不再赘述。
示例性的,当作业调度装置700通过前述调度节点110来实现时,该调度节点110可包括一种或多种硬件,如图8所示,图8为本申请提供的一种调度节点的结构示意图。该调度节点800可应用于图1所示的异构算力系统100中。
如图8所示,调度节点800可以包括处理器810、存储器820、通信接口830、总线840和统一调度器850等,处理器810、存储器820、通信接口830通过总线840连接。
处理器810是调度节点800的运算核心和控制核心。处理器810可以是一块超大规模的集成电路。处理器810中安装有操作系统和其他软件程序,使得处理器810实现对内存820及各种PCIe设备的访问。处理器810包括一个或多个处理器核(core)。处理器810中的处理器核例如是CPU或其他特定集成电路ASIC。处理器810还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。实际应用中,调度节点800也可以包括多个处理器。上述统一调度器可以为在处理器810中执行的软件。
存储器820可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器810通过运行存储在内部存储器820的指令,从而执行调度节点800的各种功能应用以及数据处理。存储器820可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如识别作业调度命令,发送功能等)等。存储数据区可存储处理设备800使用过程中所创建的数据(比如资源状态表)等。此外,内部存储器820可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
通信接口830用于实现调度节点800与外部设备或器件的通信。在本实施例中,通信接口830用于与计算节点120和终端200进行数据交互。
总线840可以包括一通路,用于在上述组件(如处理器810、存储器820、通信接口830)之间传送信息。总线840除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线840。总线840可以是PCIe总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。例如,处理器810可以通过PCIe总线访问这些I/O设备。处理器810通过双倍速率(double data rate,DDR)总线和存储器820相连。这里,不同的存储器820可能采用不同的数据总线与处理器810通信,因此,DDR总线也可以替换为其他类型的数据总线,本申请实施例不对总线类型进行限定。
作为一种可能的实施例,本申请还提供一种统一调度器,例如,前述调度节点110中的统一调度器111,该统一调度器111可包括一种或多种硬件,如图8所示,该统一调度器850部署在调度节点800上。
该统一调度器850包括处理器851,处理器851包括一个或多个处理器核(core)。该处理器851可根据获取到的作业调度命令执行如图2至图6所示的方法。统一调度器850可将资源状态表存储至调度器节点中的存储器820。
可选的,统一调度器850还包括存储器852,处理器851可将得到的资源状态表存储至存储器852。
关于统一调度器850更多详细的实现内容可参照上述作业调度方法的内容,在此不予赘述。
值得说明的是,图8中仅以调度节点800包括1个处理器810和1个存储器820为例,此处,处理 器810和存储器820分别用于指示一类器件或设备,具体实施例中,可以根据业务需求确定每种类型的器件或设备的数量。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (11)

  1. 一种作业调度方法,其特征在于,所述方法包括:
    获取作业调度命令,所述作业调度命令用于执行第一作业的调度处理;
    根据所述第一作业的类型在调度器集合中选择与所述第一作业匹配的第一类资源调度器,所述调度器集合包括可执行至少两种不同类型的作业调度处理的资源调度器;
    指示所述第一类资源调度器执行所述第一作业的数据处理。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    更新权限锁的状态,所述权限锁用于指示所述第一类资源调度器对所述第一类资源调度器管理的资源的调度权限,所述管理的资源包括计算资源、网络资源、存储资源中至少一种。
  3. 根据权利要求1或2所述的方法,其特征在于,在所述指示所述第一类资源调度器执行所述第一作业的数据处理之前,所述方法还包括:
    获取所述调度器集合中的各个资源调度器管理的资源的状态;
    根据所述资源的状态确定资源状态表,所述资源状态表用于指示所述调度器集合中的各个资源调度器管理的资源的使用情况。
  4. 根据权利要求3所述的方法,其特征在于,在所述指示所述第一类资源调度器执行所述第一作业的数据处理之后,所述方法还包括:
    当接收到所述第一类资源调度器发送的所述第一作业的资源分配结果时,更新所述资源状态表;
    其中,所述资源分配结果用于指示:所述第一类资源调度器为所述第一作业分配的可被调用的资源。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    指示所述调度器集合中其他资源调度器同步更新后的资源状态表,并根据所述更新后的资源状态表为其他作业确定资源分配结果。
  6. 一种作业调度装置,其特征在于,所述装置包括:
    获取模块,用于获取作业调度命令,所述作业调度命令用于执行第一作业的调度处理;
    选择模块,用于根据所述第一作业的类型在调度器集合中选择与所述第一作业匹配的第一类资源调度器,所述调度器集合包括可执行至少两种不同类型的作业调度处理的资源调度器;
    指示模块,用于指示所述第一类资源调度器执行所述第一作业的数据处理。
  7. 根据权利要求6所述的装置,其特征在于,所述装置还包括:
    状态更新模块,用于指示更新权限锁的状态,所述权限锁用于指示所述第一类资源调度器对所述第一类资源调度器管理的资源的调度权限,所述管理的资源包括计算资源、网络资源、存储资源中至少一种。
  8. 根据权利要求6或7所述的装置,其特征在于,所述装置还包括:
    表确定模块,用于获取所述调度器集合中的各个资源调度器管理的资源的状态;根据所述资源的状态确定资源状态表,所述资源状态表用于指示所述调度器集合中的各个资源调度器管理的资源的使用情况。
  9. 根据权利要求8所述的装置,其特征在于,所述装置包括:
    表更新模块,用于当接收到所述第一类资源调度器发送的所述第一作业的资源分配结果时,更新所述资源状态表;其中,所述资源分配结果用于指示:所述第一类资源调度器为所述第一作业分配的可被调用的资源。
  10. 根据权利要求9所述的装置,其特征在于,所述装置包括:
    指示同步模块,用于指示所述调度器集合中其他资源调度器同步更新后的资源状态表,并根据所述更新后的资源状态表为其他作业确定资源分配结果。
  11. 一种芯片,其特征在于,包括控制电路和接口电路,所述接口电路用于获取作业调度命令,所述控制电路用于根据所述作业调度命令执行权利要求1至5中任一项所述的方法。
PCT/CN2023/101052 2022-10-28 2023-06-19 作业调度方法、装置和芯片 WO2024087663A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211338117.1A CN117950816A (zh) 2022-10-28 2022-10-28 作业调度方法、装置和芯片
CN202211338117.1 2022-10-28

Publications (1)

Publication Number Publication Date
WO2024087663A1 true WO2024087663A1 (zh) 2024-05-02

Family

ID=90800635

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101052 WO2024087663A1 (zh) 2022-10-28 2023-06-19 作业调度方法、装置和芯片

Country Status (2)

Country Link
CN (1) CN117950816A (zh)
WO (1) WO2024087663A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744734A (zh) * 2013-12-24 2014-04-23 中国科学院深圳先进技术研究院 一种任务作业处理方法、装置及系统
US9229774B1 (en) * 2012-07-13 2016-01-05 Google Inc. Systems and methods for performing scheduling for a cluster
WO2017018978A1 (en) * 2015-07-24 2017-02-02 Hewlett Packard Enterprise Development Lp Scheduling jobs in a computing cluster
CN109564528A (zh) * 2017-07-06 2019-04-02 华为技术有限公司 分布式计算中计算资源分配的系统和方法
CN113918270A (zh) * 2020-07-08 2022-01-11 电科云(北京)科技有限公司 基于Kubernetes的云资源调度方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9229774B1 (en) * 2012-07-13 2016-01-05 Google Inc. Systems and methods for performing scheduling for a cluster
CN103744734A (zh) * 2013-12-24 2014-04-23 中国科学院深圳先进技术研究院 一种任务作业处理方法、装置及系统
WO2017018978A1 (en) * 2015-07-24 2017-02-02 Hewlett Packard Enterprise Development Lp Scheduling jobs in a computing cluster
CN109564528A (zh) * 2017-07-06 2019-04-02 华为技术有限公司 分布式计算中计算资源分配的系统和方法
CN113918270A (zh) * 2020-07-08 2022-01-11 电科云(北京)科技有限公司 基于Kubernetes的云资源调度方法及系统

Also Published As

Publication number Publication date
CN117950816A (zh) 2024-04-30

Similar Documents

Publication Publication Date Title
US20200241927A1 (en) Storage transactions with predictable latency
CN112199194B (zh) 基于容器集群的资源调度方法、装置、设备和存储介质
US8949847B2 (en) Apparatus and method for managing resources in cluster computing environment
US8756599B2 (en) Task prioritization management in a virtualized environment
CN108431796B (zh) 分布式资源管理系统和方法
CN109564528B (zh) 分布式计算中计算资源分配的系统和方法
WO2023125493A1 (zh) 资源管理方法、装置及资源管理平台
WO2019233322A1 (zh) 资源池的管理方法、装置、资源池控制单元和通信设备
CN110221920B (zh) 部署方法、装置、存储介质及系统
US10489177B2 (en) Resource reconciliation in a virtualized computer system
WO2019056771A1 (zh) 分布式存储系统升级管理的方法、装置及分布式存储系统
WO2024016596A1 (zh) 容器集群调度的方法、装置、设备及存储介质
US20240152395A1 (en) Resource scheduling method and apparatus, and computing node
CN114860387B (zh) 一种面向虚拟化存储应用的hba控制器i/o虚拟化方法
WO2020108337A1 (zh) 一种cpu资源调度方法及电子设备
WO2023020010A1 (zh) 一种运行进程的方法及相关设备
CN112039963B (zh) 一种处理器的绑定方法、装置、计算机设备和存储介质
WO2024087663A1 (zh) 作业调度方法、装置和芯片
WO2022111466A1 (zh) 任务调度方法、控制方法、电子设备、计算机可读介质
US11868805B2 (en) Scheduling workloads on partitioned resources of a host system in a container-orchestration system
CN110399206B (zh) 一种基于云计算环境下idc虚拟化调度节能系统
CN113076189B (zh) 具有多数据通路的数据处理系统及用多数据通路构建虚拟电子设备
Thaha et al. Data location aware scheduling for virtual Hadoop cluster deployment on private cloud computing environment
US20240160487A1 (en) Flexible gpu resource scheduling method in large-scale container operation environment
WO2022222975A1 (zh) 负载处理方法、计算节点、计算节点集群及相关设备